Science.gov

Sample records for deep short-read sequencing

  1. Unlocking Short Read Sequencing for Metagenomics

    DOE PAGESBeta

    Rodrigue, Sébastien; Materna, Arne C.; Timberlake, Sonia C.; Blackburn, Matthew C.; Malmstrom, Rex R.; Alm, Eric J.; Chisholm, Sallie W.; Gilbert, Jack Anthony

    2010-07-28

    We describe an experimental and computational pipeline yielding millions of reads that can exceed 200 bp with quality scores approaching that of traditional Sanger sequencing. The method combines an automatable gel-less library construction step with paired-end sequencing on a short-read instrument. With appropriately sized library inserts, mate-pair sequences can overlap, and we describe the SHERA software package that joins them to form a longer composite read.

  2. Unlocking Short Read Sequencing for Metagenomics.

    SciTech Connect

    Rodrigue, S A. C.; Materna, S C; Timberlake, M C; Blacburn, R R; Malmstrom, E J. Alm; Chisholm, S W

    2010-01-01

    We describe an experimental and computational pipeline yielding millions of reads that can exceed 200 bp with quality scores approaching that of traditional Sanger sequencing. The method combines an automatable gel-less library construction step with paired-end sequencing on a short-read instrument. With appropriately sized library inserts, mate-pair sequences can overlap, and we describe the SHERA software package that joins them to form a longer composite read.

  3. Unlocking Short Read Sequencing for Metagenomics

    PubMed Central

    Timberlake, Sonia C.; Blackburn, Matthew C.; Malmstrom, Rex R.; Alm, Eric J.; Chisholm, Sallie W.

    2010-01-01

    Background Different high-throughput nucleic acid sequencing platforms are currently available but a trade-off currently exists between the cost and number of reads that can be generated versus the read length that can be achieved. Methodology/Principal Findings We describe an experimental and computational pipeline yielding millions of reads that can exceed 200 bp with quality scores approaching that of traditional Sanger sequencing. The method combines an automatable gel-less library construction step with paired-end sequencing on a short-read instrument. With appropriately sized library inserts, mate-pair sequences can overlap, and we describe the SHERA software package that joins them to form a longer composite read. Conclusions/Significance This strategy is broadly applicable to sequencing applications that benefit from low-cost high-throughput sequencing, but require longer read lengths. We demonstrate that our approach enables metagenomic analyses using the Illumina Genome Analyzer, with low error rates, and at a fraction of the cost of pyrosequencing. PMID:20676378

  4. Fast Search of Thousands of Short-Read Sequencing Experiments

    PubMed Central

    Solomon, Brad; Kingsford, Carl

    2015-01-01

    We introduce Sequence Bloom Trees, a method for querying thousands of short-read sequencing experiments by sequence 485 times faster than existing approaches. The approach searches large data archives for all experiments that involve a given sequence. We use Sequence Bloom Trees to search 2652 human blood, breast, and brain RNA-seq experiments for all 214,293 known transcripts in under 4 days using less than 239 MB of RAM and a single CPU. PMID:26854477

  5. Deep short-read sequencing of chromosome 17 from the mouse strains A/J and CAST/Ei identifies significant germline variation and candidate genes that regulate liver triglyceride levels.

    PubMed

    Sudbery, Ian; Stalker, Jim; Simpson, Jared T; Keane, Thomas; Rust, Alistair G; Hurles, Matthew E; Walter, Klaudia; Lynch, Dee; Teboul, Lydia; Brown, Steve D; Li, Heng; Ning, Zemin; Nadeau, Joseph H; Croniger, Colleen M; Durbin, Richard; Adams, David J

    2009-01-01

    Genome sequences are essential tools for comparative and mutational analyses. Here we present the short read sequence of mouse chromosome 17 from the Mus musculus domesticus derived strain A/J, and the Mus musculus castaneus derived strain CAST/Ei. We describe approaches for the accurate identification of nucleotide and structural variation in the genomes of vertebrate experimental organisms, and show how these techniques can be applied to help prioritize candidate genes within quantitative trait loci. PMID:19825173

  6. Development and transferability of black and red raspberry microsatellite markers from short-read sequences

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The advent of next-generation sequencing technologies has been a boon to the cost-effective development of molecular markers, particularly in non-model species. Here, we demonstrate the efficiency of microsatellite or simple sequence repeat (SSR) marker development from short-read sequences using th...

  7. An analysis of the feasibility of short read sequencing

    PubMed Central

    Whiteford, Nava; Haslam, Niall; Weber, Gerald; Prügel-Bennett, Adam; Essex, Jonathan W.; Roach, Peter L.; Bradley, Mark; Neylon, Cameron

    2005-01-01

    Several methods for ultra high-throughput DNA sequencing are currently under investigation. Many of these methods yield very short blocks of sequence information (reads). Here we report on an analysis showing the level of genome sequencing possible as a function of read length. It is shown that re-sequencing and de novo sequencing of the majority of a bacterial genome is possible with read lengths of 20–30 nt, and that reads of 50 nt can provide reconstructed contigs (a contiguous fragment of sequence data) of 1000 nt and greater that cover 80% of human chromosome 1. PMID:16275781

  8. Short read sequencing for Genomic Analysis of the brown rot fungus Fibroporia radiculosa

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The practical capability of short read sequencing for whole genome gene prediction was investigated for Fibroporia radiculosa, a copper-tolerant basidiomycete fungus that causes brown rot decay of wood. Illumina GAIIX reads from a single run of a paired-end library (75 nt read length, 300 bp insert...

  9. Whole-genome sequencing and assembly with high-throughput, short-read technologies.

    PubMed

    Sundquist, Andreas; Ronaghi, Mostafa; Tang, Haixu; Pevzner, Pavel; Batzoglou, Serafim

    2007-01-01

    While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology. PMID:17534434

  10. Identifying wrong assemblies in de novo short read primary sequence assembly contigs.

    PubMed

    Chawla, Vandna; Kumar, Rajnish; Shankar, Ravi

    2016-09-01

    With the advent of short-reads-based genome sequencing approaches, large number of organisms are being sequenced all over the world. Most of these assemblies are done using some de novo short read assemblers and other related approaches. However, the contigs produced this way are prone to wrong assembly. So far, there is a conspicuous dearth of reliable tools to identify mis-assembled contigs. Mis-assemblies could result from incorrectly deleted or wrongly arranged genomic sequences. In the present work various factors related to sequence, sequencing and assembling have been assessed for their role in causing mis-assembly by using different genome sequencing data. Finally, some mis-assembly detecting tools have been evaluated for their ability to detect the wrongly assembled primary contigs, suggesting a lot of scope for improvement in this area. The present work also proposes a simple unsupervised learning-based novel approach to identify mis-assemblies in the contigs which was found performing reasonably well when compared to the already existing tools to report mis-assembled contigs. It was observed that the proposed methodology may work as a complementary system to the existing tools to enhance their accuracy. PMID:27581937

  11. Assembled sequence contigs by SOAPdenova and Volvet algorithms from metagenomic short reads of a new bacterial isolate of gut origin

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Assembled sequence contigs by SOAPdenova and Volvet algorithms from metagenomic short reads of a new bacterial isolate of gut origin. This study included 2 submissions with a total of 9.8 million bp of assembled contigs....

  12. Haplotype-Phased Synthetic Long Reads from Short-Read Sequencing.

    PubMed

    Stapleton, James A; Kim, Jeongwoon; Hamilton, John P; Wu, Ming; Irber, Luiz C; Maddamsetti, Rohan; Briney, Bryan; Newton, Linsey; Burton, Dennis R; Brown, C Titus; Chan, Christina; Buell, C Robin; Whitehead, Timothy A

    2016-01-01

    Next-generation DNA sequencing has revolutionized the study of biology. However, the short read lengths of the dominant instruments complicate assembly of complex genomes and haplotype phasing of mixtures of similar sequences. Here we demonstrate a method to reconstruct the sequences of individual nucleic acid molecules up to 11.6 kilobases in length from short (150-bp) reads. We show that our method can construct 99.97%-accurate synthetic reads from bacterial, plant, and animal genomic samples, full-length mRNA sequences from human cancer cell lines, and individual HIV env gene variants from a mixture. The preparation of multiple samples can be multiplexed into a single tube, further reducing effort and cost relative to competing approaches. Our approach generates sequencing libraries in three days from less than one microgram of DNA in a single-tube format without custom equipment or specialized expertise. PMID:26789840

  13. Haplotype-Phased Synthetic Long Reads from Short-Read Sequencing

    PubMed Central

    Stapleton, James A.; Kim, Jeongwoon; Hamilton, John P.; Wu, Ming; Irber, Luiz C.; Maddamsetti, Rohan; Briney, Bryan; Newton, Linsey; Burton, Dennis R.; Brown, C. Titus; Chan, Christina; Buell, C. Robin; Whitehead, Timothy A.

    2016-01-01

    Next-generation DNA sequencing has revolutionized the study of biology. However, the short read lengths of the dominant instruments complicate assembly of complex genomes and haplotype phasing of mixtures of similar sequences. Here we demonstrate a method to reconstruct the sequences of individual nucleic acid molecules up to 11.6 kilobases in length from short (150-bp) reads. We show that our method can construct 99.97%-accurate synthetic reads from bacterial, plant, and animal genomic samples, full-length mRNA sequences from human cancer cell lines, and individual HIV env gene variants from a mixture. The preparation of multiple samples can be multiplexed into a single tube, further reducing effort and cost relative to competing approaches. Our approach generates sequencing libraries in three days from less than one microgram of DNA in a single-tube format without custom equipment or specialized expertise. PMID:26789840

  14. Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences

    PubMed Central

    Catchen, Julian M.; Amores, Angel; Hohenlohe, Paul; Cresko, William; Postlethwait, John H.

    2011-01-01

    Advances in sequencing technology provide special opportunities for genotyping individuals with speed and thrift, but the lack of software to automate the calling of tens of thousands of genotypes over hundreds of individuals has hindered progress. Stacks is a software system that uses short-read sequence data to identify and genotype loci in a set of individuals either de novo or by comparison to a reference genome. From reduced representation Illumina sequence data, such as RAD-tags, Stacks can recover thousands of single nucleotide polymorphism (SNP) markers useful for the genetic analysis of crosses or populations. Stacks can generate markers for ultra-dense genetic linkage maps, facilitate the examination of population phylogeography, and help in reference genome assembly. We report here the algorithms implemented in Stacks and demonstrate their efficacy by constructing loci from simulated RAD-tags taken from the stickleback reference genome and by recapitulating and improving a genetic map of the zebrafish, Danio rerio. PMID:22384329

  15. The effect of strand bias in Illumina short-read sequencing data

    PubMed Central

    2012-01-01

    Background When using Illumina high throughput short read data, sometimes the genotype inferred from the positive strand and negative strand are significantly different, with one homozygous and the other heterozygous. This phenomenon is known as strand bias. In this study, we used Illumina short-read sequencing data to evaluate the effect of strand bias on genotyping quality, and to explore the possible causes of strand bias. Result We collected 22 breast cancer samples from 22 patients and sequenced their exome using the Illumina GAIIx machine. By comparing the consistency between the genotypes inferred from this sequencing data with the genotypes inferred from SNP chip data, we found that, when using sequencing data, SNPs with extreme strand bias did not have significantly lower consistency rates compared to SNPs with low or no strand bias. However, this result may be limited by the small subset of SNPs present in both the exome sequencing and the SNP chip data. We further compared the transition and transversion ratio and the number of novel non-synonymous SNPs between the SNPs with low or no strand bias and those with extreme strand bias, and found that SNPs with low or no strand bias have better overall quality. We also discovered that the strand bias occurs randomly at genomic positions across these samples, and observed no consistent pattern of strand bias location across samples. By comparing results from two different aligners, BWA and Bowtie, we found very consistent strand bias patterns. Thus strand bias is unlikely to be caused by alignment artifacts. We successfully replicated our results using two additional independent datasets with different capturing methods and Illumina sequencers. Conclusion Extreme strand bias indicates a potential high false-positive rate for SNPs. PMID:23176052

  16. Investigating bisulfite short-read mapping failure with hairpin bisulfite sequencing data

    PubMed Central

    2015-01-01

    Background DNA methylation is an important epigenetic mark relevant to normal development and disease genesis. A common approach to characterizing genome-wide DNA methylation is using Next Generation Sequencing technology to sequence bisulfite treated DNA. The short sequence reads are mapped to the reference genome to determine the methylation statuses of Cs. However, despite intense effort, a much smaller proportion of the reads derived from bisulfite treated DNA (usually about 40-80%) can be mapped than regular short reads mapping (> 90%), and it is unclear what factors lead to this low mapping efficiency. Results To address this issue, we used the hairpin bisulfite sequencing technology to determine sequences of both DNA double strands simultaneously. This enabled the recovery of the original non-bisulfite-converted sequences. We used Bismark for bisulfite read mapping and Bowtie2 for recovered read mapping. We found that recovering the reads improved unique mapping efficiency by 9-10% compared to the bisulfite reads. Such improvement in mapping efficiency is related to sequence entropy. Conclusions The hairpin recovery technique improves mapping efficiency, and sequence entropy relates to mapping efficiency. PMID:26576456

  17. AREM: Aligning Short Reads from ChIP-Sequencing by Expectation Maximization

    NASA Astrophysics Data System (ADS)

    Newkirk, Daniel; Biesinger, Jacob; Chon, Alvin; Yokomori, Kyoko; Xie, Xiaohui

    High-throughput sequencing coupled to chromatin immunoprecipitation (ChIP-Seq) is widely used in characterizing genome-wide binding patterns of transcription factors, cofactors, chromatin modifiers, and other DNA binding proteins. A key step in ChIP-Seq data analysis is to map short reads from high-throughput sequencing to a reference genome and identify peak regions enriched with short reads. Although several methods have been proposed for ChIP-Seq analysis, most existing methods only consider reads that can be uniquely placed in the reference genome, and therefore have low power for detecting peaks located within repeat sequences. Here we introduce a probabilistic approach for ChIP-Seq data analysis which utilizes all reads, providing a truly genome-wide view of binding patterns. Reads are modeled using a mixture model corresponding to K enriched regions and a null genomic background. We use maximum likelihood to estimate the locations of the enriched regions, and implement an expectation-maximization (E-M) algorithm, called AREM (aligning reads by expectation maximization), to update the alignment probabilities of each read to different genomic locations. We apply the algorithm to identify genome-wide binding events of two proteins: Rad21, a component of cohesin and a key factor involved in chromatid cohesion, and Srebp-1, a transcription factor important for lipid/cholesterol homeostasis. Using AREM, we were able to identify 19,935 Rad21 peaks and 1,748 Srebp-1 peaks in the mouse genome with high confidence, including 1,517 (7.6%) Rad21 peaks and 227 (13%) Srebp-1 peaks that were missed using only uniquely mapped reads. The open source implementation of our algorithm is available at http://sourceforge.net/projects/arem

  18. MOST: a modified MLST typing tool based on short read sequencing.

    PubMed

    Tewolde, Rediat; Dallman, Timothy; Schaefer, Ulf; Sheppard, Carmen L; Ashton, Philip; Pichon, Bruno; Ellington, Matthew; Swift, Craig; Green, Jonathan; Underwood, Anthony

    2016-01-01

    Multilocus sequence typing (MLST) is an effective method to describe bacterial populations. Conventionally, MLST involves Polymerase Chain Reaction (PCR) amplification of housekeeping genes followed by Sanger DNA sequencing. Public Health England (PHE) is in the process of replacing the conventional MLST methodology with a method based on short read sequence data derived from Whole Genome Sequencing (WGS). This paper reports the comparison of the reliability of MLST results derived from WGS data, comparing mapping and assembly-based approaches to conventional methods using 323 bacterial genomes of diverse species. The sensitivity of the two WGS based methods were further investigated with 26 mixed and 29 low coverage genomic data sets from Salmonella enteridis and Streptococcus pneumoniae. Of the 323 samples, 92.9% (n = 300), 97.5% (n = 315) and 99.7% (n = 322) full MLST profiles were derived by the conventional method, assembly- and mapping-based approaches, respectively. The concordance between samples that were typed by conventional (92.9%) and both WGS methods was 100%. From the 55 mixed and low coverage genomes, 89.1% (n = 49) and 67.3% (n = 37) full MLST profiles were derived from the mapping and assembly based approaches, respectively. In conclusion, deriving MLST from WGS data is more sensitive than the conventional method. When comparing WGS based methods, the mapping based approach was the most sensitive. In addition, the mapping based approach described here derives quality metrics, which are difficult to determine quantitatively using conventional and WGS-assembly based approaches. PMID:27602279

  19. MOST: a modified MLST typing tool based on short read sequencing

    PubMed Central

    Dallman, Timothy; Schaefer, Ulf; Sheppard, Carmen L.; Ashton, Philip; Pichon, Bruno; Ellington, Matthew; Swift, Craig; Green, Jonathan; Underwood, Anthony

    2016-01-01

    Multilocus sequence typing (MLST) is an effective method to describe bacterial populations. Conventionally, MLST involves Polymerase Chain Reaction (PCR) amplification of housekeeping genes followed by Sanger DNA sequencing. Public Health England (PHE) is in the process of replacing the conventional MLST methodology with a method based on short read sequence data derived from Whole Genome Sequencing (WGS). This paper reports the comparison of the reliability of MLST results derived from WGS data, comparing mapping and assembly-based approaches to conventional methods using 323 bacterial genomes of diverse species. The sensitivity of the two WGS based methods were further investigated with 26 mixed and 29 low coverage genomic data sets from Salmonella enteridis and Streptococcus pneumoniae. Of the 323 samples, 92.9% (n = 300), 97.5% (n = 315) and 99.7% (n = 322) full MLST profiles were derived by the conventional method, assembly- and mapping-based approaches, respectively. The concordance between samples that were typed by conventional (92.9%) and both WGS methods was 100%. From the 55 mixed and low coverage genomes, 89.1% (n = 49) and 67.3% (n = 37) full MLST profiles were derived from the mapping and assembly based approaches, respectively. In conclusion, deriving MLST from WGS data is more sensitive than the conventional method. When comparing WGS based methods, the mapping based approach was the most sensitive. In addition, the mapping based approach described here derives quality metrics, which are difficult to determine quantitatively using conventional and WGS-assembly based approaches. PMID:27602279

  20. Reference-based compression of short-read sequences using path encoding

    PubMed Central

    Kingsford, Carl; Patro, Rob

    2015-01-01

    Motivation: Storing, transmitting and archiving data produced by next-generation sequencing is a significant computational burden. New compression techniques tailored to short-read sequence data are needed. Results: We present here an approach to compression that reduces the difficulty of managing large-scale sequencing data. Our novel approach sits between pure reference-based compression and reference-free compression and combines much of the benefit of reference-based approaches with the flexibility of de novo encoding. Our method, called path encoding, draws a connection between storing paths in de Bruijn graphs and context-dependent arithmetic coding. Supporting this method is a system to compactly store sets of kmers that is of independent interest. We are able to encode RNA-seq reads using 3–11% of the space of the sequence in raw FASTA files, which is on average more than 34% smaller than competing approaches. We also show that even if the reference is very poorly matched to the reads that are being encoded, good compression can still be achieved. Availability and implementation: Source code and binaries freely available for download at http://www.cs.cmu.edu/∼ckingsf/software/pathenc/, implemented in Go and supported on Linux and Mac OS X. Contact: carlk@cs.cmu.edu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25649622

  1. Short-read, high-throughput sequencing technology for STR genotyping

    PubMed Central

    Bornman, Daniel M.; Hester, Mark E.; Schuetter, Jared M.; Kasoji, Manjula D.; Minard-Smith, Angela; Barden, Curt A.; Nelson, Scott C.; Godbold, Gene D.; Baker, Christine H.; Yang, Boyu; Walther, Jacquelyn E.; Tornes, Ivan E.; Yan, Pearlly S.; Rodriguez, Benjamin; Bundschuh, Ralf; Dickens, Michael L.; Young, Brian A.; Faith, Seth A.

    2013-01-01

    DNA-based methods for human identification principally rely upon genotyping of short tandem repeat (STR) loci. Electrophoretic-based techniques for variable-length classification of STRs are universally utilized, but are limited in that they have relatively low throughput and do not yield nucleotide sequence information. High-throughput sequencing technology may provide a more powerful instrument for human identification, but is not currently validated for forensic casework. Here, we present a systematic method to perform high-throughput genotyping analysis of the Combined DNA Index System (CODIS) STR loci using short-read (150 bp) massively parallel sequencing technology. Open source reference alignment tools were optimized to evaluate PCR-amplified STR loci using a custom designed STR genome reference. Evaluation of this approach demonstrated that the 13 CODIS STR loci and amelogenin (AMEL) locus could be accurately called from individual and mixture samples. Sensitivity analysis showed that as few as 18,500 reads, aligned to an in silico referenced genome, were required to genotype an individual (>99% confidence) for the CODIS loci. The power of this technology was further demonstrated by identification of variant alleles containing single nucleotide polymorphisms (SNPs) and the development of quantitative measurements (reads) for resolving mixed samples. PMID:25621315

  2. Short-read sequencing for genomic analysis of the brown rot fungus Fibroporia radiculosa.

    PubMed

    Tang, Juliet D; Perkins, Andy D; Sonstegard, Tad S; Schroeder, Steven G; Burgess, Shane C; Diehl, Susan V

    2012-04-01

    The feasibility of short-read sequencing for genomic analysis was demonstrated for Fibroporia radiculosa, a copper-tolerant fungus that causes brown rot decay of wood. The effect of read quality on genomic assembly was assessed by filtering Illumina GAIIx reads from a single run of a paired-end library (75-nucleotide read length and 300-bp fragment size) at three different stringency levels and then assembling each data set with Velvet. A simple approach was devised to determine which filter stringency was "best." Venn diagrams identified the regions containing reads that were used in an assembly but were of a low-enough quality to be removed by a filter. By plotting base quality histograms of reads in this region, we judged whether a filter was too stringent or not stringent enough. Our best assembly had a genome size of 33.6 Mb, an N50 of 65.8 kb for a k-mer of 51, and a maximum contig length of 347 kb. Using GeneMark, 9,262 genes were predicted. TargetP and SignalP analyses showed that among the 1,213 genes with secreted products, 986 had motifs for signal peptides and 227 had motifs for signal anchors. Blast2GO analysis provided functional annotation for 5,407 genes. We identified 29 genes with putative roles in copper tolerance and 73 genes for lignocellulose degradation. A search for homologs of these 102 genes showed that F. radiculosa exhibited more similarity to Postia placenta than Serpula lacrymans. Notable differences were found, however, and their involvements in copper tolerance and wood decay are discussed. PMID:22247176

  3. Characterization of a biogas-producing microbial community by short-read next generation DNA sequencing

    PubMed Central

    2012-01-01

    Background Renewable energy production is currently a major issue worldwide. Biogas is a promising renewable energy carrier as the technology of its production combines the elimination of organic waste with the formation of a versatile energy carrier, methane. In consequence of the complexity of the microbial communities and metabolic pathways involved the biotechnology of the microbiological process leading to biogas production is poorly understood. Metagenomic approaches are suitable means of addressing related questions. In the present work a novel high-throughput technique was tested for its benefits in resolving the functional and taxonomical complexity of such microbial consortia. Results It was demonstrated that the extremely parallel SOLiD™ short-read DNA sequencing platform is capable of providing sufficient useful information to decipher the systematic and functional contexts within a biogas-producing community. Although this technology has not been employed to address such problems previously, the data obtained compare well with those from similar high-throughput approaches such as 454-pyrosequencing GS FLX or Titanium. The predominant microbes contributing to the decomposition of organic matter include members of the Eubacteria, class Clostridia, order Clostridiales, family Clostridiaceae. Bacteria belonging in other systematic groups contribute to the diversity of the microbial consortium. Archaea comprise a remarkably small minority in this community, given their crucial role in biogas production. Among the Archaea, the predominant order is the Methanomicrobiales and the most abundant species is Methanoculleus marisnigri. The Methanomicrobiales are hydrogenotrophic methanogens. Besides corroborating earlier findings on the significance of the contribution of the Clostridia to organic substrate decomposition, the results demonstrate the importance of the metabolism of hydrogen within the biogas producing microbial community. Conclusions Both

  4. PARalyzer: definition of RNA binding sites from PAR-CLIP short-read sequence data

    PubMed Central

    2011-01-01

    Crosslinking and immunoprecipitation (CLIP) protocols have made it possible to identify transcriptome-wide RNA-protein interaction sites. In particular, PAR-CLIP utilizes a photoactivatable nucleoside for more efficient crosslinking. We present an approach, centered on the novel PARalyzer tool, for mapping high-confidence sites from PAR-CLIP deep-sequencing data. We show that PARalyzer delineates sites with a high signal-to-noise ratio. Motif finding identifies the sequence preferences of RNA-binding proteins, as well as seed-matches for highly expressed microRNAs when profiling Argonaute proteins. Our study describes tailored analytical methods and provides guidelines for future efforts to utilize high-throughput sequencing in RNA biology. PARalyzer is available at http://www.genome.duke.edu/labs/ohler/research/PARalyzer/. PMID:21851591

  5. Efficient Graph Based Assembly of Short-Read Sequences on Hybrid Core Architecture

    SciTech Connect

    Sczyrba, Alex; Pratap, Abhishek; Canon, Shane; Han, James; Copeland, Alex; Wang, Zhong; Brewer, Tony; Soper, David; D'Jamoos, Mike; Collins, Kirby; Vacek, George

    2011-03-22

    Advanced architectures can deliver dramatically increased throughput for genomics and proteomics applications, reducing time-to-completion in some cases from days to minutes. One such architecture, hybrid-core computing, marries a traditional x86 environment with a reconfigurable coprocessor, based on field programmable gate array (FPGA) technology. In addition to higher throughput, increased performance can fundamentally improve research quality by allowing more accurate, previously impractical approaches. We will discuss the approach used by Convey?s de Bruijn graph constructor for short-read, de-novo assembly. Bioinformatics applications that have random access patterns to large memory spaces, such as graph-based algorithms, experience memory performance limitations on cache-based x86 servers. Convey?s highly parallel memory subsystem allows application-specific logic to simultaneously access 8192 individual words in memory, significantly increasing effective memory bandwidth over cache-based memory systems. Many algorithms, such as Velvet and other de Bruijn graph based, short-read, de-novo assemblers, can greatly benefit from this type of memory architecture. Furthermore, small data type operations (four nucleotides can be represented in two bits) make more efficient use of logic gates than the data types dictated by conventional programming models.JGI is comparing the performance of Convey?s graph constructor and Velvet on both synthetic and real data. We will present preliminary results on memory usage and run time metrics for various data sets with different sizes, from small microbial and fungal genomes to very large cow rumen metagenome. For genomes with references we will also present assembly quality comparisons between the two assemblers.

  6. Similarity thresholds used in DNA sequence assembly from short reads can reduce the comparability of population histories across species

    PubMed Central

    Judy, Caroline Duffie; Seeholzer, Glenn F.; Maley, James M.; Graves, Gary R.; Brumfield, Robb T.

    2015-01-01

    Comparing inferences among datasets generated using short read sequencing may provide insight into the concerted impacts of divergence, gene flow and selection across organisms, but comparisons are complicated by biases introduced during dataset assembly. Sequence similarity thresholds allow the de novo assembly of short reads into clusters of alleles representing different loci, but the resulting datasets are sensitive to both the similarity threshold used and to the variation naturally present in the organism under study. Thresholds that require high sequence similarity among reads for assembly (stringent thresholds) as well as highly variable species may result in datasets in which divergent alleles are lost or divided into separate loci (‘over-splitting’), whereas liberal thresholds increase the risk of paralogous loci being combined into a single locus (‘under-splitting’). Comparisons among datasets or species are therefore potentially biased if different similarity thresholds are applied or if the species differ in levels of within-lineage genetic variation. We examine the impact of a range of similarity thresholds on assembly of empirical short read datasets from populations of four different non-model bird lineages (species or species pairs) with different levels of genetic divergence. We find that, in all species, stringent similarity thresholds result in fewer alleles per locus than more liberal thresholds, which appears to be the result of high levels of over-splitting. The frequency of putative under-splitting, conversely, is low at all thresholds. Inferred genetic distances between individuals, gene tree depths, and estimates of the ancestral mutation-scaled effective population size (θ) differ depending upon the similarity threshold applied. Relative differences in inferences across species differ even when the same threshold is applied, but may be dramatically different when datasets assembled under different thresholds are compared. These

  7. TGS-TB: Total Genotyping Solution for Mycobacterium tuberculosis Using Short-Read Whole-Genome Sequencing

    PubMed Central

    Sekizuka, Tsuyoshi; Yamashita, Akifumi; Murase, Yoshiro; Iwamoto, Tomotada; Mitarai, Satoshi; Kato, Seiya; Kuroda, Makoto

    2015-01-01

    Whole-genome sequencing (WGS) with next-generation DNA sequencing (NGS) is an increasingly accessible and affordable method for genotyping hundreds of Mycobacterium tuberculosis (Mtb) isolates, leading to more effective epidemiological studies involving single nucleotide variations (SNVs) in core genomic sequences based on molecular evolution. We developed an all-in-one web-based tool for genotyping Mtb, referred to as the Total Genotyping Solution for TB (TGS-TB), to facilitate multiple genotyping platforms using NGS for spoligotyping and the detection of phylogenies with core genomic SNVs, IS6110 insertion sites, and 43 customized loci for variable number tandem repeat (VNTR) through a user-friendly, simple click interface. This methodology is implemented with a KvarQ script to predict MTBC lineages/sublineages and potential antimicrobial resistance. Seven Mtb isolates (JP01 to JP07) in this study showing the same VNTR profile were accurately discriminated through median-joining network analysis using SNVs unique to those isolates. An additional IS6110 insertion was detected in one of those isolates as supportive genetic information in addition to core genomic SNVs. The results of in silico analyses using TGS-TB are consistent with those obtained using conventional molecular genotyping methods, suggesting that NGS short reads could provide multiple genotypes to discriminate multiple strains of Mtb, although longer NGS reads (≥300-mer) will be required for full genotyping on the TGS-TB web site. Most available short reads (~100-mer) can be utilized to discriminate the isolates based on the core genome phylogeny. TGS-TB provides a more accurate and discriminative strain typing for clinical and epidemiological investigations; NGS strain typing offers a total genotyping solution for Mtb outbreak and surveillance. TGS-TB web site: https://gph.niid.go.jp/tgs-tb/. PMID:26565975

  8. Rapid Short-Read Sequencing and Aneuploidy Detection Using MinION Nanopore Technology

    PubMed Central

    Wei, Shan; Williams, Zev

    2016-01-01

    MinION is a memory stick–sized nanopore-based sequencer designed primarily for single-molecule sequencing of long DNA fragments (>6 kb). We developed a library preparation and data-analysis method to enable rapid real-time sequencing of short DNA fragments (<1 kb) that resulted in the sequencing of 500 reads in 3 min and 40,000–80,000 reads in 2–4 hr at a rate of 30 nt/sec. We then demonstrated the clinical applicability of this approach by performing successful aneuploidy detection in prenatal and miscarriage samples with sequencing in <4 hr. This method broadens the application of nanopore-based single-molecule sequencing and makes it a promising and versatile tool for rapid clinical and research applications. PMID:26500254

  9. De novo bacterial genome sequencing: Millions of very short reads assembled on a desktop computer

    PubMed Central

    Hernandez, David; François, Patrice; Farinelli, Laurent; Østerås, Magne; Schrenzel, Jacques

    2008-01-01

    Novel high-throughput DNA sequencing technologies allow researchers to characterize a bacterial genome during a single experiment and at a moderate cost. However, the increase in sequencing throughput that is allowed by using such platforms is obtained at the expense of individual sequence read length, which must be assembled into longer contigs to be exploitable. This study focuses on the Illumina sequencing platform that produces millions of very short sequences that are 35 bases in length. We propose a de novo assembler software that is dedicated to process such data. Based on a classical overlap graph representation and on the detection of potentially spurious reads, our software generates a set of accurate contigs of several kilobases that cover most of the bacterial genome. The assembly results were validated by comparing data sets that were obtained experimentally for Staphylococcus aureus strain MW2 and Helicobacter acinonychis strain Sheeba with that of their published genomes acquired by conventional sequencing of 1.5- to 3.0-kb fragments. We also provide indications that the broad coverage achieved by high-throughput sequencing might allow for the detection of clonal polymorphisms in the set of DNA molecules being sequenced. PMID:18332092

  10. Optimal pooling for genome re-sequencing with ultra-high-throughput short-read technologies

    PubMed Central

    Hajirasouliha, Iman; Hormozdiari, Fereydoun; Sahinalp, S. Cenk; Birol, Inanc

    2008-01-01

    New generation sequencing technologies offer unique opportunities and challenges for re-sequencing studies. In this article, we focus on re-sequencing experiments using the Solexa technology, based on bacterial artificial chromosome (BAC) clones, and address an experimental design problem. In these specific experiments, approximate coordinates of the BACs on a reference genome are known, and fine-scale differences between the BAC sequences and the reference are of interest. The high-throughput characteristics of the sequencing technology makes it possible to multiplex BAC sequencing experiments by pooling BACs for a cost-effective operation. However, the way BACs are pooled in such re-sequencing experiments has an effect on the downstream analysis of the generated data, mostly due to subsequences common to multiple BACs. The experimental design strategy we develop in this article offers combinatorial solutions based on approximation algorithms for the well-known max n-cut problem and the related max n-section problem on hypergraphs. Our algorithms, when applied to a number of sample cases give more than a 2-fold performance improvement over random partitioning. Contact: cenk@cs.sfu.ca PMID:18586730

  11. BarraCUDA - a fast short read sequence aligner using graphics processing units

    PubMed Central

    2012-01-01

    Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http://seqbarracuda.sf.net PMID:22244497

  12. RNA-Seq Analysis and Gene Discovery of Andrias davidianus Using Illumina Short Read Sequencing

    PubMed Central

    Li, Fenggang; Wang, Lixin; Lan, Qingjing; Yang, Hui; Li, Yang; Liu, Xiaolin; Yang, Zhaoxia

    2015-01-01

    The Chinese giant salamander, Andrias davidianus, is an important species in the course of evolution; however, there is insufficient genomic data in public databases for understanding its immunologic mechanisms. High-throughput transcriptome sequencing is necessary to generate an enormous number of transcript sequences from A. davidianus for gene discovery. In this study, we generated more than 40 million reads from samples of spleen and skin tissue using the Illumina paired-end sequencing technology. De novo assembly yielded 87,297 transcripts with a mean length of 734 base pairs (bp). Based on the sequence similarities, searching with known proteins, 38,916 genes were identified. Gene enrichment analysis determined that 981 transcripts were assigned to the immune system. Tissue-specific expression analysis indicated that 443 of transcripts were specifically expressed in the spleen and skin. Among these transcripts, 147 transcripts were found to be involved in immune responses and inflammatory reactions, such as fucolectin, β-defensins and lymphotoxin beta. Eight tissue-specific genes were selected for validation using real time reverse transcription quantitative PCR (qRT-PCR). The results showed that these genes were significantly more expressed in spleen and skin than in other tissues, suggesting that these genes have vital roles in the immune response. This work provides a comprehensive genomic sequence resource for A. davidianus and lays the foundation for future research on the immunologic and disease resistance mechanisms of A. davidianus and other amphibians. PMID:25874626

  13. MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping.

    PubMed

    Lee, Wan-Ping; Stromberg, Michael P; Ward, Alistair; Stewart, Chip; Garrison, Erik P; Marth, Gabor T

    2014-01-01

    MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All variant discovery benefits from an accurate description of the read placement confidence. To this end, MOSAIK uses a neural-network based training scheme to provide well-calibrated mapping quality scores, demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO (http://gkno.me). PMID:24599324

  14. Choice of Reference Sequence and Assembler for Alignment of Listeria monocytogenes Short-Read Sequence Data Greatly Influences Rates of Error in SNP Analyses

    PubMed Central

    Pightling, Arthur W.; Petronella, Nicholas; Pagotto, Franco

    2014-01-01

    The wide availability of whole-genome sequencing (WGS) and an abundance of open-source software have made detection of single-nucleotide polymorphisms (SNPs) in bacterial genomes an increasingly accessible and effective tool for comparative analyses. Thus, ensuring that real nucleotide differences between genomes (i.e., true SNPs) are detected at high rates and that the influences of errors (such as false positive SNPs, ambiguously called sites, and gaps) are mitigated is of utmost importance. The choices researchers make regarding the generation and analysis of WGS data can greatly influence the accuracy of short-read sequence alignments and, therefore, the efficacy of such experiments. We studied the effects of some of these choices, including: i) depth of sequencing coverage, ii) choice of reference-guided short-read sequence assembler, iii) choice of reference genome, and iv) whether to perform read-quality filtering and trimming, on our ability to detect true SNPs and on the frequencies of errors. We performed benchmarking experiments, during which we assembled simulated and real Listeria monocytogenes strain 08-5578 short-read sequence datasets of varying quality with four commonly used assemblers (BWA, MOSAIK, Novoalign, and SMALT), using reference genomes of varying genetic distances, and with or without read pre-processing (i.e., quality filtering and trimming). We found that assemblies of at least 50-fold coverage provided the most accurate results. In addition, MOSAIK yielded the fewest errors when reads were aligned to a nearly identical reference genome, while using SMALT to align reads against a reference sequence that is ∼0.82% distant from 08-5578 at the nucleotide level resulted in the detection of the greatest numbers of true SNPs and the fewest errors. Finally, we show that whether read pre-processing improves SNP detection depends upon the choice of reference sequence and assembler. In total, this study demonstrates that researchers should

  15. Choice of reference sequence and assembler for alignment of Listeria monocytogenes short-read sequence data greatly influences rates of error in SNP analyses.

    PubMed

    Pightling, Arthur W; Petronella, Nicholas; Pagotto, Franco

    2014-01-01

    The wide availability of whole-genome sequencing (WGS) and an abundance of open-source software have made detection of single-nucleotide polymorphisms (SNPs) in bacterial genomes an increasingly accessible and effective tool for comparative analyses. Thus, ensuring that real nucleotide differences between genomes (i.e., true SNPs) are detected at high rates and that the influences of errors (such as false positive SNPs, ambiguously called sites, and gaps) are mitigated is of utmost importance. The choices researchers make regarding the generation and analysis of WGS data can greatly influence the accuracy of short-read sequence alignments and, therefore, the efficacy of such experiments. We studied the effects of some of these choices, including: i) depth of sequencing coverage, ii) choice of reference-guided short-read sequence assembler, iii) choice of reference genome, and iv) whether to perform read-quality filtering and trimming, on our ability to detect true SNPs and on the frequencies of errors. We performed benchmarking experiments, during which we assembled simulated and real Listeria monocytogenes strain 08-5578 short-read sequence datasets of varying quality with four commonly used assemblers (BWA, MOSAIK, Novoalign, and SMALT), using reference genomes of varying genetic distances, and with or without read pre-processing (i.e., quality filtering and trimming). We found that assemblies of at least 50-fold coverage provided the most accurate results. In addition, MOSAIK yielded the fewest errors when reads were aligned to a nearly identical reference genome, while using SMALT to align reads against a reference sequence that is ∼0.82% distant from 08-5578 at the nucleotide level resulted in the detection of the greatest numbers of true SNPs and the fewest errors. Finally, we show that whether read pre-processing improves SNP detection depends upon the choice of reference sequence and assembler. In total, this study demonstrates that researchers should

  16. Alignment of Short Reads: A Crucial Step for Application of Next-Generation Sequencing Data in Precision Medicine

    PubMed Central

    Ye, Hao; Meehan, Joe; Tong, Weida; Hong, Huixiao

    2015-01-01

    Precision medicine or personalized medicine has been proposed as a modernized and promising medical strategy. Genetic variants of patients are the key information for implementation of precision medicine. Next-generation sequencing (NGS) is an emerging technology for deciphering genetic variants. Alignment of raw reads to a reference genome is one of the key steps in NGS data analysis. Many algorithms have been developed for alignment of short read sequences since 2008. Users have to make a decision on which alignment algorithm to use in their studies. Selection of the right alignment algorithm determines not only the alignment algorithm but also the set of suitable parameters to be used by the algorithm. Understanding these algorithms helps in selecting the appropriate alignment algorithm for different applications in precision medicine. Here, we review current available algorithms and their major strategies such as seed-and-extend and q-gram filter. We also discuss the challenges in current alignment algorithms, including alignment in multiple repeated regions, long reads alignment and alignment facilitated with known genetic variants. PMID:26610555

  17. Short-read reading-frame predictors are not created equal: sequence error causes loss of signal

    PubMed Central

    2012-01-01

    Background Gene prediction algorithms (or gene callers) are an essential tool for analyzing shotgun nucleic acid sequence data. Gene prediction is a ubiquitous step in sequence analysis pipelines; it reduces the volume of data by identifying the most likely reading frame for a fragment, permitting the out-of-frame translations to be ignored. In this study we evaluate five widely used ab initio gene-calling algorithms—FragGeneScan, MetaGeneAnnotator, MetaGeneMark, Orphelia, and Prodigal—for accuracy on short (75–1000 bp) fragments containing sequence error from previously published artificial data and “real” metagenomic datasets. Results While gene prediction tools have similar accuracies predicting genes on error-free fragments, in the presence of sequencing errors considerable differences between tools become evident. For error-containing short reads, FragGeneScan finds more prokaryotic coding regions than does MetaGeneAnnotator, MetaGeneMark, Orphelia, or Prodigal. This improved detection of genes in error-containing fragments, however, comes at the cost of much lower (50%) specificity and overprediction of genes in noncoding regions. Conclusions Ab initio gene callers offer a significant reduction in the computational burden of annotating individual nucleic acid reads and are used in many metagenomic annotation systems. For predicting reading frames on raw reads, we find the hidden Markov model approach in FragGeneScan is more sensitive than other gene prediction tools, while Prodigal, MGA, and MGM are better suited for higher-quality sequences such as assembled contigs. PMID:22839106

  18. Analysis of gene expression for microminipig liver transcriptomes using parallel long-read technology and short-read sequencing.

    PubMed

    Sakai, Chizuka; Iwano, Shunsuke; Shimizu, Makiko; Onodera, Jun; Uchida, Masashi; Sakurada, Eri; Yamazaki, Yuri; Asaoka, Yoshiji; Imura, Naoko; Uno, Yasuhiro; Murayama, Norie; Hayashi, Ryoji; Yamazaki, Hiroshi; Miyamoto, Yohei

    2016-05-01

    The microminipig is one of the smallest minipigs that has emerged as a possible experimental animal model, because it shares many anatomical and/or physiological similarities with humans, including the coronary artery distribution in the heart, the digestive physiology, the kidney size and its structure, and so on. However, information on gene expression profiles, including those on drug-metabolizing phase I and II enzymes, in the microminipig is limited. Therefore, the aim of the present study was to identify transcripts in microminipig livers and to determine gene expression profiles. De novo assembly and expression analyses of microminipig transcripts were conducted with liver samples from three male and three female microminipigs using parallel long-read and short-read sequencing technologies. After unique sequences had been automatically aligned by assembling software, the mean contig length of 50843 transcripts was 707 bp. The expression profiles of cytochrome P450 (P450) 1A2, 2C, 2E1 and 3A genes in livers in microminipigs were similar to those in humans. Liver carboxylesterase (CES) precursor, liver CES-like, UDP-glucuronosyltransferase (UGT) 2C1-like, amine sulfotransferase (SULT)-like, N-acetyltransferases (NAT8) and glutathione S-transferase (GST) A2 genes, which are relatively unknown genes in pigs and/or humans, were expressed strongly. Furthermore, no significant gender differences were observed in the gene expression profiles of phase I enzymes, whereas UGT2B17, SULT1E1, SULT2A1, amine SULT-like, NAT8 and GSTT4 genes were different between males and females among phase II enzyme genes under the present sample conditions. These results provide a foundation for mechanistic studies and the use of microminipigs as model animals for drug development in the future. Copyright © 2016 John Wiley & Sons, Ltd. PMID:27214158

  19. Short reads and nonmodel species: exploring the complexities of next-generation sequence assembly and SNP discovery in the absence of a reference genome.

    PubMed

    Everett, M V; Grau, E D; Seeb, J E

    2011-03-01

    How practical is gene and SNP discovery in a nonmodel species using short read sequences? Next-generation sequencing technologies are being applied to an increasing number of species with no reference genome. For nonmodel species, the cost, availability of existing genetic resources, genome complexity and the planned method of assembly must all be considered when selecting a sequencing platform. Our goal was to examine the feasibility and optimal methodology for SNP and gene discovery in the sockeye salmon (Oncorhynchus nerka) using short read sequences. SOLiD short reads (up to 50 bp) were generated from single- and pooled-tissue transcriptome libraries from ten sockeye salmon. The individuals were from five distinct populations from the Wood River Lakes and Mendeltna Creek, Alaska. As no reference genome was available for sockeye salmon, the SOLiD sequence reads were assembled to publicly available EST reference sequences from sockeye salmon and two closely related species, rainbow trout (Oncorhynchus mykiss) and Atlantic salmon (Salmo salar). Additionally, de novo assembly of the SOLiD data was carried out, and the SOLiD reads were remapped to the de novo contigs. The results from each reference assembly were compared across all references. The number and size of contigs assembled varied with the size reference sequences. In silico SNP discovery was carried out on contigs from all four EST references; however, discovery of valid SNPs was most successful using one of the two conspecific references. PMID:21429166

  20. Fine De Novo Sequencing of a Fungal Genome Using only SOLiD Short Read Data: Verification on Aspergillus oryzae RIB40

    PubMed Central

    Takeda, Itaru; Hagiwara, Hiroko; Ikegami, Tsutomu; Koike, Hideaki; Machida, Masayuki

    2013-01-01

    The development of next-generation sequencing (NGS) technologies has dramatically increased the throughput, speed, and efficiency of genome sequencing. The short read data generated from NGS platforms, such as SOLiD and Illumina, are quite useful for mapping analysis. However, the SOLiD read data with lengths of <60 bp have been considered to be too short for de novo genome sequencing. Here, to investigate whether de novo sequencing of fungal genomes is possible using only SOLiD short read sequence data, we performed de novo assembly of the Aspergillus oryzae RIB40 genome using only SOLiD read data of 50 bp generated from mate-paired libraries with 2.8- or 1.9-kb insert sizes. The assembled scaffolds showed an N50 value of 1.6 Mb, a 22-fold increase than those obtained using only SOLiD short read in other published reports. In addition, almost 99% of the reference genome was accurately aligned by the assembled scaffold fragments in long lengths. The sequences of secondary metabolite biosynthetic genes and clusters, whose products are of considerable interest in fungal studies due to their potential medicinal, agricultural, and cosmetic properties, were also highly reconstructed in the assembled scaffolds. Based on these findings, we concluded that de novo genome sequencing using only SOLiD short reads is feasible and practical for molecular biological study of fungi. We also investigated the effect of filtering low quality data, library insert size, and k-mer size on the assembly performance, and recommend for the assembly use of mild filtered read data where the N50 was not so degraded and the library has an insert size of ∼2.0 kb, and k-mer size 33. PMID:23667655

  1. Methods for accurate quantification of LTR-retrotransposon copy number using short-read sequence data: a case study in Sorghum.

    PubMed

    Ramachandran, Dhanushya; Hawkins, Jennifer S

    2016-10-01

    Transposable elements (TEs) are ubiquitous in eukaryotic genomes and their mobility impacts genome structure and function in myriad ways. Because of their abundance, activity, and repetitive nature, the characterization and analysis of TEs remain challenging, particularly from short-read sequencing projects. To overcome this difficulty, we have developed a method that estimates TE copy number from short-read sequences. To test the accuracy of our method, we first performed an in silico analysis of the reference Sorghum bicolor genome, using both reference-based and de novo approaches. The resulting TE copy number estimates were strikingly similar to the annotated numbers. We then tested our method on real short-read data by estimating TE copy numbers in several accessions of S. bicolor and its close relative S. propinquum. Both methods effectively identify and rank similar TE families from highest to lowest abundance. We found that de novo characterization was effective at capturing qualitative variation, but underestimated the abundance of some TE families, specifically families of more ancient origin. Also, interspecific reference-based mapping of S. propinquum reads to the S. bicolor database failed to fully describe TE content in S. propinquum, indicative of recent TE activity leading to changes in the respective repetitive landscapes over very short evolutionary timescales. We conclude that reference-based analyses are best suited for within-species comparisons, while de novo approaches are more reliable for evolutionarily distant comparisons. PMID:27295958

  2. The Long March: A Sample Preparation Technique that Enhances Contig Length and Coverage by High-Throughput Short-Read Sequencing

    PubMed Central

    Webster, Dale; Dimon, Michelle; Ruby, J. Graham; Hekele, Armin; DeRisi, Joseph L.

    2008-01-01

    High-throughput short-read technologies have revolutionized DNA sequencing by drastically reducing the cost per base of sequencing information. Despite producing gigabases of sequence per run, these technologies still present obstacles in resequencing and de novo assembly applications due to biased or insufficient target sequence coverage. We present here a simple sample preparation method termed the “long march” that increases both contig lengths and target sequence coverage using high-throughput short-read technologies. By incorporating a Type IIS restriction enzyme recognition motif into the sequencing primer adapter, successive rounds of restriction enzyme cleavage and adapter ligation produce a set of nested sub-libraries from the initial amplicon library. Sequence reads from these sub-libraries are offset from each other with enough overlap to aid assembly and contig extension. We demonstrate the utility of the long march in resequencing of the Plasmodium falciparum transcriptome, where the number of genomic bases covered was increased by 39%, as well as in metagenomic analysis of a serum sample from a patient with hepatitis B virus (HBV)-related acute liver failure, where the number of HBV bases covered was increased by 42%. We also offer a theoretical optimization of the long march for de novo sequence assembly. PMID:18941527

  3. Crystallizing short-read assemblies around seeds

    PubMed Central

    Hossain, Mohammad Sajjad; Azimi, Navid; Skiena, Steven

    2009-01-01

    Background New short-read sequencing technologies produce enormous volumes of 25–30 base paired-end reads. The resulting reads have vastly different characteristics than produced by Sanger sequencing, and require different approaches than the previous generation of sequence assemblers. In this paper, we present a short-read de novo assembler particularly targeted at the new ABI SOLiD sequencing technology. Results This paper presents what we believe to be the first de novo sequence assembly results on real data from the emerging SOLiD platform, introduced by Applied Biosystems. Our assembler SHORTY augments short-paired reads using a trivially small number (5 – 10) of seeds of length 300 – 500 bp. These seeds enable us to produce significant assemblies using short-read coverage no more than 100×, which can be obtained in a single run of these high-capacity sequencers. SHORTY exploits two ideas which we believe to be of interest to the short-read assembly community: (1) using single seed reads to crystallize assemblies, and (2) estimating intercontig distances accurately from multiple spanning paired-end reads. Conclusion We demonstrate effective assemblies (N50 contig sizes ~40 kb) of three different bacterial species using simulated SOLiD data. Sequencing artifacts limit our performance on real data, however our results on this data are substantially better than those achieved by competing assemblers. PMID:19208115

  4. Whole genome sequencing of environmental Vibrio cholerae O1 from 10 nanograms of DNA using short reads.

    PubMed

    Pérez Chaparro, Paula Juliana; McCulloch, John Anthony; Cerdeira, Louise Teixeira; Al-Dilaimi, Arwa; Canto de Sá, Lena Lillian; de Oliveira, Rodrigo; Tauch, Andreas; de Carvalho Azevedo, Vasco Ariston; Cruz Schneider, Maria Paula; da Silva, Artur Luiz da Costa

    2011-11-01

    Multiple Displacement Amplification (MDA) of DNA using φ29 (phi29) DNA polymerase amplifies DNA several billion-fold, which has proved to be potentially very useful for evaluating genome information in a culture-independent manner. Whole genome sequencing using DNA from a single prokaryotic genome copy amplified by MDA has not yet been achieved due to the formation of chimeras and skewed amplification of genomic regions during the MDA step, which then precludes genome assembly. We have hereby addressed the issue by using 10 ng of genomic Vibrio cholerae DNA extracted within an agarose plug to ensure circularity as a starting point for MDA and then sequencing the amplified yield using the SOLiD platform. We successfully managed to assemble the entire genome of V. cholerae strain LMA3984-4 (environmental O1 strain isolated in urban Amazonia) using a hybrid de novo assembly strategy. Using our method, only 178 out of 16,713 (1%) of contigs were not able to be inserted into either chromosome scaffold, and out of these 178, only 3 appeared to be chimeras. The other contigs seem to be the result of template-independent non-specific amplification during MDA, yielding spurious reads. Extraction of genomic DNA within an agarose plug in order to ensure circularity of the extracted genome might be key to minimizing amplification bias by MDA for WGS. PMID:21871929

  5. COPS: a sensitive and accurate tool for detecting somatic Copy Number Alterations using short-read sequence data from paired samples.

    PubMed

    Krishnan, Neeraja M; Gaur, Prakhar; Chaudhary, Rakshit; Rao, Arjun A; Panda, Binay

    2012-01-01

    Copy Number Alterations (CNAs) such as deletions and duplications; compose a larger percentage of genetic variations than single nucleotide polymorphisms or other structural variations in cancer genomes that undergo major chromosomal re-arrangements. It is, therefore, imperative to identify cancer-specific somatic copy number alterations (SCNAs), with respect to matched normal tissue, in order to understand their association with the disease. We have devised an accurate, sensitive, and easy-to-use tool, COPS, COpy number using Paired Samples, for detecting SCNAs. We rigorously tested the performance of COPS using short sequence simulated reads at various sizes and coverage of SCNAs, read depths, read lengths and also with real tumor:normal paired samples. We found COPS to perform better in comparison to other known SCNA detection tools for all evaluated parameters, namely, sensitivity (detection of true positives), specificity (detection of false positives) and size accuracy. COPS performed well for sequencing reads of all lengths when used with most upstream read alignment tools. Additionally, by incorporating a downstream boundary segmentation detection tool, the accuracy of SCNA boundaries was further improved. Here, we report an accurate, sensitive and easy to use tool in detecting cancer-specific SCNAs using short-read sequence data. In addition to cancer, COPS can be used for any disease as long as sequence reads from both disease and normal samples from the same individual are available. An added boundary segmentation detection module makes COPS detected SCNA boundaries more specific for the samples studied. COPS is available at ftp://115.119.160.213 with username "cops" and password "cops". PMID:23110103

  6. Short Read Alignment Using SOAP2.

    PubMed

    Hurgobin, Bhavna

    2016-01-01

    Next-generation sequencing (NGS) technologies have rapidly evolved in the last 5 years, leading to the generation of millions of short reads in a single run. Consequently, various sequence alignment algorithms have been developed to compare these reads to an appropriate reference in order to perform important downstream analysis. SOAP2 from the SOAP series is one of the most commonly used alignment programs to handle NGS data, and it efficiently does so using low computer memory usage and fast alignment speed. This chapter describes the protocol used to align short reads to a reference genome using SOAP2, and highlights the significance of using the in-built command-line options to tune the behavior of the algorithm according to the inputs and the desired results. PMID:26519410

  7. Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions

    PubMed Central

    2014-01-01

    Deep sequencing harnesses the high throughput nature of next generation sequencing technologies to generate population samples, treating information contained in individual reads as meaningful. Here, we review applications of deep sequencing to pathogen evolution. Pioneering deep sequencing studies from the virology literature are discussed, such as whole genome Roche-454 sequencing analyses of the dynamics of the rapidly mutating pathogens hepatitis C virus and HIV. Extension of the deep sequencing approach to bacterial populations is then discussed, including the impacts of emerging sequencing technologies. While it is clear that deep sequencing has unprecedented potential for assessing the genetic structure and evolutionary history of pathogen populations, bioinformatic challenges remain. We summarise current approaches to overcoming these challenges, in particular methods for detecting low frequency variants in the context of sequencing error and reconstructing individual haplotypes from short reads. PMID:24428920

  8. Making sense of deep sequencing.

    PubMed

    Goldman, D; Domschke, K

    2014-10-01

    This review, the first of an occasional series, tries to make sense of the concepts and uses of deep sequencing of polynucleic acids (DNA and RNA). Deep sequencing, synonymous with next-generation sequencing, high-throughput sequencing and massively parallel sequencing, includes whole genome sequencing but is more often and diversely applied to specific parts of the genome captured in different ways, for example the highly expressed portion of the genome known as the exome and portions of the genome that are epigenetically marked either by DNA methylation, the binding of proteins including histones, or that are in different configurations and thus more or less accessible to enzymes that cleave DNA. Deep sequencing of RNA (RNASeq) reverse-transcribed to complementary DNA is invaluable for measuring RNA expression and detecting changes in RNA structure. Important concepts in deep sequencing include the length and depth of sequence reads, mapping and assembly of reads, sequencing error, haplotypes, and the propensity of deep sequencing, as with other types of 'big data', to generate large numbers of errors, requiring monitoring for methodologic biases and strategies for replication and validation. Deep sequencing yields a unique genetic fingerprint that can be used to identify a person, and a trove of predictors of genetic medical diseases. Deep sequencing to identify epigenetic events including changes in DNA methylation and RNA expression can reveal the history and impact of environmental exposures. Because of the power of sequencing to identify and deliver biomedically significant information about a person and their blood relatives, it creates ethical dilemmas and practical challenges in research and clinical care, for example the decision and procedures to report incidental findings that will increasingly and frequently be discovered. PMID:24925306

  9. Comparison of de novo short read assemblers on metagenomic data

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Next-generation sequencing technologies have potentials to revolutionize genomics and biological researches. A flurry of short-read assemblers have been developed recently to facilitate the analysis of the short sequences generated using these technologies. However, none of these assemblers has spec...

  10. Objective and Comprehensive Evaluation of Bisulfite Short Read Mapping Tools

    PubMed Central

    Zhang, Liqing

    2014-01-01

    Background. Large-scale bisulfite treatment and short reads sequencing technology allow comprehensive estimation of methylation states of Cs in the genomes of different tissues, cell types, and developmental stages. Accurate characterization of DNA methylation is essential for understanding genotype phenotype association, gene and environment interaction, diseases, and cancer. Aligning bisulfite short reads to a reference genome has been a challenging task. We compared five bisulfite short read mapping tools, BSMAP, Bismark, BS-Seeker, BiSS, and BRAT-BW, representing two classes of mapping algorithms (hash table and suffix/prefix tries). We examined their mapping efficiency (i.e., the percentage of reads that can be mapped to the genomes), usability, running time, and effects of changing default parameter settings using both real and simulated reads. We also investigated how preprocessing data might affect mapping efficiency. Conclusion. Among the five programs compared, in terms of mapping efficiency, Bismark performs the best on the real data, followed by BiSS, BSMAP, and finally BRAT-BW and BS-Seeker with very similar performance. If CPU time is not a constraint, Bismark is a good choice of program for mapping bisulfite treated short reads. Data quality impacts a great deal mapping efficiency. Although increasing the number of mismatches allowed can increase mapping efficiency, it not only significantly slows down the program, but also runs the risk of having increased false positives. Therefore, users should carefully set the related parameters depending on the quality of their sequencing data. PMID:24839440

  11. Objective and comprehensive evaluation of bisulfite short read mapping tools.

    PubMed

    Tran, Hong; Porter, Jacob; Sun, Ming-An; Xie, Hehuang; Zhang, Liqing

    2014-01-01

    Background. Large-scale bisulfite treatment and short reads sequencing technology allow comprehensive estimation of methylation states of Cs in the genomes of different tissues, cell types, and developmental stages. Accurate characterization of DNA methylation is essential for understanding genotype phenotype association, gene and environment interaction, diseases, and cancer. Aligning bisulfite short reads to a reference genome has been a challenging task. We compared five bisulfite short read mapping tools, BSMAP, Bismark, BS-Seeker, BiSS, and BRAT-BW, representing two classes of mapping algorithms (hash table and suffix/prefix tries). We examined their mapping efficiency (i.e., the percentage of reads that can be mapped to the genomes), usability, running time, and effects of changing default parameter settings using both real and simulated reads. We also investigated how preprocessing data might affect mapping efficiency. Conclusion. Among the five programs compared, in terms of mapping efficiency, Bismark performs the best on the real data, followed by BiSS, BSMAP, and finally BRAT-BW and BS-Seeker with very similar performance. If CPU time is not a constraint, Bismark is a good choice of program for mapping bisulfite treated short reads. Data quality impacts a great deal mapping efficiency. Although increasing the number of mismatches allowed can increase mapping efficiency, it not only significantly slows down the program, but also runs the risk of having increased false positives. Therefore, users should carefully set the related parameters depending on the quality of their sequencing data. PMID:24839440

  12. Droplet barcoding for massively parallel single-molecule deep sequencing

    PubMed Central

    Lan, Freeman; Haliburton, John R.; Yuan, Aaron; Abate, Adam R.

    2016-01-01

    The ability to accurately sequence long DNA molecules is important across biology, but existing sequencers are limited in read length and accuracy. Here, we demonstrate a method to leverage short-read sequencing to obtain long and accurate reads. Using droplet microfluidics, we isolate, amplify, fragment and barcode single DNA molecules in aqueous picolitre droplets, allowing the full-length molecules to be sequenced with multi-fold coverage using short-read sequencing. We show that this approach can provide accurate sequences of up to 10 kb, allowing us to identify rare mutations below the detection limit of conventional sequencing and directly link them into haplotypes. This barcoding methodology can be a powerful tool in sequencing heterogeneous populations such as viruses. PMID:27353563

  13. Droplet barcoding for massively parallel single-molecule deep sequencing.

    PubMed

    Lan, Freeman; Haliburton, John R; Yuan, Aaron; Abate, Adam R

    2016-01-01

    The ability to accurately sequence long DNA molecules is important across biology, but existing sequencers are limited in read length and accuracy. Here, we demonstrate a method to leverage short-read sequencing to obtain long and accurate reads. Using droplet microfluidics, we isolate, amplify, fragment and barcode single DNA molecules in aqueous picolitre droplets, allowing the full-length molecules to be sequenced with multi-fold coverage using short-read sequencing. We show that this approach can provide accurate sequences of up to 10 kb, allowing us to identify rare mutations below the detection limit of conventional sequencing and directly link them into haplotypes. This barcoding methodology can be a powerful tool in sequencing heterogeneous populations such as viruses. PMID:27353563

  14. Qualitative De Novo Analysis of Full Length cDNA and Quantitative Analysis of Gene Expression for Common Marmoset (Callithrix jacchus) Transcriptomes Using Parallel Long-Read Technology and Short-Read Sequencing

    PubMed Central

    Uno, Yasuhiro; Uehara, Shotaro; Inoue, Takashi; Murayama, Norie; Onodera, Jun; Sasaki, Erika; Yamazaki, Hiroshi

    2014-01-01

    The common marmoset (Callithrix jacchus) is a non-human primate that could prove useful as human pharmacokinetic and biomedical research models. The cytochromes P450 (P450s) are a superfamily of enzymes that have critical roles in drug metabolism and disposition via monooxygenation of a broad range of xenobiotics; however, information on some marmoset P450s is currently limited. Therefore, identification and quantitative analysis of tissue-specific mRNA transcripts, including those of P450s and flavin-containing monooxygenases (FMO, another monooxygenase family), need to be carried out in detail before the marmoset can be used as an animal model in drug development. De novo assembly and expression analysis of marmoset transcripts were conducted with pooled liver, intestine, kidney, and brain samples from three male and three female marmosets. After unique sequences were automatically aligned by assembling software, the mean contig length was 718 bp (with a standard deviation of 457 bp) among a total of 47,883 transcripts. Approximately 30% of the total transcripts were matched to known marmoset sequences. Gene expression in 18 marmoset P450- and 4 FMO-like genes displayed some tissue-specific patterns. Of these, the three most highly expressed in marmoset liver were P450 2D-, 2E-, and 3A-like genes. In extrahepatic tissues, including brain, gene expressions of these monooxygenases were lower than those in liver, although P450 3A4 (previously P450 3A21) in intestine and P450 4A11- and FMO1-like genes in kidney were relatively highly expressed. By means of massive parallel long-read sequencing and short-read technology applied to marmoset liver, intestine, kidney, and brain, the combined next-generation sequencing analyses reported here were able to identify novel marmoset drug-metabolizing P450 transcripts that have until now been little reported. These results provide a foundation for mechanistic studies and pave the way for the use of marmosets as model animals

  15. A hybrid short read mapping accelerator

    PubMed Central

    2013-01-01

    Background The rapid growth of short read datasets poses a new challenge to the short read mapping problem in terms of sensitivity and execution speed. Existing methods often use a restrictive error model for computing the alignments to improve speed, whereas more flexible error models are generally too slow for large-scale applications. A number of short read mapping software tools have been proposed. However, designs based on hardware are relatively rare. Field programmable gate arrays (FPGAs) have been successfully used in a number of specific application areas, such as the DSP and communications domains due to their outstanding parallel data processing capabilities, making them a competitive platform to solve problems that are “inherently parallel”. Results We present a hybrid system for short read mapping utilizing both FPGA-based hardware and CPU-based software. The computation intensive alignment and the seed generation operations are mapped onto an FPGA. We present a computationally efficient, parallel block-wise alignment structure (Align Core) to approximate the conventional dynamic programming algorithm. The performance is compared to the multi-threaded CPU-based GASSST and BWA software implementations. For single-end alignment, our hybrid system achieves faster processing speed than GASSST (with a similar sensitivity) and BWA (with a higher sensitivity); for pair-end alignment, our design achieves a slightly worse sensitivity than that of BWA but has a higher processing speed. Conclusions This paper shows that our hybrid system can effectively accelerate the mapping of short reads to a reference genome based on the seed-and-extend approach. The performance comparison to the GASSST and BWA software implementations under different conditions shows that our hybrid design achieves a high degree of sensitivity and requires less overall execution time with only modest FPGA resource utilization. Our hybrid system design also shows that the performance

  16. RAPSearch: a fast protein similarity search tool for short reads

    PubMed Central

    2011-01-01

    Background Next Generation Sequencing (NGS) is producing enormous corpuses of short DNA reads, affecting emerging fields like metagenomics. Protein similarity search--a key step to achieve annotation of protein-coding genes in these short reads, and identification of their biological functions--faces daunting challenges because of the very sizes of the short read datasets. Results We developed a fast protein similarity search tool RAPSearch that utilizes a reduced amino acid alphabet and suffix array to detect seeds of flexible length. For short reads (translated in 6 frames) we tested, RAPSearch achieved ~20-90 times speedup as compared to BLASTX. RAPSearch missed only a small fraction (~1.3-3.2%) of BLASTX similarity hits, but it also discovered additional homologous proteins (~0.3-2.1%) that BLASTX missed. By contrast, BLAT, a tool that is even slightly faster than RAPSearch, had significant loss of sensitivity as compared to RAPSearch and BLAST. Conclusions RAPSearch is implemented as open-source software and is accessible at http://omics.informatics.indiana.edu/mg/RAPSearch. It enables faster protein similarity search. The application of RAPSearch in metageomics has also been demonstrated. PMID:21575167

  17. EC: an efficient error correction algorithm for short reads

    PubMed Central

    2015-01-01

    Background In highly parallel next-generation sequencing (NGS) techniques millions to billions of short reads are produced from a genomic sequence in a single run. Due to the limitation of the NGS technologies, there could be errors in the reads. The error rate of the reads can be reduced with trimming and by correcting the erroneous bases of the reads. It helps to achieve high quality data and the computational complexity of many biological applications will be greatly reduced if the reads are first corrected. We have developed a novel error correction algorithm called EC and compared it with four other state-of-the-art algorithms using both real and simulated sequencing reads. Results We have done extensive and rigorous experiments that reveal that EC is indeed an effective, scalable, and efficient error correction tool. Real reads that we have employed in our performance evaluation are Illumina-generated short reads of various lengths. Six experimental datasets we have utilized are taken from sequence and read archive (SRA) at NCBI. The simulated reads are obtained by picking substrings from random positions of reference genomes. To introduce errors, some of the bases of the simulated reads are changed to other bases with some probabilities. Conclusions Error correction is a vital problem in biology especially for NGS data. In this paper we present a novel algorithm, called Error Corrector (EC), for correcting substitution errors in biological sequencing reads. We plan to investigate the possibility of employing the techniques introduced in this research paper to handle insertion and deletion errors also. Software availability The implementation is freely available for non-commercial purposes. It can be downloaded from: http://engr.uconn.edu/~rajasek/EC.zip. PMID:26678663

  18. Quantitative phenotyping via deep barcode sequencing.

    PubMed

    Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey

    2009-10-01

    Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale. PMID:19622793

  19. SAMMate: a GUI tool for processing short read alignments in SAM/BAM format

    PubMed Central

    2011-01-01

    Background Next Generation Sequencing (NGS) technology generates tens of millions of short reads for each DNA/RNA sample. A key step in NGS data analysis is the short read alignment of the generated sequences to a reference genome. Although storing alignment information in the Sequence Alignment/Map (SAM) or Binary SAM (BAM) format is now standard, biomedical researchers still have difficulty accessing this information. Results We have developed a Graphical User Interface (GUI) software tool named SAMMate. SAMMate allows biomedical researchers to quickly process SAM/BAM files and is compatible with both single-end and paired-end sequencing technologies. SAMMate also automates some standard procedures in DNA-seq and RNA-seq data analysis. Using either standard or customized annotation files, SAMMate allows users to accurately calculate the short read coverage of genomic intervals. In particular, for RNA-seq data SAMMate can accurately calculate the gene expression abundance scores for customized genomic intervals using short reads originating from both exons and exon-exon junctions. Furthermore, SAMMate can quickly calculate a whole-genome signal map at base-wise resolution allowing researchers to solve an array of bioinformatics problems. Finally, SAMMate can export both a wiggle file for alignment visualization in the UCSC genome browser and an alignment statistics report. The biological impact of these features is demonstrated via several case studies that predict miRNA targets using short read alignment information files. Conclusions With just a few mouse clicks, SAMMate will provide biomedical researchers easy access to important alignment information stored in SAM/BAM files. Our software is constantly updated and will greatly facilitate the downstream analysis of NGS data. Both the source code and the GUI executable are freely available under the GNU General Public License at http://sammate.sourceforge.net. PMID:21232146

  20. CRISPR Detection From Short Reads Using Partial Overlap Graphs.

    PubMed

    Ben-Bassat, Ilan; Chor, Benny

    2016-06-01

    Clustered regularly interspaced short palindromic repeats (CRISPR) are structured regions in bacterial and archaeal genomes, which are part of an adaptive immune system against phages. CRISPRs are important for many microbial studies and are playing an essential role in current gene editing techniques. As such, they attract substantial research interest. The exponential growth in the amount of bacterial sequence data in recent years enables the exploration of CRISPR loci in more and more species. Most of the automated tools that detect CRISPR loci rely on fully assembled genomes. However, many assemblers do not handle repetitive regions successfully. The first tool to work directly on raw sequence data is Crass, which requires reads that are long enough to contain two copies of the same repeat. We present a method to identify CRISPR repeats from raw sequence data of short reads. The algorithm is based on an observation differentiating CRISPR repeats from other types of repeats, and it involves a series of partial constructions of the overlap graph. This enables us to avoid many of the difficulties that assemblers face, as we merely aim to identify the repeats that belong to CRISPR loci. A preliminary implementation of the algorithm shows good results and detects CRISPR repeats in cases where other existing tools fail to do so. PMID:27058690

  1. Non-referenced genome assembly from epigenomic short-read data.

    PubMed

    Kaspi, Antony; Ziemann, Mark; Keating, Samuel T; Khurana, Ishant; Connor, Timothy; Spolding, Briana; Cooper, Adrian; Lazarus, Ross; Walder, Ken; Zimmet, Paul; El-Osta, Assam

    2014-10-01

    Current computational methods used to analyze changes in DNA methylation and chromatin modification rely on sequenced genomes. Here we describe a pipeline for the detection of these changes from short-read sequence data that does not require a reference genome. Open source software packages were used for sequence assembly, alignment, and measurement of differential enrichment. The method was evaluated by comparing results with reference-based results showing a strong correlation between chromatin modification and gene expression. We then used our de novo sequence assembly to build the DNA methylation profile for the non-referenced Psammomys obesus genome. The pipeline described uses open source software for fast annotation and visualization of unreferenced genomic regions from short-read data. PMID:25437048

  2. SNP discovery and allele frequency estimation by deep sequencing of reduced representation libraries

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genome projects routinely produce draft sequences for species from diverse evolutionary clades, but generally do not create single nucleotide polymorphism (SNP) resources. We present an approach for de novo SNP discovery based on short-read sequencing of reduced representation libraries (RRL) to ge...

  3. A Bayesian Assignment Method for Ambiguous Bisulfite Short Reads

    PubMed Central

    Tran, Hong; Wu, Xiaowei; Tithi, Saima; Sun, Ming-an; Xie, Hehuang; Zhang, Liqing

    2016-01-01

    DNA methylation is an epigenetic modification critical for normal development and diseases. The determination of genome-wide DNA methylation at single-nucleotide resolution is made possible by sequencing bisulfite treated DNA with next generation high-throughput sequencing. However, aligning bisulfite short reads to a reference genome remains challenging as only a limited proportion of them (around 50–70%) can be aligned uniquely; a significant proportion, known as multireads, are mapped to multiple locations and thus discarded from downstream analyses, causing financial waste and biased methylation inference. To address this issue, we develop a Bayesian model that assigns multireads to their most likely locations based on the posterior probability derived from information hidden in uniquely aligned reads. Analyses of both simulated data and real hairpin bisulfite sequencing data show that our method can effectively assign approximately 70% of the multireads to their best locations with up to 90% accuracy, leading to a significant increase in the overall mapping efficiency. Moreover, the assignment model shows robust performance with low coverage depth, making it particularly attractive considering the prohibitive cost of bisulfite sequencing. Additionally, results show that longer reads help improve the performance of the assignment model. The assignment model is also robust to varying degrees of methylation and varying sequencing error rates. Finally, incorporating prior knowledge on mutation rate and context specific methylation level into the assignment model increases inference accuracy. The assignment model is implemented in the BAM-ABS package and freely available at https://github.com/zhanglabvt/BAM_ABS. PMID:27011215

  4. Deep Whole-Genome Sequencing to Detect Mixed Infection of Mycobacterium tuberculosis

    PubMed Central

    Gan, Mingyu; Liu, Qingyun; Yang, Chongguang; Gao, Qian; Luo, Tao

    2016-01-01

    Mixed infection by multiple Mycobacterium tuberculosis (MTB) strains is associated with poor treatment outcome of tuberculosis (TB). Traditional genotyping methods have been used to detect mixed infections of MTB, however, their sensitivity and resolution are limited. Deep whole-genome sequencing (WGS) has been proved highly sensitive and discriminative for studying population heterogeneity of MTB. Here, we developed a phylogenetic-based method to detect MTB mixed infections using WGS data. We collected published WGS data of 782 global MTB strains from public database. We called homogeneous and heterogeneous single nucleotide variations (SNVs) of individual strains by mapping short reads to the ancestral MTB reference genome. We constructed a phylogenomic database based on 68,639 homogeneous SNVs of 652 MTB strains. Mixed infections were determined if multiple evolutionary paths were identified by mapping the SNVs of individual samples to the phylogenomic database. By simulation, our method could specifically detect mixed infections when the sequencing depth of minor strains was as low as 1× coverage, and when the genomic distance of two mixed strains was as small as 16 SNVs. By applying our methods to all 782 samples, we detected 47 mixed infections and 45 of them were caused by locally endemic strains. The results indicate that our method is highly sensitive and discriminative for identifying mixed infections from deep WGS data of MTB isolates. PMID:27391214

  5. Deep Whole-Genome Sequencing to Detect Mixed Infection of Mycobacterium tuberculosis.

    PubMed

    Gan, Mingyu; Liu, Qingyun; Yang, Chongguang; Gao, Qian; Luo, Tao

    2016-01-01

    Mixed infection by multiple Mycobacterium tuberculosis (MTB) strains is associated with poor treatment outcome of tuberculosis (TB). Traditional genotyping methods have been used to detect mixed infections of MTB, however, their sensitivity and resolution are limited. Deep whole-genome sequencing (WGS) has been proved highly sensitive and discriminative for studying population heterogeneity of MTB. Here, we developed a phylogenetic-based method to detect MTB mixed infections using WGS data. We collected published WGS data of 782 global MTB strains from public database. We called homogeneous and heterogeneous single nucleotide variations (SNVs) of individual strains by mapping short reads to the ancestral MTB reference genome. We constructed a phylogenomic database based on 68,639 homogeneous SNVs of 652 MTB strains. Mixed infections were determined if multiple evolutionary paths were identified by mapping the SNVs of individual samples to the phylogenomic database. By simulation, our method could specifically detect mixed infections when the sequencing depth of minor strains was as low as 1× coverage, and when the genomic distance of two mixed strains was as small as 16 SNVs. By applying our methods to all 782 samples, we detected 47 mixed infections and 45 of them were caused by locally endemic strains. The results indicate that our method is highly sensitive and discriminative for identifying mixed infections from deep WGS data of MTB isolates. PMID:27391214

  6. Simultaneous alignment of short reads against multiple genomes

    PubMed Central

    Schneeberger, Korbinian; Hagmann, Jörg; Ossowski, Stephan; Warthmann, Norman; Gesing, Sandra; Kohlbacher, Oliver; Weigel, Detlef

    2009-01-01

    Genome resequencing with short reads generally relies on alignments against a single reference. GenomeMapper supports simultaneous mapping of short reads against multiple genomes by integrating related genomes (e.g., individuals of the same species) into a single graph structure. It constitutes the first approach for handling multiple references and introduces representations for alignments against complex structures. Demonstrated benefits include access to polymorphisms that cannot be identified by alignments against the reference alone. Download GenomeMapper at . PMID:19761611

  7. Short read DNA fragment anchoring algorithm

    PubMed Central

    Wang, Wendi; Zhang, Peiheng; Liu, Xinchun

    2009-01-01

    Background The emerging next-generation sequencing method based on PCR technology boosts genome sequencing speed considerably, the expense is also get decreased. It has been utilized to address a broad range of bioinformatics problems. Limited by reliable output sequence length of next-generation sequencing technologies, we are confined to study gene fragments with 30~50 bps in general and it is relatively shorter than traditional gene fragment length. Anchoring gene fragments in long reference sequence is an essential and prerequisite step for further assembly and analysis works. Due to the sheer number of fragments produced by next-generation sequencing technologies and the huge size of reference sequences, anchoring would rapidly becoming a computational bottleneck. Results and discussion We compared algorithm efficiency on BLAT, SOAP and EMBF. The efficiency is defined as the count of total output results divided by time consumed to retrieve them. The data show that our algorithm EMBF have 3~4 times efficiency advantage over SOAP, and at least 150 times over BLAT. Moreover, when the reference sequence size is increased, the efficiency of SOAP will get degraded as far as 30%, while EMBF have preferable increasing tendency. Conclusion In conclusion, we deem that EMBF is more suitable for short fragment anchoring problem where result completeness and accuracy is predominant and the reference sequences are relatively large. PMID:19208116

  8. Deep Ion Torrent sequencing identifies soil fungal community shifts after frequent prescribed fires in a southeastern US forest ecosystem.

    PubMed

    Brown, Shawn P; Callaham, Mac A; Oliver, Alena K; Jumpponen, Ari

    2013-12-01

    Prescribed burning is a common management tool to control fuel loads, ground vegetation, and facilitate desirable game species. We evaluated soil fungal community responses to long-term prescribed fire treatments in a loblolly pine forest on the Piedmont of Georgia and utilized deep Internal Transcribed Spacer Region 1 (ITS1) amplicon sequencing afforded by the recent Ion Torrent Personal Genome Machine (PGM). These deep sequence data (19,000 + reads per sample after subsampling) indicate that frequent fires (3-year fire interval) shift soil fungus communities, whereas infrequent fires (6-year fire interval) permit system resetting to a state similar to that without prescribed fire. Furthermore, in nonmetric multidimensional scaling analyses, primarily ectomycorrhizal taxa were correlated with axes associated with long fire intervals, whereas soil saprobes tended to be correlated with the frequent fire recurrence. We conclude that (1) multiplexed Ion Torrent PGM analyses allow deep cost effective sequencing of fungal communities but may suffer from short read lengths and inconsistent sequence quality adjacent to the sequencing adaptor; (2) frequent prescribed fires elicit a shift in soil fungal communities; and (3) such shifts do not occur when fire intervals are longer. Our results emphasize the general responsiveness of these forests to management, and the importance of fire return intervals in meeting management objectives. PMID:23869991

  9. DistMap: A Toolkit for Distributed Short Read Mapping on a Hadoop Cluster

    PubMed Central

    Pandey, Ram Vinay; Schlötterer, Christian

    2013-01-01

    With the rapid and steady increase of next generation sequencing data output, the mapping of short reads has become a major data analysis bottleneck. On a single computer, it can take several days to map the vast quantity of reads produced from a single Illumina HiSeq lane. In an attempt to ameliorate this bottleneck we present a new tool, DistMap - a modular, scalable and integrated workflow to map reads in the Hadoop distributed computing framework. DistMap is easy to use, currently supports nine different short read mapping tools and can be run on all Unix-based operating systems. It accepts reads in FASTQ format as input and provides mapped reads in a SAM/BAM format. DistMap supports both paired-end and single-end reads thereby allowing the mapping of read data produced by different sequencing platforms. DistMap is available from http://code.google.com/p/distmap/ PMID:24009693

  10. DistMap: a toolkit for distributed short read mapping on a Hadoop cluster.

    PubMed

    Pandey, Ram Vinay; Schlötterer, Christian

    2013-01-01

    With the rapid and steady increase of next generation sequencing data output, the mapping of short reads has become a major data analysis bottleneck. On a single computer, it can take several days to map the vast quantity of reads produced from a single Illumina HiSeq lane. In an attempt to ameliorate this bottleneck we present a new tool, DistMap - a modular, scalable and integrated workflow to map reads in the Hadoop distributed computing framework. DistMap is easy to use, currently supports nine different short read mapping tools and can be run on all Unix-based operating systems. It accepts reads in FASTQ format as input and provides mapped reads in a SAM/BAM format. DistMap supports both paired-end and single-end reads thereby allowing the mapping of read data produced by different sequencing platforms. DistMap is available from http://code.google.com/p/distmap/ PMID:24009693

  11. Whole Chloroplast Genome Sequencing in Fragaria Using Deep Sequencing: A Comparison of Three Methods

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Chloroplast sequences previously investigated in Fragaria revealed low amounts of variation. Deep sequencing technologies enable economical sequencing of complete chloroplast genomes. These sequences can potentially provide robust phylogenetic resolution, even at low taxonomic levels within plant gr...

  12. Short-read DNA sequencing yields microsatellite markers for Rheum

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Identifying culinary rhubarb (Rheum ×hybridum Murray) cultivars using morphological characteristics is problematic due to variability within individual genotypes, variation caused by environmental factors, plant and leaf age, similarity between genetically diverse genotypes, multiple cultivar names ...

  13. Deep Sequencing: Becoming a Critical Tool in Clinical Virology

    PubMed Central

    QUIÑONES-MATEU, Miguel E.; AVILA, Santiago; REYES-TERAN, Gustavo; MARTINEZ, Miguel A.

    2014-01-01

    Population (Sanger) sequencing has been the standard method in basic and clinical DNA sequencing for almost 40 years; however, next-generation (deep) sequencing methodologies are now revolutionizing the field of genomics, and clinical virology is no exception. Deep sequencing is highly efficient, producing an enormous amount of information at low cost in a relatively short period of time. High-throughput sequencing techniques have enabled significant contributions to multiples areas in virology, including virus discovery and metagenomics (viromes), molecular epidemiology, pathogenesis, and studies of how viruses to escape the host immune system and antiviral pressures. In addition, new and more affordable deep sequencing-based assays are now being implemented in clinical laboratories. Here we review the use of the current deep sequencing platforms in virology, focusing on three of the most studied viruses: human immunodeficiency virus (HIV), hepatitis C virus (HCV), and influenza virus. PMID:24998424

  14. Deep Sequencing to Identify the Causes of Viral Encephalitis

    PubMed Central

    Chan, Benjamin K.; Wilson, Theodore; Fischer, Kael F.; Kriesel, John D.

    2014-01-01

    Deep sequencing allows for a rapid, accurate characterization of microbial DNA and RNA sequences in many types of samples. Deep sequencing (also called next generation sequencing or NGS) is being developed to assist with the diagnosis of a wide variety of infectious diseases. In this study, seven frozen brain samples from deceased subjects with recent encephalitis were investigated. RNA from each sample was extracted, randomly reverse transcribed and sequenced. The sequence analysis was performed in a blinded fashion and confirmed with pathogen-specific PCR. This analysis successfully identified measles virus sequences in two brain samples and herpes simplex virus type-1 sequences in three brain samples. No pathogen was identified in the other two brain specimens. These results were concordant with pathogen-specific PCR and partially concordant with prior neuropathological examinations, demonstrating that deep sequencing can accurately identify viral infections in frozen brain tissue. PMID:24699691

  15. Full Genome Virus Detection in Fecal Samples Using Sensitive Nucleic Acid Preparation, Deep Sequencing, and a Novel Iterative Sequence Classification Algorithm

    PubMed Central

    Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J.; Kellam, Paul; van der Hoek, Lia

    2014-01-01

    We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis. PMID:24695106

  16. Fitness Inference from Short-Read Data: Within-Host Evolution of a Reassortant H5N1 Influenza Virus

    PubMed Central

    Illingworth, Christopher J.R.

    2015-01-01

    We present a method to infer the role of selection acting during the within-host evolution of the influenza virus from short-read genome sequence data. Linkage disequilibrium between loci is accounted for by treating short-read sequences as noisy multilocus emissions from an underlying model of haplotype evolution. A hierarchical model-selection procedure is used to infer the underlying fitness landscape of the virus insofar as that landscape is explored by the viral population. In a first application of our method, we analyze data from an evolutionary experiment describing the growth of a reassortant H5N1 virus in ferrets. Across two sets of replica experiments we infer multiple alleles to be under selection, including variants associated with receptor binding specificity, glycosylation, and with the increased transmissibility of the virus. We identify epistasis as an important component of the within-host fitness landscape, and show that adaptation can proceed through multiple genetic pathways. PMID:26243288

  17. Comparative Analysis of Functional Metagenomic Annotation and the Mappability of Short Reads

    PubMed Central

    Carr, Rogan; Borenstein, Elhanan

    2014-01-01

    To assess the functional capacities of microbial communities, including those inhabiting the human body, shotgun metagenomic reads are often aligned to a database of known genes. Such homology-based annotation practices critically rely on the assumption that short reads can map to orthologous genes of similar function. This assumption, however, and the various factors that impact short read annotation, have not been systematically evaluated. To address this challenge, we generated an extremely large database of simulated reads (totaling 15.9 Gb), spanning over 500,000 microbial genes and 170 curated genomes and including, for many genomes, every possible read of a given length. We annotated each read using common metagenomic protocols, fully characterizing the effect of read length, sequencing error, phylogeny, database coverage, and mapping parameters. We additionally rigorously quantified gene-, genome-, and protocol-specific annotation biases. Overall, our findings provide a first comprehensive evaluation of the capabilities and limitations of functional metagenomic annotation, providing crucial goal-specific best-practice guidelines to inform future metagenomic research. PMID:25148512

  18. Deep sequencing increases hepatitis C virus phylogenetic cluster detection compared to Sanger sequencing.

    PubMed

    Montoya, Vincent; Olmstead, Andrea; Tang, Patrick; Cook, Darrel; Janjua, Naveed; Grebely, Jason; Jacka, Brendan; Poon, Art F Y; Krajden, Mel

    2016-09-01

    Effective surveillance and treatment strategies are required to control the hepatitis C virus (HCV) epidemic. Phylogenetic analyses are powerful tools for reconstructing the evolutionary history of viral outbreaks and identifying transmission clusters. These studies often rely on Sanger sequencing which typically generates a single consensus sequence for each infected individual. For rapidly mutating viruses such as HCV, consensus sequencing underestimates the complexity of the viral quasispecies population and could therefore generate different phylogenetic tree topologies. Although deep sequencing provides a more detailed quasispecies characterization, in-depth phylogenetic analyses are challenging due to dataset complexity and computational limitations. Here, we apply deep sequencing to a characterized population to assess its ability to identify phylogenetic clusters compared with consensus Sanger sequencing. For deep sequencing, a sample specific threshold determined by the 50th percentile of the patristic distance distribution for all variants within each individual was used to identify clusters. Among seven patristic distance thresholds tested for the Sanger sequence phylogeny ranging from 0.005-0.06, a threshold of 0.03 was found to provide the maximum balance between positive agreement (samples in a cluster) and negative agreement (samples not in a cluster) relative to the deep sequencing dataset. From 77 HCV seroconverters, 10 individuals were identified in phylogenetic clusters using both methods. Deep sequencing analysis identified an additional 4 individuals and excluded 8 other individuals relative to Sanger sequencing. The application of this deep sequencing approach could be a more effective tool to understand onward HCV transmission dynamics compared with Sanger sequencing, since the incorporation of minority sequence variants improves the discrimination of phylogenetically linked clusters. PMID:27282472

  19. Preparing DNA Libraries for Multiplexed Paired-End Deep Sequencing for Illumina GA Sequencers

    PubMed Central

    Son, Mike S.; Taylor, Ronald K.

    2011-01-01

    Whole genome sequencing, also known as deep sequencing, is becoming a more affordable and efficient way to identify SNP mutations, deletions and insertions in DNA sequences across several different strains. Two major obstacles preventing the widespread use of deep sequencers are the costs involved in services used to prepare DNA libraries for sequencing and the overall accuracy of the sequencing data. This Unit describes the preparation of DNA libraries for multiplexed paired-end sequencing using the Illumina GA series sequencer. Self-preparation of DNA libraries can help reduce overall expenses, especially if optimization is required for the different samples, and use of the Illumina GA Sequencer can improve the quality of the data. PMID:21400673

  20. Deep sequencing and human antibody repertoire analysis.

    PubMed

    Boyd, Scott D; Crowe, James E

    2016-06-01

    In the past decade, high-throughput DNA sequencing (HTS) methods and improved approaches for isolating antigen-specific B cells and their antibody genes have been applied in many areas of human immunology. This work has greatly increased our understanding of human antibody repertoires and the specific clones responsible for protective immunity or immune-mediated pathogenesis. Although the principles underlying selection of individual B cell clones in the intact immune system are still under investigation, the combination of more powerful genetic tracking of antibody lineage development and functional testing of the encoded proteins promises to transform therapeutic antibody discovery and optimization. Here, we highlight recent advances in this fast-moving field. PMID:27065089

  1. GAViT: Genome Assembly Visualization Tool for Short Read Data

    SciTech Connect

    Syed, Aijazuddin; Shapiro, Harris; Tu, Hank; Pangilinan, Jasmyn; Trong, Stephan

    2008-03-14

    It is a challenging job for genome analysts to accurately debug, troubleshoot, and validate genome assembly results. Genome analysts rely on visualization tools to help validate and troubleshoot assembly results, including such problems as mis-assemblies, low-quality regions, and repeats. Short read data adds further complexity and makes it extremely challenging for the visualization tools to scale and to view all needed assembly information. As a result, there is a need for a visualization tool that can scale to display assembly data from the new sequencing technologies. We present Genome Assembly Visualization Tool (GAViT), a highly scalable and interactive assembly visualization tool developed at the DOE Joint Genome Institute (JGI).

  2. Complete Genome Sequence of the WHO International Standard for HIV-2 RNA Determined by Deep Sequencing

    PubMed Central

    Ham, Claire; Morris, Clare

    2016-01-01

    The World Health Organization (WHO) International Standard for HIV-2 RNA nucleic acid assays was characterized by complete genome deep sequencing. The entire coding sequence and flanking long terminal repeats (LTRs), including minority species, were assigned subtype A. This information will aid design, development, and evaluation of HIV-2 RNA amplification assays. PMID:26847885

  3. deepTools: a flexible platform for exploring deep-sequencing data.

    PubMed

    Ramírez, Fidel; Dündar, Friederike; Diehl, Sarah; Grüning, Björn A; Manke, Thomas

    2014-07-01

    We present a Galaxy based web server for processing and visualizing deeply sequenced data. The web server's core functionality consists of a suite of newly developed tools, called deepTools, that enable users with little bioinformatic background to explore the results of their sequencing experiments in a standardized setting. Users can upload pre-processed files with continuous data in standard formats and generate heatmaps and summary plots in a straight-forward, yet highly customizable manner. In addition, we offer several tools for the analysis of files containing aligned reads and enable efficient and reproducible generation of normalized coverage files. As a modular and open-source platform, deepTools can easily be expanded and customized to future demands and developments. The deepTools webserver is freely available at http://deeptools.ie-freiburg.mpg.de and is accompanied by extensive documentation and tutorials aimed at conveying the principles of deep-sequencing data analysis. The web server can be used without registration. deepTools can be installed locally either stand-alone or as part of Galaxy. PMID:24799436

  4. deepTools: a flexible platform for exploring deep-sequencing data

    PubMed Central

    Ramírez, Fidel; Dündar, Friederike; Diehl, Sarah; Grüning, Björn A.; Manke, Thomas

    2014-01-01

    We present a Galaxy based web server for processing and visualizing deeply sequenced data. The web server's core functionality consists of a suite of newly developed tools, called deepTools, that enable users with little bioinformatic background to explore the results of their sequencing experiments in a standardized setting. Users can upload pre-processed files with continuous data in standard formats and generate heatmaps and summary plots in a straight-forward, yet highly customizable manner. In addition, we offer several tools for the analysis of files containing aligned reads and enable efficient and reproducible generation of normalized coverage files. As a modular and open-source platform, deepTools can easily be expanded and customized to future demands and developments. The deepTools webserver is freely available at http://deeptools.ie-freiburg.mpg.de and is accompanied by extensive documentation and tutorials aimed at conveying the principles of deep-sequencing data analysis. The web server can be used without registration. deepTools can be installed locally either stand-alone or as part of Galaxy. PMID:24799436

  5. DSAP: deep-sequencing small RNA analysis pipeline.

    PubMed

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw. PMID:20478825

  6. MiRGator v3.0: a microRNA portal for deep sequencing, expression profiling and mRNA targeting.

    PubMed

    Cho, Sooyoung; Jang, Insu; Jun, Yukyung; Yoon, Suhyeon; Ko, Minjeong; Kwon, Yeajee; Choi, Ikjung; Chang, Hyeshik; Ryu, Daeun; Lee, Byungwook; Kim, V Narry; Kim, Wankyu; Lee, Sanghyuk

    2013-01-01

    Biogenesis and molecular function are two key subjects in the field of microRNA (miRNA) research. Deep sequencing has become the principal technique in cataloging of miRNA repertoire and generating expression profiles in an unbiased manner. Here, we describe the miRGator v3.0 update (http://mirgator.kobic.re.kr) that compiled the deep sequencing miRNA data available in public and implemented several novel tools to facilitate exploration of massive data. The miR-seq browser supports users to examine short read alignment with the secondary structure and read count information available in concurrent windows. Features such as sequence editing, sorting, ordering, import and export of user data would be of great utility for studying iso-miRs, miRNA editing and modifications. miRNA-target relation is essential for understanding miRNA function. Coexpression analysis of miRNA and target mRNAs, based on miRNA-seq and RNA-seq data from the same sample, is visualized in the heat-map and network views where users can investigate the inverse correlation of gene expression and target relations, compiled from various databases of predicted and validated targets. By keeping datasets and analytic tools up-to-date, miRGator should continue to serve as an integrated resource for biogenesis and functional investigation of miRNAs. PMID:23193297

  7. Inferring short tandem repeat variation from paired-end short reads

    PubMed Central

    Cao, Minh Duc; Tasker, Edward; Willadsen, Kai; Imelfort, Michael; Vishwanathan, Sailaja; Sureshkumar, Sridevi; Balasubramanian, Sureshkumar; Bodén, Mikael

    2014-01-01

    The advances of high-throughput sequencing offer an unprecedented opportunity to study genetic variation. This is challenged by the difficulty of resolving variant calls in repetitive DNA regions. We present a Bayesian method to estimate repeat-length variation from paired-end sequence read data. The method makes variant calls based on deviations in sequence fragment sizes, allowing the analysis of repeats at lengths of relevance to a range of phenotypes. We demonstrate the method’s ability to detect and quantify changes in repeat lengths from short read genomic sequence data across genotypes. We use the method to estimate repeat variation among 12 strains of Arabidopsis thaliana and demonstrate experimentally that our method compares favourably against existing methods. Using this method, we have identified all repeats across the genome, which are likely to be polymorphic. In addition, our predicted polymorphic repeats also included the only known repeat expansion in A. thaliana, suggesting an ability to discover potential unstable repeats. PMID:24353318

  8. Unbiased Deep Sequencing of RNA Viruses from Clinical Samples.

    PubMed

    Matranga, Christian B; Gladden-Young, Adrianne; Qu, James; Winnicki, Sarah; Nosamiefan, Dolo; Levin, Joshua Z; Sabeti, Pardis C

    2016-01-01

    Here we outline a next-generation RNA sequencing protocol that enables de novo assemblies and intra-host variant calls of viral genomes collected from clinical and biological sources. The method is unbiased and universal; it uses random primers for cDNA synthesis and requires no prior knowledge of the viral sequence content. Before library construction, selective RNase H-based digestion is used to deplete unwanted RNA - including poly(rA) carrier and ribosomal RNA - from the viral RNA sample. Selective depletion improves both the data quality and the number of unique reads in viral RNA sequencing libraries. Moreover, a transposase-based 'tagmentation' step is used in the protocol as it reduces overall library construction time. The protocol has enabled rapid deep sequencing of over 600 Lassa and Ebola virus samples-including collections from both blood and tissue isolates-and is broadly applicable to other microbial genomics studies. PMID:27403729

  9. Transcriptome Sequences Resolve Deep Relationships of the Grape Family

    PubMed Central

    Wen, Jun; Xiong, Zhiqiang; Nie, Ze-Long; Mao, Likai; Zhu, Yabing; Kan, Xian-Zhao; Ickert-Bond, Stefanie M.; Gerrath, Jean; Zimmer, Elizabeth A.; Fang, Xiao-Dong

    2013-01-01

    Previous phylogenetic studies of the grape family (Vitaceae) yielded poorly resolved deep relationships, thus impeding our understanding of the evolution of the family. Next-generation sequencing now offers access to protein coding sequences very easily, quickly and cost-effectively. To improve upon earlier work, we extracted 417 orthologous single-copy nuclear genes from the transcriptomes of 15 species of the Vitaceae, covering its phylogenetic diversity. The resulting transcriptome phylogeny provides robust support for the deep relationships, showing the phylogenetic utility of transcriptome data for plants over a time scale at least since the mid-Cretaceous. The pros and cons of transcriptome data for phylogenetic inference in plants are also evaluated. PMID:24069307

  10. Deep Sequencing of the Transcriptomes of Soybean Aphid and Associated Endosymbionts

    PubMed Central

    Liu, Sijun; Chougule, Nanasaheb P.; Vijayendran, Diveena; Bonning, Bryony C.

    2012-01-01

    Background The soybean aphid has significantly impacted soybean production in the U.S. Transcriptomic analyses were conducted for further insight into leads for potential novel management strategies. Methodology/Principal Findings Transcriptomic data were generated from whole aphids and from 2,000 aphid guts using an Illumina GAII sequencer. The sequence data were assembled de novo using the Velvet assembler. In addition to providing a general overview, we demonstrate (i) the use of the Multiple-k/Multiple-C method for de novo assembly of short read sequences, followed by BLAST annotation of contigs for increased transcript identification: From 400,000 contigs analyzed, 16,257 non-redundant BLAST hits were identified; (ii) analysis of species distributions of top non-redundant hits: 80% of BLAST hits (minimum e-value of 1.0-E3) were to the pea aphid or other aphid species, representing about half of the pea aphid genes; (iii) comparison of relative depth of sequence coverage to relative transcript abundance for genes with high (membrane alanyl aminopeptidase N) or low transcript abundance; (iv) analysis of the Buchnera transcriptome: Transcripts from 57.6% of the genes from Buchnera aphidicola were identified; (v) identification of Arsenophonus and Wolbachia as potential secondary endosymbionts; (vi) alignment of full length sequences from RNA-seq data for the putative salivary gland protein C002, the silencing of which has potential for aphid management, and the putative Bacillus thuringiensis Cry toxin receptors, aminopeptidase N and alkaline phosphatase. Conclusions/Significance This study provides the most comprehensive data set to date for soybean aphid gene expression: This work also illustrates the utility of short-read transcriptome sequencing and the Multiple-k/Multiple-C method followed by BLAST annotation for rapid identification of target genes for organisms for which reference genome sequences are not available, and extends the utility to include the

  11. deepBase: a database for deeply annotating and mining deep sequencing data

    PubMed Central

    Yang, Jian-Hua; Shao, Peng; Zhou, Hui; Chen, Yue-Qin; Qu, Liang-Hu

    2010-01-01

    Advances in high-throughput next-generation sequencing technology have reshaped the transcriptomic research landscape. However, exploration of these massive data remains a daunting challenge. In this study, we describe a novel database, deepBase, which we have developed to facilitate the comprehensive annotation and discovery of small RNAs from transcriptomic data. The current release of deepBase contains deep sequencing data from 185 small RNA libraries from diverse tissues and cell lines of seven organisms: human, mouse, chicken, Ciona intestinalis, Drosophila melanogaster, Caenhorhabditis elegans and Arabidopsis thaliana. By analyzing ∼14.6 million unique reads that perfectly mapped to more than 284 million genomic loci, we annotated and identified ∼380 000 unique ncRNA-associated small RNAs (nasRNAs), ∼1.5 million unique promoter-associated small RNAs (pasRNAs), ∼4.0 million unique exon-associated small RNAs (easRNAs) and ∼6 million unique repeat-associated small RNAs (rasRNAs). Furthermore, 2038 miRNA and 1889 snoRNA candidates were predicted by miRDeep and snoSeeker. All of the mapped reads can be grouped into about 1.2 million RNA clusters. For the purpose of comparative analysis, deepBase provides an integrative, interactive and versatile display. A convenient search option, related publications and other useful information are also provided for further investigation. deepBase is available at: http://deepbase.sysu.edu.cn/. PMID:19966272

  12. SHARAKU: an algorithm for aligning and clustering read mapping profiles of deep sequencing in non-coding RNA processing

    PubMed Central

    Tsuchiya, Mariko; Amano, Kojiro; Abe, Masaya; Seki, Misato; Hase, Sumitaka; Sato, Kengo; Sakakibara, Yasubumi

    2016-01-01

    Motivation: Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs. Results: We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5′-end processing and 3′-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain. Availability and Implementation: The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA

  13. Genetics and Epigenetics of the Skin Meet Deep Sequence

    PubMed Central

    Cheng, Jeffrey B.; Cho, Raymond J.

    2014-01-01

    Rapid advances in next-generation sequencing technology are revolutionizing approaches to genomic and epigenomic studies of skin. Deep sequencing of cutaneous malignancies reveals heavily mutagenized genomes with large numbers of low-prevalence mutations and multiple resistance mechanisms to targeted therapies. Next-generation sequencing approaches have already paid rich dividends in identifying the genetic causes of dermatologic disease, both in heritable mutations and the somatic aberrations that underlie cutaneous mosaicism. Although epigenetic alterations clearly influence tumorigenesis, pluripotent stem cell biology, and epidermal cell lineage decisions, labor and cost-intensive approaches long delayed a genome-scale perspective. New insights into epigenomic mechanisms in skin disease should arise from the accelerating assessment of histone modification, DNA methylation, and related gene expression signatures. PMID:22237701

  14. Deep sequencing approach for investigating infectious agents causing fever.

    PubMed

    Susilawati, T N; Jex, A R; Cantacessi, C; Pearson, M; Navarro, S; Susianto, A; Loukas, A C; McBride, W J H

    2016-07-01

    Acute undifferentiated fever (AUF) poses a diagnostic challenge due to the variety of possible aetiologies. While the majority of AUFs resolve spontaneously, some cases become prolonged and cause significant morbidity and mortality, necessitating improved diagnostic methods. This study evaluated the utility of deep sequencing in fever investigation. DNA and RNA were isolated from plasma/sera of AUF cases being investigated at Cairns Hospital in northern Australia, including eight control samples from patients with a confirmed diagnosis. Following isolation, DNA and RNA were bulk amplified and RNA was reverse transcribed to cDNA. The resulting DNA and cDNA amplicons were subjected to deep sequencing on an Illumina HiSeq 2000 platform. Bioinformatics analysis was performed using the program Kraken and the CLC assembly-alignment pipeline. The results were compared with the outcomes of clinical tests. We generated between 4 and 20 million reads per sample. The results of Kraken and CLC analyses concurred with diagnoses obtained by other means in 87.5 % (7/8) and 25 % (2/8) of control samples, respectively. Some plausible causes of fever were identified in ten patients who remained undiagnosed following routine hospital investigations, including Escherichia coli bacteraemia and scrub typhus that eluded conventional tests. Achromobacter xylosoxidans, Alteromonas macleodii and Enterobacteria phage were prevalent in all samples. A deep sequencing approach of patient plasma/serum samples led to the identification of aetiological agents putatively implicated in AUFs and enabled the study of microbial diversity in human blood. The application of this approach in hospital practice is currently limited by sequencing input requirements and complicated data analysis. PMID:27180244

  15. proovread: large-scale high-accuracy PacBio correction through iterative short read consensus

    PubMed Central

    Hackl, Thomas; Hedrich, Rainer; Schultz, Jörg; Förster, Frank

    2014-01-01

    Motivation: Today, the base code of DNA is mostly determined through sequencing by synthesis as provided by the Illumina sequencers. Although highly accurate, resulting reads are short, making their analyses challenging. Recently, a new technology, single molecule real-time (SMRT) sequencing, was developed that could address these challenges, as it generates reads of several thousand bases. But, their broad application has been hampered by a high error rate. Therefore, hybrid approaches that use high-quality short reads to correct erroneous SMRT long reads have been developed. Still, current implementations have great demands on hardware, work only in well-defined computing infrastructures and reject a substantial amount of reads. This limits their usability considerably, especially in the case of large sequencing projects. Results: Here we present proovread, a hybrid correction pipeline for SMRT reads, which can be flexibly adapted on existing hardware and infrastructure from a laptop to a high-performance computing cluster. On genomic and transcriptomic test cases covering Escherichia coli, Arabidopsis thaliana and human, proovread achieved accuracies up to 99.9% and outperformed the existing hybrid correction programs. Furthermore, proovread-corrected sequences were longer and the throughput was higher. Thus, proovread combines the most accurate correction results with an excellent adaptability to the available hardware. It will therefore increase the applicability and value of SMRT sequencing. Availability and implementation: proovread is available at the following URL: http://proovread.bioapps.biozentrum.uni-wuerzburg.de Contact: frank.foerster@biozentrum.uni-wuerzburg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25015988

  16. Deep sequencing of HIV: clinical and research applications.

    PubMed

    Chabria, Shiven B; Gupta, Shaili; Kozal, Michael J

    2014-01-01

    Human immunodeficiency virus (HIV) exhibits remarkable diversity in its genomic makeup and exists in any given individual as a complex distribution of closely related but nonidentical genomes called a viral quasispecies, which is subject to genetic variation, competition, and selection. This viral diversity clinically manifests as a selection of mutant variants based on viral fitness in treatment-naive individuals and based on drug-selective pressure in those on antiretroviral therapy (ART). The current standard-of-care ART consists of a combination of antiretroviral agents, which ensures maximal viral suppression while preventing the emergence of drug-resistant HIV variants. Unfortunately, transmission of drug-resistant HIV does occur, affecting 5% to >20% of newly infected individuals. To optimize therapy, clinicians rely on viral genotypic information obtained from conventional population sequencing-based assays, which cannot reliably detect viral variants that constitute <20% of the circulating viral quasispecies. These low-frequency variants can be detected by highly sensitive genotyping methods collectively grouped under the moniker of deep sequencing. Low-frequency variants have been correlated to treatment failures and HIV transmission, and detection of these variants is helping to inform strategies for vaccine development. Here, we discuss the molecular virology of HIV, viral heterogeneity, drug-resistance mutations, and the application of deep sequencing technologies in research and the clinical care of HIV-infected individuals. PMID:24821496

  17. deepTools2: a next generation web server for deep-sequencing data analysis

    PubMed Central

    Ramírez, Fidel; Ryan, Devon P; Grüning, Björn; Bhardwaj, Vivek; Kilpert, Fabian; Richter, Andreas S; Heyne, Steffen; Dündar, Friederike; Manke, Thomas

    2016-01-01

    We present an update to our Galaxy-based web server for processing and visualizing deeply sequenced data. Its core tool set, deepTools, allows users to perform complete bioinformatic workflows ranging from quality controls and normalizations of aligned reads to integrative analyses, including clustering and visualization approaches. Since we first described our deepTools Galaxy server in 2014, we have implemented new solutions for many requests from the community and our users. Here, we introduce significant enhancements and new tools to further improve data visualization and interpretation. deepTools continue to be open to all users and freely available as a web service at deeptools.ie-freiburg.mpg.de. The new deepTools2 suite can be easily deployed within any Galaxy framework via the toolshed repository, and we also provide source code for command line usage under Linux and Mac OS X. A public and documented API for access to deepTools functionality is also available. PMID:27079975

  18. deepTools2: a next generation web server for deep-sequencing data analysis.

    PubMed

    Ramírez, Fidel; Ryan, Devon P; Grüning, Björn; Bhardwaj, Vivek; Kilpert, Fabian; Richter, Andreas S; Heyne, Steffen; Dündar, Friederike; Manke, Thomas

    2016-07-01

    We present an update to our Galaxy-based web server for processing and visualizing deeply sequenced data. Its core tool set, deepTools, allows users to perform complete bioinformatic workflows ranging from quality controls and normalizations of aligned reads to integrative analyses, including clustering and visualization approaches. Since we first described our deepTools Galaxy server in 2014, we have implemented new solutions for many requests from the community and our users. Here, we introduce significant enhancements and new tools to further improve data visualization and interpretation. deepTools continue to be open to all users and freely available as a web service at deeptools.ie-freiburg.mpg.de The new deepTools2 suite can be easily deployed within any Galaxy framework via the toolshed repository, and we also provide source code for command line usage under Linux and Mac OS X. A public and documented API for access to deepTools functionality is also available. PMID:27079975

  19. Parallel and Scalable Short-Read Alignment on Multi-Core Clusters Using UPC++

    PubMed Central

    González-Domínguez, Jorge; Liu, Yongchao; Schmidt, Bertil

    2016-01-01

    The growth of next-generation sequencing (NGS) datasets poses a challenge to the alignment of reads to reference genomes in terms of alignment quality and execution speed. Some available aligners have been shown to obtain high quality mappings at the expense of long execution times. Finding fast yet accurate software solutions is of high importance to research, since availability and size of NGS datasets continue to increase. In this work we present an efficient parallelization approach for NGS short-read alignment on multi-core clusters. Our approach takes advantage of a distributed shared memory programming model based on the new UPC++ language. Experimental results using the CUSHAW3 aligner show that our implementation based on dynamic scheduling obtains good scalability on multi-core clusters. Through our evaluation, we are able to complete the single-end and paired-end alignments of 246 million reads of length 150 base-pairs in 11.54 and 16.64 minutes, respectively, using 32 nodes with four AMD Opteron 6272 16-core CPUs per node. In contrast, the multi-threaded original tool needs 2.77 and 5.54 hours to perform the same alignments on the 64 cores of one node. The source code of our parallel implementation is publicly available at the CUSHAW3 homepage (http://cushaw3.sourceforge.net). PMID:26731399

  20. Parallel and Scalable Short-Read Alignment on Multi-Core Clusters Using UPC+.

    PubMed

    González-Domínguez, Jorge; Liu, Yongchao; Schmidt, Bertil

    2016-01-01

    The growth of next-generation sequencing (NGS) datasets poses a challenge to the alignment of reads to reference genomes in terms of alignment quality and execution speed. Some available aligners have been shown to obtain high quality mappings at the expense of long execution times. Finding fast yet accurate software solutions is of high importance to research, since availability and size of NGS datasets continue to increase. In this work we present an efficient parallelization approach for NGS short-read alignment on multi-core clusters. Our approach takes advantage of a distributed shared memory programming model based on the new UPC++ language. Experimental results using the CUSHAW3 aligner show that our implementation based on dynamic scheduling obtains good scalability on multi-core clusters. Through our evaluation, we are able to complete the single-end and paired-end alignments of 246 million reads of length 150 base-pairs in 11.54 and 16.64 minutes, respectively, using 32 nodes with four AMD Opteron 6272 16-core CPUs per node. In contrast, the multi-threaded original tool needs 2.77 and 5.54 hours to perform the same alignments on the 64 cores of one node. The source code of our parallel implementation is publicly available at the CUSHAW3 homepage (http://cushaw3.sourceforge.net). PMID:26731399

  1. Clinical actionability enhanced through deep targeted sequencing of solid tumors

    PubMed Central

    Chen, Ken; Meric-Bernstam, Funda; Zhao, Hao; Zhang, Qingxiu; Ezzeddine, Nader; Tang, Lin-ya; Qi, Yuan; Mao, Yong; Chen, Tenghui; Chong, Zechen; Zhou, Wanding; Zheng, Xiaofeng; Johnson, Amber; Aldape, Kenneth D.; Routbort, Mark J.; Luthra, Rajyalakshmi; Kopetz, Scott; Davies, Michael A.; de Groot, John; Moulder, Stacy; Vinod, Ravi; Farhangfar, Carol J.; Shaw, Kenna Mills; Mendelsohn, John; Mills, Gordon B.; Eterovic, Agda Karina

    2015-01-01

    Background Further advances of targeted cancer therapy require comprehensive in-depth profiling of somatic mutations that are present in subpopulations of tumor cells in a clinical tumor sample. However, it is unclear to what extent such intra-tumor heterogeneity is present and whether it may affect clinical decision making. To unravel this challenge, we established a deep targeted sequencing platform to identify potentially actionable DNA alterations in tumor samples. Methods We assayed 515 FFPE tumor samples and matched germline (475 patients) from 11 disease sites by capturing and sequencing all the exons in 201 cancer related genes. Mutations, indels and copy number data were reported. Results We obtained a 1000-fold average sequencing depth and identified 4794 non-synonymous mutations in the samples analyzed, which 15.2% were present at less than 10% allele frequency. Most of these low level mutations occurred at known oncogenic hotspots and are likely functional. Identifying low level mutations improved identification of mutations in actionable genes in 118 (24.84%) patients, among which 47 (9.8%) would otherwise be unactionable. In addition, acquiring ultra-high depth also ensured a low false discovery rate (less than 2.2%) from FFPE samples. Conclusion Our results were as accurate as a commercially available CLIA-compliant hotspot panel, but allowed the detection of a higher number of mutations in actionable genes. Our study revealed the critical importance of acquiring and utilizing high depth in profiling clinical tumor samples and presented a very useful platform for implementing routine sequencing in a cancer care institution. PMID:25626406

  2. Target Enrichment Improves Mapping of Complex Traits by Deep Sequencing

    PubMed Central

    Guo, Jianjun; Fan, Jue; Hauser, Bernard A.; Rhee, Seung Y.

    2015-01-01

    Complex traits such as crop performance and human diseases are controlled by multiple genetic loci, many of which have small effects and often go undetected by traditional quantitative trait locus (QTL) mapping. Recently, bulked segregant analysis with large F2 pools and genome-level markers (named extreme-QTL or X-QTL mapping) has been used to identify many QTL. To estimate parameters impacting QTL detection for X-QTL mapping, we simulated the effects of population size, marker density, and sequencing depth of markers on QTL detectability for traits with differing heritabilities. These simulations indicate that a high (>90%) chance of detecting QTL with at least 5% effect requires 5000× sequencing depth for a trait with heritability of 0.4−0.7. For most eukaryotic organisms, whole-genome sequencing at this depth is not economically feasible. Therefore, we tested and confirmed the feasibility of applying deep sequencing of target-enriched markers for X-QTL mapping. We used two traits in Arabidopsis thaliana with different heritabilities: seed size (H2 = 0.61) and seedling greening in response to salt (H2 = 0.94). We used a modified G test to identify QTL regions and developed a model-based statistical framework to resolve individual peaks by incorporating recombination rates. Multiple QTL were identified for both traits, including previously undiscovered QTL. We call our method target-enriched X-QTL (TEX-QTL) mapping; this mapping approach is not limited by the genome size or the availability of recombinant inbred populations and should be applicable to many organisms and traits. PMID:26530422

  3. Error analysis of deep sequencing of phage libraries: peptides censored in sequencing.

    PubMed

    Matochko, Wadim L; Derda, Ratmir

    2013-01-01

    Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (Sa). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq). Sequencing without any bias and errors is Seq = Sa IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process. PMID:24416071

  4. Detecting copy number variation with mated short reads

    PubMed Central

    Medvedev, Paul; Fiume, Marc; Dzamba, Misko; Smith, Tim; Brudno, Michael

    2010-01-01

    The development of high-throughput sequencing (HTS) technologies has opened the door to novel methods for detecting copy number variants (CNVs) in the human genome. While in the past CNVs have been detected based on array CGH data, recent studies have shown that depth-of-coverage information from HTS technologies can also be used for the reliable identification of large copy-variable regions. Such methods, however, are hindered by sequencing biases that lead certain regions of the genome to be over- or undersampled, lowering their resolution and ability to accurately identify the exact breakpoints of the variants. In this work, we develop a method for CNV detection that supplements the depth-of-coverage with paired-end mapping information, where mate pairs mapping discordantly to the reference serve to indicate the presence of variation. Our algorithm, called CNVer, combines this information within a unified computational framework called the donor graph, allowing us to better mitigate the sequencing biases that cause uneven local coverage and accurately predict CNVs. We use CNVer to detect 4879 CNVs in the recently described genome of a Yoruban individual. Most of the calls (77%) coincide with previously known variants within the Database of Genomic Variants, while 81% of deletion copy number variants previously known for this individual coincide with one of our loss calls. Furthermore, we demonstrate that CNVer can reconstruct the absolute copy counts of segments of the donor genome and evaluate the feasibility of using CNVer with low coverage datasets. PMID:20805290

  5. A parallel algorithm for error correction in high-throughput short-read data on CUDA-enabled graphics hardware.

    PubMed

    Shi, Haixiang; Schmidt, Bertil; Liu, Weiguo; Müller-Wittig, Wolfgang

    2010-04-01

    Emerging DNA sequencing technologies open up exciting new opportunities for genome sequencing by generating read data with a massive throughput. However, produced reads are significantly shorter and more error-prone compared to the traditional Sanger shotgun sequencing method. This poses challenges for de novo DNA fragment assembly algorithms in terms of both accuracy (to deal with short, error-prone reads) and scalability (to deal with very large input data sets). In this article, we present a scalable parallel algorithm for correcting sequencing errors in high-throughput short-read data so that error-free reads can be available before DNA fragment assembly, which is of high importance to many graph-based short-read assembly tools. The algorithm is based on spectral alignment and uses the Compute Unified Device Architecture (CUDA) programming model. To gain efficiency we are taking advantage of the CUDA texture memory using a space-efficient Bloom filter data structure for spectrum membership queries. We have tested the runtime and accuracy of our algorithm using real and simulated Illumina data for different read lengths, error rates, input sizes, and algorithmic parameters. Using a CUDA-enabled mass-produced GPU (available for less than US$400 at any local computer outlet), this results in speedups of 12-84 times for the parallelized error correction, and speedups of 3-63 times for both sequential preprocessing and parallelized error correction compared to the publicly available Euler-SR program. Our implementation is freely available for download from http://cuda-ec.sourceforge.net . PMID:20426693

  6. DeepBase: annotation and discovery of microRNAs and other noncoding RNAs from deep-sequencing data.

    PubMed

    Yang, Jian-Hua; Qu, Liang-Hu

    2012-01-01

    Recent advances in high-throughput deep-sequencing technology have produced large numbers of short and long RNA sequences and enabled the detection and profiling of known and novel microRNAs (miRNAs) and other noncoding RNAs (ncRNAs) at unprecedented sensitivity and depth. In this chapter, we describe the use of deepBase, a database that we have developed to integrate all public deep-sequencing data and to facilitate the comprehensive annotation and discovery of miRNAs and other ncRNAs from these data. deepBase provides an integrative, interactive, and versatile web graphical interface to evaluate miRBase-annotated miRNA genes and other known ncRNAs, explores the expression patterns of miRNAs and other ncRNAs, and discovers novel miRNAs and other ncRNAs from deep-sequencing data. deepBase also provides a deepView genome browser to comparatively analyze these data at multiple levels. deepBase is available at http://deepbase.sysu.edu.cn/. PMID:22144203

  7. A Protein Deep Sequencing Evaluation of Metastatic Melanoma Tissues

    PubMed Central

    Welinder, Charlotte; Pawłowski, Krzysztof; Sugihara, Yutaka; Yakovleva, Maria; Jönsson, Göran; Ingvar, Christian; Lundgren, Lotta; Baldetorp, Bo; Olsson, Håkan; Rezeli, Melinda; Jansson, Bo; Laurell, Thomas; Fehniger, Thomas; Döme, Balazs; Malm, Johan; Wieslander, Elisabet; Nishimura, Toshihide; Marko-Varga, György

    2015-01-01

    Malignant melanoma has the highest increase of incidence of malignancies in the western world. In early stages, front line therapy is surgical excision of the primary tumor. Metastatic disease has very limited possibilities for cure. Recently, several protein kinase inhibitors and immune modifiers have shown promising clinical results but drug resistance in metastasized melanoma remains a major problem. The need for routine clinical biomarkers to follow disease progression and treatment efficacy is high. The aim of the present study was to build a protein sequence database in metastatic melanoma, searching for novel, relevant biomarkers. Ten lymph node metastases (South-Swedish Malignant Melanoma Biobank) were subjected to global protein expression analysis using two proteomics approaches (with/without orthogonal fractionation). Fractionation produced higher numbers of protein identifications (4284). Combining both methods, 5326 unique proteins were identified (2641 proteins overlapping). Deep mining proteomics may contribute to the discovery of novel biomarkers for metastatic melanoma, for example dividing the samples into two metastatic melanoma “genomic subtypes”, (“pigmentation” and “high immune”) revealed several proteins showing differential levels of expression. In conclusion, the present study provides an initial version of a metastatic melanoma protein sequence database producing a total of more than 5000 unique protein identifications. The raw data have been deposited to the ProteomeXchange with identifiers PXD001724 and PXD001725. PMID:25874936

  8. Deep sequencing reveals stepwise mutation acquisition in paroxysmal nocturnal hemoglobinuria

    PubMed Central

    Shen, Wenyi; Clemente, Michael J.; Hosono, Naoko; Yoshida, Kenichi; Przychodzen, Bartlomiej; Yoshizato, Tetsuichi; Shiraishi, Yuichi; Miyano, Satoru; Ogawa, Seishi; Maciejewski, Jaroslaw P.; Makishima, Hideki

    2014-01-01

    Paroxysmal nocturnal hemoglobinuria (PNH) is a nonmalignant clonal disease of hematopoietic stem cells that is associated with hemolysis, marrow failure, and thrombophilia. PNH has been considered a monogenic disease that results from somatic mutations in the gene encoding PIGA, which is required for biosynthesis of glycosylphosphatidylinisotol-anchored (GPI-anchored) proteins. The loss of certain GPI-anchored proteins is hypothesized to provide the mutant clone with an extrinsic growth advantage, but some features of PNH argue that there are intrinsic drivers of clonal expansion. Here, we performed whole-exome sequencing of paired PNH+ and PNH– fractions on samples taken from 12 patients as well as targeted deep sequencing of an additional 36 PNH patients. We identified additional somatic mutations that resulted in a complex hierarchical clonal architecture, similar to that observed in myeloid neoplasms. In addition to mutations in PIGA, mutations were found in genes known to be involved in myeloid neoplasm pathogenesis, including TET2, SUZ12, U2AF1, and JAK2. Clonal analysis indicated that these additional mutations arose either as a subclone within the PIGA-mutant population, or prior to PIGA mutation. Together, our data indicate that in addition to PIGA mutations, accessory genetic events are frequent in PNH, suggesting a stepwise clonal evolution derived from a singular stem cell clone. PMID:25244093

  9. Accurate indel prediction using paired-end short reads

    PubMed Central

    2013-01-01

    Background One of the major open challenges in next generation sequencing (NGS) is the accurate identification of structural variants such as insertions and deletions (indels). Current methods for indel calling assign scores to different types of evidence or counter-evidence for the presence of an indel, such as the number of split read alignments spanning the boundaries of a deletion candidate or reads that map within a putative deletion. Candidates with a score above a manually defined threshold are then predicted to be true indels. As a consequence, structural variants detected in this manner contain many false positives. Results Here, we present a machine learning based method which is able to discover and distinguish true from false indel candidates in order to reduce the false positive rate. Our method identifies indel candidates using a discriminative classifier based on features of split read alignment profiles and trained on true and false indel candidates that were validated by Sanger sequencing. We demonstrate the usefulness of our method with paired-end Illumina reads from 80 genomes of the first phase of the 1001 Genomes Project ( http://www.1001genomes.org) in Arabidopsis thaliana. Conclusion In this work we show that indel classification is a necessary step to reduce the number of false positive candidates. We demonstrate that missing classification may lead to spurious biological interpretations. The software is available at: http://agkb.is.tuebingen.mpg.de/Forschung/SV-M/. PMID:23442375

  10. Concurrent and Accurate Short Read Mapping on Multicore Processors.

    PubMed

    Martínez, Héctor; Tárraga, Joaquín; Medina, Ignacio; Barrachina, Sergio; Castillo, Maribel; Dopazo, Joaquín; Quintana-Ortí, Enrique S

    2015-01-01

    We introduce a parallel aligner with a work-flow organization for fast and accurate mapping of RNA sequences on servers equipped with multicore processors. Our software, HPG Aligner SA (HPG Aligner SA is an open-source application. The software is available at http://www.opencb.org, exploits a suffix array to rapidly map a large fraction of the RNA fragments (reads), as well as leverages the accuracy of the Smith-Waterman algorithm to deal with conflictive reads. The aligner is enhanced with a careful strategy to detect splice junctions based on an adaptive division of RNA reads into small segments (or seeds), which are then mapped onto a number of candidate alignment locations, providing crucial information for the successful alignment of the complete reads. The experimental results on a platform with Intel multicore technology report the parallel performance of HPG Aligner SA, on RNA reads of 100-400 nucleotides, which excels in execution time/sensitivity to state-of-the-art aligners such as TopHat 2+Bowtie 2, MapSplice, and STAR. PMID:26451814

  11. Unified View of Backward Backtracking in Short Read Mapping

    NASA Astrophysics Data System (ADS)

    Mäkinen, Veli; Välimäki, Niko; Laaksonen, Antti; Katainen, Riku

    Mapping short DNA reads to the reference genome is the core task in the recent high-throughput technologies to study e.g. protein-DNA interactions (ChIP-seq) and alternative splicing (RNA-seq). Several tools for the task (bowtie, bwa, SOAP2, TopHat) have been developed that exploit Burrows-Wheeler transform and the backward backtracking technique on it, to map the reads to their best approximate occurrences in the genome. These tools use different tailored mechanisms for small error-levels to prune the search phase significantly. We propose a new pruning mechanism that can be seen a generalization of the tailored mechanisms used so far. It uses a novel idea of storing all cyclic rotations of fixed length substrings of the reference sequence with a compressed index that is able to exploit the repetitions created to level out the growth of the input set. For RNA-seq we propose a new method that combines dynamic programming with backtracking to map efficiently and correctly all reads that span two exons. Same mechanism can also be used for mapping mate-pair reads.

  12. Remote triggering of deep earthquakes in the 2002 Tonga sequences.

    PubMed

    Tibi, Rigobert; Wiens, Douglas A; Inoue, Hiroshi

    2003-08-21

    It is well established that an earthquake in the Earth's crust can trigger subsequent earthquakes, but such triggering has not been documented for deeper earthquakes. Models for shallow fault interactions suggest that static (permanent) stress changes can trigger nearby earthquakes, within a few fault lengths from the causative earthquake, whereas dynamic (transient) stresses carried by seismic waves may trigger earthquakes both nearby and at remote distances. Here we present a detailed analysis of the 19 August 2002 Tonga deep earthquake sequences and show evidence for both static and dynamic triggering. Seven minutes after a magnitude 7.6 earthquake occurred at a depth of 598 km, a magnitude 7.7 earthquake (664 km depth) occurred 300 km away, in a previously aseismic region. We found that nearby aftershocks of the first mainshock are preferentially located in regions where static stresses are predicted to have been enhanced by the mainshock. But the second mainshock and other triggered events are located at larger distances where static stress increases should be negligible, thus suggesting dynamic triggering. The origin times of the triggered events do not correspond to arrival times of the main seismic waves from the mainshocks and the dynamically triggered earthquakes frequently occur in aseismic regions below or adjacent to the seismic zone. We propose that these events are triggered by transient effects in regions near criticality, but where earthquakes have difficulty nucleating without external influences. PMID:12931183

  13. Key roles for freshwater Actinobacteria revealed by deep metagenomic sequencing.

    PubMed

    Ghai, Rohit; Mizuno, Carolina Megumi; Picazo, Antonio; Camacho, Antonio; Rodriguez-Valera, Francisco

    2014-12-01

    Freshwater ecosystems are critical but fragile environments directly affecting society and its welfare. However, our understanding of genuinely freshwater microbial communities, constrained by our capacity to manipulate its prokaryotic participants in axenic cultures, remains very rudimentary. Even the most abundant components, freshwater Actinobacteria, remain largely unknown. Here, applying deep metagenomic sequencing to the microbial community of a freshwater reservoir, we were able to circumvent this traditional bottleneck and reconstruct de novo seven distinct streamlined actinobacterial genomes. These genomes represent three new groups of photoheterotrophic, planktonic Actinobacteria. We describe for the first time genomes of two novel clades, acMicro (Micrococcineae, related to Luna2,) and acAMD (Actinomycetales, related to acTH1). Besides, an aggregate of contigs belonged to a new branch of the Acidimicrobiales. All are estimated to have small genomes (approximately 1.2 Mb), and their GC content varied from 40 to 61%. One of the Micrococcineae genomes encodes a proteorhodopsin, a rhodopsin type reported for the first time in Actinobacteria. The remarkable potential capacity of some of these genomes to transform recalcitrant plant detrital material, particularly lignin-derived compounds, suggests close linkages between the terrestrial and aquatic realms. Moreover, abundances of Actinobacteria correlate inversely to those of Cyanobacteria that are responsible for prolonged and frequently irretrievable damage to freshwater ecosystems. This suggests that they might serve as sentinels of impending ecological catastrophes. PMID:25355242

  14. Deep-Sea, Deep-Sequencing: Metabarcoding Extracellular DNA from Sediments of Marine Canyons

    PubMed Central

    Guardiola, Magdalena; Uriz, María Jesús; Taberlet, Pierre; Coissac, Eric; Wangensteen, Owen Simon; Turon, Xavier

    2015-01-01

    Marine sediments are home to one of the richest species pools on Earth, but logistics and a dearth of taxonomic work-force hinders the knowledge of their biodiversity. We characterized α- and β-diversity of deep-sea assemblages from submarine canyons in the western Mediterranean using an environmental DNA metabarcoding. We used a new primer set targeting a short eukaryotic 18S sequence (ca. 110 bp). We applied a protocol designed to obtain extractions enriched in extracellular DNA from replicated sediment corers. With this strategy we captured information from DNA (local or deposited from the water column) that persists adsorbed to inorganic particles and buffered short-term spatial and temporal heterogeneity. We analysed replicated samples from 20 localities including 2 deep-sea canyons, 1 shallower canal, and two open slopes (depth range 100–2,250 m). We identified 1,629 MOTUs, among which the dominant groups were Metazoa (with representatives of 19 phyla), Alveolata, Stramenopiles, and Rhizaria. There was a marked small-scale heterogeneity as shown by differences in replicates within corers and within localities. The spatial variability between canyons was significant, as was the depth component in one of the canyons where it was tested. Likewise, the composition of the first layer (1 cm) of sediment was significantly different from deeper layers. We found that qualitative (presence-absence) and quantitative (relative number of reads) data showed consistent trends of differentiation between samples and geographic areas. The subset of exclusively benthic MOTUs showed similar patterns of β-diversity and community structure as the whole dataset. Separate analyses of the main metazoan phyla (in number of MOTUs) showed some differences in distribution attributable to different lifestyles. Our results highlight the differentiation that can be found even between geographically close assemblages, and sets the ground for future monitoring and conservation efforts on

  15. Deep-Sea, Deep-Sequencing: Metabarcoding Extracellular DNA from Sediments of Marine Canyons.

    PubMed

    Guardiola, Magdalena; Uriz, María Jesús; Taberlet, Pierre; Coissac, Eric; Wangensteen, Owen Simon; Turon, Xavier

    2015-01-01

    Marine sediments are home to one of the richest species pools on Earth, but logistics and a dearth of taxonomic work-force hinders the knowledge of their biodiversity. We characterized α- and β-diversity of deep-sea assemblages from submarine canyons in the western Mediterranean using an environmental DNA metabarcoding. We used a new primer set targeting a short eukaryotic 18S sequence (ca. 110 bp). We applied a protocol designed to obtain extractions enriched in extracellular DNA from replicated sediment corers. With this strategy we captured information from DNA (local or deposited from the water column) that persists adsorbed to inorganic particles and buffered short-term spatial and temporal heterogeneity. We analysed replicated samples from 20 localities including 2 deep-sea canyons, 1 shallower canal, and two open slopes (depth range 100-2,250 m). We identified 1,629 MOTUs, among which the dominant groups were Metazoa (with representatives of 19 phyla), Alveolata, Stramenopiles, and Rhizaria. There was a marked small-scale heterogeneity as shown by differences in replicates within corers and within localities. The spatial variability between canyons was significant, as was the depth component in one of the canyons where it was tested. Likewise, the composition of the first layer (1 cm) of sediment was significantly different from deeper layers. We found that qualitative (presence-absence) and quantitative (relative number of reads) data showed consistent trends of differentiation between samples and geographic areas. The subset of exclusively benthic MOTUs showed similar patterns of β-diversity and community structure as the whole dataset. Separate analyses of the main metazoan phyla (in number of MOTUs) showed some differences in distribution attributable to different lifestyles. Our results highlight the differentiation that can be found even between geographically close assemblages, and sets the ground for future monitoring and conservation efforts on

  16. High-resolution microbial community reconstruction by integrating short reads from multiple 16S rRNA regions.

    PubMed

    Amir, Amnon; Zeisel, Amit; Zuk, Or; Elgart, Michael; Stern, Shay; Shamir, Ohad; Turnbaugh, Peter J; Soen, Yoav; Shental, Noam

    2013-12-01

    The emergence of massively parallel sequencing technology has revolutionized microbial profiling, allowing the unprecedented comparison of microbial diversity across time and space in a wide range of host-associated and environmental ecosystems. Although the high-throughput nature of such methods enables the detection of low-frequency bacteria, these advances come at the cost of sequencing read length, limiting the phylogenetic resolution possible by current methods. Here, we present a generic approach for integrating short reads from large genomic regions, thus enabling phylogenetic resolution far exceeding current methods. The approach is based on a mapping to a statistical model that is later solved as a constrained optimization problem. We demonstrate the utility of this method by analyzing human saliva and Drosophila samples, using Illumina single-end sequencing of a 750 bp amplicon of the 16S rRNA gene. Phylogenetic resolution is significantly extended while reducing the number of falsely detected bacteria, as compared with standard single-region Roche 454 Pyrosequencing. Our approach can be seamlessly applied to simultaneous sequencing of multiple genes providing a higher resolution view of the composition and activity of complex microbial communities. PMID:24214960

  17. High-resolution microbial community reconstruction by integrating short reads from multiple 16S rRNA regions

    PubMed Central

    Amir, Amnon; Zeisel, Amit; Zuk, Or; Elgart, Michael; Stern, Shay; Shamir, Ohad; Turnbaugh, Peter J.; Soen, Yoav; Shental, Noam

    2013-01-01

    The emergence of massively parallel sequencing technology has revolutionized microbial profiling, allowing the unprecedented comparison of microbial diversity across time and space in a wide range of host-associated and environmental ecosystems. Although the high-throughput nature of such methods enables the detection of low-frequency bacteria, these advances come at the cost of sequencing read length, limiting the phylogenetic resolution possible by current methods. Here, we present a generic approach for integrating short reads from large genomic regions, thus enabling phylogenetic resolution far exceeding current methods. The approach is based on a mapping to a statistical model that is later solved as a constrained optimization problem. We demonstrate the utility of this method by analyzing human saliva and Drosophila samples, using Illumina single-end sequencing of a 750 bp amplicon of the 16S rRNA gene. Phylogenetic resolution is significantly extended while reducing the number of falsely detected bacteria, as compared with standard single-region Roche 454 Pyrosequencing. Our approach can be seamlessly applied to simultaneous sequencing of multiple genes providing a higher resolution view of the composition and activity of complex microbial communities. PMID:24214960

  18. Complete Genome Sequence of Bacteriophage Deep-Blue Infecting Emetic Bacillus cereus

    PubMed Central

    Hock, Louise; Gillis, Annika

    2016-01-01

    The Bacillus cereus emetic pathotype is responsible for important food-borne intoxications. Here, we describe the complete genome sequence of bacteriophage Deep-Blue, which is able to infect emetic strains of B. cereus. Deep-Blue is a 159-kb myophage of the Bastille-like group within the Spounavirinae. PMID:27313285

  19. Complete Genome Sequence of Bacteriophage Deep-Blue Infecting Emetic Bacillus cereus.

    PubMed

    Hock, Louise; Gillis, Annika; Mahillon, Jacques

    2016-01-01

    The Bacillus cereus emetic pathotype is responsible for important food-borne intoxications. Here, we describe the complete genome sequence of bacteriophage Deep-Blue, which is able to infect emetic strains of B. cereus Deep-Blue is a 159-kb myophage of the Bastille-like group within the Spounavirinae. PMID:27313285

  20. Mutascope: sensitive detection of somatic mutations from deep amplicon sequencing

    PubMed Central

    Yost, Shawn E.; Alakus, Hakan; Matsui, Hiroko; Schwab, Richard B.; Jepsen, Kristen; Frazer, Kelly A.; Harismendy, Olivier

    2013-01-01

    Summary: We present Mutascope, a sequencing analysis pipeline specifically developed for the identification of somatic variants present at low-allelic fraction from high-throughput sequencing of amplicons from matched tumor-normal specimen. Using datasets reproducing tumor genetic heterogeneity, we demonstrate that Mutascope has a higher sensitivity and generates fewer false-positive calls than tools designed for shotgun sequencing or diploid genomes. Availability: Freely available on the web at http://sourceforge.net/projects/mutascope/. Contact: oharismendy@ucsd.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23712659

  1. DNA Methyltransferase Accessibility Protocol for Individual Templates by Deep Sequencing

    PubMed Central

    Darst, Russell P.; Nabilsi, Nancy H.; Pardo, Carolina E.; Riva, Alberto; Kladde, Michael P.

    2013-01-01

    A single-molecule probe of chromatin structure can uncover dynamic chromatin states and rare epigenetic variants of biological importance that bulk measures of chromatin structure miss. In bisulfite genomic sequencing, each sequenced clone records the methylation status of multiple sites on an individual molecule of DNA. An exogenous DNA methyltransferase can thus be used to image nucleosomes and other protein–DNA complexes. In this chapter, we describe the adaptation of this technique, termed Methylation Accessibility Protocol for individual templates, to modern high-throughput sequencing, which both simplifies the workflow and extends its utility. PMID:22929770

  2. Deep Sequencing Analysis of Nucleolar Small RNAs: Bioinformatics.

    PubMed

    Bai, Baoyan; Laiho, Marikki

    2016-01-01

    Small RNAs (size 20-30 nt) of various types have been actively investigated in recent years, and their subcellular compartmentalization and relative concentrations are likely to be of importance to their cellular and physiological functions. Comprehensive data on this subset of the transcriptome can only be obtained by application of high-throughput sequencing, which yields data that are inherently complex and multidimensional, as sequence composition, length, and abundance will all inform to the small RNA function. Subsequent data analysis, hypothesis testing, and presentation/visualization of the results are correspondingly challenging. We have constructed small RNA libraries derived from different cellular compartments, including the nucleolus, and asked whether small RNAs exist in the nucleolus and whether they are distinct from cytoplasmic and nuclear small RNAs, the miRNAs. Here, we present a workflow for analysis of small RNA sequencing data generated by the Ion Torrent PGM sequencer from samples derived from different cellular compartments. PMID:27576724

  3. HIV-1 Quasispecies Delineation by Tag Linkage Deep Sequencing

    PubMed Central

    Wu, Nicholas C.; De La Cruz, Justin; Al-Mawsawi, Laith Q.; Olson, C. Anders; Qi, Hangfei; Luan, Harding H.; Nguyen, Nguyen; Du, Yushen; Le, Shuai; Wu, Ting-Ting; Li, Xinmin; Lewis, Martha J.; Yang, Otto O.; Sun, Ren

    2014-01-01

    Trade-offs between throughput, read length, and error rates in high-throughput sequencing limit certain applications such as monitoring viral quasispecies. Here, we describe a molecular-based tag linkage method that allows assemblage of short sequence reads into long DNA fragments. It enables haplotype phasing with high accuracy and sensitivity to interrogate individual viral sequences in a quasispecies. This approach is demonstrated to deduce ∼2000 unique 1.3 kb viral sequences from HIV-1 quasispecies in vivo and after passaging ex vivo with a detection limit of ∼0.005% to ∼0.001%. Reproducibility of the method is validated quantitatively and qualitatively by a technical replicate. This approach can improve monitoring of the genetic architecture and evolution dynamics in any quasispecies population. PMID:24842159

  4. Molecular Diagnosis of Actinomadura madurae Infection by 16S rRNA Deep Sequencing

    PubMed Central

    SenGupta, Dhruba J.; Hoogestraat, Daniel R.; Cummings, Lisa A.; Bryant, Bronwyn H.; Natividad, Catherine; Thielges, Stephanie; Monsaas, Peter W.; Chau, Mimosa; Barbee, Lindley A.; Rosenthal, Christopher; Cookson, Brad T.; Hoffman, Noah G.

    2013-01-01

    Next-generation DNA sequencing can be used to catalog individual organisms within complex, polymicrobial specimens. Here, we utilized deep sequencing of 16S rRNA to implicate Actinomadura madurae as the cause of mycetoma in a diabetic patient when culture and conventional molecular methods were overwhelmed by overgrowth of other organisms. PMID:24108607

  5. Draft Genome Sequence of Loktanella cinnabarina LL-001T, Isolated from Deep-Sea Floor Sediment

    PubMed Central

    Tsubouchi, Taishi; Takaki, Yoshihiro; Koyanagi, Ryo; Satoh, Nori; Maruyama, Tadashi; Hatada, Yuji

    2013-01-01

    This report describes the draft genome sequence of Loktanella cinnabarina LL-001T, which was the first isolated strain from deep-sea floor sediment of the genus Loktanella. The draft genome sequence contains 3,896,245 bp, with a G+C content of 66.7%. PMID:24233588

  6. Draft Genome Sequence of Loktanella cinnabarina LL-001T, Isolated from Deep-Sea Floor Sediment.

    PubMed

    Nishi, Shinro; Tsubouchi, Taishi; Takaki, Yoshihiro; Koyanagi, Ryo; Satoh, Nori; Maruyama, Tadashi; Hatada, Yuji

    2013-01-01

    This report describes the draft genome sequence of Loktanella cinnabarina LL-001(T), which was the first isolated strain from deep-sea floor sediment of the genus Loktanella. The draft genome sequence contains 3,896,245 bp, with a G+C content of 66.7%. PMID:24233588

  7. Predicting effects of noncoding variants with deep learning-based sequence model.

    PubMed

    Zhou, Jian; Troyanskaya, Olga G

    2015-10-01

    Identifying functional effects of noncoding variants is a major challenge in human genetics. To predict the noncoding-variant effects de novo from sequence, we developed a deep learning-based algorithmic framework, DeepSEA (http://deepsea.princeton.edu/), that directly learns a regulatory sequence code from large-scale chromatin-profiling data, enabling prediction of chromatin effects of sequence alterations with single-nucleotide sensitivity. We further used this capability to improve prioritization of functional variants including expression quantitative trait loci (eQTLs) and disease-associated variants. PMID:26301843

  8. Predicting effects of noncoding variants with deep learning–based sequence model

    PubMed Central

    Zhou, Jian; Troyanskaya, Olga G

    2016-01-01

    Identifying functional effects of noncoding variants is a major challenge in human genetics. To predict the noncoding-variant effects de novo from sequence, we developed a deep learning–based algorithmic framework, DeepSEA (http://deepsea.princeton.edu/), that directly learns a regulatory sequence code from large-scale chromatin-profiling data, enabling prediction of chromatin effects of sequence alterations with single-nucleotide sensitivity. We further used this capability to improve prioritization of functional variants including expression quantitative trait loci (eQTLs) and disease-associated variants. PMID:26301843

  9. Use of S1 nuclease in deep sequencing for detection of double-stranded RNA viruses.

    PubMed

    Shimada, Saya; Nagai, Makoto; Moriyama, Hiromitsu; Fukuhara, Toshiyuki; Koyama, Satoshi; Omatsu, Tsutomu; Furuya, Tetsuya; Shirai, Junsuke; Mizutani, Tetsuya

    2015-09-01

    Metagenomic approach using next-generation DNA sequencing has facilitated the detection of many pathogenic viruses from fecal samples. However, in many cases, majority of the detected sequences originate from the host genome and bacterial flora in the gut. Here, to improve efficiency of the detection of double-stranded (ds) RNA viruses from samples, we evaluated the applicability of S1 nuclease on deep sequencing. Treating total RNA with S1 nuclease resulted in 1.5-28.4- and 10.1-208.9-fold increases in sequence reads of group A rotavirus in fecal and viral culture samples, respectively. Moreover, increasing coverage of mapping to reference sequences allowed for sufficient genotyping using analytical software. These results suggest that library construction using S1 nuclease is useful for deep sequencing in the detection of dsRNA viruses. PMID:25843154

  10. Short-Read Assembly of Full-Length 16S Amplicons Reveals Bacterial Diversity in Subsurface Sediments

    PubMed Central

    Miller, Christopher S.; Handley, Kim M.; Wrighton, Kelly C.; Frischkorn, Kyle R.; Thomas, Brian C.; Banfield, Jillian F.

    2013-01-01

    In microbial ecology, a fundamental question relates to how community diversity and composition change in response to perturbation. Most studies have had limited ability to deeply sample community structure (e.g. Sanger-sequenced 16S rRNA libraries), or have had limited taxonomic resolution (e.g. studies based on 16S rRNA hypervariable region sequencing). Here, we combine the higher taxonomic resolution of near-full-length 16S rRNA gene amplicons with the economics and sensitivity of short-read sequencing to assay the abundance and identity of organisms that represent as little as 0.01% of sediment bacterial communities. We used a new version of EMIRGE optimized for large data size to reconstruct near-full-length 16S rRNA genes from amplicons sheared and sequenced with Illumina technology. The approach allowed us to differentiate the community composition among samples acquired before perturbation, after acetate amendment shifted the predominant metabolism to iron reduction, and once sulfate reduction began. Results were highly reproducible across technical replicates, and identified specific taxa that responded to the perturbation. All samples contain very high alpha diversity and abundant organisms from phyla without cultivated representatives. Surprisingly, at the time points measured, there was no strong loss of evenness, despite the selective pressure of acetate amendment and change in the terminal electron accepting process. However, community membership was altered significantly. The method allows for sensitive, accurate profiling of the “long tail” of low abundance organisms that exist in many microbial communities, and can resolve population dynamics in response to environmental change. PMID:23405248

  11. Deep Sequencing Analysis of the Ixodes ricinus Haemocytome

    PubMed Central

    Franta, Zdeněk; Pedra, Joao H. F.; Ribeiro, José M. C.

    2015-01-01

    Background Ixodes ricinus is the main tick vector of the microbes that cause Lyme disease and tick-borne encephalitis in Europe. Pathogens transmitted by ticks have to overcome innate immunity barriers present in tick tissues, including midgut, salivary glands epithelia and the hemocoel. Molecularly, invertebrate immunity is initiated when pathogen recognition molecules trigger serum or cellular signalling cascades leading to the production of antimicrobials, pathogen opsonization and phagocytosis. We presently aimed at identifying hemocyte transcripts from semi-engorged female I. ricinus ticks by mass sequencing a hemocyte cDNA library and annotating immune-related transcripts based on their hemocyte abundance as well as their ubiquitous distribution. Methodology/principal findings De novo assembly of 926,596 pyrosequence reads plus 49,328,982 Illumina reads (148 nt length) from a hemocyte library, together with over 189 million Illumina reads from salivary gland and midgut libraries, generated 15,716 extracted coding sequences (CDS); these are displayed in an annotated hyperlinked spreadsheet format. Read mapping allowed the identification and annotation of tissue-enriched transcripts. A total of 327 transcripts were found significantly over expressed in the hemocyte libraries, including those coding for scavenger receptors, antimicrobial peptides, pathogen recognition proteins, proteases and protease inhibitors. Vitellogenin and lipid metabolism transcription enrichment suggests fat body components. We additionally annotated ubiquitously distributed transcripts associated with immune function, including immune-associated signal transduction proteins and transcription factors, including the STAT transcription factor. Conclusions/significance This is the first systems biology approach to describe the genes expressed in the haemocytes of this neglected disease vector. A total of 2,860 coding sequences were deposited to GenBank, increasing to 27,547 the number so

  12. Lineage analysis by microsatellite loci deep sequencing in mice.

    PubMed

    Luo, Tao; He, Xionglei; Xing, Ke

    2016-05-01

    Lineage analysis is the identification of all the progeny of a single progenitor cell, and has become particularly useful for studying developmental processes and cancer biology. Here, we propose a novel and effective method for lineage analysis that combines sequence capture and next-generation sequencing technology. Genome-wide mononucleotide and dinucleotide microsatellite loci in eight samples from two mice were identified and used to construct phylogenetic trees based on somatic indel mutations at these loci, which were unique enough to distinguish and parse samples from different mice into different groups along the lineage tree. For example, biopsies from the liver and stomach, which originate from the endoderm, were located in the same clade, while samples in kidney, which originate from the mesoderm, were located in another clade. Yet, tissue with a common developmental origin may still contain cells of a mixed ancestry. This genome-wide approach thus provides a non-invasive lineage analysis method based on mutations that accumulate in the genomes of opaque multicellular organism somatic cells. Mol. Reprod. Dev. 83: 387-391, 2016. © 2016 Wiley Periodicals, Inc. PMID:26932355

  13. SNP discovery through de novo deep sequencing using the next generation of DNA sequencers

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The production of high volumes of DNA sequence data using new technologies has permitted more efficient identification of single nucleotide polymorphisms in vertebrate genomes. This chapter presented practical methodology for production and analysis of DNA sequence data for SNP discovery....

  14. Determining mutant spectra of three RNA viral samples using ultra-deep sequencing

    SciTech Connect

    Chen, H

    2012-06-06

    RNA viruses have extremely high mutation rates that enable the virus to adapt to new host environments and even jump from one species to another. As part of a viral transmission study, three viral samples collected from naturally infected animals were sequenced using Illumina paired-end technology at ultra-deep coverage. In order to determine the mutant spectra within the viral quasispecies, it is critical to understand the sequencing error rates and control for false positive calls of viral variants (point mutantations). I will estimate the sequencing error rate from two control sequences and characterize the mutant spectra in the natural samples with this error rate.

  15. Deep sequencing as a probe of normal stem cell fate and preneoplasia in human epidermis

    PubMed Central

    Simons, Benjamin D.

    2016-01-01

    Using deep sequencing technology, methods based on the sporadic acquisition of somatic DNA mutations in human tissues have been used to trace the clonal evolution of progenitor cells in diseased states. However, the potential of these approaches to explore cell fate behavior of normal tissues and the initiation of preneoplasia remain underexploited. Focusing on the results of a recent deep sequencing study of eyelid epidermis, we show that the quantitative analysis of mutant clone size provides a general method to resolve the pattern of normal stem cell fate and to detect and characterize the mutational signature of rare field transformations in human tissues, with implications for the early detection of preneoplasia. PMID:26699486

  16. Using Small RNA Deep Sequencing Data to Detect Human Viruses

    PubMed Central

    Wang, Fang; Sun, Yu; Ruan, Jishou; Chen, Rui; Chen, Xin; Chen, Chengjie; Kreuze, Jan F.; Fei, ZhangJun; Zhu, Xiao

    2016-01-01

    Small RNA sequencing (sRNA-seq) can be used to detect viruses in infected hosts without the necessity to have any prior knowledge or specialized sample preparation. The sRNA-seq method was initially used for viral detection and identification in plants and then in invertebrates and fungi. However, it is still controversial to use sRNA-seq in the detection of mammalian or human viruses. In this study, we used 931 sRNA-seq runs of data from the NCBI SRA database to detect and identify viruses in human cells or tissues, particularly from some clinical samples. Six viruses including HPV-18, HBV, HCV, HIV-1, SMRV, and EBV were detected from 36 runs of data. Four viruses were consistent with the annotations from the previous studies. HIV-1 was found in clinical samples without the HIV-positive reports, and SMRV was found in Diffuse Large B-Cell Lymphoma cells for the first time. In conclusion, these results suggest the sRNA-seq can be used to detect viruses in mammals and humans. PMID:27066498

  17. Deep sequencing reveals 50 novel genes for recessive cognitive disorders.

    PubMed

    Najmabadi, Hossein; Hu, Hao; Garshasbi, Masoud; Zemojtel, Tomasz; Abedini, Seyedeh Sedigheh; Chen, Wei; Hosseini, Masoumeh; Behjati, Farkhondeh; Haas, Stefan; Jamali, Payman; Zecha, Agnes; Mohseni, Marzieh; Püttmann, Lucia; Vahid, Leyla Nouri; Jensen, Corinna; Moheb, Lia Abbasi; Bienek, Melanie; Larti, Farzaneh; Mueller, Ines; Weissmann, Robert; Darvish, Hossein; Wrogemann, Klaus; Hadavi, Valeh; Lipkowitz, Bettina; Esmaeeli-Nieh, Sahar; Wieczorek, Dagmar; Kariminejad, Roxana; Firouzabadi, Saghar Ghasemi; Cohen, Monika; Fattahi, Zohreh; Rost, Imma; Mojahedi, Faezeh; Hertzberg, Christoph; Dehghan, Atefeh; Rajab, Anna; Banavandi, Mohammad Javad Soltani; Hoffer, Julia; Falah, Masoumeh; Musante, Luciana; Kalscheuer, Vera; Ullmann, Reinhard; Kuss, Andreas Walter; Tzschach, Andreas; Kahrizi, Kimia; Ropers, H Hilger

    2011-10-01

    Common diseases are often complex because they are genetically heterogeneous, with many different genetic defects giving rise to clinically indistinguishable phenotypes. This has been amply documented for early-onset cognitive impairment, or intellectual disability, one of the most complex disorders known and a very important health care problem worldwide. More than 90 different gene defects have been identified for X-chromosome-linked intellectual disability alone, but research into the more frequent autosomal forms of intellectual disability is still in its infancy. To expedite the molecular elucidation of autosomal-recessive intellectual disability, we have now performed homozygosity mapping, exon enrichment and next-generation sequencing in 136 consanguineous families with autosomal-recessive intellectual disability from Iran and elsewhere. This study, the largest published so far, has revealed additional mutations in 23 genes previously implicated in intellectual disability or related neurological disorders, as well as single, probably disease-causing variants in 50 novel candidate genes. Proteins encoded by several of these genes interact directly with products of known intellectual disability genes, and many are involved in fundamental cellular processes such as transcription and translation, cell-cycle control, energy metabolism and fatty-acid synthesis, which seem to be pivotal for normal brain development and function. PMID:21937992

  18. Using Small RNA Deep Sequencing Data to Detect Human Viruses.

    PubMed

    Wang, Fang; Sun, Yu; Ruan, Jishou; Chen, Rui; Chen, Xin; Chen, Chengjie; Kreuze, Jan F; Fei, ZhangJun; Zhu, Xiao; Gao, Shan

    2016-01-01

    Small RNA sequencing (sRNA-seq) can be used to detect viruses in infected hosts without the necessity to have any prior knowledge or specialized sample preparation. The sRNA-seq method was initially used for viral detection and identification in plants and then in invertebrates and fungi. However, it is still controversial to use sRNA-seq in the detection of mammalian or human viruses. In this study, we used 931 sRNA-seq runs of data from the NCBI SRA database to detect and identify viruses in human cells or tissues, particularly from some clinical samples. Six viruses including HPV-18, HBV, HCV, HIV-1, SMRV, and EBV were detected from 36 runs of data. Four viruses were consistent with the annotations from the previous studies. HIV-1 was found in clinical samples without the HIV-positive reports, and SMRV was found in Diffuse Large B-Cell Lymphoma cells for the first time. In conclusion, these results suggest the sRNA-seq can be used to detect viruses in mammals and humans. PMID:27066498

  19. Novel lineages of Southern Ocean deep-sea foraminifera revealed by environmental DNA sequencing

    NASA Astrophysics Data System (ADS)

    Pawlowski, Jan; Fontaine, Delia; da Silva, Ana Aranda; Guiard, Jackie

    2011-10-01

    Diversity of deep-sea foraminifera is commonly studied based on analysis of agglutinated and calcareous tests preserved in the dried sediment samples. Soft-walled and agglutinated monothalamous (single-chambered) foraminifera are usually ignored because they are poorly preserved and difficult to identify. Moreover, the assemblage examined is usually limited to sediment size fraction larger than 63 or 125 μm. To overcome these problems, we analysed the foraminiferal assemblage based on ribosomal DNA sequences amplified specifically from total DNA extracted from unsieved and fine fraction (<32 μm) of sediment samples from three sites in Southern Ocean. We obtained 392 sequences, representing 123 phylotypes of foraminifera. Over 90% of phylotypes (112) could not be assigned to any previously sequenced species or genera. Among these new phylotypes, 20 belong to the clade of multi-chambered calcareous Rotaliida and agglutinated Textulariida, while 94 branch among the radiation of monothalamous species. Many new phylotypes clustered together with other environmental foraminiferal sequences and sequences of unknown origin. Eight new lineages of environmental foraminiferal sequences (ENFOR 1-8) were distinguished. The morphology of species included in these novel lineages is unknown, but we can speculate that they are tiny, amoeboid protists present in the deep-sea sediments. Their diversity may be as high as that of better known large-sized foraminifera. Documenting this hidden component of deep-sea foraminiferal assemblages is a major challenge for the future.

  20. Deep sequencing of the murine olfactory receptor neuron transcriptome.

    PubMed

    Kanageswaran, Ninthujah; Demond, Marilen; Nagel, Maximilian; Schreiner, Benjamin S P; Baumgart, Sabrina; Scholz, Paul; Altmüller, Janine; Becker, Christian; Doerner, Julia F; Conrad, Heike; Oberland, Sonja; Wetzel, Christian H; Neuhaus, Eva M; Hatt, Hanns; Gisselmann, Günter

    2015-01-01

    The ability of animals to sense and differentiate among thousands of odorants relies on a large set of olfactory receptors (OR) and a multitude of accessory proteins within the olfactory epithelium (OE). ORs and related signaling mechanisms have been the subject of intensive studies over the past years, but our knowledge regarding olfactory processing remains limited. The recent development of next generation sequencing (NGS) techniques encouraged us to assess the transcriptome of the murine OE. We analyzed RNA from OEs of female and male adult mice and from fluorescence-activated cell sorting (FACS)-sorted olfactory receptor neurons (ORNs) obtained from transgenic OMP-GFP mice. The Illumina RNA-Seq protocol was utilized to generate up to 86 million reads per transcriptome. In OE samples, nearly all OR and trace amine-associated receptor (TAAR) genes involved in the perception of volatile amines were detectably expressed. Other genes known to participate in olfactory signaling pathways were among the 200 genes with the highest expression levels in the OE. To identify OE-specific genes, we compared olfactory neuron expression profiles with RNA-Seq transcriptome data from different murine tissues. By analyzing different transcript classes, we detected the expression of non-olfactory GPCRs in ORNs and established an expression ranking for GPCRs detected in the OE. We also identified other previously undescribed membrane proteins as potential new players in olfaction. The quantitative and comprehensive transcriptome data provide a virtually complete catalogue of genes expressed in the OE and present a useful tool to uncover candidate genes involved in, for example, olfactory signaling, OR trafficking and recycling, and proliferation. PMID:25590618

  1. Deep Sequencing of the Murine Olfactory Receptor Neuron Transcriptome

    PubMed Central

    Kanageswaran, Ninthujah; Demond, Marilen; Nagel, Maximilian; Schreiner, Benjamin S. P.; Baumgart, Sabrina; Scholz, Paul; Altmüller, Janine; Becker, Christian; Doerner, Julia F.; Conrad, Heike; Oberland, Sonja; Wetzel, Christian H.; Neuhaus, Eva M.; Hatt, Hanns; Gisselmann, Günter

    2015-01-01

    The ability of animals to sense and differentiate among thousands of odorants relies on a large set of olfactory receptors (OR) and a multitude of accessory proteins within the olfactory epithelium (OE). ORs and related signaling mechanisms have been the subject of intensive studies over the past years, but our knowledge regarding olfactory processing remains limited. The recent development of next generation sequencing (NGS) techniques encouraged us to assess the transcriptome of the murine OE. We analyzed RNA from OEs of female and male adult mice and from fluorescence-activated cell sorting (FACS)-sorted olfactory receptor neurons (ORNs) obtained from transgenic OMP-GFP mice. The Illumina RNA-Seq protocol was utilized to generate up to 86 million reads per transcriptome. In OE samples, nearly all OR and trace amine-associated receptor (TAAR) genes involved in the perception of volatile amines were detectably expressed. Other genes known to participate in olfactory signaling pathways were among the 200 genes with the highest expression levels in the OE. To identify OE-specific genes, we compared olfactory neuron expression profiles with RNA-Seq transcriptome data from different murine tissues. By analyzing different transcript classes, we detected the expression of non-olfactory GPCRs in ORNs and established an expression ranking for GPCRs detected in the OE. We also identified other previously undescribed membrane proteins as potential new players in olfaction. The quantitative and comprehensive transcriptome data provide a virtually complete catalogue of genes expressed in the OE and present a useful tool to uncover candidate genes involved in, for example, olfactory signaling, OR trafficking and recycling, and proliferation. PMID:25590618

  2. Deep Sequencing of the Vaginal Microbiota of Women with HIV

    PubMed Central

    Hummelen, Ruben; Fernandes, Andrew D.; Macklaim, Jean M.; Dickson, Russell J.; Changalucha, John

    2010-01-01

    Background Women living with HIV and co-infected with bacterial vaginosis (BV) are at higher risk for transmitting HIV to a partner or newborn. It is poorly understood which bacterial communities constitute BV or the normal vaginal microbiota among this population and how the microbiota associated with BV responds to antibiotic treatment. Methods and Findings The vaginal microbiota of 132 HIV positive Tanzanian women, including 39 who received metronidazole treatment for BV, were profiled using Illumina to sequence the V6 region of the 16S rRNA gene. Of note, Gardnerella vaginalis and Lactobacillus iners were detected in each sample constituting core members of the vaginal microbiota. Eight major clusters were detected with relatively uniform microbiota compositions. Two clusters dominated by L. iners or L. crispatus were strongly associated with a normal microbiota. The L. crispatus dominated microbiota were associated with low pH, but when L. crispatus was not present, a large fraction of L. iners was required to predict a low pH. Four clusters were strongly associated with BV, and were dominated by Prevotella bivia, Lachnospiraceae, or a mixture of different species. Metronidazole treatment reduced the microbial diversity and perturbed the BV-associated microbiota, but rarely resulted in the establishment of a lactobacilli-dominated microbiota. Conclusions Illumina based microbial profiling enabled high though-put analyses of microbial samples at a high phylogenetic resolution. The vaginal microbiota among women living with HIV in Sub-Saharan Africa constitutes several profiles associated with a normal microbiota or BV. Recurrence of BV frequently constitutes a different BV-associated profile than before antibiotic treatment. PMID:20711427

  3. Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction

    PubMed Central

    Laehnemann, David; Borkhardt, Arndt

    2016-01-01

    Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here. PMID:26026159

  4. Draft Genome Sequence of the Deep-Sea Bacterium Shewanella benthica Strain KT99.

    PubMed

    Lauro, F M; Chastain, R A; Ferriera, S; Johnson, J; Yayanos, A A; Bartlett, D H

    2013-01-01

    We report the draft genome sequence of the obligately piezophilic Shewanella benthica strain KT99 isolated from the abyssal South Pacific Ocean. Strain KT99 is the first piezophilic isolate from the Tonga-Kermadec trench, and its genome provides many clues on high-pressure adaptation and the evolution of deep-sea piezophilic bacteria. PMID:23723392

  5. Deep-sequencing of the peach latent mosaic viroid reveals new aspects of population heterogeneity.

    PubMed

    Glouzon, Jean-Pierre Sehi; Bolduc, François; Wang, Shengrui; Najmanovich, Rafael J; Perreault, Jean-Pierre

    2014-01-01

    Viroids are small circular single-stranded infectious RNAs characterized by a relatively high mutation level. Knowledge of their sequence heterogeneity remains largely elusive and previous studies, using Sanger sequencing, were based on a limited number of sequences. In an attempt to address sequence heterogeneity from a population dynamics perspective, a GF305-indicator peach tree was infected with a single variant of the Avsunviroidae family member Peach latent mosaic viroid (PLMVd). Six months post-inoculation, full-length circular conformers of PLMVd were isolated and deep-sequenced. We devised an original approach to the bioinformatics refinement of our sequence libraries involving important phenotypic data, based on the systematic analysis of hammerhead self-cleavage activity. Two distinct libraries yielded a total of 3,939 different PLMVd variants. Sequence variants exhibiting up to ∼17% of mutations relative to the inoculated viroid were retrieved, clearly illustrating the high level of divergence dynamics within a unique population. While we initially assumed that most positions of the viroid sequence would mutate, we were surprised to discover that ∼50% of positions remained perfectly conserved, including several small stretches as well as a small motif reminiscent of a GNRA tetraloop which are the result of various selective pressures. Using a hierarchical clustering algorithm, the different variants harvested were subdivided into 7 clusters. We found that most sequences contained an average of 4.6 to 6.4 mutations compared to the variant used to initially inoculate the plant. Interestingly, it was possible to reconstitute and compare the sequence evolution of each of these clusters. In doing so, we identified several key mutations. This study provides a reliable pipeline for the treatment of viroid deep-sequencing. It also sheds new light on the extent of sequence variation that a viroid population can sustain, and which may give rise to a

  6. Studies of a Biochemical Factory: Tomato Trichome Deep Expressed Sequence Tag Sequencing and Proteomics1[W][OA

    PubMed Central

    Schilmiller, Anthony L.; Miner, Dennis P.; Larson, Matthew; McDowell, Eric; Gang, David R.; Wilkerson, Curtis; Last, Robert L.

    2010-01-01

    Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces β-caryophyllene and α-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells. PMID:20431087

  7. Resolving the Complexity of Human Skin Metagenomes Using Single-Molecule Sequencing

    PubMed Central

    Tsai, Yu-Chih; Deming, Clayton; Segre, Julia A.; Kong, Heidi H.; Korlach, Jonas

    2016-01-01

    ABSTRACT Deep metagenomic shotgun sequencing has emerged as a powerful tool to interrogate composition and function of complex microbial communities. Computational approaches to assemble genome fragments have been demonstrated to be an effective tool for de novo reconstruction of genomes from these communities. However, the resultant “genomes” are typically fragmented and incomplete due to the limited ability of short-read sequence data to assemble complex or low-coverage regions. Here, we use single-molecule, real-time (SMRT) sequencing to reconstruct a high-quality, closed genome of a previously uncharacterized Corynebacterium simulans and its companion bacteriophage from a skin metagenomic sample. Considerable improvement in assembly quality occurs in hybrid approaches incorporating short-read data, with even relatively small amounts of long-read data being sufficient to improve metagenome reconstruction. Using short-read data to evaluate strain variation of this C. simulans in its skin community at single-nucleotide resolution, we observed a dominant C. simulans strain with moderate allelic heterozygosity throughout the population. We demonstrate the utility of SMRT sequencing and hybrid approaches in metagenome quantitation, reconstruction, and annotation. PMID:26861018

  8. A user-friendly computational workflow for the analysis of microRNA deep sequencing data.

    PubMed

    Majer, Anna; Caligiuri, Kyle A; Booth, Stephanie A

    2013-01-01

    Second-generation high-throughput sequencing is a robust and inexpensive methodology that is becoming an increasingly common technique for the study of microRNA (miRNA) expression levels in the central nervous system. This method allows for the identification of both known and novel miRNAs, reporting on the qualitative and quantitative levels these RNA species represent in any given sample. Numerous bioinformatic programs are currently available to analyze deep sequencing data but many require at least a partial understanding of the command line interface. In this chapter, we describe a user-friendly computational workflow guiding the user through the process from the initial FASTQ deep sequencing file to the identification of known and potentially novel miRNAs in a given experiment, as well as the assessment of the differential expression of these miRNAs between experimental samples. Furthermore, programs that can predict potential targets for these miRNAs are also highlighted. PMID:23007497

  9. Pooled Amplicon Deep Sequencing of Candidate Plasmodium falciparum Transmission-Blocking Vaccine Antigens.

    PubMed

    Juliano, Jonathan J; Parobek, Christian M; Brazeau, Nicholas F; Ngasala, Billy; Randrianarivelojosia, Milijaona; Lon, Chanthap; Mwandagalirwa, Kashamuka; Tshefu, Antoinette; Dhar, Ravi; Das, Bidyut K; Hoffman, Irving; Martinson, Francis; Mårtensson, Andreas; Saunders, David L; Kumar, Nirbhay; Meshnick, Steven R

    2016-01-01

    Polymorphisms within Plasmodium falciparum vaccine candidate antigens have the potential to compromise vaccine efficacy. Understanding the allele frequencies of polymorphisms in critical binding regions of antigens can help in the designing of strain-transcendent vaccines. Here, we adopt a pooled deep-sequencing approach, originally designed to study P. falciparum drug resistance mutations, to study the diversity of two leading transmission-blocking vaccine candidates, Pfs25 and Pfs48/45. We sequenced 329 P. falciparum field isolates from six different geographic regions. Pfs25 showed little diversity, with only one known polymorphism identified in the region associated with binding of transmission-blocking antibodies among our isolates. However, we identified four new mutations among eight non-synonymous mutations within the presumed antibody-binding region of Pfs48/45. Pooled deep sequencing provides a scalable and cost-effective approach for the targeted study of allele frequencies of P. falciparum candidate vaccine antigens. PMID:26503281

  10. Classification of ncRNAs using position and size information in deep sequencing data

    PubMed Central

    Erhard, Florian; Zimmer, Ralf

    2010-01-01

    Motivation: Small non-coding RNAs (ncRNAs) play important roles in various cellular functions in all clades of life. With next-generation sequencing techniques, it has become possible to study ncRNAs in a high-throughput manner and by using specialized algorithms ncRNA classes such as miRNAs can be detected in deep sequencing data. Typically, such methods are targeted to a certain class of ncRNA. Many methods rely on RNA secondary structure prediction, which is not always accurate and not all ncRNA classes are characterized by a common secondary structure. Unbiased classification methods for ncRNAs could be important to improve accuracy and to detect new ncRNA classes in sequencing data. Results: Here, we present a scoring system called ALPS (alignment of pattern matrices score) that only uses primary information from a deep sequencing experiment, i.e. the relative positions and lengths of reads, to classify ncRNAs. ALPS makes no further assumptions, e.g. about common structural properties in the ncRNA class and is nevertheless able to identify ncRNA classes with high accuracy. Since ALPS is not designed to recognize a certain class of ncRNA, it can be used to detect novel ncRNA classes, as long as these unknown ncRNAs have a characteristic pattern of deep sequencing read lengths and positions. We evaluate our scoring system on publicly available deep sequencing data and show that it is able to classify known ncRNAs with high sensitivity and specificity. Availability: Calculated pattern matrices of the datasets hESC and EB are available at the project web site http://www.bio.ifi.lmu.de/ALPS. An implementation of the described method is available upon request from the authors. Contact: florian.erhard@bio.ifi.lmu.de PMID:20823303

  11. Enhanced arbovirus surveillance with deep sequencing: identification of novel rhabdoviruses and bunyaviruses in Australian mosquitoes

    PubMed Central

    Coffey, Lark L.; Page, Brady L.; Greninger, Alexander L.; Herring, Belinda L.; Russell, Richard C.; Doggett, Stephen L.; Haniotis, John; Wang, Chunlin; Deng, Xutao; Delwart, Eric L.

    2013-01-01

    Viral metagenomics characterizes known and identifies unknown viruses based on sequence similarities to any previously sequenced viral genomes. A metagenomics approach was used to identify virus sequences in Australian mosquitoes causing cytopathic effects in inoculated mammalian cell cultures. Sequence comparisons revealed strains of Liao Ning virus (Reovirus, Seadornavirus), previously detected only in China, livestock-infecting Stretch Lagoon virus (Reovirus, Orbivirus), two novel dimarhabdoviruses, named Beaumont and North Creek viruses, and two novel orthobunyaviruses, named Murrumbidgee and Salt Ash viruses. The novel virus proteomes diverged by ≥50% relative to their closest previously genetically characterized viral relatives. Deep sequencing also generated genomes of Warrego and Wallal viruses, orbiviruses linked to kangaroo blindness, whose genomes had not been fully characterized. This study highlights viral metagenomics in concert with traditional arbovirus surveillance to characterize known and new arboviruses in field-collected mosquitoes. Follow-up epidemiological studies are required to determine whether the novel viruses infect humans. PMID:24314645

  12. Enhanced arbovirus surveillance with deep sequencing: Identification of novel rhabdoviruses and bunyaviruses in Australian mosquitoes.

    PubMed

    Coffey, Lark L; Page, Brady L; Greninger, Alexander L; Herring, Belinda L; Russell, Richard C; Doggett, Stephen L; Haniotis, John; Wang, Chunlin; Deng, Xutao; Delwart, Eric L

    2014-01-01

    Viral metagenomics characterizes known and identifies unknown viruses based on sequence similarities to any previously sequenced viral genomes. A metagenomics approach was used to identify virus sequences in Australian mosquitoes causing cytopathic effects in inoculated mammalian cell cultures. Sequence comparisons revealed strains of Liao Ning virus (Reovirus, Seadornavirus), previously detected only in China, livestock-infecting Stretch Lagoon virus (Reovirus, Orbivirus), two novel dimarhabdoviruses, named Beaumont and North Creek viruses, and two novel orthobunyaviruses, named Murrumbidgee and Salt Ash viruses. The novel virus proteomes diverged by ≥ 50% relative to their closest previously genetically characterized viral relatives. Deep sequencing also generated genomes of Warrego and Wallal viruses, orbiviruses linked to kangaroo blindness, whose genomes had not been fully characterized. This study highlights viral metagenomics in concert with traditional arbovirus surveillance to characterize known and new arboviruses in field-collected mosquitoes. Follow-up epidemiological studies are required to determine whether the novel viruses infect humans. PMID:24314645

  13. Deep sequencing reveals global patterns of mRNA recruitment during translation initiation

    PubMed Central

    Gao, Rong; Yu, Kai; Nie, Jukui; Lian, Tengfei; Jin, Jianshi; Liljas, Anders; Su, Xiao-Dong

    2016-01-01

    In this work, we developed a method to systematically study the sequence preference of mRNAs during translation initiation. Traditionally, the dynamic process of translation initiation has been studied at the single molecule level with limited sequencing possibility. Using deep sequencing techniques, we identified the sequence preference at different stages of the initiation complexes. Our results provide a comprehensive and dynamic view of the initiation elements in the translation initiation region (TIR), including the S1 binding sequence, the Shine-Dalgarno (SD)/anti-SD interaction and the second codon, at the equilibrium of different initiation complexes. Moreover, our experiments reveal the conformational changes and regional dynamics throughout the dynamic process of mRNA recruitment. PMID:27460773

  14. Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence

    PubMed Central

    Kinney, Justin B.; Murugan, Anand; Callan, Curtis G.; Cox, Edward C.

    2010-01-01

    Cells use protein-DNA and protein-protein interactions to regulate transcription. A biophysical understanding of this process has, however, been limited by the lack of methods for quantitatively characterizing the interactions that occur at specific promoters and enhancers in living cells. Here we show how such biophysical information can be revealed by a simple experiment in which a library of partially mutated regulatory sequences are partitioned according to their in vivo transcriptional activities and then sequenced en masse. Computational analysis of the sequence data produced by this experiment can provide precise quantitative information about how the regulatory proteins at a specific arrangement of binding sites work together to regulate transcription. This ability to reliably extract precise information about regulatory biophysics in the face of experimental noise is made possible by a recently identified relationship between likelihood and mutual information. Applying our experimental and computational techniques to the Escherichia coli lac promoter, we demonstrate the ability to identify regulatory protein binding sites de novo, determine the sequence-dependent binding energy of the proteins that bind these sites, and, importantly, measure the in vivo interaction energy between RNA polymerase and a DNA-bound transcription factor. Our approach provides a generally applicable method for characterizing the biophysical basis of transcriptional regulation by a specified regulatory sequence. The principles of our method can also be applied to a wide range of other problems in molecular biology. PMID:20439748

  15. Ultra-Deep Sequencing of Intra-host Rabies Virus Populations during Cross-species Transmission

    PubMed Central

    Borucki, Monica K.; Chen-Harris, Haiyin; Lao, Victoria; Vanier, Gilda; Wadford, Debra A.; Messenger, Sharon; Allen, Jonathan E.

    2013-01-01

    One of the hurdles to understanding the role of viral quasispecies in RNA virus cross-species transmission (CST) events is the need to analyze a densely sampled outbreak using deep sequencing in order to measure the amount of mutation occurring on a small time scale. In 2009, the California Department of Public Health reported a dramatic increase (350) in the number of gray foxes infected with a rabies virus variant for which striped skunks serve as a reservoir host in Humboldt County. To better understand the evolution of rabies, deep-sequencing was applied to 40 unpassaged rabies virus samples from the Humboldt outbreak. For each sample, approximately 11 kb of the 12 kb genome was amplified and sequenced using the Illumina platform. Average coverage was 17,448 and this allowed characterization of the rabies virus population present in each sample at unprecedented depths. Phylogenetic analysis of the consensus sequence data demonstrated that samples clustered according to date (1995 vs. 2009) and geographic location (northern vs. southern). A single amino acid change in the G protein distinguished a subset of northern foxes from a haplotype present in both foxes and skunks, suggesting this mutation may have played a role in the observed increased transmission among foxes in this region. Deep-sequencing data indicated that many genetic changes associated with the CST event occurred prior to 2009 since several nonsynonymous mutations that were present in the consensus sequences of skunk and fox rabies samples obtained from 20032010 were present at the sub-consensus level (as rare variants in the viral population) in skunk and fox samples from 1995. These results suggest that analysis of rare variants within a viral population may yield clues to ancestral genomes and identify rare variants that have the potential to be selected for if environment conditions change. PMID:24278493

  16. Ultra-deep sequencing of intra-host rabies virus populations during cross-species transmission.

    PubMed

    Borucki, Monica K; Chen-Harris, Haiyin; Lao, Victoria; Vanier, Gilda; Wadford, Debra A; Messenger, Sharon; Allen, Jonathan E

    2013-11-01

    One of the hurdles to understanding the role of viral quasispecies in RNA virus cross-species transmission (CST) events is the need to analyze a densely sampled outbreak using deep sequencing in order to measure the amount of mutation occurring on a small time scale. In 2009, the California Department of Public Health reported a dramatic increase (350) in the number of gray foxes infected with a rabies virus variant for which striped skunks serve as a reservoir host in Humboldt County. To better understand the evolution of rabies, deep-sequencing was applied to 40 unpassaged rabies virus samples from the Humboldt outbreak. For each sample, approximately 11 kb of the 12 kb genome was amplified and sequenced using the Illumina platform. Average coverage was 17,448 and this allowed characterization of the rabies virus population present in each sample at unprecedented depths. Phylogenetic analysis of the consensus sequence data demonstrated that samples clustered according to date (1995 vs. 2009) and geographic location (northern vs. southern). A single amino acid change in the G protein distinguished a subset of northern foxes from a haplotype present in both foxes and skunks, suggesting this mutation may have played a role in the observed increased transmission among foxes in this region. Deep-sequencing data indicated that many genetic changes associated with the CST event occurred prior to 2009 since several nonsynonymous mutations that were present in the consensus sequences of skunk and fox rabies samples obtained from 20032010 were present at the sub-consensus level (as rare variants in the viral population) in skunk and fox samples from 1995. These results suggest that analysis of rare variants within a viral population may yield clues to ancestral genomes and identify rare variants that have the potential to be selected for if environment conditions change. PMID:24278493

  17. Seismic sequence stratigraphy of Tertiary sediments, offshore Sarawak deep-water area

    SciTech Connect

    Mohammad, A.M. )

    1994-07-01

    Tectonic processes and sea level changes are the main key factors that have strongly influenced clastic and carbonate sedimentations in the Sarawak deep-water area. A seismic sequence stratigraphy of Tertiary sediments was conducted in the area with the main objective of developing a workable genetic chronostratigraphic framework that defines the sequence and system tracts boundaries within which depositional systems and lithofacies can be identified, mapped and interpreted. This study has resulted in the identification of eight major depositional sequences that are bounded by regional unconformities and correlative conformities. These sequences can generally be grouped into four megasequences, based on the main tectonic events observed in the area. Three system tracts of a type-1, third-order sequence boundary were recognized in most of the sequences: lowstand, transgressive, and highstand systems tracts. The lowstand system tract includes basin-floor fans, slope fans, and lowstand prograding wedges. Paleoenvironmental distribution maps constructed for each of the sequences using seismic facies analysis and nearby well control suggest that the sequence intervals are predominantly transgressive units that have been intermittently interrupted by regressive pulses brought about by changes in eustatic sea level. The trend of paleocoastline observed during Oligocene to Miocene times changes from northwest-southeast orientation to a position roughly parallel to the present coastline. Seismic facies maps generated from late Oligocene to early Miocene indicate the depositional environment was coastal to coastal plain in the western and the middle part of the study area, becoming more marine toward the east and northeast.

  18. Draft genome sequence of Pseudomonas oleovorans strain MGY01 isolated from deep sea water.

    PubMed

    Wang, Runping; Ren, Chong; Huang, Nan; Liu, Yang; Zeng, Runying

    2015-04-01

    Pseudomonas oleovorans MGY01 isolated from the deep-sea water of the South China Sea could effectively degrade malachite green. The draft genome of P. oleovorans MGY01 was sequenced and analyzed to gain insights into its efficient metabolic pathway for degrading malachite green. The data obtained revealed 109 Contigs (N50; 128,269 bp) with whole genome size of 5,201,892 bp. The draft genome sequence of strain MGY01 will be helpful in studying the genetic pathways involved in the degradation of malachite green. PMID:25528517

  19. miRBase: integrating microRNA annotation and deep-sequencing data.

    PubMed

    Kozomara, Ana; Griffiths-Jones, Sam

    2011-01-01

    miRBase is the primary online repository for all microRNA sequences and annotation. The current release (miRBase 16) contains over 15,000 microRNA gene loci in over 140 species, and over 17,000 distinct mature microRNA sequences. Deep-sequencing technologies have delivered a sharp rise in the rate of novel microRNA discovery. We have mapped reads from short RNA deep-sequencing experiments to microRNAs in miRBase and developed web interfaces to view these mappings. The user can view all read data associated with a given microRNA annotation, filter reads by experiment and count, and search for microRNAs by tissue- and stage-specific expression. These data can be used as a proxy for relative expression levels of microRNA sequences, provide detailed evidence for microRNA annotations and alternative isoforms of mature microRNAs, and allow us to revisit previous annotations. miRBase is available online at: http://www.mirbase.org/. PMID:21037258

  20. Characterization of microRNA transcriptome in lung cancer by next-generation deep sequencing

    PubMed Central

    Ma, Jie; Mannoor, Kaiissar; Gao, Lu; Tan, Afang; Guarnera, Maria A.; Zhan, Min; Shetty, Amol; Stass, Sanford A; Xing, Lingxiao; Jiang, Feng

    2014-01-01

    Non-small cell lung cancer (NSCLC) is the leading cause of cancer death. Systematically characterizing miRNAs in NSCLC will help develop biomarkers for its diagnosis and subclassification, and identify therapeutic targets for the treatment. We used next-generation deep sequencing to comprehensively characterize miRNA profiles in eight lung tumor tissues consisting of two major types of NSCLC, squamous cell carcinoma (SCC) and adenocarcinoma (AC). We used quantitative PCR (qPCR) to verify the findings in 40 pairs of stage I NSCLC tissues and the paired normal tissues, and 60 NSCLC tissues of different types and stages. We also investigated the function of identified miRNAs in lung tumorigenesis. Deep sequencing identified 896 known miRNAs and 14 novel miRNAs, of which, 24 miRNAs displayed dysregulation with fold change ≥4.5 in either stage I ACs or SCCs or both relative to normal tissues. qPCR validation showed that 14 of 24 miRNAs exhibited consistent changes with deep sequencing data. Seven miRNAs displayed distinctive expressions between SCC and AC, from which, a panel of four miRNAs (miRs-944, 205-3p, 135a-5p, and 577) was identified that cold differentiate SCC from AC with 93.3% sensitivity and 86.7% specificity. Manipulation of miR-944 expression in NSCLC cells affected cell growth, proliferation, and invasion by targeting a tumor suppressor, SOCS4. Evaluating miR-944 in 52 formalin-fixed paraffin-embedded SCC tissues revealed that miR-944 expression was associated with lymph node metastasis. This study presents the earliest use of deep sequencing for profiling miRNAs in lung tumor specimens. The identified miRNA signatures may provide biomarkers for early detection, subclassification, and predicting metastasis, and potential therapeutic targets of NSCLC. PMID:24785186

  1. Ultra deep sequencing detects a low rate of mosaic mutations in Tuberous Sclerosis Complex

    PubMed Central

    Qin, Wei; Kozlowski, Piotr; Taillon, Bruce E.; Bouffard, Pascal; Holmes, Alison J.; Janne, Pasi; Camposano, Susana; Thiele, Elizabeth; Franz, David; Kwiatkowski, David J.

    2010-01-01

    Tuberous sclerosis complex (TSC) is an autosomal dominant neurocutaneous syndrome caused by mutations in TSC1 and TSC2. However, 10 to 15% TSC patients have no mutation identified with conventional molecular diagnostic studies. We used the ultra-deep pyrosequencing technique of 454 Sequencing to search for mosaicism in 38 TSC patients who had no TSC1 or TSC2 mutation identified by conventional methods. Two TSC2 mutations were identified, each at 5.3% read frequency in different patients, consistent with mosaicism. Both mosaic mutations were confirmed by several methods. Five of 38 samples were found to have heterozygous non-mosaic mutations, which had been missed in earlier analyses. Several other possible low frequency mosaic mutations were identified by deep sequencing, but were discarded as artifacts by secondary studies. The low frequency of detection of mosaic mutations, 2 (6%) of 33, suggests that the majority of TSC patients who have no mutation identified are not due to mosaicism, but rather other causes, which remain to be determined. These findings indicate the ability of deep sequencing, coupled with secondary confirmatory analyses, to detect low frequency mosaic mutations. PMID:20165957

  2. MetaGeniE: Characterizing Human Clinical Samples Using Deep Metagenomic Sequencing

    PubMed Central

    Rawat, Arun; Engelthaler, David M.; Driebe, Elizabeth M.; Keim, Paul; Foster, Jeffrey T.

    2014-01-01

    With the decreasing cost of next-generation sequencing, deep sequencing of clinical samples provides unique opportunities to understand host-associated microbial communities. Among the primary challenges of clinical metagenomic sequencing is the rapid filtering of human reads to survey for pathogens with high specificity and sensitivity. Metagenomes are inherently variable due to different microbes in the samples and their relative abundance, the size and architecture of genomes, and factors such as target DNA amounts in tissue samples (i.e. human DNA versus pathogen DNA concentration). This variation in metagenomes typically manifests in sequencing datasets as low pathogen abundance, a high number of host reads, and the presence of close relatives and complex microbial communities. In addition to these challenges posed by the composition of metagenomes, high numbers of reads generated from high-throughput deep sequencing pose immense computational challenges. Accurate identification of pathogens is confounded by individual reads mapping to multiple different reference genomes due to gene similarity in different taxa present in the community or close relatives in the reference database. Available global and local sequence aligners also vary in sensitivity, specificity, and speed of detection. The efficiency of detection of pathogens in clinical samples is largely dependent on the desired taxonomic resolution of the organisms. We have developed an efficient strategy that identifies “all against all” relationships between sequencing reads and reference genomes. Our approach allows for scaling to large reference databases and then genome reconstruction by aggregating global and local alignments, thus allowing genetic characterization of pathogens at higher taxonomic resolution. These results were consistent with strain level SNP genotyping and bacterial identification from laboratory culture. PMID:25365329

  3. Efficient selection of biomineralizing DNA aptamers using deep sequencing and population clustering.

    PubMed

    Bawazer, Lukmaan A; Newman, Aaron M; Gu, Qian; Ibish, Abdullah; Arcila, Mary; Cooper, James B; Meldrum, Fiona C; Morse, Daniel E

    2014-01-28

    DNA-based information systems drive the combinatorial optimization processes of natural evolution, including the evolution of biominerals. Advances in high-throughput DNA sequencing expand the power of DNA as a potential information platform for combinatorial engineering, but many applications remain to be developed due in part to the challenge of handling large amounts of sequence data. Here we employ high-throughput sequencing and a recently developed clustering method (AutoSOME) to identify single-stranded DNA sequence families that bind specifically to ZnO semiconductor mineral surfaces. These sequences were enriched from a diverse DNA library after a single round of screening, whereas previous screening approaches typically require 5-15 rounds of enrichment for effective sequence identification. The consensus sequence of the largest cluster was poly d(T)30. This consensus sequence exhibited clear aptamer behavior and was shown to promote the synthesis of crystalline ZnO from aqueous solution at near-neutral pH. This activity is significant, as the crystalline form of this wide-bandgap semiconductor is not typically amenable to solution synthesis in this pH range. High-resolution TEM revealed that this DNA synthesis route yields ZnO nanoparticles with an amorphous-crystalline core-shell structure, suggesting that the mechanism of mineralization involves nanoscale coacervation around the DNA template. We thus demonstrate that our new method, termed Single round Enrichment of Ligands by deep Sequencing (SEL-Seq), can facilitate biomimetic synthesis of technological nanomaterials by accelerating combinatorial selection of biomolecular-mineral interactions. Moreover, by enabling direct characterization of sequence family demographics, we anticipate that SEL-Seq will enhance aptamer discovery in applications employing additional rounds of screening. PMID:24341560

  4. miRBase: annotating high confidence microRNAs using deep sequencing data

    PubMed Central

    Kozomara, Ana; Griffiths-Jones, Sam

    2014-01-01

    We describe an update of the miRBase database (http://www.mirbase.org/), the primary microRNA sequence repository. The latest miRBase release (v20, June 2013) contains 24 521 microRNA loci from 206 species, processed to produce 30 424 mature microRNA products. The rate of deposition of novel microRNAs and the number of researchers involved in their discovery continue to increase, driven largely by small RNA deep sequencing experiments. In the face of these increases, and a range of microRNA annotation methods and criteria, maintaining the quality of the microRNA sequence data set is a significant challenge. Here, we describe recent developments of the miRBase database to address this issue. In particular, we describe the collation and use of deep sequencing data sets to assign levels of confidence to miRBase entries. We now provide a high confidence subset of miRBase entries, based on the pattern of mapped reads. The high confidence microRNA data set is available alongside the complete microRNA collection at http://www.mirbase.org/. We also describe embedding microRNA-specific Wikipedia pages on the miRBase website to encourage the microRNA community to contribute and share textual and functional information. PMID:24275495

  5. Nautilus: a bioinformatics package for the analysis of HIV type 1 targeted deep sequencing data.

    PubMed

    Kijak, Gustavo H; Pham, Phuc; Sanders-Buell, Eric; Harbolick, Elizabeth A; Eller, Leigh Anne; Robb, Merlin L; Michael, Nelson L; Kim, Jerome H; Tovanabutra, Sodsai

    2013-10-01

    The advent of next generation sequencing technologies is providing new insight into HIV-1 diversity and evolution, which has created the need for bioinformatics tools that could be applied to the characterization of viral quasispecies. Here we present Nautilus, a bioinformatics package for the analysis of HIV-1 targeted deep sequencing data. The DeepHaplo module determines the nucleotide base frequency and read depth at each position and computes the haplotype frequencies based on the linkage among polymorphisms in the same next generation sequence read. The Motifs module computes the frequency of the variants in the setting of their sequence context and mapping orientation, which allows for the validation of polymorphisms and haplotypes when strand bias is suspected. Both modules are accessed through a user-friendly GUI, which runs on Mac OS X (version 10.7.4 or later), and are based on Python, JAVA, and R scripts. Nautilus is available from www.hivresearch.org/research.php?ServiceID=5&SubServiceID=6 . PMID:23809062

  6. miRBase: annotating high confidence microRNAs using deep sequencing data.

    PubMed

    Kozomara, Ana; Griffiths-Jones, Sam

    2014-01-01

    We describe an update of the miRBase database (http://www.mirbase.org/), the primary microRNA sequence repository. The latest miRBase release (v20, June 2013) contains 24 521 microRNA loci from 206 species, processed to produce 30 424 mature microRNA products. The rate of deposition of novel microRNAs and the number of researchers involved in their discovery continue to increase, driven largely by small RNA deep sequencing experiments. In the face of these increases, and a range of microRNA annotation methods and criteria, maintaining the quality of the microRNA sequence data set is a significant challenge. Here, we describe recent developments of the miRBase database to address this issue. In particular, we describe the collation and use of deep sequencing data sets to assign levels of confidence to miRBase entries. We now provide a high confidence subset of miRBase entries, based on the pattern of mapped reads. The high confidence microRNA data set is available alongside the complete microRNA collection at http://www.mirbase.org/. We also describe embedding microRNA-specific Wikipedia pages on the miRBase website to encourage the microRNA community to contribute and share textual and functional information. PMID:24275495

  7. Analyzing the microRNA Transcriptome in Plants Using Deep Sequencing Data

    PubMed Central

    Yang, Xiaozeng; Li, Lei

    2012-01-01

    MicroRNAs (miRNAs) are 20- to 24-nucleotide endogenous small RNA molecules emerging as an important class of sequence-specific, trans-acting regulators for modulating gene expression at the post-transcription level. There has been a surge of interest in the past decade in identifying miRNAs and profiling their expression pattern using various experimental approaches. In particular, ultra-deep sampling of specifically prepared low-molecular-weight RNA libraries based on next-generation sequencing technologies has been used successfully in diverse species. The challenge now is to effectively deconvolute the complex sequencing data to provide comprehensive and reliable information on the miRNAs, miRNA precursors, and expression profile of miRNA genes. Here we review the recently developed computational tools and their applications in profiling the miRNA transcriptomes, with an emphasis on the model plant Arabidopsis thaliana. Highlighted is also progress and insight into miRNA biology derived from analyzing available deep sequencing data. PMID:24832228

  8. Complete Genome Sequence of a Reference Stock of Simian Immunodeficiency Virus RNA (SIVmac251/32H/L28) Determined by Deep Sequencing

    PubMed Central

    Jenkins, Adrian; Ham, Claire; Almond, Neil

    2016-01-01

    A reference preparation for simian immunodeficiency virus (SIV) RNA nucleic acid assays was characterized by complete genome deep sequencing. The entire coding sequence and flanking long terminal repeats, including minority species, were determined. This information will inform SIV research investigations and aid evaluation and development of amplification assays for SIV RNA quantification. PMID:27231355

  9. Complete Genome Sequence of a Reference Stock of Simian Immunodeficiency Virus RNA (SIVmac251/32H/L28) Determined by Deep Sequencing.

    PubMed

    Jenkins, Adrian; Ham, Claire; Almond, Neil; Berry, Neil

    2016-01-01

    A reference preparation for simian immunodeficiency virus (SIV) RNA nucleic acid assays was characterized by complete genome deep sequencing. The entire coding sequence and flanking long terminal repeats, including minority species, were determined. This information will inform SIV research investigations and aid evaluation and development of amplification assays for SIV RNA quantification. PMID:27231355

  10. De novo meta-assembly of ultra-deep sequencing data

    PubMed Central

    Mirebrahim, Hamid; Close, Timothy J.; Lonardi, Stefano

    2015-01-01

    We introduce a new divide and conquer approach to deal with the problem of de novo genome assembly in the presence of ultra-deep sequencing data (i.e. coverage of 1000x or higher). Our proposed meta-assembler Slicembler partitions the input data into optimal-sized ‘slices’ and uses a standard assembly tool (e.g. Velvet, SPAdes, IDBA_UD and Ray) to assemble each slice individually. Slicembler uses majority voting among the individual assemblies to identify long contigs that can be merged to the consensus assembly. To improve its efficiency, Slicembler uses a generalized suffix tree to identify these frequent contigs (or fraction thereof). Extensive experimental results on real ultra-deep sequencing data (8000x coverage) and simulated data show that Slicembler significantly improves the quality of the assembly compared with the performance of the base assembler. In fact, most of the times, Slicembler generates error-free assemblies. We also show that Slicembler is much more resistant against high sequencing error rate than the base assembler. Availability and implementation: Slicembler can be accessed at http://slicembler.cs.ucr.edu/. Contact: hamid.mirebrahim@email.ucr.edu PMID:26072514

  11. HomozygosityMapper2012--bridging the gap between homozygosity mapping and deep sequencing.

    PubMed

    Seelow, Dominik; Schuelke, Markus

    2012-07-01

    Homozygosity mapping is a common method to map recessive traits in consanguineous families. To facilitate these analyses, we have developed HomozygosityMapper, a web-based approach to homozygosity mapping. HomozygosityMapper allows researchers to directly upload the genotype files produced by the major genotyping platforms as well as deep sequencing data. It detects stretches of homozygosity shared by the affected individuals and displays them graphically. Users can interactively inspect the underlying genotypes, manually refine these regions and eventually submit them to our candidate gene search engine GeneDistiller to identify the most promising candidate genes. Here, we present the new version of HomozygosityMapper. The most striking new feature is the support of Next Generation Sequencing *.vcf files as input. Upon users' requests, we have implemented the analysis of common experimental rodents as well as of important farm animals. Furthermore, we have extended the options for single families and loss of heterozygosity studies. Another new feature is the export of *.bed files for targeted enrichment of the potential disease regions for deep sequencing strategies. HomozygosityMapper also generates files for conventional linkage analyses which are already restricted to the possible disease regions, hence superseding CPU-intensive genome-wide analyses. HomozygosityMapper is freely available at http://www.homozygositymapper.org/. PMID:22669902

  12. Genotyping Influenza Virus by Next-Generation Deep Sequencing in Clinical Specimens.

    PubMed

    Seong, Moon Woo; Cho, Sung Im; Park, Hyunwoong; Seo, Soo Hyun; Lee, Seung Jun; Kim, Eui Chong; Park, Sung Sup

    2016-05-01

    Rapid and accurate identification of an influenza outbreak is essential for patient care and treatment. We describe a next-generation sequencing (NGS)-based, unbiased deep sequencing method in clinical specimens to investigate an influenza outbreak. Nasopharyngeal swabs from patients were collected for molecular epidemiological analysis. Total RNA was sequenced by using the NGS technology as paired-end 250 bp reads. Total of 7 to 12 million reads were obtained. After mapping to the human reference genome, we analyzed the 3-4% of reads that originated from a non-human source. A BLAST search of the contigs reconstructed de novo revealed high sequence similarity with that of the pandemic H1N1 virus. In the phylogenetic analysis, the HA gene of our samples clustered closely with that of A/Senegal/VR785/2010(H1N1), A/Wisconsin/11/2013(H1N1), and A/Korea/01/2009(H1N1), and the NA gene of our samples clustered closely with A/Wisconsin/11/2013(H1N1). This study suggests that NGS-based unbiased sequencing can be effectively applied to investigate molecular characteristics of nosocomial influenza outbreak by using clinical specimens such as nasopharyngeal swabs. PMID:26915615

  13. Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing

    PubMed Central

    Whitehead, Timothy A; Chevalier, Aaron; Song, Yifan; Dreyfus, Cyrille; Fleishman, Sarel J; De Mattos, Cecilia; Myers, Chris A; Kamisetty, Hetunandan; Blair, Patrick; Wilson, Ian A; Baker, David

    2013-01-01

    We show that comprehensive sequence-function maps obtained by deep sequencing can be used to reprogram interaction specificity and to leapfrog over bottlenecks in affinity maturation by combining many individually small contributions not detectable in conventional approaches. We use this approach to optimize two computationally designed inhibitors against H1N1 influenza hemagglutinin and, in both cases, obtain variants with subnanomolar binding affinity. The most potent of these, a 51-residue protein, is broadly cross-reactive against all influenza group 1 hemagglutinins, including human H2, and neutralizes H1N1 viruses with a potency that rivals that of several human monoclonal antibodies, demonstrating that computational design followed by comprehensive energy landscape mapping can generate proteins with potential therapeutic utility. PMID:22634563

  14. Detection and characterization of mycoviruses in arbuscular mycorrhizal fungi by deep-sequencing.

    PubMed

    Ezawa, Tatsuhiro; Ikeda, Yoji; Shimura, Hanako; Masuta, Chikara

    2015-01-01

    Fungal viruses (mycoviruses) often have a significant impact not only on phenotypic expression of the host fungus but also on higher order biological interactions, e.g., conferring plant stress tolerance via an endophytic host fungus. Arbuscular mycorrhizal (AM) fungi in the phylum Glomeromycota associate with most land plants and supply mineral nutrients to the host plants. So far, little information about mycoviruses has been obtained in the fungi due to their obligate biotrophic nature. Here we provide a technical breakthrough, "two-step strategy" in combination with deep-sequencing, for virological study in AM fungi; dsRNA is first extracted and sequenced using material obtained from highly productive open pot culture, and then the presence of viruses is verified using pure material produced in the in vitro monoxenic culture. This approach enabled us to demonstrate the presence of several viruses for the first time from a glomeromycotan fungus. PMID:25287503

  15. Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing

    SciTech Connect

    Whitehead, Timothy A.; Chevalier, Aaron; Song, Yifan; Dreyfus, Cyrille; Fleishman, Sarel J.; De Mattos, Cecilia; Myers, Chris A.; Kamisetty, Hetunandan; Blair, Patrick; Wilson, Ian A.; Baker, David

    2012-06-19

    We show that comprehensive sequence-function maps obtained by deep sequencing can be used to reprogram interaction specificity and to leapfrog over bottlenecks in affinity maturation by combining many individually small contributions not detectable in conventional approaches. We use this approach to optimize two computationally designed inhibitors against H1N1 influenza hemagglutinin and, in both cases, obtain variants with subnanomolar binding affinity. The most potent of these, a 51-residue protein, is broadly cross-reactive against all influenza group 1 hemagglutinins, including human H2, and neutralizes H1N1 viruses with a potency that rivals that of several human monoclonal antibodies, demonstrating that computational design followed by comprehensive energy landscape mapping can generate proteins with potential therapeutic utility.

  16. Antibody repertoire deep sequencing reveals antigen-independent selection in maturing B cells

    PubMed Central

    Kaplinsky, Joseph; Li, Anthony; Sun, Amy; Coffre, Maryaline; Koralov, Sergei B.; Arnaout, Ramy

    2014-01-01

    Antibody repertoires are known to be shaped by selection for antigen binding. Unexpectedly, we now show that selection also acts on a non–antigen-binding antibody region: the heavy-chain variable (VH)–encoded “elbow” between variable and constant domains. By sequencing 2.8 million recombined heavy-chain genes from immature and mature B-cell subsets in mice, we demonstrate a striking gradient in VH gene use as pre-B cells mature into follicular and then into marginal zone B cells. Cells whose antibodies use VH genes that encode a more flexible elbow are more likely to mature. This effect is distinct from, and exceeds in magnitude, previously described maturation-associated changes in heavy-chain complementarity determining region 3, a key antigen-binding region, which arise from junctional diversity rather than differential VH gene use. Thus, deep sequencing reveals a previously unidentified mode of B-cell selection. PMID:24927543

  17. Rapid fine conformational epitope mapping using comprehensive mutagenesis and deep sequencing.

    PubMed

    Kowalsky, Caitlin A; Faber, Matthew S; Nath, Aritro; Dann, Hailey E; Kelly, Vince W; Liu, Li; Shanker, Purva; Wagner, Ellen K; Maynard, Jennifer A; Chan, Christina; Whitehead, Timothy A

    2015-10-30

    Knowledge of the fine location of neutralizing and non-neutralizing epitopes on human pathogens affords a better understanding of the structural basis of antibody efficacy, which will expedite rational design of vaccines, prophylactics, and therapeutics. However, full utilization of the wealth of information from single cell techniques and antibody repertoire sequencing awaits the development of a high throughput, inexpensive method to map the conformational epitopes for antibody-antigen interactions. Here we show such an approach that combines comprehensive mutagenesis, cell surface display, and DNA deep sequencing. We develop analytical equations to identify epitope positions and show the method effectiveness by mapping the fine epitope for different antibodies targeting TNF, pertussis toxin, and the cancer target TROP2. In all three cases, the experimentally determined conformational epitope was consistent with previous experimental datasets, confirming the reliability of the experimental pipeline. Once the comprehensive library is generated, fine conformational epitope maps can be prepared at a rate of four per day. PMID:26296891

  18. Inside the intraterrestrials: The deep biosphere seen through massively parallel sequencing

    NASA Astrophysics Data System (ADS)

    Biddle, J.

    2009-12-01

    Deeply buried marine sediments may house a large amount of the Earth’s microbial population. Initial studies based on 16S rRNA clone libraries suggest that these sediments contain unique phylotypes of microorganisms, particularly from the archaeal domain. Since this environment is so difficult to study, microbiologists are challenged to find ways to examine these populations remotely. A major approach taken to study this environment uses massively parallel sequencing to examine the inner genetic workings of these microorganisms after the sediment has been drilled. Both metagenomics and tagged amplicon sequencing have been employed on deep sediments, and initial results show that different geographic regions can be differentiated through genomics and also minor populations may cause major geochemical changes.

  19. Deep sequencing of New World screw-worm transcripts to discover genes involved in insecticide resistance

    PubMed Central

    2010-01-01

    Background The New World screw-worm (NWS), Cochliomyia hominivorax, is one of the most important myiasis-causing flies, causing severe losses to the livestock industry. In its current geographical distribution, this species has been controlled by the application of insecticides, mainly organophosphate (OP) compounds, but a number of lineages have been identified that are resistant to such chemicals. Despite its economic importance, only limited genetic information is available for the NWS. Here, as a part of an effort to characterize the C. hominivorax genome and identify putative genes involved in insecticide resistance, we sampled its transcriptome by deep sequencing of polyadenylated transcripts using the 454 sequencing technology. Results Deep sequencing on the 454 platform of three normalized libraries (larval, adult male and adult female) generated a total of 548,940 reads. Eighteen candidate genes coding for three metabolic detoxification enzyme families, cytochrome P450 monooxygenases, glutathione S-transferases and carboxyl/cholinesterases were selected and gene expression levels were measured using quantitative real-time polymerase chain reaction (qRT-PCR). Of the investigated candidates, only one gene was expressed differently between control and resistant larvae with, at least, a 10-fold down-regulation in the resistant larvae. The presence of mutations in the acetylcholinesterase (target site) and carboxylesterase E3 genes was investigated and all of the resistant flies presented E3 mutations previously associated with insecticide resistance. Conclusions Here, we provided the largest database of NWS expressed sequence tags that is an important resource, not only for further studies on the molecular basis of the OP resistance in NWS fly, but also for functional and comparative studies among Calliphoridae flies. Among our candidates, only one gene was found differentially expressed in resistant individuals, and its role on insecticide resistance should

  20. Draft Genome Sequence of Psychrobacter piscatorii Strain LQ58, a Psychrotolerant Bacterium Isolated from a Deep-Sea Hydrothermal Vent

    PubMed Central

    Dong, Binbin; Liu, Qing

    2016-01-01

    Here, we report the 3.1-Mb draft genome sequence of Psychrobacter piscatorii strain LQ58, isolated from a deep-sea hydrothermal vent on the East Pacific Rise. The sequence will provide further insight into the environmental adaptation of psychrotolerant bacteria and the development of novel cold-active enzymes for industrial application. PMID:26941137

  1. Draft Genome Sequence of Caloranaerobacter sp. TR13, an Anaerobic Thermophilic Bacterium Isolated from a Deep-Sea Hydrothermal Vent.

    PubMed

    Zhou, Meixian; Xie, Yunbiao; Dong, Binbin; Liu, Qing; Chen, Xiaoyao

    2015-01-01

    Here, we report the draft 2,261,881-bp genome sequence of Caloranaerobacter sp. TR13, isolated from a deep-sea hydrothermal vent on the East Pacific Rise. The sequence will be helpful for understanding the genetic and metabolic features, as well as potential biotechnological application in the genus Caloranaerobacter. PMID:26679595

  2. Draft Genome Sequence of Psychrobacter piscatorii Strain LQ58, a Psychrotolerant Bacterium Isolated from a Deep-Sea Hydrothermal Vent.

    PubMed

    Zhou, Meixian; Dong, Binbin; Liu, Qing

    2016-01-01

    Here, we report the 3.1-Mb draft genome sequence of Psychrobacter piscatorii strain LQ58, isolated from a deep-sea hydrothermal vent on the East Pacific Rise. The sequence will provide further insight into the environmental adaptation of psychrotolerant bacteria and the development of novel cold-active enzymes for industrial application. PMID:26941137

  3. Draft Genome Sequence of Caloranaerobacter sp. TR13, an Anaerobic Thermophilic Bacterium Isolated from a Deep-Sea Hydrothermal Vent

    PubMed Central

    Xie, Yunbiao; Dong, Binbin; Liu, Qing; Chen, Xiaoyao

    2015-01-01

    Here, we report the draft 2,261,881-bp genome sequence of Caloranaerobacter sp. TR13, isolated from a deep-sea hydrothermal vent on the East Pacific Rise. The sequence will be helpful for understanding the genetic and metabolic features, as well as potential biotechnological application in the genus Caloranaerobacter. PMID:26679595

  4. Population-genomic variation within RNA viruses of the Western honey bee, Apis mellifera, inferred from deep sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RN...

  5. Human norovirus hyper-mutation revealed by ultra-deep sequencing.

    PubMed

    Cuevas, José M; Combe, Marine; Torres-Puente, Manoli; Garijo, Raquel; Guix, Susana; Buesa, Javier; Rodríguez-Díaz, Jesús; Sanjuán, Rafael

    2016-07-01

    Human noroviruses (NoVs) are a major cause of gastroenteritis worldwide. It is thought that, similar to other RNA viruses, high mutation rates allow NoVs to evolve fast and to undergo rapid immune escape at the population level. However, the rate and spectrum of spontaneous mutations of human NoVs have not been quantified previously. Here, we analyzed the intra-patient diversity of the NoV capsid by carrying out RT-PCR and ultra-deep sequencing with 100,000-fold coverage of 16 stool samples from symptomatic patients. This revealed the presence of low-frequency sequences carrying large numbers of U-to-C or A-to-G base transitions, suggesting a role for hyper-mutation in NoV diversity. To more directly test for hyper-mutation, we performed transfection assays in which the production of mutations was restricted to a single cell infection cycle. This confirmed the presence of sequences with multiple U-to-C/A-to-G transitions, and suggested that hyper-mutation contributed a large fraction of the total NoV spontaneous mutation rate. The type of changes produced and their sequence context are compatible with ADAR-mediated editing of the viral RNA. PMID:27094861

  6. Targeted Deep Sequencing Reveals No Definitive Evidence for Somatic Mosaicism in Atrial Fibrillation

    PubMed Central

    Roberts, Jason D.; Longoria, James; Poon, Annie; Gollob, Michael H.; Dewland, Thomas A.; Kwok, Pui-Yan; Olgin, Jeffrey E.; Deo, Rahul C.; Marcus, Gregory M.

    2015-01-01

    Background Studies of ≤15 atrial fibrillation (AF) patients have identified atrial-specific mutations within connexin genes, suggesting that somatic mutations may account for sporadic cases of the arrhythmia. We sought to identify atrial somatic mutations among patients with and without AF using targeted deep next-generation sequencing of 560 genes, including genetic culprits implicated in AF, the Mendelian cardiomyopathies and channelopathies, and all ion channels within the genome. Methods and Results Targeted gene capture and next generation sequencing were performed on DNA from lymphocytes and left atrial appendages of 34 patients (25 with AF). Twenty AF patients had undergone cardiac surgery exclusively for pulmonary vein isolation, and 17 had no structural heart disease. Sequence alignment and variant calling were performed for each atrial-lymphocyte pair using the Burrows-Wheeler Aligner, the Genome Analysis Toolkit, and MuTect packages. Next generation sequencing yielded a median 265-fold coverage depth (IQR 164–369). Comparison of the 3 million base pairs from each atrial-lymphocyte pair revealed a single potential somatic missense mutation in 3 AF patients and 2 in a single control (12 vs. 11%; p=1). All potential discordant variants had low allelic fractions (range: 2.3–7.3%) and none were detected with conventional sequencing. Conclusions Using high-depth next generation sequencing and state-of-the art somatic mutation calling approaches, no pathogenic atrial somatic mutations could be confirmed among 25 AF patients in a comprehensive cardiac arrhythmia genetic panel. These findings indicate that atrial specific mutations are rare and that somatic mosaicism is unlikely to exert a prominent role in AF pathogenesis. PMID:25406240

  7. Analysis of the full-length genome sequence of papaya lethal yellowing virus (PLYV), determined by deep sequencing, confirms its classification in the genus Sobemovirus.

    PubMed

    Pereira, Alvaro J; Alfenas-Zerbini, Poliane; Cascardo, Renan S; Andrade, Eduardo C; Murilo Zerbini, F

    2012-10-01

    Papaya lethal yellowing virus (PLYV) causes an economically important disease in papayas in northeastern Brazil. Based on biological and molecular properties, PLYV has been tentatively assigned to the genus Sobemovirus. We report the sequence of the full-length genome of a PLYV isolate from Brazil, determined by deep sequencing. The PLYV genome is 4,145 nt long and contains four ORFs, with an arrangement identical to that of sobemoviruses. The polyprotein and CP display significant sequence identity with the corresponding proteins of other sobemoviruses. Pairwise comparisons and phylogenetic analysis based on complete nucleotide sequences confirm the classification of PLYV in the genus Sobemovirus. PMID:22743825

  8. Evaluation of ultra-deep targeted sequencing for personalized breast cancer care

    PubMed Central

    2013-01-01

    Introduction The increasing number of targeted therapies, together with a deeper understanding of cancer genetics and drug response, have prompted major healthcare centers to implement personalized treatment approaches relying on high-throughput tumor DNA sequencing. However, the optimal way to implement this transformative methodology is not yet clear. Current assays may miss important clinical information such as the mutation allelic fraction, the presence of sub-clones or chromosomal rearrangements, or the distinction between inherited variants and somatic mutations. Here, we present the evaluation of ultra-deep targeted sequencing (UDT-Seq) to generate and interpret the molecular profile of 38 breast cancer patients from two academic medical centers. Methods We sequenced 47 genes in matched germline and tumor DNA samples from 38 breast cancer patients. The selected genes, or the pathways they belong to, can be targeted by drugs or are important in familial cancer risk or drug metabolism. Results Relying on the added value of sequencing matched tumor and germline DNA and using a dedicated analysis, UDT-Seq has a high sensitivity to identify mutations in tumors with low malignant cell content. Applying UDT-Seq to matched tumor and germline specimens from the 38 patients resulted in a proposal for at least one targeted therapy for 22 patients, the identification of tumor sub-clones in 3 patients, the suggestion of potential adverse drug effects in 3 patients and a recommendation for genetic counseling for 2 patients. Conclusion Overall our study highlights the additional benefits of a sequencing strategy, which includes germline DNA and is optimized for heterogeneous tumor tissues. PMID:24326041

  9. Deep sequencing of pigeonpea sterility mosaic virus discloses five RNA segments related to emaraviruses.

    PubMed

    Elbeaino, Toufic; Digiaro, Michele; Uppala, Mangala; Sudini, Harikishan

    2014-08-01

    The sequences of five viral RNA segments of pigeonpea sterility mosaic virus (PPSMV), the agent of sterility mosaic disease (SMD) of pigeonpea (Cajanus cajan, Fabaceae), were determined using the deep sequencing technology. Each of the five RNAs encodes a single protein on the negative-sense strand with an open reading frame (ORF) of 6885, 1947, 927, 1086, and 1,422 nts, respectively. In order, from RNA1 to RNA5, these ORFs encode the RNA-dependent RNA polymerase (p1, 267.9 kDa), a putative glycoprotein precursor (p2, 74.3 kDa), a putative nucleocapsid protein (p3, 34.6 kDa), a putative movement protein (p4, 40.8 kDa), while p5 (55 kDa) has an unknown function. All RNA segments of PPSMV showed the highest identity with orthologs of fig mosaic virus (FMV) and Rose rosette virus (RRV). In phylogenetic trees constructed with the amino acid sequences of p1, p2 and p3, PPSMV clustered consistently with other emaraviruses, close to clades comprising members of other genera of the family Bunyaviridae. Based on the molecular characteristics unveiled in this study and the morphological and epidemiological features similar to other emaraviruses, PPSMV seems to be the seventh species to join the list of emaraviruses known to date and accordingly, its classification in the genus Emaravirus seems now legitimate. PMID:24685674

  10. Deep transcriptome profiling of clinical Klebsiella pneumoniae isolates reveals strain and sequence type-specific adaptation.

    PubMed

    Bruchmann, Sebastian; Muthukumarasamy, Uthayakumar; Pohl, Sarah; Preusse, Matthias; Bielecka, Agata; Nicolai, Tanja; Hamann, Isabell; Hillert, Roger; Kola, Axel; Gastmeier, Petra; Eckweiler, Denitsa; Häussler, Susanne

    2015-11-01

    Health-care-associated infections by multi-drug-resistant bacteria constitute one of the greatest challenges to modern medicine. Bacterial pathogens devise various mechanisms to withstand the activity of a wide range of antimicrobial compounds, among which the acquisition of carbapenemases is one of the most concerning. In Klebsiella pneumoniae, the dissemination of the K. pneumoniae carbapenemase is tightly connected to the global spread of certain clonal lineages. Although antibiotic resistance is a key driver for the global distribution of epidemic high-risk clones, there seem to be other adaptive traits that may explain their success. Here, we exploited the power of deep transcriptome profiling (RNA-seq) to shed light on the transcriptomic landscape of 37 clinical K. pneumoniae isolates of diverse phylogenetic origins. We identified a large set of 3346 genes which was expressed in all isolates. While the core-transcriptome profiles varied substantially between groups of different sequence types, they were more homogenous among isolates of the same sequence type. We furthermore linked the detailed information on differentially expressed genes with the clinically relevant phenotypes of biofilm formation and bacterial virulence. This allowed for the identification of a diminished expression of biofilm-specific genes within the low biofilm producing ST258 isolates as a sequence type-specific trait. PMID:26261087

  11. Heteroplasmic substitutions in the entire mitochondrial genomes of human colon cells detected by ultra-deep 454 sequencing.

    PubMed

    Skonieczna, Katarzyna; Malyarchuk, Boris; Jawień, Arkadiusz; Marszałek, Andrzej; Banaszkiewicz, Zbigniew; Jarmocik, Paweł; Borcz, Marcelina; Bała, Piotr; Grzybowski, Tomasz

    2015-03-01

    Mitochondrial DNA (mtDNA) heteroplasmy has been widely described from clinical, evolutionary and analytical points of view. Historically, the majority of studies have been based on Sanger sequencing. However, next-generation sequencing technologies are now being used for heteroplasmy analysis. Ultra-deep sequencing approaches provide increased sensitivity for detecting minority variants. However, a phylogenetic a posteriori analysis revealed that most of the next-generation sequencing data published to date suffers from shortcomings. Because implementation of new technologies in clinical, population, or forensic studies requires proper verification, in this paper we present a direct comparison of ultra-deep 454 and Sanger sequencing for the detection of heteroplasmy in complete mitochondrial genomes of normal colon cells. The spectrum of heteroplasmic mutations is discussed against the background of mitochondrial DNA variability in human populations. PMID:25465762

  12. Deep Sequencing Analysis of Nucleolar Small RNAs: RNA Isolation and Library Preparation.

    PubMed

    Bai, Baoyan; Laiho, Marikki

    2016-01-01

    The nucleolus is a subcellular compartment with a key essential function in ribosome biogenesis. The nucleolus is rich in noncoding RNAs, mostly the ribosomal RNAs and small nucleolar RNAs. Surprisingly, also several miRNAs have been detected in the nucleolus, raising the question as to whether other small RNA species are present and functional in the nucleolus. We have developed a strategy for stepwise enrichment of nucleolar small RNAs from the total nucleolar RNA extracts and subsequent construction of nucleolar small RNA libraries which are suitable for deep sequencing. Our method successfully isolates the small RNA population from total RNAs and monitors the RNA quality in each step to ensure that small RNAs recovered represent the actual small RNA population in the nucleolus and not degradation products from larger RNAs. We have further applied this approach to characterize the distribution of small RNAs in different cellular compartments. PMID:27576723

  13. Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing

    PubMed Central

    Manske, Magnus; Miotto, Olivo; Campino, Susana; Auburn, Sarah; Almagro-Garcia, Jacob; Maslen, Gareth; O’Brien, Jack; Djimde, Abdoulaye; Doumbo, Ogobara; Zongo, Issaka; Ouedraogo, Jean-Bosco; Michon, Pascal; Mueller, Ivo; Siba, Peter; Nzila, Alexis; Borrmann, Steffen; Kiara, Steven M.; Marsh, Kevin; Jiang, Hongying; Su, Xin-Zhuan; Amaratunga, Chanaki; Fairhurst, Rick; Socheat, Duong; Nosten, Francois; Imwong, Mallika; White, Nicholas J.; Sanders, Mandy; Anastasi, Elisa; Alcock, Dan; Drury, Eleanor; Oyola, Samuel; Quail, Michael A.; Turner, Daniel J.; Rubio, Valentin Ruano; Jyothi, Dushyanth; Amenga-Etego, Lucas; Hubbart, Christina; Jeffreys, Anna; Rowlands, Kate; Sutherland, Colin; Roper, Cally; Mangano, Valentina; Modiano, David; Tan, John C.; Ferdig, Michael T.; Amambua-Ngwa, Alfred; Conway, David J.; Takala-Harrison, Shannon; Plowe, Christopher V.; Rayner, Julian C.; Rockett, Kirk A.; Clark, Taane G.; Newbold, Chris I.; Berriman, Matthew; MacInnis, Bronwyn; Kwiatkowski, Dominic P.

    2013-01-01

    Malaria elimination strategies require surveillance of the parasite population for genetic changes that demand a public health response, such as new forms of drug resistance. 1,2 Here we describe methods for large-scale analysis of genetic variation in Plasmodium falciparum by deep sequencing of parasite DNA obtained from the blood of patients with malaria, either directly or after short term culture. Analysis of 86,158 exonic SNPs that passed genotyping quality control in 227 samples from Africa, Asia and Oceania provides genome-wide estimates of allele frequency distribution, population structure and linkage disequilibrium. By comparing the genetic diversity of individual infections with that of the local parasite population, we derive a metric of within-host diversity that is related to the level of inbreeding in the population. An open-access web application has been established for exploration of regional differences in allele frequency and of highly differentiated loci in the P. falciparum genome. PMID:22722859

  14. Polymorphism Identification and Improved Genome Annotation of Brassica rapa Through Deep RNA Sequencing

    PubMed Central

    Devisetty, Upendra Kumar; Covington, Michael F.; Tat, An V.; Lekkala, Saradadevi; Maloof, Julin N.

    2014-01-01

    The mapping and functional analysis of quantitative traits in Brassica rapa can be greatly improved with the availability of physically positioned, gene-based genetic markers and accurate genome annotation. In this study, deep transcriptome RNA sequencing (RNA-Seq) of Brassica rapa was undertaken with two objectives: SNP detection and improved transcriptome annotation. We performed SNP detection on two varieties that are parents of a mapping population to aid in development of a marker system for this population and subsequent development of high-resolution genetic map. An improved Brassica rapa transcriptome was constructed to detect novel transcripts and to improve the current genome annotation. This is useful for accurate mRNA abundance and detection of expression QTL (eQTLs) in mapping populations. Deep RNA-Seq of two Brassica rapa genotypes—R500 (var. trilocularis, Yellow Sarson) and IMB211 (a rapid cycling variety)—using eight different tissues (root, internode, leaf, petiole, apical meristem, floral meristem, silique, and seedling) grown across three different environments (growth chamber, greenhouse and field) and under two different treatments (simulated sun and simulated shade) generated 2.3 billion high-quality Illumina reads. A total of 330,995 SNPs were identified in transcribed regions between the two genotypes with an average frequency of one SNP in every 200 bases. The deep RNA-Seq reassembled Brassica rapa transcriptome identified 44,239 protein-coding genes. Compared with current gene models of B. rapa, we detected 3537 novel transcripts, 23,754 gene models had structural modifications, and 3655 annotated proteins changed. Gaps in the current genome assembly of B. rapa are highlighted by our identification of 780 unmapped transcripts. All the SNPs, annotations, and predicted transcripts can be viewed at http://phytonetworks.ucdavis.edu/. PMID:25122667

  15. Identification of MicroRNAs in Meloidogyne incognita Using Deep Sequencing

    PubMed Central

    Wang, Yunsheng; Mao, Zhenchuan; Yan, Jin; Cheng, Xinyue; Liu, Feng; Xiao, Luo; Dai, Liangying; Luo, Feng; Xie, Bingyan

    2015-01-01

    MicroRNAs play important regulatory roles in eukaryotic lineages. In this paper, we employed deep sequencing technology to sequence and identify microRNAs in M. incognita genome, which is one of the important plant parasitic nematodes. We identified 102 M. incognita microRNA genes, which can be grouped into 71 nonredundant miRNAs based on mature sequences. Among the 71 miRANs, 27 are known miRNAs and 44 are novel miRNAs. We identified seven miRNA clusters in M. incognita genome. Four of the seven clusters, miR-100/let-7, miR-71-1/miR-2a-1, miR-71-2/miR-2a-2 and miR-279/miR-2b are conserved in other species. We validated the expressions of 5 M. incognita microRNAs, including 3 known microRNAs (miR-71, miR-100b and let-7) and 2 novel microRNAs (NOVEL-1 and NOVEL-2), using RT-PCR. We can detect all 5 microRNAs. The expression levels of four microRNAs obtained using RT-PCR were consistent with those obtained by high-throughput sequencing except for those of let-7. We also examined how M. incognita miRNAs are conserved in four other nematodes species: C. elegans, A. suum, B. malayi and P. pacificus. We found that four microRNAs, miR-100, miR-92, miR-279 and miR-137, exist only in genomes of parasitic nematodes, but do not exist in the genomes of the free living nematode C. elegans. Our research created a unique resource for the research of plant parasitic nematodes. The candidate microRNAs could help elucidate the genomic structure, gene regulation, evolutionary processes, and developmental features of plant parasitic nematodes and nematode-plant interaction. PMID:26241472

  16. Deep Sequencing the Transcriptome Reveals Seasonal Adaptive Mechanisms in a Hibernating Mammal

    PubMed Central

    Hampton, Marshall; Melvin, Richard G.; Kendall, Anne H.; Kirkpatrick, Brian R.; Peterson, Nichole; Andrews, Matthew T.

    2011-01-01

    Mammalian hibernation is a complex phenotype involving metabolic rate reduction, bradycardia, profound hypothermia, and a reliance on stored fat that allows the animal to survive for months without food in a state of suspended animation. To determine the genes responsible for this phenotype in the thirteen-lined ground squirrel (Ictidomys tridecemlineatus) we used the Roche 454 platform to sequence mRNA isolated at six points throughout the year from three key tissues: heart, skeletal muscle, and white adipose tissue (WAT). Deep sequencing generated approximately 3.7 million cDNA reads from 18 samples (6 time points ×3 tissues) with a mean read length of 335 bases. Of these, 3,125,337 reads were assembled into 140,703 contigs. Approximately 90% of all sequences were matched to proteins in the human UniProt database. The total number of distinct human proteins matched by ground squirrel transcripts was 13,637 for heart, 12,496 for skeletal muscle, and 14,351 for WAT. Extensive mitochondrial RNA sequences enabled a novel approach of using the transcriptome to construct the complete mitochondrial genome for I. tridecemlineatus. Seasonal and activity-specific changes in mRNA levels that met our stringent false discovery rate cutoff (1.0×10−11) were used to identify patterns of gene expression involving various aspects of the hibernation phenotype. Among these patterns are differentially expressed genes encoding heart proteins AT1A1, NAC1 and RYR2 controlling ion transport required for contraction and relaxation at low body temperatures. Abundant RNAs in skeletal muscle coding ubiquitin pathway proteins ASB2, UBC and DDB1 peak in October, suggesting an increase in muscle proteolysis. Finally, genes in WAT that encode proteins involved in lipogenesis (ACOD, FABP4) are highly expressed in August, but gradually decline in expression during the seasonal transition to lipolysis. PMID:22046435

  17. Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data

    PubMed Central

    Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A.; Larsen, Martin Jakob

    2016-01-01

    Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths. PMID:27002637

  18. Uncovering microRNA-mediated response to SO2 stress in Arabidopsis thaliana by deep sequencing.

    PubMed

    Li, Lihong; Xue, Meizhao; Yi, Huilan

    2016-10-01

    Sulfur dioxide (SO2) is a major air pollutant and has significant impacts on plants. MicroRNAs (miRNAs) are a class of gene expression regulators that play important roles in response to environmental stresses. In this study, deep sequencing was used for genome-wide identification of miRNAs and their expression profiles in response to SO2 stress in Arabidopsis thaliana shoots. A total of 27 conserved miRNAs and 5 novel miRNAs were found to be differentially expressed under SO2 stress. qRT-PCR analysis showed mostly negative correlation between miRNA accumulation and target gene mRNA abundance, suggesting regulatory roles of these miRNAs during SO2 exposure. The target genes of SO2-responsive miRNAs encode transcription factors and proteins that regulate auxin signaling and stress response, and the miRNAs-mediated suppression of these genes could improve plant resistance to SO2 stress. Promoter sequence analysis of genes encoding SO2-responsive miRNAs showed that stress-responsive and phytohormone-related cis-regulatory elements occurred frequently, providing additional evidence of the involvement of miRNAs in adaption to SO2 stress. This study represents a comprehensive expression profiling of SO2-responsive miRNAs in Arabidopsis and broads our perspective on the ubiquitous regulatory roles of miRNAs under stress conditions. PMID:27232729

  19. Transcript analysis of a goat mesenteric lymph node by deep next-generation sequencing.

    PubMed

    E, G X; Zhao, Y J; Na, R S; Huang, Y F

    2016-01-01

    Deep RNA sequencing (RNA-seq) provides a practical and inexpensive alternative for exploring genomic data in non-model organisms. The functional annotation of non-model mammalian genomes, such as that of goats, is still poor compared to that of humans and mice. In the current study, we performed a whole transcriptome analysis of an intestinal mucous membrane lymph node to comprehensively characterize the transcript catalogue of this tissue in a goat. Using an Illumina HiSeq 4000 sequencing platform, 9.692 GB of raw reads were acquired. A total of 57,526 lymph transcripts were obtained, and the majority of these were mapped to known transcriptional units (42.67%). A comparison of the mRNA expression of the mesenteric lymph nodes during the juvenile and post-adolescent stages revealed 8949 transcripts that were differentially expressed, including 6174 known genes. In addition, we functionally classified these transcripts using Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) terms. A total of 6174 known genes were assigned to 64 GO terms, and 3782 genes were assigned to 303 KEGG pathways, including some related to immunity. Our results reveal the complex transcriptome profile of the lymph node and suggest that the immune system is immature in the mesenteric lymph nodes of juvenile goats. PMID:27173308

  20. Frequency-based haplotype reconstruction from deep sequencing data of bacterial populations

    PubMed Central

    Pulido-Tamayo, Sergio; Sánchez-Rodríguez, Aminael; Swings, Toon; Van den Bergh, Bram; Dubey, Akanksha; Steenackers, Hans; Michiels, Jan; Fostier, Jan; Marchal, Kathleen

    2015-01-01

    Clonal populations accumulate mutations over time, resulting in different haplotypes. Deep sequencing of such a population in principle provides information to reconstruct these haplotypes and the frequency at which the haplotypes occur. However, this reconstruction is technically not trivial, especially not in clonal systems with a relatively low mutation frequency. The low number of segregating sites in those systems adds ambiguity to the haplotype phasing and thus obviates the reconstruction of genome-wide haplotypes based on sequence overlap information. Therefore, we present EVORhA, a haplotype reconstruction method that complements phasing information in the non-empty read overlap with the frequency estimations of inferred local haplotypes. As was shown with simulated data, as soon as read lengths and/or mutation rates become restrictive for state-of-the-art methods, the use of this additional frequency information allows EVORhA to still reliably reconstruct genome-wide haplotypes. On real data, we show the applicability of the method in reconstructing the population composition of evolved bacterial populations and in decomposing mixed bacterial infections from clinical samples. PMID:25990729

  1. Deep sequencing of the MHC region in the Chinese population contributes to studies of complex disease.

    PubMed

    Zhou, Fusheng; Cao, Hongzhi; Zuo, Xianbo; Zhang, Tao; Zhang, Xiaoguang; Liu, Xiaomin; Xu, Ricong; Chen, Gang; Zhang, Yuanwei; Zheng, Xiaodong; Jin, Xin; Gao, Jinping; Mei, Junpu; Sheng, Yujun; Li, Qibin; Liang, Bo; Shen, Juan; Shen, Changbing; Jiang, Hui; Zhu, Caihong; Fan, Xing; Xu, Fengping; Yue, Min; Yin, Xianyong; Ye, Chen; Zhang, Cuicui; Liu, Xiao; Yu, Liang; Wu, Jinghua; Chen, Mengyun; Zhuang, Xuehan; Tang, Lili; Shao, Haojing; Wu, Longmao; Li, Jian; Xu, Yu; Zhang, Yijie; Zhao, Suli; Wang, Yu; Li, Ge; Xu, Hanshi; Zeng, Lei; Wang, Jianan; Bai, Mingzhou; Chen, Yanling; Chen, Wei; Kang, Tian; Wu, Yanyan; Xu, Xun; Zhu, Zhengwei; Cui, Yong; Wang, Zaixing; Yang, Chunjun; Wang, Peiguang; Xiang, Leihong; Chen, Xiang; Zhang, Anping; Gao, Xinghua; Zhang, Furen; Xu, Jinhua; Zheng, Min; Zheng, Jie; Zhang, Jianzhong; Yu, Xueqing; Li, Yingrui; Yang, Sen; Yang, Huanming; Wang, Jian; Liu, Jianjun; Hammarström, Lennart; Sun, Liangdan; Wang, Jun; Zhang, Xuejun

    2016-07-01

    The human major histocompatibility complex (MHC) region has been shown to be associated with numerous diseases. However, it remains a challenge to pinpoint the causal variants for these associations because of the extreme complexity of the region. We thus sequenced the entire 5-Mb MHC region in 20,635 individuals of Han Chinese ancestry (10,689 controls and 9,946 patients with psoriasis) and constructed a Han-MHC database that includes both variants and HLA gene typing results of high accuracy. We further identified multiple independent new susceptibility loci in HLA-C, HLA-B, HLA-DPB1 and BTNL2 and an intergenic variant, rs118179173, associated with psoriasis and confirmed the well-established risk allele HLA-C*06:02. We anticipate that our Han-MHC reference panel built by deep sequencing of a large number of samples will serve as a useful tool for investigating the role of the MHC region in a variety of diseases and thus advance understanding of the pathogenesis of these disorders. PMID:27213287

  2. Deep sequencing identifies genetic heterogeneity and recurrent convergent evolution in chronic lymphocytic leukemia

    PubMed Central

    Ojha, Juhi; Ayres, Jackline; Secreto, Charla; Tschumper, Renee; Rabe, Kari; Van Dyke, Daniel; Slager, Susan; Shanafelt, Tait; Fonseca, Rafael; Kay, Neil E.

    2015-01-01

    Recent high-throughput sequencing and microarray studies have characterized the genetic landscape and clonal complexity of chronic lymphocytic leukemia (CLL). Here, we performed a longitudinal study in a homogeneously treated cohort of 12 patients, with sequential samples obtained at comparable stages of disease. We identified clonal competition between 2 or more genetic subclones in 70% of the patients with relapse, and stable clonal dynamics in the remaining 30%. By deep sequencing, we identified a high reservoir of genetic heterogeneity in the form of several driver genes mutated in small subclones underlying the disease course. Furthermore, in 2 patients, we identified convergent evolution, characterized by the combination of genetic lesions affecting the same genes or copy number abnormality in different subclones. The phenomenon affects multiple CLL putative driver abnormalities, including mutations in NOTCH1, SF3B1, DDX3X, and del(11q23). This is the first report documenting convergent evolution as a recurrent event in the CLL genome. Furthermore, this finding suggests the selective advantage of specific combinations of genetic lesions for CLL pathogenesis in a subset of patients. PMID:25377784

  3. Deep-sequencing transcriptome analysis of chilling tolerance mechanisms of a subnival alpine plant, Chorispora bungeana

    PubMed Central

    2012-01-01

    Background The plant tolerance mechanisms to low temperature have been studied extensively in the model plant Arabidopsis at the transcriptional level. However, few studies were carried out in plants with strong inherited cold tolerance. Chorispora bungeana is a subnival alpine plant possessing strong cold tolerance mechanisms. To get a deeper insight into its cold tolerance mechanisms, the transcriptome profiles of chilling-treated C. bungeana seedlings were analyzed by Illumina deep-sequencing and compared with Arabidopsis. Results Two cDNA libraries constructed from mRNAs of control and chilling-treated seedlings were sequenced by Illumina technology. A total of 54,870 unigenes were obtained by de novo assembly, and 3,484 chilling up-regulated and 4,571 down-regulated unigenes were identified. The expressions of 18 out of top 20 up-regulated unigenes were confirmed by qPCR analysis. Functional network analysis of the up-regulated genes revealed some common biological processes, including cold responses, and molecular functions in C. bungeana and Arabidopsis responding to chilling. Karrikins were found as new plant growth regulators involved in chilling responses of C. bungeana and Arabidopsis. However, genes involved in cold acclimation were enriched in chilling up-regulated genes in Arabidopsis but not in C. bungeana. In addition, although transcription activations were stimulated in both C. bungeana and Arabidopsis, no CBF putative ortholog was up-regulated in C. bungeana while CBF2 and CBF3 were chilling up-regulated in Arabidopsis. On the other hand, up-regulated genes related to protein phosphorylation and auto-ubiquitination processes were over-represented in C. bungeana but not in Arabidopsis. Conclusions We conducted the first deep-sequencing transcriptome profiling and chilling stress regulatory network analysis of C. bungeana, a subnival alpine plant with inherited cold tolerance. Comparative transcriptome analysis suggests that cold acclimation is not

  4. Deep RNA sequencing analysis of readthrough gene fusions in human prostate adenocarcinoma and reference samples

    PubMed Central

    2011-01-01

    Background Readthrough fusions across adjacent genes in the genome, or transcription-induced chimeras (TICs), have been estimated using expressed sequence tag (EST) libraries to involve 4-6% of all genes. Deep transcriptional sequencing (RNA-Seq) now makes it possible to study the occurrence and expression levels of TICs in individual samples across the genome. Methods We performed single-end RNA-Seq on three human prostate adenocarcinoma samples and their corresponding normal tissues, as well as brain and universal reference samples. We developed two bioinformatics methods to specifically identify TIC events: a targeted alignment method using artificial exon-exon junctions within 200,000 bp from adjacent genes, and genomic alignment allowing splicing within individual reads. We performed further experimental verification and characterization of selected TIC and fusion events using quantitative RT-PCR and comparative genomic hybridization microarrays. Results Targeted alignment against artificial exon-exon junctions yielded 339 distinct TIC events, including 32 gene pairs with multiple isoforms. The false discovery rate was estimated to be 1.5%. Spliced alignment to the genome was less sensitive, finding only 18% of those found by targeted alignment in 33-nt reads and 59% of those in 50-nt reads. However, spliced alignment revealed 30 cases of TICs with intervening exons, in addition to distant inversions, scrambled genes, and translocations. Our findings increase the catalog of observed TIC gene pairs by 66%. We verified 6 of 6 predicted TICs in all prostate samples, and 2 of 5 predicted novel distant gene fusions, both private events among 54 prostate tumor samples tested. Expression of TICs correlates with that of the upstream gene, which can explain the prostate-specific pattern of some TIC events and the restriction of the SLC45A3-ELK4 e4-e2 TIC to ERG-negative prostate samples, as confirmed in 20 matched prostate tumor and normal samples and 9 lung cancer

  5. An optimized kit-free method for making strand-specific deep sequencing libraries from RNA fragments.

    PubMed

    Heyer, Erin E; Ozadam, Hakan; Ricci, Emiliano P; Cenik, Can; Moore, Melissa J

    2015-01-01

    Deep sequencing of strand-specific cDNA libraries is now a ubiquitous tool for identifying and quantifying RNAs in diverse sample types. The accuracy of conclusions drawn from these analyses depends on precise and quantitative conversion of the RNA sample into a DNA library suitable for sequencing. Here, we describe an optimized method of preparing strand-specific RNA deep sequencing libraries from small RNAs and variably sized RNA fragments obtained from ribonucleoprotein particle footprinting experiments or fragmentation of long RNAs. Our approach works across a wide range of input amounts (400 pg to 200 ng), is easy to follow and produces a library in 2-3 days at relatively low reagent cost, all while giving the user complete control over every step. Because all enzymatic reactions were optimized and driven to apparent completion, sequence diversity and species abundance in the input sample are well preserved. PMID:25505164

  6. Deep Sequencing Analysis Reveals Temporal Microbiota Changes Associated with Development of Bovine Digital Dermatitis

    PubMed Central

    Krull, Adam C.; Shearer, Jan K.; Gorden, Patrick J.; Cooper, Vickie L.; Phillips, Gregory J.

    2014-01-01

    Bovine digital dermatitis (DD) is a leading cause of lameness in dairy cattle throughout the world. Despite 35 years of research, the definitive etiologic agent associated with the disease process is still unknown. Previous studies have demonstrated that multiple bacterial species are associated with lesions, with spirochetes being the most reliably identified organism. This study details the deep sequencing-based metagenomic evaluation of 48 staged DD biopsy specimens collected during a 3-year longitudinal study of disease progression. Over 175 million sequences were evaluated by utilizing both shotgun and 16S metagenomic techniques. Based on the shotgun sequencing results, there was no evidence of a fungal or DNA viral etiology. The bacterial microbiota of biopsy specimens progresses through a systematic series of changes that correlate with the novel morphological lesion scoring system developed as part of this project. This scoring system was validated, as the microbiota of each stage was statistically significantly different from those of other stages (P < 0.001). The microbiota of control biopsy specimens were the most diverse and became less diverse as lesions developed. Although Treponema spp. predominated in the advanced lesions, they were in relatively low abundance in the newly described early lesions that are associated with the initiation of the disease process. The consortium of Treponema spp. identified at the onset of disease changes considerably as the lesions progress through the morphological stages identified. The results of this study support the hypothesis that DD is a polybacterial disease process and provide unique insights into the temporal changes in bacterial populations throughout lesion development. PMID:24866801

  7. Reconstructing the Dynamics of HIV Evolution within Hosts from Serial Deep Sequence Data

    PubMed Central

    Poon, Art F. Y.; Swenson, Luke C.; Bunnik, Evelien M.; Edo-Matas, Diana; Schuitemaker, Hanneke; van 't Wout, Angélique B.; Harrigan, P. Richard

    2012-01-01

    At the early stage of infection, human immunodeficiency virus (HIV)-1 predominantly uses the CCR5 coreceptor for host cell entry. The subsequent emergence of HIV variants that use the CXCR4 coreceptor in roughly half of all infections is associated with an accelerated decline of CD4+ T-cells and rate of progression to AIDS. The presence of a ‘fitness valley’ separating CCR5- and CXCR4-using genotypes is postulated to be a biological determinant of whether the HIV coreceptor switch occurs. Using phylogenetic methods to reconstruct the evolutionary dynamics of HIV within hosts enables us to discriminate between competing models of this process. We have developed a phylogenetic pipeline for the molecular clock analysis, ancestral reconstruction, and visualization of deep sequence data. These data were generated by next-generation sequencing of HIV RNA extracted from longitudinal serum samples (median 7 time points) from 8 untreated subjects with chronic HIV infections (Amsterdam Cohort Studies on HIV-1 infection and AIDS). We used the known dates of sampling to directly estimate rates of evolution and to map ancestral mutations to a reconstructed timeline in units of days. HIV coreceptor usage was predicted from reconstructed ancestral sequences using the geno2pheno algorithm. We determined that the first mutations contributing to CXCR4 use emerged about 16 (per subject range 4 to 30) months before the earliest predicted CXCR4-using ancestor, which preceded the first positive cell-based assay of CXCR4 usage by 10 (range 5 to 25) months. CXCR4 usage arose in multiple lineages within 5 of 8 subjects, and ancestral lineages following alternate mutational pathways before going extinct were common. We observed highly patient-specific distributions and time-scales of mutation accumulation, implying that the role of a fitness valley is contingent on the genotype of the transmitted variant. PMID:23133358

  8. High resolution sequence stratigraphy of Miocene deep-water clastic outcrops, Taranaki coast, New Zealand

    SciTech Connect

    King, P.R.; Browne, G.H.; Slatt, R.M.

    1995-08-01

    Approximately 700m of deep water clastic deposits of Mt. Messenger Formation are superbly exposed along the Taranaki coast of North Island, New Zealand. Biostratigraphy indicates the interval was deposited during the time span 10.5-9.2m.y. in water depths grading upward from lower bathyal to middle-upper bathyal. This interval is considered part of a 3rd order depositional sequence deposited under conditions of fluctuating relative sea-level, concomitant with high sedimentation rates. Several 4th order depositional sequences, reflecting successive sea-level falls, are recognized within the interval. Sequence boundaries display a range of erosive morphologies from metre-wide canyons to scours several hundred metres across. All components of a generic lowstand systems tract--basin floor fan, channel-levee complex and progading complex--are present in logical and temporal order. They are repetitive through the interval, with the relatively shallower-water components becoming more prevalent upward. Basin floor fan lithologies are mainly m-thick, massive and convolute-bedded sandstones that alternate with cm- and dm-thick massive, horizontally-stratified and ripple-laminated sandstones and bioturbated mudstones. Channel-levee deposits consist of interleaving packages of thin-bedded, climbing-rippled and parallel-laminated sandstones and millstones; infrequent channels are filled with sandstones and mudstones, and sometimes lined with conglomerate. Thin beds of parallel to convoluted mudstone comprise prograding complex deposits. Similar lowstand systems tracts can be recognized and correlated on subsurface seismic reflection profiles and wireline logs. Such correlation has been aided by a continuous outcrop gamma-ray fog obtained over most of the measured interval. In the adjacent Taranaki peninsula, basin floor fan and channel-levee deposits comprise hydrocarbon reservoir intervals. Outcrop and subsurface reservior sandstones exhibit similar permeabilities.

  9. Transcriptome-Wide Identification of Hfq-Associated RNAs in Brucella suis by Deep Sequencing

    PubMed Central

    Saadeh, Bashir; Caswell, Clayton C.; Berta, Philippe; Wattam, Alice Rebecca; Roop, R. Martin

    2015-01-01

    ABSTRACT Recent breakthroughs in next-generation sequencing technologies have led to the identification of small noncoding RNAs (sRNAs) as a new important class of regulatory molecules. In prokaryotes, sRNAs are often bound to the chaperone protein Hfq, which allows them to interact with their partner mRNA(s). We screened the genome of the zoonotic and human pathogen Brucella suis 1330 for the presence of this class of RNAs. We designed a coimmunoprecipitation strategy that relies on the use of Hfq as a bait to enrich the sample with sRNAs and eventually their target mRNAs. By deep sequencing analysis of the Hfq-bound transcripts, we identified a number of mRNAs and 33 sRNA candidates associated with Hfq. The expression of 10 sRNAs in the early stationary growth phase was experimentally confirmed by Northern blotting and/or reverse transcriptase PCR. IMPORTANCE Brucella organisms are facultative intracellular pathogens that use stealth strategies to avoid host defenses. Adaptation to the host environment requires tight control of gene expression. Recently, small noncoding RNAs (sRNAs) and the sRNA chaperone Hfq have been shown to play a role in the fine-tuning of gene expression. Here we have used RNA sequencing to identify RNAs associated with the B. suis Hfq protein. We have identified a novel list of 33 sRNAs and 62 Hfq-associated mRNAs for future studies aiming to understand the intracellular lifestyle of this pathogen. PMID:26553849

  10. Whole-genome sequence of Sunxiuqinia dokdonensis DH1(T), isolated from deep sub-seafloor sediment in Dokdo Island.

    PubMed

    Lim, Sooyeon; Chang, Dong-Ho; Kim, Byoung-Chan

    2016-09-01

    Sunxiuqinia dokdonensis DH1(T) was isolated from deep sub-seafloor sediment at a depth of 900 m below the seafloor off Seo-do (the west part of Dokdo Island) in the East Sea of the Republic of Korea and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession LGIA00000000. PMID:27437183

  11. Draft Genome Sequence of Alcanivorax sp. Strain KX64203 Isolated from Deep-Sea Sediments of Iheya North, Okinawa Trough

    PubMed Central

    Liu, Rui; Wang, Mengqiang; Wang, Hao; Gao, Qiang; Hou, Zhanhui; Gao, Dahai

    2016-01-01

    This report describes the draft genome sequence of Alcanivorax sp. strain KX64203, isolated from deep-sea sediment samples. The reads generated by an Ion Torrent PGM were assembled into contigs, with a total size of 4.76 Mb. The data will improve our understanding of the strain’s function in alkane degradation. PMID:27563046

  12. Draft Genome Sequence of Alcanivorax sp. Strain KX64203 Isolated from Deep-Sea Sediments of Iheya North, Okinawa Trough.

    PubMed

    Zhang, Huan; Liu, Rui; Wang, Mengqiang; Wang, Hao; Gao, Qiang; Hou, Zhanhui; Gao, Dahai; Wang, Lingling

    2016-01-01

    This report describes the draft genome sequence of Alcanivorax sp. strain KX64203, isolated from deep-sea sediment samples. The reads generated by an Ion Torrent PGM were assembled into contigs, with a total size of 4.76 Mb. The data will improve our understanding of the strain's function in alkane degradation. PMID:27563046

  13. Small RNA deep sequencing revealed that mixed infection of known and unknown viruses were common in field collected vegetable samples

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In an effort to characterize the causal agents for plant diseases in field collected samples using the small RNA deep sequencing technology, numerous known or novel viruses and viroids were identified. In many cases, a mixed infection with multiple pathogen species was common. Such situation compl...

  14. Discovering the Unknown: Improving Detection of Novel Species and Genera from Short Reads

    DOE PAGESBeta

    Rosen, Gail L.; Polikar, Robi; Caseiro, Diamantino A.; Essinger, Steven D.; Sokhansanj, Bahrad A.

    2011-01-01

    High-throughput sequencing technologies enable metagenome profiling, simultaneous sequencing of multiple microbial species present within an environmental sample. Since metagenomic data includes sequence fragments (“reads”) from organisms that are absent from any database, new algorithms must be developed for the identification and annotation of novel sequence fragments. Homology-based techniques have been modified to detect novel species and genera, but, composition-based methods, have not been adapted. We develop a detection technique that can discriminate between “known” and “unknown” taxa, which can be used with composition-based methods, as well as a hybrid method. Unlike previous studies, we rigorously evaluate all algorithms for theirmore » ability to detect novel taxa. First, we show that the integration of a detector with a composition-based method performs significantly better than homology-based methods for the detection of novel species and genera, with best performance at finer taxonomic resolutions. Most importantly, we evaluate all the algorithms by introducing an “unknown” class and show that the modified version of PhymmBL has similar or better overall classification performance than the other modified algorithms, especially for the species-level and ultrashort reads. Finally, we evaluate theperformance of several algorithms on a real acid mine drainage dataset.« less

  15. High-Resolution Hepatitis C Virus Subtyping Using NS5B Deep Sequencing and Phylogeny, an Alternative to Current Methods

    PubMed Central

    Gregori, Josep; Rodríguez-Frias, Francisco; Buti, Maria; Madejon, Antonio; Perez-del-Pulgar, Sofia; Garcia-Cehic, Damir; Casillas, Rosario; Blasi, Maria; Homs, Maria; Tabernero, David; Alvarez-Tejado, Miguel; Muñoz, Jose Manuel; Cubero, Maria; Caballero, Andrea; delCampo, Jose Antonio; Domingo, Esteban; Belmonte, Irene; Nieto, Leonardo; Lens, Sabela; Muñoz-de-Rueda, Paloma; Sanz-Cameno, Paloma; Sauleda, Silvia; Bes, Marta; Gomez, Jordi; Briones, Carlos; Perales, Celia; Sheldon, Julie; Castells, Lluis; Viladomiu, Lluis; Salmeron, Javier; Ruiz-Extremera, Angela; Quiles-Pérez, Rosa; Moreno-Otero, Ricardo; López-Rodríguez, Rosario; Allende, Helena; Romero-Gómez, Manuel; Guardia, Jaume; Esteban, Rafael; Garcia-Samaniego, Javier; Forns, Xavier

    2014-01-01

    Hepatitis C virus (HCV) is classified into seven major genotypes and 67 subtypes. Recent studies have shown that in HCV genotype 1-infected patients, response rates to regimens containing direct-acting antivirals (DAAs) are subtype dependent. Currently available genotyping methods have limited subtyping accuracy. We have evaluated the performance of a deep-sequencing-based HCV subtyping assay, developed for the 454/GS-Junior platform, in comparison with those of two commercial assays (Versant HCV genotype 2.0 and Abbott Real-time HCV Genotype II) and using direct NS5B sequencing as a gold standard (direct sequencing), in 114 clinical specimens previously tested by first-generation hybridization assay (82 genotype 1 and 32 with uninterpretable results). Phylogenetic analysis of deep-sequencing reads matched subtype 1 calling by population Sanger sequencing (69% 1b, 31% 1a) in 81 specimens and identified a mixed-subtype infection (1b/3a/1a) in one sample. Similarly, among the 32 previously indeterminate specimens, identical genotype and subtype results were obtained by direct and deep sequencing in all but four samples with dual infection. In contrast, both Versant HCV Genotype 2.0 and Abbott Real-time HCV Genotype II failed subtype 1 calling in 13 (16%) samples each and were unable to identify the HCV genotype and/or subtype in more than half of the non-genotype 1 samples. We concluded that deep sequencing is more efficient for HCV subtyping than currently available methods and allows qualitative identification of mixed infections and may be more helpful with respect to informing treatment strategies with new DAA-containing regimens across all HCV subtypes. PMID:25378574

  16. DeepSNVMiner: a sequence analysis tool to detect emergent, rare mutations in subsets of cell populations

    PubMed Central

    Andrews, T. Daniel; Jeelall, Yogesh; Talaulikar, Dipti; Goodnow, Christopher C.

    2016-01-01

    Background. Massively parallel sequencing technology is being used to sequence highly diverse populations of DNA such as that derived from heterogeneous cell mixtures containing both wild-type and disease-related states. At the core of such molecule tagging techniques is the tagging and identification of sequence reads derived from individual input DNA molecules, which must be first computationally disambiguated to generate read groups sharing common sequence tags, with each read group representing a single input DNA molecule. This disambiguation typically generates huge numbers of reads groups, each of which requires additional variant detection analysis steps to be run specific to each read group, thus representing a significant computational challenge. While sequencing technologies for producing these data are approaching maturity, the lack of available computational tools for analysing such heterogeneous sequence data represents an obstacle to the widespread adoption of this technology. Results. Using synthetic data we successfully detect unique variants at dilution levels of 1 in a 1,000,000 molecules, and find DeeepSNVMiner obtains significantly lower false positive and false negative rates compared to popular variant callers GATK, SAMTools, FreeBayes and LoFreq, particularly as the variant concentration levels decrease. In a dilution series with genomic DNA from two cells lines, we find DeepSNVMiner identifies a known somatic variant when present at concentrations of only 1 in 1,000 molecules in the input material, the lowest concentration amongst all variant callers tested. Conclusions. Here we present DeepSNVMiner; a tool to disambiguate tagged sequence groups and robustly identify sequence variants specific to subsets of starting DNA molecules that may indicate the presence of a disease. DeepSNVMiner is an automated workflow of custom sequence analysis utilities and open source tools able to differentiate somatic DNA variants from artefactual sequence

  17. DeepSNVMiner: a sequence analysis tool to detect emergent, rare mutations in subsets of cell populations.

    PubMed

    Andrews, T Daniel; Jeelall, Yogesh; Talaulikar, Dipti; Goodnow, Christopher C; Field, Matthew A

    2016-01-01

    Background. Massively parallel sequencing technology is being used to sequence highly diverse populations of DNA such as that derived from heterogeneous cell mixtures containing both wild-type and disease-related states. At the core of such molecule tagging techniques is the tagging and identification of sequence reads derived from individual input DNA molecules, which must be first computationally disambiguated to generate read groups sharing common sequence tags, with each read group representing a single input DNA molecule. This disambiguation typically generates huge numbers of reads groups, each of which requires additional variant detection analysis steps to be run specific to each read group, thus representing a significant computational challenge. While sequencing technologies for producing these data are approaching maturity, the lack of available computational tools for analysing such heterogeneous sequence data represents an obstacle to the widespread adoption of this technology. Results. Using synthetic data we successfully detect unique variants at dilution levels of 1 in a 1,000,000 molecules, and find DeeepSNVMiner obtains significantly lower false positive and false negative rates compared to popular variant callers GATK, SAMTools, FreeBayes and LoFreq, particularly as the variant concentration levels decrease. In a dilution series with genomic DNA from two cells lines, we find DeepSNVMiner identifies a known somatic variant when present at concentrations of only 1 in 1,000 molecules in the input material, the lowest concentration amongst all variant callers tested. Conclusions. Here we present DeepSNVMiner; a tool to disambiguate tagged sequence groups and robustly identify sequence variants specific to subsets of starting DNA molecules that may indicate the presence of a disease. DeepSNVMiner is an automated workflow of custom sequence analysis utilities and open source tools able to differentiate somatic DNA variants from artefactual sequence

  18. mRNA deep sequencing reveals 75 new genes and a complex transcriptional landscape in Mimivirus.

    PubMed

    Legendre, Matthieu; Audic, Stéphane; Poirot, Olivier; Hingamp, Pascal; Seltzer, Virginie; Byrne, Deborah; Lartigue, Audrey; Lescot, Magali; Bernadac, Alain; Poulain, Julie; Abergel, Chantal; Claverie, Jean-Michel

    2010-05-01

    Mimivirus, a virus infecting Acanthamoeba, is the prototype of the Mimiviridae, the latest addition to the nucleocytoplasmic large DNA viruses. The Mimivirus genome encodes close to 1000 proteins, many of them never before encountered in a virus, such as four amino-acyl tRNA synthetases. To explore the physiology of this exceptional virus and identify the genes involved in the building of its characteristic intracytoplasmic "virion factory," we coupled electron microscopy observations with the massively parallel pyrosequencing of the polyadenylated RNA fractions of Acanthamoeba castellanii cells at various time post-infection. We generated 633,346 reads, of which 322,904 correspond to Mimivirus transcripts. This first application of deep mRNA sequencing (454 Life Sciences [Roche] FLX) to a large DNA virus allowed the precise delineation of the 5' and 3' extremities of Mimivirus mRNAs and revealed 75 new transcripts including several noncoding RNAs. Mimivirus genes are expressed across a wide dynamic range, in a finely regulated manner broadly described by three main temporal classes: early, intermediate, and late. This RNA-seq study confirmed the AAAATTGA sequence as an early promoter element, as well as the presence of palindromes at most of the polyadenylation sites. It also revealed a new promoter element correlating with late gene expression, which is also prominent in Sputnik, the recently described Mimivirus "virophage." These results-validated genome-wide by the hybridization of total RNA extracted from infected Acanthamoeba cells on a tiling array (Agilent)--will constitute the foundation on which to build subsequent functional studies of the Mimivirus/Acanthamoeba system. PMID:20360389

  19. Improved Sequence Learning with Subthalamic Nucleus Deep Brain Stimulation: Evidence for Treatment-Specific Network Modulation

    PubMed Central

    Mure, Hideo; Tang, Chris C.; Argyelan, Miklos; Ghilardi, Maria-Felice; Kaplitt, Michael G.; Dhawan, Vijay; Eidelberg, David

    2015-01-01

    We used a network approach to study the effects of anti-parkinsonian treatment on motor sequence learning in humans. Eight Parkinson’s disease (PD) patients with bilateral subthalamic nucleus (STN) deep brain stimulation underwent H2 15Opositron emission tomography (PET) imaging to measure regional cerebral blood flow (rCBF) while they performed kinematically matched sequence learning and movement tasks at baseline and during stimulation. Network analysis revealed a significant learning-related spatial covariance pattern characterized by consistent increases in subject expression during stimulation (p = 0.008, permutation test). The network was associated with increased activity in the lateral cerebellum, dorsal premotor cortex, and parahippocampal gyrus, with covarying reductions in the supplementary motor area (SMA) and orbitofrontal cortex. Stimulation-mediated increases in network activity correlated with concurrent improvement in learning performance (p < 0.02). To determine whether similar changes occurred during dopaminergic pharmacotherapy, we studied the subjects during an intravenous levodopa infusion titrated to achieve a motor response equivalent to stimulation. Despite consistent improvement in motor ratings during infusion, levodopa did not alter learning performance or network activity. Analysis of learning-related rCBF in network regions revealed improvement in baseline abnormalities with STN stimulation but not levodopa. These effects were most pronounced in the SMA. In this region, a consistent rCBF response to stimulation was observed across subjects and trials (p = 0.01), although the levodopa response was not significant. These findings link the cognitive treatment response in PD to changes in the activity of a specific cerebello-premotor cortical network. Selective modulation of overactive SMA–STN projection pathways may underlie the improvement in learning found with stimulation. PMID:22357863

  20. A High-Dimensional, Deep-Sequencing Study of Lung Adenocarcinoma in Female Never-Smokers

    PubMed Central

    Kim, Pora; Park, Jehwan; Seo, Jihae; Kim, Jiwoong; Park, Seongjin; Jang, Insu; Kim, Namshin; Yang, Jin Ok; Lee, Byungwook; Rho, Kyoohyoung; Jung, Yeonhwa; Keum, Juhee; Lee, Jinseon; Han, Jungho; Kang, Sangeun; Bae, Sujin; Choi, So-Jung; Kim, Sujin; Lee, Jong-Eun; Kim, Wankyu; Kim, Jhingook; Lee, Sanghyuk

    2013-01-01

    Background Deep sequencing techniques provide a remarkable opportunity for comprehensive understanding of tumorigenesis at the molecular level. As omics studies become popular, integrative approaches need to be developed to move from a simple cataloguing of mutations and changes in gene expression to dissecting the molecular nature of carcinogenesis at the systemic level and understanding the complex networks that lead to cancer development. Results Here, we describe a high-throughput, multi-dimensional sequencing study of primary lung adenocarcinoma tumors and adjacent normal tissues of six Korean female never-smoker patients. Our data encompass results from exome-seq, RNA-seq, small RNA-seq, and MeDIP-seq. We identified and validated novel genetic aberrations, including 47 somatic mutations and 19 fusion transcripts. One of the fusions involves the c-RET gene, which was recently reported to form fusion genes that may function as drivers of carcinogenesis in lung cancer patients. We also characterized gene expression profiles, which we integrated with genomic aberrations and gene regulations into functional networks. The most prominent gene network module that emerged indicates that disturbances in G2/M transition and mitotic progression are causally linked to tumorigenesis in these patients. Also, results from the analysis strongly suggest that several novel microRNA-target interactions represent key regulatory elements of the gene network. Conclusions Our study not only provides an overview of the alterations occurring in lung adenocarcinoma at multiple levels from genome to transcriptome and epigenome, but also offers a model for integrative genomics analysis and proposes potential target pathways for the control of lung adenocarcinoma. PMID:23405175

  1. Hybrid DNA virus in Chinese patients with seronegative hepatitis discovered by deep sequencing

    PubMed Central

    Xu, Baoyan; Zhi, Ning; Hu, Gangqing; Wan, Zhihong; Zheng, Xiaobin; Liu, Xiaohong; Wong, Susan; Kajigaya, Sachiko; Zhao, Keji; Mao, Qing; Young, Neal S.

    2013-01-01

    Seronegative hepatitis—non-A, non-B, non-C, non-D, non-E hepatitis—is poorly characterized but strongly associated with serious complications. We collected 92 sera specimens from patients with non-A–E hepatitis in Chongqing, China between 1999 and 2007. Ten sera pools were screened by Solexa deep sequencing. We discovered a 3,780-bp contig present in all 10 pools that yielded BLASTx E scores of 7e-05–0.008 against parvoviruses. The complete sequence of the in silico-assembled 3,780-bp contig was confirmed by gene amplification of overlapping regions over almost the entire genome, and the virus was provisionally designated NIH-CQV. Further analysis revealed that the contig was composed of two major ORFs. By protein BLAST, ORF1 and ORF2 were most homologous to the replication-associated protein of bat circovirus and the capsid protein of porcine parvovirus, respectively. Phylogenetic analysis indicated that NIH-CQV is located at the interface of Parvoviridae and Circoviridae. Prevalence of NIH-CQV in patients was determined by quantitative PCR. Sixty-three of 90 patient samples (70%) were positive, but all those from 45 healthy controls were negative. Average virus titer in the patient specimens was 1.05 e4 copies/µL. Specific antibodies against NIH-CQV were sought by immunoblotting. Eighty-four percent of patients were positive for IgG, and 31% were positive for IgM; in contrast, 78% of healthy controls were positive for IgG, but all were negative for IgM. Although more work is needed to determine the etiologic role of NIH-CQV in human disease, our data indicate that a parvovirus-like virus is highly prevalent in a cohort of patients with non-A–E hepatitis. PMID:23716702

  2. Sequence stratigraphy of Cenozoic deepwater deposits in the Perdido fold belt, Northwestern Deep Gulf of Mexico

    SciTech Connect

    Fiduk, J.C.; Weimer, P.; Trudgill, B.D.

    1996-12-31

    Analysis of 12,000 km of 2-D multifold seismic data shows three large Cenozoic wedges of deepwater deposits in the Perdido fold belt that differ in seismic facies, areal distribution, and potential reservoir geometries. Together, these three wedges reflect the changing positions of Cenozoic depocenters and record the evolution of the Perdido structural province. Lithologic interpretation is based upon seismic facies and analogous facies in other drilled areas in the Gulf of Mexico (1) The Paleocene to middle Oligocene interval, which is strongly folded, reflects pre-growth deposition. Paleocene and Oligocene strata thicken westward and consist of medium to high amplitude, subparallel reflections of varying continuity. Broad channels and channel-levee systems are interpreted, suggesting turbidite deposition. These strata are interpreted as the down-dip equivalent of the Wilcox and Frio shallow-water depo-centers and are potentially sand-prone. Eocene strata are low amplitude, discontinuous, subparallel reflections interpreted to be shale-prone. (2) The upper Oligocene to upper Miocene interval consists of multiple well-developed sequences with variable amplitude, divergent reflections, many of which onlap against the fold crests. Sequences within this interval are often modified by erosion, faulting, and/or slumping against the folds. (3) The upper Miocene to Recent interval, which overlies most folds, consists of channel-levee, overbank, slump, and layered or amalgamated turbidite sheet deposits. These are similar to other coeval submarine fan sediments in the northern deep Gulf. Thus, the Cenozoic section in the Perdido fold belt is interpreted as mostly shale-prone, with some sand-prone intervals, based upon seismic facies, isopach thickening to the west, and similar producing facies elsewhere in the Gulf of Mexico.

  3. Sequence stratigraphy of Cenozoic deepwater deposits in the Perdido fold belt, Northwestern Deep Gulf of Mexico

    SciTech Connect

    Fiduk, J.C.; Weimer, P.; Trudgill, B.D. )

    1996-01-01

    Analysis of 12,000 km of 2-D multifold seismic data shows three large Cenozoic wedges of deepwater deposits in the Perdido fold belt that differ in seismic facies, areal distribution, and potential reservoir geometries. Together, these three wedges reflect the changing positions of Cenozoic depocenters and record the evolution of the Perdido structural province. Lithologic interpretation is based upon seismic facies and analogous facies in other drilled areas in the Gulf of Mexico (1) The Paleocene to middle Oligocene interval, which is strongly folded, reflects pre-growth deposition. Paleocene and Oligocene strata thicken westward and consist of medium to high amplitude, subparallel reflections of varying continuity. Broad channels and channel-levee systems are interpreted, suggesting turbidite deposition. These strata are interpreted as the down-dip equivalent of the Wilcox and Frio shallow-water depo-centers and are potentially sand-prone. Eocene strata are low amplitude, discontinuous, subparallel reflections interpreted to be shale-prone. (2) The upper Oligocene to upper Miocene interval consists of multiple well-developed sequences with variable amplitude, divergent reflections, many of which onlap against the fold crests. Sequences within this interval are often modified by erosion, faulting, and/or slumping against the folds. (3) The upper Miocene to Recent interval, which overlies most folds, consists of channel-levee, overbank, slump, and layered or amalgamated turbidite sheet deposits. These are similar to other coeval submarine fan sediments in the northern deep Gulf. Thus, the Cenozoic section in the Perdido fold belt is interpreted as mostly shale-prone, with some sand-prone intervals, based upon seismic facies, isopach thickening to the west, and similar producing facies elsewhere in the Gulf of Mexico.

  4. The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments

    PubMed Central

    Ingolia, Nicholas T.; Brar, Gloria A.; Rouskin, Silvia; McGeachy, Anna M.; Weissman, Jonathan S.

    2012-01-01

    Recent studies highlight the importance of translational control in determining protein abundance, underscoring the value of measuring gene expression at the level of translation. We present a protocol for genome-wide, quantitative analysis of in vivo translation by deep sequencing. This ribosome profiling approach maps the exact positions of ribosomes on transcripts by nuclease footprinting. The nuclease-protected mRNA fragments are converted into a DNA library suitable for deep sequencing using a strategy that minimizes bias. The abundance of different footprint fragments in deep sequencing data reports on the amount of translation of a gene. Additionally, footprints reveal the exact regions of the transcriptome that are translated. To better define translated reading frames, we describe an adaptation that reveals the sites of translation initiation by pre-treating cells with harringtonine to immobilize initiating ribosomes. The protocol we describe requires 5–7 days to generate a completed ribosome profiling sequencing library. Sequencing and data analysis requires a further 4 – 5 days. PMID:22836135

  5. Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads

    PubMed Central

    Kajitani, Rei; Toshimoto, Kouta; Noguchi, Hideki; Toyoda, Atsushi; Ogura, Yoshitoshi; Okuno, Miki; Yabana, Mitsuru; Harada, Masayuki; Nagayasu, Eiji; Maruyama, Haruhiko; Kohara, Yuji; Fujiyama, Asao; Hayashi, Tetsuya; Itoh, Takehiko

    2014-01-01

    Although many de novo genome assembly projects have recently been conducted using high-throughput sequencers, assembling highly heterozygous diploid genomes is a substantial challenge due to the increased complexity of the de Bruijn graph structure predominantly used. To address the increasing demand for sequencing of nonmodel and/or wild-type samples, in most cases inbred lines or fosmid-based hierarchical sequencing methods are used to overcome such problems. However, these methods are costly and time consuming, forfeiting the advantages of massive parallel sequencing. Here, we describe a novel de novo assembler, Platanus, that can effectively manage high-throughput data from heterozygous samples. Platanus assembles DNA fragments (reads) into contigs by constructing de Bruijn graphs with automatically optimized k-mer sizes followed by the scaffolding of contigs based on paired-end information. The complicated graph structures that result from the heterozygosity are simplified during not only the contig assembly step but also the scaffolding step. We evaluated the assembly results on eukaryotic samples with various levels of heterozygosity. Compared with other assemblers, Platanus yields assembly results that have a larger scaffold NG50 length without any accompanying loss of accuracy in both simulated and real data. In addition, Platanus recorded the largest scaffold NG50 values for two of the three low-heterozygosity species used in the de novo assembly contest, Assemblathon 2. Platanus therefore provides a novel and efficient approach for the assembly of gigabase-sized highly heterozygous genomes and is an attractive alternative to the existing assemblers designed for genomes of lower heterozygosity. PMID:24755901

  6. Complete genome sequence of Southern tomato virus naturally infecting tomatoes in Bangladesh using small RNA deep sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The complete genome sequence of a Southern tomato virus (STV) isolate on tomato plants in a seed production field in Bangladesh was obtained for the first time using next generation sequencing. The identified isolate STV_BD-13 shares high degree of sequence identity (99%) with several known STV isol...

  7. A novel method for identifying polymorphic transposable elements via scanning of high-throughput short reads.

    PubMed

    Kang, Houxiang; Zhu, Dan; Lin, Runmao; Opiyo, Stephen Obol; Jiang, Ning; Shiu, Shin-Han; Wang, Guo-Liang

    2016-06-01

    Identification of polymorphic transposable elements (TEs) is important because TE polymorphism creates genetic diversity and influences the function of genes in the host genome. However, de novo scanning of polymorphic TEs remains a challenge. Here, we report a novel computational method, called PTEMD (polymorphic TEs and their movement detection), for de novo discovery of genome-wide polymorphic TEs. PTEMD searches highly identical sequences using reads supported breakpoint evidences. Using PTEMD, we identified 14 polymorphic TE families (905 sequences) in rice blast fungus Magnaporthe oryzae, and 68 (10,618 sequences) in maize. We validated one polymorphic TE family experimentally, MoTE-1; all MoTE-1 family members are located in different genomic loci in the three tested isolates. We found that 57.1% (8 of 14) of the PTEMD-detected polymorphic TE families in M. oryzae are active. Furthermore, our data indicate that there are more polymorphic DNA transposons in maize than their counterparts of retrotransposons despite the fact that retrotransposons occupy largest fraction of genomic mass. We demonstrated that PTEMD is an effective tool for identifying polymorphic TEs in M. oryzae and maize genomes. PTEMD and the genome-wide polymorphic TEs in M. oryzae and maize are publically available at http://www.kanglab.cn/blast/PTEMD_V1.02.htm. PMID:27098848

  8. A novel method for identifying polymorphic transposable elements via scanning of high-throughput short reads

    PubMed Central

    Kang, Houxiang; Zhu, Dan; Lin, Runmao; Opiyo, Stephen Obol; Jiang, Ning; Shiu, Shin-Han; Wang, Guo-Liang

    2016-01-01

    Identification of polymorphic transposable elements (TEs) is important because TE polymorphism creates genetic diversity and influences the function of genes in the host genome. However, de novo scanning of polymorphic TEs remains a challenge. Here, we report a novel computational method, called PTEMD (polymorphic TEs and their movement detection), for de novo discovery of genome-wide polymorphic TEs. PTEMD searches highly identical sequences using reads supported breakpoint evidences. Using PTEMD, we identified 14 polymorphic TE families (905 sequences) in rice blast fungus Magnaporthe oryzae, and 68 (10,618 sequences) in maize. We validated one polymorphic TE family experimentally, MoTE-1; all MoTE-1 family members are located in different genomic loci in the three tested isolates. We found that 57.1% (8 of 14) of the PTEMD-detected polymorphic TE families in M. oryzae are active. Furthermore, our data indicate that there are more polymorphic DNA transposons in maize than their counterparts of retrotransposons despite the fact that retrotransposons occupy largest fraction of genomic mass. We demonstrated that PTEMD is an effective tool for identifying polymorphic TEs in M. oryzae and maize genomes. PTEMD and the genome-wide polymorphic TEs in M. oryzae and maize are publically available at http://www.kanglab.cn/blast/PTEMD_V1.02.htm. PMID:27098848

  9. Deep sequencing reveals abundant noncanonical retroviral microRNAs in B-cell leukemia/lymphoma.

    PubMed

    Rosewick, Nicolas; Momont, Mélanie; Durkin, Keith; Takeda, Haruko; Caiment, Florian; Cleuter, Yvette; Vernin, Céline; Mortreux, Franck; Wattel, Eric; Burny, Arsène; Georges, Michel; Van den Broeke, Anne

    2013-02-01

    Viral tumor models have significantly contributed to our understanding of oncogenic mechanisms. How transforming delta-retroviruses induce malignancy, however, remains poorly understood, especially as viral mRNA/protein are tightly silenced in tumors. Here, using deep sequencing of broad windows of small RNA sizes in the bovine leukemia virus ovine model of leukemia/lymphoma, we provide in vivo evidence of the production of noncanonical RNA polymerase III (Pol III)-transcribed viral microRNAs in leukemic B cells in the complete absence of Pol II 5'-LTR-driven transcriptional activity. Processed from a cluster of five independent self-sufficient transcriptional units located in a proviral region dispensable for in vivo infectivity, bovine leukemia virus microRNAs represent ∼40% of all microRNAs in both experimental and natural malignancy. They are subject to strong purifying selection and associate with Argonautes, consistent with a critical function in silencing of important cellular and/or viral targets. Bovine leukemia virus microRNAs are strongly expressed in preleukemic and malignant cells in which structural and regulatory gene expression is repressed, suggesting a key role in tumor onset and progression. Understanding how Pol III-dependent microRNAs subvert cellular and viral pathways will contribute to deciphering the intricate perturbations that underlie malignant transformation. PMID:23345446

  10. Genomic region operation kit for flexible processing of deep sequencing data.

    PubMed

    Ovaska, Kristian; Lyly, Lauri; Sahu, Biswajyoti; Jänne, Olli A; Hautaniemi, Sampsa

    2013-01-01

    Computational analysis of data produced in deep sequencing (DS) experiments is challenging due to large data volumes and requirements for flexible analysis approaches. Here, we present a mathematical formalism based on set algebra for frequently performed operations in DS data analysis to facilitate translation of biomedical research questions to language amenable for computational analysis. With the help of this formalism, we implemented the Genomic Region Operation Kit (GROK), which supports various DS-related operations such as preprocessing, filtering, file conversion, and sample comparison. GROK provides high-level interfaces for R, Python, Lua, and command line, as well as an extension C++ API. It supports major genomic file formats and allows storing custom genomic regions in efficient data structures such as red-black trees and SQL databases. To demonstrate the utility of GROK, we have characterized the roles of two major transcription factors (TFs) in prostate cancer using data from 10 DS experiments. GROK is freely available with a user guide from >http://csbi.ltdk.helsinki.fi/grok/. PMID:23702556

  11. Deep sequencing reveals unique small RNA repertoire that is regulated during head regeneration in Hydra magnipapillata.

    PubMed

    Krishna, Srikar; Nair, Aparna; Cheedipudi, Sirisha; Poduval, Deepak; Dhawan, Jyotsna; Palakodeti, Dasaradhi; Ghanekar, Yashoda

    2013-01-01

    Small non-coding RNAs such as miRNAs, piRNAs and endo-siRNAs fine-tune gene expression through post-transcriptional regulation, modulating important processes in development, differentiation, homeostasis and regeneration. Using deep sequencing, we have profiled small non-coding RNAs in Hydra magnipapillata and investigated changes in small RNA expression pattern during head regeneration. Our results reveal a unique repertoire of small RNAs in hydra. We have identified 126 miRNA loci; 123 of these miRNAs are unique to hydra. Less than 50% are conserved across two different strains of Hydra vulgaris tested in this study, indicating a highly diverse nature of hydra miRNAs in contrast to bilaterian miRNAs. We also identified siRNAs derived from precursors with perfect stem-loop structure and that arise from inverted repeats. piRNAs were the most abundant small RNAs in hydra, mapping to transposable elements, the annotated transcriptome and unique non-coding regions on the genome. piRNAs that map to transposable elements and the annotated transcriptome display a ping-pong signature. Further, we have identified several miRNAs and piRNAs whose expression is regulated during hydra head regeneration. Our study defines different classes of small RNAs in this cnidarian model system, which may play a role in orchestrating gene expression essential for hydra regeneration. PMID:23166307

  12. Deep sequencing reveals unique small RNA repertoire that is regulated during head regeneration in Hydra magnipapillata

    PubMed Central

    Krishna, Srikar; Nair, Aparna; Cheedipudi, Sirisha; Poduval, Deepak; Dhawan, Jyotsna; Palakodeti, Dasaradhi; Ghanekar, Yashoda

    2013-01-01

    Small non-coding RNAs such as miRNAs, piRNAs and endo-siRNAs fine-tune gene expression through post-transcriptional regulation, modulating important processes in development, differentiation, homeostasis and regeneration. Using deep sequencing, we have profiled small non-coding RNAs in Hydra magnipapillata and investigated changes in small RNA expression pattern during head regeneration. Our results reveal a unique repertoire of small RNAs in hydra. We have identified 126 miRNA loci; 123 of these miRNAs are unique to hydra. Less than 50% are conserved across two different strains of Hydra vulgaris tested in this study, indicating a highly diverse nature of hydra miRNAs in contrast to bilaterian miRNAs. We also identified siRNAs derived from precursors with perfect stem–loop structure and that arise from inverted repeats. piRNAs were the most abundant small RNAs in hydra, mapping to transposable elements, the annotated transcriptome and unique non-coding regions on the genome. piRNAs that map to transposable elements and the annotated transcriptome display a ping–pong signature. Further, we have identified several miRNAs and piRNAs whose expression is regulated during hydra head regeneration. Our study defines different classes of small RNAs in this cnidarian model system, which may play a role in orchestrating gene expression essential for hydra regeneration. PMID:23166307

  13. MICA: A fast short-read aligner that takes full advantage of Many Integrated Core Architecture (MIC)

    PubMed Central

    2015-01-01

    Background Short-read aligners have recently gained a lot of speed by exploiting the massive parallelism of GPU. An uprising alterative to GPU is Intel MIC; supercomputers like Tianhe-2, currently top of TOP500, is built with 48,000 MIC boards to offer ~55 PFLOPS. The CPU-like architecture of MIC allows CPU-based software to be parallelized easily; however, the performance is often inferior to GPU counterparts as an MIC card contains only ~60 cores (while a GPU card typically has over a thousand cores). Results To better utilize MIC-enabled computers for NGS data analysis, we developed a new short-read aligner MICA that is optimized in view of MIC's limitation and the extra parallelism inside each MIC core. By utilizing the 512-bit vector units in the MIC and implementing a new seeding strategy, experiments on aligning 150 bp paired-end reads show that MICA using one MIC card is 4.9 times faster than BWA-MEM (using 6 cores of a top-end CPU), and slightly faster than SOAP3-dp (using a GPU). Furthermore, MICA's simplicity allows very efficient scale-up when multiple MIC cards are used in a node (3 cards give a 14.1-fold speedup over BWA-MEM). Summary MICA can be readily used by MIC-enabled supercomputers for production purpose. We have tested MICA on Tianhe-2 with 90 WGS samples (17.47 Tera-bases), which can be aligned in an hour using 400 nodes. MICA has impressive performance even though MIC is only in its initial stage of development. Availability and implementation MICA's source code is freely available at http://sourceforge.net/projects/mica-aligner under GPL v3. Supplementary information Supplementary information is available as "Additional File 1". Datasets are available at www.bio8.cs.hku.hk/dataset/mica. PMID:25952019

  14. Patchiness of deep-sea benthic Foraminifera across the Southern Ocean: Insights from high-throughput DNA sequencing

    NASA Astrophysics Data System (ADS)

    Lejzerowicz, Franck; Esling, Philippe; Pawlowski, Jan

    2014-10-01

    Spatial patchiness is a natural feature that strongly influences the level of species richness we perceive in surface sediments sampled in the deep-sea. Recent environmental DNA (eDNA) surveys of benthic micro- and meiofauna confirmed this exceptional richness. However, it is unknown to which extent the results of these studies, based usually on few grams of sediment, are affected by spatial patchiness of deep-sea benthos. Here, we analyse the eDNA diversity of Foraminifera in 42 deep-sea sediment samples collected across different scales in the Southern Ocean. At three stations, we deployed at least twice the multicorer and from each multicorer cast, we subsampled 3 sediment replicates per core for 2 cores. Using high-throughput sequencing (HTS), we generated over 2.35 million high-quality sequences that we clustered into 451 operational taxonomic units (OTUs). The majority of OTUs were assigned to the monothalamous (single-chambered) taxa and environmental clades. On average, a one-gram sediment sample captures 57.9% of the overall OTU diversity found in a single core, while three replicates cover at most 61.9% of the diversity found in a station. The OTUs found in all the replicates of each core gather up to 87.9% of the total sequenced reads, but only represent from 12.2% to 30% of the OTUs found in one core. These OTUs represent the most abundant species, among which dominate environmental lineages. The majority of the OTUs are represented by few sequences comprising several well-known deep-sea morphospecies or remaining unassigned. It is crucial to study wider arrays of sample and PCR replicates as well as RNA together with DNA in order to overcome biases stemming from deep-sea patchiness and molecular methods.

  15. Deep sequencing analysis of viral infection and evolution allows rapid and detailed characterization of viral mutant spectrum

    PubMed Central

    Isakov, Ofer; Bordería, Antonio V.; Golan, David; Hamenahem, Amir; Celniker, Gershon; Yoffe, Liron; Blanc, Hervé; Vignuzzi, Marco; Shomron, Noam

    2015-01-01

    Motivation: The study of RNA virus populations is a challenging task. Each population of RNA virus is composed of a collection of different, yet related genomes often referred to as mutant spectra or quasispecies. Virologists using deep sequencing technologies face major obstacles when studying virus population dynamics, both experimentally and in natural settings due to the relatively high error rates of these technologies and the lack of high performance pipelines. In order to overcome these hurdles we developed a computational pipeline, termed ViVan (Viral Variance Analysis). ViVan is a complete pipeline facilitating the identification, characterization and comparison of sequence variance in deep sequenced virus populations. Results: Applying ViVan on deep sequenced data obtained from samples that were previously characterized by more classical approaches, we uncovered novel and potentially crucial aspects of virus populations. With our experimental work, we illustrate how ViVan can be used for studies ranging from the more practical, detection of resistant mutations and effects of antiviral treatments, to the more theoretical temporal characterization of the population in evolutionary studies. Availability and implementation: Freely available on the web at http://www.vivanbioinfo.org Contact: nshomron@post.tau.ac.il Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25701575

  16. De Novo Sequencing and Transcriptome Analysis of the Central Nervous System of Mollusc Lymnaea stagnalis by Deep RNA Sequencing

    PubMed Central

    Sadamoto, Hisayo; Takahashi, Hironobu; Okada, Taketo; Kenmoku, Hiromichi; Toyota, Masao; Asakawa, Yoshinori

    2012-01-01

    The pond snail Lymnaea stagnalis is among several mollusc species that have been well investigated due to the simplicity of their nervous systems and large identifiable neurons. Nonetheless, despite the continued attention given to the physiological characteristics of its nervous system, the genetic information of the Lymnaea central nervous system (CNS) has not yet been fully explored. The absence of genetic information is a large disadvantage for transcriptome sequencing because it makes transcriptome assembly difficult. We here performed transcriptome sequencing for Lymnaea CNS using an Illumina Genome Analyzer IIx platform and obtained 81.9 M of 100 base pair (bp) single end reads. For de novo assembly, five programs were used: ABySS, Velvet, OASES, Trinity and Rnnotator. Based on a comparison of the assemblies, we chose the Rnnotator dataset for the following blast searches and gene ontology analyses. The present dataset, 116,355 contigs of Lymnaea transcriptome shotgun assembly (TSA), contained longer sequences and was much larger compared to the previously reported Lymnaea expression sequence tag (EST) established by classical Sanger sequencing. The TSA sequences were subjected to blast analyses against several protein databases and Aplysia EST data. The results demonstrated that about 20,000 sequences had significant similarity to the reported sequences using a cutoff value of 1e-6, and showed the lack of molluscan sequences in the public databases. The richness of the present TSA data allowed us to identify a large number of new transcripts in Lymnaea and molluscan species. PMID:22870333

  17. QuasR: quantification and annotation of short reads in R

    PubMed Central

    Gaidatzis, Dimos; Lerch, Anita; Hahne, Florian; Stadler, Michael B.

    2015-01-01

    Summary: QuasR is a package for the integrated analysis of high-throughput sequencing data in R, covering all steps from read preprocessing, alignment and quality control to quantification. QuasR supports different experiment types (including RNA-seq, ChIP-seq and Bis-seq) and analysis variants (e.g. paired-end, stranded, spliced and allele-specific), and is integrated in Bioconductor so that its output can be directly processed for statistical analysis and visualization. Availability and implementation: QuasR is implemented in R and C/C++. Source code and binaries for major platforms (Linux, OS X and MS Windows) are available from Bioconductor (www.bioconductor.org/packages/release/bioc/html/QuasR.html). The package includes a ‘vignette’ with step-by-step examples for typical work flows. Contact: michael.stadler@fmi.ch Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25417205

  18. Acyclic Identification of Aptamers for Human alpha-Thrombin Using Over-Represented Libraries and Deep Sequencing

    PubMed Central

    Kupakuwana, Gillian V.; Crill, James E.; McPike, Mark P.; Borer, Philip N.

    2011-01-01

    Background Aptamers are oligonucleotides that bind proteins and other targets with high affinity and selectivity. Twenty years ago elements of natural selection were adapted to in vitro selection in order to distinguish aptamers among randomized sequence libraries. The primary bottleneck in traditional aptamer discovery is multiple cycles of in vitro evolution. Methodology/Principal Findings We show that over-representation of sequences in aptamer libraries and deep sequencing enables acyclic identification of aptamers. We demonstrated this by isolating a known family of aptamers for human α-thrombin. Aptamers were found within a library containing an average of 56,000 copies of each possible randomized 15mer segment. The high affinity sequences were counted many times above the background in 2–6 million reads. Clustering analysis of sequences with more than 10 counts distinguished two sequence motifs with candidates at high abundance. Motif I contained the previously observed consensus 15mer, Thb1 (46,000 counts), and related variants with mostly G/T substitutions; secondary analysis showed that affinity for thrombin correlated with abundance (Kd = 12 nM for Thb1). The signal-to-noise ratio for this experiment was roughly 10,000∶1 for Thb1. Motif II was unrelated to Thb1 with the leading candidate (29,000 counts) being a novel aptamer against hexose sugars in the storage and elution buffers for Concanavilin A (Kd = 0.5 µM for α-methyl-mannoside); ConA was used to immobilize α-thrombin. Conclusions/Significance Over-representation together with deep sequencing can dramatically shorten the discovery process, distinguish aptamers having a wide range of affinity for the target, allow an exhaustive search of the sequence space within a simplified library, reduce the quantity of the target required, eliminate cycling artifacts, and should allow multiplexing of sequencing experiments and targets. PMID:21625587

  19. Microbial Dark Matter: Unusual intervening sequences in 16S rRNA genes of candidate phyla from the deep subsurface

    SciTech Connect

    Jarett, Jessica; Stepanauskas, Ramunas; Kieft, Thomas; Onstott, Tullis; Woyke, Tanja

    2014-03-17

    The Microbial Dark Matter project has sequenced genomes from over 200 single cells from candidate phyla, greatly expanding our knowledge of the ecology, inferred metabolism, and evolution of these widely distributed, yet poorly understood lineages. The second phase of this project aims to sequence an additional 800 single cells from known as well as potentially novel candidate phyla derived from a variety of environments. In order to identify whole genome amplified single cells, screening based on phylogenetic placement of 16S rRNA gene sequences is being conducted. Briefly, derived 16S rRNA gene sequences are aligned to a custom version of the Greengenes reference database and added to a reference tree in ARB using parsimony. In multiple samples from deep subsurface habitats but not from other habitats, a large number of sequences proved difficult to align and therefore to place in the tree. Based on comparisons to reference sequences and structural alignments using SSU-ALIGN, many of these ?difficult? sequences appear to originate from candidate phyla, and contain intervening sequences (IVSs) within the 16S rRNA genes. These IVSs are short (39 - 79 nt) and do not appear to be self-splicing or to contain open reading frames. IVSs were found in the loop regions of stem-loop structures in several different taxonomic groups. Phylogenetic placement of sequences is strongly affected by IVSs; two out of three groups investigated were classified as different phyla after their removal. Based on data from samples screened in this project, IVSs appear to be more common in microbes occurring in deep subsurface habitats, although the reasons for this remain elusive.

  20. Deep sequencing reveals the complete genome and evidence for transcriptional activity of the first virus-like sequences identified in Aristotelia chilensis (Maqui Berry).

    PubMed

    Villacreses, Javier; Rojas-Herrera, Marcelo; Sánchez, Carolina; Hewstone, Nicole; Undurraga, Soledad F; Alzate, Juan F; Manque, Patricio; Maracaja-Coutinho, Vinicius; Polanco, Victor

    2015-04-01

    Here, we report the genome sequence and evidence for transcriptional activity of a virus-like element in the native Chilean berry tree Aristotelia chilensis. We propose to name the endogenous sequence as Aristotelia chilensis Virus 1 (AcV1). High-throughput sequencing of the genome of this tree uncovered an endogenous viral element, with a size of 7122 bp, corresponding to the complete genome of AcV1. Its sequence contains three open reading frames (ORFs): ORFs 1 and 2 shares 66%-73% amino acid similarity with members of the Caulimoviridae virus family, especially the Petunia vein clearing virus (PVCV), Petuvirus genus. ORF1 encodes a movement protein (MP); ORF2 a Reverse Transcriptase (RT) and a Ribonuclease H (RNase H) domain; and ORF3 showed no amino acid sequence similarity with any other known virus proteins. Analogous to other known endogenous pararetrovirus sequences (EPRVs), AcV1 is integrated in the genome of Maqui Berry and showed low viral transcriptional activity, which was detected by deep sequencing technology (DNA and RNA-seq). Phylogenetic analysis of AcV1 and other pararetroviruses revealed a closer resemblance with Petuvirus. Overall, our data suggests that AcV1 could be a new member of Caulimoviridae family, genus Petuvirus, and the first evidence of this kind of virus in a fruit plant. PMID:25855242

  1. Deep Sequencing Reveals the Complete Genome and Evidence for Transcriptional Activity of the First Virus-Like Sequences Identified in Aristotelia chilensis (Maqui Berry)

    PubMed Central

    Villacreses, Javier; Rojas-Herrera, Marcelo; Sánchez, Carolina; Hewstone, Nicole; Undurraga, Soledad F.; Alzate, Juan F.; Manque, Patricio; Maracaja-Coutinho, Vinicius; Polanco, Victor

    2015-01-01

    Here, we report the genome sequence and evidence for transcriptional activity of a virus-like element in the native Chilean berry tree Aristotelia chilensis. We propose to name the endogenous sequence as Aristotelia chilensis Virus 1 (AcV1). High-throughput sequencing of the genome of this tree uncovered an endogenous viral element, with a size of 7122 bp, corresponding to the complete genome of AcV1. Its sequence contains three open reading frames (ORFs): ORFs 1 and 2 shares 66%–73% amino acid similarity with members of the Caulimoviridae virus family, especially the Petunia vein clearing virus (PVCV), Petuvirus genus. ORF1 encodes a movement protein (MP); ORF2 a Reverse Transcriptase (RT) and a Ribonuclease H (RNase H) domain; and ORF3 showed no amino acid sequence similarity with any other known virus proteins. Analogous to other known endogenous pararetrovirus sequences (EPRVs), AcV1 is integrated in the genome of Maqui Berry and showed low viral transcriptional activity, which was detected by deep sequencing technology (DNA and RNA-seq). Phylogenetic analysis of AcV1 and other pararetroviruses revealed a closer resemblance with Petuvirus. Overall, our data suggests that AcV1 could be a new member of Caulimoviridae family, genus Petuvirus, and the first evidence of this kind of virus in a fruit plant. PMID:25855242

  2. Deep sequencing reveals a novel closterovirus associated with wild rose leaf rosette disease.

    PubMed

    He, Yan; Yang, Zuokun; Hong, Ni; Wang, Guoping; Ning, Guogui; Xu, Wenxing

    2015-06-01

    A bizarre virus-like symptom of a leaf rosette formed by dense small leaves on branches of wild roses (Rosa multiflora Thunb.), designated as 'wild rose leaf rosette disease' (WRLRD), was observed in China. To investigate the presumed causal virus, a wild rose sample affected by WRLRD was subjected to deep sequencing of small interfering RNAs (siRNAs) for a complete survey of the infecting viruses and viroids. The assembly of siRNAs led to the reconstruction of the complete genomes of three known viruses, namely Apple stem grooving virus (ASGV), Blackberry chlorotic ringspot virus (BCRV) and Prunus necrotic ringspot virus (PNRSV), and of a novel virus provisionally named 'rose leaf rosette-associated virus' (RLRaV). Phylogenetic analysis clearly placed RLRaV alongside members of the genus Closterovirus, family Closteroviridae. Genome organization of RLRaV RNA (17,653 nucleotides) showed 13 open reading frames (ORFs), except ORF1 and the quintuple gene block, most of which showed no significant similarities with known viral proteins, but, instead, had detectable identities to fungal or bacterial proteins. Additional novel molecular features indicated that RLRaV seems to be the most complex virus among the known genus members. To our knowledge, this is the first report of WRLRD and its associated closterovirus, as well as two ilarviruses and one capilovirus, infecting wild roses. Our findings present novel information about the closterovirus and the aetiology of this rose disease which should facilitate its control. More importantly, the novel features of RLRaV help to clarify the molecular and evolutionary features of the closterovirus. PMID:25187347

  3. Unique gene program of rat small resistance mesenteric arteries as revealed by deep RNA sequencing

    PubMed Central

    Reho, John J; Shetty, Amol; Dippold, Rachael P; Mahurkar, Anup; Fisher, Steven A

    2015-01-01

    Deep sequencing of RNA samples from rat small mesenteric arteries (MA) and aorta (AO) identified common and unique features of their gene programs. ∼5% of mRNAs were quantitatively differentially expressed in MA versus AO. Unique transcriptional control in MA smooth muscle is suggested by the selective or enriched expression of transcription factors Nkx2-3, HAND2, and Tcf21 (Capsulin). Enrichment in AO of PPAR transcription factors and their target genes of mitochondrial function, lipid metabolism, and oxidative phosphorylation is consistent with slow (oxidative) tonic smooth muscle. In contrast MA was enriched in contractile and calcium channel mRNAs suggestive of components of fast (glycolytic) phasic smooth muscle. Myosin phosphatase regulatory subunit paralogs Mypt1 and p85 were expressed at similar levels, while smooth muscle MLCK was the only such kinase expressed, suggesting functional redundancy of the former but not the latter in accordance with mouse knockout studies. With regard to vaso-regulatory signals, purinergic receptors P2rx1 and P2rx5 were reciprocally expressed in MA versus AO, while the olfactory receptor Olr59 was enriched in MA. Alox15, which generates the EDHF HPETE, was enriched in MA while eNOS was equally expressed, consistent with the greater role of EDHF in the smaller arteries. mRNAs that were not expressed at a level consistent with impugned function include skeletal myogenic factors, IKK2, nonmuscle myosin, and Gnb3. This screening analysis of gene expression in the small mesenteric resistance arteries suggests testable hypotheses regarding unique aspects of small artery function in the regional control of blood flow. PMID:26156969

  4. Sequence stratigraphy and sedimentology of a shelf-margin lowstand wedge in the deep Wilcox flexture trend of south Texas

    SciTech Connect

    Snedden, J.W. ); Cooke, J.C. ); Johnson, R.K.; Conrad, K.T. )

    1991-03-01

    An integrated sedimentologic and biostratigraphic study of 15 wells and over 1400 ft (430 m) of core facilitated establishment of a sequence stratigraphic framework for the deep Wilcox Group of south Texas. This analysis also revealed the presence of a dip-restricted, sand-prone sediment wedge that produces hydrocarbons in growth-fault structures. A sequence stratigraphic framework for the Wilcox was constructed via the use of faunal-increase markers, thin intervals present in well cuttings characterized by rises in the relative abundance of planktonic foraminifera. These marine flooding horizons can be utilized to subdivide the Wilcox Group into four depositional sequences termed P(aleogene)-8, P-7, P-4, and P-3, in descending order. Identification of standard sequence-bounding unconformities is hampered by the poor seismic expression of the Wilcox and the structural complexity of the area.

  5. Development of a candidate reference material for adventitious virus detection in vaccine and biologicals manufacturing by deep sequencing

    PubMed Central

    Mee, Edward T.; Preston, Mark D.; Minor, Philip D.; Schepelmann, Silke; Huang, Xuening; Nguyen, Jenny; Wall, David; Hargrove, Stacey; Fu, Thomas; Xu, George; Li, Li; Cote, Colette; Delwart, Eric; Li, Linlin; Hewlett, Indira; Simonyan, Vahan; Ragupathy, Viswanath; Alin, Voskanian-Kordi; Mermod, Nicolas; Hill, Christiane; Ottenwälder, Birgit; Richter, Daniel C.; Tehrani, Arman; Jacqueline, Weber-Lehmann; Cassart, Jean-Pol; Letellier, Carine; Vandeputte, Olivier; Ruelle, Jean-Louis; Deyati, Avisek; La Neve, Fabio; Modena, Chiara; Mee, Edward; Schepelmann, Silke; Preston, Mark; Minor, Philip; Eloit, Marc; Muth, Erika; Lamamy, Arnaud; Jagorel, Florence; Cheval, Justine; Anscombe, Catherine; Misra, Raju; Wooldridge, David; Gharbia, Saheer; Rose, Graham; Ng, Siemon H.S.; Charlebois, Robert L.; Gisonni-Lex, Lucy; Mallet, Laurent; Dorange, Fabien; Chiu, Charles; Naccache, Samia; Kellam, Paul; van der Hoek, Lia; Cotten, Matt; Mitchell, Christine; Baier, Brian S.; Sun, Wenping; Malicki, Heather D.

    2016-01-01

    Background Unbiased deep sequencing offers the potential for improved adventitious virus screening in vaccines and biotherapeutics. Successful implementation of such assays will require appropriate control materials to confirm assay performance and sensitivity. Methods A common reference material containing 25 target viruses was produced and 16 laboratories were invited to process it using their preferred adventitious virus detection assay. Results Fifteen laboratories returned results, obtained using a wide range of wet-lab and informatics methods. Six of 25 target viruses were detected by all laboratories, with the remaining viruses detected by 4–14 laboratories. Six non-target viruses were detected by three or more laboratories. Conclusion The study demonstrated that a wide range of methods are currently used for adventitious virus detection screening in biological products by deep sequencing and that they can yield significantly different results. This underscores the need for common reference materials to ensure satisfactory assay performance and enable comparisons between laboratories. PMID:26709640

  6. Proteome-wide Identification of Novel Ceramide-binding Proteins by Yeast Surface cDNA Display and Deep Sequencing.

    PubMed

    Bidlingmaier, Scott; Ha, Kevin; Lee, Nam-Kyung; Su, Yang; Liu, Bin

    2016-04-01

    Although the bioactive sphingolipid ceramide is an important cell signaling molecule, relatively few direct ceramide-interacting proteins are known. We used an approach combining yeast surface cDNA display and deep sequencing technology to identify novel proteins binding directly to ceramide. We identified 234 candidate ceramide-binding protein fragments and validated binding for 20. Most (17) bound selectively to ceramide, although a few (3) bound to other lipids as well. Several novel ceramide-binding domains were discovered, including the EF-hand calcium-binding motif, the heat shock chaperonin-binding motif STI1, the SCP2 sterol-binding domain, and the tetratricopeptide repeat region motif. Interestingly, four of the verified ceramide-binding proteins (HPCA, HPCAL1, NCS1, and VSNL1) and an additional three candidate ceramide-binding proteins (NCALD, HPCAL4, and KCNIP3) belong to the neuronal calcium sensor family of EF hand-containing proteins. We used mutagenesis to map the ceramide-binding site in HPCA and to create a mutant HPCA that does not bind to ceramide. We demonstrated selective binding to ceramide by mammalian cell-produced wild type but not mutant HPCA. Intriguingly, we also identified a fragment from prostaglandin D2synthase that binds preferentially to ceramide 1-phosphate. The wide variety of proteins and domains capable of binding to ceramide suggests that many of the signaling functions of ceramide may be regulated by direct binding to these proteins. Based on the deep sequencing data, we estimate that our yeast surface cDNA display library covers ∼60% of the human proteome and our selection/deep sequencing protocol can identify target-interacting protein fragments that are present at extremely low frequency in the starting library. Thus, the yeast surface cDNA display/deep sequencing approach is a rapid, comprehensive, and flexible method for the analysis of protein-ligand interactions, particularly for the study of non-protein ligands. PMID

  7. Microbial Diversity in Deep-sea Methane Seep Sediments Presented by SSU rRNA Gene Tag Sequencing

    PubMed Central

    Nunoura, Takuro; Takaki, Yoshihiro; Kazama, Hiromi; Hirai, Miho; Ashi, Juichiro; Imachi, Hiroyuki; Takai, Ken

    2012-01-01

    Microbial community structures in methane seep sediments in the Nankai Trough were analyzed by tag-sequencing analysis for the small subunit (SSU) rRNA gene using a newly developed primer set. The dominant members of Archaea were Deep-sea Hydrothermal Vent Euryarchaeotic Group 6 (DHVEG 6), Marine Group I (MGI) and Deep Sea Archaeal Group (DSAG), and those in Bacteria were Alpha-, Gamma-, Delta- and Epsilonproteobacteria, Chloroflexi, Bacteroidetes, Planctomycetes and Acidobacteria. Diversity and richness were examined by 8,709 and 7,690 tag-sequences from sediments at 5 and 25 cm below the seafloor (cmbsf), respectively. The estimated diversity and richness in the methane seep sediment are as high as those in soil and deep-sea hydrothermal environments, although the tag-sequences obtained in this study were not sufficient to show whole microbial diversity in this analysis. We also compared the diversity and richness of each taxon/division between the sediments from the two depths, and found that the diversity and richness of some taxa/divisions varied significantly along with the depth. PMID:22510646

  8. The mitochondrial genome sequence of a deep-sea, hydrothermal vent limpet, Lepetodrilus nux, presents a novel vetigastropod gene arrangement.

    PubMed

    Nakajima, Yuichi; Shinzato, Chuya; Khalturina, Mariia; Nakamura, Masako; Watanabe, Hiromi; Satoh, Noriyuki; Mitarai, Satoshi

    2016-08-01

    While mitochondrial (mt) genomes are used extensively for comparative and evolutionary genomics, few mt genomes of deep-sea species, including hydrothermal vent species, have been determined. The Genus Lepetodrilus is a major deep-sea gastropod taxon that occurs in various deep-sea ecosystems. Using next-generation sequencing, we determined nearly the complete mitochondrial genome sequence of Lepetodrilus nux, which inhabits hydrothermal vents in the Okinawa Trough. The total length of the mitochondrial genome is 16,353bp, excluding the repeat region. It contains 13 protein-coding genes, 22 tRNA genes, two rRNA genes, and a control region, typical of most metazoan genomes. Compared with other vetigastropod mt genome sequences, L. nux employs a novel mt gene arrangement. Other novel arrangements have been identified in the vetigastropod, Fissurella volcano, and in Chrysomallon squamiferum, a neomphaline gastropod; however, all three gene arrangements are different, and Bayesian inference suggests that each lineage diverged independently. Our findings suggest that vetigastropod mt gene arrangements are more diverse than previously realized. PMID:27102631

  9. Deep Sequencing for the Detection of Virus-Like Sequences in the Brains of Patients with Multiple Sclerosis: Detection of GBV-C in Human Brain

    PubMed Central

    Kriesel, John D.; Hobbs, Maurine R.; Jones, Brandt B.; Milash, Brett; Nagra, Rashed M.; Fischer, Kael F.

    2012-01-01

    Multiple sclerosis (MS) is a demyelinating disease of unknown origin that affects the central nervous system of an estimated 400,000 Americans. GBV-C or hepatitis G is a flavivirus that is found in the serum of 1–2% of blood donors. It was originally associated with hepatitis, but is now believed to be a relatively non-pathogenic lymphotropic virus. Fifty frozen specimens from the brains of deceased persons affected by MS were obtained along with 15 normal control brain specimens. RNA was extracted and ribosomal RNAs were depleted before sequencing on the Illumina GAII. These 36 bp reads were compared with a non-redundant database derived from the 600,000+ viral sequences in GenBank organized into 4080 taxa. An individual read successfully aligned to the viral database was considered to be a “hit”. Normalized MS specimen hit rates for each viral taxon were compared to the distribution of hits in the normal controls. Seventeen MS and 11 control brain extracts were sequenced, yielding 4–10 million sequences (“reads”) each. Over-representation of sequence from at least one of 12 viral taxa was observed in 7 of the 17 MS samples. Sequences resembling other viruses previously implicated in the pathogenesis of MS were not significantly enriched in any of the diseased brain specimens. Sequences from GB virus C (GBV-C), a flavivirus not previously isolated from brain, were enriched in one of the MS samples. GBV-C in this brain specimen was confirmed by specific amplification in this single MS brain specimen, but not in the 30 other MS brain samples available. The entire 9.4 kb sequence of this GBV-C isolate is reported here. This study shows the feasibility of deep sequencing for the detection of occult viral infections in the brains of deceased persons with MS. The first isolation of GBV-C from human brain is reported here. PMID:22412845

  10. Correlating Geochemical and Deep Sequence Data: An Example from the Uzon Caldera, Kamchatka

    NASA Astrophysics Data System (ADS)

    Crowe, D. E.; Wagner, I. D.; Mou, X.; Ye, W.; Sun, S.; Romanek, C. S.; Moran, M. A.

    2008-12-01

    Microbial community structure is complex and relatively unknown in high temperature extreme environments. The relationship between community structure and the variable physicochemical environment that hosts the community is similarly not well understood. One of the most significant roadblocks to elucidating these relationships is the difficulty of determining which microorganisms are present in a given environment, and in what percentages. We carried out deep sequencing of 16S rRNA genes using 454 pyrosequencing methods from a series of terrestrial hot springs in the Uzon Caldera, Kamchatka, Far East Russia. Using Primer v5 software, we correlated community structure and membership to variable geochemical parameters within the springs, and determined which set of parameters is most predictive in terms of community structure. Six hot springs within the caldera were selected for study. For each spring, temperature, pH, oxygen and hydrogen isotope ratios, and a suite of elements were measured. Sediment samples were collected and bulk DNA was extracted. A set of 12 primers with broad coverage of the Bacteria and Archaea V6 region of the 16S rRNA gene was used. The 311,981 reads obtained were clustered at an identity threshold of 99%. Rarefaction analysis revealed that although between 3350 and 6700 OTUs (Bacteria plus Archaea) were identified in each pool, saturation was not attained. PCA and MDS analyses were used to evaluate relationships within the geochemical data. Both data sets were analyzed using the BIOENV subprogram of Primer v5 to evaluate which set of physicochemical parameters best explained the community structure in all pools. The results of the BIOENV analysis revealed that 94.6% of the variance in membership between pools is explained by a set of highly correlated parameters consisting of As, Cl, Li, Ca, K, and Na concentrations. Where As and salinity are high, Bacterial communities were dominated by Thermaceae, Pseudomonadaceae, and Nitrospiraceae, and

  11. A Systematic Assessment of Accuracy in Detecting Somatic Mosaic Variants by Deep Amplicon Sequencing: Application to NF2 Gene

    PubMed Central

    Sestini, Roberta; Candita, Luisa; Capone, Gabriele Lorenzo; Barbetti, Lorenzo; Falconi, Serena; Frusconi, Sabrina; Giotti, Irene; Giuliani, Costanza; Torricelli, Francesca; Benelli, Matteo; Papi, Laura

    2015-01-01

    The accurate detection of low-allelic variants is still challenging, particularly for the identification of somatic mosaicism, where matched control sample is not available. High throughput sequencing, by the simultaneous and independent analysis of thousands of different DNA fragments, might overcome many of the limits of traditional methods, greatly increasing the sensitivity. However, it is necessary to take into account the high number of false positives that may arise due to the lack of matched control samples. Here, we applied deep amplicon sequencing to the analysis of samples with known genotype and variant allele fraction (VAF) followed by a tailored statistical analysis. This method allowed to define a minimum value of VAF for detecting mosaic variants with high accuracy. Then, we exploited the estimated VAF to select candidate alterations in NF2 gene in 34 samples with unknown genotype (30 blood and 4 tumor DNAs), demonstrating the suitability of our method. The strategy we propose optimizes the use of deep amplicon sequencing for the identification of low abundance variants. Moreover, our method can be applied to different high throughput sequencing approaches to estimate the background noise and define the accuracy of the experimental design. PMID:26066488

  12. Deep sequencing of immune repertoires during bovine development and in response to respiratory pathogen challenge

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Vertebrate immune systems generate diverse repertoires of antibodies capable of mediating response to a variety of antigens. Single-molecule circular consensus sequencing permits the sequencing of expressed antibody repertoires at previously unattainable depths of coverage and accuracy. We examined...

  13. Deep sequencing of the hepatitis B virus in hepatocellular carcinoma patients reveals enriched integration events, structural alterations and sequence variations.

    PubMed

    Toh, Soo Ting; Jin, Yu; Liu, Lizhen; Wang, Jingbo; Babrzadeh, Farbod; Gharizadeh, Baback; Ronaghi, Mostafa; Toh, Han Chong; Chow, Pierce Kah-Hoe; Chung, Alexander Y-F; Ooi, London L-P-J; Lee, Caroline G-L

    2013-04-01

    Chronic hepatitis B virus (HBV) infection is epidemiologically associated with hepatocellular carcinoma (HCC), but its role in HCC remains poorly understood due to technological limitations. In this study, we systematically characterize HBV in HCC patients. HBV sequences were enriched from 48 HCC patients using an oligo-bead-based strategy, pooled together and sequenced using the FLX-Genome-Sequencer. In the tumors, preferential integration of HBV into promoters of genes (P < 0.001) and significant enrichment of integration into chromosome 10 (P < 0.01) were observed. Integration into chromosome 10 was significantly associated with poorly differentiated tumors (P < 0.05). Notably, in the tumors, recurrent integration into the promoter of the human telomerase reverse transcriptase (TERT) gene was found to correlate with increased TERT expression. The preferred region within the HBV genome involved in integration and viral structural alteration is at the 3'-end of hepatitis B virus X protein (HBx), where viral replication/transcription initiates. Upon integration, the 3'-end of the HBx is often deleted. HBx-human chimeric transcripts, the most common type of chimeric transcripts, can be expressed as chimeric proteins. Sequence variation resulting in non-conservative amino acid substitutions are commonly observed in HBV genome. This study highlights HBV as highly mutable in HCC patients with preferential regions within the host and virus genome for HBV integration/structural alterations. PMID:23276797

  14. Identification of deep intronic variants in 15 haemophilia A patients by next generation sequencing of the whole factor VIII gene.

    PubMed

    Bach, J Elisa; Wolf, Beat; Oldenburg, Johannes; Müller, Clemens R; Rost, Simone

    2015-10-01

    Current screening methods for factor VIII gene (F8) mutations can reveal the causative alteration in the vast majority of haemophilia A patients. Yet, standard diagnostic methods fail in about 2% of cases. This study aimed at analysing the entire intronic sequences of the F8 gene in 15 haemophilia A patients by next generation sequencing. All patients had a mild to moderate phenotype and no mutation in the coding sequence and splice sites of the F8 gene could be diagnosed so far. Next generation sequencing data revealed 23 deep intronic candidate variants in several F8 introns, including six recurrent variants and three variants that have been described before. One patient additionally showed a deletion of 9.2 kb in intron 1, mediated by Alu-type repeats. Several bioinformatic tools were used to score the variants in comparison to known pathogenic F8 mutations in order to predict their deleteriousness. Pedigree analyses showed a correct segregation pattern for three of the presumptive mutations. In each of the 15 patients analysed, at least one deep intronic variant in the F8 gene was identified and predicted to alter F8 mRNA splicing. Reduced F8 mRNA levels and/or stability would be well compatible with the patients' mild to moderate haemophilia A phenotypes. The next generation sequencing approach used proved an efficient method to screen the complete F8 gene and could be applied as a one-stop sequencing method for molecular diagnostics of haemophilia A. PMID:25948085

  15. Natural variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions (2013 DOE JGI Genomics of Energy and Environment 8th Annual User Meeting)

    SciTech Connect

    Gordon, Sean

    2013-03-01

    Sean Gordon of the USDA on "Natural variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions" at the 8th Annual Genomics of Energy & Environment Meeting on March 27, 2013 in Walnut Creek, Calif.

  16. HPV Population Profiling in Healthy Men by Next-Generation Deep Sequencing Coupled with HPV-QUEST.

    PubMed

    Yin, Li; Yao, Jin; Chang, Kaifen; Gardner, Brent P; Yu, Fahong; Giuliano, Anna R; Goodenow, Maureen M

    2016-02-01

    Multiple-type human papillomaviruses (HPV) infection presents a greater risk for persistence in asymptomatic individuals and may accelerate cancer development. To extend the scope of HPV types defined by probe-based assays, multiplexing deep sequencing of HPV L1, coupled with an HPV-QUEST genotyping server and a bioinformatic pipeline, was established and applied to survey the diversity of HPV genotypes among a subset of healthy men from the HPV in Men (HIM) Multinational Study. Twenty-one HPV genotypes (12 high-risk and 9 low-risk) were detected in the genital area from 18 asymptomatic individuals. A single HPV type, either HPV16, HPV6b or HPV83, was detected in 7 individuals, while coinfection by 2 to 5 high-risk and/or low-risk genotypes was identified in the other 11 participants. In two individuals studied for over one year, HPV16 persisted, while fluctuations of coinfecting genotypes occurred. HPV L1 regions were generally identical between query and reference sequences, although nonsynonymous and synonymous nucleotide polymorphisms of HPV16, 18, 31, 35h, 59, 70, 73, cand85, 6b, 62, 81, 83, cand89 or JEB2 L1 genotypes, mostly unidentified by linear array, were evident. Deep sequencing coupled with HPV-QUEST provides efficient and unambiguous classification of HPV genotypes in multiple-type HPV infection in host ecosystems. PMID:26821041

  17. HPV Population Profiling in Healthy Men by Next-Generation Deep Sequencing Coupled with HPV-QUEST

    PubMed Central

    Yin, Li; Yao, Jin; Chang, Kaifen; Gardner, Brent P.; Yu, Fahong; Giuliano, Anna R.; Goodenow, Maureen M.

    2016-01-01

    Multiple-type human papillomaviruses (HPV) infection presents a greater risk for persistence in asymptomatic individuals and may accelerate cancer development. To extend the scope of HPV types defined by probe-based assays, multiplexing deep sequencing of HPV L1, coupled with an HPV-QUEST genotyping server and a bioinformatic pipeline, was established and applied to survey the diversity of HPV genotypes among a subset of healthy men from the HPV in Men (HIM) Multinational Study. Twenty-one HPV genotypes (12 high-risk and 9 low-risk) were detected in the genital area from 18 asymptomatic individuals. A single HPV type, either HPV16, HPV6b or HPV83, was detected in 7 individuals, while coinfection by 2 to 5 high-risk and/or low-risk genotypes was identified in the other 11 participants. In two individuals studied for over one year, HPV16 persisted, while fluctuations of coinfecting genotypes occurred. HPV L1 regions were generally identical between query and reference sequences, although nonsynonymous and synonymous nucleotide polymorphisms of HPV16, 18, 31, 35h, 59, 70, 73, cand85, 6b, 62, 81, 83, cand89 or JEB2 L1 genotypes, mostly unidentified by linear array, were evident. Deep sequencing coupled with HPV-QUEST provides efficient and unambiguous classification of HPV genotypes in multiple-type HPV infection in host ecosystems. PMID:26821041

  18. Quantitative Deep Sequencing Reveals Dynamic HIV-1 Escape and Large Population Shifts during CCR5 Antagonist Therapy In Vivo

    PubMed Central

    Tsibris, Athe M. N.; Russ, Carsten; Lo, Chien-Chi; Leitner, Thomas; Gaschen, Brian; Theiler, James; Paredes, Roger; Su, Zhaohui; Hughes, Michael D.; Gulick, Roy M.; Greaves, Wayne; Coakley, Eoin; Flexner, Charles; Nusbaum, Chad; Kuritzkes, Daniel R.

    2009-01-01

    High-throughput sequencing platforms provide an approach for detecting rare HIV-1 variants and documenting more fully quasispecies diversity. We applied this technology to the V3 loop-coding region of env in samples collected from 4 chronically HIV-infected subjects in whom CCR5 antagonist (vicriviroc [VVC]) therapy failed. Between 25,000–140,000 amplified sequences were obtained per sample. Profound baseline V3 loop sequence heterogeneity existed; predicted CXCR4-using populations were identified in a largely CCR5-using population. The V3 loop forms associated with subsequent virologic failure, either through CXCR4 use or the emergence of high-level VVC resistance, were present as minor variants at 0.8–2.8% of baseline samples. Extreme, rapid shifts in population frequencies toward these forms occurred, and deep sequencing provided a detailed view of the rapid evolutionary impact of VVC selection. Greater V3 diversity was observed post-selection. This previously unreported degree of V3 loop sequence diversity has implications for viral pathogenesis, vaccine design, and the optimal use of HIV-1 CCR5 antagonists. PMID:19479085

  19. Reads meet rotamers: structural biology in the age of deep sequencing.

    PubMed

    Sethi, Anurag; Clarke, Declan; Chen, Jieming; Kumar, Sushant; Galeev, Timur R; Regan, Lynne; Gerstein, Mark

    2015-12-01

    Structure has traditionally been interrelated with sequence, usually in the framework of comparing sequences across species sharing a common fold. However, the nature of information within the sequence and structure databases is evolving, changing the type of comparisons possible. In particular, we now have a vast amount of personal genome sequences from human populations and a greater fraction of new structures contain interacting proteins within large complexes. Consequently, we have to recast our conception of sequence conservation and its relation to structure-for example, focusing more on selection within the human population. Moreover, within structural biology there is less emphasis on the discovery of novel folds and more on relating structures to networks of protein interactions. We cover this changing mindset here. PMID:26658741

  20. Ultra-deep sequencing leads to earlier and more sensitive detection of the tyrosine kinase inhibitor resistance mutation T315I in chronic myeloid leukemia.

    PubMed

    Baer, Constance; Kern, Wolfgang; Koch, Sarah; Nadarajah, Niroshan; Schindela, Sonja; Meggendorfer, Manja; Haferlach, Claudia; Haferlach, Torsten

    2016-07-01

    Chronic myeloid leukemia cells acquire resistance to tyrosine kinase inhibitors through mutations in the ABL1 kinase domain. The T315I mutation mediates resistance to imatinib, dasatinib, nilotinib and bosutinib, whereas sensitivity to ponatinib remains. Mutation detection by conventional Sanger sequencing requires 10%-20% expansion of the mutated subclone. We studied the T315I mutation development by ultra-deep sequencing on the 454 XL+ platform (Roche) in comparison to Sanger sequencing. By ultra-deep sequencing, mutations were detected at loads of 1%-2%. We selected 40 patients who had failed first-line to third-line treatment (imatinib, dasatinib, nilotinib) and had high loads of the T315I mutation detected by Sanger sequencing. We confirmed T315I mutations by ultra-deep sequencing and investigated the mutation dynamics by backtracking earlier samples. In 20 of 40 patients, we identified the T315I three months (median) before Sanger sequencing detection limits were reached. To exclude sporadic low percentage mutation development without subsequent mutation outgrowth, we selected 42 patients without resistance mutations detected by Sanger sequencing but loss of major molecular response. Here, no mutation was detected by ultradeep sequencing. Additional non-T315I resistance mutations were found in 20 of 40 patients. Only 15% had two mutations per cell; the other cases showed multiple independently mutated clones and the T315I clone demonstrated a rapid outgrowth. In conclusion, T315I mutations could be detected earlier by ultra-deep sequencing compared to Sanger sequencing in a selected group of cases. Earlier mutation detection by ultra-deep sequencing might allow treatment to be changed before clonal increase of cells with the T315I mutation. PMID:27102501

  1. Ultra-deep sequencing leads to earlier and more sensitive detection of the tyrosine kinase inhibitor resistance mutation T315I in chronic myeloid leukemia

    PubMed Central

    Baer, Constance; Kern, Wolfgang; Koch, Sarah; Nadarajah, Niroshan; Schindela, Sonja; Meggendorfer, Manja; Haferlach, Claudia; Haferlach, Torsten

    2016-01-01

    Chronic myeloid leukemia cells acquire resistance to tyrosine kinase inhibitors through mutations in the ABL1 kinase domain. The T315I mutation mediates resistance to imatinib, dasatinib, nilotinib and bosutinib, whereas sensitivity to ponatinib remains. Mutation detection by conventional Sanger sequencing requires 10%–20% expansion of the mutated subclone. We studied the T315I mutation development by ultra-deep sequencing on the 454 XL+ platform (Roche) in comparison to Sanger sequencing. By ultra-deep sequencing, mutations were detected at loads of 1%–2%. We selected 40 patients who had failed first-line to third-line treatment (imatinib, dasatinib, nilotinib) and had high loads of the T315I mutation detected by Sanger sequencing. We confirmed T315I mutations by ultra-deep sequencing and investigated the mutation dynamics by backtracking earlier samples. In 20 of 40 patients, we identified the T315I three months (median) before Sanger sequencing detection limits were reached. To exclude sporadic low percentage mutation development without subsequent mutation outgrowth, we selected 42 patients without resistance mutations detected by Sanger sequencing but loss of major molecular response. Here, no mutation was detected by ultradeep sequencing. Additional non-T315I resistance mutations were found in 20 of 40 patients. Only 15% had two mutations per cell; the other cases showed multiple independently mutated clones and the T315I clone demonstrated a rapid outgrowth. In conclusion, T315I mutations could be detected earlier by ultra-deep sequencing compared to Sanger sequencing in a selected group of cases. Earlier mutation detection by ultra-deep sequencing might allow treatment to be changed before clonal increase of cells with the T315I mutation. PMID:27102501

  2. Simultaneous Identification of DNA and RNA Viruses Present in Pig Faeces Using Process-Controlled Deep Sequencing

    PubMed Central

    Sachsenröder, Jana; Twardziok, Sven; Hammerl, Jens A.; Janczyk, Pawel; Wrede, Paul; Hertwig, Stefan; Johne, Reimar

    2012-01-01

    Background Animal faeces comprise a community of many different microorganisms including bacteria and viruses. Only scarce information is available about the diversity of viruses present in the faeces of pigs. Here we describe a protocol, which was optimized for the purification of the total fraction of viral particles from pig faeces. The genomes of the purified DNA and RNA viruses were simultaneously amplified by PCR and subjected to deep sequencing followed by bioinformatic analyses. The efficiency of the method was monitored using a process control consisting of three bacteriophages (T4, M13 and MS2) with different morphology and genome types. Defined amounts of the bacteriophages were added to the sample and their abundance was assessed by quantitative PCR during the preparation procedure. Results The procedure was applied to a pooled faecal sample of five pigs. From this sample, 69,613 sequence reads were generated. All of the added bacteriophages were identified by sequence analysis of the reads. In total, 7.7% of the reads showed significant sequence identities with published viral sequences. They mainly originated from bacteriophages (73.9%) and mammalian viruses (23.9%); 0.8% of the sequences showed identities to plant viruses. The most abundant detected porcine viruses were kobuvirus, rotavirus C, astrovirus, enterovirus B, sapovirus and picobirnavirus. In addition, sequences with identities to the chimpanzee stool-associated circular ssDNA virus were identified. Whole genome analysis indicates that this virus, tentatively designated as pig stool-associated circular ssDNA virus (PigSCV), represents a novel pig virus. Conclusion The established protocol enables the simultaneous detection of DNA and RNA viruses in pig faeces including the identification of so far unknown viruses. It may be applied in studies investigating aetiology, epidemiology and ecology of diseases. The implemented process control serves as quality control, ensures comparability of the

  3. Using deep RNA sequencing for the structural annotation of the laccaria bicolor mycorrhizal transcriptome.

    SciTech Connect

    Larsen, P. E.; Trivedi, G.; Sreedasyam, A.; Lu, V.; Podila, G. K.; Collart, F. R.; Biosciences Division; Univ. of Alabama

    2010-07-06

    Accurate structural annotation is important for prediction of function and required for in vitro approaches to characterize or validate the gene expression products. Despite significant efforts in the field, determination of the gene structure from genomic data alone is a challenging and inaccurate process. The ease of acquisition of transcriptomic sequence provides a direct route to identify expressed sequences and determine the correct gene structure. We developed methods to utilize RNA-seq data to correct errors in the structural annotation and extend the boundaries of current gene models using assembly approaches. The methods were validated with a transcriptomic data set derived from the fungus Laccaria bicolor, which develops a mycorrhizal symbiotic association with the roots of many tree species. Our analysis focused on the subset of 1501 gene models that are differentially expressed in the free living vs. mycorrhizal transcriptome and are expected to be important elements related to carbon metabolism, membrane permeability and transport, and intracellular signaling. Of the set of 1501 gene models, 1439 (96%) successfully generated modified gene models in which all error flags were successfully resolved and the sequences aligned to the genomic sequence. The remaining 4% (62 gene models) either had deviations from transcriptomic data that could not be spanned or generated sequence that did not align to genomic sequence. The outcome of this process is a set of high confidence gene models that can be reliably used for experimental characterization of protein function. 69% of expressed mycorrhizal JGI 'best' gene models deviated from the transcript sequence derived by this method. The transcriptomic sequence enabled correction of a majority of the structural inconsistencies and resulted in a set of validated models for 96% of the mycorrhizal genes. The method described here can be applied to improve gene structural annotation in other species, provided that there

  4. Increasing Clinical Severity during a Dengue Virus Type 3 Cuban Epidemic: Deep Sequencing of Evolving Viral Populations

    PubMed Central

    Blanc, Hervé; Bordería, Antonio V.; Díaz, Gisell; Henningsson, Rasmus; Gonzalez, Daniel; Santana, Emidalys; Alvarez, Mayling; Castro, Osvaldo; Fontes, Magnus; Vignuzzi, Marco; Guzman, Maria G.

    2016-01-01

    ABSTRACT During the dengue virus type 3 (DENV-3) epidemic that occurred in Havana in 2001 to 2002, severe disease was associated with the infection sequence DENV-1 followed by DENV-3 (DENV-1/DENV-3), while the sequence DENV-2/DENV-3 was associated with mild/asymptomatic infections. To determine the role of the virus in the increasing severity demonstrated during the epidemic, serum samples collected at different time points were studied. A total of 22 full-length sequences were obtained using a deep-sequencing approach. Bayesian phylogenetic analysis of consensus sequences revealed that two DENV-3 lineages were circulating in Havana at that time, both grouped within genotype III. The predominant lineage is closely related to Peruvian and Ecuadorian strains, while the minor lineage is related to Venezuelan strains. According to consensus sequences, relatively few nonsynonymous mutations were observed; only one was fixed during the epidemic at position 4380 in the NS2B gene. Intrahost genetic analysis indicated that a significant minor population was selected and became predominant toward the end of the epidemic. In conclusion, greater variability was detected during the epidemic's progression in terms of significant minority variants, particularly in the nonstructural genes. An increasing trend of genetic diversity toward the end of the epidemic was observed only for synonymous variant allele rates, with higher variability in secondary cases. Remarkably, significant intrahost genetic variation was demonstrated within the same patient during the course of secondary infection with DENV-1/DENV-3, including changes in the structural proteins premembrane (PrM) and envelope (E). Therefore, the dynamic of evolving viral populations in the context of heterotypic antibodies could be related to the increasing clinical severity observed during the epidemic. IMPORTANCE Based on the evidence that DENV fitness is context dependent, our research has focused on the study of viral

  5. Survey of Programs Used to Detect Alternative Splicing Isoforms from Deep Sequencing Data In Silico

    PubMed Central

    Min, Feng; Wang, Sumei; Zhang, Li

    2015-01-01

    Next-generation sequencing techniques have been rapidly emerging. However, the massive sequencing reads hide a great deal of unknown important information. Advances have enabled researchers to discover alternative splicing (AS) sites and isoforms using computational approaches instead of molecular experiments. Given the importance of AS for gene expression and protein diversity in eukaryotes, detecting alternative splicing and isoforms represents a hot topic in systems biology and epigenetics research. The computational methods applied to AS prediction have improved since the emergence of next-generation sequencing. In this study, we introduce state-of-the-art research on AS and then compare the research methods and software tools available for AS based on next-generation sequencing reads. Finally, we discuss the prospects of computational methods related to AS. PMID:26421304

  6. A generic assay for whole-genome amplification and deep sequencing of enterovirus A71.

    PubMed

    Tan, Le Van; Tuyen, Nguyen Thi Kim; Thanh, Tran Tan; Ngan, Tran Thuy; Van, Hoang Minh Tu; Sabanathan, Saraswathy; Van, Tran Thi My; Thanh, Le Thi My; Nguyet, Lam Anh; Geoghegan, Jemma L; Ong, Kien Chai; Perera, David; Hang, Vu Thi Ty; Ny, Nguyen Thi Han; Anh, Nguyen To; Ha, Do Quang; Qui, Phan Tu; Viet, Do Chau; Tuan, Ha Manh; Wong, Kum Thong; Holmes, Edward C; Chau, Nguyen Van Vinh; Thwaites, Guy; van Doorn, H Rogier

    2015-04-01

    Enterovirus A71 (EV-A71) has emerged as the most important cause of large outbreaks of severe and sometimes fatal hand, foot and mouth disease (HFMD) across the Asia-Pacific region. EV-A71 outbreaks have been associated with (sub)genogroup switches, sometimes accompanied by recombination events. Understanding EV-A71 population dynamics is therefore essential for understanding this emerging infection, and may provide pivotal information for vaccine development. Despite the public health burden of EV-A71, relatively few EV-A71 complete-genome sequences are available for analysis and from limited geographical localities. The availability of an efficient procedure for whole-genome sequencing would stimulate effort to generate more viral sequence data. Herein, we report for the first time the development of a next-generation sequencing based protocol for whole-genome sequencing of EV-A71 directly from clinical specimens. We were able to sequence viruses of subgenogroup C4 and B5, while RNA from culture materials of diverse EV-A71 subgenogroups belonging to both genogroup B and C was successfully amplified. The nature of intra-host genetic diversity was explored in 22 clinical samples, revealing 107 positions carrying minor variants (ranging from 0 to 15 variants per sample). Our analysis of EV-A71 strains sampled in 2013 showed that they all belonged to subgenogroup B5, representing the first report of this subgenogroup in Vietnam. In conclusion, we have successfully developed a high-throughput next-generation sequencing-based assay for whole-genome sequencing of EV-A71 from clinical samples. PMID:25704598

  7. A generic assay for whole-genome amplification and deep sequencing of enterovirus A71

    PubMed Central

    Tan, Le Van; Tuyen, Nguyen Thi Kim; Thanh, Tran Tan; Ngan, Tran Thuy; Van, Hoang Minh Tu; Sabanathan, Saraswathy; Van, Tran Thi My; Thanh, Le Thi My; Nguyet, Lam Anh; Geoghegan, Jemma L.; Ong, Kien Chai; Perera, David; Hang, Vu Thi Ty; Ny, Nguyen Thi Han; Anh, Nguyen To; Ha, Do Quang; Qui, Phan Tu; Viet, Do Chau; Tuan, Ha Manh; Wong, Kum Thong; Holmes, Edward C.; Chau, Nguyen Van Vinh; Thwaites, Guy; van Doorn, H. Rogier

    2015-01-01

    Enterovirus A71 (EV-A71) has emerged as the most important cause of large outbreaks of severe and sometimes fatal hand, foot and mouth disease (HFMD) across the Asia-Pacific region. EV-A71 outbreaks have been associated with (sub)genogroup switches, sometimes accompanied by recombination events. Understanding EV-A71 population dynamics is therefore essential for understanding this emerging infection, and may provide pivotal information for vaccine development. Despite the public health burden of EV-A71, relatively few EV-A71 complete-genome sequences are available for analysis and from limited geographical localities. The availability of an efficient procedure for whole-genome sequencing would stimulate effort to generate more viral sequence data. Herein, we report for the first time the development of a next-generation sequencing based protocol for whole-genome sequencing of EV-A71 directly from clinical specimens. We were able to sequence viruses of subgenogroup C4 and B5, while RNA from culture materials of diverse EV-A71 subgenogroups belonging to both genogroup B and C was successfully amplified. The nature of intra-host genetic diversity was explored in 22 clinical samples, revealing 107 positions carrying minor variants (ranging from 0 to 15 variants per sample). Our analysis of EV-A71 strains sampled in 2013 showed that they all belonged to subgenogroup B5, representing the first report of this subgenogroup in Vietnam. In conclusion, we have successfully developed a high-throughput next-generation sequencing-based assay for whole-genome sequencing of EV-A71 from clinical samples. PMID:25704598

  8. Insights into deep-sea sediment fungal communities from the East Indian Ocean using targeted environmental sequencing combined with traditional cultivation.

    PubMed

    Zhang, Xiao-yong; Tang, Gui-ling; Xu, Xin-ya; Nong, Xu-hua; Qi, Shu-hua

    2014-01-01

    The fungal diversity in deep-sea environments has recently gained an increasing amount attention. Our knowledge and understanding of the true fungal diversity and the role it plays in deep-sea environments, however, is still limited. We investigated the fungal community structure in five sediments from a depth of ∼ 4000 m in the East India Ocean using a combination of targeted environmental sequencing and traditional cultivation. This approach resulted in the recovery of a total of 45 fungal operational taxonomic units (OTUs) and 20 culturable fungal phylotypes. This finding indicates that there is a great amount of fungal diversity in the deep-sea sediments collected in the East Indian Ocean. Three fungal OTUs and one culturable phylotype demonstrated high divergence (89%-97%) from the existing sequences in the GenBank. Moreover, 44.4% fungal OTUs and 30% culturable fungal phylotypes are new reports for deep-sea sediments. These results suggest that the deep-sea sediments from the East India Ocean can serve as habitats for new fungal communities compared with other deep-sea environments. In addition, different fungal community could be detected when using targeted environmental sequencing compared with traditional cultivation in this study, which suggests that a combination of targeted environmental sequencing or traditional cultivation alone. This study is the first to report new insights into the fungal communities in deep-sea sediments environmental sequencing and traditional cultivation will generate a more diverse fungal community in deep-sea environments than using either from the East Indian Ocean, which increases our knowledge and understanding of the fungal diversity in deep-sea environments. PMID:25272044

  9. Insights into Deep-Sea Sediment Fungal Communities from the East Indian Ocean Using Targeted Environmental Sequencing Combined with Traditional Cultivation

    PubMed Central

    Zhang, Xiao-yong; Tang, Gui-ling; Xu, Xin-ya; Nong, Xu-hua; Qi, Shu-Hua

    2014-01-01

    The fungal diversity in deep-sea environments has recently gained an increasing amount attention. Our knowledge and understanding of the true fungal diversity and the role it plays in deep-sea environments, however, is still limited. We investigated the fungal community structure in five sediments from a depth of ∼4000 m in the East India Ocean using a combination of targeted environmental sequencing and traditional cultivation. This approach resulted in the recovery of a total of 45 fungal operational taxonomic units (OTUs) and 20 culturable fungal phylotypes. This finding indicates that there is a great amount of fungal diversity in the deep-sea sediments collected in the East Indian Ocean. Three fungal OTUs and one culturable phylotype demonstrated high divergence (89%–97%) from the existing sequences in the GenBank. Moreover, 44.4% fungal OTUs and 30% culturable fungal phylotypes are new reports for deep-sea sediments. These results suggest that the deep-sea sediments from the East India Ocean can serve as habitats for new fungal communities compared with other deep-sea environments. In addition, different fungal community could be detected when using targeted environmental sequencing compared with traditional cultivation in this study, which suggests that a combination of targeted environmental sequencing and traditional cultivation will generate a more diverse fungal community in deep-sea environments than using either targeted environmental sequencing or traditional cultivation alone. This study is the first to report new insights into the fungal communities in deep-sea sediments from the East Indian Ocean, which increases our knowledge and understanding of the fungal diversity in deep-sea environments. PMID:25272044

  10. Characterization and Development of EST-SSRs by Deep Transcriptome Sequencing in Chinese Cabbage (Brassica rapa L. ssp. pekinensis)

    PubMed Central

    Ding, Qian; Li, Jingjuan; Wang, Fengde; Zhang, Yihui; Li, Huayin; Zhang, Jiannong; Gao, Jianwei

    2015-01-01

    Simple sequence repeats (SSRs) are among the most important markers for population analysis and have been widely used in plant genetic mapping and molecular breeding. Expressed sequence tag-SSR (EST-SSR) markers, located in the coding regions, are potentially more efficient for QTL mapping, gene targeting, and marker-assisted breeding. In this study, we investigated 51,694 nonredundant unigenes, assembled from clean reads from deep transcriptome sequencing with a Solexa/Illumina platform, for identification and development of EST-SSRs in Chinese cabbage. In total, 10,420 EST-SSRs with over 12 bp were identified and characterized, among which 2744 EST-SSRs are new and 2317 are known ones showing polymorphism with previously reported SSRs. A total of 7877 PCR primer pairs for 1561 EST-SSR loci were designed, and primer pairs for twenty-four EST-SSRs were selected for primer evaluation. In nineteen EST-SSR loci (79.2%), amplicons were successfully generated with high quality. Seventeen (89.5%) showed polymorphism in twenty-four cultivars of Chinese cabbage. The polymorphic alleles of each polymorphic locus were sequenced, and the results showed that most polymorphisms were due to variations of SSR repeat motifs. The EST-SSRs identified and characterized in this study have important implications for developing new tools for genetics and molecular breeding in Chinese cabbage. PMID:26504770

  11. Mosaic KCNJ2 mutation in Andersen-Tawil syndrome: targeted deep sequencing is useful for the detection of mosaicism.

    PubMed

    Hasegawa, K; Ohno, S; Kimura, H; Itoh, H; Makiyama, T; Yoshida, Y; Horie, M

    2015-03-01

    Andersen-Tawil syndrome (ATS) is an inherited disease characterized by ventricular arrhythmias, periodic paralysis, and dysmorphic features. It results from a heterozygous mutation of KCNJ2, but little is known about mosaicism in ATS. We performed genetic analysis of KCNJ2 in 32 ATS probands and their family members and identified KCNJ2 mutations in 25 probands, 20 families who underwent extensive genetic testing. These tests revealed that seven probands carried de novo mutations while 13 carried inherited mutations from their parents. We then specifically assessed a single proband and the respective family. The proband was a 9 year old girl who fulfilled the ATS triad and carried an insertion mutation (p.75_76insThr). We determined that the proband's mother carried a somatic mosaicism and that the proband's younger brother also carried the ATS phenotype with the same insertion mutation. The mother, who exhibited mosaicism, was asymptomatic, although she exhibited Q(T)U prolongation. Mutant allele frequency was 11% as per TA cloning and 17.3% as per targeted deep sequencing. Our observations suggest that targeted deep sequencing is useful for the detection of mosaicism and that the detection of mosaic mutations in parents of apparently sporadic ATS patients can help in the process of genetic counseling. PMID:24635491

  12. Identification of Hepatotropic Viruses from Plasma Using Deep Sequencing: A Next Generation Diagnostic Tool

    PubMed Central

    Patterson, Jordan; Ford, Glenn; O’keefe, Sandra; Wang, Weiwei; Meng, Bo; Song, Deyong; Zhang, Yong; Tian, Zhijian; Wasilenko, Shawn T.; Rahbari, Mandana; Mitchell, Troy; Jordan, Tracy; Carpenter, Eric; Mason, Andrew L.; Wong, Gane Ka-Shu

    2013-01-01

    We conducted an unbiased metagenomics survey using plasma from patients with chronic hepatitis B, chronic hepatitis C, autoimmune hepatitis (AIH), non-alcoholic steatohepatitis (NASH), and patients without liver disease (control). RNA and DNA libraries were sequenced from plasma filtrates enriched in viral particles to catalog virus populations. Hepatitis viruses were readily detected at high coverage in patients with chronic viral hepatitis B and C, but only a limited number of sequences resembling other viruses were found. The exception was a library from a patient diagnosed with hepatitis C virus (HCV) infection that contained multiple sequences matching GB virus C (GBV-C). Abundant GBV-C reads were also found in plasma from patients with AIH, whereas Torque teno virus (TTV) was found at high frequency in samples from patients with AIH and NASH. After taxonomic classification of sequences by BLASTn, a substantial fraction in each library, ranging from 35% to 76%, remained unclassified. These unknown sequences were assembled into scaffolds along with virus, phage and endogenous retrovirus sequences and then analyzed by BLASTx against the non-redundant protein database. Nearly the full genome of a heretofore-unknown circovirus was assembled and many scaffolds that encoded proteins with similarity to plant, insect and mammalian viruses. The presence of this novel circovirus was confirmed by PCR. BLASTx also identified many polypeptides resembling nucleo-cytoplasmic large DNA viruses (NCLDV) proteins. We re-evaluated these alignments with a profile hidden Markov method, HHblits, and observed inconsistencies in the target proteins reported by the different algorithms. This suggests that sequence alignments are insufficient to identify NCLDV proteins, especially when these alignments are only to small portions of the target protein. Nevertheless, we have now established a reliable protocol for the identification of viruses in plasma that can also be adapted to other

  13. Signature miRNAs in colorectal cancers were revealed using a bias reduction small RNA deep sequencing protocol

    PubMed Central

    Sun, Guihua; Cheng, Ya-Wen; Lai, Lily; Huang, Tsui-Chin; Wang, Jinhui; Wu, Xiwei; Wang, Yafan; Huang, Yasheng; Wang, Jinghan; Zhang, Keqiang; Hu, Shuya; Yang, Ji-Rui; Yen, Yun

    2016-01-01

    To explore the role of miRNAs in colorectal cancers (CRC), we have deep sequenced 48 pairs of frozen CRC samples, of which 44 pairs produced high quality sequencing data. By using a combined approach of our bias reduction small RNA (smRNA) deep sequencing protocol and Illumina small RNA TruSeq method for sample bar coding, we have obtained data from samples of relatively large size with bias reduced digital profile results. This novel approach allowed us to validate many previously published results using various techniques to profile miRNAs in CRC tissues or cell lines and to characterize ‘true’ miRNA signatures highly expressed in colon/rectum (CR) or CRC tissues. According to our results, miR-21, a miRNA that is up-regulated in CRC, and miR-143, a miRNA that is down-regulated in CRC, are the two miRNAs that dominated the miRNA population in CR tissues, and probably are also the most important miRNAs in CRCs. These two miRNAs, together with the other eight miRNAs, miR-148a, -194, -192, 200b, -200c, -10b, -26a, and -145, with descending expressing levels, constituted the top 10 highly expressed miRNAs in CR/CRC. Using TaqMan miRNA qPCR, we detected the relative expression of some of the CRC miRNAs in 10 CRC cell lines, validated their dysregulation under cancer condition, and provided possible explanation for their dysregulation, which could be caused by APC, KRAS, or TP53 mutations. We believe these results will provide a new direction in future miRNA-related CRC development studies, and application of miRNAs in CRC diagnosis/prognosis, and therapy. PMID:26646696

  14. Deep Sequencing of Protease Inhibitor Resistant HIV Patient Isolates Reveals Patterns of Correlated Mutations in Gag and Protease

    PubMed Central

    Tan, Zhiqiang; Oliveira, Glenn; Yuan, Jinyun; Okulicz, Jason F.; Torbett, Bruce E.; Levy, Ronald M.

    2015-01-01

    While the role of drug resistance mutations in HIV protease has been studied comprehensively, mutations in its substrate, Gag, have not been extensively cataloged. Using deep sequencing, we analyzed a unique collection of longitudinal viral samples from 93 patients who have been treated with therapies containing protease inhibitors (PIs). Due to the high sequence coverage within each sample, the frequencies of mutations at individual positions were calculated with high precision. We used this information to characterize the variability in the Gag polyprotein and its effects on PI-therapy outcomes. To examine covariation of mutations between two different sites using deep sequencing data, we developed an approach to estimate the tight bounds on the two-site bivariate probabilities in each viral sample, and the mutual information between pairs of positions based on all the bounds. Utilizing the new methodology we found that mutations in the matrix and p6 proteins contribute to continued therapy failure and have a major role in the network of strongly correlated mutations in the Gag polyprotein, as well as between Gag and protease. Although covariation is not direct evidence of structural propensities, we found the strongest correlations between residues on capsid and matrix of the same Gag protein were often due to structural proximity. This suggests that some of the strongest inter-protein Gag correlations are the result of structural proximity. Moreover, the strong covariation between residues in matrix and capsid at the N-terminus with p1 and p6 at the C-terminus is consistent with residue-residue contacts between these proteins at some point in the viral life cycle. PMID:25894830

  15. Exome and deep sequencing of clinically aggressive neuroblastoma reveal somatic mutations that affect key pathways involved in cancer progression.

    PubMed

    Lasorsa, Vito Alessandro; Formicola, Daniela; Pignataro, Piero; Cimmino, Flora; Calabrese, Francesco Maria; Mora, Jaume; Esposito, Maria Rosaria; Pantile, Marcella; Zanon, Carlo; De Mariano, Marilena; Longo, Luca; Hogarty, Michael D; de Torres, Carmen; Tonini, Gian Paolo; Iolascon, Achille; Capasso, Mario

    2016-04-19

    The spectrum of somatic mutation of the most aggressive forms of neuroblastoma is not completely determined. We sought to identify potential cancer drivers in clinically aggressive neuroblastoma.Whole exome sequencing was conducted on 17 germline and tumor DNA samples from high-risk patients with adverse events within 36 months from diagnosis (HR-Event3) to identify somatic mutations and deep targeted sequencing of 134 genes selected from the initial screening in additional 48 germline and tumor pairs (62.5% HR-Event3 and high-risk patients), 17 HR-Event3 tumors and 17 human-derived neuroblastoma cell lines.We revealed 22 significantly mutated genes, many of which implicated in cancer progression. Fifteen genes (68.2%) were highly expressed in neuroblastoma supporting their involvement in the disease. CHD9, a cancer driver gene, was the most significantly altered (4.0% of cases) after ALK.Other genes (PTK2, NAV3, NAV1, FZD1 and ATRX), expressed in neuroblastoma and involved in cell invasion and migration were mutated at frequency ranged from 4% to 2%.Focal adhesion and regulation of actin cytoskeleton pathways, were frequently disrupted (14.1% of cases) thus suggesting potential novel therapeutic strategies to prevent disease progression.Notably BARD1, CHEK2 and AXIN2 were enriched in rare, potentially pathogenic, germline variants.In summary, whole exome and deep targeted sequencing identified novel cancer genes of clinically aggressive neuroblastoma. Our analyses show pathway-level implications of infrequently mutated genes in leading neuroblastoma progression. PMID:27009842

  16. Signature miRNAs in colorectal cancers were revealed using a bias reduction small RNA deep sequencing protocol.

    PubMed

    Sun, Guihua; Cheng, Ya-Wen; Lai, Lily; Huang, Tsui-Chin; Wang, Jinhui; Wu, Xiwei; Wang, Yafan; Huang, Yasheng; Wang, Jinghan; Zhang, Keqiang; Hu, Shuya; Yang, Ji-Rui; Yen, Yun

    2016-01-26

    To explore the role of miRNAs in colorectal cancers (CRC), we have deep sequenced 48 pairs of frozen CRC samples, of which 44 pairs produced high quality sequencing data. By using a combined approach of our bias reduction small RNA (smRNA) deep sequencing protocol and Illumina small RNA TruSeq method for sample bar coding, we have obtained data from samples of relatively large size with bias reduced digital profile results. This novel approach allowed us to validate many previously published results using various techniques to profile miRNAs in CRC tissues or cell lines and to characterize 'true' miRNA signatures highly expressed in colon/rectum (CR) or CRC tissues. According to our results, miR-21, a miRNA that is up-regulated in CRC, and miR-143, a miRNA that is down-regulated in CRC, are the two miRNAs that dominated the miRNA population in CR tissues, and probably are also the most important miRNAs in CRCs. These two miRNAs, together with the other eight miRNAs, miR-148a, -194, -192, 200b, -200c, -10b, -26a, and -145, with descending expressing levels, constituted the top 10 highly expressed miRNAs in CR/CRC. Using TaqMan miRNA qPCR, we detected the relative expression of some of the CRC miRNAs in 10 CRC cell lines, validated their dysregulation under cancer condition, and provided possible explanation for their dysregulation, which could be caused by APC, KRAS, or TP53 mutations. We believe these results will provide a new direction in future miRNA-related CRC development studies, and application of miRNAs in CRC diagnosis/prognosis, and therapy. PMID:26646696

  17. Exome and deep sequencing of clinically aggressive neuroblastoma reveal somatic mutations that affect key pathways involved in cancer progression

    PubMed Central

    Lasorsa, Vito Alessandro; Formicola, Daniela; Pignataro, Piero; Cimmino, Flora; Calabrese, Francesco Maria; Mora, Jaume; Esposito, Maria Rosaria; Pantile, Marcella; Zanon, Carlo; De Mariano, Marilena; Longo, Luca; Hogarty, Michael D.; de Torres, Carmen; Tonini, Gian Paolo; Iolascon, Achille; Capasso, Mario

    2016-01-01

    The spectrum of somatic mutation of the most aggressive forms of neuroblastoma is not completely determined. We sought to identify potential cancer drivers in clinically aggressive neuroblastoma. Whole exome sequencing was conducted on 17 germline and tumor DNA samples from high-risk patients with adverse events within 36 months from diagnosis (HR-Event3) to identify somatic mutations and deep targeted sequencing of 134 genes selected from the initial screening in additional 48 germline and tumor pairs (62.5% HR-Event3 and high-risk patients), 17 HR-Event3 tumors and 17 human-derived neuroblastoma cell lines. We revealed 22 significantly mutated genes, many of which implicated in cancer progression. Fifteen genes (68.2%) were highly expressed in neuroblastoma supporting their involvement in the disease. CHD9, a cancer driver gene, was the most significantly altered (4.0% of cases) after ALK. Other genes (PTK2, NAV3, NAV1, FZD1 and ATRX), expressed in neuroblastoma and involved in cell invasion and migration were mutated at frequency ranged from 4% to 2%. Focal adhesion and regulation of actin cytoskeleton pathways, were frequently disrupted (14.1% of cases) thus suggesting potential novel therapeutic strategies to prevent disease progression. Notably BARD1, CHEK2 and AXIN2 were enriched in rare, potentially pathogenic, germline variants. In summary, whole exome and deep targeted sequencing identified novel cancer genes of clinically aggressive neuroblastoma. Our analyses show pathway-level implications of infrequently mutated genes in leading neuroblastoma progression. PMID:27009842

  18. Single nucleotide polymorphism discovery in rainbow trout by deep sequencing of a reduced representation library

    Technology Transfer Automated Retrieval System (TEKTRAN)

    BACKGROUND: To enhance capabilities for genomic analyses in rainbow trout, such as genomic selection, a large suite of polymorphic markers that are amenable to high-throughput genotyping protocols must be identified. Expressed Sequence Tags (ESTs) have been used for single nucleotide polymorphism (...

  19. Using small RNA (sRNA) deep sequencing to understand global virus distribution in plants

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Small RNAs (sRNAs), a class of regulatory RNAs, have been used to serve as the specificity determinants of suppressing gene expression in plants and animals. Next generation sequencing (NGS) uncovered the sRNA landscape in most organisms including their associated microbes. In the current study, w...

  20. Testing deep reticulate evolution in Amaryllidaceae Tribe Hippeastreae (Asparagales) with ITS and chloroplast sequence data

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The phylogeny of Amaryllidaceae tribe Hippeastreae was inferred using chloroplast (3’ycf1, ndhF, trnL-F) and nuclear (ITS rDNA) sequence data under maximum parsimony and maximum likelihood frameworks. Network analyses were applied to resolve conflicting signals among data sets and putative scenarios...

  1. Profiling miRNA Expression in Bovine Tissues by Deep Sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    miRNA are short RNA sequences ( ~ 21 nt long) that have been recently identified and were found to play an important role in gene regulation and controlling major cellular processes. Several miRNA are found to be evolutionarily conserved among the mammalian species. Some miRNAs are even conserved be...

  2. Deep RNA sequencing improved the structural annotation of the Tuber melanosporum transcriptome.

    PubMed

    Tisserant, E; Da Silva, C; Kohler, A; Morin, E; Wincker, P; Martin, F

    2011-02-01

    • The functional complexity of the Tuber melanosporum transcriptome has not yet been fully elucidated. Here, we applied high-throughput Illumina RNA-sequencing (RNA-Seq) to the transcriptome of T. melanosporum at different major developmental stages, that is free-living mycelium, fruiting body and ectomycorrhiza. • Sequencing of cDNA libraries generated a total of c. 24 million sequence reads representing > 882 Mb of sequence data. To construct a coverage signal profile across the genome, all reads were then aligned to the reference genome assembly of T. melanosporum Mel28. • We were able to identify a substantial number of novel transcripts, antisense transcripts, new exons, untranslated regions (UTRs), alternative upstream initiation codons and upstream open reading frames. • This RNA-Seq analysis allowed us to improve the genome annotation. It also provided us with a genome-wide view of the transcriptional and post-transcriptional mechanisms generating an increased number of transcript isoforms during major developmental transitions in T. melanosporum. PMID:21223284

  3. MiRNA expression profile of ionizing radiation-induced liver injury in mouse using deep sequencing.

    PubMed

    Lu, Jike; Chen, Chen; Hao, Limin; Zheng, Zhiqiang; Zhang, Naixun; Wang, Zhenyu

    2016-08-01

    In order to investigate the potential regulatory roles of microRNAs (miRNAs) in mouse response to ionizing radiation (IR), the small RNA libraries from liver tissues of mice with or without ionizing radiation (IR) were sequenced by high-throughput deep sequencing technology. A total of 270 miRNAs including 212 known and 58 potentially novel miRNAs were identified. Within these miRNAs, there were 48 miRNAs that were differentially expressed, including 27 known and 21 novel miRNAs. The results of quantitative RT-polymerase chain reaction (qRT-PCR) were in consistent with the sequencing analysis. Target gene prediction, function annotation, and pathway of the identified miRNAs were analyzed using RNAhybrid, miRanda software and Swiss-Prot, Gene Ontology (GO), Clusters of Orthologous Groups (COG), Kyoto Encyclopedia of Genes, and Genomes (KEGG) and non-redundant (NR) databases. These results should be useful to investigate the biological function of miRNAs under IR-induced liver injury. PMID:27214643

  4. Deep Sequencing of the Oral Microbiome Reveals Signatures of Periodontal Disease

    PubMed Central

    Ghodsi, Mohammad; Sommer, Daniel D.; Gibbons, Theodore R.; Treangen, Todd J.; Chang, Yi-Chien; Li, Shan; Stine, O. Colin; Hasturk, Hatice; Kasif, Simon; Segrè, Daniel; Pop, Mihai; Amar, Salomon

    2012-01-01

    The oral microbiome, the complex ecosystem of microbes inhabiting the human mouth, harbors several thousands of bacterial types. The proliferation of pathogenic bacteria within the mouth gives rise to periodontitis, an inflammatory disease known to also constitute a risk factor for cardiovascular disease. While much is known about individual species associated with pathogenesis, the system-level mechanisms underlying the transition from health to disease are still poorly understood. Through the sequencing of the 16S rRNA gene and of whole community DNA we provide a glimpse at the global genetic, metabolic, and ecological changes associated with periodontitis in 15 subgingival plaque samples, four from each of two periodontitis patients, and the remaining samples from three healthy individuals. We also demonstrate the power of whole-metagenome sequencing approaches in characterizing the genomes of key players in the oral microbiome, including an unculturable TM7 organism. We reveal the disease microbiome to be enriched in virulence factors, and adapted to a parasitic lifestyle that takes advantage of the disrupted host homeostasis. Furthermore, diseased samples share a common structure that was not found in completely healthy samples, suggesting that the disease state may occupy a narrow region within the space of possible configurations of the oral microbiome. Our pilot study demonstrates the power of high-throughput sequencing as a tool for understanding the role of the oral microbiome in periodontal disease. Despite a modest level of sequencing (∼2 lanes Illumina 76 bp PE) and high human DNA contamination (up to ∼90%) we were able to partially reconstruct several oral microbes and to preliminarily characterize some systems-level differences between the healthy and diseased oral microbiomes. PMID:22675498

  5. MiRNA Expression Profile for the Human Gastric Antrum Region Using Ultra-Deep Sequencing

    PubMed Central

    Hamoy, Igor G.; Darnet, Sylvain; Burbano, Rommel; Khayat, André; Gonçalves, André Nicolau; Alencar, Dayse O.; Cruz, Aline; Magalhães, Leandro; Araújo Jr., Wilson; Silva, Artur; Santos, Sidney; Demachki, Samia; Assumpção, Paulo; Ribeiro-dos-Santos, Ândrea

    2014-01-01

    Background MicroRNAs are small non-coding nucleotide sequences that regulate gene expression. These structures are fundamental to several biological processes, including cell proliferation, development, differentiation and apoptosis. Identifying the expression profile of microRNAs in healthy human gastric antrum mucosa may help elucidate the miRNA regulatory mechanisms of the human stomach. Methodology/Principal Findings A small RNA library of stomach antrum tissue was sequenced using high-throughput SOLiD sequencing technology. The total read count for the gastric mucosa antrum region was greater than 618,000. After filtering and aligning using with MirBase, 148 mature miRNAs were identified in the gastric antrum tissue, totaling 3,181 quality reads; 63.5% (2,021) of the reads were concentrated in the eight most highly expressed miRNAs (hsa-mir-145, hsa-mir-29a, hsa-mir-29c, hsa-mir-21, hsa-mir-451a, hsa-mir-192, hsa-mir-191 and hsa-mir-148a). RT-PCR validated the expression profiles of seven of these highly expressed miRNAs and confirmed the sequencing results obtained using the SOLiD platform. Conclusions/Significance In comparison with other tissues, the antrum’s expression profile was unique with respect to the most highly expressed miRNAs, suggesting that this expression profile is specific to stomach antrum tissue. The current study provides a starting point for a more comprehensive understanding of the role of miRNAs in the regulation of the molecular processes of the human stomach. PMID:24647245

  6. Deep sequencing of the oral microbiome reveals signatures of periodontal disease.

    PubMed

    Liu, Bo; Faller, Lina L; Klitgord, Niels; Mazumdar, Varun; Ghodsi, Mohammad; Sommer, Daniel D; Gibbons, Theodore R; Treangen, Todd J; Chang, Yi-Chien; Li, Shan; Stine, O Colin; Hasturk, Hatice; Kasif, Simon; Segrè, Daniel; Pop, Mihai; Amar, Salomon

    2012-01-01

    The oral microbiome, the complex ecosystem of microbes inhabiting the human mouth, harbors several thousands of bacterial types. The proliferation of pathogenic bacteria within the mouth gives rise to periodontitis, an inflammatory disease known to also constitute a risk factor for cardiovascular disease. While much is known about individual species associated with pathogenesis, the system-level mechanisms underlying the transition from health to disease are still poorly understood. Through the sequencing of the 16S rRNA gene and of whole community DNA we provide a glimpse at the global genetic, metabolic, and ecological changes associated with periodontitis in 15 subgingival plaque samples, four from each of two periodontitis patients, and the remaining samples from three healthy individuals. We also demonstrate the power of whole-metagenome sequencing approaches in characterizing the genomes of key players in the oral microbiome, including an unculturable TM7 organism. We reveal the disease microbiome to be enriched in virulence factors, and adapted to a parasitic lifestyle that takes advantage of the disrupted host homeostasis. Furthermore, diseased samples share a common structure that was not found in completely healthy samples, suggesting that the disease state may occupy a narrow region within the space of possible configurations of the oral microbiome. Our pilot study demonstrates the power of high-throughput sequencing as a tool for understanding the role of the oral microbiome in periodontal disease. Despite a modest level of sequencing (~2 lanes Illumina 76 bp PE) and high human DNA contamination (up to ~90%) we were able to partially reconstruct several oral microbes and to preliminarily characterize some systems-level differences between the healthy and diseased oral microbiomes. PMID:22675498

  7. Rare variant discovery by deep whole-genome sequencing of 1,070 Japanese individuals

    PubMed Central

    Nagasaki, Masao; Yasuda, Jun; Katsuoka, Fumiki; Nariai, Naoki; Kojima, Kaname; Kawai, Yosuke; Yamaguchi-Kabata, Yumi; Yokozawa, Junji; Danjoh, Inaho; Saito, Sakae; Sato, Yukuto; Mimori, Takahiro; Tsuda, Kaoru; Saito, Rumiko; Pan, Xiaoqing; Nishikawa, Satoshi; Ito, Shin; Kuroki, Yoko; Tanabe, Osamu; Fuse, Nobuo; Kuriyama, Shinichi; Kiyomoto, Hideyasu; Hozawa, Atsushi; Minegishi, Naoko; Douglas Engel, James; Kinoshita, Kengo; Kure, Shigeo; Yaegashi, Nobuo; Tsuboi, Akito; Nagami, Fuji; Kawame, Hiroshi; Tomita, Hiroaki; Tsuji, Ichiro; Nakaya, Jun; Sugawara, Junichi; Suzuki, Kichiya; Kikuya, Masahiro; Abe, Michiaki; Nakaya, Naoki; Osumi, Noriko; Yamashita, Riu; Ogishima, Soichi; Takai, Takako; Tominaga, Teiji; Taki, Yasuyuki; Suzuki, Yoichi; Yamamoto, Masayuki

    2015-01-01

    The Tohoku Medical Megabank Organization reports the whole-genome sequences of 1,070 healthy Japanese individuals and construction of a Japanese population reference panel (1KJPN). Here we identify through this high-coverage sequencing (32.4 × on average), 21.2 million, including 12 million novel, single-nucleotide variants (SNVs) at an estimated false discovery rate of <1.0%. This detailed analysis detected signatures for purifying selection on regulatory elements as well as coding regions. We also catalogue structural variants, including 3.4 million insertions and deletions, and 25,923 genic copy-number variants. The 1KJPN was effective for imputing genotypes of the Japanese population genome wide. These data demonstrate the value of high-coverage sequencing for constructing population-specific variant panels, which covers 99.0% SNVs of minor allele frequency ≥0.1%, and its value for identifying causal rare variants of complex human disease phenotypes in genetic association studies. PMID:26292667

  8. Virus discovery by deep sequencing and assembly of virus-derived small silencing RNAs

    PubMed Central

    Wu, Qingfa; Luo, Yingjun; Lu, Rui; Lau, Nelson; Lai, Eric C.; Li, Wan-Xiang; Ding, Shou-Wei

    2010-01-01

    In response to infection, invertebrates process replicating viral RNA genomes into siRNAs of discrete sizes to guide virus clearance by RNA interference. Here, we show that viral siRNAs sequenced from fruit fly, mosquito, and nematode cells were all overlapping in sequence, suggesting a possibility of using siRNAs for viral genome assembly and virus discovery. To test this idea, we examined contigs assembled from published small RNA libraries and discovered five previously undescribed viruses from cultured Drosophila cells and adult mosquitoes, including three with a positive-strand RNA genome and two with a dsRNA genome. Notably, four of the identified viruses exhibited only low sequence similarities to known viruses, such that none could be assigned into an existing virus genus. We also report detection of virus-derived PIWI-interacting RNAs (piRNAs) in Drosophila melanogaster that have not been previously described in any other host species and demonstrate viral genome assembly from viral piRNAs in the absence of viral siRNAs. Thus, this study provides a powerful culture-independent approach for virus discovery in invertebrates by assembling viral genomes directly from host immune response products without prior virus enrichment or amplification. We propose that invertebrate viruses discovered by this approach may include previously undescribed human and vertebrate viral pathogens that are transmitted by arthropod vectors. PMID:20080648

  9. Deep sequencing uncovers protistan plankton diversity in the Portuguese Ria Formosa solar saltern ponds.

    PubMed

    Filker, Sabine; Gimmler, Anna; Dunthorn, Micah; Mahé, Frédéric; Stoeck, Thorsten

    2015-03-01

    We used high-throughput sequencing to unravel the genetic diversity of protistan (including fungal) plankton in hypersaline ponds of the Ria Formosa solar saltern works in Portugal. From three ponds of different salinity (4, 12 and 38 %), we obtained ca. 105,000 amplicons (V4 region of the SSU rDNA). The genetic diversity we found was higher than what has been described from solar saltern ponds thus far by microscopy or molecular studies. The obtained operational taxonomic units (OTUs) could be assigned to 14 high-rank taxonomic groups and blasted to 120 eukaryotic families. The novelty of this genetic diversity was extremely high, with 27 % of all OTUs having a sequence divergence of more than 10 % to deposited sequences of described taxa. The highest degree of novelty was found at intermediate salinity of 12 % within the ciliates, which traditionally are considered as the best known and described taxon group within the kingdom Protista. Further substantial novelty was detected within the stramenopiles and the chlorophytes. Analyses of community structures suggest a transition boundary for protistan plankton between 4 and 12 % salinity, suggesting different haloadaptation strategies in individual evolutionary lineages as a result of environmental filtering. Our study makes evident the gaps in our knowledge not only of protistan and fungal plankton diversity in hypersaline environments, but also in their ecology and their strategies to cope with these environmental conditions. It substantiates that specific future research needs to fill these gaps. PMID:25472012

  10. High diversity of picornaviruses in rats from different continents revealed by deep sequencing.

    PubMed

    Hansen, Thomas Arn; Mollerup, Sarah; Nguyen, Nam-Phuong; White, Nicole E; Coghlan, Megan; Alquezar-Planas, David E; Joshi, Tejal; Jensen, Randi Holm; Fridholm, Helena; Kjartansdóttir, Kristín Rós; Mourier, Tobias; Warnow, Tandy; Belsham, Graham J; Bunce, Michael; Willerslev, Eske; Nielsen, Lars Peter; Vinner, Lasse; Hansen, Anders Johannes

    2016-01-01

    Outbreaks of zoonotic diseases in humans and livestock are not uncommon, and an important component in containment of such emerging viral diseases is rapid and reliable diagnostics. Such methods are often PCR-based and hence require the availability of sequence data from the pathogen. Rattus norvegicus (R. norvegicus) is a known reservoir for important zoonotic pathogens. Transmission may be direct via contact with the animal, for example, through exposure to its faecal matter, or indirectly mediated by arthropod vectors. Here we investigated the viral content in rat faecal matter (n=29) collected from two continents by analyzing 2.2 billion next-generation sequencing reads derived from both DNA and RNA. Among other virus families, we found sequences from members of the Picornaviridae to be abundant in the microbiome of all the samples. Here we describe the diversity of the picornavirus-like contigs including near-full-length genomes closely related to the Boone cardiovirus and Theiler's encephalomyelitis virus. From this study, we conclude that picornaviruses within R. norvegicus are more diverse than previously recognized. The virome of R. norvegicus should be investigated further to assess the full potential for zoonotic virus transmission. PMID:27530749

  11. Single nucleotide polymorphism discovery in rainbow trout by deep sequencing of a reduced representation library

    PubMed Central

    2009-01-01

    Background To enhance capabilities for genomic analyses in rainbow trout, such as genomic selection, a large suite of polymorphic markers that are amenable to high-throughput genotyping protocols must be identified. Expressed Sequence Tags (ESTs) have been used for single nucleotide polymorphism (SNP) discovery in salmonids. In those strategies, the salmonid semi-tetraploid genomes often led to assemblies of paralogous sequences and therefore resulted in a high rate of false positive SNP identification. Sequencing genomic DNA using primers identified from ESTs proved to be an effective but time consuming methodology of SNP identification in rainbow trout, therefore not suitable for high throughput SNP discovery. In this study, we employed a high-throughput strategy that used pyrosequencing technology to generate data from a reduced representation library constructed with genomic DNA pooled from 96 unrelated rainbow trout that represent the National Center for Cool and Cold Water Aquaculture (NCCCWA) broodstock population. Results The reduced representation library consisted of 440 bp fragments resulting from complete digestion with the restriction enzyme HaeIII; sequencing produced 2,000,000 reads providing an average 6 fold coverage of the estimated 150,000 unique genomic restriction fragments (300,000 fragment ends). Three independent data analyses identified 22,022 to 47,128 putative SNPs on 13,140 to 24,627 independent contigs. A set of 384 putative SNPs, randomly selected from the sets produced by the three analyses were genotyped on individual fish to determine the validation rate of putative SNPs among analyses, distinguish apparent SNPs that actually represent paralogous loci in the tetraploid genome, examine Mendelian segregation, and place the validated SNPs on the rainbow trout linkage map. Approximately 48% (183) of the putative SNPs were validated; 167 markers were successfully incorporated into the rainbow trout linkage map. In addition, 2% of the

  12. Population-genomic variation within RNA viruses of the Western honey bee, Apis mellifera, inferred from deep sequencing

    PubMed Central

    2013-01-01

    Background Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RNA viruses of the Western honey bee (Apis mellifera), deformed wing virus (DWV) and Israel acute paralysis virus (IAPV). All viral RNA was extracted from North American samples of honey bees or, in one case, the ectoparasitic mite Varroa destructor. Results Coverage depth was generally lower for IAPV than DWV, and marked gaps in coverage occurred in several narrow regions (< 50 bp) of IAPV. These coverage gaps occurred across sequencing runs and were virtually unchanged when reads were re-mapped with greater permissiveness (up to 8% divergence), suggesting a recurrent sequencing artifact rather than strain divergence. Consensus sequences of DWV for each sample showed little phylogenetic divergence, low nucleotide diversity, and strongly negative values of Fu and Li’s D statistic, suggesting a recent population bottleneck and/or purifying selection. The Kakugo strain of DWV fell outside of all other DWV sequences at 100% bootstrap support. IAPV consensus sequences supported the existence of multiple clades as had been previously reported, and Fu and Li’s D was closer to neutral expectation overall, although a sliding-window analysis identified a significantly positive D within the protease region, suggesting selection maintains diversity in that region. Within-sample mean diversity was comparable between the two viruses on average, although for both viruses there was substantial variation among samples in mean diversity at third codon positions and in the number of high-diversity sites. FST values were bimodal for DWV, likely reflecting neutral divergence in two low-diversity populations, whereas IAPV had several sites that were strong outliers with very low

  13. Identification of MicroRNAs and Transcript Targets in Camelina sativa by Deep Sequencing and Computational Methods

    PubMed Central

    Poudel, Saroj; Aryal, Niranjan; Lu, Chaofu

    2015-01-01

    Camelina sativa is an annual oilseed crop that is under intensive development for renewable resources of biofuels and industrial oils. MicroRNAs, or miRNAs, are endogenously encoded small RNAs that play key roles in diverse plant biological processes. Here, we conducted deep sequencing on small RNA libraries prepared from camelina leaves, flower buds and two stages of developing seeds corresponding to initial and peak storage products accumulation. Computational analyses identified 207 known miRNAs belonging to 63 families, as well as 5 novel miRNAs. These miRNAs, especially members of the miRNA families, varied greatly in different tissues and developmental stages. The predicted miRNA target genes are involved in a broad range of physiological functions including lipid metabolism. This report is the first step toward elucidating roles of miRNAs in C. sativa and will provide additional tools to improve this oilseed crop for biofuels and biomaterials. PMID:25826400

  14. Gene expression profiling of Sinapis alba leaves under drought stress and rewatering growth conditions with Illumina deep sequencing.

    PubMed

    Dong, Cai-Hua; Li, Chen; Yan, Xiao-Hong; Huang, Shun-Mou; Huang, Jin-Yong; Wang, Li-Jun; Guo, Rui-Xing; Lu, Guang-Yuan; Zhang, Xue-Kun; Fang, Xiao-Ping; Wei, Wen-Hui

    2012-05-01

    Sinapis alba has many desirable agronomic traits including tolerance to drought. In this investigation, we performed the genome-wide transcriptional profiling of S. alba leaves under drought stress and rewatering growth conditions in an attempt to identify candidate genes involved in drought tolerance, using the Illumina deep sequencing technology. The comparative analysis revealed numerous changes in gene expression level attributable to the drought stress, which resulted in the down-regulation of 309 genes and the up-regulation of 248 genes. Gene ontology analysis revealed that the differentially expressed genes were mainly involved in cell division and catalytic and metabolic processes. Our results provide useful information for further analyses of the drought stress tolerance in Sinapis, and will facilitate molecular breeding for Brassica crop plants. PMID:22207172

  15. Identification of microRNAs and transcript targets in Camelina sativa by deep sequencing and computational methods.

    PubMed

    Poudel, Saroj; Aryal, Niranjan; Lu, Chaofu

    2015-01-01

    Camelina sativa is an annual oilseed crop that is under intensive development for renewable resources of biofuels and industrial oils. MicroRNAs, or miRNAs, are endogenously encoded small RNAs that play key roles in diverse plant biological processes. Here, we conducted deep sequencing on small RNA libraries prepared from camelina leaves, flower buds and two stages of developing seeds corresponding to initial and peak storage products accumulation. Computational analyses identified 207 known miRNAs belonging to 63 families, as well as 5 novel miRNAs. These miRNAs, especially members of the miRNA families, varied greatly in different tissues and developmental stages. The predicted miRNA target genes are involved in a broad range of physiological functions including lipid metabolism. This report is the first step toward elucidating roles of miRNAs in C. sativa and will provide additional tools to improve this oilseed crop for biofuels and biomaterials. PMID:25826400

  16. Exploring the Gastrointestinal “Nemabiome”: Deep Amplicon Sequencing to Quantify the Species Composition of Parasitic Nematode Communities

    PubMed Central

    Avramenko, Russell W.; Redman, Elizabeth M.; Lewis, Roy; Yazwinski, Thomas A.; Wasmuth, James D.; Gilleard, John S.

    2015-01-01

    Parasitic helminth infections have a considerable impact on global human health as well as animal welfare and production. Although co-infection with multiple parasite species within a host is common, there is a dearth of tools with which to study the composition of these complex parasite communities. Helminth species vary in their pathogenicity, epidemiology and drug sensitivity and the interactions that occur between co-infecting species and their hosts are poorly understood. We describe the first application of deep amplicon sequencing to study parasitic nematode communities as well as introduce the concept of the gastro-intestinal “nemabiome”. The approach is analogous to 16S rDNA deep sequencing used to explore microbial communities, but utilizes the nematode ITS-2 rDNA locus instead. Gastro-intestinal parasites of cattle were used to develop the concept, as this host has many well-defined gastro-intestinal nematode species that commonly occur as complex co-infections. Further, the availability of pure mono-parasite populations from experimentally infected cattle allowed us to prepare mock parasite communities to determine, and correct for, species representation biases in the sequence data. We demonstrate that, once these biases have been corrected, accurate relative quantitation of gastro-intestinal parasitic nematode communities in cattle fecal samples can be achieved. We have validated the accuracy of the method applied to field-samples by comparing the results of detailed morphological examination of L3 larvae populations with those of the sequencing assay. The results illustrate the insights that can be gained into the species composition of parasite communities, using grazing cattle in the mid-west USA as an example. However, both the technical approach and the concept of the ‘nemabiome’ have a wide range of potential applications in human and veterinary medicine. These include investigations of host-parasite and parasite-parasite interactions

  17. Congruent Deep Relationships in the Grape Family (Vitaceae) Based on Sequences of Chloroplast Genomes and Mitochondrial Genes via Genome Skimming

    PubMed Central

    Zhang, Ning; Wen, Jun; Zimmer, Elizabeth A.

    2015-01-01

    Vitaceae is well-known for having one of the most economically important fruits, i.e., the grape (Vitis vinifera). The deep phylogeny of the grape family was not resolved until a recent phylogenomic analysis of 417 nuclear genes from transcriptome data. However, it has been reported extensively that topologies based on nuclear and organellar genes may be incongruent due to differences in their evolutionary histories. Therefore, it is important to reconstruct a backbone phylogeny of the grape family using plastomes and mitochondrial genes. In this study, next-generation sequencing data sets of 27 species were obtained using genome skimming with total DNAs from silica-gel preserved tissue samples on an Illumina HiSeq 2500 instrument. Plastomes were assembled using the combination of de novo and reference genome (of V. vinifera) methods. Sixteen mitochondrial genes were also obtained via genome skimming using the reference genome of V. vinifera. Extensive phylogenetic analyses were performed using maximum likelihood and Bayesian methods. The topology based on either plastome data or mitochondrial genes is congruent with the one using hundreds of nuclear genes, indicating that the grape family did not exhibit significant reticulation at the deep level. The results showcase the power of genome skimming in capturing extensive phylogenetic data: especially from chloroplast and mitochondrial DNAs. PMID:26656830

  18. Insights into the Genetic Structure and Diversity of 38 South Asian Indians from Deep Whole-Genome Sequencing

    PubMed Central

    Saw, Woei-Yuh; Ong, Rick Twee-Hee; Cheng, Anthony Youzhi; Pillai, Nisha Esakimuthu; Liu, Xuanyao; Xu, Wenting; Chen, Peng; Foo, Jia-Nee; Tan, Linda Wei-Lin; Koo, Seok-Hwee; Soong, Richie; Wenk, Markus Rene; Lim, Wei-Yen; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2014-01-01

    South Asia possesses a significant amount of genetic diversity due to considerable intergroup differences in culture and language. There have been numerous reports on the genetic structure of Asian Indians, although these have mostly relied on genotyping microarrays or targeted sequencing of the mitochondria and Y chromosomes. Asian Indians in Singapore are primarily descendants of immigrants from Dravidian-language–speaking states in south India, and 38 individuals from the general population underwent deep whole-genome sequencing with a target coverage of 30X as part of the Singapore Sequencing Indian Project (SSIP). The genetic structure and diversity of these samples were compared against samples from the Singapore Sequencing Malay Project and populations in Phase 1 of the 1,000 Genomes Project (1 KGP). SSIP samples exhibited greater intra-population genetic diversity and possessed higher heterozygous-to-homozygous genotype ratio than other Asian populations. When compared against a panel of well-defined Asian Indians, the genetic makeup of the SSIP samples was closely related to South Indians. However, even though the SSIP samples clustered distinctly from the Europeans in the global population structure analysis with autosomal SNPs, eight samples were assigned to mitochondrial haplogroups that were predominantly present in Europeans and possessed higher European admixture than the remaining samples. An analysis of the relative relatedness between SSIP with two archaic hominins (Denisovan, Neanderthal) identified higher ancient admixture in East Asian populations than in SSIP. The data resource for these samples is publicly available and is expected to serve as a valuable complement to the South Asian samples in Phase 3 of 1 KGP. PMID:24832686

  19. Deep Sequencing of Plant and Animal DNA Contained within Traditional Chinese Medicines Reveals Legality Issues and Health Safety Concerns

    PubMed Central

    Coghlan, Megan L.; Haile, James; Houston, Jayne; Murray, Dáithí C.; White, Nicole E.; Moolhuijzen, Paula; Bellgard, Matthew I.; Bunce, Michael

    2012-01-01

    Traditional Chinese medicine (TCM) has been practiced for thousands of years, but only within the last few decades has its use become more widespread outside of Asia. Concerns continue to be raised about the efficacy, legality, and safety of many popular complementary alternative medicines, including TCMs. Ingredients of some TCMs are known to include derivatives of endangered, trade-restricted species of plants and animals, and therefore contravene the Convention on International Trade in Endangered Species (CITES) legislation. Chromatographic studies have detected the presence of heavy metals and plant toxins within some TCMs, and there are numerous cases of adverse reactions. It is in the interests of both biodiversity conservation and public safety that techniques are developed to screen medicinals like TCMs. Targeting both the p-loop region of the plastid trnL gene and the mitochondrial 16S ribosomal RNA gene, over 49,000 amplicon sequence reads were generated from 15 TCM samples presented in the form of powders, tablets, capsules, bile flakes, and herbal teas. Here we show that second-generation, high-throughput sequencing (HTS) of DNA represents an effective means to genetically audit organic ingredients within complex TCMs. Comparison of DNA sequence data to reference databases revealed the presence of 68 different plant families and included genera, such as Ephedra and Asarum, that are potentially toxic. Similarly, animal families were identified that include genera that are classified as vulnerable, endangered, or critically endangered, including Asiatic black bear (Ursus thibetanus) and Saiga antelope (Saiga tatarica). Bovidae, Cervidae, and Bufonidae DNA were also detected in many of the TCM samples and were rarely declared on the product packaging. This study demonstrates that deep sequencing via HTS is an efficient and cost-effective way to audit highly processed TCM products and will assist in monitoring their legality and safety especially when

  20. Identification of an NAC Transcription Factor Family by Deep Transcriptome Sequencing in Onion (Allium cepa L.).

    PubMed

    Zheng, Xia; Tang, Shouwei; Zhu, Siyuan; Dai, Qiuzhong; Liu, Touming

    2016-01-01

    Although onion has been used extensively in the past for cytogenetic studies, molecular analysis has been lacking because the availability of genetic resources is limited. NAM, ATAF, and CUC (NAC) transcription factors (TFs) are plant-specific proteins, and they play key roles in plant growth, development, and stress tolerance. However, none of the onion NAC (CepNAC) genes had been identified thus far. In this study, the transcriptome of onion leaves was analyzed by Illumina paired-end sequencing. Approximately 102.9 million clean sequence reads were produced and used for de novo assembly, which generated 117,189 non-redundant transcripts. Of these transcripts, 39,472 were annotated for their function. In order to mine the CepNAC TFs, CepNAC genes were searched from the transcripts assembled, resulting in the identification of all 39 CepNAC genes. These 39 CepNAC proteins were subjected to phylogenetic analysis together with 47 NAC proteins of known function that were previously identified in other species. The results showed that they can be divided into five groups (NAC-I-V). Interestingly, the NAC-IV and -V groups were found to be likely related to the processes of secondary wall synthesis and stress response, respectively. The transcriptome analysis generated a substantial amount of transcripts, which will aid immensely in identifying important genes and accelerating our understanding of onion growth and development. Moreover, the discovery of 39 CepNAC TFs and the identification of the sequence conservation between them and NAC proteins published will provide a basis for further characterization and validation of their functions in the future. PMID:27331904

  1. Deep Sequencing-Based Analysis of the Cymbidium ensifolium Floral Transcriptome

    PubMed Central

    Li, Xiaobai; Luo, Jie; Yan, Tianlian; Xiang, Lin; Jin, Feng; Qin, Dehui; Sun, Chongbo; Xie, Ming

    2013-01-01

    Cymbidium ensifolium is a Chinese Cymbidium with an elegant shape, beautiful appearance, and a fragrant aroma. C. ensifolium has a long history of cultivation in China and it has excellent commercial value as a potted plant and cut flower. The development of C. ensifolium genomic resources has been delayed because of its large genome size. Taking advantage of technical and cost improvement of RNA-Seq, we extracted total mRNA from flower buds and mature flowers and obtained a total of 9.52 Gb of filtered nucleotides comprising 98,819,349 filtered reads. The filtered reads were assembled into 101,423 isotigs, representing 51,696 genes. Of the 101,423 isotigs, 41,873 were putative homologs of annotated sequences in the public databases, of which 158 were associated with floral development and 119 were associated with flowering. The isotigs were categorized according to their putative functions. In total, 10,212 of the isotigs were assigned into 25 eukaryotic orthologous groups (KOGs), 41,690 into 58 gene ontology (GO) terms, and 9,830 into 126 Arabidopsis Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, and 9,539 isotigs into 123 rice pathways. Comparison of the isotigs with those of the two related orchid species P. equestris and C. sinense showed that 17,906 isotigs are unique to C. ensifolium. In addition, a total of 7,936 SSRs and 16,676 putative SNPs were identified. To our knowledge, this transcriptome database is the first major genomic resource for C. ensifolium and the most comprehensive transcriptomic resource for genus Cymbidium. These sequences provide valuable information for understanding the molecular mechanisms of floral development and flowering. Sequences predicted to be unique to C. ensifolium would provide more insights into C. ensifolium gene diversity. The numerous SNPs and SSRs identified in the present study will contribute to marker development for C. ensifolium. PMID:24392013

  2. Identification of an NAC Transcription Factor Family by Deep Transcriptome Sequencing in Onion (Allium cepa L.)

    PubMed Central

    Zhu, Siyuan; Dai, Qiuzhong; Liu, Touming

    2016-01-01

    Although onion has been used extensively in the past for cytogenetic studies, molecular analysis has been lacking because the availability of genetic resources is limited. NAM, ATAF, and CUC (NAC) transcription factors (TFs) are plant-specific proteins, and they play key roles in plant growth, development, and stress tolerance. However, none of the onion NAC (CepNAC) genes had been identified thus far. In this study, the transcriptome of onion leaves was analyzed by Illumina paired-end sequencing. Approximately 102.9 million clean sequence reads were produced and used for de novo assembly, which generated 117,189 non-redundant transcripts. Of these transcripts, 39,472 were annotated for their function. In order to mine the CepNAC TFs, CepNAC genes were searched from the transcripts assembled, resulting in the identification of all 39 CepNAC genes. These 39 CepNAC proteins were subjected to phylogenetic analysis together with 47 NAC proteins of known function that were previously identified in other species. The results showed that they can be divided into five groups (NAC-I–V). Interestingly, the NAC-IV and -V groups were found to be likely related to the processes of secondary wall synthesis and stress response, respectively. The transcriptome analysis generated a substantial amount of transcripts, which will aid immensely in identifying important genes and accelerating our understanding of onion growth and development. Moreover, the discovery of 39 CepNAC TFs and the identification of the sequence conservation between them and NAC proteins published will provide a basis for further characterization and validation of their functions in the future. PMID:27331904

  3. Sequence-of-events-driven automation of the deep space network

    NASA Technical Reports Server (NTRS)

    Hill, R., Jr.; Fayyad, K.; Smyth, C.; Santos, T.; Chen, R.; Chien, S.; Bevan, R.

    1996-01-01

    In February 1995, sequence-of-events (SOE)-driven automation technology was demonstrated for a Voyager telemetry downlink track at DSS 13. This demonstration entailed automated generation of an operations procedure (in the form of a temporal dependency network) from project SOE information using artificial intelligence planning technology and automated execution of the temporal dependency network using the link monitor and control operator assistant system. This article describes the overall approach to SOE-driven automation that was demonstrated, identifies gaps in SOE definitions and project profiles that hamper automation, and provides detailed measurements of the knowledge engineering effort required for automation.

  4. Deep sequencing reveals exceptional diversity and modes of transmission for bacterial sponge symbionts

    PubMed Central

    Webster, Nicole S; Taylor, Michael W; Behnam, Faris; Lücker, Sebastian; Rattei, Thomas; Whalan, Stephen; Horn, Matthias; Wagner, Michael

    2010-01-01

    Marine sponges contain complex bacterial communities of considerable ecological and biotechnological importance, with many of these organisms postulated to be specific to sponge hosts. Testing this hypothesis in light of the recent discovery of the rare microbial biosphere, we investigated three Australian sponges by massively parallel 16S rRNA gene tag pyrosequencing. Here we show bacterial diversity that is unparalleled in an invertebrate host, with more than 250 000 sponge-derived sequence tags being assigned to 23 bacterial phyla and revealing up to 2996 operational taxonomic units (95% sequence similarity) per sponge species. Of the 33 previously described ‘sponge-specific’ clusters that were detected in this study, 48% were found exclusively in adults and larvae – implying vertical transmission of these groups. The remaining taxa, including ‘Poribacteria’, were also found at very low abundance among the 135 000 tags retrieved from surrounding seawater. Thus, members of the rare seawater biosphere may serve as seed organisms for widely occurring symbiont populations in sponges and their host association might have evolved much more recently than previously thought. PMID:21966903

  5. Hemocytome: deep sequencing analysis of mosquito blood cells in Indian malarial vector Anopheles stephensi.

    PubMed

    Thomas, Tina; De, Tanwee Das; Sharma, Punita; Lata, Suman; Saraswat, Priyanka; Pandey, Kailash C; Dixit, Rajnikant

    2016-07-10

    Hemocytes are tiny circulating blood cells of insects known to play multiple roles in physiological as well as cellular immune responses. However, the molecular nature of hemocytes in blood feeding insects, especially mosquitoes which transmit several deadly diseases such as malaria, dengue etc. is still limited. Therefore, to know the basic molecular composition of naïve mosquito hemocyte encoded proteins, we sequenced RNA-Seq library and analyzed a total of 13,105,858 Illumina sequencing reads in the mosquito Anopheles stephensi, an urban malarial vector in India. Denovo assembly approach yielded a buildup of 3025 contigs, for molecular and functional annotation. A total of 1829 contigs (48%) could be mapped to the mosquito transcript database, while out of remaining 1196 unmatched contigs, at least 1108 contigs i.e. 40% of total contigs, yielded a significant match to the available draft genome. ImmunoDB analysis predicted a total of 88 putative hemocyte transcripts belonging to 11 immune family proteins. A comprehensive molecular analysis of several unique transcripts including novel LRR, Holotricin, OBP, NiFU, that are involved in immunity, chemo sensing, cell-cell communication, nitrogen fixation/metabolism etc. provides initial evidence that mosquito hemocytes carry unique ability to meet and manage cell specific diverse functions of the mosquito blood. An unexpected observation of abundant transcripts encoding hypothetical proteins with unknown functions indicated that a much of the hemocyte biology remains to be understood. PMID:26915489

  6. Deep sequencing uncovers numerous small RNAs on all four replicons of the plant pathogen Agrobacterium tumefaciens

    PubMed Central

    Wilms, Ina; Overlöper, Aaron; Nowrousian, Minou; Sharma, Cynthia M.; Narberhaus, Franz

    2012-01-01

    Agrobacterium species are capable of interkingdom gene transfer between bacteria and plants. The genome of Agrobacterium tumefaciens consists of a circular and a linear chromosome, the At-plasmid and the Ti-plasmid, which harbors bacterial virulence genes required for tumor formation in plants. Little is known about promoter sequences and the small RNA (sRNA) repertoire of this and other α-proteobacteria. We used a differential RNA sequencing (dRNA-seq) approach to map transcriptional start sites of 388 annotated genes and operons. In addition, a total number of 228 sRNAs was revealed from all four Agrobacterium replicons. Twenty-two of these were confirmed by independent RNA gel blot analysis and several sRNAs were differentially expressed in response to growth media, growth phase, temperature or pH. One sRNA from the Ti-plasmid was massively induced under virulence conditions. The presence of 76 cis-antisense sRNAs, two of them on the reverse strand of virulence genes, suggests considerable antisense transcription in Agrobacterium. The information gained from this study provides a valuable reservoir for an in-depth understanding of sRNA-mediated regulation of the complex physiology and infection process of Agrobacterium. PMID:22336765

  7. Computational Approaches for the Analysis of ncRNA through Deep Sequencing Techniques

    PubMed Central

    Veneziano, Dario; Nigita, Giovanni; Ferro, Alfredo

    2015-01-01

    The majority of the human transcriptome is defined as non-coding RNA (ncRNA), since only a small fraction of human DNA encodes for proteins, as reported by the ENCODE project. Several distinct classes of ncRNAs, such as transfer RNA, microRNA, and long non-coding RNA, have been classified, each with its own three-dimensional folding and specific function. As ncRNAs are highly abundant in living organisms and have been discovered to play important roles in many biological processes, there has been an ever increasing need to investigate the entire ncRNAome in further unbiased detail. Recently, the advent of next-generation sequencing (NGS) technologies has substantially increased the throughput of transcriptome studies, allowing an unprecedented investigation of ncRNAs, as regulatory pathways and novel functions involving ncRNAs are now also emerging. The huge amount of transcript data produced by NGS has progressively required the development and implementation of suitable bioinformatics workflows, complemented by knowledge-based approaches, to identify, classify, and evaluate the expression of hundreds of ncRNAs in normal and pathological conditions, such as cancer. In this mini-review, we present and discuss current bioinformatics advances in the development of such computational approaches to analyze and classify the ncRNA component of human transcriptome sequence data obtained from NGS technologies. PMID:26090362

  8. Deep Sequencing Insights in Therapeutic shRNA Processing and siRNA Target Cleavage Precision

    PubMed Central

    Denise, Hubert; Moschos, Sterghios A.; Sidders, Benjamin; Burden, Frances; Perkins, Hannah; Carter, Nikki; Stroud, Tim; Kennedy, Michael; Fancy, Sally-Ann; Lapthorn, Cris; Lavender, Helen; Kinloch, Ross; Suhy, David; Corbau, Romu

    2014-01-01

    TT-034 (PF-05095808) is a recombinant adeno-associated virus serotype 8 (AAV8) agent expressing three short hairpin RNA (shRNA) pro-drugs that target the hepatitis C virus (HCV) RNA genome. The cytosolic enzyme Dicer cleaves each shRNA into multiple, potentially active small interfering RNA (siRNA) drugs. Using next-generation sequencing (NGS) to identify and characterize active shRNAs maturation products, we observed that each TT-034–encoded shRNA could be processed into as many as 95 separate siRNA strands. Few of these appeared active as determined by Sanger 5′ RNA Ligase-Mediated Rapid Amplification of cDNA Ends (5-RACE) and through synthetic shRNA and siRNA analogue studies. Moreover, NGS scrutiny applied on 5-RACE products (RACE-seq) suggested that synthetic siRNAs could direct cleavage in not one, but up to five separate positions on targeted RNA, in a sequence-dependent manner. These data support an on-target mechanism of action for TT-034 without cytotoxicity and question the accepted precision of substrate processing by the key RNA interference (RNAi) enzymes Dicer and siRNA-induced silencing complex (siRISC). PMID:24496437

  9. Computational Approaches for the Analysis of ncRNA through Deep Sequencing Techniques.

    PubMed

    Veneziano, Dario; Nigita, Giovanni; Ferro, Alfredo

    2015-01-01

    The majority of the human transcriptome is defined as non-coding RNA (ncRNA), since only a small fraction of human DNA encodes for proteins, as reported by the ENCODE project. Several distinct classes of ncRNAs, such as transfer RNA, microRNA, and long non-coding RNA, have been classified, each with its own three-dimensional folding and specific function. As ncRNAs are highly abundant in living organisms and have been discovered to play important roles in many biological processes, there has been an ever increasing need to investigate the entire ncRNAome in further unbiased detail. Recently, the advent of next-generation sequencing (NGS) technologies has substantially increased the throughput of transcriptome studies, allowing an unprecedented investigation of ncRNAs, as regulatory pathways and novel functions involving ncRNAs are now also emerging. The huge amount of transcript data produced by NGS has progressively required the development and implementation of suitable bioinformatics workflows, complemented by knowledge-based approaches, to identify, classify, and evaluate the expression of hundreds of ncRNAs in normal and pathological conditions, such as cancer. In this mini-review, we present and discuss current bioinformatics advances in the development of such computational approaches to analyze and classify the ncRNA component of human transcriptome sequence data obtained from NGS technologies. PMID:26090362

  10. Evolutionary Relations of Hexanchiformes Deep-Sea Sharks Elucidated by Whole Mitochondrial Genome Sequences

    PubMed Central

    Tanaka, Keiko; Tomita, Taketeru; Suzuki, Shingo; Hosomichi, Kazuyoshi; Sano, Kazumi; Doi, Hiroyuki; Kono, Azumi; Inoko, Hidetoshi; Kulski, Jerzy K.; Tanaka, Sho

    2013-01-01

    Hexanchiformes is regarded as a monophyletic taxon, but the morphological and genetic relationships between the five extant species within the order are still uncertain. In this study, we determined the whole mitochondrial DNA (mtDNA) sequences of seven sharks including representatives of the five Hexanchiformes, one squaliform, and one carcharhiniform and inferred the phylogenetic relationships among those species and 12 other Chondrichthyes (cartilaginous fishes) species for which the complete mitogenome is available. The monophyly of Hexanchiformes and its close relation with all other Squaliformes sharks were strongly supported by likelihood and Bayesian phylogenetic analysis of 13,749 aligned nucleotides of 13 protein coding genes and two rRNA genes that were derived from the whole mDNA sequences of the 19 species. The phylogeny suggested that Hexanchiformes is in the superorder Squalomorphi, Chlamydoselachus anguineus (frilled shark) is the sister species to all other Hexanchiformes, and the relations within Hexanchiformes are well resolved as Chlamydoselachus, (Notorynchus, (Heptranchias, (Hexanchus griseus, H. nakamurai))). Based on our phylogeny, we discussed evolutionary scenarios of the jaw suspension mechanism and gill slit numbers that are significant features in the sharks. PMID:24089661

  11. Deep-sequence profiling of miRNAs and their target prediction in Monotropa hypopitys.

    PubMed

    Shchennikova, Anna V; Beletsky, Alexey V; Shulga, Olga A; Mazur, Alexander M; Prokhortchouk, Egor B; Kochieva, Elena Z; Ravin, Nikolay V; Skryabin, Konstantin G

    2016-07-01

    Myco-heterotroph Monotropa hypopitys is a widely spread perennial herb used to study symbiotic interactions and physiological mechanisms underlying the development of non-photosynthetic plant. Here, we performed, for the first time, transcriptome-wide characterization of M. hypopitys miRNA profile using high throughput Illumina sequencing. As a result of small RNA library sequencing and bioinformatic analysis, we identified 55 members belonging to 40 families of known miRNAs and 17 putative novel miRNAs unique for M. hypopitys. Computational screening revealed 206 potential mRNA targets for known miRNAs and 31 potential mRNA targets for novel miRNAs. The predicted target genes were described in Gene Ontology terms and were found to be involved in a broad range of metabolic and regulatory pathways. The identification of novel M. hypopitys-specific miRNAs, some with few target genes and low abundances, suggests their recent evolutionary origin and participation in highly specialized regulatory mechanisms fundamental for non-photosynthetic biology of M. hypopitys. This global analysis of miRNAs and their potential targets in M. hypopitys provides a framework for further investigation of miRNA role in the evolution and establishment of non-photosynthetic myco-heterotrophs. PMID:27097902

  12. Functional characterization of a monoclonal antibody epitope using a lambda phage display-deep sequencing platform

    PubMed Central

    Domina, Maria; Lanza Cariccio, Veronica; Benfatto, Salvatore; Venza, Mario; Venza, Isabella; Borgogni, Erica; Castellino, Flora; Midiri, Angelina; Galbo, Roberta; Romeo, Letizia; Biondo, Carmelo; Masignani, Vega; Teti, Giuseppe; Felici, Franco; Beninati, Concetta

    2016-01-01

    We have recently described a method, named PROFILER, for the identification of antigenic regions preferentially targeted by polyclonal antibody responses after vaccination. To test the ability of the technique to provide insights into the functional properties of monoclonal antibody (mAb) epitopes, we used here a well-characterized epitope of meningococcal factor H binding protein (fHbp), which is recognized by mAb 12C1. An fHbp library, engineered on a lambda phage vector enabling surface expression of polypeptides of widely different length, was subjected to massive parallel sequencing of the phage inserts after affinity selection with the 12C1 mAb. We detected dozens of unique antibody-selected sequences, the most enriched of which (designated as FrC) could largely recapitulate the ability of fHbp to bind mAb 12C1. Computational analysis of the cumulative enrichment of single amino acids in the antibody-selected fragments identified two overrepresented stretches of residues (H248-K254 and S140-G154), whose presence was subsequently found to be required for binding of FrC to mAb 12C1. Collectively, these results suggest that the PROFILER technology can rapidly and reliably identify, in the context of complex conformational epitopes, discrete “hot spots” with a crucial role in antigen-antibody interactions, thereby providing useful clues for the functional characterization of the epitope. PMID:27530334

  13. Identification of small non-coding RNAs in the planarian Dugesia japonica via deep sequencing.

    PubMed

    Qin, Yun-Fei; Zhao, Jin-Mei; Bao, Zhen-Xia; Zhu, Zhao-Yu; Mai, Jia; Huang, Yi-Bo; Li, Jian-Biao; Chen, Ge; Lu, Ping; Chen, San-Jun; Su, Lin-Lin; Fang, Hui-Min; Lu, Ji-Ke; Zhang, Yi-Zhe; Zhang, Shou-Tao

    2012-05-01

    Freshwater planarian flatworm possesses an extraordinary ability to regenerate lost body parts after amputation; it is perfect organism model in regeneration and stem cell biology. Recently, small RNAs have been an increasing concern and studied in many aspects, including regeneration and stem cell biology, among others. In the current study, the large-scale cloning and sequencing of sRNAs from the intact and regenerative planarian Dugesia japonica are reported. Sequence analysis shows that sRNAs between 18nt and 40nt are mainly microRNAs and piRNAs. In addition, 209 conserved miRNAs and 12 novel miRNAs are identified. Especially, a better screening target method, negative-correlation relationship of miRNAs and mRNA, is adopted to improve target prediction accuracy. Similar to miRNAs, a diverse population of piRNAs and changes in the two samples are also listed. The present study is the first to report on the important role of sRNAs during planarian Dugesia japonica regeneration. PMID:22425900

  14. Functional characterization of a monoclonal antibody epitope using a lambda phage display-deep sequencing platform.

    PubMed

    Domina, Maria; Lanza Cariccio, Veronica; Benfatto, Salvatore; Venza, Mario; Venza, Isabella; Borgogni, Erica; Castellino, Flora; Midiri, Angelina; Galbo, Roberta; Romeo, Letizia; Biondo, Carmelo; Masignani, Vega; Teti, Giuseppe; Felici, Franco; Beninati, Concetta

    2016-01-01

    We have recently described a method, named PROFILER, for the identification of antigenic regions preferentially targeted by polyclonal antibody responses after vaccination. To test the ability of the technique to provide insights into the functional properties of monoclonal antibody (mAb) epitopes, we used here a well-characterized epitope of meningococcal factor H binding protein (fHbp), which is recognized by mAb 12C1. An fHbp library, engineered on a lambda phage vector enabling surface expression of polypeptides of widely different length, was subjected to massive parallel sequencing of the phage inserts after affinity selection with the 12C1 mAb. We detected dozens of unique antibody-selected sequences, the most enriched of which (designated as FrC) could largely recapitulate the ability of fHbp to bind mAb 12C1. Computational analysis of the cumulative enrichment of single amino acids in the antibody-selected fragments identified two overrepresented stretches of residues (H248-K254 and S140-G154), whose presence was subsequently found to be required for binding of FrC to mAb 12C1. Collectively, these results suggest that the PROFILER technology can rapidly and reliably identify, in the context of complex conformational epitopes, discrete "hot spots" with a crucial role in antigen-antibody interactions, thereby providing useful clues for the functional characterization of the epitope. PMID:27530334

  15. Redefinition of the human mast cell transcriptome by deep-CAGE sequencing

    PubMed Central

    Motakis, Efthymios; Guhl, Sven; Ishizu, Yuri; Itoh, Masayoshi; Kawaji, Hideya; de Hoon, Michiel; Lassmann, Timo; Carninci, Piero; Hayashizaki, Yoshihide; Zuberbier, Torsten; Babina, Magda

    2014-01-01

    Mast cells (MCs) mature exclusively in peripheral tissues, hampering research into their developmental and functional programs. Here, we employed deep cap analysis of gene expression on skin-derived MCs to generate the most comprehensive view of the human MC transcriptome ever reported. An advantage is that MCs were embedded in the FANTOM5 project, giving the opportunity to contrast their molecular signature against a multitude of human samples. We demonstrate that MCs possess a unique and surprising transcriptional landscape, combining hematopoietic genes with those exclusively active in MCs and genes not previously reported as expressed by MCs (several of them markers of unrelated tissues). We also found functional bone morphogenetic protein receptors transducing activatory signals in MCs. Conversely, several immune-related genes frequently studied in MCs were not expressed or were weakly expressed. Comparing MCs ex vivo with cultured counterparts revealed profound changes in the MC transcriptome in in vitro surroundings. We also determined the promoter usage of MC-expressed genes and identified associated motifs active in the lineage. Befitting their uniqueness, MCs had no close relative in the hematopoietic network (also only distantly related with basophils). This rich data set reveals that our knowledge of human MCs is still limited, but with this resource, novel functional programs of MCs may soon be discovered. PMID:24671954

  16. Deep Sequencing of Subseafloor Eukaryotic rRNA Reveals Active Fungi across Marine Subsurface Provinces

    PubMed Central

    Orsi, William; Biddle, Jennifer F.; Edgcomb, Virginia

    2013-01-01

    The deep marine subsurface is a vast habitat for microbial life where cells may live on geologic timescales. Because DNA in sediments may be preserved on long timescales, ribosomal RNA (rRNA) is suggested to be a proxy for the active fraction of a microbial community in the subsurface. During an investigation of eukaryotic 18S rRNA by amplicon pyrosequencing, unique profiles of Fungi were found across a range of marine subsurface provinces including ridge flanks, continental margins, and abyssal plains. Subseafloor fungal populations exhibit statistically significant correlations with total organic carbon (TOC), nitrate, sulfide, and dissolved inorganic carbon (DIC). These correlations are supported by terminal restriction length polymorphism (TRFLP) analyses of fungal rRNA. Geochemical correlations with fungal pyrosequencing and TRFLP data from this geographically broad sample set suggests environmental selection of active Fungi in the marine subsurface. Within the same dataset, ancient rRNA signatures were recovered from plants and diatoms in marine sediments ranging from 0.03 to 2.7 million years old, suggesting that rRNA from some eukaryotic taxa may be much more stable than previously considered in the marine subsurface. PMID:23418556

  17. Focused Evolution of HIV-1 Neutralizing Antibodies Revealed by Structures and Deep Sequencing

    SciTech Connect

    Wu, Xueling; Zhou, Tongqing; Zhu, Jiang; Zhang, Baoshan; Georgiev, Ivelin; Wang, Charlene; Chen, Xuejun; Longo, Nancy S.; Louder, Mark; McKee, Krisha; O’Dell, Sijy; Perfetto, Stephen; Schmidt, Stephen D.; Shi, Wei; Wu, Lan; Yang, Yongping; Yang, Zhi-Yong; Yang, Zhongjia; Zhang, Zhenhai; Bonsignori, Mattia; Crump, John A.; Kapiga, Saidi H.; Sam, Noel E.; Haynes, Barton F.; Simek, Melissa; Burton, Dennis R.; Koff, Wayne C.; Doria-Rose, Nicole A.; Connors, Mark; Mullikin, James C.; Nabel, Gary J.; Roederer, Mario; Shapiro, Lawrence; Kwong, Peter D.; Mascola, John R.

    2013-03-04

    Antibody VRC01 is a human immunoglobulin that neutralizes about 90% of HIV-1 isolates. To understand how such broadly neutralizing antibodies develop, we used x-ray crystallography and 454 pyrosequencing to characterize additional VRC01-like antibodies from HIV-1-infected individuals. Crystal structures revealed a convergent mode of binding for diverse antibodies to the same CD4-binding-site epitope. A functional genomics analysis of expressed heavy and light chains revealed common pathways of antibody-heavy chain maturation, confined to the IGHV1-2*02 lineage, involving dozens of somatic changes, and capable of pairing with different light chains. Broadly neutralizing HIV-1 immunity associated with VRC01-like antibodies thus involves the evolution of antibodies to a highly affinity-matured state required to recognize an invariant viral structure, with lineages defined from thousands of sequences providing a genetic roadmap of their development.

  18. Deep sequencing of gastric carcinoma reveals somatic mutations relevant to personalized medicine

    PubMed Central

    2011-01-01

    Background Globally, gastric cancer is the second most common cause of cancer-related death, with the majority of the health burden borne by economically less-developed countries. Methods Here, we report a genetic characterization of 50 gastric adenocarcinoma samples, using affymetrix SNP arrays and Illumina mRNA expression arrays as well as Illumina sequencing of the coding regions of 384 genes belonging to various pathways known to be altered in other cancers. Results Genetic alterations were observed in the WNT, Hedgehog, cell cycle, DNA damage and epithelial-to-mesenchymal-transition pathways. Conclusions The data suggests targeted therapies approved or in clinical development for gastric carcinoma would be of benefit to ~22% of the patients studied. In addition, the novel mutations detected here, are likely to influence clinical response and suggest new targets for drug discovery. PMID:21781349

  19. Expression profiling of Drosophila mitochondrial genes via deep mRNA sequencing

    PubMed Central

    Torres, Tatiana Teixeira; Dolezal, Marlies; Schlötterer, Christian; Ottenwälder, Birgit

    2009-01-01

    Mitochondria play an essential role in several cellular processes. Nevertheless, very little is known about patterns of gene expression of genes encoded by the mitochondrial DNA (mtDNA). In this study, we used next-generation sequencing (NGS) for transcription profiling of genes encoded in the mitochondrial genome of Drosophila melanogaster and D. pseudoobscura. The analysis of males and females in both species indicated that the expression pattern was conserved between the two species, but differed significantly between both sexes. Interestingly, mRNA levels were not only different among genes encoded by separate transcription units, but also showed significant differences among genes located in the same transcription unit. Hence, mRNA abundance of genes encoded by mtDNA seems to be heavily modulated by post-transcriptional regulation. Finally, we also identified several transcripts with a noncanonical structure, suggesting that processing of mitochondrial transcripts may be more complex than previously assumed. PMID:19843606

  20. Ultra-Deep Sequencing Reveals the microRNA Expression Pattern of the Human Stomach

    PubMed Central

    Ribeiro-dos-Santos, Ândrea; Khayat, André S.; Silva, Artur; Alencar, Dayse O.; Lobato, Jessé; Luz, Larissa; Pinheiro, Daniel G.; Varuzza, Leonardo; Assumpção, Monica; Assumpção, Paulo; Santos, Sidney; Zanette, Dalila L.; Silva, Wilson A.; Burbano, Rommel; Darnet, Sylvain

    2010-01-01

    Background While microRNAs (miRNAs) play important roles in tissue differentiation and in maintaining basal physiology, little is known about the miRNA expression levels in stomach tissue. Alterations in the miRNA profile can lead to cell deregulation, which can induce neoplasia. Methodology/Principal Findings A small RNA library of stomach tissue was sequenced using high-throughput SOLiD sequencing technology. We obtained 261,274 quality reads with perfect matches to the human miRnome, and 42% of known miRNAs were identified. Digital Gene Expression profiling (DGE) was performed based on read abundance and showed that fifteen miRNAs were highly expressed in gastric tissue. Subsequently, the expression of these miRNAs was validated in 10 healthy individuals by RT-PCR showed a significant correlation of 83.97% (P<0.05). Six miRNAs showed a low variable pattern of expression (miR-29b, miR-29c, miR-19b, miR-31, miR-148a, miR-451) and could be considered part of the expression pattern of the healthy gastric tissue. Conclusions/Significance This study aimed to validate normal miRNA profiles of human gastric tissue to establish a reference profile for healthy individuals. Determining the regulatory processes acting in the stomach will be important in the fight against gastric cancer, which is the second-leading cause of cancer mortality worldwide. PMID:20949028

  1. Identifying genomic changes associated with insecticide resistance in the dengue mosquito Aedes aegypti by deep targeted sequencing.

    PubMed

    Faucon, Frederic; Dusfour, Isabelle; Gaude, Thierry; Navratil, Vincent; Boyer, Frederic; Chandre, Fabrice; Sirisopa, Patcharawan; Thanispong, Kanutcharee; Juntarajumnong, Waraporn; Poupardin, Rodolphe; Chareonviriyaphap, Theeraphap; Girod, Romain; Corbel, Vincent; Reynaud, Stephane; David, Jean-Philippe

    2015-09-01

    The capacity of mosquitoes to resist insecticides threatens the control of diseases such as dengue and malaria. Until alternative control tools are implemented, characterizing resistance mechanisms is crucial for managing resistance in natural populations. Insecticide biodegradation by detoxification enzymes is a common resistance mechanism; however, the genomic changes underlying this mechanism have rarely been identified, precluding individual resistance genotyping. In particular, the role of copy number variations (CNVs) and polymorphisms of detoxification enzymes have never been investigated at the genome level, although they can represent robust markers of metabolic resistance. In this context, we combined target enrichment with high-throughput sequencing for conducting the first comprehensive screening of gene amplifications and polymorphisms associated with insecticide resistance in mosquitoes. More than 760 candidate genes were captured and deep sequenced in several populations of the dengue mosquito Ae. aegypti displaying distinct genetic backgrounds and contrasted resistance levels to the insecticide deltamethrin. CNV analysis identified 41 gene amplifications associated with resistance, most affecting cytochrome P450s overtranscribed in resistant populations. Polymorphism analysis detected more than 30,000 variants and strong selection footprints in specific genomic regions. Combining Bayesian and allele frequency filtering approaches identified 55 nonsynonymous variants strongly associated with resistance. Both CNVs and polymorphisms were conserved within regions but differed across continents, confirming that genomic changes underlying metabolic resistance to insecticides are not universal. By identifying novel DNA markers of insecticide resistance, this study opens the way for tracking down metabolic changes developed by mosquitoes to resist insecticides within and among populations. PMID:26206155

  2. Identification and characterisation of microRNAs in young adults of Angiostrongylus cantonensis via a deep-sequencing approach

    PubMed Central

    Chang, Shih-Hsin; Tang, Petrus; Lai, Cheng-Hung; Kuo, Ming-Ling; Wang, Lian-Chen

    2013-01-01

    Angiostrongylus cantonensis is an important causative agent of eosinophilic meningitis and eosinophilic meningoencephalitis in humans. MicroRNAs (miRNAs) are small non-coding RNAs that participate in a wide range of biological processes. This study employed a deep-sequencing approach to study miRNAs from young adults of A. cantonensis. Based on 16,880,456 high-quality reads, 252 conserved mature miRNAs including 10 antisense miRNAs that belonging to 90 families, together with 10 antisense miRNAs were identified and characterised. Among these sequences, 53 miRNAs from 25 families displayed 50 or more reads. The conserved miRNA families were divided into four groups according to their phylogenetic distribution and a total of nine families without any members showing homology to other nematodes or adult worms were identified. Stem-loop real-time polymerase chain reaction analysis of aca-miR-1-1 and aca-miR-71-1 demonstrated that their level of expression increased dramatically from infective larvae to young adults and then decreased in adult worms, with the male worms exhibiting significantly higher levels of expression than female worms. These findings provide information related to the regulation of gene expression during the growth, development and pathogenesis of young adults of A. cantonensis. PMID:24037191

  3. Deep sequencing and in silico analyses identify MYB-regulated gene networks and signaling pathways in pancreatic cancer

    PubMed Central

    Azim, Shafquat; Zubair, Haseeb; Srivastava, Sanjeev K.; Bhardwaj, Arun; Zubair, Asif; Ahmad, Aamir; Singh, Seema; Khushman, Moh’d.; Singh, Ajay P.

    2016-01-01

    We have recently demonstrated that the transcription factor MYB can modulate several cancer-associated phenotypes in pancreatic cancer. In order to understand the molecular basis of these MYB-associated changes, we conducted deep-sequencing of transcriptome of MYB-overexpressing and -silenced pancreatic cancer cells, followed by in silico pathway analysis. We identified significant modulation of 774 genes upon MYB-silencing (p < 0.05) that were assigned to 25 gene networks by in silico analysis. Further analyses placed genes in our RNA sequencing-generated dataset to several canonical signalling pathways, such as cell-cycle control, DNA-damage and -repair responses, p53 and HIF1α. Importantly, we observed downregulation of the pancreatic adenocarcinoma signaling pathway in MYB-silenced pancreatic cancer cells exhibiting suppression of EGFR and NF-κB. Decreased expression of EGFR and RELA was validated by both qPCR and immunoblotting and they were both shown to be under direct transcriptional control of MYB. These observations were further confirmed in a converse approach wherein MYB was overexpressed ectopically in a MYB-null pancreatic cancer cell line. Our findings thus suggest that MYB potentially regulates growth and genomic stability of pancreatic cancer cells via targeting complex gene networks and signaling pathways. Further in-depth functional studies are warranted to fully understand MYB signaling in pancreatic cancer. PMID:27354262

  4. Identification and characterization of novel serum microRNA candidates from deep sequencing in cervical cancer patients

    PubMed Central

    Juan, Li; Tong, Hong-li; Zhang, Pengjun; Guo, Guanghong; Wang, Zi; Wen, Xinyu; Dong, Zhennan; Tian, Ya-ping

    2014-01-01

    Small non-coding microRNAs (miRNAs) are involved in cancer development and progression, and serum profiles of cervical cancer patients may be useful for identifying novel miRNAs. We performed deep sequencing on serum pools of cervical cancer patients and healthy controls with 3 replicates and constructed a small RNA library. We used MIREAP to predict novel miRNAs and identified 2 putative novel miRNAs between serum pools of cervical cancer patients and healthy controls after filtering out pseudo-pre-miRNAs using Triplet-SVM analysis. The 2 putative novel miRNAs were validated by real time PCR and were significantly decreased in cervical cancer patients compared with healthy controls. One novel miRNA had an area under curve (AUC) of 0.921 (95% CI: 0.883, 0.959) with a sensitivity of 85.7% and a specificity of 88.2% when discriminating between cervical cancer patients and healthy controls. Our results suggest that characterizing serum profiles of cervical cancers by Solexa sequencing may be a good method for identifying novel miRNAs and that the validated novel miRNAs described here may be cervical cancer-associated biomarkers. PMID:25182173

  5. Deep sequencing of the tobacco mitochondrial transcriptome reveals expressed ORFs and numerous editing sites outside coding regions

    PubMed Central

    2014-01-01

    Background The purpose of this study was to sequence and assemble the tobacco mitochondrial transcriptome and obtain a genomic-level view of steady-state RNA abundance. Plant mitochondrial genomes have a small number of protein coding genes with large and variably sized intergenic spaces. In the tobacco mitogenome these intergenic spaces contain numerous open reading frames (ORFs) with no clear function. Results The assembled transcriptome revealed distinct monocistronic and polycistronic transcripts along with large intergenic spaces with little to no detectable RNA. Eighteen of the 117 ORFs were found to have steady-state RNA amounts above background in both deep-sequencing and qRT-PCR experiments and ten of those were found to be polysome associated. In addition, the assembled transcriptome enabled a full mitogenome screen of RNA C→U editing sites. Six hundred and thirty five potential edits were found with 557 occurring within protein-coding genes, five in tRNA genes, and 73 in non-coding regions. These sites were found in every protein-coding transcript in the tobacco mitogenome. Conclusion These results suggest that a small number of the ORFs within the tobacco mitogenome may produce functional proteins and that RNA editing occurs in coding and non-coding regions of mitochondrial transcripts. PMID:24433288

  6. Identifying genomic changes associated with insecticide resistance in the dengue mosquito Aedes aegypti by deep targeted sequencing

    PubMed Central

    Faucon, Frederic; Dusfour, Isabelle; Gaude, Thierry; Navratil, Vincent; Boyer, Frederic; Chandre, Fabrice; Sirisopa, Patcharawan; Thanispong, Kanutcharee; Juntarajumnong, Waraporn; Poupardin, Rodolphe; Chareonviriyaphap, Theeraphap; Girod, Romain; Corbel, Vincent; Reynaud, Stephane; David, Jean-Philippe

    2015-01-01

    The capacity of mosquitoes to resist insecticides threatens the control of diseases such as dengue and malaria. Until alternative control tools are implemented, characterizing resistance mechanisms is crucial for managing resistance in natural populations. Insecticide biodegradation by detoxification enzymes is a common resistance mechanism; however, the genomic changes underlying this mechanism have rarely been identified, precluding individual resistance genotyping. In particular, the role of copy number variations (CNVs) and polymorphisms of detoxification enzymes have never been investigated at the genome level, although they can represent robust markers of metabolic resistance. In this context, we combined target enrichment with high-throughput sequencing for conducting the first comprehensive screening of gene amplifications and polymorphisms associated with insecticide resistance in mosquitoes. More than 760 candidate genes were captured and deep sequenced in several populations of the dengue mosquito Ae. aegypti displaying distinct genetic backgrounds and contrasted resistance levels to the insecticide deltamethrin. CNV analysis identified 41 gene amplifications associated with resistance, most affecting cytochrome P450s overtranscribed in resistant populations. Polymorphism analysis detected more than 30,000 variants and strong selection footprints in specific genomic regions. Combining Bayesian and allele frequency filtering approaches identified 55 nonsynonymous variants strongly associated with resistance. Both CNVs and polymorphisms were conserved within regions but differed across continents, confirming that genomic changes underlying metabolic resistance to insecticides are not universal. By identifying novel DNA markers of insecticide resistance, this study opens the way for tracking down metabolic changes developed by mosquitoes to resist insecticides within and among populations. PMID:26206155

  7. Deep sequencing and in silico analyses identify MYB-regulated gene networks and signaling pathways in pancreatic cancer.

    PubMed

    Azim, Shafquat; Zubair, Haseeb; Srivastava, Sanjeev K; Bhardwaj, Arun; Zubair, Asif; Ahmad, Aamir; Singh, Seema; Khushman, Moh'd; Singh, Ajay P

    2016-01-01

    We have recently demonstrated that the transcription factor MYB can modulate several cancer-associated phenotypes in pancreatic cancer. In order to understand the molecular basis of these MYB-associated changes, we conducted deep-sequencing of transcriptome of MYB-overexpressing and -silenced pancreatic cancer cells, followed by in silico pathway analysis. We identified significant modulation of 774 genes upon MYB-silencing (p < 0.05) that were assigned to 25 gene networks by in silico analysis. Further analyses placed genes in our RNA sequencing-generated dataset to several canonical signalling pathways, such as cell-cycle control, DNA-damage and -repair responses, p53 and HIF1α. Importantly, we observed downregulation of the pancreatic adenocarcinoma signaling pathway in MYB-silenced pancreatic cancer cells exhibiting suppression of EGFR and NF-κB. Decreased expression of EGFR and RELA was validated by both qPCR and immunoblotting and they were both shown to be under direct transcriptional control of MYB. These observations were further confirmed in a converse approach wherein MYB was overexpressed ectopically in a MYB-null pancreatic cancer cell line. Our findings thus suggest that MYB potentially regulates growth and genomic stability of pancreatic cancer cells via targeting complex gene networks and signaling pathways. Further in-depth functional studies are warranted to fully understand MYB signaling in pancreatic cancer. PMID:27354262

  8. Deep-sequencing analysis of an apricot tree with vein clearing symptoms reveals the presence of a novel betaflexivirus.

    PubMed

    Elbeaino, Toufic; Giampetruzzi, Annalisa; De Stradis, Angelo; Digiaro, Michele

    2014-03-01

    Deep-sequencing technology applied on double stranded RNA recovered from an apricot tree with vein clearing symptoms allowed the identification of a novel virus with a single-stranded RNA genome, for which the provisional name apricot vein clearing-associated virus (AVCaV) is proposed. Its genome comprises 7315nt, excluding the poly(A) tail, covering four open reading frames (ORFs). The putative virus-encoded proteins, i.e., replicase (REP), movement protein (MP), coat protein (CP) and nucleic acid-binding protein (NB), had an estimated molecular weight of 192.5, 32.15, 25.5 and 16.1kDa, respectively and shared the highest identity (ca. 40%) with citrus leaf blotch virus (CLBV) and with orthologs of other known members of the family Betaflexiviridae. The phylogenetic trees constructed with the sequences of the entire replication-associated polyproteins and the putative CP showed incongruent allocations of AVCaV within the genus Citrivirus or as an outgroup species close to the genus Vitivirus, respectively. The peculiar organization of its genome (four ORFs), different from that typical of members of Citrivirus (three ORFs) and Vitivirus (five ORFs) genera, makes likely AVCaV a novel member of an unassigned genus of the family Betaflexiviridae. In RT-PCR assays, AVCaV was found to infect only one out of 39 varieties of apricot tested; thus, suggesting to be limitedly spread. PMID:24389094

  9. Deep parallel sequencing reveals conserved and novel miRNAs in gill and hepatopancreas of giant freshwater prawn.

    PubMed

    Tan, Tian Tian; Chen, Maoshan; Harikrishna, Jennifer Ann; Khairuddin, Norliana; Mohd Shamsudin, Maizatul Izzah; Zhang, Guojie; Bhassu, Subha

    2013-10-01

    MicroRNAs (miRNAs) are ~20-22 nucleotides, non protein-coding RNA regulatory genes that post-transcriptionally regulate many protein-coding genes, influencing critical biological and metabolic processes. While the number of known microRNA is increasing, there is currently no published data for miRNA from giant freshwater prawns, Macrobrachium rosenbergii (M. rosenbergii), a commercially cultured and economically important food species. In this study, we identified novel miRNAs in the gill and hepatopancreas of M. rosenbergii. Through a deep parallel sequencing analysis and an in silico data analysis approach, 327 miRNA families were identified from small RNA libraries with reference to both the de novo transcriptome of M. rosenbergii obtained from RNA-Seq and to miRBase (Release 18.0, November 2012). Based on the identified mature miRNA and recovered precursor sequences that form appropriate hairpin structures, three conserved miRNA (miR125, miR750, miR993) and 27 novel miRNA candidates encoding messenger-like non-coding RNA were identified. miR-125, miR-750, G-m0002/H-m0009, G-m0005, G-m0008/H-m0016, G-m0011/H-m0027 and G-m0015 were selected for experimental validation with stem-loop quantitative RT-PCR and were found to be coherent with the expression profile of deep sequencing data as evaluated with Pearson's correlation coefficient (r = 0.835178 for miRNA in gill, r = 0.724131 for miRNA in hepatopancreas). Using a combinatorial approach of pathway enrichment analysis and inverse expression relationship of miRNA and mRNA, four co-expressed novel miRNA candidates (G-m0005, G-m0008/H-m0016, G-m0011/H-m0027, and G-m0015) were found to be associated with energy metabolism. In addition, the expression of the three novel miRNA candidates (G-m0005, G-m0008/H-m0016, and G-m0011/H-m0027) were also found to be significantly reduced at 9 and 24 h post infection in M. rosenbergii challenged with infectious hypodermal and hematopoietic necrosis virus, suggesting a functional

  10. The 2007 Nazko, British Columbia, earthquake sequence: Injection of magma deep in the crust beneath the Anahim volcanic belt

    USGS Publications Warehouse

    Cassidy, J.F.; Balfour, N.; Hickson, C.; Kao, H.; White, R.; Caplan-Auerbach, J.; Mazzotti, S.; Rogers, Gary C.; Al-Khoubbi, I.; Bird, A.L.; Esteban, L.; Kelman, M.; Hutchinson, J.; McCormack, D.

    2011-01-01

    On 9 October 2007, an unusual sequence of earthquakes began in central British Columbia about 20 km west of the Nazko cone, the most recent (circa 7200 yr) volcanic center in the Anahim volcanic belt. Within 25 hr, eight earthquakes of magnitude 2.3-2.9 occurred in a region where no earthquakes had previously been recorded. During the next three weeks, more than 800 microearthquakes were located (and many more detected), most at a depth of 25-31 km and within a radius of about 5 km. After about two months, almost all activity ceased. The clear P- and S-wave arrivals indicated that these were high-frequency (volcanic-tectonic) earthquakes and the b value of 1.9 that we calculated is anomalous for crustal earthquakes but consistent with volcanic-related events. Analysis of receiver functions at a station immediately above the seismicity indicated a Moho near 30 km depth. Precise relocation of the seismicity using a double-difference method suggested a horizontal migration at the rate of about 0:5 km=d, with almost all events within the lowermost crust. Neither harmonic tremor nor long-period events were observed; however, some spasmodic bursts were recorded and determined to be colocated with the earthquake hypocenters. These observations are all very similar to a deep earthquake sequence recorded beneath Lake Tahoe, California, in 2003-2004. Based on these remarkable similarities, we interpret the Nazko sequence as an indication of an injection of magma into the lower crust beneath the Anahim volcanic belt. This magma injection fractures rock, producing high-frequency, volcanic-tectonic earthquakes and spasmodic bursts.

  11. Small RNA and transcriptome deep sequencing proffers insight into floral gene regulation in Rosa cultivars

    PubMed Central

    2012-01-01

    Background Roses (Rosa sp.), which belong to the family Rosaceae, are the most economically important ornamental plants—making up 30% of the floriculture market. However, given high demand for roses, rose breeding programs are limited in molecular resources which can greatly enhance and speed breeding efforts. A better understanding of important genes that contribute to important floral development and desired phenotypes will lead to improved rose cultivars. For this study, we analyzed rose miRNAs and the rose flower transcriptome in order to generate a database to expound upon current knowledge regarding regulation of important floral characteristics. A rose genetic database will enable comprehensive analysis of gene expression and regulation via miRNA among different Rosa cultivars. Results We produced more than 0.5 million reads from expressed sequences, totalling more than 110 million bp. From these, we generated 35,657, 31,434, 34,725, and 39,722 flower unigenes from Rosa hybrid: ‘Vital’, ‘Maroussia’, and ‘Sympathy’ and Rosa rugosa Thunb. , respectively. The unigenes were assigned functional annotations, domains, metabolic pathways, Gene Ontology (GO) terms, Plant Ontology (PO) terms, and MIPS Functional Catalogue (FunCat) terms. Rose flower transcripts were compared with genes from whole genome sequences of Rosaceae members (apple, strawberry, and peach) and grape. We also produced approximately 40 million small RNA reads from flower tissue for Rosa, representing 267 unique miRNA tags. Among identified miRNAs, 25 of them were novel and 242 of them were conserved miRNAs. Statistical analyses of miRNA profiles revealed both shared and species-specific miRNAs, which presumably effect flower development and phenotypes. Conclusions In this study, we constructed a Rose miRNA and transcriptome database, and we analyzed the miRNAs and transcriptome generated from the flower tissues of four Rosa cultivars. The database provides a comprehensive genetic

  12. Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

    SciTech Connect

    Shi, CY; Yang, H; Wei, CL; Yu, O; Zhang, ZZ; Sun, J; Wan, XC

    2011-01-01

    time PCR (qRT-PCR). An extensive transcriptome dataset has been obtained from the deep sequencing of tea plant. The coverage of the transcriptome is comprehensive enough to discover all known genes of several major metabolic pathways. This transcriptome dataset can serve as an important public information platform for gene expression, genomics, and functional genomic studies in C. sinensis.

  13. Multiple Layers of Chimerism in a Single-Stranded DNA Virus Discovered by Deep Sequencing

    PubMed Central

    Krupovic, Mart; Zhi, Ning; Li, Jungang; Hu, Gangqing; Koonin, Eugene V.; Wong, Susan; Shevchenko, Sofiya; Zhao, Keji; Young, Neal S.

    2015-01-01

    Viruses with single-stranded (ss) DNA genomes infect hosts in all three domains of life and include many medically, ecologically, and economically important pathogens. Recently, a new group of ssDNA viruses with chimeric genomes has been discovered through viral metagenomics. These chimeric viruses combine capsid protein genes and replicative protein genes that, respectively, appear to have been inherited from viruses with positive-strand RNA genomes, such as tombusviruses, and ssDNA genomes, such as circoviruses, nanoviruses or geminiviruses. Here, we describe the genome sequence of a new representative of this virus group and reveal an additional layer of chimerism among ssDNA viruses. We show that not only do these viruses encompass genes for capsid proteins and replicative proteins that have distinct evolutionary histories, but also the replicative genes themselves are chimeras of functional domains inherited from viruses of different families. Our results underscore the importance of horizontal gene transfer in the evolution of ssDNA viruses and the role of genetic recombination in the emergence of novel virus groups. PMID:25840414

  14. Deep sequencing reveals microbiota dysbiosis of tongue coat in patients with liver carcinoma.

    PubMed

    Lu, Haifeng; Ren, Zhigang; Li, Ang; Zhang, Hua; Jiang, Jianwen; Xu, Shaoyan; Luo, Qixia; Zhou, Kai; Sun, Xiaoli; Zheng, Shusen; Li, Lanjuan

    2016-01-01

    Liver carcinoma (LC) is a common malignancy worldwide, associated with high morbidity and mortality. Characterizing microbiome profiles of tongue coat may provide useful insights and potential diagnostic marker for LC patients. Herein, we are the first time to investigate tongue coat microbiome of LC patients with cirrhosis based on 16S ribosomal RNA (rRNA) gene sequencing. After strict inclusion and exclusion criteria, 35 early LC patients with cirrhosis and 25 matched healthy subjects were enrolled. Microbiome diversity of tongue coat in LC patients was significantly increased shown by Shannon, Simpson and Chao 1 indexes. Microbiome on tongue coat was significantly distinguished LC patients from healthy subjects by principal component analysis. Tongue coat microbial profiles represented 38 operational taxonomic units assigned to 23 different genera, distinguishing LC patients. Linear discriminant analysis (LDA) effect size (LEfSe) reveals significant microbial dysbiosis of tongue coats in LC patients. Strikingly, Oribacterium and Fusobacterium could distinguish LC patients from healthy subjects. LEfSe outputs show microbial gene functions related to categories of nickel/iron_transport, amino_acid_transport, energy produced system and metabolism between LC patients and healthy subjects. These findings firstly identify microbiota dysbiosis of tongue coat in LC patients, may providing novel and non-invasive potential diagnostic biomarker of LC. PMID:27605161

  15. Deep Sequencing of Porphyromonas gingivalis and Comparative Transcriptome Analysis of a LuxS Mutant

    PubMed Central

    Hirano, Takanori; Beck, David A. C.; Demuth, Donald R.; Hackett, Murray; Lamont, Richard J.

    2012-01-01

    Porphyromonas gingivalis is a major etiological agent in chronic and aggressive forms of periodontal disease. The organism is an asaccharolytic anaerobe and is a constituent of mixed species biofilms in a variety of microenvironments in the oral cavity. P. gingivalis expresses a range of virulence factors over which it exerts tight control. High-throughput sequencing technologies provide the opportunity to relate functional genomics to basic biology. In this study we report qualitative and quantitative RNA-Seq analysis of the transcriptome of P. gingivalis. We have also applied RNA-Seq to the transcriptome of a ΔluxS mutant of P. gingivalis deficient in AI-2-mediated bacterial communication. The transcriptome analysis confirmed the expression of all predicted ORFs for strain ATCC 33277, including 854 hypothetical proteins, and allowed the identification of hitherto unknown transcriptional units. Twelve non-coding RNAs were identified, including 11 small RNAs and one cobalamin riboswitch. Fifty-seven genes were differentially regulated in the LuxS mutant. Addition of exogenous synthetic 4,5-dihydroxy-2,3-pentanedione (DPD, AI-2 precursor) to the ΔluxS mutant culture complemented expression of a subset of genes, indicating that LuxS is involved in both AI-2 signaling and non-signaling dependent systems in P. gingivalis. This work provides an important dataset for future study of P. gingivalis pathophysiology and further defines the LuxS regulon in this oral pathogen. PMID:22919670

  16. Deep Sequencing and Ecological Characterization of Gut Microbial Communities of Diverse Bumble Bee Species

    PubMed Central

    Lim, Haw Chuan; Chu, Chia-Ching; Seufferheld, Manfredo J.; Cameron, Sydney A.

    2015-01-01

    Gut bacterial communities of bumble bees are correlated with defense against pathogens. Further understanding this host-microbe association is vitally important as bumble bees are currently experiencing global population declines, potentially due in part to emergent diseases. In this study, we used pyrosequencing and community fingerprinting (ARISA) to characterize the gut microbial communities of nine bumble species from across the Bombus phylogeny. Overall, we delimited 74 bacterial taxa (operational taxonomic units or OTUs) belonging to Betaproteobacteria, Gammaproteobacteria, Bacilli, Actinobacteria, Flavobacteria and Alphaproteobacteria. Each bacterial community was taxonomically simple, containing an average of 1.9 common (relative abundance per sample > 5%) bacterial OTUs. The most abundant and prevalent (occurring in 92% of the samples) bacterial OTU, based on 16S rRNA sequences, closely matched that of the previously described Betaproteobacteria species Snodgrassella alvi. Bacteria that were first described in bee-related external environments dominated a number of gut bacterial communities, suggesting that they are not strictly dependent on the internal gut environment. The ARISA data showed a correlation between bacterial community structures and the geographic locations where the bees were sampled, suggesting that at least a subset of the bacterial species may be transmitted environmentally. Using light and fluorescent microscopy, we demonstrated that the gut bacteria form a biofilm on the internal epithelial surface of the ileum, corroborating results obtained from Apis mellifera. PMID:25768110

  17. Revealing stable processing products from ribosome-associated small RNAs by deep-sequencing data analysis

    PubMed Central

    Zywicki, Marek; Bakowska-Zywicka, Kamilla; Polacek, Norbert

    2012-01-01

    The exploration of the non-protein-coding RNA (ncRNA) transcriptome is currently focused on profiling of microRNA expression and detection of novel ncRNA transcription units. However, recent studies suggest that RNA processing can be a multi-layer process leading to the generation of ncRNAs of diverse functions from a single primary transcript. Up to date no methodology has been presented to distinguish stable functional RNA species from rapidly degraded side products of nucleases. Thus the correct assessment of widespread RNA processing events is one of the major obstacles in transcriptome research. Here, we present a novel automated computational pipeline, named APART, providing a complete workflow for the reliable detection of RNA processing products from next-generation-sequencing data. The major features include efficient handling of non-unique reads, detection of novel stable ncRNA transcripts and processing products and annotation of known transcripts based on multiple sources of information. To disclose the potential of APART, we have analyzed a cDNA library derived from small ribosome-associated RNAs in Saccharomyces cerevisiae. By employing the APART pipeline, we were able to detect and confirm by independent experimental methods multiple novel stable RNA molecules differentially processed from well known ncRNAs, like rRNAs, tRNAs or snoRNAs, in a stress-dependent manner. PMID:22266655

  18. A deep sequencing tool for partitioning clearance rates following antimalarial treatment in polyclonal infections

    PubMed Central

    Mideo, Nicole; Bailey, Jeffrey A.; Hathaway, Nicholas J.; Ngasala, Billy; Saunders, David L.; Lon, Chanthap; Kharabora, Oksana; Jamnik, Andrew; Balasubramanian, Sujata; Björkman, Anders; Mårtensson, Andreas; Meshnick, Steven R.; Read, Andrew F.; Juliano, Jonathan J.

    2016-01-01

    Background and objectives: Current tools struggle to detect drug-resistant malaria parasites when infections contain multiple parasite clones, which is the norm in high transmission settings in Africa. Our aim was to develop and apply an approach for detecting resistance that overcomes the challenges of polyclonal infections without requiring a genetic marker for resistance. Methodology: Clinical samples from patients treated with artemisinin combination therapy were collected from Tanzania and Cambodia. By deeply sequencing a hypervariable locus, we quantified the relative abundance of parasite subpopulations (defined by haplotypes of that locus) within infections and revealed evolutionary dynamics during treatment. Slow clearance is a phenotypic, clinical marker of artemisinin resistance; we analyzed variation in clearance rates within infections by fitting parasite clearance curves to subpopulation data. Results: In Tanzania, we found substantial variation in clearance rates within individual patients. Some parasite subpopulations cleared as slowly as resistant parasites observed in Cambodia. We evaluated possible explanations for these data, including resistance to drugs. Assuming slow clearance was a stable phenotype of subpopulations, simulations predicted that modest increases in their frequency could substantially increase time to cure. Conclusions and implications: By characterizing parasite subpopulations within patients, our method can detect rare, slow clearing parasites in vivo whose phenotypic effects would otherwise be masked. Since our approach can be applied to polyclonal infections even when the genetics underlying resistance are unknown, it could aid in monitoring the emergence of artemisinin resistance. Our application to Tanzanian samples uncovers rare subpopulations with worrying phenotypes for closer examination. PMID:26817485

  19. Deep sequencing reveals microbiota dysbiosis of tongue coat in patients with liver carcinoma

    PubMed Central

    Lu, Haifeng; Ren, Zhigang; Li, Ang; Zhang, Hua; Jiang, Jianwen; Xu, Shaoyan; Luo, Qixia; Zhou, Kai; Sun, Xiaoli; Zheng, Shusen; Li, Lanjuan

    2016-01-01

    Liver carcinoma (LC) is a common malignancy worldwide, associated with high morbidity and mortality. Characterizing microbiome profiles of tongue coat may provide useful insights and potential diagnostic marker for LC patients. Herein, we are the first time to investigate tongue coat microbiome of LC patients with cirrhosis based on 16S ribosomal RNA (rRNA) gene sequencing. After strict inclusion and exclusion criteria, 35 early LC patients with cirrhosis and 25 matched healthy subjects were enrolled. Microbiome diversity of tongue coat in LC patients was significantly increased shown by Shannon, Simpson and Chao 1 indexes. Microbiome on tongue coat was significantly distinguished LC patients from healthy subjects by principal component analysis. Tongue coat microbial profiles represented 38 operational taxonomic units assigned to 23 different genera, distinguishing LC patients. Linear discriminant analysis (LDA) effect size (LEfSe) reveals significant microbial dysbiosis of tongue coats in LC patients. Strikingly, Oribacterium and Fusobacterium could distinguish LC patients from healthy subjects. LEfSe outputs show microbial gene functions related to categories of nickel/iron_transport, amino_acid_transport, energy produced system and metabolism between LC patients and healthy subjects. These findings firstly identify microbiota dysbiosis of tongue coat in LC patients, may providing novel and non-invasive potential diagnostic biomarker of LC. PMID:27605161

  20. Emergence of telaprevir-resistant variants detected by ultra-deep sequencing after triple therapy in patients infected with HCV genotype 1.

    PubMed

    Akuta, Norio; Suzuki, Fumitaka; Seko, Yuya; Kawamura, Yusuke; Sezaki, Hitomi; Suzuki, Yoshiyuki; Hosaka, Tetsuya; Kobayashi, Masahiro; Hara, Tasuku; Kobayashi, Mariko; Saitoh, Satoshi; Arase, Yasuji; Ikeda, Kenji; Kumada, Hiromitsu

    2013-06-01

    Using ultra-deep sequencing technology, the present was designed to investigate whether the emergence of telaprevir-resistant variants (amino acid substitutions of aa36, aa54, aa155, aa156, and aa170 positions in HCV NS3 region) after commencement of triple therapy of telaprevir/peginterferon (PEG-IFN)/ribavirin could be predicted at baseline in previous non-responders to dual therapy. Fourteen patients infected with HCV genotype 1 who did not respond to previous PEG-IFN/ribavirin, received a 24-week regimen of triple therapy, and were evaluated for appearance of telaprevir-resistant variants (amino acid substitutions of more than 0.2% among the total coverage) by ultra-deep sequencing. The sustained virological response rate was 28.6% (4 of 14 patients), which was significantly higher in patients with Arg70 (substitution at core aa70) and partial response (type of previous response to PEG-IFN/ribavirin) than in other patients. Telaprevir-resistant variants at baseline were detected in 7.1% (1 of 14 patients) by direct sequencing and in 21.4% (3 of 14 patients) by ultra-deep sequencing. The appearance of telaprevir-resistant variants was examined by ultra-deep sequencing in 10 who did not show sustained virological responders. De novo variants emerged at re-elevation of viral load, regardless of variant frequencies at baseline (one patient with very high frequency variants [T54S: 99.9%], two patients with very low frequency variants [V36A: 0.2%; and V170A: 0.4%], and seven patients of undetectable variants). It is concluded that it is difficult to predict at baseline the emergence of telaprevir-resistant variants after commencement of triple therapy in prior non-responders of HCV genotype 1, even with the use of ultra-deep sequencing. PMID:23588728

  1. Microbial Diversity of the Brine-Seawater Interface of the Kebrit Deep, Red Sea, Studied via 16S rRNA Gene Sequences and Cultivation Methods

    PubMed Central

    Eder, Wolfgang; Jahnke, Linda L.; Schmidt, Mark; Huber, Robert

    2001-01-01

    The brine-seawater interface of the Kebrit Deep, northern Red Sea, was investigated for the presence of microorganisms using phylogenetic analysis combined with cultivation methods. Under strictly anaerobic culture conditions, novel halophiles were isolated. The new rod-shaped isolates belong to the halophilic genus Halanaerobium and are the first representatives of the genus obtained from deep-sea, anaerobic brine pools. Within the genus Halanaerobium, they represent new species which grow chemoorganotrophically at NaCl concentrations ranging from 5 to 34%. The cellular fatty acid compositions are consistent with those of other Halanaerobium representatives, showing unusually large amounts of Δ7 and Δ11 16:1 fatty acids. Phylogenetic analysis of the brine-seawater interface sample revealed the presence of various bacterial 16S rRNA gene sequences dominated by cultivated members of the bacterial domain, with the majority affiliated with the genus Halanaerobium. The new Halanaerobium 16S rRNA clone sequences showed the highest similarity (99.9%) to the sequence of isolate KT-8-13 from the Kebrit Deep brine. In this initial survey, our polyphasic approach demonstrates that novel halophiles thrive in the anaerobic, deep-sea brine pool of the Kebrit Deep, Red Sea. They may contribute significantly to the anaerobic degradation of organic matter enriched at the brine-seawater interface. PMID:11425725

  2. Genetics and Prognostication in Splenic Marginal Zone Lymphoma: Revelations from Deep Sequencing

    PubMed Central

    Gibson, Jane; Wang, Jun; Walewska, Renata; Parker, Helen; Parker, Anton; Davis, Zadie; Gardiner, Anne; McIver-Brown, Neil; Kalpadakis, Christina; Xochelli, Aliki; Anagnostopoulos, Achilles; Fazi, Claudia; de Castro, David Gonzalez; Dearden, Claire; Pratt, Guy; Rosenquist, Richard; Ashton-Key, Margaret; Forconi, Francesco; Collins, Andrew; Ghia, Paolo; Matutes, Estella; Pangalis, Gerassimos; Stamatopoulos, Kostas; Oscier, David; Strefford, Jonathan C

    2015-01-01

    Purpose Mounting evidence supports the clinical significance of gene mutations and immunogenetic features in common mature B-cell malignancies. Experimental Design We undertook a detailed characterization of the genetic background of splenic marginal zone lymphoma (SMZL), using targeted re-sequencing and explored potential clinical implications in a multinational cohort of 175 SMZL patients. Results We identified recurrent mutations in TP53 (16%), KLF2 (12%), NOTCH2 (10%), TNFAIP3 (7%), MLL2 (11%), MYD88 (7%) and ARID1A (6%), all genes known to be targeted by somatic mutation in SMZL. KLF2 mutations were early, clonal events, enriched in patients with del(7q) and IGHV1-2*04 B-cell receptor immunoglobulins, and were associated with a short median time-to-first-treatment (0.12 vs. 1.11 yrs; P=0.01). In multivariate analysis mutations in NOTCH2 (HR 2.12, 95%CI 1.02-4.4, P=0.044) and 100% germline IGHV gene identity (HR 2.19, 95%CI 1.05-4.55, P=0.036) were independent markers of short time-to-first-treatment, while TP53 mutations were an independent marker of short overall survival (HR 2.36, 95% CI 1.08-5.2, P=0.03). Conclusion We identify key associations between gene mutations and clinical outcome, demonstrating for the first time that NOTCH2 and TP53 gene mutations are independent markers of reduced treatment-free and overall survival, respectively. PMID:25779943

  3. Deep RNA Sequencing Reveals Novel Cardiac Transcriptomic Signatures for Physiological and Pathological Hypertrophy

    PubMed Central

    Kim, Taeyong; Kim, Do Han

    2012-01-01

    Although both physiological hypertrophy (PHH) and pathological hypertrophy (PAH) of the heart have similar morphological appearances, only PAH leads to fatal heart failure. In the present study, we used RNA sequencing (RNA-Seq) to determine the transcriptomic signatures for both PHH and PAH. Approximately 13–20 million reads were obtained for both models, among which PAH showed more differentially expressed genes (DEGs) (2,041) than PHH (245). The expression of 417 genes was barely detectable in the normal heart but was suddenly activated in PAH. Among them, Foxm1 and Plk1 are of particular interest, since Ingenuity Pathway Analysis (IPA) using DEGs and upstream motif analysis showed that they are essential hub proteins that regulate the expression of downstream proteins associated with PAH. Meanwhile, 52 genes related to collagen, chemokines, and actin showed opposite expression patterns between PHH and PAH. MAZ-binding motifs were enriched in the upstream region of the participating genes. Alternative splicing (AS) of exon variants was also examined using RNA-Seq data for PAH and PHH. We found 317 and 196 exon inclusions and exon exclusions, respectively, for PAH, and 242 and 172 exon inclusions and exclusions, respectively for PHH. The AS pattern was mostly related to gains or losses of domains, changes in activity, and localization of the encoded proteins. The splicing variants of 8 genes (i.e., Fhl1, Rcan1, Ndrg2, Synpo, Ttll1, Cxxc5, Egfl7, and Tmpo) were experimentally confirmed. Multilateral pathway analysis showed that the patterns of quantitative (DEG) and qualitative (AS) changes differ depending on the type of pathway in PAH and PHH. One of the most significant changes in PHH is the severe downregulation of autoimmune pathways accompanied by significant AS. These findings revealed the unique transcriptomic signatures of PAH and PHH and also provided a more comprehensive understanding at both the quantitative and qualitative levels. PMID:22523601

  4. Makah Formation; a deep-marginal-basin sequence of late Eocene and Oligocene age in the northwestern Olympic Peninsula, Washington

    USGS Publications Warehouse

    Snavely, P. D.; Niem, A.R.; MacLeod, N.S.; Pearl, J.E.; Rau, W.W.

    1980-01-01

    The Makah Formation of the Twin River Group crops out in a northwest-trending linear belt in the northwesternmost part of the Olympic Peninsula, Wash. This marine sequence consists of 2800 meters of predominantly thin-bedded siltstone and sandstone that encloses six distinctive newly named members--four thick-bedded amalgamated turbidite sandstone members, an olistostromal shallow-water marine sandstone and conglomerate member, and a thin-bedded water-laid tuff member. A local unconformity of submarine origin occurs within the lower part of the Makah Formation except in the central part of the study area, where it forms the contact between the older Hoko River Formation and the Makah. Foraminiferal faunas indicate that the Makah Formation ranges in age from late Eocene (late Narizian) to late Oligocene (Zemorrian) and was deposited in a predominantly lower to middle bathyal environment. The Makah Formation is part of a deep-marginalbasin facies that crops out in the western part of the Olympic Peninsula, in southwesternmost Washington and coastal embayments in northwestern Oregon, and along the central part of the coast of western Vancouver Island. On the basis of limited subsurface data from exploratory wells, correlative deep-marginal-basin deposits underlie the inner continental shelf of Oregon and the continental shelf (Tofino basin) along the southwestern side of Vancouver Island. Directional structures in the Makah Formation indicate that the predominantly lithic arkosic sandstone that forms the turbidite packets was derived from the northwest. A possible source of the clastic material is the dioritic, granitic, and volcanic terranes in the vicinity of the Hesquiat Peninsula and Barkley Sound on the west coast of Vancouver Island. Vertical and lateral variations of turbidite facies suggest that the four packets of sandstone were formed as depositional lobes on an outer submarine fan. The thin-bedded strata between the turbidite packets have characteristics of

  5. Anoxic Events, Productivity Rhythms, and the Orbital Signature in a Mid-Cretaceous Deep-Sea Sequence from Central Italy

    NASA Astrophysics Data System (ADS)

    Herbert, Timothy D.; Stallard, R. F.; Fischer, Alfred G.

    1986-12-01

    Albian pelagic sediments from the Umbrian Apennines exhibit rhythmic bedding in outcrop. High temporal resolution (˜ 4-kyr spacing) sampling of carbonate, Si, and Al along a 1.6-m.y. interval of core demonstrates that changes in accumulation rate of calcareous and siliceous microorganisms were controlled by insolation cycles. Optical densitometry logging correlates variations of geochemical parameters to episodes of deep-sea anoxia, indicated by numerous black, laminated zones ("black shales"). Harmonic analysis of data shows oscillations at three frequencies, which correspond closely in estimated duration and ratio to the modern eccentricity and precessional orbital cycles. As in Pleistocene marine sequences, the eccentricity (approximately 100 kyr) term dominates the sedimentary variance in the Umbrian cycles. Anoxic episodes occur in phase with a limestone-marl repetition and reflect minima of the precessional (21 kyr) cycle. The covariance of carbonate and silica, with estimates of foraminiferal abundances, suggest that carbonate cycles are the result of changes in surface productivity of calcareous organisms rather than an effect of dissolution or diagenesis. Recognition of orbital cyclicity provides a tool for making quantitative estimates of these changes. Productivity during anoxic events was uniformly low (0.4- 0.5 g/cm2/kyr carbonate, 0- .05 g/cm2/kyr silica), but it rose to high levels during oxygenated periods (1- 2.5 g/cm2/kyr carbonate, 0.1- 0.3 g/cm2/kyr silica). Vertical overturning and organic carbon flux decreased during anoxic episodes, but decreased oxygenation of the deep water led to enhanced preservation of the organic matter delivered. The degree of paleoceanographic variability indicated by this study requires that accepted notions of climate stability and oceanic quiescence during nonglacial times be reassessed.

  6. Isolation and Characterization of Microsatellite DNA Markers in the Deep-Sea Amphipod Paralicella tenuipes by Illumina MiSeq Sequencing.

    PubMed

    Ritchie, Heather; Jamieson, Alan J; Piertney, Stuart B

    2016-07-01

    Here, we describe the development of 16 polymorphic microsatellite markers using an Illumina MiSeq sequencing approach in the deep-sea amphipod Paralicella tenuipes A total of 25 577 844 DNA sequences were filtered for microsatellite motifs of which 197 873 sequences were identified. From these sequences, 64 had sufficient flanking regions for primer design and 16 of these loci were polymorphic. Between 5 and 30 alleles were detected per locus, with an average of 13.63 alleles per locus, across a total of 120 individuals from 5 separate deep sea trenches from the Pacific Ocean. For the 16 loci, observed and expected heterozygosity values ranged from 0.116 to 0.414 and 0.422 to 0.820, respectively, with one locus displaying significant deviation from Hardy-Weinberg equilibrium. The microsatellite loci that have been isolated and described here are the first molecular markers developed for deep sea amphipods and will be invaluable for elucidating the genetic population structure and the extent of connectivity between deep ocean trenches. PMID:27012615

  7. Draft Genome Sequences of Two Thiomicrospira Strains Isolated from the Brine-Seawater Interface of Kebrit Deep in the Red Sea

    PubMed Central

    Zhang, Guishan; Fauzi Haroon, Mohamed; Zhang, Ruifu; Hikmawan, Tyas

    2016-01-01

    Two Thiomicrospira strains, WB1 and XS5, were isolated from the Kebrit Deep brine-seawater interface in the Red Sea, Saudi Arabia. Here, we present the draft genome sequences of these gammaproteobacteria, which both produce sulfuric acid from thiosulfate in culture. PMID:26966216

  8. Draft Genome Sequences of Two Thiomicrospira Strains Isolated from the Brine-Seawater Interface of Kebrit Deep in the Red Sea.

    PubMed

    Zhang, Guishan; Fauzi Haroon, Mohamed; Zhang, Ruifu; Hikmawan, Tyas; Stingl, Ulrich

    2016-01-01

    Two Thiomicrospira strains, WB1 and XS5, were isolated from the Kebrit Deep brine-seawater interface in the Red Sea, Saudi Arabia. Here, we present the draft genome sequences of these gammaproteobacteria, which both produce sulfuric acid from thiosulfate in culture. PMID:26966216

  9. Draft Genome Sequence of Marinomonas sp. Strain D104, a Polycyclic Aromatic Hydrocarbon-Degrading Bacterium from the Deep-Sea Sediment of the Arctic Ocean

    PubMed Central

    Dong, Chunming; Bai, Xiuhua; Lai, Qiliang; Xie, Yanrong; Chen, Xin

    2014-01-01

    Marinomonas sp. strain D104 was isolated from a polycyclic aromatic hydrocarbon-degrading consortium enriched from deep-sea sediment from the Arctic Ocean. The draft genome sequence of D104 (approximately 3.83 Mbp) contains 62 contigs and 3,576 protein-encoding genes, with a G+C content of 44.8%. PMID:24459272

  10. Draft Genome Sequence of Pseudoalteromonas sp. Strain XI10 Isolated from the Brine-Seawater Interface of Erba Deep in the Red Sea

    PubMed Central

    Zhang, Guishan; Fauzi Haroon, Mohamed; Zhang, Ruifu; Hikmawan, Tyas

    2016-01-01

    Pseudoalteromonas sp. strain XI10 was isolated from the brine-seawater interface of Erba Deep in the Red Sea, Saudi Arabia. Here, we present the draft genome sequence of strain XI10, a gammaproteobacterium that synthesizes polysaccharides for biofilm formation when grown in liquid culture. PMID:26966209

  11. Comparison of liver microRNA transcriptomes of Tibetan and Yorkshire pigs by deep sequencing.

    PubMed

    Li, Yanyue; Li, Xiaocheng; Sun, Wen-Kui; Cheng, Chi; Chen, Yi-Hui; Zeng, Kai; Chen, Xiaohui; Gu, Yiren; Gao, Rong; Liu, Rui; Lv, Xuebin

    2016-02-15

    MicroRNAs (miRNAs) play an important role in the modulation of various metabolic processes in the liver, yet little is known about the liver microRNAome (miRNAome) of the Tibetan pig. Here we used the Yorkshire pig as a control to analyze the Tibetan pig-specific liver miRNAome, and for preliminary investigation of differentially expressed miRNAs participating in metabolism. A comprehensive analysis of Tibetan and Yorkshire pig liver miRNAomes by small RNA sequencing identified 362 unique miRNAs. Among these, 304 were co-expressed in both libraries, and 10 and 48 miRNAs were specifically expressed. Differential expression analysis of miRNAs, miRNA target prediction and KEGG analysis revealed that differentially expressed miRNAs were associated mainly with the metabolism of glucose, lipid and protein. Six differentially expressed miRNAs (miR-34a, miR-326, miR-1, miR-335, miR-185 and miR-378) participating in the metabolism of glucose and lipid were identified. Additionally, qPCR results revealed that a lower expression of miR-34a in Tibetan pig liver may promote gluconeogenesis by increasing the expression of Sirtuin type 1 (Sirt1); a lower expression of miR-1 in Tibetan pig liver may promote the synthesis and accumulation of lipid by increasing the expression of Liver X receptor α (LXRα); and a lower expression of miR-185 in Tibetan pig liver may promote the uptake of cholesterol from blood and secretion of bile by increasing the expression of the scavenger receptor class B type I (SR-BI). Our results provide new information and understanding of porcine miRNA profiles, which may help explain the regulatory mechanisms of miRNAs in the metabolic functions of Tibetan pig liver, and provide new biomarkers to assist in the development of Tibetan pig breeding characteristics. PMID:26656174

  12. HIV Drug Resistance Mutations (DRMs) Detected by Deep Sequencing in Virologic Failure Subjects on Therapy from Hunan Province, China

    PubMed Central

    He, Jianmei; Zheng, Jun; Chiarella, Jennifer; Kozal, Michael J.

    2016-01-01

    Objective Determine HIV drug resistance mutations (DRMs) prevalence at low and high levels in ART-experienced patients experiencing virologic failure (VF). Methods 29 subjects from 18 counties in Hunan Province that experienced VF were evaluated for the prevalence of DRMs (Stanford DRMs with an algorithm value ≥15, include low-, intermediate and high-level resistance) by both Sanger sequencing (SS) and deep sequencing (DS) to 1% frequency levels. Results DS was performed on samples from 29 ART-experienced subjects; the median viral load 4.95×104 c/ml; 82.76% subtype CRF01_AE. 58 DRMs were detected by DS. 18 DRMs were detected by SS. Of the 58 mutations detected by DS, 40 were at levels <20% frequency (26 NNRTI, 12 NRTI and 2 PI) and the majority of these 95.00% (38/40) were not detected by standard genotyping. Of these 40 low-level DRMs, 16 (40%) were detected at frequency levels of 1–4% and 24 (60%) at levels of 5–19%. SS detected 15 of 17 (88.24%) DRMs at levels ≥ 20% that were detected by DS. The only variable associated with the detection of DRMs by DS was ART adherence (missed doses in the prior 7 days); all patients that reported missing a dose in the last 7 days had DRMs detected by DS. Conclusions DS of VF samples from treatment experienced subjects infected with primarily AE subtype frequently identified Stanford HIVdb NRTI and NNRTI resistance mutations with an algorithm value 15. Low frequency level resistant variants detected by DS were frequently missed by standard genotyping in VF specimens from antiretroviral-experienced subjects. PMID:26895182

  13. Deep sequencing reveals important roles of microRNAs in response to drought and salinity stress in cotton.

    PubMed

    Xie, Fuliang; Wang, Qinglian; Sun, Runrun; Zhang, Baohong

    2015-02-01

    Drought and salinity are two major environmental factors adversely affecting plant growth and productivity. However, the regulatory mechanism is unknown. In this study, the potential roles of small regulatory microRNAs (miRNAs) in cotton response to those stresses were investigated. Using next-generation deep sequencing, a total of 337 miRNAs with precursors were identified, comprising 289 known miRNAs and 48 novel miRNAs. Of these miRNAs, 155 miRNAs were expressed differentially. Target prediction, Gene Ontology (GO)-based functional classification, and Kyoto Encyclopedia of Genes and Genomes (KEGG)-based functional enrichment show that these miRNAs might play roles in response to salinity and drought stresses through targeting a series of stress-related genes. Degradome sequencing analysis showed that at least 55 predicted target genes were further validated to be regulated by 60 miRNAs. CitationRank-based literature mining was employed to determinhe the importance of genes related to drought and salinity stress. The NAC, MYB, and MAPK families were ranked top under the context of drought and salinity, indicating their important roles for the plant to combat drought and salinity stress. According to target prediction, a series of cotton miRNAs are associated with these top-ranked genes, including miR164, miR172, miR396, miR1520, miR6158, ghr-n24, ghr-n56, and ghr-n59. Interestingly, 163 cotton miRNAs were also identified to target 210 genes that are important in fibre development. These results will contribute to cotton stress-resistant breeding as well as understanding fibre development. PMID:25371507

  14. Deep Sequencing of the Scutellaria baicalensis Georgi Transcriptome Reveals Flavonoid Biosynthetic Profiling and Organ-Specific Gene Expression

    PubMed Central

    Liu, Jinxin; Hou, Jingyi; Jiang, Chao; Li, Geng; Lu, Heng; Meng, Fanyun; Shi, Linchun

    2015-01-01

    Scutellaria baicalensis Georgi has long been used in traditional medicine to treat various such widely varying diseases and has been listed in the Chinese Pharmacopeia, the Japanese Pharmacopeia, the Korean Pharmacopoeia and the European Pharmacopoeia. Flavonoids, especially wogonin, wogonoside, baicalin, and baicalein, are its main functional ingredients with various pharmacological activities. Although pharmaological studies for these flavonoid components have been well conducted, the molecular mechanism of their biosynthesis remains unclear in S. baicalensis. In this study, Illumina/Solexa deep sequencing generated more than 91 million paired-end reads and 49,507 unigenes from S. baicalensis roots, stems, leaves and flowers. More than 70% unigenes were annotated in at least one of the five public databases and 13,627 unigenes were assigned to 3,810 KEGG genes involved in 579 different pathways. 54 unigenes that encode 12 key enzymes involved in the pathway of flavonoid biosynthesis were discovered. One baicalinase and three baicalein 7-O-glucuronosyltransferases genes potentially involved in the transformation between baicalin/wogonoside and baicalein/wogonin were identified. Four candidate 6-hydroxylase genes for the formation of baicalin/baicalein and one candidate 8-O-methyltransferase gene for the biosynthesis of wogonoside/wogonin were also recognized. Our results further support the conclusion that, in S. baicalensis, 3,5,7-trihydroxyflavone was the precursor of the four above compounds. Then, the differential expression models and simple sequence repeats associated with these genes were carefully analyzed. All of these results not only enrich the gene resource but also benefit research into the molecular genetics and functional genomics in S. baicalensis. PMID:26317778

  15. Deep sequencing reveals important roles of microRNAs in response to drought and salinity stress in cotton

    PubMed Central

    Xie, Fuliang; Wang, Qinglian; Sun, Runrun; Zhang, Baohong

    2015-01-01

    Drought and salinity are two major environmental factors adversely affecting plant growth and productivity. However, the regulatory mechanism is unknown. In this study, the potential roles of small regulatory microRNAs (miRNAs) in cotton response to those stresses were investigated. Using next-generation deep sequencing, a total of 337 miRNAs with precursors were identified, comprising 289 known miRNAs and 48 novel miRNAs. Of these miRNAs, 155 miRNAs were expressed differentially. Target prediction, Gene Ontology (GO)-based functional classification, and Kyoto Encyclopedia of Genes and Genomes (KEGG)-based functional enrichment show that these miRNAs might play roles in response to salinity and drought stresses through targeting a series of stress-related genes. Degradome sequencing analysis showed that at least 55 predicted target genes were further validated to be regulated by 60 miRNAs. CitationRank-based literature mining was employed to determinhe the importance of genes related to drought and salinity stress. The NAC, MYB, and MAPK families were ranked top under the context of drought and salinity, indicating their important roles for the plant to combat drought and salinity stress. According to target prediction, a series of cotton miRNAs are associated with these top-ranked genes, including miR164, miR172, miR396, miR1520, miR6158, ghr-n24, ghr-n56, and ghr-n59. Interestingly, 163 cotton miRNAs were also identified to target 210 genes that are important in fibre development. These results will contribute to cotton stress-resistant breeding as well as understanding fibre development. PMID:25371507

  16. Refining transcriptional programs in kidney development by integration of deep RNA-sequencing and array-based spatial profiling

    PubMed Central

    2011-01-01

    Background The developing mouse kidney is currently the best-characterized model of organogenesis at a transcriptional level. Detailed spatial maps have been generated for gene expression profiling combined with systematic in situ screening. These studies, however, fall short of capturing the transcriptional complexity arising from each locus due to the limited scope of microarray-based technology, which is largely based on "gene-centric" models. Results To address this, the polyadenylated RNA and microRNA transcriptomes of the 15.5 dpc mouse kidney were profiled using strand-specific RNA-sequencing (RNA-Seq) to a depth sufficient to complement spatial maps from pre-existing microarray datasets. The transcriptional complexity of RNAs arising from mouse RefSeq loci was catalogued; including 3568 alternatively spliced transcripts and 532 uncharacterized alternate 3' UTRs. Antisense expressions for 60% of RefSeq genes was also detected including uncharacterized non-coding transcripts overlapping kidney progenitor markers, Six2 and Sall1, and were validated by section in situ hybridization. Analysis of genes known to be involved in kidney development, particularly during mesenchymal-to-epithelial transition, showed an enrichment of non-coding antisense transcripts extended along protein-coding RNAs. Conclusion The resulting resource further refines the transcriptomic cartography of kidney organogenesis by integrating deep RNA sequencing data with locus-based information from previously published expression atlases. The added resolution of RNA-Seq has provided the basis for a transition from classical gene-centric models of kidney development towards more accurate and detailed "transcript-centric" representations, which highlights the extent of transcriptional complexity of genes that direct complex development events. PMID:21888672

  17. Identification of MicroRNAs and transcript targets in Camelina sativa by deep sequencing and computational methods

    DOE PAGESBeta

    Poudel, Saroj; Aryal, Niranjan; Lu, Chaofu; Wang, Tai

    2015-03-31

    Camelina sativa is an annual oilseed crop that is under intensive development for renewable resources of biofuels and industrial oils. MicroRNAs, or miRNAs, are endogenously encoded small RNAs that play key roles in diverse plant biological processes. Here, we conducted deep sequencing on small RNA libraries prepared from camelina leaves, flower buds and two stages of developing seeds corresponding to initial and peak storage products accumulation. Computational analyses identified 207 known miRNAs belonging to 63 families, as well as 5 novel miRNAs. These miRNAs, especially members of the miRNA families, varied greatly in different tissues and developmental stages. The predictedmore » miRNA target genes are involved in a broad range of physiological functions including lipid metabolism. This report is the first step toward elucidating roles of miRNAs in C. sativa and will provide additional tools to improve this oilseed crop for biofuels and biomaterials.« less

  18. Longitudinal copy number, whole exome and targeted deep sequencing of 'good risk' IGHV-mutated CLL patients with progressive disease

    PubMed Central

    Rose-Zerilli, M J J; Gibson, J; Wang, J; Tapper, W; Davis, Z; Parker, H; Larrayoz, M; McCarthy, H; Walewska, R; Forster, J; Gardiner, A; Steele, A J; Chelala, C; Ennis, S; Collins, A; Oakes, C C; Oscier, D G; Strefford, J C

    2016-01-01

    The biological features of IGHV-M chronic lymphocytic leukemia responsible for disease progression are still poorly understood. We undertook a longitudinal study close to diagnosis, pre-treatment and post relapse in 13 patients presenting with cMBL or Stage A disease and good-risk biomarkers (IGHV-M genes, no del(17p) or del(11q) and low CD38 expression) who nevertheless developed progressive disease, of whom 10 have required therapy. Using cytogenetics, fluorescence in situ hybridisation, genome-wide DNA methylation and copy number analysis together with whole exome, targeted deep- and Sanger sequencing at diagnosis, we identified mutations in established chronic lymphocytic leukemia driver genes in nine patients (69%), non-coding mutations (PAX5 enhancer region) in three patients and genomic complexity in two patients. Branching evolutionary trajectories predominated (n=9/13), revealing intra-tumoural epi- and genetic heterogeneity and sub-clonal competition before therapy. Of the patients subsequently requiring treatment, two had sub-clonal TP53 mutations that would not be detected by standard methodologies, three qualified for the very-low-risk category defined by integrated mutational and cytogenetic analysis and yet had established or putative driver mutations and one patient developed progressive, therapy-refractory disease associated with the emergence of an IGHV-U clone. These data suggest that extended genomic and immunogenetic screening may have clinical utility in patients with apparent good-risk disease. PMID:26847028

  19. Identification of microRNAs by small RNA deep sequencing for synthetic microRNA mimics to control Spodoptera exigua.

    PubMed

    Zhang, Yu Liang; Huang, Qi Xing; Yin, Guo Hua; Lee, Samantha; Jia, Rui Zong; Liu, Zhi Xin; Yu, Nai Tong; Pennerman, Kayla K; Chen, Xin; Guo, An Ping

    2015-02-25

    Beet armyworm, Spodoptera exigua, is a major pest of cotton around the world. With the increase of resistance to Bacillus thuringiensis (Bt) toxin in transgenic cotton plants, there is a need to develop an alternative control approach that can be used in combination with Bt transgenic crops as part of resistance management strategies. MicroRNAs (miRNAs), a non-coding small RNA family (18-25 nt), play crucial roles in various biological processes and over-expression of miRNAs has been shown to interfere with the normal development of insects. In this study, we identified 127 conserved miRNAs in S. exigua by using small RNA deep sequencing technology. From this, we tested the effects of 11 miRNAs on larval development. We found three miRNAs, Sex-miR-10-1a, Sex-miR-4924, and Sex-miR-9, to be differentially expressed during larval stages of S. exigua. Oral feeding experiments using synthetic miRNA mimics of Sex-miR-10-1a, Sex-miR-4924, and Sex-miR-9 resulted in suppressed growth of S. exigua and mortality. Over-expression of Sex-miR-4924 caused a significant reduction in the expression level of chitinase 1 and caused abortive molting in the insects. Therefore, we demonstrated a novel approach of using miRNA mimics to control S. exigua development. PMID:25528266

  20. Longitudinal copy number, whole exome and targeted deep sequencing of 'good risk' IGHV-mutated CLL patients with progressive disease.

    PubMed

    Rose-Zerilli, M J J; Gibson, J; Wang, J; Tapper, W; Davis, Z; Parker, H; Larrayoz, M; McCarthy, H; Walewska, R; Forster, J; Gardiner, A; Steele, A J; Chelala, C; Ennis, S; Collins, A; Oakes, C C; Oscier, D G; Strefford, J C

    2016-06-01

    The biological features of IGHV-M chronic lymphocytic leukemia responsible for disease progression are still poorly understood. We undertook a longitudinal study close to diagnosis, pre-treatment and post relapse in 13 patients presenting with cMBL or Stage A disease and good-risk biomarkers (IGHV-M genes, no del(17p) or del(11q) and low CD38 expression) who nevertheless developed progressive disease, of whom 10 have required therapy. Using cytogenetics, fluorescence in situ hybridisation, genome-wide DNA methylation and copy number analysis together with whole exome, targeted deep- and Sanger sequencing at diagnosis, we identified mutations in established chronic lymphocytic leukemia driver genes in nine patients (69%), non-coding mutations (PAX5 enhancer region) in three patients and genomic complexity in two patients. Branching evolutionary trajectories predominated (n=9/13), revealing intra-tumoural epi- and genetic heterogeneity and sub-clonal competition before therapy. Of the patients subsequently requiring treatment, two had sub-clonal TP53 mutations that would not be detected by standard methodologies, three qualified for the very-low-risk category defined by integrated mutational and cytogenetic analysis and yet had established or putative driver mutations and one patient developed progressive, therapy-refractory disease associated with the emergence of an IGHV-U clone. These data suggest that extended genomic and immunogenetic screening may have clinical utility in patients with apparent good-risk disease. PMID:26847028

  1. Deep Sequencing of Pyrethroid-Resistant Bed Bugs Reveals Multiple Mechanisms of Resistance within a Single Population

    PubMed Central

    Adelman, Zach N.; Kilcullen, Kathleen A.; Koganemaru, Reina; Anderson, Michelle A. E.; Anderson, Troy D.; Miller, Dini M.

    2011-01-01

    A frightening resurgence of bed bug infestations has occurred over the last 10 years in the U.S. and current chemical methods have been inadequate for controlling this pest due to widespread insecticide resistance. Little is known about the mechanisms of resistance present in U.S. bed bug populations, making it extremely difficult to develop intelligent strategies for their control. We have identified bed bugs collected in Richmond, VA which exhibit both kdr-type (L925I) and metabolic resistance to pyrethroid insecticides. Using LD50 bioassays, we determined that resistance ratios for Richmond strain bed bugs were ∼5200-fold to the insecticide deltamethrin. To identify metabolic genes potentially involved in the detoxification of pyrethroids, we performed deep-sequencing of the adult bed bug transcriptome, obtaining more than 2.5 million reads on the 454 titanium platform. Following assembly, analysis of newly identified gene transcripts in both Harlan (susceptible) and Richmond (resistant) bed bugs revealed several candidate cytochrome P450 and carboxylesterase genes which were significantly over-expressed in the resistant strain, consistent with the idea of increased metabolic resistance. These data will accelerate efforts to understand the biochemical basis for insecticide resistance in bed bugs, and provide molecular markers to assist in the surveillance of metabolic resistance. PMID:22039447

  2. Deep sequencing of the T-cell receptor repertoire in CD8+ T-large granular lymphocyte leukemia identifies signature landscapes

    PubMed Central

    Clemente, Michael J.; Przychodzen, Bartlomiej; Jerez, Andres; Dienes, Brittney E.; Afable, Manuel G.; Husseinzadeh, Holleh; Rajala, Hanna L. M.; Wlodarski, Marcin W.; Mustjoki, Satu

    2013-01-01

    New massively parallel sequencing technology enables, through deep sequencing of rearranged T-cell receptor (TCR) Vβ complementarity-determining region 3 (CDR3) regions, a previously inaccessible level of TCR repertoire analysis. The CDR3 repertoire diversity reflects clonal composition, the potential antigenic recognition spectrum, and the quantity of available T-cell responses. In this context, T-large granular lymphocyte (T-LGL) leukemia is a chronic clonal lymphoproliferation of cytotoxic T cells often associated with autoimmune diseases and various cytopenias. Using CD8+ T-LGL leukemia as a model disease, we set out to evaluate and compare the TCR deep-sequencing spectra of both patients and healthy controls to better understand how TCR deep sequencing could be used in the diagnosis and monitoring of not only T-LGL leukemia but also reactive processes such as autoimmune disease and infection. Our data demonstrate, with high resolution, significantly decreased diversity of the T-cell repertoire in CD8+ T-LGL leukemia and suggest that many T-LGL clonotypes may be private to the disease and may not be present in the general public, even at the basal level. PMID:24149287

  3. Identification of miRNAs associated with sexual maturity in chicken ovary by Illumina small RNA deep sequencing

    PubMed Central

    2013-01-01

    Background MicroRNAs have been suggested to play important roles in the regulation of gene expression in various biological processes. To investigate the function of miRNAs in chicken ovarian development and folliculogenesis, two small RNA libraries constructed from sexually mature (162-day old) and immature (42-day old) ovary tissues of Single Comb White Leghorn chicken were sequenced using Illumina small RNA deep sequencing. Results In the present study, 14,545,100 and 14,774,864 clean reads were obtained from sexually mature (162-d) and sexually immature (42-d) ovaries, respectively. In total, 202 known miRNAs were identified, and 93 of them were found to be significantly differentially expressed: 42 miRNAs were up-regulated and 51 miRNAs were down-regulated in the mature ovary compared to the immature ovary. Among the up-regulated miRNAs, gga-miR-1a has the largest fold-change (6.405-fold), while gga-miR-375 has the largest fold-change (11.345-fold) among the down-regulated miRNAs. The three most abundant miRNAs in the chicken ovary are gga-miR-10a, gga-let-7 and gga-miR-21. Five differentially expressed miRNAs (gga-miR-1a, 21, 26a, 137 and 375) were validated by real-time quantitative RT-PCR (qRT-PCR). Furthermore, the expression patterns of the five miRNAs were analyzed in different developmental stages of chicken ovary and follicles of various sizes. Conclusion The present study provides the first miRNA profile in sexually immature and mature chicken ovaries. Some miRNAs such as gga-miR-1a and gga-miR-21are expressed differentially in immature and mature chicken ovaries as well as among different sized follicles, suggesting an important role in the follicular growth or ovulation mechanism in the chicken. PMID:23705682

  4. Combining resources to obtain a comprehensive survey of the bovine embryo transcriptome through deep sequencing and microarrays.

    PubMed

    Robert, Claude; Nieminen, Julie; Dufort, Isabelle; Gagné, Dominic; Grant, Jason R; Cagnone, Gaël; Plourde, Dany; Nivet, Anne-Laure; Fournier, Éric; Paquet, Éric; Blazejczyk, Michal; Rigault, Philippe; Juge, Nicolas; Sirard, Marc-André

    2011-09-01

    While most assisted reproductive technologies (ART) are considered routine for the reproduction of species of economical importance, such as the bovine, the impact of these manipulations on the developing embryo remains largely unknown. In an effort to obtain a comprehensive survey of the bovine embryo transcriptome and how it is modified by ART, resources were combined to design an embryo-specific microarray. Close to one million high-quality reads were produced from subtracted bovine embryo libraries using Roche 454 Titanium deep sequencing technology, which enabled the creation of an augmented bovine genome catalog. This catalog was enriched with bovine embryo transcripts, and included newly discovered indel type and 3'UTR variants. Using this augmented bovine genome catalog, the EmbryoGENE Bovine Microarray was designed and is composed of a total of 42,242 probes, including 21,139 known reference genes; 9,322 probes for novel transcribed regions (NTRs); 3,677 alternatively spliced exons; 3,353 3'-tiling probes; and 3,723 controls. A suite of bioinformatics tools was also developed to facilitate microrarray data analysis and database creation; it includes a quality control module, a Laboratory Information Management System (LIMS) and microarray analysis software. Results obtained during this study have already led to the identification of differentially expressed blastocyst targets, NTRs, splice variants of the indel type, and 3'UTR variants. We were able to confirm microarray results by real-time PCR, indicating that the EmbryoGENE bovine microarray has the power to detect physiologically relevant changes in gene expression. PMID:21812063

  5. Detection of low-prevalence somatic TSC2 mutations in sporadic pulmonary lymphangioleiomyomatosis tissues by deep sequencing.

    PubMed

    Fujita, Atsushi; Ando, Katsutoshi; Kobayashi, Etsuko; Mitani, Keiko; Okudera, Koji; Nakashima, Mitsuko; Miyatake, Satoko; Tsurusaki, Yoshinori; Saitsu, Hirotomo; Seyama, Kuniaki; Miyake, Noriko; Matsumoto, Naomichi

    2016-01-01

    Lymphangioleiomyomatosis (LAM) (MIM #606690) is a rare lung disorder leading to respiratory failure associated with progressive cystic destruction due to the proliferation and infiltration of abnormal smooth muscle-like cells (LAM cells). LAM can occur alone (sporadic LAM, S-LAM) or combined with tuberous sclerosis complex (TSC-LAM). TSC is caused by a germline heterozygous mutation in either TSC1 or TSC2, and TSC-LAM is thought to occur as a result of a somatic mutation (second hit) in addition to a germline mutation in TSC1 or TSC2 (first hit). S-LAM is also thought to occur under the two-hit model involving a somatic mutation and/or loss of heterozygosity in TSC2. To identify TSC1 or TSC2 changes in S-LAM patients, the two genes were analyzed by deep next-generation sequencing (NGS) using genomic DNA from blood leukocytes (n = 9), LAM tissue from lung (n = 7), LAM cultured cells (n = 4), or LAM cell clusters (n = 1). We identified nine somatic mutations in six of nine S-LAM patients (67 %) with mutant allele frequencies of 1.7-46.2 %. Three of these six patients (50 %) showed two different TSC2 mutations with allele frequencies of 1.7-28.7 %. Furthermore, at least five mutations with low prevalence (<20 % of allele frequency) were confirmed by droplet digital PCR. As LAM tissues are likely to be composed of heterogeneous cell populations, mutant allele frequencies can be low. Our results confirm the consistent finding of TSC2 mutations in LAM samples, and highlight the benefit of laser capture microdissection and in-depth allele analyses for detection, such as NGS. PMID:26563443

  6. BlockClust: efficient clustering and classification of non-coding RNAs from short read RNA-seq profiles

    PubMed Central

    Videm, Pavankumar; Rose, Dominic; Costa, Fabrizio; Backofen, Rolf

    2014-01-01

    Summary: Non-coding RNAs (ncRNAs) play a vital role in many cellular processes such as RNA splicing, translation, gene regulation. However the vast majority of ncRNAs still have no functional annotation. One prominent approach for putative function assignment is clustering of transcripts according to sequence and secondary structure. However sequence information is changed by post-transcriptional modifications, and secondary structure is only a proxy for the true 3D conformation of the RNA polymer. A different type of information that does not suffer from these issues and that can be used for the detection of RNA classes, is the pattern of processing and its traces in small RNA-seq reads data. Here we introduce BlockClust, an efficient approach to detect transcripts with similar processing patterns. We propose a novel way to encode expression profiles in compact discrete structures, which can then be processed using fast graph-kernel techniques. We perform both unsupervised clustering and develop family specific discriminative models; finally we show how the proposed approach is scalable, accurate and robust across different organisms, tissues and cell lines. Availability: The whole BlockClust galaxy workflow including all tool dependencies is available at http://toolshed.g2.bx.psu.edu/view/rnateam/blockclust_workflow. Contact: backofen@informatik.uni-freiburg.de; costa@informatik.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24931994

  7. Ultra-deep T cell receptor sequencing reveals the complexity and intratumour heterogeneity of T cell clones in renal cell carcinomas.

    PubMed

    Gerlinger, Marco; Quezada, Sergio A; Peggs, Karl S; Furness, Andrew J S; Fisher, Rosalie; Marafioti, Teresa; Shende, Vishvesh H; McGranahan, Nicholas; Rowan, Andrew J; Hazell, Steven; Hamm, David; Robins, Harlan S; Pickering, Lisa; Gore, Martin; Nicol, David L; Larkin, James; Swanton, Charles

    2013-12-01

    The recognition of cancer cells by T cells can impact upon prognosis and be exploited for immunotherapeutic approaches. This recognition depends on the specific interaction between antigens displayed on the surface of cancer cells and the T cell receptor (TCR), which is generated by somatic rearrangements of TCR α- and β-chains (TCRb). Our aim was to assess whether ultra-deep sequencing of the rearranged TCRb in DNA extracted from unfractionated clear cell renal cell carcinoma (ccRCC) samples can provide insights into the clonality and heterogeneity of intratumoural T cells in ccRCCs, a tumour type that can display extensive genetic intratumour heterogeneity (ITH). For this purpose, DNA was extracted from two to four tumour regions from each of four primary ccRCCs and was analysed by ultra-deep TCR sequencing. In parallel, tumour infiltration by CD4, CD8 and Foxp3 regulatory T cells was evaluated by immunohistochemistry and correlated with TCR-sequencing data. A polyclonal T cell repertoire with 367-16 289 (median 2394) unique TCRb sequences was identified per tumour region. The frequencies of the 100 most abundant T cell clones/tumour were poorly correlated between most regions (Pearson correlation coefficient, -0.218 to 0.465). 3-93% of these T cell clones were not detectable across all regions. Thus, the clonal composition of T cell populations can be heterogeneous across different regions of the same ccRCC. T cell ITH was higher in tumours pretreated with an mTOR inhibitor, which could suggest that therapy can influence adaptive tumour immunity. These data show that ultra-deep TCR-sequencing technology can be applied directly to DNA extracted from unfractionated tumour samples, allowing novel insights into the clonality of T cell populations in cancers. These were polyclonal and displayed ITH in ccRCC. TCRb sequencing may shed light on mechanisms of cancer immunity and the efficacy of immunotherapy approaches. PMID:24122851

  8. Complete genome sequence of the hyperthermophilic archaeon Pyrococcus sp. strain ST04, isolated from a deep-sea hydrothermal sulfide chimney on the Juan de Fuca Ridge.

    PubMed

    Jung, Jong-Hyun; Lee, Ju-Hoon; Holden, James F; Seo, Dong-Ho; Shin, Hakdong; Kim, Hae-Yeong; Kim, Wooki; Ryu, Sangryeol; Park, Cheon-Seok

    2012-08-01

    Pyrococcus sp. strain ST04 is a hyperthermophilic, anaerobic, and heterotrophic archaeon isolated from a deep-sea hydrothermal sulfide chimney on the Endeavour Segment of the Juan de Fuca Ridge in the northeastern Pacific Ocean. To further understand the distinct characteristics of this archaeon at the genome level (polysaccharide utilization at high temperature and ATP generation by a Na(+) gradient), the genome of strain ST04 was completely sequenced and analyzed. Here, we present the complete genome sequence analysis results of Pyrococcus sp. ST04 and report the major findings from the genome annotation, with a focus on its saccharolytic and metabolite production potential. PMID:22843576

  9. Complete Genome Sequence of the Hyperthermophilic Archaeon Pyrococcus sp. Strain ST04, Isolated from a Deep-Sea Hydrothermal Sulfide Chimney on the Juan de Fuca Ridge

    PubMed Central

    Jung, Jong-Hyun; Lee, Ju-Hoon; Holden, James F.; Seo, Dong-Ho; Shin, Hakdong; Kim, Hae-Yeong; Kim, Wooki; Ryu, Sangryeol

    2012-01-01

    Pyrococcus sp. strain ST04 is a hyperthermophilic, anaerobic, and heterotrophic archaeon isolated from a deep-sea hydrothermal sulfide chimney on the Endeavour Segment of the Juan de Fuca Ridge in the northeastern Pacific Ocean. To further understand the distinct characteristics of this archaeon at the genome level (polysaccharide utilization at high temperature and ATP generation by a Na+ gradient), the genome of strain ST04 was completely sequenced and analyzed. Here, we present the complete genome sequence analysis results of Pyrococcus sp. ST04 and report the major findings from the genome annotation, with a focus on its saccharolytic and metabolite production potential. PMID:22843576

  10. Comparative clinical sample preparation of DNA and RNA viral nucleic acids for a commercial deep sequencing system (Illumina MiSeq(®)).

    PubMed

    Ullmann, Leila Sabrina; de Camargo Tozato, Claudia; Malossi, Camila Dantas; da Cruz, Tais Fukuta; Cavalcante, Raíssa Vasconcelos; Kurissio, Jacqueline Kazue; Cagnini, Didier Quevedo; Rodrigues, Marianna Vaz; Biondo, Alexander Welker; Araujo, João Pessoa

    2015-08-01

    Sequence-independent methods for viral discovery have been widely used for whole genome sequencing of viruses. Different protocols for viral enrichment, library preparation and sequencing have increasingly been more available and at lower costs. However, no study to date has focused on optimization of viral sample preparation for commercial deep sequencing. Accordingly, the aim of the present study was to evaluate an In-House enzymatic protocol for double-stranded DNA (dsDNA) synthesis and also compare the use of a commercially available kit protocol (Nextera XT, Illumina Inc, San Diego, CA, USA) and its combination with a library quantitation kit (Kapa, Kapa Biosystems, Wilmington, MA, USA) for deep sequencing (Illumina Miseq). Two RNA viruses (canine distemper virus and dengue virus) and one ssDNA virus (porcine circovirus type 2) were tested with the optimized protocols. The tested method for dsDNA synthesis has shown satisfactory results and may be used in laboratory setting, particularly when enzymes are already available. Library preparation combining commercial kits (Nextera XT and Kapa) has yielded more reads and genome coverage, probably due to a lack of small fragment recovering at the normalization step of Nextera XT. In addition, libraries may be diluted or concentrated to provide increase on genome coverage with Kapa quantitation. PMID:25901649

  11. Replication fitness of multiple nonnucleoside reverse transcriptase-resistant HIV-1 variants in the presence of etravirine measured by 454 deep sequencing.

    PubMed

    Brumme, Chanson J; Huber, Kelly D; Dong, Winnie; Poon, Art F Y; Harrigan, P Richard; Sluis-Cremer, Nicolas

    2013-08-01

    We applied an efficient method to characterize the relative fitness levels of multiple nonnucleoside reverse transcriptase (NNRTI)-resistant HIV-1 variants by simultaneous competitive culture and 454 deep sequencing. Using this method, we show that the Y181V mutation in the HIV-1 reverse transcriptase in particular confers a clear selective advantage to the virus over 14 other NNRTI resistance mutations in the presence of etravirine in vitro. PMID:23720723

  12. Complete Genome Sequence of the d-Amino Acid Catabolism Bacterium Phaeobacter sp. Strain JL2886, Isolated from Deep Seawater of the South China Sea.

    PubMed

    Fu, Yingnan; Wang, Rui; Zhang, Zilian; Jiao, Nianzhi

    2016-01-01

    Phaeobacter sp. strain JL2886, isolated from deep seawater of the South China Sea, can catabolize d-amino acids. Here, we report the complete genome sequence of Phaeobacter sp. JL2886. It comprises ~4.06 Mbp, with a G+C content of 61.52%. A total of 3,913 protein-coding genes and 10 genes related to d-amino acid catabolism were obtained. PMID:27587825

  13. Complete Genome Sequence of the d-Amino Acid Catabolism Bacterium Phaeobacter sp. Strain JL2886, Isolated from Deep Seawater of the South China Sea

    PubMed Central

    Fu, Yingnan; Wang, Rui

    2016-01-01

    Phaeobacter sp. strain JL2886, isolated from deep seawater of the South China Sea, can catabolize d-amino acids. Here, we report the complete genome sequence of Phaeobacter sp. JL2886. It comprises ~4.06 Mbp, with a G+C content of 61.52%. A total of 3,913 protein-coding genes and 10 genes related to d-amino acid catabolism were obtained. PMID:27587825

  14. deepBase v2.0: identification, expression, evolution and function of small RNAs, LncRNAs and circular RNAs from deep-sequencing data

    PubMed Central

    Zheng, Ling-Ling; Li, Jun-Hao; Wu, Jie; Sun, Wen-Ju; Liu, Shun; Wang, Ze-Lin; Zhou, Hui; Yang, Jian-Hua; Qu, Liang-Hu

    2016-01-01

    Small non-coding RNAs (e.g. miRNAs) and long non-coding RNAs (e.g. lincRNAs and circRNAs) are emerging as key regulators of various cellular processes. However, only a very small fraction of these enigmatic RNAs have been well functionally characterized. In this study, we describe deepBase v2.0 (http://biocenter.sysu.edu.cn/deepBase/), an updated platform, to decode evolution, expression patterns and functions of diverse ncRNAs across 19 species. deepBase v2.0 has been updated to provide the most comprehensive collection of ncRNA-derived small RNAs generated from 588 sRNA-Seq datasets. Moreover, we developed a pipeline named lncSeeker to identify 176 680 high-confidence lncRNAs from 14 species. Temporal and spatial expression patterns of various ncRNAs were profiled. We identified approximately 24 280 primate-specific, 5193 rodent-specific lncRNAs, and 55 highly conserved lncRNA orthologs between human and zebrafish. We annotated 14 867 human circRNAs, 1260 of which are orthologous to mouse circRNAs. By combining expression profiles and functional genomic annotations, we developed lncFunction web-server to predict the function of lncRNAs based on protein-lncRNA co-expression networks. This study is expected to provide considerable resources to facilitate future experimental studies and to uncover ncRNA functions. PMID:26590255

  15. deepBase v2.0: identification, expression, evolution and function of small RNAs, LncRNAs and circular RNAs from deep-sequencing data.

    PubMed

    Zheng, Ling-Ling; Li, Jun-Hao; Wu, Jie; Sun, Wen-Ju; Liu, Shun; Wang, Ze-Lin; Zhou, Hui; Yang, Jian-Hua; Qu, Liang-Hu

    2016-01-01

    Small non-coding RNAs (e.g. miRNAs) and long non-coding RNAs (e.g. lincRNAs and circRNAs) are emerging as key regulators of various cellular processes. However, only a very small fraction of these enigmatic RNAs have been well functionally characterized. In this study, we describe deepBase v2.0 (http://biocenter.sysu.edu.cn/deepBase/), an updated platform, to decode evolution, expression patterns and functions of diverse ncRNAs across 19 species. deepBase v2.0 has been updated to provide the most comprehensive collection of ncRNA-derived small RNAs generated from 588 sRNA-Seq datasets. Moreover, we developed a pipeline named lncSeeker to identify 176 680 high-confidence lncRNAs from 14 species. Temporal and spatial expression patterns of various ncRNAs were profiled. We identified approximately 24 280 primate-specific, 5193 rodent-specific lncRNAs, and 55 highly conserved lncRNA orthologs between human and zebrafish. We annotated 14 867 human circRNAs, 1260 of which are orthologous to mouse circRNAs. By combining expression profiles and functional genomic annotations, we developed lncFunction web-server to predict the function of lncRNAs based on protein-lncRNA co-expression networks. This study is expected to provide considerable resources to facilitate future experimental studies and to uncover ncRNA functions. PMID:26590255

  16. Deep sequencing analysis of the heterogeneity of seed and commercial lots of the bacillus Calmette-Guérin (BCG) tuberculosis vaccine substrain Tokyo-172.

    PubMed

    Wada, Takayuki; Maruyama, Fumito; Iwamoto, Tomotada; Maeda, Shinji; Yamamoto, Taro; Nakagawa, Ichiro; Yamamoto, Saburo; Ohara, Naoya

    2015-01-01

    BCG, only vaccine available to prevent tuberculosis, was established in the early 20th century by prolonged passaging of a virulent clinical strain of Mycobacterium bovis. BCG Tokyo-172, originally distributed within Japan in 1924, is one of the currently used reference substrains for the vaccine. Recently, this substrain was reported to contain two spontaneously arising, heterogeneous subpopulations (Types I and II). The proportions of the subpopulations changed over time in both distributed seed lots and commercial lots. To maintain the homogeneity of live vaccines, such variations and subpopulational mutations in lots should be restrained and monitored. We incorporated deep sequencing techniques to validate such heterogeneity in lots of the BCG Tokyo-172 substrain without cloning. By bioinformatics analysis, we not only detected the two subpopulations but also detected two intrinsic variations within these populations. The intrinsic variants could be isolated from respective lots as colonies cultured on plate media, suggesting analyses incorporating deep sequencing techniques are powerful, valid tools to detect mutations in live bacterial vaccine lots. Our data showed that spontaneous mutations in BCG vaccines could be easily monitored by deep sequencing without direct isolation of variants, revealing the complex heterogeneity of BCG Tokyo-172 and its daughter lots currently in use. PMID:26635118

  17. Ultra-Deep Bisulfite Sequencing to Detect Specific DNA Methylation Patterns of Minor Cell Types in Heterogeneous Cell Populations: An Example of the Pituitary Tissue

    PubMed Central

    Atozi, Takanori; Matsumoto, Shoma; Hanazono, Yutaka; Nagashima, Hiroshi; Ohgane, Jun

    2016-01-01

    DNA methylation is an epigenetic modification important for cell fate determination and cell type-specific gene expression. Transcriptional regulatory regions of the mammalian genome contain a large number of tissue/cell type-dependent differentially methylated regions (T-DMRs) with DNA methylation patterns crucial for transcription of the corresponding genes. In general, tissues consist of multiple cell types in various proportions, making it difficult to detect T-DMRs of minor cell types in tissues. The present study attempts to detect T-DMRs of minor cell types in tissues by ultra-deep bisulfite sequencing of cell type-restricted genes and to assume proportions of minor cell types based on DNA methylation patterns of sequenced reads. For this purpose, we focused on transcriptionally active hypomethylated alleles (Hypo-alleles), which can be recognized by the high ratio of unmethylated CpGs in each sequenced read (allele). The pituitary gland contains multiple cell types including five hormone-expressing cell types and stem/progenitor cells, each of which is a minor cell type in the pituitary tissue. By ultra-deep sequencing of more than 100 reads for detection of Hypo-alleles in pituitary cell type-specific genes, we identified T-DMRs specific to hormone-expressing cells and stem/progenitor cells and used them to estimate the proportions of each cell type based on the Hypo-allele ratio in pituitary tissue. Therefore, introduction of the novel Hypo-allele concept enabled us to detect T-DMRs of minor cell types with estimation of their proportions in the tissue by ultra-deep bisulfite sequencing. PMID:26752725

  18. Analysis of the cystic fibrosis lung microbiota via serial Illumina sequencing of bacterial 16S rRNA hypervariable regions.

    PubMed

    Maughan, Heather; Wang, Pauline W; Diaz Caballero, Julio; Fung, Pauline; Gong, Yunchen; Donaldson, Sylva L; Yuan, Lijie; Keshavjee, Shaf; Zhang, Yu; Yau, Yvonne C W; Waters, Valerie J; Tullis, D Elizabeth; Hwang, David M; Guttman, David S

    2012-01-01

    The characterization of bacterial communities using DNA sequencing has revolutionized our ability to study microbes in nature and discover the ways in which microbial communities affect ecosystem functioning and human health. Here we describe Serial Illumina Sequencing (SI-Seq): a method for deep sequencing of the bacterial 16S rRNA gene using next-generation sequencing technology. SI-Seq serially sequences portions of the V5, V6 and V7 hypervariable regions from barcoded 16S rRNA amplicons using an Illumina short-read genome analyzer. SI-Seq obtains taxonomic resolution similar to 454 pyrosequencing for a fraction of the cost, and can produce hundreds of thousands of reads per sample even with very high multiplexing. We validated SI-Seq using single species and mock community controls, and via a comparison to cystic fibrosis lung microbiota sequenced using 454 FLX Titanium. Our control runs show that SI-Seq has a dynamic range of at least five orders of magnitude, can classify >96% of sequences to the genus level, and performs just as well as 454 and paired-end Illumina methods in estimation of standard microbial ecology diversity measurements. We illustrate the utility of SI-Seq in a pilot sample of central airway secretion samples from cystic fibrosis patients. PMID:23056217

  19. Transposon Mutagenesis Paired with Deep Sequencing of Caulobacter crescentus under Uranium Stress Reveals Genes Essential for Detoxification and Stress Tolerance

    PubMed Central

    Yung, Mimi C.; Park, Dan M.; Overton, K. Wesley; Blow, Matthew J.; Hoover, Cindi A.; Smit, John; Murray, Sean R.; Ricci, Dante P.; Christen, Beat; Bowman, Grant R.

    2015-01-01

    ABSTRACT The ubiquitous aquatic bacterium Caulobacter crescentus is highly resistant to uranium (U) and facilitates U biomineralization and thus holds promise as an agent of U bioremediation. To gain an understanding of how C. crescentus tolerates U, we employed transposon (Tn) mutagenesis paired with deep sequencing (Tn-seq) in a global screen for genomic elements required for U resistance. Of the 3,879 annotated genes in the C. crescentus genome, 37 were found to be specifically associated with fitness under U stress, 15 of which were subsequently tested through mutational analysis. Systematic deletion analysis revealed that mutants lacking outer membrane transporters (rsaFa and rsaFb), a stress-responsive transcription factor (cztR), or a ppGpp synthetase/hydrolase (spoT) exhibited a significantly lower survival rate under U stress. RsaFa and RsaFb, which are homologues of TolC in Escherichia coli, have previously been shown to mediate S-layer export. Transcriptional analysis revealed upregulation of rsaFa and rsaFb by 4- and 10-fold, respectively, in the presence of U. We additionally show that rsaFa mutants accumulated higher levels of U than the wild type, with no significant increase in oxidative stress levels. Our results suggest a function for RsaFa and RsaFb in U efflux and/or maintenance of membrane integrity during U stress. In addition, we present data implicating CztR and SpoT in resistance to U stress. Together, our findings reveal novel gene targets that are key to understanding the molecular mechanisms of U resistance in C. crescentus. IMPORTANCE Caulobacter crescentus is an aerobic bacterium that is highly resistant to uranium (U) and has great potential to be used in U bioremediation, but its mechanisms of U resistance are poorly understood. We conducted a Tn-seq screen to identify genes specifically required for U resistance in C. crescentus. The genes that we identified have previously remained elusive using other omics approaches and thus

  20. Identification and characterization of microRNAs by deep-sequencing in Hyalomma anatolicum anatolicum (Acari: Ixodidae) ticks.

    PubMed

    Luo, Jin; Liu, Guang-Yuan; Chen, Ze; Ren, Qiao-Yun; Yin, Hong; Luo, Jian-Xun; Wang, Hui

    2015-06-15

    Hyalomma anatolicum anatolicum (H.a. anatolicum) (Acari: Ixodidae) ticks are globally distributed ectoparasites with veterinary and medical importance. These ticks not only weaken animals by sucking their blood but also transmit different species of parasitic protozoans. Multiple factors influence these parasitic infections including miRNAs, which are non-coding, small regulatory RNA molecules essential for the complex life cycle of parasites. To identify and characterize miRNAs in H.a. anatolicum, we developed an integrative approach combining deep sequencing, bioinformatics and real-time PCR analysis. Here we report the use of this approach to identify miRNA expression, family distribution, and nucleotide characteristics, and discovered novel miRNAs in H.a. anatolicum. The result showed that miR-1-3p, miR-275-3p, and miR-92a were expressed abundantly. There was a strong bias on miRNA, family members, and nucleotide compositions at certain positions in H.a. anatolicum miRNA. Uracil was the dominant nucleotide, particularly at positions 1, 6, 16, and 18, which were located approximately at the beginning, middle, and end of conserved miRNAs. Analysis of the conserved miRNAs indicated that miRNAs in H.a. anatolicum were concentrated along three diverse phylogenetic branches of bilaterians, insects and coelomates. Two possible roles for the use of miRNA in H.a. anatolicum could be presumed based on its parasitic life cycle: to maintain a large category of miRNA families of different animals, and/or to preserve stringent conserved seed regions with active changes in other places of miRNAs mainly in the middle and the end regions. These might help the parasite to undergo its complex life style in different hosts and adapt more readily to the host changes. The present study represents the first large scale characterization of H.a. anatolicum miRNAs, which could further the understanding of the complex biology of this zoonotic parasite, as well as initiate miRNA studies

  1. Small RNA Deep Sequencing Reveals Role for Arabidopsis thaliana RNA-Dependent RNA Polymerases in Viral siRNA Biogenesis

    PubMed Central

    Qi, Xiaopeng; Bao, Forrest Sheng; Xie, Zhixin

    2009-01-01

    RNA silencing functions as an important antiviral defense mechanism in a broad range of eukaryotes. In plants, biogenesis of several classes of endogenous small interfering RNAs (siRNAs) requires RNA-dependent RNA Polymerase (RDR) activities. Members of the RDR family proteins, including RDR1and RDR6, have also been implicated in antiviral defense, although a direct role for RDRs in viral siRNA biogenesis has yet to be demonstrated. Using a crucifer-infecting strain of Tobacco Mosaic Virus (TMV-Cg) and Arabidopsis thaliana as a model system, we analyzed the viral small RNA profile in wild-type plants as well as rdr mutants by applying small RNA deep sequencing technology. Over 100,000 TMV-Cg-specific small RNA reads, mostly of 21- (78.4%) and 22-nucleotide (12.9%) in size and originating predominately (79.9%) from the genomic sense RNA strand, were captured at an early infection stage, yielding the first high-resolution small RNA map for a plant virus. The TMV-Cg genome harbored multiple, highly reproducible small RNA-generating hot spots that corresponded to regions with no apparent local hairpin-forming capacity. Significantly, both the rdr1 and rdr6 mutants exhibited globally reduced levels of viral small RNA production as well as reduced strand bias in viral small RNA population, revealing an important role for these host RDRs in viral siRNA biogenesis. In addition, an informatics analysis showed that a large set of host genes could be potentially targeted by TMV-Cg-derived siRNAs for posttranscriptional silencing. Two of such predicted host targets, which encode a cleavage and polyadenylation specificity factor (CPSF30) and an unknown protein similar to translocon-associated protein alpha (TRAP α), respectively, yielded a positive result in cleavage validation by 5′RACE assays. Our data raised the interesting possibility for viral siRNA-mediated virus-host interactions that may contribute to viral pathogenicity and host specificity. PMID:19308254

  2. Deep Sequencing Analysis of miRNA Expression in Breast Muscle of Fast-Growing and Slow-Growing Broilers

    PubMed Central

    Ouyang, Hongjia; He, Xiaomei; Li, Guihuan; Xu, Haiping; Jia, Xinzheng; Nie, Qinghua; Zhang, Xiquan

    2015-01-01

    Growth performance is an important economic trait in chicken. MicroRNAs (miRNAs) have been shown to play important roles in various biological processes, but their functions in chicken growth are not yet clear. To investigate the function of miRNAs in chicken growth, breast muscle tissues of the two-tail samples (highest and lowest body weight) from Recessive White Rock (WRR) and Xinghua Chickens (XH) were performed on high throughput small RNA deep sequencing. In this study, a total of 921 miRNAs were identified, including 733 known mature miRNAs and 188 novel miRNAs. There were 200, 279, 257 and 297 differentially expressed miRNAs in the comparisons of WRRh vs. WRRl, WRRh vs. XHh, WRRl vs. XHl, and XHh vs. XHl group, respectively. A total of 22 highly differentially expressed miRNAs (fold change > 2 or < 0.5; p-value < 0.05; q-value < 0.01), which also have abundant expression (read counts > 1000) were found in our comparisons. As far as two analyses (WRRh vs. WRRl, and XHh vs. XHl) are concerned, we found 80 common differentially expressed miRNAs, while 110 miRNAs were found in WRRh vs. XHh and WRRl vs. XHl. Furthermore, 26 common miRNAs were identified among all four comparisons. Four differentially expressed miRNAs (miR-223, miR-16, miR-205a and miR-222b-5p) were validated by quantitative real-time RT-PCR (qRT-PCR). Regulatory networks of interactions among miRNAs and their targets were constructed using integrative miRNA target-prediction and network-analysis. Growth hormone receptor (GHR) was confirmed as a target of miR-146b-3p by dual-luciferase assay and qPCR, indicating that miR-34c, miR-223, miR-146b-3p, miR-21 and miR-205a are key growth-related target genes in the network. These miRNAs are proposed as candidate miRNAs for future studies concerning miRNA-target function on regulation of chicken growth. PMID:26193261

  3. Phylogenetic and Genome-Wide Deep-Sequencing Analyses of Canine Parvovirus Reveal Co-Infection with Field Variants and Emergence of a Recent Recombinant Strain

    PubMed Central

    Pérez, Ruben; Calleros, Lucía; Marandino, Ana; Sarute, Nicolás; Iraola, Gregorio; Grecco, Sofia; Blanc, Hervé; Vignuzzi, Marco; Isakov, Ofer; Shomron, Noam; Carrau, Lucía; Hernández, Martín; Francia, Lourdes; Sosa, Katia; Tomás, Gonzalo; Panzera, Yanina

    2014-01-01

    Canine parvovirus (CPV), a fast-evolving single-stranded DNA virus, comprises three antigenic variants (2a, 2b, and 2c) with different frequencies and genetic variability among countries. The contribution of co-infection and recombination to the genetic variability of CPV is far from being fully elucidated. Here we took advantage of a natural CPV population, recently formed by the convergence of divergent CPV-2c and CPV-2a strains, to study co-infection and recombination. Complete sequences of the viral coding region of CPV-2a and CPV-2c strains from 40 samples were generated and analyzed using phylogenetic tools. Two samples showed co-infection and were further analyzed by deep sequencing. The sequence profile of one of the samples revealed the presence of CPV-2c and CPV-2a strains that differed at 29 nucleotides. The other sample included a minor CPV-2a strain (13.3% of the viral population) and a major recombinant strain (86.7%). The recombinant strain arose from inter-genotypic recombination between CPV-2c and CPV-2a strains within the VP1/VP2 gene boundary. Our findings highlight the importance of deep-sequencing analysis to provide a better understanding of CPV molecular diversity. PMID:25365348

  4. Complete genome sequence of the aerobic, heterotroph Marinithermus hydrothermalis type strain (T1T) from a deep-sea hydrothermal vent chimney

    SciTech Connect

    Copeland, A; Gu, Wei; Yasawong, Montri; Lapidus, Alla L.; Lucas, Susan; Deshpande, Shweta; Pagani, Ioanna; Tapia, Roxanne; Cheng, Jan-Fang; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Pan, Chongle; Brambilla, Evelyne-Marie; Rohde, Manfred; Tindall, Brian; Sikorski, Johannes; Goker, Markus; Detter, J. Chris; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Woyke, Tanja

    2012-01-01

    Marinithermus hydrothermalis Sako et al. 2003 is the type species of the monotypic genus Marinithermus. M. hydrothermalis T1 T was the first isolate within the phylum ThermusDeinococcus to exhibit optimal growth under a salinity equivalent to that of sea water and to have an absolute requirement for NaCl for growth. M. hydrothermalis T1 T is of interest because it may provide a new insight into the ecological significance of the aerobic, thermophilic decomposers in the circulation of organic compounds in deep-sea hydrothermal vent ecosystems. This is the first completed genome sequence of a member of the genus Marinithermus and the seventh sequence from the family Thermaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,269,167 bp long genome with its 2,251 protein-coding and 59 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  5. Construction of a rationally designed antibody platform for sequencing-assisted selection.

    PubMed

    Larman, H Benjamin; Xu, George Jing; Pavlova, Natalya N; Elledge, Stephen J

    2012-11-01

    Antibody discovery platforms have become an important source of both therapeutic biomolecules and research reagents. Massively parallel DNA sequencing can be used to assist antibody selection by comprehensively monitoring libraries during selection, thus greatly expanding the power of these systems. We have therefore constructed a rationally designed, fully defined single-chain variable fragment (scFv) library and analysis platform optimized for analysis with short-read deep sequencing. Sequence-defined oligonucleotide libraries encoding three complementarity-determining regions (L3 from the light chain, H2 and H3 from the heavy chain) were synthesized on a programmable microarray and combinatorially cloned into a single scFv framework for molecular display. Our unique complementarity-determining region sequence design optimizes for protein binding by utilizing a hidden Markov model that was trained on all antibody-antigen cocrystal structures in the Protein Data Bank. The resultant ~10(12)-member library was produced in ribosome-display format, and comprehensively analyzed over four rounds of antigen selections by multiplex paired-end Illumina sequencing. The hidden Markov model scFv library generated multiple binders against an emerging cancer antigen and is the basis for a next-generation antibody production platform. PMID:23064642

  6. Arthropod Phylogenetics in Light of Three Novel Millipede (Myriapoda: Diplopoda) Mitochondrial Genomes with Comments on the Appropriateness of Mitochondrial Genome Sequence Data for Inferring Deep Level Relationships

    PubMed Central

    Brewer, Michael S.; Swafford, Lynn; Spruill, Chad L.; Bond, Jason E.

    2013-01-01

    Background Arthropods are the most diverse group of eukaryotic organisms, but their phylogenetic relationships are poorly understood. Herein, we describe three mitochondrial genomes representing orders of millipedes for which complete genomes had not been characterized. Newly sequenced genomes are combined with existing data to characterize the protein coding regions of myriapods and to attempt to reconstruct the evolutionary relationships within the Myriapoda and Arthropoda. Results The newly sequenced genomes are similar to previously characterized millipede sequences in terms of synteny and length. Unique translocations occurred within the newly sequenced taxa, including one half of the Appalachioria falcifera genome, which is inverted with respect to other millipede genomes. Across myriapods, amino acid conservation levels are highly dependent on the gene region. Additionally, individual loci varied in the level of amino acid conservation. Overall, most gene regions showed low levels of conservation at many sites. Attempts to reconstruct the evolutionary relationships suffered from questionable relationships and low support values. Analyses of phylogenetic informativeness show the lack of signal deep in the trees (i.e., genes evolve too quickly). As a result, the myriapod tree resembles previously published results but lacks convincing support, and, within the arthropod tree, well established groups were recovered as polyphyletic. Conclusions The novel genome sequences described herein provide useful genomic information concerning millipede groups that had not been investigated. Taken together with existing sequences, the variety of compositions and evolution of myriapod mitochondrial genomes are shown to be more complex than previously thought. Unfortunately, the use of mitochondrial protein-coding regions in deep arthropod phylogenetics appears problematic, a result consistent with previously published studies. Lack of phylogenetic signal renders the

  7. Transcriptional Slippage and RNA Editing Increase the Diversity of Transcripts in Chloroplasts: Insight from Deep Sequencing of Vigna radiata Genome and Transcriptome.

    PubMed

    Lin, Ching-Ping; Ko, Chia-Yun; Kuo, Ching-I; Liu, Mao-Sen; Schafleitner, Roland; Chen, Long-Fang Oliver

    2015-01-01

    We performed deep sequencing of the nuclear and organellar genomes of three mungbean genotypes: Vigna radiata ssp. sublobata TC1966, V. radiata var. radiata NM92 and the recombinant inbred line RIL59 derived from a cross between TC1966 and NM92. Moreover, we performed deep sequencing of the RIL59 transcriptome to investigate transcript variability. The mungbean chloroplast genome has a quadripartite structure including a pair of inverted repeats separated by two single copy regions. A total of 213 simple sequence repeats were identified in the chloroplast genomes of NM92 and RIL59; 78 single nucleotide variants and nine indels were discovered in comparing the chloroplast genomes of TC1966 and NM92. Analysis of the mungbean chloroplast transcriptome revealed mRNAs that were affected by transcriptional slippage and RNA editing. Transcriptional slippage frequency was positively correlated with the length of simple sequence repeats of the mungbean chloroplast genome (R2=0.9911). In total, 41 C-to-U editing sites were found in 23 chloroplast genes and in one intergenic spacer. No editing site that swapped U to C was found. A combination of bioinformatics and experimental methods revealed that the plastid-encoded RNA polymerase-transcribed genes psbF and ndhA are affected by transcriptional slippage in mungbean and in main lineages of land plants, including three dicots (Glycine max, Brassica rapa, and Nicotiana tabacum), two monocots (Oryza sativa and Zea mays), two gymnosperms (Pinus taeda and Ginkgo biloba) and one moss (Physcomitrella patens). Transcript analysis of the rps2 gene showed that transcriptional slippage could affect transcripts at single sequence repeat regions with poly-A runs. It showed that transcriptional slippage together with incomplete RNA editing may cause sequence diversity of transcripts in chloroplasts of land plants. PMID:26076132

  8. Deep-Sequencing Method for Quantifying Background Abundances of Symbiodinium Types: Exploring the Rare Symbiodinium Biosphere in Reef-Building Corals

    PubMed Central

    Quigley, Kate M.; Davies, Sarah W.; Kenkel, Carly D.; Willis, Bette L.; Matz, Mikhail V.; Bay, Line K.

    2014-01-01

    The capacity of reef-building corals to associate with environmentally-appropriate types of endosymbionts from the dinoflagellate genus Symbiodinium contributes significantly to their success at local scales. Additionally, some corals are able to acclimatize to environmental perturbations by shuffling the relative proportions of different Symbiodinium types hosted. Understanding the dynamics of these symbioses requires a sensitive and quantitative method of Symbiodinium genotyping. Electrophoresis methods, still widely utilized for this purpose, are predominantly qualitative and cannot guarantee detection of a background type below 10% of the total Symbiodinium population. Here, the relative abundances of four Symbiodinium types (A13, C1, C3, and D1) in mixed samples of known composition were quantified using deep sequencing of the internal transcribed spacer of the ribosomal RNA gene (ITS-2) by means of Next Generation Sequencing (NGS) using Roche 454. In samples dominated by each of the four Symbiodinium types tested, background levels of the other three types were detected when present at 5%, 1%, and 0.1% levels, and their relative abundances were quantified with high (A13, C1, D1) to variable (C3) accuracy. The potential of this deep sequencing method for resolving fine-scale genetic diversity within a symbiont type was further demonstrated in a natural symbiosis using ITS-1, and uncovered reef-specific differences in the composition of Symbiodinium microadriaticum in two species of acroporid corals (Acropora digitifera and A. hyacinthus) from Palau. The ability of deep sequencing of the ITS locus (1 and 2) to detect and quantify low-abundant Symbiodinium types, as well as finer-scale diversity below the type level, will enable more robust quantification of local genetic diversity in Symbiodinium populations. This method will help to elucidate the role that background types have in maximizing coral fitness across diverse environments and in response to

  9. Cross-Species, Amplifiable Microsatellite Markers for Neoverrucid Barnacles from Deep-Sea Hydrothermal Vents Developed Using Next-Generation Sequencing

    PubMed Central

    Nakajima, Yuichi; Shinzato, Chuya; Khalturina, Mariia; Watanabe, Hiromi; Inagaki, Fumio; Satoh, Nori; Mitarai, Satoshi

    2014-01-01

    Barnacles of the genus Neoverruca are abundant near deep-sea hydrothermal vents of the northwestern Pacific Ocean, and are useful for understanding processes of population formation and maintenance of deep-sea vent faunas. Using next-generation sequencing, we isolated 12 polymorphic microsatellite loci from Neoverruca sp., collected in the Okinawa Trough. These microsatellite loci revealed 2–19 alleles per locus. The expected and observed heterozygosities ranged from 0.286 to 1.000 and 0.349 to 0.935, respectively. Cross-species amplification showed that 9 of the 12 loci were successfully amplified for Neoverruca brachylepadoformis in the Mariana Trough. A pairwise FST value calculated using nine loci showed significant genetic differentiation between the two species. Consequently, the microsatellite markers we developed will be useful for further population genetic studies to elucidate genetic diversity, differentiation, classification, and evolutionary processes in the genus Neoverruca. PMID:25196437

  10. Permanent draft genome sequence of Bacillus flexus strain T6186-2, a multidrug-resistant bacterium isolated from a deep-subsurface oil reservoir.

    PubMed

    Zhang, Fan; Jiang, Xiawei; Chai, Lujun; She, Yuehui; Yu, Gaoming; Shu, Fuchang; Wang, Zhengliang; Su, Sanbao; Wenqiong, Wu; Tingsheng, Xiang; Zhang, Zhongzhi; Hou, Dujie; Zheng, Beiwen

    2014-12-01

    Previous studies suggest that antibiotic resistance genes have an ancient origin, which is not always linked to the use of antibiotics but can be enhanced by human activities. Bacillus flexus strain T6186-2 was isolated from the formation water sample of a deep-subsurface oil reservoir. Interestingly, antimicrobial susceptibility testing showed that this strain is susceptible to kanamycin, however, resistant to ampicillin, erythromycin, gentamicin, vancomycin, fosfomycin, fosmidomycin, tetracycline and teicoplanin. To explore our knowledge about the origins of antibiotic resistance genes (ARGs) in the relatively pristine environment, we sequenced the genome of B. flexus strain T6186-2 as a permanent draft. It represents the evidence for the existence of a reservoir of ARGs in nature among microbial populations from deep-subsurface oil reservoirs. PMID:25301038

  11. Ultra-deep sequencing analysis of the hepatitis A virus 5'-untranslated region among cases of the same outbreak from a single source.

    PubMed

    Wu, Shuang; Nakamoto, Shingo; Kanda, Tatsuo; Jiang, Xia; Nakamura, Masato; Miyamura, Tatsuo; Shirasawa, Hiroshi; Sugiura, Nobuyuki; Takahashi-Nakaguchi, Azusa; Gonoi, Tohru; Yokosuka, Osamu

    2014-01-01

    Hepatitis A virus (HAV) is a causative agent of acute viral hepatitis for which an effective vaccine has been developed. Here we describe ultra-deep pyrosequences (UDPSs) of HAV 5'-untranslated region (5'UTR) among cases of the same outbreak, which arose from a single source, associated with a revolving sushi bar. We determined the reference sequence from HAV-derived clone from an attendant by the Sanger method. Sixteen UDPSs from this outbreak and one from another sporadic case were compared with this reference. Nucleotide errors yielded a UDPS error rate of < 1%. This study confirmed that nucleotide substitutions of this region are transition mutations in outbreak cases, that insertion was observed only in non-severe cases, and that these nucleotide substitutions were different from those of the sporadic case. Analysis of UDPSs detected low-prevalence HAV variations in 5'UTR, but no specific mutations associated with severity in these outbreak cases. To our surprise, HAV strains in this outbreak conserved HAV IRES sequence even if we performed analysis of UDPSs. UDPS analysis of HAV 5'UTR gave us no association between the disease severity of hepatitis A and HAV 5'UTR substitutions. It might be more interesting to perform ultra-deep sequencing of full length HAV genome in order to reveal possible unknown genomic determinants associated with disease severity. Further studies will be needed. PMID:24396287

  12. Scaling metagenome sequence assembly with probabilistic de Bruijn graphs

    PubMed Central

    Pell, Jason; Hintze, Arend; Canino-Koning, Rosangela; Howe, Adina; Tiedje, James M.; Brown, C. Titus

    2012-01-01

    Deep sequencing has enabled the investigation of a wide range of environmental microbial ecosystems, but the high memory requirements for de novo assembly of short-read shotgun sequencing data from these complex populations are an increasingly large practical barrier. Here we introduce a memory-efficient graph representation with which we can analyze the k-mer connectivity of metagenomic samples. The graph representation is based on a probabilistic data structure, a Bloom filter, that allows us to efficiently store assembly graphs in as little as 4 bits per k-mer, albeit inexactly. We show that this data structure accurately represents DNA assembly graphs in low memory. We apply this data structure to the problem of partitioning assembly graphs into components as a prelude to assembly, and show that this reduces the overall memory requirements for de novo assembly of metagenomes. On one soil metagenome assembly, this approach achieves a nearly 40-fold decrease in the maximum memory requirements for assembly. This probabilistic graph representation is a significant theoretical advance in storing assembly graphs and also yields immediate leverage on metagenomic assembly. PMID:22847406

  13. Ultra-deep T cell receptor sequencing reveals the complexity and intratumour heterogeneity of T cell clones in renal cell carcinomas

    PubMed Central

    Gerlinger, Marco; Quezada, Sergio A; Peggs, Karl S; Furness, Andrew JS; Fisher, Rosalie; Marafioti, Teresa; Shende, Vishvesh H; McGranahan, Nicholas; Rowan, Andrew J; Hazell, Steven; Hamm, David; Robins, Harlan S; Pickering, Lisa; Gore, Martin; Nicol, David L; Larkin, James; Swanton, Charles

    2013-01-01

    The recognition of cancer cells by T cells can impact upon prognosis and be exploited for immunotherapeutic approaches. This recognition depends on the specific interaction between antigens displayed on the surface of cancer cells and the T cell receptor (TCR), which is generated by somatic rearrangements of TCR α- and β-chains (TCRb). Our aim was to assess whether ultra-deep sequencing of the rearranged TCRb in DNA extracted from unfractionated clear cell renal cell carcinoma (ccRCC) samples can provide insights into the clonality and heterogeneity of intratumoural T cells in ccRCCs, a tumour type that can display extensive genetic intratumour heterogeneity (ITH). For this purpose, DNA was extracted from two to four tumour regions from each of four primary ccRCCs and was analysed by ultra-deep TCR sequencing. In parallel, tumour infiltration by CD4, CD8 and Foxp3 regulatory T cells was evaluated by immunohistochemistry and correlated with TCR-sequencing data. A polyclonal T cell repertoire with 367–16 289 (median 2394) unique TCRb sequences was identified per tumour region. The frequencies of the 100 most abundant T cell clones/tumour were poorly correlated between most regions (Pearson correlation coefficient, –0.218 to 0.465). 3–93% of these T cell clones were not detectable across all regions. Thus, the clonal composition of T cell populations can be heterogeneous across different regions of the same ccRCC. T cell ITH was higher in tumours pretreated with an mTOR inhibitor, which could suggest that therapy can influence adaptive tumour immunity. These data show that ultra-deep TCR-sequencing technology can be applied directly to DNA extracted from unfractionated tumour samples, allowing novel insights into the clonality of T cell populations in cancers. These were polyclonal and displayed ITH in ccRCC. TCRb sequencing may shed light on mechanisms of cancer immunity and the efficacy of immunotherapy approaches. Copyright © 2013 Pathological Society of

  14. Implications of spatial and temporal development of the aftershock sequence for the Mw 8.3 June 9, 1994 Deep Bolivian Earthquake

    NASA Astrophysics Data System (ADS)

    Myers, Stephen C.; Wallace, Terry C.; Beck, Susan L.; Silver, Paul G.; Zandt, George; Vandecar, John; Minaya, Estela

    On June 9, 1994 the Mw 8.3 Bolivia earthquake (636 km depth) occurred in a region which had not experienced significant, deep seismicity for at least 30 years. The mainshock and aftershocks were recorded in Bolivia on the BANJO and SEDA broadband seismic arrays and on the San Calixto Network. We used the joint hypocenter determination method to determine the relative location of the aftershocks. We have identified no foreshocks and 89 aftershocks (m > 2.2) for the 20-day period following the mainshock. The frequency of aftershock occurrence decreased rapidly, with only one or two aftershocks per day occuring after day two. The temporal decay of aftershock activity is similar to shallow aftershock sequences, but the number of aftershocks is two orders of magnitude less. Additionally, a mb ∼6, apparently triggered earthquake occurred just 10 minutes after the mainshock about 330 km east-southeast of the mainshock at a depth of 671 km. The aftershock sequence occurred north and east of the mainshock and extends to a depth of 665 km. The aftershocks define a slab striking N68°W and dipping 45°NE. The strike, dip, and location of the aftershock zone are consistent with this seismicity being confined within the downward extension of the subducted Nazca plate. The location and orientation of the aftershock sequence indicate that the subducted Nazca plate bends between the NNW striking zone of deep seismicity in western Brazil and the N-S striking zone of seismicity in central Bolivia. A tear in the deep slab is not necessitated by the data. A subset of the aftershock hypocenters cluster along a subhorizontal plane near the depth of the mainshock, favoring a horizontal fault plane. The horizontal dimensions of the mainshock [Beck et al., this issue; Silver et al., 1995] and slab defined by the aftershocks are approximately equal, indicating that the mainshock ruptured through the slab.

  15. Genome Sequence of Aeribacillus pallidus Strain GS3372, an Endospore-Forming Bacterium Isolated in a Deep Geothermal Reservoir.

    PubMed

    Filippidou, Sevasti; Jaussi, Marion; Junier, Thomas; Wunderlin, Tina; Jeanneret, Nicole; Regenspurg, Simona; Li, Po-E; Lo, Chien-Chi; Johnson, Shannon; McMurry, Kim; Gleasner, Cheryl D; Vuyisich, Momchilo; Chain, Patrick S; Junier, Pilar

    2015-01-01

    The genome of strain GS3372 is the first publicly available strain of Aeribacillus pallidus. This endospore-forming thermophilic strain was isolated from a deep geothermal reservoir. The availability of this genome can contribute to the clarification of the taxonomy of the closely related Anoxybacillus, Geobacillus, and Aeribacillus genera. PMID:26316637

  16. Genome Sequence of Aeribacillus pallidus Strain GS3372, an Endospore-Forming Bacterium Isolated in a Deep Geothermal Reservoir

    PubMed Central

    Filippidou, Sevasti; Jaussi, Marion; Junier, Thomas; Wunderlin, Tina; Jeanneret, Nicole; Regenspurg, Simona; Li, Po-E; Lo, Chien-Chi; Johnson, Shannon; McMurry, Kim; Gleasner, Cheryl D.; Vuyisich, Momchilo; Chain, Patrick S.

    2015-01-01

    The genome of strain GS3372 is the first publicly available strain of Aeribacillus pallidus. This endospore-forming thermophilic strain was isolated from a deep geothermal reservoir. The availability of this genome can contribute to the clarification of the taxonomy of the closely related Anoxybacillus, Geobacillus, and Aeribacillus genera. PMID:26316637

  17. Draft genome sequence of Thermococcus sp. EP1, a novel hyperthermophilic archaeon isolated from a deep-sea hydrothermal vent on the East Pacific Rise.

    PubMed

    Zhou, Meixian; Liu, Qing; Xie, Yunbiao; Dong, Binbin; Chen, Xiaoyao

    2016-04-01

    Thermococcus sp. strain EP1 is a novel anaerobic hyperthermophilic archaeon isolated from a deep-sea hydrothermal vent on the East Pacific Rise. It grows optimally at 80 °C and can produce industrial enzymes at high temperature. We report here the draft genome of EP1, which contains 1,819,157 bp with a G+C content of 39.3%. The sequence will provide the genetic basis for better understanding of adaptation to hydrothermal environment and the development of novel thermostable enzymes for industrial application. PMID:26672397

  18. Detection of Short-Range DNA Interactions in Mammalian Cells Using High-Resolution Circular Chromosome Conformation Capture Coupled to Deep Sequencing.

    PubMed

    Millau, Jean-François; Gaudreau, Luc

    2015-01-01

    DNA interactions shape the genome to physically and functionally connect regulatory elements to their target genes. Studying these interactions is crucial to understanding the molecular mechanisms that regulate gene expression. In this chapter, we present a protocol for high-resolution circular chromosome conformation capture coupled to deep sequencing. This methodology allows to investigate short-range DNA interactions (<100 kbp) and to obtain high-resolution DNA interaction maps of loci. It is a powerful tool to explore how regulatory elements and genes are connected together. PMID:26404155

  19. Genome Sequence of the Psychrophilic Bacterium Tenacibaculum ovolyticum Strain da5A-8 Isolated from Deep Seawater

    PubMed Central

    Zhai, Zhenyu; Komatsu, Ayumi; Shibayama, Keigo

    2016-01-01

    Some bacterial species of the genus Tenacibaculum, including Tenacibaculum ovolyticum, have been known as fish pathogens in the sea. So far, the only published genome sequence for this genus is for Tenacibaculum dicentrarchi, which could also be a fish pathogen. Strain da5A-8, showing 100% identity to the 16S rRNA gene sequence of T. ovolyticum DSM 18103T, was isolated from seawater at a depth of 344 m in Kochi, Japan, and grew optimally at 10 to 20°C. The genome sequence of strain da5A-8 revealed the possible virulence genes commonly observed in the genus Tenacibaculum. PMID:27365358

  20. Genome Sequence of the Psychrophilic Bacterium Tenacibaculum ovolyticum Strain da5A-8 Isolated from Deep Seawater.

    PubMed

    Teramoto, Maki; Zhai, Zhenyu; Komatsu, Ayumi; Shibayama, Keigo; Suzuki, Masato

    2016-01-01

    Some bacterial species of the genus Tenacibaculum, including Tenacibaculum ovolyticum, have been known as fish pathogens in the sea. So far, the only published genome sequence for this genus is for Tenacibaculum dicentrarchi, which could also be a fish pathogen. Strain da5A-8, showing 100% identity to the 16S rRNA gene sequence of T. ovolyticum DSM 18103(T), was isolated from seawater at a depth of 344 m in Kochi, Japan, and grew optimally at 10 to 20°C. The genome sequence of strain da5A-8 revealed the possible virulence genes commonly observed in the genus Tenacibaculum. PMID:27365358

  1. Rapid genome mapping in nano channel array for highly complete and accurate de novo sequence assembly of the complex Aegilops tauschii genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Next-generation sequencing (NGS) technologies have enabled high-throughput and low-cost generation of sequence data; however, de novo genome assembly remains a great challenge, particularly for large genomes. NGS short reads are often insufficient to create large contigs that span repeat sequences...

  2. Transcriptome Analysis of the Mud Crab (Scylla paramamosain) by 454 Deep Sequencing: Assembly, Annotation, and Marker Discovery

    PubMed Central

    Ma, Hongyu; Ma, Chunyan; Li, Shujuan; Jiang, Wei; Li, Xincang; Liu, Yuexing; Ma, Lingbo

    2014-01-01

    In this study, we reported the characterization of the first transcriptome of the mud crab (Scylla paramamosain). Pooled cDNAs of four tissue types from twelve wild individuals were sequenced using the Roche 454 FLX platform. Analysis performed included de novo assembly of transcriptome sequences, functional annotation, and molecular marker discovery. A total of 1,314,101 high quality reads with an average length of 411 bp were generated by 454 sequencing on a mixed cDNA library. De novo assembly of these 1,314,101 reads produced 76,778 contigs (consisting of 818,154 reads) with 5.4-fold average sequencing coverage. The remaining 495,947 reads were singletons. A total of 78,268 unigenes were identified based on sequence similarity with known proteins (E≤0.00001) in UniProt and non-redundant protein databases. Meanwhile, 44,433 sequences were identified (E≤0.00001) using a BLASTN search against the NCBI nucleotide database. Gene Ontology (GO) analysis indicated that biosynthetic process, cell part, and ion binding were the most abundant terms in biological process, cellular component, and molecular function categories, respectively. Kyoto Encyclopedia of Genes and Genome (KEGG) pathway analysis revealed that 4,878 unigenes distributed in 281 different pathways. In addition, 19,011 microsatellites and 37,063 potential single nucleotide polymorphisms were detected from the transcriptome of S. paramamosain. Finally, thirty polymorphic microsatellite markers were developed and used to assess genetic diversity of a wild population of S. paramamosain. So far, existing sequence resources for S. paramamosain are extremely limited. The present study provides a characterization of transcriptome from multiple tissues and individuals, as well as an assessment of genetic diversity of a wild population. These sequence resources will facilitate the investigation of population genetic diversity, the development of genetic maps, and the conduct of molecular marker

  3. Draft genome sequence of Caminibacter mediatlanticus strain TB-2, an epsilonproteobacterium isolated from a deep-sea hydrothermal vent.

    PubMed

    Giovannelli, Donato; Ferriera, Steven; Johnson, Justin; Kravitz, Saul; Pérez-Rodríguez, Ileana; Ricci, Jessica; O'Brien, Charles; Voordeckers, James W; Bini, Elisabetta; Vetriani, Costantino

    2011-10-15

    Caminibacter mediatlanticus strain TB-2(T) [1], is a thermophilic, anaerobic, chemolithoautotrophic bacterium, isolated from the walls of an active deep-sea hydrothermal vent chimney on the Mid-Atlantic Ridge and the type strain of the species. C. mediatlanticus is a Gram-negative member of the Epsilonproteobacteria (order Nautiliales) that grows chemolithoautotrophically with H(2) as the energy source and CO(2) as the carbon source. Nitrate or sulfur is used as the terminal electron acceptor, with resulting production of ammonium and hydrogen sulfide, respectively. In view of the widespread distribution, importance and physiological characteristics of thermophilic Epsilonproteobacteria in deep-sea geothermal environments, it is likely that these organisms provide a relevant contribution to both primary productivity and the biogeochemical cycling of carbon, nitrogen and sulfur at hydrothermal vents. Here we report the main features of the genome of C. mediatlanticus strain TB-2(T). PMID:22180817

  4. Network properties derived from deep sequencing of human B-cell receptor repertoires delineate B-cell populations

    PubMed Central

    Bashford-Rogers, Rachael J.M.; Palser, Anne L.; Huntly, Brian J.; Rance, Richard; Vassiliou, George S.; Follows, George A.; Kellam, Paul

    2013-01-01

    The adaptive immune response selectively expands B- and T-cell clones following antigen recognition by B- and T-cell receptors (BCR and TCR), respectively. Next-generation sequencing is a powerful tool for dissecting the BCR and TCR populations at high resolution, but robust computational analyses are required to interpret such sequencing. Here, we develop a novel computational approach for BCR repertoire analysis using established next-generation sequencing methods coupled with network construction and population analysis. BCR sequences organize into networks based on sequence diversity, with differences in network connectivity clearly distinguishing between diverse repertoires of healthy individuals and clonally expanded repertoires from individuals with chronic lymphocytic leukemia (CLL) and other clonal blood disorders. Network population measures defined by the Gini Index and cluster sizes quantify the BCR clonality status and are robust to sampling and sequencing depths. BCR network analysis therefore allows the direct and quantifiable comparison of BCR repertoires between samples and intra-individual population changes between temporal or spatially separated samples and over the course of therapy. PMID:23742949

  5. Identification of microRNA-like RNAs from Curvularia lunata associated with maize leaf spot by bioinformation analysis and deep sequencing.

    PubMed

    Liu, Tong; Hu, John; Zuo, Yuhu; Jin, Yazhong; Hou, Jumei

    2016-04-01

    Deep sequencing of small RNAs is a useful tool to identify novel small RNAs that may be involved in fungal growth and pathogenesis. In this study, we used HiSeq deep sequencing to identify 747,487 unique small RNAs from Curvularia lunata. Among these small RNAs were 1012 microRNA-like RNAs (milRNAs), which are similar to other known microRNAs, and 48 potential novel milRNAs without homologs in other organisms have been identified using the miRBase© database. We used quantitative PCR to analyze the expression of four of these milRNAs from C. lunata at different developmental stages. The analysis revealed several changes associated with germinating conidia and mycelial growth, suggesting that these milRNAs may play a role in pathogen infection and mycelial growth. A total of 8334 target mRNAs for the 1012 milRNAs that were identified, and 256 target mRNAs for the 48 novel milRNAs were predicted by computational analysis. These target mRNAs of milRNAs were also performed by gene ontology and Kyoto Encyclopedia of Genes and Genomes pathway analysis. To our knowledge, this study is the first report of C. lunata's milRNA profiles. This information will provide a better understanding of pathogen development and infection mechanism. PMID:26481645

  6. Complete mitochondrial genome DNA sequence for two ophiuroids and a holothuroid: the utility of protein gene sequence and gene maps in the analyses of deep deuterostome phylogeny.

    PubMed

    Scouras, Andrea; Beckenbach, Karen; Arndt, Allan; Smith, Michael J

    2004-04-01

    The complete mitochondrial genome sequences have been determined for the holothuroid Cucumaria miniata and two ophiuroid species Ophiopholis aculeata and Ophiura lütkeni. In addition, the nucleotide sequence of the mitochondrial protein-coding genes for the asteroid Pisaster ochraceus has been completed. Maximum-likelihood and LogDet distance analyses of concatenated protein-coding sequences produced a series of trees that did not conclusively support generally accepted models of echinoderm phylogeny. The ophiuroid data consistently demonstrated accelerated nucleotide divergence rates and lack of stationarity. This confounds the phylogenetic analyses. Molecular investigations using individual protein-coding gene alignments demonstrated that the cytochrome b gene exhibits the least deviation in rate and stationarity and generated some trees consistent with proposed echinoderm phylogenies. Phylogenies based on echinoderm mitochondrial gene rearrangements also proved problematic because of extensive variation in gene order between and within classes. A comparison of the two distinctive ophiuroid mitochondrial gene orders supports the hypothesis that O. lütkeni has a more derived mitochondrial gene order versus O. aculeata. The variation in the echinoderm mitochondrial gene maps reinforces the limitations of the application of mitochondrial gene rearrangements as a global phylogenetic tool. PMID:15019608

  7. A Method for Amplicon Deep Sequencing of Drug Resistance Genes in Plasmodium falciparum Clinical Isolates from India

    PubMed Central

    Rao, Pavitra N.; Uplekar, Swapna; Kayal, Sriti; Mallick, Prashant K.; Bandyopadhyay, Nabamita; Kale, Sonal; Singh, Om P.; Mohanty, Akshaya; Mohanty, Sanjib; Wassmer, Samuel C.

    2016-01-01

    A major challenge to global malaria control and elimination is early detection and containment of emerging drug resistance. Next-generation sequencing (NGS) methods provide the resolution, scalability, and sensitivity required for high-throughput surveillance of molecular markers of drug resistance. We have developed an amplicon sequencing method on the Ion Torrent PGM platform for targeted resequencing of a panel of six Plasmodium falciparum genes implicated in resistance to first-line antimalarial therapy, including artemisinin combination therapy, chloroquine, and sulfadoxine-pyrimethamine. The protocol was optimized using 12 geographically diverse P. falciparum reference strains and successfully applied to multiplexed sequencing of 16 clinical isolates from India. The sequencing results from the reference strains showed 100% concordance with previously reported drug resistance-associated mutations. Single-nucleotide polymorphisms (SNPs) in clinical isolates revealed a number of known resistance-associated mutations and other nonsynonymous mutations that have not been implicated in drug resistance. SNP positions containing multiple allelic variants were used to identify three clinical samples containing mixed genotypes indicative of multiclonal infections. The amplicon sequencing protocol has been designed for the benchtop Ion Torrent PGM platform and can be operated with minimal bioinformatics infrastructure, making it ideal for use in countries that are endemic for the disease to facilitate routine large-scale surveillance of the emergence of drug resistance and to ensure continued success of the malaria treatment policy. PMID:27008882

  8. Discovery and profiling of novel and conserved microRNAs during flower development in Carya cathayensis via deep sequencing.

    PubMed

    Wang, Zheng Jia; Huang, Jian Qin; Huang, You Jun; Li, Zheng; Zheng, Bing Song

    2012-08-01

    Hickory (Carya cathayensis Sarg.) is an economically important woody plant in China, but its long juvenile phase delays yield. MicroRNAs (miRNAs) are critical regulators of genes and important for normal plant development and physiology, including flower development. We used Solexa technology to sequence two small RNA libraries from two floral differentiation stages in hickory to identify miRNAs related to flower development. We identified 39 conserved miRNA sequences from 114 loci belonging to 23 families as well as two novel and ten potential novel miRNAs belonging to nine families. Moreover, 35 conserved miRNA*s and two novel miRNA*s were detected. Twenty miRNA sequences from 49 loci belonging to 11 families were differentially expressed; all were up-regulated at the later stage of flower development in hickory. Quantitative real-time PCR of 12 conserved miRNA sequences, five novel miRNA families, and two novel miRNA*s validated that all were expressed during hickory flower development, and the expression patterns were similar to those detected with Solexa sequencing. Finally, a total of 146 targets of the novel and conserved miRNAs were predicted. This study identified a diverse set of miRNAs that were closely related to hickory flower development and that could help in plant floral induction. PMID:22481137

  9. Ultra-deep sequencing enables high-fidelity recovery of biodiversity for bulk arthropod samples without PCR amplification

    PubMed Central

    2013-01-01

    Background Next-generation-sequencing (NGS) technologies combined with a classic DNA barcoding approach have enabled fast and credible measurement for biodiversity of mixed environmental samples. However, the PCR amplification involved in nearly all existing NGS protocols inevitably introduces taxonomic biases. In the present study, we developed new Illumina pipelines without PCR amplifications to analyze terrestrial arthropod communities. Results Mitochondrial enrichment directly followed by Illumina shotgun sequencing, at an ultra-high sequence volume, enabled the recovery of Cytochrome c Oxidase subunit 1 (COI) barcode sequences, which allowed for the estimation of species composition at high fidelity for a terrestrial insect community. With 15.5 Gbp Illumina data, approximately 97% and 92% were detected out of the 37 input Operational Taxonomic Units (OTUs), whether the reference barcode library was used or not, respectively, while only 1 novel OTU was found for the latter. Additionally, relatively strong correlation between the sequencing volume and the total biomass was observed for species from the bulk sample, suggesting a potential solution to reveal relative abundance. Conclusions The ability of the new Illumina PCR-free pipeline for DNA metabarcoding to detect small arthropod specimens and its tendency to avoid most, if not all, false positives suggests its great potential in biodiversity-related surveillance, such as in biomonitoring programs. However, further improvement for mitochondrial enrichment is likely needed for the application of the new pipeline in analyzing arthropod communities at higher diversity. PMID:23587339

  10. A Method for Amplicon Deep Sequencing of Drug Resistance Genes in Plasmodium falciparum Clinical Isolates from India.

    PubMed

    Rao, Pavitra N; Uplekar, Swapna; Kayal, Sriti; Mallick, Prashant K; Bandyopadhyay, Nabamita; Kale, Sonal; Singh, Om P; Mohanty, Akshaya; Mohanty, Sanjib; Wassmer, Samuel C; Carlton, Jane M

    2016-06-01

    A major challenge to global malaria control and elimination is early detection and containment of emerging drug resistance. Next-generation sequencing (NGS) methods provide the resolution, scalability, and sensitivity required for high-throughput surveillance of molecular markers of drug resistance. We have developed an amplicon sequencing method on the Ion Torrent PGM platform for targeted resequencing of a panel of six Plasmodium falciparum genes implicated in resistance to first-line antimalarial therapy, including artemisinin combination therapy, chloroquine, and sulfadoxine-pyrimethamine. The protocol was optimized using 12 geographically diverse P. falciparum reference strains and successfully applied to multiplexed sequencing of 16 clinical isolates from India. The sequencing results from the reference strains showed 100% concordance with previously reported drug resistance-associated mutations. Single-nucleotide polymorphisms (SNPs) in clinical isolates revealed a number of known resistance-associated mutations and other nonsynonymous mutations that have not been implicated in drug resistance. SNP positions containing multiple allelic variants were used to identify three clinical samples containing mixed genotypes indicative of multiclonal infections. The amplicon sequencing protocol has been designed for the benchtop Ion Torrent PGM platform and can be operated with minimal bioinformatics infrastructure, making it ideal for use in countries that are endemic for the disease to facilitate routine large-scale surveillance of the emergence of drug resistance and to ensure continued success of the malaria treatment policy. PMID:27008882

  11. Ultra-deep sequencing detects ovarian cancer cells in peritoneal fluid and reveals somatic TP53 mutations in noncancerous tissues.

    PubMed

    Krimmel, Jeffrey D; Schmitt, Michael W; Harrell, Maria I; Agnew, Kathy J; Kennedy, Scott R; Emond, Mary J; Loeb, Lawrence A; Swisher, Elizabeth M; Risques, Rosa Ana

    2016-05-24

    Current sequencing methods are error-prone, which precludes the identification of low frequency mutations for early cancer detection. Duplex sequencing is a sequencing technology that decreases errors by scoring mutations present only in both strands of DNA. Our aim was to determine whether duplex sequencing could detect extremely rare cancer cells present in peritoneal fluid from women with high-grade serous ovarian carcinomas (HGSOCs). These aggressive cancers are typically diagnosed at a late stage and are characterized by TP53 mutations and peritoneal dissemination. We used duplex sequencing to analyze TP53 mutations in 17 peritoneal fluid samples from women with HGSOC and 20 from women without cancer. The tumor TP53 mutation was detected in 94% (16/17) of peritoneal fluid samples from women with HGSOC (frequency as low as 1 mutant per 24,736 normal genomes). Additionally, we detected extremely low frequency TP53 mutations (median mutant fraction 1/13,139) in peritoneal fluid from nearly all patients with and without cancer (35/37). These mutations were mostly deleterious, clustered in hotspots, increased with age, and were more abundant in women with cancer than in controls. The total burden of TP53 mutations in peritoneal fluid distinguished cancers from controls with 82% sensitivity (14/17) and 90% specificity (18/20). Age-associated, low frequency TP53 mutations were also found in 100% of peripheral blood samples from 15 women with and without ovarian cancer (none with hematologic disorder). Our results demonstrate the ability of duplex sequencing to detect rare cancer cells and provide evidence of widespread, low frequency, age-associated somatic TP53 mutation in noncancerous tissue. PMID:27152024

  12. Global Transcriptome Analysis of the Tentacle of the Jellyfish Cyanea capillata Using Deep Sequencing and Expressed Sequence Tags: Insight into the Toxin- and Degenerative Disease-Related Transcripts

    PubMed Central

    Liu, Dan; Wang, Qianqian; Ruan, Zengliang; He, Qian; Zhang, Liming

    2015-01-01

    Background Jellyfish contain diverse toxins and other bioactive components. However, large-scale identification of novel toxins and bioactive components from jellyfish has been hampered by the low efficiency of traditional isolation and purification methods. Results We performed de novo transcriptome sequencing of the tentacle tissue of the jellyfish Cyanea capillata. A total of 51,304,108 reads were obtained and assembled into 50,536 unigenes. Of these, 21,357 unigenes had homologues in public databases, but the remaining unigenes had no significant matches due to the limited sequence information available and species-specific novel sequences. Functional annotation of the unigenes also revealed general gene expression profile characteristics in the tentacle of C. capillata. A primary goal of this study was to identify putative toxin transcripts. As expected, we screened many transcripts encoding proteins similar to several well-known toxin families including phospholipases, metalloproteases, serine proteases and serine protease inhibitors. In addition, some transcripts also resembled molecules with potential toxic activities, including cnidarian CfTX-like toxins with hemolytic activity, plancitoxin-1, venom toxin-like peptide-6, histamine-releasing factor, neprilysin, dipeptidyl peptidase 4, vascular endothelial growth factor A, angiotensin-converting enzyme-like and endothelin-converting enzyme 1-like proteins. Most of these molecules have not been previously reported in jellyfish. Interestingly, we also characterized a number of transcripts with similarities to proteins relevant to several degenerative diseases, including Huntington’s, Alzheimer’s and Parkinson’s diseases. This is the first description of degenerative disease-associated genes in jellyfish. Conclusion We obtained a well-categorized and annotated transcriptome of C. capillata tentacle that will be an important and valuable resource for further understanding of jellyfish at the molecular

  13. Low-pass shotgun sequencing of the barley genome facilitates rapid identification of genes, conserved non-coding sequences and novel repeats

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Background: Barley has one of the largest and most complex genomes of all economically important food crops. The rise of new short read sequencing technologies such as Illumina/Solexa permits such large genomes to be effectively sampled at relatively low costs. An MDR (Mathematically Defined Repeat)...

  14. Draft Genome Sequence of the Deep-Sea Basidiomycetous Yeast Cryptococcus sp. Strain Mo29 Reveals Its Biotechnological Potential.

    PubMed

    Rédou, Vanessa; Kumar, Abhishek; Hainaut, Matthieu; Henrissat, Bernard; Record, Eric; Barbier, Georges; Burgaud, Gaëtan

    2016-01-01

    Cryptococcus sp. strain Mo29 was isolated from the Rainbow hydrothermal site on the Mid-Atlantic Ridge. Here, we present the draft genome sequence of this basidiomycetous yeast strain, which has highlighted its biotechnological potential as revealed by the presence of genes involved in the synthesis of secondary metabolites and biotechnologically important enzymes. PMID:27389259

  15. Draft Genome Sequence of the Deep-Sea Basidiomycetous Yeast Cryptococcus sp. Strain Mo29 Reveals Its Biotechnological Potential

    PubMed Central

    Rédou, Vanessa; Kumar, Abhishek; Hainaut, Matthieu; Henrissat, Bernard; Record, Eric; Barbier, Georges

    2016-01-01

    Cryptococcus sp. strain Mo29 was isolated from the Rainbow hydrothermal site on the Mid-Atlantic Ridge. Here, we present the draft genome sequence of this basidiomycetous yeast strain, which has highlighted its biotechnological potential as revealed by the presence of genes involved in the synthesis of secondary metabolites and biotechnologically important enzymes. PMID:27389259

  16. The utility of diversity profiling using Illumina 18S rRNA gene amplicon deep sequencing to detect and discriminate Toxoplasma gondii among the cyst-forming coccidia.

    PubMed

    Cooper, Madalyn K; Phalen, David N; Donahoe, Shannon L; Rose, Karrie; Šlapeta, Jan

    2016-01-30

    Next-generation sequencing (NGS) has the capacity to screen a single DNA sample and detect pathogen DNA from thousands of host DNA sequence reads, making it a versatile and informative tool for investigation of pathogens in diseased animals. The technique is effective and labor saving in the initial identification of pathogens, and will complement conventional diagnostic tests to associate the candidate pathogen with a disease process. In this report, we investigated the utility of the diversity profiling NGS approach using Illumina small subunit ribosomal RNA (18S rRNA) gene amplicon deep sequencing to detect Toxoplasma gondii in previously confirmed cases of toxoplasmosis. We then tested the diagnostic approach with species-specific PCR genotyping, histopathology and immunohistochemistry of toxoplasmosis in a Risso's dolphin (Grampus griseus) to systematically characterise the disease and associate causality. We show that the Euk7A/Euk570R primer set targeting the V1-V3 hypervariable region of the 18S rRNA gene can be used as a species-specific assay for cyst-forming coccidia and discriminate T. gondii. Overall, the approach is cost-effective and improves diagnostic decision support by narrowing the differential diagnosis list with more certainty than was previously possible. Furthermore, it supplements the limitations of cryptic protozoan morphology and surpasses the need for species-specific PCR primer combinations. PMID:26801593

  17. Deep Sequencing and Phylogenetic Analysis of Variants Resistant to Interferon-Based Protease Inhibitor Therapy in Chronic Hepatitis Induced by Genotype 1b Hepatitis C Virus

    PubMed Central

    Sato, Mitsuaki; Komatsu, Nobutoshi; Tatsumi, Akihisa; Miura, Mika; Muraoka, Masaru; Suzuki, Yuichiro; Amemiya, Fumitake; Takano, Shinichi; Fukasawa, Mitsuharu; Nakayama, Yasuhiro; Yamaguchi, Tatsuya; Uetake, Tomoyoshi; Inoue, Taisuke; Sato, Tadashi; Sakamoto, Minoru; Yamashita, Atsuya; Moriishi, Kohji; Enomoto, Nobuyuki

    2015-01-01

    ABSTRACT Because of recent advances in deep sequencing technology, detailed analysis of hepatitis C virus (HCV) quasispecies and their dynamic changes in response to direct antiviral agents (DAAs) became possible, although the role of quasispecies is not fully understood. In this study, to clarify the evolution of viral quasispecies and the origin of drug-resistant mutations induced by interferon (IFN)-based protease inhibitor therapy, the nonstructural-3 (NS3) region of genotype 1b HCV in 34 chronic hepatitis patients treated with telaprevir (TVR)/pegylated interferon (PEG-IFN)/ribavirin (RBV) was subjected to a deep sequencing study coupled with phylogenetic analysis. Twenty-six patients (76.5%) achieved a sustained viral response (SVR), while 8 patients did not (non-SVR; 23.5%). When the complexity of the quasispecies was expressed as the mutation frequency or Shannon entropy value, a significant decrease in the IFNL3 (rs8099917) TT group and a marginal decrease in the SVR group were found soon (12 h) after the introduction of treatment, whereas there was no decrease in the non-SVR group and no significant decrease in mutation frequency in the IFNL3 TG/GG group. In the analysis of viral quasispecies composition in non-SVR patients, major populations greatly changed, accompanied by the appearance of resistance, and the compositions were unlikely to return to the pretreatment composition even after the end of therapy. Clinically TVR-resistant variants were observed in 5 non-SVR patients (5/8, 62.5%), all of which were suspected to have acquired resistance by mutations through phylogenetic analysis. In conclusion, results of the study have important implications for treatment response and outcome in interferon-based protease inhibitor therapy. IMPORTANCE In the host, hepatitis C virus (HCV) consists of a variety of populations (quasispecies), and it is supposed that dynamic changes in quasispecies are closely related to pathogenesis, although this is poorly

  18. Variable sequence of events during the past seven terminations in two deep-sea cores from the Southern Ocean

    NASA Astrophysics Data System (ADS)

    Schneider Mor, Aya; Yam, Ruth; Bianchi, Cristina; Kunz-Pirrung, Martina; Gersonde, Rainer; Shemesh, Aldo

    2012-03-01

    The relationships among internally consistent records of summer sea-surface temperature (SSST), winter sea ice (WSI), and diatomaceous stable isotopes were studied across seven terminations over the last 660 ka in sedimentary cores from ODP sites 1093 and 1094. The sequence of events at both sites indicates that SSST and WSI changes led the carbon and nitrogen isotopic changes in three Terminations (TI, TII and TVI) and followed them in the other four Terminations (TIII, TIV, TV and TVII). In both TIII and TIV, the leads and lags between the proxies were related to weak glacial mode, while in TV and TVII they were due to the influence of the mid-Pleistocene transition. We show that the sequence of events is not unique and does not follow the same pattern across terminations, implying that the processes that initiated climate change in the Southern Ocean has varied through time.

  19. Independent studies using deep sequencing resolve the same set of core bacterial species dominating gut communities of honey bees.

    PubMed

    Sabree, Zakee L; Hansen, Allison K; Moran, Nancy A

    2012-01-01

    Starting in 2003, numerous studies using culture-independent methodologies to characterize the gut microbiota of honey bees have retrieved a consistent and distinctive set of eight bacterial species, based on near identity of the 16S rRNA gene sequences. A recent study [Mattila HR, Rios D, Walker-Sperling VE, Roeselers G, Newton ILG (2012) Characterization of the active microbiotas associated with honey bees reveals healthier and broader communities when colonies are genetically diverse. PLoS ONE 7(3): e32962], using pyrosequencing of the V1-V2 hypervariable region of the 16S rRNA gene, reported finding entirely novel bacterial species in honey bee guts, and used taxonomic assignments from these reads to predict metabolic activities based on known metabolisms of cultivable species. To better understand this discrepancy, we analyzed the Mattila et al. pyrotag dataset. In contrast to the conclusions of Mattila et al., we found that the large majority of pyrotag sequences belonged to clusters for which representative sequences were identical to sequences from previously identified core species of the bee microbiota. On average, they represent 95% of the bacteria in each worker bee in the Mattila et al. dataset, a slightly lower value than that found in other studies. Some colonies contain small proportions of other bacteria, mostly species of Enterobacteriaceae. Reanalysis of the Mattila et al. dataset also did not support a relationship between abundances of Bifidobacterium and of putative pathogens or a significant difference in gut communities between colonies from queens that were singly or multiply mated. Additionally, consistent with previous studies, the dataset supports the occurrence of considerable strain variation within core species, even within single colonies. The roles of these bacteria within bees, or the implications of the strain variation, are not yet clear. PMID:22829932

  20. Contrasted seismogenic and rheological behaviours from shallow and deep earthquake sequences in the North Tanzanian Divergence, East Africa

    NASA Astrophysics Data System (ADS)

    Albaric, J.; Perrot, J.; Déverchère, J.; Deschamps, A.; Le Gall, B.; Ferdinand, R. W.; Petit, C.; Tiberi, C.; Sue, C.; Songo, M.

    2010-12-01

    We report preliminary results of a seismological experiment, SEISMO-TANZ' 07, which consisted in the deployment of a local network (35 stations) in the East African Rift System (EARS), North Tanzania, during 6 months in 2007. We compare two earthquake sequences (Gelai and Manyara) occurring, respectively, in the southern end of the Kenya rift and in the North Tanzanian Divergence (NTD). Only distant of ˜150 km, their triggering mechanisms are different. None of the sequences depicts typical swarm or mainshock-aftershock patterns. They highlight the change in the magmatic/tectonic nature of the rift where the eastern branch of the EARS enters the Tanzanian craton. The similar shape and long-axis of the elongate sequences emphasize the preferred locus of active strain release along NE-SW discontinuities which probably root at depth into steep Proterozoic shear zones. At Gelai, the deformation is dominated by aseismic process involving slow slip on normal fault and dyke intrusion within the upper crust (Calais et al., 2008). The spatial and temporal earthquake distribution indicates a possible correlation between the Gelai crisis and the eruption of the nearby Oldoinyo Lengai volcano. At Manyara, the sequence is more uncommon, revealing a long-lasting seismic activity deeply rooted (˜20-35 km depth) possibly related to stress loading transmitted laterally. The yield strength envelope modelled from the depth frequency distribution of earthquakes in the NTD is consistent with the presence of a mafic lower crust and further supports the strength increase of the rifted crust from south Kenya to the NTD.

  1. Independent Studies Using Deep Sequencing Resolve the Same Set of Core Bacterial Species Dominating Gut Communities of Honey Bees

    PubMed Central

    Sabree, Zakee L.; Hansen, Allison K.; Moran, Nancy A.

    2012-01-01

    Starting in 2003, numerous studies using culture-independent methodologies to characterize the gut microbiota of honey bees have retrieved a consistent and distinctive set of eight bacterial species, based on near identity of the 16S rRNA gene sequences. A recent study [Mattila HR, Rios D, Walker-Sperling VE, Roeselers G, Newton ILG (2012) Characterization of the active microbiotas associated with honey bees reveals healthier and broader communities when colonies are genetically diverse. PLoS ONE 7(3): e32962], using pyrosequencing of the V1–V2 hypervariable region of the 16S rRNA gene, reported finding entirely novel bacterial species in honey bee guts, and used taxonomic assignments from these reads to predict metabolic activities based on known metabolisms of cultivable species. To better understand this discrepancy, we analyzed the Mattila et al. pyrotag dataset. In contrast to the conclusions of Mattila et al., we found that the large majority of pyrotag sequences belonged to clusters for which representative sequences were identical to sequences from previously identified core species of the bee microbiota. On average, they represent 95% of the bacteria in each worker bee in the Mattila et al. dataset, a slightly lower value than that found in other studies. Some colonies contain small proportions of other bacteria, mostly species of Enterobacteriaceae. Reanalysis of the Mattila et al. dataset also did not support a relationship between abundances of Bifidobacterium and of putative pathogens or a significant difference in gut communities between colonies from queens that were singly or multiply mated. Additionally, consistent with previous studies, the dataset supports the occurrence of considerable strain variation within core species, even within single colonies. The roles of these bacteria within bees, or the implications of the strain variation, are not yet clear. PMID:22829932

  2. The Venom Gland Transcriptome of Latrodectus tredecimguttatus Revealed by Deep Sequencing and cDNA Library Analysis

    PubMed Central

    He, Quanze; Duan, Zhigui; Yu, Ying; Liu, Zhen; Liu, Zhonghua; Liang, Songping

    2013-01-01

    Latrodectus tredecimguttatus, commonly known as black widow spider, is well known for its dangerous bite. Although its venom has been characterized extensively, some fundamental questions about its molecular composition remain unanswered. The limited transcriptome and genome data available prevent further understanding of spider venom at the molecular level. In the present study, we combined next-generation sequencing and conventional DNA sequencing to construct a venom gland transcriptome of the spider L. tredecimguttatus, which resulted in the identification of 9,666 and 480 high-confidence proteins among 34,334 de novo sequences and 1,024 cDNA sequences, respectively, by assembly, translation, filtering, quantification and annotation. Extensive functional analyses of these proteins indicated that mRNAs involved in RNA transport and spliceosome, protein translation, processing and transport were highly enriched in the venom gland, which is consistent with the specific function of venom glands, namely the production of toxins. Furthermore, we identified 146 toxin-like proteins forming 12 families, including 6 new families in this spider in which α-LTX-Lt1a family2 is firstly identified as a subfamily of α-LTX-Lt1a family. The toxins were classified according to their bioactivities into five categories that functioned in a coordinate way. Few ion channels were expressed in venom gland cells, suggesting a possible mechanism of protection from the attack of their own toxins. The present study provides a gland transcriptome profile and extends our understanding of the toxinome of spiders and coordination mechanism for toxin production in protein expression quantity. PMID:24312294

  3. Exploration for deep gas in the Devonian Chaco Basin of Southern Bolivia: Sequence stratigraphy, predictions, and well results

    SciTech Connect

    Williams, K.E.; Radovich, B.J.; Brett, J.W.

    1995-12-31

    In mid 1991, a team was assembled in Texaco`s Frontier Exploration Department (FED) to define the hydrocarbon potential of the Chaco Basin of Southern Bolivia. The Miraflores No. 1 was drilled in the fall of 1992, for stratigraphic objectives. The well confirmed the predicted stratigraphic trap in the Mid-Devonian, with gas discovered in two highstand and transgressive sands. They are low contrast and low resistivity sands that are found in a deep basin `tight gas` setting. Testing of the gas sands was complicated by drilling fluid interactions at the well bore. Subsequent analysis indicated that the existing porosity and permeability were reduced, such that a realistic test of reservoir capabilities was prevented.

  4. Genome Re-Sequencing of Semi-Wild Soybean Reveals a Complex Soja Population Structure and Deep Introgression

    PubMed Central

    Wu, Sanling; Wang, Ying-Ying; Ye, Chu-Yu; Bai, Xuefei; Li, Zefeng; Yan, Chenghai; Wang, Weidi; Wang, Ziqiang; Shu, Qingyao; Xie, Jiahua; Lee, Suk-Ha; Fan, Longjiang

    2014-01-01

    Semi-wild soybean is a unique type of soybean that retains both wild and domesticated characteristics, which provides an important intermediate type for understanding the evolution of the subgenus Soja population in the Glycine genus. In this study, a semi-wild soybean line (Maliaodou) and a wild line (Lanxi 1) collected from the lower Yangtze regions were deeply sequenced while nine other semi-wild lines were sequenced to a 3-fold genome coverage. Sequence analysis revealed that (1) no independent phylogenetic branch covering all 10 semi-wild lines was observed in the Soja phylogenetic tree; (2) besides two distinct subpopulations of wild and cultivated soybean in the Soja population structure, all semi-wild lines were mixed with some wild lines into a subpopulation rather than an independent one or an intermediate transition type of soybean domestication; (3) high heterozygous rates (0.19–0.49) were observed in several semi-wild lines; and (4) over 100 putative selective regions were identified by selective sweep analysis, including those related to the development of seed size. Our results suggested a hybridization origin for the semi-wild soybean, which makes a complex Soja population structure. PMID:25265539

  5. Deep Sequencing of Mixed Total DNA without Barcodes Allows Efficient Assembly of Highly Plastic Ascidian Mitochondrial Genomes

    PubMed Central

    Rubinstein, Nimrod D.; Feldstein, Tamar; Shenkar, Noa; Botero-Castro, Fidel; Griggio, Francesca; Mastrototaro, Francesco; Delsuc, Frédéric; Douzery, Emmanuel J.P.; Gissi, Carmela; Huchon, Dorothée

    2013-01-01

    Ascidians or sea squirts form a diverse group within chordates, which includes a few thousand members of marine sessile filter-feeding animals. Their mitochondrial genomes are characterized by particularly high evolutionary rates and rampant gene rearrangements. This extreme variability complicates standard polymerase chain reaction (PCR) based techniques for molecular characterization studies, and consequently only a few complete Ascidian mitochondrial genome sequences are available. Using the standard PCR and Sanger sequencing approach, we produced the mitochondrial genome of Ascidiella aspersa only after a great effort. In contrast, we produced five additional mitogenomes (Botrylloides aff. leachii, Halocynthia spinosa, Polycarpa mytiligera, Pyura gangelion, and Rhodosoma turcicum) with a novel strategy, consisting in sequencing the pooled total DNA samples of these five species using one Illumina HiSeq 2000 flow cell lane. Each mitogenome was efficiently assembled in a single contig using de novo transcriptome assembly, as de novo genome assembly generally performed poorly for this task. Each of the new six mitogenomes presents a different and novel gene order, showing that no syntenic block has been conserved at the ordinal level (in Stolidobranchia and in Phlebobranchia). Phylogenetic analyses support the paraphyly of both Ascidiacea and Phlebobranchia, with Thaliacea nested inside Phlebobranchia, although the deepest nodes of the Phlebobranchia–Thaliacea clade are not well resolved. The strategy described here thus provides a cost-effective approach to obtain complete mitogenomes characterized by a highly plastic gene order and a fast nucleotide/amino acid substitution rate. PMID:23709623

  6. DASAF: An R Package for Deep Sequencing-Based Detection of Fetal Autosomal Abnormalities from Maternal Cell-Free DNA

    PubMed Central

    Tang, Xiaoyan; Qiu, Feng; Tao, Chunmei; Gao, Junhui; Ma, Mengmeng; Zhong, Tingyan; Cai, JianPing; Li, Yixue

    2016-01-01

    Background. With the development of massively parallel sequencing (MPS), noninvasive prenatal diagnosis using maternal cell-free DNA is fast becoming the preferred method of fetal chromosomal abnormality detection, due to its inherent high accuracy and low risk. Typically, MPS data is parsed to calculate a risk score, which is used to predict whether a fetal chromosome is normal or not. Although there are several highly sensitive and specific MPS data-parsing algorithms, there are currently no tools that implement these methods. Results. We developed an R package, detection of autosomal abnormalities for fetus (DASAF), that implements the three most popular trisomy detection methods—the standard Z-score (STDZ) method, the GC correction Z-score (GCCZ) method, and the internal reference Z-score (IRZ) method—together with one subchromosome abnormality identification method (SCAZ). Conclusions. With the cost of DNA sequencing declining and with advances in personalized medicine, the demand for noninvasive prenatal testing will undoubtedly increase, which will in turn trigger an increase in the tools available for subsequent analysis. DASAF is a user-friendly tool, implemented in R, that supports identification of whole-chromosome as well as subchromosome abnormalities, based on maternal cell-free DNA sequencing data after genome mapping. PMID:27437397

  7. Shallow and deep earthquake sequences captured in the North Tanzanian Divergence, East Africa: Inferences on seismogenic processes and rheology

    NASA Astrophysics Data System (ADS)

    Albaric, J.; Perrot, J.; Déverchère, J.; Deschamps, A.; Ferdinand, R. W.; Le Gall, B.

    2009-12-01

    Using a temporary local seismic network of 35 stations deployed in North Tanzania (SEISMOTANZ'07 experiment) during 6 months in 2007, we captured two earthquake sequences (Gelai and Manyara) occurring respectively in the southern end of the Kenya rift and in the North Tanzanian Divergence (NTD). None of the sequences depicts typical swarm or mainshock-aftershock patterns. Although distant of only ~150 km, their triggering mechanisms appear to be different. They highlight a major change in the magmatic/tectonic nature of the rift where the eastern branch of the Est African Rift enters the Tanzanian craton. Both depict similar shape and long-axis, emphasizing the preferred locus of active strain release along NE-SW discontinuities which probably root at depth into steep Proterozoic shear zones. At Gelai, the deformation is dominated by aseismic processes involving slow slip on a normal fault and dyke intrusion within the upper crust, and an interaction with the eruption of the nearby Oldoinyo Lengai volcano. At Manyara, the sequence reveals a long-lasting seismic activity deeply rooted (~20-35 km depth), possibly indicative of stress loading transmitted laterally. Focal solutions demonstrate a mixture of normal and strike slip faulting on sub-vertical inherited structures striking N60°E. The yield stress envelope modelled from the depth frequency distribution of earthquakes in Manyara is consistent with the presence of a mafic lower crust and further supports the strength increase of the rifted crust from south Kenya to the NTD.

  8. Identification of novel microRNAs in primates by using the synteny information and small RNA deep sequencing data.

    PubMed

    Yuan, Zhidong; Liu, Hongde; Nie, Yumin; Ding, Suping; Yan, Mingli; Tan, Shuhua; Jin, Yuanchang; Sun, Xiao

    2013-01-01

    Current technologies that are used for genome-wide microRNA (miRNA) prediction are mainly based on BLAST tool. They often produce a large number of false positives. Here, we describe an effective approach for identifying orthologous pre-miRNAs in several primates based on syntenic information. Some of them have been validated by small RNA high throughput sequencing data. This approach uses the synteny information and experimentally validated miRNAs of human, and incorporates currently available algorithms and tools to identify the pre-miRNAs in five other primates. First, we identified 929 potential pre-miRNAs in the marmoset in which miRNAs have not yet been reported. Then, we predicted the miRNAs in other primates, and we successfully re-identified most of the published miRNAs and found 721, 979, 650 and 639 new potential pre-miRNAs in chimpanzee, gorilla, orangutan and rhesus macaque, respectively. Furthermore, the miRNA transcriptome in the four primates have been re-analyzed and some novel predicted miRNAs have been supported by the small RNA sequencing data. Finally, we analyzed the potential functions of those validated miRNAs and explored the regulatory elements and transcription factors of some validated miRNA genes of interest. The results show that our approach can effectively identify novel miRNAs and some miRNAs that supported by small RNA sequencing data maybe play roles in the nervous system. PMID:24135875

  9. DASAF: An R Package for Deep Sequencing-Based Detection of Fetal Autosomal Abnormalities from Maternal Cell-Free DNA.

    PubMed

    Liu, Baohong; Tang, Xiaoyan; Qiu, Feng; Tao, Chunmei; Gao, Junhui; Ma, Mengmeng; Zhong, Tingyan; Cai, JianPing; Li, Yixue; Ding, Guohui

    2016-01-01

    Background. With the development of massively parallel sequencing (MPS), noninvasive prenatal diagnosis using maternal cell-free DNA is fast becoming the preferred method of fetal chromosomal abnormality detection, due to its inherent high accuracy and low risk. Typically, MPS data is parsed to calculate a risk score, which is used to predict whether a fetal chromosome is normal or not. Although there are several highly sensitive and specific MPS data-parsing algorithms, there are currently no tools that implement these methods. Results. We developed an R package, detection of autosomal abnormalities for fetus (DASAF), that implements the three most popular trisomy detection methods-the standard Z-score (STDZ) method, the GC correction Z-score (GCCZ) method, and the internal reference Z-score (IRZ) method-together with one subchromosome abnormality identification method (SCAZ). Conclusions. With the cost of DNA sequencing declining and with advances in personalized medicine, the demand for noninvasive prenatal testing will undoubtedly increase, which will in turn trigger an increase in the tools available for subsequent analysis. DASAF is a user-friendly tool, implemented in R, that supports identification of whole-chromosome as well as subchromosome abnormalities, based on maternal cell-free DNA sequencing data after genome mapping. PMID:27437397

  10. Genome re-sequencing of semi-wild soybean reveals a complex Soja population structure and deep introgression.

    PubMed

    Qiu, Jie; Wang, Yu; Wu, Sanling; Wang, Ying-Ying; Ye, Chu-Yu; Bai, Xuefei; Li, Zefeng; Yan, Chenghai; Wang, Weidi; Wang, Ziqiang; Shu, Qingyao; Xie, Jiahua; Lee, Suk-Ha; Fan, Longjiang

    2014-01-01

    Semi-wild soybean is a unique type of soybean that retains both wild and domesticated characteristics, which provides an important intermediate type for understanding the evolution of the subgenus Soja population in the Glycine genus. In this study, a semi-wild soybean line (Maliaodou) and a wild line (Lanxi 1) collected from the lower Yangtze regions were deeply sequenced while nine other semi-wild lines were sequenced to a 3-fold genome coverage. Sequence analysis revealed that (1) no independent phylogenetic branch covering all 10 semi-wild lines was observed in the Soja phylogenetic tree; (2) besides two distinct subpopulations of wild and cultivated soybean in the Soja population structure, all semi-wild lines were mixed with some wild lines into a subpopulation rather than an independent one or an intermediate transition type of soybean domestication; (3) high heterozygous rates (0.19-0.49) were observed in several semi-wild lines; and (4) over 100 putative selective regions were identified by selective sweep analysis, including those related to the development of seed size. Our results suggested a hybridization origin for the semi-wild soybean, which makes a complex Soja population structure. PMID:25265539

  11. Deep COI sequencing of standardized benthic samples unveils overlooked diversity of Jordanian coral reefs in the northern Red Sea.

    PubMed

    Al-Rshaidat, Mamoon M D; Snider, Allison; Rosebraugh, Sydney; Devine, Amanda M; Devine, Thomas D; Plaisance, Laetitia; Knowlton, Nancy; Leray, Matthieu

    2016-09-01

    High-throughput sequencing (HTS) of DNA barcodes (metabarcoding), particularly when combined with standardized sampling protocols, is one of the most promising approaches for censusing overlooked cryptic invertebrate communities. We present biodiversity estimates based on sequencing of the cytochrome c oxidase subunit 1 (COI) gene for coral reefs of the Gulf of Aqaba, a semi-enclosed system in the northern Red Sea. Samples were obtained from standardized sampling devices (Autonomous Reef Monitoring Structures (ARMS)) deployed for 18 months. DNA barcoding of non-sessile specimens >2 mm revealed 83 OTUs in six phyla, of which only 25% matched a reference sequence in public databases. Metabarcoding of the 2 mm - 500 μm and sessile bulk fractions revealed 1197 OTUs in 15 animal phyla, of which only 4.9% matched reference barcodes. These results highlight the scarcity of COI data for cryptobenthic organisms of the Red Sea. Compared with data obtained using similar methods, our results suggest that Gulf of Aqaba reefs are less diverse than two Pacific coral reefs but much more diverse than an Atlantic oyster reef at a similar latitude. The standardized approaches used here show promise for establishing baseline data on biodiversity, monitoring the impacts of environmental change, and quantifying patterns of diversity at regional and global scales. PMID:27584940

  12. Deep Sequencing of the Trypanosoma cruzi GP63 Surface Proteases Reveals Diversity and Diversifying Selection among Chronic and Congenital Chagas Disease Patients

    PubMed Central

    Llewellyn, Martin S.; Messenger, Louisa A.; Luquetti, Alejandro O.; Garcia, Lineth; Torrico, Faustino; Tavares, Suelene B. N.; Cheaib, Bachar; Derome, Nicolas; Delepine, Marc; Baulard, Céline; Deleuze, Jean-Francois; Sauer, Sascha; Miles, Michael A.

    2015-01-01

    Background Chagas disease results from infection with the diploid protozoan parasite Trypanosoma cruzi. T. cruzi is highly genetically diverse, and multiclonal infections in individual hosts are common, but little studied. In this study, we explore T. cruzi infection multiclonality in the context of age, sex and clinical profile among a cohort of chronic patients, as well as paired congenital cases from Cochabamba, Bolivia and Goias, Brazil using amplicon deep sequencing technology. Methodology/ Principal Findings A 450bp fragment of the trypomastigote TcGP63I surface protease gene was amplified and sequenced across 70 chronic and 22 congenital cases on the Illumina MiSeq platform. In addition, a second, mitochondrial target—ND5—was sequenced across the same cohort of cases. Several million reads were generated, and sequencing read depths were normalized within patient cohorts (Goias chronic, n = 43, Goias congenital n = 2, Bolivia chronic, n = 27; Bolivia congenital, n = 20), Among chronic cases, analyses of variance indicated no clear correlation between intra-host sequence diversity and age, sex or symptoms, while principal coordinate analyses showed no clustering by symptoms between patients. Between congenital pairs, we found evidence for the transmission of multiple sequence types from mother to infant, as well as widespread instances of novel genotypes in infants. Finally, non-synonymous to synonymous (dn:ds) nucleotide substitution ratios among sequences of TcGP63Ia and TcGP63Ib subfamilies within each cohort provided powerful evidence of strong diversifying selection at this locus. Conclusions/Significance Our results shed light on the diversity of parasite DTUs within each patient, as well as the extent to which parasite strains pass between mother and foetus in congenital cases. Although we were unable to find any evidence that parasite diversity accumulates with age in our study cohorts, putative diversifying selection within members of the TcGP63I

  13. Deep sequencing of dsRNAs recovered from mosaic-diseased pigeonpea reveals the presence of a novel emaravirus: pigeonpea sterility mosaic virus 2.

    PubMed

    Elbeaino, Toufic; Digiaro, Michele; Uppala, Mangala; Sudini, Harikishan

    2015-08-01

    Deep-sequencing analysis of double-stranded RNA extracted from a mosaic-diseased pigeonpea plant (Cajanus cajan L., family Fabaceae) revealed the complete sequence of six emaravirus-like negative-sense RNA segments of 7009, 2229, 1335, 1491, 1833 and 1194 nucleotides in size. In the order from RNA1 to RNA6, these genomic RNAs contained ORFs coding for the RNA-dependent RNA polymerase (RdRp, p1 of 266 kDa), the glycoprotein precursor (GP, p2 of 74.5 kDa), the nucleocapsid (NC, p3 of 34.9 kDa), and the putative movement protein (MP, p4 of 40.7 kDa), while p5 (55 kDa) and p6 (27 kDa) had unknown functions. All RNA segments showed distant relationships to viruses of the genus Emaravirus, and in particular to pigeonpea sterility mosaic virus (PPSMV), with which they shared nucleotide sequence identity ranging from 48.5 % (RNA3) to 62.5 % (RNA1). In phylogenetic trees constructed from the sequences of the proteins encoded by RNA1, RNA2 and RNA3 (p1, p2 and p3), this new viral entity showed a consistent grouping with fig mosaic virus (FMV) and rose rosette virus (RRV), which formed a cluster of their own, clearly distinct from PPSMV-1. In experimental greenhouse trials, this novel virus was successfully transmitted to pigeonpea and French bean seedlings by the eriophyid mite Aceria cajani. Preliminary surveys conducted in the Hyderabad region (India) showed that the virus in question is widespread in pigeonpea plants affected by sterility mosaic disease (86.4 %) but is absent in symptomless plants. Based on molecular, biological and epidemiological features, this novel virus is the second emaravirus infecting pigeonpea, for which the provisional name pigeonpea sterility mosaic virus 2 (PPSMV-2) is proposed. PMID:26060057

  14. Deep transcriptome-sequencing and proteome analysis of the hydrothermal vent annelid Alvinella pompejana identifies the CvP-bias as a robust measure of eukaryotic thermostability

    PubMed Central

    2013-01-01

    Background Alvinella pompejana is an annelid worm that inhabits deep-sea hydrothermal vent sites in the Pacific Ocean. Living at a depth of approximately 2500 meters, these worms experience extreme environmental conditions, including high temperature and pressure as well as high levels of sulfide and heavy metals. A. pompejana is one of the most thermotolerant metazoans, making this animal a subject of great interest for studies of eukaryotic thermoadaptation. Results In order to complement existing EST resources we performed deep sequencing of the A. pompejana transcriptome. We identified several thousand novel protein-coding transcripts, nearly doubling the sequence data for this annelid. We then performed an extensive survey of previously established prokaryotic thermoadaptation measures to search for global signals of thermoadaptation in A. pompejana in comparison with mesophilic eukaryotes. In an orthologous set of 457 proteins, we found that the best indicator of thermoadaptation was the difference in frequency of charged versus polar residues (CvP-bias), which was highest in A. pompejana. CvP-bias robustly distinguished prokaryotic thermophiles from prokaryotic mesophiles, as well as the thermophilic fungus Chaetomium thermophilum from mesophilic eukaryotes. Experimental values for thermophilic proteins supported higher CvP-bias as a measure of thermal stability when compared to their mesophilic orthologs. Proteome-wide mean CvP-bias also correlated with the body temperatures of homeothermic birds and mammals. Conclusions Our work extends the transcriptome resources for A. pompejana and identifies the CvP-bias as a robust and widely applicable measure of eukaryotic thermoadaptation. Reviewer This article was reviewed by Sándor Pongor, L. Aravind and Anthony M. Poole. PMID:23324115

  15. Microdiversity of Deep-Sea Bacillales Isolated from Tyrrhenian Sea Sediments as Revealed by ARISA, 16S rRNA Gene Sequencing and BOX-PCR Fingerprinting

    PubMed Central

    Ettoumi, Besma; Guesmi, Amel; Brusetti, Lorenzo; Borin, Sara; Najjari, Afef; Boudabous, Abdellatif; Cherif, Ameur

    2013-01-01

    With respect to their terrestrial relatives, marine Bacillales have not been sufficiently investigated. In this report, the diversity of deep-sea Bacillales, isolated from seamount and non-seamount stations at 3,425 to 3,580 m depth in the Tyrrhenian Sea, was investigated using PCR fingerprinting and 16S rRNA sequence analysis. The isolate collection (n=120) was de-replicated by automated ribosomal intergenic spacer analysis (ARISA), and phylogenetic diversity was analyzed by 16S rRNA gene sequencing of representatives of each ARISA haplotype (n=37). Phylogenetic analysis of isolates showed their affiliation to six different genera of low G+C% content Gram-positive Bacillales: Bacillus, Staphylococcus, Exiguobacterium, Paenibacillus, Lysinibacillus and Terribacillus. Bacillus was the dominant genus represented by the species B. licheniformis, B. pumilus, B. subtilis, B. amyloliquefaciens and B. firmus, typically isolated from marine sediments. The most abundant species in the collection was B. licheniformis (n=85), which showed seven distinct ARISA haplotypes with haplotype H8 being the most dominant since it was identified by 63 isolates. The application of BOX-PCR fingerprinting to the B. licheniformis sub-collection allowed their separation into five distinct BOX genotypes, suggesting a high level of intraspecies diversity among marine B. licheniformis strains. This species also exhibited distinct strain distribution between seamount and non-seamount stations and was shown to be highly prevalent in non-seamount stations. This study revealed the great microdiversity of marine Bacillales and contributes to understanding the biogeographic distribution of marine bacteria in deep-sea sediments. PMID:24005887

  16. Microdiversity of deep-sea Bacillales isolated from Tyrrhenian sea sediments as revealed by ARISA, 16S rRNA gene sequencing and BOX-PCR fingerprinting.

    PubMed

    Ettoumi, Besma; Guesmi, Amel; Brusetti, Lorenzo; Borin, Sara; Najjari, Afef; Boudabous, Abdellatif; Cherif, Ameur

    2013-01-01

    With respect to their terrestrial relatives, marine Bacillales have not been sufficiently investigated. In this report, the diversity of deep-sea Bacillales, isolated from seamount and non-seamount stations at 3,425 to 3,580 m depth in the Tyrrhenian Sea, was investigated using PCR fingerprinting and 16S rRNA sequence analysis. The isolate collection (n=120) was de-replicated by automated ribosomal intergenic spacer analysis (ARISA), and phylogenetic diversity was analyzed by 16S rRNA gene sequencing of representatives of each ARISA haplotype (n=37). Phylogenetic analysis of isolates showed their affiliation to six different genera of low G+C% content Gram-positive Bacillales: Bacillus, Staphylococcus, Exiguobacterium, Paenibacillus, Lysinibacillus and Terribacillus. Bacillus was the dominant genus represented by the species B. licheniformis, B. pumilus, B. subtilis, B. amyloliquefaciens and B. firmus, typically isolated from marine sediments. The most abundant species in the collection was B. licheniformis (n=85), which showed seven distinct ARISA haplotypes with haplotype H8 being the most dominant since it was identified by 63 isolates. The application of BOX-PCR fingerprinting to the B. licheniformis sub-collection allowed their separation into five distinct BOX genotypes, suggesting a high level of intraspecies diversity among marine B. licheniformis strains. This species also exhibited distinct strain distribution between seamount and non-seamount stations and was shown to be highly prevalent in non-seamount stations. This study revealed the great microdiversity of marine Bacillales and contributes to understanding the biogeographic distribution of marine bacteria in deep-sea sediments. PMID:24005887

  17. Complete genome sequence of 'Halanaeroarchaeum sulfurireducens' M27-SA2, a sulfur-reducing and acetate-oxidizing haloarchaeon from the deep-sea hypersaline anoxic lake Medee.

    PubMed

    Messina, Enzo; Sorokin, Dimitry Y; Kublanov, Ilya V; Toshchakov, Stepan; Lopatina, Anna; Arcadi, Erika; Smedile, Francesco; La Spada, Gina; La Cono, Violetta; Yakimov, Michail M

    2016-01-01

    Strain M27-SA2 was isolated from the deep-sea salt-saturated anoxic lake Medee, which represents one of the most hostile extreme environments on our planet. On the basis of physiological studies and phylogenetic positioning this extremely halophilic euryarchaeon belongs to a novel genus 'Halanaeroarchaeum' within the family Halobacteriaceae. All members of this genus cultivated so far are strict anaerobes using acetate as the sole carbon and energy source and elemental sulfur as electron acceptor. Here we report the complete genome sequence of the strain M27-SA2 which is composed of a 2,129,244-bp chromosome and a 124,256-bp plasmid. This is the second complete genome sequence within the genus Halanaeroarchaeum. We demonstrate that genome of 'Halanaeroarchaeum sulfurireducens' M27-SA2 harbors complete metabolic pathways for acetate and sulfur catabolism and for de novo biosynthesis of 19 amino acids. The genomic analysis also reveals that 'Halanaeroarchaeum sulfurireducens' M27-SA2 harbors two prophage loci and one CRISPR locus, highly similar to that of Kulunda Steppe (Altai, Russia) isolate 'H. sulfurireducens' HSR2(T). The discovery of sulfur-respiring acetate-utilizing haloarchaeon in deep-sea hypersaline anoxic lakes has certain significance for understanding the biogeochemical functioning of these harsh ecosystems, which are incompatible with life for common organisms. Moreover, isolations of Halanaeroarchaeum members from geographically distant salt-saturated sites of different origin suggest a high degree of evolutionary success in their adaptation to this type of extreme biotopes around the world. PMID:27182430

  18. Appearances Can Be Deceptive: Revealing a Hidden Viral Infection with Deep Sequencing in a Plant Quarantine Context

    PubMed Central

    Candresse, Thierry; Filloux, Denis; Muhire, Brejnev; Julian, Charlotte; Galzi, Serge; Fort, Guillaume; Bernardo, Pauline; Daugrois, Jean-Heindrich; Fernandez, Emmanuel; Martin, Darren P.; Varsani, Arvind; Roumagnac, Philippe

    2014-01-01

    Comprehensive inventories of plant viral diversity are essential for effective quarantine and sanitation efforts. The safety of regulated plant material exchanges presently relies heavily on techniques such as PCR or nucleic acid hybridisation, which are only suited to the detection and characterisation of specific, well characterised pathogens. Here, we demonstrate the utility of sequence-independent next generation sequencing (NGS) of both virus-derived small interfering RNAs (siRNAs) and virion-associated nucleic acids (VANA) for the detailed identification and characterisation of viruses infecting two quarantined sugarcane plants. Both plants originated from Egypt and were known to be infected with Sugarcane streak Egypt Virus (SSEV; Genus Mastrevirus, Family Geminiviridae), but were revealed by the NGS approaches to also be infected by a second highly divergent mastrevirus, here named Sugarcane white streak Virus (SWSV). This novel virus had escaped detection by all routine quarantine detection assays and was found to also be present in sugarcane plants originating from Sudan. Complete SWSV genomes were cloned and sequenced from six plants and all were found to share >91% genome-wide identity. With the exception of two SWSV variants, which potentially express unusually large RepA proteins, the SWSV isolates display genome characteristics very typical to those of all other previously described mastreviruses. An analysis of virus-derived siRNAs for SWSV and SSEV showed them to be strongly influenced by secondary structures within both genomic single stranded DNA and mRNA transcripts. In addition, the distribution of siRNA size frequencies indicates that these mastreviruses are likely subject to both transcriptional and post-transcriptional gene silencing. Our study stresses the potential advantages of NGS-based virus metagenomic screening in a plant quarantine setting and indicates that such techniques could dramatically reduce the numbers of non

  19. Deep Sequencing Reveals Novel Genetic Variants in Children with Acute Liver Failure and Tissue Evidence of Impaired Energy Metabolism

    PubMed Central

    Valencia, C. Alexander; Wang, Xinjian; Wang, Jin; Peters, Anna; Simmons, Julia R.; Moran, Molly C.; Mathur, Abhinav; Husami, Ammar; Qian, Yaping; Sheridan, Rachel; Bove, Kevin E.; Witte, David; Huang, Taosheng; Miethke, Alexander G.

    2016-01-01

    Background & Aims The etiology of acute liver failure (ALF) remains elusive in almost half of affected children. We hypothesized that inherited mitochondrial and fatty acid oxidation disorders were occult etiological factors in patients with idiopathic ALF and impaired energy metabolism. Methods Twelve patients with elevated blood molar lactate/pyruvate ratio and indeterminate etiology were selected from a retrospective cohort of 74 subjects with ALF because their fixed and frozen liver samples were available for histological, ultrastructural, molecular and biochemical analysis. Results A customized next-generation sequencing panel for 26 genes associated with mitochondrial and fatty acid oxidation defects revealed mutations and sequence variants in five subjects. Variants involved the genes ACAD9, POLG, POLG2, DGUOK, and RRM2B; the latter not previously reported in subjects with ALF. The explanted livers of the patients with heterozygous, truncating insertion mutations in RRM2B showed patchy micro- and macrovesicular steatosis, decreased mitochondrial DNA (mtDNA) content <30% of controls, and reduced respiratory chain complex activity; both patients had good post-transplant outcome. One infant with severe lactic acidosis was found to carry two heterozygous variants in ACAD9, which was associated with isolated complex I deficiency and diffuse hypergranular hepatocytes. The two subjects with heterozygous variants of unknown clinical significance in POLG and DGUOK developed ALF following drug exposure. Their hepatocytes displayed abnormal mitochondria by electron microscopy. Conclusion Targeted next generation sequencing and correlation with histological, ultrastructural and functional studies on liver tissue in children with elevated lactate/pyruvate ratio expand the spectrum of genes associated with pediatric ALF. PMID:27483465

  20. Deep sequencing of non-ribosomal peptide synthetases and polyketide synthases from the microbiomes of Australian marine sponges.

    PubMed

    Woodhouse, Jason N; Fan, Lu; Brown, Mark V; Thomas, Torsten; Neilan, Brett A

    2013-09-01

    The biosynthesis of non-ribosomal peptide and polyketide natural products is facilitated by multimodular enzymes that contain domains responsible for the sequential condensation of amino and carboxylic subunits. These conserved domains provide molecular targets for the discovery of natural products from microbial metagenomes. This study demonstrates the application of tag-encoded FLX amplicon pyrosequencing (TEFAP) targeting non-ribosomal peptide synthetase (NRPS) and polyketide synthase (PKS) genes as a method for determining the identity and diversity of natural product biosynthesis genes. To validate this approach, we assessed the diversity of NRPS and PKS genes within the microbiomes of six Australian marine sponge species using both TEFAP and metagenomic whole-genome shotgun sequencing approaches. The TEFAP approach identified 100 novel ketosynthase (KS) domain sequences and 400 novel condensation domain sequences within the microbiomes of the six sponges. The diversity of KS domains within the microbiome of a single sponge species Scopalina sp. exceeded that of any previously surveyed marine sponge. Furthermore, this study represented the first to target the condensation domain from NRPS biosynthesis and resulted in the identification of a novel condensation domain lineage. This study highlights the untapped potential of Australian marine sponges for the isolation of novel bioactive natural products. Furthermore, this study demonstrates that TEFAP approaches can be applied to functional genes, involved in natural product biosynthesis, as a tool to aid natural product discovery. It is envisaged that this approach will be used across multiple environments, offering an insight into the biological processes that influence the production of secondary metabolites. PMID:23598791

  1. Expression Profiling of Preadipocyte MicroRNAs by Deep Sequencing on Chicken Lines Divergently Selected for Abdominal Fatness

    PubMed Central

    Wang, Weishi; Du, Zhi-Qiang; Cheng, Bohan; Wang, Yuxiang; Yao, Jing; Li, Yumao; Cao, Zhiping; Luan, Peng; Wang, Ning; Li, Hui

    2015-01-01

    Through posttranscriptional gene regulation, microRNA (miRNA) is linked to a wide variety of biological processes, including adipogenesis and lipid metabolism. Although miRNAs in mammalian adipogenesis have been worked on extensively, their study in chicken adipogenesis is still very limited. To find miRNAs potentially important for chicken preadipocyte development, we compared the preadipocyte miRNA expression profiles in two broiler lines divergently selected for abdominal fat content, by sequencing two small RNA libraries constructed for primary preadipocytes isolated from abdominal adipose tissues. After bioinformatics analyses, from chicken miRNAs deposited in miRBase 20.0, we identified 225 miRNAs to be expressed in preadipocytes, 185 in the lean line and 200 in the fat line (derived from 208 and 203 miRNA precursors, respectively), which corresponds to 114 miRNA families. The let-7 family miRNAs were the most abundant. Furthermore, we validated the sequencing results of 15 known miRNAs by qRT-PCR, and confirmed that the expression levels of most miRNAs correlated well with those of Solexa sequencing. A total of 33 miRNAs was significantly differentially expressed between the two chicken lines (P<0.05). Gene ontology analysis revealed that they could target genes enriched in the regulation of gene transcription and chromatin function, response to insulin stimulation, and IGF-1 signaling pathways, which could have important roles in preadipocyte development. Therefore, a valuable information and resource of miRNAs on chicken adipogenesis were provided in this study. Future functional investigations on these miRNAs could help explore related genes and molecular networks fundamental to preadipocyte development. PMID:25675096

  2. Role of IL-17 Pathways in Immune Privilege: A RNA Deep Sequencing Analysis of the Mice Testis Exposure to Fluoride.

    PubMed

    Huo, Meijun; Han, Haijun; Sun, Zilong; Lu, Zhaojing; Yao, Xinglei; Wang, Shaolin; Wang, Jundong

    2016-01-01

    We sequenced RNA transcripts from the testicles of healthy male mice, divided into a control group with distilled water and two experimental groups with 50 and 100 mg/l NaF in drinking water for 56 days. Bowtie/Tophat were used to align 50-bp paired-end reads into transcripts, Cufflinks to measure the relative abundance of each transcript and IPA to analyze RNA-Sequencing data. In the 100 mg/l NaF-treated group, four pathways related to IL-17, TGF-β and other cellular growth factor pathways were overexpressed. The mRNA expression of IL-17RA, IL-17RC, MAP2K1, MAP2K2, MAP2K3 and MAPKAPK2, monitored by qRT-PCR, increased remarkably in the 100 mg/L NaF group and coincided with the result of RNA-Sequencing. Fluoride exposure could disrupt spermatogenesis and testicles in male mice by influencing many signaling pathways and genes, which work on the immune signal transduction and cellular metabolism. The high expression of the IL-17 signal pathway was a response to the invasion of the testicular immune system due to extracellular fluoride. The PI3-kinase/AKT, MAPKs and the cytokines in TGF-β family were contributed to control the IL-17 pathway activation and maintain the immune privilege and spermatogenesis. All the findings provided new ideas for further molecular researches of fluorosis on the reproduction and immune response mechanism. PMID:27572304

  3. Mining tissue-specific contigs from peanut (Arachis hypogaea L.) for promoter cloning by deep transcriptome sequencing.

    PubMed

    Geng, Lili; Duan, Xiaohong; Liang, Chun; Shu, Changlong; Song, Fuping; Zhang, Jie

    2014-10-01

    Peanut (Arachis hypogaea L.), one of the most important oil legumes in the world, is heavily damaged by white grubs. Tissue-specific promoters are needed to incorporate insect resistance genes into peanut by genetic transformation to control the subterranean pests. Transcriptome sequencing is the most effective way to analyze differential gene expression in this non-model species and contribute to promoter cloning. The transcriptomes of the roots, seeds and leaves of peanut were sequenced using Illumina technology. A simple digital expression profile was established based on number of transcripts per million clean tags (TPM) from different tissues. Subsequently, 584 root-specific candidate transcript assembly contigs (TACs) and 316 seed-specific candidate TACs were identified. Among these candidate TACs, 55.3% were root-specific and 64.6% were seed-specific by semi-quantitative RT-PCR analysis. Moreover, the consistency of semi-quantitative RT-PCR with the simple digital expression profile was correlated with the length and TPM value of TACs. The results of gene ontology showed that some root-specific TACs are involved in stress resistance and respond to auxin stimulus, whereas, seed-specific candidate TACs are involved in embryo development, lipid storage and long-chain fatty acid biosynthesis. One root-specific promoter was cloned and characterized. We developed a high-yield screening system in peanut by establishing a simple digital expression profile based on Illumina sequencing. The feasible and rapid method presented by this study can be used for other non-model crops to explore tissue-specific or spatially specific promoters. PMID:25231965

  4. Role of IL-17 Pathways in Immune Privilege: A RNA Deep Sequencing Analysis of the Mice Testis Exposure to Fluoride

    PubMed Central

    Huo, Meijun; Han, Haijun; Sun, Zilong; Lu, Zhaojing; Yao, Xinglei; Wang, Shaolin; Wang, Jundong

    2016-01-01

    We sequenced RNA transcripts from the testicles of healthy male mice, divided into a control group with distilled water and two experimental groups with 50 and 100 mg/l NaF in drinking water for 56 days. Bowtie/Tophat were used to align 50-bp paired-end reads into transcripts, Cufflinks to measure the relative abundance of each transcript and IPA to analyze RNA-Sequencing data. In the 100 mg/l NaF-treated group, four pathways related to IL-17, TGF-β and other cellular growth factor pathways were overexpressed. The mRNA expression of IL-17RA, IL-17RC, MAP2K1, MAP2K2, MAP2K3 and MAPKAPK2, monitored by qRT-PCR, increased remarkably in the 100 mg/L NaF group and coincided with the result of RNA-Sequencing. Fluoride exposure could disrupt spermatogenesis and testicles in male mice by influencing many signaling pathways and genes, which work on the immune signal transduction and cellular metabolism. The high expression of the IL-17 signal pathway was a response to the invasion of the testicular immune system due to extracellular fluoride. The PI3-kinase/AKT, MAPKs and the cytokines in TGF-β family were contributed to control the IL-17 pathway activation and maintain the immune privilege and spermatogenesis. All the findings provided new ideas for further molecular researches of fluorosis on the reproduction and immune response mechanism. PMID:27572304

  5. Quantitative and qualitative differences in celiac disease epitopes among durum wheat varieties identified through deep RNA-amplicon sequencing

    PubMed Central

    2013-01-01

    Background Wheat gluten is important for the industrial quality of bread wheat (Triticum aestivum L.) and durum wheat (T. turgidum L.). Gluten proteins are also the source of immunogenic peptides that can trigger a T cell reaction in celiac disease (CD) patients, leading to inflammatory responses in the small intestine. Various peptides with three major T cell epitopes involved in CD are derived from alpha-gliadin fraction of gluten. Alpha-gliadins are encoded by a large multigene family and amino acid variation in the CD epitopes is known to influence the immunogenicity of individual gene family members. Current commercial methods of gluten detection are unable to distinguish between immunogenic and non-immunogenic CD epitope variants and thus to accurately quantify the overall CD epitope load of a given wheat variety. Such quantification is indispensable for correct selection of wheat varieties with low potential to cause CD. Results A 454 RNA-amplicon sequencing method was developed for alpha-gliadin transcripts encompassing the three major CD epitopes and their variants. The method was used to screen developing grains on plants of 61 different durum wheat cultivars and accessions. A dedicated sequence analysis pipeline returned a total of 304 unique alpha-gliadin transcripts, corresponding to a total of 171 ‘unique deduced protein fragments’ of alpha-gliadins. The numbers of these fragments obtained in each plant were used to calculate quantitative and quantitative differences between the CD epitopes expressed in the endosperm of these wheat plants. A few plants showed a lower fraction of CD epitope-encoding alpha-gliadin transcripts, but none were free of CD epitopes. Conclusions The dedicated 454 RNA-amplicon sequencing method enables 1) the grouping of wheat plants according to the genetic variation in alpha-gliadin transcripts, and 2) the screening for plants which are potentially less CD-immunogenic. The resulting alpha-gliadin sequence database will

  6. Deep sequencing identifies deregulation of microRNAs involved with vincristine drug-resistance of colon cancer cells

    PubMed Central

    Dong, Wei-Hua; Li, Qin; Zhang, Xiao-Yan; Guo, Qing; Li, Huizheng; Wang, Tian-Yun

    2015-01-01

    Background: Vincristine (VCR) is a chemical that is widely used in tumor therapy. While long-term use can make tumor cells resistant to VCR, the underlying mechanisms of this resistance are still unclear. Objective: This study aimed at investigating the role of microRNA (miRNA) in colon cancer drug resistance. Methods: HCT-8 colon carcinoma cells were cultured and treated with different VCR concentrations to establish an HCT-8/VCR resistant cell line. Whole-genome screens, HiSeq 2500 sequencing, and bioinformatics methods were used to detect and analyze differences in miRNA expression between the drug-resistant HCT-8/VCR cells and non-resistant HCT-8 cells. Differential expression profiles of miRNAs were constructed based on sequencing result. Results: The HCT-8/VCR resistant colon carcinoma cell line was established. With regard to the difference in drug resistance between HCT-8/VCR and HCT-8 cells, 24 miRNAs showed statistically significant differences in their expression (fold change > 4), of which 17 were up-regulated. Seven miRNAs were down-regulated. Conclusion: As abnormal expression of miRNAs was associated with VCR resistance of colon carcinoma cells, differences in miRNA expression may play a key role in VCR resistance of colon cancer cells. PMID:26617885

  7. Identification and expression profiling of Vigna mungo microRNAs from leaf small RNA transcriptome by deep sequencing.

    PubMed

    Paul, Sujay; Kundu, Anirban; Pal, Amita

    2014-01-01

    MicroRNAs (miRNAs) represent a class of small non-coding RNA molecules that play a crucial role in post-transcriptional gene regulation. Several conserved and species-specific miRNAs have been characterized to date, predominantly from the plant species whose genome is well characterized. However, information on the variability of these regulatory RNAs in economically important but genetically less characterized crop species are limited. Vigna mungo is an important grain legume, which is grown primarily for its protein-rich edible seeds. miRNAs from this species have not been identified to date due to lack of genome sequence information. To identify miRNAs from V. mungo, a small RNA library was constructed from young leaves. High-throughput Illumina sequencing technology and bioinformatic analysis of the small RNA reads led to the identification of 66 miRNA loci represented by 45 conserved miRNAs belonging to 19 families and eight non-conserved miRNAs belonging to seven families. Besides, 13 novel miRNA candidates in V. mungo were also identified. Expression patterns of selected conserved, non-conserved, and novel miRNA candidates have been demonstrated in leaf, stem, and root tissues by quantitative polymerase chain reaction, and potential target genes were predicted for most of the conserved miRNAs. This information offers genomic resources for better understanding of miRNA mediated post-transcriptional gene regulation. PMID:24138283

  8. Coffee and tomato share common gene repertoires as revealed by deep sequencing of seed and cherry transcripts.

    PubMed

    Lin, Chenwei; Mueller, Lukas A; Mc Carthy, James; Crouzillat, Dominique; Pétiard, Vincent; Tanksley, Steven D

    2005-12-01

    An EST database has been generated for coffee based on sequences from approximately 47,000 cDNA clones derived from five different stages/tissues, with a special focus on developing seeds. When computationally assembled, these sequences correspond to 13,175 unigenes, which were analyzed with respect to functional annotation, expression profile and evolution. Compared with Arabidopsis, the coffee unigenes encode a higher proportion of proteins related to protein modification/turnover and metabolism-an observation that may explain the high diversity of metabolites found in coffee and related species. Several gene families were found to be either expanded or unique to coffee when compared with Arabidopsis. A high proportion of these families encode proteins assigned to functions related to disease resistance. Such families may have expanded and evolved rapidly under the intense pathogen pressure experienced by a tropical, perennial species like coffee. Finally, the coffee gene repertoire was compared with that of Arabidopsis and Solanaceous species (e.g. tomato). Unlike Arabidopsis, tomato has a nearly perfect gene-for-gene match with coffee. These results are consistent with the facts that coffee and tomato have a similar genome size, chromosome karyotype (tomato, n=12; coffee n=11) and chromosome architecture. Moreover, both belong to the Asterid I clade of dicot plant families. Thus, the biology of coffee (family Rubiacaeae) and tomato (family Solanaceae) may be united into one common network of shared discoveries, resources and information. PMID:16273343

  9. Ultra-deep profiling of alternatively spliced Drosophila Dscam isoforms by circularization-assisted multi-segment sequencing

    PubMed Central

    Sun, Wei; You, Xintian; Gogol-Döring, Andreas; He, Haihuai; Kise, Yoshiaki; Sohn, Madlen; Chen, Tao; Klebes, Ansgar; Schmucker, Dietmar; Chen, Wei

    2013-01-01

    The Drosophila melanogaster gene Dscam (Down syndrome cell adhesion molecule) can generate thousands of different ectodomains via mutual exclusive splicing of three large exon clusters. The isoform diversity plays a profound role in both neuronal wiring and pathogen recognition. However, the isoform expression pattern at the global level remained unexplored. Here, we developed a novel method that allows for direct quantification of the alternatively spliced exon combinations from over hundreds of millions of Dscam transcripts in one sequencing run. With unprecedented sequencing depth, we detected a total of 18 496 isoforms, out of 19 008 theoretically possible combinations. Importantly, we demonstrated that alternative splicing between different clusters is independent. Moreover, the isoforms were expressed across a broad dynamic range, with significant bias in cell/tissue and developmental stage-specific patterns. Hitherto underappreciated, such bias can dramatically reduce the ability of neurons to display unique surface receptor codes. Therefore, the seemingly excessive diversity encoded in the Dscam locus might nevertheless be essential for a robust self and non-self discrimination in neurons. PMID:23792425

  10. Deep sequencing shows multiple oligouridylations are required for 3' to 5' degradation of histone mRNAs on polyribosomes.

    PubMed

    Slevin, Michael K; Meaux, Stacie; Welch, Joshua D; Bigler, Rebecca; Miliani de Marval, Paula L; Su, Wei; Rhoads, Robert E; Prins, Jan F; Marzluff, William F

    2014-03-20

    Histone mRNAs are rapidly degraded when DNA replication is inhibited during S phase with degradation initiating with oligouridylation of the stem loop at the 3' end. We developed a customized RNA sequencing strategy to identify the 3' termini of degradation intermediates of histone mRNAs. Using this strategy, we identified two types of oligouridylated degradation intermediates: RNAs ending at different sites of the 3' side of the stem loop that resulted from initial degradation by 3'hExo and intermediates near the stop codon and within the coding region. Sequencing of polyribosomal histone mRNAs revealed that degradation initiates and proceeds 3' to 5' on translating mRNA and that many intermediates are capped. Knockdown of the exosome-associated exonuclease PM/Scl-100, but not the Dis3L2 exonuclease, slows histone mRNA degradation consistent with 3' to 5' degradation by the exosome containing PM/Scl-100. Knockdown of No-go decay factors also slowed histone mRNA degradation, suggesting a role in removing ribosomes from partially degraded mRNAs. PMID:24656133

  11. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences

    PubMed Central

    Quang, Daniel; Xie, Xiaohui

    2016-01-01

    Modeling the properties and functions of DNA sequences is an important, but challenging task in the broad field of genomics. This task is particularly difficult for non-coding DNA, the vast majority of which is still poorly understood in terms of function. A powerful predictive model for the function of non-coding DNA can have enormous benefit for both basic science and translational research because over 98% of the human genome is non-coding and 93% of disease-associated variants lie in these regions. To address this need, we propose DanQ, a novel hybrid convolutional and bi-directional long short-term memory recurrent neural network framework for predicting non-coding function de novo from sequence. In the DanQ model, the convolution layer captures regulatory motifs, while the recurrent layer captures long-term dependencies between the motifs in order to learn a regulatory ‘grammar’ to improve predictions. DanQ improves considerably upon other models across several metrics. For some regulatory markers, DanQ can achieve over a 50% relative improvement in the area under the precision-recall curve metric compared to related models. We have made the source code available at the github repository http://github.com/uci-cbcl/DanQ. PMID:27084946

  12. Identification and Analysis of the Porcine MicroRNA in Porcine Cytomegalovirus-Infected Macrophages Using Deep Sequencing

    PubMed Central

    Liu, Xiao; Liao, Shan; Xu, Zhiwen; Zhu, Ling; Yang, Fan; Guo, Wanzhu

    2016-01-01

    Porcine cytomegalovirus (PCMV; genus Cytomegalovirus, subfamily Betaherpesvirinae, family Herpesviridae) is an immunosuppressive virus that mainly inhibits the immune function of T lymphocytes and macrophages, which has caused substantial damage in the farming industry. In this study, we obtained the miRNA expression profiles of PCMV-infected porcine macrophages via high-throughput sequencing. The comprehensive analysis of miRNA profiles showed that 239 miRNA database-annotated and 355 novel pig-encoded miRNAs were detected. Of these, 130 miRNAs showed significant differential expression between the PCMV-infected and uninfected porcine macrophages. The 10 differentially expressed pig-encoded miRNAs were further determined by stem-loop reverse-transcription polymerase chain reaction, and the results were consistent with the high-throughput sequencing. Gene Ontology analysis of the target genes of miRNAs in PCMV-infected porcine macrophages showed that the differentially expressed miRNAs are mainly involved in immune and metabolic processes. This is the first report of the miRNA transcriptome in porcine macrophages and an analysis of the miRNA regulatory mechanisms during PCMV infection. Further research into the regulatory mechanisms of miRNAs during immunosuppressive viral infections should contribute to the treatment and prevention of immunosuppressive viruses. PMID:26943793

  13. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences.

    PubMed

    Quang, Daniel; Xie, Xiaohui

    2016-06-20

    Modeling the properties and functions of DNA sequences is an important, but challenging task in the broad field of genomics. This task is particularly difficult for non-coding DNA, the vast majority of which is still poorly understood in terms of function. A powerful predictive model for the function of non-coding DNA can have enormous benefit for both basic science and translational research because over 98% of the human genome is non-coding and 93% of disease-associated variants lie in these regions. To address this need, we propose DanQ, a novel hybrid convolutional and bi-directional long short-term memory recurrent neural network framework for predicting non-coding function de novo from sequence. In the DanQ model, the convolution layer captures regulatory motifs, while the recurrent layer captures long-term dependencies between the motifs in order to learn a regulatory 'grammar' to improve predictions. DanQ improves considerably upon other models across several metrics. For some regulatory markers, DanQ can achieve over a 50% relative improvement in the area under the precision-recall curve metric compared to related models. We have made the source code available at the github repository http://github.com/uci-cbcl/DanQ. PMID:27084946

  14. Coffee and tomato share common gene repertoires as revealed by deep sequencing of seed and cherry transcripts

    PubMed Central

    Lin, Chenwei; Mueller, Lukas A.; Carthy, James Mc; Crouzillat, Dominique; Pétiard, Vincent

    2005-01-01

    An EST database has been generated for coffee based on sequences from approximately 47,000 cDNA clones derived from five different stages/tissues, with a special focus on developing seeds. When computationally assembled, these sequences correspond to 13,175 unigenes, which were analyzed with respect to functional annotation, expression profile and evolution. Compared with Arabidopsis, the coffee unigenes encode a higher proportion of proteins related to protein modification/turnover and metabolism—an observation that may explain the high diversity of metabolites found in coffee and related species. Several gene families were found to be either expanded or unique to coffee when compared with Arabidopsis. A high proportion of these families encode proteins assigned to functions related to disease resistance. Such families may have expanded and evolved rapidly under the intense pathogen pressure experienced by a tropical, perennial species like coffee. Finally, the coffee gene repertoire was compared with that of Arabidopsis and Solanaceous species (e.g. tomato). Unlike Arabidopsis, tomato has a nearly perfect gene-for-gene match with coffee. These results are consistent with the facts that coffee and tomato have a similar genome size, chromosome karyotype (tomato, n=12; coffee n=11) and chromosome architecture. Moreover, both belong to the Asterid I clade of dicot plant families. Thus, the biology of coffee (family Rubiacaeae) and tomato (family Solanaceae) may be united into one common network of shared discoveries, resources and information. PMID:16273343

  15. Deep Sequencing of HIV-1 RNA and DNA in Newly Diagnosed Patients with Baseline Drug Resistance Showed No Indications for Hidden Resistance and Is Biased by Strong Interference of Hypermutation.

    PubMed

    Dauwe, Kenny; Staelens, Delfien; Vancoillie, Leen; Mortier, Virginie; Verhofstede, Chris

    2016-06-01

    Deep sequencing of plasma RNA or proviral DNA may be an interesting alternative to population sequencing for the detection of baseline transmitted HIV-1 drug resistance. Using a Roche 454 GS Junior HIV-1 prototype kit, we performed deep sequencing of the HIV-1 protease and reverse transcriptase genes on paired plasma and buffy coat samples from newly diagnosed HIV-1-positive individuals. Selection was based on the outcome of population sequencing and included 12 patients with either a revertant amino acid at codon 215 of the reverse transcriptase or a singleton resistance mutation, 4 patients with multiple resistance mutations, and 4 patients with wild-type virus. Deep sequencing of RNA and DNA detected 6 and 43 mutations, respectively, that were not identified by population sequencing. A subsequently performed hypermutation analysis, however, revealed hypermutation in 61.19% of 3,188 DNA reads with a resistance mutation. The removal of hypermutated reads dropped the number of additional mutations in DNA from 43 to 17. No hypermutation evidence was found in the RNA reads. Five of the 6 additional RNA mutations and all additional DNA mutations, after full exclusion of hypermutation bias, were observed in the 3 individuals with multiple resistance mutations detected by population sequencing. Despite focused selection of patients with T215 revertants or singleton mutations, deep sequencing failed to identify the resistant T215Y/F or M184V or any other resistance mutation, indicating that in most of these cases there is no hidden resistance and that the virus detected at diagnosis by population sequencing is the original infecting variant. PMID:27076656

  16. MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island.

    PubMed

    Ashton, Philip M; Nair, Satheesh; Dallman, Tim; Rubino, Salvatore; Rabsch, Wolfgang; Mwaigwisya, Solomon; Wain, John; O'Grady, Justin

    2015-03-01

    Short-read, high-throughput sequencing technology cannot identify the chromosomal position of repetitive insertion sequences that typically flank horizontally acquired genes such as bacterial virulence genes and antibiotic resistance genes. The MinION nanopore sequencer can produce long sequencing reads on a device similar in size to a USB memory stick. Here we apply a MinION sequencer to resolve the structure and chromosomal insertion site of a composite antibiotic resistance island in Salmonella Typhi Haplotype 58. Nanopore sequencing data from a single 18-h run was used to create a scaffold for an assembly generated from short-read Illumina data. Our results demonstrate the potential of the MinION device in clinical laboratories to fully characterize the epidemic spread of bacterial pathogens. PMID:25485618

  17. Bacterial Communities Associated with Host-Adapted Populations of Pea Aphids Revealed by Deep Sequencing of 16S Ribosomal DNA

    PubMed Central

    Gauthier, Jean-Pierre; Outreman, Yannick; Mieuzet, Lucie; Simon, Jean-Christophe

    2015-01-01

    Associations between microbes and animals are ubiquitous and hosts may benefit from harbouring microbial communities through improved resource exploitation or resistance to environmental stress. The pea aphid, Acyrthosiphon pisum, is the host of heritable bacterial symbionts, including the obligate endosymbiont Buchnera aphidicola and several facultative symbionts. While obligate symbionts supply aphids with key nutrients, facultative symbionts influence their hosts in many ways such as protection against natural enemies, heat tolerance, color change and reproduction alteration. The pea aphid also encompasses multiple plant-specialized biotypes, each adapted to one or a few legume species. Facultative symbiont communities differ strongly between biotypes, although bacterial involvement in plant specialization is uncertain. Here, we analyse the diversity of bacterial communities associated with nine biotypes of the pea aphid complex using amplicon pyrosequencing of 16S rRNA genes. Combined clustering and phylogenetic analyses of 16S sequences allowed identifying 21 bacterial OTUs (Operational Taxonomic Unit). More than 98% of the sequencing reads were assigned to known pea aphid symbionts. The presence of Wolbachia was confirmed in A. pisum while Erwinia and Pantoea, two gut associates, were detected in multiple samples. The diversity of bacterial communities harboured by pea aphid biotypes was very low, ranging from 3 to 11 OTUs across samples. Bacterial communities differed more between than within biotypes but this difference did not correlate with the genetic divergence between biotypes. Altogether, these results confirm that the aphid microbiota is dominated by a few heritable symbionts and that plant specialization is an important structuring factor of bacterial communities associated with the pea aphid complex. However, since we examined the microbiota of aphid samples kept a few generations in controlled conditions, it may be that bacterial diversity was

  18. Bacterial communities associated with host-adapted populations of pea aphids revealed by deep sequencing of 16S ribosomal DNA.

    PubMed

    Gauthier, Jean-Pierre; Outreman, Yannick; Mieuzet, Lucie; Simon, Jean-Christophe

    2015-01-01

    Associations between microbes and animals are ubiquitous and hosts may benefit from harbouring microbial communities through improved resource exploitation or resistance to environmental stress. The pea aphid, Acyrthosiphon pisum, is the host of heritable bacterial symbionts, including the obligate endosymbiont Buchnera aphidicola and several facultative symbionts. While obligate symbionts supply aphids with key nutrients, facultative symbionts influence their hosts in many ways such as protection against natural enemies, heat tolerance, color change and reproduction alteration. The pea aphid also encompasses multiple plant-specialized biotypes, each adapted to one or a few legume species. Facultative symbiont communities differ strongly between biotypes, although bacterial involvement in plant specialization is uncertain. Here, we analyse the diversity of bacterial communities associated with nine biotypes of the pea aphid complex using amplicon pyrosequencing of 16S rRNA genes. Combined clustering and phylogenetic analyses of 16S sequences allowed identifying 21 bacterial OTUs (Operational Taxonomic Unit). More than 98% of the sequencing reads were assigned to known pea aphid symbionts. The presence of Wolbachia was confirmed in A. pisum while Erwinia and Pantoea, two gut associates, were detected in multiple samples. The diversity of bacterial communities harboured by pea aphid biotypes was very low, ranging from 3 to 11 OTUs across samples. Bacterial communities differed more between than within biotypes but this difference did not correlate with the genetic divergence between biotypes. Altogether, these results confirm that the aphid microbiota is dominated by a few heritable symbionts and that plant specialization is an important structuring factor of bacterial communities associated with the pea aphid complex. However, since we examined the microbiota of aphid samples kept a few generations in controlled conditions, it may be that bacterial diversity was

  19. Revisiting bovine pyometra--new insights into the disease using a culture-independent deep sequencing approach.

    PubMed

    Knudsen, Lif Rødtness Vesterby; Karstrup, Cecilia Christensen; Pedersen, Hanne Gervi; Agerholm, Jørgen Steen; Jensen, Tim Kåre; Klitgaard, Kirstine

    2015-02-25

    The bacteria present in the uterus during pyometra have previously been studied using bacteriological culturing. These studies identified Fusobacterium necrophorum and Trueperella pyogenes as the major contributors to the pathogenesis of pyometra. However, an increasing number of culture-independent studies have demonstrated that the bacterial diversity in most environments is underestimated in culture-based studies. Consequently, fastidious pyometra-associated pathogens may have been overlooked. Therefore, the primary purpose of this study was to investigate the diversity of bacteria in the uterus of cows with pyometra by using culture-independent 16S rRNA PCR combined with next generation sequencing. We investigated the microbial composition in the uterus of 21 cows with pyometra, which were obtained from a Danish slaughterhouse. Similar to the observations from the culture studies, Fusobacteriaceae, the family that F. necrophorum belongs to, was the operational taxonomic unit (OTU) observed in the largest quantities. By contrast, the Actinomycetaceae family, which includes T. pyogenes, constituted only 1% of the total number of reads. Thus we cannot confirm the previously reported role of species from this family in the pathogenesis of pyometra. Finally, we identified a large number of sequences representing three families of Gram-negative bacteria in the pyometra samples: Porphyromonadaceae, Mycoplasmataceae, and Pasteurellaceae. It is likely that these families comprise potential pathogenic species of a fastidious nature, which have been overlooked in previous studies. Our results increase the knowledge of the complexity of the pyometra microbiota and suggest that pathogens in addition to F. necrophorum may be involved in the pathogenesis of pyometra. PMID:25550285

  20. Deep sequencing reveals mutagenic effects of ribavirin during monotherapy of hepatitis C virus genotype 1-infected patients.

    PubMed

    Dietz, Julia; Schelhorn, Sven-Eric; Fitting, Daniel; Mihm, Ulrike; Susser, Simone; Welker, Martin-Walter; Füller, Caterina; Däumer, Martin; Teuber, Gerlinde; Wedemeyer, Heiner; Berg, Thomas; Lengauer, Thomas; Zeuzem, Stefan; Herrmann, Eva; Sarrazin, Christoph

    2013-06-01

    The preeminent mode of action of the broad-spectrum antiviral nucleoside ribavirin in the therapy of chronic hepatitis C is currently unresolved. Particularly under contest are possible mutagenic effects of ribavirin that may lead to viral extinction by lethal mutagenesis of the hepatitis C virus (HCV) genome. We applied ultradeep sequencing to determine ribavirin-induced sequence changes in the HCV coding region (nucleotides [nt] 330 to 9351) of patients treated with 6-week ribavirin monotherapy (n = 6) in comparison to placebo (n = 6). Baseline HCV RNA levels maximally declined on average by -0.8 or -0.1 log10 IU/ml in ribavirin- versus placebo-treated patients. No general increase in rates of nucleotide substitutions in ribavirin-treated patients was observed. However, more HCV genome positions with high G-to-A and C-to-U transition rates were detected between baseline and treatment week 6 in ribavirin-treated patients in comparison to placebo-treated patients (rate of 0.0041 transitions per base pair versus rate of 0.0022 transitions per base pair; P = 0.049). Similarly, the sensitive detection of low-frequency minority variants by statistical filtering indicated significantly more positions with G-to-A and C-to-U transitions in ribavirin-treated patients than in placebo-treated patients (rate of 0.0331 transitions versus rate of 0.0186 transitions per G/C-containing position at baseline; P = 0.018). In contrast, non-ribavirin-associated A-to-G and U-to-C transitions were not enriched in the ribavirin group (P = 0.152). We conclude that ribavirin exerts a mutagenic effect on the virus in patients with chronic hepatitis C by facilitating G-to-A and C-to-U nucleotide transitions. PMID:23536652

  1. Deep Sequencing Reveals Mutagenic Effects of Ribavirin during Monotherapy of Hepatitis C Virus Genotype 1-Infected Patients

    PubMed Central

    Dietz, Julia; Schelhorn, Sven-Eric; Fitting, Daniel; Mihm, Ulrike; Susser, Simone; Welker, Martin-Walter; Füller, Caterina; Däumer, Martin; Teuber, Gerlinde; Wedemeyer, Heiner; Berg, Thomas; Lengauer, Thomas; Zeuzem, Stefan; Herrmann, Eva

    2013-01-01

    The preeminent mode of action of the broad-spectrum antiviral nucleoside ribavirin in the therapy of chronic hepatitis C is currently unresolved. Particularly under contest are possible mutagenic effects of ribavirin that may lead to viral extinction by lethal mutagenesis of the hepatitis C virus (HCV) genome. We applied ultradeep sequencing to determine ribavirin-induced sequence changes in the HCV coding region (nucleotides [nt] 330 to 9351) of patients treated with 6-week ribavirin monotherapy (n = 6) in comparison to placebo (n = 6). Baseline HCV RNA levels maximally declined on average by −0.8 or −0.1 log10 IU/ml in ribavirin- versus placebo-treated patients. No general increase in rates of nucleotide substitutions in ribavirin-treated patients was observed. However, more HCV genome positions with high G-to-A and C-to-U transition rates were detected between baseline and treatment week 6 in ribavirin-treated patients in comparison to placebo-treated patients (rate of 0.0041 transitions per base pair versus rate of 0.0022 transitions per base pair; P = 0.049). Similarly, the sensitive detection of low-frequency minority variants by statistical filtering indicated significantly more positions with G-to-A and C-to-U transitions in ribavirin-treated patients than in placebo-treated patients (rate of 0.0331 transitions versus rate of 0.0186 transitions per G/C-containing position at baseline; P = 0.018). In contrast, non-ribavirin-associated A-to-G and U-to-C transitions were not enriched in the ribavirin group (P = 0.152). We conclude that ribavirin exerts a mutagenic effect on the virus in patients with chronic hepatitis C by facilitating G-to-A and C-to-U nucleotide transitions. PMID:23536652

  2. Implemented Lomb-Scargle periodogram: a valuable tool for improving cyclostratigraphic research on unevenly sampled deep-sea stratigraphic sequences

    NASA Astrophysics Data System (ADS)

    Pardo-Iguzquiza, Eulogio; Rodríguez-Tovar, Francisco J.

    2011-12-01

    One important handicap when working with stratigraphic sequences is the discontinuous character of the sedimentary record, especially relevant in cyclostratigraphic analysis. Uneven palaeoclimatic/palaeoceanographic time series are common, their cyclostratigraphic analysis being comparatively difficult because most spectral methodologies are appropriate only when working with even sampling. As a means to solve this problem, a program for calculating the smoothed Lomb-Scargle periodogram and cross-periodogram, which additionally evaluates the statistical confidence of the estimated power spectrum through a Monte Carlo procedure (the permutation test), has been developed. The spectral analysis of a short uneven time series calls for assessment of the statistical significance of the spectral peaks, since a periodogram can always be calculated but the main challenge resides in identifying true spectral features. To demonstrate the effectiveness of this program, two case studies are presented: the one deals with synthetic data and the other with paleoceanographic/palaeoclimatic proxies. On a simulated time series of 500 data, two uneven time series (with 100 and 25 data) were generated by selecting data at random. Comparative analysis between the power spectra from the simulated series and from the two uneven time series demonstrates the usefulness of the smoothed Lomb-Scargle periodogram for uneven sequences, making it possible to distinguish between statistically significant and spurious spectral peaks. Fragmentary time series of Cd/Ca ratios and δ18O from core AII107-131 of SPECMAP were analysed as a real case study. The efficiency of the direct and cross Lomb-Scargle periodogram in recognizing Milankovitch and sub-Milankovitch signals related to palaeoclimatic/palaeoceanographic changes is demonstrated. As implemented, the Lomb-Scargle periodogram may be applied to any palaeoclimatic/palaeoceanographic proxies, including those usually recovered from contourites

  3. Small RNA deep sequencing identifies viral microRNAs during malignant catarrhal fever induced by alcelaphine herpesvirus 1.

    PubMed

    Sorel, Océane; Tuddenham, Lee; Myster, Françoise; Palmeira, Leonor; Kerkhofs, Pierre; Pfeffer, Sébastien; Vanderplasschen, Alain; Dewals, Benjamin G

    2015-11-01

    Alcelaphine herpesvirus 1 (AlHV-1) is a c-herpesvirus (c-HV) carried asymptomatically by wildebeest. Upon cross-species transmission, AlHV-1 induces a fatal lymphoproliferative disease named malignant catarrhal fever (MCF) in many ruminants, including cattle, and the rabbit model. Latency has been shown to be essential for MCF induction. However, the mechanisms causing the activation and proliferation of infected CD8+T cells are unknown. Many c-HVs express microRNAs (miRNAs). These small non-coding RNAs can regulate expression of host or viral target genes involved in various pathways and are thought to facilitate viral infection and/or mediate activation and proliferation of infected lymphocytes. The AlHV-1 genome has been predicted to encode a large number of miRNAs. However, their precise contribution in viral infection and pathogenesis in vivo remains unknown. Here, using cloning and sequencing of small RNAs we identified 36 potential miRNAs expressed in a lymphoblastoid cell line propagated from a calf infected with AlHV-1 and developing MCF. Among the sequenced candidate miRNAs, 32 were expressed on the reverse strand of the genome in two main clusters. The expression of these 32 viral miRNAs was further validated using Northern blot and quantitative reverse transcription PCR in lymphoid organs of MCF developing