Science.gov

Sample records for deep short-read sequencing

  1. Unlocking Short Read Sequencing for Metagenomics

    DOE PAGES

    Rodrigue, Sébastien; Materna, Arne C.; Timberlake, Sonia C.; Blackburn, Matthew C.; Malmstrom, Rex R.; Alm, Eric J.; Chisholm, Sallie W.; Gilbert, Jack Anthony

    2010-07-28

    We describe an experimental and computational pipeline yielding millions of reads that can exceed 200 bp with quality scores approaching that of traditional Sanger sequencing. The method combines an automatable gel-less library construction step with paired-end sequencing on a short-read instrument. With appropriately sized library inserts, mate-pair sequences can overlap, and we describe the SHERA software package that joins them to form a longer composite read.

  2. Software for pre-processing Illumina next-generation sequencing short read sequences

    PubMed Central

    2014-01-01

    Background When compared to Sanger sequencing technology, next-generation sequencing (NGS) technologies are hindered by shorter sequence read length, higher base-call error rate, non-uniform coverage, and platform-specific sequencing artifacts. These characteristics lower the quality of their downstream analyses, e.g. de novo and reference-based assembly, by introducing sequencing artifacts and errors that may contribute to incorrect interpretation of data. Although many tools have been developed for quality control and pre-processing of NGS data, none of them provide flexible and comprehensive trimming options in conjunction with parallel processing to expedite pre-processing of large NGS datasets. Methods We developed ngsShoRT (next-generation sequencing Short Reads Trimmer), a flexible and comprehensive open-source software package written in Perl that provides a set of algorithms commonly used for pre-processing NGS short read sequences. We compared the features and performance of ngsShoRT with existing tools: CutAdapt, NGS QC Toolkit and Trimmomatic. We also compared the effects of using pre-processed short read sequences generated by different algorithms on de novo and reference-based assembly for three different genomes: Caenorhabditis elegans, Saccharomyces cerevisiae S288c, and Escherichia coli O157 H7. Results Several combinations of ngsShoRT algorithms were tested on publicly available Illumina GA II, HiSeq 2000, and MiSeq eukaryotic and bacteria genomic short read sequences with the focus on removing sequencing artifacts and low-quality reads and/or bases. Our results show that across three organisms and three sequencing platforms, trimming improved the mean quality scores of trimmed sequences. Using trimmed sequences for de novo and reference-based assembly improved assembly quality as well as assembler performance. In general, ngsShoRT outperformed comparable trimming tools in terms of trimming speed and improvement of de novo and reference

  3. Development and transferability of black and red raspberry microsatellite markers from short-read sequences

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The advent of next-generation sequencing technologies has been a boon to the cost-effective development of molecular markers, particularly in non-model species. Here, we demonstrate the efficiency of microsatellite or simple sequence repeat (SSR) marker development from short-read sequences using th...

  4. SRComp: short read sequence compression using burstsort and Elias omega coding.

    PubMed

    Selva, Jeremy John; Chen, Xin

    2013-01-01

    Next-generation sequencing (NGS) technologies permit the rapid production of vast amounts of data at low cost. Economical data storage and transmission hence becomes an increasingly important challenge for NGS experiments. In this paper, we introduce a new non-reference based read sequence compression tool called SRComp. It works by first employing a fast string-sorting algorithm called burstsort to sort read sequences in lexicographical order and then Elias omega-based integer coding to encode the sorted read sequences. SRComp has been benchmarked on four large NGS datasets, where experimental results show that it can run 5-35 times faster than current state-of-the-art read sequence compression tools such as BEETL and SCALCE, while retaining comparable compression efficiency for large collections of short read sequences. SRComp is a read sequence compression tool that is particularly valuable in certain applications where compression time is of major concern.

  5. Identifying wrong assemblies in de novo short read primary sequence assembly contigs.

    PubMed

    Chawla, Vandna; Kumar, Rajnish; Shankar, Ravi

    2016-09-01

    With the advent of short-reads-based genome sequencing approaches, large number of organisms are being sequenced all over the world. Most of these assemblies are done using some de novo short read assemblers and other related approaches. However, the contigs produced this way are prone to wrong assembly. So far, there is a conspicuous dearth of reliable tools to identify mis-assembled contigs. Mis-assemblies could result from incorrectly deleted or wrongly arranged genomic sequences. In the present work various factors related to sequence, sequencing and assembling have been assessed for their role in causing mis-assembly by using different genome sequencing data. Finally, some mis-assembly detecting tools have been evaluated for their ability to detect the wrongly assembled primary contigs, suggesting a lot of scope for improvement in this area. The present work also proposes a simple unsupervised learning-based novel approach to identify mis-assemblies in the contigs which was found performing reasonably well when compared to the already existing tools to report mis-assembled contigs. It was observed that the proposed methodology may work as a complementary system to the existing tools to enhance their accuracy. PMID:27581937

  6. Assembled sequence contigs by SOAPdenova and Volvet algorithms from metagenomic short reads of a new bacterial isolate of gut origin

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Assembled sequence contigs by SOAPdenova and Volvet algorithms from metagenomic short reads of a new bacterial isolate of gut origin. This study included 2 submissions with a total of 9.8 million bp of assembled contigs....

  7. Haplotype-Phased Synthetic Long Reads from Short-Read Sequencing

    PubMed Central

    Stapleton, James A.; Kim, Jeongwoon; Hamilton, John P.; Wu, Ming; Irber, Luiz C.; Maddamsetti, Rohan; Briney, Bryan; Newton, Linsey; Burton, Dennis R.; Brown, C. Titus; Chan, Christina; Buell, C. Robin; Whitehead, Timothy A.

    2016-01-01

    Next-generation DNA sequencing has revolutionized the study of biology. However, the short read lengths of the dominant instruments complicate assembly of complex genomes and haplotype phasing of mixtures of similar sequences. Here we demonstrate a method to reconstruct the sequences of individual nucleic acid molecules up to 11.6 kilobases in length from short (150-bp) reads. We show that our method can construct 99.97%-accurate synthetic reads from bacterial, plant, and animal genomic samples, full-length mRNA sequences from human cancer cell lines, and individual HIV env gene variants from a mixture. The preparation of multiple samples can be multiplexed into a single tube, further reducing effort and cost relative to competing approaches. Our approach generates sequencing libraries in three days from less than one microgram of DNA in a single-tube format without custom equipment or specialized expertise. PMID:26789840

  8. Haplotype-Phased Synthetic Long Reads from Short-Read Sequencing.

    PubMed

    Stapleton, James A; Kim, Jeongwoon; Hamilton, John P; Wu, Ming; Irber, Luiz C; Maddamsetti, Rohan; Briney, Bryan; Newton, Linsey; Burton, Dennis R; Brown, C Titus; Chan, Christina; Buell, C Robin; Whitehead, Timothy A

    2016-01-01

    Next-generation DNA sequencing has revolutionized the study of biology. However, the short read lengths of the dominant instruments complicate assembly of complex genomes and haplotype phasing of mixtures of similar sequences. Here we demonstrate a method to reconstruct the sequences of individual nucleic acid molecules up to 11.6 kilobases in length from short (150-bp) reads. We show that our method can construct 99.97%-accurate synthetic reads from bacterial, plant, and animal genomic samples, full-length mRNA sequences from human cancer cell lines, and individual HIV env gene variants from a mixture. The preparation of multiple samples can be multiplexed into a single tube, further reducing effort and cost relative to competing approaches. Our approach generates sequencing libraries in three days from less than one microgram of DNA in a single-tube format without custom equipment or specialized expertise. PMID:26789840

  9. Haplotype-Phased Synthetic Long Reads from Short-Read Sequencing.

    PubMed

    Stapleton, James A; Kim, Jeongwoon; Hamilton, John P; Wu, Ming; Irber, Luiz C; Maddamsetti, Rohan; Briney, Bryan; Newton, Linsey; Burton, Dennis R; Brown, C Titus; Chan, Christina; Buell, C Robin; Whitehead, Timothy A

    2016-01-01

    Next-generation DNA sequencing has revolutionized the study of biology. However, the short read lengths of the dominant instruments complicate assembly of complex genomes and haplotype phasing of mixtures of similar sequences. Here we demonstrate a method to reconstruct the sequences of individual nucleic acid molecules up to 11.6 kilobases in length from short (150-bp) reads. We show that our method can construct 99.97%-accurate synthetic reads from bacterial, plant, and animal genomic samples, full-length mRNA sequences from human cancer cell lines, and individual HIV env gene variants from a mixture. The preparation of multiple samples can be multiplexed into a single tube, further reducing effort and cost relative to competing approaches. Our approach generates sequencing libraries in three days from less than one microgram of DNA in a single-tube format without custom equipment or specialized expertise.

  10. Optimized filtering reduces the error rate in detecting genomic variants by short-read sequencing.

    PubMed

    Reumers, Joke; De Rijk, Peter; Zhao, Hui; Liekens, Anthony; Smeets, Dominiek; Cleary, John; Van Loo, Peter; Van Den Bossche, Maarten; Catthoor, Kirsten; Sabbe, Bernard; Despierre, Evelyn; Vergote, Ignace; Hilbush, Brian; Lambrechts, Diether; Del-Favero, Jurgen

    2012-01-01

    Distinguishing single-nucleotide variants (SNVs) from errors in whole-genome sequences remains challenging. Here we describe a set of filters, together with a freely accessible software tool, that selectively reduce error rates and thereby facilitate variant detection in data from two short-read sequencing technologies, Complete Genomics and Illumina. By sequencing the nearly identical genomes from monozygotic twins and considering shared SNVs as 'true variants' and discordant SNVs as 'errors', we optimized thresholds for 12 individual filters and assessed which of the 1,048 filter combinations were effective in terms of sensitivity and specificity. Cumulative application of all effective filters reduced the error rate by 290-fold, facilitating the identification of genetic differences between monozygotic twins. We also applied an adapted, less stringent set of filters to reliably identify somatic mutations in a highly rearranged tumor and to identify variants in the NA19240 HapMap genome relative to a reference set of SNVs. PMID:22178994

  11. AREM: Aligning Short Reads from ChIP-Sequencing by Expectation Maximization

    NASA Astrophysics Data System (ADS)

    Newkirk, Daniel; Biesinger, Jacob; Chon, Alvin; Yokomori, Kyoko; Xie, Xiaohui

    High-throughput sequencing coupled to chromatin immunoprecipitation (ChIP-Seq) is widely used in characterizing genome-wide binding patterns of transcription factors, cofactors, chromatin modifiers, and other DNA binding proteins. A key step in ChIP-Seq data analysis is to map short reads from high-throughput sequencing to a reference genome and identify peak regions enriched with short reads. Although several methods have been proposed for ChIP-Seq analysis, most existing methods only consider reads that can be uniquely placed in the reference genome, and therefore have low power for detecting peaks located within repeat sequences. Here we introduce a probabilistic approach for ChIP-Seq data analysis which utilizes all reads, providing a truly genome-wide view of binding patterns. Reads are modeled using a mixture model corresponding to K enriched regions and a null genomic background. We use maximum likelihood to estimate the locations of the enriched regions, and implement an expectation-maximization (E-M) algorithm, called AREM (aligning reads by expectation maximization), to update the alignment probabilities of each read to different genomic locations. We apply the algorithm to identify genome-wide binding events of two proteins: Rad21, a component of cohesin and a key factor involved in chromatid cohesion, and Srebp-1, a transcription factor important for lipid/cholesterol homeostasis. Using AREM, we were able to identify 19,935 Rad21 peaks and 1,748 Srebp-1 peaks in the mouse genome with high confidence, including 1,517 (7.6%) Rad21 peaks and 227 (13%) Srebp-1 peaks that were missed using only uniquely mapped reads. The open source implementation of our algorithm is available at http://sourceforge.net/projects/arem

  12. MOST: a modified MLST typing tool based on short read sequencing

    PubMed Central

    Dallman, Timothy; Schaefer, Ulf; Sheppard, Carmen L.; Ashton, Philip; Pichon, Bruno; Ellington, Matthew; Swift, Craig; Green, Jonathan; Underwood, Anthony

    2016-01-01

    Multilocus sequence typing (MLST) is an effective method to describe bacterial populations. Conventionally, MLST involves Polymerase Chain Reaction (PCR) amplification of housekeeping genes followed by Sanger DNA sequencing. Public Health England (PHE) is in the process of replacing the conventional MLST methodology with a method based on short read sequence data derived from Whole Genome Sequencing (WGS). This paper reports the comparison of the reliability of MLST results derived from WGS data, comparing mapping and assembly-based approaches to conventional methods using 323 bacterial genomes of diverse species. The sensitivity of the two WGS based methods were further investigated with 26 mixed and 29 low coverage genomic data sets from Salmonella enteridis and Streptococcus pneumoniae. Of the 323 samples, 92.9% (n = 300), 97.5% (n = 315) and 99.7% (n = 322) full MLST profiles were derived by the conventional method, assembly- and mapping-based approaches, respectively. The concordance between samples that were typed by conventional (92.9%) and both WGS methods was 100%. From the 55 mixed and low coverage genomes, 89.1% (n = 49) and 67.3% (n = 37) full MLST profiles were derived from the mapping and assembly based approaches, respectively. In conclusion, deriving MLST from WGS data is more sensitive than the conventional method. When comparing WGS based methods, the mapping based approach was the most sensitive. In addition, the mapping based approach described here derives quality metrics, which are difficult to determine quantitatively using conventional and WGS-assembly based approaches. PMID:27602279

  13. MOST: a modified MLST typing tool based on short read sequencing

    PubMed Central

    Dallman, Timothy; Schaefer, Ulf; Sheppard, Carmen L.; Ashton, Philip; Pichon, Bruno; Ellington, Matthew; Swift, Craig; Green, Jonathan; Underwood, Anthony

    2016-01-01

    Multilocus sequence typing (MLST) is an effective method to describe bacterial populations. Conventionally, MLST involves Polymerase Chain Reaction (PCR) amplification of housekeeping genes followed by Sanger DNA sequencing. Public Health England (PHE) is in the process of replacing the conventional MLST methodology with a method based on short read sequence data derived from Whole Genome Sequencing (WGS). This paper reports the comparison of the reliability of MLST results derived from WGS data, comparing mapping and assembly-based approaches to conventional methods using 323 bacterial genomes of diverse species. The sensitivity of the two WGS based methods were further investigated with 26 mixed and 29 low coverage genomic data sets from Salmonella enteridis and Streptococcus pneumoniae. Of the 323 samples, 92.9% (n = 300), 97.5% (n = 315) and 99.7% (n = 322) full MLST profiles were derived by the conventional method, assembly- and mapping-based approaches, respectively. The concordance between samples that were typed by conventional (92.9%) and both WGS methods was 100%. From the 55 mixed and low coverage genomes, 89.1% (n = 49) and 67.3% (n = 37) full MLST profiles were derived from the mapping and assembly based approaches, respectively. In conclusion, deriving MLST from WGS data is more sensitive than the conventional method. When comparing WGS based methods, the mapping based approach was the most sensitive. In addition, the mapping based approach described here derives quality metrics, which are difficult to determine quantitatively using conventional and WGS-assembly based approaches.

  14. MOST: a modified MLST typing tool based on short read sequencing.

    PubMed

    Tewolde, Rediat; Dallman, Timothy; Schaefer, Ulf; Sheppard, Carmen L; Ashton, Philip; Pichon, Bruno; Ellington, Matthew; Swift, Craig; Green, Jonathan; Underwood, Anthony

    2016-01-01

    Multilocus sequence typing (MLST) is an effective method to describe bacterial populations. Conventionally, MLST involves Polymerase Chain Reaction (PCR) amplification of housekeeping genes followed by Sanger DNA sequencing. Public Health England (PHE) is in the process of replacing the conventional MLST methodology with a method based on short read sequence data derived from Whole Genome Sequencing (WGS). This paper reports the comparison of the reliability of MLST results derived from WGS data, comparing mapping and assembly-based approaches to conventional methods using 323 bacterial genomes of diverse species. The sensitivity of the two WGS based methods were further investigated with 26 mixed and 29 low coverage genomic data sets from Salmonella enteridis and Streptococcus pneumoniae. Of the 323 samples, 92.9% (n = 300), 97.5% (n = 315) and 99.7% (n = 322) full MLST profiles were derived by the conventional method, assembly- and mapping-based approaches, respectively. The concordance between samples that were typed by conventional (92.9%) and both WGS methods was 100%. From the 55 mixed and low coverage genomes, 89.1% (n = 49) and 67.3% (n = 37) full MLST profiles were derived from the mapping and assembly based approaches, respectively. In conclusion, deriving MLST from WGS data is more sensitive than the conventional method. When comparing WGS based methods, the mapping based approach was the most sensitive. In addition, the mapping based approach described here derives quality metrics, which are difficult to determine quantitatively using conventional and WGS-assembly based approaches. PMID:27602279

  15. Reference-based compression of short-read sequences using path encoding

    PubMed Central

    Kingsford, Carl; Patro, Rob

    2015-01-01

    Motivation: Storing, transmitting and archiving data produced by next-generation sequencing is a significant computational burden. New compression techniques tailored to short-read sequence data are needed. Results: We present here an approach to compression that reduces the difficulty of managing large-scale sequencing data. Our novel approach sits between pure reference-based compression and reference-free compression and combines much of the benefit of reference-based approaches with the flexibility of de novo encoding. Our method, called path encoding, draws a connection between storing paths in de Bruijn graphs and context-dependent arithmetic coding. Supporting this method is a system to compactly store sets of kmers that is of independent interest. We are able to encode RNA-seq reads using 3–11% of the space of the sequence in raw FASTA files, which is on average more than 34% smaller than competing approaches. We also show that even if the reference is very poorly matched to the reads that are being encoded, good compression can still be achieved. Availability and implementation: Source code and binaries freely available for download at http://www.cs.cmu.edu/∼ckingsf/software/pathenc/, implemented in Go and supported on Linux and Mac OS X. Contact: carlk@cs.cmu.edu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25649622

  16. Characterization of a biogas-producing microbial community by short-read next generation DNA sequencing

    PubMed Central

    2012-01-01

    Background Renewable energy production is currently a major issue worldwide. Biogas is a promising renewable energy carrier as the technology of its production combines the elimination of organic waste with the formation of a versatile energy carrier, methane. In consequence of the complexity of the microbial communities and metabolic pathways involved the biotechnology of the microbiological process leading to biogas production is poorly understood. Metagenomic approaches are suitable means of addressing related questions. In the present work a novel high-throughput technique was tested for its benefits in resolving the functional and taxonomical complexity of such microbial consortia. Results It was demonstrated that the extremely parallel SOLiD™ short-read DNA sequencing platform is capable of providing sufficient useful information to decipher the systematic and functional contexts within a biogas-producing community. Although this technology has not been employed to address such problems previously, the data obtained compare well with those from similar high-throughput approaches such as 454-pyrosequencing GS FLX or Titanium. The predominant microbes contributing to the decomposition of organic matter include members of the Eubacteria, class Clostridia, order Clostridiales, family Clostridiaceae. Bacteria belonging in other systematic groups contribute to the diversity of the microbial consortium. Archaea comprise a remarkably small minority in this community, given their crucial role in biogas production. Among the Archaea, the predominant order is the Methanomicrobiales and the most abundant species is Methanoculleus marisnigri. The Methanomicrobiales are hydrogenotrophic methanogens. Besides corroborating earlier findings on the significance of the contribution of the Clostridia to organic substrate decomposition, the results demonstrate the importance of the metabolism of hydrogen within the biogas producing microbial community. Conclusions Both

  17. Efficient Graph Based Assembly of Short-Read Sequences on Hybrid Core Architecture

    SciTech Connect

    Sczyrba, Alex; Pratap, Abhishek; Canon, Shane; Han, James; Copeland, Alex; Wang, Zhong; Brewer, Tony; Soper, David; D'Jamoos, Mike; Collins, Kirby; Vacek, George

    2011-03-22

    Advanced architectures can deliver dramatically increased throughput for genomics and proteomics applications, reducing time-to-completion in some cases from days to minutes. One such architecture, hybrid-core computing, marries a traditional x86 environment with a reconfigurable coprocessor, based on field programmable gate array (FPGA) technology. In addition to higher throughput, increased performance can fundamentally improve research quality by allowing more accurate, previously impractical approaches. We will discuss the approach used by Convey?s de Bruijn graph constructor for short-read, de-novo assembly. Bioinformatics applications that have random access patterns to large memory spaces, such as graph-based algorithms, experience memory performance limitations on cache-based x86 servers. Convey?s highly parallel memory subsystem allows application-specific logic to simultaneously access 8192 individual words in memory, significantly increasing effective memory bandwidth over cache-based memory systems. Many algorithms, such as Velvet and other de Bruijn graph based, short-read, de-novo assemblers, can greatly benefit from this type of memory architecture. Furthermore, small data type operations (four nucleotides can be represented in two bits) make more efficient use of logic gates than the data types dictated by conventional programming models.JGI is comparing the performance of Convey?s graph constructor and Velvet on both synthetic and real data. We will present preliminary results on memory usage and run time metrics for various data sets with different sizes, from small microbial and fungal genomes to very large cow rumen metagenome. For genomes with references we will also present assembly quality comparisons between the two assemblers.

  18. Similarity thresholds used in DNA sequence assembly from short reads can reduce the comparability of population histories across species

    PubMed Central

    Judy, Caroline Duffie; Seeholzer, Glenn F.; Maley, James M.; Graves, Gary R.; Brumfield, Robb T.

    2015-01-01

    Comparing inferences among datasets generated using short read sequencing may provide insight into the concerted impacts of divergence, gene flow and selection across organisms, but comparisons are complicated by biases introduced during dataset assembly. Sequence similarity thresholds allow the de novo assembly of short reads into clusters of alleles representing different loci, but the resulting datasets are sensitive to both the similarity threshold used and to the variation naturally present in the organism under study. Thresholds that require high sequence similarity among reads for assembly (stringent thresholds) as well as highly variable species may result in datasets in which divergent alleles are lost or divided into separate loci (‘over-splitting’), whereas liberal thresholds increase the risk of paralogous loci being combined into a single locus (‘under-splitting’). Comparisons among datasets or species are therefore potentially biased if different similarity thresholds are applied or if the species differ in levels of within-lineage genetic variation. We examine the impact of a range of similarity thresholds on assembly of empirical short read datasets from populations of four different non-model bird lineages (species or species pairs) with different levels of genetic divergence. We find that, in all species, stringent similarity thresholds result in fewer alleles per locus than more liberal thresholds, which appears to be the result of high levels of over-splitting. The frequency of putative under-splitting, conversely, is low at all thresholds. Inferred genetic distances between individuals, gene tree depths, and estimates of the ancestral mutation-scaled effective population size (θ) differ depending upon the similarity threshold applied. Relative differences in inferences across species differ even when the same threshold is applied, but may be dramatically different when datasets assembled under different thresholds are compared. These

  19. TGS-TB: Total Genotyping Solution for Mycobacterium tuberculosis Using Short-Read Whole-Genome Sequencing.

    PubMed

    Sekizuka, Tsuyoshi; Yamashita, Akifumi; Murase, Yoshiro; Iwamoto, Tomotada; Mitarai, Satoshi; Kato, Seiya; Kuroda, Makoto

    2015-01-01

    Whole-genome sequencing (WGS) with next-generation DNA sequencing (NGS) is an increasingly accessible and affordable method for genotyping hundreds of Mycobacterium tuberculosis (Mtb) isolates, leading to more effective epidemiological studies involving single nucleotide variations (SNVs) in core genomic sequences based on molecular evolution. We developed an all-in-one web-based tool for genotyping Mtb, referred to as the Total Genotyping Solution for TB (TGS-TB), to facilitate multiple genotyping platforms using NGS for spoligotyping and the detection of phylogenies with core genomic SNVs, IS6110 insertion sites, and 43 customized loci for variable number tandem repeat (VNTR) through a user-friendly, simple click interface. This methodology is implemented with a KvarQ script to predict MTBC lineages/sublineages and potential antimicrobial resistance. Seven Mtb isolates (JP01 to JP07) in this study showing the same VNTR profile were accurately discriminated through median-joining network analysis using SNVs unique to those isolates. An additional IS6110 insertion was detected in one of those isolates as supportive genetic information in addition to core genomic SNVs. The results of in silico analyses using TGS-TB are consistent with those obtained using conventional molecular genotyping methods, suggesting that NGS short reads could provide multiple genotypes to discriminate multiple strains of Mtb, although longer NGS reads (≥ 300-mer) will be required for full genotyping on the TGS-TB web site. Most available short reads (~100-mer) can be utilized to discriminate the isolates based on the core genome phylogeny. TGS-TB provides a more accurate and discriminative strain typing for clinical and epidemiological investigations; NGS strain typing offers a total genotyping solution for Mtb outbreak and surveillance. TGS-TB web site: https://gph.niid.go.jp/tgs-tb/. PMID:26565975

  20. TGS-TB: Total Genotyping Solution for Mycobacterium tuberculosis Using Short-Read Whole-Genome Sequencing.

    PubMed

    Sekizuka, Tsuyoshi; Yamashita, Akifumi; Murase, Yoshiro; Iwamoto, Tomotada; Mitarai, Satoshi; Kato, Seiya; Kuroda, Makoto

    2015-01-01

    Whole-genome sequencing (WGS) with next-generation DNA sequencing (NGS) is an increasingly accessible and affordable method for genotyping hundreds of Mycobacterium tuberculosis (Mtb) isolates, leading to more effective epidemiological studies involving single nucleotide variations (SNVs) in core genomic sequences based on molecular evolution. We developed an all-in-one web-based tool for genotyping Mtb, referred to as the Total Genotyping Solution for TB (TGS-TB), to facilitate multiple genotyping platforms using NGS for spoligotyping and the detection of phylogenies with core genomic SNVs, IS6110 insertion sites, and 43 customized loci for variable number tandem repeat (VNTR) through a user-friendly, simple click interface. This methodology is implemented with a KvarQ script to predict MTBC lineages/sublineages and potential antimicrobial resistance. Seven Mtb isolates (JP01 to JP07) in this study showing the same VNTR profile were accurately discriminated through median-joining network analysis using SNVs unique to those isolates. An additional IS6110 insertion was detected in one of those isolates as supportive genetic information in addition to core genomic SNVs. The results of in silico analyses using TGS-TB are consistent with those obtained using conventional molecular genotyping methods, suggesting that NGS short reads could provide multiple genotypes to discriminate multiple strains of Mtb, although longer NGS reads (≥ 300-mer) will be required for full genotyping on the TGS-TB web site. Most available short reads (~100-mer) can be utilized to discriminate the isolates based on the core genome phylogeny. TGS-TB provides a more accurate and discriminative strain typing for clinical and epidemiological investigations; NGS strain typing offers a total genotyping solution for Mtb outbreak and surveillance. TGS-TB web site: https://gph.niid.go.jp/tgs-tb/.

  1. Rapid Short-Read Sequencing and Aneuploidy Detection Using MinION Nanopore Technology

    PubMed Central

    Wei, Shan; Williams, Zev

    2016-01-01

    MinION is a memory stick–sized nanopore-based sequencer designed primarily for single-molecule sequencing of long DNA fragments (>6 kb). We developed a library preparation and data-analysis method to enable rapid real-time sequencing of short DNA fragments (<1 kb) that resulted in the sequencing of 500 reads in 3 min and 40,000–80,000 reads in 2–4 hr at a rate of 30 nt/sec. We then demonstrated the clinical applicability of this approach by performing successful aneuploidy detection in prenatal and miscarriage samples with sequencing in <4 hr. This method broadens the application of nanopore-based single-molecule sequencing and makes it a promising and versatile tool for rapid clinical and research applications. PMID:26500254

  2. Genome-wide detection of chromosomal rearrangements, indels, and mutations in circular chromosomes by short read sequencing

    PubMed Central

    Skovgaard, Ole; Bak, Mads; Løbner-Olesen, Anders; Tommerup, Niels

    2011-01-01

    Whole-genome sequencing (WGS) with new short-read sequencing technologies has recently been applied for genome-wide identification of mutations. Genomic rearrangements have, however, often remained undetected by WGS, and additional analyses are required for their detection. Here, we have applied a combination of WGS and genome copy number analysis, for the identification of mutations that suppress the growth deficiency imposed by excessive initiations from the Escherichia coli origin of replication, oriC. The E. coli chromosome, like the majority of bacterial chromosomes, is circular, and DNA replication is initiated by assembling two replication complexes at the origin, oriC. These complexes then replicate the chromosome bidirectionally toward the terminus, ter. In a population of growing cells, this results in a copy number gradient, so that origin-proximal sequences are more frequent than origin-distal sequences. Major rearrangements in the chromosome are, therefore, readily identified by changes in copy number, i.e., certain sequences become over- or under-represented. Of the eight mutations analyzed in detail here, six were found to affect a single gene only, one was a large chromosomal inversion, and one was a large chromosomal duplication. The latter two mutations could not be detected solely by WGS, validating the present approach for identification of genomic rearrangements. We further suggest the use of copy number analysis in combination with WGS for validation of newly assembled bacterial chromosomes. PMID:21555365

  3. BarraCUDA - a fast short read sequence aligner using graphics processing units

    PubMed Central

    2012-01-01

    Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http://seqbarracuda.sf.net PMID:22244497

  4. Base calling for high-throughput short-read sequencing: dynamic programming solutions

    PubMed Central

    2013-01-01

    Background Next-generation DNA sequencing platforms are capable of generating millions of reads in a matter of days at rapidly reducing costs. Despite its proliferation and technological improvements, the performance of next-generation sequencing remains adversely affected by the imperfections in the underlying biochemical and signal acquisition procedures. To this end, various techniques, including statistical methods, are used to improve read lengths and accuracy of these systems. Development of high performing base calling algorithms that are computationally efficient and scalable is an ongoing challenge. Results We develop model-based statistical methods for fast and accurate base calling in Illumina’s next-generation sequencing platforms. In particular, we propose a computationally tractable parametric model which enables dynamic programming formulation of the base calling problem. Forward-backward and soft-output Viterbi algorithms are developed, and their performance and complexity are investigated and compared with the existing state-of-the-art base calling methods for this platform. A C code implementation of our algorithm named Softy can be downloaded from https://sourceforge.net/projects/dynamicprog. Conclusion We demonstrate high accuracy and speed of the proposed methods on reads obtained using Illumina’s Genome Analyzer II and HiSeq2000. In addition to performing reliable and fast base calling, the developed algorithms enable incorporation of prior knowledge which can be utilized for parameter estimation and is potentially beneficial in various downstream applications. PMID:23586484

  5. Alignment of Short Reads: A Crucial Step for Application of Next-Generation Sequencing Data in Precision Medicine.

    PubMed

    Ye, Hao; Meehan, Joe; Tong, Weida; Hong, Huixiao

    2015-01-01

    Precision medicine or personalized medicine has been proposed as a modernized and promising medical strategy. Genetic variants of patients are the key information for implementation of precision medicine. Next-generation sequencing (NGS) is an emerging technology for deciphering genetic variants. Alignment of raw reads to a reference genome is one of the key steps in NGS data analysis. Many algorithms have been developed for alignment of short read sequences since 2008. Users have to make a decision on which alignment algorithm to use in their studies. Selection of the right alignment algorithm determines not only the alignment algorithm but also the set of suitable parameters to be used by the algorithm. Understanding these algorithms helps in selecting the appropriate alignment algorithm for different applications in precision medicine. Here, we review current available algorithms and their major strategies such as seed-and-extend and q-gram filter. We also discuss the challenges in current alignment algorithms, including alignment in multiple repeated regions, long reads alignment and alignment facilitated with known genetic variants.

  6. Alignment of Short Reads: A Crucial Step for Application of Next-Generation Sequencing Data in Precision Medicine

    PubMed Central

    Ye, Hao; Meehan, Joe; Tong, Weida; Hong, Huixiao

    2015-01-01

    Precision medicine or personalized medicine has been proposed as a modernized and promising medical strategy. Genetic variants of patients are the key information for implementation of precision medicine. Next-generation sequencing (NGS) is an emerging technology for deciphering genetic variants. Alignment of raw reads to a reference genome is one of the key steps in NGS data analysis. Many algorithms have been developed for alignment of short read sequences since 2008. Users have to make a decision on which alignment algorithm to use in their studies. Selection of the right alignment algorithm determines not only the alignment algorithm but also the set of suitable parameters to be used by the algorithm. Understanding these algorithms helps in selecting the appropriate alignment algorithm for different applications in precision medicine. Here, we review current available algorithms and their major strategies such as seed-and-extend and q-gram filter. We also discuss the challenges in current alignment algorithms, including alignment in multiple repeated regions, long reads alignment and alignment facilitated with known genetic variants. PMID:26610555

  7. Analysis of gene expression for microminipig liver transcriptomes using parallel long-read technology and short-read sequencing.

    PubMed

    Sakai, Chizuka; Iwano, Shunsuke; Shimizu, Makiko; Onodera, Jun; Uchida, Masashi; Sakurada, Eri; Yamazaki, Yuri; Asaoka, Yoshiji; Imura, Naoko; Uno, Yasuhiro; Murayama, Norie; Hayashi, Ryoji; Yamazaki, Hiroshi; Miyamoto, Yohei

    2016-05-01

    The microminipig is one of the smallest minipigs that has emerged as a possible experimental animal model, because it shares many anatomical and/or physiological similarities with humans, including the coronary artery distribution in the heart, the digestive physiology, the kidney size and its structure, and so on. However, information on gene expression profiles, including those on drug-metabolizing phase I and II enzymes, in the microminipig is limited. Therefore, the aim of the present study was to identify transcripts in microminipig livers and to determine gene expression profiles. De novo assembly and expression analyses of microminipig transcripts were conducted with liver samples from three male and three female microminipigs using parallel long-read and short-read sequencing technologies. After unique sequences had been automatically aligned by assembling software, the mean contig length of 50843 transcripts was 707 bp. The expression profiles of cytochrome P450 (P450) 1A2, 2C, 2E1 and 3A genes in livers in microminipigs were similar to those in humans. Liver carboxylesterase (CES) precursor, liver CES-like, UDP-glucuronosyltransferase (UGT) 2C1-like, amine sulfotransferase (SULT)-like, N-acetyltransferases (NAT8) and glutathione S-transferase (GST) A2 genes, which are relatively unknown genes in pigs and/or humans, were expressed strongly. Furthermore, no significant gender differences were observed in the gene expression profiles of phase I enzymes, whereas UGT2B17, SULT1E1, SULT2A1, amine SULT-like, NAT8 and GSTT4 genes were different between males and females among phase II enzyme genes under the present sample conditions. These results provide a foundation for mechanistic studies and the use of microminipigs as model animals for drug development in the future. Copyright © 2016 John Wiley & Sons, Ltd. PMID:27214158

  8. Short reads and nonmodel species: exploring the complexities of next-generation sequence assembly and SNP discovery in the absence of a reference genome.

    PubMed

    Everett, M V; Grau, E D; Seeb, J E

    2011-03-01

    How practical is gene and SNP discovery in a nonmodel species using short read sequences? Next-generation sequencing technologies are being applied to an increasing number of species with no reference genome. For nonmodel species, the cost, availability of existing genetic resources, genome complexity and the planned method of assembly must all be considered when selecting a sequencing platform. Our goal was to examine the feasibility and optimal methodology for SNP and gene discovery in the sockeye salmon (Oncorhynchus nerka) using short read sequences. SOLiD short reads (up to 50 bp) were generated from single- and pooled-tissue transcriptome libraries from ten sockeye salmon. The individuals were from five distinct populations from the Wood River Lakes and Mendeltna Creek, Alaska. As no reference genome was available for sockeye salmon, the SOLiD sequence reads were assembled to publicly available EST reference sequences from sockeye salmon and two closely related species, rainbow trout (Oncorhynchus mykiss) and Atlantic salmon (Salmo salar). Additionally, de novo assembly of the SOLiD data was carried out, and the SOLiD reads were remapped to the de novo contigs. The results from each reference assembly were compared across all references. The number and size of contigs assembled varied with the size reference sequences. In silico SNP discovery was carried out on contigs from all four EST references; however, discovery of valid SNPs was most successful using one of the two conspecific references. PMID:21429166

  9. Methods for accurate quantification of LTR-retrotransposon copy number using short-read sequence data: a case study in Sorghum.

    PubMed

    Ramachandran, Dhanushya; Hawkins, Jennifer S

    2016-10-01

    Transposable elements (TEs) are ubiquitous in eukaryotic genomes and their mobility impacts genome structure and function in myriad ways. Because of their abundance, activity, and repetitive nature, the characterization and analysis of TEs remain challenging, particularly from short-read sequencing projects. To overcome this difficulty, we have developed a method that estimates TE copy number from short-read sequences. To test the accuracy of our method, we first performed an in silico analysis of the reference Sorghum bicolor genome, using both reference-based and de novo approaches. The resulting TE copy number estimates were strikingly similar to the annotated numbers. We then tested our method on real short-read data by estimating TE copy numbers in several accessions of S. bicolor and its close relative S. propinquum. Both methods effectively identify and rank similar TE families from highest to lowest abundance. We found that de novo characterization was effective at capturing qualitative variation, but underestimated the abundance of some TE families, specifically families of more ancient origin. Also, interspecific reference-based mapping of S. propinquum reads to the S. bicolor database failed to fully describe TE content in S. propinquum, indicative of recent TE activity leading to changes in the respective repetitive landscapes over very short evolutionary timescales. We conclude that reference-based analyses are best suited for within-species comparisons, while de novo approaches are more reliable for evolutionarily distant comparisons. PMID:27295958

  10. BayesCall: A model-based base-calling algorithm for high-throughput short-read sequencing.

    PubMed

    Kao, Wei-Chun; Stevens, Kristian; Song, Yun S

    2009-10-01

    Extracting sequence information from raw images of fluorescence is the foundation underlying several high-throughput sequencing platforms. Some of the main challenges associated with this technology include reducing the error rate, assigning accurate base-specific quality scores, and reducing the cost of sequencing by increasing the throughput per run. To demonstrate how computational advancement can help to meet these challenges, a novel model-based base-calling algorithm, BayesCall, is introduced for the Illumina sequencing platform. Being founded on the tools of statistical learning, BayesCall is flexible enough to incorporate various features of the sequencing process. In particular, it can easily incorporate time-dependent parameters and model residual effects. This new approach significantly improves the accuracy over Illumina's base-caller Bustard, particularly in the later cycles of a sequencing run. For 76-cycle data on a standard viral sample, phiX174, BayesCall improves Bustard's average per-base error rate by approximately 51%. The probability of observing each base can be readily computed in BayesCall, and this probability can be transformed into a useful base-specific quality score with a high discrimination ability. A detailed study of BayesCall's performance is presented here. PMID:19661376

  11. Short read alignment with populations of genomes

    PubMed Central

    Huang, Lin; Popic, Victoria; Batzoglou, Serafim

    2013-01-01

    Summary: The increasing availability of high-throughput sequencing technologies has led to thousands of human genomes having been sequenced in the past years. Efforts such as the 1000 Genomes Project further add to the availability of human genome variation data. However, to date, there is no method that can map reads of a newly sequenced human genome to a large collection of genomes. Instead, methods rely on aligning reads to a single reference genome. This leads to inherent biases and lower accuracy. To tackle this problem, a new alignment tool BWBBLE is introduced in this article. We (i) introduce a new compressed representation of a collection of genomes, which explicitly tackles the genomic variation observed at every position, and (ii) design a new alignment algorithm based on the Burrows–Wheeler transform that maps short reads from a newly sequenced genome to an arbitrary collection of two or more (up to millions of) genomes with high accuracy and no inherent bias to one specific genome. Availability: http://viq854.github.com/bwbble. Contact: serafim@cs.stanford.edu PMID:23813006

  12. Objective and comprehensive evaluation of bisulfite short read mapping tools.

    PubMed

    Tran, Hong; Porter, Jacob; Sun, Ming-An; Xie, Hehuang; Zhang, Liqing

    2014-01-01

    Background. Large-scale bisulfite treatment and short reads sequencing technology allow comprehensive estimation of methylation states of Cs in the genomes of different tissues, cell types, and developmental stages. Accurate characterization of DNA methylation is essential for understanding genotype phenotype association, gene and environment interaction, diseases, and cancer. Aligning bisulfite short reads to a reference genome has been a challenging task. We compared five bisulfite short read mapping tools, BSMAP, Bismark, BS-Seeker, BiSS, and BRAT-BW, representing two classes of mapping algorithms (hash table and suffix/prefix tries). We examined their mapping efficiency (i.e., the percentage of reads that can be mapped to the genomes), usability, running time, and effects of changing default parameter settings using both real and simulated reads. We also investigated how preprocessing data might affect mapping efficiency. Conclusion. Among the five programs compared, in terms of mapping efficiency, Bismark performs the best on the real data, followed by BiSS, BSMAP, and finally BRAT-BW and BS-Seeker with very similar performance. If CPU time is not a constraint, Bismark is a good choice of program for mapping bisulfite treated short reads. Data quality impacts a great deal mapping efficiency. Although increasing the number of mismatches allowed can increase mapping efficiency, it not only significantly slows down the program, but also runs the risk of having increased false positives. Therefore, users should carefully set the related parameters depending on the quality of their sequencing data. PMID:24839440

  13. Droplet barcoding for massively parallel single-molecule deep sequencing

    PubMed Central

    Lan, Freeman; Haliburton, John R.; Yuan, Aaron; Abate, Adam R.

    2016-01-01

    The ability to accurately sequence long DNA molecules is important across biology, but existing sequencers are limited in read length and accuracy. Here, we demonstrate a method to leverage short-read sequencing to obtain long and accurate reads. Using droplet microfluidics, we isolate, amplify, fragment and barcode single DNA molecules in aqueous picolitre droplets, allowing the full-length molecules to be sequenced with multi-fold coverage using short-read sequencing. We show that this approach can provide accurate sequences of up to 10 kb, allowing us to identify rare mutations below the detection limit of conventional sequencing and directly link them into haplotypes. This barcoding methodology can be a powerful tool in sequencing heterogeneous populations such as viruses. PMID:27353563

  14. Qualitative De Novo Analysis of Full Length cDNA and Quantitative Analysis of Gene Expression for Common Marmoset (Callithrix jacchus) Transcriptomes Using Parallel Long-Read Technology and Short-Read Sequencing

    PubMed Central

    Uno, Yasuhiro; Uehara, Shotaro; Inoue, Takashi; Murayama, Norie; Onodera, Jun; Sasaki, Erika; Yamazaki, Hiroshi

    2014-01-01

    The common marmoset (Callithrix jacchus) is a non-human primate that could prove useful as human pharmacokinetic and biomedical research models. The cytochromes P450 (P450s) are a superfamily of enzymes that have critical roles in drug metabolism and disposition via monooxygenation of a broad range of xenobiotics; however, information on some marmoset P450s is currently limited. Therefore, identification and quantitative analysis of tissue-specific mRNA transcripts, including those of P450s and flavin-containing monooxygenases (FMO, another monooxygenase family), need to be carried out in detail before the marmoset can be used as an animal model in drug development. De novo assembly and expression analysis of marmoset transcripts were conducted with pooled liver, intestine, kidney, and brain samples from three male and three female marmosets. After unique sequences were automatically aligned by assembling software, the mean contig length was 718 bp (with a standard deviation of 457 bp) among a total of 47,883 transcripts. Approximately 30% of the total transcripts were matched to known marmoset sequences. Gene expression in 18 marmoset P450- and 4 FMO-like genes displayed some tissue-specific patterns. Of these, the three most highly expressed in marmoset liver were P450 2D-, 2E-, and 3A-like genes. In extrahepatic tissues, including brain, gene expressions of these monooxygenases were lower than those in liver, although P450 3A4 (previously P450 3A21) in intestine and P450 4A11- and FMO1-like genes in kidney were relatively highly expressed. By means of massive parallel long-read sequencing and short-read technology applied to marmoset liver, intestine, kidney, and brain, the combined next-generation sequencing analyses reported here were able to identify novel marmoset drug-metabolizing P450 transcripts that have until now been little reported. These results provide a foundation for mechanistic studies and pave the way for the use of marmosets as model animals

  15. A hybrid short read mapping accelerator

    PubMed Central

    2013-01-01

    Background The rapid growth of short read datasets poses a new challenge to the short read mapping problem in terms of sensitivity and execution speed. Existing methods often use a restrictive error model for computing the alignments to improve speed, whereas more flexible error models are generally too slow for large-scale applications. A number of short read mapping software tools have been proposed. However, designs based on hardware are relatively rare. Field programmable gate arrays (FPGAs) have been successfully used in a number of specific application areas, such as the DSP and communications domains due to their outstanding parallel data processing capabilities, making them a competitive platform to solve problems that are “inherently parallel”. Results We present a hybrid system for short read mapping utilizing both FPGA-based hardware and CPU-based software. The computation intensive alignment and the seed generation operations are mapped onto an FPGA. We present a computationally efficient, parallel block-wise alignment structure (Align Core) to approximate the conventional dynamic programming algorithm. The performance is compared to the multi-threaded CPU-based GASSST and BWA software implementations. For single-end alignment, our hybrid system achieves faster processing speed than GASSST (with a similar sensitivity) and BWA (with a higher sensitivity); for pair-end alignment, our design achieves a slightly worse sensitivity than that of BWA but has a higher processing speed. Conclusions This paper shows that our hybrid system can effectively accelerate the mapping of short reads to a reference genome based on the seed-and-extend approach. The performance comparison to the GASSST and BWA software implementations under different conditions shows that our hybrid design achieves a high degree of sensitivity and requires less overall execution time with only modest FPGA resource utilization. Our hybrid system design also shows that the performance

  16. NGS-based deep bisulfite sequencing.

    PubMed

    Lee, Suman; Kim, Joomyeong

    2016-01-01

    We have developed an NGS-based deep bisulfite sequencing protocol for the DNA methylation analysis of genomes. This approach allows the rapid and efficient construction of NGS-ready libraries with a large number of PCR products that have been individually amplified from bisulfite-converted DNA. This approach also employs a bioinformatics strategy to sort the raw sequence reads generated from NGS platforms and subsequently to derive DNA methylation levels for individual loci. The results demonstrated that this NGS-based deep bisulfite sequencing approach provide not only DNA methylation levels but also informative DNA methylation patterns that have not been seen through other existing methods.•This protocol provides an efficient method generating NGS-ready libraries from individually amplified PCR products.•This protocol provides a bioinformatics strategy sorting NGS-derived raw sequence reads.•This protocol provides deep bisulfite sequencing results that can measure DNA methylation levels and patterns of individual loci.

  17. EC: an efficient error correction algorithm for short reads

    PubMed Central

    2015-01-01

    Background In highly parallel next-generation sequencing (NGS) techniques millions to billions of short reads are produced from a genomic sequence in a single run. Due to the limitation of the NGS technologies, there could be errors in the reads. The error rate of the reads can be reduced with trimming and by correcting the erroneous bases of the reads. It helps to achieve high quality data and the computational complexity of many biological applications will be greatly reduced if the reads are first corrected. We have developed a novel error correction algorithm called EC and compared it with four other state-of-the-art algorithms using both real and simulated sequencing reads. Results We have done extensive and rigorous experiments that reveal that EC is indeed an effective, scalable, and efficient error correction tool. Real reads that we have employed in our performance evaluation are Illumina-generated short reads of various lengths. Six experimental datasets we have utilized are taken from sequence and read archive (SRA) at NCBI. The simulated reads are obtained by picking substrings from random positions of reference genomes. To introduce errors, some of the bases of the simulated reads are changed to other bases with some probabilities. Conclusions Error correction is a vital problem in biology especially for NGS data. In this paper we present a novel algorithm, called Error Corrector (EC), for correcting substitution errors in biological sequencing reads. We plan to investigate the possibility of employing the techniques introduced in this research paper to handle insertion and deletion errors also. Software availability The implementation is freely available for non-commercial purposes. It can be downloaded from: http://engr.uconn.edu/~rajasek/EC.zip. PMID:26678663

  18. A Bayesian Assignment Method for Ambiguous Bisulfite Short Reads.

    PubMed

    Tran, Hong; Wu, Xiaowei; Tithi, Saima; Sun, Ming-an; Xie, Hehuang; Zhang, Liqing

    2016-01-01

    DNA methylation is an epigenetic modification critical for normal development and diseases. The determination of genome-wide DNA methylation at single-nucleotide resolution is made possible by sequencing bisulfite treated DNA with next generation high-throughput sequencing. However, aligning bisulfite short reads to a reference genome remains challenging as only a limited proportion of them (around 50-70%) can be aligned uniquely; a significant proportion, known as multireads, are mapped to multiple locations and thus discarded from downstream analyses, causing financial waste and biased methylation inference. To address this issue, we develop a Bayesian model that assigns multireads to their most likely locations based on the posterior probability derived from information hidden in uniquely aligned reads. Analyses of both simulated data and real hairpin bisulfite sequencing data show that our method can effectively assign approximately 70% of the multireads to their best locations with up to 90% accuracy, leading to a significant increase in the overall mapping efficiency. Moreover, the assignment model shows robust performance with low coverage depth, making it particularly attractive considering the prohibitive cost of bisulfite sequencing. Additionally, results show that longer reads help improve the performance of the assignment model. The assignment model is also robust to varying degrees of methylation and varying sequencing error rates. Finally, incorporating prior knowledge on mutation rate and context specific methylation level into the assignment model increases inference accuracy. The assignment model is implemented in the BAM-ABS package and freely available at https://github.com/zhanglabvt/BAM_ABS.

  19. CRISPR Detection From Short Reads Using Partial Overlap Graphs.

    PubMed

    Ben-Bassat, Ilan; Chor, Benny

    2016-06-01

    Clustered regularly interspaced short palindromic repeats (CRISPR) are structured regions in bacterial and archaeal genomes, which are part of an adaptive immune system against phages. CRISPRs are important for many microbial studies and are playing an essential role in current gene editing techniques. As such, they attract substantial research interest. The exponential growth in the amount of bacterial sequence data in recent years enables the exploration of CRISPR loci in more and more species. Most of the automated tools that detect CRISPR loci rely on fully assembled genomes. However, many assemblers do not handle repetitive regions successfully. The first tool to work directly on raw sequence data is Crass, which requires reads that are long enough to contain two copies of the same repeat. We present a method to identify CRISPR repeats from raw sequence data of short reads. The algorithm is based on an observation differentiating CRISPR repeats from other types of repeats, and it involves a series of partial constructions of the overlap graph. This enables us to avoid many of the difficulties that assemblers face, as we merely aim to identify the repeats that belong to CRISPR loci. A preliminary implementation of the algorithm shows good results and detects CRISPR repeats in cases where other existing tools fail to do so.

  20. SNP discovery and allele frequency estimation by deep sequencing of reduced representation libraries

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genome projects routinely produce draft sequences for species from diverse evolutionary clades, but generally do not create single nucleotide polymorphism (SNP) resources. We present an approach for de novo SNP discovery based on short-read sequencing of reduced representation libraries (RRL) to ge...

  1. Optimization of De Novo Short Read Assembly of Seabuckthorn (Hippophae rhamnoides L.) Transcriptome

    PubMed Central

    Ghangal, Rajesh; Chaudhary, Saurabh; Jain, Mukesh; Purty, Ram Singh; Chand Sharma, Prakash

    2013-01-01

    Seabuckthorn (Hippophaerhamnoides L.) is known for its medicinal, nutritional and environmental importance since ancient times. However, very limited efforts have been made to characterize the genome and transcriptome of this wonder plant. Here, we report the use of next generation massive parallel sequencing technology (Illumina platform) and de novo assembly to gain a comprehensive view of the seabuckthorn transcriptome. We assembled 86,253,874 high quality short reads using six assembly tools. At our hand, assembly of non-redundant short reads following a two-step procedure was found to be the best considering various assembly quality parameters. Initially, ABySS tool was used following an additive k-mer approach. The assembled transcripts were subsequently subjected to TGICL suite. Finally, de novo short read assembly yielded 88,297 transcripts (> 100 bp), representing about 53 Mb of seabuckthorn transcriptome. The average length of transcripts was 610 bp, N50 length 1198 BP and 91% of the short reads uniquely mapped back to seabuckthorn transcriptome. A total of 41,340 (46.8%) transcripts showed significant similarity with sequences present in nr protein databases of NCBI (E-value < 1E-06). We also screened the assembled transcripts for the presence of transcription factors and simple sequence repeats. Our strategy involving the use of short read assembler (ABySS) followed by TGICL will be useful for the researchers working with a non-model organism’s transcriptome in terms of saving time and reducing complexity in data management. The seabuckthorn transcriptome data generated here provide a valuable resource for gene discovery and development of functional molecular markers. PMID:23991119

  2. Deep Whole-Genome Sequencing to Detect Mixed Infection of Mycobacterium tuberculosis

    PubMed Central

    Gan, Mingyu; Liu, Qingyun; Yang, Chongguang; Gao, Qian; Luo, Tao

    2016-01-01

    Mixed infection by multiple Mycobacterium tuberculosis (MTB) strains is associated with poor treatment outcome of tuberculosis (TB). Traditional genotyping methods have been used to detect mixed infections of MTB, however, their sensitivity and resolution are limited. Deep whole-genome sequencing (WGS) has been proved highly sensitive and discriminative for studying population heterogeneity of MTB. Here, we developed a phylogenetic-based method to detect MTB mixed infections using WGS data. We collected published WGS data of 782 global MTB strains from public database. We called homogeneous and heterogeneous single nucleotide variations (SNVs) of individual strains by mapping short reads to the ancestral MTB reference genome. We constructed a phylogenomic database based on 68,639 homogeneous SNVs of 652 MTB strains. Mixed infections were determined if multiple evolutionary paths were identified by mapping the SNVs of individual samples to the phylogenomic database. By simulation, our method could specifically detect mixed infections when the sequencing depth of minor strains was as low as 1× coverage, and when the genomic distance of two mixed strains was as small as 16 SNVs. By applying our methods to all 782 samples, we detected 47 mixed infections and 45 of them were caused by locally endemic strains. The results indicate that our method is highly sensitive and discriminative for identifying mixed infections from deep WGS data of MTB isolates. PMID:27391214

  3. Deep Whole-Genome Sequencing to Detect Mixed Infection of Mycobacterium tuberculosis.

    PubMed

    Gan, Mingyu; Liu, Qingyun; Yang, Chongguang; Gao, Qian; Luo, Tao

    2016-01-01

    Mixed infection by multiple Mycobacterium tuberculosis (MTB) strains is associated with poor treatment outcome of tuberculosis (TB). Traditional genotyping methods have been used to detect mixed infections of MTB, however, their sensitivity and resolution are limited. Deep whole-genome sequencing (WGS) has been proved highly sensitive and discriminative for studying population heterogeneity of MTB. Here, we developed a phylogenetic-based method to detect MTB mixed infections using WGS data. We collected published WGS data of 782 global MTB strains from public database. We called homogeneous and heterogeneous single nucleotide variations (SNVs) of individual strains by mapping short reads to the ancestral MTB reference genome. We constructed a phylogenomic database based on 68,639 homogeneous SNVs of 652 MTB strains. Mixed infections were determined if multiple evolutionary paths were identified by mapping the SNVs of individual samples to the phylogenomic database. By simulation, our method could specifically detect mixed infections when the sequencing depth of minor strains was as low as 1× coverage, and when the genomic distance of two mixed strains was as small as 16 SNVs. By applying our methods to all 782 samples, we detected 47 mixed infections and 45 of them were caused by locally endemic strains. The results indicate that our method is highly sensitive and discriminative for identifying mixed infections from deep WGS data of MTB isolates. PMID:27391214

  4. A Bayesian Assignment Method for Ambiguous Bisulfite Short Reads

    PubMed Central

    Tran, Hong; Wu, Xiaowei; Tithi, Saima; Sun, Ming-an; Xie, Hehuang; Zhang, Liqing

    2016-01-01

    DNA methylation is an epigenetic modification critical for normal development and diseases. The determination of genome-wide DNA methylation at single-nucleotide resolution is made possible by sequencing bisulfite treated DNA with next generation high-throughput sequencing. However, aligning bisulfite short reads to a reference genome remains challenging as only a limited proportion of them (around 50–70%) can be aligned uniquely; a significant proportion, known as multireads, are mapped to multiple locations and thus discarded from downstream analyses, causing financial waste and biased methylation inference. To address this issue, we develop a Bayesian model that assigns multireads to their most likely locations based on the posterior probability derived from information hidden in uniquely aligned reads. Analyses of both simulated data and real hairpin bisulfite sequencing data show that our method can effectively assign approximately 70% of the multireads to their best locations with up to 90% accuracy, leading to a significant increase in the overall mapping efficiency. Moreover, the assignment model shows robust performance with low coverage depth, making it particularly attractive considering the prohibitive cost of bisulfite sequencing. Additionally, results show that longer reads help improve the performance of the assignment model. The assignment model is also robust to varying degrees of methylation and varying sequencing error rates. Finally, incorporating prior knowledge on mutation rate and context specific methylation level into the assignment model increases inference accuracy. The assignment model is implemented in the BAM-ABS package and freely available at https://github.com/zhanglabvt/BAM_ABS. PMID:27011215

  5. Simultaneous alignment of short reads against multiple genomes

    PubMed Central

    Schneeberger, Korbinian; Hagmann, Jörg; Ossowski, Stephan; Warthmann, Norman; Gesing, Sandra; Kohlbacher, Oliver; Weigel, Detlef

    2009-01-01

    Genome resequencing with short reads generally relies on alignments against a single reference. GenomeMapper supports simultaneous mapping of short reads against multiple genomes by integrating related genomes (e.g., individuals of the same species) into a single graph structure. It constitutes the first approach for handling multiple references and introduces representations for alignments against complex structures. Demonstrated benefits include access to polymorphisms that cannot be identified by alignments against the reference alone. Download GenomeMapper at . PMID:19761611

  6. Deep Ion Torrent sequencing identifies soil fungal community shifts after frequent prescribed fires in a southeastern US forest ecosystem.

    PubMed

    Brown, Shawn P; Callaham, Mac A; Oliver, Alena K; Jumpponen, Ari

    2013-12-01

    Prescribed burning is a common management tool to control fuel loads, ground vegetation, and facilitate desirable game species. We evaluated soil fungal community responses to long-term prescribed fire treatments in a loblolly pine forest on the Piedmont of Georgia and utilized deep Internal Transcribed Spacer Region 1 (ITS1) amplicon sequencing afforded by the recent Ion Torrent Personal Genome Machine (PGM). These deep sequence data (19,000 + reads per sample after subsampling) indicate that frequent fires (3-year fire interval) shift soil fungus communities, whereas infrequent fires (6-year fire interval) permit system resetting to a state similar to that without prescribed fire. Furthermore, in nonmetric multidimensional scaling analyses, primarily ectomycorrhizal taxa were correlated with axes associated with long fire intervals, whereas soil saprobes tended to be correlated with the frequent fire recurrence. We conclude that (1) multiplexed Ion Torrent PGM analyses allow deep cost effective sequencing of fungal communities but may suffer from short read lengths and inconsistent sequence quality adjacent to the sequencing adaptor; (2) frequent prescribed fires elicit a shift in soil fungal communities; and (3) such shifts do not occur when fire intervals are longer. Our results emphasize the general responsiveness of these forests to management, and the importance of fire return intervals in meeting management objectives.

  7. Biases in small RNA deep sequencing data

    PubMed Central

    Raabe, Carsten A.; Tang, Thean-Hock; Brosius, Juergen; Rozhdestvensky, Timofey S.

    2014-01-01

    High-throughput RNA sequencing (RNA-seq) is considered a powerful tool for novel gene discovery and fine-tuned transcriptional profiling. The digital nature of RNA-seq is also believed to simplify meta-analysis and to reduce background noise associated with hybridization-based approaches. The development of multiplex sequencing enables efficient and economic parallel analysis of gene expression. In addition, RNA-seq is of particular value when low RNA expression or modest changes between samples are monitored. However, recent data uncovered severe bias in the sequencing of small non-protein coding RNA (small RNA-seq or sRNA-seq), such that the expression levels of some RNAs appeared to be artificially enhanced and others diminished or even undetectable. The use of different adapters and barcodes during ligation as well as complex RNA structures and modifications drastically influence cDNA synthesis efficacies and exemplify sources of bias in deep sequencing. In addition, variable specific RNA G/C-content is associated with unequal polymerase chain reaction amplification efficiencies. Given the central importance of RNA-seq to molecular biology and personalized medicine, we review recent findings that challenge small non-protein coding RNA-seq data and suggest approaches and precautions to overcome or minimize bias. PMID:24198247

  8. DistMap: a toolkit for distributed short read mapping on a Hadoop cluster.

    PubMed

    Pandey, Ram Vinay; Schlötterer, Christian

    2013-01-01

    With the rapid and steady increase of next generation sequencing data output, the mapping of short reads has become a major data analysis bottleneck. On a single computer, it can take several days to map the vast quantity of reads produced from a single Illumina HiSeq lane. In an attempt to ameliorate this bottleneck we present a new tool, DistMap - a modular, scalable and integrated workflow to map reads in the Hadoop distributed computing framework. DistMap is easy to use, currently supports nine different short read mapping tools and can be run on all Unix-based operating systems. It accepts reads in FASTQ format as input and provides mapped reads in a SAM/BAM format. DistMap supports both paired-end and single-end reads thereby allowing the mapping of read data produced by different sequencing platforms. DistMap is available from http://code.google.com/p/distmap/

  9. DistMap: a toolkit for distributed short read mapping on a Hadoop cluster.

    PubMed

    Pandey, Ram Vinay; Schlötterer, Christian

    2013-01-01

    With the rapid and steady increase of next generation sequencing data output, the mapping of short reads has become a major data analysis bottleneck. On a single computer, it can take several days to map the vast quantity of reads produced from a single Illumina HiSeq lane. In an attempt to ameliorate this bottleneck we present a new tool, DistMap - a modular, scalable and integrated workflow to map reads in the Hadoop distributed computing framework. DistMap is easy to use, currently supports nine different short read mapping tools and can be run on all Unix-based operating systems. It accepts reads in FASTQ format as input and provides mapped reads in a SAM/BAM format. DistMap supports both paired-end and single-end reads thereby allowing the mapping of read data produced by different sequencing platforms. DistMap is available from http://code.google.com/p/distmap/ PMID:24009693

  10. DistMap: A Toolkit for Distributed Short Read Mapping on a Hadoop Cluster

    PubMed Central

    Pandey, Ram Vinay; Schlötterer, Christian

    2013-01-01

    With the rapid and steady increase of next generation sequencing data output, the mapping of short reads has become a major data analysis bottleneck. On a single computer, it can take several days to map the vast quantity of reads produced from a single Illumina HiSeq lane. In an attempt to ameliorate this bottleneck we present a new tool, DistMap - a modular, scalable and integrated workflow to map reads in the Hadoop distributed computing framework. DistMap is easy to use, currently supports nine different short read mapping tools and can be run on all Unix-based operating systems. It accepts reads in FASTQ format as input and provides mapped reads in a SAM/BAM format. DistMap supports both paired-end and single-end reads thereby allowing the mapping of read data produced by different sequencing platforms. DistMap is available from http://code.google.com/p/distmap/ PMID:24009693

  11. Short-read DNA sequencing yields microsatellite markers for Rheum

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Identifying culinary rhubarb (Rheum ×hybridum Murray) cultivars using morphological characteristics is problematic due to variability within individual genotypes, variation caused by environmental factors, plant and leaf age, similarity between genetically diverse genotypes, multiple cultivar names ...

  12. Deep sequencing: becoming a critical tool in clinical virology.

    PubMed

    Quiñones-Mateu, Miguel E; Avila, Santiago; Reyes-Teran, Gustavo; Martinez, Miguel A

    2014-09-01

    Population (Sanger) sequencing has been the standard method in basic and clinical DNA sequencing for almost 40 years; however, next-generation (deep) sequencing methodologies are now revolutionizing the field of genomics, and clinical virology is no exception. Deep sequencing is highly efficient, producing an enormous amount of information at low cost in a relatively short period of time. High-throughput sequencing techniques have enabled significant contributions to multiples areas in virology, including virus discovery and metagenomics (viromes), molecular epidemiology, pathogenesis, and studies of how viruses to escape the host immune system and antiviral pressures. In addition, new and more affordable deep sequencing-based assays are now being implemented in clinical laboratories. Here, we review the use of the current deep sequencing platforms in virology, focusing on three of the most studied viruses: human immunodeficiency virus (HIV), hepatitis C virus (HCV), and influenza virus.

  13. Deep sequencing-based transcriptome analysis of Plutella xylostella larvae parasitized by Diadegma semiclausum

    PubMed Central

    2011-01-01

    Background Parasitoid insects manipulate their hosts' physiology by injecting various factors into their host upon parasitization. Transcriptomic approaches provide a powerful approach to study insect host-parasitoid interactions at the molecular level. In order to investigate the effects of parasitization by an ichneumonid wasp (Diadegma semiclausum) on the host (Plutella xylostella), the larval transcriptome profile was analyzed using a short-read deep sequencing method (Illumina). Symbiotic polydnaviruses (PDVs) associated with ichneumonid parasitoids, known as ichnoviruses, play significant roles in host immune suppression and developmental regulation. In the current study, D. semiclausum ichnovirus (DsIV) genes expressed in P. xylostella were identified and their sequences compared with other reported PDVs. Five of these genes encode proteins of unknown identity, that have not previously been reported. Results De novo assembly of cDNA sequence data generated 172,660 contigs between 100 and 10000 bp in length; with 35% of > 200 bp in length. Parasitization had significant impacts on expression levels of 928 identified insect host transcripts. Gene ontology data illustrated that the majority of the differentially expressed genes are involved in binding, catalytic activity, and metabolic and cellular processes. In addition, the results show that transcription levels of antimicrobial peptides, such as gloverin, cecropin E and lysozyme, were up-regulated after parasitism. Expression of ichnovirus genes were detected in parasitized larvae with 19 unique sequences identified from five PDV gene families including vankyrin, viral innexin, repeat elements, a cysteine-rich motif, and polar residue rich protein. Vankyrin 1 and repeat element 1 genes showed the highest transcription levels among the DsIV genes. Conclusion This study provides detailed information on differential expression of P. xylostella larval genes following parasitization, DsIV genes expressed in the

  14. Optimizing de novo common wheat transcriptome assembly using short-read RNA-Seq data

    PubMed Central

    2012-01-01

    Background Rapid advances in next-generation sequencing methods have provided new opportunities for transcriptome sequencing (RNA-Seq). The unprecedented sequencing depth provided by RNA-Seq makes it a powerful and cost-efficient method for transcriptome study, and it has been widely used in model organisms and non-model organisms to identify and quantify RNA. For non-model organisms lacking well-defined genomes, de novo assembly is typically required for downstream RNA-Seq analyses, including SNP discovery and identification of genes differentially expressed by phenotypes. Although RNA-Seq has been successfully used to sequence many non-model organisms, the results of de novo assembly from short reads can still be improved by using recent bioinformatic developments. Results In this study, we used 212.6 million pair-end reads, which accounted for 16.2 Gb, to assemble the hexaploid wheat transcriptome. Two state-of-the-art assemblers, Trinity and Trans-ABySS, which use the single and multiple k-mer methods, respectively, were used, and the whole de novo assembly process was divided into the following four steps: pre-assembly, merging different samples, removal of redundancy and scaffolding. We documented every detail of these steps and how these steps influenced assembly performance to gain insight into transcriptome assembly from short reads. After optimization, the assembled transcripts were comparable to Sanger-derived ESTs in terms of both continuity and accuracy. We also provided considerable new wheat transcript data to the community. Conclusions It is feasible to assemble the hexaploid wheat transcriptome from short reads. Special attention should be paid to dealing with multiple samples to balance the spectrum of expression levels and redundancy. To obtain an accurate overview of RNA profiling, removal of redundancy may be crucial in de novo assembly. PMID:22891638

  15. Fast and accurate short read alignment with Burrows–Wheeler transform

    PubMed Central

    Li, Heng; Durbin, Richard

    2009-01-01

    Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ∼10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: rd@sanger.ac.uk PMID:19451168

  16. Deep sequencing increases hepatitis C virus phylogenetic cluster detection compared to Sanger sequencing.

    PubMed

    Montoya, Vincent; Olmstead, Andrea; Tang, Patrick; Cook, Darrel; Janjua, Naveed; Grebely, Jason; Jacka, Brendan; Poon, Art F Y; Krajden, Mel

    2016-09-01

    Effective surveillance and treatment strategies are required to control the hepatitis C virus (HCV) epidemic. Phylogenetic analyses are powerful tools for reconstructing the evolutionary history of viral outbreaks and identifying transmission clusters. These studies often rely on Sanger sequencing which typically generates a single consensus sequence for each infected individual. For rapidly mutating viruses such as HCV, consensus sequencing underestimates the complexity of the viral quasispecies population and could therefore generate different phylogenetic tree topologies. Although deep sequencing provides a more detailed quasispecies characterization, in-depth phylogenetic analyses are challenging due to dataset complexity and computational limitations. Here, we apply deep sequencing to a characterized population to assess its ability to identify phylogenetic clusters compared with consensus Sanger sequencing. For deep sequencing, a sample specific threshold determined by the 50th percentile of the patristic distance distribution for all variants within each individual was used to identify clusters. Among seven patristic distance thresholds tested for the Sanger sequence phylogeny ranging from 0.005-0.06, a threshold of 0.03 was found to provide the maximum balance between positive agreement (samples in a cluster) and negative agreement (samples not in a cluster) relative to the deep sequencing dataset. From 77 HCV seroconverters, 10 individuals were identified in phylogenetic clusters using both methods. Deep sequencing analysis identified an additional 4 individuals and excluded 8 other individuals relative to Sanger sequencing. The application of this deep sequencing approach could be a more effective tool to understand onward HCV transmission dynamics compared with Sanger sequencing, since the incorporation of minority sequence variants improves the discrimination of phylogenetically linked clusters.

  17. GAViT: Genome Assembly Visualization Tool for Short Read Data

    SciTech Connect

    Syed, Aijazuddin; Shapiro, Harris; Tu, Hank; Pangilinan, Jasmyn; Trong, Stephan

    2008-03-14

    It is a challenging job for genome analysts to accurately debug, troubleshoot, and validate genome assembly results. Genome analysts rely on visualization tools to help validate and troubleshoot assembly results, including such problems as mis-assemblies, low-quality regions, and repeats. Short read data adds further complexity and makes it extremely challenging for the visualization tools to scale and to view all needed assembly information. As a result, there is a need for a visualization tool that can scale to display assembly data from the new sequencing technologies. We present Genome Assembly Visualization Tool (GAViT), a highly scalable and interactive assembly visualization tool developed at the DOE Joint Genome Institute (JGI).

  18. Deep sequencing analysis of phage libraries using Illumina platform.

    PubMed

    Matochko, Wadim L; Chu, Kiki; Jin, Bingjie; Lee, Sam W; Whitesides, George M; Derda, Ratmir

    2012-09-01

    This paper presents an analysis of phage-displayed libraries of peptides using Illumina. We describe steps for the preparation of short DNA fragments for deep sequencing and MatLab software for the analysis of the results. Screening of peptide libraries displayed on the surface of bacteriophage (phage display) can be used to discover peptides that bind to any target. The key step in this discovery is the analysis of peptide sequences present in the library. This analysis is usually performed by Sanger sequencing, which is labor intensive and limited to examination of a few hundred phage clones. On the other hand, Illumina deep-sequencing technology can characterize over 10(7) reads in a single run. We applied Illumina sequencing to analyze phage libraries. Using PCR, we isolated the variable regions from M13KE phage vectors from a phage display library. The PCR primers contained (i) sequences flanking the variable region, (ii) barcodes, and (iii) variable 5'-terminal region. We used this approach to examine how diversity of peptides in phage display libraries changes as a result of amplification of libraries in bacteria. Using HiSeq single-end Illumina sequencing of these fragments, we acquired over 2×10(7) reads, 57 base pairs (bp) in length. Each read contained information about the barcode (6bp), one complimentary region (12bp) and a variable region (36bp). We applied this sequencing to a model library of 10(6) unique clones and observed that amplification enriches ∼150 clones, which dominate ∼20% of the library. Deep sequencing, for the first time, characterized the collapse of diversity in phage libraries. The results suggest that screens based on repeated amplification and small-scale sequencing identify a few binding clones and miss thousands of useful clones. The deep sequencing approach described here could identify under-represented clones in phage screens. It could also be instrumental in developing new screening strategies, which can preserve

  19. deepTools: a flexible platform for exploring deep-sequencing data

    PubMed Central

    Ramírez, Fidel; Dündar, Friederike; Diehl, Sarah; Grüning, Björn A.; Manke, Thomas

    2014-01-01

    We present a Galaxy based web server for processing and visualizing deeply sequenced data. The web server's core functionality consists of a suite of newly developed tools, called deepTools, that enable users with little bioinformatic background to explore the results of their sequencing experiments in a standardized setting. Users can upload pre-processed files with continuous data in standard formats and generate heatmaps and summary plots in a straight-forward, yet highly customizable manner. In addition, we offer several tools for the analysis of files containing aligned reads and enable efficient and reproducible generation of normalized coverage files. As a modular and open-source platform, deepTools can easily be expanded and customized to future demands and developments. The deepTools webserver is freely available at http://deeptools.ie-freiburg.mpg.de and is accompanied by extensive documentation and tutorials aimed at conveying the principles of deep-sequencing data analysis. The web server can be used without registration. deepTools can be installed locally either stand-alone or as part of Galaxy. PMID:24799436

  20. deepTools: a flexible platform for exploring deep-sequencing data.

    PubMed

    Ramírez, Fidel; Dündar, Friederike; Diehl, Sarah; Grüning, Björn A; Manke, Thomas

    2014-07-01

    We present a Galaxy based web server for processing and visualizing deeply sequenced data. The web server's core functionality consists of a suite of newly developed tools, called deepTools, that enable users with little bioinformatic background to explore the results of their sequencing experiments in a standardized setting. Users can upload pre-processed files with continuous data in standard formats and generate heatmaps and summary plots in a straight-forward, yet highly customizable manner. In addition, we offer several tools for the analysis of files containing aligned reads and enable efficient and reproducible generation of normalized coverage files. As a modular and open-source platform, deepTools can easily be expanded and customized to future demands and developments. The deepTools webserver is freely available at http://deeptools.ie-freiburg.mpg.de and is accompanied by extensive documentation and tutorials aimed at conveying the principles of deep-sequencing data analysis. The web server can be used without registration. deepTools can be installed locally either stand-alone or as part of Galaxy.

  1. deepTools: a flexible platform for exploring deep-sequencing data.

    PubMed

    Ramírez, Fidel; Dündar, Friederike; Diehl, Sarah; Grüning, Björn A; Manke, Thomas

    2014-07-01

    We present a Galaxy based web server for processing and visualizing deeply sequenced data. The web server's core functionality consists of a suite of newly developed tools, called deepTools, that enable users with little bioinformatic background to explore the results of their sequencing experiments in a standardized setting. Users can upload pre-processed files with continuous data in standard formats and generate heatmaps and summary plots in a straight-forward, yet highly customizable manner. In addition, we offer several tools for the analysis of files containing aligned reads and enable efficient and reproducible generation of normalized coverage files. As a modular and open-source platform, deepTools can easily be expanded and customized to future demands and developments. The deepTools webserver is freely available at http://deeptools.ie-freiburg.mpg.de and is accompanied by extensive documentation and tutorials aimed at conveying the principles of deep-sequencing data analysis. The web server can be used without registration. deepTools can be installed locally either stand-alone or as part of Galaxy. PMID:24799436

  2. DSAP: deep-sequencing small RNA analysis pipeline.

    PubMed

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.

  3. Unbiased Deep Sequencing of RNA Viruses from Clinical Samples.

    PubMed

    Matranga, Christian B; Gladden-Young, Adrianne; Qu, James; Winnicki, Sarah; Nosamiefan, Dolo; Levin, Joshua Z; Sabeti, Pardis C

    2016-01-01

    Here we outline a next-generation RNA sequencing protocol that enables de novo assemblies and intra-host variant calls of viral genomes collected from clinical and biological sources. The method is unbiased and universal; it uses random primers for cDNA synthesis and requires no prior knowledge of the viral sequence content. Before library construction, selective RNase H-based digestion is used to deplete unwanted RNA - including poly(rA) carrier and ribosomal RNA - from the viral RNA sample. Selective depletion improves both the data quality and the number of unique reads in viral RNA sequencing libraries. Moreover, a transposase-based 'tagmentation' step is used in the protocol as it reduces overall library construction time. The protocol has enabled rapid deep sequencing of over 600 Lassa and Ebola virus samples-including collections from both blood and tissue isolates-and is broadly applicable to other microbial genomics studies. PMID:27403729

  4. Unbiased Deep Sequencing of RNA Viruses from Clinical Samples

    PubMed Central

    Matranga, Christian B.; Gladden-Young, Adrianne; Qu, James; Winnicki, Sarah; Nosamiefan, Dolo; Levin, Joshua Z.; Sabeti, Pardis C.

    2016-01-01

    Here we outline a next-generation RNA sequencing protocol that enables de novo assemblies and intra-host variant calls of viral genomes collected from clinical and biological sources. The method is unbiased and universal; it uses random primers for cDNA synthesis and requires no prior knowledge of the viral sequence content. Before library construction, selective RNase H-based digestion is used to deplete unwanted RNA — including poly(rA) carrier and ribosomal RNA — from the viral RNA sample. Selective depletion improves both the data quality and the number of unique reads in viral RNA sequencing libraries. Moreover, a transposase-based 'tagmentation' step is used in the protocol as it reduces overall library construction time. The protocol has enabled rapid deep sequencing of over 600 Lassa and Ebola virus samples-including collections from both blood and tissue isolates-and is broadly applicable to other microbial genomics studies. PMID:27403729

  5. Deep Sequencing Analysis of Apple Infecting Viruses in Korea

    PubMed Central

    Cho, In-Sook; Igori, Davaajargal; Lim, Seungmo; Choi, Gug-Seoun; Hammond, John; Lim, Hyoun-Sub; Moon, Jae Sun

    2016-01-01

    Deep sequencing has generated 52 contigs derived from five viruses; Apple chlorotic leaf spot virus (ACLSV), Apple stem grooving virus (ASGV), Apple stem pitting virus (ASPV), Apple green crinkle associated virus (AGCaV), and Apricot latent virus (ApLV) were identified from eight apple samples showing small leaves and/or growth retardation. Nucleotide (nt) sequence identity of the assembled contigs was from 68% to 99% compared to the reference sequences of the five respective viral genomes. Sequences of ASPV and ASGV were the most abundantly represented by the 52 contigs assembled. The presence of the five viruses in the samples was confirmed by RT-PCR using specific primers based on the sequences of each assembled contig. All five viruses were detected in three of the samples, whereas all samples had mixed infections with at least two viruses. The most frequently detected virus was ASPV, followed by ASGV, ApLV, ACLSV, and AGCaV which were withal found in mixed infections in the tested samples. AGCaV was identified in assembled contigs ID 1012480 and 93549, which showed 82% and 78% nt sequence identity with ORF1 of AGCaV isolate Aurora-1. ApLV was identified in three assembled contigs, ID 65587, 1802365, and 116777, which showed 77%, 78%, and 76% nt sequence identity respectively with ORF1 of ApLV isolate LA2. Deep sequencing assay was shown to be a valuable and powerful tool for detection and identification of known and unknown virome in infected apple trees, here identifying ApLV and AGCaV in commercial orchards in Korea for the first time. PMID:27721694

  6. Deep sequencing of 10,000 human genomes

    PubMed Central

    Pierce, Levi C. T.; Biggs, William H.; di Iulio, Julia; Wong, Emily H. M.; Fabani, Martin M.; Kirkness, Ewen F.; Moustafa, Ahmed; Shah, Naisha; Xie, Chao; Brewerton, Suzanne C.; Bulsara, Nadeem; Garner, Chad; Metzker, Gary; Sandoval, Efren; Perkins, Brad A.; Och, Franz J.; Turpaz, Yaron; Venter, J. Craig

    2016-01-01

    We report on the sequencing of 10,545 human genomes at 30×–40× coverage with an emphasis on quality metrics and novel variant and sequence discovery. We find that 84% of an individual human genome can be sequenced confidently. This high-confidence region includes 91.5% of exon sequence and 95.2% of known pathogenic variant positions. We present the distribution of over 150 million single-nucleotide variants in the coding and noncoding genome. Each newly sequenced genome contributes an average of 8,579 novel variants. In addition, each genome carries on average 0.7 Mb of sequence that is not found in the main build of the hg38 reference genome. The density of this catalog of variation allowed us to construct high-resolution profiles that define genomic sites that are highly intolerant of genetic variation. These results indicate that the data generated by deep genome sequencing is of the quality necessary for clinical use. PMID:27702888

  7. Genetics and Epigenetics of the Skin Meet Deep Sequence

    PubMed Central

    Cheng, Jeffrey B.; Cho, Raymond J.

    2014-01-01

    Rapid advances in next-generation sequencing technology are revolutionizing approaches to genomic and epigenomic studies of skin. Deep sequencing of cutaneous malignancies reveals heavily mutagenized genomes with large numbers of low-prevalence mutations and multiple resistance mechanisms to targeted therapies. Next-generation sequencing approaches have already paid rich dividends in identifying the genetic causes of dermatologic disease, both in heritable mutations and the somatic aberrations that underlie cutaneous mosaicism. Although epigenetic alterations clearly influence tumorigenesis, pluripotent stem cell biology, and epidermal cell lineage decisions, labor and cost-intensive approaches long delayed a genome-scale perspective. New insights into epigenomic mechanisms in skin disease should arise from the accelerating assessment of histone modification, DNA methylation, and related gene expression signatures. PMID:22237701

  8. Deep sequencing approach for investigating infectious agents causing fever.

    PubMed

    Susilawati, T N; Jex, A R; Cantacessi, C; Pearson, M; Navarro, S; Susianto, A; Loukas, A C; McBride, W J H

    2016-07-01

    Acute undifferentiated fever (AUF) poses a diagnostic challenge due to the variety of possible aetiologies. While the majority of AUFs resolve spontaneously, some cases become prolonged and cause significant morbidity and mortality, necessitating improved diagnostic methods. This study evaluated the utility of deep sequencing in fever investigation. DNA and RNA were isolated from plasma/sera of AUF cases being investigated at Cairns Hospital in northern Australia, including eight control samples from patients with a confirmed diagnosis. Following isolation, DNA and RNA were bulk amplified and RNA was reverse transcribed to cDNA. The resulting DNA and cDNA amplicons were subjected to deep sequencing on an Illumina HiSeq 2000 platform. Bioinformatics analysis was performed using the program Kraken and the CLC assembly-alignment pipeline. The results were compared with the outcomes of clinical tests. We generated between 4 and 20 million reads per sample. The results of Kraken and CLC analyses concurred with diagnoses obtained by other means in 87.5 % (7/8) and 25 % (2/8) of control samples, respectively. Some plausible causes of fever were identified in ten patients who remained undiagnosed following routine hospital investigations, including Escherichia coli bacteraemia and scrub typhus that eluded conventional tests. Achromobacter xylosoxidans, Alteromonas macleodii and Enterobacteria phage were prevalent in all samples. A deep sequencing approach of patient plasma/serum samples led to the identification of aetiological agents putatively implicated in AUFs and enabled the study of microbial diversity in human blood. The application of this approach in hospital practice is currently limited by sequencing input requirements and complicated data analysis.

  9. Deep sequencing approach for investigating infectious agents causing fever.

    PubMed

    Susilawati, T N; Jex, A R; Cantacessi, C; Pearson, M; Navarro, S; Susianto, A; Loukas, A C; McBride, W J H

    2016-07-01

    Acute undifferentiated fever (AUF) poses a diagnostic challenge due to the variety of possible aetiologies. While the majority of AUFs resolve spontaneously, some cases become prolonged and cause significant morbidity and mortality, necessitating improved diagnostic methods. This study evaluated the utility of deep sequencing in fever investigation. DNA and RNA were isolated from plasma/sera of AUF cases being investigated at Cairns Hospital in northern Australia, including eight control samples from patients with a confirmed diagnosis. Following isolation, DNA and RNA were bulk amplified and RNA was reverse transcribed to cDNA. The resulting DNA and cDNA amplicons were subjected to deep sequencing on an Illumina HiSeq 2000 platform. Bioinformatics analysis was performed using the program Kraken and the CLC assembly-alignment pipeline. The results were compared with the outcomes of clinical tests. We generated between 4 and 20 million reads per sample. The results of Kraken and CLC analyses concurred with diagnoses obtained by other means in 87.5 % (7/8) and 25 % (2/8) of control samples, respectively. Some plausible causes of fever were identified in ten patients who remained undiagnosed following routine hospital investigations, including Escherichia coli bacteraemia and scrub typhus that eluded conventional tests. Achromobacter xylosoxidans, Alteromonas macleodii and Enterobacteria phage were prevalent in all samples. A deep sequencing approach of patient plasma/serum samples led to the identification of aetiological agents putatively implicated in AUFs and enabled the study of microbial diversity in human blood. The application of this approach in hospital practice is currently limited by sequencing input requirements and complicated data analysis. PMID:27180244

  10. Deep sequencing of HIV: clinical and research applications.

    PubMed

    Chabria, Shiven B; Gupta, Shaili; Kozal, Michael J

    2014-01-01

    Human immunodeficiency virus (HIV) exhibits remarkable diversity in its genomic makeup and exists in any given individual as a complex distribution of closely related but nonidentical genomes called a viral quasispecies, which is subject to genetic variation, competition, and selection. This viral diversity clinically manifests as a selection of mutant variants based on viral fitness in treatment-naive individuals and based on drug-selective pressure in those on antiretroviral therapy (ART). The current standard-of-care ART consists of a combination of antiretroviral agents, which ensures maximal viral suppression while preventing the emergence of drug-resistant HIV variants. Unfortunately, transmission of drug-resistant HIV does occur, affecting 5% to >20% of newly infected individuals. To optimize therapy, clinicians rely on viral genotypic information obtained from conventional population sequencing-based assays, which cannot reliably detect viral variants that constitute <20% of the circulating viral quasispecies. These low-frequency variants can be detected by highly sensitive genotyping methods collectively grouped under the moniker of deep sequencing. Low-frequency variants have been correlated to treatment failures and HIV transmission, and detection of these variants is helping to inform strategies for vaccine development. Here, we discuss the molecular virology of HIV, viral heterogeneity, drug-resistance mutations, and the application of deep sequencing technologies in research and the clinical care of HIV-infected individuals. PMID:24821496

  11. deepTools2: a next generation web server for deep-sequencing data analysis.

    PubMed

    Ramírez, Fidel; Ryan, Devon P; Grüning, Björn; Bhardwaj, Vivek; Kilpert, Fabian; Richter, Andreas S; Heyne, Steffen; Dündar, Friederike; Manke, Thomas

    2016-07-01

    We present an update to our Galaxy-based web server for processing and visualizing deeply sequenced data. Its core tool set, deepTools, allows users to perform complete bioinformatic workflows ranging from quality controls and normalizations of aligned reads to integrative analyses, including clustering and visualization approaches. Since we first described our deepTools Galaxy server in 2014, we have implemented new solutions for many requests from the community and our users. Here, we introduce significant enhancements and new tools to further improve data visualization and interpretation. deepTools continue to be open to all users and freely available as a web service at deeptools.ie-freiburg.mpg.de The new deepTools2 suite can be easily deployed within any Galaxy framework via the toolshed repository, and we also provide source code for command line usage under Linux and Mac OS X. A public and documented API for access to deepTools functionality is also available.

  12. deepTools2: a next generation web server for deep-sequencing data analysis.

    PubMed

    Ramírez, Fidel; Ryan, Devon P; Grüning, Björn; Bhardwaj, Vivek; Kilpert, Fabian; Richter, Andreas S; Heyne, Steffen; Dündar, Friederike; Manke, Thomas

    2016-07-01

    We present an update to our Galaxy-based web server for processing and visualizing deeply sequenced data. Its core tool set, deepTools, allows users to perform complete bioinformatic workflows ranging from quality controls and normalizations of aligned reads to integrative analyses, including clustering and visualization approaches. Since we first described our deepTools Galaxy server in 2014, we have implemented new solutions for many requests from the community and our users. Here, we introduce significant enhancements and new tools to further improve data visualization and interpretation. deepTools continue to be open to all users and freely available as a web service at deeptools.ie-freiburg.mpg.de The new deepTools2 suite can be easily deployed within any Galaxy framework via the toolshed repository, and we also provide source code for command line usage under Linux and Mac OS X. A public and documented API for access to deepTools functionality is also available. PMID:27079975

  13. deepTools2: a next generation web server for deep-sequencing data analysis

    PubMed Central

    Ramírez, Fidel; Ryan, Devon P; Grüning, Björn; Bhardwaj, Vivek; Kilpert, Fabian; Richter, Andreas S; Heyne, Steffen; Dündar, Friederike; Manke, Thomas

    2016-01-01

    We present an update to our Galaxy-based web server for processing and visualizing deeply sequenced data. Its core tool set, deepTools, allows users to perform complete bioinformatic workflows ranging from quality controls and normalizations of aligned reads to integrative analyses, including clustering and visualization approaches. Since we first described our deepTools Galaxy server in 2014, we have implemented new solutions for many requests from the community and our users. Here, we introduce significant enhancements and new tools to further improve data visualization and interpretation. deepTools continue to be open to all users and freely available as a web service at deeptools.ie-freiburg.mpg.de. The new deepTools2 suite can be easily deployed within any Galaxy framework via the toolshed repository, and we also provide source code for command line usage under Linux and Mac OS X. A public and documented API for access to deepTools functionality is also available. PMID:27079975

  14. Assemblathon 1: a competitive assessment of de novo short read assembly methods.

    PubMed

    Earl, Dent; Bradnam, Keith; St John, John; Darling, Aaron; Lin, Dawei; Fass, Joseph; Yu, Hung On Ken; Buffalo, Vince; Zerbino, Daniel R; Diekhans, Mark; Nguyen, Ngan; Ariyaratne, Pramila Nuwantha; Sung, Wing-Kin; Ning, Zemin; Haimel, Matthias; Simpson, Jared T; Fonseca, Nuno A; Birol, İnanç; Docking, T Roderick; Ho, Isaac Y; Rokhsar, Daniel S; Chikhi, Rayan; Lavenier, Dominique; Chapuis, Guillaume; Naquin, Delphine; Maillet, Nicolas; Schatz, Michael C; Kelley, David R; Phillippy, Adam M; Koren, Sergey; Yang, Shiaw-Pyng; Wu, Wei; Chou, Wen-Chi; Srivastava, Anuj; Shaw, Timothy I; Ruby, J Graham; Skewes-Cox, Peter; Betegon, Miguel; Dimon, Michelle T; Solovyev, Victor; Seledtsov, Igor; Kosarev, Petr; Vorobyev, Denis; Ramirez-Gonzalez, Ricardo; Leggett, Richard; MacLean, Dan; Xia, Fangfang; Luo, Ruibang; Li, Zhenyu; Xie, Yinlong; Liu, Binghang; Gnerre, Sante; MacCallum, Iain; Przybylski, Dariusz; Ribeiro, Filipe J; Yin, Shuangye; Sharpe, Ted; Hall, Giles; Kersey, Paul J; Durbin, Richard; Jackman, Shaun D; Chapman, Jarrod A; Huang, Xiaoqiu; DeRisi, Joseph L; Caccamo, Mario; Li, Yingrui; Jaffe, David B; Green, Richard E; Haussler, David; Korf, Ian; Paten, Benedict

    2011-12-01

    Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.

  15. SOAP3-dp: Fast, Accurate and Sensitive GPU-Based Short Read Aligner

    PubMed Central

    Zhu, Xiaoqian; Wu, Edward; Lee, Lap-Kei; Lin, Haoxiang; Zhu, Wenjuan; Cheung, David W.; Ting, Hing-Fung; Yiu, Siu-Ming; Peng, Shaoliang; Yu, Chang; Li, Yingrui; Li, Ruiqiang; Lam, Tak-Wah

    2013-01-01

    To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS), most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including BWA, Bowtie2, SeqAlto, CUSHAW2, GEM and GPU-based aligners BarraCUDA and CUSHAW, SOAP3-dp was found to be two to tens of times faster, while maintaining the highest sensitivity and lowest false discovery rate (FDR) on Illumina reads with different lengths. Transcending its predecessor SOAP3, which does not allow gapped alignment, SOAP3-dp by default tolerates alignment similarity as low as 60%. Real data evaluation using human genome demonstrates SOAP3-dp's power to enable more authentic variants and longer Indels to be discovered. Fosmid sequencing shows a 9.1% FDR on newly discovered deletions. SOAP3-dp natively supports BAM file format and provides the same scoring scheme as BWA, which enables it to be integrated into existing analysis pipelines. SOAP3-dp has been deployed on Amazon-EC2, NIH-Biowulf and Tianhe-1A. PMID:23741504

  16. Optimizing de novo assembly of short-read RNA-seq data for phylogenomics

    PubMed Central

    2013-01-01

    Background RNA-seq has shown huge potential for phylogenomic inferences in non-model organisms. However, error, incompleteness, and redundant assembled transcripts for each gene in de novo assembly of short reads cause noise in analyses and a large amount of missing data in the aligned matrix. To address these problems, we compare de novo assemblies of paired end 90 bp RNA-seq reads using Oases, Trinity, Trans-ABySS and SOAPdenovo-Trans to transcripts from genome annotation of the model plant Ricinus communis. By doing so we evaluate strategies for optimizing total gene coverage and minimizing assembly chimeras and redundancy. Results We found that the frequency and structure of chimeras vary dramatically among different software packages. The differences were largely due to the number of trans-self chimeras that contain repeats in the opposite direction. More than half of the total chimeras in Oases and Trinity were trans-self chimeras. Within each package, we found a trade-off between maximizing reference coverage and minimizing redundancy and chimera rate. In order to reduce redundancy, we investigated three methods: 1) using cap3 and CD-HIT-EST to combine highly similar transcripts, 2) only retaining the transcript with the highest read coverage, or removing the transcript with the lowest read coverage for each subcomponent in Trinity, and 3) filtering Oases single k-mer assemblies by number of transcripts per locus and relative transcript length, and then finding the transcript with the highest read coverage. We then utilized results from blastx against model protein sequences to effectively remove trans chimeras. After optimization, seven assembly strategies among all four packages successfully assembled 42.9–47.1% of reference genes to more than 200 bp, with a chimera rate of 0.92–2.21%, and on average 1.8–3.1 transcripts per reference gene assembled. Conclusions With rapidly improving sequencing and assembly tools, our study provides a framework to

  17. Parallel and Scalable Short-Read Alignment on Multi-Core Clusters Using UPC+.

    PubMed

    González-Domínguez, Jorge; Liu, Yongchao; Schmidt, Bertil

    2016-01-01

    The growth of next-generation sequencing (NGS) datasets poses a challenge to the alignment of reads to reference genomes in terms of alignment quality and execution speed. Some available aligners have been shown to obtain high quality mappings at the expense of long execution times. Finding fast yet accurate software solutions is of high importance to research, since availability and size of NGS datasets continue to increase. In this work we present an efficient parallelization approach for NGS short-read alignment on multi-core clusters. Our approach takes advantage of a distributed shared memory programming model based on the new UPC++ language. Experimental results using the CUSHAW3 aligner show that our implementation based on dynamic scheduling obtains good scalability on multi-core clusters. Through our evaluation, we are able to complete the single-end and paired-end alignments of 246 million reads of length 150 base-pairs in 11.54 and 16.64 minutes, respectively, using 32 nodes with four AMD Opteron 6272 16-core CPUs per node. In contrast, the multi-threaded original tool needs 2.77 and 5.54 hours to perform the same alignments on the 64 cores of one node. The source code of our parallel implementation is publicly available at the CUSHAW3 homepage (http://cushaw3.sourceforge.net). PMID:26731399

  18. Parallel and Scalable Short-Read Alignment on Multi-Core Clusters Using UPC++

    PubMed Central

    González-Domínguez, Jorge; Liu, Yongchao; Schmidt, Bertil

    2016-01-01

    The growth of next-generation sequencing (NGS) datasets poses a challenge to the alignment of reads to reference genomes in terms of alignment quality and execution speed. Some available aligners have been shown to obtain high quality mappings at the expense of long execution times. Finding fast yet accurate software solutions is of high importance to research, since availability and size of NGS datasets continue to increase. In this work we present an efficient parallelization approach for NGS short-read alignment on multi-core clusters. Our approach takes advantage of a distributed shared memory programming model based on the new UPC++ language. Experimental results using the CUSHAW3 aligner show that our implementation based on dynamic scheduling obtains good scalability on multi-core clusters. Through our evaluation, we are able to complete the single-end and paired-end alignments of 246 million reads of length 150 base-pairs in 11.54 and 16.64 minutes, respectively, using 32 nodes with four AMD Opteron 6272 16-core CPUs per node. In contrast, the multi-threaded original tool needs 2.77 and 5.54 hours to perform the same alignments on the 64 cores of one node. The source code of our parallel implementation is publicly available at the CUSHAW3 homepage (http://cushaw3.sourceforge.net). PMID:26731399

  19. Parallel and Scalable Short-Read Alignment on Multi-Core Clusters Using UPC+.

    PubMed

    González-Domínguez, Jorge; Liu, Yongchao; Schmidt, Bertil

    2016-01-01

    The growth of next-generation sequencing (NGS) datasets poses a challenge to the alignment of reads to reference genomes in terms of alignment quality and execution speed. Some available aligners have been shown to obtain high quality mappings at the expense of long execution times. Finding fast yet accurate software solutions is of high importance to research, since availability and size of NGS datasets continue to increase. In this work we present an efficient parallelization approach for NGS short-read alignment on multi-core clusters. Our approach takes advantage of a distributed shared memory programming model based on the new UPC++ language. Experimental results using the CUSHAW3 aligner show that our implementation based on dynamic scheduling obtains good scalability on multi-core clusters. Through our evaluation, we are able to complete the single-end and paired-end alignments of 246 million reads of length 150 base-pairs in 11.54 and 16.64 minutes, respectively, using 32 nodes with four AMD Opteron 6272 16-core CPUs per node. In contrast, the multi-threaded original tool needs 2.77 and 5.54 hours to perform the same alignments on the 64 cores of one node. The source code of our parallel implementation is publicly available at the CUSHAW3 homepage (http://cushaw3.sourceforge.net).

  20. Estimating Copy Number and Allelic Variation at the Immunoglobulin Heavy Chain Locus Using Short Reads.

    PubMed

    Luo, Shishi; Yu, Jane A; Song, Yun S

    2016-09-01

    The study of genomic regions that contain gene copies and structural variation is a major challenge in modern genomics. Unlike variation involving single nucleotide changes, data on the variation of copy number is difficult to collect and few tools exist for analyzing the variation between individuals. The immunoglobulin heavy variable (IGHV) locus, which plays an integral role in the adaptive immune response, is an example of a complex genomic region that varies in gene copy number. Lack of standard methods to genotype this region prevents it from being included in association studies and is holding back the growing field of antibody repertoire analysis. Here we develop a method that takes short reads from high-throughput sequencing and outputs a genetic profile of the IGHV locus with the read coverage depth and a putative nucleotide sequence for each operationally defined gene cluster. Our operationally defined gene clusters aim to address a major challenge in studying the IGHV locus: the high sequence similarity between gene segments in different genomic locations. Tests on simulated data demonstrate that our approach can accurately determine the presence or absence of a gene cluster from reads as short as 70 bp. More detailed resolution on the copy number of gene clusters can be obtained from read coverage depth using longer reads (e.g., ≥ 100 bp). Detail at the nucleotide resolution of single copy genes (genes present in one copy per haplotype) can be determined with 250 bp reads. For IGHV genes with more than one copy, accurate nucleotide-resolution reconstruction is currently beyond the means of our approach. When applied to a family of European ancestry, our pipeline outputs genotypes that are consistent with the family pedigree, confirms existing multigene variants and suggests new copy number variants. This study paves the way for analyzing population-level patterns of variation in IGHV gene clusters in larger diverse datasets and for quantitatively

  1. Estimating Copy Number and Allelic Variation at the Immunoglobulin Heavy Chain Locus Using Short Reads

    PubMed Central

    Luo, Shishi; Song, Yun S.

    2016-01-01

    The study of genomic regions that contain gene copies and structural variation is a major challenge in modern genomics. Unlike variation involving single nucleotide changes, data on the variation of copy number is difficult to collect and few tools exist for analyzing the variation between individuals. The immunoglobulin heavy variable (IGHV) locus, which plays an integral role in the adaptive immune response, is an example of a complex genomic region that varies in gene copy number. Lack of standard methods to genotype this region prevents it from being included in association studies and is holding back the growing field of antibody repertoire analysis. Here we develop a method that takes short reads from high-throughput sequencing and outputs a genetic profile of the IGHV locus with the read coverage depth and a putative nucleotide sequence for each operationally defined gene cluster. Our operationally defined gene clusters aim to address a major challenge in studying the IGHV locus: the high sequence similarity between gene segments in different genomic locations. Tests on simulated data demonstrate that our approach can accurately determine the presence or absence of a gene cluster from reads as short as 70 bp. More detailed resolution on the copy number of gene clusters can be obtained from read coverage depth using longer reads (e.g., ≥ 100 bp). Detail at the nucleotide resolution of single copy genes (genes present in one copy per haplotype) can be determined with 250 bp reads. For IGHV genes with more than one copy, accurate nucleotide-resolution reconstruction is currently beyond the means of our approach. When applied to a family of European ancestry, our pipeline outputs genotypes that are consistent with the family pedigree, confirms existing multigene variants and suggests new copy number variants. This study paves the way for analyzing population-level patterns of variation in IGHV gene clusters in larger diverse datasets and for quantitatively

  2. Clinical actionability enhanced through deep targeted sequencing of solid tumors

    PubMed Central

    Chen, Ken; Meric-Bernstam, Funda; Zhao, Hao; Zhang, Qingxiu; Ezzeddine, Nader; Tang, Lin-ya; Qi, Yuan; Mao, Yong; Chen, Tenghui; Chong, Zechen; Zhou, Wanding; Zheng, Xiaofeng; Johnson, Amber; Aldape, Kenneth D.; Routbort, Mark J.; Luthra, Rajyalakshmi; Kopetz, Scott; Davies, Michael A.; de Groot, John; Moulder, Stacy; Vinod, Ravi; Farhangfar, Carol J.; Shaw, Kenna Mills; Mendelsohn, John; Mills, Gordon B.; Eterovic, Agda Karina

    2015-01-01

    Background Further advances of targeted cancer therapy require comprehensive in-depth profiling of somatic mutations that are present in subpopulations of tumor cells in a clinical tumor sample. However, it is unclear to what extent such intra-tumor heterogeneity is present and whether it may affect clinical decision making. To unravel this challenge, we established a deep targeted sequencing platform to identify potentially actionable DNA alterations in tumor samples. Methods We assayed 515 FFPE tumor samples and matched germline (475 patients) from 11 disease sites by capturing and sequencing all the exons in 201 cancer related genes. Mutations, indels and copy number data were reported. Results We obtained a 1000-fold average sequencing depth and identified 4794 non-synonymous mutations in the samples analyzed, which 15.2% were present at less than 10% allele frequency. Most of these low level mutations occurred at known oncogenic hotspots and are likely functional. Identifying low level mutations improved identification of mutations in actionable genes in 118 (24.84%) patients, among which 47 (9.8%) would otherwise be unactionable. In addition, acquiring ultra-high depth also ensured a low false discovery rate (less than 2.2%) from FFPE samples. Conclusion Our results were as accurate as a commercially available CLIA-compliant hotspot panel, but allowed the detection of a higher number of mutations in actionable genes. Our study revealed the critical importance of acquiring and utilizing high depth in profiling clinical tumor samples and presented a very useful platform for implementing routine sequencing in a cancer care institution. PMID:25626406

  3. Error analysis of deep sequencing of phage libraries: peptides censored in sequencing.

    PubMed

    Matochko, Wadim L; Derda, Ratmir

    2013-01-01

    Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (Sa). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq). Sequencing without any bias and errors is Seq = Sa IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process. PMID:24416071

  4. Diagnosing Balamuthia mandrillaris Encephalitis With Metagenomic Deep Sequencing

    PubMed Central

    Shanbhag, Niraj M.; Reid, Michael J.; Singhal, Neel S.; Gelfand, Jeffrey M.; Sample, Hannah A.; Benkli, Barlas; O'Donovan, Brian D.; Ali, Ibne K.M.; Keating, M. Kelly; Dunnebacke, Thelma H.; Wood, Matthew D.; Bollen, Andrew; DeRisi, Joseph L.

    2015-01-01

    Objective Identification of a particular cause of meningoencephalitis can be challenging owing to the myriad bacteria, viruses, fungi, and parasites that can produce overlapping clinical phenotypes, frequently delaying diagnosis and therapy. Metagenomic deep sequencing (MDS) approaches to infectious disease diagnostics are known for their ability to identify unusual or novel viruses and thus are well suited for investigating possible etiologies of meningoencephalitis. Methods We present the case of a 74‐year‐old woman with endophthalmitis followed by meningoencephalitis. MDS of her cerebrospinal fluid (CSF) was performed to identify an infectious agent. Results Sequences aligning to Balamuthia mandrillaris ribosomal RNA genes were identified in the CSF by MDS. Polymerase chain reaction subsequently confirmed the presence of B. mandrillaris in CSF, brain tissue, and vitreous fluid from the patient's infected eye. B. mandrillaris serology and immunohistochemistry for free‐living amoebas on the brain biopsy tissue were positive. Interpretation The diagnosis was made using MDS after the patient had been hospitalized for several weeks and subjected to costly and invasive testing. MDS is a powerful diagnostic tool with the potential for rapid and unbiased pathogen identification leading to early therapeutic targeting. Ann Neurol 2015;78:Ann Neurol 2015;78:679–696 PMID:26290222

  5. DeepBase: annotation and discovery of microRNAs and other noncoding RNAs from deep-sequencing data.

    PubMed

    Yang, Jian-Hua; Qu, Liang-Hu

    2012-01-01

    Recent advances in high-throughput deep-sequencing technology have produced large numbers of short and long RNA sequences and enabled the detection and profiling of known and novel microRNAs (miRNAs) and other noncoding RNAs (ncRNAs) at unprecedented sensitivity and depth. In this chapter, we describe the use of deepBase, a database that we have developed to integrate all public deep-sequencing data and to facilitate the comprehensive annotation and discovery of miRNAs and other ncRNAs from these data. deepBase provides an integrative, interactive, and versatile web graphical interface to evaluate miRBase-annotated miRNA genes and other known ncRNAs, explores the expression patterns of miRNAs and other ncRNAs, and discovers novel miRNAs and other ncRNAs from deep-sequencing data. deepBase also provides a deepView genome browser to comparatively analyze these data at multiple levels. deepBase is available at http://deepbase.sysu.edu.cn/.

  6. A parallel algorithm for error correction in high-throughput short-read data on CUDA-enabled graphics hardware.

    PubMed

    Shi, Haixiang; Schmidt, Bertil; Liu, Weiguo; Müller-Wittig, Wolfgang

    2010-04-01

    Emerging DNA sequencing technologies open up exciting new opportunities for genome sequencing by generating read data with a massive throughput. However, produced reads are significantly shorter and more error-prone compared to the traditional Sanger shotgun sequencing method. This poses challenges for de novo DNA fragment assembly algorithms in terms of both accuracy (to deal with short, error-prone reads) and scalability (to deal with very large input data sets). In this article, we present a scalable parallel algorithm for correcting sequencing errors in high-throughput short-read data so that error-free reads can be available before DNA fragment assembly, which is of high importance to many graph-based short-read assembly tools. The algorithm is based on spectral alignment and uses the Compute Unified Device Architecture (CUDA) programming model. To gain efficiency we are taking advantage of the CUDA texture memory using a space-efficient Bloom filter data structure for spectrum membership queries. We have tested the runtime and accuracy of our algorithm using real and simulated Illumina data for different read lengths, error rates, input sizes, and algorithmic parameters. Using a CUDA-enabled mass-produced GPU (available for less than US$400 at any local computer outlet), this results in speedups of 12-84 times for the parallelized error correction, and speedups of 3-63 times for both sequential preprocessing and parallelized error correction compared to the publicly available Euler-SR program. Our implementation is freely available for download from http://cuda-ec.sourceforge.net .

  7. Ultra-deep sequencing of foraminiferal microbarcodes unveils hidden richness of early monothalamous lineages in deep-sea sediments

    PubMed Central

    Bachar, Dipankar; Christen, Richard; Esling, Philippe; Baerlocher, Loïc; Østerås, Magne; Farinelli, Laurent; Pawlowski, Jan

    2011-01-01

    Deep-sea floors represent one of the largest and most complex ecosystems on Earth but remain essentially unexplored. The vastness and remoteness of this ecosystem make deep-sea sampling difficult, hampering traditional taxonomic observations and diversity assessment. This problem is particularly true in the case of the deep-sea meiofauna, which largely comprises small-sized, fragile, and difficult-to-identify metazoans and protists. Here, we introduce an ultra-deep sequencing-based metagenetic approach to examine the richness of benthic foraminifera, a principal component of deep-sea meiofauna. We used Illumina sequencing technology to assess foraminiferal richness in 31 unsieved deep-sea sediment samples from five distinct oceanic regions. We sequenced an extremely short fragment (36 bases) of the small subunit ribosomal DNA hypervariable region 37f, which has been shown to accurately distinguish foraminiferal species. In total, we obtained 495,978 unique sequences that were grouped into 1,643 operational taxonomic units, of which about half (841) could be reliably assigned to foraminifera. The vast majority of the operational taxonomic units (nearly 90%) were either assigned to early (ancient) lineages of soft-walled, single-chambered (monothalamous) foraminifera or remained undetermined and yet possibly belong to unknown early lineages. Contrasting with the classical view of multichambered taxa dominating foraminiferal assemblages, our work reflects an unexpected diversity of monothalamous lineages that are as yet unknown using conventional micropaleontological observations. Although we can only speculate about their morphology, the immense richness of deep-sea phylotypes revealed by this study suggests that ultra-deep sequencing can improve understanding of deep-sea benthic diversity considered until now as unknowable based on a traditional taxonomic approach. PMID:21788523

  8. Ultra-deep sequencing of foraminiferal microbarcodes unveils hidden richness of early monothalamous lineages in deep-sea sediments.

    PubMed

    Lecroq, Béatrice; Lejzerowicz, Franck; Bachar, Dipankar; Christen, Richard; Esling, Philippe; Baerlocher, Loïc; Østerås, Magne; Farinelli, Laurent; Pawlowski, Jan

    2011-08-01

    Deep-sea floors represent one of the largest and most complex ecosystems on Earth but remain essentially unexplored. The vastness and remoteness of this ecosystem make deep-sea sampling difficult, hampering traditional taxonomic observations and diversity assessment. This problem is particularly true in the case of the deep-sea meiofauna, which largely comprises small-sized, fragile, and difficult-to-identify metazoans and protists. Here, we introduce an ultra-deep sequencing-based metagenetic approach to examine the richness of benthic foraminifera, a principal component of deep-sea meiofauna. We used Illumina sequencing technology to assess foraminiferal richness in 31 unsieved deep-sea sediment samples from five distinct oceanic regions. We sequenced an extremely short fragment (36 bases) of the small subunit ribosomal DNA hypervariable region 37f, which has been shown to accurately distinguish foraminiferal species. In total, we obtained 495,978 unique sequences that were grouped into 1,643 operational taxonomic units, of which about half (841) could be reliably assigned to foraminifera. The vast majority of the operational taxonomic units (nearly 90%) were either assigned to early (ancient) lineages of soft-walled, single-chambered (monothalamous) foraminifera or remained undetermined and yet possibly belong to unknown early lineages. Contrasting with the classical view of multichambered taxa dominating foraminiferal assemblages, our work reflects an unexpected diversity of monothalamous lineages that are as yet unknown using conventional micropaleontological observations. Although we can only speculate about their morphology, the immense richness of deep-sea phylotypes revealed by this study suggests that ultra-deep sequencing can improve understanding of deep-sea benthic diversity considered until now as unknowable based on a traditional taxonomic approach.

  9. Whole plastome sequencing reveals deep plastid divergence and cytonuclear discordance between closely related balsam poplars, Populus balsamifera and P. trichocarpa (Salicaceae).

    PubMed

    Huang, Daisie I; Hefer, Charles A; Kolosova, Natalia; Douglas, Carl J; Cronk, Quentin C B

    2014-11-01

    As molecular phylogenetic analyses incorporate ever-greater numbers of loci, cases of cytonuclear discordance - the phenomenon in which nuclear gene trees deviate significantly from organellar gene trees - are being reported more frequently. Plant examples of topological discordance, caused by recent hybridization between extant species, are well known. However, examples of branch-length discordance are less reported in plants relative to animals. We use a combination of de novo assembly and reference-based mapping using short-read shotgun sequences to construct a robust phylogeny of the plastome for multiple individuals of all the common Populus species in North America. We demonstrate a case of strikingly high plastome divergence, in contrast to little nuclear genome divergence, in two closely related balsam poplars, Populus balsamifera and Populus trichocarpa (Populus balsamifera ssp. trichocarpa). Previous studies with nuclear loci indicate that the two species (or subspecies) diverged since the late Pleistocene, whereas their plastomes indicate deep divergence, dating to at least the Pliocene (6-7 Myr ago). Our finding is in marked contrast to the estimated Pleistocene divergence of the nuclear genomes, previously calculated at 75 000 yr ago, suggesting plastid capture from a 'ghost lineage' of a now-extinct North American poplar. PMID:25078531

  10. Whole plastome sequencing reveals deep plastid divergence and cytonuclear discordance between closely related balsam poplars, Populus balsamifera and P. trichocarpa (Salicaceae).

    PubMed

    Huang, Daisie I; Hefer, Charles A; Kolosova, Natalia; Douglas, Carl J; Cronk, Quentin C B

    2014-11-01

    As molecular phylogenetic analyses incorporate ever-greater numbers of loci, cases of cytonuclear discordance - the phenomenon in which nuclear gene trees deviate significantly from organellar gene trees - are being reported more frequently. Plant examples of topological discordance, caused by recent hybridization between extant species, are well known. However, examples of branch-length discordance are less reported in plants relative to animals. We use a combination of de novo assembly and reference-based mapping using short-read shotgun sequences to construct a robust phylogeny of the plastome for multiple individuals of all the common Populus species in North America. We demonstrate a case of strikingly high plastome divergence, in contrast to little nuclear genome divergence, in two closely related balsam poplars, Populus balsamifera and Populus trichocarpa (Populus balsamifera ssp. trichocarpa). Previous studies with nuclear loci indicate that the two species (or subspecies) diverged since the late Pleistocene, whereas their plastomes indicate deep divergence, dating to at least the Pliocene (6-7 Myr ago). Our finding is in marked contrast to the estimated Pleistocene divergence of the nuclear genomes, previously calculated at 75 000 yr ago, suggesting plastid capture from a 'ghost lineage' of a now-extinct North American poplar.

  11. Unified View of Backward Backtracking in Short Read Mapping

    NASA Astrophysics Data System (ADS)

    Mäkinen, Veli; Välimäki, Niko; Laaksonen, Antti; Katainen, Riku

    Mapping short DNA reads to the reference genome is the core task in the recent high-throughput technologies to study e.g. protein-DNA interactions (ChIP-seq) and alternative splicing (RNA-seq). Several tools for the task (bowtie, bwa, SOAP2, TopHat) have been developed that exploit Burrows-Wheeler transform and the backward backtracking technique on it, to map the reads to their best approximate occurrences in the genome. These tools use different tailored mechanisms for small error-levels to prune the search phase significantly. We propose a new pruning mechanism that can be seen a generalization of the tailored mechanisms used so far. It uses a novel idea of storing all cyclic rotations of fixed length substrings of the reference sequence with a compressed index that is able to exploit the repetitions created to level out the growth of the input set. For RNA-seq we propose a new method that combines dynamic programming with backtracking to map efficiently and correctly all reads that span two exons. Same mechanism can also be used for mapping mate-pair reads.

  12. Deep-Sea, Deep-Sequencing: Metabarcoding Extracellular DNA from Sediments of Marine Canyons

    PubMed Central

    Guardiola, Magdalena; Uriz, María Jesús; Taberlet, Pierre; Coissac, Eric; Wangensteen, Owen Simon; Turon, Xavier

    2015-01-01

    Marine sediments are home to one of the richest species pools on Earth, but logistics and a dearth of taxonomic work-force hinders the knowledge of their biodiversity. We characterized α- and β-diversity of deep-sea assemblages from submarine canyons in the western Mediterranean using an environmental DNA metabarcoding. We used a new primer set targeting a short eukaryotic 18S sequence (ca. 110 bp). We applied a protocol designed to obtain extractions enriched in extracellular DNA from replicated sediment corers. With this strategy we captured information from DNA (local or deposited from the water column) that persists adsorbed to inorganic particles and buffered short-term spatial and temporal heterogeneity. We analysed replicated samples from 20 localities including 2 deep-sea canyons, 1 shallower canal, and two open slopes (depth range 100–2,250 m). We identified 1,629 MOTUs, among which the dominant groups were Metazoa (with representatives of 19 phyla), Alveolata, Stramenopiles, and Rhizaria. There was a marked small-scale heterogeneity as shown by differences in replicates within corers and within localities. The spatial variability between canyons was significant, as was the depth component in one of the canyons where it was tested. Likewise, the composition of the first layer (1 cm) of sediment was significantly different from deeper layers. We found that qualitative (presence-absence) and quantitative (relative number of reads) data showed consistent trends of differentiation between samples and geographic areas. The subset of exclusively benthic MOTUs showed similar patterns of β-diversity and community structure as the whole dataset. Separate analyses of the main metazoan phyla (in number of MOTUs) showed some differences in distribution attributable to different lifestyles. Our results highlight the differentiation that can be found even between geographically close assemblages, and sets the ground for future monitoring and conservation efforts on

  13. Deep-Sea, Deep-Sequencing: Metabarcoding Extracellular DNA from Sediments of Marine Canyons.

    PubMed

    Guardiola, Magdalena; Uriz, María Jesús; Taberlet, Pierre; Coissac, Eric; Wangensteen, Owen Simon; Turon, Xavier

    2015-01-01

    Marine sediments are home to one of the richest species pools on Earth, but logistics and a dearth of taxonomic work-force hinders the knowledge of their biodiversity. We characterized α- and β-diversity of deep-sea assemblages from submarine canyons in the western Mediterranean using an environmental DNA metabarcoding. We used a new primer set targeting a short eukaryotic 18S sequence (ca. 110 bp). We applied a protocol designed to obtain extractions enriched in extracellular DNA from replicated sediment corers. With this strategy we captured information from DNA (local or deposited from the water column) that persists adsorbed to inorganic particles and buffered short-term spatial and temporal heterogeneity. We analysed replicated samples from 20 localities including 2 deep-sea canyons, 1 shallower canal, and two open slopes (depth range 100-2,250 m). We identified 1,629 MOTUs, among which the dominant groups were Metazoa (with representatives of 19 phyla), Alveolata, Stramenopiles, and Rhizaria. There was a marked small-scale heterogeneity as shown by differences in replicates within corers and within localities. The spatial variability between canyons was significant, as was the depth component in one of the canyons where it was tested. Likewise, the composition of the first layer (1 cm) of sediment was significantly different from deeper layers. We found that qualitative (presence-absence) and quantitative (relative number of reads) data showed consistent trends of differentiation between samples and geographic areas. The subset of exclusively benthic MOTUs showed similar patterns of β-diversity and community structure as the whole dataset. Separate analyses of the main metazoan phyla (in number of MOTUs) showed some differences in distribution attributable to different lifestyles. Our results highlight the differentiation that can be found even between geographically close assemblages, and sets the ground for future monitoring and conservation efforts on

  14. Deep-Sea, Deep-Sequencing: Metabarcoding Extracellular DNA from Sediments of Marine Canyons.

    PubMed

    Guardiola, Magdalena; Uriz, María Jesús; Taberlet, Pierre; Coissac, Eric; Wangensteen, Owen Simon; Turon, Xavier

    2015-01-01

    Marine sediments are home to one of the richest species pools on Earth, but logistics and a dearth of taxonomic work-force hinders the knowledge of their biodiversity. We characterized α- and β-diversity of deep-sea assemblages from submarine canyons in the western Mediterranean using an environmental DNA metabarcoding. We used a new primer set targeting a short eukaryotic 18S sequence (ca. 110 bp). We applied a protocol designed to obtain extractions enriched in extracellular DNA from replicated sediment corers. With this strategy we captured information from DNA (local or deposited from the water column) that persists adsorbed to inorganic particles and buffered short-term spatial and temporal heterogeneity. We analysed replicated samples from 20 localities including 2 deep-sea canyons, 1 shallower canal, and two open slopes (depth range 100-2,250 m). We identified 1,629 MOTUs, among which the dominant groups were Metazoa (with representatives of 19 phyla), Alveolata, Stramenopiles, and Rhizaria. There was a marked small-scale heterogeneity as shown by differences in replicates within corers and within localities. The spatial variability between canyons was significant, as was the depth component in one of the canyons where it was tested. Likewise, the composition of the first layer (1 cm) of sediment was significantly different from deeper layers. We found that qualitative (presence-absence) and quantitative (relative number of reads) data showed consistent trends of differentiation between samples and geographic areas. The subset of exclusively benthic MOTUs showed similar patterns of β-diversity and community structure as the whole dataset. Separate analyses of the main metazoan phyla (in number of MOTUs) showed some differences in distribution attributable to different lifestyles. Our results highlight the differentiation that can be found even between geographically close assemblages, and sets the ground for future monitoring and conservation efforts on

  15. Complete Genome Sequence of Bacteriophage Deep-Blue Infecting Emetic Bacillus cereus.

    PubMed

    Hock, Louise; Gillis, Annika; Mahillon, Jacques

    2016-01-01

    The Bacillus cereus emetic pathotype is responsible for important food-borne intoxications. Here, we describe the complete genome sequence of bacteriophage Deep-Blue, which is able to infect emetic strains of B. cereus Deep-Blue is a 159-kb myophage of the Bastille-like group within the Spounavirinae.

  16. Virus identification in unknown tropical febrile illness cases using deep sequencing.

    PubMed

    Yozwiak, Nathan L; Skewes-Cox, Peter; Stenglein, Mark D; Balmaseda, Angel; Harris, Eva; DeRisi, Joseph L

    2012-01-01

    Dengue virus is an emerging infectious agent that infects an estimated 50-100 million people annually worldwide, yet current diagnostic practices cannot detect an etiologic pathogen in ∼40% of dengue-like illnesses. Metagenomic approaches to pathogen detection, such as viral microarrays and deep sequencing, are promising tools to address emerging and non-diagnosable disease challenges. In this study, we used the Virochip microarray and deep sequencing to characterize the spectrum of viruses present in human sera from 123 Nicaraguan patients presenting with dengue-like symptoms but testing negative for dengue virus. We utilized a barcoding strategy to simultaneously deep sequence multiple serum specimens, generating on average over 1 million reads per sample. We then implemented a stepwise bioinformatic filtering pipeline to remove the majority of human and low-quality sequences to improve the speed and accuracy of subsequent unbiased database searches. By deep sequencing, we were able to detect virus sequence in 37% (45/123) of previously negative cases. These included 13 cases with Human Herpesvirus 6 sequences. Other samples contained sequences with similarity to sequences from viruses in the Herpesviridae, Flaviviridae, Circoviridae, Anelloviridae, Asfarviridae, and Parvoviridae families. In some cases, the putative viral sequences were virtually identical to known viruses, and in others they diverged, suggesting that they may derive from novel viruses. These results demonstrate the utility of unbiased metagenomic approaches in the detection of known and divergent viruses in the study of tropical febrile illness.

  17. Sniper: improved SNP discovery by multiply mapping deep sequenced reads.

    PubMed

    Simola, Daniel F; Kim, Junhyong

    2011-06-20

    SNP (single nucleotide polymorphism) discovery using next-generation sequencing data remains difficult primarily because of redundant genomic regions, such as interspersed repetitive elements and paralogous genes, present in all eukaryotic genomes. To address this problem, we developed Sniper, a novel multi-locus Bayesian probabilistic model and a computationally efficient algorithm that explicitly incorporates sequence reads that map to multiple genomic loci. Our model fully accounts for sequencing error, template bias, and multi-locus SNP combinations, maintaining high sensitivity and specificity under a broad range of conditions. An implementation of Sniper is freely available at http://kim.bio.upenn.edu/software/sniper.shtml.

  18. Mutascope: sensitive detection of somatic mutations from deep amplicon sequencing

    PubMed Central

    Yost, Shawn E.; Alakus, Hakan; Matsui, Hiroko; Schwab, Richard B.; Jepsen, Kristen; Frazer, Kelly A.; Harismendy, Olivier

    2013-01-01

    Summary: We present Mutascope, a sequencing analysis pipeline specifically developed for the identification of somatic variants present at low-allelic fraction from high-throughput sequencing of amplicons from matched tumor-normal specimen. Using datasets reproducing tumor genetic heterogeneity, we demonstrate that Mutascope has a higher sensitivity and generates fewer false-positive calls than tools designed for shotgun sequencing or diploid genomes. Availability: Freely available on the web at http://sourceforge.net/projects/mutascope/. Contact: oharismendy@ucsd.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23712659

  19. Stratigraphic sequence architecture of deep-sea clastic system from aerial photographs, Great Valley sequence, northern California

    SciTech Connect

    Paramore, R.C.; Suchecki, R.K.

    1989-04-01

    Lineations interpreted from aerial photographs reveal stratal geometries of deep-sea clastic deposits along an ancient basin margin that was strongly influenced by both subduction and related volcanogenic processes. These stratal patterns of four principal stratigraphic sequences in steeply dipping Tithonian to Valanginian sediments of the Great Valley sequence, northern California, in combination with lithic facies data, illustrate the major components and internal architecture that resulted from eustatic and tectonic variations. Although deposited along a tectonically active margin, the component geometries and internal stratal patterns of the sequences are similar in detail to seismically defined stratigraphic sequences of Vail. The integration of fine-scale stratal architecture based on aerial photograph interpretation and sediment facies using classical models of submarine-fan deposits illustrates the depositional and stratigraphic evolution of a convegent deep-sea margin.

  20. Deep Sequencing Analysis of Nucleolar Small RNAs: Bioinformatics.

    PubMed

    Bai, Baoyan; Laiho, Marikki

    2016-01-01

    Small RNAs (size 20-30 nt) of various types have been actively investigated in recent years, and their subcellular compartmentalization and relative concentrations are likely to be of importance to their cellular and physiological functions. Comprehensive data on this subset of the transcriptome can only be obtained by application of high-throughput sequencing, which yields data that are inherently complex and multidimensional, as sequence composition, length, and abundance will all inform to the small RNA function. Subsequent data analysis, hypothesis testing, and presentation/visualization of the results are correspondingly challenging. We have constructed small RNA libraries derived from different cellular compartments, including the nucleolus, and asked whether small RNAs exist in the nucleolus and whether they are distinct from cytoplasmic and nuclear small RNAs, the miRNAs. Here, we present a workflow for analysis of small RNA sequencing data generated by the Ion Torrent PGM sequencer from samples derived from different cellular compartments. PMID:27576724

  1. Molecular Diagnosis of Actinomadura madurae Infection by 16S rRNA Deep Sequencing

    PubMed Central

    SenGupta, Dhruba J.; Hoogestraat, Daniel R.; Cummings, Lisa A.; Bryant, Bronwyn H.; Natividad, Catherine; Thielges, Stephanie; Monsaas, Peter W.; Chau, Mimosa; Barbee, Lindley A.; Rosenthal, Christopher; Cookson, Brad T.; Hoffman, Noah G.

    2013-01-01

    Next-generation DNA sequencing can be used to catalog individual organisms within complex, polymicrobial specimens. Here, we utilized deep sequencing of 16S rRNA to implicate Actinomadura madurae as the cause of mycetoma in a diabetic patient when culture and conventional molecular methods were overwhelmed by overgrowth of other organisms. PMID:24108607

  2. Predicting effects of noncoding variants with deep learning–based sequence model

    PubMed Central

    Zhou, Jian; Troyanskaya, Olga G

    2016-01-01

    Identifying functional effects of noncoding variants is a major challenge in human genetics. To predict the noncoding-variant effects de novo from sequence, we developed a deep learning–based algorithmic framework, DeepSEA (http://deepsea.princeton.edu/), that directly learns a regulatory sequence code from large-scale chromatin-profiling data, enabling prediction of chromatin effects of sequence alterations with single-nucleotide sensitivity. We further used this capability to improve prioritization of functional variants including expression quantitative trait loci (eQTLs) and disease-associated variants. PMID:26301843

  3. Protein sequences bound to mineral surfaces persist into deep time

    PubMed Central

    Demarchi, Beatrice; Hall, Shaun; Roncal-Herrero, Teresa; Freeman, Colin L; Woolley, Jos; Crisp, Molly K; Wilson, Julie; Fotakis, Anna; Fischer, Roman; Kessler, Benedikt M; Rakownikow Jersie-Christensen, Rosa; Olsen, Jesper V; Haile, James; Thomas, Jessica; Marean, Curtis W; Parkington, John; Presslee, Samantha; Lee-Thorp, Julia; Ditchfield, Peter; Hamilton, Jacqueline F; Ward, Martyn W; Wang, Chunting Michelle; Shaw, Marvin D; Harrison, Terry; Domínguez-Rodrigo, Manuel; MacPhee, Ross DE; Kwekason, Amandus; Ecker, Michaela; Kolska Horwitz, Liora; Chazan, Michael; Kröger, Roland; Thomas-Oates, Jane; Harding, John H; Cappellini, Enrico; Penkman, Kirsty; Collins, Matthew J

    2016-01-01

    Proteins persist longer in the fossil record than DNA, but the longevity, survival mechanisms and substrates remain contested. Here, we demonstrate the role of mineral binding in preserving the protein sequence in ostrich (Struthionidae) eggshell, including from the palaeontological sites of Laetoli (3.8 Ma) and Olduvai Gorge (1.3 Ma) in Tanzania. By tracking protein diagenesis back in time we find consistent patterns of preservation, demonstrating authenticity of the surviving sequences. Molecular dynamics simulations of struthiocalcin-1 and -2, the dominant proteins within the eggshell, reveal that distinct domains bind to the mineral surface. It is the domain with the strongest calculated binding energy to the calcite surface that is selectively preserved. Thermal age calculations demonstrate that the Laetoli and Olduvai peptides are 50 times older than any previously authenticated sequence (equivalent to ~16 Ma at a constant 10°C). DOI: http://dx.doi.org/10.7554/eLife.17092.001 PMID:27668515

  4. Next generation deep sequencing and vaccine design: today and tomorrow.

    PubMed

    Luciani, Fabio; Bull, Rowena A; Lloyd, Andrew R

    2012-09-01

    Next generation sequencing (NGS) technologies have redefined the modus operandi in both human and microbial genetics research, allowing the unprecedented generation of very large sequencing datasets on a short time scale and at affordable costs. Vaccine development research is rapidly taking full advantage of the advent of NGS. This review provides a concise summary of the current applications of NGS in relation to research seeking to develop vaccines for human infectious diseases, incorporating studies of both the pathogen and the host. We focus on rapidly mutating viral pathogens, which are major targets in current vaccine research. NGS is unraveling the complex dynamics of viral evolution and host responses against these viruses, thus contributing substantially to the likelihood of successful vaccine development.

  5. Using Amplicon Deep Sequencing to Detect Genetic Signatures of Plasmodium vivax Relapse

    PubMed Central

    Lin, Jessica T.; Hathaway, Nicholas J.; Saunders, David L.; Lon, Chanthap; Balasubramanian, Sujata; Kharabora, Oksana; Gosi, Panita; Sriwichai, Sabaithip; Kartchner, Laurel; Chuor, Char Meng; Satharath, Prom; Lanteri, Charlotte; Bailey, Jeffrey A.; Juliano, Jonathan J.

    2015-01-01

    Plasmodium vivax infections often recur due to relapse of hypnozoites from the liver. In malaria-endemic areas, tools to distinguish relapse from reinfection are needed. We applied amplicon deep sequencing to P. vivax isolates from 78 Cambodian volunteers, nearly one-third of whom suffered recurrence at a median of 68 days. Deep sequencing at a highly variable region of the P. vivax merozoite surface protein 1 gene revealed impressive diversity—generating 67 unique haplotypes and detecting on average 3.6 cocirculating parasite clones within individuals, compared to 2.1 clones detected by a combination of 3 microsatellite markers. This diversity enabled a scheme to classify over half of recurrences as probable relapses based on the low probability of reinfection by multiple recurring variants. In areas of high P. vivax diversity, targeted deep sequencing can help detect genetic signatures of relapse, key to evaluating antivivax interventions and achieving a better understanding of relapse-reinfection epidemiology. PMID:25748326

  6. Using Amplicon Deep Sequencing to Detect Genetic Signatures of Plasmodium vivax Relapse.

    PubMed

    Lin, Jessica T; Hathaway, Nicholas J; Saunders, David L; Lon, Chanthap; Balasubramanian, Sujata; Kharabora, Oksana; Gosi, Panita; Sriwichai, Sabaithip; Kartchner, Laurel; Chuor, Char Meng; Satharath, Prom; Lanteri, Charlotte; Bailey, Jeffrey A; Juliano, Jonathan J

    2015-09-15

    Plasmodium vivax infections often recur due to relapse of hypnozoites from the liver. In malaria-endemic areas, tools to distinguish relapse from reinfection are needed. We applied amplicon deep sequencing to P. vivax isolates from 78 Cambodian volunteers, nearly one-third of whom suffered recurrence at a median of 68 days. Deep sequencing at a highly variable region of the P. vivax merozoite surface protein 1 gene revealed impressive diversity-generating 67 unique haplotypes and detecting on average 3.6 cocirculating parasite clones within individuals, compared to 2.1 clones detected by a combination of 3 microsatellite markers. This diversity enabled a scheme to classify over half of recurrences as probable relapses based on the low probability of reinfection by multiple recurring variants. In areas of high P. vivax diversity, targeted deep sequencing can help detect genetic signatures of relapse, key to evaluating antivivax interventions and achieving a better understanding of relapse-reinfection epidemiology.

  7. Deep Sequencing Analysis of the Ixodes ricinus Haemocytome

    PubMed Central

    Franta, Zdeněk; Pedra, Joao H. F.; Ribeiro, José M. C.

    2015-01-01

    Background Ixodes ricinus is the main tick vector of the microbes that cause Lyme disease and tick-borne encephalitis in Europe. Pathogens transmitted by ticks have to overcome innate immunity barriers present in tick tissues, including midgut, salivary glands epithelia and the hemocoel. Molecularly, invertebrate immunity is initiated when pathogen recognition molecules trigger serum or cellular signalling cascades leading to the production of antimicrobials, pathogen opsonization and phagocytosis. We presently aimed at identifying hemocyte transcripts from semi-engorged female I. ricinus ticks by mass sequencing a hemocyte cDNA library and annotating immune-related transcripts based on their hemocyte abundance as well as their ubiquitous distribution. Methodology/principal findings De novo assembly of 926,596 pyrosequence reads plus 49,328,982 Illumina reads (148 nt length) from a hemocyte library, together with over 189 million Illumina reads from salivary gland and midgut libraries, generated 15,716 extracted coding sequences (CDS); these are displayed in an annotated hyperlinked spreadsheet format. Read mapping allowed the identification and annotation of tissue-enriched transcripts. A total of 327 transcripts were found significantly over expressed in the hemocyte libraries, including those coding for scavenger receptors, antimicrobial peptides, pathogen recognition proteins, proteases and protease inhibitors. Vitellogenin and lipid metabolism transcription enrichment suggests fat body components. We additionally annotated ubiquitously distributed transcripts associated with immune function, including immune-associated signal transduction proteins and transcription factors, including the STAT transcription factor. Conclusions/significance This is the first systems biology approach to describe the genes expressed in the haemocytes of this neglected disease vector. A total of 2,860 coding sequences were deposited to GenBank, increasing to 27,547 the number so

  8. Deep sequencing as a probe of normal stem cell fate and preneoplasia in human epidermis

    PubMed Central

    Simons, Benjamin D.

    2016-01-01

    Using deep sequencing technology, methods based on the sporadic acquisition of somatic DNA mutations in human tissues have been used to trace the clonal evolution of progenitor cells in diseased states. However, the potential of these approaches to explore cell fate behavior of normal tissues and the initiation of preneoplasia remain underexploited. Focusing on the results of a recent deep sequencing study of eyelid epidermis, we show that the quantitative analysis of mutant clone size provides a general method to resolve the pattern of normal stem cell fate and to detect and characterize the mutational signature of rare field transformations in human tissues, with implications for the early detection of preneoplasia. PMID:26699486

  9. Determining mutant spectra of three RNA viral samples using ultra-deep sequencing

    SciTech Connect

    Chen, H

    2012-06-06

    RNA viruses have extremely high mutation rates that enable the virus to adapt to new host environments and even jump from one species to another. As part of a viral transmission study, three viral samples collected from naturally infected animals were sequenced using Illumina paired-end technology at ultra-deep coverage. In order to determine the mutant spectra within the viral quasispecies, it is critical to understand the sequencing error rates and control for false positive calls of viral variants (point mutantations). I will estimate the sequencing error rate from two control sequences and characterize the mutant spectra in the natural samples with this error rate.

  10. HCV genotyping from NGS short reads and its application in genotype detection from HCV mixed infected plasma.

    PubMed

    Qiu, Ping; Stevens, Richard; Wei, Bo; Lahser, Fred; Howe, Anita Y M; Klappenbach, Joel A; Marton, Matthew J

    2015-01-01

    Genotyping of hepatitis C virus (HCV) plays an important role in the treatment of HCV. As new genotype-specific treatment options become available, it has become increasingly important to have accurate HCV genotype and subtype information to ensure that the most appropriate treatment regimen is selected. Most current genotyping methods are unable to detect mixed genotypes from two or more HCV infections. Next generation sequencing (NGS) allows for rapid and low cost mass sequencing of viral genomes and provides an opportunity to probe the viral population from a single host. In this paper, the possibility of using short NGS reads for direct HCV genotyping without genome assembly was evaluated. We surveyed the publicly-available genetic content of three HCV drug target regions (NS3, NS5A, NS5B) in terms of whether these genes contained genotype-specific regions that could predict genotype. Six genotypes and 38 subtypes were included in this study. An automated phylogenetic analysis based HCV genotyping method was implemented and used to assess different HCV target gene regions. Candidate regions of 250-bp each were found for all three genes that have enough genetic information to predict HCV genotypes/subtypes. Validation using public datasets shows 100% genotyping accuracy. To test whether these 250-bp regions were sufficient to identify mixed genotypes, we developed a random primer-based method to sequence HCV plasma samples containing mixtures of two HCV genotypes in different ratios. We were able to determine the genotypes without ambiguity and to quantify the ratio of the abundances of the mixed genotypes in the samples. These data provide a proof-of-concept that this random primed, NGS-based short-read genotyping approach does not need prior information about the viral population and is capable of detecting mixed viral infection.

  11. Deep sequencing extends the diversity of human papillomaviruses in human skin.

    PubMed

    Bzhalava, Davit; Mühr, Laila Sara Arroyo; Lagheden, Camilla; Ekström, Johanna; Forslund, Ola; Dillner, Joakim; Hultin, Emilie

    2014-07-24

    Most viruses in human skin are known to be human papillomaviruses (HPVs). Previous sequencing of skin samples has identified 273 different cutaneous HPV types, including 47 previously unknown types. In the present study, we wished to extend prior studies using deeper sequencing. This deeper sequencing without prior PCR of a pool of 142 whole genome amplified skin lesions identified 23 known HPV types, 3 novel putative HPV types and 4 non-HPV viruses. The complete sequence was obtained for one of the known putative types and almost the complete sequence was obtained for one of the novel putative types. In addition, sequencing of amplimers from HPV consensus PCR of 326 skin lesions detected 385 different HPV types, including 226 previously unknown putative types. In conclusion, metagenomic deep sequencing of human skin samples identified no less than 396 different HPV types in human skin, out of which 229 putative HPV types were previously unknown.

  12. Using Small RNA Deep Sequencing Data to Detect Human Viruses.

    PubMed

    Wang, Fang; Sun, Yu; Ruan, Jishou; Chen, Rui; Chen, Xin; Chen, Chengjie; Kreuze, Jan F; Fei, ZhangJun; Zhu, Xiao; Gao, Shan

    2016-01-01

    Small RNA sequencing (sRNA-seq) can be used to detect viruses in infected hosts without the necessity to have any prior knowledge or specialized sample preparation. The sRNA-seq method was initially used for viral detection and identification in plants and then in invertebrates and fungi. However, it is still controversial to use sRNA-seq in the detection of mammalian or human viruses. In this study, we used 931 sRNA-seq runs of data from the NCBI SRA database to detect and identify viruses in human cells or tissues, particularly from some clinical samples. Six viruses including HPV-18, HBV, HCV, HIV-1, SMRV, and EBV were detected from 36 runs of data. Four viruses were consistent with the annotations from the previous studies. HIV-1 was found in clinical samples without the HIV-positive reports, and SMRV was found in Diffuse Large B-Cell Lymphoma cells for the first time. In conclusion, these results suggest the sRNA-seq can be used to detect viruses in mammals and humans. PMID:27066498

  13. Using Small RNA Deep Sequencing Data to Detect Human Viruses

    PubMed Central

    Wang, Fang; Sun, Yu; Ruan, Jishou; Chen, Rui; Chen, Xin; Chen, Chengjie; Kreuze, Jan F.; Fei, ZhangJun; Zhu, Xiao

    2016-01-01

    Small RNA sequencing (sRNA-seq) can be used to detect viruses in infected hosts without the necessity to have any prior knowledge or specialized sample preparation. The sRNA-seq method was initially used for viral detection and identification in plants and then in invertebrates and fungi. However, it is still controversial to use sRNA-seq in the detection of mammalian or human viruses. In this study, we used 931 sRNA-seq runs of data from the NCBI SRA database to detect and identify viruses in human cells or tissues, particularly from some clinical samples. Six viruses including HPV-18, HBV, HCV, HIV-1, SMRV, and EBV were detected from 36 runs of data. Four viruses were consistent with the annotations from the previous studies. HIV-1 was found in clinical samples without the HIV-positive reports, and SMRV was found in Diffuse Large B-Cell Lymphoma cells for the first time. In conclusion, these results suggest the sRNA-seq can be used to detect viruses in mammals and humans. PMID:27066498

  14. Deep Sequencing of the Murine Olfactory Receptor Neuron Transcriptome

    PubMed Central

    Kanageswaran, Ninthujah; Demond, Marilen; Nagel, Maximilian; Schreiner, Benjamin S. P.; Baumgart, Sabrina; Scholz, Paul; Altmüller, Janine; Becker, Christian; Doerner, Julia F.; Conrad, Heike; Oberland, Sonja; Wetzel, Christian H.; Neuhaus, Eva M.; Hatt, Hanns; Gisselmann, Günter

    2015-01-01

    The ability of animals to sense and differentiate among thousands of odorants relies on a large set of olfactory receptors (OR) and a multitude of accessory proteins within the olfactory epithelium (OE). ORs and related signaling mechanisms have been the subject of intensive studies over the past years, but our knowledge regarding olfactory processing remains limited. The recent development of next generation sequencing (NGS) techniques encouraged us to assess the transcriptome of the murine OE. We analyzed RNA from OEs of female and male adult mice and from fluorescence-activated cell sorting (FACS)-sorted olfactory receptor neurons (ORNs) obtained from transgenic OMP-GFP mice. The Illumina RNA-Seq protocol was utilized to generate up to 86 million reads per transcriptome. In OE samples, nearly all OR and trace amine-associated receptor (TAAR) genes involved in the perception of volatile amines were detectably expressed. Other genes known to participate in olfactory signaling pathways were among the 200 genes with the highest expression levels in the OE. To identify OE-specific genes, we compared olfactory neuron expression profiles with RNA-Seq transcriptome data from different murine tissues. By analyzing different transcript classes, we detected the expression of non-olfactory GPCRs in ORNs and established an expression ranking for GPCRs detected in the OE. We also identified other previously undescribed membrane proteins as potential new players in olfaction. The quantitative and comprehensive transcriptome data provide a virtually complete catalogue of genes expressed in the OE and present a useful tool to uncover candidate genes involved in, for example, olfactory signaling, OR trafficking and recycling, and proliferation. PMID:25590618

  15. Deep Sequencing of the Vaginal Microbiota of Women with HIV

    PubMed Central

    Hummelen, Ruben; Fernandes, Andrew D.; Macklaim, Jean M.; Dickson, Russell J.; Changalucha, John

    2010-01-01

    Background Women living with HIV and co-infected with bacterial vaginosis (BV) are at higher risk for transmitting HIV to a partner or newborn. It is poorly understood which bacterial communities constitute BV or the normal vaginal microbiota among this population and how the microbiota associated with BV responds to antibiotic treatment. Methods and Findings The vaginal microbiota of 132 HIV positive Tanzanian women, including 39 who received metronidazole treatment for BV, were profiled using Illumina to sequence the V6 region of the 16S rRNA gene. Of note, Gardnerella vaginalis and Lactobacillus iners were detected in each sample constituting core members of the vaginal microbiota. Eight major clusters were detected with relatively uniform microbiota compositions. Two clusters dominated by L. iners or L. crispatus were strongly associated with a normal microbiota. The L. crispatus dominated microbiota were associated with low pH, but when L. crispatus was not present, a large fraction of L. iners was required to predict a low pH. Four clusters were strongly associated with BV, and were dominated by Prevotella bivia, Lachnospiraceae, or a mixture of different species. Metronidazole treatment reduced the microbial diversity and perturbed the BV-associated microbiota, but rarely resulted in the establishment of a lactobacilli-dominated microbiota. Conclusions Illumina based microbial profiling enabled high though-put analyses of microbial samples at a high phylogenetic resolution. The vaginal microbiota among women living with HIV in Sub-Saharan Africa constitutes several profiles associated with a normal microbiota or BV. Recurrence of BV frequently constitutes a different BV-associated profile than before antibiotic treatment. PMID:20711427

  16. Novel lineages of Southern Ocean deep-sea foraminifera revealed by environmental DNA sequencing

    NASA Astrophysics Data System (ADS)

    Pawlowski, Jan; Fontaine, Delia; da Silva, Ana Aranda; Guiard, Jackie

    2011-10-01

    Diversity of deep-sea foraminifera is commonly studied based on analysis of agglutinated and calcareous tests preserved in the dried sediment samples. Soft-walled and agglutinated monothalamous (single-chambered) foraminifera are usually ignored because they are poorly preserved and difficult to identify. Moreover, the assemblage examined is usually limited to sediment size fraction larger than 63 or 125 μm. To overcome these problems, we analysed the foraminiferal assemblage based on ribosomal DNA sequences amplified specifically from total DNA extracted from unsieved and fine fraction (<32 μm) of sediment samples from three sites in Southern Ocean. We obtained 392 sequences, representing 123 phylotypes of foraminifera. Over 90% of phylotypes (112) could not be assigned to any previously sequenced species or genera. Among these new phylotypes, 20 belong to the clade of multi-chambered calcareous Rotaliida and agglutinated Textulariida, while 94 branch among the radiation of monothalamous species. Many new phylotypes clustered together with other environmental foraminiferal sequences and sequences of unknown origin. Eight new lineages of environmental foraminiferal sequences (ENFOR 1-8) were distinguished. The morphology of species included in these novel lineages is unknown, but we can speculate that they are tiny, amoeboid protists present in the deep-sea sediments. Their diversity may be as high as that of better known large-sized foraminifera. Documenting this hidden component of deep-sea foraminiferal assemblages is a major challenge for the future.

  17. Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction

    PubMed Central

    Laehnemann, David; Borkhardt, Arndt

    2016-01-01

    Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here. PMID:26026159

  18. VirusDetect: An automated pipeline for efficient virus discovery using deep sequencing of small RNAs

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Accurate detection of viruses in plants and animals is critical for agriculture production and human health. Deep sequencing and assembly of virus-derived siRNAs has proven to be a highly efficient approach for virus discovery. However, to date no computational tools specifically designed for both k...

  19. Draft Genome Sequence of the Deep-Sea Bacterium Shewanella benthica Strain KT99.

    PubMed

    Lauro, F M; Chastain, R A; Ferriera, S; Johnson, J; Yayanos, A A; Bartlett, D H

    2013-01-01

    We report the draft genome sequence of the obligately piezophilic Shewanella benthica strain KT99 isolated from the abyssal South Pacific Ocean. Strain KT99 is the first piezophilic isolate from the Tonga-Kermadec trench, and its genome provides many clues on high-pressure adaptation and the evolution of deep-sea piezophilic bacteria. PMID:23723392

  20. Draft Genome Sequence of the Deep-Subsurface Actinobacterium Tessaracoccus lapidicaptus IPBSL-7T

    PubMed Central

    Pieper, Dietmar H.; Arce-Rodríguez, Alejandro

    2016-01-01

    The type strain of Tessaracoccus lapidicaptus was isolated from the deep subsurface of the Iberian Pyrite Belt (southwest Spain). Here, we report its draft genome, consisting of 27 contigs with a ~3.1-Mb genome size. The annotation revealed 2,905 coding DNA sequences, 45 tRNA genes, and three rRNA genes. PMID:27688325

  1. Draft Genome Sequence of the Deep-Subsurface Actinobacterium Tessaracoccus lapidicaptus IPBSL-7T.

    PubMed

    Puente-Sánchez, Fernando; Pieper, Dietmar H; Arce-Rodríguez, Alejandro

    2016-01-01

    The type strain of Tessaracoccus lapidicaptus was isolated from the deep subsurface of the Iberian Pyrite Belt (southwest Spain). Here, we report its draft genome, consisting of 27 contigs with a ~3.1-Mb genome size. The annotation revealed 2,905 coding DNA sequences, 45 tRNA genes, and three rRNA genes. PMID:27688325

  2. Studies of a Biochemical Factory: Tomato Trichome Deep Expressed Sequence Tag Sequencing and Proteomics1[W][OA

    PubMed Central

    Schilmiller, Anthony L.; Miner, Dennis P.; Larson, Matthew; McDowell, Eric; Gang, David R.; Wilkerson, Curtis; Last, Robert L.

    2010-01-01

    Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces β-caryophyllene and α-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells. PMID:20431087

  3. SOAP3: ultra-fast GPU-based parallel alignment tool for short reads.

    PubMed

    Liu, Chi-Man; Wong, Thomas; Wu, Edward; Luo, Ruibang; Yiu, Siu-Ming; Li, Yingrui; Wang, Bingqiang; Yu, Chang; Chu, Xiaowen; Zhao, Kaiyong; Li, Ruiqiang; Lam, Tak-Wah

    2012-03-15

    SOAP3 is the first short read alignment tool that leverages the multi-processors in a graphic processing unit (GPU) to achieve a drastic improvement in speed. We adapted the compressed full-text index (BWT) used by SOAP2 in view of the advantages and disadvantages of GPU. When tested with millions of Illumina Hiseq 2000 length-100 bp reads, SOAP3 takes < 30 s to align a million read pairs onto the human reference genome and is at least 7.5 and 20 times faster than BWA and Bowtie, respectively. For aligning reads with up to four mismatches, SOAP3 aligns slightly more reads than BWA and Bowtie; this is because SOAP3, unlike BWA and Bowtie, is not heuristic-based and always reports all answers.

  4. Resolving the Complexity of Human Skin Metagenomes Using Single-Molecule Sequencing

    PubMed Central

    Tsai, Yu-Chih; Deming, Clayton; Segre, Julia A.; Kong, Heidi H.; Korlach, Jonas

    2016-01-01

    ABSTRACT Deep metagenomic shotgun sequencing has emerged as a powerful tool to interrogate composition and function of complex microbial communities. Computational approaches to assemble genome fragments have been demonstrated to be an effective tool for de novo reconstruction of genomes from these communities. However, the resultant “genomes” are typically fragmented and incomplete due to the limited ability of short-read sequence data to assemble complex or low-coverage regions. Here, we use single-molecule, real-time (SMRT) sequencing to reconstruct a high-quality, closed genome of a previously uncharacterized Corynebacterium simulans and its companion bacteriophage from a skin metagenomic sample. Considerable improvement in assembly quality occurs in hybrid approaches incorporating short-read data, with even relatively small amounts of long-read data being sufficient to improve metagenome reconstruction. Using short-read data to evaluate strain variation of this C. simulans in its skin community at single-nucleotide resolution, we observed a dominant C. simulans strain with moderate allelic heterozygosity throughout the population. We demonstrate the utility of SMRT sequencing and hybrid approaches in metagenome quantitation, reconstruction, and annotation. PMID:26861018

  5. Genome-Wide Probing of RNA Structures In Vitro Using Nucleases and Deep Sequencing.

    PubMed

    Wan, Yue; Qu, Kun; Ouyang, Zhengqing; Chang, Howard Y

    2016-01-01

    RNA structure probing is an important technique that studies the secondary and tertiary conformations of an RNA. While it was traditionally performed on one RNA at a time, recent advances in deep sequencing has enabled the secondary structure mapping of thousands of RNAs simultaneously. Here, we describe the method Parallel Analysis for RNA Structures (PARS), which couples double and single strand specific nuclease probing to high throughput sequencing. Upon cloning of the cleavage sites into a cDNA library, deep sequencing and mapping of reads to the transcriptome, the position of paired and unpaired bases along cellular RNAs can be identified. PARS can be performed under diverse solution conditions and on different organismal RNAs to provide genome-wide RNA structural information. This information can also be further used to constrain computational predictions to provide better RNA structure models under different conditions. PMID:26483021

  6. Deep sequencing reveals global patterns of mRNA recruitment during translation initiation

    PubMed Central

    Gao, Rong; Yu, Kai; Nie, Jukui; Lian, Tengfei; Jin, Jianshi; Liljas, Anders; Su, Xiao-Dong

    2016-01-01

    In this work, we developed a method to systematically study the sequence preference of mRNAs during translation initiation. Traditionally, the dynamic process of translation initiation has been studied at the single molecule level with limited sequencing possibility. Using deep sequencing techniques, we identified the sequence preference at different stages of the initiation complexes. Our results provide a comprehensive and dynamic view of the initiation elements in the translation initiation region (TIR), including the S1 binding sequence, the Shine-Dalgarno (SD)/anti-SD interaction and the second codon, at the equilibrium of different initiation complexes. Moreover, our experiments reveal the conformational changes and regional dynamics throughout the dynamic process of mRNA recruitment. PMID:27460773

  7. Enhanced arbovirus surveillance with deep sequencing: identification of novel rhabdoviruses and bunyaviruses in Australian mosquitoes

    PubMed Central

    Coffey, Lark L.; Page, Brady L.; Greninger, Alexander L.; Herring, Belinda L.; Russell, Richard C.; Doggett, Stephen L.; Haniotis, John; Wang, Chunlin; Deng, Xutao; Delwart, Eric L.

    2013-01-01

    Viral metagenomics characterizes known and identifies unknown viruses based on sequence similarities to any previously sequenced viral genomes. A metagenomics approach was used to identify virus sequences in Australian mosquitoes causing cytopathic effects in inoculated mammalian cell cultures. Sequence comparisons revealed strains of Liao Ning virus (Reovirus, Seadornavirus), previously detected only in China, livestock-infecting Stretch Lagoon virus (Reovirus, Orbivirus), two novel dimarhabdoviruses, named Beaumont and North Creek viruses, and two novel orthobunyaviruses, named Murrumbidgee and Salt Ash viruses. The novel virus proteomes diverged by ≥50% relative to their closest previously genetically characterized viral relatives. Deep sequencing also generated genomes of Warrego and Wallal viruses, orbiviruses linked to kangaroo blindness, whose genomes had not been fully characterized. This study highlights viral metagenomics in concert with traditional arbovirus surveillance to characterize known and new arboviruses in field-collected mosquitoes. Follow-up epidemiological studies are required to determine whether the novel viruses infect humans. PMID:24314645

  8. Deep sequencing reveals global patterns of mRNA recruitment during translation initiation.

    PubMed

    Gao, Rong; Yu, Kai; Nie, Jukui; Lian, Tengfei; Jin, Jianshi; Liljas, Anders; Su, Xiao-Dong

    2016-07-27

    In this work, we developed a method to systematically study the sequence preference of mRNAs during translation initiation. Traditionally, the dynamic process of translation initiation has been studied at the single molecule level with limited sequencing possibility. Using deep sequencing techniques, we identified the sequence preference at different stages of the initiation complexes. Our results provide a comprehensive and dynamic view of the initiation elements in the translation initiation region (TIR), including the S1 binding sequence, the Shine-Dalgarno (SD)/anti-SD interaction and the second codon, at the equilibrium of different initiation complexes. Moreover, our experiments reveal the conformational changes and regional dynamics throughout the dynamic process of mRNA recruitment.

  9. Enhanced arbovirus surveillance with deep sequencing: Identification of novel rhabdoviruses and bunyaviruses in Australian mosquitoes.

    PubMed

    Coffey, Lark L; Page, Brady L; Greninger, Alexander L; Herring, Belinda L; Russell, Richard C; Doggett, Stephen L; Haniotis, John; Wang, Chunlin; Deng, Xutao; Delwart, Eric L

    2014-01-01

    Viral metagenomics characterizes known and identifies unknown viruses based on sequence similarities to any previously sequenced viral genomes. A metagenomics approach was used to identify virus sequences in Australian mosquitoes causing cytopathic effects in inoculated mammalian cell cultures. Sequence comparisons revealed strains of Liao Ning virus (Reovirus, Seadornavirus), previously detected only in China, livestock-infecting Stretch Lagoon virus (Reovirus, Orbivirus), two novel dimarhabdoviruses, named Beaumont and North Creek viruses, and two novel orthobunyaviruses, named Murrumbidgee and Salt Ash viruses. The novel virus proteomes diverged by ≥ 50% relative to their closest previously genetically characterized viral relatives. Deep sequencing also generated genomes of Warrego and Wallal viruses, orbiviruses linked to kangaroo blindness, whose genomes had not been fully characterized. This study highlights viral metagenomics in concert with traditional arbovirus surveillance to characterize known and new arboviruses in field-collected mosquitoes. Follow-up epidemiological studies are required to determine whether the novel viruses infect humans. PMID:24314645

  10. Enhanced arbovirus surveillance with deep sequencing: Identification of novel rhabdoviruses and bunyaviruses in Australian mosquitoes.

    PubMed

    Coffey, Lark L; Page, Brady L; Greninger, Alexander L; Herring, Belinda L; Russell, Richard C; Doggett, Stephen L; Haniotis, John; Wang, Chunlin; Deng, Xutao; Delwart, Eric L

    2014-01-01

    Viral metagenomics characterizes known and identifies unknown viruses based on sequence similarities to any previously sequenced viral genomes. A metagenomics approach was used to identify virus sequences in Australian mosquitoes causing cytopathic effects in inoculated mammalian cell cultures. Sequence comparisons revealed strains of Liao Ning virus (Reovirus, Seadornavirus), previously detected only in China, livestock-infecting Stretch Lagoon virus (Reovirus, Orbivirus), two novel dimarhabdoviruses, named Beaumont and North Creek viruses, and two novel orthobunyaviruses, named Murrumbidgee and Salt Ash viruses. The novel virus proteomes diverged by ≥ 50% relative to their closest previously genetically characterized viral relatives. Deep sequencing also generated genomes of Warrego and Wallal viruses, orbiviruses linked to kangaroo blindness, whose genomes had not been fully characterized. This study highlights viral metagenomics in concert with traditional arbovirus surveillance to characterize known and new arboviruses in field-collected mosquitoes. Follow-up epidemiological studies are required to determine whether the novel viruses infect humans.

  11. Bias from removing read duplication in ultra-deep sequencing experiments

    PubMed Central

    Zhou, Wanding; Chen, Tenghui; Zhao, Hao; Eterovic, Agda Karina; Meric-Bernstam, Funda; Mills, Gordon B.; Chen, Ken

    2014-01-01

    Motivation: Identifying subclonal mutations and their implications requires accurate estimation of mutant allele fractions from possibly duplicated sequencing reads. Removing duplicate reads assumes that polymerase chain reaction amplification from library constructions is the primary source. The alternative—sampling coincidence from DNA fragmentation—has not been systematically investigated. Results: With sufficiently high-sequencing depth, sampling-induced read duplication is non-negligible, and removing duplicate reads can overcorrect read counts, causing systemic biases in variant allele fraction and copy number variation estimations. Minimal overcorrection occurs when duplicate reads are identified accounting for their mate reads, inserts are of a variety of lengths and samples are sequenced in separate batches. We investigate sampling-induced read duplication in deep sequencing data with 500× to 2000× duplicates-removed sequence coverage. We provide a quantitative solution to overcorrection and guidance for effective designs of deep sequencing platforms that facilitate accurate estimation of variant allele fraction and copy number variation. Availability and implementation: A Python implementation is freely available at https://bitbucket.org/wanding/duprecover/overview. Contact: wzhou1@mdanderson.org, kchen3@mdanderson.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24389657

  12. Exploring fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing

    NASA Astrophysics Data System (ADS)

    Zhang, Xiao-Yong; Wang, Guang-Hua; Xu, Xin-Ya; Nong, Xu-Hua; Wang, Jie; Amin, Muhammad; Qi, Shu-Hua

    2016-10-01

    The present study investigated the fungal diversity in four different deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing of the nuclear ribosomal internal transcribed spacer-1 (ITS1). A total of 40,297 fungal ITS1 sequences clustered into 420 operational taxonomic units (OTUs) with 97% sequence similarity and 170 taxa were recovered from these sediments. Most ITS1 sequences (78%) belonged to the phylum Ascomycota, followed by Basidiomycota (17.3%), Zygomycota (1.5%) and Chytridiomycota (0.8%), and a small proportion (2.4%) belonged to unassigned fungal phyla. Compared with previous studies on fungal diversity of sediments from deep-sea environments by culture-dependent approach and clone library analysis, the present result suggested that Illumina sequencing had been dramatically accelerating the discovery of fungal community of deep-sea sediments. Furthermore, our results revealed that Sordariomycetes was the most diverse and abundant fungal class in this study, challenging the traditional view that the diversity of Sordariomycetes phylotypes was low in the deep-sea environments. In addition, more than 12 taxa accounted for 21.5% sequences were found to be rarely reported as deep-sea fungi, suggesting the deep-sea sediments from Okinawa Trough harbored a plethora of different fungal communities compared with other deep-sea environments. To our knowledge, this study is the first exploration of the fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing.

  13. Ultra-deep sequencing of intra-host rabies virus populations during cross-species transmission.

    PubMed

    Borucki, Monica K; Chen-Harris, Haiyin; Lao, Victoria; Vanier, Gilda; Wadford, Debra A; Messenger, Sharon; Allen, Jonathan E

    2013-11-01

    One of the hurdles to understanding the role of viral quasispecies in RNA virus cross-species transmission (CST) events is the need to analyze a densely sampled outbreak using deep sequencing in order to measure the amount of mutation occurring on a small time scale. In 2009, the California Department of Public Health reported a dramatic increase (350) in the number of gray foxes infected with a rabies virus variant for which striped skunks serve as a reservoir host in Humboldt County. To better understand the evolution of rabies, deep-sequencing was applied to 40 unpassaged rabies virus samples from the Humboldt outbreak. For each sample, approximately 11 kb of the 12 kb genome was amplified and sequenced using the Illumina platform. Average coverage was 17,448 and this allowed characterization of the rabies virus population present in each sample at unprecedented depths. Phylogenetic analysis of the consensus sequence data demonstrated that samples clustered according to date (1995 vs. 2009) and geographic location (northern vs. southern). A single amino acid change in the G protein distinguished a subset of northern foxes from a haplotype present in both foxes and skunks, suggesting this mutation may have played a role in the observed increased transmission among foxes in this region. Deep-sequencing data indicated that many genetic changes associated with the CST event occurred prior to 2009 since several nonsynonymous mutations that were present in the consensus sequences of skunk and fox rabies samples obtained from 20032010 were present at the sub-consensus level (as rare variants in the viral population) in skunk and fox samples from 1995. These results suggest that analysis of rare variants within a viral population may yield clues to ancestral genomes and identify rare variants that have the potential to be selected for if environment conditions change. PMID:24278493

  14. Ultra-Deep Sequencing of Intra-host Rabies Virus Populations during Cross-species Transmission

    PubMed Central

    Borucki, Monica K.; Chen-Harris, Haiyin; Lao, Victoria; Vanier, Gilda; Wadford, Debra A.; Messenger, Sharon; Allen, Jonathan E.

    2013-01-01

    One of the hurdles to understanding the role of viral quasispecies in RNA virus cross-species transmission (CST) events is the need to analyze a densely sampled outbreak using deep sequencing in order to measure the amount of mutation occurring on a small time scale. In 2009, the California Department of Public Health reported a dramatic increase (350) in the number of gray foxes infected with a rabies virus variant for which striped skunks serve as a reservoir host in Humboldt County. To better understand the evolution of rabies, deep-sequencing was applied to 40 unpassaged rabies virus samples from the Humboldt outbreak. For each sample, approximately 11 kb of the 12 kb genome was amplified and sequenced using the Illumina platform. Average coverage was 17,448 and this allowed characterization of the rabies virus population present in each sample at unprecedented depths. Phylogenetic analysis of the consensus sequence data demonstrated that samples clustered according to date (1995 vs. 2009) and geographic location (northern vs. southern). A single amino acid change in the G protein distinguished a subset of northern foxes from a haplotype present in both foxes and skunks, suggesting this mutation may have played a role in the observed increased transmission among foxes in this region. Deep-sequencing data indicated that many genetic changes associated with the CST event occurred prior to 2009 since several nonsynonymous mutations that were present in the consensus sequences of skunk and fox rabies samples obtained from 20032010 were present at the sub-consensus level (as rare variants in the viral population) in skunk and fox samples from 1995. These results suggest that analysis of rare variants within a viral population may yield clues to ancestral genomes and identify rare variants that have the potential to be selected for if environment conditions change. PMID:24278493

  15. Ultra-deep sequencing of intra-host rabies virus populations during cross-species transmission.

    PubMed

    Borucki, Monica K; Chen-Harris, Haiyin; Lao, Victoria; Vanier, Gilda; Wadford, Debra A; Messenger, Sharon; Allen, Jonathan E

    2013-11-01

    One of the hurdles to understanding the role of viral quasispecies in RNA virus cross-species transmission (CST) events is the need to analyze a densely sampled outbreak using deep sequencing in order to measure the amount of mutation occurring on a small time scale. In 2009, the California Department of Public Health reported a dramatic increase (350) in the number of gray foxes infected with a rabies virus variant for which striped skunks serve as a reservoir host in Humboldt County. To better understand the evolution of rabies, deep-sequencing was applied to 40 unpassaged rabies virus samples from the Humboldt outbreak. For each sample, approximately 11 kb of the 12 kb genome was amplified and sequenced using the Illumina platform. Average coverage was 17,448 and this allowed characterization of the rabies virus population present in each sample at unprecedented depths. Phylogenetic analysis of the consensus sequence data demonstrated that samples clustered according to date (1995 vs. 2009) and geographic location (northern vs. southern). A single amino acid change in the G protein distinguished a subset of northern foxes from a haplotype present in both foxes and skunks, suggesting this mutation may have played a role in the observed increased transmission among foxes in this region. Deep-sequencing data indicated that many genetic changes associated with the CST event occurred prior to 2009 since several nonsynonymous mutations that were present in the consensus sequences of skunk and fox rabies samples obtained from 20032010 were present at the sub-consensus level (as rare variants in the viral population) in skunk and fox samples from 1995. These results suggest that analysis of rare variants within a viral population may yield clues to ancestral genomes and identify rare variants that have the potential to be selected for if environment conditions change.

  16. Draft genome sequence of Pseudomonas oleovorans strain MGY01 isolated from deep sea water.

    PubMed

    Wang, Runping; Ren, Chong; Huang, Nan; Liu, Yang; Zeng, Runying

    2015-04-01

    Pseudomonas oleovorans MGY01 isolated from the deep-sea water of the South China Sea could effectively degrade malachite green. The draft genome of P. oleovorans MGY01 was sequenced and analyzed to gain insights into its efficient metabolic pathway for degrading malachite green. The data obtained revealed 109 Contigs (N50; 128,269 bp) with whole genome size of 5,201,892 bp. The draft genome sequence of strain MGY01 will be helpful in studying the genetic pathways involved in the degradation of malachite green. PMID:25528517

  17. Chromatin immunoprecipitation and deep sequencing in Xenopus tropicalis and Xenopus laevis

    PubMed Central

    Wills, Andrea E.; Gupta, Rakhi; Chuong, Edward; Baker, Julie C.

    2014-01-01

    Chromatin immunoprecipitation and deep sequencing (ChIP-SEQ) represents a powerful tool for identifying the genomic targets of transcription factors, chromatin remodeling factors, and histone modifications. The frogs Xenopus laevis and Xenopus tropicalis have historically been outstanding model systems for embryology and cell biology, with emerging utility as highly accessible embryos for genome-wide studies. Here we focus on the particular strengths and limitations of Xenopus cell biology and genomics as they apply to ChIP-SEQ, and outline a methodology for ChIP-SEQ in both species, providing detailed strategies for sample preparation, antibody selection, quality control, sequencing library preparation, and basic analysis. PMID:24064036

  18. Draft genome sequence of Pseudomonas oleovorans strain MGY01 isolated from deep sea water.

    PubMed

    Wang, Runping; Ren, Chong; Huang, Nan; Liu, Yang; Zeng, Runying

    2015-04-01

    Pseudomonas oleovorans MGY01 isolated from the deep-sea water of the South China Sea could effectively degrade malachite green. The draft genome of P. oleovorans MGY01 was sequenced and analyzed to gain insights into its efficient metabolic pathway for degrading malachite green. The data obtained revealed 109 Contigs (N50; 128,269 bp) with whole genome size of 5,201,892 bp. The draft genome sequence of strain MGY01 will be helpful in studying the genetic pathways involved in the degradation of malachite green.

  19. Seismic sequence stratigraphy of Tertiary sediments, offshore Sarawak deep-water area

    SciTech Connect

    Mohammad, A.M. )

    1994-07-01

    Tectonic processes and sea level changes are the main key factors that have strongly influenced clastic and carbonate sedimentations in the Sarawak deep-water area. A seismic sequence stratigraphy of Tertiary sediments was conducted in the area with the main objective of developing a workable genetic chronostratigraphic framework that defines the sequence and system tracts boundaries within which depositional systems and lithofacies can be identified, mapped and interpreted. This study has resulted in the identification of eight major depositional sequences that are bounded by regional unconformities and correlative conformities. These sequences can generally be grouped into four megasequences, based on the main tectonic events observed in the area. Three system tracts of a type-1, third-order sequence boundary were recognized in most of the sequences: lowstand, transgressive, and highstand systems tracts. The lowstand system tract includes basin-floor fans, slope fans, and lowstand prograding wedges. Paleoenvironmental distribution maps constructed for each of the sequences using seismic facies analysis and nearby well control suggest that the sequence intervals are predominantly transgressive units that have been intermittently interrupted by regressive pulses brought about by changes in eustatic sea level. The trend of paleocoastline observed during Oligocene to Miocene times changes from northwest-southeast orientation to a position roughly parallel to the present coastline. Seismic facies maps generated from late Oligocene to early Miocene indicate the depositional environment was coastal to coastal plain in the western and the middle part of the study area, becoming more marine toward the east and northeast.

  20. Prognostic value of deep sequencing method for minimal residual disease detection in multiple myeloma

    PubMed Central

    Lahuerta, Juan J.; Pepin, François; González, Marcos; Barrio, Santiago; Ayala, Rosa; Puig, Noemí; Montalban, María A.; Paiva, Bruno; Weng, Li; Jiménez, Cristina; Sopena, María; Moorhead, Martin; Cedena, Teresa; Rapado, Immaculada; Mateos, María Victoria; Rosiñol, Laura; Oriol, Albert; Blanchard, María J.; Martínez, Rafael; Bladé, Joan; San Miguel, Jesús; Faham, Malek; García-Sanz, Ramón

    2014-01-01

    We assessed the prognostic value of minimal residual disease (MRD) detection in multiple myeloma (MM) patients using a sequencing-based platform in bone marrow samples from 133 MM patients in at least very good partial response (VGPR) after front-line therapy. Deep sequencing was carried out in patients in whom a high-frequency myeloma clone was identified and MRD was assessed using the IGH-VDJH, IGH-DJH, and IGK assays. The results were contrasted with those of multiparametric flow cytometry (MFC) and allele-specific oligonucleotide polymerase chain reaction (ASO-PCR). The applicability of deep sequencing was 91%. Concordance between sequencing and MFC and ASO-PCR was 83% and 85%, respectively. Patients who were MRD– by sequencing had a significantly longer time to tumor progression (TTP) (median 80 vs 31 months; P < .0001) and overall survival (median not reached vs 81 months; P = .02), compared with patients who were MRD+. When stratifying patients by different levels of MRD, the respective TTP medians were: MRD ≥10−3 27 months, MRD 10−3 to 10−5 48 months, and MRD <10−5 80 months (P = .003 to .0001). Ninety-two percent of VGPR patients were MRD+. In complete response patients, the TTP remained significantly longer for MRD– compared with MRD+ patients (131 vs 35 months; P = .0009). PMID:24646471

  1. miRBase: integrating microRNA annotation and deep-sequencing data.

    PubMed

    Kozomara, Ana; Griffiths-Jones, Sam

    2011-01-01

    miRBase is the primary online repository for all microRNA sequences and annotation. The current release (miRBase 16) contains over 15,000 microRNA gene loci in over 140 species, and over 17,000 distinct mature microRNA sequences. Deep-sequencing technologies have delivered a sharp rise in the rate of novel microRNA discovery. We have mapped reads from short RNA deep-sequencing experiments to microRNAs in miRBase and developed web interfaces to view these mappings. The user can view all read data associated with a given microRNA annotation, filter reads by experiment and count, and search for microRNAs by tissue- and stage-specific expression. These data can be used as a proxy for relative expression levels of microRNA sequences, provide detailed evidence for microRNA annotations and alternative isoforms of mature microRNAs, and allow us to revisit previous annotations. miRBase is available online at: http://www.mirbase.org/.

  2. Prognostic value of deep sequencing method for minimal residual disease detection in multiple myeloma.

    PubMed

    Martinez-Lopez, Joaquin; Lahuerta, Juan J; Pepin, François; González, Marcos; Barrio, Santiago; Ayala, Rosa; Puig, Noemí; Montalban, María A; Paiva, Bruno; Weng, Li; Jiménez, Cristina; Sopena, María; Moorhead, Martin; Cedena, Teresa; Rapado, Immaculada; Mateos, María Victoria; Rosiñol, Laura; Oriol, Albert; Blanchard, María J; Martínez, Rafael; Bladé, Joan; San Miguel, Jesús; Faham, Malek; García-Sanz, Ramón

    2014-05-15

    We assessed the prognostic value of minimal residual disease (MRD) detection in multiple myeloma (MM) patients using a sequencing-based platform in bone marrow samples from 133 MM patients in at least very good partial response (VGPR) after front-line therapy. Deep sequencing was carried out in patients in whom a high-frequency myeloma clone was identified and MRD was assessed using the IGH-VDJH, IGH-DJH, and IGK assays. The results were contrasted with those of multiparametric flow cytometry (MFC) and allele-specific oligonucleotide polymerase chain reaction (ASO-PCR). The applicability of deep sequencing was 91%. Concordance between sequencing and MFC and ASO-PCR was 83% and 85%, respectively. Patients who were MRD(-) by sequencing had a significantly longer time to tumor progression (TTP) (median 80 vs 31 months; P < .0001) and overall survival (median not reached vs 81 months; P = .02), compared with patients who were MRD(+). When stratifying patients by different levels of MRD, the respective TTP medians were: MRD ≥10(-3) 27 months, MRD 10(-3) to 10(-5) 48 months, and MRD <10(-5) 80 months (P = .003 to .0001). Ninety-two percent of VGPR patients were MRD(+). In complete response patients, the TTP remained significantly longer for MRD(-) compared with MRD(+) patients (131 vs 35 months; P = .0009).

  3. The metagenome of shallow estuary sediment: A reflection of the deep biosphere

    NASA Astrophysics Data System (ADS)

    Biddle, J.; Crowgey, E.; Christman, G.; Russell, J.; Polson, S.

    2012-12-01

    Shallow sediments have proven to be valuable proxies for the deep biosphere as they contain many of the same microbial groups in a much more readily accessible habitat. One area under recent study is the White Oak River estuary in North Carolina, where a sulfate:methane transition zone is present year-round and relatives of deep subsurface Archaea such as ANME and MCG have been found. A previously studied sample was prepared for metagenomic sequencing through DNA extraction and whole genome amplification. An amplicon library was prepared from this using universal primers, showing that the community was roughly 28% MCG archaea, 27% Chloroflexi bacteria, 17% Proteobacteria, 3% ANME archaea and numerous rare taxa. The metagenome was sequenced via Illumina sequencing, yielding reads that were 152 bp. Assembly of these short reads was initially performed via the JGI pipeline and contigs over 800 bp had taxonomic assignments via MEGAN. In this analysis, the majority of reads had no hits. The next major taxon was the unassigned category, followed by a minority of hits to Thaumarchaeota, Dehalococcoides and DeltaProteobacteria. In order to improve potential errors caused by short reads in assembly, we developed a pipeline utilizing FR-HIT to bin taxonomically relevant reads prior to assembly. Using this approach, new contigs were discovered from rare groups such as the ANME that were not seen in the general assembly. Overall the data suggests shallow populations are relatively similar to deep ones on a metagenomic level and that bulk assembly of short reads should be critiqued at the individual study basis.

  4. A transcriptional sketch of a primary human breast cancer by 454 deep sequencing

    PubMed Central

    Guffanti, Alessandro; Iacono, Michele; Pelucchi, Paride; Kim, Namshin; Soldà, Giulia; Croft, Larry J; Taft, Ryan J; Rizzi, Ermanno; Askarian-Amiri, Marjan; Bonnal, Raoul J; Callari, Maurizio; Mignone, Flavio; Pesole, Graziano; Bertalot, Giovanni; Bernardi, Luigi Rossi; Albertini, Alberto; Lee, Christopher; Mattick, John S; Zucchi, Ileana; De Bellis, Gianluca

    2009-01-01

    Background The cancer transcriptome is difficult to explore due to the heterogeneity of quantitative and qualitative changes in gene expression linked to the disease status. An increasing number of "unconventional" transcripts, such as novel isoforms, non-coding RNAs, somatic gene fusions and deletions have been associated with the tumoral state. Massively parallel sequencing techniques provide a framework for exploring the transcriptional complexity inherent to cancer with a limited laboratory and financial effort. We developed a deep sequencing and bioinformatics analysis protocol to investigate the molecular composition of a breast cancer poly(A)+ transcriptome. This method utilizes a cDNA library normalization step to diminish the representation of highly expressed transcripts and biology-oriented bioinformatic analyses to facilitate detection of rare and novel transcripts. Results We analyzed over 132,000 Roche 454 high-confidence deep sequencing reads from a primary human lobular breast cancer tissue specimen, and detected a range of unusual transcriptional events that were subsequently validated by RT-PCR in additional eight primary human breast cancer samples. We identified and validated one deletion, two novel ncRNAs (one intergenic and one intragenic), ten previously unknown or rare transcript isoforms and a novel gene fusion specific to a single primary tissue sample. We also explored the non-protein-coding portion of the breast cancer transcriptome, identifying thousands of novel non-coding transcripts and more than three hundred reads corresponding to the non-coding RNA MALAT1, which is highly expressed in many human carcinomas. Conclusion Our results demonstrate that combining 454 deep sequencing with a normalization step and careful bioinformatic analysis facilitates the discovery and quantification of rare transcripts or ncRNAs, and can be used as a qualitative tool to characterize transcriptome complexity, revealing many hitherto unknown

  5. Deep Sequencing of Norovirus Genomes Defines Evolutionary Patterns in an Urban Tropical Setting

    PubMed Central

    Cotten, Matthew; Petrova, Velislava; Phan, My V. T.; Rabaa, Maia A.; Watson, Simon J.; Ong, Swee Hoe; Baker, Stephen

    2014-01-01

    ABSTRACT Norovirus is a highly transmissible infectious agent that causes epidemic gastroenteritis in susceptible children and adults. Norovirus infections can be severe and can be initiated from an exceptionally small number of viral particles. Detailed genome sequence data are useful for tracking norovirus transmission and evolution. To address this need, we have developed a whole-genome deep-sequencing method that generates entire genome sequences from small amounts of clinical specimens. This novel approach employs an algorithm for reverse transcription and PCR amplification primer design using all of the publically available norovirus sequence data. Deep sequencing and de novo assembly were used to generate norovirus genomes from a large set of diarrheal patients attending three hospitals in Ho Chi Minh City, Vietnam, over a 2.5-year period. Positive-selection analysis and direct examination of protein changes in the virus over time identified codons in the regions encoding proteins VP1, p48 (NS1-2), and p22 (NS4) under positive selection and expands the known targets of norovirus evolutionary pressure. IMPORTANCE The high transmissibility and rapid evolutionary rate of norovirus, combined with a short-lived host immune responses, are thought to be the reasons why the virus causes the majority of pediatric viral diarrhea cases. The evolutionary patterns of this RNA virus have been described in detail for only a portion of the virus genome and never for a virus from a detailed urban tropical setting. We provide a detailed sequence description of the noroviruses circulating in three Ho Chi Minh City hospitals over a 2.5-year period. This study identified patterns of virus change in known sites of host immune response and identified three additional regions of the virus genome under selection that were not previously recognized. In addition, the method described here provides a robust full-genome sequencing platform for community-based virus surveillance. PMID

  6. Analyzing the microRNA Transcriptome in Plants Using Deep Sequencing Data

    PubMed Central

    Yang, Xiaozeng; Li, Lei

    2012-01-01

    MicroRNAs (miRNAs) are 20- to 24-nucleotide endogenous small RNA molecules emerging as an important class of sequence-specific, trans-acting regulators for modulating gene expression at the post-transcription level. There has been a surge of interest in the past decade in identifying miRNAs and profiling their expression pattern using various experimental approaches. In particular, ultra-deep sampling of specifically prepared low-molecular-weight RNA libraries based on next-generation sequencing technologies has been used successfully in diverse species. The challenge now is to effectively deconvolute the complex sequencing data to provide comprehensive and reliable information on the miRNAs, miRNA precursors, and expression profile of miRNA genes. Here we review the recently developed computational tools and their applications in profiling the miRNA transcriptomes, with an emphasis on the model plant Arabidopsis thaliana. Highlighted is also progress and insight into miRNA biology derived from analyzing available deep sequencing data. PMID:24832228

  7. Nautilus: a bioinformatics package for the analysis of HIV type 1 targeted deep sequencing data.

    PubMed

    Kijak, Gustavo H; Pham, Phuc; Sanders-Buell, Eric; Harbolick, Elizabeth A; Eller, Leigh Anne; Robb, Merlin L; Michael, Nelson L; Kim, Jerome H; Tovanabutra, Sodsai

    2013-10-01

    The advent of next generation sequencing technologies is providing new insight into HIV-1 diversity and evolution, which has created the need for bioinformatics tools that could be applied to the characterization of viral quasispecies. Here we present Nautilus, a bioinformatics package for the analysis of HIV-1 targeted deep sequencing data. The DeepHaplo module determines the nucleotide base frequency and read depth at each position and computes the haplotype frequencies based on the linkage among polymorphisms in the same next generation sequence read. The Motifs module computes the frequency of the variants in the setting of their sequence context and mapping orientation, which allows for the validation of polymorphisms and haplotypes when strand bias is suspected. Both modules are accessed through a user-friendly GUI, which runs on Mac OS X (version 10.7.4 or later), and are based on Python, JAVA, and R scripts. Nautilus is available from www.hivresearch.org/research.php?ServiceID=5&SubServiceID=6 . PMID:23809062

  8. miRBase: annotating high confidence microRNAs using deep sequencing data.

    PubMed

    Kozomara, Ana; Griffiths-Jones, Sam

    2014-01-01

    We describe an update of the miRBase database (http://www.mirbase.org/), the primary microRNA sequence repository. The latest miRBase release (v20, June 2013) contains 24 521 microRNA loci from 206 species, processed to produce 30 424 mature microRNA products. The rate of deposition of novel microRNAs and the number of researchers involved in their discovery continue to increase, driven largely by small RNA deep sequencing experiments. In the face of these increases, and a range of microRNA annotation methods and criteria, maintaining the quality of the microRNA sequence data set is a significant challenge. Here, we describe recent developments of the miRBase database to address this issue. In particular, we describe the collation and use of deep sequencing data sets to assign levels of confidence to miRBase entries. We now provide a high confidence subset of miRBase entries, based on the pattern of mapped reads. The high confidence microRNA data set is available alongside the complete microRNA collection at http://www.mirbase.org/. We also describe embedding microRNA-specific Wikipedia pages on the miRBase website to encourage the microRNA community to contribute and share textual and functional information. PMID:24275495

  9. De Novo assembly of the complete genome of an enhanced electricity-producing variant of Geobacter sulfurreducens using only short reads.

    PubMed

    Nagarajan, Harish; Butler, Jessica E; Klimes, Anna; Qiu, Yu; Zengler, Karsten; Ward, Joy; Young, Nelson D; Methé, Barbara A; Palsson, Bernhard Ø; Lovley, Derek R; Barrett, Christian L

    2010-06-08

    State-of-the-art DNA sequencing technologies are transforming the life sciences due to their ability to generate nucleotide sequence information with a speed and quantity that is unapproachable with traditional Sanger sequencing. Genome sequencing is a principal application of this technology, where the ultimate goal is the full and complete sequence of the organism of interest. Due to the nature of the raw data produced by these technologies, a full genomic sequence attained without the aid of Sanger sequencing has yet to be demonstrated.We have successfully developed a four-phase strategy for using only next-generation sequencing technologies (Illumina and 454) to assemble a complete microbial genome de novo. We applied this approach to completely assemble the 3.7 Mb genome of a rare Geobacter variant (KN400) that is capable of unprecedented current production at an electrode. Two key components of our strategy enabled us to achieve this result. First, we integrated the two data types early in the process to maximally leverage their complementary characteristics. And second, we used the output of different short read assembly programs in such a way so as to leverage the complementary nature of their different underlying algorithms or of their different implementations of the same underlying algorithm.The significance of our result is that it demonstrates a general approach for maximizing the efficiency and success of genome assembly projects as new sequencing technologies and new assembly algorithms are introduced. The general approach is a meta strategy, wherein sequencing data are integrated as early as possible and in particular ways and wherein multiple assembly algorithms are judiciously applied such that the deficiencies in one are complemented by another.

  10. HomozygosityMapper2012--bridging the gap between homozygosity mapping and deep sequencing.

    PubMed

    Seelow, Dominik; Schuelke, Markus

    2012-07-01

    Homozygosity mapping is a common method to map recessive traits in consanguineous families. To facilitate these analyses, we have developed HomozygosityMapper, a web-based approach to homozygosity mapping. HomozygosityMapper allows researchers to directly upload the genotype files produced by the major genotyping platforms as well as deep sequencing data. It detects stretches of homozygosity shared by the affected individuals and displays them graphically. Users can interactively inspect the underlying genotypes, manually refine these regions and eventually submit them to our candidate gene search engine GeneDistiller to identify the most promising candidate genes. Here, we present the new version of HomozygosityMapper. The most striking new feature is the support of Next Generation Sequencing *.vcf files as input. Upon users' requests, we have implemented the analysis of common experimental rodents as well as of important farm animals. Furthermore, we have extended the options for single families and loss of heterozygosity studies. Another new feature is the export of *.bed files for targeted enrichment of the potential disease regions for deep sequencing strategies. HomozygosityMapper also generates files for conventional linkage analyses which are already restricted to the possible disease regions, hence superseding CPU-intensive genome-wide analyses. HomozygosityMapper is freely available at http://www.homozygositymapper.org/.

  11. Complete Genome Sequence of a Reference Stock of Simian Immunodeficiency Virus RNA (SIVmac251/32H/L28) Determined by Deep Sequencing.

    PubMed

    Jenkins, Adrian; Ham, Claire; Almond, Neil; Berry, Neil

    2016-01-01

    A reference preparation for simian immunodeficiency virus (SIV) RNA nucleic acid assays was characterized by complete genome deep sequencing. The entire coding sequence and flanking long terminal repeats, including minority species, were determined. This information will inform SIV research investigations and aid evaluation and development of amplification assays for SIV RNA quantification. PMID:27231355

  12. Complete Genome Sequence of a Reference Stock of Simian Immunodeficiency Virus RNA (SIVmac251/32H/L28) Determined by Deep Sequencing

    PubMed Central

    Jenkins, Adrian; Ham, Claire; Almond, Neil

    2016-01-01

    A reference preparation for simian immunodeficiency virus (SIV) RNA nucleic acid assays was characterized by complete genome deep sequencing. The entire coding sequence and flanking long terminal repeats, including minority species, were determined. This information will inform SIV research investigations and aid evaluation and development of amplification assays for SIV RNA quantification. PMID:27231355

  13. Deep Sequencing of Small RNAs in Tomato for Virus and Viroid Identification and Strain Differentiation

    PubMed Central

    Li, Rugang; Gao, Shan; Hernandez, Alvaro G.; Wechter, W. Patrick; Fei, Zhangjun; Ling, Kai-Shu

    2012-01-01

    Small RNAs (sRNA), including microRNAs (miRNA) and small interfering RNAs (siRNA), are produced abundantly in plants and animals and function in regulating gene expression or in defense against virus or viroid infection. Analysis of siRNA profiles upon virus infection in plant may allow for virus identification, strain differentiation, and de novo assembly of virus genomes. In the present study, four suspected virus-infected tomato samples collected in the U.S. and Mexico were used for sRNA library construction and deep sequencing. Each library generated between 5–7 million sRNA reads, of which more than 90% were from the tomato genome. Upon in-silico subtraction of the tomato sRNAs, the remaining highly enriched, virus-like siRNA pools were assembled with or without reference virus or viroid genomes. A complete genome was assembled for Potato spindle tuber viroid (PSTVd) using siRNA alone. In addition, a near complete virus genome (98%) also was assembled for Pepino mosaic virus (PepMV). A common mixed infection of two strains of PepMV (EU and US1), which shared 82% of genome nucleotide sequence identity, also could be differentially assembled into their respective genomes. Using de novo assembly, a novel potyvirus with less than 60% overall genome nucleotide sequence identity to other known viruses was discovered and its full genome sequence obtained. Taken together, these data suggest that the sRNA deep sequencing technology will likely become an efficient and powerful generic tool for virus identification in plants and animals. PMID:22623984

  14. Development of genic-SSR markers by deep transcriptome sequencing in pigeonpea [Cajanus cajan (L.) Millspaugh

    PubMed Central

    2011-01-01

    Background Pigeonpea [Cajanus cajan (L.) Millspaugh], one of the most important food legumes of semi-arid tropical and subtropical regions, has limited genomic resources, particularly expressed sequence based (genic) markers. We report a comprehensive set of validated genic simple sequence repeat (SSR) markers using deep transcriptome sequencing, and its application in genetic diversity analysis and mapping. Results In this study, 43,324 transcriptome shotgun assembly unigene contigs were assembled from 1.696 million 454 GS-FLX sequence reads of separate pooled cDNA libraries prepared from leaf, root, stem and immature seed of two pigeonpea varieties, Asha and UPAS 120. A total of 3,771 genic-SSR loci, excluding homopolymeric and compound repeats, were identified; of which 2,877 PCR primer pairs were designed for marker development. Dinucleotide was the most common repeat motif with a frequency of 60.41%, followed by tri- (34.52%), hexa- (2.62%), tetra- (1.67%) and pentanucleotide (0.76%) repeat motifs. Primers were synthesized and tested for 772 of these loci with repeat lengths of ≥18 bp. Of these, 550 markers were validated for consistent amplification in eight diverse pigeonpea varieties; 71 were found to be polymorphic on agarose gel electrophoresis. Genetic diversity analysis was done on 22 pigeonpea varieties and eight wild species using 20 highly polymorphic genic-SSR markers. The number of alleles at these loci ranged from 4-10 and the polymorphism information content values ranged from 0.46 to 0.72. Neighbor-joining dendrogram showed distinct separation of the different groups of pigeonpea cultivars and wild species. Deep transcriptome sequencing of the two parental lines helped in silico identification of polymorphic genic-SSR loci to facilitate the rapid development of an intra-species reference genetic map, a subset of which was validated for expected allelic segregation in the reference mapping population. Conclusion We developed 550 validated genic

  15. Detection and characterization of mycoviruses in arbuscular mycorrhizal fungi by deep-sequencing.

    PubMed

    Ezawa, Tatsuhiro; Ikeda, Yoji; Shimura, Hanako; Masuta, Chikara

    2015-01-01

    Fungal viruses (mycoviruses) often have a significant impact not only on phenotypic expression of the host fungus but also on higher order biological interactions, e.g., conferring plant stress tolerance via an endophytic host fungus. Arbuscular mycorrhizal (AM) fungi in the phylum Glomeromycota associate with most land plants and supply mineral nutrients to the host plants. So far, little information about mycoviruses has been obtained in the fungi due to their obligate biotrophic nature. Here we provide a technical breakthrough, "two-step strategy" in combination with deep-sequencing, for virological study in AM fungi; dsRNA is first extracted and sequenced using material obtained from highly productive open pot culture, and then the presence of viruses is verified using pure material produced in the in vitro monoxenic culture. This approach enabled us to demonstrate the presence of several viruses for the first time from a glomeromycotan fungus.

  16. Metatranscriptomic analysis of small RNAs present in soybean deep sequencing libraries

    PubMed Central

    Molina, Lorrayne Gomes; da Fonseca, Guilherme Cordenonsi; de Morais, Guilherme Loss; de Oliveira, Luiz Felipe Valter; de Carvalho, Joseane Biso; Kulcheski, Franceli Rodrigues; Margis, Rogerio

    2012-01-01

    A large number of small RNAs unrelated to the soybean genome were identified after deep sequencing of soybean small RNA libraries. A metatranscriptomic analysis was carried out to identify the origin of these sequences. Comparative analyses of small interference RNAs (siRNAs) present in samples collected in open areas corresponding to soybean field plantations and samples from soybean cultivated in greenhouses under a controlled environment were made. Different pathogenic, symbiotic and free-living organisms were identified from samples of both growth systems. They included viruses, bacteria and different groups of fungi. This approach can be useful not only to identify potentially unknown pathogens and pests, but also to understand the relations that soybean plants establish with microorganisms that may affect, directly or indirectly, plant health and crop production. PMID:22802714

  17. Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing

    SciTech Connect

    Whitehead, Timothy A.; Chevalier, Aaron; Song, Yifan; Dreyfus, Cyrille; Fleishman, Sarel J.; De Mattos, Cecilia; Myers, Chris A.; Kamisetty, Hetunandan; Blair, Patrick; Wilson, Ian A.; Baker, David

    2012-06-19

    We show that comprehensive sequence-function maps obtained by deep sequencing can be used to reprogram interaction specificity and to leapfrog over bottlenecks in affinity maturation by combining many individually small contributions not detectable in conventional approaches. We use this approach to optimize two computationally designed inhibitors against H1N1 influenza hemagglutinin and, in both cases, obtain variants with subnanomolar binding affinity. The most potent of these, a 51-residue protein, is broadly cross-reactive against all influenza group 1 hemagglutinins, including human H2, and neutralizes H1N1 viruses with a potency that rivals that of several human monoclonal antibodies, demonstrating that computational design followed by comprehensive energy landscape mapping can generate proteins with potential therapeutic utility.

  18. Inside the intraterrestrials: The deep biosphere seen through massively parallel sequencing

    NASA Astrophysics Data System (ADS)

    Biddle, J.

    2009-12-01

    Deeply buried marine sediments may house a large amount of the Earth’s microbial population. Initial studies based on 16S rRNA clone libraries suggest that these sediments contain unique phylotypes of microorganisms, particularly from the archaeal domain. Since this environment is so difficult to study, microbiologists are challenged to find ways to examine these populations remotely. A major approach taken to study this environment uses massively parallel sequencing to examine the inner genetic workings of these microorganisms after the sediment has been drilled. Both metagenomics and tagged amplicon sequencing have been employed on deep sediments, and initial results show that different geographic regions can be differentiated through genomics and also minor populations may cause major geochemical changes.

  19. Rapid fine conformational epitope mapping using comprehensive mutagenesis and deep sequencing.

    PubMed

    Kowalsky, Caitlin A; Faber, Matthew S; Nath, Aritro; Dann, Hailey E; Kelly, Vince W; Liu, Li; Shanker, Purva; Wagner, Ellen K; Maynard, Jennifer A; Chan, Christina; Whitehead, Timothy A

    2015-10-30

    Knowledge of the fine location of neutralizing and non-neutralizing epitopes on human pathogens affords a better understanding of the structural basis of antibody efficacy, which will expedite rational design of vaccines, prophylactics, and therapeutics. However, full utilization of the wealth of information from single cell techniques and antibody repertoire sequencing awaits the development of a high throughput, inexpensive method to map the conformational epitopes for antibody-antigen interactions. Here we show such an approach that combines comprehensive mutagenesis, cell surface display, and DNA deep sequencing. We develop analytical equations to identify epitope positions and show the method effectiveness by mapping the fine epitope for different antibodies targeting TNF, pertussis toxin, and the cancer target TROP2. In all three cases, the experimentally determined conformational epitope was consistent with previous experimental datasets, confirming the reliability of the experimental pipeline. Once the comprehensive library is generated, fine conformational epitope maps can be prepared at a rate of four per day. PMID:26296891

  20. Draft Genome Sequence of Caloranaerobacter sp. TR13, an Anaerobic Thermophilic Bacterium Isolated from a Deep-Sea Hydrothermal Vent

    PubMed Central

    Xie, Yunbiao; Dong, Binbin; Liu, Qing; Chen, Xiaoyao

    2015-01-01

    Here, we report the draft 2,261,881-bp genome sequence of Caloranaerobacter sp. TR13, isolated from a deep-sea hydrothermal vent on the East Pacific Rise. The sequence will be helpful for understanding the genetic and metabolic features, as well as potential biotechnological application in the genus Caloranaerobacter. PMID:26679595

  1. Draft Genome Sequence of Caloranaerobacter sp. TR13, an Anaerobic Thermophilic Bacterium Isolated from a Deep-Sea Hydrothermal Vent.

    PubMed

    Zhou, Meixian; Xie, Yunbiao; Dong, Binbin; Liu, Qing; Chen, Xiaoyao

    2015-01-01

    Here, we report the draft 2,261,881-bp genome sequence of Caloranaerobacter sp. TR13, isolated from a deep-sea hydrothermal vent on the East Pacific Rise. The sequence will be helpful for understanding the genetic and metabolic features, as well as potential biotechnological application in the genus Caloranaerobacter. PMID:26679595

  2. Draft Genome Sequence of Psychrobacter piscatorii Strain LQ58, a Psychrotolerant Bacterium Isolated from a Deep-Sea Hydrothermal Vent.

    PubMed

    Zhou, Meixian; Dong, Binbin; Liu, Qing

    2016-01-01

    Here, we report the 3.1-Mb draft genome sequence of Psychrobacter piscatorii strain LQ58, isolated from a deep-sea hydrothermal vent on the East Pacific Rise. The sequence will provide further insight into the environmental adaptation of psychrotolerant bacteria and the development of novel cold-active enzymes for industrial application. PMID:26941137

  3. Draft Genome Sequence of Psychrobacter piscatorii Strain LQ58, a Psychrotolerant Bacterium Isolated from a Deep-Sea Hydrothermal Vent

    PubMed Central

    Dong, Binbin; Liu, Qing

    2016-01-01

    Here, we report the 3.1-Mb draft genome sequence of Psychrobacter piscatorii strain LQ58, isolated from a deep-sea hydrothermal vent on the East Pacific Rise. The sequence will provide further insight into the environmental adaptation of psychrotolerant bacteria and the development of novel cold-active enzymes for industrial application. PMID:26941137

  4. Population-genomic variation within RNA viruses of the Western honey bee, Apis mellifera, inferred from deep sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RN...

  5. Analysis of microRNA transcriptome by deep sequencing of small RNA libraries of peripheral blood

    PubMed Central

    2010-01-01

    Background MicroRNAs are a class of small non-coding RNAs that regulate mRNA expression at the post - transcriptional level and thereby many fundamental biological processes. A number of methods, such as multiplex polymerase chain reaction, microarrays have been developed for profiling levels of known miRNAs. These methods lack the ability to identify novel miRNAs and accurately determine expression at a range of concentrations. Deep or massively parallel sequencing methods are providing suitable platforms for genome wide transcriptome analysis and have the ability to identify novel transcripts. Results The results of analysis of small RNA sequences obtained by Solexa technology of normal peripheral blood mononuclear cells, tumor cell lines K562 and HL60 are presented. In general K562 cells displayed overall low level of miRNA population and also low levels of DICER. Some of the highly expressed miRNAs in the leukocytes include several members of the let-7 family, miR-21, 103, 185, 191 and 320a. Comparison of the miRNA profiles of normal versus K562 or HL60 cells revealed a specific set of differentially expressed molecules. Correlation of the miRNA with that of mRNA expression profiles, obtained by microarray, revealed a set of target genes showing inverse correlation with miRNA levels. Relative expression levels of individual miRNAs belonging to a cluster were found to be highly variable. Our computational pipeline also predicted a number of novel miRNAs. Some of the predictions were validated by Real-time RT-PCR and or RNase protection assay. Organization of some of the novel miRNAs in human genome suggests that these may also be part of existing clusters or form new clusters. Conclusions We conclude that about 904 miRNAs are expressed in human leukocytes. Out of these 370 are novel miRNAs. We have identified miRNAs that are differentially regulated in normal PBMC with respect to cancer cells, K562 and HL60. Our results suggest that post - transcriptional

  6. Evaluation of ultra-deep targeted sequencing for personalized breast cancer care

    PubMed Central

    2013-01-01

    Introduction The increasing number of targeted therapies, together with a deeper understanding of cancer genetics and drug response, have prompted major healthcare centers to implement personalized treatment approaches relying on high-throughput tumor DNA sequencing. However, the optimal way to implement this transformative methodology is not yet clear. Current assays may miss important clinical information such as the mutation allelic fraction, the presence of sub-clones or chromosomal rearrangements, or the distinction between inherited variants and somatic mutations. Here, we present the evaluation of ultra-deep targeted sequencing (UDT-Seq) to generate and interpret the molecular profile of 38 breast cancer patients from two academic medical centers. Methods We sequenced 47 genes in matched germline and tumor DNA samples from 38 breast cancer patients. The selected genes, or the pathways they belong to, can be targeted by drugs or are important in familial cancer risk or drug metabolism. Results Relying on the added value of sequencing matched tumor and germline DNA and using a dedicated analysis, UDT-Seq has a high sensitivity to identify mutations in tumors with low malignant cell content. Applying UDT-Seq to matched tumor and germline specimens from the 38 patients resulted in a proposal for at least one targeted therapy for 22 patients, the identification of tumor sub-clones in 3 patients, the suggestion of potential adverse drug effects in 3 patients and a recommendation for genetic counseling for 2 patients. Conclusion Overall our study highlights the additional benefits of a sequencing strategy, which includes germline DNA and is optimized for heterogeneous tumor tissues. PMID:24326041

  7. Identification of torque teno virus in culture-negative endophthalmitis by representational deep-DNA sequencing

    PubMed Central

    Lee, Aaron Y.; Akileswaran, Lakshmi; Tibbetts, Michael D.; Garg, Sunir J.; Van Gelder, Russell N.

    2014-01-01

    Purpose To test the hypothesis that uncultured organisms may be present in cases of culture-negative endophthalmitis, by use of deep DNA sequencing of vitreous biopsies. Design Single center consecutive prospective observational study. Participants and Controls Aqueous or vitreous biopsies from 21 consecutive patients presenting with presumed infectious endophthalmitis, and seven vitreous samples from patients undergoing surgery for non-infectious retinal disorders. Methods Traditional bacterial and fungal culture, 16S quantitative polymerase chain reaction (qPCR) and a representational deep-sequencing method (Biome Representational in Silico Karyotyping [BRiSK]) were applied in parallel to samples to identify DNA sequences corresponding to potential pathogens. Main Outcome Measures Presence of potential pathogen DNA in ocular samples. Results None of 7 control eyes undergoing routine vitreous surgery yielded positive results for bacteria or virus by culture or 16S PCR. Fourteen of the 21 samples (66.7%) from eyes harboring suspected infectious endophthalmitis were culture-positive, the most common being Staphylococcal and Streptococcal species. There was good agreement among culture, 16S bacterial PCR, and BRiSK methodologies for culture-positive cases (Fleiss’ kappa of 0.621). 16S PCR did not yield a recognizable pathogen sequence in any culture-negative sample, while BRiSK suggested presence of Steptococcus in one culture-negative sample. Surprisingly, using BRiSK, 57.1% of culture-positive and 100% of culture-negative samples demonstrated presence of Torque Teno Virus (TTV) sequences, compared to none in the controls (Fisher exact, p = 0.0005). Presence of TTV viral DNA was confirmed in seven cases by qPCR. No other known viruses or potential pathogens were identified in these samples. Conclusion Culture, 16S qPCR, and BRiSK provide complementary information in presumed infectious endophthalmitis. The majority of culture-negative endophthalmitis samples did

  8. Fungal communities from the calcareous deep-sea sediments in the Southwest India Ridge revealed by Illumina sequencing technology.

    PubMed

    Zhang, Likui; Kang, Manyu; Huang, Yangchao; Yang, Lixiang

    2016-05-01

    The diversity and ecological significance of bacteria and archaea in deep-sea environments have been thoroughly investigated, but eukaryotic microorganisms in these areas, such as fungi, are poorly understood. To elucidate fungal diversity in calcareous deep-sea sediments in the Southwest India Ridge (SWIR), the internal transcribed spacer (ITS) regions of rRNA genes from two sediment metagenomic DNA samples were amplified and sequenced using the Illumina sequencing platform. The results revealed that 58-63 % and 36-42 % of the ITS sequences (97 % similarity) belonged to Basidiomycota and Ascomycota, respectively. These findings suggest that Basidiomycota and Ascomycota are the predominant fungal phyla in the two samples. We also found that Agaricomycetes, Leotiomycetes, and Pezizomycetes were the major fungal classes in the two samples. At the species level, Thelephoraceae sp. and Phialocephala fortinii were major fungal species in the two samples. Despite the low relative abundance, unidentified fungal sequences were also observed in the two samples. Furthermore, we found that there were slight differences in fungal diversity between the two sediment samples, although both were collected from the SWIR. Thus, our results demonstrate that calcareous deep-sea sediments in the SWIR harbor diverse fungi, which augment the fungal groups in deep-sea sediments. This is the first report of fungal communities in calcareous deep-sea sediments in the SWIR revealed by Illumina sequencing.

  9. Deep sequencing of pigeonpea sterility mosaic virus discloses five RNA segments related to emaraviruses.

    PubMed

    Elbeaino, Toufic; Digiaro, Michele; Uppala, Mangala; Sudini, Harikishan

    2014-08-01

    The sequences of five viral RNA segments of pigeonpea sterility mosaic virus (PPSMV), the agent of sterility mosaic disease (SMD) of pigeonpea (Cajanus cajan, Fabaceae), were determined using the deep sequencing technology. Each of the five RNAs encodes a single protein on the negative-sense strand with an open reading frame (ORF) of 6885, 1947, 927, 1086, and 1,422 nts, respectively. In order, from RNA1 to RNA5, these ORFs encode the RNA-dependent RNA polymerase (p1, 267.9 kDa), a putative glycoprotein precursor (p2, 74.3 kDa), a putative nucleocapsid protein (p3, 34.6 kDa), a putative movement protein (p4, 40.8 kDa), while p5 (55 kDa) has an unknown function. All RNA segments of PPSMV showed the highest identity with orthologs of fig mosaic virus (FMV) and Rose rosette virus (RRV). In phylogenetic trees constructed with the amino acid sequences of p1, p2 and p3, PPSMV clustered consistently with other emaraviruses, close to clades comprising members of other genera of the family Bunyaviridae. Based on the molecular characteristics unveiled in this study and the morphological and epidemiological features similar to other emaraviruses, PPSMV seems to be the seventh species to join the list of emaraviruses known to date and accordingly, its classification in the genus Emaravirus seems now legitimate. PMID:24685674

  10. Deep CCD photometry in globular clusters. I. The Main sequence of M4

    SciTech Connect

    Richer, H.B.; Fahlman, G.G.

    1984-02-01

    From deep UBV CCD images obtained with CTIO 4 m telescope, we have constructed color-magnitude and color-color diagrams in a 4' x 3' field of the globular cluster M4. Inspection of the color-magnitude diagram indicates that the main sequence down to almost 3 mag below the turnoff has an intrinsic width no wider than +- 0.02 magnitudes in (B--V) implying that the variation in helium abundance (..delta..Y) in these stars must be less than +- 0.07 (..delta..Z = 0) or that the fractional variation in metallicity (..delta..Z/Z) is no larger than +- 0.22 (..delta..Y = 0). To a similar limit on the main sequence, the binary frequency in the field studied must be very small and does not exceed 3% of all main-sequence stars (and may be zero). The luminosity function of M4 is rather flat and definitely turns over by V = 20 (M/sub v/ = 7.5).

  11. Deep transcriptome profiling of clinical Klebsiella pneumoniae isolates reveals strain and sequence type-specific adaptation.

    PubMed

    Bruchmann, Sebastian; Muthukumarasamy, Uthayakumar; Pohl, Sarah; Preusse, Matthias; Bielecka, Agata; Nicolai, Tanja; Hamann, Isabell; Hillert, Roger; Kola, Axel; Gastmeier, Petra; Eckweiler, Denitsa; Häussler, Susanne

    2015-11-01

    Health-care-associated infections by multi-drug-resistant bacteria constitute one of the greatest challenges to modern medicine. Bacterial pathogens devise various mechanisms to withstand the activity of a wide range of antimicrobial compounds, among which the acquisition of carbapenemases is one of the most concerning. In Klebsiella pneumoniae, the dissemination of the K. pneumoniae carbapenemase is tightly connected to the global spread of certain clonal lineages. Although antibiotic resistance is a key driver for the global distribution of epidemic high-risk clones, there seem to be other adaptive traits that may explain their success. Here, we exploited the power of deep transcriptome profiling (RNA-seq) to shed light on the transcriptomic landscape of 37 clinical K. pneumoniae isolates of diverse phylogenetic origins. We identified a large set of 3346 genes which was expressed in all isolates. While the core-transcriptome profiles varied substantially between groups of different sequence types, they were more homogenous among isolates of the same sequence type. We furthermore linked the detailed information on differentially expressed genes with the clinically relevant phenotypes of biofilm formation and bacterial virulence. This allowed for the identification of a diminished expression of biofilm-specific genes within the low biofilm producing ST258 isolates as a sequence type-specific trait. PMID:26261087

  12. Transcriptome and small RNA deep sequencing reveals deregulation of miRNA biogenesis in human glioma.

    PubMed

    Moore, Lynette M; Kivinen, Virpi; Liu, Yuexin; Annala, Matti; Cogdell, David; Liu, Xiuping; Liu, Chang-Gong; Sawaya, Raymond; Yli-Harja, Olli; Shmulevich, Ilya; Fuller, Gregory N; Zhang, Wei; Nykter, Matti

    2013-02-01

    Altered expression of oncogenic and tumour-suppressing microRNAs (miRNAs) is widely associated with tumourigenesis. However, the regulatory mechanisms underlying these alterations are poorly understood. We sought to shed light on the deregulation of miRNA biogenesis promoting the aberrant miRNA expression profiles identified in these tumours. Using sequencing technology to perform both whole-transcriptome and small RNA sequencing of glioma patient samples, we examined precursor and mature miRNAs to directly evaluate the miRNA maturation process, and examined expression profiles for genes involved in the major steps of miRNA biogenesis. We found that ratios of mature to precursor forms of a large number of miRNAs increased with the progression from normal brain to low-grade and then to high-grade gliomas. The expression levels of genes involved in each of the three major steps of miRNA biogenesis (nuclear processing, nucleo-cytoplasmic transport, and cytoplasmic processing) were systematically altered in glioma tissues. Survival analysis of an independent data set demonstrated that the alteration of genes involved in miRNA maturation correlates with survival in glioma patients. Direct quantification of miRNA maturation with deep sequencing demonstrated that deregulation of the miRNA biogenesis pathway is a hallmark for glioma genesis and progression.

  13. Transcriptome and Small RNA Deep Sequencing Reveals Deregulation of miRNA Biogenesis in Human Glioma

    PubMed Central

    Moore, Lynette M.; Kivinen, Virpi; Liu, Yuexin; Annala, Matti; Cogdell, David; Liu, Xiuping; Liu, Chang-Gong; Sawaya, Raymond; Yli-Harja, Olli; Shmulevich, Ilya; Fuller, Gregory N.; Zhang, Wei; Nykter, Matti

    2013-01-01

    Altered expression of oncogenic and tumor-suppressing microRNAs (miRNAs) is widely associated with tumorigenesis. However, the regulatory mechanisms underlying these alterations are poorly understood. We sought to shed light on the deregulation of miRNA biogenesis promoting the aberrant miRNA expression profiles identified in these tumors. Using sequencing technology to perform both whole-transcriptome and small RNA sequencing of glioma patient samples, we examined precursor and mature miRNAs to directly evaluate the miRNA maturation process, and interrogated expression profiles for genes involved in the major steps of miRNA biogenesis. We found that ratios of mature to precursor forms of a large number of miRNAs increased with the progression from normal brain to low-grade and then to high-grade gliomas. The expression levels of genes involved in each of the three major steps of miRNA biogenesis (nuclear processing, nucleo-cytoplasmic transport, and cytoplasmic processing) were systematically altered in glioma tissues. Survival analysis of an independent data set demonstrated that the alteration of genes involved in miRNA maturation correlates with survival in glioma patients. Direct quantification of miRNA maturation with deep sequencing demonstrated that deregulation of the miRNA biogenesis pathway is a hallmark for glioma genesis and progression. PMID:23007860

  14. Identification of Dirofilaria immitis miRNA using illumina deep sequencing

    PubMed Central

    2013-01-01

    The heartworm Dirofilaria immitis is the causative agent of cardiopulmonary dirofilariosis in dogs and cats, which also infects a wide range of wild mammals and humans. The complex life cycle of D. immitis with several developmental stages in its invertebrate mosquito vectors and its vertebrate hosts indicates the importance of miRNA in growth and development, and their ability to regulate infection of mammalian hosts. This study identified the miRNA profiles of D. immitis of zoonotic significance by deep sequencing. A total of 1063 conserved miRNA candidates, including 68 anti-sense miRNA (miRNA*) sequences, were predicted by computational methods and could be grouped into 808 miRNA families. A significant bias towards family members, family abundance and sequence nucleotides was observed. Thirteen novel miRNA candidates were predicted by alignment with the Brugia malayi genome. Eleven out of 13 predicted miRNA candidates were verified by using a PCR-based method. Target genes of the novel miRNA candidates were predicted by using the heartworm transcriptome dataset. To our knowledge, this is the first report of miRNA profiles in D. immitis, which will contribute to a better understanding of the complex biology of this zoonotic filarial nematode and the molecular regulation roles of miRNA involved. Our findings may also become a useful resource for small RNA studies in other filarial parasitic nematodes. PMID:23331513

  15. Heteroplasmic substitutions in the entire mitochondrial genomes of human colon cells detected by ultra-deep 454 sequencing.

    PubMed

    Skonieczna, Katarzyna; Malyarchuk, Boris; Jawień, Arkadiusz; Marszałek, Andrzej; Banaszkiewicz, Zbigniew; Jarmocik, Paweł; Borcz, Marcelina; Bała, Piotr; Grzybowski, Tomasz

    2015-03-01

    Mitochondrial DNA (mtDNA) heteroplasmy has been widely described from clinical, evolutionary and analytical points of view. Historically, the majority of studies have been based on Sanger sequencing. However, next-generation sequencing technologies are now being used for heteroplasmy analysis. Ultra-deep sequencing approaches provide increased sensitivity for detecting minority variants. However, a phylogenetic a posteriori analysis revealed that most of the next-generation sequencing data published to date suffers from shortcomings. Because implementation of new technologies in clinical, population, or forensic studies requires proper verification, in this paper we present a direct comparison of ultra-deep 454 and Sanger sequencing for the detection of heteroplasmy in complete mitochondrial genomes of normal colon cells. The spectrum of heteroplasmic mutations is discussed against the background of mitochondrial DNA variability in human populations.

  16. Polymorphism Identification and Improved Genome Annotation of Brassica rapa Through Deep RNA Sequencing

    PubMed Central

    Devisetty, Upendra Kumar; Covington, Michael F.; Tat, An V.; Lekkala, Saradadevi; Maloof, Julin N.

    2014-01-01

    The mapping and functional analysis of quantitative traits in Brassica rapa can be greatly improved with the availability of physically positioned, gene-based genetic markers and accurate genome annotation. In this study, deep transcriptome RNA sequencing (RNA-Seq) of Brassica rapa was undertaken with two objectives: SNP detection and improved transcriptome annotation. We performed SNP detection on two varieties that are parents of a mapping population to aid in development of a marker system for this population and subsequent development of high-resolution genetic map. An improved Brassica rapa transcriptome was constructed to detect novel transcripts and to improve the current genome annotation. This is useful for accurate mRNA abundance and detection of expression QTL (eQTLs) in mapping populations. Deep RNA-Seq of two Brassica rapa genotypes—R500 (var. trilocularis, Yellow Sarson) and IMB211 (a rapid cycling variety)—using eight different tissues (root, internode, leaf, petiole, apical meristem, floral meristem, silique, and seedling) grown across three different environments (growth chamber, greenhouse and field) and under two different treatments (simulated sun and simulated shade) generated 2.3 billion high-quality Illumina reads. A total of 330,995 SNPs were identified in transcribed regions between the two genotypes with an average frequency of one SNP in every 200 bases. The deep RNA-Seq reassembled Brassica rapa transcriptome identified 44,239 protein-coding genes. Compared with current gene models of B. rapa, we detected 3537 novel transcripts, 23,754 gene models had structural modifications, and 3655 annotated proteins changed. Gaps in the current genome assembly of B. rapa are highlighted by our identification of 780 unmapped transcripts. All the SNPs, annotations, and predicted transcripts can be viewed at http://phytonetworks.ucdavis.edu/. PMID:25122667

  17. Polymorphism identification and improved genome annotation of Brassica rapa through Deep RNA sequencing.

    PubMed

    Devisetty, Upendra Kumar; Covington, Michael F; Tat, An V; Lekkala, Saradadevi; Maloof, Julin N

    2014-08-12

    The mapping and functional analysis of quantitative traits in Brassica rapa can be greatly improved with the availability of physically positioned, gene-based genetic markers and accurate genome annotation. In this study, deep transcriptome RNA sequencing (RNA-Seq) of Brassica rapa was undertaken with two objectives: SNP detection and improved transcriptome annotation. We performed SNP detection on two varieties that are parents of a mapping population to aid in development of a marker system for this population and subsequent development of high-resolution genetic map. An improved Brassica rapa transcriptome was constructed to detect novel transcripts and to improve the current genome annotation. This is useful for accurate mRNA abundance and detection of expression QTL (eQTLs) in mapping populations. Deep RNA-Seq of two Brassica rapa genotypes-R500 (var. trilocularis, Yellow Sarson) and IMB211 (a rapid cycling variety)-using eight different tissues (root, internode, leaf, petiole, apical meristem, floral meristem, silique, and seedling) grown across three different environments (growth chamber, greenhouse and field) and under two different treatments (simulated sun and simulated shade) generated 2.3 billion high-quality Illumina reads. A total of 330,995 SNPs were identified in transcribed regions between the two genotypes with an average frequency of one SNP in every 200 bases. The deep RNA-Seq reassembled Brassica rapa transcriptome identified 44,239 protein-coding genes. Compared with current gene models of B. rapa, we detected 3537 novel transcripts, 23,754 gene models had structural modifications, and 3655 annotated proteins changed. Gaps in the current genome assembly of B. rapa are highlighted by our identification of 780 unmapped transcripts. All the SNPs, annotations, and predicted transcripts can be viewed at http://phytonetworks.ucdavis.edu/.

  18. Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing

    PubMed Central

    Manske, Magnus; Miotto, Olivo; Campino, Susana; Auburn, Sarah; Almagro-Garcia, Jacob; Maslen, Gareth; O’Brien, Jack; Djimde, Abdoulaye; Doumbo, Ogobara; Zongo, Issaka; Ouedraogo, Jean-Bosco; Michon, Pascal; Mueller, Ivo; Siba, Peter; Nzila, Alexis; Borrmann, Steffen; Kiara, Steven M.; Marsh, Kevin; Jiang, Hongying; Su, Xin-Zhuan; Amaratunga, Chanaki; Fairhurst, Rick; Socheat, Duong; Nosten, Francois; Imwong, Mallika; White, Nicholas J.; Sanders, Mandy; Anastasi, Elisa; Alcock, Dan; Drury, Eleanor; Oyola, Samuel; Quail, Michael A.; Turner, Daniel J.; Rubio, Valentin Ruano; Jyothi, Dushyanth; Amenga-Etego, Lucas; Hubbart, Christina; Jeffreys, Anna; Rowlands, Kate; Sutherland, Colin; Roper, Cally; Mangano, Valentina; Modiano, David; Tan, John C.; Ferdig, Michael T.; Amambua-Ngwa, Alfred; Conway, David J.; Takala-Harrison, Shannon; Plowe, Christopher V.; Rayner, Julian C.; Rockett, Kirk A.; Clark, Taane G.; Newbold, Chris I.; Berriman, Matthew; MacInnis, Bronwyn; Kwiatkowski, Dominic P.

    2013-01-01

    Malaria elimination strategies require surveillance of the parasite population for genetic changes that demand a public health response, such as new forms of drug resistance. 1,2 Here we describe methods for large-scale analysis of genetic variation in Plasmodium falciparum by deep sequencing of parasite DNA obtained from the blood of patients with malaria, either directly or after short term culture. Analysis of 86,158 exonic SNPs that passed genotyping quality control in 227 samples from Africa, Asia and Oceania provides genome-wide estimates of allele frequency distribution, population structure and linkage disequilibrium. By comparing the genetic diversity of individual infections with that of the local parasite population, we derive a metric of within-host diversity that is related to the level of inbreeding in the population. An open-access web application has been established for exploration of regional differences in allele frequency and of highly differentiated loci in the P. falciparum genome. PMID:22722859

  19. Deep Sequencing Analysis of Nucleolar Small RNAs: RNA Isolation and Library Preparation.

    PubMed

    Bai, Baoyan; Laiho, Marikki

    2016-01-01

    The nucleolus is a subcellular compartment with a key essential function in ribosome biogenesis. The nucleolus is rich in noncoding RNAs, mostly the ribosomal RNAs and small nucleolar RNAs. Surprisingly, also several miRNAs have been detected in the nucleolus, raising the question as to whether other small RNA species are present and functional in the nucleolus. We have developed a strategy for stepwise enrichment of nucleolar small RNAs from the total nucleolar RNA extracts and subsequent construction of nucleolar small RNA libraries which are suitable for deep sequencing. Our method successfully isolates the small RNA population from total RNAs and monitors the RNA quality in each step to ensure that small RNAs recovered represent the actual small RNA population in the nucleolus and not degradation products from larger RNAs. We have further applied this approach to characterize the distribution of small RNAs in different cellular compartments. PMID:27576723

  20. Deep sequencing reveals low incidence of endogenous LINE-1 retrotransposition in human induced pluripotent stem cells.

    PubMed

    Arokium, Hubert; Kamata, Masakazu; Kim, Sanggu; Kim, Namshin; Liang, Min; Presson, Angela P; Chen, Irvin S

    2014-01-01

    Long interspersed element-1 (LINE-1 or L1) retrotransposition induces insertional mutations that can result in diseases. It was recently shown that the copy number of L1 and other retroelements is stable in induced pluripotent stem cells (iPSCs). However, by using an engineered reporter construct over-expressing L1, another study suggests that reprogramming activates L1 mobility in iPSCs. Given the potential of human iPSCs in therapeutic applications, it is important to clarify whether these cells harbor somatic insertions resulting from endogenous L1 retrotransposition. Here, we verified L1 expression during and after reprogramming as well as potential somatic insertions driven by the most active human endogenous L1 subfamily (L1Hs). Our results indicate that L1 over-expression is initiated during the reprogramming process and is subsequently sustained in isolated clones. To detect potential somatic insertions in iPSCs caused by L1Hs retotransposition, we used a novel sequencing strategy. As opposed to conventional sequencing direction, we sequenced from the 3' end of L1Hs to the genomic DNA, thus enabling the direct detection of the polyA tail signature of retrotransposition for verification of true insertions. Deep coverage sequencing thus allowed us to detect seven potential somatic insertions with low read counts from two iPSC clones. Negative PCR amplification in parental cells, presence of a polyA tail and absence from seven L1 germline insertion databases highly suggested true somatic insertions in iPSCs. Furthermore, these insertions could not be detected in iPSCs by PCR, likely due to low abundance. We conclude that L1Hs retrotransposes at low levels in iPSCs and therefore warrants careful analyses for genotoxic effects.

  1. Deep RNA Sequencing of the Skeletal Muscle Transcriptome in Swimming Fish

    PubMed Central

    Palstra, Arjan P.; Beltran, Sergi; Burgerhout, Erik; Brittijn, Sebastiaan A.; Magnoni, Leonardo J.; Henkel, Christiaan V.; Jansen, Hans J.; van den Thillart, Guido E. E. J. M.; Spaink, Herman P.; Planas, Josep V.

    2013-01-01

    Deep RNA sequencing (RNA-seq) was performed to provide an in-depth view of the transcriptome of red and white skeletal muscle of exercised and non-exercised rainbow trout (Oncorhynchus mykiss) with the specific objective to identify expressed genes and quantify the transcriptomic effects of swimming-induced exercise. Pubertal autumn-spawning seawater-raised female rainbow trout were rested (n = 10) or swum (n = 10) for 1176 km at 0.75 body-lengths per second in a 6,000-L swim-flume under reproductive conditions for 40 days. Red and white muscle RNA of exercised and non-exercised fish (4 lanes) was sequenced and resulted in 15–17 million reads per lane that, after de novo assembly, yielded 149,159 red and 118,572 white muscle contigs. Most contigs were annotated using an iterative homology search strategy against salmonid ESTs, the zebrafish Danio rerio genome and general Metazoan genes. When selecting for large contigs (>500 nucleotides), a number of novel rainbow trout gene sequences were identified in this study: 1,085 and 1,228 novel gene sequences for red and white muscle, respectively, which included a number of important molecules for skeletal muscle function. Transcriptomic analysis revealed that sustained swimming increased transcriptional activity in skeletal muscle and specifically an up-regulation of genes involved in muscle growth and developmental processes in white muscle. The unique collection of transcripts will contribute to our understanding of red and white muscle physiology, specifically during the long-term reproductive migration of salmonids. PMID:23308156

  2. Identification of MicroRNAs in Meloidogyne incognita Using Deep Sequencing

    PubMed Central

    Wang, Yunsheng; Mao, Zhenchuan; Yan, Jin; Cheng, Xinyue; Liu, Feng; Xiao, Luo; Dai, Liangying; Luo, Feng; Xie, Bingyan

    2015-01-01

    MicroRNAs play important regulatory roles in eukaryotic lineages. In this paper, we employed deep sequencing technology to sequence and identify microRNAs in M. incognita genome, which is one of the important plant parasitic nematodes. We identified 102 M. incognita microRNA genes, which can be grouped into 71 nonredundant miRNAs based on mature sequences. Among the 71 miRANs, 27 are known miRNAs and 44 are novel miRNAs. We identified seven miRNA clusters in M. incognita genome. Four of the seven clusters, miR-100/let-7, miR-71-1/miR-2a-1, miR-71-2/miR-2a-2 and miR-279/miR-2b are conserved in other species. We validated the expressions of 5 M. incognita microRNAs, including 3 known microRNAs (miR-71, miR-100b and let-7) and 2 novel microRNAs (NOVEL-1 and NOVEL-2), using RT-PCR. We can detect all 5 microRNAs. The expression levels of four microRNAs obtained using RT-PCR were consistent with those obtained by high-throughput sequencing except for those of let-7. We also examined how M. incognita miRNAs are conserved in four other nematodes species: C. elegans, A. suum, B. malayi and P. pacificus. We found that four microRNAs, miR-100, miR-92, miR-279 and miR-137, exist only in genomes of parasitic nematodes, but do not exist in the genomes of the free living nematode C. elegans. Our research created a unique resource for the research of plant parasitic nematodes. The candidate microRNAs could help elucidate the genomic structure, gene regulation, evolutionary processes, and developmental features of plant parasitic nematodes and nematode-plant interaction. PMID:26241472

  3. Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data.

    PubMed

    Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A; Larsen, Martin Jakob

    2016-01-01

    Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths.

  4. Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data

    PubMed Central

    Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A.; Larsen, Martin Jakob

    2016-01-01

    Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths. PMID:27002637

  5. Uncovering microRNA-mediated response to SO2 stress in Arabidopsis thaliana by deep sequencing.

    PubMed

    Li, Lihong; Xue, Meizhao; Yi, Huilan

    2016-10-01

    Sulfur dioxide (SO2) is a major air pollutant and has significant impacts on plants. MicroRNAs (miRNAs) are a class of gene expression regulators that play important roles in response to environmental stresses. In this study, deep sequencing was used for genome-wide identification of miRNAs and their expression profiles in response to SO2 stress in Arabidopsis thaliana shoots. A total of 27 conserved miRNAs and 5 novel miRNAs were found to be differentially expressed under SO2 stress. qRT-PCR analysis showed mostly negative correlation between miRNA accumulation and target gene mRNA abundance, suggesting regulatory roles of these miRNAs during SO2 exposure. The target genes of SO2-responsive miRNAs encode transcription factors and proteins that regulate auxin signaling and stress response, and the miRNAs-mediated suppression of these genes could improve plant resistance to SO2 stress. Promoter sequence analysis of genes encoding SO2-responsive miRNAs showed that stress-responsive and phytohormone-related cis-regulatory elements occurred frequently, providing additional evidence of the involvement of miRNAs in adaption to SO2 stress. This study represents a comprehensive expression profiling of SO2-responsive miRNAs in Arabidopsis and broads our perspective on the ubiquitous regulatory roles of miRNAs under stress conditions.

  6. Engineering and analysis of peptide-recognition domain specificities by phage display and deep sequencing.

    PubMed

    McLaughlin, Megan E; Sidhu, Sachdev S

    2013-01-01

    Protein interaction networks depend in part on the specific recognition of unstructured peptides by folded domains. Understanding how members of a domain family use a similar fold to recognize different peptide sequences selectively is a fundamental question. One way to advance our understanding of peptide recognition is to apply an existing model of peptide recognition for a particular domain toward engineering synthetic domain variants with desired properties. Successes, failures, and unintended outcomes can help refine the model and can illuminate more general principles of peptide recognition. Using the PDZ domain fold as an example, we describe methods for (1) structure-based combinatorial library design and directed evolution of domain variants and (2) specificity profiling of large repertoires of synthetic variants using multiplexed deep sequencing. Peptide-binding preferences for hundreds of variants can be decoded in parallel, enabling comparisons between different library designs and selection pressures. The tremendous depth of coverage of the binding peptide profiles also permits robust computational analysis. This approach to studying peptide recognition can be applied to other domains and to a variety of structural and functional models by tailoring the combinatorial library design and selection pressures accordingly.

  7. Deep sequencing of Trichomonas vaginalis during the early infection of vaginal epithelial cells and amoeboid transition.

    PubMed

    Gould, Sven B; Woehle, Christian; Kusdian, Gary; Landan, Giddy; Tachezy, Jan; Zimorski, Verena; Martin, William F

    2013-08-01

    The human pathogen Trichomonas vaginalis has the largest protozoan genome known, potentially encoding approximately 60,000 proteins. To what degree these genes are expressed is not well known and only a few key transcription factors and promoter domains have been identified. To shed light on the expression capacity of the parasite and transcriptional regulation during phase transitions, we deep sequenced the transcriptomes of the protozoan during two environmental stimuli of the early infection process: exposure to oxygen and contact with vaginal epithelial cells. Eleven 3' fragment libraries from different time points after exposure to oxygen only and in combination with human tissue were sequenced, generating more than 150 million reads which mapped onto 33,157 protein coding genes in total and a core set of more than 20,000 genes represented within all libraries. The data uncover gene family expression regulation in this parasite and give evidence for a concentrated response to the individual stimuli. Oxygen stress primarily reveals the parasite's strategies to deal with oxygen radicals. The exposure of oxygen-adapted parasites to human epithelial cells primarily induces cytoskeletal rearrangement and proliferation, reflecting the rapid morphological transition from spindle shaped flagellates to tissue-feeding and actively dividing amoeboids.

  8. Transcript analysis of a goat mesenteric lymph node by deep next-generation sequencing.

    PubMed

    E, G X; Zhao, Y J; Na, R S; Huang, Y F

    2016-01-01

    Deep RNA sequencing (RNA-seq) provides a practical and inexpensive alternative for exploring genomic data in non-model organisms. The functional annotation of non-model mammalian genomes, such as that of goats, is still poor compared to that of humans and mice. In the current study, we performed a whole transcriptome analysis of an intestinal mucous membrane lymph node to comprehensively characterize the transcript catalogue of this tissue in a goat. Using an Illumina HiSeq 4000 sequencing platform, 9.692 GB of raw reads were acquired. A total of 57,526 lymph transcripts were obtained, and the majority of these were mapped to known transcriptional units (42.67%). A comparison of the mRNA expression of the mesenteric lymph nodes during the juvenile and post-adolescent stages revealed 8949 transcripts that were differentially expressed, including 6174 known genes. In addition, we functionally classified these transcripts using Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) terms. A total of 6174 known genes were assigned to 64 GO terms, and 3782 genes were assigned to 303 KEGG pathways, including some related to immunity. Our results reveal the complex transcriptome profile of the lymph node and suggest that the immune system is immature in the mesenteric lymph nodes of juvenile goats. PMID:27173308

  9. Transcript analysis of a goat mesenteric lymph node by deep next-generation sequencing.

    PubMed

    E, G X; Zhao, Y J; Na, R S; Huang, Y F

    2016-01-01

    Deep RNA sequencing (RNA-seq) provides a practical and inexpensive alternative for exploring genomic data in non-model organisms. The functional annotation of non-model mammalian genomes, such as that of goats, is still poor compared to that of humans and mice. In the current study, we performed a whole transcriptome analysis of an intestinal mucous membrane lymph node to comprehensively characterize the transcript catalogue of this tissue in a goat. Using an Illumina HiSeq 4000 sequencing platform, 9.692 GB of raw reads were acquired. A total of 57,526 lymph transcripts were obtained, and the majority of these were mapped to known transcriptional units (42.67%). A comparison of the mRNA expression of the mesenteric lymph nodes during the juvenile and post-adolescent stages revealed 8949 transcripts that were differentially expressed, including 6174 known genes. In addition, we functionally classified these transcripts using Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) terms. A total of 6174 known genes were assigned to 64 GO terms, and 3782 genes were assigned to 303 KEGG pathways, including some related to immunity. Our results reveal the complex transcriptome profile of the lymph node and suggest that the immune system is immature in the mesenteric lymph nodes of juvenile goats.

  10. estMOI: estimating multiplicity of infection using parasite deep sequencing data

    PubMed Central

    Assefa, Samuel A.; Preston, Mark D.; Campino, Susana; Ocholla, Harold; Sutherland, Colin J.; Clark, Taane G.

    2014-01-01

    Summary: Individuals living in endemic areas generally harbour multiple parasite strains. Multiplicity of infection (MOI) can be an indicator of immune status and transmission intensity. It has a potentially confounding effect on a number of population genetic analyses, which often assume isolates are clonal. Polymerase chain reaction-based approaches to estimate MOI can lack sensitivity. For example, in the human malaria parasite Plasmodium falciparum, genotyping of the merozoite surface protein (MSP1/2) genes is a standard method for assessing MOI, despite the apparent problem of underestimation. The availability of deep coverage data from massively parallizable sequencing technologies means that MOI can be detected genome wide by considering the abundance of heterozygous genotypes. Here, we present a method to estimate MOI, which considers unique combinations of polymorphisms from sequence reads. The method is implemented within the estMOI software. When applied to clinical P.falciparum isolates from three continents, we find that multiple infections are common, especially in regions with high transmission. Availability and implementation: estMOI is freely available from http://pathogenseq.lshtm.ac.uk. Contact: samuel.assefa@lshtm.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24443379

  11. estMOI: estimating multiplicity of infection using parasite deep sequencing data.

    PubMed

    Assefa, Samuel A; Preston, Mark D; Campino, Susana; Ocholla, Harold; Sutherland, Colin J; Clark, Taane G

    2014-05-01

    Individuals living in endemic areas generally harbour multiple parasite strains. Multiplicity of infection (MOI) can be an indicator of immune status and transmission intensity. It has a potentially confounding effect on a number of population genetic analyses, which often assume isolates are clonal. Polymerase chain reaction-based approaches to estimate MOI can lack sensitivity. For example, in the human malaria parasite Plasmodium falciparum, genotyping of the merozoite surface protein (MSP1/2) genes is a standard method for assessing MOI, despite the apparent problem of underestimation. The availability of deep coverage data from massively parallizable sequencing technologies means that MOI can be detected genome wide by considering the abundance of heterozygous genotypes. Here, we present a method to estimate MOI, which considers unique combinations of polymorphisms from sequence reads. The method is implemented within the estMOI software. When applied to clinical P.falciparum isolates from three continents, we find that multiple infections are common, especially in regions with high transmission. PMID:24443379

  12. Deep sequencing of the MHC region in the Chinese population contributes to studies of complex disease.

    PubMed

    Zhou, Fusheng; Cao, Hongzhi; Zuo, Xianbo; Zhang, Tao; Zhang, Xiaoguang; Liu, Xiaomin; Xu, Ricong; Chen, Gang; Zhang, Yuanwei; Zheng, Xiaodong; Jin, Xin; Gao, Jinping; Mei, Junpu; Sheng, Yujun; Li, Qibin; Liang, Bo; Shen, Juan; Shen, Changbing; Jiang, Hui; Zhu, Caihong; Fan, Xing; Xu, Fengping; Yue, Min; Yin, Xianyong; Ye, Chen; Zhang, Cuicui; Liu, Xiao; Yu, Liang; Wu, Jinghua; Chen, Mengyun; Zhuang, Xuehan; Tang, Lili; Shao, Haojing; Wu, Longmao; Li, Jian; Xu, Yu; Zhang, Yijie; Zhao, Suli; Wang, Yu; Li, Ge; Xu, Hanshi; Zeng, Lei; Wang, Jianan; Bai, Mingzhou; Chen, Yanling; Chen, Wei; Kang, Tian; Wu, Yanyan; Xu, Xun; Zhu, Zhengwei; Cui, Yong; Wang, Zaixing; Yang, Chunjun; Wang, Peiguang; Xiang, Leihong; Chen, Xiang; Zhang, Anping; Gao, Xinghua; Zhang, Furen; Xu, Jinhua; Zheng, Min; Zheng, Jie; Zhang, Jianzhong; Yu, Xueqing; Li, Yingrui; Yang, Sen; Yang, Huanming; Wang, Jian; Liu, Jianjun; Hammarström, Lennart; Sun, Liangdan; Wang, Jun; Zhang, Xuejun

    2016-07-01

    The human major histocompatibility complex (MHC) region has been shown to be associated with numerous diseases. However, it remains a challenge to pinpoint the causal variants for these associations because of the extreme complexity of the region. We thus sequenced the entire 5-Mb MHC region in 20,635 individuals of Han Chinese ancestry (10,689 controls and 9,946 patients with psoriasis) and constructed a Han-MHC database that includes both variants and HLA gene typing results of high accuracy. We further identified multiple independent new susceptibility loci in HLA-C, HLA-B, HLA-DPB1 and BTNL2 and an intergenic variant, rs118179173, associated with psoriasis and confirmed the well-established risk allele HLA-C*06:02. We anticipate that our Han-MHC reference panel built by deep sequencing of a large number of samples will serve as a useful tool for investigating the role of the MHC region in a variety of diseases and thus advance understanding of the pathogenesis of these disorders. PMID:27213287

  13. Deep sequencing of the MHC region in the Chinese population contributes to studies of complex disease.

    PubMed

    Zhou, Fusheng; Cao, Hongzhi; Zuo, Xianbo; Zhang, Tao; Zhang, Xiaoguang; Liu, Xiaomin; Xu, Ricong; Chen, Gang; Zhang, Yuanwei; Zheng, Xiaodong; Jin, Xin; Gao, Jinping; Mei, Junpu; Sheng, Yujun; Li, Qibin; Liang, Bo; Shen, Juan; Shen, Changbing; Jiang, Hui; Zhu, Caihong; Fan, Xing; Xu, Fengping; Yue, Min; Yin, Xianyong; Ye, Chen; Zhang, Cuicui; Liu, Xiao; Yu, Liang; Wu, Jinghua; Chen, Mengyun; Zhuang, Xuehan; Tang, Lili; Shao, Haojing; Wu, Longmao; Li, Jian; Xu, Yu; Zhang, Yijie; Zhao, Suli; Wang, Yu; Li, Ge; Xu, Hanshi; Zeng, Lei; Wang, Jianan; Bai, Mingzhou; Chen, Yanling; Chen, Wei; Kang, Tian; Wu, Yanyan; Xu, Xun; Zhu, Zhengwei; Cui, Yong; Wang, Zaixing; Yang, Chunjun; Wang, Peiguang; Xiang, Leihong; Chen, Xiang; Zhang, Anping; Gao, Xinghua; Zhang, Furen; Xu, Jinhua; Zheng, Min; Zheng, Jie; Zhang, Jianzhong; Yu, Xueqing; Li, Yingrui; Yang, Sen; Yang, Huanming; Wang, Jian; Liu, Jianjun; Hammarström, Lennart; Sun, Liangdan; Wang, Jun; Zhang, Xuejun

    2016-07-01

    The human major histocompatibility complex (MHC) region has been shown to be associated with numerous diseases. However, it remains a challenge to pinpoint the causal variants for these associations because of the extreme complexity of the region. We thus sequenced the entire 5-Mb MHC region in 20,635 individuals of Han Chinese ancestry (10,689 controls and 9,946 patients with psoriasis) and constructed a Han-MHC database that includes both variants and HLA gene typing results of high accuracy. We further identified multiple independent new susceptibility loci in HLA-C, HLA-B, HLA-DPB1 and BTNL2 and an intergenic variant, rs118179173, associated with psoriasis and confirmed the well-established risk allele HLA-C*06:02. We anticipate that our Han-MHC reference panel built by deep sequencing of a large number of samples will serve as a useful tool for investigating the role of the MHC region in a variety of diseases and thus advance understanding of the pathogenesis of these disorders.

  14. Ultra Deep Sequencing of a Baculovirus Population Reveals Widespread Genomic Variations

    PubMed Central

    Chateigner, Aurélien; Bézier, Annie; Labrousse, Carole; Jiolle, Davy; Barbe, Valérie; Herniou, Elisabeth A.

    2015-01-01

    Viruses rely on widespread genetic variation and large population size for adaptation. Large DNA virus populations are thought to harbor little variation though natural populations may be polymorphic. To measure the genetic variation present in a dsDNA virus population, we deep sequenced a natural strain of the baculovirus Autographa californica multiple nucleopolyhedrovirus. With 124,221X average genome coverage of our 133,926 bp long consensus, we could detect low frequency mutations (0.025%). K-means clustering was used to classify the mutations in four categories according to their frequency in the population. We found 60 high frequency non-synonymous mutations under balancing selection distributed in all functional classes. These mutants could alter viral adaptation dynamics, either through competitive or synergistic processes. Lastly, we developed a technique for the delimitation of large deletions in next generation sequencing data. We found that large deletions occur along the entire viral genome, with hotspots located in homologous repeat regions (hrs). Present in 25.4% of the genomes, these deletion mutants presumably require functional complementation to complete their infection cycle. They might thus have a large impact on the fitness of the baculovirus population. Altogether, we found a wide breadth of genomic variation in the baculovirus population, suggesting it has high adaptive potential. PMID:26198241

  15. Deep RNA sequencing analysis of readthrough gene fusions in human prostate adenocarcinoma and reference samples

    PubMed Central

    2011-01-01

    Background Readthrough fusions across adjacent genes in the genome, or transcription-induced chimeras (TICs), have been estimated using expressed sequence tag (EST) libraries to involve 4-6% of all genes. Deep transcriptional sequencing (RNA-Seq) now makes it possible to study the occurrence and expression levels of TICs in individual samples across the genome. Methods We performed single-end RNA-Seq on three human prostate adenocarcinoma samples and their corresponding normal tissues, as well as brain and universal reference samples. We developed two bioinformatics methods to specifically identify TIC events: a targeted alignment method using artificial exon-exon junctions within 200,000 bp from adjacent genes, and genomic alignment allowing splicing within individual reads. We performed further experimental verification and characterization of selected TIC and fusion events using quantitative RT-PCR and comparative genomic hybridization microarrays. Results Targeted alignment against artificial exon-exon junctions yielded 339 distinct TIC events, including 32 gene pairs with multiple isoforms. The false discovery rate was estimated to be 1.5%. Spliced alignment to the genome was less sensitive, finding only 18% of those found by targeted alignment in 33-nt reads and 59% of those in 50-nt reads. However, spliced alignment revealed 30 cases of TICs with intervening exons, in addition to distant inversions, scrambled genes, and translocations. Our findings increase the catalog of observed TIC gene pairs by 66%. We verified 6 of 6 predicted TICs in all prostate samples, and 2 of 5 predicted novel distant gene fusions, both private events among 54 prostate tumor samples tested. Expression of TICs correlates with that of the upstream gene, which can explain the prostate-specific pattern of some TIC events and the restriction of the SLC45A3-ELK4 e4-e2 TIC to ERG-negative prostate samples, as confirmed in 20 matched prostate tumor and normal samples and 9 lung cancer

  16. Reconstructing the Dynamics of HIV Evolution within Hosts from Serial Deep Sequence Data

    PubMed Central

    Poon, Art F. Y.; Swenson, Luke C.; Bunnik, Evelien M.; Edo-Matas, Diana; Schuitemaker, Hanneke; van 't Wout, Angélique B.; Harrigan, P. Richard

    2012-01-01

    At the early stage of infection, human immunodeficiency virus (HIV)-1 predominantly uses the CCR5 coreceptor for host cell entry. The subsequent emergence of HIV variants that use the CXCR4 coreceptor in roughly half of all infections is associated with an accelerated decline of CD4+ T-cells and rate of progression to AIDS. The presence of a ‘fitness valley’ separating CCR5- and CXCR4-using genotypes is postulated to be a biological determinant of whether the HIV coreceptor switch occurs. Using phylogenetic methods to reconstruct the evolutionary dynamics of HIV within hosts enables us to discriminate between competing models of this process. We have developed a phylogenetic pipeline for the molecular clock analysis, ancestral reconstruction, and visualization of deep sequence data. These data were generated by next-generation sequencing of HIV RNA extracted from longitudinal serum samples (median 7 time points) from 8 untreated subjects with chronic HIV infections (Amsterdam Cohort Studies on HIV-1 infection and AIDS). We used the known dates of sampling to directly estimate rates of evolution and to map ancestral mutations to a reconstructed timeline in units of days. HIV coreceptor usage was predicted from reconstructed ancestral sequences using the geno2pheno algorithm. We determined that the first mutations contributing to CXCR4 use emerged about 16 (per subject range 4 to 30) months before the earliest predicted CXCR4-using ancestor, which preceded the first positive cell-based assay of CXCR4 usage by 10 (range 5 to 25) months. CXCR4 usage arose in multiple lineages within 5 of 8 subjects, and ancestral lineages following alternate mutational pathways before going extinct were common. We observed highly patient-specific distributions and time-scales of mutation accumulation, implying that the role of a fitness valley is contingent on the genotype of the transmitted variant. PMID:23133358

  17. Reconstructing the dynamics of HIV evolution within hosts from serial deep sequence data.

    PubMed

    Poon, Art F Y; Swenson, Luke C; Bunnik, Evelien M; Edo-Matas, Diana; Schuitemaker, Hanneke; van 't Wout, Angélique B; Harrigan, P Richard

    2012-01-01

    At the early stage of infection, human immunodeficiency virus (HIV)-1 predominantly uses the CCR5 coreceptor for host cell entry. The subsequent emergence of HIV variants that use the CXCR4 coreceptor in roughly half of all infections is associated with an accelerated decline of CD4+ T-cells and rate of progression to AIDS. The presence of a 'fitness valley' separating CCR5- and CXCR4-using genotypes is postulated to be a biological determinant of whether the HIV coreceptor switch occurs. Using phylogenetic methods to reconstruct the evolutionary dynamics of HIV within hosts enables us to discriminate between competing models of this process. We have developed a phylogenetic pipeline for the molecular clock analysis, ancestral reconstruction, and visualization of deep sequence data. These data were generated by next-generation sequencing of HIV RNA extracted from longitudinal serum samples (median 7 time points) from 8 untreated subjects with chronic HIV infections (Amsterdam Cohort Studies on HIV-1 infection and AIDS). We used the known dates of sampling to directly estimate rates of evolution and to map ancestral mutations to a reconstructed timeline in units of days. HIV coreceptor usage was predicted from reconstructed ancestral sequences using the geno2pheno algorithm. We determined that the first mutations contributing to CXCR4 use emerged about 16 (per subject range 4 to 30) months before the earliest predicted CXCR4-using ancestor, which preceded the first positive cell-based assay of CXCR4 usage by 10 (range 5 to 25) months. CXCR4 usage arose in multiple lineages within 5 of 8 subjects, and ancestral lineages following alternate mutational pathways before going extinct were common. We observed highly patient-specific distributions and time-scales of mutation accumulation, implying that the role of a fitness valley is contingent on the genotype of the transmitted variant.

  18. Transcriptome-Wide Identification of Hfq-Associated RNAs in Brucella suis by Deep Sequencing

    PubMed Central

    Saadeh, Bashir; Caswell, Clayton C.; Berta, Philippe; Wattam, Alice Rebecca; Roop, R. Martin

    2015-01-01

    ABSTRACT Recent breakthroughs in next-generation sequencing technologies have led to the identification of small noncoding RNAs (sRNAs) as a new important class of regulatory molecules. In prokaryotes, sRNAs are often bound to the chaperone protein Hfq, which allows them to interact with their partner mRNA(s). We screened the genome of the zoonotic and human pathogen Brucella suis 1330 for the presence of this class of RNAs. We designed a coimmunoprecipitation strategy that relies on the use of Hfq as a bait to enrich the sample with sRNAs and eventually their target mRNAs. By deep sequencing analysis of the Hfq-bound transcripts, we identified a number of mRNAs and 33 sRNA candidates associated with Hfq. The expression of 10 sRNAs in the early stationary growth phase was experimentally confirmed by Northern blotting and/or reverse transcriptase PCR. IMPORTANCE Brucella organisms are facultative intracellular pathogens that use stealth strategies to avoid host defenses. Adaptation to the host environment requires tight control of gene expression. Recently, small noncoding RNAs (sRNAs) and the sRNA chaperone Hfq have been shown to play a role in the fine-tuning of gene expression. Here we have used RNA sequencing to identify RNAs associated with the B. suis Hfq protein. We have identified a novel list of 33 sRNAs and 62 Hfq-associated mRNAs for future studies aiming to understand the intracellular lifestyle of this pathogen. PMID:26553849

  19. Whole-genome sequence of Sunxiuqinia dokdonensis DH1(T), isolated from deep sub-seafloor sediment in Dokdo Island.

    PubMed

    Lim, Sooyeon; Chang, Dong-Ho; Kim, Byoung-Chan

    2016-09-01

    Sunxiuqinia dokdonensis DH1(T) was isolated from deep sub-seafloor sediment at a depth of 900 m below the seafloor off Seo-do (the west part of Dokdo Island) in the East Sea of the Republic of Korea and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession LGIA00000000.

  20. Whole-genome sequence of Sunxiuqinia dokdonensis DH1(T), isolated from deep sub-seafloor sediment in Dokdo Island.

    PubMed

    Lim, Sooyeon; Chang, Dong-Ho; Kim, Byoung-Chan

    2016-09-01

    Sunxiuqinia dokdonensis DH1(T) was isolated from deep sub-seafloor sediment at a depth of 900 m below the seafloor off Seo-do (the west part of Dokdo Island) in the East Sea of the Republic of Korea and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession LGIA00000000. PMID:27437183

  1. Draft Genome Sequence of Alcanivorax sp. Strain KX64203 Isolated from Deep-Sea Sediments of Iheya North, Okinawa Trough.

    PubMed

    Zhang, Huan; Liu, Rui; Wang, Mengqiang; Wang, Hao; Gao, Qiang; Hou, Zhanhui; Gao, Dahai; Wang, Lingling

    2016-01-01

    This report describes the draft genome sequence of Alcanivorax sp. strain KX64203, isolated from deep-sea sediment samples. The reads generated by an Ion Torrent PGM were assembled into contigs, with a total size of 4.76 Mb. The data will improve our understanding of the strain's function in alkane degradation. PMID:27563046

  2. Draft Genome Sequence of Alcanivorax sp. Strain KX64203 Isolated from Deep-Sea Sediments of Iheya North, Okinawa Trough

    PubMed Central

    Liu, Rui; Wang, Mengqiang; Wang, Hao; Gao, Qiang; Hou, Zhanhui; Gao, Dahai

    2016-01-01

    This report describes the draft genome sequence of Alcanivorax sp. strain KX64203, isolated from deep-sea sediment samples. The reads generated by an Ion Torrent PGM were assembled into contigs, with a total size of 4.76 Mb. The data will improve our understanding of the strain’s function in alkane degradation. PMID:27563046

  3. A filtering method to generate high quality short reads using illumina paired-end technology.

    PubMed

    Eren, A Murat; Vineis, Joseph H; Morrison, Hilary G; Sogin, Mitchell L

    2013-01-01

    Consensus between independent reads improves the accuracy of genome and transcriptome analyses, however lack of consensus between very similar sequences in metagenomic studies can and often does represent natural variation of biological significance. The common use of machine-assigned quality scores on next generation platforms does not necessarily correlate with accuracy. Here, we describe using the overlap of paired-end, short sequence reads to identify error-prone reads in marker gene analyses and their contribution to spurious OTUs following clustering analysis using QIIME. Our approach can also reduce error in shotgun sequencing data generated from libraries with small, tightly constrained insert sizes. The open-source implementation of this algorithm in Python programming language with user instructions can be obtained from https://github.com/meren/illumina-utils.

  4. High-Resolution Hepatitis C Virus Subtyping Using NS5B Deep Sequencing and Phylogeny, an Alternative to Current Methods

    PubMed Central

    Gregori, Josep; Rodríguez-Frias, Francisco; Buti, Maria; Madejon, Antonio; Perez-del-Pulgar, Sofia; Garcia-Cehic, Damir; Casillas, Rosario; Blasi, Maria; Homs, Maria; Tabernero, David; Alvarez-Tejado, Miguel; Muñoz, Jose Manuel; Cubero, Maria; Caballero, Andrea; delCampo, Jose Antonio; Domingo, Esteban; Belmonte, Irene; Nieto, Leonardo; Lens, Sabela; Muñoz-de-Rueda, Paloma; Sanz-Cameno, Paloma; Sauleda, Silvia; Bes, Marta; Gomez, Jordi; Briones, Carlos; Perales, Celia; Sheldon, Julie; Castells, Lluis; Viladomiu, Lluis; Salmeron, Javier; Ruiz-Extremera, Angela; Quiles-Pérez, Rosa; Moreno-Otero, Ricardo; López-Rodríguez, Rosario; Allende, Helena; Romero-Gómez, Manuel; Guardia, Jaume; Esteban, Rafael; Garcia-Samaniego, Javier; Forns, Xavier

    2014-01-01

    Hepatitis C virus (HCV) is classified into seven major genotypes and 67 subtypes. Recent studies have shown that in HCV genotype 1-infected patients, response rates to regimens containing direct-acting antivirals (DAAs) are subtype dependent. Currently available genotyping methods have limited subtyping accuracy. We have evaluated the performance of a deep-sequencing-based HCV subtyping assay, developed for the 454/GS-Junior platform, in comparison with those of two commercial assays (Versant HCV genotype 2.0 and Abbott Real-time HCV Genotype II) and using direct NS5B sequencing as a gold standard (direct sequencing), in 114 clinical specimens previously tested by first-generation hybridization assay (82 genotype 1 and 32 with uninterpretable results). Phylogenetic analysis of deep-sequencing reads matched subtype 1 calling by population Sanger sequencing (69% 1b, 31% 1a) in 81 specimens and identified a mixed-subtype infection (1b/3a/1a) in one sample. Similarly, among the 32 previously indeterminate specimens, identical genotype and subtype results were obtained by direct and deep sequencing in all but four samples with dual infection. In contrast, both Versant HCV Genotype 2.0 and Abbott Real-time HCV Genotype II failed subtype 1 calling in 13 (16%) samples each and were unable to identify the HCV genotype and/or subtype in more than half of the non-genotype 1 samples. We concluded that deep sequencing is more efficient for HCV subtyping than currently available methods and allows qualitative identification of mixed infections and may be more helpful with respect to informing treatment strategies with new DAA-containing regimens across all HCV subtypes. PMID:25378574

  5. Next-Generation Analysis of Deep Sequencing Data: Bringing Light into the Black Box of SELEX Experiments.

    PubMed

    Blank, Michael

    2016-01-01

    In silico analysis of next-generation sequencing data (NGS; also termed deep sequencing) derived from in vitro selection experiments enables the analysis of the SELEX procedure (Systematic Evolution of Ligands by EXponential enrichment) in an unprecedented depth and improves the identification of aptamers. Besides quality control and optimization of starting libraries, advanced screening strategies for difficult targets or early identification of rare but high quality aptamers which are otherwise lost in the in vitro selection experiments become possible. The high information content of sequence data obtained from selection experiments is furthermore useful for subsequent lead optimization.

  6. Discovering the Unknown: Improving Detection of Novel Species and Genera from Short Reads

    DOE PAGES

    Rosen, Gail L.; Polikar, Robi; Caseiro, Diamantino A.; Essinger, Steven D.; Sokhansanj, Bahrad A.

    2011-01-01

    High-throughput sequencing technologies enable metagenome profiling, simultaneous sequencing of multiple microbial species present within an environmental sample. Since metagenomic data includes sequence fragments (“reads”) from organisms that are absent from any database, new algorithms must be developed for the identification and annotation of novel sequence fragments. Homology-based techniques have been modified to detect novel species and genera, but, composition-based methods, have not been adapted. We develop a detection technique that can discriminate between “known” and “unknown” taxa, which can be used with composition-based methods, as well as a hybrid method. Unlike previous studies, we rigorously evaluate all algorithms for theirmore » ability to detect novel taxa. First, we show that the integration of a detector with a composition-based method performs significantly better than homology-based methods for the detection of novel species and genera, with best performance at finer taxonomic resolutions. Most importantly, we evaluate all the algorithms by introducing an “unknown” class and show that the modified version of PhymmBL has similar or better overall classification performance than the other modified algorithms, especially for the species-level and ultrashort reads. Finally, we evaluate theperformance of several algorithms on a real acid mine drainage dataset.« less

  7. Improved Sequence Learning with Subthalamic Nucleus Deep Brain Stimulation: Evidence for Treatment-Specific Network Modulation

    PubMed Central

    Mure, Hideo; Tang, Chris C.; Argyelan, Miklos; Ghilardi, Maria-Felice; Kaplitt, Michael G.; Dhawan, Vijay; Eidelberg, David

    2015-01-01

    We used a network approach to study the effects of anti-parkinsonian treatment on motor sequence learning in humans. Eight Parkinson’s disease (PD) patients with bilateral subthalamic nucleus (STN) deep brain stimulation underwent H2 15Opositron emission tomography (PET) imaging to measure regional cerebral blood flow (rCBF) while they performed kinematically matched sequence learning and movement tasks at baseline and during stimulation. Network analysis revealed a significant learning-related spatial covariance pattern characterized by consistent increases in subject expression during stimulation (p = 0.008, permutation test). The network was associated with increased activity in the lateral cerebellum, dorsal premotor cortex, and parahippocampal gyrus, with covarying reductions in the supplementary motor area (SMA) and orbitofrontal cortex. Stimulation-mediated increases in network activity correlated with concurrent improvement in learning performance (p < 0.02). To determine whether similar changes occurred during dopaminergic pharmacotherapy, we studied the subjects during an intravenous levodopa infusion titrated to achieve a motor response equivalent to stimulation. Despite consistent improvement in motor ratings during infusion, levodopa did not alter learning performance or network activity. Analysis of learning-related rCBF in network regions revealed improvement in baseline abnormalities with STN stimulation but not levodopa. These effects were most pronounced in the SMA. In this region, a consistent rCBF response to stimulation was observed across subjects and trials (p = 0.01), although the levodopa response was not significant. These findings link the cognitive treatment response in PD to changes in the activity of a specific cerebello-premotor cortical network. Selective modulation of overactive SMA–STN projection pathways may underlie the improvement in learning found with stimulation. PMID:22357863

  8. Sequence stratigraphy of Cenozoic deepwater deposits in the Perdido fold belt, Northwestern Deep Gulf of Mexico

    SciTech Connect

    Fiduk, J.C.; Weimer, P.; Trudgill, B.D. )

    1996-01-01

    Analysis of 12,000 km of 2-D multifold seismic data shows three large Cenozoic wedges of deepwater deposits in the Perdido fold belt that differ in seismic facies, areal distribution, and potential reservoir geometries. Together, these three wedges reflect the changing positions of Cenozoic depocenters and record the evolution of the Perdido structural province. Lithologic interpretation is based upon seismic facies and analogous facies in other drilled areas in the Gulf of Mexico (1) The Paleocene to middle Oligocene interval, which is strongly folded, reflects pre-growth deposition. Paleocene and Oligocene strata thicken westward and consist of medium to high amplitude, subparallel reflections of varying continuity. Broad channels and channel-levee systems are interpreted, suggesting turbidite deposition. These strata are interpreted as the down-dip equivalent of the Wilcox and Frio shallow-water depo-centers and are potentially sand-prone. Eocene strata are low amplitude, discontinuous, subparallel reflections interpreted to be shale-prone. (2) The upper Oligocene to upper Miocene interval consists of multiple well-developed sequences with variable amplitude, divergent reflections, many of which onlap against the fold crests. Sequences within this interval are often modified by erosion, faulting, and/or slumping against the folds. (3) The upper Miocene to Recent interval, which overlies most folds, consists of channel-levee, overbank, slump, and layered or amalgamated turbidite sheet deposits. These are similar to other coeval submarine fan sediments in the northern deep Gulf. Thus, the Cenozoic section in the Perdido fold belt is interpreted as mostly shale-prone, with some sand-prone intervals, based upon seismic facies, isopach thickening to the west, and similar producing facies elsewhere in the Gulf of Mexico.

  9. Sequence stratigraphy of Cenozoic deepwater deposits in the Perdido fold belt, Northwestern Deep Gulf of Mexico

    SciTech Connect

    Fiduk, J.C.; Weimer, P.; Trudgill, B.D.

    1996-12-31

    Analysis of 12,000 km of 2-D multifold seismic data shows three large Cenozoic wedges of deepwater deposits in the Perdido fold belt that differ in seismic facies, areal distribution, and potential reservoir geometries. Together, these three wedges reflect the changing positions of Cenozoic depocenters and record the evolution of the Perdido structural province. Lithologic interpretation is based upon seismic facies and analogous facies in other drilled areas in the Gulf of Mexico (1) The Paleocene to middle Oligocene interval, which is strongly folded, reflects pre-growth deposition. Paleocene and Oligocene strata thicken westward and consist of medium to high amplitude, subparallel reflections of varying continuity. Broad channels and channel-levee systems are interpreted, suggesting turbidite deposition. These strata are interpreted as the down-dip equivalent of the Wilcox and Frio shallow-water depo-centers and are potentially sand-prone. Eocene strata are low amplitude, discontinuous, subparallel reflections interpreted to be shale-prone. (2) The upper Oligocene to upper Miocene interval consists of multiple well-developed sequences with variable amplitude, divergent reflections, many of which onlap against the fold crests. Sequences within this interval are often modified by erosion, faulting, and/or slumping against the folds. (3) The upper Miocene to Recent interval, which overlies most folds, consists of channel-levee, overbank, slump, and layered or amalgamated turbidite sheet deposits. These are similar to other coeval submarine fan sediments in the northern deep Gulf. Thus, the Cenozoic section in the Perdido fold belt is interpreted as mostly shale-prone, with some sand-prone intervals, based upon seismic facies, isopach thickening to the west, and similar producing facies elsewhere in the Gulf of Mexico.

  10. mRNA deep sequencing reveals 75 new genes and a complex transcriptional landscape in Mimivirus.

    PubMed

    Legendre, Matthieu; Audic, Stéphane; Poirot, Olivier; Hingamp, Pascal; Seltzer, Virginie; Byrne, Deborah; Lartigue, Audrey; Lescot, Magali; Bernadac, Alain; Poulain, Julie; Abergel, Chantal; Claverie, Jean-Michel

    2010-05-01

    Mimivirus, a virus infecting Acanthamoeba, is the prototype of the Mimiviridae, the latest addition to the nucleocytoplasmic large DNA viruses. The Mimivirus genome encodes close to 1000 proteins, many of them never before encountered in a virus, such as four amino-acyl tRNA synthetases. To explore the physiology of this exceptional virus and identify the genes involved in the building of its characteristic intracytoplasmic "virion factory," we coupled electron microscopy observations with the massively parallel pyrosequencing of the polyadenylated RNA fractions of Acanthamoeba castellanii cells at various time post-infection. We generated 633,346 reads, of which 322,904 correspond to Mimivirus transcripts. This first application of deep mRNA sequencing (454 Life Sciences [Roche] FLX) to a large DNA virus allowed the precise delineation of the 5' and 3' extremities of Mimivirus mRNAs and revealed 75 new transcripts including several noncoding RNAs. Mimivirus genes are expressed across a wide dynamic range, in a finely regulated manner broadly described by three main temporal classes: early, intermediate, and late. This RNA-seq study confirmed the AAAATTGA sequence as an early promoter element, as well as the presence of palindromes at most of the polyadenylation sites. It also revealed a new promoter element correlating with late gene expression, which is also prominent in Sputnik, the recently described Mimivirus "virophage." These results-validated genome-wide by the hybridization of total RNA extracted from infected Acanthamoeba cells on a tiling array (Agilent)--will constitute the foundation on which to build subsequent functional studies of the Mimivirus/Acanthamoeba system. PMID:20360389

  11. mRNA deep sequencing reveals 75 new genes and a complex transcriptional landscape in Mimivirus

    PubMed Central

    Legendre, Matthieu; Audic, Stéphane; Poirot, Olivier; Hingamp, Pascal; Seltzer, Virginie; Byrne, Deborah; Lartigue, Audrey; Lescot, Magali; Bernadac, Alain; Poulain, Julie; Abergel, Chantal; Claverie, Jean-Michel

    2010-01-01

    Mimivirus, a virus infecting Acanthamoeba, is the prototype of the Mimiviridae, the latest addition to the nucleocytoplasmic large DNA viruses. The Mimivirus genome encodes close to 1000 proteins, many of them never before encountered in a virus, such as four amino-acyl tRNA synthetases. To explore the physiology of this exceptional virus and identify the genes involved in the building of its characteristic intracytoplasmic “virion factory,” we coupled electron microscopy observations with the massively parallel pyrosequencing of the polyadenylated RNA fractions of Acanthamoeba castellanii cells at various time post-infection. We generated 633,346 reads, of which 322,904 correspond to Mimivirus transcripts. This first application of deep mRNA sequencing (454 Life Sciences [Roche] FLX) to a large DNA virus allowed the precise delineation of the 5′ and 3′ extremities of Mimivirus mRNAs and revealed 75 new transcripts including several noncoding RNAs. Mimivirus genes are expressed across a wide dynamic range, in a finely regulated manner broadly described by three main temporal classes: early, intermediate, and late. This RNA-seq study confirmed the AAAATTGA sequence as an early promoter element, as well as the presence of palindromes at most of the polyadenylation sites. It also revealed a new promoter element correlating with late gene expression, which is also prominent in Sputnik, the recently described Mimivirus “virophage.” These results—validated genome-wide by the hybridization of total RNA extracted from infected Acanthamoeba cells on a tiling array (Agilent)—will constitute the foundation on which to build subsequent functional studies of the Mimivirus/Acanthamoeba system. PMID:20360389

  12. Complete genome sequence of Southern tomato virus naturally infecting tomatoes in Bangladesh using small RNA deep sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The complete genome sequence of a Southern tomato virus (STV) isolate on tomato plants in a seed production field in Bangladesh was obtained for the first time using next generation sequencing. The identified isolate STV_BD-13 shares high degree of sequence identity (99%) with several known STV isol...

  13. Deep sequencing reveals unique small RNA repertoire that is regulated during head regeneration in Hydra magnipapillata

    PubMed Central

    Krishna, Srikar; Nair, Aparna; Cheedipudi, Sirisha; Poduval, Deepak; Dhawan, Jyotsna; Palakodeti, Dasaradhi; Ghanekar, Yashoda

    2013-01-01

    Small non-coding RNAs such as miRNAs, piRNAs and endo-siRNAs fine-tune gene expression through post-transcriptional regulation, modulating important processes in development, differentiation, homeostasis and regeneration. Using deep sequencing, we have profiled small non-coding RNAs in Hydra magnipapillata and investigated changes in small RNA expression pattern during head regeneration. Our results reveal a unique repertoire of small RNAs in hydra. We have identified 126 miRNA loci; 123 of these miRNAs are unique to hydra. Less than 50% are conserved across two different strains of Hydra vulgaris tested in this study, indicating a highly diverse nature of hydra miRNAs in contrast to bilaterian miRNAs. We also identified siRNAs derived from precursors with perfect stem–loop structure and that arise from inverted repeats. piRNAs were the most abundant small RNAs in hydra, mapping to transposable elements, the annotated transcriptome and unique non-coding regions on the genome. piRNAs that map to transposable elements and the annotated transcriptome display a ping–pong signature. Further, we have identified several miRNAs and piRNAs whose expression is regulated during hydra head regeneration. Our study defines different classes of small RNAs in this cnidarian model system, which may play a role in orchestrating gene expression essential for hydra regeneration. PMID:23166307

  14. Metagenomes obtained by 'deep sequencing' - what do they tell about the enhanced biological phosphorus removal communities?

    PubMed

    Albertsen, Mads; Saunders, Aaron M; Nielsen, Kåre L; Nielsen, Per H

    2013-01-01

    Metagenomics enables studies of the genomic potential of complex microbial communities by sequencing bulk genomic DNA directly from the environment. Knowledge of the genetic potential of a community can be used to formulate and test ecological hypotheses about stability and performance. In this study deep metagenomics and fluorescence in situ hybridization (FISH) were used to study a full-scale wastewater treatment plant with enhanced biological phosphorus removal (EBPR), and the results were compared to an existing EBPR metagenome. EBPR is a widely used process that relies on a complex community of microorganisms to function properly. Insight into community and species level stability and dynamics is valuable for knowledge-driven optimization of the EBPR process. The metagenomes of the EBPR communities were distinct compared to metagenomes of communities from a wide range of other environments, which could be attributed to selection pressures of the EBPR process. The metabolic potential of one of the key microorganisms in the EPBR process, Accumulibacter, was investigated in more detail in the two plants, revealing a potential importance of phage predation on the dynamics of Accumulibacter populations. The results demonstrate that metagenomics can be used as a powerful tool for system wide characterization of the EBPR community as well as for a deeper understanding of the function of specific community members. Furthermore, we discuss and illustrate some of the general pitfalls in metagenomics and stress the need of additional DNA extraction independent information in metagenome studies.

  15. Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing.

    PubMed

    Manske, Magnus; Miotto, Olivo; Campino, Susana; Auburn, Sarah; Almagro-Garcia, Jacob; Maslen, Gareth; O'Brien, Jack; Djimde, Abdoulaye; Doumbo, Ogobara; Zongo, Issaka; Ouedraogo, Jean-Bosco; Michon, Pascal; Mueller, Ivo; Siba, Peter; Nzila, Alexis; Borrmann, Steffen; Kiara, Steven M; Marsh, Kevin; Jiang, Hongying; Su, Xin-Zhuan; Amaratunga, Chanaki; Fairhurst, Rick; Socheat, Duong; Nosten, Francois; Imwong, Mallika; White, Nicholas J; Sanders, Mandy; Anastasi, Elisa; Alcock, Dan; Drury, Eleanor; Oyola, Samuel; Quail, Michael A; Turner, Daniel J; Ruano-Rubio, Valentin; Jyothi, Dushyanth; Amenga-Etego, Lucas; Hubbart, Christina; Jeffreys, Anna; Rowlands, Kate; Sutherland, Colin; Roper, Cally; Mangano, Valentina; Modiano, David; Tan, John C; Ferdig, Michael T; Amambua-Ngwa, Alfred; Conway, David J; Takala-Harrison, Shannon; Plowe, Christopher V; Rayner, Julian C; Rockett, Kirk A; Clark, Taane G; Newbold, Chris I; Berriman, Matthew; MacInnis, Bronwyn; Kwiatkowski, Dominic P

    2012-07-19

    Malaria elimination strategies require surveillance of the parasite population for genetic changes that demand a public health response, such as new forms of drug resistance. Here we describe methods for the large-scale analysis of genetic variation in Plasmodium falciparum by deep sequencing of parasite DNA obtained from the blood of patients with malaria, either directly or after short-term culture. Analysis of 86,158 exonic single nucleotide polymorphisms that passed genotyping quality control in 227 samples from Africa, Asia and Oceania provides genome-wide estimates of allele frequency distribution, population structure and linkage disequilibrium. By comparing the genetic diversity of individual infections with that of the local parasite population, we derive a metric of within-host diversity that is related to the level of inbreeding in the population. An open-access web application has been established for the exploration of regional differences in allele frequency and of highly differentiated loci in the P. falciparum genome.

  16. A novel method for identifying polymorphic transposable elements via scanning of high-throughput short reads

    PubMed Central

    Kang, Houxiang; Zhu, Dan; Lin, Runmao; Opiyo, Stephen Obol; Jiang, Ning; Shiu, Shin-Han; Wang, Guo-Liang

    2016-01-01

    Identification of polymorphic transposable elements (TEs) is important because TE polymorphism creates genetic diversity and influences the function of genes in the host genome. However, de novo scanning of polymorphic TEs remains a challenge. Here, we report a novel computational method, called PTEMD (polymorphic TEs and their movement detection), for de novo discovery of genome-wide polymorphic TEs. PTEMD searches highly identical sequences using reads supported breakpoint evidences. Using PTEMD, we identified 14 polymorphic TE families (905 sequences) in rice blast fungus Magnaporthe oryzae, and 68 (10,618 sequences) in maize. We validated one polymorphic TE family experimentally, MoTE-1; all MoTE-1 family members are located in different genomic loci in the three tested isolates. We found that 57.1% (8 of 14) of the PTEMD-detected polymorphic TE families in M. oryzae are active. Furthermore, our data indicate that there are more polymorphic DNA transposons in maize than their counterparts of retrotransposons despite the fact that retrotransposons occupy largest fraction of genomic mass. We demonstrated that PTEMD is an effective tool for identifying polymorphic TEs in M. oryzae and maize genomes. PTEMD and the genome-wide polymorphic TEs in M. oryzae and maize are publically available at http://www.kanglab.cn/blast/PTEMD_V1.02.htm. PMID:27098848

  17. A novel method for identifying polymorphic transposable elements via scanning of high-throughput short reads.

    PubMed

    Kang, Houxiang; Zhu, Dan; Lin, Runmao; Opiyo, Stephen Obol; Jiang, Ning; Shiu, Shin-Han; Wang, Guo-Liang

    2016-06-01

    Identification of polymorphic transposable elements (TEs) is important because TE polymorphism creates genetic diversity and influences the function of genes in the host genome. However, de novo scanning of polymorphic TEs remains a challenge. Here, we report a novel computational method, called PTEMD (polymorphic TEs and their movement detection), for de novo discovery of genome-wide polymorphic TEs. PTEMD searches highly identical sequences using reads supported breakpoint evidences. Using PTEMD, we identified 14 polymorphic TE families (905 sequences) in rice blast fungus Magnaporthe oryzae, and 68 (10,618 sequences) in maize. We validated one polymorphic TE family experimentally, MoTE-1; all MoTE-1 family members are located in different genomic loci in the three tested isolates. We found that 57.1% (8 of 14) of the PTEMD-detected polymorphic TE families in M. oryzae are active. Furthermore, our data indicate that there are more polymorphic DNA transposons in maize than their counterparts of retrotransposons despite the fact that retrotransposons occupy largest fraction of genomic mass. We demonstrated that PTEMD is an effective tool for identifying polymorphic TEs in M. oryzae and maize genomes. PTEMD and the genome-wide polymorphic TEs in M. oryzae and maize are publically available at http://www.kanglab.cn/blast/PTEMD_V1.02.htm. PMID:27098848

  18. Microbial Dark Matter: Unusual intervening sequences in 16S rRNA genes of candidate phyla from the deep subsurface

    SciTech Connect

    Jarett, Jessica; Stepanauskas, Ramunas; Kieft, Thomas; Onstott, Tullis; Woyke, Tanja

    2014-03-17

    The Microbial Dark Matter project has sequenced genomes from over 200 single cells from candidate phyla, greatly expanding our knowledge of the ecology, inferred metabolism, and evolution of these widely distributed, yet poorly understood lineages. The second phase of this project aims to sequence an additional 800 single cells from known as well as potentially novel candidate phyla derived from a variety of environments. In order to identify whole genome amplified single cells, screening based on phylogenetic placement of 16S rRNA gene sequences is being conducted. Briefly, derived 16S rRNA gene sequences are aligned to a custom version of the Greengenes reference database and added to a reference tree in ARB using parsimony. In multiple samples from deep subsurface habitats but not from other habitats, a large number of sequences proved difficult to align and therefore to place in the tree. Based on comparisons to reference sequences and structural alignments using SSU-ALIGN, many of these ?difficult? sequences appear to originate from candidate phyla, and contain intervening sequences (IVSs) within the 16S rRNA genes. These IVSs are short (39 - 79 nt) and do not appear to be self-splicing or to contain open reading frames. IVSs were found in the loop regions of stem-loop structures in several different taxonomic groups. Phylogenetic placement of sequences is strongly affected by IVSs; two out of three groups investigated were classified as different phyla after their removal. Based on data from samples screened in this project, IVSs appear to be more common in microbes occurring in deep subsurface habitats, although the reasons for this remain elusive.

  19. Acyclic Identification of Aptamers for Human alpha-Thrombin Using Over-Represented Libraries and Deep Sequencing

    PubMed Central

    Kupakuwana, Gillian V.; Crill, James E.; McPike, Mark P.; Borer, Philip N.

    2011-01-01

    Background Aptamers are oligonucleotides that bind proteins and other targets with high affinity and selectivity. Twenty years ago elements of natural selection were adapted to in vitro selection in order to distinguish aptamers among randomized sequence libraries. The primary bottleneck in traditional aptamer discovery is multiple cycles of in vitro evolution. Methodology/Principal Findings We show that over-representation of sequences in aptamer libraries and deep sequencing enables acyclic identification of aptamers. We demonstrated this by isolating a known family of aptamers for human α-thrombin. Aptamers were found within a library containing an average of 56,000 copies of each possible randomized 15mer segment. The high affinity sequences were counted many times above the background in 2–6 million reads. Clustering analysis of sequences with more than 10 counts distinguished two sequence motifs with candidates at high abundance. Motif I contained the previously observed consensus 15mer, Thb1 (46,000 counts), and related variants with mostly G/T substitutions; secondary analysis showed that affinity for thrombin correlated with abundance (Kd = 12 nM for Thb1). The signal-to-noise ratio for this experiment was roughly 10,000∶1 for Thb1. Motif II was unrelated to Thb1 with the leading candidate (29,000 counts) being a novel aptamer against hexose sugars in the storage and elution buffers for Concanavilin A (Kd = 0.5 µM for α-methyl-mannoside); ConA was used to immobilize α-thrombin. Conclusions/Significance Over-representation together with deep sequencing can dramatically shorten the discovery process, distinguish aptamers having a wide range of affinity for the target, allow an exhaustive search of the sequence space within a simplified library, reduce the quantity of the target required, eliminate cycling artifacts, and should allow multiplexing of sequencing experiments and targets. PMID:21625587

  20. MicroRNA Discovery and Analysis of Pinewood Nematode Bursaphelenchus xylophilus by Deep Sequencing

    PubMed Central

    Huang, Qi-Xing; Cheng, Xin-Yue; Mao, Zhen-Chuan; Wang, Yun-Sheng; Zhao, Li-Lin; Yan, Xia; Ferris, Virginia R.; Xu, Ru-Mei; Xie, Bing-Yan

    2010-01-01

    Background MicroRNAs (miRNAs) are considered to be very important in regulating the growth, development, behavior and stress response in animals and plants in post-transcriptional gene regulation. Pinewood nematode, Bursaphelenchus xylophilus, is an important invasive plant parasitic nematode in Asia. To have a comprehensive knowledge about miRNAs of the nematode is necessary for further in-depth study on roles of miRNAs in the ecological adaptation of the invasive species. Methods and Findings Five small RNA libraries were constructed and sequenced by Illumina/Solexa deep-sequencing technology. A total of 810 miRNA candidates (49 conserved and 761 novel) were predicted by a computational pipeline, of which 57 miRNAs (20 conserved and 37 novel) encoded by 53 miRNA precursors were identified by experimental methods. Ten novel miRNAs were considered to be species-specific miRNAs of B. xylophilus. Comparison of expression profiles of miRNAs in the five small RNA libraries showed that many miRNAs exhibited obviously different expression levels in the third-stage dispersal juvenile and at a cold-stressed status. Most of the miRNAs exhibited obviously down-regulated expression in the dispersal stage. But differences among the three geographic libraries were not prominent. A total of 979 genes were predicted to be targets of these authentic miRNAs. Among them, seven heat shock protein genes were targeted by 14 miRNAs, and six FMRFamide-like neuropeptides genes were targeted by 17 miRNAs. A real-time quantitative polymerase chain reaction was used to quantify the mRNA expression levels of target genes. Conclusions Basing on the fact that a negative correlation existed between the expression profiles of miRNAs and the mRNA expression profiles of their target genes (hsp, flp) by comparing those of the nematodes at a cold stressed status and a normal status, we suggested that miRNAs might participate in ecological adaptation and behavior regulation of the nematode. This is

  1. Deep sequencing reveals the complete genome and evidence for transcriptional activity of the first virus-like sequences identified in Aristotelia chilensis (Maqui Berry).

    PubMed

    Villacreses, Javier; Rojas-Herrera, Marcelo; Sánchez, Carolina; Hewstone, Nicole; Undurraga, Soledad F; Alzate, Juan F; Manque, Patricio; Maracaja-Coutinho, Vinicius; Polanco, Victor

    2015-04-01

    Here, we report the genome sequence and evidence for transcriptional activity of a virus-like element in the native Chilean berry tree Aristotelia chilensis. We propose to name the endogenous sequence as Aristotelia chilensis Virus 1 (AcV1). High-throughput sequencing of the genome of this tree uncovered an endogenous viral element, with a size of 7122 bp, corresponding to the complete genome of AcV1. Its sequence contains three open reading frames (ORFs): ORFs 1 and 2 shares 66%-73% amino acid similarity with members of the Caulimoviridae virus family, especially the Petunia vein clearing virus (PVCV), Petuvirus genus. ORF1 encodes a movement protein (MP); ORF2 a Reverse Transcriptase (RT) and a Ribonuclease H (RNase H) domain; and ORF3 showed no amino acid sequence similarity with any other known virus proteins. Analogous to other known endogenous pararetrovirus sequences (EPRVs), AcV1 is integrated in the genome of Maqui Berry and showed low viral transcriptional activity, which was detected by deep sequencing technology (DNA and RNA-seq). Phylogenetic analysis of AcV1 and other pararetroviruses revealed a closer resemblance with Petuvirus. Overall, our data suggests that AcV1 could be a new member of Caulimoviridae family, genus Petuvirus, and the first evidence of this kind of virus in a fruit plant. PMID:25855242

  2. Deep Sequencing Reveals the Complete Genome and Evidence for Transcriptional Activity of the First Virus-Like Sequences Identified in Aristotelia chilensis (Maqui Berry)

    PubMed Central

    Villacreses, Javier; Rojas-Herrera, Marcelo; Sánchez, Carolina; Hewstone, Nicole; Undurraga, Soledad F.; Alzate, Juan F.; Manque, Patricio; Maracaja-Coutinho, Vinicius; Polanco, Victor

    2015-01-01

    Here, we report the genome sequence and evidence for transcriptional activity of a virus-like element in the native Chilean berry tree Aristotelia chilensis. We propose to name the endogenous sequence as Aristotelia chilensis Virus 1 (AcV1). High-throughput sequencing of the genome of this tree uncovered an endogenous viral element, with a size of 7122 bp, corresponding to the complete genome of AcV1. Its sequence contains three open reading frames (ORFs): ORFs 1 and 2 shares 66%–73% amino acid similarity with members of the Caulimoviridae virus family, especially the Petunia vein clearing virus (PVCV), Petuvirus genus. ORF1 encodes a movement protein (MP); ORF2 a Reverse Transcriptase (RT) and a Ribonuclease H (RNase H) domain; and ORF3 showed no amino acid sequence similarity with any other known virus proteins. Analogous to other known endogenous pararetrovirus sequences (EPRVs), AcV1 is integrated in the genome of Maqui Berry and showed low viral transcriptional activity, which was detected by deep sequencing technology (DNA and RNA-seq). Phylogenetic analysis of AcV1 and other pararetroviruses revealed a closer resemblance with Petuvirus. Overall, our data suggests that AcV1 could be a new member of Caulimoviridae family, genus Petuvirus, and the first evidence of this kind of virus in a fruit plant. PMID:25855242

  3. Deep sequencing reveals microRNAs predictive of antiangiogenic drug response

    PubMed Central

    García-Donas, Jesús; Beuselinck, Benoit; Inglada-Pérez, Lucía; Graña, Osvaldo; Schöffski, Patrick; Wozniak, Agnieszka; Bechter, Oliver; Apellániz-Ruiz, Maria; Leandro-García, Luis Javier; Esteban, Emilio; Castellano, Daniel E.; González del Alba, Aranzazu; Climent, Miguel Angel; Hernando, Susana; Arranz, José Angel; Morente, Manuel; Pisano, David G.; Robledo, Mercedes

    2016-01-01

    The majority of metastatic renal cell carcinoma (RCC) patients are treated with tyrosine kinase inhibitors (TKI) in first-line treatment; however, a fraction are refractory to these antiangiogenic drugs. MicroRNAs (miRNAs) are regulatory molecules proven to be accurate biomarkers in cancer. Here, we identified miRNAs predictive of progressive disease under TKI treatment through deep sequencing of 74 metastatic clear cell RCC cases uniformly treated with these drugs. Twenty-nine miRNAs were differentially expressed in the tumors of patients who progressed under TKI therapy (P values from 6 × 10–9 to 3 × 10–3). Among 6 miRNAs selected for validation in an independent series, the most relevant associations corresponded to miR–1307-3p, miR–155-5p, and miR–221-3p (P = 4.6 × 10–3, 6.5 × 10–3, and 3.4 × 10–2, respectively). Furthermore, a 2 miRNA–based classifier discriminated individuals with progressive disease upon TKI treatment (AUC = 0.75, 95% CI, 0.64–0.85; P = 1.3 × 10–4) with better predictive value than clinicopathological risk factors commonly used. We also identified miRNAs significantly associated with progression-free survival and overall survival (P = 6.8 × 10–8 and 7.8 × 10–7 for top hits, respectively), and 7 overlapped with early progressive disease. In conclusion, this is the first miRNome comprehensive study, to our knowledge, that demonstrates a predictive value of miRNAs for TKI response and provides a new set of relevant markers that can help rationalize metastatic RCC treatment. PMID:27699216

  4. Deep sequencing reveals a novel closterovirus associated with wild rose leaf rosette disease.

    PubMed

    He, Yan; Yang, Zuokun; Hong, Ni; Wang, Guoping; Ning, Guogui; Xu, Wenxing

    2015-06-01

    A bizarre virus-like symptom of a leaf rosette formed by dense small leaves on branches of wild roses (Rosa multiflora Thunb.), designated as 'wild rose leaf rosette disease' (WRLRD), was observed in China. To investigate the presumed causal virus, a wild rose sample affected by WRLRD was subjected to deep sequencing of small interfering RNAs (siRNAs) for a complete survey of the infecting viruses and viroids. The assembly of siRNAs led to the reconstruction of the complete genomes of three known viruses, namely Apple stem grooving virus (ASGV), Blackberry chlorotic ringspot virus (BCRV) and Prunus necrotic ringspot virus (PNRSV), and of a novel virus provisionally named 'rose leaf rosette-associated virus' (RLRaV). Phylogenetic analysis clearly placed RLRaV alongside members of the genus Closterovirus, family Closteroviridae. Genome organization of RLRaV RNA (17,653 nucleotides) showed 13 open reading frames (ORFs), except ORF1 and the quintuple gene block, most of which showed no significant similarities with known viral proteins, but, instead, had detectable identities to fungal or bacterial proteins. Additional novel molecular features indicated that RLRaV seems to be the most complex virus among the known genus members. To our knowledge, this is the first report of WRLRD and its associated closterovirus, as well as two ilarviruses and one capilovirus, infecting wild roses. Our findings present novel information about the closterovirus and the aetiology of this rose disease which should facilitate its control. More importantly, the novel features of RLRaV help to clarify the molecular and evolutionary features of the closterovirus.

  5. Identifying Conserved and Novel MicroRNAs in Developing Seeds of Brassica napus Using Deep Sequencing

    PubMed Central

    Körbes, Ana Paula; Machado, Ronei Dorneles; Guzman, Frank; Almerão, Mauricio Pereira; de Oliveira, Luiz Felipe Valter; Loss-Morais, Guilherme; Turchetto-Zolet, Andreia Carina; Cagliari, Alexandro; dos Santos Maraschin, Felipe; Margis-Pinheiro, Marcia; Margis, Rogerio

    2012-01-01

    MicroRNAs (miRNAs) are important post-transcriptional regulators of plant development and seed formation. In Brassica napus, an important edible oil crop, valuable lipids are synthesized and stored in specific seed tissues during embryogenesis. The miRNA transcriptome of B. napus is currently poorly characterized, especially at different seed developmental stages. This work aims to describe the miRNAome of developing seeds of B. napus by identifying plant-conserved and novel miRNAs and comparing miRNA abundance in mature versus developing seeds. Members of 59 miRNA families were detected through a computational analysis of a large number of reads obtained from deep sequencing two small RNA and two RNA-seq libraries of (i) pooled immature developing stages and (ii) mature B. napus seeds. Among these miRNA families, 17 families are currently known to exist in B. napus; additionally 29 families not reported in B. napus but conserved in other plant species were identified by alignment with known plant mature miRNAs. Assembled mRNA-seq contigs allowed for a search of putative new precursors and led to the identification of 13 novel miRNA families. Analysis of miRNA population between libraries reveals that several miRNAs and isomiRNAs have different abundance in developing stages compared to mature seeds. The predicted miRNA target genes encode a broad range of proteins related to seed development and energy storage. This work presents a comparative study of the miRNA transcriptome of mature and developing B. napus seeds and provides a basis for future research on individual miRNAs and their functions in embryogenesis, seed maturation and lipid accumulation in B. napus. PMID:23226347

  6. Deep sequencing reveals microRNAs predictive of antiangiogenic drug response

    PubMed Central

    García-Donas, Jesús; Beuselinck, Benoit; Inglada-Pérez, Lucía; Graña, Osvaldo; Schöffski, Patrick; Wozniak, Agnieszka; Bechter, Oliver; Apellániz-Ruiz, Maria; Leandro-García, Luis Javier; Esteban, Emilio; Castellano, Daniel E.; González del Alba, Aranzazu; Climent, Miguel Angel; Hernando, Susana; Arranz, José Angel; Morente, Manuel; Pisano, David G.; Robledo, Mercedes

    2016-01-01

    The majority of metastatic renal cell carcinoma (RCC) patients are treated with tyrosine kinase inhibitors (TKI) in first-line treatment; however, a fraction are refractory to these antiangiogenic drugs. MicroRNAs (miRNAs) are regulatory molecules proven to be accurate biomarkers in cancer. Here, we identified miRNAs predictive of progressive disease under TKI treatment through deep sequencing of 74 metastatic clear cell RCC cases uniformly treated with these drugs. Twenty-nine miRNAs were differentially expressed in the tumors of patients who progressed under TKI therapy (P values from 6 × 10–9 to 3 × 10–3). Among 6 miRNAs selected for validation in an independent series, the most relevant associations corresponded to miR–1307-3p, miR–155-5p, and miR–221-3p (P = 4.6 × 10–3, 6.5 × 10–3, and 3.4 × 10–2, respectively). Furthermore, a 2 miRNA–based classifier discriminated individuals with progressive disease upon TKI treatment (AUC = 0.75, 95% CI, 0.64–0.85; P = 1.3 × 10–4) with better predictive value than clinicopathological risk factors commonly used. We also identified miRNAs significantly associated with progression-free survival and overall survival (P = 6.8 × 10–8 and 7.8 × 10–7 for top hits, respectively), and 7 overlapped with early progressive disease. In conclusion, this is the first miRNome comprehensive study, to our knowledge, that demonstrates a predictive value of miRNAs for TKI response and provides a new set of relevant markers that can help rationalize metastatic RCC treatment.

  7. Unique gene program of rat small resistance mesenteric arteries as revealed by deep RNA sequencing

    PubMed Central

    Reho, John J; Shetty, Amol; Dippold, Rachael P; Mahurkar, Anup; Fisher, Steven A

    2015-01-01

    Deep sequencing of RNA samples from rat small mesenteric arteries (MA) and aorta (AO) identified common and unique features of their gene programs. ∼5% of mRNAs were quantitatively differentially expressed in MA versus AO. Unique transcriptional control in MA smooth muscle is suggested by the selective or enriched expression of transcription factors Nkx2-3, HAND2, and Tcf21 (Capsulin). Enrichment in AO of PPAR transcription factors and their target genes of mitochondrial function, lipid metabolism, and oxidative phosphorylation is consistent with slow (oxidative) tonic smooth muscle. In contrast MA was enriched in contractile and calcium channel mRNAs suggestive of components of fast (glycolytic) phasic smooth muscle. Myosin phosphatase regulatory subunit paralogs Mypt1 and p85 were expressed at similar levels, while smooth muscle MLCK was the only such kinase expressed, suggesting functional redundancy of the former but not the latter in accordance with mouse knockout studies. With regard to vaso-regulatory signals, purinergic receptors P2rx1 and P2rx5 were reciprocally expressed in MA versus AO, while the olfactory receptor Olr59 was enriched in MA. Alox15, which generates the EDHF HPETE, was enriched in MA while eNOS was equally expressed, consistent with the greater role of EDHF in the smaller arteries. mRNAs that were not expressed at a level consistent with impugned function include skeletal myogenic factors, IKK2, nonmuscle myosin, and Gnb3. This screening analysis of gene expression in the small mesenteric resistance arteries suggests testable hypotheses regarding unique aspects of small artery function in the regional control of blood flow. PMID:26156969

  8. Development of a candidate reference material for adventitious virus detection in vaccine and biologicals manufacturing by deep sequencing

    PubMed Central

    Mee, Edward T.; Preston, Mark D.; Minor, Philip D.; Schepelmann, Silke; Huang, Xuening; Nguyen, Jenny; Wall, David; Hargrove, Stacey; Fu, Thomas; Xu, George; Li, Li; Cote, Colette; Delwart, Eric; Li, Linlin; Hewlett, Indira; Simonyan, Vahan; Ragupathy, Viswanath; Alin, Voskanian-Kordi; Mermod, Nicolas; Hill, Christiane; Ottenwälder, Birgit; Richter, Daniel C.; Tehrani, Arman; Jacqueline, Weber-Lehmann; Cassart, Jean-Pol; Letellier, Carine; Vandeputte, Olivier; Ruelle, Jean-Louis; Deyati, Avisek; La Neve, Fabio; Modena, Chiara; Mee, Edward; Schepelmann, Silke; Preston, Mark; Minor, Philip; Eloit, Marc; Muth, Erika; Lamamy, Arnaud; Jagorel, Florence; Cheval, Justine; Anscombe, Catherine; Misra, Raju; Wooldridge, David; Gharbia, Saheer; Rose, Graham; Ng, Siemon H.S.; Charlebois, Robert L.; Gisonni-Lex, Lucy; Mallet, Laurent; Dorange, Fabien; Chiu, Charles; Naccache, Samia; Kellam, Paul; van der Hoek, Lia; Cotten, Matt; Mitchell, Christine; Baier, Brian S.; Sun, Wenping; Malicki, Heather D.

    2016-01-01

    Background Unbiased deep sequencing offers the potential for improved adventitious virus screening in vaccines and biotherapeutics. Successful implementation of such assays will require appropriate control materials to confirm assay performance and sensitivity. Methods A common reference material containing 25 target viruses was produced and 16 laboratories were invited to process it using their preferred adventitious virus detection assay. Results Fifteen laboratories returned results, obtained using a wide range of wet-lab and informatics methods. Six of 25 target viruses were detected by all laboratories, with the remaining viruses detected by 4–14 laboratories. Six non-target viruses were detected by three or more laboratories. Conclusion The study demonstrated that a wide range of methods are currently used for adventitious virus detection screening in biological products by deep sequencing and that they can yield significantly different results. This underscores the need for common reference materials to ensure satisfactory assay performance and enable comparisons between laboratories. PMID:26709640

  9. Sequence stratigraphy and sedimentology of a shelf-margin lowstand wedge in the deep Wilcox flexture trend of south Texas

    SciTech Connect

    Snedden, J.W. ); Cooke, J.C. ); Johnson, R.K.; Conrad, K.T. )

    1991-03-01

    An integrated sedimentologic and biostratigraphic study of 15 wells and over 1400 ft (430 m) of core facilitated establishment of a sequence stratigraphic framework for the deep Wilcox Group of south Texas. This analysis also revealed the presence of a dip-restricted, sand-prone sediment wedge that produces hydrocarbons in growth-fault structures. A sequence stratigraphic framework for the Wilcox was constructed via the use of faunal-increase markers, thin intervals present in well cuttings characterized by rises in the relative abundance of planktonic foraminifera. These marine flooding horizons can be utilized to subdivide the Wilcox Group into four depositional sequences termed P(aleogene)-8, P-7, P-4, and P-3, in descending order. Identification of standard sequence-bounding unconformities is hampered by the poor seismic expression of the Wilcox and the structural complexity of the area.

  10. Proteome-wide Identification of Novel Ceramide-binding Proteins by Yeast Surface cDNA Display and Deep Sequencing.

    PubMed

    Bidlingmaier, Scott; Ha, Kevin; Lee, Nam-Kyung; Su, Yang; Liu, Bin

    2016-04-01

    Although the bioactive sphingolipid ceramide is an important cell signaling molecule, relatively few direct ceramide-interacting proteins are known. We used an approach combining yeast surface cDNA display and deep sequencing technology to identify novel proteins binding directly to ceramide. We identified 234 candidate ceramide-binding protein fragments and validated binding for 20. Most (17) bound selectively to ceramide, although a few (3) bound to other lipids as well. Several novel ceramide-binding domains were discovered, including the EF-hand calcium-binding motif, the heat shock chaperonin-binding motif STI1, the SCP2 sterol-binding domain, and the tetratricopeptide repeat region motif. Interestingly, four of the verified ceramide-binding proteins (HPCA, HPCAL1, NCS1, and VSNL1) and an additional three candidate ceramide-binding proteins (NCALD, HPCAL4, and KCNIP3) belong to the neuronal calcium sensor family of EF hand-containing proteins. We used mutagenesis to map the ceramide-binding site in HPCA and to create a mutant HPCA that does not bind to ceramide. We demonstrated selective binding to ceramide by mammalian cell-produced wild type but not mutant HPCA. Intriguingly, we also identified a fragment from prostaglandin D2synthase that binds preferentially to ceramide 1-phosphate. The wide variety of proteins and domains capable of binding to ceramide suggests that many of the signaling functions of ceramide may be regulated by direct binding to these proteins. Based on the deep sequencing data, we estimate that our yeast surface cDNA display library covers ∼60% of the human proteome and our selection/deep sequencing protocol can identify target-interacting protein fragments that are present at extremely low frequency in the starting library. Thus, the yeast surface cDNA display/deep sequencing approach is a rapid, comprehensive, and flexible method for the analysis of protein-ligand interactions, particularly for the study of non-protein ligands. PMID

  11. The mitochondrial genome sequence of a deep-sea, hydrothermal vent limpet, Lepetodrilus nux, presents a novel vetigastropod gene arrangement.

    PubMed

    Nakajima, Yuichi; Shinzato, Chuya; Khalturina, Mariia; Nakamura, Masako; Watanabe, Hiromi; Satoh, Noriyuki; Mitarai, Satoshi

    2016-08-01

    While mitochondrial (mt) genomes are used extensively for comparative and evolutionary genomics, few mt genomes of deep-sea species, including hydrothermal vent species, have been determined. The Genus Lepetodrilus is a major deep-sea gastropod taxon that occurs in various deep-sea ecosystems. Using next-generation sequencing, we determined nearly the complete mitochondrial genome sequence of Lepetodrilus nux, which inhabits hydrothermal vents in the Okinawa Trough. The total length of the mitochondrial genome is 16,353bp, excluding the repeat region. It contains 13 protein-coding genes, 22 tRNA genes, two rRNA genes, and a control region, typical of most metazoan genomes. Compared with other vetigastropod mt genome sequences, L. nux employs a novel mt gene arrangement. Other novel arrangements have been identified in the vetigastropod, Fissurella volcano, and in Chrysomallon squamiferum, a neomphaline gastropod; however, all three gene arrangements are different, and Bayesian inference suggests that each lineage diverged independently. Our findings suggest that vetigastropod mt gene arrangements are more diverse than previously realized. PMID:27102631

  12. The mitochondrial genome sequence of a deep-sea, hydrothermal vent limpet, Lepetodrilus nux, presents a novel vetigastropod gene arrangement.

    PubMed

    Nakajima, Yuichi; Shinzato, Chuya; Khalturina, Mariia; Nakamura, Masako; Watanabe, Hiromi; Satoh, Noriyuki; Mitarai, Satoshi

    2016-08-01

    While mitochondrial (mt) genomes are used extensively for comparative and evolutionary genomics, few mt genomes of deep-sea species, including hydrothermal vent species, have been determined. The Genus Lepetodrilus is a major deep-sea gastropod taxon that occurs in various deep-sea ecosystems. Using next-generation sequencing, we determined nearly the complete mitochondrial genome sequence of Lepetodrilus nux, which inhabits hydrothermal vents in the Okinawa Trough. The total length of the mitochondrial genome is 16,353bp, excluding the repeat region. It contains 13 protein-coding genes, 22 tRNA genes, two rRNA genes, and a control region, typical of most metazoan genomes. Compared with other vetigastropod mt genome sequences, L. nux employs a novel mt gene arrangement. Other novel arrangements have been identified in the vetigastropod, Fissurella volcano, and in Chrysomallon squamiferum, a neomphaline gastropod; however, all three gene arrangements are different, and Bayesian inference suggests that each lineage diverged independently. Our findings suggest that vetigastropod mt gene arrangements are more diverse than previously realized.

  13. Homology-independent discovery of replicating pathogenic circular RNAs by deep sequencing and a new computational algorithm.

    PubMed

    Wu, Qingfa; Wang, Ying; Cao, Mengji; Pantaleo, Vitantonio; Burgyan, Joszef; Li, Wan-Xiang; Ding, Shou-Wei

    2012-03-01

    A common challenge in pathogen discovery by deep sequencing approaches is to recognize viral or subviral pathogens in samples of diseased tissue that share no significant homology with a known pathogen. Here we report a homology-independent approach for discovering viroids, a distinct class of free circular RNA subviral pathogens that encode no protein and are known to infect plants only. Our approach involves analyzing the sequences of the total small RNAs of the infected plants obtained by deep sequencing with a unique computational algorithm, progressive filtering of overlapping small RNAs (PFOR). Viroid infection triggers production of viroid-derived overlapping siRNAs that cover the entire genome with high densities. PFOR retains viroid-specific siRNAs for genome assembly by progressively eliminating nonoverlapping small RNAs and those that overlap but cannot be assembled into a direct repeat RNA, which is synthesized from circular or multimeric repeated-sequence templates during viroid replication. We show that viroids from the two known families are readily identified and their full-length sequences assembled by PFOR from small RNAs sequenced from infected plants. PFOR analysis of a grapevine library further identified a viroid-like circular RNA 375 nt long that shared no significant sequence homology with known molecules and encoded active hammerhead ribozymes in RNAs of both plus and minus polarities, which presumably self-cleave to release monomer from multimeric replicative intermediates. A potential application of the homology-independent approach for viroid discovery in plant and animal species where RNA replication triggers the biogenesis of siRNAs is discussed. PMID:22345560

  14. Increasing the Scale of Deep Sequencing Data Analysis with BioHDF

    SciTech Connect

    Smith, Todd

    2010-06-03

    Todd Smith of Geospiza discusses how BioHDF systems can be used with next generation DNA sequencing technologies on June 3, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  15. Natural variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions (2013 DOE JGI Genomics of Energy and Environment 8th Annual User Meeting)

    SciTech Connect

    Gordon, Sean

    2013-03-01

    Sean Gordon of the USDA on "Natural variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions" at the 8th Annual Genomics of Energy & Environment Meeting on March 27, 2013 in Walnut Creek, Calif.

  16. Heavy-light chain interrelations of MS-associated immunoglobulins probed by deep sequencing and rational variation.

    PubMed

    Lomakin, Yakov A; Zakharova, Maria Yu; Stepanov, Alexey V; Dronina, Maria A; Smirnov, Ivan V; Bobik, Tatyana V; Pyrkov, Andrey Yu; Tikunova, Nina V; Sharanova, Svetlana N; Boitsov, Vitali M; Vyazmin, Sergey Yu; Kabilov, Marsel R; Tupikin, Alexey E; Krasnov, Alexey N; Bykova, Nadezda A; Medvedeva, Yulia A; Fridman, Marina V; Favorov, Alexander V; Ponomarenko, Natalia A; Dubina, Michael V; Boyko, Alexey N; Vlassov, Valentin V; Belogurov, Alexey A; Gabibov, Alexander G

    2014-12-01

    The mechanisms triggering most of autoimmune diseases are still obscure. Autoreactive B cells play a crucial role in the development of such pathologies and, in particular, production of autoantibodies of different specificities. The combination of deep-sequencing technology with functional studies of antibodies selected from highly representative immunoglobulin combinatorial libraries may provide unique information on specific features in the repertoires of autoreactive B cells. Here, we have analyzed cross-combinations of the variable regions of human immunoglobulins against the myelin basic protein (MBP) previously selected from a multiple sclerosis (MS)-related scFv phage-display library. On the other hand, we have performed deep sequencing of the sublibraries of scFvs against MBP, Epstein-Barr virus (EBV) latent membrane protein 1 (LMP1), and myelin oligodendrocyte glycoprotein (MOG). Bioinformatics analysis of sequencing data and surface plasmon resonance (SPR) studies have shown that it is the variable fragments of antibody heavy chains that mainly determine both the affinity of antibodies to the parent autoantigen and their cross-reactivity. It is suggested that LMP1-cross-reactive anti-myelin autoantibodies contain heavy chains encoded by certain germline gene segments, which may be a hallmark of the EBV-specific B cell subpopulation involved in MS triggering.

  17. HPV Population Profiling in Healthy Men by Next-Generation Deep Sequencing Coupled with HPV-QUEST

    PubMed Central

    Yin, Li; Yao, Jin; Chang, Kaifen; Gardner, Brent P.; Yu, Fahong; Giuliano, Anna R.; Goodenow, Maureen M.

    2016-01-01

    Multiple-type human papillomaviruses (HPV) infection presents a greater risk for persistence in asymptomatic individuals and may accelerate cancer development. To extend the scope of HPV types defined by probe-based assays, multiplexing deep sequencing of HPV L1, coupled with an HPV-QUEST genotyping server and a bioinformatic pipeline, was established and applied to survey the diversity of HPV genotypes among a subset of healthy men from the HPV in Men (HIM) Multinational Study. Twenty-one HPV genotypes (12 high-risk and 9 low-risk) were detected in the genital area from 18 asymptomatic individuals. A single HPV type, either HPV16, HPV6b or HPV83, was detected in 7 individuals, while coinfection by 2 to 5 high-risk and/or low-risk genotypes was identified in the other 11 participants. In two individuals studied for over one year, HPV16 persisted, while fluctuations of coinfecting genotypes occurred. HPV L1 regions were generally identical between query and reference sequences, although nonsynonymous and synonymous nucleotide polymorphisms of HPV16, 18, 31, 35h, 59, 70, 73, cand85, 6b, 62, 81, 83, cand89 or JEB2 L1 genotypes, mostly unidentified by linear array, were evident. Deep sequencing coupled with HPV-QUEST provides efficient and unambiguous classification of HPV genotypes in multiple-type HPV infection in host ecosystems. PMID:26821041

  18. Deep sequencing detects very-low-grade somatic mosaicism in the unaffected mother of siblings with nemaline myopathy.

    PubMed

    Miyatake, Satoko; Koshimizu, Eriko; Hayashi, Yukiko K; Miya, Kazushi; Shiina, Masaaki; Nakashima, Mitsuko; Tsurusaki, Yoshinori; Miyake, Noriko; Saitsu, Hirotomo; Ogata, Kazuhiro; Nishino, Ichizo; Matsumoto, Naomichi

    2014-07-01

    When an expected mutation in a particular disease-causing gene is not identified in a suspected carrier, it is usually assumed to be due to germline mosaicism. We report here very-low-grade somatic mosaicism in ACTA1 in an unaffected mother of two siblings affected with a neonatal form of nemaline myopathy. The mosaicism was detected by deep resequencing using a next-generation sequencer. We identified a novel heterozygous mutation in ACTA1, c.448A>G (p.Thr150Ala), in the affected siblings. Three-dimensional structural modeling suggested that this mutation may affect polymerization and/or actin's interactions with other proteins. In this family, we expected autosomal dominant inheritance with either parent demonstrating germline or somatic mosaicism. Sanger sequencing identified no mutation. However, further deep resequencing of this mutation on a next-generation sequencer identified very-low-grade somatic mosaicism in the mother: 0.4%, 1.1%, and 8.3% in the saliva, blood leukocytes, and nails, respectively. Our study demonstrates the possibility of very-low-grade somatic mosaicism in suspected carriers, rather than germline mosaicism. PMID:24852243

  19. HPV Population Profiling in Healthy Men by Next-Generation Deep Sequencing Coupled with HPV-QUEST.

    PubMed

    Yin, Li; Yao, Jin; Chang, Kaifen; Gardner, Brent P; Yu, Fahong; Giuliano, Anna R; Goodenow, Maureen M

    2016-01-25

    Multiple-type human papillomaviruses (HPV) infection presents a greater risk for persistence in asymptomatic individuals and may accelerate cancer development. To extend the scope of HPV types defined by probe-based assays, multiplexing deep sequencing of HPV L1, coupled with an HPV-QUEST genotyping server and a bioinformatic pipeline, was established and applied to survey the diversity of HPV genotypes among a subset of healthy men from the HPV in Men (HIM) Multinational Study. Twenty-one HPV genotypes (12 high-risk and 9 low-risk) were detected in the genital area from 18 asymptomatic individuals. A single HPV type, either HPV16, HPV6b or HPV83, was detected in 7 individuals, while coinfection by 2 to 5 high-risk and/or low-risk genotypes was identified in the other 11 participants. In two individuals studied for over one year, HPV16 persisted, while fluctuations of coinfecting genotypes occurred. HPV L1 regions were generally identical between query and reference sequences, although nonsynonymous and synonymous nucleotide polymorphisms of HPV16, 18, 31, 35h, 59, 70, 73, cand85, 6b, 62, 81, 83, cand89 or JEB2 L1 genotypes, mostly unidentified by linear array, were evident. Deep sequencing coupled with HPV-QUEST provides efficient and unambiguous classification of HPV genotypes in multiple-type HPV infection in host ecosystems.

  20. Ultra-deep sequencing leads to earlier and more sensitive detection of the tyrosine kinase inhibitor resistance mutation T315I in chronic myeloid leukemia

    PubMed Central

    Baer, Constance; Kern, Wolfgang; Koch, Sarah; Nadarajah, Niroshan; Schindela, Sonja; Meggendorfer, Manja; Haferlach, Claudia; Haferlach, Torsten

    2016-01-01

    Chronic myeloid leukemia cells acquire resistance to tyrosine kinase inhibitors through mutations in the ABL1 kinase domain. The T315I mutation mediates resistance to imatinib, dasatinib, nilotinib and bosutinib, whereas sensitivity to ponatinib remains. Mutation detection by conventional Sanger sequencing requires 10%–20% expansion of the mutated subclone. We studied the T315I mutation development by ultra-deep sequencing on the 454 XL+ platform (Roche) in comparison to Sanger sequencing. By ultra-deep sequencing, mutations were detected at loads of 1%–2%. We selected 40 patients who had failed first-line to third-line treatment (imatinib, dasatinib, nilotinib) and had high loads of the T315I mutation detected by Sanger sequencing. We confirmed T315I mutations by ultra-deep sequencing and investigated the mutation dynamics by backtracking earlier samples. In 20 of 40 patients, we identified the T315I three months (median) before Sanger sequencing detection limits were reached. To exclude sporadic low percentage mutation development without subsequent mutation outgrowth, we selected 42 patients without resistance mutations detected by Sanger sequencing but loss of major molecular response. Here, no mutation was detected by ultradeep sequencing. Additional non-T315I resistance mutations were found in 20 of 40 patients. Only 15% had two mutations per cell; the other cases showed multiple independently mutated clones and the T315I clone demonstrated a rapid outgrowth. In conclusion, T315I mutations could be detected earlier by ultra-deep sequencing compared to Sanger sequencing in a selected group of cases. Earlier mutation detection by ultra-deep sequencing might allow treatment to be changed before clonal increase of cells with the T315I mutation. PMID:27102501

  1. Simultaneous Identification of DNA and RNA Viruses Present in Pig Faeces Using Process-Controlled Deep Sequencing

    PubMed Central

    Sachsenröder, Jana; Twardziok, Sven; Hammerl, Jens A.; Janczyk, Pawel; Wrede, Paul; Hertwig, Stefan; Johne, Reimar

    2012-01-01

    Background Animal faeces comprise a community of many different microorganisms including bacteria and viruses. Only scarce information is available about the diversity of viruses present in the faeces of pigs. Here we describe a protocol, which was optimized for the purification of the total fraction of viral particles from pig faeces. The genomes of the purified DNA and RNA viruses were simultaneously amplified by PCR and subjected to deep sequencing followed by bioinformatic analyses. The efficiency of the method was monitored using a process control consisting of three bacteriophages (T4, M13 and MS2) with different morphology and genome types. Defined amounts of the bacteriophages were added to the sample and their abundance was assessed by quantitative PCR during the preparation procedure. Results The procedure was applied to a pooled faecal sample of five pigs. From this sample, 69,613 sequence reads were generated. All of the added bacteriophages were identified by sequence analysis of the reads. In total, 7.7% of the reads showed significant sequence identities with published viral sequences. They mainly originated from bacteriophages (73.9%) and mammalian viruses (23.9%); 0.8% of the sequences showed identities to plant viruses. The most abundant detected porcine viruses were kobuvirus, rotavirus C, astrovirus, enterovirus B, sapovirus and picobirnavirus. In addition, sequences with identities to the chimpanzee stool-associated circular ssDNA virus were identified. Whole genome analysis indicates that this virus, tentatively designated as pig stool-associated circular ssDNA virus (PigSCV), represents a novel pig virus. Conclusion The established protocol enables the simultaneous detection of DNA and RNA viruses in pig faeces including the identification of so far unknown viruses. It may be applied in studies investigating aetiology, epidemiology and ecology of diseases. The implemented process control serves as quality control, ensures comparability of the

  2. Using deep RNA sequencing for the structural annotation of the laccaria bicolor mycorrhizal transcriptome.

    SciTech Connect

    Larsen, P. E.; Trivedi, G.; Sreedasyam, A.; Lu, V.; Podila, G. K.; Collart, F. R.; Biosciences Division; Univ. of Alabama

    2010-07-06

    Accurate structural annotation is important for prediction of function and required for in vitro approaches to characterize or validate the gene expression products. Despite significant efforts in the field, determination of the gene structure from genomic data alone is a challenging and inaccurate process. The ease of acquisition of transcriptomic sequence provides a direct route to identify expressed sequences and determine the correct gene structure. We developed methods to utilize RNA-seq data to correct errors in the structural annotation and extend the boundaries of current gene models using assembly approaches. The methods were validated with a transcriptomic data set derived from the fungus Laccaria bicolor, which develops a mycorrhizal symbiotic association with the roots of many tree species. Our analysis focused on the subset of 1501 gene models that are differentially expressed in the free living vs. mycorrhizal transcriptome and are expected to be important elements related to carbon metabolism, membrane permeability and transport, and intracellular signaling. Of the set of 1501 gene models, 1439 (96%) successfully generated modified gene models in which all error flags were successfully resolved and the sequences aligned to the genomic sequence. The remaining 4% (62 gene models) either had deviations from transcriptomic data that could not be spanned or generated sequence that did not align to genomic sequence. The outcome of this process is a set of high confidence gene models that can be reliably used for experimental characterization of protein function. 69% of expressed mycorrhizal JGI 'best' gene models deviated from the transcript sequence derived by this method. The transcriptomic sequence enabled correction of a majority of the structural inconsistencies and resulted in a set of validated models for 96% of the mycorrhizal genes. The method described here can be applied to improve gene structural annotation in other species, provided that there

  3. Using Deep RNA Sequencing for the Structural Annotation of the Laccaria Bicolor Mycorrhizal Transcriptome

    PubMed Central

    Larsen, Peter E.; Trivedi, Geetika; Sreedasyam, Avinash; Lu, Vincent; Podila, Gopi K.; Collart, Frank R.

    2010-01-01

    Background Accurate structural annotation is important for prediction of function and required for in vitro approaches to characterize or validate the gene expression products. Despite significant efforts in the field, determination of the gene structure from genomic data alone is a challenging and inaccurate process. The ease of acquisition of transcriptomic sequence provides a direct route to identify expressed sequences and determine the correct gene structure. Methodology We developed methods to utilize RNA-seq data to correct errors in the structural annotation and extend the boundaries of current gene models using assembly approaches. The methods were validated with a transcriptomic data set derived from the fungus Laccaria bicolor, which develops a mycorrhizal symbiotic association with the roots of many tree species. Our analysis focused on the subset of 1501 gene models that are differentially expressed in the free living vs. mycorrhizal transcriptome and are expected to be important elements related to carbon metabolism, membrane permeability and transport, and intracellular signaling. Of the set of 1501 gene models, 1439 (96%) successfully generated modified gene models in which all error flags were successfully resolved and the sequences aligned to the genomic sequence. The remaining 4% (62 gene models) either had deviations from transcriptomic data that could not be spanned or generated sequence that did not align to genomic sequence. The outcome of this process is a set of high confidence gene models that can be reliably used for experimental characterization of protein function. Conclusions 69% of expressed mycorrhizal JGI “best” gene models deviated from the transcript sequence derived by this method. The transcriptomic sequence enabled correction of a majority of the structural inconsistencies and resulted in a set of validated models for 96% of the mycorrhizal genes. The method described here can be applied to improve gene structural

  4. Increasing Clinical Severity during a Dengue Virus Type 3 Cuban Epidemic: Deep Sequencing of Evolving Viral Populations

    PubMed Central

    Blanc, Hervé; Bordería, Antonio V.; Díaz, Gisell; Henningsson, Rasmus; Gonzalez, Daniel; Santana, Emidalys; Alvarez, Mayling; Castro, Osvaldo; Fontes, Magnus; Vignuzzi, Marco; Guzman, Maria G.

    2016-01-01

    ABSTRACT During the dengue virus type 3 (DENV-3) epidemic that occurred in Havana in 2001 to 2002, severe disease was associated with the infection sequence DENV-1 followed by DENV-3 (DENV-1/DENV-3), while the sequence DENV-2/DENV-3 was associated with mild/asymptomatic infections. To determine the role of the virus in the increasing severity demonstrated during the epidemic, serum samples collected at different time points were studied. A total of 22 full-length sequences were obtained using a deep-sequencing approach. Bayesian phylogenetic analysis of consensus sequences revealed that two DENV-3 lineages were circulating in Havana at that time, both grouped within genotype III. The predominant lineage is closely related to Peruvian and Ecuadorian strains, while the minor lineage is related to Venezuelan strains. According to consensus sequences, relatively few nonsynonymous mutations were observed; only one was fixed during the epidemic at position 4380 in the NS2B gene. Intrahost genetic analysis indicated that a significant minor population was selected and became predominant toward the end of the epidemic. In conclusion, greater variability was detected during the epidemic's progression in terms of significant minority variants, particularly in the nonstructural genes. An increasing trend of genetic diversity toward the end of the epidemic was observed only for synonymous variant allele rates, with higher variability in secondary cases. Remarkably, significant intrahost genetic variation was demonstrated within the same patient during the course of secondary infection with DENV-1/DENV-3, including changes in the structural proteins premembrane (PrM) and envelope (E). Therefore, the dynamic of evolving viral populations in the context of heterotypic antibodies could be related to the increasing clinical severity observed during the epidemic. IMPORTANCE Based on the evidence that DENV fitness is context dependent, our research has focused on the study of viral

  5. A generic assay for whole-genome amplification and deep sequencing of enterovirus A71

    PubMed Central

    Tan, Le Van; Tuyen, Nguyen Thi Kim; Thanh, Tran Tan; Ngan, Tran Thuy; Van, Hoang Minh Tu; Sabanathan, Saraswathy; Van, Tran Thi My; Thanh, Le Thi My; Nguyet, Lam Anh; Geoghegan, Jemma L.; Ong, Kien Chai; Perera, David; Hang, Vu Thi Ty; Ny, Nguyen Thi Han; Anh, Nguyen To; Ha, Do Quang; Qui, Phan Tu; Viet, Do Chau; Tuan, Ha Manh; Wong, Kum Thong; Holmes, Edward C.; Chau, Nguyen Van Vinh; Thwaites, Guy; van Doorn, H. Rogier

    2015-01-01

    Enterovirus A71 (EV-A71) has emerged as the most important cause of large outbreaks of severe and sometimes fatal hand, foot and mouth disease (HFMD) across the Asia-Pacific region. EV-A71 outbreaks have been associated with (sub)genogroup switches, sometimes accompanied by recombination events. Understanding EV-A71 population dynamics is therefore essential for understanding this emerging infection, and may provide pivotal information for vaccine development. Despite the public health burden of EV-A71, relatively few EV-A71 complete-genome sequences are available for analysis and from limited geographical localities. The availability of an efficient procedure for whole-genome sequencing would stimulate effort to generate more viral sequence data. Herein, we report for the first time the development of a next-generation sequencing based protocol for whole-genome sequencing of EV-A71 directly from clinical specimens. We were able to sequence viruses of subgenogroup C4 and B5, while RNA from culture materials of diverse EV-A71 subgenogroups belonging to both genogroup B and C was successfully amplified. The nature of intra-host genetic diversity was explored in 22 clinical samples, revealing 107 positions carrying minor variants (ranging from 0 to 15 variants per sample). Our analysis of EV-A71 strains sampled in 2013 showed that they all belonged to subgenogroup B5, representing the first report of this subgenogroup in Vietnam. In conclusion, we have successfully developed a high-throughput next-generation sequencing-based assay for whole-genome sequencing of EV-A71 from clinical samples. PMID:25704598

  6. A generic assay for whole-genome amplification and deep sequencing of enterovirus A71.

    PubMed

    Tan, Le Van; Tuyen, Nguyen Thi Kim; Thanh, Tran Tan; Ngan, Tran Thuy; Van, Hoang Minh Tu; Sabanathan, Saraswathy; Van, Tran Thi My; Thanh, Le Thi My; Nguyet, Lam Anh; Geoghegan, Jemma L; Ong, Kien Chai; Perera, David; Hang, Vu Thi Ty; Ny, Nguyen Thi Han; Anh, Nguyen To; Ha, Do Quang; Qui, Phan Tu; Viet, Do Chau; Tuan, Ha Manh; Wong, Kum Thong; Holmes, Edward C; Chau, Nguyen Van Vinh; Thwaites, Guy; van Doorn, H Rogier

    2015-04-01

    Enterovirus A71 (EV-A71) has emerged as the most important cause of large outbreaks of severe and sometimes fatal hand, foot and mouth disease (HFMD) across the Asia-Pacific region. EV-A71 outbreaks have been associated with (sub)genogroup switches, sometimes accompanied by recombination events. Understanding EV-A71 population dynamics is therefore essential for understanding this emerging infection, and may provide pivotal information for vaccine development. Despite the public health burden of EV-A71, relatively few EV-A71 complete-genome sequences are available for analysis and from limited geographical localities. The availability of an efficient procedure for whole-genome sequencing would stimulate effort to generate more viral sequence data. Herein, we report for the first time the development of a next-generation sequencing based protocol for whole-genome sequencing of EV-A71 directly from clinical specimens. We were able to sequence viruses of subgenogroup C4 and B5, while RNA from culture materials of diverse EV-A71 subgenogroups belonging to both genogroup B and C was successfully amplified. The nature of intra-host genetic diversity was explored in 22 clinical samples, revealing 107 positions carrying minor variants (ranging from 0 to 15 variants per sample). Our analysis of EV-A71 strains sampled in 2013 showed that they all belonged to subgenogroup B5, representing the first report of this subgenogroup in Vietnam. In conclusion, we have successfully developed a high-throughput next-generation sequencing-based assay for whole-genome sequencing of EV-A71 from clinical samples. PMID:25704598

  7. Genome Sequence of the Deep-Sea Bacterium Idiomarina abyssalis KMM 227T

    PubMed Central

    Rheaume, Bruce A.; Mithoefer, Scott

    2015-01-01

    Idiomarina abyssalis KMM 227T is an aerobic flagellar gammaproteobacterium found at a depth of 4,000 to 5,000 m below sea level in the Pacific Ocean. This paper presents a draft genome sequence for I. abyssalis KMM 227T, with a predicted composition of 2,684,812 bp (47.15% G+C content) and 2,611 genes, of which 2,508 were predicted coding sequences. PMID:26514763

  8. A generic assay for whole-genome amplification and deep sequencing of enterovirus A71.

    PubMed

    Tan, Le Van; Tuyen, Nguyen Thi Kim; Thanh, Tran Tan; Ngan, Tran Thuy; Van, Hoang Minh Tu; Sabanathan, Saraswathy; Van, Tran Thi My; Thanh, Le Thi My; Nguyet, Lam Anh; Geoghegan, Jemma L; Ong, Kien Chai; Perera, David; Hang, Vu Thi Ty; Ny, Nguyen Thi Han; Anh, Nguyen To; Ha, Do Quang; Qui, Phan Tu; Viet, Do Chau; Tuan, Ha Manh; Wong, Kum Thong; Holmes, Edward C; Chau, Nguyen Van Vinh; Thwaites, Guy; van Doorn, H Rogier

    2015-04-01

    Enterovirus A71 (EV-A71) has emerged as the most important cause of large outbreaks of severe and sometimes fatal hand, foot and mouth disease (HFMD) across the Asia-Pacific region. EV-A71 outbreaks have been associated with (sub)genogroup switches, sometimes accompanied by recombination events. Understanding EV-A71 population dynamics is therefore essential for understanding this emerging infection, and may provide pivotal information for vaccine development. Despite the public health burden of EV-A71, relatively few EV-A71 complete-genome sequences are available for analysis and from limited geographical localities. The availability of an efficient procedure for whole-genome sequencing would stimulate effort to generate more viral sequence data. Herein, we report for the first time the development of a next-generation sequencing based protocol for whole-genome sequencing of EV-A71 directly from clinical specimens. We were able to sequence viruses of subgenogroup C4 and B5, while RNA from culture materials of diverse EV-A71 subgenogroups belonging to both genogroup B and C was successfully amplified. The nature of intra-host genetic diversity was explored in 22 clinical samples, revealing 107 positions carrying minor variants (ranging from 0 to 15 variants per sample). Our analysis of EV-A71 strains sampled in 2013 showed that they all belonged to subgenogroup B5, representing the first report of this subgenogroup in Vietnam. In conclusion, we have successfully developed a high-throughput next-generation sequencing-based assay for whole-genome sequencing of EV-A71 from clinical samples.

  9. Dissection of the octoploid strawberry genome by deep sequencing of the genomes of Fragaria species.

    PubMed

    Hirakawa, Hideki; Shirasawa, Kenta; Kosugi, Shunichi; Tashiro, Kosuke; Nakayama, Shinobu; Yamada, Manabu; Kohara, Mistuyo; Watanabe, Akiko; Kishida, Yoshie; Fujishiro, Tsunakazu; Tsuruoka, Hisano; Minami, Chiharu; Sasamoto, Shigemi; Kato, Midori; Nanri, Keiko; Komaki, Akiko; Yanagi, Tomohiro; Guoxin, Qin; Maeda, Fumi; Ishikawa, Masami; Kuhara, Satoru; Sato, Shusei; Tabata, Satoshi; Isobe, Sachiko N

    2014-01-01

    Cultivated strawberry (Fragaria x ananassa) is octoploid and shows allogamous behaviour. The present study aims at dissecting this octoploid genome through comparison with its wild relatives, F. iinumae, F. nipponica, F. nubicola, and F. orientalis by de novo whole-genome sequencing on an Illumina and Roche 454 platforms. The total length of the assembled Illumina genome sequences obtained was 698 Mb for F. x ananassa, and ∼200 Mb each for the four wild species. Subsequently, a virtual reference genome termed FANhybrid_r1.2 was constructed by integrating the sequences of the four homoeologous subgenomes of F. x ananassa, from which heterozygous regions in the Roche 454 and Illumina genome sequences were eliminated. The total length of FANhybrid_r1.2 thus created was 173.2 Mb with the N50 length of 5137 bp. The Illumina-assembled genome sequences of F. x ananassa and the four wild species were then mapped onto the reference genome, along with the previously published F. vesca genome sequence to establish the subgenomic structure of F. x ananassa. The strategy adopted in this study has turned out to be successful in dissecting the genome of octoploid F. x ananassa and appears promising when applied to the analysis of other polyploid plant species. PMID:24282021

  10. Insights into deep-sea sediment fungal communities from the East Indian Ocean using targeted environmental sequencing combined with traditional cultivation.

    PubMed

    Zhang, Xiao-yong; Tang, Gui-ling; Xu, Xin-ya; Nong, Xu-hua; Qi, Shu-hua

    2014-01-01

    The fungal diversity in deep-sea environments has recently gained an increasing amount attention. Our knowledge and understanding of the true fungal diversity and the role it plays in deep-sea environments, however, is still limited. We investigated the fungal community structure in five sediments from a depth of ∼ 4000 m in the East India Ocean using a combination of targeted environmental sequencing and traditional cultivation. This approach resulted in the recovery of a total of 45 fungal operational taxonomic units (OTUs) and 20 culturable fungal phylotypes. This finding indicates that there is a great amount of fungal diversity in the deep-sea sediments collected in the East Indian Ocean. Three fungal OTUs and one culturable phylotype demonstrated high divergence (89%-97%) from the existing sequences in the GenBank. Moreover, 44.4% fungal OTUs and 30% culturable fungal phylotypes are new reports for deep-sea sediments. These results suggest that the deep-sea sediments from the East India Ocean can serve as habitats for new fungal communities compared with other deep-sea environments. In addition, different fungal community could be detected when using targeted environmental sequencing compared with traditional cultivation in this study, which suggests that a combination of targeted environmental sequencing or traditional cultivation alone. This study is the first to report new insights into the fungal communities in deep-sea sediments environmental sequencing and traditional cultivation will generate a more diverse fungal community in deep-sea environments than using either from the East Indian Ocean, which increases our knowledge and understanding of the fungal diversity in deep-sea environments.

  11. Insights into Deep-Sea Sediment Fungal Communities from the East Indian Ocean Using Targeted Environmental Sequencing Combined with Traditional Cultivation

    PubMed Central

    Zhang, Xiao-yong; Tang, Gui-ling; Xu, Xin-ya; Nong, Xu-hua; Qi, Shu-Hua

    2014-01-01

    The fungal diversity in deep-sea environments has recently gained an increasing amount attention. Our knowledge and understanding of the true fungal diversity and the role it plays in deep-sea environments, however, is still limited. We investigated the fungal community structure in five sediments from a depth of ∼4000 m in the East India Ocean using a combination of targeted environmental sequencing and traditional cultivation. This approach resulted in the recovery of a total of 45 fungal operational taxonomic units (OTUs) and 20 culturable fungal phylotypes. This finding indicates that there is a great amount of fungal diversity in the deep-sea sediments collected in the East Indian Ocean. Three fungal OTUs and one culturable phylotype demonstrated high divergence (89%–97%) from the existing sequences in the GenBank. Moreover, 44.4% fungal OTUs and 30% culturable fungal phylotypes are new reports for deep-sea sediments. These results suggest that the deep-sea sediments from the East India Ocean can serve as habitats for new fungal communities compared with other deep-sea environments. In addition, different fungal community could be detected when using targeted environmental sequencing compared with traditional cultivation in this study, which suggests that a combination of targeted environmental sequencing and traditional cultivation will generate a more diverse fungal community in deep-sea environments than using either targeted environmental sequencing or traditional cultivation alone. This study is the first to report new insights into the fungal communities in deep-sea sediments from the East Indian Ocean, which increases our knowledge and understanding of the fungal diversity in deep-sea environments. PMID:25272044

  12. Insights into deep-sea sediment fungal communities from the East Indian Ocean using targeted environmental sequencing combined with traditional cultivation.

    PubMed

    Zhang, Xiao-yong; Tang, Gui-ling; Xu, Xin-ya; Nong, Xu-hua; Qi, Shu-hua

    2014-01-01

    The fungal diversity in deep-sea environments has recently gained an increasing amount attention. Our knowledge and understanding of the true fungal diversity and the role it plays in deep-sea environments, however, is still limited. We investigated the fungal community structure in five sediments from a depth of ∼ 4000 m in the East India Ocean using a combination of targeted environmental sequencing and traditional cultivation. This approach resulted in the recovery of a total of 45 fungal operational taxonomic units (OTUs) and 20 culturable fungal phylotypes. This finding indicates that there is a great amount of fungal diversity in the deep-sea sediments collected in the East Indian Ocean. Three fungal OTUs and one culturable phylotype demonstrated high divergence (89%-97%) from the existing sequences in the GenBank. Moreover, 44.4% fungal OTUs and 30% culturable fungal phylotypes are new reports for deep-sea sediments. These results suggest that the deep-sea sediments from the East India Ocean can serve as habitats for new fungal communities compared with other deep-sea environments. In addition, different fungal community could be detected when using targeted environmental sequencing compared with traditional cultivation in this study, which suggests that a combination of targeted environmental sequencing or traditional cultivation alone. This study is the first to report new insights into the fungal communities in deep-sea sediments environmental sequencing and traditional cultivation will generate a more diverse fungal community in deep-sea environments than using either from the East Indian Ocean, which increases our knowledge and understanding of the fungal diversity in deep-sea environments. PMID:25272044

  13. Characterization of microRNAs and their targets in wild barley (Hordeum vulgare subsp. spontaneum) using deep sequencing.

    PubMed

    Deng, Pingchuan; Bian, Jianxin; Yue, Hong; Feng, Kewei; Wang, Mengxing; Du, Xianghong; Weining, Song; Nie, Xiaojun

    2016-05-01

    MicroRNAs (miRNA) are a class of small, endogenous RNAs that play a negative regulatory role in various developmental and metabolic processes of plants. Wild barley (Hordeum vulgare subsp. spontaneum), as the progenitor of cultivated barley (Hordeum vulgare subsp. vulgare), has served as a valuable germplasm resource for barley genetic improvement. To survey miRNAs in wild barley, we sequenced the small RNA library prepared from wild barley using the Illumina deep sequencing technology. A total of 70 known miRNAs and 18 putative novel miRNAs were identified. Sequence analysis revealed that all of the miRNAs identified in wild barley contained the highly conserved hairpin sequences found in barley cultivars. MiRNA target predictions showed that 12 out of 52 miRNA families were predicted to target transcription factors, including 8 highly conserved miRNA families in plants and 4 wheat-barley conserved miRNA families. In addition to transcription factors, other predicted target genes were involved in diverse physiological and metabolic processes and stress defense. Our study for the first time reported the large-scale investigation of small RNAs in wild barley, which will provide essential information for understanding the regulatory role of miRNAs in wild barley and also shed light on future practical utilization of miRNAs for barley improvement.

  14. Characterization and Development of EST-SSRs by Deep Transcriptome Sequencing in Chinese Cabbage (Brassica rapa L. ssp. pekinensis)

    PubMed Central

    Ding, Qian; Li, Jingjuan; Wang, Fengde; Zhang, Yihui; Li, Huayin; Zhang, Jiannong; Gao, Jianwei

    2015-01-01

    Simple sequence repeats (SSRs) are among the most important markers for population analysis and have been widely used in plant genetic mapping and molecular breeding. Expressed sequence tag-SSR (EST-SSR) markers, located in the coding regions, are potentially more efficient for QTL mapping, gene targeting, and marker-assisted breeding. In this study, we investigated 51,694 nonredundant unigenes, assembled from clean reads from deep transcriptome sequencing with a Solexa/Illumina platform, for identification and development of EST-SSRs in Chinese cabbage. In total, 10,420 EST-SSRs with over 12 bp were identified and characterized, among which 2744 EST-SSRs are new and 2317 are known ones showing polymorphism with previously reported SSRs. A total of 7877 PCR primer pairs for 1561 EST-SSR loci were designed, and primer pairs for twenty-four EST-SSRs were selected for primer evaluation. In nineteen EST-SSR loci (79.2%), amplicons were successfully generated with high quality. Seventeen (89.5%) showed polymorphism in twenty-four cultivars of Chinese cabbage. The polymorphic alleles of each polymorphic locus were sequenced, and the results showed that most polymorphisms were due to variations of SSR repeat motifs. The EST-SSRs identified and characterized in this study have important implications for developing new tools for genetics and molecular breeding in Chinese cabbage. PMID:26504770

  15. Characterization and Development of EST-SSRs by Deep Transcriptome Sequencing in Chinese Cabbage (Brassica rapa L. ssp. pekinensis).

    PubMed

    Ding, Qian; Li, Jingjuan; Wang, Fengde; Zhang, Yihui; Li, Huayin; Zhang, Jiannong; Gao, Jianwei

    2015-01-01

    Simple sequence repeats (SSRs) are among the most important markers for population analysis and have been widely used in plant genetic mapping and molecular breeding. Expressed sequence tag-SSR (EST-SSR) markers, located in the coding regions, are potentially more efficient for QTL mapping, gene targeting, and marker-assisted breeding. In this study, we investigated 51,694 nonredundant unigenes, assembled from clean reads from deep transcriptome sequencing with a Solexa/Illumina platform, for identification and development of EST-SSRs in Chinese cabbage. In total, 10,420 EST-SSRs with over 12 bp were identified and characterized, among which 2744 EST-SSRs are new and 2317 are known ones showing polymorphism with previously reported SSRs. A total of 7877 PCR primer pairs for 1561 EST-SSR loci were designed, and primer pairs for twenty-four EST-SSRs were selected for primer evaluation. In nineteen EST-SSR loci (79.2%), amplicons were successfully generated with high quality. Seventeen (89.5%) showed polymorphism in twenty-four cultivars of Chinese cabbage. The polymorphic alleles of each polymorphic locus were sequenced, and the results showed that most polymorphisms were due to variations of SSR repeat motifs. The EST-SSRs identified and characterized in this study have important implications for developing new tools for genetics and molecular breeding in Chinese cabbage.

  16. Exome and deep sequencing of clinically aggressive neuroblastoma reveal somatic mutations that affect key pathways involved in cancer progression

    PubMed Central

    Lasorsa, Vito Alessandro; Formicola, Daniela; Pignataro, Piero; Cimmino, Flora; Calabrese, Francesco Maria; Mora, Jaume; Esposito, Maria Rosaria; Pantile, Marcella; Zanon, Carlo; De Mariano, Marilena; Longo, Luca; Hogarty, Michael D.; de Torres, Carmen; Tonini, Gian Paolo; Iolascon, Achille; Capasso, Mario

    2016-01-01

    The spectrum of somatic mutation of the most aggressive forms of neuroblastoma is not completely determined. We sought to identify potential cancer drivers in clinically aggressive neuroblastoma. Whole exome sequencing was conducted on 17 germline and tumor DNA samples from high-risk patients with adverse events within 36 months from diagnosis (HR-Event3) to identify somatic mutations and deep targeted sequencing of 134 genes selected from the initial screening in additional 48 germline and tumor pairs (62.5% HR-Event3 and high-risk patients), 17 HR-Event3 tumors and 17 human-derived neuroblastoma cell lines. We revealed 22 significantly mutated genes, many of which implicated in cancer progression. Fifteen genes (68.2%) were highly expressed in neuroblastoma supporting their involvement in the disease. CHD9, a cancer driver gene, was the most significantly altered (4.0% of cases) after ALK. Other genes (PTK2, NAV3, NAV1, FZD1 and ATRX), expressed in neuroblastoma and involved in cell invasion and migration were mutated at frequency ranged from 4% to 2%. Focal adhesion and regulation of actin cytoskeleton pathways, were frequently disrupted (14.1% of cases) thus suggesting potential novel therapeutic strategies to prevent disease progression. Notably BARD1, CHEK2 and AXIN2 were enriched in rare, potentially pathogenic, germline variants. In summary, whole exome and deep targeted sequencing identified novel cancer genes of clinically aggressive neuroblastoma. Our analyses show pathway-level implications of infrequently mutated genes in leading neuroblastoma progression. PMID:27009842

  17. Exome and deep sequencing of clinically aggressive neuroblastoma reveal somatic mutations that affect key pathways involved in cancer progression.

    PubMed

    Lasorsa, Vito Alessandro; Formicola, Daniela; Pignataro, Piero; Cimmino, Flora; Calabrese, Francesco Maria; Mora, Jaume; Esposito, Maria Rosaria; Pantile, Marcella; Zanon, Carlo; De Mariano, Marilena; Longo, Luca; Hogarty, Michael D; de Torres, Carmen; Tonini, Gian Paolo; Iolascon, Achille; Capasso, Mario

    2016-04-19

    The spectrum of somatic mutation of the most aggressive forms of neuroblastoma is not completely determined. We sought to identify potential cancer drivers in clinically aggressive neuroblastoma.Whole exome sequencing was conducted on 17 germline and tumor DNA samples from high-risk patients with adverse events within 36 months from diagnosis (HR-Event3) to identify somatic mutations and deep targeted sequencing of 134 genes selected from the initial screening in additional 48 germline and tumor pairs (62.5% HR-Event3 and high-risk patients), 17 HR-Event3 tumors and 17 human-derived neuroblastoma cell lines.We revealed 22 significantly mutated genes, many of which implicated in cancer progression. Fifteen genes (68.2%) were highly expressed in neuroblastoma supporting their involvement in the disease. CHD9, a cancer driver gene, was the most significantly altered (4.0% of cases) after ALK.Other genes (PTK2, NAV3, NAV1, FZD1 and ATRX), expressed in neuroblastoma and involved in cell invasion and migration were mutated at frequency ranged from 4% to 2%.Focal adhesion and regulation of actin cytoskeleton pathways, were frequently disrupted (14.1% of cases) thus suggesting potential novel therapeutic strategies to prevent disease progression.Notably BARD1, CHEK2 and AXIN2 were enriched in rare, potentially pathogenic, germline variants.In summary, whole exome and deep targeted sequencing identified novel cancer genes of clinically aggressive neuroblastoma. Our analyses show pathway-level implications of infrequently mutated genes in leading neuroblastoma progression. PMID:27009842

  18. Identification of a new enamovirus associated with citrus vein enation disease by deep sequencing of small RNAs.

    PubMed

    Vives, Mari Carmen; Velázquez, Karelia; Pina, José Antonio; Moreno, Pedro; Guerri, José; Navarro, Luis

    2013-10-01

    To identify the causal agent of citrus vein enation disease, we examined by deep sequencing (Solexa-Illumina) the small RNA (sRNA) fraction from infected and healthy Etrog citron plants. Our results showed that virus-derived sRNAs (vsRNAs): (i) represent about 14.21% of the total sRNA population, (ii) are predominantly of 21 and 24 nucleotides with a biased distribution of their 5' nucleotide and with a clear prevalence of those of (+) polarity, and (iii) derive from all the viral genome, although a prominent hotspot is present at a 5'-proximal region. Contigs assembled from vsRNAs showed similarity with luteovirus sequences, particularly with Pea enation mosaic virus, the type member of the genus Enamovirus. The genomic RNA (gRNA) sequence of a new virus, provisionally named Citrus vein enation virus (CVEV), was completed and characterized. The CVEV gRNA was found to be single-stranded, positive-sense, with a size of 5,983 nucleotides and five open reading frames. Phylogenetic comparisons based on amino acid signatures of the RNA polymerase and the coat protein clearly classifies CVEV within the genus Enamovirus. Dot-blot hybridization and reverse transcription-polymerase chain reaction tests were developed to detect CVEV in plants affected by vein enation disease. CVEV detection by these methods has already been adopted for use in the Spanish citrus quarantine, sanitation, and certification programs.

  19. Deep sequencing of microRNA precursors reveals extensive 3′ end modification

    PubMed Central

    Newman, Martin A.; Mani, Vidya; Hammond, Scott M.

    2011-01-01

    MicroRNAs (miRNAs) are small, noncoding RNAs that post-transcriptionally regulate gene expression. An emerging mechanism to control miRNA production is the addition of an oligo-uridine tail to the 3′ end of the precursor miRNA. This has been demonstrated for the Let-7 family of miRNAs in embryonic cells. Additionally, nontemplated nucleotides have been found on mature miRNA species, though in most cases it is not known if nucleotide addition occurs at the precursor step or at the mature miRNA. To examine the diversity of nucleotide addition we have developed a high-throughput sequencing method specific for miRNA precursors. Here we report that nontemplated addition is a widespread phenomenon occurring in many miRNA families. As previously reported, Let-7 family members are oligo-uridylated in embryonic cells in a Lin28-dependent manner. However, we find that the fraction of uridylated precursors increases with differentiation, independent of Lin28, and is highest in adult mouse tissues, exceeding 30% of all sequence reads for some Let-7 family members. A similar fraction of sequence reads are modified for many other miRNA families. Mono-uridylation is most common, with cytidine and adenosine modification less frequent but occurring above the expected error rate for Illumina sequencing. Nucleotide addition in cell lines is associated with 3′ end degradation, in contrast to adult tissues, where modification occurs predominantly on full-length precursors. This work provides an unprecedented view of the complexity of 3′ modification and trimming of miRNA precursors. PMID:21849429

  20. Deep sequencing of microRNA precursors reveals extensive 3' end modification.

    PubMed

    Newman, Martin A; Mani, Vidya; Hammond, Scott M

    2011-10-01

    MicroRNAs (miRNAs) are small, noncoding RNAs that post-transcriptionally regulate gene expression. An emerging mechanism to control miRNA production is the addition of an oligo-uridine tail to the 3' end of the precursor miRNA. This has been demonstrated for the Let-7 family of miRNAs in embryonic cells. Additionally, nontemplated nucleotides have been found on mature miRNA species, though in most cases it is not known if nucleotide addition occurs at the precursor step or at the mature miRNA. To examine the diversity of nucleotide addition we have developed a high-throughput sequencing method specific for miRNA precursors. Here we report that nontemplated addition is a widespread phenomenon occurring in many miRNA families. As previously reported, Let-7 family members are oligo-uridylated in embryonic cells in a Lin28-dependent manner. However, we find that the fraction of uridylated precursors increases with differentiation, independent of Lin28, and is highest in adult mouse tissues, exceeding 30% of all sequence reads for some Let-7 family members. A similar fraction of sequence reads are modified for many other miRNA families. Mono-uridylation is most common, with cytidine and adenosine modification less frequent but occurring above the expected error rate for Illumina sequencing. Nucleotide addition in cell lines is associated with 3' end degradation, in contrast to adult tissues, where modification occurs predominantly on full-length precursors. This work provides an unprecedented view of the complexity of 3' modification and trimming of miRNA precursors.

  1. MiRNA Expression Profile for the Human Gastric Antrum Region Using Ultra-Deep Sequencing

    PubMed Central

    Hamoy, Igor G.; Darnet, Sylvain; Burbano, Rommel; Khayat, André; Gonçalves, André Nicolau; Alencar, Dayse O.; Cruz, Aline; Magalhães, Leandro; Araújo Jr., Wilson; Silva, Artur; Santos, Sidney; Demachki, Samia; Assumpção, Paulo; Ribeiro-dos-Santos, Ândrea

    2014-01-01

    Background MicroRNAs are small non-coding nucleotide sequences that regulate gene expression. These structures are fundamental to several biological processes, including cell proliferation, development, differentiation and apoptosis. Identifying the expression profile of microRNAs in healthy human gastric antrum mucosa may help elucidate the miRNA regulatory mechanisms of the human stomach. Methodology/Principal Findings A small RNA library of stomach antrum tissue was sequenced using high-throughput SOLiD sequencing technology. The total read count for the gastric mucosa antrum region was greater than 618,000. After filtering and aligning using with MirBase, 148 mature miRNAs were identified in the gastric antrum tissue, totaling 3,181 quality reads; 63.5% (2,021) of the reads were concentrated in the eight most highly expressed miRNAs (hsa-mir-145, hsa-mir-29a, hsa-mir-29c, hsa-mir-21, hsa-mir-451a, hsa-mir-192, hsa-mir-191 and hsa-mir-148a). RT-PCR validated the expression profiles of seven of these highly expressed miRNAs and confirmed the sequencing results obtained using the SOLiD platform. Conclusions/Significance In comparison with other tissues, the antrum’s expression profile was unique with respect to the most highly expressed miRNAs, suggesting that this expression profile is specific to stomach antrum tissue. The current study provides a starting point for a more comprehensive understanding of the role of miRNAs in the regulation of the molecular processes of the human stomach. PMID:24647245

  2. High-throughput, high-fidelity HLA genotyping with deep sequencing.

    PubMed

    Wang, Chunlin; Krishnakumar, Sujatha; Wilhelmy, Julie; Babrzadeh, Farbod; Stepanyan, Lilit; Su, Laura F; Levinson, Douglas; Fernandez-Viña, Marcelo A; Davis, Ronald W; Davis, Mark M; Mindrinos, Michael

    2012-05-29

    Human leukocyte antigen (HLA) genes are the most polymorphic in the human genome. They play a pivotal role in the immune response and have been implicated in numerous human pathologies, especially autoimmunity and infectious diseases. Despite their importance, however, they are rarely characterized comprehensively because of the prohibitive cost of standard technologies and the technical challenges of accurately discriminating between these highly related genes and their many allelles. Here we demonstrate a high-resolution, and cost-effective methodology to type HLA genes by sequencing, which combines the advantage of long-range amplification, the power of high-throughput sequencing platforms, and a unique genotyping algorithm. We calibrated our method for HLA-A, -B, -C, and -DRB1 genes with both reference cell lines and clinical samples and identified several previously undescribed alleles with mismatches, insertions, and deletions. We have further demonstrated the utility of this method in a clinical setting by typing five clinical samples in an Illumina MiSeq instrument with a 5-d turnaround. Overall, this technology has the capacity to deliver low-cost, high-throughput, and accurate HLA typing by multiplexing thousands of samples in a single sequencing run, which will enable comprehensive disease-association studies with large cohorts. Furthermore, this approach can also be extended to include other polymorphic genes.

  3. High-throughput, high-fidelity HLA genotyping with deep sequencing

    PubMed Central

    Wang, Chunlin; Krishnakumar, Sujatha; Wilhelmy, Julie; Babrzadeh, Farbod; Stepanyan, Lilit; Su, Laura F.; Levinson, Douglas; Fernandez-Viña, Marcelo A.; Davis, Ronald W.; Davis, Mark M.; Mindrinos, Michael

    2012-01-01

    Human leukocyte antigen (HLA) genes are the most polymorphic in the human genome. They play a pivotal role in the immune response and have been implicated in numerous human pathologies, especially autoimmunity and infectious diseases. Despite their importance, however, they are rarely characterized comprehensively because of the prohibitive cost of standard technologies and the technical challenges of accurately discriminating between these highly related genes and their many allelles. Here we demonstrate a high-resolution, and cost-effective methodology to type HLA genes by sequencing, which combines the advantage of long-range amplification, the power of high-throughput sequencing platforms, and a unique genotyping algorithm. We calibrated our method for HLA-A, -B, -C, and -DRB1 genes with both reference cell lines and clinical samples and identified several previously undescribed alleles with mismatches, insertions, and deletions. We have further demonstrated the utility of this method in a clinical setting by typing five clinical samples in an Illumina MiSeq instrument with a 5-d turnaround. Overall, this technology has the capacity to deliver low-cost, high-throughput, and accurate HLA typing by multiplexing thousands of samples in a single sequencing run, which will enable comprehensive disease-association studies with large cohorts. Furthermore, this approach can also be extended to include other polymorphic genes. PMID:22589303

  4. High diversity of picornaviruses in rats from different continents revealed by deep sequencing.

    PubMed

    Hansen, Thomas Arn; Mollerup, Sarah; Nguyen, Nam-Phuong; White, Nicole E; Coghlan, Megan; Alquezar-Planas, David E; Joshi, Tejal; Jensen, Randi Holm; Fridholm, Helena; Kjartansdóttir, Kristín Rós; Mourier, Tobias; Warnow, Tandy; Belsham, Graham J; Bunce, Michael; Willerslev, Eske; Nielsen, Lars Peter; Vinner, Lasse; Hansen, Anders Johannes

    2016-08-17

    Outbreaks of zoonotic diseases in humans and livestock are not uncommon, and an important component in containment of such emerging viral diseases is rapid and reliable diagnostics. Such methods are often PCR-based and hence require the availability of sequence data from the pathogen. Rattus norvegicus (R. norvegicus) is a known reservoir for important zoonotic pathogens. Transmission may be direct via contact with the animal, for example, through exposure to its faecal matter, or indirectly mediated by arthropod vectors. Here we investigated the viral content in rat faecal matter (n=29) collected from two continents by analyzing 2.2 billion next-generation sequencing reads derived from both DNA and RNA. Among other virus families, we found sequences from members of the Picornaviridae to be abundant in the microbiome of all the samples. Here we describe the diversity of the picornavirus-like contigs including near-full-length genomes closely related to the Boone cardiovirus and Theiler's encephalomyelitis virus. From this study, we conclude that picornaviruses within R. norvegicus are more diverse than previously recognized. The virome of R. norvegicus should be investigated further to assess the full potential for zoonotic virus transmission.

  5. High diversity of picornaviruses in rats from different continents revealed by deep sequencing

    PubMed Central

    Hansen, Thomas Arn; Mollerup, Sarah; Nguyen, Nam-phuong; White, Nicole E; Coghlan, Megan; Alquezar-Planas, David E; Joshi, Tejal; Jensen, Randi Holm; Fridholm, Helena; Kjartansdóttir, Kristín Rós; Mourier, Tobias; Warnow, Tandy; Belsham, Graham J; Bunce, Michael; Willerslev, Eske; Nielsen, Lars Peter; Vinner, Lasse; Hansen, Anders Johannes

    2016-01-01

    Outbreaks of zoonotic diseases in humans and livestock are not uncommon, and an important component in containment of such emerging viral diseases is rapid and reliable diagnostics. Such methods are often PCR-based and hence require the availability of sequence data from the pathogen. Rattus norvegicus (R. norvegicus) is a known reservoir for important zoonotic pathogens. Transmission may be direct via contact with the animal, for example, through exposure to its faecal matter, or indirectly mediated by arthropod vectors. Here we investigated the viral content in rat faecal matter (n=29) collected from two continents by analyzing 2.2 billion next-generation sequencing reads derived from both DNA and RNA. Among other virus families, we found sequences from members of the Picornaviridae to be abundant in the microbiome of all the samples. Here we describe the diversity of the picornavirus-like contigs including near-full-length genomes closely related to the Boone cardiovirus and Theiler's encephalomyelitis virus. From this study, we conclude that picornaviruses within R. norvegicus are more diverse than previously recognized. The virome of R. norvegicus should be investigated further to assess the full potential for zoonotic virus transmission. PMID:27530749

  6. Virus discovery by deep sequencing and assembly of virus-derived small silencing RNAs

    PubMed Central

    Wu, Qingfa; Luo, Yingjun; Lu, Rui; Lau, Nelson; Lai, Eric C.; Li, Wan-Xiang; Ding, Shou-Wei

    2010-01-01

    In response to infection, invertebrates process replicating viral RNA genomes into siRNAs of discrete sizes to guide virus clearance by RNA interference. Here, we show that viral siRNAs sequenced from fruit fly, mosquito, and nematode cells were all overlapping in sequence, suggesting a possibility of using siRNAs for viral genome assembly and virus discovery. To test this idea, we examined contigs assembled from published small RNA libraries and discovered five previously undescribed viruses from cultured Drosophila cells and adult mosquitoes, including three with a positive-strand RNA genome and two with a dsRNA genome. Notably, four of the identified viruses exhibited only low sequence similarities to known viruses, such that none could be assigned into an existing virus genus. We also report detection of virus-derived PIWI-interacting RNAs (piRNAs) in Drosophila melanogaster that have not been previously described in any other host species and demonstrate viral genome assembly from viral piRNAs in the absence of viral siRNAs. Thus, this study provides a powerful culture-independent approach for virus discovery in invertebrates by assembling viral genomes directly from host immune response products without prior virus enrichment or amplification. We propose that invertebrate viruses discovered by this approach may include previously undescribed human and vertebrate viral pathogens that are transmitted by arthropod vectors. PMID:20080648

  7. Deep sequencing uncovers protistan plankton diversity in the Portuguese Ria Formosa solar saltern ponds.

    PubMed

    Filker, Sabine; Gimmler, Anna; Dunthorn, Micah; Mahé, Frédéric; Stoeck, Thorsten

    2015-03-01

    We used high-throughput sequencing to unravel the genetic diversity of protistan (including fungal) plankton in hypersaline ponds of the Ria Formosa solar saltern works in Portugal. From three ponds of different salinity (4, 12 and 38 %), we obtained ca. 105,000 amplicons (V4 region of the SSU rDNA). The genetic diversity we found was higher than what has been described from solar saltern ponds thus far by microscopy or molecular studies. The obtained operational taxonomic units (OTUs) could be assigned to 14 high-rank taxonomic groups and blasted to 120 eukaryotic families. The novelty of this genetic diversity was extremely high, with 27 % of all OTUs having a sequence divergence of more than 10 % to deposited sequences of described taxa. The highest degree of novelty was found at intermediate salinity of 12 % within the ciliates, which traditionally are considered as the best known and described taxon group within the kingdom Protista. Further substantial novelty was detected within the stramenopiles and the chlorophytes. Analyses of community structures suggest a transition boundary for protistan plankton between 4 and 12 % salinity, suggesting different haloadaptation strategies in individual evolutionary lineages as a result of environmental filtering. Our study makes evident the gaps in our knowledge not only of protistan and fungal plankton diversity in hypersaline environments, but also in their ecology and their strategies to cope with these environmental conditions. It substantiates that specific future research needs to fill these gaps.

  8. High diversity of picornaviruses in rats from different continents revealed by deep sequencing.

    PubMed

    Hansen, Thomas Arn; Mollerup, Sarah; Nguyen, Nam-Phuong; White, Nicole E; Coghlan, Megan; Alquezar-Planas, David E; Joshi, Tejal; Jensen, Randi Holm; Fridholm, Helena; Kjartansdóttir, Kristín Rós; Mourier, Tobias; Warnow, Tandy; Belsham, Graham J; Bunce, Michael; Willerslev, Eske; Nielsen, Lars Peter; Vinner, Lasse; Hansen, Anders Johannes

    2016-01-01

    Outbreaks of zoonotic diseases in humans and livestock are not uncommon, and an important component in containment of such emerging viral diseases is rapid and reliable diagnostics. Such methods are often PCR-based and hence require the availability of sequence data from the pathogen. Rattus norvegicus (R. norvegicus) is a known reservoir for important zoonotic pathogens. Transmission may be direct via contact with the animal, for example, through exposure to its faecal matter, or indirectly mediated by arthropod vectors. Here we investigated the viral content in rat faecal matter (n=29) collected from two continents by analyzing 2.2 billion next-generation sequencing reads derived from both DNA and RNA. Among other virus families, we found sequences from members of the Picornaviridae to be abundant in the microbiome of all the samples. Here we describe the diversity of the picornavirus-like contigs including near-full-length genomes closely related to the Boone cardiovirus and Theiler's encephalomyelitis virus. From this study, we conclude that picornaviruses within R. norvegicus are more diverse than previously recognized. The virome of R. norvegicus should be investigated further to assess the full potential for zoonotic virus transmission. PMID:27530749

  9. Population-genomic variation within RNA viruses of the Western honey bee, Apis mellifera, inferred from deep sequencing

    PubMed Central

    2013-01-01

    Background Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RNA viruses of the Western honey bee (Apis mellifera), deformed wing virus (DWV) and Israel acute paralysis virus (IAPV). All viral RNA was extracted from North American samples of honey bees or, in one case, the ectoparasitic mite Varroa destructor. Results Coverage depth was generally lower for IAPV than DWV, and marked gaps in coverage occurred in several narrow regions (< 50 bp) of IAPV. These coverage gaps occurred across sequencing runs and were virtually unchanged when reads were re-mapped with greater permissiveness (up to 8% divergence), suggesting a recurrent sequencing artifact rather than strain divergence. Consensus sequences of DWV for each sample showed little phylogenetic divergence, low nucleotide diversity, and strongly negative values of Fu and Li’s D statistic, suggesting a recent population bottleneck and/or purifying selection. The Kakugo strain of DWV fell outside of all other DWV sequences at 100% bootstrap support. IAPV consensus sequences supported the existence of multiple clades as had been previously reported, and Fu and Li’s D was closer to neutral expectation overall, although a sliding-window analysis identified a significantly positive D within the protease region, suggesting selection maintains diversity in that region. Within-sample mean diversity was comparable between the two viruses on average, although for both viruses there was substantial variation among samples in mean diversity at third codon positions and in the number of high-diversity sites. FST values were bimodal for DWV, likely reflecting neutral divergence in two low-diversity populations, whereas IAPV had several sites that were strong outliers with very low

  10. Discovering novel microRNAs and age-related nonlinear changes in rat brains using deep sequencing.

    PubMed

    Yin, Lanxuan; Sun, Yubai; Wu, Jinfeng; Yan, Siyu; Deng, Zhenglu; Wang, Jun; Liao, Shenke; Yin, Dazhong; Li, Guolin

    2015-02-01

    Elucidating the molecular mechanisms of brain aging remains a significant challenge for biogerontologists. The discovery of gene regulation by microRNAs (miRNAs) has added a new dimension for examining this process; however, the full complement of miRNAs involved in brain aging is still not known. In this study, miRNA profiles of young, adult, and old rats were obtained to evaluate molecular changes during aging. High-throughput deep sequencing revealed 547 known and 171 candidate novel miRNAs that were differentially expressed among groups. Unexpectedly, miRNA expression did not decline progressively with advancing age; moreover, genes targeted by age-associated miRNAs were predicted to be involved in biological processes linked to aging and neurodegenerative diseases. These findings provide novel insight into the molecular mechanisms underlying brain aging and a resource for future studies on age-related brain disorders.

  11. Deep sequencing of mRNA in CD24(-) and CD24(+) mammary carcinoma Mvt1 cell line.

    PubMed

    Rostoker, Ran; Jayaprakash, Anitha D; Sachidanandam, Ravi; LeRoith, Derek

    2015-09-01

    CD24 is an anchored cell surface marker that is highly expressed in cancer cells (Lee et al., 2009) and its expression is associated with poorer outcome of cancer patients (Kristiansen et al., 2003). Phenotype comparison between two subpopulations derived from the Mvt1 cell line, CD24(-) cells (with no CD24 cell surface expression) and the CD24(+) cells, identified high tumorigenic capacity for the CD24(+) cells. In order to reveal the transcripts that support the CD24(+) aggressive and invasive phenotype we compared the gene profiles of these two subpopulations. mRNA profiles of CD24(-) and CD24(+) cells were generated by deep sequencing, in triplicate, using an Illumina HiSeq 2500. Here we provide a detailed description of the mRNA-seq analysis from our recent study (Rostoker et al., 2015). The mRNA-seq data have been deposited in the NCBI GEO database (accession number GSE68746).

  12. Gene expression profiling of Sinapis alba leaves under drought stress and rewatering growth conditions with Illumina deep sequencing.

    PubMed

    Dong, Cai-Hua; Li, Chen; Yan, Xiao-Hong; Huang, Shun-Mou; Huang, Jin-Yong; Wang, Li-Jun; Guo, Rui-Xing; Lu, Guang-Yuan; Zhang, Xue-Kun; Fang, Xiao-Ping; Wei, Wen-Hui

    2012-05-01

    Sinapis alba has many desirable agronomic traits including tolerance to drought. In this investigation, we performed the genome-wide transcriptional profiling of S. alba leaves under drought stress and rewatering growth conditions in an attempt to identify candidate genes involved in drought tolerance, using the Illumina deep sequencing technology. The comparative analysis revealed numerous changes in gene expression level attributable to the drought stress, which resulted in the down-regulation of 309 genes and the up-regulation of 248 genes. Gene ontology analysis revealed that the differentially expressed genes were mainly involved in cell division and catalytic and metabolic processes. Our results provide useful information for further analyses of the drought stress tolerance in Sinapis, and will facilitate molecular breeding for Brassica crop plants. PMID:22207172

  13. Predicting backbone Cα angles and dihedrals from protein sequences by stacked sparse auto-encoder deep neural network.

    PubMed

    Lyons, James; Dehzangi, Abdollah; Heffernan, Rhys; Sharma, Alok; Paliwal, Kuldip; Sattar, Abdul; Zhou, Yaoqi; Yang, Yuedong

    2014-10-30

    Because a nearly constant distance between two neighbouring Cα atoms, local backbone structure of proteins can be represented accurately by the angle between C(αi-1)-C(αi)-C(αi+1) (θ) and a dihedral angle rotated about the C(αi)-C(αi+1) bond (τ). θ and τ angles, as the representative of structural properties of three to four amino-acid residues, offer a description of backbone conformations that is complementary to φ and ψ angles (single residue) and secondary structures (>3 residues). Here, we report the first machine-learning technique for sequence-based prediction of θ and τ angles. Predicted angles based on an independent test have a mean absolute error of 9° for θ and 34° for τ with a distribution on the θ-τ plane close to that of native values. The average root-mean-square distance of 10-residue fragment structures constructed from predicted θ and τ angles is only 1.9Å from their corresponding native structures. Predicted θ and τ angles are expected to be complementary to predicted ϕ and ψ angles and secondary structures for using in model validation and template-based as well as template-free structure prediction. The deep neural network learning technique is available as an on-line server called Structural Property prediction with Integrated DEep neuRal network (SPIDER) at http://sparks-lab.org.

  14. Congruent Deep Relationships in the Grape Family (Vitaceae) Based on Sequences of Chloroplast Genomes and Mitochondrial Genes via Genome Skimming

    PubMed Central

    Zhang, Ning; Wen, Jun; Zimmer, Elizabeth A.

    2015-01-01

    Vitaceae is well-known for having one of the most economically important fruits, i.e., the grape (Vitis vinifera). The deep phylogeny of the grape family was not resolved until a recent phylogenomic analysis of 417 nuclear genes from transcriptome data. However, it has been reported extensively that topologies based on nuclear and organellar genes may be incongruent due to differences in their evolutionary histories. Therefore, it is important to reconstruct a backbone phylogeny of the grape family using plastomes and mitochondrial genes. In this study, next-generation sequencing data sets of 27 species were obtained using genome skimming with total DNAs from silica-gel preserved tissue samples on an Illumina HiSeq 2500 instrument. Plastomes were assembled using the combination of de novo and reference genome (of V. vinifera) methods. Sixteen mitochondrial genes were also obtained via genome skimming using the reference genome of V. vinifera. Extensive phylogenetic analyses were performed using maximum likelihood and Bayesian methods. The topology based on either plastome data or mitochondrial genes is congruent with the one using hundreds of nuclear genes, indicating that the grape family did not exhibit significant reticulation at the deep level. The results showcase the power of genome skimming in capturing extensive phylogenetic data: especially from chloroplast and mitochondrial DNAs. PMID:26656830

  15. Congruent Deep Relationships in the Grape Family (Vitaceae) Based on Sequences of Chloroplast Genomes and Mitochondrial Genes via Genome Skimming.

    PubMed

    Zhang, Ning; Wen, Jun; Zimmer, Elizabeth A

    2015-01-01

    Vitaceae is well-known for having one of the most economically important fruits, i.e., the grape (Vitis vinifera). The deep phylogeny of the grape family was not resolved until a recent phylogenomic analysis of 417 nuclear genes from transcriptome data. However, it has been reported extensively that topologies based on nuclear and organellar genes may be incongruent due to differences in their evolutionary histories. Therefore, it is important to reconstruct a backbone phylogeny of the grape family using plastomes and mitochondrial genes. In this study,next-generation sequencing data sets of 27 species were obtained using genome skimming with total DNAs from silica-gel preserved tissue samples on an Illumina NextSeq 500 instrument [corrected]. Plastomes were assembled using the combination of de novo and reference genome (of V. vinifera) methods. Sixteen mitochondrial genes were also obtained via genome skimming using the reference genome of V. vinifera. Extensive phylogenetic analyses were performed using maximum likelihood and Bayesian methods. The topology based on either plastome data or mitochondrial genes is congruent with the one using hundreds of nuclear genes, indicating that the grape family did not exhibit significant reticulation at the deep level. The results showcase the power of genome skimming in capturing extensive phylogenetic data: especially from chloroplast and mitochondrial DNAs. PMID:26656830

  16. Exploring the Gastrointestinal “Nemabiome”: Deep Amplicon Sequencing to Quantify the Species Composition of Parasitic Nematode Communities

    PubMed Central

    Avramenko, Russell W.; Redman, Elizabeth M.; Lewis, Roy; Yazwinski, Thomas A.; Wasmuth, James D.; Gilleard, John S.

    2015-01-01

    Parasitic helminth infections have a considerable impact on global human health as well as animal welfare and production. Although co-infection with multiple parasite species within a host is common, there is a dearth of tools with which to study the composition of these complex parasite communities. Helminth species vary in their pathogenicity, epidemiology and drug sensitivity and the interactions that occur between co-infecting species and their hosts are poorly understood. We describe the first application of deep amplicon sequencing to study parasitic nematode communities as well as introduce the concept of the gastro-intestinal “nemabiome”. The approach is analogous to 16S rDNA deep sequencing used to explore microbial communities, but utilizes the nematode ITS-2 rDNA locus instead. Gastro-intestinal parasites of cattle were used to develop the concept, as this host has many well-defined gastro-intestinal nematode species that commonly occur as complex co-infections. Further, the availability of pure mono-parasite populations from experimentally infected cattle allowed us to prepare mock parasite communities to determine, and correct for, species representation biases in the sequence data. We demonstrate that, once these biases have been corrected, accurate relative quantitation of gastro-intestinal parasitic nematode communities in cattle fecal samples can be achieved. We have validated the accuracy of the method applied to field-samples by comparing the results of detailed morphological examination of L3 larvae populations with those of the sequencing assay. The results illustrate the insights that can be gained into the species composition of parasite communities, using grazing cattle in the mid-west USA as an example. However, both the technical approach and the concept of the ‘nemabiome’ have a wide range of potential applications in human and veterinary medicine. These include investigations of host-parasite and parasite-parasite interactions

  17. Deep transcriptome sequencing of Pecten maximus hemocytes: a genomic resource for bivalve immunology.

    PubMed

    Pauletto, Marianna; Milan, Massimo; Moreira, Rebeca; Novoa, Beatriz; Figueras, Antonio; Babbucci, Massimiliano; Patarnello, Tomaso; Bargelloni, Luca

    2014-03-01

    Pecten maximus, the king scallop, is a bivalve species with important commercial value for both fisheries and aquaculture, traditionally consumed in several European countries. Major problems in larval rearing, however, still limit hatchery-based seed production. High mortalities during early larval stages, likely related to bacterial pathogens, represent the most relevant bottleneck. To address this issue, understanding host defense mechanisms against microbes is extremely important. In this study next-generation RNA-sequencing was carried on scallop hemocytes. To enrich for immune-related transcripts, cDNA libraries from hemocytes challenged in vivo with inactivated-Vibrio anguillarum and in vitro with pathogen-associated molecular patterns, as well as unchallenged controls, were sequenced yielding 216,444,674 sequence reads. De novo assembly of the scallop hemocyte transcriptome consisted of 73,732 contigs (31% annotated). A total of 934 contigs encoded proteins with a known immune function, grouped into several functional categories. Particular attention was reserved to Toll-like receptors (TLRs), a family of pattern recognition receptors (PRRs) involved in non-self recognition. Through mining the scallop hemocyte transcriptome, at least four TLRs could be identified. The organization of canonical TLR domains demonstrated that single cysteine cluster and multiple cysteine cluster TLRs co-exist in this species. In addition, preliminary data concerning their mRNA level following bacterial challenge suggested that different members of this family could exhibit opposite responses to pathogenic stimuli. Finally, a global analysis of differential expression comparing gene-expression levels in in vitro and in vivo stimulated hemocytes against controls provided evidence on a large set of transcripts involved in the great scallop immune response.

  18. Identification of an NAC Transcription Factor Family by Deep Transcriptome Sequencing in Onion (Allium cepa L.).

    PubMed

    Zheng, Xia; Tang, Shouwei; Zhu, Siyuan; Dai, Qiuzhong; Liu, Touming

    2016-01-01

    Although onion has been used extensively in the past for cytogenetic studies, molecular analysis has been lacking because the availability of genetic resources is limited. NAM, ATAF, and CUC (NAC) transcription factors (TFs) are plant-specific proteins, and they play key roles in plant growth, development, and stress tolerance. However, none of the onion NAC (CepNAC) genes had been identified thus far. In this study, the transcriptome of onion leaves was analyzed by Illumina paired-end sequencing. Approximately 102.9 million clean sequence reads were produced and used for de novo assembly, which generated 117,189 non-redundant transcripts. Of these transcripts, 39,472 were annotated for their function. In order to mine the CepNAC TFs, CepNAC genes were searched from the transcripts assembled, resulting in the identification of all 39 CepNAC genes. These 39 CepNAC proteins were subjected to phylogenetic analysis together with 47 NAC proteins of known function that were previously identified in other species. The results showed that they can be divided into five groups (NAC-I-V). Interestingly, the NAC-IV and -V groups were found to be likely related to the processes of secondary wall synthesis and stress response, respectively. The transcriptome analysis generated a substantial amount of transcripts, which will aid immensely in identifying important genes and accelerating our understanding of onion growth and development. Moreover, the discovery of 39 CepNAC TFs and the identification of the sequence conservation between them and NAC proteins published will provide a basis for further characterization and validation of their functions in the future. PMID:27331904

  19. Identification of an NAC Transcription Factor Family by Deep Transcriptome Sequencing in Onion (Allium cepa L.)

    PubMed Central

    Zhu, Siyuan; Dai, Qiuzhong; Liu, Touming

    2016-01-01

    Although onion has been used extensively in the past for cytogenetic studies, molecular analysis has been lacking because the availability of genetic resources is limited. NAM, ATAF, and CUC (NAC) transcription factors (TFs) are plant-specific proteins, and they play key roles in plant growth, development, and stress tolerance. However, none of the onion NAC (CepNAC) genes had been identified thus far. In this study, the transcriptome of onion leaves was analyzed by Illumina paired-end sequencing. Approximately 102.9 million clean sequence reads were produced and used for de novo assembly, which generated 117,189 non-redundant transcripts. Of these transcripts, 39,472 were annotated for their function. In order to mine the CepNAC TFs, CepNAC genes were searched from the transcripts assembled, resulting in the identification of all 39 CepNAC genes. These 39 CepNAC proteins were subjected to phylogenetic analysis together with 47 NAC proteins of known function that were previously identified in other species. The results showed that they can be divided into five groups (NAC-I–V). Interestingly, the NAC-IV and -V groups were found to be likely related to the processes of secondary wall synthesis and stress response, respectively. The transcriptome analysis generated a substantial amount of transcripts, which will aid immensely in identifying important genes and accelerating our understanding of onion growth and development. Moreover, the discovery of 39 CepNAC TFs and the identification of the sequence conservation between them and NAC proteins published will provide a basis for further characterization and validation of their functions in the future. PMID:27331904

  20. Deep Sequencing-Based Analysis of the Cymbidium ensifolium Floral Transcriptome

    PubMed Central

    Li, Xiaobai; Luo, Jie; Yan, Tianlian; Xiang, Lin; Jin, Feng; Qin, Dehui; Sun, Chongbo; Xie, Ming

    2013-01-01

    Cymbidium ensifolium is a Chinese Cymbidium with an elegant shape, beautiful appearance, and a fragrant aroma. C. ensifolium has a long history of cultivation in China and it has excellent commercial value as a potted plant and cut flower. The development of C. ensifolium genomic resources has been delayed because of its large genome size. Taking advantage of technical and cost improvement of RNA-Seq, we extracted total mRNA from flower buds and mature flowers and obtained a total of 9.52 Gb of filtered nucleotides comprising 98,819,349 filtered reads. The filtered reads were assembled into 101,423 isotigs, representing 51,696 genes. Of the 101,423 isotigs, 41,873 were putative homologs of annotated sequences in the public databases, of which 158 were associated with floral development and 119 were associated with flowering. The isotigs were categorized according to their putative functions. In total, 10,212 of the isotigs were assigned into 25 eukaryotic orthologous groups (KOGs), 41,690 into 58 gene ontology (GO) terms, and 9,830 into 126 Arabidopsis Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, and 9,539 isotigs into 123 rice pathways. Comparison of the isotigs with those of the two related orchid species P. equestris and C. sinense showed that 17,906 isotigs are unique to C. ensifolium. In addition, a total of 7,936 SSRs and 16,676 putative SNPs were identified. To our knowledge, this transcriptome database is the first major genomic resource for C. ensifolium and the most comprehensive transcriptomic resource for genus Cymbidium. These sequences provide valuable information for understanding the molecular mechanisms of floral development and flowering. Sequences predicted to be unique to C. ensifolium would provide more insights into C. ensifolium gene diversity. The numerous SNPs and SSRs identified in the present study will contribute to marker development for C. ensifolium. PMID:24392013

  1. Deep sequencing-based analysis of the Cymbidium ensifolium floral transcriptome.

    PubMed

    Li, Xiaobai; Luo, Jie; Yan, Tianlian; Xiang, Lin; Jin, Feng; Qin, Dehui; Sun, Chongbo; Xie, Ming

    2013-01-01

    Cymbidium ensifolium is a Chinese Cymbidium with an elegant shape, beautiful appearance, and a fragrant aroma. C. ensifolium has a long history of cultivation in China and it has excellent commercial value as a potted plant and cut flower. The development of C. ensifolium genomic resources has been delayed because of its large genome size. Taking advantage of technical and cost improvement of RNA-Seq, we extracted total mRNA from flower buds and mature flowers and obtained a total of 9.52 Gb of filtered nucleotides comprising 98,819,349 filtered reads. The filtered reads were assembled into 101,423 isotigs, representing 51,696 genes. Of the 101,423 isotigs, 41,873 were putative homologs of annotated sequences in the public databases, of which 158 were associated with floral development and 119 were associated with flowering. The isotigs were categorized according to their putative functions. In total, 10,212 of the isotigs were assigned into 25 eukaryotic orthologous groups (KOGs), 41,690 into 58 gene ontology (GO) terms, and 9,830 into 126 Arabidopsis Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, and 9,539 isotigs into 123 rice pathways. Comparison of the isotigs with those of the two related orchid species P. equestris and C. sinense showed that 17,906 isotigs are unique to C. ensifolium. In addition, a total of 7,936 SSRs and 16,676 putative SNPs were identified. To our knowledge, this transcriptome database is the first major genomic resource for C. ensifolium and the most comprehensive transcriptomic resource for genus Cymbidium. These sequences provide valuable information for understanding the molecular mechanisms of floral development and flowering. Sequences predicted to be unique to C. ensifolium would provide more insights into C. ensifolium gene diversity. The numerous SNPs and SSRs identified in the present study will contribute to marker development for C. ensifolium.

  2. Deep Sequencing of Plant and Animal DNA Contained within Traditional Chinese Medicines Reveals Legality Issues and Health Safety Concerns

    PubMed Central

    Coghlan, Megan L.; Haile, James; Houston, Jayne; Murray, Dáithí C.; White, Nicole E.; Moolhuijzen, Paula; Bellgard, Matthew I.; Bunce, Michael

    2012-01-01

    Traditional Chinese medicine (TCM) has been practiced for thousands of years, but only within the last few decades has its use become more widespread outside of Asia. Concerns continue to be raised about the efficacy, legality, and safety of many popular complementary alternative medicines, including TCMs. Ingredients of some TCMs are known to include derivatives of endangered, trade-restricted species of plants and animals, and therefore contravene the Convention on International Trade in Endangered Species (CITES) legislation. Chromatographic studies have detected the presence of heavy metals and plant toxins within some TCMs, and there are numerous cases of adverse reactions. It is in the interests of both biodiversity conservation and public safety that techniques are developed to screen medicinals like TCMs. Targeting both the p-loop region of the plastid trnL gene and the mitochondrial 16S ribosomal RNA gene, over 49,000 amplicon sequence reads were generated from 15 TCM samples presented in the form of powders, tablets, capsules, bile flakes, and herbal teas. Here we show that second-generation, high-throughput sequencing (HTS) of DNA represents an effective means to genetically audit organic ingredients within complex TCMs. Comparison of DNA sequence data to reference databases revealed the presence of 68 different plant families and included genera, such as Ephedra and Asarum, that are potentially toxic. Similarly, animal families were identified that include genera that are classified as vulnerable, endangered, or critically endangered, including Asiatic black bear (Ursus thibetanus) and Saiga antelope (Saiga tatarica). Bovidae, Cervidae, and Bufonidae DNA were also detected in many of the TCM samples and were rarely declared on the product packaging. This study demonstrates that deep sequencing via HTS is an efficient and cost-effective way to audit highly processed TCM products and will assist in monitoring their legality and safety especially when

  3. Insights into the genetic structure and diversity of 38 South Asian Indians from deep whole-genome sequencing.

    PubMed

    Wong, Lai-Ping; Lai, Jason Kuan-Han; Saw, Woei-Yuh; Ong, Rick Twee-Hee; Cheng, Anthony Youzhi; Pillai, Nisha Esakimuthu; Liu, Xuanyao; Xu, Wenting; Chen, Peng; Foo, Jia-Nee; Tan, Linda Wei-Lin; Koo, Seok-Hwee; Soong, Richie; Wenk, Markus Rene; Lim, Wei-Yen; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2014-05-01

    South Asia possesses a significant amount of genetic diversity due to considerable intergroup differences in culture and language. There have been numerous reports on the genetic structure of Asian Indians, although these have mostly relied on genotyping microarrays or targeted sequencing of the mitochondria and Y chromosomes. Asian Indians in Singapore are primarily descendants of immigrants from Dravidian-language-speaking states in south India, and 38 individuals from the general population underwent deep whole-genome sequencing with a target coverage of 30X as part of the Singapore Sequencing Indian Project (SSIP). The genetic structure and diversity of these samples were compared against samples from the Singapore Sequencing Malay Project and populations in Phase 1 of the 1,000 Genomes Project (1 KGP). SSIP samples exhibited greater intra-population genetic diversity and possessed higher heterozygous-to-homozygous genotype ratio than other Asian populations. When compared against a panel of well-defined Asian Indians, the genetic makeup of the SSIP samples was closely related to South Indians. However, even though the SSIP samples clustered distinctly from the Europeans in the global population structure analysis with autosomal SNPs, eight samples were assigned to mitochondrial haplogroups that were predominantly present in Europeans and possessed higher European admixture than the remaining samples. An analysis of the relative relatedness between SSIP with two archaic hominins (Denisovan, Neanderthal) identified higher ancient admixture in East Asian populations than in SSIP. The data resource for these samples is publicly available and is expected to serve as a valuable complement to the South Asian samples in Phase 3 of 1 KGP.

  4. Sequence-of-events-driven automation of the deep space network

    NASA Technical Reports Server (NTRS)

    Hill, R., Jr.; Fayyad, K.; Smyth, C.; Santos, T.; Chen, R.; Chien, S.; Bevan, R.

    1996-01-01

    In February 1995, sequence-of-events (SOE)-driven automation technology was demonstrated for a Voyager telemetry downlink track at DSS 13. This demonstration entailed automated generation of an operations procedure (in the form of a temporal dependency network) from project SOE information using artificial intelligence planning technology and automated execution of the temporal dependency network using the link monitor and control operator assistant system. This article describes the overall approach to SOE-driven automation that was demonstrated, identifies gaps in SOE definitions and project profiles that hamper automation, and provides detailed measurements of the knowledge engineering effort required for automation.

  5. Sequence-of-Events-Driven Automation of the Deep Space Network

    NASA Astrophysics Data System (ADS)

    Hill, R., Jr.; Fayyad, K.; Smyth, C.; Santos, T.; Chen, R.; Chien, S.; Bevan, R.

    1995-10-01

    In February 1995, sequence-of-events (SOE)-driven automation technology was demonstrated for a Voyager telemetry downlink track at DSS 13. This demonstration entailed automated generation of an operations procedure (in the form of a temporal dependency network) from project SOE information using artificial intelligence planning technology and automated execution of the temporal dependency network using the link monitor and control operator assistant system. This article describes the overall approach to SOE-driven automation that was demonstrated, identifies gaps in SOE definitions and project profiles that hamper automation, and provides detailed measurements of the knowledge engineering effort required for automation.

  6. Deep sequencing of subseafloor eukaryotic rRNA reveals active Fungi across marine subsurface provinces.

    PubMed

    Orsi, William; Biddle, Jennifer F; Edgcomb, Virginia

    2013-01-01

    The deep marine subsurface is a vast habitat for microbial life where cells may live on geologic timescales. Because DNA in sediments may be preserved on long timescales, ribosomal RNA (rRNA) is suggested to be a proxy for the active fraction of a microbial community in the subsurface. During an investigation of eukaryotic 18S rRNA by amplicon pyrosequencing, unique profiles of Fungi were found across a range of marine subsurface provinces including ridge flanks, continental margins, and abyssal plains. Subseafloor fungal populations exhibit statistically significant correlations with total organic carbon (TOC), nitrate, sulfide, and dissolved inorganic carbon (DIC). These correlations are supported by terminal restriction length polymorphism (TRFLP) analyses of fungal rRNA. Geochemical correlations with fungal pyrosequencing and TRFLP data from this geographically broad sample set suggests environmental selection of active Fungi in the marine subsurface. Within the same dataset, ancient rRNA signatures were recovered from plants and diatoms in marine sediments ranging from 0.03 to 2.7 million years old, suggesting that rRNA from some eukaryotic taxa may be much more stable than previously considered in the marine subsurface.

  7. Deep Sequencing of Subseafloor Eukaryotic rRNA Reveals Active Fungi across Marine Subsurface Provinces

    PubMed Central

    Orsi, William; Biddle, Jennifer F.; Edgcomb, Virginia

    2013-01-01

    The deep marine subsurface is a vast habitat for microbial life where cells may live on geologic timescales. Because DNA in sediments may be preserved on long timescales, ribosomal RNA (rRNA) is suggested to be a proxy for the active fraction of a microbial community in the subsurface. During an investigation of eukaryotic 18S rRNA by amplicon pyrosequencing, unique profiles of Fungi were found across a range of marine subsurface provinces including ridge flanks, continental margins, and abyssal plains. Subseafloor fungal populations exhibit statistically significant correlations with total organic carbon (TOC), nitrate, sulfide, and dissolved inorganic carbon (DIC). These correlations are supported by terminal restriction length polymorphism (TRFLP) analyses of fungal rRNA. Geochemical correlations with fungal pyrosequencing and TRFLP data from this geographically broad sample set suggests environmental selection of active Fungi in the marine subsurface. Within the same dataset, ancient rRNA signatures were recovered from plants and diatoms in marine sediments ranging from 0.03 to 2.7 million years old, suggesting that rRNA from some eukaryotic taxa may be much more stable than previously considered in the marine subsurface. PMID:23418556

  8. Computational Approaches for the Analysis of ncRNA through Deep Sequencing Techniques

    PubMed Central

    Veneziano, Dario; Nigita, Giovanni; Ferro, Alfredo

    2015-01-01

    The majority of the human transcriptome is defined as non-coding RNA (ncRNA), since only a small fraction of human DNA encodes for proteins, as reported by the ENCODE project. Several distinct classes of ncRNAs, such as transfer RNA, microRNA, and long non-coding RNA, have been classified, each with its own three-dimensional folding and specific function. As ncRNAs are highly abundant in living organisms and have been discovered to play important roles in many biological processes, there has been an ever increasing need to investigate the entire ncRNAome in further unbiased detail. Recently, the advent of next-generation sequencing (NGS) technologies has substantially increased the throughput of transcriptome studies, allowing an unprecedented investigation of ncRNAs, as regulatory pathways and novel functions involving ncRNAs are now also emerging. The huge amount of transcript data produced by NGS has progressively required the development and implementation of suitable bioinformatics workflows, complemented by knowledge-based approaches, to identify, classify, and evaluate the expression of hundreds of ncRNAs in normal and pathological conditions, such as cancer. In this mini-review, we present and discuss current bioinformatics advances in the development of such computational approaches to analyze and classify the ncRNA component of human transcriptome sequence data obtained from NGS technologies. PMID:26090362

  9. Deep sequencing reveals exceptional diversity and modes of transmission for bacterial sponge symbionts

    PubMed Central

    Webster, Nicole S; Taylor, Michael W; Behnam, Faris; Lücker, Sebastian; Rattei, Thomas; Whalan, Stephen; Horn, Matthias; Wagner, Michael

    2010-01-01

    Marine sponges contain complex bacterial communities of considerable ecological and biotechnological importance, with many of these organisms postulated to be specific to sponge hosts. Testing this hypothesis in light of the recent discovery of the rare microbial biosphere, we investigated three Australian sponges by massively parallel 16S rRNA gene tag pyrosequencing. Here we show bacterial diversity that is unparalleled in an invertebrate host, with more than 250 000 sponge-derived sequence tags being assigned to 23 bacterial phyla and revealing up to 2996 operational taxonomic units (95% sequence similarity) per sponge species. Of the 33 previously described ‘sponge-specific’ clusters that were detected in this study, 48% were found exclusively in adults and larvae – implying vertical transmission of these groups. The remaining taxa, including ‘Poribacteria’, were also found at very low abundance among the 135 000 tags retrieved from surrounding seawater. Thus, members of the rare seawater biosphere may serve as seed organisms for widely occurring symbiont populations in sponges and their host association might have evolved much more recently than previously thought. PMID:21966903

  10. Deep sequencing uncovers numerous small RNAs on all four replicons of the plant pathogen Agrobacterium tumefaciens.

    PubMed

    Wilms, Ina; Overlöper, Aaron; Nowrousian, Minou; Sharma, Cynthia M; Narberhaus, Franz

    2012-04-01

    Agrobacterium species are capable of interkingdom gene transfer between bacteria and plants. The genome of Agrobacterium tumefaciens consists of a circular and a linear chromosome, the At-plasmid and the Ti-plasmid, which harbors bacterial virulence genes required for tumor formation in plants. Little is known about promoter sequences and the small RNA (sRNA) repertoire of this and other α-proteobacteria. We used a differential RNA sequencing (dRNA-seq) approach to map transcriptional start sites of 388 annotated genes and operons. In addition, a total number of 228 sRNAs was revealed from all four Agrobacterium replicons. Twenty-two of these were confirmed by independent RNA gel blot analysis and several sRNAs were differentially expressed in response to growth media, growth phase, temperature or pH. One sRNA from the Ti-plasmid was massively induced under virulence conditions. The presence of 76 cis-antisense sRNAs, two of them on the reverse strand of virulence genes, suggests considerable antisense transcription in Agrobacterium. The information gained from this study provides a valuable reservoir for an in-depth understanding of sRNA-mediated regulation of the complex physiology and infection process of Agrobacterium.

  11. Identification of conservative microRNAs in Saanen dairy goat testis through deep sequencing.

    PubMed

    Wu, J; Zhu, H; Song, W; Li, M; Liu, C; Li, N; Tang, F; Mu, H; Liao, M; Li, X; Guan, W; Li, X; Hua, J

    2014-02-01

    MicroRNA (miRNA) is a kind of small non-coding RNA molecules that function as important gene expression regulators by targeting messenger RNAs for post-transcriptional endonucleolytic cleavage or translational inhibition. In this study, small RNA libraries were constructed based on adult dairy goat testicular tissues and sequenced using the Illumina high-throughput sequencing technology. Blasted to miRNAs of cow and sheep in miRBase 19.0, 373 conserved miRNAs were identified in dairy goat testis and 91 novel paired-miRNAs were found. Expression of miRNAs in the dairy goat testis (miR-10b, miR-126-3p, miR-126-5p, miR-34c, miR-449b and miR-1468) was confirmed by qRT-PCR. In addition, the 128 conserved miRNAs were found by comparing the miRNA expression profiles in dairy goat testis with those in cow and mouse, which all might be involved in dairy goat testis development and meiosis. This study reveals the first miRNA profile related to the biology of testis in the dairy goat. The characterization of these miRNAs could contribute to a better understanding of the molecular mechanisms of reproductive physiology and development in the dairy goat.

  12. Computational Approaches for the Analysis of ncRNA through Deep Sequencing Techniques.

    PubMed

    Veneziano, Dario; Nigita, Giovanni; Ferro, Alfredo

    2015-01-01

    The majority of the human transcriptome is defined as non-coding RNA (ncRNA), since only a small fraction of human DNA encodes for proteins, as reported by the ENCODE project. Several distinct classes of ncRNAs, such as transfer RNA, microRNA, and long non-coding RNA, have been classified, each with its own three-dimensional folding and specific function. As ncRNAs are highly abundant in living organisms and have been discovered to play important roles in many biological processes, there has been an ever increasing need to investigate the entire ncRNAome in further unbiased detail. Recently, the advent of next-generation sequencing (NGS) technologies has substantially increased the throughput of transcriptome studies, allowing an unprecedented investigation of ncRNAs, as regulatory pathways and novel functions involving ncRNAs are now also emerging. The huge amount of transcript data produced by NGS has progressively required the development and implementation of suitable bioinformatics workflows, complemented by knowledge-based approaches, to identify, classify, and evaluate the expression of hundreds of ncRNAs in normal and pathological conditions, such as cancer. In this mini-review, we present and discuss current bioinformatics advances in the development of such computational approaches to analyze and classify the ncRNA component of human transcriptome sequence data obtained from NGS technologies. PMID:26090362

  13. Functional characterization of a monoclonal antibody epitope using a lambda phage display-deep sequencing platform

    PubMed Central

    Domina, Maria; Lanza Cariccio, Veronica; Benfatto, Salvatore; Venza, Mario; Venza, Isabella; Borgogni, Erica; Castellino, Flora; Midiri, Angelina; Galbo, Roberta; Romeo, Letizia; Biondo, Carmelo; Masignani, Vega; Teti, Giuseppe; Felici, Franco; Beninati, Concetta

    2016-01-01

    We have recently described a method, named PROFILER, for the identification of antigenic regions preferentially targeted by polyclonal antibody responses after vaccination. To test the ability of the technique to provide insights into the functional properties of monoclonal antibody (mAb) epitopes, we used here a well-characterized epitope of meningococcal factor H binding protein (fHbp), which is recognized by mAb 12C1. An fHbp library, engineered on a lambda phage vector enabling surface expression of polypeptides of widely different length, was subjected to massive parallel sequencing of the phage inserts after affinity selection with the 12C1 mAb. We detected dozens of unique antibody-selected sequences, the most enriched of which (designated as FrC) could largely recapitulate the ability of fHbp to bind mAb 12C1. Computational analysis of the cumulative enrichment of single amino acids in the antibody-selected fragments identified two overrepresented stretches of residues (H248-K254 and S140-G154), whose presence was subsequently found to be required for binding of FrC to mAb 12C1. Collectively, these results suggest that the PROFILER technology can rapidly and reliably identify, in the context of complex conformational epitopes, discrete “hot spots” with a crucial role in antigen-antibody interactions, thereby providing useful clues for the functional characterization of the epitope. PMID:27530334

  14. Deep Sequencing Insights in Therapeutic shRNA Processing and siRNA Target Cleavage Precision

    PubMed Central

    Denise, Hubert; Moschos, Sterghios A.; Sidders, Benjamin; Burden, Frances; Perkins, Hannah; Carter, Nikki; Stroud, Tim; Kennedy, Michael; Fancy, Sally-Ann; Lapthorn, Cris; Lavender, Helen; Kinloch, Ross; Suhy, David; Corbau, Romu

    2014-01-01

    TT-034 (PF-05095808) is a recombinant adeno-associated virus serotype 8 (AAV8) agent expressing three short hairpin RNA (shRNA) pro-drugs that target the hepatitis C virus (HCV) RNA genome. The cytosolic enzyme Dicer cleaves each shRNA into multiple, potentially active small interfering RNA (siRNA) drugs. Using next-generation sequencing (NGS) to identify and characterize active shRNAs maturation products, we observed that each TT-034–encoded shRNA could be processed into as many as 95 separate siRNA strands. Few of these appeared active as determined by Sanger 5′ RNA Ligase-Mediated Rapid Amplification of cDNA Ends (5-RACE) and through synthetic shRNA and siRNA analogue studies. Moreover, NGS scrutiny applied on 5-RACE products (RACE-seq) suggested that synthetic siRNAs could direct cleavage in not one, but up to five separate positions on targeted RNA, in a sequence-dependent manner. These data support an on-target mechanism of action for TT-034 without cytotoxicity and question the accepted precision of substrate processing by the key RNA interference (RNAi) enzymes Dicer and siRNA-induced silencing complex (siRISC). PMID:24496437

  15. Deep-sequence profiling of miRNAs and their target prediction in Monotropa hypopitys.

    PubMed

    Shchennikova, Anna V; Beletsky, Alexey V; Shulga, Olga A; Mazur, Alexander M; Prokhortchouk, Egor B; Kochieva, Elena Z; Ravin, Nikolay V; Skryabin, Konstantin G

    2016-07-01

    Myco-heterotroph Monotropa hypopitys is a widely spread perennial herb used to study symbiotic interactions and physiological mechanisms underlying the development of non-photosynthetic plant. Here, we performed, for the first time, transcriptome-wide characterization of M. hypopitys miRNA profile using high throughput Illumina sequencing. As a result of small RNA library sequencing and bioinformatic analysis, we identified 55 members belonging to 40 families of known miRNAs and 17 putative novel miRNAs unique for M. hypopitys. Computational screening revealed 206 potential mRNA targets for known miRNAs and 31 potential mRNA targets for novel miRNAs. The predicted target genes were described in Gene Ontology terms and were found to be involved in a broad range of metabolic and regulatory pathways. The identification of novel M. hypopitys-specific miRNAs, some with few target genes and low abundances, suggests their recent evolutionary origin and participation in highly specialized regulatory mechanisms fundamental for non-photosynthetic biology of M. hypopitys. This global analysis of miRNAs and their potential targets in M. hypopitys provides a framework for further investigation of miRNA role in the evolution and establishment of non-photosynthetic myco-heterotrophs. PMID:27097902

  16. Evolutionary Relations of Hexanchiformes Deep-Sea Sharks Elucidated by Whole Mitochondrial Genome Sequences

    PubMed Central

    Tanaka, Keiko; Tomita, Taketeru; Suzuki, Shingo; Hosomichi, Kazuyoshi; Sano, Kazumi; Doi, Hiroyuki; Kono, Azumi; Inoko, Hidetoshi; Kulski, Jerzy K.; Tanaka, Sho

    2013-01-01

    Hexanchiformes is regarded as a monophyletic taxon, but the morphological and genetic relationships between the five extant species within the order are still uncertain. In this study, we determined the whole mitochondrial DNA (mtDNA) sequences of seven sharks including representatives of the five Hexanchiformes, one squaliform, and one carcharhiniform and inferred the phylogenetic relationships among those species and 12 other Chondrichthyes (cartilaginous fishes) species for which the complete mitogenome is available. The monophyly of Hexanchiformes and its close relation with all other Squaliformes sharks were strongly supported by likelihood and Bayesian phylogenetic analysis of 13,749 aligned nucleotides of 13 protein coding genes and two rRNA genes that were derived from the whole mDNA sequences of the 19 species. The phylogeny suggested that Hexanchiformes is in the superorder Squalomorphi, Chlamydoselachus anguineus (frilled shark) is the sister species to all other Hexanchiformes, and the relations within Hexanchiformes are well resolved as Chlamydoselachus, (Notorynchus, (Heptranchias, (Hexanchus griseus, H. nakamurai))). Based on our phylogeny, we discussed evolutionary scenarios of the jaw suspension mechanism and gill slit numbers that are significant features in the sharks. PMID:24089661

  17. Identification of small non-coding RNAs in the planarian Dugesia japonica via deep sequencing.

    PubMed

    Qin, Yun-Fei; Zhao, Jin-Mei; Bao, Zhen-Xia; Zhu, Zhao-Yu; Mai, Jia; Huang, Yi-Bo; Li, Jian-Biao; Chen, Ge; Lu, Ping; Chen, San-Jun; Su, Lin-Lin; Fang, Hui-Min; Lu, Ji-Ke; Zhang, Yi-Zhe; Zhang, Shou-Tao

    2012-05-01

    Freshwater planarian flatworm possesses an extraordinary ability to regenerate lost body parts after amputation; it is perfect organism model in regeneration and stem cell biology. Recently, small RNAs have been an increasing concern and studied in many aspects, including regeneration and stem cell biology, among others. In the current study, the large-scale cloning and sequencing of sRNAs from the intact and regenerative planarian Dugesia japonica are reported. Sequence analysis shows that sRNAs between 18nt and 40nt are mainly microRNAs and piRNAs. In addition, 209 conserved miRNAs and 12 novel miRNAs are identified. Especially, a better screening target method, negative-correlation relationship of miRNAs and mRNA, is adopted to improve target prediction accuracy. Similar to miRNAs, a diverse population of piRNAs and changes in the two samples are also listed. The present study is the first to report on the important role of sRNAs during planarian Dugesia japonica regeneration. PMID:22425900

  18. Deep Sequencing Insights in Therapeutic shRNA Processing and siRNA Target Cleavage Precision.

    PubMed

    Denise, Hubert; Moschos, Sterghios A; Sidders, Benjamin; Burden, Frances; Perkins, Hannah; Carter, Nikki; Stroud, Tim; Kennedy, Michael; Fancy, Sally-Ann; Lapthorn, Cris; Lavender, Helen; Kinloch, Ross; Suhy, David; Corbau, Romu

    2014-01-01

    TT-034 (PF-05095808) is a recombinant adeno-associated virus serotype 8 (AAV8) agent expressing three short hairpin RNA (shRNA) pro-drugs that target the hepatitis C virus (HCV) RNA genome. The cytosolic enzyme Dicer cleaves each shRNA into multiple, potentially active small interfering RNA (siRNA) drugs. Using next-generation sequencing (NGS) to identify and characterize active shRNAs maturation products, we observed that each TT-034-encoded shRNA could be processed into as many as 95 separate siRNA strands. Few of these appeared active as determined by Sanger 5' RNA Ligase-Mediated Rapid Amplification of cDNA Ends (5-RACE) and through synthetic shRNA and siRNA analogue studies. Moreover, NGS scrutiny applied on 5-RACE products (RACE-seq) suggested that synthetic siRNAs could direct cleavage in not one, but up to five separate positions on targeted RNA, in a sequence-dependent manner. These data support an on-target mechanism of action for TT-034 without cytotoxicity and question the accepted precision of substrate processing by the key RNA interference (RNAi) enzymes Dicer and siRNA-induced silencing complex (siRISC).Molecular Therapy-Nucleic Acids (2014) 3, e145; doi:10.1038/mtna.2013.73; published online 4 February 2014.

  19. Focused Evolution of HIV-1 Neutralizing Antibodies Revealed by Structures and Deep Sequencing

    SciTech Connect

    Wu, Xueling; Zhou, Tongqing; Zhu, Jiang; Zhang, Baoshan; Georgiev, Ivelin; Wang, Charlene; Chen, Xuejun; Longo, Nancy S.; Louder, Mark; McKee, Krisha; O’Dell, Sijy; Perfetto, Stephen; Schmidt, Stephen D.; Shi, Wei; Wu, Lan; Yang, Yongping; Yang, Zhi-Yong; Yang, Zhongjia; Zhang, Zhenhai; Bonsignori, Mattia; Crump, John A.; Kapiga, Saidi H.; Sam, Noel E.; Haynes, Barton F.; Simek, Melissa; Burton, Dennis R.; Koff, Wayne C.; Doria-Rose, Nicole A.; Connors, Mark; Mullikin, James C.; Nabel, Gary J.; Roederer, Mario; Shapiro, Lawrence; Kwong, Peter D.; Mascola, John R.

    2013-03-04

    Antibody VRC01 is a human immunoglobulin that neutralizes about 90% of HIV-1 isolates. To understand how such broadly neutralizing antibodies develop, we used x-ray crystallography and 454 pyrosequencing to characterize additional VRC01-like antibodies from HIV-1-infected individuals. Crystal structures revealed a convergent mode of binding for diverse antibodies to the same CD4-binding-site epitope. A functional genomics analysis of expressed heavy and light chains revealed common pathways of antibody-heavy chain maturation, confined to the IGHV1-2*02 lineage, involving dozens of somatic changes, and capable of pairing with different light chains. Broadly neutralizing HIV-1 immunity associated with VRC01-like antibodies thus involves the evolution of antibodies to a highly affinity-matured state required to recognize an invariant viral structure, with lineages defined from thousands of sequences providing a genetic roadmap of their development.

  20. Genomic DNA sequences from mastodon and woolly mammoth reveal deep speciation of forest and savanna elephants.

    PubMed

    Rohland, Nadin; Reich, David; Mallick, Swapan; Meyer, Matthias; Green, Richard E; Georgiadis, Nicholas J; Roca, Alfred L; Hofreiter, Michael

    2010-12-21

    To elucidate the history of living and extinct elephantids, we generated 39,763 bp of aligned nuclear DNA sequence across 375 loci for African savanna elephant, African forest elephant, Asian elephant, the extinct American mastodon, and the woolly mammoth. Our data establish that the Asian elephant is the closest living relative of the extinct mammoth in the nuclear genome, extending previous findings from mitochondrial DNA analyses. We also find that savanna and forest elephants, which some have argued are the same species, are as or more divergent in the nuclear genome as mammoths and Asian elephants, which are considered to be distinct genera, thus resolving a long-standing debate about the appropriate taxonomic classification of the African elephants. Finally, we document a much larger effective population size in forest elephants compared with the other elephantid taxa, likely reflecting species differences in ancient geographic structure and range and differences in life history traits such as variance in male reproductive success.

  1. Genomic DNA sequences from mastodon and woolly mammoth reveal deep speciation of forest and savanna elephants.

    PubMed

    Rohland, Nadin; Reich, David; Mallick, Swapan; Meyer, Matthias; Green, Richard E; Georgiadis, Nicholas J; Roca, Alfred L; Hofreiter, Michael

    2010-01-01

    To elucidate the history of living and extinct elephantids, we generated 39,763 bp of aligned nuclear DNA sequence across 375 loci for African savanna elephant, African forest elephant, Asian elephant, the extinct American mastodon, and the woolly mammoth. Our data establish that the Asian elephant is the closest living relative of the extinct mammoth in the nuclear genome, extending previous findings from mitochondrial DNA analyses. We also find that savanna and forest elephants, which some have argued are the same species, are as or more divergent in the nuclear genome as mammoths and Asian elephants, which are considered to be distinct genera, thus resolving a long-standing debate about the appropriate taxonomic classification of the African elephants. Finally, we document a much larger effective population size in forest elephants compared with the other elephantid taxa, likely reflecting species differences in ancient geographic structure and range and differences in life history traits such as variance in male reproductive success. PMID:21203580

  2. Deep sequencing of gastric carcinoma reveals somatic mutations relevant to personalized medicine

    PubMed Central

    2011-01-01

    Background Globally, gastric cancer is the second most common cause of cancer-related death, with the majority of the health burden borne by economically less-developed countries. Methods Here, we report a genetic characterization of 50 gastric adenocarcinoma samples, using affymetrix SNP arrays and Illumina mRNA expression arrays as well as Illumina sequencing of the coding regions of 384 genes belonging to various pathways known to be altered in other cancers. Results Genetic alterations were observed in the WNT, Hedgehog, cell cycle, DNA damage and epithelial-to-mesenchymal-transition pathways. Conclusions The data suggests targeted therapies approved or in clinical development for gastric carcinoma would be of benefit to ~22% of the patients studied. In addition, the novel mutations detected here, are likely to influence clinical response and suggest new targets for drug discovery. PMID:21781349

  3. The transcriptome of Verticillium dahliae-infected Nicotiana benthamiana determined by deep RNA sequencing.

    PubMed

    Faino, Luigi; de Jonge, Ronnie; Thomma, Bart P H J

    2012-09-01

    Verticillium wilt disease is caused by fungi of the Verticillium genus that occur on a wide range of host plants, including Solanaceous species such as tomato and tobacco. Currently, the well characterized Ve1 gene of tomato is the only Verticillium wilt resistance gene cloned. During experiments to identify the Verticillium molecule that activates Ve1 resistance in tomato, RNA sequencing (RNA-Seq) of Verticillium-infected Nicotiana benthamiana was performed. In total, over 99% of the obtained reads were derived from N. benthamiana. Here, we report the assembly and annotation of the N. benthamiana transcriptome. In total, 142,738 transcripts > 100 bp were obtained, amounting to a total transcriptome size of 38.7 Mbp, which is comparable to the Arabidopsis transcriptome. About 30,282 transcripts could be annotated based on homology to Arabidopsis genes. By assembly of the N. benthamiana transcriptome, we provide a catalogue of transcripts of a Solanaceous model plant under pathogen stress.

  4. Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes

    PubMed Central

    Tennessen, Jacob A.; Bigham, Abigail W.; O'Connor, Timothy D.; Fu, Wenqing; Kenny, Eimear E.; Gravel, Simon; McGee, Sean; Do, Ron; Liu, Xiaoming; Jun, Goo; Kang, Hyun Min; Jordan, Daniel; Leal, Suzanne M.; Gabriel, Stacey; Rieder, Mark J.; Abecasis, Goncalo; Altshuler, David; Nickerson, Deborah A.; Boerwinkle, Eric; Sunyaev, Shamil; Bustamante, Carlos D.; Bamshad, Michael J.; Akey, Joshua M.

    2013-01-01

    As a first step toward understanding how rare variants contribute to risk for complex diseases, we sequenced 15,585 human protein-coding genes to an average median depth of 111× in 2440 individuals of European (n = 1351) and African (n = 1088) ancestry. We identified over 500,000 single-nucleotide variants (SNVs), the majority of which were rare (86% with a minor allele frequency less than 0.5%), previously unknown (82%), and population-specific (82%). On average, 2.3% of the 13,595 SNVs each person carried were predicted to affect protein function of ∼313 genes per genome, and ∼95.7% of SNVs predicted to be functionally important were rare. This excess of rare functional variants is due to the combined effects of explosive, recent accelerated population growth and weak purifying selection. Furthermore, we show that large sample sizes will be required to associate rare variants with complex traits. PMID:22604720

  5. Ultra-Deep Sequencing Reveals the microRNA Expression Pattern of the Human Stomach

    PubMed Central

    Ribeiro-dos-Santos, Ândrea; Khayat, André S.; Silva, Artur; Alencar, Dayse O.; Lobato, Jessé; Luz, Larissa; Pinheiro, Daniel G.; Varuzza, Leonardo; Assumpção, Monica; Assumpção, Paulo; Santos, Sidney; Zanette, Dalila L.; Silva, Wilson A.; Burbano, Rommel; Darnet, Sylvain

    2010-01-01

    Background While microRNAs (miRNAs) play important roles in tissue differentiation and in maintaining basal physiology, little is known about the miRNA expression levels in stomach tissue. Alterations in the miRNA profile can lead to cell deregulation, which can induce neoplasia. Methodology/Principal Findings A small RNA library of stomach tissue was sequenced using high-throughput SOLiD sequencing technology. We obtained 261,274 quality reads with perfect matches to the human miRnome, and 42% of known miRNAs were identified. Digital Gene Expression profiling (DGE) was performed based on read abundance and showed that fifteen miRNAs were highly expressed in gastric tissue. Subsequently, the expression of these miRNAs was validated in 10 healthy individuals by RT-PCR showed a significant correlation of 83.97% (P<0.05). Six miRNAs showed a low variable pattern of expression (miR-29b, miR-29c, miR-19b, miR-31, miR-148a, miR-451) and could be considered part of the expression pattern of the healthy gastric tissue. Conclusions/Significance This study aimed to validate normal miRNA profiles of human gastric tissue to establish a reference profile for healthy individuals. Determining the regulatory processes acting in the stomach will be important in the fight against gastric cancer, which is the second-leading cause of cancer mortality worldwide. PMID:20949028

  6. N-terminal amino acid sequence of the deep-sea tube worm haemoglobin remarkably resembles that of annelid haemoglobin.

    PubMed Central

    Suzuki, T; Takagi, T; Ohta, S

    1988-01-01

    The deep-sea giant tube worm Lamellibrachia, belonging to the phylum Vestimentifera, contains two extracellular haemoglobins, an Mr 3,000,000 haemoglobin and an Mr 440,000 haemoglobin. The former has a hexagonal bilayer structure and consists of six polypeptide chains (AI-VI); a study of its haem content shows that not all of the chains contain haem. The Mr 440,000 haemoglobin consists of four haem-containing chains (BI-IV). We isolated most of the chains by reverse-phase chromatography and determined the amino acid sequences of the 21-45 N-terminal residues. Eight chains (AI-IV and BI-IV) showed significant homology with haem-containing chains of annelid giant haemoglobin. The highest homology was found between Lamellibrachia chain AI and Tylorrhynchus chain I; surprisingly, 18 out of the 20 N-terminal residues are identical. On the other hand, chain AV, with an unusual Mr of 32,000, showed a rather different sequence and is likely to be a non-haem chain which might act as a linker protein in the assembly of the haem-containing chains. From these results, we conclude that the tube worm Mr 3,000,000 haemoglobin is highly homologous with annelid haemoglobin. Images Fig. 2. PMID:3202832

  7. Identification and characterization of novel serum microRNA candidates from deep sequencing in cervical cancer patients

    PubMed Central

    Juan, Li; Tong, Hong-li; Zhang, Pengjun; Guo, Guanghong; Wang, Zi; Wen, Xinyu; Dong, Zhennan; Tian, Ya-ping

    2014-01-01

    Small non-coding microRNAs (miRNAs) are involved in cancer development and progression, and serum profiles of cervical cancer patients may be useful for identifying novel miRNAs. We performed deep sequencing on serum pools of cervical cancer patients and healthy controls with 3 replicates and constructed a small RNA library. We used MIREAP to predict novel miRNAs and identified 2 putative novel miRNAs between serum pools of cervical cancer patients and healthy controls after filtering out pseudo-pre-miRNAs using Triplet-SVM analysis. The 2 putative novel miRNAs were validated by real time PCR and were significantly decreased in cervical cancer patients compared with healthy controls. One novel miRNA had an area under curve (AUC) of 0.921 (95% CI: 0.883, 0.959) with a sensitivity of 85.7% and a specificity of 88.2% when discriminating between cervical cancer patients and healthy controls. Our results suggest that characterizing serum profiles of cervical cancers by Solexa sequencing may be a good method for identifying novel miRNAs and that the validated novel miRNAs described here may be cervical cancer-associated biomarkers. PMID:25182173

  8. Deep sequencing and proteomic analysis of the microRNA-induced silencing complex in human red blood cells.

    PubMed

    Azzouzi, Imane; Moest, Hansjoerg; Wollscheid, Bernd; Schmugge, Markus; Eekels, Julia J M; Speer, Oliver

    2015-05-01

    During maturation, erythropoietic cells extrude their nuclei but retain their ability to respond to oxidant stress by tightly regulating protein translation. Several studies have reported microRNA-mediated regulation of translation during terminal stages of erythropoiesis, even after enucleation. In the present study, we performed a detailed examination of the endogenous microRNA machinery in human red blood cells using a combination of deep sequencing analysis of microRNAs and proteomic analysis of the microRNA-induced silencing complex. Among the 197 different microRNAs detected, miR-451a was the most abundant, representing more than 60% of all read sequences. In addition, miR-451a and its known target, 14-3-3ζ mRNA, were bound to the microRNA-induced silencing complex, implying their direct interaction in red blood cells. The proteomic characterization of endogenous Argonaute 2-associated microRNA-induced silencing complex revealed 26 cofactor candidates. Among these cofactors, we identified several RNA-binding proteins, as well as motor proteins and vesicular trafficking proteins. Our results demonstrate that red blood cells contain complex microRNA machinery, which might enable immature red blood cells to control protein translation independent of de novo nuclei information. PMID:25681748

  9. Deciphering KRAS and NRAS mutated clone dynamics in MLL-AF4 paediatric leukaemia by ultra deep sequencing analysis

    PubMed Central

    Trentin, Luca; Bresolin, Silvia; Giarin, Emanuela; Bardini, Michela; Serafin, Valentina; Accordi, Benedetta; Fais, Franco; Tenca, Claudya; De Lorenzo, Paola; Valsecchi, Maria Grazia; Cazzaniga, Giovanni; Kronnie, Geertruy te; Basso, Giuseppe

    2016-01-01

    To induce and sustain the leukaemogenic process, MLL-AF4+ leukaemia seems to require very few genetic alterations in addition to the fusion gene itself. Studies of infant and paediatric patients with MLL-AF4+ B cell precursor acute lymphoblastic leukaemia (BCP-ALL) have reported mutations in KRAS and NRAS with incidences ranging from 25 to 50%. Whereas previous studies employed Sanger sequencing, here we used next generation amplicon deep sequencing for in depth evaluation of RAS mutations in 36 paediatric patients at diagnosis of MLL-AF4+ leukaemia. RAS mutations including those in small sub-clones were detected in 63.9% of patients. Furthermore, the mutational analysis of 17 paired samples at diagnosis and relapse revealed complex RAS clone dynamics and showed that the mutated clones present at relapse were almost all originated from clones that were already detectable at diagnosis and survived to the initial therapy. Finally, we showed that mutated patients were indeed characterized by a RAS related signature at both transcriptional and protein levels and that the targeting of the RAS pathway could be of beneficial for treatment of MLL-AF4+ BCP-ALL clones carrying somatic RAS mutations. PMID:27698462

  10. Deep sequencing and in silico analyses identify MYB-regulated gene networks and signaling pathways in pancreatic cancer

    PubMed Central

    Azim, Shafquat; Zubair, Haseeb; Srivastava, Sanjeev K.; Bhardwaj, Arun; Zubair, Asif; Ahmad, Aamir; Singh, Seema; Khushman, Moh’d.; Singh, Ajay P.

    2016-01-01

    We have recently demonstrated that the transcription factor MYB can modulate several cancer-associated phenotypes in pancreatic cancer. In order to understand the molecular basis of these MYB-associated changes, we conducted deep-sequencing of transcriptome of MYB-overexpressing and -silenced pancreatic cancer cells, followed by in silico pathway analysis. We identified significant modulation of 774 genes upon MYB-silencing (p < 0.05) that were assigned to 25 gene networks by in silico analysis. Further analyses placed genes in our RNA sequencing-generated dataset to several canonical signalling pathways, such as cell-cycle control, DNA-damage and -repair responses, p53 and HIF1α. Importantly, we observed downregulation of the pancreatic adenocarcinoma signaling pathway in MYB-silenced pancreatic cancer cells exhibiting suppression of EGFR and NF-κB. Decreased expression of EGFR and RELA was validated by both qPCR and immunoblotting and they were both shown to be under direct transcriptional control of MYB. These observations were further confirmed in a converse approach wherein MYB was overexpressed ectopically in a MYB-null pancreatic cancer cell line. Our findings thus suggest that MYB potentially regulates growth and genomic stability of pancreatic cancer cells via targeting complex gene networks and signaling pathways. Further in-depth functional studies are warranted to fully understand MYB signaling in pancreatic cancer. PMID:27354262

  11. Identifying genomic changes associated with insecticide resistance in the dengue mosquito Aedes aegypti by deep targeted sequencing.

    PubMed

    Faucon, Frederic; Dusfour, Isabelle; Gaude, Thierry; Navratil, Vincent; Boyer, Frederic; Chandre, Fabrice; Sirisopa, Patcharawan; Thanispong, Kanutcharee; Juntarajumnong, Waraporn; Poupardin, Rodolphe; Chareonviriyaphap, Theeraphap; Girod, Romain; Corbel, Vincent; Reynaud, Stephane; David, Jean-Philippe

    2015-09-01

    The capacity of mosquitoes to resist insecticides threatens the control of diseases such as dengue and malaria. Until alternative control tools are implemented, characterizing resistance mechanisms is crucial for managing resistance in natural populations. Insecticide biodegradation by detoxification enzymes is a common resistance mechanism; however, the genomic changes underlying this mechanism have rarely been identified, precluding individual resistance genotyping. In particular, the role of copy number variations (CNVs) and polymorphisms of detoxification enzymes have never been investigated at the genome level, although they can represent robust markers of metabolic resistance. In this context, we combined target enrichment with high-throughput sequencing for conducting the first comprehensive screening of gene amplifications and polymorphisms associated with insecticide resistance in mosquitoes. More than 760 candidate genes were captured and deep sequenced in several populations of the dengue mosquito Ae. aegypti displaying distinct genetic backgrounds and contrasted resistance levels to the insecticide deltamethrin. CNV analysis identified 41 gene amplifications associated with resistance, most affecting cytochrome P450s overtranscribed in resistant populations. Polymorphism analysis detected more than 30,000 variants and strong selection footprints in specific genomic regions. Combining Bayesian and allele frequency filtering approaches identified 55 nonsynonymous variants strongly associated with resistance. Both CNVs and polymorphisms were conserved within regions but differed across continents, confirming that genomic changes underlying metabolic resistance to insecticides are not universal. By identifying novel DNA markers of insecticide resistance, this study opens the way for tracking down metabolic changes developed by mosquitoes to resist insecticides within and among populations.

  12. Deep sequencing of the tobacco mitochondrial transcriptome reveals expressed ORFs and numerous editing sites outside coding regions

    PubMed Central

    2014-01-01

    Background The purpose of this study was to sequence and assemble the tobacco mitochondrial transcriptome and obtain a genomic-level view of steady-state RNA abundance. Plant mitochondrial genomes have a small number of protein coding genes with large and variably sized intergenic spaces. In the tobacco mitogenome these intergenic spaces contain numerous open reading frames (ORFs) with no clear function. Results The assembled transcriptome revealed distinct monocistronic and polycistronic transcripts along with large intergenic spaces with little to no detectable RNA. Eighteen of the 117 ORFs were found to have steady-state RNA amounts above background in both deep-sequencing and qRT-PCR experiments and ten of those were found to be polysome associated. In addition, the assembled transcriptome enabled a full mitogenome screen of RNA C→U editing sites. Six hundred and thirty five potential edits were found with 557 occurring within protein-coding genes, five in tRNA genes, and 73 in non-coding regions. These sites were found in every protein-coding transcript in the tobacco mitogenome. Conclusion These results suggest that a small number of the ORFs within the tobacco mitogenome may produce functional proteins and that RNA editing occurs in coding and non-coding regions of mitochondrial transcripts. PMID:24433288

  13. Deep sequencing and in silico analyses identify MYB-regulated gene networks and signaling pathways in pancreatic cancer.

    PubMed

    Azim, Shafquat; Zubair, Haseeb; Srivastava, Sanjeev K; Bhardwaj, Arun; Zubair, Asif; Ahmad, Aamir; Singh, Seema; Khushman, Moh'd; Singh, Ajay P

    2016-06-29

    We have recently demonstrated that the transcription factor MYB can modulate several cancer-associated phenotypes in pancreatic cancer. In order to understand the molecular basis of these MYB-associated changes, we conducted deep-sequencing of transcriptome of MYB-overexpressing and -silenced pancreatic cancer cells, followed by in silico pathway analysis. We identified significant modulation of 774 genes upon MYB-silencing (p < 0.05) that were assigned to 25 gene networks by in silico analysis. Further analyses placed genes in our RNA sequencing-generated dataset to several canonical signalling pathways, such as cell-cycle control, DNA-damage and -repair responses, p53 and HIF1α. Importantly, we observed downregulation of the pancreatic adenocarcinoma signaling pathway in MYB-silenced pancreatic cancer cells exhibiting suppression of EGFR and NF-κB. Decreased expression of EGFR and RELA was validated by both qPCR and immunoblotting and they were both shown to be under direct transcriptional control of MYB. These observations were further confirmed in a converse approach wherein MYB was overexpressed ectopically in a MYB-null pancreatic cancer cell line. Our findings thus suggest that MYB potentially regulates growth and genomic stability of pancreatic cancer cells via targeting complex gene networks and signaling pathways. Further in-depth functional studies are warranted to fully understand MYB signaling in pancreatic cancer.

  14. Identifying genomic changes associated with insecticide resistance in the dengue mosquito Aedes aegypti by deep targeted sequencing

    PubMed Central

    Faucon, Frederic; Dusfour, Isabelle; Gaude, Thierry; Navratil, Vincent; Boyer, Frederic; Chandre, Fabrice; Sirisopa, Patcharawan; Thanispong, Kanutcharee; Juntarajumnong, Waraporn; Poupardin, Rodolphe; Chareonviriyaphap, Theeraphap; Girod, Romain; Corbel, Vincent; Reynaud, Stephane; David, Jean-Philippe

    2015-01-01

    The capacity of mosquitoes to resist insecticides threatens the control of diseases such as dengue and malaria. Until alternative control tools are implemented, characterizing resistance mechanisms is crucial for managing resistance in natural populations. Insecticide biodegradation by detoxification enzymes is a common resistance mechanism; however, the genomic changes underlying this mechanism have rarely been identified, precluding individual resistance genotyping. In particular, the role of copy number variations (CNVs) and polymorphisms of detoxification enzymes have never been investigated at the genome level, although they can represent robust markers of metabolic resistance. In this context, we combined target enrichment with high-throughput sequencing for conducting the first comprehensive screening of gene amplifications and polymorphisms associated with insecticide resistance in mosquitoes. More than 760 candidate genes were captured and deep sequenced in several populations of the dengue mosquito Ae. aegypti displaying distinct genetic backgrounds and contrasted resistance levels to the insecticide deltamethrin. CNV analysis identified 41 gene amplifications associated with resistance, most affecting cytochrome P450s overtranscribed in resistant populations. Polymorphism analysis detected more than 30,000 variants and strong selection footprints in specific genomic regions. Combining Bayesian and allele frequency filtering approaches identified 55 nonsynonymous variants strongly associated with resistance. Both CNVs and polymorphisms were conserved within regions but differed across continents, confirming that genomic changes underlying metabolic resistance to insecticides are not universal. By identifying novel DNA markers of insecticide resistance, this study opens the way for tracking down metabolic changes developed by mosquitoes to resist insecticides within and among populations. PMID:26206155

  15. Ultra-deep targeted sequencing of advanced oral squamous cell carcinoma identifies a mutation-based prognostic gene signature

    PubMed Central

    Huang, Po-Jung; Huang, Yi; Hsu, An; Tang, Petrus; Chang, Yu-Sun; Chen, Hua-Chien; Yen, Tzu-Chen

    2015-01-01

    Background Patients with advanced oral squamous cell carcinoma (OSCC) have heterogeneous outcomes that limit the implementation of tailored treatment options. Genetic markers for improved prognostic stratification are eagerly awaited. Methods Herein, next-generation sequencing (NGS) was performed in 345 formalin-fixed paraffin-embedded (FFPE) samples obtained from advanced OSCC patients. Genetic mutations on the hotspot regions of 45 cancer-related genes were detected using an ultra-deep (>1000×) sequencing approach. Kaplan-Meier plots and Cox regression analyses were used to investigate the associations between the mutation status and disease-free survival (DFS). Results We identified 1269 non-synonymous mutations in 276 OSCC samples. TP53, PIK3CA, CDKN2A, HRAS and BRAF were the most frequently mutated genes. Mutations in 14 genes were found to predict DFS. A mutation-based signature affecting ten genes (HRAS, BRAF, FGFR3, SMAD4, KIT, PTEN, NOTCH1, AKT1, CTNNB1, and PTPN11) was devised to predict DFS. Two different resampling methods were used to validate the prognostic value of the identified gene signature. Multivariate analysis demonstrated that presence of a mutated gene signature was an independent predictor of poorer DFS (P = 0.005). Conclusions Genetic variants identified by NGS technology in FFPE samples are clinically useful to predict prognosis in advanced OSCC patients. PMID:25980437

  16. Deep parallel sequencing reveals conserved and novel miRNAs in gill and hepatopancreas of giant freshwater prawn.

    PubMed

    Tan, Tian Tian; Chen, Maoshan; Harikrishna, Jennifer Ann; Khairuddin, Norliana; Mohd Shamsudin, Maizatul Izzah; Zhang, Guojie; Bhassu, Subha

    2013-10-01

    MicroRNAs (miRNAs) are ~20-22 nucleotides, non protein-coding RNA regulatory genes that post-transcriptionally regulate many protein-coding genes, influencing critical biological and metabolic processes. While the number of known microRNA is increasing, there is currently no published data for miRNA from giant freshwater prawns, Macrobrachium rosenbergii (M. rosenbergii), a commercially cultured and economically important food species. In this study, we identified novel miRNAs in the gill and hepatopancreas of M. rosenbergii. Through a deep parallel sequencing analysis and an in silico data analysis approach, 327 miRNA families were identified from small RNA libraries with reference to both the de novo transcriptome of M. rosenbergii obtained from RNA-Seq and to miRBase (Release 18.0, November 2012). Based on the identified mature miRNA and recovered precursor sequences that form appropriate hairpin structures, three conserved miRNA (miR125, miR750, miR993) and 27 novel miRNA candidates encoding messenger-like non-coding RNA were identified. miR-125, miR-750, G-m0002/H-m0009, G-m0005, G-m0008/H-m0016, G-m0011/H-m0027 and G-m0015 were selected for experimental validation with stem-loop quantitative RT-PCR and were found to be coherent with the expression profile of deep sequencing data as evaluated with Pearson's correlation coefficient (r = 0.835178 for miRNA in gill, r = 0.724131 for miRNA in hepatopancreas). Using a combinatorial approach of pathway enrichment analysis and inverse expression relationship of miRNA and mRNA, four co-expressed novel miRNA candidates (G-m0005, G-m0008/H-m0016, G-m0011/H-m0027, and G-m0015) were found to be associated with energy metabolism. In addition, the expression of the three novel miRNA candidates (G-m0005, G-m0008/H-m0016, and G-m0011/H-m0027) were also found to be significantly reduced at 9 and 24 h post infection in M. rosenbergii challenged with infectious hypodermal and hematopoietic necrosis virus, suggesting a functional

  17. High-Resolution Sequence-Function Mapping of Full-Length Proteins

    PubMed Central

    Kowalsky, Caitlin A.; Klesmith, Justin R.; Stapleton, James A.; Kelly, Vince; Reichkitzer, Nolan; Whitehead, Timothy A.

    2015-01-01

    Comprehensive sequence-function mapping involves detailing the fitness contribution of every possible single mutation to a gene by comparing the abundance of each library variant before and after selection for the phenotype of interest. Deep sequencing of library DNA allows frequency reconstruction for tens of thousands of variants in a single experiment, yet short read lengths of current sequencers makes it challenging to probe genes encoding full-length proteins. Here we extend the scope of sequence-function maps to entire protein sequences with a modular, universal sequence tiling method. We demonstrate the approach with both growth-based selections and FACS screening, offer parameters and best practices that simplify design of experiments, and present analytical solutions to normalize data across independent selections. Using this protocol, sequence-function maps covering full sequences can be obtained in four to six weeks. Best practices introduced in this manuscript are fully compatible with, and complementary to, other recently published sequence-function mapping protocols. PMID:25790064

  18. Small RNA and transcriptome deep sequencing proffers insight into floral gene regulation in Rosa cultivars

    PubMed Central

    2012-01-01

    Background Roses (Rosa sp.), which belong to the family Rosaceae, are the most economically important ornamental plants—making up 30% of the floriculture market. However, given high demand for roses, rose breeding programs are limited in molecular resources which can greatly enhance and speed breeding efforts. A better understanding of important genes that contribute to important floral development and desired phenotypes will lead to improved rose cultivars. For this study, we analyzed rose miRNAs and the rose flower transcriptome in order to generate a database to expound upon current knowledge regarding regulation of important floral characteristics. A rose genetic database will enable comprehensive analysis of gene expression and regulation via miRNA among different Rosa cultivars. Results We produced more than 0.5 million reads from expressed sequences, totalling more than 110 million bp. From these, we generated 35,657, 31,434, 34,725, and 39,722 flower unigenes from Rosa hybrid: ‘Vital’, ‘Maroussia’, and ‘Sympathy’ and Rosa rugosa Thunb. , respectively. The unigenes were assigned functional annotations, domains, metabolic pathways, Gene Ontology (GO) terms, Plant Ontology (PO) terms, and MIPS Functional Catalogue (FunCat) terms. Rose flower transcripts were compared with genes from whole genome sequences of Rosaceae members (apple, strawberry, and peach) and grape. We also produced approximately 40 million small RNA reads from flower tissue for Rosa, representing 267 unique miRNA tags. Among identified miRNAs, 25 of them were novel and 242 of them were conserved miRNAs. Statistical analyses of miRNA profiles revealed both shared and species-specific miRNAs, which presumably effect flower development and phenotypes. Conclusions In this study, we constructed a Rose miRNA and transcriptome database, and we analyzed the miRNAs and transcriptome generated from the flower tissues of four Rosa cultivars. The database provides a comprehensive genetic

  19. Deep Sequencing of HIV-Infected Cells: Insights into Nascent Transcription and Host-Directed Therapy

    PubMed Central

    Peng, Xinxia; Sova, Pavel; Green, Richard R.; Thomas, Matthew J.; Korth, Marcus J.; Proll, Sean; Xu, Jiabao; Cheng, Yanbing; Yi, Kang; Chen, Li; Peng, Zhiyu; Wang, Jun; Palermo, Robert E.

    2014-01-01

    ABSTRACT Polyadenylated mature mRNAs are the focus of standard transcriptome analyses. However, the profiling of nascent transcripts, which often include nonpolyadenylated RNAs, can unveil novel insights into transcriptional regulation. Here, we separately sequenced total RNAs (Total RNAseq) and mRNAs (mRNAseq) from the same HIV-1-infected human CD4+ T cells. We found that many nonpolyadenylated RNAs were differentially expressed upon HIV-1 infection, and we identified 8 times more differentially expressed genes at 12 h postinfection by Total RNAseq than by mRNAseq. These expression changes were also evident by concurrent changes in introns and were recapitulated by later mRNA changes, revealing an unexpectedly significant delay between transcriptional initiation and mature mRNA production early after HIV-1 infection. We computationally derived and validated the underlying regulatory programs, and we predicted drugs capable of reversing these HIV-1-induced expression changes followed by experimental confirmation. Our results show that combined total and mRNA transcriptome analysis is essential for fully capturing the early host response to virus infection and provide a framework for identifying candidate drugs for host-directed therapy against HIV/AIDS. IMPORTANCE In this study, we used mass sequencing to identify genes differentially expressed in CD4+ T cells during HIV-1 infection. To our surprise, we found many differentially expressed genes early after infection by analyzing both newly transcribed unprocessed pre-mRNAs and fully processed mRNAs, but not by analyzing mRNAs alone, indicating a significant delay between transcription initiation and mRNA production early after HIV-1 infection. These results also show that important findings could be missed by the standard practice of analyzing mRNAs alone. We then derived the regulatory mechanisms driving the observed expression changes using integrative computational analyses. Further, we predicted drugs that

  20. Characterization of microRNAs in Taenia saginata of zoonotic significance by Solexa deep sequencing and bioinformatics analysis.

    PubMed

    Ai, L; Xu, M J; Chen, M X; Zhang, Y N; Chen, S H; Guo, J; Cai, Y C; Zhou, X N; Zhu, X Q; Chen, J X

    2012-06-01

    The beef tapeworm Taenia saginata infects human beings with symptoms ranging from nausea, abdominal discomfort to digestive disturbances and intestinal blockage. In the present study, microRNA (miRNA) expressing profile in adult T. saginata was analyzed using Solexa deep sequencing and bioinformatics analysis. A total of 15.8 million reads was obtained by Solexa sequencing, and 13.3 million clean reads (1.73 million unique sequences) was obtained after removing reads smaller than 18 nt. Ten conserved miRNAs corresponding to 607,382 reads were found when matching the reads against known miRNAs of Schistosoma japonicum in miRBase database. The miR-71 had the most abundant expression in T. saginata, followed by miR-219-5p, but some other common miRNAs such as let-7, miR-40, and miR-103 were not identified in T. saginata. Nucleotide bias analysis found that the known miRNAs showed high bias and the uracil was the dominant nucleotide, particularly at the first and 11th positions which were almost at the beginning and middle of conserved miRNAs. One novel miRNA (Tsa-miR-001) corresponding to ten precursors was identified and confirmed by stem-loop RT-PCR. To our knowledge, this is the first report of miRNA profiles in T. saginata, which will contribute to better understanding of the complex biology of this zoonotic trematode. The reported data of T. saginata miRNAs should provide valuable references for miRNA studies of closed related zoonotic Taenia cestodes such as Taenia solium and Taenia asiatica.

  1. The 2007 Nazko, British Columbia, earthquake sequence: Injection of magma deep in the crust beneath the Anahim volcanic belt

    USGS Publications Warehouse

    Cassidy, J.F.; Balfour, N.; Hickson, C.; Kao, H.; White, Rickie; Caplan-Auerbach, J.; Mazzotti, S.; Rogers, Gary C.; Al-Khoubbi, I.; Bird, A.L.; Esteban, L.; Kelman, M.; Hutchinson, J.; McCormack, D.

    2011-01-01

    On 9 October 2007, an unusual sequence of earthquakes began in central British Columbia about 20 km west of the Nazko cone, the most recent (circa 7200 yr) volcanic center in the Anahim volcanic belt. Within 25 hr, eight earthquakes of magnitude 2.3-2.9 occurred in a region where no earthquakes had previously been recorded. During the next three weeks, more than 800 microearthquakes were located (and many more detected), most at a depth of 25-31 km and within a radius of about 5 km. After about two months, almost all activity ceased. The clear P- and S-wave arrivals indicated that these were high-frequency (volcanic-tectonic) earthquakes and the b value of 1.9 that we calculated is anomalous for crustal earthquakes but consistent with volcanic-related events. Analysis of receiver functions at a station immediately above the seismicity indicated a Moho near 30 km depth. Precise relocation of the seismicity using a double-difference method suggested a horizontal migration at the rate of about 0:5 km=d, with almost all events within the lowermost crust. Neither harmonic tremor nor long-period events were observed; however, some spasmodic bursts were recorded and determined to be colocated with the earthquake hypocenters. These observations are all very similar to a deep earthquake sequence recorded beneath Lake Tahoe, California, in 2003-2004. Based on these remarkable similarities, we interpret the Nazko sequence as an indication of an injection of magma into the lower crust beneath the Anahim volcanic belt. This magma injection fractures rock, producing high-frequency, volcanic-tectonic earthquakes and spasmodic bursts.

  2. Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

    SciTech Connect

    Shi, CY; Yang, H; Wei, CL; Yu, O; Zhang, ZZ; Sun, J; Wan, XC

    2011-01-01

    time PCR (qRT-PCR). An extensive transcriptome dataset has been obtained from the deep sequencing of tea plant. The coverage of the transcriptome is comprehensive enough to discover all known genes of several major metabolic pathways. This transcriptome dataset can serve as an important public information platform for gene expression, genomics, and functional genomic studies in C. sinensis.

  3. Deep sequencing reveals microbiota dysbiosis of tongue coat in patients with liver carcinoma.

    PubMed

    Lu, Haifeng; Ren, Zhigang; Li, Ang; Zhang, Hua; Jiang, Jianwen; Xu, Shaoyan; Luo, Qixia; Zhou, Kai; Sun, Xiaoli; Zheng, Shusen; Li, Lanjuan

    2016-01-01

    Liver carcinoma (LC) is a common malignancy worldwide, associated with high morbidity and mortality. Characterizing microbiome profiles of tongue coat may provide useful insights and potential diagnostic marker for LC patients. Herein, we are the first time to investigate tongue coat microbiome of LC patients with cirrhosis based on 16S ribosomal RNA (rRNA) gene sequencing. After strict inclusion and exclusion criteria, 35 early LC patients with cirrhosis and 25 matched healthy subjects were enrolled. Microbiome diversity of tongue coat in LC patients was significantly increased shown by Shannon, Simpson and Chao 1 indexes. Microbiome on tongue coat was significantly distinguished LC patients from healthy subjects by principal component analysis. Tongue coat microbial profiles represented 38 operational taxonomic units assigned to 23 different genera, distinguishing LC patients. Linear discriminant analysis (LDA) effect size (LEfSe) reveals significant microbial dysbiosis of tongue coats in LC patients. Strikingly, Oribacterium and Fusobacterium could distinguish LC patients from healthy subjects. LEfSe outputs show microbial gene functions related to categories of nickel/iron_transport, amino_acid_transport, energy produced system and metabolism between LC patients and healthy subjects. These findings firstly identify microbiota dysbiosis of tongue coat in LC patients, may providing novel and non-invasive potential diagnostic biomarker of LC. PMID:27605161

  4. Deep sequencing reveals microbiota dysbiosis of tongue coat in patients with liver carcinoma

    PubMed Central

    Lu, Haifeng; Ren, Zhigang; Li, Ang; Zhang, Hua; Jiang, Jianwen; Xu, Shaoyan; Luo, Qixia; Zhou, Kai; Sun, Xiaoli; Zheng, Shusen; Li, Lanjuan

    2016-01-01

    Liver carcinoma (LC) is a common malignancy worldwide, associated with high morbidity and mortality. Characterizing microbiome profiles of tongue coat may provide useful insights and potential diagnostic marker for LC patients. Herein, we are the first time to investigate tongue coat microbiome of LC patients with cirrhosis based on 16S ribosomal RNA (rRNA) gene sequencing. After strict inclusion and exclusion criteria, 35 early LC patients with cirrhosis and 25 matched healthy subjects were enrolled. Microbiome diversity of tongue coat in LC patients was significantly increased shown by Shannon, Simpson and Chao 1 indexes. Microbiome on tongue coat was significantly distinguished LC patients from healthy subjects by principal component analysis. Tongue coat microbial profiles represented 38 operational taxonomic units assigned to 23 different genera, distinguishing LC patients. Linear discriminant analysis (LDA) effect size (LEfSe) reveals significant microbial dysbiosis of tongue coats in LC patients. Strikingly, Oribacterium and Fusobacterium could distinguish LC patients from healthy subjects. LEfSe outputs show microbial gene functions related to categories of nickel/iron_transport, amino_acid_transport, energy produced system and metabolism between LC patients and healthy subjects. These findings firstly identify microbiota dysbiosis of tongue coat in LC patients, may providing novel and non-invasive potential diagnostic biomarker of LC. PMID:27605161

  5. A deep sequencing tool for partitioning clearance rates following antimalarial treatment in polyclonal infections

    PubMed Central

    Mideo, Nicole; Bailey, Jeffrey A.; Hathaway, Nicholas J.; Ngasala, Billy; Saunders, David L.; Lon, Chanthap; Kharabora, Oksana; Jamnik, Andrew; Balasubramanian, Sujata; Björkman, Anders; Mårtensson, Andreas; Meshnick, Steven R.; Read, Andrew F.; Juliano, Jonathan J.

    2016-01-01

    Background and objectives: Current tools struggle to detect drug-resistant malaria parasites when infections contain multiple parasite clones, which is the norm in high transmission settings in Africa. Our aim was to develop and apply an approach for detecting resistance that overcomes the challenges of polyclonal infections without requiring a genetic marker for resistance. Methodology: Clinical samples from patients treated with artemisinin combination therapy were collected from Tanzania and Cambodia. By deeply sequencing a hypervariable locus, we quantified the relative abundance of parasite subpopulations (defined by haplotypes of that locus) within infections and revealed evolutionary dynamics during treatment. Slow clearance is a phenotypic, clinical marker of artemisinin resistance; we analyzed variation in clearance rates within infections by fitting parasite clearance curves to subpopulation data. Results: In Tanzania, we found substantial variation in clearance rates within individual patients. Some parasite subpopulations cleared as slowly as resistant parasites observed in Cambodia. We evaluated possible explanations for these data, including resistance to drugs. Assuming slow clearance was a stable phenotype of subpopulations, simulations predicted that modest increases in their frequency could substantially increase time to cure. Conclusions and implications: By characterizing parasite subpopulations within patients, our method can detect rare, slow clearing parasites in vivo whose phenotypic effects would otherwise be masked. Since our approach can be applied to polyclonal infections even when the genetics underlying resistance are unknown, it could aid in monitoring the emergence of artemisinin resistance. Our application to Tanzanian samples uncovers rare subpopulations with worrying phenotypes for closer examination. PMID:26817485

  6. Deep sequencing of the Camellia chekiangoleosa transcriptome revealed candidate genes for anthocyanin biosynthesis.

    PubMed

    Wang, Zhong-Wei; Jiang, Cong; Wen, Qiang; Wang, Na; Tao, Yuan-Yuan; Xu, Li-An

    2014-03-15

    Camellia chekiangoleosa is an important species of genus Camellia. It provides high-quality edible oil and has great ornamental value. The flowers are big and red which bloom between February and March. Flower pigmentation is closely related to the accumulation of anthocyanin. Although anthocyanin biosynthesis has been studied extensively in herbaceous plants, little molecular information on the anthocyanin biosynthesis pathway of C. chekiangoleosa is yet known. In the present study, a cDNA library was constructed to obtain detailed and general data from the flowers of C. chekiangoleosa. To explore the transcriptome of C. chekiangoleosa and investigate genes involved in anthocyanin biosynthesis, a 454 GS FLX Titanium platform was used to generate an EST dataset. About 46,279 sequences were obtained, and 24,593 (53.1%) were annotated. Using Blast search against the AGRIS, 1740 unigenes were found homologous to 599 Arabidopsis transcription factor genes. Based on the transcriptome dataset, nine anthocyanin biosynthesis pathway genes (PAL, CHS1, CHS2, CHS3, CHI, F3H, DFR, ANS, and UFGT) were identified and cloned. The spatio-temporal expression patterns of these genes were also analyzed using quantitative real-time polymerase chain reaction. The study results not only enrich the gene resource but also provide valuable information for further studies concerning anthocyanin biosynthesis. PMID:24462969

  7. Interactions between Closely Related Bacterial Strains Are Revealed by Deep Transcriptome Sequencing

    PubMed Central

    González-Torres, Pedro; Pryszcz, Leszek P.; Santos, Fernando; Martínez-García, Manuel

    2015-01-01

    Comparative genomics, metagenomics, and single-cell technologies have shown that populations of microbial species encompass assemblages of closely related strains. This raises the question of whether individual bacterial lineages respond to the presence of their close relatives by modifying their gene expression or, instead, whether assemblages simply act as the arithmetic addition of their individual components. Here, we took advantage of transcriptome sequencing to address this question. For this, we analyzed the transcriptomes of two closely related strains of the extremely halophilic bacterium Salinibacter ruber grown axenically and in coculture. These organisms dominate bacterial assemblages in hypersaline environments worldwide. The strains used here cooccurred in the natural environment and are 100% identical in their 16S rRNA genes, and each strain harbors an accessory genome representing 10% of its complete genome. Overall, transcriptomic patterns from pure cultures were very similar for both strains. Expression was detected along practically the whole genome albeit with some genes at low levels. A subset of genes was very highly expressed in both strains, including genes coding for the light-driven proton pump xanthorhodopsin, genes involved in the stress response, and genes coding for transcriptional regulators. Expression differences between pure cultures affected mainly genes involved in environmental sensing. When the strains were grown in coculture, there was a modest but significant change in their individual transcription patterns compared to those in pure culture. Each strain sensed the presence of the other and responded in a specific manner, which points to fine intraspecific transcriptomic modulation. PMID:26431969

  8. Deep Sequencing and Ecological Characterization of Gut Microbial Communities of Diverse Bumble Bee Species

    PubMed Central

    Lim, Haw Chuan; Chu, Chia-Ching; Seufferheld, Manfredo J.; Cameron, Sydney A.

    2015-01-01

    Gut bacterial communities of bumble bees are correlated with defense against pathogens. Further understanding this host-microbe association is vitally important as bumble bees are currently experiencing global population declines, potentially due in part to emergent diseases. In this study, we used pyrosequencing and community fingerprinting (ARISA) to characterize the gut microbial communities of nine bumble species from across the Bombus phylogeny. Overall, we delimited 74 bacterial taxa (operational taxonomic units or OTUs) belonging to Betaproteobacteria, Gammaproteobacteria, Bacilli, Actinobacteria, Flavobacteria and Alphaproteobacteria. Each bacterial community was taxonomically simple, containing an average of 1.9 common (relative abundance per sample > 5%) bacterial OTUs. The most abundant and prevalent (occurring in 92% of the samples) bacterial OTU, based on 16S rRNA sequences, closely matched that of the previously described Betaproteobacteria species Snodgrassella alvi. Bacteria that were first described in bee-related external environments dominated a number of gut bacterial communities, suggesting that they are not strictly dependent on the internal gut environment. The ARISA data showed a correlation between bacterial community structures and the geographic locations where the bees were sampled, suggesting that at least a subset of the bacterial species may be transmitted environmentally. Using light and fluorescent microscopy, we demonstrated that the gut bacteria form a biofilm on the internal epithelial surface of the ileum, corroborating results obtained from Apis mellifera. PMID:25768110

  9. Deep sequencing reveals microbiota dysbiosis of tongue coat in patients with liver carcinoma.

    PubMed

    Lu, Haifeng; Ren, Zhigang; Li, Ang; Zhang, Hua; Jiang, Jianwen; Xu, Shaoyan; Luo, Qixia; Zhou, Kai; Sun, Xiaoli; Zheng, Shusen; Li, Lanjuan

    2016-01-01

    Liver carcinoma (LC) is a common malignancy worldwide, associated with high morbidity and mortality. Characterizing microbiome profiles of tongue coat may provide useful insights and potential diagnostic marker for LC patients. Herein, we are the first time to investigate tongue coat microbiome of LC patients with cirrhosis based on 16S ribosomal RNA (rRNA) gene sequencing. After strict inclusion and exclusion criteria, 35 early LC patients with cirrhosis and 25 matched healthy subjects were enrolled. Microbiome diversity of tongue coat in LC patients was significantly increased shown by Shannon, Simpson and Chao 1 indexes. Microbiome on tongue coat was significantly distinguished LC patients from healthy subjects by principal component analysis. Tongue coat microbial profiles represented 38 operational taxonomic units assigned to 23 different genera, distinguishing LC patients. Linear discriminant analysis (LDA) effect size (LEfSe) reveals significant microbial dysbiosis of tongue coats in LC patients. Strikingly, Oribacterium and Fusobacterium could distinguish LC patients from healthy subjects. LEfSe outputs show microbial gene functions related to categories of nickel/iron_transport, amino_acid_transport, energy produced system and metabolism between LC patients and healthy subjects. These findings firstly identify microbiota dysbiosis of tongue coat in LC patients, may providing novel and non-invasive potential diagnostic biomarker of LC.

  10. Deep sequencing of virus-infected cells reveals HIV-encoded small RNAs

    PubMed Central

    Schopman, Nick C.T.; Willemsen, Marcel; Liu, Ying Poi; Bradley, Ted; van Kampen, Antoine; Baas, Frank; Berkhout, Ben; Haasnoot, Joost

    2012-01-01

    Small virus-derived interfering RNAs (viRNAs) play an important role in antiviral defence in plants, insects and nematodes by triggering the RNA interference (RNAi) pathway. The role of RNAi as an antiviral defence mechanism in mammalian cells has been obscure due to the lack of viRNA detection. Although viRNAs from different mammalian viruses have recently been identified, their functions and possible impact on viral replication remain unknown. To identify viRNAs derived from HIV-1, we used the extremely sensitive SOLiDTM 3 Plus System to analyse viRNA accumulation in HIV-1-infected T lymphocytes. We detected numerous small RNAs that correspond to the HIV-1 RNA genome. The majority of these sequences have a positive polarity (98.1%) and could be derived from miRNAs encoded by structured segments of the HIV-1 RNA genome (vmiRNAs). A small portion of the viRNAs is of negative polarity and most of them are encoded within the 3′-UTR, which may represent viral siRNAs (vsiRNAs). The identified vsiRNAs can potently repress HIV-1 production, whereas suppression of the vsiRNAs by antagomirs stimulate virus production. These results suggest that HIV-1 triggers the production of vsiRNAs and vmiRNAs to modulate cellular and/or viral gene expression. PMID:21911362

  11. Deep sequencing of virus-infected cells reveals HIV-encoded small RNAs.

    PubMed

    Schopman, Nick C T; Willemsen, Marcel; Liu, Ying Poi; Bradley, Ted; van Kampen, Antoine; Baas, Frank; Berkhout, Ben; Haasnoot, Joost

    2012-01-01

    Small virus-derived interfering RNAs (viRNAs) play an important role in antiviral defence in plants, insects and nematodes by triggering the RNA interference (RNAi) pathway. The role of RNAi as an antiviral defence mechanism in mammalian cells has been obscure due to the lack of viRNA detection. Although viRNAs from different mammalian viruses have recently been identified, their functions and possible impact on viral replication remain unknown. To identify viRNAs derived from HIV-1, we used the extremely sensitive SOLiD(TM) 3 Plus System to analyse viRNA accumulation in HIV-1-infected T lymphocytes. We detected numerous small RNAs that correspond to the HIV-1 RNA genome. The majority of these sequences have a positive polarity (98.1%) and could be derived from miRNAs encoded by structured segments of the HIV-1 RNA genome (vmiRNAs). A small portion of the viRNAs is of negative polarity and most of them are encoded within the 3'-UTR, which may represent viral siRNAs (vsiRNAs). The identified vsiRNAs can potently repress HIV-1 production, whereas suppression of the vsiRNAs by antagomirs stimulate virus production. These results suggest that HIV-1 triggers the production of vsiRNAs and vmiRNAs to modulate cellular and/or viral gene expression. PMID:21911362

  12. Deep mRNA Sequencing Analysis to Capture the Transcriptome Landscape of Zebrafish Embryos and Larvae

    PubMed Central

    Xie, Shuying; Xu, Yao; Zhu, Genfeng; Wang, Lei; Huang, Jiyue; Ma, Hong; Yao, Jihua

    2013-01-01

    Transcriptome analysis is a powerful tool to obtain large amount genome-scale gene expression profiles. Despite its extensive usage to diverse biological problems in the last decade, transcriptomic researches approaching the zebrafish embryonic development have been very limited. Several recent studies have made great progress in this direction, yet the large gap still exists, especially regarding to the transcriptome dynamics of embryonic stages from early gastrulation onwards. Here, we present a comprehensive analysis about the transcriptomes of 9 different stages covering 7 major periods (cleavage, blastula, gastrula, segmentation, pharyngula, hatching and early larval stage) in zebrafish development, by recruiting the RNA-sequencing technology. We detected the expression for at least 24,065 genes in at least one of the 9 stages. We identified 16,130 genes that were significantly differentially expressed between stages and were subsequently classified into six clusters. Each revealed gene cluster had distinct expression patterns and characteristic functional pathways, providing a framework for the understanding of the developmental transcriptome dynamics. Over 4000 genes were identified as preferentially expressed in one of the stages, which could be of high relevance to stage-specific developmental and molecular events. Among the 68 transcription factor families active during development, most had enhanced average expression levels and thus might be crucial for embryogenesis, whereas the inactivation of the other families was likely required by the activation of the zygotic genome. We discussed our RNA-seq data together with previous findings about the Wnt signaling pathway and some other genes with known functions, to show how our data could be used to advance our understanding about these developmental functional elements. Our study provides ample information for further study about the molecular and cellular mechanisms underlying vertebrate development. PMID

  13. Makah Formation; a deep-marginal-basin sequence of late Eocene and Oligocene age in the northwestern Olympic Peninsula, Washington

    USGS Publications Warehouse

    Snavely, P. D.; Niem, A.R.; MacLeod, N.S.; Pearl, J.E.; Rau, W.W.

    1980-01-01

    The Makah Formation of the Twin River Group crops out in a northwest-trending linear belt in the northwesternmost part of the Olympic Peninsula, Wash. This marine sequence consists of 2800 meters of predominantly thin-bedded siltstone and sandstone that encloses six distinctive newly named members--four thick-bedded amalgamated turbidite sandstone members, an olistostromal shallow-water marine sandstone and conglomerate member, and a thin-bedded water-laid tuff member. A local unconformity of submarine origin occurs within the lower part of the Makah Formation except in the central part of the study area, where it forms the contact between the older Hoko River Formation and the Makah. Foraminiferal faunas indicate that the Makah Formation ranges in age from late Eocene (late Narizian) to late Oligocene (Zemorrian) and was deposited in a predominantly lower to middle bathyal environment. The Makah Formation is part of a deep-marginalbasin facies that crops out in the western part of the Olympic Peninsula, in southwesternmost Washington and coastal embayments in northwestern Oregon, and along the central part of the coast of western Vancouver Island. On the basis of limited subsurface data from exploratory wells, correlative deep-marginal-basin deposits underlie the inner continental shelf of Oregon and the continental shelf (Tofino basin) along the southwestern side of Vancouver Island. Directional structures in the Makah Formation indicate that the predominantly lithic arkosic sandstone that forms the turbidite packets was derived from the northwest. A possible source of the clastic material is the dioritic, granitic, and volcanic terranes in the vicinity of the Hesquiat Peninsula and Barkley Sound on the west coast of Vancouver Island. Vertical and lateral variations of turbidite facies suggest that the four packets of sandstone were formed as depositional lobes on an outer submarine fan. The thin-bedded strata between the turbidite packets have characteristics of

  14. Eustatic controls on stratification and facies associations in deep-water deposits, Great Valley sequence, Sacramento Valley, California

    SciTech Connect

    Morgan, S.R.; Campion, K.M.

    1987-05-01

    The Great Valley sequence consists of submarine fan deposits that are divided into laterally persistent sandstones and conglomerates separated by thick shaly intervals. The frequency of sandstone-shale successions in the Great Valley closely corresponds to the occurrence of major eustatic falls observed elsewhere in the world during the Upper Jurassic and Cretaceous. This close correspondence between the number of observed fans and sea level cycles has implications for the timing of fan development and facies models of deep-water deposits. On the basis of seismic expression, deep-water deposits from various basins have been divided by Mitchum into a sand-prone lower fan, which has a sharp basal contact, and a younger upper fan, which exhibits downlap onto and over the lower fan. Sand-prone members of the Great Valley (e.g., Venado and Forbes) are sharp-based, fining-upward units that have an aggradational or retrogradational stacking pattern of fan lobes. Massive sandstone, pebbly sandstone, conglomerate, pebbly mudstone, turbidites, and lenticular turbidites compose the fan lithologies. These rocks are typically referred to as inner fan channel or midfan lobes. In contrast, shale-dominated sections with thin-bedded turbidites (e.g., Boxer and Yolo) that have been variously described as basin plain, outer fan, inner fan levee, and slope correspond to the upper fan. Sharp basal fan contacts, textural contrasts between the lower and upper fans, and encasement of sand-prone fans in thick shaly sections indicate that fan development is an episodic rather than a continuous process. Rapid eustatic fall causing stream incision and shelf bypass is a likely mechanism for basin-wide and interbasinal fan development. Lithofacies encountered in fan deposits are related to grain size in the source area; specific lithologies in Great Valley fans (e.g., conglomerate) may be absent in other basins.

  15. Draft Genome Sequences of Two Thiomicrospira Strains Isolated from the Brine-Seawater Interface of Kebrit Deep in the Red Sea

    PubMed Central

    Zhang, Guishan; Fauzi Haroon, Mohamed; Zhang, Ruifu; Hikmawan, Tyas

    2016-01-01

    Two Thiomicrospira strains, WB1 and XS5, were isolated from the Kebrit Deep brine-seawater interface in the Red Sea, Saudi Arabia. Here, we present the draft genome sequences of these gammaproteobacteria, which both produce sulfuric acid from thiosulfate in culture. PMID:26966216

  16. MicroRNA Profiling of Epstein-Barr Virus-Associated NK/T-Cell Lymphomas by Deep Sequencing

    PubMed Central

    Motsch, Natalie; Alles, Julia; Imig, Jochen; Zhu, Jiayun; Barth, Stephanie; Reineke, Tanja; Tinguely, Marianne; Cogliatti, Sergio; Dueck, Anne; Meister, Gunter

    2012-01-01

    The Epstein-Barr virus (EBV) is an oncogenic human Herpes virus involved in the pathogenesis of nasal NK/T-cell lymphoma. EBV encodes microRNAs (miRNAs) and induces changes in the host cellular miRNA profile. MiRNAs are short non-coding RNAs of about 19–25 nt length that regulate gene expression by post-transcriptional mechanisms and are frequently deregulated in human malignancies including cancer. The microRNA profiles of EBV-positive NK/T-cell lymphoma, non-infected T-cell lymphoma and normal thymus were established by deep sequencing of small RNA libraries. The comparison of the EBV-positive NK/T-cell vs. EBV-negative T-cell lymphoma revealed 15 up- und 16 down-regulated miRNAs. In contrast, the majority of miRNAs was repressed in the lymphomas compared to normal tissue. We also identified 10 novel miRNAs from known precursors and two so far unknown miRNAs. The sequencing results were confirmed for selected miRNAs by quantitative Real-Time PCR (qRT-PCR). We show that the proinflammatory cytokine interleukin 1 alpha (IL1A) is a target for miR-142-3p and the oncogenic BCL6 for miR-205. MiR-142-3p is down-regulated in the EBV-positive vs. EBV-negative lymphomas. MiR-205 was undetectable in EBV-negative lymphoma and strongly down-regulated in EBV-positive NK/T-cell lymphoma as compared to thymus. The targets were confirmed by reporter assays and by down-regulation of the proteins by ectopic expression of the cognate miRNAs. Taken together, our findings demonstrate the relevance of deregulated miRNAs for the post-transcriptional gene regulation in nasal NK/T-cell lymphomas. PMID:22870299

  17. Refining transcriptional programs in kidney development by integration of deep RNA-sequencing and array-based spatial profiling

    PubMed Central

    2011-01-01

    Background The developing mouse kidney is currently the best-characterized model of organogenesis at a transcriptional level. Detailed spatial maps have been generated for gene expression profiling combined with systematic in situ screening. These studies, however, fall short of capturing the transcriptional complexity arising from each locus due to the limited scope of microarray-based technology, which is largely based on "gene-centric" models. Results To address this, the polyadenylated RNA and microRNA transcriptomes of the 15.5 dpc mouse kidney were profiled using strand-specific RNA-sequencing (RNA-Seq) to a depth sufficient to complement spatial maps from pre-existing microarray datasets. The transcriptional complexity of RNAs arising from mouse RefSeq loci was catalogued; including 3568 alternatively spliced transcripts and 532 uncharacterized alternate 3' UTRs. Antisense expressions for 60% of RefSeq genes was also detected including uncharacterized non-coding transcripts overlapping kidney progenitor markers, Six2 and Sall1, and were validated by section in situ hybridization. Analysis of genes known to be involved in kidney development, particularly during mesenchymal-to-epithelial transition, showed an enrichment of non-coding antisense transcripts extended along protein-coding RNAs. Conclusion The resulting resource further refines the transcriptomic cartography of kidney organogenesis by integrating deep RNA sequencing data with locus-based information from previously published expression atlases. The added resolution of RNA-Seq has provided the basis for a transition from classical gene-centric models of kidney development towards more accurate and detailed "transcript-centric" representations, which highlights the extent of transcriptional complexity of genes that direct complex development events. PMID:21888672

  18. Deep Sequencing of the Scutellaria baicalensis Georgi Transcriptome Reveals Flavonoid Biosynthetic Profiling and Organ-Specific Gene Expression

    PubMed Central

    Liu, Jinxin; Hou, Jingyi; Jiang, Chao; Li, Geng; Lu, Heng; Meng, Fanyun; Shi, Linchun

    2015-01-01

    Scutellaria baicalensis Georgi has long been used in traditional medicine to treat various such widely varying diseases and has been listed in the Chinese Pharmacopeia, the Japanese Pharmacopeia, the Korean Pharmacopoeia and the European Pharmacopoeia. Flavonoids, especially wogonin, wogonoside, baicalin, and baicalein, are its main functional ingredients with various pharmacological activities. Although pharmaological studies for these flavonoid components have been well conducted, the molecular mechanism of their biosynthesis remains unclear in S. baicalensis. In this study, Illumina/Solexa deep sequencing generated more than 91 million paired-end reads and 49,507 unigenes from S. baicalensis roots, stems, leaves and flowers. More than 70% unigenes were annotated in at least one of the five public databases and 13,627 unigenes were assigned to 3,810 KEGG genes involved in 579 different pathways. 54 unigenes that encode 12 key enzymes involved in the pathway of flavonoid biosynthesis were discovered. One baicalinase and three baicalein 7-O-glucuronosyltransferases genes potentially involved in the transformation between baicalin/wogonoside and baicalein/wogonin were identified. Four candidate 6-hydroxylase genes for the formation of baicalin/baicalein and one candidate 8-O-methyltransferase gene for the biosynthesis of wogonoside/wogonin were also recognized. Our results further support the conclusion that, in S. baicalensis, 3,5,7-trihydroxyflavone was the precursor of the four above compounds. Then, the differential expression models and simple sequence repeats associated with these genes were carefully analyzed. All of these results not only enrich the gene resource but also benefit research into the molecular genetics and functional genomics in S. baicalensis. PMID:26317778

  19. Deep sequencing reveals important roles of microRNAs in response to drought and salinity stress in cotton

    PubMed Central

    Xie, Fuliang; Wang, Qinglian; Sun, Runrun; Zhang, Baohong

    2015-01-01

    Drought and salinity are two major environmental factors adversely affecting plant growth and productivity. However, the regulatory mechanism is unknown. In this study, the potential roles of small regulatory microRNAs (miRNAs) in cotton response to those stresses were investigated. Using next-generation deep sequencing, a total of 337 miRNAs with precursors were identified, comprising 289 known miRNAs and 48 novel miRNAs. Of these miRNAs, 155 miRNAs were expressed differentially. Target prediction, Gene Ontology (GO)-based functional classification, and Kyoto Encyclopedia of Genes and Genomes (KEGG)-based functional enrichment show that these miRNAs might play roles in response to salinity and drought stresses through targeting a series of stress-related genes. Degradome sequencing analysis showed that at least 55 predicted target genes were further validated to be regulated by 60 miRNAs. CitationRank-based literature mining was employed to determinhe the importance of genes related to drought and salinity stress. The NAC, MYB, and MAPK families were ranked top under the context of drought and salinity, indicating their important roles for the plant to combat drought and salinity stress. According to target prediction, a series of cotton miRNAs are associated with these top-ranked genes, including miR164, miR172, miR396, miR1520, miR6158, ghr-n24, ghr-n56, and ghr-n59. Interestingly, 163 cotton miRNAs were also identified to target 210 genes that are important in fibre development. These results will contribute to cotton stress-resistant breeding as well as understanding fibre development. PMID:25371507

  20. The transcriptomic response to copper exposure by the gill tissue of Japanese scallops (Mizuhopecten yessoensis) using deep-sequencing technology.

    PubMed

    Meng, Xiaolin; Tian, Xue; Liu, Mei; Nie, Guoxing; Jiang, Keyong; Wang, Baojie; Wang, Lei

    2014-06-01

    The bivalve Mizuhopecten yessoensis has been greatly impacted by marine pollutants in northern China. To elucidate the toxicological mechanism of copper exposure on the immune system, we investigated differentially expressed genes (DEGs) and transcript abundance in M. yessoensis gill tissue using the deep-sequencing platform Illumina HiSeq™ 2000. In total, 1312 and 2237 genes were identified as significantly up- or down-regulated, respectively. In addition, significant enrichment analysis identified 9 GO terms and 38 pathways involved in the response to copper exposure. The analysis of immune-related transcripts revealed a complex repertoire of innate recognition receptors, including toll-like receptors, NOD-like receptors and RIG-like receptors. Downstream pathway effectors, such as apoptotic, lysosomal and C-type lectin transcripts, were also analyzed. These results will provide a resource for subsequent gene expression studies regarding heavy metal exposure and the identification of copper-sensitive biomarkers to monitor the aquaculture of M. yessoensis.

  1. Identification of MicroRNAs and transcript targets in Camelina sativa by deep sequencing and computational methods

    DOE PAGES

    Poudel, Saroj; Aryal, Niranjan; Lu, Chaofu; Wang, Tai

    2015-03-31

    Camelina sativa is an annual oilseed crop that is under intensive development for renewable resources of biofuels and industrial oils. MicroRNAs, or miRNAs, are endogenously encoded small RNAs that play key roles in diverse plant biological processes. Here, we conducted deep sequencing on small RNA libraries prepared from camelina leaves, flower buds and two stages of developing seeds corresponding to initial and peak storage products accumulation. Computational analyses identified 207 known miRNAs belonging to 63 families, as well as 5 novel miRNAs. These miRNAs, especially members of the miRNA families, varied greatly in different tissues and developmental stages. The predictedmore » miRNA target genes are involved in a broad range of physiological functions including lipid metabolism. This report is the first step toward elucidating roles of miRNAs in C. sativa and will provide additional tools to improve this oilseed crop for biofuels and biomaterials.« less

  2. Longitudinal copy number, whole exome and targeted deep sequencing of 'good risk' IGHV-mutated CLL patients with progressive disease.

    PubMed

    Rose-Zerilli, M J J; Gibson, J; Wang, J; Tapper, W; Davis, Z; Parker, H; Larrayoz, M; McCarthy, H; Walewska, R; Forster, J; Gardiner, A; Steele, A J; Chelala, C; Ennis, S; Collins, A; Oakes, C C; Oscier, D G; Strefford, J C

    2016-06-01

    The biological features of IGHV-M chronic lymphocytic leukemia responsible for disease progression are still poorly understood. We undertook a longitudinal study close to diagnosis, pre-treatment and post relapse in 13 patients presenting with cMBL or Stage A disease and good-risk biomarkers (IGHV-M genes, no del(17p) or del(11q) and low CD38 expression) who nevertheless developed progressive disease, of whom 10 have required therapy. Using cytogenetics, fluorescence in situ hybridisation, genome-wide DNA methylation and copy number analysis together with whole exome, targeted deep- and Sanger sequencing at diagnosis, we identified mutations in established chronic lymphocytic leukemia driver genes in nine patients (69%), non-coding mutations (PAX5 enhancer region) in three patients and genomic complexity in two patients. Branching evolutionary trajectories predominated (n=9/13), revealing intra-tumoural epi- and genetic heterogeneity and sub-clonal competition before therapy. Of the patients subsequently requiring treatment, two had sub-clonal TP53 mutations that would not be detected by standard methodologies, three qualified for the very-low-risk category defined by integrated mutational and cytogenetic analysis and yet had established or putative driver mutations and one patient developed progressive, therapy-refractory disease associated with the emergence of an IGHV-U clone. These data suggest that extended genomic and immunogenetic screening may have clinical utility in patients with apparent good-risk disease. PMID:26847028

  3. Longitudinal copy number, whole exome and targeted deep sequencing of 'good risk' IGHV-mutated CLL patients with progressive disease

    PubMed Central

    Rose-Zerilli, M J J; Gibson, J; Wang, J; Tapper, W; Davis, Z; Parker, H; Larrayoz, M; McCarthy, H; Walewska, R; Forster, J; Gardiner, A; Steele, A J; Chelala, C; Ennis, S; Collins, A; Oakes, C C; Oscier, D G; Strefford, J C

    2016-01-01

    The biological features of IGHV-M chronic lymphocytic leukemia responsible for disease progression are still poorly understood. We undertook a longitudinal study close to diagnosis, pre-treatment and post relapse in 13 patients presenting with cMBL or Stage A disease and good-risk biomarkers (IGHV-M genes, no del(17p) or del(11q) and low CD38 expression) who nevertheless developed progressive disease, of whom 10 have required therapy. Using cytogenetics, fluorescence in situ hybridisation, genome-wide DNA methylation and copy number analysis together with whole exome, targeted deep- and Sanger sequencing at diagnosis, we identified mutations in established chronic lymphocytic leukemia driver genes in nine patients (69%), non-coding mutations (PAX5 enhancer region) in three patients and genomic complexity in two patients. Branching evolutionary trajectories predominated (n=9/13), revealing intra-tumoural epi- and genetic heterogeneity and sub-clonal competition before therapy. Of the patients subsequently requiring treatment, two had sub-clonal TP53 mutations that would not be detected by standard methodologies, three qualified for the very-low-risk category defined by integrated mutational and cytogenetic analysis and yet had established or putative driver mutations and one patient developed progressive, therapy-refractory disease associated with the emergence of an IGHV-U clone. These data suggest that extended genomic and immunogenetic screening may have clinical utility in patients with apparent good-risk disease. PMID:26847028

  4. Identification of miRNAs associated with sexual maturity in chicken ovary by Illumina small RNA deep sequencing

    PubMed Central

    2013-01-01

    Background MicroRNAs have been suggested to play important roles in the regulation of gene expression in various biological processes. To investigate the function of miRNAs in chicken ovarian development and folliculogenesis, two small RNA libraries constructed from sexually mature (162-day old) and immature (42-day old) ovary tissues of Single Comb White Leghorn chicken were sequenced using Illumina small RNA deep sequencing. Results In the present study, 14,545,100 and 14,774,864 clean reads were obtained from sexually mature (162-d) and sexually immature (42-d) ovaries, respectively. In total, 202 known miRNAs were identified, and 93 of them were found to be significantly differentially expressed: 42 miRNAs were up-regulated and 51 miRNAs were down-regulated in the mature ovary compared to the immature ovary. Among the up-regulated miRNAs, gga-miR-1a has the largest fold-change (6.405-fold), while gga-miR-375 has the largest fold-change (11.345-fold) among the down-regulated miRNAs. The three most abundant miRNAs in the chicken ovary are gga-miR-10a, gga-let-7 and gga-miR-21. Five differentially expressed miRNAs (gga-miR-1a, 21, 26a, 137 and 375) were validated by real-time quantitative RT-PCR (qRT-PCR). Furthermore, the expression patterns of the five miRNAs were analyzed in different developmental stages of chicken ovary and follicles of various sizes. Conclusion The present study provides the first miRNA profile in sexually immature and mature chicken ovaries. Some miRNAs such as gga-miR-1a and gga-miR-21are expressed differentially in immature and mature chicken ovaries as well as among different sized follicles, suggesting an important role in the follicular growth or ovulation mechanism in the chicken. PMID:23705682

  5. Cell and Microvesicle Urine microRNA Deep Sequencing Profiles from Healthy Individuals: Observations with Potential Impact on Biomarker Studies

    PubMed Central

    Ben-Dov, Iddo Z.; Whalen, Veronica M.; Goilav, Beatrice; Max, Klaas E. A.; Tuschl, Thomas

    2016-01-01

    Background Urine is a potential source of biomarkers for diseases of the kidneys and urinary tract. RNA, including microRNA, is present in the urine enclosed in detached cells or in extracellular vesicles (EVs) or bound and protected by extracellular proteins. Detection of cell- and disease-specific microRNA in urine may aid early diagnosis of organ-specific pathology. In this study, we applied barcoded deep sequencing to profile microRNAs in urine of healthy volunteers, and characterized the effects of sex, urine fraction (cells vs. EVs) and repeated voids by the same individuals. Results Compared to urine-cell-derived small RNA libraries, urine-EV-derived libraries were relatively enriched with miRNA, and accordingly had lesser content of other small RNA such as rRNA, tRNA and sn/snoRNA. Unsupervised clustering of specimens in relation to miRNA expression levels showed prominent bundling by specimen type (urine cells or EVs) and by sex, as well as a tendency of repeated (first and second void) samples to neighbor closely. Likewise, miRNA profile correlations between void repeats, as well as fraction counterparts (cells and EVs from the same specimen) were distinctly higher than correlations between miRNA profiles overall. Differential miRNA expression by sex was similar in cells and EVs. Conclusions miRNA profiling of both urine EVs and sediment cells can convey biologically important differences between individuals. However, to be useful as urine biomarkers, careful consideration is needed for biofluid fractionation and sex-specific analysis, while the time of voiding appears to be less important. PMID:26785265

  6. Ultra-Deep Sequencing Characterization of HCV Samples with Equivocal Typing Results Determined with a Commercial Assay

    PubMed Central

    Minosse, Claudia; Giombini, Emanuela; Bartolini, Barbara; Capobianchi, Maria R.; Garbuglia, Anna R.

    2016-01-01

    Hepatitis C virus (HCV) is classified into seven phylogenetically distinct genotypes, which are further subdivided into related subtypes. Accurate assignment of genotype/subtype is mandatory in the era of directly acting antivirals. Several molecular methods are available for HCV genotyping; however, a relevant number of samples with indeterminate, mixed, or unspecified subtype results, or even with misclassified genotypes, may occur. Using NS5B direct (DS) and ultra-deep pyrosequencing (UDPS), we have tested 43 samples, which resulted in genotype 1 unsubtyped (n = 17), mixed infection (n = 17), or indeterminate (n = 9) with the Abbott RealTime HCV Genotype II assay. Genotype 1 was confirmed in 14/17 samples (82%): eight resulted in subtype 1b, and five resulted in subtype 1a with both DS and UDPS, while one was classified as subtype 1e by DS and mixed infection (1e + 1a) by UDPS. Three of seventeen genotype 1 samples resulted in genotype 3h with both sequencing approaches. Only one mixed infection was confirmed by UDPS (4d + 1a), while in 88% of cases a single component of the mixture was detected (five genotype 1a, four genotype 1b, two genotype 3a, two genotype 4m, and two genotype 4d); 44% of indeterminate samples resulted genotype 2c by both DS and UDPS, 22% resulted genotype 3a; one indeterminate sample by Abbott resulted in genotype 4d, one resulted in genotype 6n, and one was classified as subtype 3a by DS, and resulted mixed infection (3a + 3h) by UDPS. The concordance between DS and UDPS was 94%, 88%, and 89% for genotype 1, co-infection, and indeterminate results, respectively. UDPS should be considered very useful to resolve ambiguous HCV genotyping results. PMID:27739414

  7. RNA Deep Sequencing Reveals Novel Candidate Genes and Polymorphisms in Boar Testis and Liver Tissues with Divergent Androstenone Levels

    PubMed Central

    Gunawan, Asep; Sahadevan, Sudeep; Neuhoff, Christiane; Große-Brinkhaus, Christine; Gad, Ahmed; Frieden, Luc; Tesfaye, Dawit; Tholen, Ernst; Looft, Christian; Uddin, Muhammad Jasim; Schellander, Karl; Cinar, Mehmet Ulas

    2013-01-01

    Boar taint is an unpleasant smell and taste of pork meat derived from some entire male pigs. The main causes of boar taint are the two compounds androstenone (5α-androst-16-en-3-one) and skatole (3-methylindole). It is crucial to understand the genetic mechanism of boar taint to select pigs for lower androstenone levels and thus reduce boar taint. The aim of the present study was to investigate transcriptome differences in boar testis and liver tissues with divergent androstenone levels using RNA deep sequencing (RNA-Seq). The total number of reads produced for each testis and liver sample ranged from 13,221,550 to 33,206,723 and 12,755,487 to 46,050,468, respectively. In testis samples 46 genes were differentially regulated whereas 25 genes showed differential expression in the liver. The fold change values ranged from −4.68 to 2.90 in testis samples and −2.86 to 3.89 in liver samples. Differentially regulated genes in high androstenone testis and liver samples were enriched in metabolic processes such as lipid metabolism, small molecule biochemistry and molecular transport. This study provides evidence for transcriptome profile and gene polymorphisms of boars with divergent androstenone level using RNA-Seq technology. Digital gene expression analysis identified candidate genes in flavin monooxygenease family, cytochrome P450 family and hydroxysteroid dehydrogenase family. Moreover, polymorphism and association analysis revealed mutation in IRG6, MX1, IFIT2, CYP7A1, FMO5 and KRT18 genes could be potential candidate markers for androstenone levels in boars. Further studies are required for proving the role of candidate genes to be used in genomic selection against boar taint in pig breeding programs. PMID:23696805

  8. deepBase v2.0: identification, expression, evolution and function of small RNAs, LncRNAs and circular RNAs from deep-sequencing data.

    PubMed

    Zheng, Ling-Ling; Li, Jun-Hao; Wu, Jie; Sun, Wen-Ju; Liu, Shun; Wang, Ze-Lin; Zhou, Hui; Yang, Jian-Hua; Qu, Liang-Hu

    2016-01-01

    Small non-coding RNAs (e.g. miRNAs) and long non-coding RNAs (e.g. lincRNAs and circRNAs) are emerging as key regulators of various cellular processes. However, only a very small fraction of these enigmatic RNAs have been well functionally characterized. In this study, we describe deepBase v2.0 (http://biocenter.sysu.edu.cn/deepBase/), an updated platform, to decode evolution, expression patterns and functions of diverse ncRNAs across 19 species. deepBase v2.0 has been updated to provide the most comprehensive collection of ncRNA-derived small RNAs generated from 588 sRNA-Seq datasets. Moreover, we developed a pipeline named lncSeeker to identify 176 680 high-confidence lncRNAs from 14 species. Temporal and spatial expression patterns of various ncRNAs were profiled. We identified approximately 24 280 primate-specific, 5193 rodent-specific lncRNAs, and 55 highly conserved lncRNA orthologs between human and zebrafish. We annotated 14 867 human circRNAs, 1260 of which are orthologous to mouse circRNAs. By combining expression profiles and functional genomic annotations, we developed lncFunction web-server to predict the function of lncRNAs based on protein-lncRNA co-expression networks. This study is expected to provide considerable resources to facilitate future experimental studies and to uncover ncRNA functions.

  9. deepBase v2.0: identification, expression, evolution and function of small RNAs, LncRNAs and circular RNAs from deep-sequencing data.

    PubMed

    Zheng, Ling-Ling; Li, Jun-Hao; Wu, Jie; Sun, Wen-Ju; Liu, Shun; Wang, Ze-Lin; Zhou, Hui; Yang, Jian-Hua; Qu, Liang-Hu

    2016-01-01

    Small non-coding RNAs (e.g. miRNAs) and long non-coding RNAs (e.g. lincRNAs and circRNAs) are emerging as key regulators of various cellular processes. However, only a very small fraction of these enigmatic RNAs have been well functionally characterized. In this study, we describe deepBase v2.0 (http://biocenter.sysu.edu.cn/deepBase/), an updated platform, to decode evolution, expression patterns and functions of diverse ncRNAs across 19 species. deepBase v2.0 has been updated to provide the most comprehensive collection of ncRNA-derived small RNAs generated from 588 sRNA-Seq datasets. Moreover, we developed a pipeline named lncSeeker to identify 176 680 high-confidence lncRNAs from 14 species. Temporal and spatial expression patterns of various ncRNAs were profiled. We identified approximately 24 280 primate-specific, 5193 rodent-specific lncRNAs, and 55 highly conserved lncRNA orthologs between human and zebrafish. We annotated 14 867 human circRNAs, 1260 of which are orthologous to mouse circRNAs. By combining expression profiles and functional genomic annotations, we developed lncFunction web-server to predict the function of lncRNAs based on protein-lncRNA co-expression networks. This study is expected to provide considerable resources to facilitate future experimental studies and to uncover ncRNA functions. PMID:26590255

  10. Complete Genome Sequence of the Hyperthermophilic Archaeon Pyrococcus sp. Strain ST04, Isolated from a Deep-Sea Hydrothermal Sulfide Chimney on the Juan de Fuca Ridge

    PubMed Central

    Jung, Jong-Hyun; Lee, Ju-Hoon; Holden, James F.; Seo, Dong-Ho; Shin, Hakdong; Kim, Hae-Yeong; Kim, Wooki; Ryu, Sangryeol

    2012-01-01

    Pyrococcus sp. strain ST04 is a hyperthermophilic, anaerobic, and heterotrophic archaeon isolated from a deep-sea hydrothermal sulfide chimney on the Endeavour Segment of the Juan de Fuca Ridge in the northeastern Pacific Ocean. To further understand the distinct characteristics of this archaeon at the genome level (polysaccharide utilization at high temperature and ATP generation by a Na+ gradient), the genome of strain ST04 was completely sequenced and analyzed. Here, we present the complete genome sequence analysis results of Pyrococcus sp. ST04 and report the major findings from the genome annotation, with a focus on its saccharolytic and metabolite production potential. PMID:22843576

  11. Complete genome sequence of the hyperthermophilic archaeon Pyrococcus sp. strain ST04, isolated from a deep-sea hydrothermal sulfide chimney on the Juan de Fuca Ridge.

    PubMed

    Jung, Jong-Hyun; Lee, Ju-Hoon; Holden, James F; Seo, Dong-Ho; Shin, Hakdong; Kim, Hae-Yeong; Kim, Wooki; Ryu, Sangryeol; Park, Cheon-Seok

    2012-08-01

    Pyrococcus sp. strain ST04 is a hyperthermophilic, anaerobic, and heterotrophic archaeon isolated from a deep-sea hydrothermal sulfide chimney on the Endeavour Segment of the Juan de Fuca Ridge in the northeastern Pacific Ocean. To further understand the distinct characteristics of this archaeon at the genome level (polysaccharide utilization at high temperature and ATP generation by a Na(+) gradient), the genome of strain ST04 was completely sequenced and analyzed. Here, we present the complete genome sequence analysis results of Pyrococcus sp. ST04 and report the major findings from the genome annotation, with a focus on its saccharolytic and metabolite production potential.

  12. Complete genome sequence of the hyperthermophilic archaeon Pyrococcus sp. strain ST04, isolated from a deep-sea hydrothermal sulfide chimney on the Juan de Fuca Ridge.

    PubMed

    Jung, Jong-Hyun; Lee, Ju-Hoon; Holden, James F; Seo, Dong-Ho; Shin, Hakdong; Kim, Hae-Yeong; Kim, Wooki; Ryu, Sangryeol; Park, Cheon-Seok

    2012-08-01

    Pyrococcus sp. strain ST04 is a hyperthermophilic, anaerobic, and heterotrophic archaeon isolated from a deep-sea hydrothermal sulfide chimney on the Endeavour Segment of the Juan de Fuca Ridge in the northeastern Pacific Ocean. To further understand the distinct characteristics of this archaeon at the genome level (polysaccharide utilization at high temperature and ATP generation by a Na(+) gradient), the genome of strain ST04 was completely sequenced and analyzed. Here, we present the complete genome sequence analysis results of Pyrococcus sp. ST04 and report the major findings from the genome annotation, with a focus on its saccharolytic and metabolite production potential. PMID:22843576

  13. Comparative clinical sample preparation of DNA and RNA viral nucleic acids for a commercial deep sequencing system (Illumina MiSeq(®)).

    PubMed

    Ullmann, Leila Sabrina; de Camargo Tozato, Claudia; Malossi, Camila Dantas; da Cruz, Tais Fukuta; Cavalcante, Raíssa Vasconcelos; Kurissio, Jacqueline Kazue; Cagnini, Didier Quevedo; Rodrigues, Marianna Vaz; Biondo, Alexander Welker; Araujo, João Pessoa

    2015-08-01

    Sequence-independent methods for viral discovery have been widely used for whole genome sequencing of viruses. Different protocols for viral enrichment, library preparation and sequencing have increasingly been more available and at lower costs. However, no study to date has focused on optimization of viral sample preparation for commercial deep sequencing. Accordingly, the aim of the present study was to evaluate an In-House enzymatic protocol for double-stranded DNA (dsDNA) synthesis and also compare the use of a commercially available kit protocol (Nextera XT, Illumina Inc, San Diego, CA, USA) and its combination with a library quantitation kit (Kapa, Kapa Biosystems, Wilmington, MA, USA) for deep sequencing (Illumina Miseq). Two RNA viruses (canine distemper virus and dengue virus) and one ssDNA virus (porcine circovirus type 2) were tested with the optimized protocols. The tested method for dsDNA synthesis has shown satisfactory results and may be used in laboratory setting, particularly when enzymes are already available. Library preparation combining commercial kits (Nextera XT and Kapa) has yielded more reads and genome coverage, probably due to a lack of small fragment recovering at the normalization step of Nextera XT. In addition, libraries may be diluted or concentrated to provide increase on genome coverage with Kapa quantitation.

  14. Complete Genome Sequence of the d-Amino Acid Catabolism Bacterium Phaeobacter sp. Strain JL2886, Isolated from Deep Seawater of the South China Sea.

    PubMed

    Fu, Yingnan; Wang, Rui; Zhang, Zilian; Jiao, Nianzhi

    2016-01-01

    Phaeobacter sp. strain JL2886, isolated from deep seawater of the South China Sea, can catabolize d-amino acids. Here, we report the complete genome sequence of Phaeobacter sp. JL2886. It comprises ~4.06 Mbp, with a G+C content of 61.52%. A total of 3,913 protein-coding genes and 10 genes related to d-amino acid catabolism were obtained. PMID:27587825

  15. Complete Genome Sequence of the d-Amino Acid Catabolism Bacterium Phaeobacter sp. Strain JL2886, Isolated from Deep Seawater of the South China Sea

    PubMed Central

    Fu, Yingnan; Wang, Rui

    2016-01-01

    Phaeobacter sp. strain JL2886, isolated from deep seawater of the South China Sea, can catabolize d-amino acids. Here, we report the complete genome sequence of Phaeobacter sp. JL2886. It comprises ~4.06 Mbp, with a G+C content of 61.52%. A total of 3,913 protein-coding genes and 10 genes related to d-amino acid catabolism were obtained. PMID:27587825

  16. Complete Genome Sequence of the d-Amino Acid Catabolism Bacterium Phaeobacter sp. Strain JL2886, Isolated from Deep Seawater of the South China Sea.

    PubMed

    Fu, Yingnan; Wang, Rui; Zhang, Zilian; Jiao, Nianzhi

    2016-01-01

    Phaeobacter sp. strain JL2886, isolated from deep seawater of the South China Sea, can catabolize d-amino acids. Here, we report the complete genome sequence of Phaeobacter sp. JL2886. It comprises ~4.06 Mbp, with a G+C content of 61.52%. A total of 3,913 protein-coding genes and 10 genes related to d-amino acid catabolism were obtained.

  17. Deep sequencing analysis of the heterogeneity of seed and commercial lots of the bacillus Calmette-Guérin (BCG) tuberculosis vaccine substrain Tokyo-172.

    PubMed

    Wada, Takayuki; Maruyama, Fumito; Iwamoto, Tomotada; Maeda, Shinji; Yamamoto, Taro; Nakagawa, Ichiro; Yamamoto, Saburo; Ohara, Naoya

    2015-12-04

    BCG, only vaccine available to prevent tuberculosis, was established in the early 20th century by prolonged passaging of a virulent clinical strain of Mycobacterium bovis. BCG Tokyo-172, originally distributed within Japan in 1924, is one of the currently used reference substrains for the vaccine. Recently, this substrain was reported to contain two spontaneously arising, heterogeneous subpopulations (Types I and II). The proportions of the subpopulations changed over time in both distributed seed lots and commercial lots. To maintain the homogeneity of live vaccines, such variations and subpopulational mutations in lots should be restrained and monitored. We incorporated deep sequencing techniques to validate such heterogeneity in lots of the BCG Tokyo-172 substrain without cloning. By bioinformatics analysis, we not only detected the two subpopulations but also detected two intrinsic variations within these populations. The intrinsic variants could be isolated from respective lots as colonies cultured on plate media, suggesting analyses incorporating deep sequencing techniques are powerful, valid tools to detect mutations in live bacterial vaccine lots. Our data showed that spontaneous mutations in BCG vaccines could be easily monitored by deep sequencing without direct isolation of variants, revealing the complex heterogeneity of BCG Tokyo-172 and its daughter lots currently in use.

  18. An efficient strategy of screening for pathogens in wild-caught ticks and mosquitoes by reusing small RNA deep sequencing data.

    PubMed

    Zhuang, Lu; Zhang, Zhiyi; An, Xiaoping; Fan, Hang; Ma, Maijuan; Anderson, Benjamin D; Jiang, Jiafu; Liu, Wei; Cao, Wuchun; Tong, Yigang

    2014-01-01

    This paper explored our hypothesis that sRNA (18 ∼ 30 bp) deep sequencing technique can be used as an efficient strategy to identify microorganisms other than viruses, such as prokaryotic and eukaryotic pathogens. In the study, the clean reads derived from the sRNA deep sequencing data of wild-caught ticks and mosquitoes were compared against the NCBI nucleotide collection (non-redundant nt database) using Blastn. The blast results were then analyzed with in-house Python scripts. An empirical formula was proposed to identify the putative pathogens. Results showed that not only viruses but also prokaryotic and eukaryotic species of interest can be screened out and were subsequently confirmed with experiments. Specially, a novel Rickettsia spp. was indicated to exist in Haemaphysalis longicornis ticks collected in Beijing. Our study demonstrated the reuse of sRNA deep sequencing data would have the potential to trace the origin of pathogens or discover novel agents of emerging/re-emerging infectious diseases.

  19. Deep sequencing analysis of the heterogeneity of seed and commercial lots of the bacillus Calmette-Guérin (BCG) tuberculosis vaccine substrain Tokyo-172.

    PubMed

    Wada, Takayuki; Maruyama, Fumito; Iwamoto, Tomotada; Maeda, Shinji; Yamamoto, Taro; Nakagawa, Ichiro; Yamamoto, Saburo; Ohara, Naoya

    2015-01-01

    BCG, only vaccine available to prevent tuberculosis, was established in the early 20th century by prolonged passaging of a virulent clinical strain of Mycobacterium bovis. BCG Tokyo-172, originally distributed within Japan in 1924, is one of the currently used reference substrains for the vaccine. Recently, this substrain was reported to contain two spontaneously arising, heterogeneous subpopulations (Types I and II). The proportions of the subpopulations changed over time in both distributed seed lots and commercial lots. To maintain the homogeneity of live vaccines, such variations and subpopulational mutations in lots should be restrained and monitored. We incorporated deep sequencing techniques to validate such heterogeneity in lots of the BCG Tokyo-172 substrain without cloning. By bioinformatics analysis, we not only detected the two subpopulations but also detected two intrinsic variations within these populations. The intrinsic variants could be isolated from respective lots as colonies cultured on plate media, suggesting analyses incorporating deep sequencing techniques are powerful, valid tools to detect mutations in live bacterial vaccine lots. Our data showed that spontaneous mutations in BCG vaccines could be easily monitored by deep sequencing without direct isolation of variants, revealing the complex heterogeneity of BCG Tokyo-172 and its daughter lots currently in use. PMID:26635118

  20. De Novo Assembly of the Common Bean Transcriptome Using Short Reads for the Discovery of Drought-Responsive Genes

    PubMed Central

    Wu, Jing; Wang, Lanfen; Li, Long; Wang, Shumin

    2014-01-01

    The common bean (Phaseolus vulgaris L.) is one of the most important food legumes, far ahead of other legumes. The average grain yield of the common bean worldwide is much lower than its potential yields, primarily due to drought in the field. However, the gene network that mediates plant responses to drought stress remains largely unknown in this species. The major goals of our study are to identify a large scale of genes involved in drought stress using RNA-seq. First, we assembled 270 million high-quality trimmed reads into a non-redundant set of 62,828 unigenes, representing approximately 49 Mb of unique transcriptome sequences. Of these unigenes, 26,501 (42.2%) common bean unigenes had significant similarity with unigenes/predicted proteins from other legumes or sequenced plants. All unigenes were functionally annotated within the GO, COG and KEGG pathways. The strategy for de novo assembly of transcriptome data generated here will be useful in other legume plant transcriptome studies. Second, we identified 10,482 SSRs and 4,099 SNPs in transcripts. The large number of genetic markers provides a resource for gene discovery and development of functional molecular markers. Finally, we found differential expression genes (DEGs) between terminal drought and optimal irrigation treatments and between the two different genotypes Long 22-0579 (drought tolerant) and Naihua (drought sensitive). DEGs were confirmed by quantitative real-time PCR assays, which indicated that these genes are functionally associated with the drought-stress response. These resources will be helpful for basic and applied research for genome analysis and crop drought resistance improvement in the common bean. PMID:25275443

  1. Ultra-Deep Bisulfite Sequencing to Detect Specific DNA Methylation Patterns of Minor Cell Types in Heterogeneous Cell Populations: An Example of the Pituitary Tissue.

    PubMed

    Arai, Yoshikazu; Fukukawa, Hisho; Atozi, Takanori; Matsumoto, Shoma; Hanazono, Yutaka; Nagashima, Hiroshi; Ohgane, Jun

    2016-01-01

    DNA methylation is an epigenetic modification important for cell fate determination and cell type-specific gene expression. Transcriptional regulatory regions of the mammalian genome contain a large number of tissue/cell type-dependent differentially methylated regions (T-DMRs) with DNA methylation patterns crucial for transcription of the corresponding genes. In general, tissues consist of multiple cell types in various proportions, making it difficult to detect T-DMRs of minor cell types in tissues. The present study attempts to detect T-DMRs of minor cell types in tissues by ultra-deep bisulfite sequencing of cell type-restricted genes and to assume proportions of minor cell types based on DNA methylation patterns of sequenced reads. For this purpose, we focused on transcriptionally active hypomethylated alleles (Hypo-alleles), which can be recognized by the high ratio of unmethylated CpGs in each sequenced read (allele). The pituitary gland contains multiple cell types including five hormone-expressing cell types and stem/progenitor cells, each of which is a minor cell type in the pituitary tissue. By ultra-deep sequencing of more than 100 reads for detection of Hypo-alleles in pituitary cell type-specific genes, we identified T-DMRs specific to hormone-expressing cells and stem/progenitor cells and used them to estimate the proportions of each cell type based on the Hypo-allele ratio in pituitary tissue. Therefore, introduction of the novel Hypo-allele concept enabled us to detect T-DMRs of minor cell types with estimation of their proportions in the tissue by ultra-deep bisulfite sequencing.

  2. Deep Sequencing Analysis of miRNA Expression in Breast Muscle of Fast-Growing and Slow-Growing Broilers

    PubMed Central

    Ouyang, Hongjia; He, Xiaomei; Li, Guihuan; Xu, Haiping; Jia, Xinzheng; Nie, Qinghua; Zhang, Xiquan

    2015-01-01

    Growth performance is an important economic trait in chicken. MicroRNAs (miRNAs) have been shown to play important roles in various biological processes, but their functions in chicken growth are not yet clear. To investigate the function of miRNAs in chicken growth, breast muscle tissues of the two-tail samples (highest and lowest body weight) from Recessive White Rock (WRR) and Xinghua Chickens (XH) were performed on high throughput small RNA deep sequencing. In this study, a total of 921 miRNAs were identified, including 733 known mature miRNAs and 188 novel miRNAs. There were 200, 279, 257 and 297 differentially expressed miRNAs in the comparisons of WRRh vs. WRRl, WRRh vs. XHh, WRRl vs. XHl, and XHh vs. XHl group, respectively. A total of 22 highly differentially expressed miRNAs (fold change > 2 or < 0.5; p-value < 0.05; q-value < 0.01), which also have abundant expression (read counts > 1000) were found in our comparisons. As far as two analyses (WRRh vs. WRRl, and XHh vs. XHl) are concerned, we found 80 common differentially expressed miRNAs, while 110 miRNAs were found in WRRh vs. XHh and WRRl vs. XHl. Furthermore, 26 common miRNAs were identified among all four comparisons. Four differentially expressed miRNAs (miR-223, miR-16, miR-205a and miR-222b-5p) were validated by quantitative real-time RT-PCR (qRT-PCR). Regulatory networks of interactions among miRNAs and their targets were constructed using integrative miRNA target-prediction and network-analysis. Growth hormone receptor (GHR) was confirmed as a target of miR-146b-3p by dual-luciferase assay and qPCR, indicating that miR-34c, miR-223, miR-146b-3p, miR-21 and miR-205a are key growth-related target genes in the network. These miRNAs are proposed as candidate miRNAs for future studies concerning miRNA-target function on regulation of chicken growth. PMID:26193261

  3. Transposon Mutagenesis Paired with Deep Sequencing of Caulobacter crescentus under Uranium Stress Reveals Genes Essential for Detoxification and Stress Tolerance

    PubMed Central

    Yung, Mimi C.; Park, Dan M.; Overton, K. Wesley; Blow, Matthew J.; Hoover, Cindi A.; Smit, John; Murray, Sean R.; Ricci, Dante P.; Christen, Beat; Bowman, Grant R.

    2015-01-01

    ABSTRACT The ubiquitous aquatic bacterium Caulobacter crescentus is highly resistant to uranium (U) and facilitates U biomineralization and thus holds promise as an agent of U bioremediation. To gain an understanding of how C. crescentus tolerates U, we employed transposon (Tn) mutagenesis paired with deep sequencing (Tn-seq) in a global screen for genomic elements required for U resistance. Of the 3,879 annotated genes in the C. crescentus genome, 37 were found to be specifically associated with fitness under U stress, 15 of which were subsequently tested through mutational analysis. Systematic deletion analysis revealed that mutants lacking outer membrane transporters (rsaFa and rsaFb), a stress-responsive transcription factor (cztR), or a ppGpp synthetase/hydrolase (spoT) exhibited a significantly lower survival rate under U stress. RsaFa and RsaFb, which are homologues of TolC in Escherichia coli, have previously been shown to mediate S-layer export. Transcriptional analysis revealed upregulation of rsaFa and rsaFb by 4- and 10-fold, respectively, in the presence of U. We additionally show that rsaFa mutants accumulated higher levels of U than the wild type, with no significant increase in oxidative stress levels. Our results suggest a function for RsaFa and RsaFb in U efflux and/or maintenance of membrane integrity during U stress. In addition, we present data implicating CztR and SpoT in resistance to U stress. Together, our findings reveal novel gene targets that are key to understanding the molecular mechanisms of U resistance in C. crescentus. IMPORTANCE Caulobacter crescentus is an aerobic bacterium that is highly resistant to uranium (U) and has great potential to be used in U bioremediation, but its mechanisms of U resistance are poorly understood. We conducted a Tn-seq screen to identify genes specifically required for U resistance in C. crescentus. The genes that we identified have previously remained elusive using other omics approaches and thus

  4. Identification and characterization of microRNAs by deep-sequencing in Hyalomma anatolicum anatolicum (Acari: Ixodidae) ticks.

    PubMed

    Luo, Jin; Liu, Guang-Yuan; Chen, Ze; Ren, Qiao-Yun; Yin, Hong; Luo, Jian-Xun; Wang, Hui

    2015-06-15

    Hyalomma anatolicum anatolicum (H.a. anatolicum) (Acari: Ixodidae) ticks are globally distributed ectoparasites with veterinary and medical importance. These ticks not only weaken animals by sucking their blood but also transmit different species of parasitic protozoans. Multiple factors influence these parasitic infections including miRNAs, which are non-coding, small regulatory RNA molecules essential for the complex life cycle of parasites. To identify and characterize miRNAs in H.a. anatolicum, we developed an integrative approach combining deep sequencing, bioinformatics and real-time PCR analysis. Here we report the use of this approach to identify miRNA expression, family distribution, and nucleotide characteristics, and discovered novel miRNAs in H.a. anatolicum. The result showed that miR-1-3p, miR-275-3p, and miR-92a were expressed abundantly. There was a strong bias on miRNA, family members, and nucleotide compositions at certain positions in H.a. anatolicum miRNA. Uracil was the dominant nucleotide, particularly at positions 1, 6, 16, and 18, which were located approximately at the beginning, middle, and end of conserved miRNAs. Analysis of the conserved miRNAs indicated that miRNAs in H.a. anatolicum were concentrated along three diverse phylogenetic branches of bilaterians, insects and coelomates. Two possible roles for the use of miRNA in H.a. anatolicum could be presumed based on its parasitic life cycle: to maintain a large category of miRNA families of different animals, and/or to preserve stringent conserved seed regions with active changes in other places of miRNAs mainly in the middle and the end regions. These might help the parasite to undergo its complex life style in different hosts and adapt more readily to the host changes. The present study represents the first large scale characterization of H.a. anatolicum miRNAs, which could further the understanding of the complex biology of this zoonotic parasite, as well as initiate miRNA studies

  5. Short Reads Phasing to Construct Haplotypes in Genomic Regions That Are Associated with Body Mass Index in Korean Individuals

    PubMed Central

    Lee, Kichan; Han, Seonggyun; Tark, Yeonjeong

    2014-01-01

    Genome-wide association (GWA) studies have found many important genetic variants that affect various traits. Since these studies are useful to investigate untyped but causal variants using linkage disequilibrium (LD), it would be useful to explore the haplotypes of single-nucleotide polymorphisms (SNPs) within the same LD block of significant associations based on high-density variants from population references. Here, we tried to make a haplotype catalog affecting body mass index (BMI) through an integrative analysis of previously published whole-genome next-generation sequencing (NGS) data of 7 representative Korean individuals and previously known Korean GWA signals. We selected 435 SNPs that were significantly associated with BMI from the GWA analysis and searched 53 LD ranges nearby those SNPs. With the NGS data, the haplotypes were phased within the LDs. A total of 44 possible haplotype blocks for Korean BMI were cataloged. Although the current result constitutes little data, this study provides new insights that may help to identify important haplotypes for traits and low variants nearby significant SNPs. Furthermore, we can build a more comprehensive catalog as a larger dataset becomes available. PMID:25705154

  6. Phylogenetic and genome-wide deep-sequencing analyses of canine parvovirus reveal co-infection with field variants and emergence of a recent recombinant strain.

    PubMed

    Pérez, Ruben; Calleros, Lucía; Marandino, Ana; Sarute, Nicolás; Iraola, Gregorio; Grecco, Sofia; Blanc, Hervé; Vignuzzi, Marco; Isakov, Ofer; Shomron, Noam; Carrau, Lucía; Hernández, Martín; Francia, Lourdes; Sosa, Katia; Tomás, Gonzalo; Panzera, Yanina

    2014-01-01

    Canine parvovirus (CPV), a fast-evolving single-stranded DNA virus, comprises three antigenic variants (2a, 2b, and 2c) with different frequencies and genetic variability among countries. The contribution of co-infection and recombination to the genetic variability of CPV is far from being fully elucidated. Here we took advantage of a natural CPV population, recently formed by the convergence of divergent CPV-2c and CPV-2a strains, to study co-infection and recombination. Complete sequences of the viral coding region of CPV-2a and CPV-2c strains from 40 samples were generated and analyzed using phylogenetic tools. Two samples showed co-infection and were further analyzed by deep sequencing. The sequence profile of one of the samples revealed the presence of CPV-2c and CPV-2a strains that differed at 29 nucleotides. The other sample included a minor CPV-2a strain (13.3% of the viral population) and a major recombinant strain (86.7%). The recombinant strain arose from inter-genotypic recombination between CPV-2c and CPV-2a strains within the VP1/VP2 gene boundary. Our findings highlight the importance of deep-sequencing analysis to provide a better understanding of CPV molecular diversity.

  7. Fault segmentation, deep rift earthquakes and crustal rheology: Insights from the 2009 Karonga sequence and seismicity in the Rukwa-Malawi rift zone

    NASA Astrophysics Data System (ADS)

    Fagereng, Å.

    2013-08-01

    The Rukwa-Malawi rift zone has a record of seismic events down to depths in excess of 30 km, deep for a zone of active continental extension. This deep seismicity, as well as the presence of long (~ 100 km) border faults, has previously been explained by the long-term bulk rheology of intact, old, cold, anhydrous strong crust in east Africa, or the presence of mafic material in the lower crust. The Karonga sequence of 2009 showed a style of faulting different from continuous slip along long border faults, and is interpreted as segmented failure of hanging wall faults. Coulomb stress transfer in this sequence is calculated, and found to be consistent with segmented slip on a fault system synthetic to a nearby border fault and restricted to depths < 12 km. The inferred thermal structure of the Malawi rift indicates that slip at depths in excess of 30 km occurs at temperatures greater than the 350-450 °C commonly inferred at the base of the seismogenic zone. Crustal strength calculations indicate that long border faults and deep seismicity require the presence of a weak zone of localized deformation with increased strain rate (or fluid pressure), within a strong lower crust. A hypothesis is proposed where shallow, segmented frictional failure occurs in regions of relatively strong, intact crust (e.g. the Karonga sequence), whereas long border faults and deep earthquakes are representative of zones of weakness within strong crust. This hypothesis, if correct, implies that seismogenic thickness can vary within thick elastic lithosphere, such that localized weak zones of the crust enable nucleation of larger seismic events, whereas strong, intact crust favors smaller, segmented events and a shallower seismogenic zone.

  8. Finding the needle in the haystack: differentiating "identical" twins in paternity testing and forensics by ultra-deep next generation sequencing.

    PubMed

    Weber-Lehmann, Jacqueline; Schilling, Elmar; Gradl, Georg; Richter, Daniel C; Wiehler, Jens; Rolf, Burkhard

    2014-03-01

    Monozygotic (MZ) twins are considered being genetically identical, therefore they cannot be differentiated using standard forensic DNA testing. Here we describe how identification of extremely rare mutations by ultra-deep next generation sequencing can solve such cases. We sequenced DNA from sperm samples of two twins and from a blood sample of the child of one twin. Bioinformatics analysis revealed five single nucleotide polymorphisms (SNPs) present in the twin father and the child, but not in the twin uncle. The SNPs were confirmed by classical Sanger sequencing. Our results give experimental evidence for the hypothesis that rare mutations will occur early after the human blastocyst has split into two, the origin of twins, and that such mutations will be carried on into somatic tissue and the germline. The method provides a solution to solve paternity and forensic cases involving monozygotic twins as alleged fathers or originators of DNA traces. PMID:24528578

  9. Complete genome sequence of the aerobic, heterotroph Marinithermus hydrothermalis type strain (T1(T)) from a deep-sea hydrothermal vent chimney.

    PubMed

    Copeland, Alex; Gu, Wei; Yasawong, Montri; Lapidus, Alla; Lucas, Susan; Deshpande, Shweta; Pagani, Ioanna; Tapia, Roxanne; Cheng, Jan-Fang; Goodwin, Lynne A; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Pan, Chongle; Brambilla, Evelyne-Marie; Rohde, Manfred; Tindall, Brian J; Sikorski, Johannes; Göker, Markus; Detter, John C; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Woyke, Tanja

    2012-03-19

    Marinithermus hydrothermalis Sako et al. 2003 is the type species of the monotypic genus Marinithermus. M. hydrothermalis T1(T) was the first isolate within the phylum "Thermus-Deinococcus" to exhibit optimal growth under a salinity equivalent to that of sea water and to have an absolute requirement for NaCl for growth. M. hydrothermalis T1(T) is of interest because it may provide a new insight into the ecological significance of the aerobic, thermophilic decomposers in the circulation of organic compounds in deep-sea hydrothermal vent ecosystems. This is the first completed genome sequence of a member of the genus Marinithermus and the seventh sequence from the family Thermaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,269,167 bp long genome with its 2,251 protein-coding and 59 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  10. Complete genome sequence of the aerobic, heterotroph Marinithermus hydrothermalis type strain (T1T) from a deep-sea hydrothermal vent chimney

    SciTech Connect

    Copeland, A; Gu, Wei; Yasawong, Montri; Lapidus, Alla L.; Lucas, Susan; Deshpande, Shweta; Pagani, Ioanna; Tapia, Roxanne; Cheng, Jan-Fang; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Pan, Chongle; Brambilla, Evelyne-Marie; Rohde, Manfred; Tindall, Brian; Sikorski, Johannes; Goker, Markus; Detter, J. Chris; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Woyke, Tanja

    2012-01-01

    Marinithermus hydrothermalis Sako et al. 2003 is the type species of the monotypic genus Marinithermus. M. hydrothermalis T1 T was the first isolate within the phylum ThermusDeinococcus to exhibit optimal growth under a salinity equivalent to that of sea water and to have an absolute requirement for NaCl for growth. M. hydrothermalis T1 T is of interest because it may provide a new insight into the ecological significance of the aerobic, thermophilic decomposers in the circulation of organic compounds in deep-sea hydrothermal vent ecosystems. This is the first completed genome sequence of a member of the genus Marinithermus and the seventh sequence from the family Thermaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,269,167 bp long genome with its 2,251 protein-coding and 59 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  11. Deep-sequencing method for quantifying background abundances of symbiodinium types: exploring the rare symbiodinium biosphere in reef-building corals.

    PubMed

    Quigley, Kate M; Davies, Sarah W; Kenkel, Carly D; Willis, Bette L; Matz, Mikhail V; Bay, Line K

    2014-01-01

    The capacity of reef-building corals to associate with environmentally-appropriate types of endosymbionts from the dinoflagellate genus Symbiodinium contributes significantly to their success at local scales. Additionally, some corals are able to acclimatize to environmental perturbations by shuffling the relative proportions of different Symbiodinium types hosted. Understanding the dynamics of these symbioses requires a sensitive and quantitative method of Symbiodinium genotyping. Electrophoresis methods, still widely utilized for this purpose, are predominantly qualitative and cannot guarantee detection of a background type below 10% of the total Symbiodinium population. Here, the relative abundances of four Symbiodinium types (A13, C1, C3, and D1) in mixed samples of known composition were quantified using deep sequencing of the internal transcribed spacer of the ribosomal RNA gene (ITS-2) by means of Next Generation Sequencing (NGS) using Roche 454. In samples dominated by each of the four Symbiodinium types tested, background levels of the other three types were detected when present at 5%, 1%, and 0.1% levels, and their relative abundances were quantified with high (A13, C1, D1) to variable (C3) accuracy. The potential of this deep sequencing method for resolving fine-scale genetic diversity within a symbiont type was further demonstrated in a natural symbiosis using ITS-1, and uncovered reef-specific differences in the composition of Symbiodinium microadriaticum in two species of acroporid corals (Acropora digitifera and A. hyacinthus) from Palau. The ability of deep sequencing of the ITS locus (1 and 2) to detect and quantify low-abundant Symbiodinium types, as well as finer-scale diversity below the type level, will enable more robust quantification of local genetic diversity in Symbiodinium populations. This method will help to elucidate the role that background types have in maximizing coral fitness across diverse environments and in response to

  12. Deep-Sequencing Method for Quantifying Background Abundances of Symbiodinium Types: Exploring the Rare Symbiodinium Biosphere in Reef-Building Corals

    PubMed Central

    Quigley, Kate M.; Davies, Sarah W.; Kenkel, Carly D.; Willis, Bette L.; Matz, Mikhail V.; Bay, Line K.

    2014-01-01

    The capacity of reef-building corals to associate with environmentally-appropriate types of endosymbionts from the dinoflagellate genus Symbiodinium contributes significantly to their success at local scales. Additionally, some corals are able to acclimatize to environmental perturbations by shuffling the relative proportions of different Symbiodinium types hosted. Understanding the dynamics of these symbioses requires a sensitive and quantitative method of Symbiodinium genotyping. Electrophoresis methods, still widely utilized for this purpose, are predominantly qualitative and cannot guarantee detection of a background type below 10% of the total Symbiodinium population. Here, the relative abundances of four Symbiodinium types (A13, C1, C3, and D1) in mixed samples of known composition were quantified using deep sequencing of the internal transcribed spacer of the ribosomal RNA gene (ITS-2) by means of Next Generation Sequencing (NGS) using Roche 454. In samples dominated by each of the four Symbiodinium types tested, background levels of the other three types were detected when present at 5%, 1%, and 0.1% levels, and their relative abundances were quantified with high (A13, C1, D1) to variable (C3) accuracy. The potential of this deep sequencing method for resolving fine-scale genetic diversity within a symbiont type was further demonstrated in a natural symbiosis using ITS-1, and uncovered reef-specific differences in the composition of Symbiodinium microadriaticum in two species of acroporid corals (Acropora digitifera and A. hyacinthus) from Palau. The ability of deep sequencing of the ITS locus (1 and 2) to detect and quantify low-abundant Symbiodinium types, as well as finer-scale diversity below the type level, will enable more robust quantification of local genetic diversity in Symbiodinium populations. This method will help to elucidate the role that background types have in maximizing coral fitness across diverse environments and in response to

  13. Transcriptional Slippage and RNA Editing Increase the Diversity of Transcripts in Chloroplasts: Insight from Deep Sequencing of Vigna radiata Genome and Transcriptome.

    PubMed

    Lin, Ching-Ping; Ko, Chia-Yun; Kuo, Ching-I; Liu, Mao-Sen; Schafleitner, Roland; Chen, Long-Fang Oliver

    2015-01-01

    We performed deep sequencing of the nuclear and organellar genomes of three mungbean genotypes: Vigna radiata ssp. sublobata TC1966, V. radiata var. radiata NM92 and the recombinant inbred line RIL59 derived from a cross between TC1966 and NM92. Moreover, we performed deep sequencing of the RIL59 transcriptome to investigate transcript variability. The mungbean chloroplast genome has a quadripartite structure including a pair of inverted repeats separated by two single copy regions. A total of 213 simple sequence repeats were identified in the chloroplast genomes of NM92 and RIL59; 78 single nucleotide variants and nine indels were discovered in comparing the chloroplast genomes of TC1966 and NM92. Analysis of the mungbean chloroplast transcriptome revealed mRNAs that were affected by transcriptional slippage and RNA editing. Transcriptional slippage frequency was positively correlated with the length of simple sequence repeats of the mungbean chloroplast genome (R2=0.9911). In total, 41 C-to-U editing sites were found in 23 chloroplast genes and in one intergenic spacer. No editing site that swapped U to C was found. A combination of bioinformatics and experimental methods revealed that the plastid-encoded RNA polymerase-transcribed genes psbF and ndhA are affected by transcriptional slippage in mungbean and in main lineages of land plants, including three dicots (Glycine max, Brassica rapa, and Nicotiana tabacum), two monocots (Oryza sativa and Zea mays), two gymnosperms (Pinus taeda and Ginkgo biloba) and one moss (Physcomitrella patens). Transcript analysis of the rps2 gene showed that transcriptional slippage could affect transcripts at single sequence repeat regions with poly-A runs. It showed that transcriptional slippage together with incomplete RNA editing may cause sequence diversity of transcripts in chloroplasts of land plants.

  14. Arthropod Phylogenetics in Light of Three Novel Millipede (Myriapoda: Diplopoda) Mitochondrial Genomes with Comments on the Appropriateness of Mitochondrial Genome Sequence Data for Inferring Deep Level Relationships

    PubMed Central

    Brewer, Michael S.; Swafford, Lynn; Spruill, Chad L.; Bond, Jason E.

    2013-01-01

    Background Arthropods are the most diverse group of eukaryotic organisms, but their phylogenetic relationships are poorly understood. Herein, we describe three mitochondrial genomes representing orders of millipedes for which complete genomes had not been characterized. Newly sequenced genomes are combined with existing data to characterize the protein coding regions of myriapods and to attempt to reconstruct the evolutionary relationships within the Myriapoda and Arthropoda. Results The newly sequenced genomes are similar to previously characterized millipede sequences in terms of synteny and length. Unique translocations occurred within the newly sequenced taxa, including one half of the Appalachioria falcifera genome, which is inverted with respect to other millipede genomes. Across myriapods, amino acid conservation levels are highly dependent on the gene region. Additionally, individual loci varied in the level of amino acid conservation. Overall, most gene regions showed low levels of conservation at many sites. Attempts to reconstruct the evolutionary relationships suffered from questionable relationships and low support values. Analyses of phylogenetic informativeness show the lack of signal deep in the trees (i.e., genes evolve too quickly). As a result, the myriapod tree resembles previously published results but lacks convincing support, and, within the arthropod tree, well established groups were recovered as polyphyletic. Conclusions The novel genome sequences described herein provide useful genomic information concerning millipede groups that had not been investigated. Taken together with existing sequences, the variety of compositions and evolution of myriapod mitochondrial genomes are shown to be more complex than previously thought. Unfortunately, the use of mitochondrial protein-coding regions in deep arthropod phylogenetics appears problematic, a result consistent with previously published studies. Lack of phylogenetic signal renders the

  15. Transcriptional Slippage and RNA Editing Increase the Diversity of Transcripts in Chloroplasts: Insight from Deep Sequencing of Vigna radiata Genome and Transcriptome

    PubMed Central

    Kuo, Ching-I; Liu, Mao-Sen; Schafleitner, Roland; Chen, Long-Fang Oliver

    2015-01-01

    We performed deep sequencing of the nuclear and organellar genomes of three mungbean genotypes: Vigna radiata ssp. sublobata TC1966, V. radiata var. radiata NM92 and the recombinant inbred line RIL59 derived from a cross between TC1966 and NM92. Moreover, we performed deep sequencing of the RIL59 transcriptome to investigate transcript variability. The mungbean chloroplast genome has a quadripartite structure including a pair of inverted repeats separated by two single copy regions. A total of 213 simple sequence repeats were identified in the chloroplast genomes of NM92 and RIL59; 78 single nucleotide variants and nine indels were discovered in comparing the chloroplast genomes of TC1966 and NM92. Analysis of the mungbean chloroplast transcriptome revealed mRNAs that were affected by transcriptional slippage and RNA editing. Transcriptional slippage frequency was positively correlated with the length of simple sequence repeats of the mungbean chloroplast genome (R2=0.9911). In total, 41 C-to-U editing sites were found in 23 chloroplast genes and in one intergenic spacer. No editing site that swapped U to C was found. A combination of bioinformatics and experimental methods revealed that the plastid-encoded RNA polymerase-transcribed genes psbF and ndhA are affected by transcriptional slippage in mungbean and in main lineages of land plants, including three dicots (Glycine max, Brassica rapa, and Nicotiana tabacum), two monocots (Oryza sativa and Zea mays), two gymnosperms (Pinus taeda and Ginkgo biloba) and one moss (Physcomitrella patens). Transcript analysis of the rps2 gene showed that transcriptional slippage could affect transcripts at single sequence repeat regions with poly-A runs. It showed that transcriptional slippage together with incomplete RNA editing may cause sequence diversity of transcripts in chloroplasts of land plants. PMID:26076132

  16. Transcriptional Slippage and RNA Editing Increase the Diversity of Transcripts in Chloroplasts: Insight from Deep Sequencing of Vigna radiata Genome and Transcriptome.

    PubMed

    Lin, Ching-Ping; Ko, Chia-Yun; Kuo, Ching-I; Liu, Mao-Sen; Schafleitner, Roland; Chen, Long-Fang Oliver

    2015-01-01

    We performed deep sequencing of the nuclear and organellar genomes of three mungbean genotypes: Vigna radiata ssp. sublobata TC1966, V. radiata var. radiata NM92 and the recombinant inbred line RIL59 derived from a cross between TC1966 and NM92. Moreover, we performed deep sequencing of the RIL59 transcriptome to investigate transcript variability. The mungbean chloroplast genome has a quadripartite structure including a pair of inverted repeats separated by two single copy regions. A total of 213 simple sequence repeats were identified in the chloroplast genomes of NM92 and RIL59; 78 single nucleotide variants and nine indels were discovered in comparing the chloroplast genomes of TC1966 and NM92. Analysis of the mungbean chloroplast transcriptome revealed mRNAs that were affected by transcriptional slippage and RNA editing. Transcriptional slippage frequency was positively correlated with the length of simple sequence repeats of the mungbean chloroplast genome (R2=0.9911). In total, 41 C-to-U editing sites were found in 23 chloroplast genes and in one intergenic spacer. No editing site that swapped U to C was found. A combination of bioinformatics and experimental methods revealed that the plastid-encoded RNA polymerase-transcribed genes psbF and ndhA are affected by transcriptional slippage in mungbean and in main lineages of land plants, including three dicots (Glycine max, Brassica rapa, and Nicotiana tabacum), two monocots (Oryza sativa and Zea mays), two gymnosperms (Pinus taeda and Ginkgo biloba) and one moss (Physcomitrella patens). Transcript analysis of the rps2 gene showed that transcriptional slippage could affect transcripts at single sequence repeat regions with poly-A runs. It showed that transcriptional slippage together with incomplete RNA editing may cause sequence diversity of transcripts in chloroplasts of land plants. PMID:26076132

  17. Construction of a rationally designed antibody platform for sequencing-assisted selection.

    PubMed

    Larman, H Benjamin; Xu, George Jing; Pavlova, Natalya N; Elledge, Stephen J

    2012-11-01

    Antibody discovery platforms have become an important source of both therapeutic biomolecules and research reagents. Massively parallel DNA sequencing can be used to assist antibody selection by comprehensively monitoring libraries during selection, thus greatly expanding the power of these systems. We have therefore constructed a rationally designed, fully defined single-chain variable fragment (scFv) library and analysis platform optimized for analysis with short-read deep sequencing. Sequence-defined oligonucleotide libraries encoding three complementarity-determining regions (L3 from the light chain, H2 and H3 from the heavy chain) were synthesized on a programmable microarray and combinatorially cloned into a single scFv framework for molecular display. Our unique complementarity-determining region sequence design optimizes for protein binding by utilizing a hidden Markov model that was trained on all antibody-antigen cocrystal structures in the Protein Data Bank. The resultant ~10(12)-member library was produced in ribosome-display format, and comprehensively analyzed over four rounds of antigen selections by multiplex paired-end Illumina sequencing. The hidden Markov model scFv library generated multiple binders against an emerging cancer antigen and is the basis for a next-generation antibody production platform. PMID:23064642

  18. Cross-species, amplifiable microsatellite markers for neoverrucid barnacles from deep-sea hydrothermal vents developed using next-generation sequencing.

    PubMed

    Nakajima, Yuichi; Shinzato, Chuya; Khalturina, Mariia; Watanabe, Hiromi; Inagaki, Fumio; Satoh, Nori; Mitarai, Satoshi

    2014-08-18

    Barnacles of the genus Neoverruca are abundant near deep-sea hydrothermal vents of the northwestern Pacific Ocean, and are useful for understanding processes of population formation and maintenance of deep-sea vent faunas. Using next-generation sequencing, we isolated 12 polymorphic microsatellite loci from Neoverruca sp., collected in the Okinawa Trough. These microsatellite loci revealed 2-19 alleles per locus. The expected and observed heterozygosities ranged from 0.286 to 1.000 and 0.349 to 0.935, respectively. Cross-species amplification showed that 9 of the 12 loci were successfully amplified for Neoverruca brachylepadoformis in the Mariana Trough. A pairwise FST value calculated using nine loci showed significant genetic differentiation between the two species. Consequently, the microsatellite markers we developed will be useful for further population genetic studies to elucidate genetic diversity, differentiation, classification, and evolutionary processes in the genus Neoverruca.

  19. Detection of the NS3 Q80K polymorphism by Sanger and deep sequencing in hepatitis C virus genotype 1a strains in the UK.

    PubMed

    Beloukas, A; King, S; Childs, K; Papadimitropoulos, A; Hopkins, M; Atkins, M; Agarwal, K; Nelson, M; Geretti, A M

    2015-11-01

    The Q80K polymorphism in the hepatitis C virus (HCV) NS3 enzyme reduces susceptibility to simeprevir and other novel protease inhibitors. The aims of this study were to determine the prevalence of Q80K in treatment-naïve HCV-1a carriers in the North West region (NW) and South East region (SE) of England, investigate the occurrence of Q80K as a minority variant, and characterize viral phylogeny. Plasma samples from subjects who were naïve to anti-HCV therapy were subjected to conventional (Sanger) and deep (Illumina-Miseq, 1% interpretative cut-off) sequencing of NS3. Q80K occurred in 44 of 238 subjects (18.5%, 95% CI 13.6-23.4%), including 19 of 70 (27.1%) in the NW and 25 of 168 (14.9%) in the SE (p 0.0425), with no difference in HCV RNA load or human immunodeficiency virus (HIV) status. Q80K frequencies in reads of samples subjected to Illumina sequencing were >40% in all cases. Among subjects with Q80K, five of 44 (11.4%) showed one additional major resistance-associated mutation in NS3, detected at frequencies of >10% (V36L and V55A) or <10% (V36M). Phylogenetic analyses identified the two recognized HCV-1a lineages with (clade I) and without (clade II) Q80K. Overall, 148 of 238 (62.2%) sequences occurred within regional or inter-regional clusters, each comprising 3-20 sequences. There was no unique clustering of English sequences relative to strains from continental Europe and North America. In conclusion, Q80K was found at a high prevalence among treatment-naïve HCV-1a carriers in England, and was reliably detected by conventional sequencing, with no increased detection by deep sequencing. English sequences were highly interspersed with sequences from elsewhere in Europe (clade II) and North America (clade I), and their phylogeny was consistent with multiple introductions from different areas. PMID:26232533

  20. Ultra-deep sequencing analysis of the hepatitis A virus 5'-untranslated region among cases of the same outbreak from a single source.

    PubMed

    Wu, Shuang; Nakamoto, Shingo; Kanda, Tatsuo; Jiang, Xia; Nakamura, Masato; Miyamura, Tatsuo; Shirasawa, Hiroshi; Sugiura, Nobuyuki; Takahashi-Nakaguchi, Azusa; Gonoi, Tohru; Yokosuka, Osamu

    2014-01-01

    Hepatitis A virus (HAV) is a causative agent of acute viral hepatitis for which an effective vaccine has been developed. Here we describe ultra-deep pyrosequences (UDPSs) of HAV 5'-untranslated region (5'UTR) among cases of the same outbreak, which arose from a single source, associated with a revolving sushi bar. We determined the reference sequence from HAV-derived clone from an attendant by the Sanger method. Sixteen UDPSs from this outbreak and one from another sporadic case were compared with this reference. Nucleotide errors yielded a UDPS error rate of < 1%. This study confirmed that nucleotide substitutions of this region are transition mutations in outbreak cases, that insertion was observed only in non-severe cases, and that these nucleotide substitutions were different from those of the sporadic case. Analysis of UDPSs detected low-prevalence HAV variations in 5'UTR, but no specific mutations associated with severity in these outbreak cases. To our surprise, HAV strains in this outbreak conserved HAV IRES sequence even if we performed analysis of UDPSs. UDPS analysis of HAV 5'UTR gave us no association between the disease severity of hepatitis A and HAV 5'UTR substitutions. It might be more interesting to perform ultra-deep sequencing of full length HAV genome in order to reveal possible unknown genomic determinants associated with disease severity. Further studies will be needed. PMID:24396287

  1. Implications of spatial and temporal development of the aftershock sequence for the Mw 8.3 June 9, 1994 Deep Bolivian Earthquake

    NASA Astrophysics Data System (ADS)

    Myers, Stephen C.; Wallace, Terry C.; Beck, Susan L.; Silver, Paul G.; Zandt, George; Vandecar, John; Minaya, Estela

    On June 9, 1994 the Mw 8.3 Bolivia earthquake (636 km depth) occurred in a region which had not experienced significant, deep seismicity for at least 30 years. The mainshock and aftershocks were recorded in Bolivia on the BANJO and SEDA broadband seismic arrays and on the San Calixto Network. We used the joint hypocenter determination method to determine the relative location of the aftershocks. We have identified no foreshocks and 89 aftershocks (m > 2.2) for the 20-day period following the mainshock. The frequency of aftershock occurrence decreased rapidly, with only one or two aftershocks per day occuring after day two. The temporal decay of aftershock activity is similar to shallow aftershock sequences, but the number of aftershocks is two orders of magnitude less. Additionally, a mb ∼6, apparently triggered earthquake occurred just 10 minutes after the mainshock about 330 km east-southeast of the mainshock at a depth of 671 km. The aftershock sequence occurred north and east of the mainshock and extends to a depth of 665 km. The aftershocks define a slab striking N68°W and dipping 45°NE. The strike, dip, and location of the aftershock zone are consistent with this seismicity being confined within the downward extension of the subducted Nazca plate. The location and orientation of the aftershock sequence indicate that the subducted Nazca plate bends between the NNW striking zone of deep seismicity in western Brazil and the N-S striking zone of seismicity in central Bolivia. A tear in the deep slab is not necessitated by the data. A subset of the aftershock hypocenters cluster along a subhorizontal plane near the depth of the mainshock, favoring a horizontal fault plane. The horizontal dimensions of the mainshock [Beck et al., this issue; Silver et al., 1995] and slab defined by the aftershocks are approximately equal, indicating that the mainshock ruptured through the slab.

  2. Genome Sequence of Aeribacillus pallidus Strain GS3372, an Endospore-Forming Bacterium Isolated in a Deep Geothermal Reservoir.

    PubMed

    Filippidou, Sevasti; Jaussi, Marion; Junier, Thomas; Wunderlin, Tina; Jeanneret, Nicole; Regenspurg, Simona; Li, Po-E; Lo, Chien-Chi; Johnson, Shannon; McMurry, Kim; Gleasner, Cheryl D; Vuyisich, Momchilo; Chain, Patrick S; Junier, Pilar

    2015-08-27

    The genome of strain GS3372 is the first publicly available strain of Aeribacillus pallidus. This endospore-forming thermophilic strain was isolated from a deep geothermal reservoir. The availability of this genome can contribute to the clarification of the taxonomy of the closely related Anoxybacillus, Geobacillus, and Aeribacillus genera.

  3. Genome Sequence of Aeribacillus pallidus Strain GS3372, an Endospore-Forming Bacterium Isolated in a Deep Geothermal Reservoir

    PubMed Central

    Filippidou, Sevasti; Jaussi, Marion; Junier, Thomas; Wunderlin, Tina; Jeanneret, Nicole; Regenspurg, Simona; Li, Po-E; Lo, Chien-Chi; McMurry, Kim; Gleasner, Cheryl D.; Vuyisich, Momchilo; Chain, Patrick S.

    2015-01-01

    The genome of strain GS3372 is the first publicly available strain of Aeribacillus pallidus. This endospore-forming thermophilic strain was isolated from a deep geothermal reservoir. The availability of this genome can contribute to the clarification of the taxonomy of the closely related Anoxybacillus, Geobacillus, and Aeribacillus genera. PMID:26316637

  4. Scaling metagenome sequence assembly with probabilistic de Bruijn graphs

    PubMed Central

    Pell, Jason; Hintze, Arend; Canino-Koning, Rosangela; Howe, Adina; Tiedje, James M.; Brown, C. Titus

    2012-01-01

    Deep sequencing has enabled the investigation of a wide range of environmental microbial ecosystems, but the high memory requirements for de novo assembly of short-read shotgun sequencing data from these complex populations are an increasingly large practical barrier. Here we introduce a memory-efficient graph representation with which we can analyze the k-mer connectivity of metagenomic samples. The graph representation is based on a probabilistic data structure, a Bloom filter, that allows us to efficiently store assembly graphs in as little as 4 bits per k-mer, albeit inexactly. We show that this data structure accurately represents DNA assembly graphs in low memory. We apply this data structure to the problem of partitioning assembly graphs into components as a prelude to assembly, and show that this reduces the overall memory requirements for de novo assembly of metagenomes. On one soil metagenome assembly, this approach achieves a nearly 40-fold decrease in the maximum memory requirements for assembly. This probabilistic graph representation is a significant theoretical advance in storing assembly graphs and also yields immediate leverage on metagenomic assembly. PMID:22847406

  5. Deep Sequencing of ESTs from Nacreous and Prismatic Layer Producing Tissues and a Screen for Novel Shell Formation-Related Genes in the Pearl Oyster

    PubMed Central

    Kinoshita, Shigeharu; Wang, Ning; Inoue, Haruka; Maeyama, Kaoru; Okamoto, Kikuhiko; Nagai, Kiyohito; Kondo, Hidehiro; Hirono, Ikuo; Asakawa, Shuichi; Watabe, Shugo

    2011-01-01

    Background Despite its economic importance, we have a limited understanding of the molecular mechanisms underlying shell formation in pearl oysters, wherein the calcium carbonate crystals, nacre and prism, are formed in a highly controlled manner. We constructed comprehensive expressed gene profiles in the shell-forming tissues of the pearl oyster Pinctada fucata and identified novel shell formation-related genes candidates. Principal Findings We employed the GS FLX 454 system and constructed transcriptome data sets from pallial mantle and pearl sac, which form the nacreous layer, and from the mantle edge, which forms the prismatic layer in P. fucata. We sequenced 260477 reads and obtained 29682 unique sequences. We also screened novel nacreous and prismatic gene candidates by a combined analysis of sequence and expression data sets, and identified various genes encoding lectin, protease, protease inhibitors, lysine-rich matrix protein, and secreting calcium-binding proteins. We also examined the expression of known nacreous and prismatic genes in our EST library and identified novel isoforms with tissue-specific expressions. Conclusions We constructed EST data sets from the nacre- and prism-producing tissues in P. fucata and found 29682 unique sequences containing novel gene candidates for nacreous and prismatic layer formation. This is the first report of deep sequencing of ESTs in the shell-forming tissues of P. fucata and our data provide a powerful tool for a comprehensive understanding of the molecular mechanisms of molluscan biomineralization. PMID:21731681

  6. Complete Genome Sequence of Hyperthermophilic Piezophilic Archaeon Palaeococcus pacificus DY20341T, Isolated from Deep-Sea Hydrothermal Sediments.

    PubMed

    Zeng, Xiang; Jebbar, Mohamed; Shao, Zongze

    2015-01-01

    We report the genome sequence of Palaeococcus pacificus DY20341(T), isolated from a sediment sample collected from eastern Pacific Ocean hydrothermal fields, which is the first report of a complete genome for a Palaeococcus species. The genome sequence will help to better understand differentiation phylogenetic relationships and evolution of several Thermococcales species.

  7. Draft genome sequence of Thermococcus sp. EP1, a novel hyperthermophilic archaeon isolated from a deep-sea hydrothermal vent on the East Pacific Rise.

    PubMed

    Zhou, Meixian; Liu, Qing; Xie, Yunbiao; Dong, Binbin; Chen, Xiaoyao

    2016-04-01

    Thermococcus sp. strain EP1 is a novel anaerobic hyperthermophilic archaeon isolated from a deep-sea hydrothermal vent on the East Pacific Rise. It grows optimally at 80 °C and can produce industrial enzymes at high temperature. We report here the draft genome of EP1, which contains 1,819,157 bp with a G+C content of 39.3%. The sequence will provide the genetic basis for better understanding of adaptation to hydrothermal environment and the development of novel thermostable enzymes for industrial application. PMID:26672397

  8. BFC: correcting Illumina sequencing errors

    PubMed Central

    2015-01-01

    Summary: BFC is a free, fast and easy-to-use sequencing error corrector designed for Illumina short reads. It uses a non-greedy algorithm but still maintains a speed comparable to implementations based on greedy methods. In evaluations on real data, BFC appears to correct more errors with fewer overcorrections in comparison to existing tools. It particularly does well in suppressing systematic sequencing errors, which helps to improve the base accuracy of de novo assemblies. Availability and implementation: https://github.com/lh3/bfc Contact: hengli@broadinstitute.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25953801

  9. Characterization of rainbow trout gonad, brain and gill deep cDNA repertoires using a Roche 454-Titanium sequencing approach.

    PubMed

    Le Cam, Aurélie; Bobe, Julien; Bouchez, Olivier; Cabau, Cédric; Kah, Olivier; Klopp, Christophe; Lareyre, Jean-Jacques; Le Guen, Isabelle; Lluch, Jérôme; Montfort, Jérôme; Moreews, Francois; Nicol, Barbara; Prunet, Patrick; Rescan, Pierre-Yves; Servili, Arianna; Guiguen, Yann

    2012-05-25

    Rainbow trout, Oncorhynchus mykiss, is an important aquaculture species worldwide and, in addition to being of commercial interest, it is also a research model organism of considerable scientific importance. Because of the lack of a whole genome sequence in that species, transcriptomic analyses of this species have often been hindered. Using next-generation sequencing (NGS) technologies, we sought to fill these informational gaps. Here, using Roche 454-Titanium technology, we provide new tissue-specific cDNA repertoires from several rainbow trout tissues. Non-normalized cDNA libraries were constructed from testis, ovary, brain and gill rainbow trout tissue samples, and these different libraries were sequenced in 10 separate half-runs of 454-Titanium. Overall, we produced a total of 3million quality sequences with an average size of 328bp, representing more than 1Gb of expressed sequence information. These sequences have been combined with all publicly available rainbow trout sequences, resulting in a total of 242,187 clusters of putative transcript groups and 22,373 singletons. To identify the predominantly expressed genes in different tissues of interest, we developed a Digital Differential Display (DDD) approach. This approach allowed us to characterize the genes that are predominantly expressed within each tissue of interest. Of these genes, some were already known to be tissue-specific, thereby validating our approach. Many others, however, were novel candidates, demonstrating the usefulness of our strategy and of such tissue-specific resources. This new sequence information, acquired using NGS 454-Titanium technology, deeply enriched our current knowledge of the expressed genes in rainbow trout through the identification of an increased number of tissue-specific sequences. This identification allowed a precise cDNA tissue repertoire to be characterized in several important rainbow trout tissues. The rainbow trout contig browser can be accessed at the following

  10. Characterization of rainbow trout gonad, brain and gill deep cDNA repertoires using a Roche 454-Titanium sequencing approach.

    PubMed

    Le Cam, Aurélie; Bobe, Julien; Bouchez, Olivier; Cabau, Cédric; Kah, Olivier; Klopp, Christophe; Lareyre, Jean-Jacques; Le Guen, Isabelle; Lluch, Jérôme; Montfort, Jérôme; Moreews, Francois; Nicol, Barbara; Prunet, Patrick; Rescan, Pierre-Yves; Servili, Arianna; Guiguen, Yann

    2012-05-25

    Rainbow trout, Oncorhynchus mykiss, is an important aquaculture species worldwide and, in addition to being of commercial interest, it is also a research model organism of considerable scientific importance. Because of the lack of a whole genome sequence in that species, transcriptomic analyses of this species have often been hindered. Using next-generation sequencing (NGS) technologies, we sought to fill these informational gaps. Here, using Roche 454-Titanium technology, we provide new tissue-specific cDNA repertoires from several rainbow trout tissues. Non-normalized cDNA libraries were constructed from testis, ovary, brain and gill rainbow trout tissue samples, and these different libraries were sequenced in 10 separate half-runs of 454-Titanium. Overall, we produced a total of 3million quality sequences with an average size of 328bp, representing more than 1Gb of expressed sequence information. These sequences have been combined with all publicly available rainbow trout sequences, resulting in a total of 242,187 clusters of putative transcript groups and 22,373 singletons. To identify the predominantly expressed genes in different tissues of interest, we developed a Digital Differential Display (DDD) approach. This approach allowed us to characterize the genes that are predominantly expressed within each tissue of interest. Of these genes, some were already known to be tissue-specific, thereby validating our approach. Many others, however, were novel candidates, demonstrating the usefulness of our strategy and of such tissue-specific resources. This new sequence information, acquired using NGS 454-Titanium technology, deeply enriched our current knowledge of the expressed genes in rainbow trout through the identification of an increased number of tissue-specific sequences. This identification allowed a precise cDNA tissue repertoire to be characterized in several important rainbow trout tissues. The rainbow trout contig browser can be accessed at the following

  11. Genome Sequence of the Psychrophilic Bacterium Tenacibaculum ovolyticum Strain da5A-8 Isolated from Deep Seawater

    PubMed Central

    Zhai, Zhenyu; Komatsu, Ayumi; Shibayama, Keigo

    2016-01-01

    Some bacterial species of the genus Tenacibaculum, including Tenacibaculum ovolyticum, have been known as fish pathogens in the sea. So far, the only published genome sequence for this genus is for Tenacibaculum dicentrarchi, which could also be a fish pathogen. Strain da5A-8, showing 100% identity to the 16S rRNA gene sequence of T. ovolyticum DSM 18103T, was isolated from seawater at a depth of 344 m in Kochi, Japan, and grew optimally at 10 to 20°C. The genome sequence of strain da5A-8 revealed the possible virulence genes commonly observed in the genus Tenacibaculum. PMID:27365358

  12. Draft genome sequence of Caminibacter mediatlanticus strain TB-2, an epsilonproteobacterium isolated from a deep-sea hydrothermal vent.

    PubMed

    Giovannelli, Donato; Ferriera, Steven; Johnson, Justin; Kravitz, Saul; Pérez-Rodríguez, Ileana; Ricci, Jessica; O'Brien, Charles; Voordeckers, James W; Bini, Elisabetta; Vetriani, Costantino

    2011-10-15

    Caminibacter mediatlanticus strain TB-2(T) [1], is a thermophilic, anaerobic, chemolithoautotrophic bacterium, isolated from the walls of an active deep-sea hydrothermal vent chimney on the Mid-Atlantic Ridge and the type strain of the species. C. mediatlanticus is a Gram-negative member of the Epsilonproteobacteria (order Nautiliales) that grows chemolithoautotrophically with H(2) as the energy source and CO(2) as the carbon source. Nitrate or sulfur is used as the terminal electron acceptor, with resulting production of ammonium and hydrogen sulfide, respectively. In view of the widespread distribution, importance and physiological characteristics of thermophilic Epsilonproteobacteria in deep-sea geothermal environments, it is likely that these organisms provide a relevant contribution to both primary productivity and the biogeochemical cycling of carbon, nitrogen and sulfur at hydrothermal vents. Here we report the main features of the genome of C. mediatlanticus strain TB-2(T). PMID:22180817

  13. Transcriptome Analysis of the Mud Crab (Scylla paramamosain) by 454 Deep Sequencing: Assembly, Annotation, and Marker Discovery

    PubMed Central

    Ma, Hongyu; Ma, Chunyan; Li, Shujuan; Jiang, Wei; Li, Xincang; Liu, Yuexing; Ma, Lingbo

    2014-01-01

    In this study, we reported the characterization of the first transcriptome of the mud crab (Scylla paramamosain). Pooled cDNAs of four tissue types from twelve wild individuals were sequenced using the Roche 454 FLX platform. Analysis performed included de novo assembly of transcriptome sequences, functional annotation, and molecular marker discovery. A total of 1,314,101 high quality reads with an average length of 411 bp were generated by 454 sequencing on a mixed cDNA library. De novo assembly of these 1,314,101 reads produced 76,778 contigs (consisting of 818,154 reads) with 5.4-fold average sequencing coverage. The remaining 495,947 reads were singletons. A total of 78,268 unigenes were identified based on sequence similarity with known proteins (E≤0.00001) in UniProt and non-redundant protein databases. Meanwhile, 44,433 sequences were identified (E≤0.00001) using a BLASTN search against the NCBI nucleotide database. Gene Ontology (GO) analysis indicated that biosynthetic process, cell part, and ion binding were the most abundant terms in biological process, cellular component, and molecular function categories, respectively. Kyoto Encyclopedia of Genes and Genome (KEGG) pathway analysis revealed that 4,878 unigenes distributed in 281 different pathways. In addition, 19,011 microsatellites and 37,063 potential single nucleotide polymorphisms were detected from the transcriptome of S. paramamosain. Finally, thirty polymorphic microsatellite markers were developed and used to assess genetic diversity of a wild population of S. paramamosain. So far, existing sequence resources for S. paramamosain are extremely limited. The present study provides a characterization of transcriptome from multiple tissues and individuals, as well as an assessment of genetic diversity of a wild population. These sequence resources will facilitate the investigation of population genetic diversity, the development of genetic maps, and the conduct of molecular marker

  14. Late Pleistocene Sea-level and Deep-sea Temperature Changes Constrained by U.S. Mid-Atlantic Margin Sequences

    NASA Astrophysics Data System (ADS)

    Wright, J. D.; Miller, K. G.; Sheridan, R. E.; Cramer, B. S.

    2004-12-01

    We assembled and dated a late Pleistocene (last 130 kyr) sea-level record based on sequence stratigraphy from the U.S. middle Atlantic margin. The timing and magnitude of these sea-level changes are similar to those reported from uplifted coral terraces in New Guinea and Barbados, suggesting that we have established a global record of late Pleistocene sea-level change. Comparison of this eustatic record with benthic foraminiferal oxygen isotope records shows that the deep sea cooled ~2.5\\deg C between Marine Isotope Chrons (MIC) 5e and 5d (~120-110 ka) and that near freezing conditions persisted until Termination 1a (14-15 ka). The pattern of deep-sea cooling follows a hysteresis loop between two stable modes of operation. Cold, near freezing deep-water conditions characterize most of the past 130 kyr. In contrast, two warm intervals (the Holocene/MIC 1 and MIC 5e) resulted from rapid warming during the terminations; rapid cooling followed the peak warmth of 5e and presumably the same would be beginning today if not for anthropogenic warming.

  15. Deep sequencing identifies circulating mouse miRNAs that are functionally implicated in manifestations of aging and responsive to calorie restriction.

    PubMed

    Dhahbi, Joseph M; Spindler, Stephen R; Atamna, Hani; Yamakawa, Amy; Guerrero, Noel; Boffelli, Dario; Mote, Patricia; Martin, David I K

    2013-02-01

    MicroRNAs (miRNAs) function to modulate gene expression, and through this property they regulate a broad spectrum of cellular processes. They can circulate in blood and thereby mediate cell-to-cell communication. Aging involves changes in many cellular processes that are potentially regulated by miRNAs, and some evidence has implicated circulating miRNAs in the aging process. In order to initiate a comprehensive assessment of the role of circulating miRNAs in aging, we have used deep sequencing to characterize circulating miRNAs in the serum of young mice, old mice, and old mice maintained on calorie restriction (CR). Deep sequencing identifies a set of novel miRNAs, and also accurately measures all known miRNAs present in serum. This analysis demonstrates that the levels of many miRNAs circulating in the mouse are increased with age, and that the increases can be antagonized by CR. The genes targeted by this set of age-modulated miRNAs are predicted to regulate biological processes directly relevant to the manifestations of aging including metabolic changes, and the miRNAs themselves have been linked to diseases associated with old age. This finding implicates circulating miRNAs in the aging process, raising questions about their tissues of origin, their cellular targets, and their functional role in metabolic changes that occur with aging.

  16. Identification of microRNA-like RNAs from Curvularia lunata associated with maize leaf spot by bioinformation analysis and deep sequencing.

    PubMed

    Liu, Tong; Hu, John; Zuo, Yuhu; Jin, Yazhong; Hou, Jumei

    2016-04-01

    Deep sequencing of small RNAs is a useful tool to identify novel small RNAs that may be involved in fungal growth and pathogenesis. In this study, we used HiSeq deep sequencing to identify 747,487 unique small RNAs from Curvularia lunata. Among these small RNAs were 1012 microRNA-like RNAs (milRNAs), which are similar to other known microRNAs, and 48 potential novel milRNAs without homologs in other organisms have been identified using the miRBase© database. We used quantitative PCR to analyze the expression of four of these milRNAs from C. lunata at different developmental stages. The analysis revealed several changes associated with germinating conidia and mycelial growth, suggesting that these milRNAs may play a role in pathogen infection and mycelial growth. A total of 8334 target mRNAs for the 1012 milRNAs that were identified, and 256 target mRNAs for the 48 novel milRNAs were predicted by computational analysis. These target mRNAs of milRNAs were also performed by gene ontology and Kyoto Encyclopedia of Genes and Genomes pathway analysis. To our knowledge, this study is the first report of C. lunata's milRNA profiles. This information will provide a better understanding of pathogen development and infection mechanism.

  17. Rapid genome mapping in nano channel array for highly complete and accurate de novo sequence assembly of the complex Aegilops tauschii genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Next-generation sequencing (NGS) technologies have enabled high-throughput and low-cost generation of sequence data; however, de novo genome assembly remains a great challenge, particularly for large genomes. NGS short reads are often insufficient to create large contigs that span repeat sequences...

  18. Deep Sequencing of T-Cell Receptor DNA as a biomarker of clonally expanded TILs in breast cancer after immunotherapy

    PubMed Central

    Page, David B.; Yuan, Jianda; Redmond, David; Wen, Y Hanna; Durack, Jeremy C.; Emerson, Ryan; Solomon, Stephen; Dong, Zhiwan; Wong, Phillip; Comstock, Christopher; Diab, Adi; Sung, Janice; Maybody, Majid; Morris, Elizabeth; Brogi, Edi; Morrow, Monica; Sacchini, Virgilio; Elemento, Olivier; Robins, Harlan; Patil, Sujata; Allison, James P.; Wolchok, Jedd D.; Hudis, Clifford; Norton, Larry; McArthur, Heather

    2016-01-01

    In early stage breast cancer, the degree of tumor-infiltrating lymphocytes (TILs) predicts response to chemotherapy and overall survival. Combination immunotherapy with immune checkpoint antibody plus tumor cryoablation can induce lymphocytic infiltrates and improve survival in mice. We used T-cell receptor (TCR) DNA sequencing to evaluate both the effect of cryo-immunotherapy in humans and the feasibility of TCR sequencing in early-stage breast cancer. In a pilot clinical trial, 18 women with early-stage breast cancer were treated preoperatively with cryoablation, single-dose anti-CTLA-4 (ipilimumab), or cryoablation + ipilimumab. TCRs within serially collected peripheral blood and tumor tissue were sequenced. In baseline tumor tissues, T-cell density as measured by TCR sequencing correlated with TIL scores obtained by hematoxylin and eosin (H&E) staining. However, tumors with little or no lymphocytes by H&E contained up to 3.6 × 106 TCR DNA sequences, highlighting the sensitivity of the ImmunoSEQ platform. In this dataset, ipilimumab increased intratumoral T-cell density over time, whereas cryoablation ± ipilimumab diversified and remodeled the intratumoral T-cell clonal repertoire. Compared to monotherapy, cryoablation plus ipilimumab was associated with numerically greater numbers of peripheral blood and intratumoral T-cell clones expanding robustly following therapy. In conclusion, TCR sequencing correlates with H&E lymphocyte scoring, and provides additional information on clonal diversity. These findings support further study of the use of TCR sequencing as a biomarker for T cell responses to therapy and for the study of cryo-immunotherapy in early-stage breast cancer. PMID:27587469

  19. Ultra-deep Illumina sequencing accurately identifies MHC class IIb alleles and provides evidence for copy number variation in the guppy (Poecilia reticulata).

    PubMed

    Lighten, Jackie; van Oosterhout, Cock; Paterson, Ian G; McMullan, Mark; Bentzen, Paul

    2014-07-01

    We address the bioinformatic issue of accurately separating amplified genes of the major histocompatibility complex (MHC) from artefacts generated during high-throughput sequencing workflows. We fit observed ultra-deep sequencing depths (hundreds to thousands of sequences per amplicon) of allelic variants to expectations from genetic models of copy number variation (CNV). We provide a simple, accurate and repeatable method for genotyping multigene families, evaluating our method via analyses of 209 b of MHC class IIb exon 2 in guppies (Poecilia reticulata). Genotype repeatability for resequenced individuals (N = 49) was high (100%) within the same sequencing run. However, repeatability dropped to 83.7% between independent runs, either because of lower mean amplicon sequencing depth in the initial run or random PCR effects. This highlights the importance of fully independent replicates. Significant improvements in genotyping accuracy were made by greatly reducing type I genotyping error (i.e. accepting an artefact as a true allele), which may occur when using low-depth allele validation thresholds used by previous methods. Only a small amount (4.9%) of type II error (i.e. rejecting a genuine allele as an artefact) was detected through fully independent sequencing runs. We observed 1-6 alleles per individual, and evidence of sharing of alleles across loci. Variation in the total number of MHC class II loci among individuals, both among and within populations was also observed, and some genotypes appeared to be partially hemizygous; total allelic dosage added up to an odd number of allelic copies. Collectively, observations provide evidence of MHC CNV and its complex basis in natural populations.

  20. Ultra-deep sequencing enables high-fidelity recovery of biodiversity for bulk arthropod samples without PCR amplification

    PubMed Central

    2013-01-01

    Background Next-generation-sequencing (NGS) technologies combined with a classic DNA barcoding approach have enabled fast and credible measurement for biodiversity of mixed environmental samples. However, the PCR amplification involved in nearly all existing NGS protocols inevitably introduces taxonomic biases. In the present study, we developed new Illumina pipelines without PCR amplifications to analyze terrestrial arthropod communities. Results Mitochondrial enrichment directly followed by Illumina shotgun sequencing, at an ultra-high sequence volume, enabled the recovery of Cytochrome c Oxidase subunit 1 (COI) barcode sequences, which allowed for the estimation of species composition at high fidelity for a terrestrial insect community. With 15.5 Gbp Illumina data, approximately 97% and 92% were detected out of the 37 input Operational Taxonomic Units (OTUs), whether the reference barcode library was used or not, respectively, while only 1 novel OTU was found for the latter. Additionally, relatively strong correlation between the sequencing volume and the total biomass was observed for species from the bulk sample, suggesting a potential solution to reveal relative abundance. Conclusions The ability of the new Illumina PCR-free pipeline for DNA metabarcoding to detect small arthropod specimens and its tendency to avoid most, if not all, false positives suggests its great potential in biodiversity-related surveillance, such as in biomonitoring programs. However, further improvement for mitochondrial enrichment is likely needed for the application of the new pipeline in analyzing arthropod communities at higher diversity. PMID:23587339

  1. Identification of SSRs and differentially expressed genes in two cultivars of celery (Apium graveolens L.) by deep transcriptome sequencing.

    PubMed

    Li, Meng-Yao; Wang, Feng; Jiang, Qian; Ma, Jing; Xiong, Ai-Sheng

    2014-01-01

    Celery (Apium graveolens L.) is one of the most important and widely grown vegetables in the Apiaceae family. Due to the lack of comprehensive genomic resources, research on celery has mainly utilized physiological and biochemical approaches, rather than molecular biology, to study this crop. Transcriptome sequencing has become an efficient and economic technology for obtaining information on gene expression that can greatly facilitate molecular and genomic studies of species for which a sequenced genome is not available. In the present study, 15 893 516 and 19 818 161 high-quality sequences were obtained by RNA-seq from two celery varieties 'Ventura' and 'Jinnan Shiqin', respectively. The obtained reads were assembled into 39 584 and 41 740 unigenes with mean lengths of 683 bp and 690 bp, respectively. A total of 1939 simple sequence repeat (SSR) markers were identified in 'Ventura' and 2004 SSRs in 'Jinnan Shiqin'. Di-nucleotide repeats were the most common repeat motif, accounting for 55.49% and 54.84% in 'Ventura' and 'Jinnan Shiqin', respectively. A comparison of expressed genes between the two libraries, identified 338 differentially expressed genes (DEGs). Three hundred and three of the DEGs were annotated based on a sequence similarity search utilizing eight public databases. Additionally, the expression profile of eight annotated DEGs was characterized in response to abiotic stresses. The collective data generated in the present research represent a valuable resource for further genetic and molecular studies in celery.

  2. A Method for Amplicon Deep Sequencing of Drug Resistance Genes in Plasmodium falciparum Clinical Isolates from India.

    PubMed

    Rao, Pavitra N; Uplekar, Swapna; Kayal, Sriti; Mallick, Prashant K; Bandyopadhyay, Nabamita; Kale, Sonal; Singh, Om P; Mohanty, Akshaya; Mohanty, Sanjib; Wassmer, Samuel C; Carlton, Jane M

    2016-06-01

    A major challenge to global malaria control and elimination is early detection and containment of emerging drug resistance. Next-generation sequencing (NGS) methods provide the resolution, scalability, and sensitivity required for high-throughput surveillance of molecular markers of drug resistance. We have developed an amplicon sequencing method on the Ion Torrent PGM platform for targeted resequencing of a panel of six Plasmodium falciparum genes implicated in resistance to first-line antimalarial therapy, including artemisinin combination therapy, chloroquine, and sulfadoxine-pyrimethamine. The protocol was optimized using 12 geographically diverse P. falciparum reference strains and successfully applied to multiplexed sequencing of 16 clinical isolates from India. The sequencing results from the reference strains showed 100% concordance with previously reported drug resistance-associated mutations. Single-nucleotide polymorphisms (SNPs) in clinical isolates revealed a number of known resistance-associated mutations and other nonsynonymous mutations that have not been implicated in drug resistance. SNP positions containing multiple allelic variants were used to identify three clinical samples containing mixed genotypes indicative of multiclonal infections. The amplicon sequencing protocol has been designed for the benchtop Ion Torrent PGM platform and can be operated with minimal bioinformatics infrastructure, making it ideal for use in countries that are endemic for the disease to facilitate routine large-scale surveillance of the emergence of drug resistance and to ensure continued success of the malaria treatment policy.

  3. A Method for Amplicon Deep Sequencing of Drug Resistance Genes in Plasmodium falciparum Clinical Isolates from India

    PubMed Central

    Rao, Pavitra N.; Uplekar, Swapna; Kayal, Sriti; Mallick, Prashant K.; Bandyopadhyay, Nabamita; Kale, Sonal; Singh, Om P.; Mohanty, Akshaya; Mohanty, Sanjib; Wassmer, Samuel C.

    2016-01-01

    A major challenge to global malaria control and elimination is early detection and containment of emerging drug resistance. Next-generation sequencing (NGS) methods provide the resolution, scalability, and sensitivity required for high-throughput surveillance of molecular markers of drug resistance. We have developed an amplicon sequencing method on the Ion Torrent PGM platform for targeted resequencing of a panel of six Plasmodium falciparum genes implicated in resistance to first-line antimalarial therapy, including artemisinin combination therapy, chloroquine, and sulfadoxine-pyrimethamine. The protocol was optimized using 12 geographically diverse P. falciparum reference strains and successfully applied to multiplexed sequencing of 16 clinical isolates from India. The sequencing results from the reference strains showed 100% concordance with previously reported drug resistance-associated mutations. Single-nucleotide polymorphisms (SNPs) in clinical isolates revealed a number of known resistance-associated mutations and other nonsynonymous mutations that have not been implicated in drug resistance. SNP positions containing multiple allelic variants were used to identify three clinical samples containing mixed genotypes indicative of multiclonal infections. The amplicon sequencing protocol has been designed for the benchtop Ion Torrent PGM platform and can be operated with minimal bioinformatics infrastructure, making it ideal for use in countries that are endemic for the disease to facilitate routine large-scale surveillance of the emergence of drug resistance and to ensure continued success of the malaria treatment policy. PMID:27008882

  4. Pooled deep sequencing of Plasmodium falciparum isolates: an efficient and scalable tool to quantify prevailing malaria drug-resistance genotypes.

    PubMed

    Taylor, Steve M; Parobek, Christian M; Aragam, Nash; Ngasala, Billy E; Mårtensson, Andreas; Meshnick, Steven R; Juliano, Jonathan J

    2013-12-15

    Molecular surveillance for drug-resistant malaria parasites requires reliable, timely, and scalable methods. These data may be efficiently produced by genotyping parasite populations using second-generation sequencing (SGS). We designed and validated a SGS protocol to quantify mutant allele frequencies in the Plasmodium falciparum genes dhfr and dhps in mixed isolates. We applied this new protocol to field isolates from children and compared it to standard genotyping using Sanger sequencing. The SGS protocol accurately quantified dhfr and dhps allele frequencies in a mixture of parasite strains. Using SGS of DNA that was extracted and then pooled from individual isolates, we estimated mutant allele frequencies that were closely correlated to those estimated by Sanger sequencing (correlations, >0.98). The SGS protocol obviated most molecular steps in conventional methods and is cost saving for parasite populations >50. This SGS genotyping method efficiently and reproducibly estimates parasite allele frequencies within populations of P. falciparum for molecular epidemiologic studies.

  5. Ultra-deep sequencing detects ovarian cancer cells in peritoneal fluid and reveals somatic TP53 mutations in noncancerous tissues.

    PubMed

    Krimmel, Jeffrey D; Schmitt, Michael W; Harrell, Maria I; Agnew, Kathy J; Kennedy, Scott R; Emond, Mary J; Loeb, Lawrence A; Swisher, Elizabeth M; Risques, Rosa Ana

    2016-05-24

    Current sequencing methods are error-prone, which precludes the identification of low frequency mutations for early cancer detection. Duplex sequencing is a sequencing technology that decreases errors by scoring mutations present only in both strands of DNA. Our aim was to determine whether duplex sequencing could detect extremely rare cancer cells present in peritoneal fluid from women with high-grade serous ovarian carcinomas (HGSOCs). These aggressive cancers are typically diagnosed at a late stage and are characterized by TP53 mutations and peritoneal dissemination. We used duplex sequencing to analyze TP53 mutations in 17 peritoneal fluid samples from women with HGSOC and 20 from women without cancer. The tumor TP53 mutation was detected in 94% (16/17) of peritoneal fluid samples from women with HGSOC (frequency as low as 1 mutant per 24,736 normal genomes). Additionally, we detected extremely low frequency TP53 mutations (median mutant fraction 1/13,139) in peritoneal fluid from nearly all patients with and without cancer (35/37). These mutations were mostly deleterious, clustered in hotspots, increased with age, and were more abundant in women with cancer than in controls. The total burden of TP53 mutations in peritoneal fluid distinguished cancers from controls with 82% sensitivity (14/17) and 90% specificity (18/20). Age-associated, low frequency TP53 mutations were also found in 100% of peripheral blood samples from 15 women with and without ovarian cancer (none with hematologic disorder). Our results demonstrate the ability of duplex sequencing to detect rare cancer cells and provide evidence of widespread, low frequency, age-associated somatic TP53 mutation in noncancerous tissue. PMID:27152024

  6. The Accuracy, Feasibility and Challenges of Sequencing Short Tandem Repeats Using Next-Generation Sequencing Platforms

    PubMed Central

    Zavodna, Monika; Bagshaw, Andrew; Brauning, Rudiger; Gemmell, Neil J.

    2014-01-01

    To date we have little knowledge of how accurate next-generation sequencing (NGS) technologies are in sequencing repetitive sequences beyond known limitations to accurately sequence homopolymers. Only a handful of previous reports have evaluated the potential of NGS for sequencing short tandem repeats (microsatellites) and no empirical study has compared and evaluated the performance of more than one NGS platform with the same dataset. Here we examined yeast microsatellite variants from both long-read (454-sequencing) and short-read (Illumina) NGS platforms and compared these to data derived through Sanger sequencing. In addition, we investigated any locus-specific biases and differences that might have resulted from variability in microsatellite repeat number, repeat motif or type of mutation. Out of 112 insertion/deletion variants identified among 45 microsatellite amplicons in our study, we found 87.5% agreement between the 454-platform and Sanger sequencing in frequency of variant detection after Benjamini-Hochberg correction for multiple tests. For a subset of 21 microsatellite amplicons derived from Illumina sequencing, the results of short-read platform were highly consistent with the other two platforms, with 100% agreement with 454-sequencing and 93.6% agreement with the Sanger method after Benjamini-Hochberg correction. We found that the microsatellite attributes copy number, repeat motif and type of mutation did not have a significant effect on differences seen between the sequencing platforms. We show that both long-read and short-read NGS platforms can be used to sequence short tandem repeats accurately, which makes it feasible to consider the use of these platforms in high-throughput genotyping. It appears the major requirement for achieving both high accuracy and rare variant detection in microsatellite genotyping is sufficient read depth coverage. This might be a challenge because each platform generates a consistent pattern of non-uniform sequence

  7. Global Transcriptome Analysis of the Tentacle of the Jellyfish Cyanea capillata Using Deep Sequencing and Expressed Sequence Tags: Insight into the Toxin- and Degenerative Disease-Related Transcripts

    PubMed Central

    Liu, Dan; Wang, Qianqian; Ruan, Zengliang; He, Qian; Zhang, Liming

    2015-01-01

    Background Jellyfish contain diverse toxins and other bioactive components. However, large-scale identification of novel toxins and bioactive components from jellyfish has been hampered by the low efficiency of traditional isolation and purification methods. Results We performed de novo transcriptome sequencing of the tentacle tissue of the jellyfish Cyanea capillata. A total of 51,304,108 reads were obtained and assembled into 50,536 unigenes. Of these, 21,357 unigenes had homologues in public databases, but the remaining unigenes had no significant matches due to the limited sequence information available and species-specific novel sequences. Functional annotation of the unigenes also revealed general gene expression profile characteristics in the tentacle of C. capillata. A primary goal of this study was to identify putative toxin transcripts. As expected, we screened many transcripts encoding proteins similar to several well-known toxin families including phospholipases, metalloproteases, serine proteases and serine protease inhibitors. In addition, some transcripts also resembled molecules with potential toxic activities, including cnidarian CfTX-like toxins with hemolytic activity, plancitoxin-1, venom toxin-like peptide-6, histamine-releasing factor, neprilysin, dipeptidyl peptidase 4, vascular endothelial growth factor A, angiotensin-converting enzyme-like and endothelin-converting enzyme 1-like proteins. Most of these molecules have not been previously reported in jellyfish. Interestingly, we also characterized a number of transcripts with similarities to proteins relevant to several degenerative diseases, including Huntington’s, Alzheimer’s and Parkinson’s diseases. This is the first description of degenerative disease-associated genes in jellyfish. Conclusion We obtained a well-categorized and annotated transcriptome of C. capillata tentacle that will be an important and valuable resource for further understanding of jellyfish at the molecular

  8. Draft Genome Sequence of the Deep-Sea Basidiomycetous Yeast Cryptococcus sp. Strain Mo29 Reveals Its Biotechnological Potential

    PubMed Central

    Rédou, Vanessa; Kumar, Abhishek; Hainaut, Matthieu; Henrissat, Bernard; Record, Eric; Barbier, Georges

    2016-01-01

    Cryptococcus sp. strain Mo29 was isolated from the Rainbow hydrothermal site on the Mid-Atlantic Ridge. Here, we present the draft genome sequence of this basidiomycetous yeast strain, which has highlighted its biotechnological potential as revealed by the presence of genes involved in the synthesis of secondary metabolites and biotechnologically important enzymes. PMID:27389259

  9. Construction and sequence sampling of deep-coverage, large-insert BAC libraries for three model lepidopteran species

    PubMed Central

    Wu, Chengcang; Proestou, Dina; Carter, Dorothy; Nicholson, Erica; Santos, Filippe; Zhao, Shaying; Zhang, Hong-Bin; Goldsmith, Marian R

    2009-01-01

    Background Manduca sexta, Heliothis virescens, and Heliconius erato represent three widely-used insect model species for genomic and fundamental studies in Lepidoptera. Large-insert BAC libraries of these insects are critical resources for many molecular studies, including physical mapping and genome sequencing, but not available to date. Results We report the construction and characterization of six large-insert BAC libraries for the three species and sampling sequence analysis of the genomes. The six BAC libraries were constructed with two restriction enzymes, two libraries for each species, and each has an average clone insert size ranging from 152–175 kb. We estimated that the genome coverage of each library ranged from 6–9 ×, with the two combined libraries of each species being equivalent to 13.0–16.3 × haploid genomes. The genome coverage, quality and utility of the libraries were further confirmed by library screening using 6~8 putative single-copy probes. To provide a first glimpse into these genomes, we sequenced and analyzed the BAC ends of ~200 clones randomly selected from the libraries of each species. The data revealed that the genomes are AT-rich, contain relatively small fractions of repeat elements with a majority belonging to the category of low complexity repeats, and are more abundant in retro-elements than DNA transposons. Among the species, the H. erato genome is somewhat more abundant in repeat elements and simple repeats than those of M. sexta and H. virescens. The BLAST analysis of the BAC end sequences suggested that the evolution of the three genomes is widely varied, with the genome of H. virescens being the most conserved as a typical lepidopteran, whereas both genomes of H. erato and M. sexta appear to have evolved significantly, resulting in a higher level of species- or evolutionary lineage-specific sequences. Conclusion The high-quality and large-insert BAC libraries of the insects, together with the identified BACs

  10. Identification of MiRNA from Eggplant (Solanum melongena L.) by Small RNA Deep Sequencing and Their Response to Verticillium dahliae Infection

    PubMed Central

    Yang, Liu; Jue, Dengwei; Li, Wang; Zhang, Ruijie; Chen, Min; Yang, Qing

    2013-01-01

    MiRNAs are a class of non-coding small RNAs that play important roles in the regulation of gene expression. Although plant miRNAs have been extensively studied in model systems, less is known in other plants with limited genome sequence data, including eggplant (Solanum melongena L.). To identify miRNAs in eggplant and their response to Verticillium dahliae infection, a fungal pathogen for which clear understanding of infection mechanisms and effective cure methods are currently lacking, we deep-sequenced two small RNA (sRNA) libraries prepared from mock-infected and infected seedlings of eggplants. Specifically, 30,830,792 reads produced 7,716,328 unique miRNAs representing 99 known miRNA families that have been identified in other plant species. Two novel putative miRNAs were predicted with eggplant ESTs. The potential targets of the identified known and novel miRNAs were also predicted based on sequence homology search. It was observed that the length distribution of obtained sRNAs and the expression of 6 miRNA families were obviously different between the two libraries. These results provide a framework for further analysis of miRNAs and their role in regulating plant response to fungal infection and Verticillium wilt in particular. PMID:24015279

  11. Identification of MiRNA from eggplant (Solanum melongena L.) by small RNA deep sequencing and their response to Verticillium dahliae infection.

    PubMed

    Yang, Liu; Jue, Dengwei; Li, Wang; Zhang, Ruijie; Chen, Min; Yang, Qing

    2013-01-01

    MiRNAs are a class of non-coding small RNAs that play important roles in the regulation of gene expression. Although plant miRNAs have been extensively studied in model systems, less is known in other plants with limited genome sequence data, including eggplant (Solanum melongena L.). To identify miRNAs in eggplant and their response to Verticillium dahliae infection, a fungal pathogen for which clear understanding of infection mechanisms and effective cure methods are currently lacking, we deep-sequenced two small RNA (sRNA) libraries prepared from mock-infected and infected seedlings of eggplants. Specifically, 30,830,792 reads produced 7,716,328 unique miRNAs representing 99 known miRNA families that have been identified in other plant species. Two novel putative miRNAs were predicted with eggplant ESTs. The potential targets of the identified known and novel miRNAs were also predicted based on sequence homology search. It was observed that the length distribution of obtained sRNAs and the expression of 6 miRNA families were obviously different between the two libraries. These results provide a framework for further analysis of miRNAs and their role in regulating plant response to fungal infection and Verticillium wilt in particular.

  12. The utility of diversity profiling using Illumina 18S rRNA gene amplicon deep sequencing to detect and discriminate Toxoplasma gondii among the cyst-forming coccidia.

    PubMed

    Cooper, Madalyn K; Phalen, David N; Donahoe, Shannon L; Rose, Karrie; Šlapeta, Jan

    2016-01-30

    Next-generation sequencing (NGS) has the capacity to screen a single DNA sample and detect pathogen DNA from thousands of host DNA sequence reads, making it a versatile and informative tool for investigation of pathogens in diseased animals. The technique is effective and labor saving in the initial identification of pathogens, and will complement conventional diagnostic tests to associate the candidate pathogen with a disease process. In this report, we investigated the utility of the diversity profiling NGS approach using Illumina small subunit ribosomal RNA (18S rRNA) gene amplicon deep sequencing to detect Toxoplasma gondii in previously confirmed cases of toxoplasmosis. We then tested the diagnostic approach with species-specific PCR genotyping, histopathology and immunohistochemistry of toxoplasmosis in a Risso's dolphin (Grampus griseus) to systematically characterise the disease and associate causality. We show that the Euk7A/Euk570R primer set targeting the V1-V3 hypervariable region of the 18S rRNA gene can be used as a species-specific assay for cyst-forming coccidia and discriminate T. gondii. Overall, the approach is cost-effective and improves diagnostic decision support by narrowing the differential diagnosis list with more certainty than was previously possible. Furthermore, it supplements the limitations of cryptic protozoan morphology and surpasses the need for species-specific PCR primer combinations.

  13. [Sequence of venous blood flow alterations in patients after recently endured acute thrombosis of lower-limb deep veins based on the findings of ultrasonographic duplex scanning].

    PubMed

    Tarkovskiĭ, A A; Zudin, A M; Aleksandrova, E S

    2009-01-01

    This study was undertaken to investigate the sequence of alterations in the venous blood flow to have occurred within the time frame of one year after sustained acute thrombosis of the lower-limb deep veins, which was carried out using the standard technique of ultrasonographic duplex scanning. A total of thirty-two 24-to-62-year-old patients presenting with newly onset acute phlebothrombosis were followed up. All the patients were sequentially examined at 2 days, 3 weeks, 3 months, 6 months and 12 months after the manifestation of the initial clinical signs of the disease. Amongst the parameters to determine were the patency of the deep veins and the condition of the valvular apparatus of the deep, superficial and communicant veins. According to the obtained findings, it was as early as at the first stage of the phlebohaemodynamic alterations after the endured thrombosis, i. e., during the acute period of the disease, that seven (21.9%) patients were found to have developed valvular insufficiency of the communicant veins of the cms, manifesting itself in the formation of a horizontal veno-venous reflux, and 6 months later, these events were observed to have occurred in all the patients examined (100%). Afterwards, the second stage of the phlebohaemodynamic alterations was, simultaneously with the process of recanalization of the thrombotic masses in the deep veins, specifically characterized by the formation of valvular insufficiency of the latter, manifesting itself in the form of the development of a deep vertical veno-venous reflux, which was revealed at month six after the onset of the disease in 56.3% of the examined subjects, to be then observed after 12 months in 93.8% of the patients involved. Recanalization of thrombotic masses was noted to commence 3 months after the onset of thrombosis in twelve (37.5%) patients, and after 12 months it was seen to ensue in all the patients (100%), eventually ending in complete restoration of the patency of the affected

  14. Exploration for deep gas in the Devonian Chaco Basin of Southern Bolivia: Sequence stratigraphy, predictions, and well results

    SciTech Connect

    Williams, K.E.; Radovich, B.J.; Brett, J.W.

    1995-12-31

    In mid 1991, a team was assembled in Texaco`s Frontier Exploration Department (FED) to define the hydrocarbon potential of the Chaco Basin of Southern Bolivia. The Miraflores No. 1 was drilled in the fall of 1992, for stratigraphic objectives. The well confirmed the predicted stratigraphic trap in the Mid-Devonian, with gas discovered in two highstand and transgressive sands. They are low contrast and low resistivity sands that are found in a deep basin `tight gas` setting. Testing of the gas sands was complicated by drilling fluid interactions at the well bore. Subsequent analysis indicated that the existing porosity and permeability were reduced, such that a realistic test of reservoir capabilities was prevented.

  15. The Venom Gland Transcriptome of Latrodectus tredecimguttatus Revealed by Deep Sequencing and cDNA Library Analysis

    PubMed Central

    He, Quanze; Duan, Zhigui; Yu, Ying; Liu, Zhen; Liu, Zhonghua; Liang, Songping

    2013-01-01

    Latrodectus tredecimguttatus, commonly known as black widow spider, is well known for its dangerous bite. Although its venom has been characterized extensively, some fundamental questions about its molecular composition remain unanswered. The limited transcriptome and genome data available prevent further understanding of spider venom at the molecular level. In the present study, we combined next-generation sequencing and conventional DNA sequencing to construct a venom gland transcriptome of the spider L. tredecimguttatus, which resulted in the identification of 9,666 and 480 high-confidence proteins among 34,334 de novo sequences and 1,024 cDNA sequences, respectively, by assembly, translation, filtering, quantification and annotation. Extensive functional analyses of these proteins indicated that mRNAs involved in RNA transport and spliceosome, protein translation, processing and transport were highly enriched in the venom gland, which is consistent with the specific function of venom glands, namely the production of toxins. Furthermore, we identified 146 toxin-like proteins forming 12 families, including 6 new families in this spider in which α-LTX-Lt1a family2 is firstly identified as a subfamily of α-LTX-Lt1a family. The toxins were classified according to their bioactivities into five categories that functioned in a coordinate way. Few ion channels were expressed in venom gland cells, suggesting a possible mechanism of protection from the attack of their own toxins. The present study provides a gland transcriptome profile and extends our understanding of the toxinome of spiders and coordination mechanism for toxin production in protein expression quantity. PMID:24312294

  16. The venom gland transcriptome of Latrodectus tredecimguttatus revealed by deep sequencing and cDNA library analysis.

    PubMed

    He, Quanze; Duan, Zhigui; Yu, Ying; Liu, Zhen; Liu, Zhonghua; Liang, Songping

    2013-01-01

    Latrodectus tredecimguttatus, commonly known as black widow spider, is well known for its dangerous bite. Although its venom has been characterized extensively, some fundamental questions about its molecular composition remain unanswered. The limited transcriptome and genome data available prevent further understanding of spider venom at the molecular level. In the present study, we combined next-generation sequencing and conventional DNA sequencing to construct a venom gland transcriptome of the spider L. tredecimguttatus, which resulted in the identification of 9,666 and 480 high-confidence proteins among 34,334 de novo sequences and 1,024 cDNA sequences, respectively, by assembly, translation, filtering, quantification and annotation. Extensive functional analyses of these proteins indicated that mRNAs involved in RNA transport and spliceosome, protein translation, processing and transport were highly enriched in the venom gland, which is consistent with the specific function of venom glands, namely the production of toxins. Furthermore, we identified 146 toxin-like proteins forming 12 families, including 6 new families in this spider in which α-LTX-Lt1a family2 is firstly identified as a subfamily of α-LTX-Lt1a family. The toxins were classified according to their bioactivities into five categories that functioned in a coordinate way. Few ion channels were expressed in venom gland cells, suggesting a possible mechanism of protection from the attack of their own toxins. The present study provides a gland transcriptome profile and extends our understanding of the toxinome of spiders and coordination mechanism for toxin production in protein expression quantity.

  17. Contrasted seismogenic and rheological behaviours from shallow and deep earthquake sequences in the North Tanzanian Divergence, East Africa

    NASA Astrophysics Data System (ADS)

    Albaric, J.; Perrot, J.; Déverchère, J.; Deschamps, A.; Le Gall, B.; Ferdinand, R. W.; Petit, C.; Tiberi, C.; Sue, C.; Songo, M.

    2010-12-01

    We report preliminary results of a seismological experiment, SEISMO-TANZ' 07, which consisted in the deployment of a local network (35 stations) in the East African Rift System (EARS), North Tanzania, during 6 months in 2007. We compare two earthquake sequences (Gelai and Manyara) occurring, respectively, in the southern end of the Kenya rift and in the North Tanzanian Divergence (NTD). Only distant of ˜150 km, their triggering mechanisms are different. None of the sequences depicts typical swarm or mainshock-aftershock patterns. They highlight the change in the magmatic/tectonic nature of the rift where the eastern branch of the EARS enters the Tanzanian craton. The similar shape and long-axis of the elongate sequences emphasize the preferred locus of active strain release along NE-SW discontinuities which probably root at depth into steep Proterozoic shear zones. At Gelai, the deformation is dominated by aseismic process involving slow slip on normal fault and dyke intrusion within the upper crust (Calais et al., 2008). The spatial and temporal earthquake distribution indicates a possible correlation between the Gelai crisis and the eruption of the nearby Oldoinyo Lengai volcano. At Manyara, the sequence is more uncommon, revealing a long-lasting seismic activity deeply rooted (˜20-35 km depth) possibly related to stress loading transmitted laterally. The yield strength envelope modelled from the depth frequency distribution of earthquakes in the NTD is consistent with the presence of a mafic lower crust and further supports the strength increase of the rifted crust from south Kenya to the NTD.

  18. Genome-wide discovery and differential regulation of conserved and novel microRNAs in chickpea via deep sequencing.

    PubMed

    Jain, Mukesh; Chevala, V V S Narayana; Garg, Rohini

    2014-11-01

    MicroRNAs (miRNAs) are essential components of complex gene regulatory networks that orchestrate plant development. Although several genomic resources have been developed for the legume crop chickpea, miRNAs have not been discovered until now. For genome-wide discovery of miRNAs in chickpea (Cicer arietinum), we sequenced the small RNA content from seven major tissues/organs employing Illumina technology. About 154 million reads were generated, which represented more than 20 million distinct small RNA sequences. We identified a total of 440 conserved miRNAs in chickpea based on sequence similarity with known miRNAs in other plants. In addition, 178 novel miRNAs were identified using a miRDeep pipeline with plant-specific scoring. Some of the conserved and novel miRNAs with significant sequence similarity were grouped into families. The chickpea miRNAs targeted a wide range of mRNAs involved in diverse cellular processes, including transcriptional regulation (transcription factors), protein modification and turnover, signal transduction, and metabolism. Our analysis revealed several miRNAs with differential spatial expression. Many of the chickpea miRNAs were expressed in a tissue-specific manner. The conserved and differential expression of members of the same miRNA family in different tissues was also observed. Some of the same family members were predicted to target different chickpea mRNAs, which suggested the specificity and complexity of miRNA-mediated developmental regulation. This study, for the first time, reveals a comprehensive set of conserved and novel miRNAs along with their expression patterns and putative targets in chickpea, and provides a framework for understanding regulation of developmental processes in legumes.

  19. DASAF: An R Package for Deep Sequencing-Based Detection of Fetal Autosomal Abnormalities from Maternal Cell-Free DNA.

    PubMed

    Liu, Baohong; Tang, Xiaoyan; Qiu, Feng; Tao, Chunmei; Gao, Junhui; Ma, Mengmeng; Zhong, Tingyan; Cai, JianPing; Li, Yixue; Ding, Guohui

    2016-01-01

    Background. With the development of massively parallel sequencing (MPS), noninvasive prenatal diagnosis using maternal cell-free DNA is fast becoming the preferred method of fetal chromosomal abnormality detection, due to its inherent high accuracy and low risk. Typically, MPS data is parsed to calculate a risk score, which is used to predict whether a fetal chromosome is normal or not. Although there are several highly sensitive and specific MPS data-parsing algorithms, there are currently no tools that implement these methods. Results. We developed an R package, detection of autosomal abnormalities for fetus (DASAF), that implements the three most popular trisomy detection methods-the standard Z-score (STDZ) method, the GC correction Z-score (GCCZ) method, and the internal reference Z-score (IRZ) method-together with one subchromosome abnormality identification method (SCAZ). Conclusions. With the cost of DNA sequencing declining and with advances in personalized medicine, the demand for noninvasive prenatal testing will undoubtedly increase, which will in turn trigger an increase in the tools available for subsequent analysis. DASAF is a user-friendly tool, implemented in R, that supports identification of whole-chromosome as well as subchromosome abnormalities, based on maternal cell-free DNA sequencing data after genome mapping. PMID:27437397

  20. Deep COI sequencing of standardized benthic samples unveils overlooked diversity of Jordanian coral reefs in the northern Red Sea.

    PubMed

    Al-Rshaidat, Mamoon M D; Snider, Allison; Rosebraugh, Sydney; Devine, Amanda M; Devine, Thomas D; Plaisance, Laetitia; Knowlton, Nancy; Leray, Matthieu

    2016-09-01

    High-throughput sequencing (HTS) of DNA barcodes (metabarcoding), particularly when combined with standardized sampling protocols, is one of the most promising approaches for censusing overlooked cryptic invertebrate communities. We present biodiversity estimates based on sequencing of the cytochrome c oxidase subunit 1 (COI) gene for coral reefs of the Gulf of Aqaba, a semi-enclosed system in the northern Red Sea. Samples were obtained from standardized sampling devices (Autonomous Reef Monitoring Structures (ARMS)) deployed for 18 months. DNA barcoding of non-sessile specimens >2 mm revealed 83 OTUs in six phyla, of which only 25% matched a reference sequence in public databases. Metabarcoding of the 2 mm - 500 μm and sessile bulk fractions revealed 1197 OTUs in 15 animal phyla, of which only 4.9% matched reference barcodes. These results highlight the scarcity of COI data for cryptobenthic organisms of the Red Sea. Compared with data obtained using similar methods, our results suggest that Gulf of Aqaba reefs are less diverse than two Pacific coral reefs but much more diverse than an Atlantic oyster reef at a similar latitude. The standardized approaches used here show promise for establishing baseline data on biodiversity, monitoring the impacts of environmental change, and quantifying patterns of diversity at regional and global scales. PMID:27584940

  1. Shallow and deep earthquake sequences captured in the North Tanzanian Divergence, East Africa: Inferences on seismogenic processes and rheology

    NASA Astrophysics Data System (ADS)

    Albaric, J.; Perrot, J.; Déverchère, J.; Deschamps, A.; Ferdinand, R. W.; Le Gall, B.

    2009-12-01

    Using a temporary local seismic network of 35 stations deployed in North Tanzania (SEISMOTANZ'07 experiment) during 6 months in 2007, we captured two earthquake sequences (Gelai and Manyara) occurring respectively in the southern end of the Kenya rift and in the North Tanzanian Divergence (NTD). None of the sequences depicts typical swarm or mainshock-aftershock patterns. Although distant of only ~150 km, their triggering mechanisms appear to be different. They highlight a major change in the magmatic/tectonic nature of the rift where the eastern branch of the Est African Rift enters the Tanzanian craton. Both depict similar shape and long-axis, emphasizing the preferred locus of active strain release along NE-SW discontinuities which probably root at depth into steep Proterozoic shear zones. At Gelai, the deformation is dominated by aseismic processes involving slow slip on a normal fault and dyke intrusion within the upper crust, and an interaction with the eruption of the nearby Oldoinyo Lengai volcano. At Manyara, the sequence reveals a long-lasting seismic activity deeply rooted (~20-35 km depth), possibly indicative of stress loading transmitted laterally. Focal solutions demonstrate a mixture of normal and strike slip faulting on sub-vertical inherited structures striking N60°E. The yield stress envelope modelled from the depth frequency distribution of earthquakes in Manyara is consistent with the presence of a mafic lower crust and further supports the strength increase of the rifted crust from south Kenya to the NTD.

  2. Deep sequencing of mixed total DNA without barcodes allows efficient assembly of highly plastic ascidian mitochondrial genomes.

    PubMed

    Rubinstein, Nimrod D; Feldstein, Tamar; Shenkar, Noa; Botero-Castro, Fidel; Griggio, Francesca; Mastrototaro, Francesco; Delsuc, Frédéric; Douzery, Emmanuel J P; Gissi, Carmela; Huchon, Dorothée

    2013-01-01

    Ascidians or sea squirts form a diverse group within chordates, which includes a few thousand members of marine sessile filter-feeding animals. Their mitochondrial genomes are characterized by particularly high evolutionary rates and rampant gene rearrangements. This extreme variability complicates standard polymerase chain reaction (PCR) based techniques for molecular characterization studies, and consequently only a few complete Ascidian mitochondrial genome sequences are available. Using the standard PCR and Sanger sequencing approach, we produced the mitochondrial genome of Ascidiella aspersa only after a great effort. In contrast, we produced five additional mitogenomes (Botrylloides aff. leachii, Halocynthia spinosa, Polycarpa mytiligera, Pyura gangelion, and Rhodosoma turcicum) with a novel strategy, consisting in sequencing the pooled total DNA samples of these five species using one Illumina HiSeq 2000 flow cell lane. Each mitogenome was efficiently assembled in a single contig using de novo transcriptome assembly, as de novo genome assembly generally performed poorly for this task. Each of the new six mitogenomes presents a different and novel gene order, showing that no syntenic block has been conserved at the ordinal level (in Stolidobranchia and in Phlebobranchia). Phylogenetic analyses support the paraphyly of both Ascidiacea and Phlebobranchia, with Thaliacea nested inside Phlebobranchia, although the deepest nodes of the Phlebobranchia-Thaliacea clade are not well resolved. The strategy described here thus provides a cost-effective approach to obtain complete mitogenomes characterized by a highly plastic gene order and a fast nucleotide/amino acid substitution rate.

  3. DASAF: An R Package for Deep Sequencing-Based Detection of Fetal Autosomal Abnormalities from Maternal Cell-Free DNA

    PubMed Central

    Tang, Xiaoyan; Qiu, Feng; Tao, Chunmei; Gao, Junhui; Ma, Mengmeng; Zhong, Tingyan; Cai, JianPing; Li, Yixue

    2016-01-01

    Background. With the development of massively parallel sequencing (MPS), noninvasive prenatal diagnosis using maternal cell-free DNA is fast becoming the preferred method of fetal chromosomal abnormality detection, due to its inherent high accuracy and low risk. Typically, MPS data is parsed to calculate a risk score, which is used to predict whether a fetal chromosome is normal or not. Although there are several highly sensitive and specific MPS data-parsing algorithms, there are currently no tools that implement these methods. Results. We developed an R package, detection of autosomal abnormalities for fetus (DASAF), that implements the three most popular trisomy detection methods—the standard Z-score (STDZ) method, the GC correction Z-score (GCCZ) method, and the internal reference Z-score (IRZ) method—together with one subchromosome abnormality identification method (SCAZ). Conclusions. With the cost of DNA sequencing declining and with advances in personalized medicine, the demand for noninvasive prenatal testing will undoubtedly increase, which will in turn trigger an increase in the tools available for subsequent analysis. DASAF is a user-friendly tool, implemented in R, that supports identification of whole-chromosome as well as subchromosome abnormalities, based on maternal cell-free DNA sequencing data after genome mapping. PMID:27437397

  4. Genome re-sequencing of semi-wild soybean reveals a complex Soja population structure and deep introgression.

    PubMed

    Qiu, Jie; Wang, Yu; Wu, Sanling; Wang, Ying-Ying; Ye, Chu-Yu; Bai, Xuefei; Li, Zefeng; Yan, Chenghai; Wang, Weidi; Wang, Ziqiang; Shu, Qingyao; Xie, Jiahua; Lee, Suk-Ha; Fan, Longjiang

    2014-01-01

    Semi-wild soybean is a unique type of soybean that retains both wild and domesticated characteristics, which provides an important intermediate type for understanding the evolution of the subgenus Soja population in the Glycine genus. In this study, a semi-wild soybean line (Maliaodou) and a wild line (Lanxi 1) collected from the lower Yangtze regions were deeply sequenced while nine other semi-wild lines were sequenced to a 3-fold genome coverage. Sequence analysis revealed that (1) no independent phylogenetic branch covering all 10 semi-wild lines was observed in the Soja phylogenetic tree; (2) besides two distinct subpopulations of wild and cultivated soybean in the Soja population structure, all semi-wild lines were mixed with some wild lines into a subpopulation rather than an independent one or an intermediate transition type of soybean domestication; (3) high heterozygous rates (0.19-0.49) were observed in several semi-wild lines; and (4) over 100 putative selective regions were identified by selective sweep analysis, including those related to the development of seed size. Our results suggested a hybridization origin for the semi-wild soybean, which makes a complex Soja population structure. PMID:25265539

  5. Genome re-sequencing of semi-wild soybean reveals a complex Soja population structure and deep introgression.

    PubMed

    Qiu, Jie; Wang, Yu; Wu, Sanling; Wang, Ying-Ying; Ye, Chu-Yu; Bai, Xuefei; Li, Zefeng; Yan, Chenghai; Wang, Weidi; Wang, Ziqiang; Shu, Qingyao; Xie, Jiahua; Lee, Suk-Ha; Fan, Longjiang

    2014-01-01

    Semi-wild soybean is a unique type of soybean that retains both wild and domesticated characteristics, which provides an important intermediate type for understanding the evolution of the subgenus Soja population in the Glycine genus. In this study, a semi-wild soybean line (Maliaodou) and a wild line (Lanxi 1) collected from the lower Yangtze regions were deeply sequenced while nine other semi-wild lines were sequenced to a 3-fold genome coverage. Sequence analysis revealed that (1) no independent phylogenetic branch covering all 10 semi-wild lines was observed in the Soja phylogenetic tree; (2) besides two distinct subpopulations of wild and cultivated soybean in the Soja population structure, all semi-wild lines were mixed with some wild lines into a subpopulation rather than an independent one or an intermediate transition type of soybean domestication; (3) high heterozygous rates (0.19-0.49) were observed in several semi-wild lines; and (4) over 100 putative selective regions were identified by selective sweep analysis, including those related to the development of seed size. Our results suggested a hybridization origin for the semi-wild soybean, which makes a complex Soja population structure.

  6. A deep sequencing analysis of transcriptomes and the development of EST-SSR markers in mungbean (Vigna radiata).

    PubMed

    Liu, Changyou; Fan, Baojie; Cao, Zhimin; Su, Qiuzhu; Wang, Yan; Zhang, Zhixiao; Wu, Jing; Tian, Jing

    2016-09-01

    Mungbean (Vigna radiata L. Wilczek) is one of the most important leguminous food crops in Asia. We employed Illumina paired-end sequencing to analyse transcriptomes of three different mungbean genotypes. A total of 38.3-39.8 million pairedend reads with 73 bp lengths were generated. The pooled reads from the three libraries were assembled into 56,471 transcripts. Following a cluster analysis, 43,293 unigenes were obtained with an average length of 739 bp and N50 length of 1176 bp. Of the unigenes, 34,903 (80.6%) had significant similarity to known proteins in the NCBI nonredundant protein database (Nr), while 21,450 (58.4%) had BLAST hits in the Swiss-Prot database (E-value<10⁻⁵). Further, 1245 differential expression genes were detected among three mungbean genotypes. In addition, we identified 3788 expressed sequence tag-simple sequence repeat (EST-SSR) motifs that could be used as potential molecular markers. Among 320 tested loci, 310 (96.5%) yielded amplification products, and 151 (47.0%) exhibited polymorphisms among six mungbean accessions. These transcriptome data and mungbean EST-SSRs could serve as a valuable resource for novel gene discovery and the marker-assisted selective breeding of this species. PMID:27659323

  7. Deep COI sequencing of standardized benthic samples unveils overlooked diversity of Jordanian coral reefs in the northern Red Sea.

    PubMed

    Al-Rshaidat, Mamoon M D; Snider, Allison; Rosebraugh, Sydney; Devine, Amanda M; Devine, Thomas D; Plaisance, Laetitia; Knowlton, Nancy; Leray, Matthieu

    2016-09-01

    High-throughput sequencing (HTS) of DNA barcodes (metabarcoding), particularly when combined with standardized sampling protocols, is one of the most promising approaches for censusing overlooked cryptic invertebrate communities. We present biodiversity estimates based on sequencing of the cytochrome c oxidase subunit 1 (COI) gene for coral reefs of the Gulf of Aqaba, a semi-enclosed system in the northern Red Sea. Samples were obtained from standardized sampling devices (Autonomous Reef Monitoring Structures (ARMS)) deployed for 18 months. DNA barcoding of non-sessile specimens >2 mm revealed 83 OTUs in six phyla, of which only 25% matched a reference sequence in public databases. Metabarcoding of the 2 mm - 500 μm and sessile bulk fractions revealed 1197 OTUs in 15 animal phyla, of which only 4.9% matched reference barcodes. These results highlight the scarcity of COI data for cryptobenthic organisms of the Red Sea. Compared with data obtained using similar methods, our results suggest that Gulf of Aqaba reefs are less diverse than two Pacific coral reefs but much more diverse than an Atlantic oyster reef at a similar latitude. The standardized approaches used here show promise for establishing baseline data on biodiversity, monitoring the impacts of environmental change, and quantifying patterns of diversity at regional and global scales.

  8. Deep Sequencing of the Trypanosoma cruzi GP63 Surface Proteases Reveals Diversity and Diversifying Selection among Chronic and Congenital Chagas Disease Patients

    PubMed Central

    Llewellyn, Martin S.; Messenger, Louisa A.; Luquetti, Alejandro O.; Garcia, Lineth; Torrico, Faustino; Tavares, Suelene B. N.; Cheaib, Bachar; Derome, Nicolas; Delepine, Marc; Baulard, Céline; Deleuze, Jean-Francois; Sauer, Sascha; Miles, Michael A.

    2015-01-01

    Background Chagas disease results from infection with the diploid protozoan parasite Trypanosoma cruzi. T. cruzi is highly genetically diverse, and multiclonal infections in individual hosts are common, but little studied. In this study, we explore T. cruzi infection multiclonality in the context of age, sex and clinical profile among a cohort of chronic patients, as well as paired congenital cases from Cochabamba, Bolivia and Goias, Brazil using amplicon deep sequencing technology. Methodology/ Principal Findings A 450bp fragment of the trypomastigote TcGP63I surface protease gene was amplified and sequenced across 70 chronic and 22 congenital cases on the Illumina MiSeq platform. In addition, a second, mitochondrial target—ND5—was sequenced across the same cohort of cases. Several million reads were generated, and sequencing read depths were normalized within patient cohorts (Goias chronic, n = 43, Goias congenital n = 2, Bolivia chronic, n = 27; Bolivia congenital, n = 20), Among chronic cases, analyses of variance indicated no clear correlation between intra-host sequence diversity and age, sex or symptoms, while principal coordinate analyses showed no clustering by symptoms between patients. Between congenital pairs, we found evidence for the transmission of multiple sequence types from mother to infant, as well as widespread instances of novel genotypes in infants. Finally, non-synonymous to synonymous (dn:ds) nucleotide substitution ratios among sequences of TcGP63Ia and TcGP63Ib subfamilies within each cohort provided powerful evidence of strong diversifying selection at this locus. Conclusions/Significance Our results shed light on the diversity of parasite DTUs within each patient, as well as the extent to which parasite strains pass between mother and foetus in congenital cases. Although we were unable to find any evidence that parasite diversity accumulates with age in our study cohorts, putative diversifying selection within members of the TcGP63I

  9. Genome sequence of the deep-sea gamma-proteobacterium Idiomarina loihiensis reveals amino acid fermentation as a source of carbon and energy.

    PubMed

    Hou, Shaobin; Saw, Jimmy H; Lee, Kit Shan; Freitas, Tracey A; Belisle, Claude; Kawarabayasi, Yutaka; Donachie, Stuart P; Pikina, Alla; Galperin, Michael Y; Koonin, Eugene V; Makarova, Kira S; Omelchenko, Marina V; Sorokin, Alexander; Wolf, Yuri I; Li, Qing X; Keum, Young Soo; Campbell, Sonia; Denery, Judith; Aizawa, Shin-Ichi; Shibata, Satoshi; Malahoff, Alexander; Alam, Maqsudul

    2004-12-28

    We report the complete genome sequence of the deep-sea gamma-proteobacterium, Idiomarina loihiensis, isolated recently from a hydrothermal vent at 1,300-m depth on the Loihi submarine volcano, Hawaii. The I. loihiensis genome comprises a single chromosome of 2,839,318 base pairs, encoding 2,640 proteins, four rRNA operons, and 56 tRNA genes. A comparison of I. loihiensis to the genomes of other gamma-proteobacteria reveals abundance of amino acid transport and degradation enzymes, but a loss of sugar transport systems and certain enzymes of sugar metabolism. This finding suggests that I. loihiensis relies primarily on amino acid catabolism, rather than on sugar fermentation, for carbon and energy. Enzymes for biosynthesis of purines, pyrimidines, the majority of amino acids, and coenzymes are encoded in the genome, but biosynthetic pathways for Leu, Ile, Val, Thr, and Met are incomplete. Auxotrophy for Val and Thr was confirmed by in vivo experiments. The I. loihiensis genome contains a cluster of 32 genes encoding enzymes for exopolysaccharide and capsular polysaccharide synthesis. It also encodes diverse peptidases, a variety of peptide and amino acid uptake systems, and versatile signal transduction machinery. We propose that the source of amino acids for I. loihiensis growth are the proteinaceous particles present in the deep sea hydrothermal vent waters. I. loihiensis would colonize these particles by using the secreted exopolysaccharide, digest these proteins, and metabolize the resulting peptides and amino acids. In summary, the I. loihiensis genome reveals an integrated mechanism of metabolic adaptation to the constantly changing deep-sea hydrothermal ecosystem. PMID:15596722

  10. Deep transcriptome-sequencing and proteome analysis of the hydrothermal vent annelid Alvinella pompejana identifies the CvP-bias as a robust measure of eukaryotic thermostability

    PubMed Central

    2013-01-01

    Background Alvinella pompejana is an annelid worm that inhabits deep-sea hydrothermal vent sites in the Pacific Ocean. Living at a depth of approximately 2500 meters, these worms experience extreme environmental conditions, including high temperature and pressure as well as high levels of sulfide and heavy metals. A. pompejana is one of the most thermotolerant metazoans, making this animal a subject of great interest for studies of eukaryotic thermoadaptation. Results In order to complement existing EST resources we performed deep sequencing of the A. pompejana transcriptome. We identified several thousand novel protein-coding transcripts, nearly doubling the sequence data for this annelid. We then performed an extensive survey of previously established prokaryotic thermoadaptation measures to search for global signals of thermoadaptation in A. pompejana in comparison with mesophilic eukaryotes. In an orthologous set of 457 proteins, we found that the best indicator of thermoadaptation was the difference in frequency of charged versus polar residues (CvP-bias), which was highest in A. pompejana. CvP-bias robustly distinguished prokaryotic thermophiles from prokaryotic mesophiles, as well as the thermophilic fungus Chaetomium thermophilum from mesophilic eukaryotes. Experimental values for thermophilic proteins supported higher CvP-bias as a measure of thermal stability when compared to their mesophilic orthologs. Proteome-wide mean CvP-bias also correlated with the body temperatures of homeothermic birds and mammals. Conclusions Our work extends the transcriptome resources for A. pompejana and identifies the CvP-bias as a robust and widely applicable measure of eukaryotic thermoadaptation. Reviewer This article was reviewed by Sándor Pongor, L. Aravind and Anthony M. Poole. PMID:23324115

  11. Microdiversity of deep-sea Bacillales isolated from Tyrrhenian sea sediments as revealed by ARISA, 16S rRNA gene sequencing and BOX-PCR fingerprinting.

    PubMed

    Ettoumi, Besma; Guesmi, Amel; Brusetti, Lorenzo; Borin, Sara; Najjari, Afef; Boudabous, Abdellatif; Cherif, Ameur

    2013-01-01

    With respect to their terrestrial relatives, marine Bacillales have not been sufficiently investigated. In this report, the diversity of deep-sea Bacillales, isolated from seamount and non-seamount stations at 3,425 to 3,580 m depth in the Tyrrhenian Sea, was investigated using PCR fingerprinting and 16S rRNA sequence analysis. The isolate collection (n=120) was de-replicated by automated ribosomal intergenic spacer analysis (ARISA), and phylogenetic diversity was analyzed by 16S rRNA gene sequencing of representatives of each ARISA haplotype (n=37). Phylogenetic analysis of isolates showed their affiliation to six different genera of low G+C% content Gram-positive Bacillales: Bacillus, Staphylococcus, Exiguobacterium, Paenibacillus, Lysinibacillus and Terribacillus. Bacillus was the dominant genus represented by the species B. licheniformis, B. pumilus, B. subtilis, B. amyloliquefaciens and B. firmus, typically isolated from marine sediments. The most abundant species in the collection was B. licheniformis (n=85), which showed seven distinct ARISA haplotypes with haplotype H8 being the most dominant since it was identified by 63 isolates. The application of BOX-PCR fingerprinting to the B. licheniformis sub-collection allowed their separation into five distinct BOX genotypes, suggesting a high level of intraspecies diversity among marine B. licheniformis strains. This species also exhibited distinct strain distribution between seamount and non-seamount stations and was shown to be highly prevalent in non-seamount stations. This study revealed the great microdiversity of marine Bacillales and contributes to understanding the biogeographic distribution of marine bacteria in deep-sea sediments.

  12. Complete genome sequence of 'Halanaeroarchaeum sulfurireducens' M27-SA2, a sulfur-reducing and acetate-oxidizing haloarchaeon from the deep-sea hypersaline anoxic lake Medee.

    PubMed

    Messina, Enzo; Sorokin, Dimitry Y; Kublanov, Ilya V; Toshchakov, Stepan; Lopatina, Anna; Arcadi, Erika; Smedile, Francesco; La Spada, Gina; La Cono, Violetta; Yakimov, Michail M

    2016-01-01

    Strain M27-SA2 was isolated from the deep-sea salt-saturated anoxic lake Medee, which represents one of the most hostile extreme environments on our planet. On the basis of physiological studies and phylogenetic positioning this extremely halophilic euryarchaeon belongs to a novel genus 'Halanaeroarchaeum' within the family Halobacteriaceae. All members of this genus cultivated so far are strict anaerobes using acetate as the sole carbon and energy source and elemental sulfur as electron acceptor. Here we report the complete genome sequence of the strain M27-SA2 which is composed of a 2,129,244-bp chromosome and a 124,256-bp plasmid. This is the second complete genome sequence within the genus Halanaeroarchaeum. We demonstrate that genome of 'Halanaeroarchaeum sulfurireducens' M27-SA2 harbors complete metabolic pathways for acetate and sulfur catabolism and for de novo biosynthesis of 19 amino acids. The genomic analysis also reveals that 'Halanaeroarchaeum sulfurireducens' M27-SA2 harbors two prophage loci and one CRISPR locus, highly similar to that of Kulunda Steppe (Altai, Russia) isolate 'H. sulfurireducens' HSR2(T). The discovery of sulfur-respiring acetate-utilizing haloarchaeon in deep-sea hypersaline anoxic lakes has certain significance for understanding the biogeochemical functioning of these harsh ecosystems, which are incompatible with life for common organisms. Moreover, isolations of Halanaeroarchaeum members from geographically distant salt-saturated sites of different origin suggest a high degree of evolutionary success in their adaptation to this type of extreme biotopes around the world. PMID:27182430

  13. Deep Sequencing of Distinct Preparations of the Live Attenuated Varicella-Zoster Virus Vaccine Reveals a Conserved Core of Attenuating Single-Nucleotide Polymorphisms

    PubMed Central

    Yamanishi, Koichi; Gomi, Yasuyuki; Gershon, Anne A.; Breuer, Judith

    2016-01-01

    ABSTRACT The continued success of the live attenuated varicella-zoster virus vaccine in preventing varicella-zoster and herpes zoster is well documented, as are many of the mutations that contribute to the attenuation of the vOka virus for replication in skin. At least three different preparations of vOka are marketed. Here, we show using deep sequencing of seven batches of vOka vaccine (including ZostaVax, VariVax, VarilRix, and the Oka/Biken working seed) from three different manufacturers (VariVax, GSK, and Biken) that 137 single-nucleotide polymorphism (SNP) mutations are present in all vaccine batches. This includes six sites at which the vaccine allele is fixed or near fixation, which we speculate are likely to be important for attenuation. We also show that despite differences in the vaccine populations between preparations, batch-to-batch variation is minimal, as is the number and frequency of mutations unique to individual batches. This suggests that the vaccine manufacturing processes are not introducing new mutations and that, notwithstanding the mixture of variants present, VZV live vaccines are extremely stable. IMPORTANCE The continued success of vaccinations to prevent chickenpox and shingles, combined with the extremely low incidence of adverse reactions, indicates the quality of these vaccines. The vaccine itself is comprised of a heterogeneous live attenuated virus population and thus requires deep-sequencing technologies to explore the differences and similarities in the virus populations between different preparations and batches of the vaccines. Our data demonstrate minimal variation between batches, an important safety feature, and provide new insights into the extent of the mutations present in this attenuated virus. PMID:27440875

  14. Microdiversity of deep-sea Bacillales isolated from Tyrrhenian sea sediments as revealed by ARISA, 16S rRNA gene sequencing and BOX-PCR fingerprinting.

    PubMed

    Ettoumi, Besma; Guesmi, Amel; Brusetti, Lorenzo; Borin, Sara; Najjari, Afef; Boudabous, Abdellatif; Cherif, Ameur

    2013-01-01

    With respect to their terrestrial relatives, marine Bacillales have not been sufficiently investigated. In this report, the diversity of deep-sea Bacillales, isolated from seamount and non-seamount stations at 3,425 to 3,580 m depth in the Tyrrhenian Sea, was investigated using PCR fingerprinting and 16S rRNA sequence analysis. The isolate collection (n=120) was de-replicated by automated ribosomal intergenic spacer analysis (ARISA), and phylogenetic diversity was analyzed by 16S rRNA gene sequencing of representatives of each ARISA haplotype (n=37). Phylogenetic analysis of isolates showed their affiliation to six different genera of low G+C% content Gram-positive Bacillales: Bacillus, Staphylococcus, Exiguobacterium, Paenibacillus, Lysinibacillus and Terribacillus. Bacillus was the dominant genus represented by the species B. licheniformis, B. pumilus, B. subtilis, B. amyloliquefaciens and B. firmus, typically isolated from marine sediments. The most abundant species in the collection was B. licheniformis (n=85), which showed seven distinct ARISA haplotypes with haplotype H8 being the most dominant since it was identified by 63 isolates. The application of BOX-PCR fingerprinting to the B. licheniformis sub-collection allowed their separation into five distinct BOX genotypes, suggesting a high level of intraspecies diversity among marine B. licheniformis strains. This species also exhibited distinct strain distribution between seamount and non-seamount stations and was shown to be highly prevalent in non-seamount stations. This study revealed the great microdiversity of marine Bacillales and contributes to understanding the biogeographic distribution of marine bacteria in deep-sea sediments. PMID:24005887

  15. Role of IL-17 Pathways in Immune Privilege: A RNA Deep Sequencing Analysis of the Mice Testis Exposure to Fluoride

    PubMed Central

    Huo, Meijun; Han, Haijun; Sun, Zilong; Lu, Zhaojing; Yao, Xinglei; Wang, Shaolin; Wang, Jundong

    2016-01-01

    We sequenced RNA transcripts from the testicles of healthy male mice, divided into a control group with distilled water and two experimental groups with 50 and 100 mg/l NaF in drinking water for 56 days. Bowtie/Tophat were used to align 50-bp paired-end reads into transcripts, Cufflinks to measure the relative abundance of each transcript and IPA to analyze RNA-Sequencing data. In the 100 mg/l NaF-treated group, four pathways related to IL-17, TGF-β and other cellular growth factor pathways were overexpressed. The mRNA expression of IL-17RA, IL-17RC, MAP2K1, MAP2K2, MAP2K3 and MAPKAPK2, monitored by qRT-PCR, increased remarkably in the 100 mg/L NaF group and coincided with the result of RNA-Sequencing. Fluoride exposure could disrupt spermatogenesis and testicles in male mice by influencing many signaling pathways and genes, which work on the immune signal transduction and cellular metabolism. The high expression of the IL-17 signal pathway was a response to the invasion of the testicular immune system due to extracellular fluoride. The PI3-kinase/AKT, MAPKs and the cytokines in TGF-β family were contributed to control the IL-17 pathway activation and maintain the immune privilege and spermatogenesis. All the findings provided new ideas for further molecular researches of fluorosis on the reproduction and immune response mechanism. PMID:27572304

  16. Appearances Can Be Deceptive: Revealing a Hidden Viral Infection with Deep Sequencing in a Plant Quarantine Context

    PubMed Central

    Candresse, Thierry; Filloux, Denis; Muhire, Brejnev; Julian, Charlotte; Galzi, Serge; Fort, Guillaume; Bernardo, Pauline; Daugrois, Jean-Heindrich; Fernandez, Emmanuel; Martin, Darren P.; Varsani, Arvind; Roumagnac, Philippe

    2014-01-01

    Comprehensive inventories of plant viral diversity are essential for effective quarantine and sanitation efforts. The safety of regulated plant material exchanges presently relies heavily on techniques such as PCR or nucleic acid hybridisation, which are only suited to the detection and characterisation of specific, well characterised pathogens. Here, we demonstrate the utility of sequence-independent next generation sequencing (NGS) of both virus-derived small interfering RNAs (siRNAs) and virion-associated nucleic acids (VANA) for the detailed identification and characterisation of viruses infecting two quarantined sugarcane plants. Both plants originated from Egypt and were known to be infected with Sugarcane streak Egypt Virus (SSEV; Genus Mastrevirus, Family Geminiviridae), but were revealed by the NGS approaches to also be infected by a second highly divergent mastrevirus, here named Sugarcane white streak Virus (SWSV). This novel virus had escaped detection by all routine quarantine detection assays and was found to also be present in sugarcane plants originating from Sudan. Complete SWSV genomes were cloned and sequenced from six plants and all were found to share >91% genome-wide identity. With the exception of two SWSV variants, which potentially express unusually large RepA proteins, the SWSV isolates display genome characteristics very typical to those of all other previously described mastreviruses. An analysis of virus-derived siRNAs for SWSV and SSEV showed them to be strongly influenced by secondary structures within both genomic single stranded DNA and mRNA transcripts. In addition, the distribution of siRNA size frequencies indicates that these mastreviruses are likely subject to both transcriptional and post-transcriptional gene silencing. Our study stresses the potential advantages of NGS-based virus metagenomic screening in a plant quarantine setting and indicates that such techniques could dramatically reduce the numbers of non

  17. Role of IL-17 Pathways in Immune Privilege: A RNA Deep Sequencing Analysis of the Mice Testis Exposure to Fluoride.

    PubMed

    Huo, Meijun; Han, Haijun; Sun, Zilong; Lu, Zhaojing; Yao, Xinglei; Wang, Shaolin; Wang, Jundong

    2016-01-01

    We sequenced RNA transcripts from the testicles of healthy male mice, divided into a control group with distilled water and two experimental groups with 50 and 100 mg/l NaF in drinking water for 56 days. Bowtie/Tophat were used to align 50-bp paired-end reads into transcripts, Cufflinks to measure the relative abundance of each transcript and IPA to analyze RNA-Sequencing data. In the 100 mg/l NaF-treated group, four pathways related to IL-17, TGF-β and other cellular growth factor pathways were overexpressed. The mRNA expression of IL-17RA, IL-17RC, MAP2K1, MAP2K2, MAP2K3 and MAPKAPK2, monitored by qRT-PCR, increased remarkably in the 100 mg/L NaF group and coincided with the result of RNA-Sequencing. Fluoride exposure could disrupt spermatogenesis and testicles in male mice by influencing many signaling pathways and genes, which work on the immune signal transduction and cellular metabolism. The high expression of the IL-17 signal pathway was a response to the invasion of the testicular immune system due to extracellular fluoride. The PI3-kinase/AKT, MAPKs and the cytokines in TGF-β family were contributed to control the IL-17 pathway activation and maintain the immune privilege and spermatogenesis. All the findings provided new ideas for further molecular researches of fluorosis on the reproduction and immune response mechanism. PMID:27572304

  18. Mining tissue-specific contigs from peanut (Arachis hypogaea L.) for promoter cloning by deep transcriptome sequencing.

    PubMed

    Geng, Lili; Duan, Xiaohong; Liang, Chun; Shu, Changlong; Song, Fuping; Zhang, Jie

    2014-10-01

    Peanut (Arachis hypogaea L.), one of the most important oil legumes in the world, is heavily damaged by white grubs. Tissue-specific promoters are needed to incorporate insect resistance genes into peanut by genetic transformation to control the subterranean pests. Transcriptome sequencing is the most effective way to analyze differential gene expression in this non-model species and contribute to promoter cloning. The transcriptomes of the roots, seeds and leaves of peanut were sequenced using Illumina technology. A simple digital expression profile was established based on number of transcripts per million clean tags (TPM) from different tissues. Subsequently, 584 root-specific candidate transcript assembly contigs (TACs) and 316 seed-specific candidate TACs were identified. Among these candidate TACs, 55.3% were root-specific and 64.6% were seed-specific by semi-quantitative RT-PCR analysis. Moreover, the consistency of semi-quantitative RT-PCR with the simple digital expression profile was correlated with the length and TPM value of TACs. The results of gene ontology showed that some root-specific TACs are involved in stress resistance and respond to auxin stimulus, whereas, seed-specific candidate TACs are involved in embryo development, lipid storage and long-chain fatty acid biosynthesis. One root-specific promoter was cloned and characterized. We developed a high-yield screening system in peanut by establishing a simple digital expression profile based on Illumina sequencing. The feasible and rapid method presented by this study can be used for other non-model crops to explore tissue-specific or spatially specific promoters.

  19. Deep Sequencing Reveals Novel Genetic Variants in Children with Acute Liver Failure and Tissue Evidence of Impaired Energy Metabolism

    PubMed Central

    Valencia, C. Alexander; Wang, Xinjian; Wang, Jin; Peters, Anna; Simmons, Julia R.; Moran, Molly C.; Mathur, Abhinav; Husami, Ammar; Qian, Yaping; Sheridan, Rachel; Bove, Kevin E.; Witte, David; Huang, Taosheng; Miethke, Alexander G.

    2016-01-01

    Background & Aims The etiology of acute liver failure (ALF) remains elusive in almost half of affected children. We hypothesized that inherited mitochondrial and fatty acid oxidation disorders were occult etiological factors in patients with idiopathic ALF and impaired energy metabolism. Methods Twelve patients with elevated blood molar lactate/pyruvate ratio and indeterminate etiology were selected from a retrospective cohort of 74 subjects with ALF because their fixed and frozen liver samples were available for histological, ultrastructural, molecular and biochemical analysis. Results A customized next-generation sequencing panel for 26 genes associated with mitochondrial and fatty acid oxidation defects revealed mutations and sequence variants in five subjects. Variants involved the genes ACAD9, POLG, POLG2, DGUOK, and RRM2B; the latter not previously reported in subjects with ALF. The explanted livers of the patients with heterozygous, truncating insertion mutations in RRM2B showed patchy micro- and macrovesicular steatosis, decreased mitochondrial DNA (mtDNA) content <30% of controls, and reduced respiratory chain complex activity; both patients had good post-transplant outcome. One infant with severe lactic acidosis was found to carry two heterozygous variants in ACAD9, which was associated with isolated complex I deficiency and diffuse hypergranular hepatocytes. The two subjects with heterozygous variants of unknown clinical significance in POLG and DGUOK developed ALF following drug exposure. Their hepatocytes displayed abnormal mitochondria by electron microscopy. Conclusion Targeted next generation sequencing and correlation with histological, ultrastructural and functional studies on liver tissue in children with elevated lactate/pyruvate ratio expand the spectrum of genes associated with pediatric ALF. PMID:27483465

  20. Mining tissue-specific contigs from peanut (Arachis hypogaea L.) for promoter cloning by deep transcriptome sequencing.

    PubMed

    Geng, Lili; Duan, Xiaohong; Liang, Chun; Shu, Changlong; Song, Fuping; Zhang, Jie

    2014-10-01

    Peanut (Arachis hypogaea L.), one of the most important oil legumes in the world, is heavily damaged by white grubs. Tissue-specific promoters are needed to incorporate insect resistance genes into peanut by genetic transformation to control the subterranean pests. Transcriptome sequencing is the most effective way to analyze differential gene expression in this non-model species and contribute to promoter cloning. The transcriptomes of the roots, seeds and leaves of peanut were sequenced using Illumina technology. A simple digital expression profile was established based on number of transcripts per million clean tags (TPM) from different tissues. Subsequently, 584 root-specific candidate transcript assembly contigs (TACs) and 316 seed-specific candidate TACs were identified. Among these candidate TACs, 55.3% were root-specific and 64.6% were seed-specific by semi-quantitative RT-PCR analysis. Moreover, the consistency of semi-quantitative RT-PCR with the simple digital expression profile was correlated with the length and TPM value of TACs. The results of gene ontology showed that some root-specific TACs are involved in stress resistance and respond to auxin stimulus, whereas, seed-specific candidate TACs are involved in embryo development, lipid storage and long-chain fatty acid biosynthesis. One root-specific promoter was cloned and characterized. We developed a high-yield screening system in peanut by establishing a simple digital expression profile based on Illumina sequencing. The feasible and rapid method presented by this study can be used for other non-model crops to explore tissue-specific or spatially specific promoters. PMID:25231965

  1. Deep sequencing of dsRNAs recovered from mosaic-diseased pigeonpea reveals the presence of a novel emaravirus: pigeonpea sterility mosaic virus 2.

    PubMed

    Elbeaino, Toufic; Digiaro, Michele; Uppala, Mangala; Sudini, Harikishan

    2015-08-01

    Deep-sequencing analysis of double-stranded RNA extracted from a mosaic-diseased pigeonpea plant (Cajanus cajan L., family Fabaceae) revealed the complete sequence of six emaravirus-like negative-sense RNA segments of 7009, 2229, 1335, 1491, 1833 and 1194 nucleotides in size. In the order from RNA1 to RNA6, these genomic RNAs contained ORFs coding for the RNA-dependent RNA polymerase (RdRp, p1 of 266 kDa), the glycoprotein precursor (GP, p2 of 74.5 kDa), the nucleocapsid (NC, p3 of 34.9 kDa), and the putative movement protein (MP, p4 of 40.7 kDa), while p5 (55 kDa) and p6 (27 kDa) had unknown functions. All RNA segments showed distant relationships to viruses of the genus Emaravirus, and in particular to pigeonpea sterility mosaic virus (PPSMV), with which they shared nucleotide sequence identity ranging from 48.5 % (RNA3) to 62.5 % (RNA1). In phylogenetic trees constructed from the sequences of the proteins encoded by RNA1, RNA2 and RNA3 (p1, p2 and p3), this new viral entity showed a consistent grouping with fig mosaic virus (FMV) and rose rosette virus (RRV), which formed a cluster of their own, clearly distinct from PPSMV-1. In experimental greenhouse trials, this novel virus was successfully transmitted to pigeonpea and French bean seedlings by the eriophyid mite Aceria cajani. Preliminary surveys conducted in the Hyderabad region (India) showed that the virus in question is widespread in pigeonpea plants affected by sterility mosaic disease (86.4 %) but is absent in symptomless plants. Based on molecular, biological and epidemiological features, this novel virus is the second emaravirus infecting pigeonpea, for which the provisional name pigeonpea sterility mosaic virus 2 (PPSMV-2) is proposed. PMID:26060057

  2. Deep sequencing of the ancestral tobacco species Nicotiana tomentosiformis reveals multiple T-DNA inserts and a complex evolutionary history of natural transformation in the genus Nicotiana.

    PubMed

    Chen, Ke; Dorlhac de Borne, François; Szegedi, Ernö; Otten, Léon

    2014-11-01

    Nicotiana species carry cellular T-DNA sequences (cT-DNAs), acquired by Agrobacterium-mediated transformation. We characterized the cT-DNA sequences of the ancestral Nicotiana tabacum species Nicotiana tomentosiformis by deep sequencing. N. tomentosiformis contains four cT-DNA inserts derived from different Agrobacterium strains. Each has an incomplete inverted-repeat structure. TA is similar to part of the Agrobacterium rhizogenes 1724 mikimopine-type T-DNA, but has unusual orf14 and mis genes. TB carries a 1724 mikimopine-type orf14-mis fragment and a mannopine-agropine synthesis region (mas2-mas1-ags). The mas2' gene codes for an active enzyme. TC is similar to the left part of the A. rhizogenes A4 T-DNA, but also carries octopine synthase-like (ocl) and c-like genes normally found in A. tumefaciens. TD shows a complex rearrangement of T-DNA fragments similar to the right end of the A4 TL-DNA, and including an orf14-like gene and a gene with unknown function, orf511. The TA, TB, TC and TD insertion sites were identified by alignment with N. tabacum and Nicotiana sylvestris sequences. The divergence values for the TA, TB, TC and TD repeats provide an estimate for their relative introduction times. A large deletion has occurred in the central part of the N. tabacum cv. Basma/Xanthi TA region, and another deletion removed the complete TC region in N. tabacum. Nicotiana otophora lacks TA, TB and TD, but contains TC and another cT-DNA, TE. This analysis, together with that of Nicotiana glauca and other Nicotiana species, indicates multiple sequential insertions of cT-DNAs during the evolution of the genus Nicotiana.

  3. Identification and comparative analysis of the Pseudosciaena crocea microRNA transcriptome response to poly(I:C) infection using a deep sequencing approach.

    PubMed

    Qi, Pengzhi; Guo, Baoying; Zhu, Aiyi; Wu, Changwen; Liu, Changlin

    2014-08-01

    Two sRNA libraries with or without poly(I:C) infection of large yellow croaker Pseudosciaena crocea were constructed and sequenced using the high-throughput Illumina/Solexa deep sequencing technology. The high-throughput sequencing pipeline yielded 163,79,272 and 217,07,070 raw reads corresponding to 132,27,594 and 206,86,409 clean reads for the normal and infected libraries, respectively. Bioinfromatic analysis identified 534 miRNAs, of which, 158 miRNAs were known in miRBase 20.0 and the remaining 376 were not found homology to any known metazoan miRNAs, suggesting a possible species-specificity. We analyzed the significance of differently expressed miRNAs between two libraries using pairwise comparison. There was significant differential expression of 112 miRNAs (p < 0.001) between two libraries. Thereinto, a number of known miRNAs were identified immune-related. Real-time quantitative PCR experiments (RT-qPCR) were preformed for 6 miRNAs of the two samples, and agreement was found between the sequencing and RT-qPCR data. To our knowledge, this is the first comprehensive study of miRNAs in P. crocea and of expression analysis of P. crocea miRNAs in response to poly(I:C) infection, and many miRNAs were differentially regulated under normal and infection conditions. These findings deepened our understanding of the role of miRNAs in the intricate host's immune system, and should be useful to develop new control strategies for host immune defense against various foreign infection in P. crocea. PMID:24945573

  4. NGS-QC Generator: A Quality Control System for ChIP-Seq and Related Deep Sequencing-Generated Datasets.

    PubMed

    Mendoza-Parra, Marco Antonio; Saleem, Mohamed-Ashick M; Blum, Matthias; Cholley, Pierre-Etienne; Gronemeyer, Hinrich

    2016-01-01

    The combination of massive parallel sequencing with a variety of modern DNA/RNA enrichment technologies provides means for interrogating functional protein-genome interactions (ChIP-seq), genome-wide transcriptional activity (RNA-seq; GRO-seq), chromatin accessibility (DNase-seq, FAIRE-seq, MNase-seq), and more recently the three-dimensional organization of chromatin (Hi-C, ChIA-PET). In systems biology-based approaches several of these readouts are generally cumulated with the aim of describing living systems through a reconstitution of the genome-regulatory functions. However, an issue that is often underestimated is that conclusions drawn from such multidimensional analyses of NGS-derived datasets critically depend on the quality of the compared datasets. To address this problem, we have developed the NGS-QC Generator, a quality control system that infers quality descriptors for any kind of ChIP-sequencing and related datasets. In this chapter we provide a detailed protocol for (1) assessing quality descriptors with the NGS-QC Generator; (2) to interpret the generated reports; and (3) to explore the database of QC indicators (www.ngs-qc.org) for >21,000 publicly available datasets. PMID:27008019

  5. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences.

    PubMed

    Quang, Daniel; Xie, Xiaohui

    2016-06-20

    Modeling the properties and functions of DNA sequences is an important, but challenging task in the broad field of genomics. This task is particularly difficult for non-coding DNA, the vast majority of which is still poorly understood in terms of function. A powerful predictive model for the function of non-coding DNA can have enormous benefit for both basic science and translational research because over 98% of the human genome is non-coding and 93% of disease-associated variants lie in these regions. To address this need, we propose DanQ, a novel hybrid convolutional and bi-directional long short-term memory recurrent neural network framework for predicting non-coding function de novo from sequence. In the DanQ model, the convolution layer captures regulatory motifs, while the recurrent layer captures long-term dependencies between the motifs in order to learn a regulatory 'grammar' to improve predictions. DanQ improves considerably upon other models across several metrics. For some regulatory markers, DanQ can achieve over a 50% relative improvement in the area under the precision-recall curve metric compared to related models. We have made the source code available at the github repository http://github.com/uci-cbcl/DanQ.

  6. Deep sequencing shows multiple oligouridylations are required for 3' to 5' degradation of histone mRNAs on polyribosomes.

    PubMed

    Slevin, Michael K; Meaux, Stacie; Welch, Joshua D; Bigler, Rebecca; Miliani de Marval, Paula L; Su, Wei; Rhoads, Robert E; Prins, Jan F; Marzluff, William F

    2014-03-20

    Histone mRNAs are rapidly degraded when DNA replication is inhibited during S phase with degradation initiating with oligouridylation of the stem loop at the 3' end. We developed a customized RNA sequencing strategy to identify the 3' termini of degradation intermediates of histone mRNAs. Using this strategy, we identified two types of oligouridylated degradation intermediates: RNAs ending at different sites of the 3' side of the stem loop that resulted from initial degradation by 3'hExo and intermediates near the stop codon and within the coding region. Sequencing of polyribosomal histone mRNAs revealed that degradation initiates and proceeds 3' to 5' on translating mRNA and that many intermediates are capped. Knockdown of the exosome-associated exonuclease PM/Scl-100, but not the Dis3L2 exonuclease, slows histone mRNA degradation consistent with 3' to 5' degradation by the exosome containing PM/Scl-100. Knockdown of No-go decay factors also slowed histone mRNA degradation, suggesting a role in removing ribosomes from partially degraded mRNAs.

  7. Ultra-deep profiling of alternatively spliced Drosophila Dscam isoforms by circularization-assisted multi-segment sequencing

    PubMed Central

    Sun, Wei; You, Xintian; Gogol-Döring, Andreas; He, Haihuai; Kise, Yoshiaki; Sohn, Madlen; Chen, Tao; Klebes, Ansgar; Schmucker, Dietmar; Chen, Wei

    2013-01-01

    The Drosophila melanogaster gene Dscam (Down syndrome cell adhesion molecule) can generate thousands of different ectodomains via mutual exclusive splicing of three large exon clusters. The isoform diversity plays a profound role in both neuronal wiring and pathogen recognition. However, the isoform expression pattern at the global level remained unexplored. Here, we developed a novel method that allows for direct quantification of the alternatively spliced exon combinations from over hundreds of millions of Dscam transcripts in one sequencing run. With unprecedented sequencing depth, we detected a total of 18 496 isoforms, out of 19 008 theoretically possible combinations. Importantly, we demonstrated that alternative splicing between different clusters is independent. Moreover, the isoforms were expressed across a broad dynamic range, with significant bias in cell/tissue and developmental stage-specific patterns. Hitherto underappreciated, such bias can dramatically reduce the ability of neurons to display unique surface receptor codes. Therefore, the seemingly excessive diversity encoded in the Dscam locus might nevertheless be essential for a robust self and non-self discrimination in neurons. PMID:23792425

  8. NGS-QC Generator: A Quality Control System for ChIP-Seq and Related Deep Sequencing-Generated Datasets.

    PubMed

    Mendoza-Parra, Marco Antonio; Saleem, Mohamed-Ashick M; Blum, Matthias; Cholley, Pierre-Etienne; Gronemeyer, Hinrich

    2016-01-01

    The combination of massive parallel sequencing with a variety of modern DNA/RNA enrichment technologies provides means for interrogating functional protein-genome interactions (ChIP-seq), genome-wide transcriptional activity (RNA-seq; GRO-seq), chromatin accessibility (DNase-seq, FAIRE-seq, MNase-seq), and more recently the three-dimensional organization of chromatin (Hi-C, ChIA-PET). In systems biology-based approaches several of these readouts are generally cumulated with the aim of describing living systems through a reconstitution of the genome-regulatory functions. However, an issue that is often underestimated is that conclusions drawn from such multidimensional analyses of NGS-derived datasets critically depend on the quality of the compared datasets. To address this problem, we have developed the NGS-QC Generator, a quality control system that infers quality descriptors for any kind of ChIP-sequencing and related datasets. In this chapter we provide a detailed protocol for (1) assessing quality descriptors with the NGS-QC Generator; (2) to interpret the generated reports; and (3) to explore the database of QC indicators (www.ngs-qc.org) for >21,000 publicly available datasets.

  9. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences.

    PubMed

    Quang, Daniel; Xie, Xiaohui

    2016-06-20

    Modeling the properties and functions of DNA sequences is an important, but challenging task in the broad field of genomics. This task is particularly difficult for non-coding DNA, the vast majority of which is still poorly understood in terms of function. A powerful predictive model for the function of non-coding DNA can have enormous benefit for both basic science and translational research because over 98% of the human genome is non-coding and 93% of disease-associated variants lie in these regions. To address this need, we propose DanQ, a novel hybrid convolutional and bi-directional long short-term memory recurrent neural network framework for predicting non-coding function de novo from sequence. In the DanQ model, the convolution layer captures regulatory motifs, while the recurrent layer captures long-term dependencies between the motifs in order to learn a regulatory 'grammar' to improve predictions. DanQ improves considerably upon other models across several metrics. For some regulatory markers, DanQ can achieve over a 50% relative improvement in the area under the precision-recall curve metric compared to related models. We have made the source code available at the github repository http://github.com/uci-cbcl/DanQ. PMID:27084946

  10. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences

    PubMed Central

    Quang, Daniel; Xie, Xiaohui

    2016-01-01

    Modeling the properties and functions of DNA sequences is an important, but challenging task in the broad field of genomics. This task is particularly difficult for non-coding DNA, the vast majority of which is still poorly understood in terms of function. A powerful predictive model for the function of non-coding DNA can have enormous benefit for both basic science and translational research because over 98% of the human genome is non-coding and 93% of disease-associated variants lie in these regions. To address this need, we propose DanQ, a novel hybrid convolutional and bi-directional long short-term memory recurrent neural network framework for predicting non-coding function de novo from sequence. In the DanQ model, the convolution layer captures regulatory motifs, while the recurrent layer captures long-term dependencies between the motifs in order to learn a regulatory ‘grammar’ to improve predictions. DanQ improves considerably upon other models across several metrics. For some regulatory markers, DanQ can achieve over a 50% relative improvement in the area under the precision-recall curve metric compared to related models. We have made the source code available at the github repository http://github.com/uci-cbcl/DanQ. PMID:27084946

  11. Identification and expression profiling of Vigna mungo microRNAs from leaf small RNA transcriptome by deep sequencing.

    PubMed

    Paul, Sujay; Kundu, Anirban; Pal, Amita

    2014-01-01

    MicroRNAs (miRNAs) represent a class of small non-coding RNA molecules that play a crucial role in post-transcriptional gene regulation. Several conserved and species-specific miRNAs have been characterized to date, predominantly from the plant species whose genome is well characterized. However, information on the variability of these regulatory RNAs in economically important but genetically less characterized crop species are limited. Vigna mungo is an important grain legume, which is grown primarily for its protein-rich edible seeds. miRNAs from this species have not been identified to date due to lack of genome sequence information. To identify miRNAs from V. mungo, a small RNA library was constructed from young leaves. High-throughput Illumina sequencing technology and bioinformatic analysis of the small RNA reads led to the identification of 66 miRNA loci represented by 45 conserved miRNAs belonging to 19 families and eight non-conserved miRNAs belonging to seven families. Besides, 13 novel miRNA candidates in V. mungo were also identified. Expression patterns of selected conserved, non-conserved, and novel miRNA candidates have been demonstrated in leaf, stem, and root tissues by quantitative polymerase chain reaction, and potential target genes were predicted for most of the conserved miRNAs. This information offers genomic resources for better understanding of miRNA mediated post-transcriptional gene regulation.

  12. Identification and Analysis of the Porcine MicroRNA in Porcine Cytomegalovirus-Infected Macrophages Using Deep Sequencing

    PubMed Central

    Liu, Xiao; Liao, Shan; Xu, Zhiwen; Zhu, Ling; Yang, Fan; Guo, Wanzhu

    2016-01-01

    Porcine cytomegalovirus (PCMV; genus Cytomegalovirus, subfamily Betaherpesvirinae, family Herpesviridae) is an immunosuppressive virus that mainly inhibits the immune function of T lymphocytes and macrophages, which has caused substantial damage in the farming industry. In this study, we obtained the miRNA expression profiles of PCMV-infected porcine macrophages via high-throughput sequencing. The comprehensive analysis of miRNA profiles showed that 239 miRNA database-annotated and 355 novel pig-encoded miRNAs were detected. Of these, 130 miRNAs showed significant differential expression between the PCMV-infected and uninfected porcine macrophages. The 10 differentially expressed pig-encoded miRNAs were further determined by stem-loop reverse-transcription polymerase chain reaction, and the results were consistent with the high-throughput sequencing. Gene Ontology analysis of the target genes of miRNAs in PCMV-infected porcine macrophages showed that the differentially expressed miRNAs are mainly involved in immune and metabolic processes. This is the first report of the miRNA transcriptome in porcine macrophages and an analysis of the miRNA regulatory mechanisms during PCMV infection. Further research into the regulatory mechanisms of miRNAs during immunosuppressive viral infections should contribute to the treatment and prevention of immunosuppressive viruses. PMID:26943793

  13. Bacterial communities associated with host-adapted populations of pea aphids revealed by deep sequencing of 16S ribosomal DNA.

    PubMed

    Gauthier, Jean-Pierre; Outreman, Yannick; Mieuzet, Lucie; Simon, Jean-Christophe

    2015-01-01

    Associations between microbes and animals are ubiquitous and hosts may benefit from harbouring microbial communities through improved resource exploitation or resistance to environmental stress. The pea aphid, Acyrthosiphon pisum, is the host of heritable bacterial symbionts, including the obligate endosymbiont Buchnera aphidicola and several facultative symbionts. While obligate symbionts supply aphids with key nutrients, facultative symbionts influence their hosts in many ways such as protection against natural enemies, heat tolerance, color change and reproduction alteration. The pea aphid also encompasses multiple plant-specialized biotypes, each adapted to one or a few legume species. Facultative symbiont communities differ strongly between biotypes, although bacterial involvement in plant specialization is uncertain. Here, we analyse the diversity of bacterial communities associated with nine biotypes of the pea aphid complex using amplicon pyrosequencing of 16S rRNA genes. Combined clustering and phylogenetic analyses of 16S sequences allowed identifying 21 bacterial OTUs (Operational Taxonomic Unit). More than 98% of the sequencing reads were assigned to known pea aphid symbionts. The presence of Wolbachia was confirmed in A. pisum while Erwinia and Pantoea, two gut associates, were detected in multiple samples. The diversity of bacterial communities harboured by pea aphid biotypes was very low, ranging from 3 to 11 OTUs across samples. Bacterial communities differed more between than within biotypes but this difference did not correlate with the genetic divergence between biotypes. Altogether, these results confirm that the aphid microbiota is dominated by a few heritable symbionts and that plant specialization is an important structuring factor of bacterial communities associated with the pea aphid complex. However, since we examined the microbiota of aphid samples kept a few generations in controlled conditions, it may be that bacterial diversity was

  14. Revisiting bovine pyometra--new insights into the disease using a culture-independent deep sequencing approach.

    PubMed

    Knudsen, Lif Rødtness Vesterby; Karstrup, Cecilia Christensen; Pedersen, Hanne Gervi; Agerholm, Jørgen Steen; Jensen, Tim Kåre; Klitgaard, Kirstine

    2015-02-25

    The bacteria present in the uterus during pyometra have previously been studied using bacteriological culturing. These studies identified Fusobacterium necrophorum and Trueperella pyogenes as the major contributors to the pathogenesis of pyometra. However, an increasing number of culture-independent studies have demonstrated that the bacterial diversity in most environments is underestimated in culture-based studies. Consequently, fastidious pyometra-associated pathogens may have been overlooked. Therefore, the primary purpose of this study was to investigate the diversity of bacteria in the uterus of cows with pyometra by using culture-independent 16S rRNA PCR combined with next generation sequencing. We investigated the microbial composition in the uterus of 21 cows with pyometra, which were obtained from a Danish slaughterhouse. Similar to the observations from the culture studies, Fusobacteriaceae, the family that F. necrophorum belongs to, was the operational taxonomic unit (OTU) observed in the largest quantities. By contrast, the Actinomycetaceae family, which includes T. pyogenes, constituted only 1% of the total number of reads. Thus we cannot confirm the previously reported role of species from this family in the pathogenesis of pyometra. Finally, we identified a large number of sequences representing three families of Gram-negative bacteria in the pyometra samples: Porphyromonadaceae, Mycoplasmataceae, and Pasteurellaceae. It is likely that these families comprise potential pathogenic species of a fastidious nature, which have been overlooked in previous studies. Our results increase the knowledge of the complexity of the pyometra microbiota and suggest that pathogens in addition to F. necrophorum may be involved in the pathogenesis of pyometra. PMID:25550285

  15. Small RNA deep sequencing identifies viral microRNAs during malignant catarrhal fever induced by alcelaphine herpesvirus 1.

    PubMed

    Sorel, Océane; Tuddenham, Lee; Myster, Françoise; Palmeira, Leonor; Kerkhofs, Pierre; Pfeffer, Sébastien; Vanderplasschen, Alain; Dewals, Benjamin G

    2015-11-01

    Alcelaphine herpesvirus 1 (AlHV-1) is a c-herpesvirus (c-HV) carried asymptomatically by wildebeest. Upon cross-species transmission, AlHV-1 induces a fatal lymphoproliferative disease named malignant catarrhal fever (MCF) in many ruminants, including cattle, and the rabbit model. Latency has been shown to be essential for MCF induction. However, the mechanisms causing the activation and proliferation of infected CD8+T cells are unknown. Many c-HVs express microRNAs (miRNAs). These small non-coding RNAs can regulate expression of host or viral target genes involved in various pathways and are thought to facilitate viral infection and/or mediate activation and proliferation of infected lymphocytes. The AlHV-1 genome has been predicted to encode a large number of miRNAs. However, their precise contribution in viral infection and pathogenesis in vivo remains unknown. Here, using cloning and sequencing of small RNAs we identified 36 potential miRNAs expressed in a lymphoblastoid cell line propagated from a calf infected with AlHV-1 and developing MCF. Among the sequenced candidate miRNAs, 32 were expressed on the reverse strand of the genome in two main clusters. The expression of these 32 viral miRNAs was further validated using Northern blot and quantitative reverse transcription PCR in lymphoid organs of MCF developing calves or rabbits. To determine the concerted contribution in MCF of 28 viralmiRNAs clustered in the non-protein-coding region of the AlHV-1 genome, a recombinant virus was produced. The absence of these 28 miRNAs did not affect viral growth in vitro or MCF induction in rabbits, indicating that the AlHV-1 miRNAs clustered in this non-protein-coding genomic region are dispensable for MCF induction. PMID:26329753

  16. Bacterial communities associated with host-adapted populations of pea aphids revealed by deep sequencing of 16S ribosomal DNA.

    PubMed

    Gauthier, Jean-Pierre; Outreman, Yannick; Mieuzet, Lucie; Simon, Jean-Christophe

    2015-01-01

    Associations between microbes and animals are ubiquitous and hosts may benefit from harbouring microbial communities through improved resource exploitation or resistance to environmental stress. The pea aphid, Acyrthosiphon pisum, is the host of heritable bacterial symbionts, including the obligate endosymbiont Buchnera aphidicola and several facultative symbionts. While obligate symbionts supply aphids with key nutrients, facultative symbionts influence their hosts in many ways such as protection against natural enemies, heat tolerance, color change and reproduction alteration. The pea aphid also encompasses multiple plant-specialized biotypes, each adapted to one or a few legume species. Facultative symbiont communities differ strongly between biotypes, although bacterial involvement in plant specialization is uncertain. Here, we analyse the diversity of bacterial communities associated with nine biotypes of the pea aphid complex using amplicon pyrosequencing of 16S rRNA genes. Combined clustering and phylogenetic analyses of 16S sequences allowed identifying 21 bacterial OTUs (Operational Taxonomic Unit). More than 98% of the sequencing reads were assigned to known pea aphid symbionts. The presence of Wolbachia was confirmed in A. pisum while Erwinia and Pantoea, two gut associates, were detected in multiple samples. The diversity of bacterial communities harboured by pea aphid biotypes was very low, ranging from 3 to 11 OTUs across samples. Bacterial communities differed more between than within biotypes but this difference did not correlate with the genetic divergence between biotypes. Altogether, these results confirm that the aphid microbiota is dominated by a few heritable symbionts and that plant specialization is an important structuring factor of bacterial communities associated with the pea aphid complex. However, since we examined the microbiota of aphid samples kept a few generations in controlled conditions, it may be that bacterial diversity was

  17. Implemented Lomb-Scargle periodogram: a valuable tool for improving cyclostratigraphic research on unevenly sampled deep-sea stratigraphic sequences

    NASA Astrophysics Data System (ADS)

    Pardo-Iguzquiza, Eulogio; Rodríguez-Tovar, Francisco J.

    2011-12-01

    One important handicap when working with stratigraphic sequences is the discontinuous character of the sedimentary record, especially relevant in cyclostratigraphic analysis. Uneven palaeoclimatic/palaeoceanographic time series are common, their cyclostratigraphic analysis being comparatively difficult because most spectral methodologies are appropriate only when working with even sampling. As a means to solve this problem, a program for calculating the smoothed Lomb-Scargle periodogram and cross-periodogram, which additionally evaluates the statistical confidence of the estimated power spectrum through a Monte Carlo procedure (the permutation test), has been developed. The spectral analysis of a short uneven time series calls for assessment of the statistical significance of the spectral peaks, since a periodogram can always be calculated but the main challenge resides in identifying true spectral features. To demonstrate the effectiveness of this program, two case studies are presented: the one deals with synthetic data and the other with paleoceanographic/palaeoclimatic proxies. On a simulated time series of 500 data, two uneven time series (with 100 and 25 data) were generated by selecting data at random. Comparative analysis between the power spectra from the simulated series and from the two uneven time series demonstrates the usefulness of the smoothed Lomb-Scargle periodogram for