Sample records for generation deep sequencing

  1. Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions

    PubMed Central

    2014-01-01

    Deep sequencing harnesses the high throughput nature of next generation sequencing technologies to generate population samples, treating information contained in individual reads as meaningful. Here, we review applications of deep sequencing to pathogen evolution. Pioneering deep sequencing studies from the virology literature are discussed, such as whole genome Roche-454 sequencing analyses of the dynamics of the rapidly mutating pathogens hepatitis C virus and HIV. Extension of the deep sequencing approach to bacterial populations is then discussed, including the impacts of emerging sequencing technologies. While it is clear that deep sequencing has unprecedented potential for assessing the genetic structure and evolutionary history of pathogen populations, bioinformatic challenges remain. We summarise current approaches to overcoming these challenges, in particular methods for detecting low frequency variants in the context of sequencing error and reconstructing individual haplotypes from short reads. PMID:24428920

  2. Making sense of deep sequencing

    PubMed Central

    Goldman, D.; Domschke, K.

    2016-01-01

    This review, the first of an occasional series, tries to make sense of the concepts and uses of deep sequencing of polynucleic acids (DNA and RNA). Deep sequencing, synonymous with next-generation sequencing, high-throughput sequencing and massively parallel sequencing, includes whole genome sequencing but is more often and diversely applied to specific parts of the genome captured in different ways, for example the highly expressed portion of the genome known as the exome and portions of the genome that are epigenetically marked either by DNA methylation, the binding of proteins including histones, or that are in different configurations and thus more or less accessible to enzymes that cleave DNA. Deep sequencing of RNA (RNASeq) reverse-transcribed to complementary DNA is invaluable for measuring RNA expression and detecting changes in RNA structure. Important concepts in deep sequencing include the length and depth of sequence reads, mapping and assembly of reads, sequencing error, haplotypes, and the propensity of deep sequencing, as with other types of ‘big data’, to generate large numbers of errors, requiring monitoring for methodologic biases and strategies for replication and validation. Deep sequencing yields a unique genetic fingerprint that can be used to identify a person, and a trove of predictors of genetic medical diseases. Deep sequencing to identify epigenetic events including changes in DNA methylation and RNA expression can reveal the history and impact of environmental exposures. Because of the power of sequencing to identify and deliver biomedically significant information about a person and their blood relatives, it creates ethical dilemmas and practical challenges in research and clinical care, for example the decision and procedures to report incidental findings that will increasingly and frequently be discovered. PMID:24925306

  3. deepTools: a flexible platform for exploring deep-sequencing data.

    PubMed

    Ramírez, Fidel; Dündar, Friederike; Diehl, Sarah; Grüning, Björn A; Manke, Thomas

    2014-07-01

    We present a Galaxy based web server for processing and visualizing deeply sequenced data. The web server's core functionality consists of a suite of newly developed tools, called deepTools, that enable users with little bioinformatic background to explore the results of their sequencing experiments in a standardized setting. Users can upload pre-processed files with continuous data in standard formats and generate heatmaps and summary plots in a straight-forward, yet highly customizable manner. In addition, we offer several tools for the analysis of files containing aligned reads and enable efficient and reproducible generation of normalized coverage files. As a modular and open-source platform, deepTools can easily be expanded and customized to future demands and developments. The deepTools webserver is freely available at http://deeptools.ie-freiburg.mpg.de and is accompanied by extensive documentation and tutorials aimed at conveying the principles of deep-sequencing data analysis. The web server can be used without registration. deepTools can be installed locally either stand-alone or as part of Galaxy. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Position-specific automated processing of V3 env ultra-deep pyrosequencing data for predicting HIV-1 tropism

    PubMed Central

    Jeanne, Nicolas; Saliou, Adrien; Carcenac, Romain; Lefebvre, Caroline; Dubois, Martine; Cazabat, Michelle; Nicot, Florence; Loiseau, Claire; Raymond, Stéphanie; Izopet, Jacques; Delobel, Pierre

    2015-01-01

    HIV-1 coreceptor usage must be accurately determined before starting CCR5 antagonist-based treatment as the presence of undetected minor CXCR4-using variants can cause subsequent virological failure. Ultra-deep pyrosequencing of HIV-1 V3 env allows to detect low levels of CXCR4-using variants that current genotypic approaches miss. However, the computation of the mass of sequence data and the need to identify true minor variants while excluding artifactual sequences generated during amplification and ultra-deep pyrosequencing is rate-limiting. Arbitrary fixed cut-offs below which minor variants are discarded are currently used but the errors generated during ultra-deep pyrosequencing are sequence-dependant rather than random. We have developed an automated processing of HIV-1 V3 env ultra-deep pyrosequencing data that uses biological filters to discard artifactual or non-functional V3 sequences followed by statistical filters to determine position-specific sensitivity thresholds, rather than arbitrary fixed cut-offs. It allows to retain authentic sequences with point mutations at V3 positions of interest and discard artifactual ones with accurate sensitivity thresholds. PMID:26585833

  5. Position-specific automated processing of V3 env ultra-deep pyrosequencing data for predicting HIV-1 tropism.

    PubMed

    Jeanne, Nicolas; Saliou, Adrien; Carcenac, Romain; Lefebvre, Caroline; Dubois, Martine; Cazabat, Michelle; Nicot, Florence; Loiseau, Claire; Raymond, Stéphanie; Izopet, Jacques; Delobel, Pierre

    2015-11-20

    HIV-1 coreceptor usage must be accurately determined before starting CCR5 antagonist-based treatment as the presence of undetected minor CXCR4-using variants can cause subsequent virological failure. Ultra-deep pyrosequencing of HIV-1 V3 env allows to detect low levels of CXCR4-using variants that current genotypic approaches miss. However, the computation of the mass of sequence data and the need to identify true minor variants while excluding artifactual sequences generated during amplification and ultra-deep pyrosequencing is rate-limiting. Arbitrary fixed cut-offs below which minor variants are discarded are currently used but the errors generated during ultra-deep pyrosequencing are sequence-dependant rather than random. We have developed an automated processing of HIV-1 V3 env ultra-deep pyrosequencing data that uses biological filters to discard artifactual or non-functional V3 sequences followed by statistical filters to determine position-specific sensitivity thresholds, rather than arbitrary fixed cut-offs. It allows to retain authentic sequences with point mutations at V3 positions of interest and discard artifactual ones with accurate sensitivity thresholds.

  6. Quantitative phenotyping via deep barcode sequencing.

    PubMed

    Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey

    2009-10-01

    Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.

  7. Deep Sequencing to Identify the Causes of Viral Encephalitis

    PubMed Central

    Chan, Benjamin K.; Wilson, Theodore; Fischer, Kael F.; Kriesel, John D.

    2014-01-01

    Deep sequencing allows for a rapid, accurate characterization of microbial DNA and RNA sequences in many types of samples. Deep sequencing (also called next generation sequencing or NGS) is being developed to assist with the diagnosis of a wide variety of infectious diseases. In this study, seven frozen brain samples from deceased subjects with recent encephalitis were investigated. RNA from each sample was extracted, randomly reverse transcribed and sequenced. The sequence analysis was performed in a blinded fashion and confirmed with pathogen-specific PCR. This analysis successfully identified measles virus sequences in two brain samples and herpes simplex virus type-1 sequences in three brain samples. No pathogen was identified in the other two brain specimens. These results were concordant with pathogen-specific PCR and partially concordant with prior neuropathological examinations, demonstrating that deep sequencing can accurately identify viral infections in frozen brain tissue. PMID:24699691

  8. Quantitative phenotyping via deep barcode sequencing

    PubMed Central

    Smith, Andrew M.; Heisler, Lawrence E.; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J.; Chee, Mark; Roth, Frederick P.; Giaever, Guri; Nislow, Corey

    2009-01-01

    Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or “Bar-seq,” outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that ∼20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene–environment interactions on a genome-wide scale. PMID:19622793

  9. Understanding the complex evolution of rapidly mutating viruses with deep sequencing: Beyond the analysis of viral diversity.

    PubMed

    Leung, Preston; Eltahla, Auda A; Lloyd, Andrew R; Bull, Rowena A; Luciani, Fabio

    2017-07-15

    With the advent of affordable deep sequencing technologies, detection of low frequency variants within genetically diverse viral populations can now be achieved with unprecedented depth and efficiency. The high-resolution data provided by next generation sequencing technologies is currently recognised as the gold standard in estimation of viral diversity. In the analysis of rapidly mutating viruses, longitudinal deep sequencing datasets from viral genomes during individual infection episodes, as well as at the epidemiological level during outbreaks, now allow for more sophisticated analyses such as statistical estimates of the impact of complex mutation patterns on the evolution of the viral populations both within and between hosts. These analyses are revealing more accurate descriptions of the evolutionary dynamics that underpin the rapid adaptation of these viruses to the host response, and to drug therapies. This review assesses recent developments in methods and provide informative research examples using deep sequencing data generated from rapidly mutating viruses infecting humans, particularly hepatitis C virus (HCV), human immunodeficiency virus (HIV), Ebola virus and influenza virus, to understand the evolution of viral genomes and to explore the relationship between viral mutations and the host adaptive immune response. Finally, we discuss limitations in current technologies, and future directions that take advantage of publically available large deep sequencing datasets. Copyright © 2016 Elsevier B.V. All rights reserved.

  10. Identification of Prostate Cancer-Specific microDNAs

    DTIC Science & Technology

    2016-02-01

    circular DNA by rolling circle amplification (RCA) and then amplified DNA fragments were subject to deep sequencing. Deep sequencing of the...demonstrate the existence of microDNAs in prostate cancer. We adopted multiple displacement amplification (MDA) with random 2 primers for enriched...prostate cancer cells through multiple displacement amplification and next generation sequencing. R e la ti v e c e ll g ro w th ( % ) 0 20

  11. A deep learning method for lincRNA detection using auto-encoder algorithm.

    PubMed

    Yu, Ning; Yu, Zeng; Pan, Yi

    2017-12-06

    RNA sequencing technique (RNA-seq) enables scientists to develop novel data-driven methods for discovering more unidentified lincRNAs. Meantime, knowledge-based technologies are experiencing a potential revolution ignited by the new deep learning methods. By scanning the newly found data set from RNA-seq, scientists have found that: (1) the expression of lincRNAs appears to be regulated, that is, the relevance exists along the DNA sequences; (2) lincRNAs contain some conversed patterns/motifs tethered together by non-conserved regions. The two evidences give the reasoning for adopting knowledge-based deep learning methods in lincRNA detection. Similar to coding region transcription, non-coding regions are split at transcriptional sites. However, regulatory RNAs rather than message RNAs are generated. That is, the transcribed RNAs participate the biological process as regulatory units instead of generating proteins. Identifying these transcriptional regions from non-coding regions is the first step towards lincRNA recognition. The auto-encoder method achieves 100% and 92.4% prediction accuracy on transcription sites over the putative data sets. The experimental results also show the excellent performance of predictive deep neural network on the lincRNA data sets compared with support vector machine and traditional neural network. In addition, it is validated through the newly discovered lincRNA data set and one unreported transcription site is found by feeding the whole annotated sequences through the deep learning machine, which indicates that deep learning method has the extensive ability for lincRNA prediction. The transcriptional sequences of lincRNAs are collected from the annotated human DNA genome data. Subsequently, a two-layer deep neural network is developed for the lincRNA detection, which adopts the auto-encoder algorithm and utilizes different encoding schemes to obtain the best performance over intergenic DNA sequence data. Driven by those newly annotated lincRNA data, deep learning methods based on auto-encoder algorithm can exert their capability in knowledge learning in order to capture the useful features and the information correlation along DNA genome sequences for lincRNA detection. As our knowledge, this is the first application to adopt the deep learning techniques for identifying lincRNA transcription sequences.

  12. A robust and cost-effective approach to sequence and analyze complete genomes of small RNA viruses

    USDA-ARS?s Scientific Manuscript database

    Background: Next-generation sequencing (NGS) allows ultra-deep sequencing of nucleic acids. The use of sequence-independent amplification of viral nucleic acids without utilization of target-specific primers provides advantages over traditional sequencing methods and allows detection of unsuspected ...

  13. Discovery radiomics via evolutionary deep radiomic sequencer discovery for pathologically proven lung cancer detection.

    PubMed

    Shafiee, Mohammad Javad; Chung, Audrey G; Khalvati, Farzad; Haider, Masoom A; Wong, Alexander

    2017-10-01

    While lung cancer is the second most diagnosed form of cancer in men and women, a sufficiently early diagnosis can be pivotal in patient survival rates. Imaging-based, or radiomics-driven, detection methods have been developed to aid diagnosticians, but largely rely on hand-crafted features that may not fully encapsulate the differences between cancerous and healthy tissue. Recently, the concept of discovery radiomics was introduced, where custom abstract features are discovered from readily available imaging data. We propose an evolutionary deep radiomic sequencer discovery approach based on evolutionary deep intelligence. Motivated by patient privacy concerns and the idea of operational artificial intelligence, the evolutionary deep radiomic sequencer discovery approach organically evolves increasingly more efficient deep radiomic sequencers that produce significantly more compact yet similarly descriptive radiomic sequences over multiple generations. As a result, this framework improves operational efficiency and enables diagnosis to be run locally at the radiologist's computer while maintaining detection accuracy. We evaluated the evolved deep radiomic sequencer (EDRS) discovered via the proposed evolutionary deep radiomic sequencer discovery framework against state-of-the-art radiomics-driven and discovery radiomics methods using clinical lung CT data with pathologically proven diagnostic data from the LIDC-IDRI dataset. The EDRS shows improved sensitivity (93.42%), specificity (82.39%), and diagnostic accuracy (88.78%) relative to previous radiomics approaches.

  14. Transcriptome sequences resolve deep relationships of the grape family.

    PubMed

    Wen, Jun; Xiong, Zhiqiang; Nie, Ze-Long; Mao, Likai; Zhu, Yabing; Kan, Xian-Zhao; Ickert-Bond, Stefanie M; Gerrath, Jean; Zimmer, Elizabeth A; Fang, Xiao-Dong

    2013-01-01

    Previous phylogenetic studies of the grape family (Vitaceae) yielded poorly resolved deep relationships, thus impeding our understanding of the evolution of the family. Next-generation sequencing now offers access to protein coding sequences very easily, quickly and cost-effectively. To improve upon earlier work, we extracted 417 orthologous single-copy nuclear genes from the transcriptomes of 15 species of the Vitaceae, covering its phylogenetic diversity. The resulting transcriptome phylogeny provides robust support for the deep relationships, showing the phylogenetic utility of transcriptome data for plants over a time scale at least since the mid-Cretaceous. The pros and cons of transcriptome data for phylogenetic inference in plants are also evaluated.

  15. Optimization of conditions to sequence long cDNAs from viruses

    USDA-ARS?s Scientific Manuscript database

    Fourth generation sequencing with the Minion nanopore sequencer provides opportunity to obtain deep coverage and long read for single molecules. This will benefit studies on RNA viruses. In the past, Sanger, Illumina, and Ion Torrent sequencing have been utilized to study RNA viruses. Both technique...

  16. SNP discovery through de novo deep sequencing using the next generation of DNA sequencers

    USDA-ARS?s Scientific Manuscript database

    The production of high volumes of DNA sequence data using new technologies has permitted more efficient identification of single nucleotide polymorphisms in vertebrate genomes. This chapter presented practical methodology for production and analysis of DNA sequence data for SNP discovery....

  17. DSAP: deep-sequencing small RNA analysis pipeline.

    PubMed

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.

  18. Deep sequencing methods for protein engineering and design.

    PubMed

    Wrenbeck, Emily E; Faber, Matthew S; Whitehead, Timothy A

    2017-08-01

    The advent of next-generation sequencing (NGS) has revolutionized protein science, and the development of complementary methods enabling NGS-driven protein engineering have followed. In general, these experiments address the functional consequences of thousands of protein variants in a massively parallel manner using genotype-phenotype linked high-throughput functional screens followed by DNA counting via deep sequencing. We highlight the use of information rich datasets to engineer protein molecular recognition. Examples include the creation of multiple dual-affinity Fabs targeting structurally dissimilar epitopes and engineering of a broad germline-targeted anti-HIV-1 immunogen. Additionally, we highlight the generation of enzyme fitness landscapes for conducting fundamental studies of protein behavior and evolution. We conclude with discussion of technological advances. Copyright © 2016 Elsevier Ltd. All rights reserved.

  19. Genomic variation in macrophage-cultured European porcine reproductive and respiratory syndrome virus Olot/91 revealed using ultra-deep next generation sequencing.

    PubMed

    Lu, Zen H; Brown, Alexander; Wilson, Alison D; Calvert, Jay G; Balasch, Monica; Fuentes-Utrilla, Pablo; Loecherbach, Julia; Turner, Frances; Talbot, Richard; Archibald, Alan L; Ait-Ali, Tahar

    2014-03-04

    Porcine Reproductive and Respiratory Syndrome (PRRS) is a disease of major economic impact worldwide. The etiologic agent of this disease is the PRRS virus (PRRSV). Increasing evidence suggest that microevolution within a coexisting quasispecies population can give rise to high sequence heterogeneity in PRRSV. We developed a pipeline based on the ultra-deep next generation sequencing approach to first construct the complete genome of a European PRRSV, strain Olot/9, cultured on macrophages and then capture the rare variants representative of the mixed quasispecies population. Olot/91 differs from the reference Lelystad strain by about 5% and a total of 88 variants, with frequencies as low as 1%, were detected in the mixed population. These variants included 16 non-synonymous variants concentrated in the genes encoding structural and nonstructural proteins; including Glycoprotein 2a and 5. Using an ultra-deep sequencing methodology, the complete genome of Olot/91 was constructed without any prior knowledge of the sequence. Rare variants that constitute minor fractions of the heterogeneous PRRSV population could successfully be detected to allow further exploration of microevolutionary events.

  20. Dendrites, deep learning, and sequences in the hippocampus.

    PubMed

    Bhalla, Upinder S

    2017-10-12

    The hippocampus places us both in time and space. It does so over remarkably large spans: milliseconds to years, and centimeters to kilometers. This works for sensory representations, for memory, and for behavioral context. How does it fit in such wide ranges of time and space scales, and keep order among the many dimensions of stimulus context? A key organizing principle for a wide sweep of scales and stimulus dimensions is that of order in time, or sequences. Sequences of neuronal activity are ubiquitous in sensory processing, in motor control, in planning actions, and in memory. Against this strong evidence for the phenomenon, there are currently more models than definite experiments about how the brain generates ordered activity. The flip side of sequence generation is discrimination. Discrimination of sequences has been extensively studied at the behavioral, systems, and modeling level, but again physiological mechanisms are fewer. It is against this backdrop that I discuss two recent developments in neural sequence computation, that at face value share little beyond the label "neural." These are dendritic sequence discrimination, and deep learning. One derives from channel physiology and molecular signaling, the other from applied neural network theory - apparently extreme ends of the spectrum of neural circuit detail. I suggest that each of these topics has deep lessons about the possible mechanisms, scales, and capabilities of hippocampal sequence computation. © 2017 Wiley Periodicals, Inc.

  1. Virus Identification in Unknown Tropical Febrile Illness Cases Using Deep Sequencing

    PubMed Central

    Balmaseda, Angel; Harris, Eva; DeRisi, Joseph L.

    2012-01-01

    Dengue virus is an emerging infectious agent that infects an estimated 50–100 million people annually worldwide, yet current diagnostic practices cannot detect an etiologic pathogen in ∼40% of dengue-like illnesses. Metagenomic approaches to pathogen detection, such as viral microarrays and deep sequencing, are promising tools to address emerging and non-diagnosable disease challenges. In this study, we used the Virochip microarray and deep sequencing to characterize the spectrum of viruses present in human sera from 123 Nicaraguan patients presenting with dengue-like symptoms but testing negative for dengue virus. We utilized a barcoding strategy to simultaneously deep sequence multiple serum specimens, generating on average over 1 million reads per sample. We then implemented a stepwise bioinformatic filtering pipeline to remove the majority of human and low-quality sequences to improve the speed and accuracy of subsequent unbiased database searches. By deep sequencing, we were able to detect virus sequence in 37% (45/123) of previously negative cases. These included 13 cases with Human Herpesvirus 6 sequences. Other samples contained sequences with similarity to sequences from viruses in the Herpesviridae, Flaviviridae, Circoviridae, Anelloviridae, Asfarviridae, and Parvoviridae families. In some cases, the putative viral sequences were virtually identical to known viruses, and in others they diverged, suggesting that they may derive from novel viruses. These results demonstrate the utility of unbiased metagenomic approaches in the detection of known and divergent viruses in the study of tropical febrile illness. PMID:22347512

  2. Transcriptome Sequences Resolve Deep Relationships of the Grape Family

    PubMed Central

    Wen, Jun; Xiong, Zhiqiang; Nie, Ze-Long; Mao, Likai; Zhu, Yabing; Kan, Xian-Zhao; Ickert-Bond, Stefanie M.; Gerrath, Jean; Zimmer, Elizabeth A.; Fang, Xiao-Dong

    2013-01-01

    Previous phylogenetic studies of the grape family (Vitaceae) yielded poorly resolved deep relationships, thus impeding our understanding of the evolution of the family. Next-generation sequencing now offers access to protein coding sequences very easily, quickly and cost-effectively. To improve upon earlier work, we extracted 417 orthologous single-copy nuclear genes from the transcriptomes of 15 species of the Vitaceae, covering its phylogenetic diversity. The resulting transcriptome phylogeny provides robust support for the deep relationships, showing the phylogenetic utility of transcriptome data for plants over a time scale at least since the mid-Cretaceous. The pros and cons of transcriptome data for phylogenetic inference in plants are also evaluated. PMID:24069307

  3. Complete genome sequence of Southern tomato virus naturally infecting tomatoes in Bangladesh using small RNA deep sequencing

    USDA-ARS?s Scientific Manuscript database

    The complete genome sequence of a Southern tomato virus (STV) isolate on tomato plants in a seed production field in Bangladesh was obtained for the first time using next generation sequencing. The identified isolate STV_BD-13 shares high degree of sequence identity (99%) with several known STV isol...

  4. Complete genome sequence of southern tomato virus identified from China using next generation sequencing

    USDA-ARS?s Scientific Manuscript database

    Complete genome sequence of a double-stranded RNA (dsRNA) virus, southern tomato virus (STV), on tomatoes in China, was elucidated using small RNAs deep sequencing. The identified STV_CN12 shares 99% sequence identity to other isolates from Mexico, France, Spain, and U.S. This is the first report ...

  5. Insights into Deep-Sea Sediment Fungal Communities from the East Indian Ocean Using Targeted Environmental Sequencing Combined with Traditional Cultivation

    PubMed Central

    Zhang, Xiao-yong; Tang, Gui-ling; Xu, Xin-ya; Nong, Xu-hua; Qi, Shu-Hua

    2014-01-01

    The fungal diversity in deep-sea environments has recently gained an increasing amount attention. Our knowledge and understanding of the true fungal diversity and the role it plays in deep-sea environments, however, is still limited. We investigated the fungal community structure in five sediments from a depth of ∼4000 m in the East India Ocean using a combination of targeted environmental sequencing and traditional cultivation. This approach resulted in the recovery of a total of 45 fungal operational taxonomic units (OTUs) and 20 culturable fungal phylotypes. This finding indicates that there is a great amount of fungal diversity in the deep-sea sediments collected in the East Indian Ocean. Three fungal OTUs and one culturable phylotype demonstrated high divergence (89%–97%) from the existing sequences in the GenBank. Moreover, 44.4% fungal OTUs and 30% culturable fungal phylotypes are new reports for deep-sea sediments. These results suggest that the deep-sea sediments from the East India Ocean can serve as habitats for new fungal communities compared with other deep-sea environments. In addition, different fungal community could be detected when using targeted environmental sequencing compared with traditional cultivation in this study, which suggests that a combination of targeted environmental sequencing and traditional cultivation will generate a more diverse fungal community in deep-sea environments than using either targeted environmental sequencing or traditional cultivation alone. This study is the first to report new insights into the fungal communities in deep-sea sediments from the East Indian Ocean, which increases our knowledge and understanding of the fungal diversity in deep-sea environments. PMID:25272044

  6. Use of sequence-independent-single-primer-amplification (SISPA) for whole genome sequencing using illumina MiSeq platform for avian influenza virus, Newcastle disease virus, and infectious bronchitis virus

    USDA-ARS?s Scientific Manuscript database

    Over the past decade, Next Generation Sequencing (NGS) technologies, also called deep sequencing, have continued to evolve, increasing capacity and lower the cost necessary for large genome sequencing projects. The one of the advantage of NGS platforms is the possibility to sequence the samples with...

  7. Deep sequencing detects very-low-grade somatic mosaicism in the unaffected mother of siblings with nemaline myopathy.

    PubMed

    Miyatake, Satoko; Koshimizu, Eriko; Hayashi, Yukiko K; Miya, Kazushi; Shiina, Masaaki; Nakashima, Mitsuko; Tsurusaki, Yoshinori; Miyake, Noriko; Saitsu, Hirotomo; Ogata, Kazuhiro; Nishino, Ichizo; Matsumoto, Naomichi

    2014-07-01

    When an expected mutation in a particular disease-causing gene is not identified in a suspected carrier, it is usually assumed to be due to germline mosaicism. We report here very-low-grade somatic mosaicism in ACTA1 in an unaffected mother of two siblings affected with a neonatal form of nemaline myopathy. The mosaicism was detected by deep resequencing using a next-generation sequencer. We identified a novel heterozygous mutation in ACTA1, c.448A>G (p.Thr150Ala), in the affected siblings. Three-dimensional structural modeling suggested that this mutation may affect polymerization and/or actin's interactions with other proteins. In this family, we expected autosomal dominant inheritance with either parent demonstrating germline or somatic mosaicism. Sanger sequencing identified no mutation. However, further deep resequencing of this mutation on a next-generation sequencer identified very-low-grade somatic mosaicism in the mother: 0.4%, 1.1%, and 8.3% in the saliva, blood leukocytes, and nails, respectively. Our study demonstrates the possibility of very-low-grade somatic mosaicism in suspected carriers, rather than germline mosaicism. Copyright © 2014 Elsevier B.V. All rights reserved.

  8. miRanalyzer: a microRNA detection and analysis tool for next-generation sequencing experiments.

    PubMed

    Hackenberg, Michael; Sturm, Martin; Langenberger, David; Falcón-Pérez, Juan Manuel; Aransay, Ana M

    2009-07-01

    Next-generation sequencing allows now the sequencing of small RNA molecules and the estimation of their expression levels. Consequently, there will be a high demand of bioinformatics tools to cope with the several gigabytes of sequence data generated in each single deep-sequencing experiment. Given this scene, we developed miRanalyzer, a web server tool for the analysis of deep-sequencing experiments for small RNAs. The web server tool requires a simple input file containing a list of unique reads and its copy numbers (expression levels). Using these data, miRanalyzer (i) detects all known microRNA sequences annotated in miRBase, (ii) finds all perfect matches against other libraries of transcribed sequences and (iii) predicts new microRNAs. The prediction of new microRNAs is an especially important point as there are many species with very few known microRNAs. Therefore, we implemented a highly accurate machine learning algorithm for the prediction of new microRNAs that reaches AUC values of 97.9% and recall values of up to 75% on unseen data. The web tool summarizes all the described steps in a single output page, which provides a comprehensive overview of the analysis, adding links to more detailed output pages for each analysis module. miRanalyzer is available at http://web.bioinformatics.cicbiogune.es/microRNA/.

  9. Targeted parallel sequencing of the Musa species: searching for an alternative model system for polyploidy studies

    USDA-ARS?s Scientific Manuscript database

    Modern day genomics holds the promise of solving the complexities of basic plant sciences, and of catalyzing practical advances in plant breeding. While contiguous, "base perfect" deep sequencing is a key module of any genome project, recent advances in parallel next generation sequencing technologi...

  10. RNA-Seq analysis to capture the transcriptome landscape of a single cell

    PubMed Central

    Tang, Fuchou; Barbacioru, Catalin; Nordman, Ellen; Xu, Nanlan; Bashkirov, Vladimir I; Lao, Kaiqin; Surani, M. Azim

    2013-01-01

    We describe here a protocol for digital transcriptome analysis in a single mouse blastomere using a deep sequencing approach. An individual blastomere was first isolated and put into lysate buffer by mouth pipette. Reverse transcription was then performed directly on the whole cell lysate. After this, the free primers were removed by Exonuclease I and a poly(A) tail was added to the 3′ end of the first-strand cDNA by Terminal Deoxynucleotidyl Transferase. Then the single cell cDNAs were amplified by 20 plus 9 cycles of PCR. Then 100-200 ng of these amplified cDNAs were used to construct a sequencing library. The sequencing library can be used for deep sequencing using the SOLiD system. Compared with the cDNA microarray technique, our assay can capture up to 75% more genes expressed in early embryos. The protocol can generate deep sequencing libraries within 6 days for 16 single cell samples. PMID:20203668

  11. Geoseq: a tool for dissecting deep-sequencing datasets.

    PubMed

    Gurtowski, James; Cancio, Anthony; Shah, Hardik; Levovitz, Chaya; George, Ajish; Homann, Robert; Sachidanandam, Ravi

    2010-10-12

    Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a) identify differential isoform expression in mRNA-seq datasets, b) identify miRNAs (microRNAs) in libraries, and identify mature and star sequences in miRNAS and c) to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.

  12. Enhanced sensitivity for detection of low-level germline mosaic RB1 mutations in sporadic retinoblastoma cases using deep semiconductor sequencing.

    PubMed

    Chen, Zhao; Moran, Kimberly; Richards-Yutz, Jennifer; Toorens, Erik; Gerhart, Daniel; Ganguly, Tapan; Shields, Carol L; Ganguly, Arupa

    2014-03-01

    Sporadic retinoblastoma (RB) is caused by de novo mutations in the RB1 gene. Often, these mutations are present as mosaic mutations that cannot be detected by Sanger sequencing. Next-generation deep sequencing allows unambiguous detection of the mosaic mutations in lymphocyte DNA. Deep sequencing of the RB1 gene on lymphocyte DNA from 20 bilateral and 70 unilateral RB cases was performed, where Sanger sequencing excluded the presence of mutations. The individual exons of the RB1 gene from each sample were amplified, pooled, ligated to barcoded adapters, and sequenced using semiconductor sequencing on an Ion Torrent Personal Genome Machine. Six low-level mosaic mutations were identified in bilateral RB and four in unilateral RB cases. The incidence of low-level mosaic mutation was estimated to be 30% and 6%, respectively, in sporadic bilateral and unilateral RB cases, previously classified as mutation negative. The frequency of point mutations detectable in lymphocyte DNA increased from 96% to 97% for bilateral RB and from 13% to 18% for unilateral RB. The use of deep sequencing technology increased the sensitivity of the detection of low-level germline mosaic mutations in the RB1 gene. This finding has significant implications for improved clinical diagnosis, genetic counseling, surveillance, and management of RB. © 2013 WILEY PERIODICALS, INC.

  13. Full genome virus detection in fecal samples using sensitive nucleic acid preparation, deep sequencing, and a novel iterative sequence classification algorithm.

    PubMed

    Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J; Kellam, Paul; van der Hoek, Lia

    2014-01-01

    We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis.

  14. Full Genome Virus Detection in Fecal Samples Using Sensitive Nucleic Acid Preparation, Deep Sequencing, and a Novel Iterative Sequence Classification Algorithm

    PubMed Central

    Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J.; Kellam, Paul; van der Hoek, Lia

    2014-01-01

    We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis. PMID:24695106

  15. HomozygosityMapper2012--bridging the gap between homozygosity mapping and deep sequencing.

    PubMed

    Seelow, Dominik; Schuelke, Markus

    2012-07-01

    Homozygosity mapping is a common method to map recessive traits in consanguineous families. To facilitate these analyses, we have developed HomozygosityMapper, a web-based approach to homozygosity mapping. HomozygosityMapper allows researchers to directly upload the genotype files produced by the major genotyping platforms as well as deep sequencing data. It detects stretches of homozygosity shared by the affected individuals and displays them graphically. Users can interactively inspect the underlying genotypes, manually refine these regions and eventually submit them to our candidate gene search engine GeneDistiller to identify the most promising candidate genes. Here, we present the new version of HomozygosityMapper. The most striking new feature is the support of Next Generation Sequencing *.vcf files as input. Upon users' requests, we have implemented the analysis of common experimental rodents as well as of important farm animals. Furthermore, we have extended the options for single families and loss of heterozygosity studies. Another new feature is the export of *.bed files for targeted enrichment of the potential disease regions for deep sequencing strategies. HomozygosityMapper also generates files for conventional linkage analyses which are already restricted to the possible disease regions, hence superseding CPU-intensive genome-wide analyses. HomozygosityMapper is freely available at http://www.homozygositymapper.org/.

  16. Single-virion sequencing of lamivudine-treated HBV populations reveal population evolution dynamics and demographic history.

    PubMed

    Zhu, Yuan O; Aw, Pauline P K; de Sessions, Paola Florez; Hong, Shuzhen; See, Lee Xian; Hong, Lewis Z; Wilm, Andreas; Li, Chen Hao; Hue, Stephane; Lim, Seng Gee; Nagarajan, Niranjan; Burkholder, William F; Hibberd, Martin

    2017-10-27

    Viral populations are complex, dynamic, and fast evolving. The evolution of groups of closely related viruses in a competitive environment is termed quasispecies. To fully understand the role that quasispecies play in viral evolution, characterizing the trajectories of viral genotypes in an evolving population is the key. In particular, long-range haplotype information for thousands of individual viruses is critical; yet generating this information is non-trivial. Popular deep sequencing methods generate relatively short reads that do not preserve linkage information, while third generation sequencing methods have higher error rates that make detection of low frequency mutations a bioinformatics challenge. Here we applied BAsE-Seq, an Illumina-based single-virion sequencing technology, to eight samples from four chronic hepatitis B (CHB) patients - once before antiviral treatment and once after viral rebound due to resistance. With single-virion sequencing, we obtained 248-8796 single-virion sequences per sample, which allowed us to find evidence for both hard and soft selective sweeps. We were able to reconstruct population demographic history that was independently verified by clinically collected data. We further verified four of the samples independently through PacBio SMRT and Illumina Pooled deep sequencing. Overall, we showed that single-virion sequencing yields insight into viral evolution and population dynamics in an efficient and high throughput manner. We believe that single-virion sequencing is widely applicable to the study of viral evolution in the context of drug resistance and host adaptation, allows differentiation between soft or hard selective sweeps, and may be useful in the reconstruction of intra-host viral population demographic history.

  17. Ultra-deep mutant spectrum profiling: improving sequencing accuracy using overlapping read pairs.

    PubMed

    Chen-Harris, Haiyin; Borucki, Monica K; Torres, Clinton; Slezak, Tom R; Allen, Jonathan E

    2013-02-12

    High throughput sequencing is beginning to make a transformative impact in the area of viral evolution. Deep sequencing has the potential to reveal the mutant spectrum within a viral sample at high resolution, thus enabling the close examination of viral mutational dynamics both within- and between-hosts. The challenge however, is to accurately model the errors in the sequencing data and differentiate real viral mutations, particularly those that exist at low frequencies, from sequencing errors. We demonstrate that overlapping read pairs (ORP) -- generated by combining short fragment sequencing libraries and longer sequencing reads -- significantly reduce sequencing error rates and improve rare variant detection accuracy. Using this sequencing protocol and an error model optimized for variant detection, we are able to capture a large number of genetic mutations present within a viral population at ultra-low frequency levels (<0.05%). Our rare variant detection strategies have important implications beyond viral evolution and can be applied to any basic and clinical research area that requires the identification of rare mutations.

  18. DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data.

    PubMed

    Arango-Argoty, Gustavo; Garner, Emily; Pruden, Amy; Heath, Lenwood S; Vikesland, Peter; Zhang, Liqing

    2018-02-01

    Growing concerns about increasing rates of antibiotic resistance call for expanded and comprehensive global monitoring. Advancing methods for monitoring of environmental media (e.g., wastewater, agricultural waste, food, and water) is especially needed for identifying potential resources of novel antibiotic resistance genes (ARGs), hot spots for gene exchange, and as pathways for the spread of ARGs and human exposure. Next-generation sequencing now enables direct access and profiling of the total metagenomic DNA pool, where ARGs are typically identified or predicted based on the "best hits" of sequence searches against existing databases. Unfortunately, this approach produces a high rate of false negatives. To address such limitations, we propose here a deep learning approach, taking into account a dissimilarity matrix created using all known categories of ARGs. Two deep learning models, DeepARG-SS and DeepARG-LS, were constructed for short read sequences and full gene length sequences, respectively. Evaluation of the deep learning models over 30 antibiotic resistance categories demonstrates that the DeepARG models can predict ARGs with both high precision (> 0.97) and recall (> 0.90). The models displayed an advantage over the typical best hit approach, yielding consistently lower false negative rates and thus higher overall recall (> 0.9). As more data become available for under-represented ARG categories, the DeepARG models' performance can be expected to be further enhanced due to the nature of the underlying neural networks. Our newly developed ARG database, DeepARG-DB, encompasses ARGs predicted with a high degree of confidence and extensive manual inspection, greatly expanding current ARG repositories. The deep learning models developed here offer more accurate antimicrobial resistance annotation relative to current bioinformatics practice. DeepARG does not require strict cutoffs, which enables identification of a much broader diversity of ARGs. The DeepARG models and database are available as a command line version and as a Web service at http://bench.cs.vt.edu/deeparg .

  19. A Comprehensive Phylogenetic Analysis of the Scleractinia (Cnidaria, Anthozoa) Based on Mitochondrial CO1 Sequence Data

    PubMed Central

    Kitahara, Marcelo V.; Cairns, Stephen D.; Stolarski, Jarosław; Blair, David; Miller, David J.

    2010-01-01

    Background Classical morphological taxonomy places the approximately 1400 recognized species of Scleractinia (hard corals) into 27 families, but many aspects of coral evolution remain unclear despite the application of molecular phylogenetic methods. In part, this may be a consequence of such studies focusing on the reef-building (shallow water and zooxanthellate) Scleractinia, and largely ignoring the large number of deep-sea species. To better understand broad patterns of coral evolution, we generated molecular data for a broad and representative range of deep sea scleractinians collected off New Caledonia and Australia during the last decade, and conducted the most comprehensive molecular phylogenetic analysis to date of the order Scleractinia. Methodology Partial (595 bp) sequences of the mitochondrial cytochrome oxidase subunit 1 (CO1) gene were determined for 65 deep-sea (azooxanthellate) scleractinians and 11 shallow-water species. These new data were aligned with 158 published sequences, generating a 234 taxon dataset representing 25 of the 27 currently recognized scleractinian families. Principal Findings/Conclusions There was a striking discrepancy between the taxonomic validity of coral families consisting predominantly of deep-sea or shallow-water species. Most families composed predominantly of deep-sea azooxanthellate species were monophyletic in both maximum likelihood and Bayesian analyses but, by contrast (and consistent with previous studies), most families composed predominantly of shallow-water zooxanthellate taxa were polyphyletic, although Acroporidae, Poritidae, Pocilloporidae, and Fungiidae were exceptions to this general pattern. One factor contributing to this inconsistency may be the greater environmental stability of deep-sea environments, effectively removing taxonomic “noise” contributed by phenotypic plasticity. Our phylogenetic analyses imply that the most basal extant scleractinians are azooxanthellate solitary corals from deep-water, their divergence predating that of the robust and complex corals. Deep-sea corals are likely to be critical to understanding anthozoan evolution and the origins of the Scleractinia. PMID:20628613

  20. deepTools2: a next generation web server for deep-sequencing data analysis.

    PubMed

    Ramírez, Fidel; Ryan, Devon P; Grüning, Björn; Bhardwaj, Vivek; Kilpert, Fabian; Richter, Andreas S; Heyne, Steffen; Dündar, Friederike; Manke, Thomas

    2016-07-08

    We present an update to our Galaxy-based web server for processing and visualizing deeply sequenced data. Its core tool set, deepTools, allows users to perform complete bioinformatic workflows ranging from quality controls and normalizations of aligned reads to integrative analyses, including clustering and visualization approaches. Since we first described our deepTools Galaxy server in 2014, we have implemented new solutions for many requests from the community and our users. Here, we introduce significant enhancements and new tools to further improve data visualization and interpretation. deepTools continue to be open to all users and freely available as a web service at deeptools.ie-freiburg.mpg.de The new deepTools2 suite can be easily deployed within any Galaxy framework via the toolshed repository, and we also provide source code for command line usage under Linux and Mac OS X. A public and documented API for access to deepTools functionality is also available. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. Using small RNA (sRNA) deep sequencing to understand global virus distribution in plants

    USDA-ARS?s Scientific Manuscript database

    Small RNAs (sRNAs), a class of regulatory RNAs, have been used to serve as the specificity determinants of suppressing gene expression in plants and animals. Next generation sequencing (NGS) uncovered the sRNA landscape in most organisms including their associated microbes. In the current study, w...

  2. Poly(A)-tag deep sequencing data processing to extract poly(A) sites.

    PubMed

    Wu, Xiaohui; Ji, Guoli; Li, Qingshun Quinn

    2015-01-01

    Polyadenylation [poly(A)] is an essential posttranscriptional processing step in the maturation of eukaryotic mRNA. The advent of next-generation sequencing (NGS) technology has offered feasible means to generate large-scale data and new opportunities for intensive study of polyadenylation, particularly deep sequencing of the transcriptome targeting the junction of 3'-UTR and the poly(A) tail of the transcript. To take advantage of this unprecedented amount of data, we present an automated workflow to identify polyadenylation sites by integrating NGS data cleaning, processing, mapping, normalizing, and clustering. In this pipeline, a series of Perl scripts are seamlessly integrated to iteratively map the single- or paired-end sequences to the reference genome. After mapping, the poly(A) tags (PATs) at the same genome coordinate are grouped into one cleavage site, and the internal priming artifacts removed. Then the ambiguous region is introduced to parse the genome annotation for cleavage site clustering. Finally, cleavage sites within a close range of 24 nucleotides and from different samples can be clustered into poly(A) clusters. This procedure could be used to identify thousands of reliable poly(A) clusters from millions of NGS sequences in different tissues or treatments.

  3. Genome-wide analyses of long noncoding RNA expression profiles correlated with radioresistance in nasopharyngeal carcinoma via next-generation deep sequencing.

    PubMed

    Li, Guo; Liu, Yong; Liu, Chao; Su, Zhongwu; Ren, Shuling; Wang, Yunyun; Deng, Tengbo; Huang, Donghai; Tian, Yongquan; Qiu, Yuanzheng

    2016-09-06

    Radioresistance is one of the major factors limiting the therapeutic efficacy and prognosis of patients with nasopharyngeal carcinoma (NPC). Accumulating evidence has suggested that aberrant expression of long noncoding RNAs (lncRNAs) contributes to cancer progression. Therefore, here we identified lncRNAs associated with radioresistance in NPC. The differential expression profiles of lncRNAs associated with NPC radioresistance were constructed by next-generation deep sequencing by comparing radioresistant NPC cells with their parental cells. LncRNA-related mRNAs were predicted and analyzed using bioinformatics algorithms compared with the mRNA profiles related to radioresistance obtained in our previous study. Several lncRNAs and associated mRNAs were validated in established NPC radioresistant cell models and NPC tissues. By comparison between radioresistant CNE-2-Rs and parental CNE-2 cells by next-generation deep sequencing, a total of 781 known lncRNAs and 2054 novel lncRNAs were annotated. The top five upregulated and downregulated known/novel lncRNAs were detected using quantitative real-time reverse transcription-polymerase chain reaction, and 7/10 known lncRNAs and 3/10 novel lncRNAs were demonstrated to have significant differential expression trends that were the same as those predicted by deep sequencing. From the prediction process, 13 pairs of lncRNAs and their associated genes were acquired, and the prediction trends of three pairs were validated in both radioresistant CNE-2-Rs and 6-10B-Rs cell lines, including lncRNA n373932 and SLITRK5, n409627 and PRSS12, and n386034 and RIMKLB. LncRNA n373932 and its related SLITRK5 showed dramatic expression changes in post-irradiation radioresistant cells and a negative expression correlation in NPC tissues (R = -0.595, p < 0.05). Our study provides an overview of the expression profiles of radioresistant lncRNAs and potentially related mRNAs, which will facilitate future investigations into the function of lncRNAs in NPC radioresistance.

  4. Dissecting enzyme function with microfluidic-based deep mutational scanning.

    PubMed

    Romero, Philip A; Tran, Tuan M; Abate, Adam R

    2015-06-09

    Natural enzymes are incredibly proficient catalysts, but engineering them to have new or improved functions is challenging due to the complexity of how an enzyme's sequence relates to its biochemical properties. Here, we present an ultrahigh-throughput method for mapping enzyme sequence-function relationships that combines droplet microfluidic screening with next-generation DNA sequencing. We apply our method to map the activity of millions of glycosidase sequence variants. Microfluidic-based deep mutational scanning provides a comprehensive and unbiased view of the enzyme function landscape. The mapping displays expected patterns of mutational tolerance and a strong correspondence to sequence variation within the enzyme family, but also reveals previously unreported sites that are crucial for glycosidase function. We modified the screening protocol to include a high-temperature incubation step, and the resulting thermotolerance landscape allowed the discovery of mutations that enhance enzyme thermostability. Droplet microfluidics provides a general platform for enzyme screening that, when combined with DNA-sequencing technologies, enables high-throughput mapping of enzyme sequence space.

  5. Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data.

    PubMed

    Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A; Larsen, Martin Jakob

    2016-01-01

    Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths.

  6. Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data

    PubMed Central

    Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A.; Larsen, Martin Jakob

    2016-01-01

    Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths. PMID:27002637

  7. Enhanced arbovirus surveillance with deep sequencing: Identification of novel rhabdoviruses and bunyaviruses in Australian mosquitoes.

    PubMed

    Coffey, Lark L; Page, Brady L; Greninger, Alexander L; Herring, Belinda L; Russell, Richard C; Doggett, Stephen L; Haniotis, John; Wang, Chunlin; Deng, Xutao; Delwart, Eric L

    2014-01-05

    Viral metagenomics characterizes known and identifies unknown viruses based on sequence similarities to any previously sequenced viral genomes. A metagenomics approach was used to identify virus sequences in Australian mosquitoes causing cytopathic effects in inoculated mammalian cell cultures. Sequence comparisons revealed strains of Liao Ning virus (Reovirus, Seadornavirus), previously detected only in China, livestock-infecting Stretch Lagoon virus (Reovirus, Orbivirus), two novel dimarhabdoviruses, named Beaumont and North Creek viruses, and two novel orthobunyaviruses, named Murrumbidgee and Salt Ash viruses. The novel virus proteomes diverged by ≥ 50% relative to their closest previously genetically characterized viral relatives. Deep sequencing also generated genomes of Warrego and Wallal viruses, orbiviruses linked to kangaroo blindness, whose genomes had not been fully characterized. This study highlights viral metagenomics in concert with traditional arbovirus surveillance to characterize known and new arboviruses in field-collected mosquitoes. Follow-up epidemiological studies are required to determine whether the novel viruses infect humans. © 2013 Elsevier Inc. All rights reserved.

  8. Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Whitehead, Timothy A.; Chevalier, Aaron; Song, Yifan

    2012-06-19

    We show that comprehensive sequence-function maps obtained by deep sequencing can be used to reprogram interaction specificity and to leapfrog over bottlenecks in affinity maturation by combining many individually small contributions not detectable in conventional approaches. We use this approach to optimize two computationally designed inhibitors against H1N1 influenza hemagglutinin and, in both cases, obtain variants with subnanomolar binding affinity. The most potent of these, a 51-residue protein, is broadly cross-reactive against all influenza group 1 hemagglutinins, including human H2, and neutralizes H1N1 viruses with a potency that rivals that of several human monoclonal antibodies, demonstrating that computational design followedmore » by comprehensive energy landscape mapping can generate proteins with potential therapeutic utility.« less

  9. Optimization of whole-transcriptome amplification from low cell density deep-sea microbial samples for metatranscriptomic analysis.

    PubMed

    Wu, Jieying; Gao, Weimin; Zhang, Weiwen; Meldrum, Deirdre R

    2011-01-01

    Limitation in sample quality and quantity is one of the big obstacles for applying metatranscriptomic technologies to explore gene expression and functionality of microbial communities in natural environments. In this study, several amplification methods were evaluated for whole-transcriptome amplification of deep-sea microbial samples, which are of low cell density and high impurity. The best amplification method was identified and incorporated into a complete protocol to isolate and amplify deep-sea microbial samples. In the protocol, total RNA was first isolated by a modified method combining Trizol (Invitrogen, CA) and RNeasy (QIAGEN, CA) method, amplified with a WT-Ovation™ Pico RNA Amplification System (NuGEN, CA), and then converted to double-strand DNA from single-strand cDNA with a WT-Ovation™ Exon Module (NuGEN, CA). The products from the whole-transcriptome amplification of deep-sea microbial samples were assessed first through random clone library sequencing. The BLAST search results showed that marine-based sequences are dominant in the libraries, consistent with the ecological source of the samples. The products were then used for next-generation Roche GS FLX Titanium sequencing to obtain metatranscriptome data. Preliminary analysis of the metatranscriptomic data showed good sequencing quality. Although the protocol was designed and demonstrated to be effective for deep-sea microbial samples, it should be applicable to similar samples from other extreme environments in exploring community structure and functionality of microbial communities. Copyright © 2010 Elsevier B.V. All rights reserved.

  10. SHARAKU: an algorithm for aligning and clustering read mapping profiles of deep sequencing in non-coding RNA processing.

    PubMed

    Tsuchiya, Mariko; Amano, Kojiro; Abe, Masaya; Seki, Misato; Hase, Sumitaka; Sato, Kengo; Sakakibara, Yasubumi

    2016-06-15

    Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs. We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5'-end processing and 3'-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain. The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA004502. yasu@bio.keio.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  11. Reporting Differences Between Spacecraft Sequence Files

    NASA Technical Reports Server (NTRS)

    Khanampompan, Teerapat; Gladden, Roy E.; Fisher, Forest W.

    2010-01-01

    A suite of computer programs, called seq diff suite, reports differences between the products of other computer programs involved in the generation of sequences of commands for spacecraft. These products consist of files of several types: replacement sequence of events (RSOE), DSN keyword file [DKF (wherein DSN signifies Deep Space Network)], spacecraft activities sequence file (SASF), spacecraft sequence file (SSF), and station allocation file (SAF). These products can include line numbers, request identifications, and other pieces of information that are not relevant when generating command sequence products, though these fields can result in the appearance of many changes to the files, particularly when using the UNIX diff command to inspect file differences. The outputs of prior software tools for reporting differences between such products include differences in these non-relevant pieces of information. In contrast, seq diff suite removes the fields containing the irrelevant pieces of information before processing to extract differences, so that only relevant differences are reported. Thus, seq diff suite is especially useful for reporting changes between successive versions of the various products and in particular flagging difference in fields relevant to the sequence command generation and review process.

  12. GenomeGems: evaluation of genetic variability from deep sequencing data

    PubMed Central

    2012-01-01

    Background Detection of disease-causing mutations using Deep Sequencing technologies possesses great challenges. In particular, organizing the great amount of sequences generated so that mutations, which might possibly be biologically relevant, are easily identified is a difficult task. Yet, for this assignment only limited automatic accessible tools exist. Findings We developed GenomeGems to gap this need by enabling the user to view and compare Single Nucleotide Polymorphisms (SNPs) from multiple datasets and to load the data onto the UCSC Genome Browser for an expanded and familiar visualization. As such, via automatic, clear and accessible presentation of processed Deep Sequencing data, our tool aims to facilitate ranking of genomic SNP calling. GenomeGems runs on a local Personal Computer (PC) and is freely available at http://www.tau.ac.il/~nshomron/GenomeGems. Conclusions GenomeGems enables researchers to identify potential disease-causing SNPs in an efficient manner. This enables rapid turnover of information and leads to further experimental SNP validation. The tool allows the user to compare and visualize SNPs from multiple experiments and to easily load SNP data onto the UCSC Genome browser for further detailed information. PMID:22748151

  13. A novel wavelet sequence based on deep bidirectional LSTM network model for ECG signal classification.

    PubMed

    Yildirim, Özal

    2018-05-01

    Long-short term memory networks (LSTMs), which have recently emerged in sequential data analysis, are the most widely used type of recurrent neural networks (RNNs) architecture. Progress on the topic of deep learning includes successful adaptations of deep versions of these architectures. In this study, a new model for deep bidirectional LSTM network-based wavelet sequences called DBLSTM-WS was proposed for classifying electrocardiogram (ECG) signals. For this purpose, a new wavelet-based layer is implemented to generate ECG signal sequences. The ECG signals were decomposed into frequency sub-bands at different scales in this layer. These sub-bands are used as sequences for the input of LSTM networks. New network models that include unidirectional (ULSTM) and bidirectional (BLSTM) structures are designed for performance comparisons. Experimental studies have been performed for five different types of heartbeats obtained from the MIT-BIH arrhythmia database. These five types are Normal Sinus Rhythm (NSR), Ventricular Premature Contraction (VPC), Paced Beat (PB), Left Bundle Branch Block (LBBB), and Right Bundle Branch Block (RBBB). The results show that the DBLSTM-WS model gives a high recognition performance of 99.39%. It has been observed that the wavelet-based layer proposed in the study significantly improves the recognition performance of conventional networks. This proposed network structure is an important approach that can be applied to similar signal processing problems. Copyright © 2018 Elsevier Ltd. All rights reserved.

  14. Deep Sequencing Analysis of Apple Infecting Viruses in Korea

    PubMed Central

    Cho, In-Sook; Igori, Davaajargal; Lim, Seungmo; Choi, Gug-Seoun; Hammond, John; Lim, Hyoun-Sub; Moon, Jae Sun

    2016-01-01

    Deep sequencing has generated 52 contigs derived from five viruses; Apple chlorotic leaf spot virus (ACLSV), Apple stem grooving virus (ASGV), Apple stem pitting virus (ASPV), Apple green crinkle associated virus (AGCaV), and Apricot latent virus (ApLV) were identified from eight apple samples showing small leaves and/or growth retardation. Nucleotide (nt) sequence identity of the assembled contigs was from 68% to 99% compared to the reference sequences of the five respective viral genomes. Sequences of ASPV and ASGV were the most abundantly represented by the 52 contigs assembled. The presence of the five viruses in the samples was confirmed by RT-PCR using specific primers based on the sequences of each assembled contig. All five viruses were detected in three of the samples, whereas all samples had mixed infections with at least two viruses. The most frequently detected virus was ASPV, followed by ASGV, ApLV, ACLSV, and AGCaV which were withal found in mixed infections in the tested samples. AGCaV was identified in assembled contigs ID 1012480 and 93549, which showed 82% and 78% nt sequence identity with ORF1 of AGCaV isolate Aurora-1. ApLV was identified in three assembled contigs, ID 65587, 1802365, and 116777, which showed 77%, 78%, and 76% nt sequence identity respectively with ORF1 of ApLV isolate LA2. Deep sequencing assay was shown to be a valuable and powerful tool for detection and identification of known and unknown virome in infected apple trees, here identifying ApLV and AGCaV in commercial orchards in Korea for the first time. PMID:27721694

  15. Rapid gene identification in sugar beet using deep sequencing of DNA from phenotypic pools selected from breeding panels.

    PubMed

    Ries, David; Holtgräwe, Daniela; Viehöver, Prisca; Weisshaar, Bernd

    2016-03-15

    The combination of bulk segregant analysis (BSA) and next generation sequencing (NGS), also known as mapping by sequencing (MBS), has been shown to significantly accelerate the identification of causal mutations for species with a reference genome sequence. The usual approach is to cross homozygous parents that differ for the monogenic trait to address, to perform deep sequencing of DNA from F2 plants pooled according to their phenotype, and subsequently to analyze the allele frequency distribution based on a marker table for the parents studied. The method has been successfully applied for EMS induced mutations as well as natural variation. Here, we show that pooling genetically diverse breeding lines according to a contrasting phenotype also allows high resolution mapping of the causal gene in a crop species. The test case was the monogenic locus causing red vs. green hypocotyl color in Beta vulgaris (R locus). We determined the allele frequencies of polymorphic sequences using sequence data from two diverging phenotypic pools of 180 B. vulgaris accessions each. A single interval of about 31 kbp among the nine chromosomes was identified which indeed contained the causative mutation. By applying a variation of the mapping by sequencing approach, we demonstrated that phenotype-based pooling of diverse accessions from breeding panels and subsequent direct determination of the allele frequency distribution can be successfully applied for gene identification in a crop species. Our approach made it possible to identify a small interval around the causative gene. Sequencing of parents or individual lines was not necessary. Whenever the appropriate plant material is available, the approach described saves time compared to the generation of an F2 population. In addition, we provide clues for planning similar experiments with regard to pool size and the sequencing depth required.

  16. High-throughput sequencing of the entire genomic regions of CCM1/KRIT1, CCM2 and CCM3/PDCD10 to search for pathogenic deep-intronic splice mutations in cerebral cavernous malformations.

    PubMed

    Rath, Matthias; Jenssen, Sönke E; Schwefel, Konrad; Spiegler, Stefanie; Kleimeier, Dana; Sperling, Christian; Kaderali, Lars; Felbor, Ute

    2017-09-01

    Cerebral cavernous malformations (CCM) are vascular lesions of the central nervous system that can cause headaches, seizures and hemorrhagic stroke. Disease-associated mutations have been identified in three genes: CCM1/KRIT1, CCM2 and CCM3/PDCD10. The precise proportion of deep-intronic variants in these genes and their clinical relevance is yet unknown. Here, a long-range PCR (LR-PCR) approach for target enrichment of the entire genomic regions of the three genes was combined with next generation sequencing (NGS) to screen for coding and non-coding variants. NGS detected all six CCM1/KRIT1, two CCM2 and four CCM3/PDCD10 mutations that had previously been identified by Sanger sequencing. Two of the pathogenic variants presented here are novel. Additionally, 20 stringently selected CCM index cases that had remained mutation-negative after conventional sequencing and exclusion of copy number variations were screened for deep-intronic mutations. The combination of bioinformatics filtering and transcript analyses did not reveal any deep-intronic splice mutations in these cases. Our results demonstrate that target enrichment by LR-PCR combined with NGS can be used for a comprehensive analysis of the entire genomic regions of the CCM genes in a research context. However, its clinical utility is limited as deep-intronic splice mutations in CCM1/KRIT1, CCM2 and CCM3/PDCD10 seem to be rather rare. Copyright © 2017 Elsevier Masson SAS. All rights reserved.

  17. Construction of Pseudomolecule Sequences of the aus Rice Cultivar Kasalath for Comparative Genomics of Asian Cultivated Rice

    PubMed Central

    Sakai, Hiroaki; Kanamori, Hiroyuki; Arai-Kichise, Yuko; Shibata-Hatta, Mari; Ebana, Kaworu; Oono, Youko; Kurita, Kanako; Fujisawa, Hiroko; Katagiri, Satoshi; Mukai, Yoshiyuki; Hamada, Masao; Itoh, Takeshi; Matsumoto, Takashi; Katayose, Yuichi; Wakasa, Kyo; Yano, Masahiro; Wu, Jianzhong

    2014-01-01

    Having a deep genetic structure evolved during its domestication and adaptation, the Asian cultivated rice (Oryza sativa) displays considerable physiological and morphological variations. Here, we describe deep whole-genome sequencing of the aus rice cultivar Kasalath by using the advanced next-generation sequencing (NGS) technologies to gain a better understanding of the sequence and structural changes among highly differentiated cultivars. The de novo assembled Kasalath sequences represented 91.1% (330.55 Mb) of the genome and contained 35 139 expressed loci annotated by RNA-Seq analysis. We detected 2 787 250 single-nucleotide polymorphisms (SNPs) and 7393 large insertion/deletion (indel) sites (>100 bp) between Kasalath and Nipponbare, and 2 216 251 SNPs and 3780 large indels between Kasalath and 93-11. Extensive comparison of the gene contents among these cultivars revealed similar rates of gene gain and loss. We detected at least 7.39 Mb of inserted sequences and 40.75 Mb of unmapped sequences in the Kasalath genome in comparison with the Nipponbare reference genome. Mapping of the publicly available NGS short reads from 50 rice accessions proved the necessity and the value of using the Kasalath whole-genome sequence as an additional reference to capture the sequence polymorphisms that cannot be discovered by using the Nipponbare sequence alone. PMID:24578372

  18. DNA Cryptography and Deep Learning using Genetic Algorithm with NW algorithm for Key Generation.

    PubMed

    Kalsi, Shruti; Kaur, Harleen; Chang, Victor

    2017-12-05

    Cryptography is not only a science of applying complex mathematics and logic to design strong methods to hide data called as encryption, but also to retrieve the original data back, called decryption. The purpose of cryptography is to transmit a message between a sender and receiver such that an eavesdropper is unable to comprehend it. To accomplish this, not only we need a strong algorithm, but a strong key and a strong concept for encryption and decryption process. We have introduced a concept of DNA Deep Learning Cryptography which is defined as a technique of concealing data in terms of DNA sequence and deep learning. In the cryptographic technique, each alphabet of a letter is converted into a different combination of the four bases, namely; Adenine (A), Cytosine (C), Guanine (G) and Thymine (T), which make up the human deoxyribonucleic acid (DNA). Actual implementations with the DNA don't exceed laboratory level and are expensive. To bring DNA computing on a digital level, easy and effective algorithms are proposed in this paper. In proposed work we have introduced firstly, a method and its implementation for key generation based on the theory of natural selection using Genetic Algorithm with Needleman-Wunsch (NW) algorithm and Secondly, a method for implementation of encryption and decryption based on DNA computing using biological operations Transcription, Translation, DNA Sequencing and Deep Learning.

  19. Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology.

    PubMed

    Otto, Thomas D; Sanders, Mandy; Berriman, Matthew; Newbold, Chris

    2010-07-15

    The accuracy of reference genomes is important for downstream analysis but a low error rate requires expensive manual interrogation of the sequence. Here, we describe a novel algorithm (Iterative Correction of Reference Nucleotides) that iteratively aligns deep coverage of short sequencing reads to correct errors in reference genome sequences and evaluate their accuracy. Using Plasmodium falciparum (81% A + T content) as an extreme example, we show that the algorithm is highly accurate and corrects over 2000 errors in the reference sequence. We give examples of its application to numerous other eukaryotic and prokaryotic genomes and suggest additional applications. The software is available at http://icorn.sourceforge.net

  20. De Novo Deep Transcriptome Analysis of Medicinal Plants for Gene Discovery in Biosynthesis of Plant Natural Products.

    PubMed

    Han, R; Rai, A; Nakamura, M; Suzuki, H; Takahashi, H; Yamazaki, M; Saito, K

    2016-01-01

    Study on transcriptome, the entire pool of transcripts in an organism or single cells at certain physiological or pathological stage, is indispensable in unraveling the connection and regulation between DNA and protein. Before the advent of deep sequencing, microarray was the main approach to handle transcripts. Despite obvious shortcomings, including limited dynamic range and difficulties to compare the results from distinct experiments, microarray was widely applied. During the past decade, next-generation sequencing (NGS) has revolutionized our understanding of genomics in a fast, high-throughput, cost-effective, and tractable manner. By adopting NGS, efficiency and fruitful outcomes concerning the efforts to elucidate genes responsible for producing active compounds in medicinal plants were profoundly enhanced. The whole process involves steps, from the plant material sampling, to cDNA library preparation, to deep sequencing, and then bioinformatics takes over to assemble enormous-yet fragmentary-data from which to comb and extract information. The unprecedentedly rapid development of such technologies provides so many choices to facilitate the task, which can cause confusion when choosing the suitable methodology for specific purposes. Here, we review the general approaches for deep transcriptome analysis and then focus on their application in discovering biosynthetic pathways of medicinal plants that produce important secondary metabolites. © 2016 Elsevier Inc. All rights reserved.

  1. The Pediatric Cancer Genome Project

    PubMed Central

    Downing, James R; Wilson, Richard K; Zhang, Jinghui; Mardis, Elaine R; Pui, Ching-Hon; Ding, Li; Ley, Timothy J; Evans, William E

    2013-01-01

    The St. Jude Children’s Research Hospital–Washington University Pediatric Cancer Genome Project (PCGP) is participating in the international effort to identify somatic mutations that drive cancer. These cancer genome sequencing efforts will not only yield an unparalleled view of the altered signaling pathways in cancer but should also identify new targets against which novel therapeutics can be developed. Although these projects are still deep in the phase of generating primary DNA sequence data, important results are emerging and valuable community resources are being generated that should catalyze future cancer research. We describe here the rationale for conducting the PCGP, present some of the early results of this project and discuss the major lessons learned and how these will affect the application of genomic sequencing in the clinic. PMID:22641210

  2. Studies of a biochemical factory: tomato trichome deep expressed sequence tag sequencing and proteomics.

    PubMed

    Schilmiller, Anthony L; Miner, Dennis P; Larson, Matthew; McDowell, Eric; Gang, David R; Wilkerson, Curtis; Last, Robert L

    2010-07-01

    Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces beta-caryophyllene and alpha-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells.

  3. Studies of a Biochemical Factory: Tomato Trichome Deep Expressed Sequence Tag Sequencing and Proteomics1[W][OA

    PubMed Central

    Schilmiller, Anthony L.; Miner, Dennis P.; Larson, Matthew; McDowell, Eric; Gang, David R.; Wilkerson, Curtis; Last, Robert L.

    2010-01-01

    Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces β-caryophyllene and α-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells. PMID:20431087

  4. A Neocortical Delta Rhythm Facilitates Reciprocal Interlaminar Interactions via Nested Theta Rhythms

    PubMed Central

    Carracedo, Lucy M.; Kjeldsen, Henrik; Cunnington, Leonie; Jenkins, Alastair; Schofield, Ian; Cunningham, Mark O.; Davies, Ceri H.; Traub, Roger D.

    2013-01-01

    Delta oscillations (1–4 Hz) associate with deep sleep and are implicated in memory consolidation and replay of cortical responses elicited during wake states. A potent local generator has been characterized in thalamus, and local generators in neocortex have been suggested. Here we demonstrate that isolated rat neocortex generates delta rhythms in conditions mimicking the neuromodulatory state during deep sleep (low cholinergic and dopaminergic tone). The rhythm originated in an NMDA receptor-driven network of intrinsic bursting (IB) neurons in layer 5, activating a source of GABAB receptor-mediated inhibition. In contrast, regular spiking (RS) neurons in layer 5 generated theta-frequency outputs. In layer 2/3 principal cells, outputs from IB cells associated with IPSPs, whereas those from layer 5 RS neurons related to nested bursts of theta-frequency EPSPs. Both interlaminar spike and field correlations revealed a sequence of events whereby sparse spiking in layer 2/3 was partially reflected back from layer 5 on each delta period. We suggest that these reciprocal, interlaminar interactions may represent a “Helmholtz machine”-like process to control synaptic rescaling during deep sleep. PMID:23804097

  5. A Template-Based Protein Structure Reconstruction Method Using Deep Autoencoder Learning.

    PubMed

    Li, Haiou; Lyu, Qiang; Cheng, Jianlin

    2016-12-01

    Protein structure prediction is an important problem in computational biology, and is widely applied to various biomedical problems such as protein function study, protein design, and drug design. In this work, we developed a novel deep learning approach based on a deeply stacked denoising autoencoder for protein structure reconstruction. We applied our approach to a template-based protein structure prediction using only the 3D structural coordinates of homologous template proteins as input. The templates were identified for a target protein by a PSI-BLAST search. 3DRobot (a program that automatically generates diverse and well-packed protein structure decoys) was used to generate initial decoy models for the target from the templates. A stacked denoising autoencoder was trained on the decoys to obtain a deep learning model for the target protein. The trained deep model was then used to reconstruct the final structural model for the target sequence. With target proteins that have highly similar template proteins as benchmarks, the GDT-TS score of the predicted structures is greater than 0.7, suggesting that the deep autoencoder is a promising method for protein structure reconstruction.

  6. The dynamics of genome replication using deep sequencing

    PubMed Central

    Müller, Carolin A.; Hawkins, Michelle; Retkute, Renata; Malla, Sunir; Wilson, Ray; Blythe, Martin J.; Nakato, Ryuichiro; Komata, Makiko; Shirahige, Katsuhiko; de Moura, Alessandro P.S.; Nieduszynski, Conrad A.

    2014-01-01

    Eukaryotic genomes are replicated from multiple DNA replication origins. We present complementary deep sequencing approaches to measure origin location and activity in Saccharomyces cerevisiae. Measuring the increase in DNA copy number during a synchronous S-phase allowed the precise determination of genome replication. To map origin locations, replication forks were stalled close to their initiation sites; therefore, copy number enrichment was limited to origins. Replication timing profiles were generated from asynchronous cultures using fluorescence-activated cell sorting. Applying this technique we show that the replication profiles of haploid and diploid cells are indistinguishable, indicating that both cell types use the same cohort of origins with the same activities. Finally, increasing sequencing depth allowed the direct measure of replication dynamics from an exponentially growing culture. This is the first time this approach, called marker frequency analysis, has been successfully applied to a eukaryote. These data provide a high-resolution resource and methodological framework for studying genome biology. PMID:24089142

  7. De Novo Generation and Characterization of New Zika Virus Isolate Using Sequence Data from a Microcephaly Case

    PubMed Central

    Setoh, Yin Xiang; Prow, Natalie A.; Peng, Nias; Hugo, Leon E.; Devine, Gregor; Hazlewood, Jessamine E.

    2017-01-01

    ABSTRACT Zika virus (ZIKV) has recently emerged and is the etiological agent of congenital Zika syndrome (CZS), a spectrum of congenital abnormalities arising from neural tissue infections in utero. Herein, we describe the de novo generation of a new ZIKV isolate, ZIKVNatal, using a modified circular polymerase extension reaction protocol and sequence data obtained from a ZIKV-infected fetus with microcephaly. ZIKVNatal thus has no laboratory passage history and is unequivocally associated with CZS. ZIKVNatal could be used to establish a fetal brain infection model in IFNAR−/− mice (including intrauterine growth restriction) without causing symptomatic infections in dams. ZIKVNatal was also able to be transmitted by Aedes aegypti mosquitoes. ZIKVNatal thus retains key aspects of circulating pathogenic ZIKVs and illustrates a novel methodology for obtaining an authentic functional viral isolate by using data from deep sequencing of infected tissues. IMPORTANCE The major complications of an ongoing Zika virus outbreak in the Americas and Asia are congenital defects caused by the virus’s ability to cross the placenta and infect the fetal brain. The ability to generate molecular tools to analyze viral isolates from the current outbreak is essential for furthering our understanding of how these viruses cause congenital defects. The majority of existing viral isolates and infectious cDNA clones generated from them have undergone various numbers of passages in cell culture and/or suckling mice, which is likely to result in the accumulation of adaptive mutations that may affect viral properties. The approach described herein allows rapid generation of new, fully functional Zika virus isolates directly from deep sequencing data from virus-infected tissues without the need for prior virus passaging and for the generation and propagation of full-length cDNA clones. The approach should be applicable to other medically important flaviviruses and perhaps other positive-strand RNA viruses. PMID:28529976

  8. De Novo Generation and Characterization of New Zika Virus Isolate Using Sequence Data from a Microcephaly Case.

    PubMed

    Setoh, Yin Xiang; Prow, Natalie A; Peng, Nias; Hugo, Leon E; Devine, Gregor; Hazlewood, Jessamine E; Suhrbier, Andreas; Khromykh, Alexander A

    2017-01-01

    Zika virus (ZIKV) has recently emerged and is the etiological agent of congenital Zika syndrome (CZS), a spectrum of congenital abnormalities arising from neural tissue infections in utero . Herein, we describe the de novo generation of a new ZIKV isolate, ZIKV Natal , using a modified circular polymerase extension reaction protocol and sequence data obtained from a ZIKV-infected fetus with microcephaly. ZIKV Natal thus has no laboratory passage history and is unequivocally associated with CZS. ZIKV Natal could be used to establish a fetal brain infection model in IFNAR -/- mice (including intrauterine growth restriction) without causing symptomatic infections in dams. ZIKV Natal was also able to be transmitted by Aedes aegypti mosquitoes. ZIKV Natal thus retains key aspects of circulating pathogenic ZIKVs and illustrates a novel methodology for obtaining an authentic functional viral isolate by using data from deep sequencing of infected tissues. IMPORTANCE The major complications of an ongoing Zika virus outbreak in the Americas and Asia are congenital defects caused by the virus's ability to cross the placenta and infect the fetal brain. The ability to generate molecular tools to analyze viral isolates from the current outbreak is essential for furthering our understanding of how these viruses cause congenital defects. The majority of existing viral isolates and infectious cDNA clones generated from them have undergone various numbers of passages in cell culture and/or suckling mice, which is likely to result in the accumulation of adaptive mutations that may affect viral properties. The approach described herein allows rapid generation of new, fully functional Zika virus isolates directly from deep sequencing data from virus-infected tissues without the need for prior virus passaging and for the generation and propagation of full-length cDNA clones. The approach should be applicable to other medically important flaviviruses and perhaps other positive-strand RNA viruses.

  9. Phylogenetic and Genome-Wide Deep-Sequencing Analyses of Canine Parvovirus Reveal Co-Infection with Field Variants and Emergence of a Recent Recombinant Strain

    PubMed Central

    Pérez, Ruben; Calleros, Lucía; Marandino, Ana; Sarute, Nicolás; Iraola, Gregorio; Grecco, Sofia; Blanc, Hervé; Vignuzzi, Marco; Isakov, Ofer; Shomron, Noam; Carrau, Lucía; Hernández, Martín; Francia, Lourdes; Sosa, Katia; Tomás, Gonzalo; Panzera, Yanina

    2014-01-01

    Canine parvovirus (CPV), a fast-evolving single-stranded DNA virus, comprises three antigenic variants (2a, 2b, and 2c) with different frequencies and genetic variability among countries. The contribution of co-infection and recombination to the genetic variability of CPV is far from being fully elucidated. Here we took advantage of a natural CPV population, recently formed by the convergence of divergent CPV-2c and CPV-2a strains, to study co-infection and recombination. Complete sequences of the viral coding region of CPV-2a and CPV-2c strains from 40 samples were generated and analyzed using phylogenetic tools. Two samples showed co-infection and were further analyzed by deep sequencing. The sequence profile of one of the samples revealed the presence of CPV-2c and CPV-2a strains that differed at 29 nucleotides. The other sample included a minor CPV-2a strain (13.3% of the viral population) and a major recombinant strain (86.7%). The recombinant strain arose from inter-genotypic recombination between CPV-2c and CPV-2a strains within the VP1/VP2 gene boundary. Our findings highlight the importance of deep-sequencing analysis to provide a better understanding of CPV molecular diversity. PMID:25365348

  10. Deep-sequencing to resolve complex diversity of apicomplexan parasites in platypuses and echidnas: Proof of principle for wildlife disease investigation.

    PubMed

    Šlapeta, Jan; Saverimuttu, Stefan; Vogelnest, Larry; Sangster, Cheryl; Hulst, Frances; Rose, Karrie; Thompson, Paul; Whittington, Richard

    2017-11-01

    The short-beaked echidna (Tachyglossus aculeatus) and the platypus (Ornithorhynchus anatinus) are iconic egg-laying monotremes (Mammalia: Monotremata) from Australasia. The aim of this study was to demonstrate the utility of diversity profiles in disease investigations of monotremes. Using small subunit (18S) rDNA amplicon deep-sequencing we demonstrated the presence of apicomplexan parasites and confirmed by direct and cloned amplicon gene sequencing Theileria ornithorhynchi, Theileria tachyglossi, Eimeria echidnae and Cryptosporidium fayeri. Using a combination of samples from healthy and diseased animals, we show a close evolutionary relationship between species of coccidia (Eimeria) and piroplasms (Theileria) from the echidna and platypus. The presence of E. echidnae was demonstrated in faeces and tissues affected by disseminated coccidiosis. Moreover, the presence of E. echidnae DNA in the blood of echidnas was associated with atoxoplasma-like stages in white blood cells, suggesting Hepatozoon tachyglossi blood stages are disseminated E. echidnae stages. These next-generation DNA sequencing technologies are suited to material and organisms that have not been previously characterised and for which the material is scarce. The deep sequencing approach supports traditional diagnostic methods, including microscopy, clinical pathology and histopathology, to better define the status quo. This approach is particularly suitable for wildlife disease investigation. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. Effects of hydrostatic pressure on yeasts isolated from deep-sea hydrothermal vents.

    PubMed

    Burgaud, Gaëtan; Hué, Nguyen Thi Minh; Arzur, Danielle; Coton, Monika; Perrier-Cornet, Jean-Marie; Jebbar, Mohamed; Barbier, Georges

    2015-11-01

    Hydrostatic pressure plays a significant role in the distribution of life in the biosphere. Knowledge of deep-sea piezotolerant and (hyper)piezophilic bacteria and archaea diversity has been well documented, along with their specific adaptations to cope with high hydrostatic pressure (HHP). Recent investigations of deep-sea microbial community compositions have shown unexpected micro-eukaryotic communities, mainly dominated by fungi. Molecular methods such as next-generation sequencing have been used for SSU rRNA gene sequencing to reveal fungal taxa. Currently, a difficult but fascinating challenge for marine mycologists is to create deep-sea marine fungus culture collections and assess their ability to cope with pressure. Indeed, although there is no universal genetic marker for piezoresistance, physiological analyses provide concrete relevant data for estimating their adaptations and understanding the role of fungal communities in the abyss. The present study investigated morphological and physiological responses of fungi to HHP using a collection of deep-sea yeasts as a model. The aim was to determine whether deep-sea yeasts were able to tolerate different HHP and if they were metabolically active. Here we report an unexpected taxonomic-based dichotomic response to pressure with piezosensitve ascomycetes and piezotolerant basidiomycetes, and distinct morphological switches triggered by pressure for certain strains. Copyright © 2015 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

  12. The salt-responsive transcriptome of chickpea roots and nodules via deepSuperSAGE

    PubMed Central

    2011-01-01

    Background The combination of high-throughput transcript profiling and next-generation sequencing technologies is a prerequisite for genome-wide comprehensive transcriptome analysis. Our recent innovation of deepSuperSAGE is based on an advanced SuperSAGE protocol and its combination with massively parallel pyrosequencing on Roche's 454 sequencing platform. As a demonstration of the power of this combination, we have chosen the salt stress transcriptomes of roots and nodules of the third most important legume crop chickpea (Cicer arietinum L.). While our report is more technology-oriented, it nevertheless addresses a major world-wide problem for crops generally: high salinity. Together with low temperatures and water stress, high salinity is responsible for crop losses of millions of tons of various legume (and other) crops. Continuously deteriorating environmental conditions will combine with salinity stress to further compromise crop yields. As a good example for such stress-exposed crop plants, we started to characterize salt stress responses of chickpeas on the transcriptome level. Results We used deepSuperSAGE to detect early global transcriptome changes in salt-stressed chickpea. The salt stress responses of 86,919 transcripts representing 17,918 unique 26 bp deepSuperSAGE tags (UniTags) from roots of the salt-tolerant variety INRAT-93 two hours after treatment with 25 mM NaCl were characterized. Additionally, the expression of 57,281 transcripts representing 13,115 UniTags was monitored in nodules of the same plants. From a total of 144,200 analyzed 26 bp tags in roots and nodules together, 21,401 unique transcripts were identified. Of these, only 363 and 106 specific transcripts, respectively, were commonly up- or down-regulated (>3.0-fold) under salt stress in both organs, witnessing a differential organ-specific response to stress. Profiting from recent pioneer works on massive cDNA sequencing in chickpea, more than 9,400 UniTags were able to be linked to UniProt entries. Additionally, gene ontology (GO) categories over-representation analysis enabled to filter out enriched biological processes among the differentially expressed UniTags. Subsequently, the gathered information was further cross-checked with stress-related pathways. From several filtered pathways, here we focus exemplarily on transcripts associated with the generation and scavenging of reactive oxygen species (ROS), as well as on transcripts involved in Na+ homeostasis. Although both processes are already very well characterized in other plants, the information generated in the present work is of high value. Information on expression profiles and sequence similarity for several hundreds of transcripts of potential interest is now available. Conclusions This report demonstrates, that the combination of the high-throughput transcriptome profiling technology SuperSAGE with one of the next-generation sequencing platforms allows deep insights into the first molecular reactions of a plant exposed to salinity. Cross validation with recent reports enriched the information about the salt stress dynamics of more than 9,000 chickpea ESTs, and enlarged their pool of alternative transcripts isoforms. As an example for the high resolution of the employed technology that we coin deepSuperSAGE, we demonstrate that ROS-scavenging and -generating pathways undergo strong global transcriptome changes in chickpea roots and nodules already 2 hours after onset of moderate salt stress (25 mM NaCl). Additionally, a set of more than 15 candidate transcripts are proposed to be potential components of the salt overly sensitive (SOS) pathway in chickpea. Newly identified transcript isoforms are potential targets for breeding novel cultivars with high salinity tolerance. We demonstrate that these targets can be integrated into breeding schemes by micro-arrays and RT-PCR assays downstream of the generation of 26 bp tags by SuperSAGE. PMID:21320317

  13. The salt-responsive transcriptome of chickpea roots and nodules via deepSuperSAGE.

    PubMed

    Molina, Carlos; Zaman-Allah, Mainassara; Khan, Faheema; Fatnassi, Nadia; Horres, Ralf; Rotter, Björn; Steinhauer, Diana; Amenc, Laurie; Drevon, Jean-Jacques; Winter, Peter; Kahl, Günter

    2011-02-14

    The combination of high-throughput transcript profiling and next-generation sequencing technologies is a prerequisite for genome-wide comprehensive transcriptome analysis. Our recent innovation of deepSuperSAGE is based on an advanced SuperSAGE protocol and its combination with massively parallel pyrosequencing on Roche's 454 sequencing platform. As a demonstration of the power of this combination, we have chosen the salt stress transcriptomes of roots and nodules of the third most important legume crop chickpea (Cicer arietinum L.). While our report is more technology-oriented, it nevertheless addresses a major world-wide problem for crops generally: high salinity. Together with low temperatures and water stress, high salinity is responsible for crop losses of millions of tons of various legume (and other) crops. Continuously deteriorating environmental conditions will combine with salinity stress to further compromise crop yields. As a good example for such stress-exposed crop plants, we started to characterize salt stress responses of chickpeas on the transcriptome level. We used deepSuperSAGE to detect early global transcriptome changes in salt-stressed chickpea. The salt stress responses of 86,919 transcripts representing 17,918 unique 26 bp deepSuperSAGE tags (UniTags) from roots of the salt-tolerant variety INRAT-93 two hours after treatment with 25 mM NaCl were characterized. Additionally, the expression of 57,281 transcripts representing 13,115 UniTags was monitored in nodules of the same plants. From a total of 144,200 analyzed 26 bp tags in roots and nodules together, 21,401 unique transcripts were identified. Of these, only 363 and 106 specific transcripts, respectively, were commonly up- or down-regulated (>3.0-fold) under salt stress in both organs, witnessing a differential organ-specific response to stress.Profiting from recent pioneer works on massive cDNA sequencing in chickpea, more than 9,400 UniTags were able to be linked to UniProt entries. Additionally, gene ontology (GO) categories over-representation analysis enabled to filter out enriched biological processes among the differentially expressed UniTags. Subsequently, the gathered information was further cross-checked with stress-related pathways. From several filtered pathways, here we focus exemplarily on transcripts associated with the generation and scavenging of reactive oxygen species (ROS), as well as on transcripts involved in Na+ homeostasis. Although both processes are already very well characterized in other plants, the information generated in the present work is of high value. Information on expression profiles and sequence similarity for several hundreds of transcripts of potential interest is now available. This report demonstrates, that the combination of the high-throughput transcriptome profiling technology SuperSAGE with one of the next-generation sequencing platforms allows deep insights into the first molecular reactions of a plant exposed to salinity. Cross validation with recent reports enriched the information about the salt stress dynamics of more than 9,000 chickpea ESTs, and enlarged their pool of alternative transcripts isoforms. As an example for the high resolution of the employed technology that we coin deepSuperSAGE, we demonstrate that ROS-scavenging and -generating pathways undergo strong global transcriptome changes in chickpea roots and nodules already 2 hours after onset of moderate salt stress (25 mM NaCl). Additionally, a set of more than 15 candidate transcripts are proposed to be potential components of the salt overly sensitive (SOS) pathway in chickpea. Newly identified transcript isoforms are potential targets for breeding novel cultivars with high salinity tolerance. We demonstrate that these targets can be integrated into breeding schemes by micro-arrays and RT-PCR assays downstream of the generation of 26 bp tags by SuperSAGE.

  14. Deep whole-genome sequencing of 90 Han Chinese genomes.

    PubMed

    Lan, Tianming; Lin, Haoxiang; Zhu, Wenjuan; Laurent, Tellier Christian Asker Melchior; Yang, Mengcheng; Liu, Xin; Wang, Jun; Wang, Jian; Yang, Huanming; Xu, Xun; Guo, Xiaosen

    2017-09-01

    Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low-frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low-frequency and novel variants. Although whole-exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole-genome sequencing data are limited for any population, and a large amount of low-frequency, population-specific variants remain uncharacterized. We have performed whole-genome sequencing at a high depth (∼×80) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genomes Project samples, including 45 Northern Han Chinese and 45 Southern Han Chinese samples. Eighty-three of these 90 have been sequenced by the 1000 Genomes Project. We have identified 12 568 804 single nucleotide polymorphisms, 2 074 210 short InDels, and 26 142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7 000 629 novel variants with low frequency (defined as minor allele frequency < 5%), including 5 813 503 single nucleotide polymorphisms, 1 169 199 InDels, and 17 927 structural variants. Using deep sequencing data, we have built a greatly expanded spectrum of genetic variation for the Han Chinese genome. Compared to the 1000 Genomes Project, these Han Chinese deep sequencing data enhance the characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement to the 1000 Genomes Project, as well as to other human genome projects. © The Authors 2017. Published by Oxford University Press.

  15. Differential expression analysis of Paralichthys olivaceus microRNAs in adult ovary and testis by deep sequencing.

    PubMed

    Gu, Yifeng; Zhang, Lei; Chen, Xiaowu

    2014-08-01

    MicroRNAs (miRNAs) play an important role in gonadal development and differentiation in fish. However, understanding of the mechanism of this process is hindered by our poor knowledge of miRNA expression patterns in fish gonads. In this study, miRNA libraries derived from adult gonads of Paralichthys olivaceus were generated by using next-generation sequencing (NGS) technology. Bioinformatics analysis was performed to distinguish mature miRNA sequences from two classes of small RNAs represented in the sequencing data. A total of 141 mature miRNAs were identified, in which 21 miRNAs were found in P. olivaceus for the first time. Variance and preference of miRNAs expression were concluded from the deep sequencing reads. Some miRNAs, such as pol-miR-143, pol-miR-26a and pol-let-7a were found with quite high expression levels in both gonads, while some exhibited a clear sex-biased expression in different gonad. Approximate 20.0% and 13.1% of the isolated miRNAs were preferentially expressed in the testis (FC<0.5) or ovary (FC>2), respectively. The identification and the preliminary analysis of the sex-biased expression of miRNAs in P. olivaceus gonads in our work by using NGS will provide us a basic catalog of miRNAs to facilitate future improvement and exploitation of sexual regulatory mechanisms in P. olivaceus. Copyright © 2014. Published by Elsevier Inc.

  16. FANSe2: a robust and cost-efficient alignment tool for quantitative next-generation sequencing applications.

    PubMed

    Xiao, Chuan-Le; Mai, Zhi-Biao; Lian, Xin-Lei; Zhong, Jia-Yong; Jin, Jing-Jie; He, Qing-Yu; Zhang, Gong

    2014-01-01

    Correct and bias-free interpretation of the deep sequencing data is inevitably dependent on the complete mapping of all mappable reads to the reference sequence, especially for quantitative RNA-seq applications. Seed-based algorithms are generally slow but robust, while Burrows-Wheeler Transform (BWT) based algorithms are fast but less robust. To have both advantages, we developed an algorithm FANSe2 with iterative mapping strategy based on the statistics of real-world sequencing error distribution to substantially accelerate the mapping without compromising the accuracy. Its sensitivity and accuracy are higher than the BWT-based algorithms in the tests using both prokaryotic and eukaryotic sequencing datasets. The gene identification results of FANSe2 is experimentally validated, while the previous algorithms have false positives and false negatives. FANSe2 showed remarkably better consistency to the microarray than most other algorithms in terms of gene expression quantifications. We implemented a scalable and almost maintenance-free parallelization method that can utilize the computational power of multiple office computers, a novel feature not present in any other mainstream algorithm. With three normal office computers, we demonstrated that FANSe2 mapped an RNA-seq dataset generated from an entire Illunima HiSeq 2000 flowcell (8 lanes, 608 M reads) to masked human genome within 4.1 hours with higher sensitivity than Bowtie/Bowtie2. FANSe2 thus provides robust accuracy, full indel sensitivity, fast speed, versatile compatibility and economical computational utilization, making it a useful and practical tool for deep sequencing applications. FANSe2 is freely available at http://bioinformatics.jnu.edu.cn/software/fanse2/.

  17. Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells.

    PubMed

    Beltman, Joost B; Urbanus, Jos; Velds, Arno; van Rooij, Nienke; Rohr, Jan C; Naik, Shalin H; Schumacher, Ton N

    2016-04-02

    Next generation sequencing (NGS) of amplified DNA is a powerful tool to describe genetic heterogeneity within cell populations that can both be used to investigate the clonal structure of cell populations and to perform genetic lineage tracing. For applications in which both abundant and rare sequences are biologically relevant, the relatively high error rate of NGS techniques complicates data analysis, as it is difficult to distinguish rare true sequences from spurious sequences that are generated by PCR or sequencing errors. This issue, for instance, applies to cellular barcoding strategies that aim to follow the amount and type of offspring of single cells, by supplying these with unique heritable DNA tags. Here, we use genetic barcoding data from the Illumina HiSeq platform to show that straightforward read threshold-based filtering of data is typically insufficient to filter out spurious barcodes. Importantly, we demonstrate that specific sequencing errors occur at an approximately constant rate across different samples that are sequenced in parallel. We exploit this observation by developing a novel approach to filter out spurious sequences. Application of our new method demonstrates its value in the identification of true sequences amongst spurious sequences in biological data sets.

  18. DeepBase: annotation and discovery of microRNAs and other noncoding RNAs from deep-sequencing data.

    PubMed

    Yang, Jian-Hua; Qu, Liang-Hu

    2012-01-01

    Recent advances in high-throughput deep-sequencing technology have produced large numbers of short and long RNA sequences and enabled the detection and profiling of known and novel microRNAs (miRNAs) and other noncoding RNAs (ncRNAs) at unprecedented sensitivity and depth. In this chapter, we describe the use of deepBase, a database that we have developed to integrate all public deep-sequencing data and to facilitate the comprehensive annotation and discovery of miRNAs and other ncRNAs from these data. deepBase provides an integrative, interactive, and versatile web graphical interface to evaluate miRBase-annotated miRNA genes and other known ncRNAs, explores the expression patterns of miRNAs and other ncRNAs, and discovers novel miRNAs and other ncRNAs from deep-sequencing data. deepBase also provides a deepView genome browser to comparatively analyze these data at multiple levels. deepBase is available at http://deepbase.sysu.edu.cn/.

  19. High-throughput sequencing and analysis of the gill tissue transcriptome from the deep-sea hydrothermal vent mussel Bathymodiolus azoricus

    PubMed Central

    2010-01-01

    Background Bathymodiolus azoricus is a deep-sea hydrothermal vent mussel found in association with large faunal communities living in chemosynthetic environments at the bottom of the sea floor near the Azores Islands. Investigation of the exceptional physiological reactions that vent mussels have adopted in their habitat, including responses to environmental microbes, remains a difficult challenge for deep-sea biologists. In an attempt to reveal genes potentially involved in the deep-sea mussel innate immunity we carried out a high-throughput sequence analysis of freshly collected B. azoricus transcriptome using gills tissues as the primary source of immune transcripts given its strategic role in filtering the surrounding waterborne potentially infectious microorganisms. Additionally, a substantial EST data set was produced and from which a comprehensive collection of genes coding for putative proteins was organized in a dedicated database, "DeepSeaVent" the first deep-sea vent animal transcriptome database based on the 454 pyrosequencing technology. Results A normalized cDNA library from gills tissue was sequenced in a full 454 GS-FLX run, producing 778,996 sequencing reads. Assembly of the high quality reads resulted in 75,407 contigs of which 3,071 were singletons. A total of 39,425 transcripts were conceptually translated into amino-sequences of which 22,023 matched known proteins in the NCBI non-redundant protein database, 15,839 revealed conserved protein domains through InterPro functional classification and 9,584 were assigned with Gene Ontology terms. Queries conducted within the database enabled the identification of genes putatively involved in immune and inflammatory reactions which had not been previously evidenced in the vent mussel. Their physical counterpart was confirmed by semi-quantitative quantitative Reverse-Transcription-Polymerase Chain Reactions (RT-PCR) and their RNA transcription level by quantitative PCR (qPCR) experiments. Conclusions We have established the first tissue transcriptional analysis of a deep-sea hydrothermal vent animal and generated a searchable catalog of genes that provides a direct method of identifying and retrieving vast numbers of novel coding sequences which can be applied in gene expression profiling experiments from a non-conventional model organism. This provides the most comprehensive sequence resource for identifying novel genes currently available for a deep-sea vent organism, in particular, genes putatively involved in immune and inflammatory reactions in vent mussels. The characterization of the B. azoricus transcriptome will facilitate research into biological processes underlying physiological adaptations to hydrothermal vent environments and will provide a basis for expanding our understanding of genes putatively involved in adaptations processes during post-capture long term acclimatization experiments, at "sea-level" conditions, using B. azoricus as a model organism. PMID:20937131

  20. De novo peptide sequencing by deep learning

    PubMed Central

    Tran, Ngoc Hieu; Zhang, Xianglilan; Xin, Lei; Shan, Baozhen; Li, Ming

    2017-01-01

    De novo peptide sequencing from tandem MS data is the key technology in proteomics for the characterization of proteins, especially for new sequences, such as mAbs. In this study, we propose a deep neural network model, DeepNovo, for de novo peptide sequencing. DeepNovo architecture combines recent advances in convolutional neural networks and recurrent neural networks to learn features of tandem mass spectra, fragment ions, and sequence patterns of peptides. The networks are further integrated with local dynamic programming to solve the complex optimization task of de novo sequencing. We evaluated the method on a wide variety of species and found that DeepNovo considerably outperformed state of the art methods, achieving 7.7–22.9% higher accuracy at the amino acid level and 38.1–64.0% higher accuracy at the peptide level. We further used DeepNovo to automatically reconstruct the complete sequences of antibody light and heavy chains of mouse, achieving 97.5–100% coverage and 97.2–99.5% accuracy, without assisting databases. Moreover, DeepNovo is retrainable to adapt to any sources of data and provides a complete end-to-end training and prediction solution to the de novo sequencing problem. Not only does our study extend the deep learning revolution to a new field, but it also shows an innovative approach in solving optimization problems by using deep learning and dynamic programming. PMID:28720701

  1. Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning.

    PubMed

    Adhikari, Badri; Hou, Jie; Cheng, Jianlin

    2018-03-01

    In this study, we report the evaluation of the residue-residue contacts predicted by our three different methods in the CASP12 experiment, focusing on studying the impact of multiple sequence alignment, residue coevolution, and machine learning on contact prediction. The first method (MULTICOM-NOVEL) uses only traditional features (sequence profile, secondary structure, and solvent accessibility) with deep learning to predict contacts and serves as a baseline. The second method (MULTICOM-CONSTRUCT) uses our new alignment algorithm to generate deep multiple sequence alignment to derive coevolution-based features, which are integrated by a neural network method to predict contacts. The third method (MULTICOM-CLUSTER) is a consensus combination of the predictions of the first two methods. We evaluated our methods on 94 CASP12 domains. On a subset of 38 free-modeling domains, our methods achieved an average precision of up to 41.7% for top L/5 long-range contact predictions. The comparison of the three methods shows that the quality and effective depth of multiple sequence alignments, coevolution-based features, and machine learning integration of coevolution-based features and traditional features drive the quality of predicted protein contacts. On the full CASP12 dataset, the coevolution-based features alone can improve the average precision from 28.4% to 41.6%, and the machine learning integration of all the features further raises the precision to 56.3%, when top L/5 predicted long-range contacts are evaluated. And the correlation between the precision of contact prediction and the logarithm of the number of effective sequences in alignments is 0.66. © 2017 Wiley Periodicals, Inc.

  2. A novel process of viral vector barcoding and library preparation enables high-diversity library generation and recombination-free paired-end sequencing

    PubMed Central

    Davidsson, Marcus; Diaz-Fernandez, Paula; Schwich, Oliver D.; Torroba, Marcos; Wang, Gang; Björklund, Tomas

    2016-01-01

    Detailed characterization and mapping of oligonucleotide function in vivo is generally a very time consuming effort that only allows for hypothesis driven subsampling of the full sequence to be analysed. Recent advances in deep sequencing together with highly efficient parallel oligonucleotide synthesis and cloning techniques have, however, opened up for entirely new ways to map genetic function in vivo. Here we present a novel, optimized protocol for the generation of universally applicable, barcode labelled, plasmid libraries. The libraries are designed to enable the production of viral vector preparations assessing coding or non-coding RNA function in vivo. When generating high diversity libraries, it is a challenge to achieve efficient cloning, unambiguous barcoding and detailed characterization using low-cost sequencing technologies. With the presented protocol, diversity of above 3 million uniquely barcoded adeno-associated viral (AAV) plasmids can be achieved in a single reaction through a process achievable in any molecular biology laboratory. This approach opens up for a multitude of in vivo assessments from the evaluation of enhancer and promoter regions to the optimization of genome editing. The generated plasmid libraries are also useful for validation of sequencing clustering algorithms and we here validate the newly presented message passing clustering process named Starcode. PMID:27874090

  3. AMPLISAS: a web server for multilocus genotyping using next-generation amplicon sequencing data.

    PubMed

    Sebastian, Alvaro; Herdegen, Magdalena; Migalska, Magdalena; Radwan, Jacek

    2016-03-01

    Next-generation sequencing (NGS) technologies are revolutionizing the fields of biology and medicine as powerful tools for amplicon sequencing (AS). Using combinations of primers and barcodes, it is possible to sequence targeted genomic regions with deep coverage for hundreds, even thousands, of individuals in a single experiment. This is extremely valuable for the genotyping of gene families in which locus-specific primers are often difficult to design, such as the major histocompatibility complex (MHC). The utility of AS is, however, limited by the high intrinsic sequencing error rates of NGS technologies and other sources of error such as polymerase amplification or chimera formation. Correcting these errors requires extensive bioinformatic post-processing of NGS data. Amplicon Sequence Assignment (AMPLISAS) is a tool that performs analysis of AS results in a simple and efficient way, while offering customization options for advanced users. AMPLISAS is designed as a three-step pipeline consisting of (i) read demultiplexing, (ii) unique sequence clustering and (iii) erroneous sequence filtering. Allele sequences and frequencies are retrieved in excel spreadsheet format, making them easy to interpret. AMPLISAS performance has been successfully benchmarked against previously published genotyped MHC data sets obtained with various NGS technologies. © 2015 John Wiley & Sons Ltd.

  4. Detection of microRNAs in color space.

    PubMed

    Marco, Antonio; Griffiths-Jones, Sam

    2012-02-01

    Deep sequencing provides inexpensive opportunities to characterize the transcriptional diversity of known genomes. The AB SOLiD technology generates millions of short sequencing reads in color-space; that is, the raw data is a sequence of colors, where each color represents 2 nt and each nucleotide is represented by two consecutive colors. This strategy is purported to have several advantages, including increased ability to distinguish sequencing errors from polymorphisms. Several programs have been developed to map short reads to genomes in color space. However, a number of previously unexplored technical issues arise when using SOLiD technology to characterize microRNAs. Here we explore these technical difficulties. First, since the sequenced reads are longer than the biological sequences, every read is expected to contain linker fragments. The color-calling error rate increases toward the 3(') end of the read such that recognizing the linker sequence for removal becomes problematic. Second, mapping in color space may lead to the loss of the first nucleotide of each read. We propose a sequential trimming and mapping approach to map small RNAs. Using our strategy, we reanalyze three published insect small RNA deep sequencing datasets and characterize 22 new microRNAs. A bash shell script to perform the sequential trimming and mapping procedure, called SeqTrimMap, is available at: http://www.mirbase.org/tools/seqtrimmap/ antonio.marco@manchester.ac.uk Supplementary data are available at Bioinformatics online.

  5. Rapid Fine Conformational Epitope Mapping Using Comprehensive Mutagenesis and Deep Sequencing*

    PubMed Central

    Kowalsky, Caitlin A.; Faber, Matthew S.; Nath, Aritro; Dann, Hailey E.; Kelly, Vince W.; Liu, Li; Shanker, Purva; Wagner, Ellen K.; Maynard, Jennifer A.; Chan, Christina; Whitehead, Timothy A.

    2015-01-01

    Knowledge of the fine location of neutralizing and non-neutralizing epitopes on human pathogens affords a better understanding of the structural basis of antibody efficacy, which will expedite rational design of vaccines, prophylactics, and therapeutics. However, full utilization of the wealth of information from single cell techniques and antibody repertoire sequencing awaits the development of a high throughput, inexpensive method to map the conformational epitopes for antibody-antigen interactions. Here we show such an approach that combines comprehensive mutagenesis, cell surface display, and DNA deep sequencing. We develop analytical equations to identify epitope positions and show the method effectiveness by mapping the fine epitope for different antibodies targeting TNF, pertussis toxin, and the cancer target TROP2. In all three cases, the experimentally determined conformational epitope was consistent with previous experimental datasets, confirming the reliability of the experimental pipeline. Once the comprehensive library is generated, fine conformational epitope maps can be prepared at a rate of four per day. PMID:26296891

  6. Deep intronic GPR143 mutation in a Japanese family with ocular albinism

    PubMed Central

    Naruto, Takuya; Okamoto, Nobuhiko; Masuda, Kiyoshi; Endo, Takao; Hatsukawa, Yoshikazu; Kohmoto, Tomohiro; Imoto, Issei

    2015-01-01

    Deep intronic mutations are often ignored as possible causes of human disease. Using whole-exome sequencing, we analysed genomic DNAs of a Japanese family with two male siblings affected by ocular albinism and congenital nystagmus. Although mutations or copy number alterations of coding regions were not identified in candidate genes, the novel intronic mutation c.659-131 T > G within GPR143 intron 5 was identified as hemizygous in affected siblings and as heterozygous in the unaffected mother. This mutation was predicted to create a cryptic splice donor site within intron 5 and activate a cryptic acceptor site at 41nt upstream, causing the insertion into the coding sequence of an out-of-frame 41-bp pseudoexon with a premature stop codon in the aberrant transcript, which was confirmed by minigene experiments. This result expands the mutational spectrum of GPR143 and suggests the utility of next-generation sequencing integrated with in silico and experimental analyses for improving the molecular diagnosis of this disease. PMID:26061757

  7. Deep intronic GPR143 mutation in a Japanese family with ocular albinism.

    PubMed

    Naruto, Takuya; Okamoto, Nobuhiko; Masuda, Kiyoshi; Endo, Takao; Hatsukawa, Yoshikazu; Kohmoto, Tomohiro; Imoto, Issei

    2015-06-10

    Deep intronic mutations are often ignored as possible causes of human disease. Using whole-exome sequencing, we analysed genomic DNAs of a Japanese family with two male siblings affected by ocular albinism and congenital nystagmus. Although mutations or copy number alterations of coding regions were not identified in candidate genes, the novel intronic mutation c.659-131 T > G within GPR143 intron 5 was identified as hemizygous in affected siblings and as heterozygous in the unaffected mother. This mutation was predicted to create a cryptic splice donor site within intron 5 and activate a cryptic acceptor site at 41nt upstream, causing the insertion into the coding sequence of an out-of-frame 41-bp pseudoexon with a premature stop codon in the aberrant transcript, which was confirmed by minigene experiments. This result expands the mutational spectrum of GPR143 and suggests the utility of next-generation sequencing integrated with in silico and experimental analyses for improving the molecular diagnosis of this disease.

  8. Novel microbial diversity retrieved by autonomous robotic exploration of the world's deepest vertical phreatic sinkhole.

    PubMed

    Sahl, Jason W; Fairfield, Nathaniel; Harris, J Kirk; Wettergreen, David; Stone, William C; Spear, John R

    2010-03-01

    The deep phreatic thermal explorer (DEPTHX) is an autonomous underwater vehicle designed to navigate an unexplored environment, generate high-resolution three-dimensional (3-D) maps, collect biological samples based on an autonomous sampling decision, and return to its origin. In the spring of 2007, DEPTHX was deployed in Zacatón, a deep (approximately 318 m), limestone, phreatic sinkhole (cenote) in northeastern Mexico. As DEPTHX descended, it generated a 3-D map based on the processing of range data from 54 onboard sonars. The vehicle collected water column samples and wall biomat samples throughout the depth profile of the cenote. Post-expedition sample analysis via comparative analysis of 16S rRNA gene sequences revealed a wealth of microbial diversity. Traditional Sanger gene sequencing combined with a barcoded-amplicon pyrosequencing approach revealed novel, phylum-level lineages from the domains Bacteria and Archaea; in addition, several novel subphylum lineages were also identified. Overall, DEPTHX successfully navigated and mapped Zacatón, and collected biological samples based on an autonomous decision, which revealed novel microbial diversity in a previously unexplored environment.

  9. Exploring fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing

    NASA Astrophysics Data System (ADS)

    Zhang, Xiao-Yong; Wang, Guang-Hua; Xu, Xin-Ya; Nong, Xu-Hua; Wang, Jie; Amin, Muhammad; Qi, Shu-Hua

    2016-10-01

    The present study investigated the fungal diversity in four different deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing of the nuclear ribosomal internal transcribed spacer-1 (ITS1). A total of 40,297 fungal ITS1 sequences clustered into 420 operational taxonomic units (OTUs) with 97% sequence similarity and 170 taxa were recovered from these sediments. Most ITS1 sequences (78%) belonged to the phylum Ascomycota, followed by Basidiomycota (17.3%), Zygomycota (1.5%) and Chytridiomycota (0.8%), and a small proportion (2.4%) belonged to unassigned fungal phyla. Compared with previous studies on fungal diversity of sediments from deep-sea environments by culture-dependent approach and clone library analysis, the present result suggested that Illumina sequencing had been dramatically accelerating the discovery of fungal community of deep-sea sediments. Furthermore, our results revealed that Sordariomycetes was the most diverse and abundant fungal class in this study, challenging the traditional view that the diversity of Sordariomycetes phylotypes was low in the deep-sea environments. In addition, more than 12 taxa accounted for 21.5% sequences were found to be rarely reported as deep-sea fungi, suggesting the deep-sea sediments from Okinawa Trough harbored a plethora of different fungal communities compared with other deep-sea environments. To our knowledge, this study is the first exploration of the fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing.

  10. Command system output bit verification

    NASA Technical Reports Server (NTRS)

    Odd, C. W.; Abbate, S. F.

    1981-01-01

    An automatic test was developed to test the ability of the deep space station (DSS) command subsystem and exciter to generate and radiate, from the exciter, the correct idle bit sequence for a given flight project or to store and radiate received command data elements and files without alteration. This test, called the command system output bit verification test, is an extension of the command system performance test (SPT) and can be selected as an SPT option. The test compares the bit stream radiated from the DSS exciter with reference sequences generated by the SPT software program. The command subsystem and exciter are verified when the bit stream and reference sequences are identical. It is a key element of the acceptance testing conducted on the command processor assembly (CPA) operational program (DMC-0584-OP-G) prior to its transfer from development to operations.

  11. Multiplicity and molecular epidemiology of Plasmodium vivax and Plasmodium falciparum infections in East Africa.

    PubMed

    Zhong, Daibin; Lo, Eugenia; Wang, Xiaoming; Yewhalaw, Delenasaw; Zhou, Guofa; Atieli, Harrysone E; Githeko, Andrew; Hemming-Schroeder, Elizabeth; Lee, Ming-Chieh; Afrane, Yaw; Yan, Guiyun

    2018-05-02

    Parasite genetic diversity and multiplicity of infection (MOI) affect clinical outcomes, response to drug treatment and naturally-acquired or vaccine-induced immunity. Traditional methods often underestimate the frequency and diversity of multiclonal infections due to technical sensitivity and specificity. Next-generation sequencing techniques provide a novel opportunity to study complexity of parasite populations and molecular epidemiology. Symptomatic and asymptomatic Plasmodium vivax samples were collected from health centres/hospitals and schools, respectively, from 2011 to 2015 in Ethiopia. Similarly, both symptomatic and asymptomatic Plasmodium falciparum samples were collected, respectively, from hospitals and schools in 2005 and 2015 in Kenya. Finger-pricked blood samples were collected and dried on filter paper. Long amplicon (> 400 bp) deep sequencing of merozoite surface protein 1 (msp1) gene was conducted to determine multiplicity and molecular epidemiology of P. vivax and P. falciparum infections. The results were compared with those based on short amplicon (117 bp) deep sequencing. A total of 139 P. vivax and 222 P. falciparum samples were pyro-sequenced for pvmsp1 and pfmsp1, yielding a total of 21 P. vivax and 99 P. falciparum predominant haplotypes. The average MOI for P. vivax and P. falciparum were 2.16 and 2.68, respectively, which were significantly higher than that of microsatellite markers and short amplicon (117 bp) deep sequencing. Multiclonal infections were detected in 62.2% of the samples for P. vivax and 74.8% of the samples for P. falciparum. Four out of the five subjects with recurrent P. vivax malaria were found to be a relapse 44-65 days after clearance of parasites. No difference was observed in MOI among P. vivax patients of different symptoms, ages and genders. Similar patterns were also observed in P. falciparum except for one study site in Kenyan lowland areas with significantly higher MOI. The study used a novel method to evaluate Plasmodium MOI and molecular epidemiological patterns by long amplicon ultra-deep sequencing. The complexity of infections were similar among age groups, symptoms, genders, transmission settings (spatial heterogeneity), as well as over years (pre- vs. post-scale-up interventions). This study demonstrated that long amplicon deep sequencing is a useful tool to investigate multiplicity and molecular epidemiology of Plasmodium parasite infections.

  12. A deep transcriptomic analysis of pod development in the vanilla orchid (Vanilla planifolia).

    PubMed

    Rao, Xiaolan; Krom, Nick; Tang, Yuhong; Widiez, Thomas; Havkin-Frenkel, Daphna; Belanger, Faith C; Dixon, Richard A; Chen, Fang

    2014-11-07

    Pods of the vanilla orchid (Vanilla planifolia) accumulate large amounts of the flavor compound vanillin (3-methoxy, 4-hydroxy-benzaldehyde) as a glucoside during the later stages of their development. At earlier stages, the developing seeds within the pod synthesize a novel lignin polymer, catechyl (C) lignin, in their coats. Genomic resources for determining the biosynthetic routes to these compounds and other flavor components in V. planifolia are currently limited. Using next-generation sequencing technologies, we have generated very large gene sequence datasets from vanilla pods at different times of development, and representing different tissue types, including the seeds, hairs, placental and mesocarp tissues. This developmental series was chosen as being the most informative for interrogation of pathways of vanillin and C-lignin biosynthesis in the pod and seed, respectively. The combined 454/Illumina RNA-seq platforms provide both deep sequence coverage and high quality de novo transcriptome assembly for this non-model crop species. The annotated sequence data provide a foundation for understanding multiple aspects of the biochemistry and development of the vanilla bean, as exemplified by the identification of candidate genes involved in lignin biosynthesis. Our transcriptome data indicate that C-lignin formation in the seed coat involves coordinate expression of monolignol biosynthetic genes with the exception of those encoding the caffeoyl coenzyme A 3-O-methyltransferase for conversion of caffeoyl to feruloyl moieties. This database provides a general resource for further studies on this important flavor species.

  13. Plastid Phylogenomics Resolve Deep Relationships among Eupolypod II Ferns with Rapid Radiation and Rate Heterogeneity

    PubMed Central

    Wei, Ran; Yan, Yue-Hong; Harris, AJ; Kang, Jong-Soo; Shen, Hui; Zhang, Xian-Chun

    2017-01-01

    Abstract The eupolypods II ferns represent a classic case of evolutionary radiation and, simultaneously, exhibit high substitution rate heterogeneity. These factors have been proposed to contribute to the contentious resolutions among clades within this fern group in multilocus phylogenetic studies. We investigated the deep phylogenetic relationships of eupolypod II ferns by sampling all major families and using 40 plastid genomes, or plastomes, of which 33 were newly sequenced with next-generation sequencing technology. We performed model-based analyses to evaluate the diversity of molecular evolutionary rates for these ferns. Our plastome data, with more than 26,000 informative characters, yielded good resolution for deep relationships within eupolypods II and unambiguously clarified the position of Rhachidosoraceae and the monophyly of Athyriaceae. Results of rate heterogeneity analysis revealed approximately 33 significant rate shifts in eupolypod II ferns, with the most heterogeneous rates (both accelerations and decelerations) occurring in two phylogenetically difficult lineages, that is, the Rhachidosoraceae–Aspleniaceae and Athyriaceae clades. These observations support the hypothesis that rate heterogeneity has previously constrained the deep phylogenetic resolution in eupolypods II. According to the plastome data, we propose that 14 chloroplast markers are particularly phylogenetically informative for eupolypods II both at the familial and generic levels. Our study demonstrates the power of a character-rich plastome data set and high-throughput sequencing for resolving the recalcitrant lineages, which have undergone rapid evolutionary radiation and dramatic changes in substitution rates. PMID:28854625

  14. HSA: a heuristic splice alignment tool.

    PubMed

    Bu, Jingde; Chi, Xuebin; Jin, Zhong

    2013-01-01

    RNA-Seq methodology is a revolutionary transcriptomics sequencing technology, which is the representative of Next generation Sequencing (NGS). With the high throughput sequencing of RNA-Seq, we can acquire much more information like differential expression and novel splice variants from deep sequence analysis and data mining. But the short read length brings a great challenge to alignment, especially when the reads span two or more exons. A two steps heuristic splice alignment tool is generated in this investigation. First, map raw reads to reference with unspliced aligner--BWA; second, split initial unmapped reads into three equal short reads (seeds), align each seed to the reference, filter hits, search possible split position of read and extend hits to a complete match. Compare with other splice alignment tools like SOAPsplice and Tophat2, HSA has a better performance in call rate and efficiency, but its results do not as accurate as the other software to some extent. HSA is an effective spliced aligner of RNA-Seq reads mapping, which is available at https://github.com/vlcc/HSA.

  15. Kinetics of forming aldehydes in frying oils and their distribution in French fries revealed by LC-MS-based chemometrics

    USDA-ARS?s Scientific Manuscript database

    Aldehydes are major secondary lipid oxidation products (LOPs) from heating vegetable oils and deep frying. The routes and reactions that generate aldehydes have been extensively investigated, but the sequences and kinetics of their formation in oils are poorly defined. In this study, a platform comb...

  16. Metagenome sequencing and 98 microbial genomes from Juan de Fuca Ridge flank subsurface fluids

    NASA Astrophysics Data System (ADS)

    Jungbluth, Sean P.; Amend, Jan P.; Rappé, Michael S.

    2017-03-01

    The global deep subsurface biosphere is one of the largest reservoirs for microbial life on our planet. This study takes advantage of new sampling technologies and couples them with improvements to DNA sequencing and associated informatics tools to reconstruct the genomes of uncultivated Bacteria and Archaea from fluids collected deep within the Juan de Fuca Ridge subseafloor. Here, we generated two metagenomes from borehole observatories located 311 meters apart and, using binning tools, retrieved 98 genomes from metagenomes (GFMs). Of the GFMs, 31 were estimated to be >90% complete, while an additional 17 were >70% complete. Phylogenomic analysis revealed 53 bacterial and 45 archaeal GFMs, of which nearly all were distantly related to known cultivated isolates. In the GFMs, abundant Bacteria included Chloroflexi, Nitrospirae, Acetothermia (OP1), EM3, Aminicenantes (OP8), Gammaproteobacteria, and Deltaproteobacteria, while abundant Archaea included Archaeoglobi, Bathyarchaeota (MCG), and Marine Benthic Group E (MBG-E). These data are the first GFMs reconstructed from the deep basaltic subseafloor biosphere, and provide a dataset available for further interrogation.

  17. Metagenome sequencing and 98 microbial genomes from Juan de Fuca Ridge flank subsurface fluids.

    PubMed

    Jungbluth, Sean P; Amend, Jan P; Rappé, Michael S

    2017-03-28

    The global deep subsurface biosphere is one of the largest reservoirs for microbial life on our planet. This study takes advantage of new sampling technologies and couples them with improvements to DNA sequencing and associated informatics tools to reconstruct the genomes of uncultivated Bacteria and Archaea from fluids collected deep within the Juan de Fuca Ridge subseafloor. Here, we generated two metagenomes from borehole observatories located 311 meters apart and, using binning tools, retrieved 98 genomes from metagenomes (GFMs). Of the GFMs, 31 were estimated to be >90% complete, while an additional 17 were >70% complete. Phylogenomic analysis revealed 53 bacterial and 45 archaeal GFMs, of which nearly all were distantly related to known cultivated isolates. In the GFMs, abundant Bacteria included Chloroflexi, Nitrospirae, Acetothermia (OP1), EM3, Aminicenantes (OP8), Gammaproteobacteria, and Deltaproteobacteria, while abundant Archaea included Archaeoglobi, Bathyarchaeota (MCG), and Marine Benthic Group E (MBG-E). These data are the first GFMs reconstructed from the deep basaltic subseafloor biosphere, and provide a dataset available for further interrogation.

  18. Metagenome sequencing and 98 microbial genomes from Juan de Fuca Ridge flank subsurface fluids

    PubMed Central

    Jungbluth, Sean P.; Amend, Jan P.; Rappé, Michael S.

    2017-01-01

    The global deep subsurface biosphere is one of the largest reservoirs for microbial life on our planet. This study takes advantage of new sampling technologies and couples them with improvements to DNA sequencing and associated informatics tools to reconstruct the genomes of uncultivated Bacteria and Archaea from fluids collected deep within the Juan de Fuca Ridge subseafloor. Here, we generated two metagenomes from borehole observatories located 311 meters apart and, using binning tools, retrieved 98 genomes from metagenomes (GFMs). Of the GFMs, 31 were estimated to be >90% complete, while an additional 17 were >70% complete. Phylogenomic analysis revealed 53 bacterial and 45 archaeal GFMs, of which nearly all were distantly related to known cultivated isolates. In the GFMs, abundant Bacteria included Chloroflexi, Nitrospirae, Acetothermia (OP1), EM3, Aminicenantes (OP8), Gammaproteobacteria, and Deltaproteobacteria, while abundant Archaea included Archaeoglobi, Bathyarchaeota (MCG), and Marine Benthic Group E (MBG-E). These data are the first GFMs reconstructed from the deep basaltic subseafloor biosphere, and provide a dataset available for further interrogation. PMID:28350381

  19. HomSI: a homozygous stretch identifier from next-generation sequencing data.

    PubMed

    Görmez, Zeliha; Bakir-Gungor, Burcu; Sagiroglu, Mahmut Samil

    2014-02-01

    In consanguineous families, as a result of inheriting the same genomic segments through both parents, the individuals have stretches of their genomes that are homozygous. This situation leads to the prevalence of recessive diseases among the members of these families. Homozygosity mapping is based on this observation, and in consanguineous families, several recessive disease genes have been discovered with the help of this technique. The researchers typically use single nucleotide polymorphism arrays to determine the homozygous regions and then search for the disease gene by sequencing the genes within this candidate disease loci. Recently, the advent of next-generation sequencing enables the concurrent identification of homozygous regions and the detection of mutations relevant for diagnosis, using data from a single sequencing experiment. In this respect, we have developed a novel tool that identifies homozygous regions using deep sequence data. Using *.vcf (variant call format) files as an input file, our program identifies the majority of homozygous regions found by microarray single nucleotide polymorphism genotype data. HomSI software is freely available at www.igbam.bilgem.tubitak.gov.tr/softwares/HomSI, with an online manual.

  20. A multiple-alignment based primer design algorithm for genetically highly variable DNA targets

    PubMed Central

    2013-01-01

    Background Primer design for highly variable DNA sequences is difficult, and experimental success requires attention to many interacting constraints. The advent of next-generation sequencing methods allows the investigation of rare variants otherwise hidden deep in large populations, but requires attention to population diversity and primer localization in relatively conserved regions, in addition to recognized constraints typically considered in primer design. Results Design constraints include degenerate sites to maximize population coverage, matching of melting temperatures, optimizing de novo sequence length, finding optimal bio-barcodes to allow efficient downstream analyses, and minimizing risk of dimerization. To facilitate primer design addressing these and other constraints, we created a novel computer program (PrimerDesign) that automates this complex procedure. We show its powers and limitations and give examples of successful designs for the analysis of HIV-1 populations. Conclusions PrimerDesign is useful for researchers who want to design DNA primers and probes for analyzing highly variable DNA populations. It can be used to design primers for PCR, RT-PCR, Sanger sequencing, next-generation sequencing, and other experimental protocols targeting highly variable DNA samples. PMID:23965160

  1. Single-Molecule Sequencing Reveals Complex Genome Variation of Hepatitis B Virus during 15 Years of Chronic Infection following Liver Transplantation

    PubMed Central

    Betz-Stablein, B. D.; Töpfer, A.; Littlejohn, M.; Yuen, L.; Colledge, D.; Sozzi, V.; Angus, P.; Thompson, A.; Revill, P.; Beerenwinkel, N.; Warner, N.

    2016-01-01

    ABSTRACT Chronic hepatitis B (CHB) is prevalent worldwide. The infectious agent, hepatitis B virus (HBV), replicates via an RNA intermediate and is error prone, leading to the rapid generation of closely related but not identical viral variants, including those that can escape host immune responses and antiviral treatments. The complexity of CHB can be further enhanced by the presence of HBV variants with large deletions in the genome generated via splicing (spHBV variants). Although spHBV variants are incapable of autonomous replication, their replication is rescued by wild-type HBV. spHBV variants have been shown to enhance wild-type virus replication, and their prevalence increases with liver disease progression. Single-molecule deep sequencing was performed on whole HBV genomes extracted from samples, including the liver explant, longitudinally collected from a subject with CHB over a 15-year period after liver transplantation. By employing novel bioinformatics methods, this analysis showed that the dynamics of the viral population across a period of changing treatment regimens was complex. The spHBV variants detected in the liver explant remained present posttransplantation, and a highly diverse novel spHBV population as well as variants with multiple deletions in the pre-S genes emerged. The identification of novel mutations outside the HBV reverse transcriptase gene that co-occurred with known drug resistance-associated mutations highlights the relevance of using full-genome deep sequencing and supports the hypothesis that drug resistance involves interactions across the full length of the HBV genome. IMPORTANCE Single-molecule sequencing allowed the characterization, in unprecedented detail, of the evolution of HBV populations and offered unique insights into the dynamics of defective and spHBV variants following liver transplantation and complex treatment regimens. This analysis also showed the rapid adaptation of HBV populations to treatment regimens with evolving drug resistance phenotypes and evidence of purifying selection across the whole genome. Finally, the new open-source bioinformatics tools with the capacity to easily identify potential spliced variants from deep sequencing data are freely available. PMID:27252524

  2. Modeling and prediction of peptide drift times in ion mobility spectrometry using sequence-based and structure-based approaches.

    PubMed

    Zhang, Yiming; Jin, Quan; Wang, Shuting; Ren, Ren

    2011-05-01

    The mobile behavior of 1481 peptides in ion mobility spectrometry (IMS), which are generated by protease digestion of the Drosophila melanogaster proteome, is modeled and predicted based on two different types of characterization methods, i.e. sequence-based approach and structure-based approach. In this procedure, the sequence-based approach considers both the amino acid composition of a peptide and the local environment profile of each amino acid in the peptide; the structure-based approach is performed with the CODESSA protocol, which regards a peptide as a common organic compound and generates more than 200 statistically significant variables to characterize the whole structure profile of a peptide molecule. Subsequently, the nonlinear support vector machine (SVM) and Gaussian process (GP) as well as linear partial least squares (PLS) regression is employed to correlate the structural parameters of the characterizations with the IMS drift times of these peptides. The obtained quantitative structure-spectrum relationship (QSSR) models are evaluated rigorously and investigated systematically via both one-deep and two-deep cross-validations as well as the rigorous Monte Carlo cross-validation (MCCV). We also give a comprehensive comparison on the resulting statistics arising from the different combinations of variable types with modeling methods and find that the sequence-based approach can give the QSSR models with better fitting ability and predictive power but worse interpretability than the structure-based approach. In addition, though the QSSR modeling using sequence-based approach is not needed for the preparation of the minimization structures of peptides before the modeling, it would be considerably efficient as compared to that using structure-based approach. Copyright © 2011 Elsevier Ltd. All rights reserved.

  3. Leaf Transcriptome Sequencing for Identifying Genic-SSR Markers and SNP Heterozygosity in Crossbred Mango Variety 'Amrapali' (Mangifera indica L.).

    PubMed

    Mahato, Ajay Kumar; Sharma, Nimisha; Singh, Akshay; Srivastav, Manish; Jaiprakash; Singh, Sanjay Kumar; Singh, Anand Kumar; Sharma, Tilak Raj; Singh, Nagendra Kumar

    2016-01-01

    Mango (Mangifera indica L.) is called "king of fruits" due to its sweetness, richness of taste, diversity, large production volume and a variety of end usage. Despite its huge economic importance genomic resources in mango are scarce and genetics of useful horticultural traits are poorly understood. Here we generated deep coverage leaf RNA sequence data for mango parental varieties 'Neelam', 'Dashehari' and their hybrid 'Amrapali' using next generation sequencing technologies. De-novo sequence assembly generated 27,528, 20,771 and 35,182 transcripts for the three genotypes, respectively. The transcripts were further assembled into a non-redundant set of 70,057 unigenes that were used for SSR and SNP identification and annotation. Total 5,465 SSR loci were identified in 4,912 unigenes with 288 type I SSR (n ≥ 20 bp). One hundred type I SSR markers were randomly selected of which 43 yielded PCR amplicons of expected size in the first round of validation and were designated as validated genic-SSR markers. Further, 22,306 SNPs were identified by aligning high quality sequence reads of the three mango varieties to the reference unigene set, revealing significantly enhanced SNP heterozygosity in the hybrid Amrapali. The present study on leaf RNA sequencing of mango varieties and their hybrid provides useful genomic resource for genetic improvement of mango.

  4. Leaf Transcriptome Sequencing for Identifying Genic-SSR Markers and SNP Heterozygosity in Crossbred Mango Variety ‘Amrapali’ (Mangifera indica L.)

    PubMed Central

    Mahato, Ajay Kumar; Sharma, Nimisha; Singh, Akshay; Srivastav, Manish; Jaiprakash; Singh, Sanjay Kumar; Singh, Anand Kumar; Sharma, Tilak Raj; Singh, Nagendra Kumar

    2016-01-01

    Mango (Mangifera indica L.) is called “king of fruits” due to its sweetness, richness of taste, diversity, large production volume and a variety of end usage. Despite its huge economic importance genomic resources in mango are scarce and genetics of useful horticultural traits are poorly understood. Here we generated deep coverage leaf RNA sequence data for mango parental varieties ‘Neelam’, ‘Dashehari’ and their hybrid ‘Amrapali’ using next generation sequencing technologies. De-novo sequence assembly generated 27,528, 20,771 and 35,182 transcripts for the three genotypes, respectively. The transcripts were further assembled into a non-redundant set of 70,057 unigenes that were used for SSR and SNP identification and annotation. Total 5,465 SSR loci were identified in 4,912 unigenes with 288 type I SSR (n ≥ 20 bp). One hundred type I SSR markers were randomly selected of which 43 yielded PCR amplicons of expected size in the first round of validation and were designated as validated genic-SSR markers. Further, 22,306 SNPs were identified by aligning high quality sequence reads of the three mango varieties to the reference unigene set, revealing significantly enhanced SNP heterozygosity in the hybrid Amrapali. The present study on leaf RNA sequencing of mango varieties and their hybrid provides useful genomic resource for genetic improvement of mango. PMID:27736892

  5. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research

    PubMed Central

    Lai, Zhongwu; Markovets, Aleksandra; Ahdesmaki, Miika; Chapman, Brad; Hofmann, Oliver; McEwen, Robert; Johnson, Justin; Dougherty, Brian; Barrett, J. Carl; Dry, Jonathan R.

    2016-01-01

    Abstract Accurate variant calling in next generation sequencing (NGS) is critical to understand cancer genomes better. Here we present VarDict, a novel and versatile variant caller for both DNA- and RNA-sequencing data. VarDict simultaneously calls SNV, MNV, InDels, complex and structural variants, expanding the detected genetic driver landscape of tumors. It performs local realignments on the fly for more accurate allele frequency estimation. VarDict performance scales linearly to sequencing depth, enabling ultra-deep sequencing used to explore tumor evolution or detect tumor DNA circulating in blood. In addition, VarDict performs amplicon aware variant calling for polymerase chain reaction (PCR)-based targeted sequencing often used in diagnostic settings, and is able to detect PCR artifacts. Finally, VarDict also detects differences in somatic and loss of heterozygosity variants between paired samples. VarDict reprocessing of The Cancer Genome Atlas (TCGA) Lung Adenocarcinoma dataset called known driver mutations in KRAS, EGFR, BRAF, PIK3CA and MET in 16% more patients than previously published variant calls. We believe VarDict will greatly facilitate application of NGS in clinical cancer research. PMID:27060149

  6. Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

    PubMed Central

    Matochko, Wadim L.; Derda, Ratmir

    2013-01-01

    Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (S a). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of S a and use them to define the sequencing operator (S e q). Sequencing without any bias and errors is S e q = S a IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (C E N), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process. PMID:24416071

  7. A Retrospective Examination of Feline Leukemia Subgroup Characterization: Viral Interference Assays to Deep Sequencing.

    PubMed

    Chiu, Elliott S; Hoover, Edward A; VandeWoude, Sue

    2018-01-10

    Feline leukemia virus (FeLV) was the first feline retrovirus discovered, and is associated with multiple fatal disease syndromes in cats, including lymphoma. The original research conducted on FeLV employed classical virological techniques. As methods have evolved to allow FeLV genetic characterization, investigators have continued to unravel the molecular pathology associated with this fascinating agent. In this review, we discuss how FeLV classification, transmission, and disease-inducing potential have been defined sequentially by viral interference assays, Sanger sequencing, PCR, and next-generation sequencing. In particular, we highlight the influences of endogenous FeLV and host genetics that represent FeLV research opportunities on the near horizon.

  8. Advances in Discrete-Event Simulation for MSL Command Validation

    NASA Technical Reports Server (NTRS)

    Patrikalakis, Alexander; O'Reilly, Taifun

    2013-01-01

    In the last five years, the discrete event simulator, SEQuence GENerator (SEQGEN), developed at the Jet Propulsion Laboratory to plan deep-space missions, has greatly increased uplink operations capacity to deal with increasingly complicated missions. In this paper, we describe how the Mars Science Laboratory (MSL) project makes full use of an interpreted environment to simulate change in more than fifty thousand flight software parameters and conditional command sequences to predict the result of executing a conditional branch in a command sequence, and enable the ability to warn users whenever one or more simulated spacecraft states change in an unexpected manner. Using these new SEQGEN features, operators plan more activities in one sol than ever before.

  9. Transcriptome Analysis in Venom Gland of the Predatory Giant Ant Dinoponera quadriceps: Insights into the Polypeptide Toxin Arsenal of Hymenopterans

    PubMed Central

    Chong, Cheong-Meng; Leung, Siu Wai; Prieto-da-Silva, Álvaro R. B.; Havt, Alexandre; Quinet, Yves P.; Martins, Alice M. C.; Lee, Simon M. Y.; Rádis-Baptista, Gandhi

    2014-01-01

    Background Dinoponera quadriceps is a predatory giant ant that inhabits the Neotropical region and subdues its prey (insects) with stings that deliver a toxic cocktail of molecules. Human accidents occasionally occur and cause local pain and systemic symptoms. A comprehensive study of the D. quadriceps venom gland transcriptome is required to advance our knowledge about the toxin repertoire of the giant ant venom and to understand the physiopathological basis of Hymenoptera envenomation. Results We conducted a transcriptome analysis of a cDNA library from the D. quadriceps venom gland with Sanger sequencing in combination with whole-transcriptome shotgun deep sequencing. From the cDNA library, a total of 420 independent clones were analyzed. Although the proportion of dinoponeratoxin isoform precursors was high, the first giant ant venom inhibitor cysteine-knot (ICK) toxin was found. The deep next generation sequencing yielded a total of 2,514,767 raw reads that were assembled into 18,546 contigs. A BLAST search of the assembled contigs against non-redundant and Swiss-Prot databases showed that 6,463 contigs corresponded to BLASTx hits and indicated an interesting diversity of transcripts related to venom gene expression. The majority of these venom-related sequences code for a major polypeptide core, which comprises venom allergens, lethal-like proteins and esterases, and a minor peptide framework composed of inter-specific structurally conserved cysteine-rich toxins. Both the cDNA library and deep sequencing yielded large proportions of contigs that showed no similarities with known sequences. Conclusions To our knowledge, this is the first report of the venom gland transcriptome of the New World giant ant D. quadriceps. The glandular venom system was dissected, and the toxin arsenal was revealed; this process brought to light novel sequences that included an ICK-folded toxins, allergen proteins, esterases (phospholipases and carboxylesterases), and lethal-like toxins. These findings contribute to the understanding of the ecology, behavior and venomics of hymenopterans. PMID:24498135

  10. A generic assay for whole-genome amplification and deep sequencing of enterovirus A71

    PubMed Central

    Tan, Le Van; Tuyen, Nguyen Thi Kim; Thanh, Tran Tan; Ngan, Tran Thuy; Van, Hoang Minh Tu; Sabanathan, Saraswathy; Van, Tran Thi My; Thanh, Le Thi My; Nguyet, Lam Anh; Geoghegan, Jemma L.; Ong, Kien Chai; Perera, David; Hang, Vu Thi Ty; Ny, Nguyen Thi Han; Anh, Nguyen To; Ha, Do Quang; Qui, Phan Tu; Viet, Do Chau; Tuan, Ha Manh; Wong, Kum Thong; Holmes, Edward C.; Chau, Nguyen Van Vinh; Thwaites, Guy; van Doorn, H. Rogier

    2015-01-01

    Enterovirus A71 (EV-A71) has emerged as the most important cause of large outbreaks of severe and sometimes fatal hand, foot and mouth disease (HFMD) across the Asia-Pacific region. EV-A71 outbreaks have been associated with (sub)genogroup switches, sometimes accompanied by recombination events. Understanding EV-A71 population dynamics is therefore essential for understanding this emerging infection, and may provide pivotal information for vaccine development. Despite the public health burden of EV-A71, relatively few EV-A71 complete-genome sequences are available for analysis and from limited geographical localities. The availability of an efficient procedure for whole-genome sequencing would stimulate effort to generate more viral sequence data. Herein, we report for the first time the development of a next-generation sequencing based protocol for whole-genome sequencing of EV-A71 directly from clinical specimens. We were able to sequence viruses of subgenogroup C4 and B5, while RNA from culture materials of diverse EV-A71 subgenogroups belonging to both genogroup B and C was successfully amplified. The nature of intra-host genetic diversity was explored in 22 clinical samples, revealing 107 positions carrying minor variants (ranging from 0 to 15 variants per sample). Our analysis of EV-A71 strains sampled in 2013 showed that they all belonged to subgenogroup B5, representing the first report of this subgenogroup in Vietnam. In conclusion, we have successfully developed a high-throughput next-generation sequencing-based assay for whole-genome sequencing of EV-A71 from clinical samples. PMID:25704598

  11. Predicting RNA-protein binding sites and motifs through combining local and global deep convolutional neural networks.

    PubMed

    Pan, Xiaoyong; Shen, Hong-Bin

    2018-05-02

    RNA-binding proteins (RBPs) take over 5∼10% of the eukaryotic proteome and play key roles in many biological processes, e.g. gene regulation. Experimental detection of RBP binding sites is still time-intensive and high-costly. Instead, computational prediction of the RBP binding sites using pattern learned from existing annotation knowledge is a fast approach. From the biological point of view, the local structure context derived from local sequences will be recognized by specific RBPs. However, in computational modeling using deep learning, to our best knowledge, only global representations of entire RNA sequences are employed. So far, the local sequence information is ignored in the deep model construction process. In this study, we present a computational method iDeepE to predict RNA-protein binding sites from RNA sequences by combining global and local convolutional neural networks (CNNs). For the global CNN, we pad the RNA sequences into the same length. For the local CNN, we split a RNA sequence into multiple overlapping fixed-length subsequences, where each subsequence is a signal channel of the whole sequence. Next, we train deep CNNs for multiple subsequences and the padded sequences to learn high-level features, respectively. Finally, the outputs from local and global CNNs are combined to improve the prediction. iDeepE demonstrates a better performance over state-of-the-art methods on two large-scale datasets derived from CLIP-seq. We also find that the local CNN run 1.8 times faster than the global CNN with comparable performance when using GPUs. Our results show that iDeepE has captured experimentally verified binding motifs. https://github.com/xypan1232/iDeepE. xypan172436@gmail.com or hbshen@sjtu.edu.cn. Supplementary data are available at Bioinformatics online.

  12. Assessing the Gene Content of the Megagenome: Sugar Pine (Pinus lambertiana)

    PubMed Central

    Gonzalez-Ibeas, Daniel; Martinez-Garcia, Pedro J.; Famula, Randi A.; Delfino-Mix, Annette; Stevens, Kristian A.; Loopstra, Carol A.; Langley, Charles H.; Neale, David B.; Wegrzyn, Jill L.

    2016-01-01

    Sugar pine (Pinus lambertiana Douglas) is within the subgenus Strobus with an estimated genome size of 31 Gbp. Transcriptomic resources are of particular interest in conifers due to the challenges presented in their megagenomes for gene identification. In this study, we present the first comprehensive survey of the P. lambertiana transcriptome through deep sequencing of a variety of tissue types to generate more than 2.5 billion short reads. Third generation, long reads generated through PacBio Iso-Seq have been included for the first time in conifers to combat the challenges associated with de novo transcriptome assembly. A technology comparison is provided here to contribute to the otherwise scarce comparisons of second and third generation transcriptome sequencing approaches in plant species. In addition, the transcriptome reference was essential for gene model identification and quality assessment in the parallel project responsible for sequencing and assembly of the entire genome. In this study, the transcriptomic data were also used to address questions surrounding lineage-specific Dicer-like proteins in conifers. These proteins play a role in the control of transposable element proliferation and the related genome expansion in conifers. PMID:27799338

  13. Clinical genomics information management software linking cancer genome sequence and clinical decisions.

    PubMed

    Watt, Stuart; Jiao, Wei; Brown, Andrew M K; Petrocelli, Teresa; Tran, Ben; Zhang, Tong; McPherson, John D; Kamel-Reid, Suzanne; Bedard, Philippe L; Onetto, Nicole; Hudson, Thomas J; Dancey, Janet; Siu, Lillian L; Stein, Lincoln; Ferretti, Vincent

    2013-09-01

    Using sequencing information to guide clinical decision-making requires coordination of a diverse set of people and activities. In clinical genomics, the process typically includes sample acquisition, template preparation, genome data generation, analysis to identify and confirm variant alleles, interpretation of clinical significance, and reporting to clinicians. We describe a software application developed within a clinical genomics study, to support this entire process. The software application tracks patients, samples, genomic results, decisions and reports across the cohort, monitors progress and sends reminders, and works alongside an electronic data capture system for the trial's clinical and genomic data. It incorporates systems to read, store, analyze and consolidate sequencing results from multiple technologies, and provides a curated knowledge base of tumor mutation frequency (from the COSMIC database) annotated with clinical significance and drug sensitivity to generate reports for clinicians. By supporting the entire process, the application provides deep support for clinical decision making, enabling the generation of relevant guidance in reports for verification by an expert panel prior to forwarding to the treating physician. Copyright © 2013 Elsevier Inc. All rights reserved.

  14. SCRaMbLE generates designed combinatorial stochastic diversity in synthetic chromosomes

    PubMed Central

    Shen, Yue; Stracquadanio, Giovanni; Wang, Yun; Yang, Kun; Mitchell, Leslie A.; Xue, Yaxin; Cai, Yizhi; Chen, Tai; Dymond, Jessica S.; Kang, Kang; Gong, Jianhui; Zeng, Xiaofan; Zhang, Yongfen; Li, Yingrui; Feng, Qiang; Xu, Xun; Wang, Jun; Wang, Jian; Yang, Huanming; Boeke, Jef D.; Bader, Joel S.

    2016-01-01

    Synthetic chromosome rearrangement and modification by loxP-mediated evolution (SCRaMbLE) generates combinatorial genomic diversity through rearrangements at designed recombinase sites. We applied SCRaMbLE to yeast synthetic chromosome arm synIXR (43 recombinase sites) and then used a computational pipeline to infer or unscramble the sequence of recombinations that created the observed genomes. Deep sequencing of 64 synIXR SCRaMbLE strains revealed 156 deletions, 89 inversions, 94 duplications, and 55 additional complex rearrangements; several duplications are consistent with a double rolling circle mechanism. Every SCRaMbLE strain was unique, validating the capability of SCRaMbLE to explore a diverse space of genomes. Rearrangements occurred exclusively at designed loxPsym sites, with no significant evidence for ectopic rearrangements or mutations involving synthetic regions, the 99% nonsynthetic nuclear genome, or the mitochondrial genome. Deletion frequencies identified genes required for viability or fast growth. Replacement of 3′ UTR by non-UTR sequence had surprisingly little effect on fitness. SCRaMbLE generates genome diversity in designated regions, reveals fitness constraints, and should scale to simultaneous evolution of multiple synthetic chromosomes. PMID:26566658

  15. Viral to metazoan marine plankton nucleotide sequences from the Tara Oceans expedition

    PubMed Central

    Alberti, Adriana; Poulain, Julie; Engelen, Stefan; Labadie, Karine; Romac, Sarah; Ferrera, Isabel; Albini, Guillaume; Aury, Jean-Marc; Belser, Caroline; Bertrand, Alexis; Cruaud, Corinne; Da Silva, Corinne; Dossat, Carole; Gavory, Frédérick; Gas, Shahinaz; Guy, Julie; Haquelle, Maud; Jacoby, E'krame; Jaillon, Olivier; Lemainque, Arnaud; Pelletier, Eric; Samson, Gaëlle; Wessner, Mark; Bazire, Pascal; Beluche, Odette; Bertrand, Laurie; Besnard-Gonnet, Marielle; Bordelais, Isabelle; Boutard, Magali; Dubois, Maria; Dumont, Corinne; Ettedgui, Evelyne; Fernandez, Patricia; Garcia, Espérance; Aiach, Nathalie Giordanenco; Guerin, Thomas; Hamon, Chadia; Brun, Elodie; Lebled, Sandrine; Lenoble, Patricia; Louesse, Claudine; Mahieu, Eric; Mairey, Barbara; Martins, Nathalie; Megret, Catherine; Milani, Claire; Muanga, Jacqueline; Orvain, Céline; Payen, Emilie; Perroud, Peggy; Petit, Emmanuelle; Robert, Dominique; Ronsin, Murielle; Vacherie, Benoit; Acinas, Silvia G.; Royo-Llonch, Marta; Cornejo-Castillo, Francisco M.; Logares, Ramiro; Fernández-Gómez, Beatriz; Bowler, Chris; Cochrane, Guy; Amid, Clara; Hoopen, Petra Ten; De Vargas, Colomban; Grimsley, Nigel; Desgranges, Elodie; Kandels-Lewis, Stefanie; Ogata, Hiroyuki; Poulton, Nicole; Sieracki, Michael E.; Stepanauskas, Ramunas; Sullivan, Matthew B.; Brum, Jennifer R.; Duhaime, Melissa B.; Poulos, Bonnie T.; Hurwitz, Bonnie L.; Acinas, Silvia G.; Bork, Peer; Boss, Emmanuel; Bowler, Chris; De Vargas, Colomban; Follows, Michael; Gorsky, Gabriel; Grimsley, Nigel; Hingamp, Pascal; Iudicone, Daniele; Jaillon, Olivier; Kandels-Lewis, Stefanie; Karp-Boss, Lee; Karsenti, Eric; Not, Fabrice; Ogata, Hiroyuki; Pesant, Stéphane; Raes, Jeroen; Sardet, Christian; Sieracki, Michael E.; Speich, Sabrina; Stemmann, Lars; Sullivan, Matthew B.; Sunagawa, Shinichi; Wincker, Patrick; Pesant, Stéphane; Karsenti, Eric; Wincker, Patrick

    2017-01-01

    A unique collection of oceanic samples was gathered by the Tara Oceans expeditions (2009–2013), targeting plankton organisms ranging from viruses to metazoans, and providing rich environmental context measurements. Thanks to recent advances in the field of genomics, extensive sequencing has been performed for a deep genomic analysis of this huge collection of samples. A strategy based on different approaches, such as metabarcoding, metagenomics, single-cell genomics and metatranscriptomics, has been chosen for analysis of size-fractionated plankton communities. Here, we provide detailed procedures applied for genomic data generation, from nucleic acids extraction to sequence production, and we describe registries of genomics datasets available at the European Nucleotide Archive (ENA, www.ebi.ac.uk/ena). The association of these metadata to the experimental procedures applied for their generation will help the scientific community to access these data and facilitate their analysis. This paper complements other efforts to provide a full description of experiments and open science resources generated from the Tara Oceans project, further extending their value for the study of the world’s planktonic ecosystems. PMID:28763055

  16. Viral to metazoan marine plankton nucleotide sequences from the Tara Oceans expedition.

    PubMed

    Alberti, Adriana; Poulain, Julie; Engelen, Stefan; Labadie, Karine; Romac, Sarah; Ferrera, Isabel; Albini, Guillaume; Aury, Jean-Marc; Belser, Caroline; Bertrand, Alexis; Cruaud, Corinne; Da Silva, Corinne; Dossat, Carole; Gavory, Frédérick; Gas, Shahinaz; Guy, Julie; Haquelle, Maud; Jacoby, E'krame; Jaillon, Olivier; Lemainque, Arnaud; Pelletier, Eric; Samson, Gaëlle; Wessner, Mark; Acinas, Silvia G; Royo-Llonch, Marta; Cornejo-Castillo, Francisco M; Logares, Ramiro; Fernández-Gómez, Beatriz; Bowler, Chris; Cochrane, Guy; Amid, Clara; Hoopen, Petra Ten; De Vargas, Colomban; Grimsley, Nigel; Desgranges, Elodie; Kandels-Lewis, Stefanie; Ogata, Hiroyuki; Poulton, Nicole; Sieracki, Michael E; Stepanauskas, Ramunas; Sullivan, Matthew B; Brum, Jennifer R; Duhaime, Melissa B; Poulos, Bonnie T; Hurwitz, Bonnie L; Pesant, Stéphane; Karsenti, Eric; Wincker, Patrick

    2017-08-01

    A unique collection of oceanic samples was gathered by the Tara Oceans expeditions (2009-2013), targeting plankton organisms ranging from viruses to metazoans, and providing rich environmental context measurements. Thanks to recent advances in the field of genomics, extensive sequencing has been performed for a deep genomic analysis of this huge collection of samples. A strategy based on different approaches, such as metabarcoding, metagenomics, single-cell genomics and metatranscriptomics, has been chosen for analysis of size-fractionated plankton communities. Here, we provide detailed procedures applied for genomic data generation, from nucleic acids extraction to sequence production, and we describe registries of genomics datasets available at the European Nucleotide Archive (ENA, www.ebi.ac.uk/ena). The association of these metadata to the experimental procedures applied for their generation will help the scientific community to access these data and facilitate their analysis. This paper complements other efforts to provide a full description of experiments and open science resources generated from the Tara Oceans project, further extending their value for the study of the world's planktonic ecosystems.

  17. DeepText2GO: Improving large-scale protein function prediction with deep semantic text representation.

    PubMed

    You, Ronghui; Huang, Xiaodi; Zhu, Shanfeng

    2018-06-06

    As of April 2018, UniProtKB has collected more than 115 million protein sequences. Less than 0.15% of these proteins, however, have been associated with experimental GO annotations. As such, the use of automatic protein function prediction (AFP) to reduce this huge gap becomes increasingly important. The previous studies conclude that sequence homology based methods are highly effective in AFP. In addition, mining motif, domain, and functional information from protein sequences has been found very helpful for AFP. Other than sequences, alternative information sources such as text, however, may be useful for AFP as well. Instead of using BOW (bag of words) representation in traditional text-based AFP, we propose a new method called DeepText2GO that relies on deep semantic text representation, together with different kinds of available protein information such as sequence homology, families, domains, and motifs, to improve large-scale AFP. Furthermore, DeepText2GO integrates text-based methods with sequence-based ones by means of a consensus approach. Extensive experiments on the benchmark dataset extracted from UniProt/SwissProt have demonstrated that DeepText2GO significantly outperformed both text-based and sequence-based methods, validating its superiority. Copyright © 2018 Elsevier Inc. All rights reserved.

  18. Integrated sequence stratigraphy of the postimpact sediments from the Eyreville core holes, Chesapeake Bay impact structure inner basin

    USGS Publications Warehouse

    Browning, J.V.; Miller, K.G.; McLaughlin, P.P.; Edwards, L.E.; Kulpecz, A.A.; Powars, D.S.; Wade, B.S.; Feigenson, M.D.; Wright, J.D.

    2009-01-01

    The Eyreville core holes provide the first continuously cored record of postimpact sequences from within the deepest part of the central Chesapeake Bay impact crater. We analyzed the upper Eocene to Pliocene postimpact sediments from the Eyreville A and C core holes for lithology (semiquantitative measurements of grain size and composition), sequence stratigraphy, and chronostratigraphy. Age is based primarily on Sr isotope stratigraphy supplemented by biostratigraphy (dinocysts, nannofossils, and planktonic foraminifers); age resolution is approximately ??0.5 Ma for early Miocene sequences and approximately ??1.0 Ma for younger and older sequences. Eocene-lower Miocene sequences are subtle, upper middle to lower upper Miocene sequences are more clearly distinguished, and upper Miocene- Pliocene sequences display a distinct facies pattern within sequences. We recognize two upper Eocene, two Oligocene, nine Miocene, three Pliocene, and one Pleistocene sequence and correlate them with those in New Jersey and Delaware. The upper Eocene through Pleistocene strata at Eyreville record changes from: (1) rapidly deposited, extremely fi ne-grained Eocene strata that probably represent two sequences deposited in a deep (>200 m) basin; to (2) highly dissected Oligocene (two very thin sequences) to lower Miocene (three thin sequences) with a long hiatus; to (3) a thick, rapidly deposited (43-73 m/Ma), very fi ne-grained, biosiliceous middle Miocene (16.5-14 Ma) section divided into three sequences (V5-V3) deposited in middle neritic paleoenvironments; to (4) a 4.5-Ma-long hiatus (12.8-8.3 Ma); to (5) sandy, shelly upper Miocene to Pliocene strata (8.3-2.0 Ma) divided into six sequences deposited in shelf and shoreface environments; and, last, to (6) a sandy middle Pleistocene paralic sequence (~400 ka). The Eyreville cores thus record the fi lling of a deep impact-generated basin where the timing of sequence boundaries is heavily infl uenced by eustasy. ?? 2009 The Geological Society of America.

  19. Combining Next Generation Sequencing with Bulked Segregant Analysis to Fine Map a Stem Moisture Locus in Sorghum (Sorghum bicolor L. Moench).

    PubMed

    Han, Yucui; Lv, Peng; Hou, Shenglin; Li, Suying; Ji, Guisu; Ma, Xue; Du, Ruiheng; Liu, Guoqing

    2015-01-01

    Sorghum is one of the most promising bioenergy crops. Stem juice yield, together with stem sugar concentration, determines sugar yield in sweet sorghum. Bulked segregant analysis (BSA) is a gene mapping technique for identifying genomic regions containing genetic loci affecting a trait of interest that when combined with deep sequencing could effectively accelerate the gene mapping process. In this study, a dry stem sorghum landrace was characterized and the stem water controlling locus, qSW6, was fine mapped using QTL analysis and the combined BSA and deep sequencing technologies. Results showed that: (i) In sorghum variety Jiliang 2, stem water content was around 80% before flowering stage. It dropped to 75% during grain filling with little difference between different internodes. In landrace G21, stem water content keeps dropping after the flag leaf stage. The drop from 71% at flowering time progressed to 60% at grain filling time. Large differences exist between different internodes with the lowest (51%) at the 7th and 8th internodes at dough stage. (ii) A quantitative trait locus (QTL) controlling stem water content mapped on chromosome 6 between SSR markers Ch6-2 and gpsb069 explained about 34.7-56.9% of the phenotypic variation for the 5th to 10th internodes, respectively. (iii) BSA and deep sequencing analysis narrowed the associated region to 339 kb containing 38 putative genes. The results could help reveal molecular mechanisms underlying juice yield of sorghum and thus to improve total sugar yield.

  20. Sequence-of-events-driven automation of the deep space network

    NASA Technical Reports Server (NTRS)

    Hill, R., Jr.; Fayyad, K.; Smyth, C.; Santos, T.; Chen, R.; Chien, S.; Bevan, R.

    1996-01-01

    In February 1995, sequence-of-events (SOE)-driven automation technology was demonstrated for a Voyager telemetry downlink track at DSS 13. This demonstration entailed automated generation of an operations procedure (in the form of a temporal dependency network) from project SOE information using artificial intelligence planning technology and automated execution of the temporal dependency network using the link monitor and control operator assistant system. This article describes the overall approach to SOE-driven automation that was demonstrated, identifies gaps in SOE definitions and project profiles that hamper automation, and provides detailed measurements of the knowledge engineering effort required for automation.

  1. Sequence-of-Events-Driven Automation of the Deep Space Network

    NASA Technical Reports Server (NTRS)

    Hill, R., Jr.; Fayyad, K.; Smyth, C.; Santos, T.; Chen, R.; Chien, S.; Bevan, R.

    1996-01-01

    In February 1995, sequence-of-events (SOE)-driven automation technology was demonstrated for a Voyager telemetry downlink track at DSS 13. This demonstration entailed automated generation of an operations procedure (in the form of a temporal dependency network) from project SOE information using artificial intelligence planning technology and automated execution of the temporal dependency network using the link monitor and control operator assistant system. This article describes the overall approach to SOE-driven automation that was demonstrated, identifies gaps in SOE definitions and project profiles that hamper automation, and provides detailed measurements of the knowledge engineering effort required for automation.

  2. dictyExpress: a web-based platform for sequence data management and analytics in Dictyostelium and beyond.

    PubMed

    Stajdohar, Miha; Rosengarten, Rafael D; Kokosar, Janez; Jeran, Luka; Blenkus, Domen; Shaulsky, Gad; Zupan, Blaz

    2017-06-02

    Dictyostelium discoideum, a soil-dwelling social amoeba, is a model for the study of numerous biological processes. Research in the field has benefited mightily from the adoption of next-generation sequencing for genomics and transcriptomics. Dictyostelium biologists now face the widespread challenges of analyzing and exploring high dimensional data sets to generate hypotheses and discovering novel insights. We present dictyExpress (2.0), a web application designed for exploratory analysis of gene expression data, as well as data from related experiments such as Chromatin Immunoprecipitation sequencing (ChIP-Seq). The application features visualization modules that include time course expression profiles, clustering, gene ontology enrichment analysis, differential expression analysis and comparison of experiments. All visualizations are interactive and interconnected, such that the selection of genes in one module propagates instantly to visualizations in other modules. dictyExpress currently stores the data from over 800 Dictyostelium experiments and is embedded within a general-purpose software framework for management of next-generation sequencing data. dictyExpress allows users to explore their data in a broader context by reciprocal linking with dictyBase-a repository of Dictyostelium genomic data. In addition, we introduce a companion application called GenBoard, an intuitive graphic user interface for data management and bioinformatics analysis. dictyExpress and GenBoard enable broad adoption of next generation sequencing based inquiries by the Dictyostelium research community. Labs without the means to undertake deep sequencing projects can mine the data available to the public. The entire information flow, from raw sequence data to hypothesis testing, can be accomplished in an efficient workspace. The software framework is generalizable and represents a useful approach for any research community. To encourage more wide usage, the backend is open-source, available for extension and further development by bioinformaticians and data scientists.

  3. Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome

    PubMed Central

    Margulies, Elliott H.; Cooper, Gregory M.; Asimenos, George; Thomas, Daryl J.; Dewey, Colin N.; Siepel, Adam; Birney, Ewan; Keefe, Damian; Schwartz, Ariel S.; Hou, Minmei; Taylor, James; Nikolaev, Sergey; Montoya-Burgos, Juan I.; Löytynoja, Ari; Whelan, Simon; Pardi, Fabio; Massingham, Tim; Brown, James B.; Bickel, Peter; Holmes, Ian; Mullikin, James C.; Ureta-Vidal, Abel; Paten, Benedict; Stone, Eric A.; Rosenbloom, Kate R.; Kent, W. James; Bouffard, Gerard G.; Guan, Xiaobin; Hansen, Nancy F.; Idol, Jacquelyn R.; Maduro, Valerie V.B.; Maskeri, Baishali; McDowell, Jennifer C.; Park, Morgan; Thomas, Pamela J.; Young, Alice C.; Blakesley, Robert W.; Muzny, Donna M.; Sodergren, Erica; Wheeler, David A.; Worley, Kim C.; Jiang, Huaiyang; Weinstock, George M.; Gibbs, Richard A.; Graves, Tina; Fulton, Robert; Mardis, Elaine R.; Wilson, Richard K.; Clamp, Michele; Cuff, James; Gnerre, Sante; Jaffe, David B.; Chang, Jean L.; Lindblad-Toh, Kerstin; Lander, Eric S.; Hinrichs, Angie; Trumbower, Heather; Clawson, Hiram; Zweig, Ann; Kuhn, Robert M.; Barber, Galt; Harte, Rachel; Karolchik, Donna; Field, Matthew A.; Moore, Richard A.; Matthewson, Carrie A.; Schein, Jacqueline E.; Marra, Marco A.; Antonarakis, Stylianos E.; Batzoglou, Serafim; Goldman, Nick; Hardison, Ross; Haussler, David; Miller, Webb; Pachter, Lior; Green, Eric D.; Sidow, Arend

    2007-01-01

    A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization. PMID:17567995

  4. Next-generation sequencing identification of pathogenic bacterial genes and their relationship with fecal indicator bacteria in different water sources in the Kathmandu Valley, Nepal.

    PubMed

    Ghaju Shrestha, Rajani; Tanaka, Yasuhiro; Malla, Bikash; Bhandari, Dinesh; Tandukar, Sarmila; Inoue, Daisuke; Sei, Kazunari; Sherchand, Jeevan B; Haramoto, Eiji

    2017-12-01

    Bacteriological analysis of drinking water leads to detection of only conventional fecal indicator bacteria. This study aimed to explore and characterize bacterial diversity, to understand the extent of pathogenic bacterial contamination, and to examine the relationship between pathogenic bacteria and fecal indicator bacteria in different water sources in the Kathmandu Valley, Nepal. Sixteen water samples were collected from shallow dug wells (n=12), a deep tube well (n=1), a spring (n=1), and rivers (n=2) in September 2014 for 16S rRNA gene next-generation sequencing. A total of 525 genera were identified, of which 81 genera were classified as possible pathogenic bacteria. Acinetobacter, Arcobacter, and Clostridium were detected with a relatively higher abundance (>0.1% of total bacterial genes) in 16, 13, and 5 of the 16 samples, respectively, and the highest abundance ratio of Acinetobacter (85.14%) was obtained in the deep tube well sample. Furthermore, the bla OXA23-like genes of Acinetobacter were detected using SYBR Green-based quantitative PCR in 13 (35%) of 37 water samples, including the 16 samples that were analyzed for next-generation sequencing, with concentrations ranging 5.3-7.5logcopies/100mL. There was no sufficient correlation found between fecal indicator bacteria, such as Escherichia coli and total coliforms, and potential pathogenic bacteria, as well as the bla OXA23-like gene of Acinetobacter. These results suggest the limitation of using conventional fecal indicator bacteria in evaluating the pathogenic bacteria contamination of different water sources in the Kathmandu Valley. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. GARFIELD-NGS: Genomic vARiants FIltering by dEep Learning moDels in NGS.

    PubMed

    Ravasio, Viola; Ritelli, Marco; Legati, Andrea; Giacopuzzi, Edoardo

    2018-04-14

    Exome sequencing approach is extensively used in research and diagnostic laboratories to discover pathological variants and study genetic architecture of human diseases. However, a significant proportion of identified genetic variants are actually false positive calls, and this pose serious challenges for variants interpretation. Here, we propose a new tool named GARFIELD-NGS (Genomic vARiants FIltering by dEep Learning moDels in NGS), which rely on deep learning models to dissect false and true variants in exome sequencing experiments performed with Illumina or ION platforms. GARFIELD-NGS showed strong performances for both SNP and INDEL variants (AUC 0.71 - 0.98) and outperformed established hard filters. The method is robust also at low coverage down to 30X and can be applied on data generated with the recent Illumina two-colour chemistry. GARFIELD-NGS processes standard VCF file and produces a regular VCF output. Thus, it can be easily integrated in existing analysis pipeline, allowing application of different thresholds based on desired level of sensitivity and specificity. GARFIELD-NGS available at https://github.com/gedoardo83/GARFIELD-NGS. edoardo.giacopuzzi@unibs.it. Supplementary data are available at Bioinformatics online.

  6. Deep Sequencing Analysis of RNAs from Citrus Plants Grown in a Citrus Sudden Death-Affected Area Reveals Diverse Known and Putative Novel Viruses.

    PubMed

    Matsumura, Emilyn E; Coletta-Filho, Helvecio D; Nouri, Shahideh; Falk, Bryce W; Nerva, Luca; Oliveira, Tiago S; Dorta, Silvia O; Machado, Marcos A

    2017-04-24

    Citrus sudden death (CSD) has caused the death of approximately four million orange trees in a very important citrus region in Brazil. Although its etiology is still not completely clear, symptoms and distribution of affected plants indicate a viral disease. In a search for viruses associated with CSD, we have performed a comparative high-throughput sequencing analysis of the transcriptome and small RNAs from CSD-symptomatic and -asymptomatic plants using the Illumina platform. The data revealed mixed infections that included Citrus tristeza virus (CTV) as the most predominant virus, followed by the Citrus sudden death-associated virus (CSDaV), Citrus endogenous pararetrovirus (CitPRV) and two putative novel viruses tentatively named Citrus jingmen-like virus (CJLV), and Citrus virga-like virus (CVLV). The deep sequencing analyses were sensitive enough to differentiate two genotypes of both viruses previously associated with CSD-affected plants: CTV and CSDaV. Our data also showed a putative association of the CSD-symptomatic plants with a specific CSDaV genotype and a likely association with CitPRV as well, whereas the two putative novel viruses showed to be more associated with CSD-asymptomatic plants. This is the first high-throughput sequencing-based study of the viral sequences present in CSD-affected citrus plants, and generated valuable information for further CSD studies.

  7. Describing the diversity of Ag specific receptors in vertebrates: Contribution of repertoire deep sequencing.

    PubMed

    Castro, Rosario; Navelsaker, Sofie; Krasnov, Aleksei; Du Pasquier, Louis; Boudinot, Pierre

    2017-10-01

    During the last decades, gene and cDNA cloning identified TCR and Ig genes across vertebrates; genome sequencing of TCR and Ig loci in many species revealed the different organizations selected during evolution under the pressure of generating diverse repertoires of Ag receptors. By detecting clonotypes over a wide range of frequency, deep sequencing of Ig and TCR transcripts provides a new way to compare the structure of expressed repertoires in species of various sizes, at different stages of development, with different physiologies, and displaying multiple adaptations to the environment. In this review, we provide a short overview of the technologies currently used to produce global description of immune repertoires, describe how they have already been used in comparative immunology, and we discuss the future potential of such approaches. The development of these methodologies in new species holds promise for new discoveries concerning particular adaptations. As an example, understanding the development of adaptive immunity across metamorphosis in frogs has been made possible by such approaches. Repertoire sequencing is now widely used, not only in basic research but also in the context of immunotherapy and vaccination. Analysis of fish responses to pathogens and vaccines has already benefited from these methods. Finally, we also discuss potential advances based on repertoire sequencing of multigene families of immune sensors and effectors in invertebrates. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. Quasispecies Analyses of the HIV-1 Near-full-length Genome With Illumina MiSeq

    PubMed Central

    Ode, Hirotaka; Matsuda, Masakazu; Matsuoka, Kazuhiro; Hachiya, Atsuko; Hattori, Junko; Kito, Yumiko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru

    2015-01-01

    Human immunodeficiency virus type-1 (HIV-1) exhibits high between-host genetic diversity and within-host heterogeneity, recognized as quasispecies. Because HIV-1 quasispecies fluctuate in terms of multiple factors, such as antiretroviral exposure and host immunity, analyzing the HIV-1 genome is critical for selecting effective antiretroviral therapy and understanding within-host viral coevolution mechanisms. Here, to obtain HIV-1 genome sequence information that includes minority variants, we sought to develop a method for evaluating quasispecies throughout the HIV-1 near-full-length genome using the Illumina MiSeq benchtop deep sequencer. To ensure the reliability of minority mutation detection, we applied an analysis method of sequence read mapping onto a consensus sequence derived from de novo assembly followed by iterative mapping and subsequent unique error correction. Deep sequencing analyses of aHIV-1 clone showed that the analysis method reduced erroneous base prevalence below 1% in each sequence position and discarded only < 1% of all collected nucleotides, maximizing the usage of the collected genome sequences. Further, we designed primer sets to amplify the HIV-1 near-full-length genome from clinical plasma samples. Deep sequencing of 92 samples in combination with the primer sets and our analysis method provided sufficient coverage to identify >1%-frequency sequences throughout the genome. When we evaluated sequences of pol genes from 18 treatment-naïve patients' samples, the deep sequencing results were in agreement with Sanger sequencing and identified numerous additional minority mutations. The results suggest that our deep sequencing method would be suitable for identifying within-host viral population dynamics throughout the genome. PMID:26617593

  9. Investigation of a Canine Parvovirus Outbreak using Next Generation Sequencing.

    PubMed

    Parker, Jayme; Murphy, Molly; Hueffer, Karsten; Chen, Jack

    2017-08-29

    Canine parvovirus (CPV) outbreaks can have a devastating effect in communities with dense dog populations. The interior region of Alaska experienced a CPV outbreak in the winter of 2016 leading to the further investigation of the virus due to reports of increased morbidity and mortality occurring at dog mushing kennels in the area. Twelve rectal-swab specimens from dogs displaying clinical signs consistent with parvoviral-associated disease were processed using next-generation sequencing (NGS) methodologies by targeting RNA transcripts, and therefore detecting only replicating virus. All twelve specimens demonstrated the presence of the CPV transcriptome, with read depths ranging from 2.2X - 12,381X, genome coverage ranging from 44.8-96.5%, and representation of CPV sequencing reads to those of the metagenome background ranging from 0.0015-6.7%. Using the data generated by NGS, the presence of newly evolved, yet known, strains of both CPV-2a and CPV-2b were identified and grouped geographically. Deep-sequencing data provided additional diagnostic information in terms of investigating novel CPV in this outbreak. NGS data in addition to limited serological data provided strong diagnostic evidence that this outbreak most likely arose from unvaccinated or under-vaccinated canines, not from a novel CPV strain incapable of being neutralized by current vaccination efforts.

  10. Integration of deep transcriptome and proteome analyses reveals the components of alkaloid metabolism in opium poppy cell cultures

    PubMed Central

    2010-01-01

    Background Papaver somniferum (opium poppy) is the source for several pharmaceutical benzylisoquinoline alkaloids including morphine, the codeine and sanguinarine. In response to treatment with a fungal elicitor, the biosynthesis and accumulation of sanguinarine is induced along with other plant defense responses in opium poppy cell cultures. The transcriptional induction of alkaloid metabolism in cultured cells provides an opportunity to identify components of this process via the integration of deep transcriptome and proteome databases generated using next-generation technologies. Results A cDNA library was prepared for opium poppy cell cultures treated with a fungal elicitor for 10 h. Using 454 GS-FLX Titanium pyrosequencing, 427,369 expressed sequence tags (ESTs) with an average length of 462 bp were generated. Assembly of these sequences yielded 93,723 unigenes, of which 23,753 were assigned Gene Ontology annotations. Transcripts encoding all known sanguinarine biosynthetic enzymes were identified in the EST database, 5 of which were represented among the 50 most abundant transcripts. Liquid chromatography-tandem mass spectrometry (LC-MS/MS) of total protein extracts from cell cultures treated with a fungal elicitor for 50 h facilitated the identification of 1,004 proteins. Proteins were fractionated by one-dimensional SDS-PAGE and digested with trypsin prior to LC-MS/MS analysis. Query of an opium poppy-specific EST database substantially enhanced peptide identification. Eight out of 10 known sanguinarine biosynthetic enzymes and many relevant primary metabolic enzymes were represented in the peptide database. Conclusions The integration of deep transcriptome and proteome analyses provides an effective platform to catalogue the components of secondary metabolism, and to identify genes encoding uncharacterized enzymes. The establishment of corresponding transcript and protein databases generated by next-generation technologies in a system with a well-defined metabolite profile facilitates an improved linkage between genes, enzymes, and pathway components. The proteome database represents the most relevant alkaloid-producing enzymes, compared with the much deeper and more complete transcriptome library. The transcript database contained full-length mRNAs encoding most alkaloid biosynthetic enzymes, which is a key requirement for the functional characterization of novel gene candidates. PMID:21083930

  11. Detecting very low allele fraction variants using targeted DNA sequencing and a novel molecular barcode-aware variant caller.

    PubMed

    Xu, Chang; Nezami Ranjbar, Mohammad R; Wu, Zhong; DiCarlo, John; Wang, Yexun

    2017-01-03

    Detection of DNA mutations at very low allele fractions with high accuracy will significantly improve the effectiveness of precision medicine for cancer patients. To achieve this goal through next generation sequencing, researchers need a detection method that 1) captures rare mutation-containing DNA fragments efficiently in the mix of abundant wild-type DNA; 2) sequences the DNA library extensively to deep coverage; and 3) distinguishes low level true variants from amplification and sequencing errors with high accuracy. Targeted enrichment using PCR primers provides researchers with a convenient way to achieve deep sequencing for a small, yet most relevant region using benchtop sequencers. Molecular barcoding (or indexing) provides a unique solution for reducing sequencing artifacts analytically. Although different molecular barcoding schemes have been reported in recent literature, most variant calling has been done on limited targets, using simple custom scripts. The analytical performance of barcode-aware variant calling can be significantly improved by incorporating advanced statistical models. We present here a highly efficient, simple and scalable enrichment protocol that integrates molecular barcodes in multiplex PCR amplification. In addition, we developed smCounter, an open source, generic, barcode-aware variant caller based on a Bayesian probabilistic model. smCounter was optimized and benchmarked on two independent read sets with SNVs and indels at 5 and 1% allele fractions. Variants were called with very good sensitivity and specificity within coding regions. We demonstrated that we can accurately detect somatic mutations with allele fractions as low as 1% in coding regions using our enrichment protocol and variant caller.

  12. Fingerprints of Modified RNA Bases from Deep Sequencing Profiles.

    PubMed

    Kietrys, Anna M; Velema, Willem A; Kool, Eric T

    2017-11-29

    Posttranscriptional modifications of RNA bases are not only found in many noncoding RNAs but have also recently been identified in coding (messenger) RNAs as well. They require complex and laborious methods to locate, and many still lack methods for localized detection. Here we test the ability of next-generation sequencing (NGS) to detect and distinguish between ten modified bases in synthetic RNAs. We compare ultradeep sequencing patterns of modified bases, including miscoding, insertions and deletions (indels), and truncations, to unmodified bases in the same contexts. The data show widely varied responses to modification, ranging from no response, to high levels of mutations, insertions, deletions, and truncations. The patterns are distinct for several of the modifications, and suggest the future use of ultradeep sequencing as a fingerprinting strategy for locating and identifying modifications in cellular RNAs.

  13. Stratal stacking patterns and tectono-sedimentary evolution of hyperextended magma-poor rifted margins

    NASA Astrophysics Data System (ADS)

    Ribes, C.; Gillard, M.; Epin, M. E.; Ghienne, J. F.; Manatschal, G.; Karner, G. D.; Johnson, C. A.

    2016-12-01

    Research on the formation and evolution of deep-water rifted margins has undergone a major paradigm shift in recent years. An increasing number of studies of present-day and fossil rifted margins allow us to identify and characterize the structural architecture of the most distal parts of rifted margins, the so-called hyperextended, magma-poor rifted margins. However, at present, little is known about the depositional environments, sedimentary facies, stacking patterns, subsidence and thermal history within these domains. In this context, characterizing the stratal stacking patterns and understanding their spatial and temporal evolution is a new challenge. The major difficulty comes from the fact that the observed stratigraphic geometries and facies relationships are a result of the complex interplay between sediment supply and available accommodation, which is controlled by not only the regional generation of accommodation, but also by local tectono-magmatic processes. These parameters are poorly constrained or even sufficiently known in these tectonic settings. Indeed, the complex structural evolution of hyperextended magma-poor rifted margins, including the development of poly-phase in-sequence and out of sequence extensional detachment faults and associated mantle exhumation and magmatic activity, can generate complex accommodation patterns over a highly structured top basement. The presentation summarizes early results concerning the controlling parameters on ultra-deep water stratigraphic stacking patterns and to provide a conceptual framework. This observation-driven approach combines fieldwork from fossil Alpine Tethys margins exposed in the Alps and the analysis of seismic reflection data from present-day deep water rifted margins such as the Australian-Antarctic, East India and Iberia-Newfoundland margins.

  14. Sedimentary modeling and analysis of petroleum system of the upper Tertiary sequences in southern Ulleung sedimentary Basin, East Sea (Sea of Japan)

    NASA Astrophysics Data System (ADS)

    Cheong, D.; Kim, D.; Kim, Y.

    2010-12-01

    The block 6-1 located in the southwestern margin of the Ulleung basin, East Sea (Sea of Japan) is an area where recently produces commercial natural gas and condensate. A total of 17 exploratory wells have been drilled, and also many seismic explorations have been carried out since early 1970s. Among the wells and seismic sections, the Gorae 1 well and a seismic section through the Gorae 1-2 well were chosen for this simulation work. Then, a 2-D graphic simulation using SEDPAK elucidates the evolution, burial history and diagenesis of the sedimentary sequence. The study area is a suitable place for modeling a petroleum system and evaluating hydrocarbon potential of reservoir. Shale as a source rock is about 3500m deep from sea floor, and sandstones interbedded with thin mud layers are distributed as potential reservoir rocks from 3,500m to 2,000m deep. On top of that, shales cover as seal rocks and overburden rocks upto 900m deep. Input data(sea level, sediment supply, subsidence rate, etc) for the simulation was taken from several previous published papers including the well and seismic data, and the thermal maturity of the sediment was calculated from known thermal gradient data. In this study area, gas and condensate have been found and commercially produced, and the result of the simulation also shows that there is a gas window between 4000m and 6000m deep, so that three possible interpretations can be inferred from the simulation result. First, oil has already moved and gone to the southeastern area along uplifting zones. Or second, oil has never been generated because organic matter is kerogen type 3, and or finally, generated oil has been converted into gas by thermally overcooking. SEDPAK has an advantage that it provides the timing and depth information of generated oil and gas with TTI values even though it has a limit which itself can not perform geochemical modeling to analyze thermal maturity level of source rocks. Based on the result of our simulation, added exploratory wells are required to discover deeper gas located in the study area.

  15. MicroRNAs play critical roles during plant development and in response to abiotic stresses.

    PubMed

    de Lima, Júlio César; Loss-Morais, Guilherme; Margis, Rogerio

    2012-12-01

    MicroRNAs (miRNAs) have been identified as key molecules in regulatory networks. The fine-tuning role of miRNAs in addition to the regulatory role of transcription factors has shown that molecular events during development are tightly regulated. In addition, several miRNAs play crucial roles in the response to abiotic stress induced by drought, salinity, low temperatures, and metals such as aluminium. Interestingly, several miRNAs have overlapping roles with regard to development, stress responses, and nutrient homeostasis. Moreover, in response to the same abiotic stresses, different expression patterns for some conserved miRNA families among different plant species revealed different metabolic adjustments. The use of deep sequencing technologies for the characterisation of miRNA frequency and the identification of new miRNAs adds complexity to regulatory networks in plants. In this review, we consider the regulatory role of miRNAs in plant development and abiotic stresses, as well as the impact of deep sequencing technologies on the generation of miRNA data.

  16. SeqReporter: automating next-generation sequencing result interpretation and reporting workflow in a clinical laboratory.

    PubMed

    Roy, Somak; Durso, Mary Beth; Wald, Abigail; Nikiforov, Yuri E; Nikiforova, Marina N

    2014-01-01

    A wide repertoire of bioinformatics applications exist for next-generation sequencing data analysis; however, certain requirements of the clinical molecular laboratory limit their use: i) comprehensive report generation, ii) compatibility with existing laboratory information systems and computer operating system, iii) knowledgebase development, iv) quality management, and v) data security. SeqReporter is a web-based application developed using ASP.NET framework version 4.0. The client-side was designed using HTML5, CSS3, and Javascript. The server-side processing (VB.NET) relied on interaction with a customized SQL server 2008 R2 database. Overall, 104 cases (1062 variant calls) were analyzed by SeqReporter. Each variant call was classified into one of five report levels: i) known clinical significance, ii) uncertain clinical significance, iii) pending pathologists' review, iv) synonymous and deep intronic, and v) platform and panel-specific sequence errors. SeqReporter correctly annotated and classified 99.9% (859 of 860) of sequence variants, including 68.7% synonymous single-nucleotide variants, 28.3% nonsynonymous single-nucleotide variants, 1.7% insertions, and 1.3% deletions. One variant of potential clinical significance was re-classified after pathologist review. Laboratory information system-compatible clinical reports were generated automatically. SeqReporter also facilitated quality management activities. SeqReporter is an example of a customized and well-designed informatics solution to optimize and automate the downstream analysis of clinical next-generation sequencing data. We propose it as a model that may envisage the development of a comprehensive clinical informatics solution. Copyright © 2014 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  17. Acute multi-sgRNA knockdown of KEOPS complex genes reproduces the microcephaly phenotype of the stable knockout zebrafish model.

    PubMed

    Jobst-Schwan, Tilman; Schmidt, Johanna Magdalena; Schneider, Ronen; Hoogstraten, Charlotte A; Ullmann, Jeremy F P; Schapiro, David; Majmundar, Amar J; Kolb, Amy; Eddy, Kaitlyn; Shril, Shirlee; Braun, Daniela A; Poduri, Annapurna; Hildebrandt, Friedhelm

    2018-01-01

    Until recently, morpholino oligonucleotides have been widely employed in zebrafish as an acute and efficient loss-of-function assay. However, off-target effects and reproducibility issues when compared to stable knockout lines have compromised their further use. Here we employed an acute CRISPR/Cas approach using multiple single guide RNAs targeting simultaneously different positions in two exemplar genes (osgep or tprkb) to increase the likelihood of generating mutations on both alleles in the injected F0 generation and to achieve a similar effect as morpholinos but with the reproducibility of stable lines. This multi single guide RNA approach resulted in median likelihoods for at least one mutation on each allele of >99% and sgRNA specific insertion/deletion profiles as revealed by deep-sequencing. Immunoblot showed a significant reduction for Osgep and Tprkb proteins. For both genes, the acute multi-sgRNA knockout recapitulated the microcephaly phenotype and reduction in survival that we observed previously in stable knockout lines, though milder in the acute multi-sgRNA knockout. Finally, we quantify the degree of mutagenesis by deep sequencing, and provide a mathematical model to quantitate the chance for a biallelic loss-of-function mutation. Our findings can be generalized to acute and stable CRISPR/Cas targeting for any zebrafish gene of interest.

  18. Deciphering KRAS and NRAS mutated clone dynamics in MLL-AF4 paediatric leukaemia by ultra deep sequencing analysis.

    PubMed

    Trentin, Luca; Bresolin, Silvia; Giarin, Emanuela; Bardini, Michela; Serafin, Valentina; Accordi, Benedetta; Fais, Franco; Tenca, Claudya; De Lorenzo, Paola; Valsecchi, Maria Grazia; Cazzaniga, Giovanni; Kronnie, Geertruy Te; Basso, Giuseppe

    2016-10-04

    To induce and sustain the leukaemogenic process, MLL-AF4+ leukaemia seems to require very few genetic alterations in addition to the fusion gene itself. Studies of infant and paediatric patients with MLL-AF4+ B cell precursor acute lymphoblastic leukaemia (BCP-ALL) have reported mutations in KRAS and NRAS with incidences ranging from 25 to 50%. Whereas previous studies employed Sanger sequencing, here we used next generation amplicon deep sequencing for in depth evaluation of RAS mutations in 36 paediatric patients at diagnosis of MLL-AF4+ leukaemia. RAS mutations including those in small sub-clones were detected in 63.9% of patients. Furthermore, the mutational analysis of 17 paired samples at diagnosis and relapse revealed complex RAS clone dynamics and showed that the mutated clones present at relapse were almost all originated from clones that were already detectable at diagnosis and survived to the initial therapy. Finally, we showed that mutated patients were indeed characterized by a RAS related signature at both transcriptional and protein levels and that the targeting of the RAS pathway could be of beneficial for treatment of MLL-AF4+ BCP-ALL clones carrying somatic RAS mutations.

  19. Low-Latency Telerobotic Sample Return and Biomolecular Sequencing for Deep Space Gateway

    NASA Astrophysics Data System (ADS)

    Lupisella, M.; Bleacher, J.; Lewis, R.; Dworkin, J.; Wright, M.; Burton, A.; Rubins, K.; Wallace, S.; Stahl, S.; John, K.; Archer, D.; Niles, P.; Regberg, A.; Smith, D.; Race, M.; Chiu, C.; Russell, J.; Rampe, E.; Bywaters, K.

    2018-02-01

    Low-latency telerobotics, crew-assisted sample return, and biomolecular sequencing can be used to acquire and analyze lunar farside and/or Apollo landing site samples. Sequencing can also be used to monitor and study Deep Space Gateway environment and crew health.

  20. SCRaMbLE generates designed combinatorial stochastic diversity in synthetic chromosomes.

    PubMed

    Shen, Yue; Stracquadanio, Giovanni; Wang, Yun; Yang, Kun; Mitchell, Leslie A; Xue, Yaxin; Cai, Yizhi; Chen, Tai; Dymond, Jessica S; Kang, Kang; Gong, Jianhui; Zeng, Xiaofan; Zhang, Yongfen; Li, Yingrui; Feng, Qiang; Xu, Xun; Wang, Jun; Wang, Jian; Yang, Huanming; Boeke, Jef D; Bader, Joel S

    2016-01-01

    Synthetic chromosome rearrangement and modification by loxP-mediated evolution (SCRaMbLE) generates combinatorial genomic diversity through rearrangements at designed recombinase sites. We applied SCRaMbLE to yeast synthetic chromosome arm synIXR (43 recombinase sites) and then used a computational pipeline to infer or unscramble the sequence of recombinations that created the observed genomes. Deep sequencing of 64 synIXR SCRaMbLE strains revealed 156 deletions, 89 inversions, 94 duplications, and 55 additional complex rearrangements; several duplications are consistent with a double rolling circle mechanism. Every SCRaMbLE strain was unique, validating the capability of SCRaMbLE to explore a diverse space of genomes. Rearrangements occurred exclusively at designed loxPsym sites, with no significant evidence for ectopic rearrangements or mutations involving synthetic regions, the 99% nonsynthetic nuclear genome, or the mitochondrial genome. Deletion frequencies identified genes required for viability or fast growth. Replacement of 3' UTR by non-UTR sequence had surprisingly little effect on fitness. SCRaMbLE generates genome diversity in designated regions, reveals fitness constraints, and should scale to simultaneous evolution of multiple synthetic chromosomes. © 2016 Shen et al.; Published by Cold Spring Harbor Laboratory Press.

  1. Fungal communities from the calcareous deep-sea sediments in the Southwest India Ridge revealed by Illumina sequencing technology.

    PubMed

    Zhang, Likui; Kang, Manyu; Huang, Yangchao; Yang, Lixiang

    2016-05-01

    The diversity and ecological significance of bacteria and archaea in deep-sea environments have been thoroughly investigated, but eukaryotic microorganisms in these areas, such as fungi, are poorly understood. To elucidate fungal diversity in calcareous deep-sea sediments in the Southwest India Ridge (SWIR), the internal transcribed spacer (ITS) regions of rRNA genes from two sediment metagenomic DNA samples were amplified and sequenced using the Illumina sequencing platform. The results revealed that 58-63 % and 36-42 % of the ITS sequences (97 % similarity) belonged to Basidiomycota and Ascomycota, respectively. These findings suggest that Basidiomycota and Ascomycota are the predominant fungal phyla in the two samples. We also found that Agaricomycetes, Leotiomycetes, and Pezizomycetes were the major fungal classes in the two samples. At the species level, Thelephoraceae sp. and Phialocephala fortinii were major fungal species in the two samples. Despite the low relative abundance, unidentified fungal sequences were also observed in the two samples. Furthermore, we found that there were slight differences in fungal diversity between the two sediment samples, although both were collected from the SWIR. Thus, our results demonstrate that calcareous deep-sea sediments in the SWIR harbor diverse fungi, which augment the fungal groups in deep-sea sediments. This is the first report of fungal communities in calcareous deep-sea sediments in the SWIR revealed by Illumina sequencing.

  2. Clonal architecture of secondary acute myeloid leukemia defined by single-cell sequencing.

    PubMed

    Hughes, Andrew E O; Magrini, Vincent; Demeter, Ryan; Miller, Christopher A; Fulton, Robert; Fulton, Lucinda L; Eades, William C; Elliott, Kevin; Heath, Sharon; Westervelt, Peter; Ding, Li; Conrad, Donald F; White, Brian S; Shao, Jin; Link, Daniel C; DiPersio, John F; Mardis, Elaine R; Wilson, Richard K; Ley, Timothy J; Walter, Matthew J; Graubert, Timothy A

    2014-07-01

    Next-generation sequencing has been used to infer the clonality of heterogeneous tumor samples. These analyses yield specific predictions-the population frequency of individual clones, their genetic composition, and their evolutionary relationships-which we set out to test by sequencing individual cells from three subjects diagnosed with secondary acute myeloid leukemia, each of whom had been previously characterized by whole genome sequencing of unfractionated tumor samples. Single-cell mutation profiling strongly supported the clonal architecture implied by the analysis of bulk material. In addition, it resolved the clonal assignment of single nucleotide variants that had been initially ambiguous and identified areas of previously unappreciated complexity. Accordingly, we find that many of the key assumptions underlying the analysis of tumor clonality by deep sequencing of unfractionated material are valid. Furthermore, we illustrate a single-cell sequencing strategy for interrogating the clonal relationships among known variants that is cost-effective, scalable, and adaptable to the analysis of both hematopoietic and solid tumors, or any heterogeneous population of cells.

  3. Designing Anticancer Peptides by Constructive Machine Learning.

    PubMed

    Grisoni, Francesca; Neuhaus, Claudia S; Gabernet, Gisela; Müller, Alex T; Hiss, Jan A; Schneider, Gisbert

    2018-04-21

    Constructive (generative) machine learning enables the automated generation of novel chemical structures without the need for explicit molecular design rules. This study presents the experimental application of such a deep machine learning model to design membranolytic anticancer peptides (ACPs) de novo. A recurrent neural network with long short-term memory cells was trained on α-helical cationic amphipathic peptide sequences and then fine-tuned with 26 known ACPs by transfer learning. This optimized model was used to generate unique and novel amino acid sequences. Twelve of the peptides were synthesized and tested for their activity on MCF7 human breast adenocarcinoma cells and selectivity against human erythrocytes. Ten of these peptides were active against cancer cells. Six of the active peptides killed MCF7 cancer cells without affecting human erythrocytes with at least threefold selectivity. These results advocate constructive machine learning for the automated design of peptides with desired biological activities. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. Deep Sequencing of Plant and Animal DNA Contained within Traditional Chinese Medicines Reveals Legality Issues and Health Safety Concerns

    PubMed Central

    Coghlan, Megan L.; Haile, James; Houston, Jayne; Murray, Dáithí C.; White, Nicole E.; Moolhuijzen, Paula; Bellgard, Matthew I.; Bunce, Michael

    2012-01-01

    Traditional Chinese medicine (TCM) has been practiced for thousands of years, but only within the last few decades has its use become more widespread outside of Asia. Concerns continue to be raised about the efficacy, legality, and safety of many popular complementary alternative medicines, including TCMs. Ingredients of some TCMs are known to include derivatives of endangered, trade-restricted species of plants and animals, and therefore contravene the Convention on International Trade in Endangered Species (CITES) legislation. Chromatographic studies have detected the presence of heavy metals and plant toxins within some TCMs, and there are numerous cases of adverse reactions. It is in the interests of both biodiversity conservation and public safety that techniques are developed to screen medicinals like TCMs. Targeting both the p-loop region of the plastid trnL gene and the mitochondrial 16S ribosomal RNA gene, over 49,000 amplicon sequence reads were generated from 15 TCM samples presented in the form of powders, tablets, capsules, bile flakes, and herbal teas. Here we show that second-generation, high-throughput sequencing (HTS) of DNA represents an effective means to genetically audit organic ingredients within complex TCMs. Comparison of DNA sequence data to reference databases revealed the presence of 68 different plant families and included genera, such as Ephedra and Asarum, that are potentially toxic. Similarly, animal families were identified that include genera that are classified as vulnerable, endangered, or critically endangered, including Asiatic black bear (Ursus thibetanus) and Saiga antelope (Saiga tatarica). Bovidae, Cervidae, and Bufonidae DNA were also detected in many of the TCM samples and were rarely declared on the product packaging. This study demonstrates that deep sequencing via HTS is an efficient and cost-effective way to audit highly processed TCM products and will assist in monitoring their legality and safety especially when plant reference databases become better established. PMID:22511890

  5. Rapid molecular diagnostics of severe primary immunodeficiency determined by using targeted next-generation sequencing.

    PubMed

    Yu, Hui; Zhang, Victor Wei; Stray-Pedersen, Asbjørg; Hanson, Imelda Celine; Forbes, Lisa R; de la Morena, M Teresa; Chinn, Ivan K; Gorman, Elizabeth; Mendelsohn, Nancy J; Pozos, Tamara; Wiszniewski, Wojciech; Nicholas, Sarah K; Yates, Anne B; Moore, Lindsey E; Berge, Knut Erik; Sorte, Hanne; Bayer, Diana K; ALZahrani, Daifulah; Geha, Raif S; Feng, Yanming; Wang, Guoli; Orange, Jordan S; Lupski, James R; Wang, Jing; Wong, Lee-Jun

    2016-10-01

    Primary immunodeficiency diseases (PIDDs) are inherited disorders of the immune system. The most severe form, severe combined immunodeficiency (SCID), presents with profound deficiencies of T cells, B cells, or both at birth. If not treated promptly, affected patients usually do not live beyond infancy because of infections. Genetic heterogeneity of SCID frequently delays the diagnosis; a specific diagnosis is crucial for life-saving treatment and optimal management. We developed a next-generation sequencing (NGS)-based multigene-targeted panel for SCID and other severe PIDDs requiring rapid therapeutic actions in a clinical laboratory setting. The target gene capture/NGS assay provides an average read depth of approximately 1000×. The deep coverage facilitates simultaneous detection of single nucleotide variants and exonic copy number variants in one comprehensive assessment. Exons with insufficient coverage (<20× read depth) or high sequence homology (pseudogenes) are complemented by amplicon-based sequencing with specific primers to ensure 100% coverage of all targeted regions. Analysis of 20 patient samples with low T-cell receptor excision circle numbers on newborn screening or a positive family history or clinical suspicion of SCID or other severe PIDD identified deleterious mutations in 14 of them. Identified pathogenic variants included both single nucleotide variants and exonic copy number variants, such as hemizygous nonsense, frameshift, and missense changes in IL2RG; compound heterozygous changes in ATM, RAG1, and CIITA; homozygous changes in DCLRE1C and IL7R; and a heterozygous nonsense mutation in CHD7. High-throughput deep sequencing analysis with complete clinical validation greatly increases the diagnostic yield of severe primary immunodeficiency. Establishing a molecular diagnosis enables early immune reconstitution through prompt therapeutic intervention and guides management for improved long-term quality of life. Copyright © 2016 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  6. Accurate identification of RNA editing sites from primitive sequence with deep neural networks.

    PubMed

    Ouyang, Zhangyi; Liu, Feng; Zhao, Chenghui; Ren, Chao; An, Gaole; Mei, Chuan; Bo, Xiaochen; Shu, Wenjie

    2018-04-16

    RNA editing is a post-transcriptional RNA sequence alteration. Current methods have identified editing sites and facilitated research but require sufficient genomic annotations and prior-knowledge-based filtering steps, resulting in a cumbersome, time-consuming identification process. Moreover, these methods have limited generalizability and applicability in species with insufficient genomic annotations or in conditions of limited prior knowledge. We developed DeepRed, a deep learning-based method that identifies RNA editing from primitive RNA sequences without prior-knowledge-based filtering steps or genomic annotations. DeepRed achieved 98.1% and 97.9% area under the curve (AUC) in training and test sets, respectively. We further validated DeepRed using experimentally verified U87 cell RNA-seq data, achieving 97.9% positive predictive value (PPV). We demonstrated that DeepRed offers better prediction accuracy and computational efficiency than current methods with large-scale, mass RNA-seq data. We used DeepRed to assess the impact of multiple factors on editing identification with RNA-seq data from the Association of Biomolecular Resource Facilities and Sequencing Quality Control projects. We explored developmental RNA editing pattern changes during human early embryogenesis and evolutionary patterns in Drosophila species and the primate lineage using DeepRed. Our work illustrates DeepRed's state-of-the-art performance; it may decipher the hidden principles behind RNA editing, making editing detection convenient and effective.

  7. Classification of G-protein coupled receptors based on a rich generation of convolutional neural network, N-gram transformation and multiple sequence alignments.

    PubMed

    Li, Man; Ling, Cheng; Xu, Qi; Gao, Jingyang

    2018-02-01

    Sequence classification is crucial in predicting the function of newly discovered sequences. In recent years, the prediction of the incremental large-scale and diversity of sequences has heavily relied on the involvement of machine-learning algorithms. To improve prediction accuracy, these algorithms must confront the key challenge of extracting valuable features. In this work, we propose a feature-enhanced protein classification approach, considering the rich generation of multiple sequence alignment algorithms, N-gram probabilistic language model and the deep learning technique. The essence behind the proposed method is that if each group of sequences can be represented by one feature sequence, composed of homologous sites, there should be less loss when the sequence is rebuilt, when a more relevant sequence is added to the group. On the basis of this consideration, the prediction becomes whether a query sequence belonging to a group of sequences can be transferred to calculate the probability that the new feature sequence evolves from the original one. The proposed work focuses on the hierarchical classification of G-protein Coupled Receptors (GPCRs), which begins by extracting the feature sequences from the multiple sequence alignment results of the GPCRs sub-subfamilies. The N-gram model is then applied to construct the input vectors. Finally, these vectors are imported into a convolutional neural network to make a prediction. The experimental results elucidate that the proposed method provides significant performance improvements. The classification error rate of the proposed method is reduced by at least 4.67% (family level I) and 5.75% (family Level II), in comparison with the current state-of-the-art methods. The implementation program of the proposed work is freely available at: https://github.com/alanFchina/CNN .

  8. [Personalized urooncology based on molecular uropathology: what is the future?].

    PubMed

    Dahl, E; Haller, F

    2013-07-01

    Targeted therapies and biomarker validation are key drivers in the advancement of personalized oncology which is a growing topic in all clinical areas. Compared with other professions, such as pulmonology and gynecology, development in urology has so far been retarded but has recently gained increasing momentum. A basis for this is the currently growing and in future accelerated application of new knowledge derived from molecular biology in the field of uropathology. The rapid gain of knowledge is driven by a whole new class of analytical methods, such as massively parallel sequencing (deep sequencing or next generation sequencing), which enables analysis of virtually a new universe of potential biomarkers. This article describes the emerging paradigm shift in molecular pathological diagnostics of urological tumors using the example of prostate cancer.

  9. Deep sampling of the Palomero maize transcriptome by a high throughput strategy of pyrosequencing.

    PubMed

    Vega-Arreguín, Julio C; Ibarra-Laclette, Enrique; Jiménez-Moraila, Beatriz; Martínez, Octavio; Vielle-Calzada, Jean Philippe; Herrera-Estrella, Luis; Herrera-Estrella, Alfredo

    2009-07-06

    In-depth sequencing analysis has not been able to determine the overall complexity of transcriptional activity of a plant organ or tissue sample. In some cases, deep parallel sequencing of Expressed Sequence Tags (ESTs), although not yet optimized for the sequencing of cDNAs, has represented an efficient procedure for validating gene prediction and estimating overall gene coverage. This approach could be very valuable for complex plant genomes. In addition, little emphasis has been given to efforts aiming at an estimation of the overall transcriptional universe found in a multicellular organism at a specific developmental stage. To explore, in depth, the transcriptional diversity in an ancient maize landrace, we developed a protocol to optimize the sequencing of cDNAs and performed 4 consecutive GS20-454 pyrosequencing runs of a cDNA library obtained from 2 week-old Palomero Toluqueño maize plants. The protocol reported here allowed obtaining over 90% of informative sequences. These GS20-454 runs generated over 1.5 Million reads, representing the largest amount of sequences reported from a single plant cDNA library. A collection of 367,391 quality-filtered reads (30.09 Mb) from a single run was sufficient to identify transcripts corresponding to 34% of public maize ESTs databases; total sequences generated after 4 filtered runs increased this coverage to 50%. Comparisons of all 1.5 Million reads to the Maize Assembled Genomic Islands (MAGIs) provided evidence for the transcriptional activity of 11% of MAGIs. We estimate that 5.67% (86,069 sequences) do not align with public ESTs or annotated genes, potentially representing new maize transcripts. Following the assembly of 74.4% of the reads in 65,493 contigs, real-time PCR of selected genes confirmed a predicted correlation between the abundance of GS20-454 sequences and corresponding levels of gene expression. A protocol was developed that significantly increases the number, length and quality of cDNA reads using massive 454 parallel sequencing. We show that recurrent 454 pyrosequencing of a single cDNA sample is necessary to attain a thorough representation of the transcriptional universe present in maize, that can also be used to estimate transcript abundance of specific genes. This data suggests that the molecular and functional diversity contained in the vast native landraces remains to be explored, and that large-scale transcriptional sequencing of a presumed ancestor of the modern maize varieties represents a valuable approach to characterize the functional diversity of maize for future agricultural and evolutionary studies.

  10. Evolution of multi-drug resistant HCV clones from pre-existing resistant-associated variants during direct-acting antiviral therapy determined by third-generation sequencing

    NASA Astrophysics Data System (ADS)

    Takeda, Haruhiko; Ueda, Yoshihide; Inuzuka, Tadashi; Yamashita, Yukitaka; Osaki, Yukio; Nasu, Akihiro; Umeda, Makoto; Takemura, Ryo; Seno, Hiroshi; Sekine, Akihiro; Marusawa, Hiroyuki

    2017-03-01

    Resistance-associated variant (RAV) is one of the most significant clinical challenges in treating HCV-infected patients with direct-acting antivirals (DAAs). We investigated the viral dynamics in patients receiving DAAs using third-generation sequencing technology. Among 283 patients with genotype-1b HCV receiving daclatasvir + asunaprevir (DCV/ASV), 32 (11.3%) failed to achieve sustained virological response (SVR). Conventional ultra-deep sequencing of HCV genome was performed in 104 patients (32 non-SVR, 72 SVR), and detected representative RAVs in all non-SVR patients at baseline, including Y93H in 28 (87.5%). Long contiguous sequences spanning NS3 to NS5A regions of each viral clone in 12 sera from 6 representative non-SVR patients were determined by third-generation sequencing, and showed the concurrent presence of several synonymous mutations linked to resistance-associated substitutions in a subpopulation of pre-existing RAVs and dominant isolates at treatment failure. Phylogenetic analyses revealed close genetic distances between pre-existing RAVs and dominant RAVs at treatment failure. In addition, multiple drug-resistant mutations developed on pre-existing RAVs after DCV/ASV in all non-SVR cases. In conclusion, multi-drug resistant viral clones at treatment failure certainly originated from a subpopulation of pre-existing RAVs in HCV-infected patients. Those RAVs were selected for and became dominant with the acquisition of multiple resistance-associated substitutions under DAA treatment pressure.

  11. Noninvasive genome sampling in chimpanzees.

    PubMed

    Kohn, Michael H

    2010-12-01

    The inevitable has happened: genomic technologies have been added to our noninvasive genetic sampling repertoire. In this issue of Molecular Ecology, Perry et al. (2010) demonstrate how DNA extraction from chimpanzee faeces, followed by a series of steps to enrich for target loci, can be coupled with next-generation sequencing. These authors collected sequence and single-nucleotide polymorphism (SNP) data at more than 600 genomic loci (chromosome 21 and the X) and the complete mitochondrial DNA. By design, each locus was 'deep sequenced' to enable SNP identification. To demonstrate the reliability of their data, the work included samples from six captive chimps, which allowed for a comparison between presumably genuine SNPs obtained from blood and potentially flawed SNPs deduced from faeces. Thus, with this method, anyone with the resources, skills and ambition to do genome sequencing of wild, elusive, or protected mammals can enjoy all of the benefits of noninvasive sampling. © 2010 Blackwell Publishing Ltd.

  12. Comparative sequencing analysis reveals high genomic concordance between matched primary and metastatic colorectal cancer lesions.

    PubMed

    Brannon, A Rose; Vakiani, Efsevia; Sylvester, Brooke E; Scott, Sasinya N; McDermott, Gregory; Shah, Ronak H; Kania, Krishan; Viale, Agnes; Oschwald, Dayna M; Vacic, Vladimir; Emde, Anne-Katrin; Cercek, Andrea; Yaeger, Rona; Kemeny, Nancy E; Saltz, Leonard B; Shia, Jinru; D'Angelica, Michael I; Weiser, Martin R; Solit, David B; Berger, Michael F

    2014-08-28

    Colorectal cancer is the second leading cause of cancer death in the United States, with over 50,000 deaths estimated in 2014. Molecular profiling for somatic mutations that predict absence of response to anti-EGFR therapy has become standard practice in the treatment of metastatic colorectal cancer; however, the quantity and type of tissue available for testing is frequently limited. Further, the degree to which the primary tumor is a faithful representation of metastatic disease has been questioned. As next-generation sequencing technology becomes more widely available for clinical use and additional molecularly targeted agents are considered as treatment options in colorectal cancer, it is important to characterize the extent of tumor heterogeneity between primary and metastatic tumors. We performed deep coverage, targeted next-generation sequencing of 230 key cancer-associated genes for 69 matched primary and metastatic tumors and normal tissue. Mutation profiles were 100% concordant for KRAS, NRAS, and BRAF, and were highly concordant for recurrent alterations in colorectal cancer. Additionally, whole genome sequencing of four patient trios did not reveal any additional site-specific targetable alterations. Colorectal cancer primary tumors and metastases exhibit high genomic concordance. As current clinical practices in colorectal cancer revolve around KRAS, NRAS, and BRAF mutation status, diagnostic sequencing of either primary or metastatic tissue as available is acceptable for most patients. Additionally, consistency between targeted sequencing and whole genome sequencing results suggests that targeted sequencing may be a suitable strategy for clinical diagnostic applications.

  13. A Universal Method for Species Identification of Mammals Utilizing Next Generation Sequencing for the Analysis of DNA Mixtures

    PubMed Central

    Tillmar, Andreas O.; Dell'Amico, Barbara; Welander, Jenny; Holmlund, Gunilla

    2013-01-01

    Species identification can be interesting in a wide range of areas, for example, in forensic applications, food monitoring and in archeology. The vast majority of existing DNA typing methods developed for species determination, mainly focuses on a single species source. There are, however, many instances where all species from mixed sources need to be determined, even when the species in minority constitutes less than 1 % of the sample. The introduction of next generation sequencing opens new possibilities for such challenging samples. In this study we present a universal deep sequencing method using 454 GS Junior sequencing of a target on the mitochondrial gene 16S rRNA. The method was designed through phylogenetic analyses of DNA reference sequences from more than 300 mammal species. Experiments were performed on artificial species-species mixture samples in order to verify the method’s robustness and its ability to detect all species within a mixture. The method was also tested on samples from authentic forensic casework. The results showed to be promising, discriminating over 99.9 % of mammal species and the ability to detect multiple donors within a mixture and also to detect minor components as low as 1 % of a mixed sample. PMID:24358309

  14. Phage display screening without repetitious selection rounds.

    PubMed

    't Hoen, Peter A C; Jirka, Silvana M G; Ten Broeke, Bradley R; Schultes, Erik A; Aguilera, Begoña; Pang, Kar Him; Heemskerk, Hans; Aartsma-Rus, Annemieke; van Ommen, Gertjan J; den Dunnen, Johan T

    2012-02-15

    Phage display screenings are frequently employed to identify high-affinity peptides or antibodies. Although successful, phage display is a laborious technology and is notorious for identification of false positive hits. To accelerate and improve the selection process, we have employed Illumina next generation sequencing to deeply characterize the Ph.D.-7 M13 peptide phage display library before and after several rounds of biopanning on KS483 osteoblast cells. Sequencing of the naive library after one round of amplification in bacteria identifies propagation advantage as an important source of false positive hits. Most important, our data show that deep sequencing of the phage pool after a first round of biopanning is already sufficient to identify positive phages. Whereas traditional sequencing of a limited number of clones after one or two rounds of selection is uninformative, the required additional rounds of biopanning are associated with the risk of losing promising clones propagating slower than nonbinding phages. Confocal and live cell imaging confirms that our screen successfully selected a peptide with very high binding and uptake in osteoblasts. We conclude that next generation sequencing can significantly empower phage display screenings by accelerating the finding of specific binders and restraining the number of false positive hits. Copyright © 2011 Elsevier Inc. All rights reserved.

  15. ampliMethProfiler: a pipeline for the analysis of CpG methylation profiles of targeted deep bisulfite sequenced amplicons.

    PubMed

    Scala, Giovanni; Affinito, Ornella; Palumbo, Domenico; Florio, Ermanno; Monticelli, Antonella; Miele, Gennaro; Chiariotti, Lorenzo; Cocozza, Sergio

    2016-11-25

    CpG sites in an individual molecule may exist in a binary state (methylated or unmethylated) and each individual DNA molecule, containing a certain number of CpGs, is a combination of these states defining an epihaplotype. Classic quantification based approaches to study DNA methylation are intrinsically unable to fully represent the complexity of the underlying methylation substrate. Epihaplotype based approaches, on the other hand, allow methylation profiles of cell populations to be studied at the single molecule level. For such investigations, next-generation sequencing techniques can be used, both for quantitative and for epihaplotype analysis. Currently available tools for methylation analysis lack output formats that explicitly report CpG methylation profiles at the single molecule level and that have suited statistical tools for their interpretation. Here we present ampliMethProfiler, a python-based pipeline for the extraction and statistical epihaplotype analysis of amplicons from targeted deep bisulfite sequencing of multiple DNA regions. ampliMethProfiler tool provides an easy and user friendly way to extract and analyze the epihaplotype composition of reads from targeted bisulfite sequencing experiments. ampliMethProfiler is written in python language and requires a local installation of BLAST and (optionally) QIIME tools. It can be run on Linux and OS X platforms. The software is open source and freely available at http://amplimethprofiler.sourceforge.net .

  16. The mitochondrial genome of a sea anemone Bolocera sp. exhibits novel genetic structures potentially involved in adaptation to the deep-sea environment.

    PubMed

    Zhang, Bo; Zhang, Yan-Hong; Wang, Xin; Zhang, Hui-Xian; Lin, Qiang

    2017-07-01

    The deep sea is one of the most extensive ecosystems on earth. Organisms living there survive in an extremely harsh environment, and their mitochondrial energy metabolism might be a result of evolution. As one of the most important organelles, mitochondria generate energy through energy metabolism and play an important role in almost all biological activities. In this study, the mitogenome of a deep-sea sea anemone ( Bolocera sp.) was sequenced and characterized. Like other metazoans, it contained 13 energy pathway protein-coding genes and two ribosomal RNAs. However, it also exhibited some unique features: just two transfer RNA genes, two group I introns, two transposon-like noncanonical open reading frames (ORFs), and a control region-like (CR-like) element. All of the mitochondrial genes were coded by the same strand (the H-strand). The genetic order and orientation were identical to those of most sequenced actiniarians. Phylogenetic analyses showed that this species was closely related to Bolocera tuediae . Positive selection analysis showed that three residues (31 L and 42 N in ATP6 , 570 S in ND5 ) of Bolocera sp. were positively selected sites. By comparing these features with those of shallow sea anemone species, we deduced that these novel gene features may influence the activity of mitochondrial genes. This study may provide some clues regarding the adaptation of Bolocera sp. to the deep-sea environment.

  17. Comprehensive Annotation of the Parastagonospora nodorum Reference Genome Using Next-Generation Genomics, Transcriptomics and Proteogenomics

    PubMed Central

    Dodhia, Kejal; Stoll, Thomas; Hastie, Marcus; Furuki, Eiko; Ellwood, Simon R.; Williams, Angela H.; Tan, Yew-Foon; Testa, Alison C.; Gorman, Jeffrey J.; Oliver, Richard P.

    2016-01-01

    Parastagonospora nodorum, the causal agent of Septoria nodorum blotch (SNB), is an economically important pathogen of wheat (Triticum spp.), and a model for the study of necrotrophic pathology and genome evolution. The reference P. nodorum strain SN15 was the first Dothideomycete with a published genome sequence, and has been used as the basis for comparison within and between species. Here we present an updated reference genome assembly with corrections of SNP and indel errors in the underlying genome assembly from deep resequencing data as well as extensive manual annotation of gene models using transcriptomic and proteomic sources of evidence (https://github.com/robsyme/Parastagonospora_nodorum_SN15). The updated assembly and annotation includes 8,366 genes with modified protein sequence and 866 new genes. This study shows the benefits of using a wide variety of experimental methods allied to expert curation to generate a reliable set of gene models. PMID:26840125

  18. Efficient generation of mouse models of human diseases via ABE- and BE-mediated base editing.

    PubMed

    Liu, Zhen; Lu, Zongyang; Yang, Guang; Huang, Shisheng; Li, Guanglei; Feng, Songjie; Liu, Yajing; Li, Jianan; Yu, Wenxia; Zhang, Yu; Chen, Jia; Sun, Qiang; Huang, Xingxu

    2018-06-14

    A recently developed adenine base editor (ABE) efficiently converts A to G and is potentially useful for clinical applications. However, its precision and efficiency in vivo remains to be addressed. Here we achieve A-to-G conversion in vivo at frequencies up to 100% by microinjection of ABE mRNA together with sgRNAs. We then generate mouse models harboring clinically relevant mutations at Ar and Hoxd13, which recapitulates respective clinical defects. Furthermore, we achieve both C-to-T and A-to-G base editing by using a combination of ABE and SaBE3, thus creating mouse model harboring multiple mutations. We also demonstrate the specificity of ABE by deep sequencing and whole-genome sequencing (WGS). Taken together, ABE is highly efficient and precise in vivo, making it feasible to model and potentially cure relevant genetic diseases.

  19. Small RNA Analysis in Sindbis Virus Infected Human HEK293 Cells

    PubMed Central

    Dalmay, Tamas; Powell, Penny P.

    2013-01-01

    Introduction In contrast to the defence mechanism of RNA interference (RNAi) in plants and invertebrates, its role in the innate response to virus infection of mammals is a matter of debate. Since RNAi has a well-established role in controlling infection of the alphavirus Sindbis virus (SINV) in insects, we have used this virus to investigate the role of RNAi in SINV infection of human cells. Results SINV AR339 and TR339-GFP were adapted to grow in HEK293 cells. Deep sequencing of small RNAs (sRNAs) early in SINV infection (4 and 6 hpi) showed low abundance (0.8%) of viral sRNAs (vsRNAs), with no size, sequence or location specific patterns characteristic of Dicer products nor did they possess any discernible pattern to ascribe to a specific RNAi biogenesis pathway. This was supported by multiple variants for each sequence, and lack of hot spots along the viral genome sequence. The abundance of the best defined vsRNAs was below the limit of Northern blot detection. The adaptation of the virus to HEK293 cells showed little sequence changes compared to the reference; however, a SNP in E1 gene with a preference from G to C was found. Deep sequencing results showed little variation of expression of cellular microRNAs (miRNAs) at 4 and 6 hpi compared to uninfected cells. Twelve miRNAs exhibiting some minor differential expression by sequencing, showed no difference in expression by Northern blot analysis. Conclusions We show that, unlike SINV infection of invertebrates, generation of Dicer-dependent svRNAs and change in expression of cellular miRNAs were not detected as part of the Human response to SINV. PMID:24391886

  20. Population-genomic variation within RNA viruses of the Western honey bee, Apis mellifera, inferred from deep sequencing

    USDA-ARS?s Scientific Manuscript database

    Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RN...

  1. Detection of Emerging Vaccine-Related Polioviruses by Deep Sequencing.

    PubMed

    Sahoo, Malaya K; Holubar, Marisa; Huang, ChunHong; Mohamed-Hadley, Alisha; Liu, Yuanyuan; Waggoner, Jesse J; Troy, Stephanie B; Garcia-Garcia, Lourdes; Ferreyra-Reyes, Leticia; Maldonado, Yvonne; Pinsky, Benjamin A

    2017-07-01

    Oral poliovirus vaccine can mutate to regain neurovirulence. To date, evaluation of these mutations has been performed primarily on culture-enriched isolates by using conventional Sanger sequencing. We therefore developed a culture-independent, deep-sequencing method targeting the 5' untranslated region (UTR) and P1 genomic region to characterize vaccine-related poliovirus variants. Error analysis of the deep-sequencing method demonstrated reliable detection of poliovirus mutations at levels of <1%, depending on read depth. Sequencing of viral nucleic acids from the stool of vaccinated, asymptomatic children and their close contacts collected during a prospective cohort study in Veracruz, Mexico, revealed no vaccine-derived polioviruses. This was expected given that the longest duration between sequenced sample collection and the end of the most recent national immunization week was 66 days. However, we identified many low-level variants (<5%) distributed across the 5' UTR and P1 genomic region in all three Sabin serotypes, as well as vaccine-related viruses with multiple canonical mutations associated with phenotypic reversion present at high levels (>90%). These results suggest that monitoring emerging vaccine-related poliovirus variants by deep sequencing may aid in the poliovirus endgame and efforts to ensure global polio eradication. Copyright © 2017 Sahoo et al.

  2. Rational Protein Engineering Guided by Deep Mutational Scanning

    PubMed Central

    Shin, HyeonSeok; Cho, Byung-Kwan

    2015-01-01

    Sequence–function relationship in a protein is commonly determined by the three-dimensional protein structure followed by various biochemical experiments. However, with the explosive increase in the number of genome sequences, facilitated by recent advances in sequencing technology, the gap between protein sequences available and three-dimensional structures is rapidly widening. A recently developed method termed deep mutational scanning explores the functional phenotype of thousands of mutants via massive sequencing. Coupled with a highly efficient screening system, this approach assesses the phenotypic changes made by the substitution of each amino acid sequence that constitutes a protein. Such an informational resource provides the functional role of each amino acid sequence, thereby providing sufficient rationale for selecting target residues for protein engineering. Here, we discuss the current applications of deep mutational scanning and consider experimental design. PMID:26404267

  3. Deep Sequencing in Infectious Diseases: Immune and Pathogen Repertoires for the Improvement of Patient Outcomes.

    PubMed

    Burkholder, William F; Newell, Evan W; Poidinger, Michael; Chen, Swaine; Fink, Katja

    2017-01-01

    The inaugural workshop "Deep Sequencing in Infectious Diseases: Immune and Pathogen Repertoires for the Improvement of Patient Outcomes" was held in Singapore on 13-14 October 2016. The aim of the workshop was to discuss the latest trends in using high-throughput sequencing, bioinformatics, and allied technologies to analyze immune and pathogen repertoires and their interplay within the host, bringing together key international players in the field and Singapore-based researchers and clinician-scientists. The focus was in particular on the application of these technologies for the improvement of patient diagnosis, prognosis and treatment, and for other broad public health outcomes. The presentations by scientists and clinicians showed the potential of deep sequencing technology to capture the coevolution of adaptive immunity and pathogens. For clinical applications, some key challenges remain, such as the long turnaround time and relatively high cost of deep sequencing for pathogen identification and characterization and the lack of international standardization in immune repertoire analysis.

  4. Deep Sequencing in Infectious Diseases: Immune and Pathogen Repertoires for the Improvement of Patient Outcomes

    PubMed Central

    Burkholder, William F.; Newell, Evan W.; Poidinger, Michael; Chen, Swaine; Fink, Katja

    2017-01-01

    The inaugural workshop “Deep Sequencing in Infectious Diseases: Immune and Pathogen Repertoires for the Improvement of Patient Outcomes” was held in Singapore on 13–14 October 2016. The aim of the workshop was to discuss the latest trends in using high-throughput sequencing, bioinformatics, and allied technologies to analyze immune and pathogen repertoires and their interplay within the host, bringing together key international players in the field and Singapore-based researchers and clinician-scientists. The focus was in particular on the application of these technologies for the improvement of patient diagnosis, prognosis and treatment, and for other broad public health outcomes. The presentations by scientists and clinicians showed the potential of deep sequencing technology to capture the coevolution of adaptive immunity and pathogens. For clinical applications, some key challenges remain, such as the long turnaround time and relatively high cost of deep sequencing for pathogen identification and characterization and the lack of international standardization in immune repertoire analysis. PMID:28620372

  5. Accuracy for detection of simulated lesions: comparison of fluid-attenuated inversion-recovery, proton density--weighted, and T2-weighted synthetic brain MR imaging

    NASA Technical Reports Server (NTRS)

    Herskovits, E. H.; Itoh, R.; Melhem, E. R.

    2001-01-01

    OBJECTIVE: The objective of our study was to determine the effects of MR sequence (fluid-attenuated inversion-recovery [FLAIR], proton density--weighted, and T2-weighted) and of lesion location on sensitivity and specificity of lesion detection. MATERIALS AND METHODS: We generated FLAIR, proton density-weighted, and T2-weighted brain images with 3-mm lesions using published parameters for acute multiple sclerosis plaques. Each image contained from zero to five lesions that were distributed among cortical-subcortical, periventricular, and deep white matter regions; on either side; and anterior or posterior in position. We presented images of 540 lesions, distributed among 2592 image regions, to six neuroradiologists. We constructed a contingency table for image regions with lesions and another for image regions without lesions (normal). Each table included the following: the reviewer's number (1--6); the MR sequence; the side, position, and region of the lesion; and the reviewer's response (lesion present or absent [normal]). We performed chi-square and log-linear analyses. RESULTS: The FLAIR sequence yielded the highest true-positive rates (p < 0.001) and the highest true-negative rates (p < 0.001). Regions also differed in reviewers' true-positive rates (p < 0.001) and true-negative rates (p = 0.002). The true-positive rate model generated by log-linear analysis contained an additional sequence-location interaction. The true-negative rate model generated by log-linear analysis confirmed these associations, but no higher order interactions were added. CONCLUSION: We developed software with which we can generate brain images of a wide range of pulse sequences and that allows us to specify the location, size, shape, and intrinsic characteristics of simulated lesions. We found that the use of FLAIR sequences increases detection accuracy for cortical-subcortical and periventricular lesions over that associated with proton density- and T2-weighted sequences.

  6. QQ-SNV: single nucleotide variant detection at low frequency by comparing the quality quantiles.

    PubMed

    Van der Borght, Koen; Thys, Kim; Wetzels, Yves; Clement, Lieven; Verbist, Bie; Reumers, Joke; van Vlijmen, Herman; Aerssens, Jeroen

    2015-11-10

    Next generation sequencing enables studying heterogeneous populations of viral infections. When the sequencing is done at high coverage depth ("deep sequencing"), low frequency variants can be detected. Here we present QQ-SNV (http://sourceforge.net/projects/qqsnv), a logistic regression classifier model developed for the Illumina sequencing platforms that uses the quantiles of the quality scores, to distinguish true single nucleotide variants from sequencing errors based on the estimated SNV probability. To train the model, we created a dataset of an in silico mixture of five HIV-1 plasmids. Testing of our method in comparison to the existing methods LoFreq, ShoRAH, and V-Phaser 2 was performed on two HIV and four HCV plasmid mixture datasets and one influenza H1N1 clinical dataset. For default application of QQ-SNV, variants were called using a SNV probability cutoff of 0.5 (QQ-SNV(D)). To improve the sensitivity we used a SNV probability cutoff of 0.0001 (QQ-SNV(HS)). To also increase specificity, SNVs called were overruled when their frequency was below the 80(th) percentile calculated on the distribution of error frequencies (QQ-SNV(HS-P80)). When comparing QQ-SNV versus the other methods on the plasmid mixture test sets, QQ-SNV(D) performed similarly to the existing approaches. QQ-SNV(HS) was more sensitive on all test sets but with more false positives. QQ-SNV(HS-P80) was found to be the most accurate method over all test sets by balancing sensitivity and specificity. When applied to a paired-end HCV sequencing study, with lowest spiked-in true frequency of 0.5%, QQ-SNV(HS-P80) revealed a sensitivity of 100% (vs. 40-60% for the existing methods) and a specificity of 100% (vs. 98.0-99.7% for the existing methods). In addition, QQ-SNV required the least overall computation time to process the test sets. Finally, when testing on a clinical sample, four putative true variants with frequency below 0.5% were consistently detected by QQ-SNV(HS-P80) from different generations of Illumina sequencers. We developed and successfully evaluated a novel method, called QQ-SNV, for highly efficient single nucleotide variant calling on Illumina deep sequencing virology data.

  7. Exploring Genomic Diversity Using Metagenomics of Deep-Sea Subsurface Microbes from the Louisville Seamount and the South Pacific Gyre

    NASA Astrophysics Data System (ADS)

    Tully, B. J.; Sylvan, J. B.; Heidelberg, J. F.; Huber, J. A.

    2014-12-01

    There are many limitations involved with sampling microbial diversity from deep-sea subsurface environments, ranging from physical sample collection, low microbial biomass, culturing at in situ conditions, and inefficient nucleic acid extractions. As such, we are continually modifying our methods to obtain better results and expanding what we know about microbes in these environments. Here we present analysis of metagenomes sequences from samples collected from 120 m within the Louisville Seamount and from the top 5-10cm of the sediment in the center of the south Pacific gyre (SPG). Both systems are low biomass with ~102 and ~104 cells per cm3 for Louisville Seamount samples analyzed and the SPG sediment, respectively. The Louisville Seamount represents the first in situ subseafloor basalt and the SPG sediments represent the first in situ low biomass sediment microbial metagenomes. Both of these environments, subseafloor basalt and sediments underlying oligotrophic ocean gyres, represent large provinces of the seafloor environment that remain understudied. Despite the low biomass and DNA generated from these samples, we have generated 16 near complete genomes (5 from Louisville and 11 from the SPG) from the two metagenomic datasets. These genomes are estimated to be between 51-100% complete and span a range of phylogenetic groups, including the Proteobacteria, Actinobacteria, Firmicutes, Chloroflexi, and unclassified bacterial groups. With these genomes, we have assessed potential functional capabilities of these organisms and performed a comparative analysis between the environmental genomes and previously sequenced relatives to determine possible adaptations that may elucidate survival mechanisms for these low energy environments. These methods illustrate a baseline analysis that can be applied to future metagenomic deep-sea subsurface datasets and will help to further our understanding of microbiology within these environments.

  8. Current and future molecular diagnostics for ocular infectious diseases.

    PubMed

    Doan, Thuy; Pinsky, Benjamin A

    2016-11-01

    Confirmation of ocular infections can pose great challenges to the clinician. A fundamental limitation is the small amounts of specimen that can be obtained from the eye. Molecular diagnostics can circumvent this limitation and have been shown to be more sensitive than conventional culture. The purpose of this review is to describe new molecular methods and to discuss the applications of next-generation sequencing-based approaches in the diagnosis of ocular infections. Efforts have focused on improving the sensitivity of pathogen detection using molecular methods. This review describes a new molecular target for Toxoplasma gondii-directed polymerase chain reaction assays. Molecular diagnostics for Chlamydia trachomatis and Acanthamoeba species are also discussed. Finally, we describe a hypothesis-free approach, metagenomic deep sequencing, which can detect DNA and RNA pathogens from a single specimen in one test. In some cases, this method can provide the geographic location and timing of the infection. Pathogen-directed PCRs have been powerful tools in the diagnosis of ocular infections for over 20 years. The use of next-generation sequencing-based approaches, when available, will further improve sensitivity of detection with the potential to improve patient care.

  9. DNA Replication Profiling Using Deep Sequencing.

    PubMed

    Saayman, Xanita; Ramos-Pérez, Cristina; Brown, Grant W

    2018-01-01

    Profiling of DNA replication during progression through S phase allows a quantitative snap-shot of replication origin usage and DNA replication fork progression. We present a method for using deep sequencing data to profile DNA replication in S. cerevisiae.

  10. Screening for single nucleotide variants, small indels and exon deletions with a next-generation sequencing based gene panel approach for Usher syndrome

    PubMed Central

    Krawitz, Peter M; Schiska, Daniela; Krüger, Ulrike; Appelt, Sandra; Heinrich, Verena; Parkhomchuk, Dmitri; Timmermann, Bernd; Millan, Jose M; Robinson, Peter N; Mundlos, Stefan; Hecht, Jochen; Gross, Manfred

    2014-01-01

    Usher syndrome is an autosomal recessive disorder characterized both by deafness and blindness. For the three clinical subtypes of Usher syndrome causal mutations in altogether 12 genes and a modifier gene have been identified. Due to the genetic heterogeneity of Usher syndrome, the molecular analysis is predestined for a comprehensive and parallelized analysis of all known genes by next-generation sequencing (NGS) approaches. We describe here the targeted enrichment and deep sequencing for exons of Usher genes and compare the costs and workload of this approach compared to Sanger sequencing. We also present a bioinformatics analysis pipeline that allows us to detect single-nucleotide variants, short insertions and deletions, as well as copy number variations of one or more exons on the same sequence data. Additionally, we present a flexible in silico gene panel for the analysis of sequence variants, in which newly identified genes can easily be included. We applied this approach to a cohort of 44 Usher patients and detected biallelic pathogenic mutations in 35 individuals and monoallelic mutations in eight individuals of our cohort. Thirty-nine of the sequence variants, including two heterozygous deletions comprising several exons of USH2A, have not been reported so far. Our NGS-based approach allowed us to assess single-nucleotide variants, small indels, and whole exon deletions in a single test. The described diagnostic approach is fast and cost-effective with a high molecular diagnostic yield. PMID:25333064

  11. Screening for single nucleotide variants, small indels and exon deletions with a next-generation sequencing based gene panel approach for Usher syndrome.

    PubMed

    Krawitz, Peter M; Schiska, Daniela; Krüger, Ulrike; Appelt, Sandra; Heinrich, Verena; Parkhomchuk, Dmitri; Timmermann, Bernd; Millan, Jose M; Robinson, Peter N; Mundlos, Stefan; Hecht, Jochen; Gross, Manfred

    2014-09-01

    Usher syndrome is an autosomal recessive disorder characterized both by deafness and blindness. For the three clinical subtypes of Usher syndrome causal mutations in altogether 12 genes and a modifier gene have been identified. Due to the genetic heterogeneity of Usher syndrome, the molecular analysis is predestined for a comprehensive and parallelized analysis of all known genes by next-generation sequencing (NGS) approaches. We describe here the targeted enrichment and deep sequencing for exons of Usher genes and compare the costs and workload of this approach compared to Sanger sequencing. We also present a bioinformatics analysis pipeline that allows us to detect single-nucleotide variants, short insertions and deletions, as well as copy number variations of one or more exons on the same sequence data. Additionally, we present a flexible in silico gene panel for the analysis of sequence variants, in which newly identified genes can easily be included. We applied this approach to a cohort of 44 Usher patients and detected biallelic pathogenic mutations in 35 individuals and monoallelic mutations in eight individuals of our cohort. Thirty-nine of the sequence variants, including two heterozygous deletions comprising several exons of USH2A, have not been reported so far. Our NGS-based approach allowed us to assess single-nucleotide variants, small indels, and whole exon deletions in a single test. The described diagnostic approach is fast and cost-effective with a high molecular diagnostic yield.

  12. MRO DKF Post-Processing Tool

    NASA Technical Reports Server (NTRS)

    Ayap, Shanti; Fisher, Forest; Gladden, Roy; Khanampompan, Teerapat

    2008-01-01

    This software tool saves time and reduces risk by automating two labor-intensive and error-prone post-processing steps required for every DKF [DSN (Deep Space Network) Keyword File] that MRO (Mars Reconnaissance Orbiter) produces, and is being extended to post-process the corresponding TSOE (Text Sequence Of Events) as well. The need for this post-processing step stems from limitations in the seq-gen modeling resulting in incorrect DKF generation that is then cleaned up in post-processing.

  13. High-Throughput Identification of Loss-of-Function Mutations for Anti-Interferon Activity in the Influenza A Virus NS Segment

    PubMed Central

    Wu, Nicholas C.; Young, Arthur P.; Al-Mawsawi, Laith Q.; Olson, C. Anders; Feng, Jun; Qi, Hangfei; Luan, Harding H.; Li, Xinmin; Wu, Ting-Ting

    2014-01-01

    ABSTRACT Viral proteins often display several functions which require multiple assays to dissect their genetic basis. Here, we describe a systematic approach to screen for loss-of-function mutations that confer a fitness disadvantage under a specified growth condition. Our methodology was achieved by genetically monitoring a mutant library under two growth conditions, with and without interferon, by deep sequencing. We employed a molecular tagging technique to distinguish true mutations from sequencing error. This approach enabled us to identify mutations that were negatively selected against, in addition to those that were positively selected for. Using this technique, we identified loss-of-function mutations in the influenza A virus NS segment that were sensitive to type I interferon in a high-throughput fashion. Mechanistic characterization further showed that a single substitution, D92Y, resulted in the inability of NS to inhibit RIG-I ubiquitination. The approach described in this study can be applied under any specified condition for any virus that can be genetically manipulated. IMPORTANCE Traditional genetics focuses on a single genotype-phenotype relationship, whereas high-throughput genetics permits phenotypic characterization of numerous mutants in parallel. High-throughput genetics often involves monitoring of a mutant library with deep sequencing. However, deep sequencing suffers from a high error rate (∼0.1 to 1%), which is usually higher than the occurrence frequency for individual point mutations within a mutant library. Therefore, only mutations that confer a fitness advantage can be identified with confidence due to an enrichment in the occurrence frequency. In contrast, it is impossible to identify deleterious mutations using most next-generation sequencing techniques. In this study, we have applied a molecular tagging technique to distinguish true mutations from sequencing errors. It enabled us to identify mutations that underwent negative selection, in addition to mutations that experienced positive selection. This study provides a proof of concept by screening for loss-of-function mutations on the influenza A virus NS segment that are involved in its anti-interferon activity. PMID:24965464

  14. Acute multi-sgRNA knockdown of KEOPS complex genes reproduces the microcephaly phenotype of the stable knockout zebrafish model

    PubMed Central

    Schneider, Ronen; Hoogstraten, Charlotte A.; Schapiro, David; Majmundar, Amar J.; Kolb, Amy; Eddy, Kaitlyn; Shril, Shirlee; Braun, Daniela A.; Poduri, Annapurna

    2018-01-01

    Until recently, morpholino oligonucleotides have been widely employed in zebrafish as an acute and efficient loss-of-function assay. However, off-target effects and reproducibility issues when compared to stable knockout lines have compromised their further use. Here we employed an acute CRISPR/Cas approach using multiple single guide RNAs targeting simultaneously different positions in two exemplar genes (osgep or tprkb) to increase the likelihood of generating mutations on both alleles in the injected F0 generation and to achieve a similar effect as morpholinos but with the reproducibility of stable lines. This multi single guide RNA approach resulted in median likelihoods for at least one mutation on each allele of >99% and sgRNA specific insertion/deletion profiles as revealed by deep-sequencing. Immunoblot showed a significant reduction for Osgep and Tprkb proteins. For both genes, the acute multi-sgRNA knockout recapitulated the microcephaly phenotype and reduction in survival that we observed previously in stable knockout lines, though milder in the acute multi-sgRNA knockout. Finally, we quantify the degree of mutagenesis by deep sequencing, and provide a mathematical model to quantitate the chance for a biallelic loss-of-function mutation. Our findings can be generalized to acute and stable CRISPR/Cas targeting for any zebrafish gene of interest. PMID:29346415

  15. Identifying MicroRNAs and Transcript Targets in Jatropha Seeds

    PubMed Central

    Galli, Vanessa; Guzman, Frank; de Oliveira, Luiz F. V.; Loss-Morais, Guilherme; Körbes, Ana P.; Silva, Sérgio D. A.; Margis-Pinheiro, Márcia M. A. N.; Margis, Rogério

    2014-01-01

    MicroRNAs, or miRNAs, are endogenously encoded small RNAs that play a key role in diverse plant biological processes. Jatropha curcas L. has received significant attention as a potential oilseed crop for the production of renewable oil. Here, a sRNA library of mature seeds and three mRNA libraries from three different seed development stages were generated by deep sequencing to identify and characterize the miRNAs and pre-miRNAs of J. curcas. Computational analysis was used for the identification of 180 conserved miRNAs and 41 precursors (pre-miRNAs) as well as 16 novel pre-miRNAs. The predicted miRNA target genes are involved in a broad range of physiological functions, including cellular structure, nuclear function, translation, transport, hormone synthesis, defense, and lipid metabolism. Some pre-miRNA and miRNA targets vary in abundance between the three stages of seed development. A search for sequences that produce siRNA was performed, and the results indicated that J. curcas siRNAs play a role in nuclear functions, transport, catalytic processes and disease resistance. This study presents the first large scale identification of J. curcas miRNAs and their targets in mature seeds based on deep sequencing, and it contributes to a functional understanding of these miRNAs. PMID:24551031

  16. Bayesian mixture analysis for metagenomic community profiling.

    PubMed

    Morfopoulou, Sofia; Plagnol, Vincent

    2015-09-15

    Deep sequencing of clinical samples is now an established tool for the detection of infectious pathogens, with direct medical applications. The large amount of data generated produces an opportunity to detect species even at very low levels, provided that computational tools can effectively profile the relevant metagenomic communities. Data interpretation is complicated by the fact that short sequencing reads can match multiple organisms and by the lack of completeness of existing databases, in particular for viral pathogens. Here we present metaMix, a Bayesian mixture model framework for resolving complex metagenomic mixtures. We show that the use of parallel Monte Carlo Markov chains for the exploration of the species space enables the identification of the set of species most likely to contribute to the mixture. We demonstrate the greater accuracy of metaMix compared with relevant methods, particularly for profiling complex communities consisting of several related species. We designed metaMix specifically for the analysis of deep transcriptome sequencing datasets, with a focus on viral pathogen detection; however, the principles are generally applicable to all types of metagenomic mixtures. metaMix is implemented as a user friendly R package, freely available on CRAN: http://cran.r-project.org/web/packages/metaMix sofia.morfopoulou.10@ucl.ac.uk Supplementary data are available at Bionformatics online. © The Author 2015. Published by Oxford University Press.

  17. Deep Sequencing of 71 Candidate Genes to Characterize Variation Associated with Alcohol Dependence.

    PubMed

    Clark, Shaunna L; McClay, Joseph L; Adkins, Daniel E; Kumar, Gaurav; Aberg, Karolina A; Nerella, Srilaxmi; Xie, Linying; Collins, Ann L; Crowley, James J; Quackenbush, Corey R; Hilliard, Christopher E; Shabalin, Andrey A; Vrieze, Scott I; Peterson, Roseann E; Copeland, William E; Silberg, Judy L; McGue, Matt; Maes, Hermine; Iacono, William G; Sullivan, Patrick F; Costello, Elizabeth J; van den Oord, Edwin J

    2017-04-01

    Previous genomewide association studies (GWASs) have identified a number of putative risk loci for alcohol dependence (AD). However, only a few loci have replicated and these replicated variants only explain a small proportion of AD risk. Using an innovative approach, the goal of this study was to generate hypotheses about potentially causal variants for AD that can be explored further through functional studies. We employed targeted capture of 71 candidate loci and flanking regions followed by next-generation deep sequencing (mean coverage 78X) in 806 European Americans. Regions included in our targeted capture library were genes identified through published GWAS of alcohol, all human alcohol and aldehyde dehydrogenases, reward system genes including dopaminergic and opioid receptors, prioritized candidate genes based on previous associations, and genes involved in the absorption, distribution, metabolism, and excretion of drugs. We performed single-locus tests to determine if any single variant was associated with AD symptom count. Sets of variants that overlapped with biologically meaningful annotations were tested for association in aggregate. No single, common variant was significantly associated with AD in our study. We did, however, find evidence for association with several variant sets. Two variant sets were significant at the q-value <0.10 level: a genic enhancer for ADHFE1 (p = 1.47 × 10 -5 ; q = 0.019), an alcohol dehydrogenase, and ADORA1 (p = 5.29 × 10 -5 ; q = 0.035), an adenosine receptor that belongs to a G-protein-coupled receptor gene family. To our knowledge, this is the first sequencing study of AD to examine variants in entire genes, including flanking and regulatory regions. We found that in addition to protein coding variant sets, regulatory variant sets may play a role in AD. From these findings, we have generated initial functional hypotheses about how these sets may influence AD. Copyright © 2017 by the Research Society on Alcoholism.

  18. DEEP MOTIF DASHBOARD: VISUALIZING AND UNDERSTANDING GENOMIC SEQUENCES USING DEEP NEURAL NETWORKS.

    PubMed

    Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun

    2017-01-01

    Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence's saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them.

  19. Next-generation sequencing of the Trichinella murrelli mitochondrial genome allows comprehensive comparison of its divergence from the principal agent of human trichinellosis, Trichinella spiralis.

    PubMed

    Webb, Kristen M; Rosenthal, Benjamin M

    2011-01-01

    The mitochondrial genome's non-recombinant mode of inheritance and relatively rapid rate of evolution has promoted its use as a marker for studying the biogeographic history and evolutionary interrelationships among many metazoan species. A modest portion of the mitochondrial genome has been defined for 12 species and genotypes of parasites in the genus Trichinella, but its adequacy in representing the mitochondrial genome as a whole remains unclear, as the complete coding sequence has been characterized only for Trichinella spiralis. Here, we sought to comprehensively describe the extent and nature of divergence between the mitochondrial genomes of T. spiralis (which poses the most appreciable zoonotic risk owing to its capacity to establish persistent infections in domestic pigs) and Trichinella murrelli (which is the most prevalent species in North American wildlife hosts, but which poses relatively little risk to the safety of pork). Next generation sequencing methodologies and scaffold and de novo assembly strategies were employed. The entire protein-coding region was sequenced (13,917 bp), along with a portion of the highly repetitive non-coding region (1524 bp) of the mitochondrial genome of T. murrelli with a combined average read depth of 250 reads. The accuracy of base calling, estimated from coding region sequence was found to exceed 99.3%. Genome content and gene order was not found to be significantly different from that of T. spiralis. An overall inter-species sequence divergence of 9.5% was estimated. Significant variation was identified when the amount of variation between species at each gene is compared to the average amount of variation between species across the coding region. Next generation sequencing is a highly effective means to obtain previously unknown mitochondrial genome sequence. Particular to parasites, the extremely deep coverage achieved through this method allows for the detection of sequence heterogeneity between the multiple individuals that necessarily comprise such templates. Copyright © 2010 Elsevier B.V. All rights reserved.

  20. Digital Marine Bioprospecting: Mining New Neurotoxin Drug Candidates from the Transcriptomes of Cold-Water Sea Anemones

    PubMed Central

    Urbarova, Ilona; Karlsen, Bård Ove; Okkenhaug, Siri; Seternes, Ole Morten; Johansen, Steinar D.; Emblem, Åse

    2012-01-01

    Marine bioprospecting is the search for new marine bioactive compounds and large-scale screening in extracts represents the traditional approach. Here, we report an alternative complementary protocol, called digital marine bioprospecting, based on deep sequencing of transcriptomes. We sequenced the transcriptomes from the adult polyp stage of two cold-water sea anemones, Bolocera tuediae and Hormathia digitata. We generated approximately 1.1 million quality-filtered sequencing reads by 454 pyrosequencing, which were assembled into approximately 120,000 contigs and 220,000 single reads. Based on annotation and gene ontology analysis we profiled the expressed mRNA transcripts according to known biological processes. As a proof-of-concept we identified polypeptide toxins with a potential blocking activity on sodium and potassium voltage-gated channels from digital transcriptome libraries. PMID:23170083

  1. Improved detection of CXCR4-using HIV by V3 genotyping: application of population-based and "deep" sequencing to plasma RNA and proviral DNA.

    PubMed

    Swenson, Luke C; Moores, Andrew; Low, Andrew J; Thielen, Alexander; Dong, Winnie; Woods, Conan; Jensen, Mark A; Wynhoven, Brian; Chan, Dennison; Glascock, Christopher; Harrigan, P Richard

    2010-08-01

    Tropism testing should rule out CXCR4-using HIV before treatment with CCR5 antagonists. Currently, the recombinant phenotypic Trofile assay (Monogram) is most widely utilized; however, genotypic tests may represent alternative methods. Independent triplicate amplifications of the HIV gp120 V3 region were made from either plasma HIV RNA or proviral DNA. These underwent standard, population-based sequencing with an ABI3730 (RNA n = 63; DNA n = 40), or "deep" sequencing with a Roche/454 Genome Sequencer-FLX (RNA n = 12; DNA n = 12). Position-specific scoring matrices (PSSMX4/R5) (-6.96 cutoff) and geno2pheno[coreceptor] (5% false-positive rate) inferred tropism from V3 sequence. These methods were then independently validated with a separate, blinded dataset (n = 278) of screening samples from the maraviroc MOTIVATE trials. Standard sequencing of HIV RNA with PSSM yielded 69% sensitivity and 91% specificity, relative to Trofile. The validation dataset gave 75% sensitivity and 83% specificity. Proviral DNA plus PSSM gave 77% sensitivity and 71% specificity. "Deep" sequencing of HIV RNA detected >2% inferred-CXCR4-using virus in 8/8 samples called non-R5 by Trofile, and <2% in 4/4 samples called R5. Triplicate analyses of V3 standard sequence data detect greater proportions of CXCR4-using samples than previously achieved. Sequencing proviral DNA and "deep" V3 sequencing may also be useful tools for assessing tropism.

  2. Deep neural models for ICD-10 coding of death certificates and autopsy reports in free-text.

    PubMed

    Duarte, Francisco; Martins, Bruno; Pinto, Cátia Sousa; Silva, Mário J

    2018-04-01

    We address the assignment of ICD-10 codes for causes of death by analyzing free-text descriptions in death certificates, together with the associated autopsy reports and clinical bulletins, from the Portuguese Ministry of Health. We leverage a deep neural network that combines word embeddings, recurrent units, and neural attention, for the generation of intermediate representations of the textual contents. The neural network also explores the hierarchical nature of the input data, by building representations from the sequences of words within individual fields, which are then combined according to the sequences of fields that compose the inputs. Moreover, we explore innovative mechanisms for initializing the weights of the final nodes of the network, leveraging co-occurrences between classes together with the hierarchical structure of ICD-10. Experimental results attest to the contribution of the different neural network components. Our best model achieves accuracy scores over 89%, 81%, and 76%, respectively for ICD-10 chapters, blocks, and full-codes. Through examples, we also show that our method can produce interpretable results, useful for public health surveillance. Copyright © 2018 Elsevier Inc. All rights reserved.

  3. Bi-PROF

    PubMed Central

    Gries, Jasmin; Schumacher, Dirk; Arand, Julia; Lutsik, Pavlo; Markelova, Maria Rivera; Fichtner, Iduna; Walter, Jörn; Sers, Christine; Tierling, Sascha

    2013-01-01

    The use of next generation sequencing has expanded our view on whole mammalian methylome patterns. In particular, it provides a genome-wide insight of local DNA methylation diversity at single nucleotide level and enables the examination of single chromosome sequence sections at a sufficient statistical power. We describe a bisulfite-based sequence profiling pipeline, Bi-PROF, which is based on the 454 GS-FLX Titanium technology that allows to obtain up to one million sequence stretches at single base pair resolution without laborious subcloning. To illustrate the performance of the experimental workflow connected to a bioinformatics program pipeline (BiQ Analyzer HT) we present a test analysis set of 68 different epigenetic marker regions (amplicons) in five individual patient-derived xenograft tissue samples of colorectal cancer and one healthy colon epithelium sample as a control. After the 454 GS-FLX Titanium run, sequence read processing and sample decoding, the obtained alignments are quality controlled and statistically evaluated. Comprehensive methylation pattern interpretation (profiling) assessed by analyzing 102-104 sequence reads per amplicon allows an unprecedented deep view on pattern formation and methylation marker heterogeneity in tissues concerned by complex diseases like cancer. PMID:23803588

  4. Sensitivity of BRCA1/2 testing in high-risk breast/ovarian/male breast cancer families: little contribution of comprehensive RNA/NGS panel testing.

    PubMed

    Byers, Helen; Wallis, Yvonne; van Veen, Elke M; Lalloo, Fiona; Reay, Kim; Smith, Philip; Wallace, Andrew J; Bowers, Naomi; Newman, William G; Evans, D Gareth

    2016-11-01

    The sensitivity of testing BRCA1 and BRCA2 remains unresolved as the frequency of deep intronic splicing variants has not been defined in high-risk familial breast/ovarian cancer families. This variant category is reported at significant frequency in other tumour predisposition genes, including NF1 and MSH2. We carried out comprehensive whole gene RNA analysis on 45 high-risk breast/ovary and male breast cancer families with no identified pathogenic variant on exonic sequencing and copy number analysis of BRCA1/2. In addition, we undertook variant screening of a 10-gene high/moderate risk breast/ovarian cancer panel by next-generation sequencing. DNA testing identified the causative variant in 50/56 (89%) breast/ovarian/male breast cancer families with Manchester scores of ≥50 with two variants being confirmed to affect splicing on RNA analysis. RNA sequencing of BRCA1/BRCA2 on 45 individuals from high-risk families identified no deep intronic variants and did not suggest loss of RNA expression as a cause of lost sensitivity. Panel testing in 42 samples identified a known RAD51D variant, a high-risk ATM variant in another breast ovary family and a truncating CHEK2 mutation. Current exonic sequencing and copy number analysis variant detection methods of BRCA1/2 have high sensitivity in high-risk breast/ovarian cancer families. Sequence analysis of RNA does not identify any variants undetected by current analysis of BRCA1/2. However, RNA analysis clarified the pathogenicity of variants of unknown significance detected by current methods. The low diagnostic uplift achieved through sequence analysis of the other known breast/ovarian cancer susceptibility genes indicates that further high-risk genes remain to be identified.

  5. Using next generation transcriptome sequencing to predict an ectomycorrhizal metablome.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Larsen, P. E.; Sreedasyam, A.; Trivedi, G

    Mycorrhizae, symbiotic interactions between soil fungi and tree roots, are ubiquitous in terrestrial ecosystems. The fungi contribute phosphorous, nitrogen and mobilized nutrients from organic matter in the soil and in return the fungus receives photosynthetically-derived carbohydrates. This union of plant and fungal metabolisms is the mycorrhizal metabolome. Understanding this symbiotic relationship at a molecular level provides important contributions to the understanding of forest ecosystems and global carbon cycling. We generated next generation short-read transcriptomic sequencing data from fully-formed ectomycorrhizae between Laccaria bicolor and aspen (Populus tremuloides) roots. The transcriptomic data was used to identify statistically significantly expressed gene models usingmore » a bootstrap-style approach, and these expressed genes were mapped to specific metabolic pathways. Integration of expressed genes that code for metabolic enzymes and the set of expressed membrane transporters generates a predictive model of the ectomycorrhizal metabolome. The generated model of mycorrhizal metabolome predicts that the specific compounds glycine, glutamate, and allantoin are synthesized by L. bicolor and that these compounds or their metabolites may be used for the benefit of aspen in exchange for the photosynthetically-derived sugars fructose and glucose. The analysis illustrates an approach to generate testable biological hypotheses to investigate the complex molecular interactions that drive ectomycorrhizal symbiosis. These models are consistent with experimental environmental data and provide insight into the molecular exchange processes for organisms in this complex ecosystem. The method used here for predicting metabolomic models of mycorrhizal systems from deep RNA sequencing data can be generalized and is broadly applicable to transcriptomic data derived from complex systems.« less

  6. Diversity of Bacteria at Healthy Human Conjunctiva

    PubMed Central

    Dong, Qunfeng; Brulc, Jennifer M.; Iovieno, Alfonso; Bates, Brandon; Garoutte, Aaron; Miller, Darlene; Revanna, Kashi V.; Gao, Xiang; Antonopoulos, Dionysios A.; Slepak, Vladlen Z.

    2011-01-01

    Purpose. Ocular surface (OS) microbiota contributes to infectious and autoimmune diseases of the eye. Comprehensive analysis of microbial diversity at the OS has been impossible because of the limitations of conventional cultivation techniques. This pilot study aimed to explore true diversity of human OS microbiota using DNA sequencing-based detection and identification of bacteria. Methods. Composition of the bacterial community was characterized using deep sequencing of the 16S rRNA gene amplicon libraries generated from total conjunctival swab DNA. The DNA sequences were classified and the diversity parameters measured using bioinformatics software ESPRIT and MOTHUR and tools available through the Ribosomal Database Project-II (RDP-II). Results. Deep sequencing of conjunctival rDNA from four subjects yielded a total of 115,003 quality DNA reads, corresponding to 221 species-level phylotypes per subject. The combined bacterial community classified into 5 phyla and 59 distinct genera. However, 31% of all DNA reads belonged to unclassified or novel bacteria. The intersubject variability of individual OS microbiomes was very significant. Regardless, 12 genera—Pseudomonas, Propionibacterium, Bradyrhizobium, Corynebacterium, Acinetobacter, Brevundimonas, Staphylococci, Aquabacterium, Sphingomonas, Streptococcus, Streptophyta, and Methylobacterium—were ubiquitous among the analyzed cohort and represented the putative “core” of conjunctival microbiota. The other 47 genera accounted for <4% of the classified portion of this microbiome. Unexpectedly, healthy conjunctiva contained many genera that are commonly identified as ocular surface pathogens. Conclusions. The first DNA sequencing-based survey of bacterial population at the conjunctiva have revealed an unexpectedly diverse microbial community. All analyzed samples contained ubiquitous (core) genera that included commensal, environmental, and opportunistic pathogenic bacteria. PMID:21571682

  7. Quick, sensitive and specific detection and evaluation of quantification of minor variants by high-throughput sequencing.

    PubMed

    Leung, Ross Ka-Kit; Dong, Zhi Qiang; Sa, Fei; Chong, Cheong Meng; Lei, Si Wan; Tsui, Stephen Kwok-Wing; Lee, Simon Ming-Yuen

    2014-02-01

    Minor variants have significant implications in quasispecies evolution, early cancer detection and non-invasive fetal genotyping but their accurate detection by next-generation sequencing (NGS) is hampered by sequencing errors. We generated sequencing data from mixtures at predetermined ratios in order to provide insight into sequencing errors and variations that can arise for which simulation cannot be performed. The information also enables better parameterization in depth of coverage, read quality and heterogeneity, library preparation techniques, technical repeatability for mathematical modeling, theory development and simulation experimental design. We devised minor variant authentication rules that achieved 100% accuracy in both testing and validation experiments. The rules are free from tedious inspection of alignment accuracy, sequencing read quality or errors introduced by homopolymers. The authentication processes only require minor variants to: (1) have minimum depth of coverage larger than 30; (2) be reported by (a) four or more variant callers, or (b) DiBayes or LoFreq, plus SNVer (or BWA when no results are returned by SNVer), and with the interassay coefficient of variation (CV) no larger than 0.1. Quantification accuracy undermined by sequencing errors could neither be overcome by ultra-deep sequencing, nor recruiting more variant callers to reach a consensus, such that consistent underestimation and overestimation (i.e. low CV) were observed. To accommodate stochastic error and adjust the observed ratio within a specified accuracy, we presented a proof of concept for the use of a double calibration curve for quantification, which provides an important reference towards potential industrial-scale fabrication of calibrants for NGS.

  8. Pre-drill predictions versus post-drill results: use of sequence stratigraphic methods in reduction of exploration risk, Sarawak Deep-water Blocks, Malaysia

    NASA Astrophysics Data System (ADS)

    Mansor, Md Yazid; Snedden, J. W.; Sarg, J. F.; Smith, B. S.; Kolich, T.; Carter, M.

    1999-04-01

    Limited well control, great distances from age-equivalent producing fields, and a largely unknown stratigraphy necessitated use of sequence stratigraphic methods to assess exploration risk associated with reservoir, source and seal distribution in the Mobil-operated Deep-water Blocks of Sarawak, Malaysia. These methods allowed predictions to be made and reservoir risks to be halved in each of the locations drilled in 1995. Predictions regarding reservoir and stratigraphy proved correct, as the Mulu-1 and Bako-1 wells penetrated numerous high-quality, thick sandstone reservoirs in the Middle to Lower Miocene section. Shallow marine sandstones dominate the vertical succession in both wells, with characteristic aggradational, upward-coarsening log motifs. Cores display classic wave-generated stratification and hummocky cross-bedding. Evidence, such as marginal-marine to neritic microfauna in cuttings of both wells, supports these interpretations. Lack of hydrocarbon charge in the two wells may be due to their position relative to coaly hydrocarbon source beds. These prospects have high trap and seal integrity, being well defined on seismics as high relief horst blocks covered by a very thick shale-prone section. The Mulu-1 well, for example, is located at least 20-30 km down stratigraphic dip from mapped coeval lower coastal-plain deposits. Amplitude anomalies on the flank of the Mulu horst are probably derived from transported organics buried in deep Plio-Pleistocene kitchens in the northwest portion of the Mobil blocks. Remaining potential of mapped prospects is high and efforts continue at characterizing the petroleum system of the Deep-water Blocks. Seismic attribute and interval velocity analyses provide new clues to the location of probable coaly source rocks, especially when viewed in their regional and sequence stratigraphic context. Future work is planned and will serve to reduce risk to acceptable levels and support further drilling in this prospective hydrocarbon province.

  9. Sedimentology and Sedimentary Dynamics of the Desmoinesian Cherokee Group, Deep Anadarko Basin, Texas Panhandle

    NASA Astrophysics Data System (ADS)

    Hu, N.; Loucks, R.; Frebourg, G.

    2015-12-01

    Understanding the spatial variability of deep-water facies is critical to deep-water research because of its revealing information about the relationship between desity flow processes and their resultant sedimentary sequences. The Cherokee Group in the Anadarko Basin, northeastern Texas Panhandle, provides an opportunity to investigate an icehouse-greenhouse Pennsylvanian hybrid system that well demonstrates the intricacies of vertical and lateral facies relationships in an unconfined fan-delta fed deep-water slope to basinal setting. The stratigraphic section ranges in thickness from 150 to 460 m. The cyclic sedimentation and foreland basin tectonics resulted in a complex stratal architecture that was sourced by multiple areas of sediment input. This investigation consists of wireline-log and core data. Five-thousand wireline logs were correlated in an area of over 9500 sq km to map out six depositional sequences that are separated by major flooding events. These events are correlative over the whole area of study. Six cores, that sample nearly the complete section, were described for lithofacies. Lithofacies are recognized based on depositional features and mineralogy:(1) Subarkose, (2) Lithicarkoses, (3) Sandy siliciclastic conglomerate, (4) Muddy calcareous conglomerate, (5) Crinoidal packstone, (6) Oodic grainstone, (7)Pelodic grainstone, (8) Ripple laminated mudrock, (9) faint laminated mudrock. The integration of isopachs of depositional sequences with the lithofacies has allowed the delineation of the spatial and temporal evolution of the slope to basin-floor system. Thin-to-thick bedded turbidites, hyperconcentrated density flow deposits (slurry beds), and debris and mud flow deposits were observed and can be used to better predicte lithofacies distributions in areas that have less data control. These mixed siliciclastic and carbonate deposits can be carrier beds for the hydrocarbons generated from the enclosing organic-rich (TOC ranges from 0.55 to 6.77wt%), dysareobic to anaerobic mudstones.

  10. Draft Genome Sequence of Deep-Sea Alteromonas sp. Strain V450 Isolated from the Marine Sponge Leiodermatium sp.

    PubMed Central

    Barrett, Nolan H.; McCarthy, Peter J.

    2017-01-01

    ABSTRACT The proteobacterium Alteromonas sp. strain V450 was isolated from the Atlantic deep-sea sponge Leiodermatium sp. Here, we report the draft genome sequence of this strain, with a genome size of approx. 4.39 Mb and a G+C content of 44.01%. The results will aid deep-sea microbial ecology, evolution, and sponge-microbe association studies. PMID:28153886

  11. Unbiased whole-genome deep sequencing of human and porcine stool samples reveals circulation of multiple groups of rotaviruses and a putative zoonotic infection

    PubMed Central

    Phan, My V. T.; Anh, Pham Hong; Cuong, Nguyen Van; Munnink, Bas B. Oude; van der Hoek, Lia; My, Phuc Tran; Tri, Tue Ngo; Bryant, Juliet E.; Baker, Stephen; Thwaites, Guy; Woolhouse, Mark; Kellam, Paul; Rabaa, Maia A.

    2016-01-01

    Abstract Coordinated and synchronous surveillance for zoonotic viruses in both human clinical cases and animal reservoirs provides an opportunity to identify interspecies virus movement. Rotavirus (RV) is an important cause of viral gastroenteritis in humans and animals. In this study, we document the RV diversity within co-located humans and animals sampled from the Mekong delta region of Vietnam using a primer-independent, agnostic, deep sequencing approach. A total of 296 stool samples (146 from diarrhoeal human patients and 150 from pigs living in the same geographical region) were directly sequenced, generating the genomic sequences of sixty human rotaviruses (all group A) and thirty-one porcine rotaviruses (thirteen group A, seven group B, six group C, and five group H). Phylogenetic analyses showed the co-circulation of multiple distinct RV group A (RVA) genotypes/strains, many of which were divergent from the strain components of licensed RVA vaccines, as well as considerable virus diversity in pigs including full genomes of rotaviruses in groups B, C, and H, none of which have been previously reported in Vietnam. Furthermore, the detection of an atypical RVA genotype constellation (G4-P[6]-I1-R1-C1-M1-A8-N1-T7-E1-H1) in a human patient and a pig from the same region provides some evidence for a zoonotic event. PMID:28748110

  12. Mapping vaccinia virus DNA replication origins at nucleotide level by deep sequencing.

    PubMed

    Senkevich, Tatiana G; Bruno, Daniel; Martens, Craig; Porcella, Stephen F; Wolf, Yuri I; Moss, Bernard

    2015-09-01

    Poxviruses reproduce in the host cytoplasm and encode most or all of the enzymes and factors needed for expression and synthesis of their double-stranded DNA genomes. Nevertheless, the mode of poxvirus DNA replication and the nature and location of the replication origins remain unknown. A current but unsubstantiated model posits only leading strand synthesis starting at a nick near one covalently closed end of the genome and continuing around the other end to generate a concatemer that is subsequently resolved into unit genomes. The existence of specific origins has been questioned because any plasmid can replicate in cells infected by vaccinia virus (VACV), the prototype poxvirus. We applied directional deep sequencing of short single-stranded DNA fragments enriched for RNA-primed nascent strands isolated from the cytoplasm of VACV-infected cells to pinpoint replication origins. The origins were identified as the switching points of the fragment directions, which correspond to the transition from continuous to discontinuous DNA synthesis. Origins containing a prominent initiation point mapped to a sequence within the hairpin loop at one end of the VACV genome and to the same sequence within the concatemeric junction of replication intermediates. These findings support a model for poxvirus genome replication that involves leading and lagging strand synthesis and is consistent with the requirements for primase and ligase activities as well as earlier electron microscopic and biochemical studies implicating a replication origin at the end of the VACV genome.

  13. Using pyrosequencing to shed light on deep mine microbial ecology

    PubMed Central

    Edwards, Robert A; Rodriguez-Brito, Beltran; Wegley, Linda; Haynes, Matthew; Breitbart, Mya; Peterson, Dean M; Saar, Martin O; Alexander, Scott; Alexander, E Calvin; Rohwer, Forest

    2006-01-01

    Background Contrasting biological, chemical and hydrogeological analyses highlights the fundamental processes that shape different environments. Generating and interpreting the biological sequence data was a costly and time-consuming process in defining an environment. Here we have used pyrosequencing, a rapid and relatively inexpensive sequencing technology, to generate environmental genome sequences from two sites in the Soudan Mine, Minnesota, USA. These sites were adjacent to each other, but differed significantly in chemistry and hydrogeology. Results Comparisons of the microbes and the subsystems identified in the two samples highlighted important differences in metabolic potential in each environment. The microbes were performing distinct biochemistry on the available substrates, and subsystems such as carbon utilization, iron acquisition mechanisms, nitrogen assimilation, and respiratory pathways separated the two communities. Although the correlation between much of the microbial metabolism occurring and the geochemical conditions from which the samples were isolated could be explained, the reason for the presence of many pathways in these environments remains to be determined. Despite being physically close, these two communities were markedly different from each other. In addition, the communities were also completely different from other microbial communities sequenced to date. Conclusion We anticipate that pyrosequencing will be widely used to sequence environmental samples because of the speed, cost, and technical advantages. Furthermore, subsystem comparisons rapidly identify the important metabolisms employed by the microbes in different environments. PMID:16549033

  14. Partial androgen insensitivity syndrome caused by a deep intronic mutation creating an alternative splice acceptor site of the AR gene.

    PubMed

    Ono, Hiroyuki; Saitsu, Hirotomo; Horikawa, Reiko; Nakashima, Shinichi; Ohkubo, Yumiko; Yanagi, Kumiko; Nakabayashi, Kazuhiko; Fukami, Maki; Fujisawa, Yasuko; Ogata, Tsutomu

    2018-02-02

    Although partial androgen insensitivity syndrome (PAIS) is caused by attenuated responsiveness to androgens, androgen receptor gene (AR) mutations on the coding regions and their splice sites have been identified only in <25% of patients with a diagnosis of PAIS. We performed extensive molecular studies including whole exome sequencing in a Japanese family with PAIS, identifying a deep intronic variant beyond the branch site at intron 6 of AR (NM_000044.4:c.2450-42 G > A). This variant created the splice acceptor motif that was accompanied by pyrimidine-rich sequence and two candidate branch sites. Consistent with this, reverse transcriptase (RT)-PCR experiments for cycloheximide-treated lymphoblastoid cell lines revealed a relatively large amount of aberrant mRNA produced by the newly created splice acceptor site and a relatively small amount of wildtype mRNA produced by the normal splice acceptor site. Furthermore, most of the aberrant mRNA was shown to undergo nonsense mediated decay (NMD) and, if a small amount of aberrant mRNA may have escaped NMD, such mRNA was predicted to generate a truncated AR protein missing some functional domains. These findings imply that the deep intronic mutation creating an alternative splice acceptor site resulted in the production of a relatively small amount of wildtype AR mRNA, leading to PAIS.

  15. Congruent Deep Relationships in the Grape Family (Vitaceae) Based on Sequences of Chloroplast Genomes and Mitochondrial Genes via Genome Skimming

    PubMed Central

    Zhang, Ning; Wen, Jun; Zimmer, Elizabeth A.

    2015-01-01

    Vitaceae is well-known for having one of the most economically important fruits, i.e., the grape (Vitis vinifera). The deep phylogeny of the grape family was not resolved until a recent phylogenomic analysis of 417 nuclear genes from transcriptome data. However, it has been reported extensively that topologies based on nuclear and organellar genes may be incongruent due to differences in their evolutionary histories. Therefore, it is important to reconstruct a backbone phylogeny of the grape family using plastomes and mitochondrial genes. In this study, next-generation sequencing data sets of 27 species were obtained using genome skimming with total DNAs from silica-gel preserved tissue samples on an Illumina HiSeq 2500 instrument. Plastomes were assembled using the combination of de novo and reference genome (of V. vinifera) methods. Sixteen mitochondrial genes were also obtained via genome skimming using the reference genome of V. vinifera. Extensive phylogenetic analyses were performed using maximum likelihood and Bayesian methods. The topology based on either plastome data or mitochondrial genes is congruent with the one using hundreds of nuclear genes, indicating that the grape family did not exhibit significant reticulation at the deep level. The results showcase the power of genome skimming in capturing extensive phylogenetic data: especially from chloroplast and mitochondrial DNAs. PMID:26656830

  16. Congruent Deep Relationships in the Grape Family (Vitaceae) Based on Sequences of Chloroplast Genomes and Mitochondrial Genes via Genome Skimming.

    PubMed

    Zhang, Ning; Wen, Jun; Zimmer, Elizabeth A

    2015-01-01

    Vitaceae is well-known for having one of the most economically important fruits, i.e., the grape (Vitis vinifera). The deep phylogeny of the grape family was not resolved until a recent phylogenomic analysis of 417 nuclear genes from transcriptome data. However, it has been reported extensively that topologies based on nuclear and organellar genes may be incongruent due to differences in their evolutionary histories. Therefore, it is important to reconstruct a backbone phylogeny of the grape family using plastomes and mitochondrial genes. In this study,next-generation sequencing data sets of 27 species were obtained using genome skimming with total DNAs from silica-gel preserved tissue samples on an Illumina NextSeq 500 instrument [corrected]. Plastomes were assembled using the combination of de novo and reference genome (of V. vinifera) methods. Sixteen mitochondrial genes were also obtained via genome skimming using the reference genome of V. vinifera. Extensive phylogenetic analyses were performed using maximum likelihood and Bayesian methods. The topology based on either plastome data or mitochondrial genes is congruent with the one using hundreds of nuclear genes, indicating that the grape family did not exhibit significant reticulation at the deep level. The results showcase the power of genome skimming in capturing extensive phylogenetic data: especially from chloroplast and mitochondrial DNAs.

  17. Sequence-specific bias correction for RNA-seq data using recurrent neural networks.

    PubMed

    Zhang, Yao-Zhong; Yamaguchi, Rui; Imoto, Seiya; Miyano, Satoru

    2017-01-25

    The recent success of deep learning techniques in machine learning and artificial intelligence has stimulated a great deal of interest among bioinformaticians, who now wish to bring the power of deep learning to bare on a host of bioinformatical problems. Deep learning is ideally suited for biological problems that require automatic or hierarchical feature representation for biological data when prior knowledge is limited. In this work, we address the sequence-specific bias correction problem for RNA-seq data redusing Recurrent Neural Networks (RNNs) to model nucleotide sequences without pre-determining sequence structures. The sequence-specific bias of a read is then calculated based on the sequence probabilities estimated by RNNs, and used in the estimation of gene abundance. We explore the application of two popular RNN recurrent units for this task and demonstrate that RNN-based approaches provide a flexible way to model nucleotide sequences without knowledge of predetermined sequence structures. Our experiments show that training a RNN-based nucleotide sequence model is efficient and RNN-based bias correction methods compare well with the-state-of-the-art sequence-specific bias correction method on the commonly used MAQC-III data set. RNNs provides an alternative and flexible way to calculate sequence-specific bias without explicitly pre-determining sequence structures.

  18. Hear it, See it, Explore it: Visualizations and Sonifications of Seismic Signals

    NASA Astrophysics Data System (ADS)

    Fisher, M.; Peng, Z.; Simpson, D. W.; Kilb, D. L.

    2010-12-01

    Sonification of seismic data is an innovative way to represent seismic data in the audible range (Simpson, 2005). Seismic waves with different frequency and temporal characteristics, such as those from teleseismic earthquakes, deep “non-volcanic” tremor and local earthquakes, can be easily discriminated when time-compressed to the audio range. Hence, sonification is particularly useful for presenting complicated seismic signals with multiple sources, such as aftershocks within the coda of large earthquakes, and remote triggering of earthquakes and tremor by large teleseismic earthquakes. Previous studies mostly focused on converting the seismic data into audible files by simple time compression or frequency modulation (Simpson et al., 2009). Here we generate animations of the seismic data together with the sounds. We first read seismic data in the SAC format into Matlab, and generate a sequence of image files and an associated WAV sound file. Next, we use a third party video editor, such as the QuickTime Pro, to combine the image sequences and the sound file into an animation. We have applied this simple procedure to generate animations of remotely triggered earthquakes, tremor and low-frequency earthquakes in California, and mainshock-aftershock sequences in Japan and California. These animations clearly demonstrate the interactions of earthquake sequences and the richness of the seismic data. The tool developed in this study can be easily adapted for use in other research applications and to create sonification/animation of seismic data for education and outreach purpose.

  19. Identification of a novel MYO7A mutation in Usher syndrome type 1.

    PubMed

    Cheng, Ling; Yu, Hongsong; Jiang, Yan; He, Juan; Pu, Sisi; Li, Xin; Zhang, Li

    2018-01-05

    Usher syndrome (USH) is an autosomal recessive disease characterized by deafness and retinitis pigmentosa. In view of the high phenotypic and genetic heterogeneity in USH, performing genetic screening with traditional methods is impractical. In the present study, we carried out targeted next-generation sequencing (NGS) to uncover the underlying gene in an USH family (2 USH patients and 15 unaffected relatives). One hundred and thirty-five genes associated with inherited retinal degeneration were selected for deep exome sequencing. Subsequently, variant analysis, Sanger validation and segregation tests were utilized to identify the disease-causing mutations in this family. All affected individuals had a classic USH type I (USH1) phenotype which included deafness, vestibular dysfunction and retinitis pigmentosa. Targeted NGS and Sanger sequencing validation suggested that USH1 patients carried an unreported splice site mutation, c.5168+1G>A, as a compound heterozygous mutation with c.6070C>T (p.R2024X) in the MYO7A gene. A functional study revealed decreased expression of the MYO7A gene in the individuals carrying heterozygous mutations. In conclusion, targeted next-generation sequencing provided a comprehensive and efficient diagnosis for USH1. This study revealed the genetic defects in the MYO7A gene and expanded the spectrum of clinical phenotypes associated with USH1 mutations.

  20. Deep sequencing of the Trypanosoma cruzi GP63 surface proteases reveals diversity and diversifying selection among chronic and congenital Chagas disease patients.

    PubMed

    Llewellyn, Martin S; Messenger, Louisa A; Luquetti, Alejandro O; Garcia, Lineth; Torrico, Faustino; Tavares, Suelene B N; Cheaib, Bachar; Derome, Nicolas; Delepine, Marc; Baulard, Céline; Deleuze, Jean-Francois; Sauer, Sascha; Miles, Michael A

    2015-04-01

    Chagas disease results from infection with the diploid protozoan parasite Trypanosoma cruzi. T. cruzi is highly genetically diverse, and multiclonal infections in individual hosts are common, but little studied. In this study, we explore T. cruzi infection multiclonality in the context of age, sex and clinical profile among a cohort of chronic patients, as well as paired congenital cases from Cochabamba, Bolivia and Goias, Brazil using amplicon deep sequencing technology. A 450bp fragment of the trypomastigote TcGP63I surface protease gene was amplified and sequenced across 70 chronic and 22 congenital cases on the Illumina MiSeq platform. In addition, a second, mitochondrial target--ND5--was sequenced across the same cohort of cases. Several million reads were generated, and sequencing read depths were normalized within patient cohorts (Goias chronic, n = 43, Goias congenital n = 2, Bolivia chronic, n = 27; Bolivia congenital, n = 20), Among chronic cases, analyses of variance indicated no clear correlation between intra-host sequence diversity and age, sex or symptoms, while principal coordinate analyses showed no clustering by symptoms between patients. Between congenital pairs, we found evidence for the transmission of multiple sequence types from mother to infant, as well as widespread instances of novel genotypes in infants. Finally, non-synonymous to synonymous (dn:ds) nucleotide substitution ratios among sequences of TcGP63Ia and TcGP63Ib subfamilies within each cohort provided powerful evidence of strong diversifying selection at this locus. Our results shed light on the diversity of parasite DTUs within each patient, as well as the extent to which parasite strains pass between mother and foetus in congenital cases. Although we were unable to find any evidence that parasite diversity accumulates with age in our study cohorts, putative diversifying selection within members of the TcGP63I gene family suggests a link between genetic diversity within this gene family and survival in the mammalian host.

  1. Simultaneous identification of DNA and RNA viruses present in pig faeces using process-controlled deep sequencing.

    PubMed

    Sachsenröder, Jana; Twardziok, Sven; Hammerl, Jens A; Janczyk, Pawel; Wrede, Paul; Hertwig, Stefan; Johne, Reimar

    2012-01-01

    Animal faeces comprise a community of many different microorganisms including bacteria and viruses. Only scarce information is available about the diversity of viruses present in the faeces of pigs. Here we describe a protocol, which was optimized for the purification of the total fraction of viral particles from pig faeces. The genomes of the purified DNA and RNA viruses were simultaneously amplified by PCR and subjected to deep sequencing followed by bioinformatic analyses. The efficiency of the method was monitored using a process control consisting of three bacteriophages (T4, M13 and MS2) with different morphology and genome types. Defined amounts of the bacteriophages were added to the sample and their abundance was assessed by quantitative PCR during the preparation procedure. The procedure was applied to a pooled faecal sample of five pigs. From this sample, 69,613 sequence reads were generated. All of the added bacteriophages were identified by sequence analysis of the reads. In total, 7.7% of the reads showed significant sequence identities with published viral sequences. They mainly originated from bacteriophages (73.9%) and mammalian viruses (23.9%); 0.8% of the sequences showed identities to plant viruses. The most abundant detected porcine viruses were kobuvirus, rotavirus C, astrovirus, enterovirus B, sapovirus and picobirnavirus. In addition, sequences with identities to the chimpanzee stool-associated circular ssDNA virus were identified. Whole genome analysis indicates that this virus, tentatively designated as pig stool-associated circular ssDNA virus (PigSCV), represents a novel pig virus. The established protocol enables the simultaneous detection of DNA and RNA viruses in pig faeces including the identification of so far unknown viruses. It may be applied in studies investigating aetiology, epidemiology and ecology of diseases. The implemented process control serves as quality control, ensures comparability of the method and may be used for further method optimization.

  2. Single-Cell Sequencing for Drug Discovery and Drug Development.

    PubMed

    Wu, Hongjin; Wang, Charles; Wu, Shixiu

    2017-01-01

    Next-generation sequencing (NGS), particularly single-cell sequencing, has revolutionized the scale and scope of genomic and biomedical research. Recent technological advances in NGS and singlecell studies have made the deep whole-genome (DNA-seq), whole epigenome and whole-transcriptome sequencing (RNA-seq) at single-cell level feasible. NGS at the single-cell level expands our view of genome, epigenome and transcriptome and allows the genome, epigenome and transcriptome of any organism to be explored without a priori assumptions and with unprecedented throughput. And it does so with single-nucleotide resolution. NGS is also a very powerful tool for drug discovery and drug development. In this review, we describe the current state of single-cell sequencing techniques, which can provide a new, more powerful and precise approach for analyzing effects of drugs on treated cells and tissues. Our review discusses single-cell whole genome/exome sequencing (scWGS/scWES), single-cell transcriptome sequencing (scRNA-seq), single-cell bisulfite sequencing (scBS), and multiple omics of single-cell sequencing. We also highlight the advantages and challenges of each of these approaches. Finally, we describe, elaborate and speculate the potential applications of single-cell sequencing for drug discovery and drug development. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  3. An Efficient Strategy of Screening for Pathogens in Wild-Caught Ticks and Mosquitoes by Reusing Small RNA Deep Sequencing Data

    PubMed Central

    An, Xiaoping; Fan, Hang; Ma, Maijuan; Anderson, Benjamin D.; Jiang, Jiafu; Liu, Wei; Cao, Wuchun; Tong, Yigang

    2014-01-01

    This paper explored our hypothesis that sRNA (18∼30 bp) deep sequencing technique can be used as an efficient strategy to identify microorganisms other than viruses, such as prokaryotic and eukaryotic pathogens. In the study, the clean reads derived from the sRNA deep sequencing data of wild-caught ticks and mosquitoes were compared against the NCBI nucleotide collection (non-redundant nt database) using Blastn. The blast results were then analyzed with in-house Python scripts. An empirical formula was proposed to identify the putative pathogens. Results showed that not only viruses but also prokaryotic and eukaryotic species of interest can be screened out and were subsequently confirmed with experiments. Specially, a novel Rickettsia spp. was indicated to exist in Haemaphysalis longicornis ticks collected in Beijing. Our study demonstrated the reuse of sRNA deep sequencing data would have the potential to trace the origin of pathogens or discover novel agents of emerging/re-emerging infectious diseases. PMID:24618575

  4. Generic Amplicon Deep Sequencing to Determine Ilarvirus Species Diversity in Australian Prunus

    PubMed Central

    Kinoti, Wycliff M.; Constable, Fiona E.; Nancarrow, Narelle; Plummer, Kim M.; Rodoni, Brendan

    2017-01-01

    The distribution of Ilarvirus species populations amongst 61 Australian Prunus trees was determined by next generation sequencing (NGS) of amplicons generated using a genus-based generic RT-PCR targeting a conserved region of the Ilarvirus RNA2 component that encodes the RNA dependent RNA polymerase (RdRp) gene. Presence of Ilarvirus sequences in each positive sample was further validated by Sanger sequencing of cloned amplicons of regions of each of RNA1, RNA2 and/or RNA3 that were generated by species specific PCRs and by metagenomic NGS. Prunus necrotic ringspot virus (PNRSV) was the most frequently detected Ilarvirus, occurring in 48 of the 61 Ilarvirus-positive trees and Prune dwarf virus (PDV) and Apple mosaic virus (ApMV) were detected in three trees and one tree, respectively. American plum line pattern virus (APLPV) was detected in three trees and represents the first report of APLPV detection in Australia. Two novel and distinct groups of Ilarvirus-like RNA2 amplicon sequences were also identified in several trees by the generic amplicon NGS approach. The high read depth from the amplicon NGS of the generic PCR products allowed the detection of distinct RNA2 RdRp sequence variant populations of PNRSV, PDV, ApMV, APLPV and the two novel Ilarvirus-like sequences. Mixed infections of ilarviruses were also detected in seven Prunus trees. Sanger sequencing of specific RNA1, RNA2, and/or RNA3 genome segments of each virus and total nucleic acid metagenomics NGS confirmed the presence of PNRSV, PDV, ApMV and APLPV detected by RNA2 generic amplicon NGS. However, the two novel groups of Ilarvirus-like RNA2 amplicon sequences detected by the generic amplicon NGS could not be associated to the presence of sequence from RNA1 or RNA3 genome segments or full Ilarvirus genomes, and their origin is unclear. This work highlights the sensitivity of genus-specific amplicon NGS in detection of virus sequences and their distinct populations in multiple samples, and the need for a standardized approach to accurately determine what constitutes an active, viable virus infection after detection by molecular based methods. PMID:28713347

  5. Generic Amplicon Deep Sequencing to Determine Ilarvirus Species Diversity in Australian Prunus.

    PubMed

    Kinoti, Wycliff M; Constable, Fiona E; Nancarrow, Narelle; Plummer, Kim M; Rodoni, Brendan

    2017-01-01

    The distribution of Ilarvirus species populations amongst 61 Australian Prunus trees was determined by next generation sequencing (NGS) of amplicons generated using a genus-based generic RT-PCR targeting a conserved region of the Ilarvirus RNA2 component that encodes the RNA dependent RNA polymerase (RdRp) gene. Presence of Ilarvirus sequences in each positive sample was further validated by Sanger sequencing of cloned amplicons of regions of each of RNA1, RNA2 and/or RNA3 that were generated by species specific PCRs and by metagenomic NGS. Prunus necrotic ringspot virus (PNRSV) was the most frequently detected Ilarvirus , occurring in 48 of the 61 Ilarvirus -positive trees and Prune dwarf virus (PDV) and Apple mosaic virus (ApMV) were detected in three trees and one tree, respectively. American plum line pattern virus (APLPV) was detected in three trees and represents the first report of APLPV detection in Australia. Two novel and distinct groups of Ilarvirus -like RNA2 amplicon sequences were also identified in several trees by the generic amplicon NGS approach. The high read depth from the amplicon NGS of the generic PCR products allowed the detection of distinct RNA2 RdRp sequence variant populations of PNRSV, PDV, ApMV, APLPV and the two novel Ilarvirus -like sequences. Mixed infections of ilarviruses were also detected in seven Prunus trees. Sanger sequencing of specific RNA1, RNA2, and/or RNA3 genome segments of each virus and total nucleic acid metagenomics NGS confirmed the presence of PNRSV, PDV, ApMV and APLPV detected by RNA2 generic amplicon NGS. However, the two novel groups of Ilarvirus -like RNA2 amplicon sequences detected by the generic amplicon NGS could not be associated to the presence of sequence from RNA1 or RNA3 genome segments or full Ilarvirus genomes, and their origin is unclear. This work highlights the sensitivity of genus-specific amplicon NGS in detection of virus sequences and their distinct populations in multiple samples, and the need for a standardized approach to accurately determine what constitutes an active, viable virus infection after detection by molecular based methods.

  6. Draft Genome Sequence of Deep-Sea Alteromonas sp. Strain V450 Isolated from the Marine Sponge Leiodermatium sp.

    PubMed

    Wang, Guojun; Barrett, Nolan H; McCarthy, Peter J

    2017-02-02

    The proteobacterium Alteromonas sp. strain V450 was isolated from the Atlantic deep-sea sponge Leiodermatium sp. Here, we report the draft genome sequence of this strain, with a genome size of approx. 4.39 Mb and a G+C content of 44.01%. The results will aid deep-sea microbial ecology, evolution, and sponge-microbe association studies. Copyright © 2017 Wang et al.

  7. Interpretable dimensionality reduction of single cell transcriptome data with deep generative models.

    PubMed

    Ding, Jiarui; Condon, Anne; Shah, Sohrab P

    2018-05-21

    Single-cell RNA-sequencing has great potential to discover cell types, identify cell states, trace development lineages, and reconstruct the spatial organization of cells. However, dimension reduction to interpret structure in single-cell sequencing data remains a challenge. Existing algorithms are either not able to uncover the clustering structures in the data or lose global information such as groups of clusters that are close to each other. We present a robust statistical model, scvis, to capture and visualize the low-dimensional structures in single-cell gene expression data. Simulation results demonstrate that low-dimensional representations learned by scvis preserve both the local and global neighbor structures in the data. In addition, scvis is robust to the number of data points and learns a probabilistic parametric mapping function to add new data points to an existing embedding. We then use scvis to analyze four single-cell RNA-sequencing datasets, exemplifying interpretable two-dimensional representations of the high-dimensional single-cell RNA-sequencing data.

  8. Exploring single nucleotide polymorphism (SNP), microsatellite (SSR) and differentially expressed genes in the jellyfish (Rhopilema esculentum) by transcriptome sequencing.

    PubMed

    Li, Yunfeng; Zhou, Zunchun; Tian, Meilin; Tian, Yi; Dong, Ying; Li, Shilei; Liu, Weidong; He, Chongbo

    2017-08-01

    In this study, single nucleotide polymorphism (SNP), microsatellite (SSR) and differentially expressed genes (DEGs) in the oral parts, gonads, and umbrella parts of the jellyfish Rhopilema esculentum were analyzed by RNA-Seq technology. A total of 76.4 million raw reads and 72.1 million clean reads were generated from deep sequencing. Approximately 119,874 tentative unigenes and 149,239 transcripts were obtained. A total of 1,034,708 SNP markers were detected in the three tissues. For microsatellite mining, 5088 SSRs were identified from the unigene sequences. The most frequent repeat motifs were mononucleotide repeats, which accounted for 61.93%. Transcriptome comparison of the three tissues yielded a total of 8841 DEGs, of which 3560 were up-regulated and 5281 were down-regulated. This study represents the greatest sequencing effort carried out for a jellyfish and provides the first high-throughput transcriptomic resource for jellyfish. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. miRBase: integrating microRNA annotation and deep-sequencing data.

    PubMed

    Kozomara, Ana; Griffiths-Jones, Sam

    2011-01-01

    miRBase is the primary online repository for all microRNA sequences and annotation. The current release (miRBase 16) contains over 15,000 microRNA gene loci in over 140 species, and over 17,000 distinct mature microRNA sequences. Deep-sequencing technologies have delivered a sharp rise in the rate of novel microRNA discovery. We have mapped reads from short RNA deep-sequencing experiments to microRNAs in miRBase and developed web interfaces to view these mappings. The user can view all read data associated with a given microRNA annotation, filter reads by experiment and count, and search for microRNAs by tissue- and stage-specific expression. These data can be used as a proxy for relative expression levels of microRNA sequences, provide detailed evidence for microRNA annotations and alternative isoforms of mature microRNAs, and allow us to revisit previous annotations. miRBase is available online at: http://www.mirbase.org/.

  10. Deep sequencing with intronic capture enables identification of an APC exon 10 inversion in a patient with polyposis.

    PubMed

    Shirts, Brian H; Salipante, Stephen J; Casadei, Silvia; Ryan, Shawnia; Martin, Judith; Jacobson, Angela; Vlaskin, Tatyana; Koehler, Karen; Livingston, Robert J; King, Mary-Claire; Walsh, Tom; Pritchard, Colin C

    2014-10-01

    Single-exon inversions have rarely been described in clinical syndromes and are challenging to detect using Sanger sequencing. We report the case of a 40-year-old woman with adenomatous colon polyps too numerous to count and who had a complex inversion spanning the entire exon 10 in APC (the gene encoding for adenomatous polyposis coli), causing exon skipping and resulting in a frameshift and premature protein truncation. In this study, we employed complete APC gene sequencing using high-coverage next-generation sequencing by ColoSeq, analysis with BreakDancer and SLOPE software, and confirmatory transcript analysis. ColoSeq identified a complex small genomic rearrangement consisting of an inversion that results in translational skipping of exon 10 in the APC gene. This mutation would not have been detected by traditional sequencing or gene-dosage methods. We report a case of adenomatous polyposis resulting from a complex single-exon inversion. Our report highlights the benefits of large-scale sequencing methods that capture intronic sequences with high enough depth of coverage-as well as the use of informatics tools-to enable detection of small pathogenic structural rearrangements.

  11. Same-day genomic and epigenomic diagnosis of brain tumors using real-time nanopore sequencing.

    PubMed

    Euskirchen, Philipp; Bielle, Franck; Labreche, Karim; Kloosterman, Wigard P; Rosenberg, Shai; Daniau, Mailys; Schmitt, Charlotte; Masliah-Planchon, Julien; Bourdeaut, Franck; Dehais, Caroline; Marie, Yannick; Delattre, Jean-Yves; Idbaih, Ahmed

    2017-11-01

    Molecular classification of cancer has entered clinical routine to inform diagnosis, prognosis, and treatment decisions. At the same time, new tumor entities have been identified that cannot be defined histologically. For central nervous system tumors, the current World Health Organization classification explicitly demands molecular testing, e.g., for 1p/19q-codeletion or IDH mutations, to make an integrated histomolecular diagnosis. However, a plethora of sophisticated technologies is currently needed to assess different genomic and epigenomic alterations and turnaround times are in the range of weeks, which makes standardized and widespread implementation difficult and hinders timely decision making. Here, we explored the potential of a pocket-size nanopore sequencing device for multimodal and rapid molecular diagnostics of cancer. Low-pass whole genome sequencing was used to simultaneously generate copy number (CN) and methylation profiles from native tumor DNA in the same sequencing run. Single nucleotide variants in IDH1, IDH2, TP53, H3F3A, and the TERT promoter region were identified using deep amplicon sequencing. Nanopore sequencing yielded ~0.1X genome coverage within 6 h and resulting CN and epigenetic profiles correlated well with matched microarray data. Diagnostically relevant alterations, such as 1p/19q codeletion, and focal amplifications could be recapitulated. Using ad hoc random forests, we could perform supervised pan-cancer classification to distinguish gliomas, medulloblastomas, and brain metastases of different primary sites. Single nucleotide variants in IDH1, IDH2, and H3F3A were identified using deep amplicon sequencing within minutes of sequencing. Detection of TP53 and TERT promoter mutations shows that sequencing of entire genes and GC-rich regions is feasible. Nanopore sequencing allows same-day detection of structural variants, point mutations, and methylation profiling using a single device with negligible capital cost. It outperforms hybridization-based and current sequencing technologies with respect to time to diagnosis and required laboratory equipment and expertise, aiming to make precision medicine possible for every cancer patient, even in resource-restricted settings.

  12. Coupling Deep Transcriptome Analysis with Untargeted Metabolic Profiling in Ophiorrhiza pumila to Further the Understanding of the Biosynthesis of the Anti-Cancer Alkaloid Camptothecin and Anthraquinones

    PubMed Central

    Yamazaki, Mami; Mochida, Keiichi; Asano, Takashi; Nakabayashi, Ryo; Chiba, Motoaki; Udomson, Nirin; Yamazaki, Yasuyo; Goodenowe, Dayan B.; Sankawa, Ushio; Yoshida, Takuhiro; Toyoda, Atsushi; Totoki, Yasushi; Sakaki, Yoshiyuki; Góngora-Castillo, Elsa; Buell, C. Robin; Sakurai, Tetsuya; Saito, Kazuki

    2013-01-01

    The Rubiaceae species, Ophiorrhiza pumila, accumulates camptothecin, an anti-cancer alkaloid with a potent DNA topoisomerase I inhibitory activity, as well as anthraquinones that are derived from the combination of the isochorismate and hemiterpenoid pathways. The biosynthesis of these secondary products is active in O. pumila hairy roots yet very low in cell suspension culture. Deep transcriptome analysis was conducted in O. pumila hairy roots and cell suspension cultures using the Illumina platform, yielding a total of 2 Gb of sequence for each sample. We generated a hybrid transcriptome assembly of O. pumila using the Illumina-derived short read sequences and conventional Sanger-derived expressed sequence tag clones derived from a full-length cDNA library constructed using RNA from hairy roots. Among 35,608 non-redundant unigenes, 3,649 were preferentially expressed in hairy roots compared with cell suspension culture. Candidate genes involved in the biosynthetic pathway for the monoterpenoid indole alkaloid camptothecin were identified; specifically, genes involved in post-strictosamide biosynthetic events and genes involved in the biosynthesis of anthraquinones and chlorogenic acid. Untargeted metabolomic analysis by Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR-MS) indicated that most of the proposed intermediates in the camptothecin biosynthetic pathway accumulated in hairy roots in a preferential manner compared with cell suspension culture. In addition, a number of anthraquinones and chlorogenic acid preferentially accumulated in hairy roots compared with cell suspension culture. These results suggest that deep transcriptome and metabolome data sets can facilitate the identification of genes and intermediates involved in the biosynthesis of secondary products including camptothecin in O. pumila. PMID:23503598

  13. Genetic diversity among pandemic 2009 influenza viruses isolated from a transmission chain

    PubMed Central

    2013-01-01

    Background Influenza viruses such as swine-origin influenza A(H1N1) virus (A(H1N1)pdm09) generate genetic diversity due to the high error rate of their RNA polymerase, often resulting in mixed genotype populations (intra-host variants) within a single infection. This variation helps influenza to rapidly respond to selection pressures, such as those imposed by the immunological host response and antiviral therapy. We have applied deep sequencing to characterize influenza intra-host variation in a transmission chain consisting of three cases due to oseltamivir-sensitive viruses, and one derived oseltamivir-resistant case. Methods Following detection of the A(H1N1)pdm09 infections, we deep-sequenced the complete NA gene from two of the oseltamivir-sensitive virus-infected cases, and all eight gene segments of the viruses causing the remaining two cases. Results No evidence for the resistance-causing mutation (resulting in NA H275Y substitution) was observed in the oseltamivir-sensitive cases. Furthermore, deep sequencing revealed a subpopulation of oseltamivir-sensitive viruses in the case carrying resistant viruses. We detected higher levels of intra-host variation in the case carrying oseltamivir-resistant viruses than in those infected with oseltamivir-sensitive viruses. Conclusions Oseltamivir-resistance was only detected after prophylaxis with oseltamivir, suggesting that the mutation was selected for as a result of antiviral intervention. The persisting oseltamivir-sensitive virus population in the case carrying resistant viruses suggests either that a small proportion survive the treatment, or that the oseltamivir-sensitive virus rapidly re-establishes itself in the virus population after the bottleneck. Moreover, the increased intra-host variation in the oseltamivir-resistant case is consistent with the hypothesis that the population diversity of a RNA virus can increase rapidly following a population bottleneck. PMID:23587185

  14. Methods, Tools and Current Perspectives in Proteogenomics *

    PubMed Central

    Ruggles, Kelly V.; Krug, Karsten; Wang, Xiaojing; Clauser, Karl R.; Wang, Jing; Payne, Samuel H.; Fenyö, David; Zhang, Bing; Mani, D. R.

    2017-01-01

    With combined technological advancements in high-throughput next-generation sequencing and deep mass spectrometry-based proteomics, proteogenomics, i.e. the integrative analysis of proteomic and genomic data, has emerged as a new research field. Early efforts in the field were focused on improving protein identification using sample-specific genomic and transcriptomic sequencing data. More recently, integrative analysis of quantitative measurements from genomic and proteomic studies have identified novel insights into gene expression regulation, cell signaling, and disease. Many methods and tools have been developed or adapted to enable an array of integrative proteogenomic approaches and in this article, we systematically classify published methods and tools into four major categories, (1) Sequence-centric proteogenomics; (2) Analysis of proteogenomic relationships; (3) Integrative modeling of proteogenomic data; and (4) Data sharing and visualization. We provide a comprehensive review of methods and available tools in each category and highlight their typical applications. PMID:28456751

  15. Prediction of novel pre-microRNAs with high accuracy through boosting and SVM.

    PubMed

    Zhang, Yuanwei; Yang, Yifan; Zhang, Huan; Jiang, Xiaohua; Xu, Bo; Xue, Yu; Cao, Yunxia; Zhai, Qian; Zhai, Yong; Xu, Mingqing; Cooke, Howard J; Shi, Qinghua

    2011-05-15

    High-throughput deep-sequencing technology has generated an unprecedented number of expressed short sequence reads, presenting not only an opportunity but also a challenge for prediction of novel microRNAs. To verify the existence of candidate microRNAs, we have to show that these short sequences can be processed from candidate pre-microRNAs. However, it is laborious and time consuming to verify these using existing experimental techniques. Therefore, here, we describe a new method, miRD, which is constructed using two feature selection strategies based on support vector machines (SVMs) and boosting method. It is a high-efficiency tool for novel pre-microRNA prediction with accuracy up to 94.0% among different species. miRD is implemented in PHP/PERL+MySQL+R and can be freely accessed at http://mcg.ustc.edu.cn/rpg/mird/mird.php.

  16. Draft Genome Sequence of Pseudomonas oceani DSM 100277T, a Deep-Sea Bacterium

    PubMed Central

    2018-01-01

    ABSTRACT Pseudomonas oceani DSM 100277T was isolated from deep seawater in the Okinawa Trough at 1390 m. P. oceani belongs to the Pseudomonas pertucinogena group. Here, we report the draft genome sequence of P. oceani, which has an estimated size of 4.1 Mb and exhibits 3,790 coding sequences, with a G+C content of 59.94 mol%. PMID:29650573

  17. A Bioinformatic Pipeline for Monitoring of the Mutational Stability of Viral Drug Targets with Deep-Sequencing Technology.

    PubMed

    Kravatsky, Yuri; Chechetkin, Vladimir; Fedoseeva, Daria; Gorbacheva, Maria; Kravatskaya, Galina; Kretova, Olga; Tchurikov, Nickolai

    2017-11-23

    The efficient development of antiviral drugs, including efficient antiviral small interfering RNAs (siRNAs), requires continuous monitoring of the strict correspondence between a drug and the related highly variable viral DNA/RNA target(s). Deep sequencing is able to provide an assessment of both the general target conservation and the frequency of particular mutations in the different target sites. The aim of this study was to develop a reliable bioinformatic pipeline for the analysis of millions of short, deep sequencing reads corresponding to selected highly variable viral sequences that are drug target(s). The suggested bioinformatic pipeline combines the available programs and the ad hoc scripts based on an original algorithm of the search for the conserved targets in the deep sequencing data. We also present the statistical criteria for the threshold of reliable mutation detection and for the assessment of variations between corresponding data sets. These criteria are robust against the possible sequencing errors in the reads. As an example, the bioinformatic pipeline is applied to the study of the conservation of RNA interference (RNAi) targets in human immunodeficiency virus 1 (HIV-1) subtype A. The developed pipeline is freely available to download at the website http://virmut.eimb.ru/. Brief comments and comparisons between VirMut and other pipelines are also presented.

  18. Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks

    PubMed Central

    Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun

    2018-01-01

    Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence’s saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them. PMID:27896980

  19. Aftershock occurrence rate decay for individual sequences and catalogs

    NASA Astrophysics Data System (ADS)

    Nyffenegger, Paul A.

    One of the earliest observations of the Earth's seismicity is that the rate of aftershock occurrence decays with time according to a power law commonly known as modified Omori-law (MOL) decay. However, the physical reasons for aftershock occurrence and the empirical decay in rate remain unclear despite numerous models that yield similar rate decay behavior. Key problems in relating the observed empirical relationship to the physical conditions of the mainshock and fault are the lack of studies including small magnitude mainshocks and the lack of uniformity between studies. We use simulated aftershock sequences to investigate the factors which influence the maximum likelihood (ML) estimate of the Omori-law p value, the parameter describing aftershock occurrence rate decay, for both individual aftershock sequences and "stacked" or superposed sequences. Generally the ML estimate of p is accurate, but since the ML estimated uncertainty is unaffected by whether the sequence resembles an MOL model, a goodness-of-fit test such as the Anderson-Darling statistic is necessary. While stacking aftershock sequences permits the study of entire catalogs and sequences with small aftershock populations, stacking introduces artifacts. The p value for stacked sequences is approximately equal to the mean of the individual sequence p values. We apply single-link cluster analysis to identify all aftershock sequences from eleven regional seismicity catalogs. We observe two new mathematically predictable empirical relationships for the distribution of aftershock sequence populations. The average properties of aftershock sequences are not correlated with tectonic environment, but aftershock populations and p values do show a depth dependence. The p values show great variability with time, and large values or changes in p sometimes precedes major earthquakes. Studies of teleseismic earthquake catalogs over the last twenty years have led seismologists to question seismicity models and aftershock sequence decay for deep sequences. For seven exceptional deep sequences, we conclude that MOL decay adequately describes these sequences, and little difference exists compared to shallow sequences. However, they do include larger aftershock populations compared to most deep sequences. These results imply that p values for deep sequences are larger than those for intermediate depth sequences.

  20. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.

    PubMed

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-11

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

  1. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields

    NASA Astrophysics Data System (ADS)

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-01

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

  2. Deep learning methods for protein torsion angle prediction.

    PubMed

    Li, Haiou; Hou, Jie; Adhikari, Badri; Lyu, Qiang; Cheng, Jianlin

    2017-09-18

    Deep learning is one of the most powerful machine learning methods that has achieved the state-of-the-art performance in many domains. Since deep learning was introduced to the field of bioinformatics in 2012, it has achieved success in a number of areas such as protein residue-residue contact prediction, secondary structure prediction, and fold recognition. In this work, we developed deep learning methods to improve the prediction of torsion (dihedral) angles of proteins. We design four different deep learning architectures to predict protein torsion angles. The architectures including deep neural network (DNN) and deep restricted Boltzmann machine (DRBN), deep recurrent neural network (DRNN) and deep recurrent restricted Boltzmann machine (DReRBM) since the protein torsion angle prediction is a sequence related problem. In addition to existing protein features, two new features (predicted residue contact number and the error distribution of torsion angles extracted from sequence fragments) are used as input to each of the four deep learning architectures to predict phi and psi angles of protein backbone. The mean absolute error (MAE) of phi and psi angles predicted by DRNN, DReRBM, DRBM and DNN is about 20-21° and 29-30° on an independent dataset. The MAE of phi angle is comparable to the existing methods, but the MAE of psi angle is 29°, 2° lower than the existing methods. On the latest CASP12 targets, our methods also achieved the performance better than or comparable to a state-of-the art method. Our experiment demonstrates that deep learning is a valuable method for predicting protein torsion angles. The deep recurrent network architecture performs slightly better than deep feed-forward architecture, and the predicted residue contact number and the error distribution of torsion angles extracted from sequence fragments are useful features for improving prediction accuracy.

  3. High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features.

    PubMed

    Jones, David T; Kandathil, Shaun M

    2018-04-26

    In addition to substitution frequency data from protein sequence alignments, many state-of-the-art methods for contact prediction rely on additional sources of information, or features, of protein sequences in order to predict residue-residue contacts, such as solvent accessibility, predicted secondary structure, and scores from other contact prediction methods. It is unclear how much of this information is needed to achieve state-of-the-art results. Here, we show that using deep neural network models, simple alignment statistics contain sufficient information to achieve state-of-the-art precision. Our prediction method, DeepCov, uses fully convolutional neural networks operating on amino-acid pair frequency or covariance data derived directly from sequence alignments, without using global statistical methods such as sparse inverse covariance or pseudolikelihood estimation. Comparisons against CCMpred and MetaPSICOV2 show that using pairwise covariance data calculated from raw alignments as input allows us to match or exceed the performance of both of these methods. Almost all of the achieved precision is obtained when considering relatively local windows (around 15 residues) around any member of a given residue pairing; larger window sizes have comparable performance. Assessment on a set of shallow sequence alignments (fewer than 160 effective sequences) indicates that the new method is substantially more precise than CCMpred and MetaPSICOV2 in this regime, suggesting that improved precision is attainable on smaller sequence families. Overall, the performance of DeepCov is competitive with the state of the art, and our results demonstrate that global models, which employ features from all parts of the input alignment when predicting individual contacts, are not strictly needed in order to attain precise contact predictions. DeepCov is freely available at https://github.com/psipred/DeepCov. d.t.jones@ucl.ac.uk.

  4. Ancient genomics

    PubMed Central

    Der Sarkissian, Clio; Allentoft, Morten E.; Ávila-Arcos, María C.; Barnett, Ross; Campos, Paula F.; Cappellini, Enrico; Ermini, Luca; Fernández, Ruth; da Fonseca, Rute; Ginolhac, Aurélien; Hansen, Anders J.; Jónsson, Hákon; Korneliussen, Thorfinn; Margaryan, Ashot; Martin, Michael D.; Moreno-Mayar, J. Víctor; Raghavan, Maanasa; Rasmussen, Morten; Velasco, Marcela Sandoval; Schroeder, Hannes; Schubert, Mikkel; Seguin-Orlando, Andaine; Wales, Nathan; Gilbert, M. Thomas P.; Willerslev, Eske; Orlando, Ludovic

    2015-01-01

    The past decade has witnessed a revolution in ancient DNA (aDNA) research. Although the field's focus was previously limited to mitochondrial DNA and a few nuclear markers, whole genome sequences from the deep past can now be retrieved. This breakthrough is tightly connected to the massive sequence throughput of next generation sequencing platforms and the ability to target short and degraded DNA molecules. Many ancient specimens previously unsuitable for DNA analyses because of extensive degradation can now successfully be used as source materials. Additionally, the analytical power obtained by increasing the number of sequence reads to billions effectively means that contamination issues that have haunted aDNA research for decades, particularly in human studies, can now be efficiently and confidently quantified. At present, whole genomes have been sequenced from ancient anatomically modern humans, archaic hominins, ancient pathogens and megafaunal species. Those have revealed important functional and phenotypic information, as well as unexpected adaptation, migration and admixture patterns. As such, the field of aDNA has entered the new era of genomics and has provided valuable information when testing specific hypotheses related to the past. PMID:25487338

  5. De Novo Peptide Sequencing: Deep Mining of High-Resolution Mass Spectrometry Data.

    PubMed

    Islam, Mohammad Tawhidul; Mohamedali, Abidali; Fernandes, Criselda Santan; Baker, Mark S; Ranganathan, Shoba

    2017-01-01

    High resolution mass spectrometry has revolutionized proteomics over the past decade, resulting in tremendous amounts of data in the form of mass spectra, being generated in a relatively short span of time. The mining of this spectral data for analysis and interpretation though has lagged behind such that potentially valuable data is being overlooked because it does not fit into the mold of traditional database searching methodologies. Although the analysis of spectra by de novo sequences removes such biases and has been available for a long period of time, its uptake has been slow or almost nonexistent within the scientific community. In this chapter, we propose a methodology to integrate de novo peptide sequencing using three commonly available software solutions in tandem, complemented by homology searching, and manual validation of spectra. This simplified method would allow greater use of de novo sequencing approaches and potentially greatly increase proteome coverage leading to the unearthing of valuable insights into protein biology, especially of organisms whose genomes have been recently sequenced or are poorly annotated.

  6. Using Poisson mixed-effects model to quantify transcript-level gene expression in RNA-Seq.

    PubMed

    Hu, Ming; Zhu, Yu; Taylor, Jeremy M G; Liu, Jun S; Qin, Zhaohui S

    2012-01-01

    RNA sequencing (RNA-Seq) is a powerful new technology for mapping and quantifying transcriptomes using ultra high-throughput next-generation sequencing technologies. Using deep sequencing, gene expression levels of all transcripts including novel ones can be quantified digitally. Although extremely promising, the massive amounts of data generated by RNA-Seq, substantial biases and uncertainty in short read alignment pose challenges for data analysis. In particular, large base-specific variation and between-base dependence make simple approaches, such as those that use averaging to normalize RNA-Seq data and quantify gene expressions, ineffective. In this study, we propose a Poisson mixed-effects (POME) model to characterize base-level read coverage within each transcript. The underlying expression level is included as a key parameter in this model. Since the proposed model is capable of incorporating base-specific variation as well as between-base dependence that affect read coverage profile throughout the transcript, it can lead to improved quantification of the true underlying expression level. POME can be freely downloaded at http://www.stat.purdue.edu/~yuzhu/pome.html. yuzhu@purdue.edu; zhaohui.qin@emory.edu Supplementary data are available at Bioinformatics online.

  7. A combination of LongSAGE with Solexa sequencing is well suited to explore the depth and the complexity of transcriptome

    PubMed Central

    Hanriot, Lucie; Keime, Céline; Gay, Nadine; Faure, Claudine; Dossat, Carole; Wincker, Patrick; Scoté-Blachon, Céline; Peyron, Christelle; Gandrillon, Olivier

    2008-01-01

    Background "Open" transcriptome analysis methods allow to study gene expression without a priori knowledge of the transcript sequences. As of now, SAGE (Serial Analysis of Gene Expression), LongSAGE and MPSS (Massively Parallel Signature Sequencing) are the mostly used methods for "open" transcriptome analysis. Both LongSAGE and MPSS rely on the isolation of 21 pb tag sequences from each transcript. In contrast to LongSAGE, the high throughput sequencing method used in MPSS enables the rapid sequencing of very large libraries containing several millions of tags, allowing deep transcriptome analysis. However, a bias in the complexity of the transcriptome representation obtained by MPSS was recently uncovered. Results In order to make a deep analysis of mouse hypothalamus transcriptome avoiding the limitation introduced by MPSS, we combined LongSAGE with the Solexa sequencing technology and obtained a library of more than 11 millions of tags. We then compared it to a LongSAGE library of mouse hypothalamus sequenced with the Sanger method. Conclusion We found that Solexa sequencing technology combined with LongSAGE is perfectly suited for deep transcriptome analysis. In contrast to MPSS, it gives a complex representation of transcriptome as reliable as a LongSAGE library sequenced by the Sanger method. PMID:18796152

  8. A deep intronic CLRN1 (USH3A) founder mutation generates an aberrant exon and underlies severe Usher syndrome on the Arabian Peninsula.

    PubMed

    Khan, Arif O; Becirovic, Elvir; Betz, Christian; Neuhaus, Christine; Altmüller, Janine; Maria Riedmayr, Lisa; Motameny, Susanne; Nürnberg, Gudrun; Nürnberg, Peter; Bolz, Hanno J

    2017-05-03

    Deafblindness is mostly due to Usher syndrome caused by recessive mutations in the known genes. Mutation-negative patients therefore either have distinct diseases, mutations in yet unknown Usher genes or in extra-exonic parts of the known genes - to date a largely unexplored possibility. In a consanguineous Saudi family segregating Usher syndrome type 1 (USH1), NGS of genes for Usher syndrome, deafness and retinal dystrophy and subsequent whole-exome sequencing each failed to identify a mutation. Genome-wide linkage analysis revealed two small candidate regions on chromosome 3, one containing the USH3A gene CLRN1, which has never been associated with Usher syndrome in Saudi Arabia. Whole-genome sequencing (WGS) identified a homozygous deep intronic mutation, c.254-649T > G, predicted to generate a novel donor splice site. CLRN1 minigene-based analysis confirmed the splicing of an aberrant exon due to usage of this novel motif, resulting in a frameshift and a premature termination codon. We identified this mutation in an additional two of seven unrelated mutation-negative Saudi USH1 patients. Locus-specific markers indicated that c.254-649T > G CLRN1 represents a founder allele that may significantly contribute to deafblindness in this population. Our finding underlines the potential of WGS to uncover atypically localized, hidden mutations in patients who lack exonic mutations in the known disease genes.

  9. Effective deep learning training for single-image super-resolution in endomicroscopy exploiting video-registration-based reconstruction.

    PubMed

    Ravì, Daniele; Szczotka, Agnieszka Barbara; Shakir, Dzhoshkun Ismail; Pereira, Stephen P; Vercauteren, Tom

    2018-06-01

    Probe-based confocal laser endomicroscopy (pCLE) is a recent imaging modality that allows performing in vivo optical biopsies. The design of pCLE hardware, and its reliance on an optical fibre bundle, fundamentally limits the image quality with a few tens of thousands fibres, each acting as the equivalent of a single-pixel detector, assembled into a single fibre bundle. Video registration techniques can be used to estimate high-resolution (HR) images by exploiting the temporal information contained in a sequence of low-resolution (LR) images. However, the alignment of LR frames, required for the fusion, is computationally demanding and prone to artefacts. In this work, we propose a novel synthetic data generation approach to train exemplar-based Deep Neural Networks (DNNs). HR pCLE images with enhanced quality are recovered by the models trained on pairs of estimated HR images (generated by the video registration algorithm) and realistic synthetic LR images. Performance of three different state-of-the-art DNNs techniques were analysed on a Smart Atlas database of 8806 images from 238 pCLE video sequences. The results were validated through an extensive image quality assessment that takes into account different quality scores, including a Mean Opinion Score (MOS). Results indicate that the proposed solution produces an effective improvement in the quality of the obtained reconstructed image. The proposed training strategy and associated DNNs allows us to perform convincing super-resolution of pCLE images.

  10. Assessment of clinical analytical sensitivity and specificity of next-generation sequencing for detection of simple and complex mutations.

    PubMed

    Chin, Ephrem L H; da Silva, Cristina; Hegde, Madhuri

    2013-02-19

    Detecting mutations in disease genes by full gene sequence analysis is common in clinical diagnostic laboratories. Sanger dideoxy terminator sequencing allows for rapid development and implementation of sequencing assays in the clinical laboratory, but it has limited throughput, and due to cost constraints, only allows analysis of one or at most a few genes in a patient. Next-generation sequencing (NGS), on the other hand, has evolved rapidly, although to date it has mainly been used for large-scale genome sequencing projects and is beginning to be used in the clinical diagnostic testing. One advantage of NGS is that many genes can be analyzed easily at the same time, allowing for mutation detection when there are many possible causative genes for a specific phenotype. In addition, regions of a gene typically not tested for mutations, like deep intronic and promoter mutations, can also be detected. Here we use 20 previously characterized Sanger-sequenced positive controls in disease-causing genes to demonstrate the utility of NGS in a clinical setting using standard PCR based amplification to assess the analytical sensitivity and specificity of the technology for detecting all previously characterized changes (mutations and benign SNPs). The positive controls chosen for validation range from simple substitution mutations to complex deletion and insertion mutations occurring in autosomal dominant and recessive disorders. The NGS data was 100% concordant with the Sanger sequencing data identifying all 119 previously identified changes in the 20 samples. We have demonstrated that NGS technology is ready to be deployed in clinical laboratories. However, NGS and associated technologies are evolving, and clinical laboratories will need to invest significantly in staff and infrastructure to build the necessary foundation for success.

  11. Deep sequencing-based transcriptome analysis of Plutella xylostella larvae parasitized by Diadegma semiclausum

    PubMed Central

    2011-01-01

    Background Parasitoid insects manipulate their hosts' physiology by injecting various factors into their host upon parasitization. Transcriptomic approaches provide a powerful approach to study insect host-parasitoid interactions at the molecular level. In order to investigate the effects of parasitization by an ichneumonid wasp (Diadegma semiclausum) on the host (Plutella xylostella), the larval transcriptome profile was analyzed using a short-read deep sequencing method (Illumina). Symbiotic polydnaviruses (PDVs) associated with ichneumonid parasitoids, known as ichnoviruses, play significant roles in host immune suppression and developmental regulation. In the current study, D. semiclausum ichnovirus (DsIV) genes expressed in P. xylostella were identified and their sequences compared with other reported PDVs. Five of these genes encode proteins of unknown identity, that have not previously been reported. Results De novo assembly of cDNA sequence data generated 172,660 contigs between 100 and 10000 bp in length; with 35% of > 200 bp in length. Parasitization had significant impacts on expression levels of 928 identified insect host transcripts. Gene ontology data illustrated that the majority of the differentially expressed genes are involved in binding, catalytic activity, and metabolic and cellular processes. In addition, the results show that transcription levels of antimicrobial peptides, such as gloverin, cecropin E and lysozyme, were up-regulated after parasitism. Expression of ichnovirus genes were detected in parasitized larvae with 19 unique sequences identified from five PDV gene families including vankyrin, viral innexin, repeat elements, a cysteine-rich motif, and polar residue rich protein. Vankyrin 1 and repeat element 1 genes showed the highest transcription levels among the DsIV genes. Conclusion This study provides detailed information on differential expression of P. xylostella larval genes following parasitization, DsIV genes expressed in the host and also improves our current understanding of this host-parasitoid interaction. PMID:21906285

  12. Identification of novel microRNAs in Hevea brasiliensis and computational prediction of their targets

    PubMed Central

    2012-01-01

    Background Plants respond to external stimuli through fine regulation of gene expression partially ensured by small RNAs. Of these, microRNAs (miRNAs) play a crucial role. They negatively regulate gene expression by targeting the cleavage or translational inhibition of target messenger RNAs (mRNAs). In Hevea brasiliensis, environmental and harvesting stresses are known to affect natural rubber production. This study set out to identify abiotic stress-related miRNAs in Hevea using next-generation sequencing and bioinformatic analysis. Results Deep sequencing of small RNAs was carried out on plantlets subjected to severe abiotic stress using the Solexa technique. By combining the LeARN pipeline, data from the Plant microRNA database (PMRD) and Hevea EST sequences, we identified 48 conserved miRNA families already characterized in other plant species, and 10 putatively novel miRNA families. The results showed the most abundant size for miRNAs to be 24 nucleotides, except for seven families. Several MIR genes produced both 20-22 nucleotides and 23-27 nucleotides. The two miRNA class sizes were detected for both conserved and putative novel miRNA families, suggesting their functional duality. The EST databases were scanned with conserved and novel miRNA sequences. MiRNA targets were computationally predicted and analysed. The predicted targets involved in "responses to stimuli" and to "antioxidant" and "transcription activities" are presented. Conclusions Deep sequencing of small RNAs combined with transcriptomic data is a powerful tool for identifying conserved and novel miRNAs when the complete genome is not yet available. Our study provided additional information for evolutionary studies and revealed potentially specific regulation of the control of redox status in Hevea. PMID:22330773

  13. Within-Host Variations of Human Papillomavirus Reveal APOBEC Signature Mutagenesis in the Viral Genome.

    PubMed

    Hirose, Yusuke; Onuki, Mamiko; Tenjimbayashi, Yuri; Mori, Seiichiro; Ishii, Yoshiyuki; Takeuchi, Takamasa; Tasaka, Nobutaka; Satoh, Toyomi; Morisada, Tohru; Iwata, Takashi; Miyamoto, Shingo; Matsumoto, Koji; Sekizawa, Akihiko; Kukimoto, Iwao

    2018-06-15

    Persistent infection with oncogenic human papillomaviruses (HPVs) causes cervical cancer, accompanied by the accumulation of somatic mutations into the host genome. There are concomitant genetic changes in the HPV genome during viral infection; however, their relevance to cervical carcinogenesis is poorly understood. Here, we explored within-host genetic diversity of HPV by performing deep-sequencing analyses of viral whole-genome sequences in clinical specimens. The whole genomes of HPV types 16, 52, and 58 were amplified by type-specific PCR from total cellular DNA of cervical exfoliated cells collected from patients with cervical intraepithelial neoplasia (CIN) and invasive cervical cancer (ICC) and were deep sequenced. After constructing a reference viral genome sequence for each specimen, nucleotide positions showing changes with >0.5% frequencies compared to the reference sequence were determined for individual samples. In total, 1,052 positions of nucleotide variations were detected in HPV genomes from 151 samples (CIN1, n = 56; CIN2/3, n = 68; ICC, n = 27), with various numbers per sample. Overall, C-to-T and C-to-A substitutions were the dominant changes observed across all histological grades. While C-to-T transitions were predominantly detected in CIN1, their prevalence was decreased in CIN2/3 and fell below that of C-to-A transversions in ICC. Analysis of the trinucleotide context encompassing substituted bases revealed that TpCpN, a preferred target sequence for cellular APOBEC cytosine deaminases, was a primary site for C-to-T substitutions in the HPV genome. These results strongly imply that the APOBEC proteins are drivers of HPV genome mutation, particularly in CIN1 lesions. IMPORTANCE HPVs exhibit surprisingly high levels of genetic diversity, including a large repertoire of minor genomic variants in each viral genotype. Here, by conducting deep-sequencing analyses, we show for the first time a comprehensive snapshot of the within-host genetic diversity of high-risk HPVs during cervical carcinogenesis. Quasispecies harboring minor nucleotide variations in viral whole-genome sequences were extensively observed across different grades of CIN and cervical cancer. Among the within-host variations, C-to-T transitions, a characteristic change mediated by cellular APOBEC cytosine deaminases, were predominantly detected throughout the whole viral genome, most strikingly in low-grade CIN lesions. The results strongly suggest that within-host variations of the HPV genome are primarily generated through the interaction with host cell DNA-editing enzymes and that such within-host variability is an evolutionary source of the genetic diversity of HPVs. Copyright © 2018 American Society for Microbiology.

  14. Living microbial ecosystems within the active zone of catagenesis: Implications for feeding the deep biosphere

    NASA Astrophysics Data System (ADS)

    Horsfield, B.; Schenk, H. J.; Zink, K.; Ondrak, R.; Dieckmann, V.; Kallmeyer, J.; Mangelsdorf, K.; di Primio, R.; Wilkes, H.; Parkes, R. J.; Fry, J.; Cragg, B.

    2006-06-01

    Earth's largest reactive carbon pool, marine sedimentary organic matter, becomes increasingly recalcitrant during burial, making it almost inaccessible as a substrate for microorganisms, and thereby limiting metabolic activity in the deep biosphere. Because elevated temperature acting over geological time leads to the massive thermal breakdown of the organic matter into volatiles, including petroleum, the question arises whether microorganisms can directly utilize these maturation products as a substrate. While migrated thermogenic fluids are known to sustain microbial consortia in shallow sediments, an in situ coupling of abiotic generation and microbial utilization has not been demonstrated. Here we show, using a combination of basin modelling, kinetic modelling, geomicrobiology and biogeochemistry, that microorganisms inhabit the active generation zone in the Nankai Trough, offshore Japan. Three sites from ODP Leg 190 have been evaluated, namely 1173, 1174 and 1177, drilled in nearly undeformed Quaternary and Tertiary sedimentary sequences seaward of the Nankai Trough itself. Paleotemperatures were reconstructed based on subsidence profiles, compaction modelling, present-day heat flow, downhole temperature measurements and organic maturity parameters. Today's heat flow distribution can be considered mainly conductive, and is extremely high in places, reaching 180 mW/m 2. The kinetic parameters describing total hydrocarbon generation, determined by laboratory pyrolysis experiments, were utilized by the model in order to predict the timing of generation in time and space. The model predicts that the onset of present day generation lies between 300 and 500 m below sea floor (5100-5300 m below mean sea level), depending on well location. In the case of Site 1174, 5-10% conversion has taken place by a present day temperature of ca. 85 °C. Predictions were largely validated by on-site hydrocarbon gas measurements. Viable organisms in the same depth range have been proven using 14C-radiolabelled substrates for methanogenesis, bacterial cell counts and intact phospholipids. Altogether, these results point to an overlap of abiotic thermal degradation reactions going on in the same part of the sedimentary column as where a deep biosphere exists. The organic matter preserved in Nankai Trough sediments is of the type that generates putative feedstocks for microbial activity, namely oxygenated compounds and hydrocarbons. Furthermore, the rates of thermal degradation calculated from the kinetic model closely resemble rates of respiration and electron donor consumption independently measured in other deep biosphere environments. We deduce that abiotically driven degradation reactions have provided substrates for microbial activity in deep sediments at this convergent continental margin.

  15. Iterative free-energy optimization for recurrent neural networks (INFERNO).

    PubMed

    Pitti, Alexandre; Gaussier, Philippe; Quoy, Mathias

    2017-01-01

    The intra-parietal lobe coupled with the Basal Ganglia forms a working memory that demonstrates strong planning capabilities for generating robust yet flexible neuronal sequences. Neurocomputational models however, often fails to control long range neural synchrony in recurrent spiking networks due to spontaneous activity. As a novel framework based on the free-energy principle, we propose to see the problem of spikes' synchrony as an optimization problem of the neurons sub-threshold activity for the generation of long neuronal chains. Using a stochastic gradient descent, a reinforcement signal (presumably dopaminergic) evaluates the quality of one input vector to move the recurrent neural network to a desired activity; depending on the error made, this input vector is strengthened to hill-climb the gradient or elicited to search for another solution. This vector can be learned then by one associative memory as a model of the basal-ganglia to control the recurrent neural network. Experiments on habit learning and on sequence retrieving demonstrate the capabilities of the dual system to generate very long and precise spatio-temporal sequences, above two hundred iterations. Its features are applied then to the sequential planning of arm movements. In line with neurobiological theories, we discuss its relevance for modeling the cortico-basal working memory to initiate flexible goal-directed neuronal chains of causation and its relation to novel architectures such as Deep Networks, Neural Turing Machines and the Free-Energy Principle.

  16. Deep Impact Sequence Planning Using Multi-Mission Adaptable Planning Tools With Integrated Spacecraft Models

    NASA Technical Reports Server (NTRS)

    Wissler, Steven S.; Maldague, Pierre; Rocca, Jennifer; Seybold, Calina

    2006-01-01

    The Deep Impact mission was ambitious and challenging. JPL's well proven, easily adaptable multi-mission sequence planning tools combined with integrated spacecraft subsystem models enabled a small operations team to develop, validate, and execute extremely complex sequence-based activities within very short development times. This paper focuses on the core planning tool used in the mission, APGEN. It shows how the multi-mission design and adaptability of APGEN made it possible to model spacecraft subsystems as well as ground assets throughout the lifecycle of the Deep Impact project, starting with models of initial, high-level mission objectives, and culminating in detailed predictions of spacecraft behavior during mission-critical activities.

  17. Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction

    PubMed Central

    Laehnemann, David; Borkhardt, Arndt

    2016-01-01

    Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here. PMID:26026159

  18. Deep Learning and Its Applications in Biomedicine.

    PubMed

    Cao, Chensi; Liu, Feng; Tan, Hai; Song, Deshou; Shu, Wenjie; Li, Weizhong; Zhou, Yiming; Bo, Xiaochen; Xie, Zhi

    2018-02-01

    Advances in biological and medical technologies have been providing us explosive volumes of biological and physiological data, such as medical images, electroencephalography, genomic and protein sequences. Learning from these data facilitates the understanding of human health and disease. Developed from artificial neural networks, deep learning-based algorithms show great promise in extracting features and learning patterns from complex data. The aim of this paper is to provide an overview of deep learning techniques and some of the state-of-the-art applications in the biomedical field. We first introduce the development of artificial neural network and deep learning. We then describe two main components of deep learning, i.e., deep learning architectures and model optimization. Subsequently, some examples are demonstrated for deep learning applications, including medical image classification, genomic sequence analysis, as well as protein structure classification and prediction. Finally, we offer our perspectives for the future directions in the field of deep learning. Copyright © 2018. Production and hosting by Elsevier B.V.

  19. Comparative genomic analysis of oil spill impacts on deep water shipwreck microbiomes in the northern Gulf of Mexico

    NASA Astrophysics Data System (ADS)

    Hamdan, L. J.; Damour, M.; McGown, C.; Figan, C.; Kassahun, Z.; Blackwell, K.; Horrell, C.; Gillevet, P.

    2014-12-01

    Shipwrecks serve as artificial reefs in the deep ocean. Because of their inherent diversity compared to their surrounding environment and their random distribution, shipwrecks are ideal ecosystems to study pollution impacts and microbial distribution patterns in the deep biosphere. This study provides a comparative assessment of Deepwater Horizon spill impacts on shipwreck and local sedimentary microbiomes and the synergistic effects of contaminants on these communities and the physical structures that support them. For this study, microbiomes associated with wooden 19th century shipwrecks and World War II era steel shipwrecks in the northern Gulf of Mexico were investigated using next generation sequencing. Samples derived from in situ biofilm monitoring platforms deployed adjacent to 5 shipwrecks for 4 months, and sediment collected from distances ranging from 2-200m from each shipwreck were evaluated for shifts in microbiome structure and gene function relative to proximity to the spill, and oil spill related contaminants in the local environment. The goals of the investigation are to determine impacts to recruitment and community structure at sites located within and outside of areas impacted by the spill. Taxonomic classification of dominant and rare members of shipwreck microbiomes and metabolic information extracted from sequence data yield new understanding of microbial processes associated with site formation. The study provides information on the identity of microbial inhabitants of shipwrecks, their role in site preservation, and impacts of the Deepwater Horizon spill on the primary colonizers of artificial reefs in the deep ocean. This approach could inform about the role of microorganisms in establishment and maintenance of the artificial reef environment, while providing information about ecosystem feedbacks resulting from spills.

  20. Recalcitrant deep and shallow nodes in Aristolochia (Aristolochiaceae) illuminated using anchored hybrid enrichment.

    PubMed

    Wanke, Stefan; Granados Mendoza, Carolina; Müller, Sebastian; Paizanni Guillén, Anna; Neinhuis, Christoph; Lemmon, Alan R; Lemmon, Emily Moriarty; Samain, Marie-Stéphanie

    2017-12-01

    Recalcitrant relationships are characterized by very short internodes that can be found among shallow and deep phylogenetic scales all over the tree of life. Adding large amounts of presumably informative sequences, while decreasing systematic error, has been suggested as a possible approach to increase phylogenetic resolution. The development of enrichment strategies, coupled with next generation sequencing, resulted in a cost-effective way to facilitate the reconstruction of recalcitrant relationships. By applying the anchored hybrid enrichment (AHE) genome partitioning strategy to Aristolochia using an universal angiosperm probe set, we obtained 231-233 out of 517 single or low copy nuclear loci originally contained in the enrichment kit, resulting in a total alignment length of 154,756bp to 160,150bp. Since Aristolochia (Piperales; magnoliids) is distantly related to any angiosperm species whose genome has been used for the plant AHE probe design (Amborella trichopoda being the closest), it serves as a proof of universality for this probe set. Aristolochia comprises approximately 500 species grouped in several clades (OTUs), whose relationships to each other are partially unknown. Previous phylogenetic studies have shown that these lineages branched deep in time and in quick succession, seen as short-deep internodes. Short-shallow internodes are also characteristic of some Aristolochia lineages such as Aristolochia subsection Pentandrae, a clade of presumably recent diversification. This subsection is here included to test the performance of AHE at species level. Filtering and subsampling loci using the phylogenetic informativeness method resolves several recalcitrant phylogenetic relationships within Aristolochia. By assuming different ploidy levels during bioinformatics processing of raw data, first hints are obtained that polyploidization contributed to the evolution of Aristolochia. Phylogenetic results are discussed in the light of current systematics and morphology. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  1. A Follow-Up of the Multicenter Collaborative Study on HIV-1 Drug Resistance and Tropism Testing Using 454 Ultra Deep Pyrosequencing

    PubMed Central

    St. John, Elizabeth P.; Simen, Birgitte B.; Turenchalk, Gregory S.; Braverman, Michael S.; Abbate, Isabella; Aerssens, Jeroen; Bouchez, Olivier; Gabriel, Christian; Izopet, Jacques; Meixenberger, Karolin; Di Giallonardo, Francesca; Schlapbach, Ralph; Paredes, Roger; Sakwa, James; Schmitz-Agheguian, Gudrun G.; Thielen, Alexander; Victor, Martin

    2016-01-01

    Background Ultra deep sequencing is of increasing use not only in research but also in diagnostics. For implementation of ultra deep sequencing assays in clinical laboratories for routine diagnostics, intra- and inter-laboratory testing are of the utmost importance. Methods A multicenter study was conducted to validate an updated assay design for 454 Life Sciences’ GS FLX Titanium system targeting protease/reverse transcriptase (RTP) and env (V3) regions to identify HIV-1 drug-resistance mutations and determine co-receptor use with high sensitivity. The study included 30 HIV-1 subtype B and 6 subtype non-B samples with viral titers (VT) of 3,940–447,400 copies/mL, two dilution series (52,129–1,340 and 25,130–734 copies/mL), and triplicate samples. Amplicons spanning PR codons 10–99, RT codons 1–251 and the entire V3 region were generated using barcoded primers. Analysis was performed using the GS Amplicon Variant Analyzer and geno2pheno for tropism. For comparison, population sequencing was performed using the ViroSeq HIV-1 genotyping system. Results The median sequencing depth across the 11 sites was 1,829 reads per position for RTP (IQR 592–3,488) and 2,410 for V3 (IQR 786–3,695). 10 preselected drug resistant variants were measured across sites and showed high inter-laboratory correlation across all sites with data (P<0.001). The triplicate samples of a plasmid mixture confirmed the high inter-laboratory consistency (mean% ± stdev: 4.6 ±0.5, 4.8 ±0.4, 4.9 ±0.3) and revealed good intra-laboratory consistency (mean% range ± stdev range: 4.2–5.2 ± 0.04–0.65). In the two dilutions series, no variants >20% were missed, variants 2–10% were detected at most sites (even at low VT), and variants 1–2% were detected by some sites. All mutations detected by population sequencing were also detected by UDS. Conclusions This assay design results in an accurate and reproducible approach to analyze HIV-1 mutant spectra, even at variant frequencies well below those routinely detectable by population sequencing. PMID:26756901

  2. Emergent HIV-1 Drug Resistance Mutations Were Not Present at Low-Frequency at Baseline in Non-Nucleoside Reverse Transcriptase Inhibitor-Treated Subjects in the STaR Study

    PubMed Central

    Porter, Danielle P.; Daeumer, Martin; Thielen, Alexander; Chang, Silvia; Martin, Ross; Cohen, Cal; Miller, Michael D.; White, Kirsten L.

    2015-01-01

    At Week 96 of the Single-Tablet Regimen (STaR) study, more treatment-naïve subjects that received rilpivirine/emtricitabine/tenofovir DF (RPV/FTC/TDF) developed resistance mutations compared to those treated with efavirenz (EFV)/FTC/TDF by population sequencing. Furthermore, more RPV/FTC/TDF-treated subjects with baseline HIV-1 RNA >100,000 copies/mL developed resistance compared to subjects with baseline HIV-1 RNA ≤100,000 copies/mL. Here, deep sequencing was utilized to assess the presence of pre-existing low-frequency variants in subjects with and without resistance development in the STaR study. Deep sequencing (Illumina MiSeq) was performed on baseline and virologic failure samples for all subjects analyzed for resistance by population sequencing during the clinical study (n = 33), as well as baseline samples from control subjects with virologic response (n = 118). Primary NRTI or NNRTI drug resistance mutations present at low frequency (≥2% to 20%) were detected in 6.6% of baseline samples by deep sequencing, all of which occurred in control subjects. Deep sequencing results were generally consistent with population sequencing but detected additional primary NNRTI and NRTI resistance mutations at virologic failure in seven samples. HIV-1 drug resistance mutations emerging while on RPV/FTC/TDF or EFV/FTC/TDF treatment were not present at low frequency at baseline in the STaR study. PMID:26690199

  3. Emergent HIV-1 Drug Resistance Mutations Were Not Present at Low-Frequency at Baseline in Non-Nucleoside Reverse Transcriptase Inhibitor-Treated Subjects in the STaR Study.

    PubMed

    Porter, Danielle P; Daeumer, Martin; Thielen, Alexander; Chang, Silvia; Martin, Ross; Cohen, Cal; Miller, Michael D; White, Kirsten L

    2015-12-07

    At Week 96 of the Single-Tablet Regimen (STaR) study, more treatment-naïve subjects that received rilpivirine/emtricitabine/tenofovir DF (RPV/FTC/TDF) developed resistance mutations compared to those treated with efavirenz (EFV)/FTC/TDF by population sequencing. Furthermore, more RPV/FTC/TDF-treated subjects with baseline HIV-1 RNA >100,000 copies/mL developed resistance compared to subjects with baseline HIV-1 RNA ≤100,000 copies/mL. Here, deep sequencing was utilized to assess the presence of pre-existing low-frequency variants in subjects with and without resistance development in the STaR study. Deep sequencing (Illumina MiSeq) was performed on baseline and virologic failure samples for all subjects analyzed for resistance by population sequencing during the clinical study (n = 33), as well as baseline samples from control subjects with virologic response (n = 118). Primary NRTI or NNRTI drug resistance mutations present at low frequency (≥2% to 20%) were detected in 6.6% of baseline samples by deep sequencing, all of which occurred in control subjects. Deep sequencing results were generally consistent with population sequencing but detected additional primary NNRTI and NRTI resistance mutations at virologic failure in seven samples. HIV-1 drug resistance mutations emerging while on RPV/FTC/TDF or EFV/FTC/TDF treatment were not present at low frequency at baseline in the STaR study.

  4. VirusDetect: An automated pipeline for efficient virus discovery using deep sequencing of small RNAs

    USDA-ARS?s Scientific Manuscript database

    Accurate detection of viruses in plants and animals is critical for agriculture production and human health. Deep sequencing and assembly of virus-derived siRNAs has proven to be a highly efficient approach for virus discovery. However, to date no computational tools specifically designed for both k...

  5. Natural Variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions (2013 DOE JGI Genomics of Energy and Environment 8th Annual User Meeting)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gordon, Sean

    2013-03-01

    Sean Gordon of the USDA on Natural variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions at the 8th Annual Genomics of Energy Environment Meeting on March 27, 2013 in Walnut Creek, CA.

  6. Microbial Diversity in Deep-sea Methane Seep Sediments Presented by SSU rRNA Gene Tag Sequencing

    PubMed Central

    Nunoura, Takuro; Takaki, Yoshihiro; Kazama, Hiromi; Hirai, Miho; Ashi, Juichiro; Imachi, Hiroyuki; Takai, Ken

    2012-01-01

    Microbial community structures in methane seep sediments in the Nankai Trough were analyzed by tag-sequencing analysis for the small subunit (SSU) rRNA gene using a newly developed primer set. The dominant members of Archaea were Deep-sea Hydrothermal Vent Euryarchaeotic Group 6 (DHVEG 6), Marine Group I (MGI) and Deep Sea Archaeal Group (DSAG), and those in Bacteria were Alpha-, Gamma-, Delta- and Epsilonproteobacteria, Chloroflexi, Bacteroidetes, Planctomycetes and Acidobacteria. Diversity and richness were examined by 8,709 and 7,690 tag-sequences from sediments at 5 and 25 cm below the seafloor (cmbsf), respectively. The estimated diversity and richness in the methane seep sediment are as high as those in soil and deep-sea hydrothermal environments, although the tag-sequences obtained in this study were not sufficient to show whole microbial diversity in this analysis. We also compared the diversity and richness of each taxon/division between the sediments from the two depths, and found that the diversity and richness of some taxa/divisions varied significantly along with the depth. PMID:22510646

  7. Deep Recurrent Neural Networks for Human Activity Recognition

    PubMed Central

    Murad, Abdulmajid

    2017-01-01

    Adopting deep learning methods for human activity recognition has been effective in extracting discriminative features from raw input sequences acquired from body-worn sensors. Although human movements are encoded in a sequence of successive samples in time, typical machine learning methods perform recognition tasks without exploiting the temporal correlations between input data samples. Convolutional neural networks (CNNs) address this issue by using convolutions across a one-dimensional temporal sequence to capture dependencies among input data. However, the size of convolutional kernels restricts the captured range of dependencies between data samples. As a result, typical models are unadaptable to a wide range of activity-recognition configurations and require fixed-length input windows. In this paper, we propose the use of deep recurrent neural networks (DRNNs) for building recognition models that are capable of capturing long-range dependencies in variable-length input sequences. We present unidirectional, bidirectional, and cascaded architectures based on long short-term memory (LSTM) DRNNs and evaluate their effectiveness on miscellaneous benchmark datasets. Experimental results show that our proposed models outperform methods employing conventional machine learning, such as support vector machine (SVM) and k-nearest neighbors (KNN). Additionally, the proposed models yield better performance than other deep learning techniques, such as deep believe networks (DBNs) and CNNs. PMID:29113103

  8. Deep Recurrent Neural Networks for Human Activity Recognition.

    PubMed

    Murad, Abdulmajid; Pyun, Jae-Young

    2017-11-06

    Adopting deep learning methods for human activity recognition has been effective in extracting discriminative features from raw input sequences acquired from body-worn sensors. Although human movements are encoded in a sequence of successive samples in time, typical machine learning methods perform recognition tasks without exploiting the temporal correlations between input data samples. Convolutional neural networks (CNNs) address this issue by using convolutions across a one-dimensional temporal sequence to capture dependencies among input data. However, the size of convolutional kernels restricts the captured range of dependencies between data samples. As a result, typical models are unadaptable to a wide range of activity-recognition configurations and require fixed-length input windows. In this paper, we propose the use of deep recurrent neural networks (DRNNs) for building recognition models that are capable of capturing long-range dependencies in variable-length input sequences. We present unidirectional, bidirectional, and cascaded architectures based on long short-term memory (LSTM) DRNNs and evaluate their effectiveness on miscellaneous benchmark datasets. Experimental results show that our proposed models outperform methods employing conventional machine learning, such as support vector machine (SVM) and k-nearest neighbors (KNN). Additionally, the proposed models yield better performance than other deep learning techniques, such as deep believe networks (DBNs) and CNNs.

  9. Distribution of long-lived radioactive iodine isotope (I-129) in pore waters from the gas hydrate fields on the continental margins: Indication for methane source of gas hydrate deposits

    NASA Astrophysics Data System (ADS)

    Tomaru, H.; Lu, Z.; Fehn, U.

    2011-12-01

    Because iodine has a strong association with organic matters in marine environments, pore waters in high methane potential region, in particular gas hydrate occurrences on the continental margins, are enriched significantly in iodine compared with seawater. Natural iodine system is composed of stable and radioactive species, I-129 (half-life of 15.7 Myr) has been used for estimating the age of source formations both for methane and iodine, because iodine can be liberated into pore water during the degradation of organic matter to methane in deep sediments. Here we present I-129 age data in pore waters collected from variety of gas hydrate occurrences on the continental margins. The I-129 ages in pore waters from these locations are significantly older than those of host sediments, indicating long-term transport and accumulation from deep/old sediments. The I-129 ages in the Japan Sea and Okhotsk Sea along the plate boundary between the North American and Amurian Plates correspond to the ages of initial spreading of these marginal seas, pointing to the massive deposition of organic matter for methane generation in deep sediments within limited periods. On the Pacific side of these areas, organic matter-rich back stop is responsible for methane in deep-seated gas hydrate deposits along the Nankai Trough. Deep coaly sequences responsible for deep conventional natural gas deposits are also responsible for overlying gas hydrate deposits off Shimokita Peninsula, NE Japan. Those in the Gulf of Mexico are correlative to the ages of sediments where the top of salt diapirs intrude. Marine sediments on the Pacific Plate subducting beneath the Australian Plate are likely responsible for the methane and iodine in the Hikurangi Trough, New Zealand. These ages reflect well the regional geological settings responsible for generation, transport, and accumulation of methane, I-129 is a key to understand the geological history of gas hydrate deposition.

  10. Cratering motions and structural deformation in the rim of the Prairie Flat multiring explosion crater

    NASA Technical Reports Server (NTRS)

    Roddy, D. J.; Ullrich, G. W.; Sauer, F. M.; Jones, G. H. S.

    1977-01-01

    Cratering motions and structural deformation are described for the rim of the Prairie Flat multiring crater, 85.5 m across and 5.3 m deep, which was formed by the detonation of a 500-ton TNT surface-tangent sphere. The terminal displacement and motion data are derived from marker cans and velocity gages emplaced in drill holes in a three-dimensional matrix radial to the crater. The integration of this data with a detailed geologic cross section, mapped from deep trench excavations through the rim, provides a composite view of the general sequence of motions that formed a transiently uplifted rim, overturned flap, inverted stratigraphy, downfolded rim, and deformed strata in the crater walls. Preliminary comparisons with laboratory experimental cratering and with numerical simulations indicate that explosion craters of the Prairie Flat-type generated by surface and near-surface energy sources tend to follow predictable motion sequences and produce comparable structural deformation. More specifically, central uplift and multiring impact craters with morphologies and structures comparable to Prairie Flat are inferred to have experienced similar deformational histories of the rim, such as uplift, overturning, terracing, and downfolding.

  11. Draft Genome Sequence of Pseudomonas oceani DSM 100277T, a Deep-Sea Bacterium.

    PubMed

    García-Valdés, Elena; Gomila, Margarita; Mulet, Magdalena; Lalucat, Jorge

    2018-04-12

    Pseudomonas oceani DSM 100277 T was isolated from deep seawater in the Okinawa Trough at 1390 m. P. oceani belongs to the Pseudomonas pertucinogena group. Here, we report the draft genome sequence of P. oceani , which has an estimated size of 4.1 Mb and exhibits 3,790 coding sequences, with a G+C content of 59.94 mol%. Copyright © 2018 García-Valdés et al.

  12. IgM Repertoire Biodiversity is Reduced in HIV-1 Infection and Systemic Lupus Erythematosus.

    PubMed

    Yin, Li; Hou, Wei; Liu, Li; Cai, Yunpeng; Wallet, Mark Andrew; Gardner, Brent Paul; Chang, Kaifen; Lowe, Amanda Catherine; Rodriguez, Carina Adriana; Sriaroon, Panida; Farmerie, William George; Sleasman, John William; Goodenow, Maureen Michels

    2013-01-01

    HIV-1 infection or systemic lupus erythematosus (SLE) disrupt B cell homeostasis, reduce memory B cells, and impair function of IgG and IgM antibodies. To determine how disturbances in B cell populations producing polyclonal antibodies relate to the IgM repertoire, the IgM transcriptome in health and disease was explored at the complementarity determining region 3 (CDRH3) sequence level. 454-deep pyrosequencing in combination with a novel analysis pipeline was applied to define populations of IGHM CDRH3 sequences based on absence or presence of somatic hypermutations (SHM) in peripheral blood B cells. HIV or SLE subjects have reduced biodiversity within their IGHM transcriptome compared to healthy subjects, mainly due to a significant decrease in the number of unique combinations of alleles, although recombination machinery was intact. While major differences between sequences without or with SHM occurred among all groups, IGHD and IGHJ allele use, CDRH3 length distribution, or generation of SHM were similar among study cohorts. Antiretroviral therapy failed to normalize IGHM biodiversity in HIV-infected individuals. All subjects had a low frequency of allelic combinations within the IGHM repertoire similar to known broadly neutralizing HIV-1 antibodies. Polyclonal expansion would decrease overall IgM biodiversity independent of other mechanisms for development of the B cell repertoire. Applying deep sequencing as a strategy to follow development of the IgM repertoire in health and disease provides a novel molecular assessment of multiple points along the B cell differentiation pathway that is highly sensitive for detecting perturbations within the repertoire at the population level.

  13. Deep Ion Torrent sequencing identifies soil fungal community shifts after frequent prescribed fires in a southeastern US forest ecosystem.

    PubMed

    Brown, Shawn P; Callaham, Mac A; Oliver, Alena K; Jumpponen, Ari

    2013-12-01

    Prescribed burning is a common management tool to control fuel loads, ground vegetation, and facilitate desirable game species. We evaluated soil fungal community responses to long-term prescribed fire treatments in a loblolly pine forest on the Piedmont of Georgia and utilized deep Internal Transcribed Spacer Region 1 (ITS1) amplicon sequencing afforded by the recent Ion Torrent Personal Genome Machine (PGM). These deep sequence data (19,000 + reads per sample after subsampling) indicate that frequent fires (3-year fire interval) shift soil fungus communities, whereas infrequent fires (6-year fire interval) permit system resetting to a state similar to that without prescribed fire. Furthermore, in nonmetric multidimensional scaling analyses, primarily ectomycorrhizal taxa were correlated with axes associated with long fire intervals, whereas soil saprobes tended to be correlated with the frequent fire recurrence. We conclude that (1) multiplexed Ion Torrent PGM analyses allow deep cost effective sequencing of fungal communities but may suffer from short read lengths and inconsistent sequence quality adjacent to the sequencing adaptor; (2) frequent prescribed fires elicit a shift in soil fungal communities; and (3) such shifts do not occur when fire intervals are longer. Our results emphasize the general responsiveness of these forests to management, and the importance of fire return intervals in meeting management objectives. © 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.

  14. Molecular characterization of oral squamous cell carcinoma using targeted next-generation sequencing.

    PubMed

    Er, Tze-Kiong; Wang, Yen-Yun; Chen, Chih-Chieh; Herreros-Villanueva, Marta; Liu, Ta-Chih; Yuan, Shyng-Shiou F

    2015-10-01

    Many genetic factors play an important role in the development of oral squamous cell carcinoma. The aim of this study was to assess the mutational profile in oral squamous cell carcinoma using formalin-fixed, paraffin-embedded tumors from a Taiwanese population by performing targeted sequencing of 26 cancer-associated genes that are frequently mutated in solid tumors. Next-generation sequencing was performed in 50 formalin-fixed, paraffin-embedded tumor specimens obtained from patients with oral squamous cell carcinoma. Genetic alterations in the 26 cancer-associated genes were detected using a deep sequencing (>1000X) approach. TP53, PIK3CA, MET, APC, CDH1, and FBXW7 were most frequently mutated genes. Most remarkably, TP53 mutations and PIK3CA mutations, which accounted for 68% and 18% of tumors, respectively, were more prevalent in a Taiwanese population. Other genes including MET (4%), APC (4%), CDH1 (2%), and FBXW7 (2%) were identified in our population. In summary, our study shows the feasibility of performing targeted sequencing using formalin-fixed, paraffin-embedded samples. Additionally, this study also reports the mutational landscape of oral squamous cell carcinoma in the Taiwanese population. We believe that this study will shed new light on fundamental aspects in understanding the molecular pathogenesis of oral squamous cell carcinoma and may aid in the development of new targeted therapies. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  15. Deep sequencing reveals double mutations in cis of MPL exon 10 in myeloproliferative neoplasms.

    PubMed

    Pietra, Daniela; Brisci, Angela; Rumi, Elisa; Boggi, Sabrina; Elena, Chiara; Pietrelli, Alessandro; Bordoni, Roberta; Ferrari, Maurizio; Passamonti, Francesco; De Bellis, Gianluca; Cremonesi, Laura; Cazzola, Mario

    2011-04-01

    Somatic mutations of MPL exon 10, mainly involving a W515 substitution, have been described in JAK2 (V617F)-negative patients with essential thrombocythemia and primary myelofibrosis. We used direct sequencing and high-resolution melt analysis to identify mutations of MPL exon 10 in 570 patients with myeloproliferative neoplasms, and allele specific PCR and deep sequencing to further characterize a subset of mutated patients. Somatic mutations were detected in 33 of 221 patients (15%) with JAK2 (V617F)-negative essential thrombocythemia or primary myelofibrosis. Only one patient with essential thrombocythemia carried both JAK2 (V617F) and MPL (W515L). High-resolution melt analysis identified abnormal patterns in all the MPL mutated cases, while direct sequencing did not detect the mutant MPL in one fifth of them. In 3 cases carrying double MPL mutations, deep sequencing analysis showed identical load and location in cis of the paired lesions, indicating their simultaneous occurrence on the same chromosome.

  16. Diverse, rare microbial taxa responded to the Deepwater Horizon deep-sea hydrocarbon plume.

    PubMed

    Kleindienst, Sara; Grim, Sharon; Sogin, Mitchell; Bracco, Annalisa; Crespo-Medina, Melitza; Joye, Samantha B

    2016-02-01

    The Deepwater Horizon (DWH) oil well blowout generated an enormous plume of dispersed hydrocarbons that substantially altered the Gulf of Mexico's deep-sea microbial community. A significant enrichment of distinct microbial populations was observed, yet, little is known about the abundance and richness of specific microbial ecotypes involved in gas, oil and dispersant biodegradation in the wake of oil spills. Here, we document a previously unrecognized diversity of closely related taxa affiliating with Cycloclasticus, Colwellia and Oceanospirillaceae and describe their spatio-temporal distribution in the Gulf's deepwater, in close proximity to the discharge site and at increasing distance from it, before, during and after the discharge. A highly sensitive, computational method (oligotyping) applied to a data set generated from 454-tag pyrosequencing of bacterial 16S ribosomal RNA gene V4-V6 regions, enabled the detection of population dynamics at the sub-operational taxonomic unit level (0.2% sequence similarity). The biogeochemical signature of the deep-sea samples was assessed via total cell counts, concentrations of short-chain alkanes (C1-C5), nutrients, (colored) dissolved organic and inorganic carbon, as well as methane oxidation rates. Statistical analysis elucidated environmental factors that shaped ecologically relevant dynamics of oligotypes, which likely represent distinct ecotypes. Major hydrocarbon degraders, adapted to the slow-diffusive natural hydrocarbon seepage in the Gulf of Mexico, appeared unable to cope with the conditions encountered during the DWH spill or were outcompeted. In contrast, diverse, rare taxa increased rapidly in abundance, underscoring the importance of specialized sub-populations and potential ecotypes during massive deep-sea oil discharges and perhaps other large-scale perturbations.

  17. Zero-Echo-Time and Dixon Deep Pseudo-CT (ZeDD CT): Direct Generation of Pseudo-CT Images for Pelvic PET/MRI Attenuation Correction Using Deep Convolutional Neural Networks with Multiparametric MRI.

    PubMed

    Leynes, Andrew P; Yang, Jaewon; Wiesinger, Florian; Kaushik, Sandeep S; Shanbhag, Dattesh D; Seo, Youngho; Hope, Thomas A; Larson, Peder E Z

    2018-05-01

    Accurate quantification of uptake on PET images depends on accurate attenuation correction in reconstruction. Current MR-based attenuation correction methods for body PET use a fat and water map derived from a 2-echo Dixon MRI sequence in which bone is neglected. Ultrashort-echo-time or zero-echo-time (ZTE) pulse sequences can capture bone information. We propose the use of patient-specific multiparametric MRI consisting of Dixon MRI and proton-density-weighted ZTE MRI to directly synthesize pseudo-CT images with a deep learning model: we call this method ZTE and Dixon deep pseudo-CT (ZeDD CT). Methods: Twenty-six patients were scanned using an integrated 3-T time-of-flight PET/MRI system. Helical CT images of the patients were acquired separately. A deep convolutional neural network was trained to transform ZTE and Dixon MR images into pseudo-CT images. Ten patients were used for model training, and 16 patients were used for evaluation. Bone and soft-tissue lesions were identified, and the SUV max was measured. The root-mean-squared error (RMSE) was used to compare the MR-based attenuation correction with the ground-truth CT attenuation correction. Results: In total, 30 bone lesions and 60 soft-tissue lesions were evaluated. The RMSE in PET quantification was reduced by a factor of 4 for bone lesions (10.24% for Dixon PET and 2.68% for ZeDD PET) and by a factor of 1.5 for soft-tissue lesions (6.24% for Dixon PET and 4.07% for ZeDD PET). Conclusion: ZeDD CT produces natural-looking and quantitatively accurate pseudo-CT images and reduces error in pelvic PET/MRI attenuation correction compared with standard methods. © 2018 by the Society of Nuclear Medicine and Molecular Imaging.

  18. Thermovibrio ammonificans sp. nov., a thermophilic, chemolithotrophic, nitrate-ammonifying bacterium from deep-sea hydrothermal vents.

    PubMed

    Vetriani, Costantino; Speck, Mark D; Ellor, Susan V; Lutz, Richard A; Starovoytov, Valentin

    2004-01-01

    A thermophilic, anaerobic, chemolithoautotrophic bacterium was isolated from the walls of an active deep-sea hydrothermal vent chimney on the East Pacific Rise at 9 degrees 50' N. Cells of the organism were Gram-negative, motile rods that were about 1.0 microm in length and 0.6 microm in width. Growth occurred between 60 and 80 degrees C (optimum at 75 degrees C), 0.5 and 4.5% (w/v) NaCl (optimum at 2%) and pH 5 and 7 (optimum at 5.5). Generation time under optimal conditions was 1.57 h. Growth occurred under chemolithoautotrophic conditions in the presence of H2 and CO2, with nitrate or sulfur as the electron acceptor and with concomitant formation of ammonium or hydrogen sulfide, respectively. Thiosulfate, sulfite and oxygen were not used as electron acceptors. Acetate, formate, lactate and yeast extract inhibited growth. No chemoorganoheterotrophic growth was observed on peptone, tryptone or Casamino acids. The genomic DNA G+C content was 54.6 mol%. Phylogenetic analyses of the 16S rRNA gene sequence indicated that the organism was a member of the domain Bacteria and formed a deep branch within the phylum Aquificae, with Thermovibrio ruber as its closest relative (94.4% sequence similarity). On the basis of phylogenetic, physiological and genetic considerations, it is proposed that the organism represents a novel species within the newly described genus Thermovibrio. The type strain is Thermovibrio ammonificans HB-1T (=DSM 15698T=JCM 12110T).

  19. RaptorX-Property: a web server for protein structure property prediction.

    PubMed

    Wang, Sheng; Li, Wei; Liu, Shiwang; Xu, Jinbo

    2016-07-08

    RaptorX Property (http://raptorx2.uchicago.edu/StructurePropertyPred/predict/) is a web server predicting structure property of a protein sequence without using any templates. It outperforms other servers, especially for proteins without close homologs in PDB or with very sparse sequence profile (i.e. carries little evolutionary information). This server employs a powerful in-house deep learning model DeepCNF (Deep Convolutional Neural Fields) to predict secondary structure (SS), solvent accessibility (ACC) and disorder regions (DISO). DeepCNF not only models complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent property labels. Our experimental results show that, tested on CASP10, CASP11 and the other benchmarks, this server can obtain ∼84% Q3 accuracy for 3-state SS, ∼72% Q8 accuracy for 8-state SS, ∼66% Q3 accuracy for 3-state solvent accessibility, and ∼0.89 area under the ROC curve (AUC) for disorder prediction. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. Deep sequencing analysis of viral infection and evolution allows rapid and detailed characterization of viral mutant spectrum.

    PubMed

    Isakov, Ofer; Bordería, Antonio V; Golan, David; Hamenahem, Amir; Celniker, Gershon; Yoffe, Liron; Blanc, Hervé; Vignuzzi, Marco; Shomron, Noam

    2015-07-01

    The study of RNA virus populations is a challenging task. Each population of RNA virus is composed of a collection of different, yet related genomes often referred to as mutant spectra or quasispecies. Virologists using deep sequencing technologies face major obstacles when studying virus population dynamics, both experimentally and in natural settings due to the relatively high error rates of these technologies and the lack of high performance pipelines. In order to overcome these hurdles we developed a computational pipeline, termed ViVan (Viral Variance Analysis). ViVan is a complete pipeline facilitating the identification, characterization and comparison of sequence variance in deep sequenced virus populations. Applying ViVan on deep sequenced data obtained from samples that were previously characterized by more classical approaches, we uncovered novel and potentially crucial aspects of virus populations. With our experimental work, we illustrate how ViVan can be used for studies ranging from the more practical, detection of resistant mutations and effects of antiviral treatments, to the more theoretical temporal characterization of the population in evolutionary studies. Freely available on the web at http://www.vivanbioinfo.org : nshomron@post.tau.ac.il Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  1. Genomic and Transcriptomic Analyses to Identify Pathways Involved in Nanoparticle Generation in the Ubiquitous Marine Bacterium Alteromonas macleodii Under Elevated Copper Conditions

    NASA Astrophysics Data System (ADS)

    Cusick, K. D.; Dale, J.; Little, B.; Cockrell, A.; Biffinger, J.

    2016-02-01

    Alteromonas macleodii is a ubiquitous marine bacterium that clusters by molecular analyses into two ecotypes: surface and deep-water. Our group isolated a marine bacterium from copper coupons that generates nanoparticles (NPs) at elevated copper concentrations. Sequencing of the 16S rRNA gene identified it as an A. macleodii strain. In phylogenetic analyses based on the gyrB gene, it clustered with other surface isolates; however, it formed a unique cluster separate from that of other surface isolates based on rpoB gene sequences. Copper is commonly employed as an antifouling agent on the hulls of ships, and so copper tolerance and NP generation is under investigation in this strain. The overall goals of this study were: (1) to determine if copper tolerance is the result of changes at the genetic or transcriptional level and (2) to identify the genes involved in NP formation. Sub-cultures were established from the initial isolate in which copper concentrations were increased in .25 mM increments through multiple generations. These sub-cultures were assayed for NP formation in seawater medium supplemented with 3-4 mM copper. Scanning electron microscopy revealed large aggregates of NPs on the exterior surface of all sub-cultures. Additionally, a portion of the cells in all sub-cultures displayed an elongated morphology in comparison to the wild-type. No NPs were observed in wild-type controls grown without the addition of increased copper. Metagenomic sequencing of natural populations of A. macleodii revealed extreme divergence in several large genomic regions whose content includes genes coding for exopolysaccharide production and metal resistance. High-throughput sequencing is being used to determine whether copper tolerance and NP generation is the result of genetic or transcriptional changes. These results will be extended to natural communities to gain insights into the role of bacterial NPs during conditions of elevated metal concentrations in coastal systems.

  2. Identification of ribonucleotide reductase mutation causing temperature-sensitivity of herpes simplex virus isolates from whitlow by deep sequencing.

    PubMed

    Daikoku, Tohru; Oyama, Yukari; Yajima, Misako; Sekizuka, Tsuyoshi; Kuroda, Makoto; Shimada, Yuka; Takehara, Kazuhiko; Miwa, Naoko; Okuda, Tomoko; Sata, Tetsutaro; Shiraki, Kimiyasu

    2015-06-01

    Herpes simplex virus 2 caused a genital ulcer, and a secondary herpetic whitlow appeared during acyclovir therapy. The secondary and recurrent whitlow isolates were acyclovir-resistant and temperature-sensitive in contrast to a genital isolate. We identified the ribonucleotide reductase mutation responsible for temperature-sensitivity by deep-sequencing analysis.

  3. Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning.

    PubMed

    Teng, Haotian; Cao, Minh Duc; Hall, Michael B; Duarte, Tania; Wang, Sheng; Coin, Lachlan J M

    2018-05-01

    Sequencing by translocating DNA fragments through an array of nanopores is a rapidly maturing technology that offers faster and cheaper sequencing than other approaches. However, accurately deciphering the DNA sequence from the noisy and complex electrical signal is challenging. Here, we report Chiron, the first deep learning model to achieve end-to-end basecalling and directly translate the raw signal to DNA sequence without the error-prone segmentation step. Trained with only a small set of 4,000 reads, we show that our model provides state-of-the-art basecalling accuracy, even on previously unseen species. Chiron achieves basecalling speeds of more than 2,000 bases per second using desktop computer graphics processing units.

  4. The LAM-PCR Method to Sequence LV Integration Sites.

    PubMed

    Wang, Wei; Bartholomae, Cynthia C; Gabriel, Richard; Deichmann, Annette; Schmidt, Manfred

    2016-01-01

    Integrating viral gene transfer vectors are commonly used gene delivery tools in clinical gene therapy trials providing stable integration and continuous gene expression of the transgene in the treated host cell. However, integration of the reverse-transcribed vector DNA into the host genome is a potentially mutagenic event that may directly contribute to unwanted side effects. A comprehensive and accurate analysis of the integration site (IS) repertoire is indispensable to study clonality in transduced cells obtained from patients undergoing gene therapy and to identify potential in vivo selection of affected cell clones. To date, next-generation sequencing (NGS) of vector-genome junctions allows sophisticated studies on the integration repertoire in vitro and in vivo. We have explored the use of the Illumina MiSeq Personal Sequencer platform to sequence vector ISs amplified by non-restrictive linear amplification-mediated PCR (nrLAM-PCR) and LAM-PCR. MiSeq-based high-quality IS sequence retrieval is accomplished by the introduction of a double-barcode strategy that substantially minimizes the frequency of IS sequence collisions compared to the conventionally used single-barcode protocol. Here, we present an updated protocol of (nr)LAM-PCR for the analysis of lentiviral IS using a double-barcode system and followed by deep sequencing using the MiSeq device.

  5. Using deep RNA sequencing for the structural annotation of the laccaria bicolor mycorrhizal transcriptome.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Larsen, P. E.; Trivedi, G.; Sreedasyam, A.

    2010-07-06

    Accurate structural annotation is important for prediction of function and required for in vitro approaches to characterize or validate the gene expression products. Despite significant efforts in the field, determination of the gene structure from genomic data alone is a challenging and inaccurate process. The ease of acquisition of transcriptomic sequence provides a direct route to identify expressed sequences and determine the correct gene structure. We developed methods to utilize RNA-seq data to correct errors in the structural annotation and extend the boundaries of current gene models using assembly approaches. The methods were validated with a transcriptomic data set derivedmore » from the fungus Laccaria bicolor, which develops a mycorrhizal symbiotic association with the roots of many tree species. Our analysis focused on the subset of 1501 gene models that are differentially expressed in the free living vs. mycorrhizal transcriptome and are expected to be important elements related to carbon metabolism, membrane permeability and transport, and intracellular signaling. Of the set of 1501 gene models, 1439 (96%) successfully generated modified gene models in which all error flags were successfully resolved and the sequences aligned to the genomic sequence. The remaining 4% (62 gene models) either had deviations from transcriptomic data that could not be spanned or generated sequence that did not align to genomic sequence. The outcome of this process is a set of high confidence gene models that can be reliably used for experimental characterization of protein function. 69% of expressed mycorrhizal JGI 'best' gene models deviated from the transcript sequence derived by this method. The transcriptomic sequence enabled correction of a majority of the structural inconsistencies and resulted in a set of validated models for 96% of the mycorrhizal genes. The method described here can be applied to improve gene structural annotation in other species, provided that there is a sequenced genome and a set of gene models.« less

  6. Analysis of BAC end sequences in oak, a keystone forest tree species, providing insight into the composition of its genome

    PubMed Central

    2011-01-01

    Background One of the key goals of oak genomics research is to identify genes of adaptive significance. This information may help to improve the conservation of adaptive genetic variation and the management of forests to increase their health and productivity. Deep-coverage large-insert genomic libraries are a crucial tool for attaining this objective. We report herein the construction of a BAC library for Quercus robur, its characterization and an analysis of BAC end sequences. Results The EcoRI library generated consisted of 92,160 clones, 7% of which had no insert. Levels of chloroplast and mitochondrial contamination were below 3% and 1%, respectively. Mean clone insert size was estimated at 135 kb. The library represents 12 haploid genome equivalents and, the likelihood of finding a particular oak sequence of interest is greater than 99%. Genome coverage was confirmed by PCR screening of the library with 60 unique genetic loci sampled from the genetic linkage map. In total, about 20,000 high-quality BAC end sequences (BESs) were generated by sequencing 15,000 clones. Roughly 5.88% of the combined BAC end sequence length corresponded to known retroelements while ab initio repeat detection methods identified 41 additional repeats. Collectively, characterized and novel repeats account for roughly 8.94% of the genome. Further analysis of the BESs revealed 1,823 putative genes suggesting at least 29,340 genes in the oak genome. BESs were aligned with the genome sequences of Arabidopsis thaliana, Vitis vinifera and Populus trichocarpa. One putative collinear microsyntenic region encoding an alcohol acyl transferase protein was observed between oak and chromosome 2 of V. vinifera. Conclusions This BAC library provides a new resource for genomic studies, including SSR marker development, physical mapping, comparative genomics and genome sequencing. BES analysis provided insight into the structure of the oak genome. These sequences will be used in the assembly of a future genome sequence for oak. PMID:21645357

  7. Cough event classification by pretrained deep neural network.

    PubMed

    Liu, Jia-Ming; You, Mingyu; Wang, Zheng; Li, Guo-Zheng; Xu, Xianghuai; Qiu, Zhongmin

    2015-01-01

    Cough is an essential symptom in respiratory diseases. In the measurement of cough severity, an accurate and objective cough monitor is expected by respiratory disease society. This paper aims to introduce a better performed algorithm, pretrained deep neural network (DNN), to the cough classification problem, which is a key step in the cough monitor. The deep neural network models are built from two steps, pretrain and fine-tuning, followed by a Hidden Markov Model (HMM) decoder to capture tamporal information of the audio signals. By unsupervised pretraining a deep belief network, a good initialization for a deep neural network is learned. Then the fine-tuning step is a back propogation tuning the neural network so that it can predict the observation probability associated with each HMM states, where the HMM states are originally achieved by force-alignment with a Gaussian Mixture Model Hidden Markov Model (GMM-HMM) on the training samples. Three cough HMMs and one noncough HMM are employed to model coughs and noncoughs respectively. The final decision is made based on viterbi decoding algorihtm that generates the most likely HMM sequence for each sample. A sample is labeled as cough if a cough HMM is found in the sequence. The experiments were conducted on a dataset that was collected from 22 patients with respiratory diseases. Patient dependent (PD) and patient independent (PI) experimental settings were used to evaluate the models. Five criteria, sensitivity, specificity, F1, macro average and micro average are shown to depict different aspects of the models. From overall evaluation criteria, the DNN based methods are superior to traditional GMM-HMM based method on F1 and micro average with maximal 14% and 11% error reduction in PD and 7% and 10% in PI, meanwhile keep similar performances on macro average. They also surpass GMM-HMM model on specificity with maximal 14% error reduction on both PD and PI. In this paper, we tried pretrained deep neural network in cough classification problem. Our results showed that comparing with the conventional GMM-HMM framework, the HMM-DNN could get better overall performance on cough classification task.

  8. De novo Transcriptome Analysis of Portunus trituberculatus Ovary and Testis by RNA-Seq: Identification of Genes Involved in Gonadal Development

    PubMed Central

    Meng, Xian-liang; Liu, Ping; Jia, Fu-long; Li, Jian; Gao, Bao-Quan

    2015-01-01

    The swimming crab Portunus trituberculatus is a commercially important crab species in East Asia countries. Gonadal development is a physiological process of great significance to the reproduction as well as commercial seed production for P. trituberculatus. However, little is currently known about the molecular mechanisms governing the developmental processes of gonads in this species. To open avenues of molecular research on P. trituberculatus gonadal development, Illumina paired-end sequencing technology was employed to develop deep-coverage transcriptome sequencing data for its gonads. Illumina sequencing generated 58,429,148 and 70,474,978 high-quality reads from the ovary and testis cDNA library, respectively. All these reads were assembled into 54,960 unigenes with an average sequence length of 879 bp, of which 12,340 unigenes (22.45% of the total) matched sequences in GenBank non-redundant database. Based on our transcriptome analysis as well as published literature, a number of candidate genes potentially involved in the regulation of gonadal development of P. trituberculatus were identified, such as FAOMeT, mPRγ, PGMRC1, PGDS, PGER4, 3β-HSD and 17β-HSDs. Differential expression analysis generated 5,919 differentially expressed genes between ovary and testis, among which many genes related to gametogenesis and several genes previously reported to be critical in differentiation and development of gonads were found, including Foxl2, Wnt4, Fst, Fem-1 and Sox9. Furthermore, 28,534 SSRs and 111,646 high-quality SNPs were identified in this transcriptome dataset. This work represents the first transcriptome analysis of P. trituberculatus gonads using the next generation sequencing technology and provides a valuable dataset for understanding molecular mechanisms controlling development of gonads and facilitating future investigation of reproductive biology in this species. The molecular markers obtained in this study will provide a fundamental basis for population genetics and functional genomics in P. trituberculatus and other closely related species. PMID:26042806

  9. The deep, hot biosphere: Twenty-five years of retrospection.

    PubMed

    Colman, Daniel R; Poudel, Saroj; Stamps, Blake W; Boyd, Eric S; Spear, John R

    2017-07-03

    Twenty-five years ago this month, Thomas Gold published a seminal manuscript suggesting the presence of a "deep, hot biosphere" in the Earth's crust. Since this publication, a considerable amount of attention has been given to the study of deep biospheres, their role in geochemical cycles, and their potential to inform on the origin of life and its potential outside of Earth. Overwhelming evidence now supports the presence of a deep biosphere ubiquitously distributed on Earth in both terrestrial and marine settings. Furthermore, it has become apparent that much of this life is dependent on lithogenically sourced high-energy compounds to sustain productivity. A vast diversity of uncultivated microorganisms has been detected in subsurface environments, and we show that H 2 , CH 4 , and CO feature prominently in many of their predicted metabolisms. Despite 25 years of intense study, key questions remain on life in the deep subsurface, including whether it is endemic and the extent of its involvement in the anaerobic formation and degradation of hydrocarbons. Emergent data from cultivation and next-generation sequencing approaches continue to provide promising new hints to answer these questions. As Gold suggested, and as has become increasingly evident, to better understand the subsurface is critical to further understanding the Earth, life, the evolution of life, and the potential for life elsewhere. To this end, we suggest the need to develop a robust network of interdisciplinary scientists and accessible field sites for long-term monitoring of the Earth's subsurface in the form of a deep subsurface microbiome initiative.

  10. Deep Illumina-Based Shotgun Sequencing Reveals Dietary Effects on the Structure and Function of the Fecal Microbiome of Growing Kittens

    PubMed Central

    Deusch, Oliver; O’Flynn, Ciaran; Colyer, Alison; Morris, Penelope; Allaway, David; Jones, Paul G.; Swanson, Kelly S.

    2014-01-01

    Background Previously, we demonstrated that dietary protein:carbohydrate ratio dramatically affects the fecal microbial taxonomic structure of kittens using targeted 16S gene sequencing. The present study, using the same fecal samples, applied deep Illumina shotgun sequencing to identify the diet-associated functional potential and analyze taxonomic changes of the feline fecal microbiome. Methodology & Principal Findings Fecal samples from kittens fed one of two diets differing in protein and carbohydrate content (high–protein, low–carbohydrate, HPLC; and moderate-protein, moderate-carbohydrate, MPMC) were collected at 8, 12 and 16 weeks of age (n = 6 per group). A total of 345.3 gigabases of sequence were generated from 36 samples, with 99.75% of annotated sequences identified as bacterial. At the genus level, 26% and 39% of reads were annotated for HPLC- and MPMC-fed kittens, with HPLC-fed cats showing greater species richness and microbial diversity. Two phyla, ten families and fifteen genera were responsible for more than 80% of the sequences at each taxonomic level for both diet groups, consistent with the previous taxonomic study. Significantly different abundances between diet groups were observed for 324 genera (56% of all genera identified) demonstrating widespread diet-induced changes in microbial taxonomic structure. Diversity was not affected over time. Functional analysis identified 2,013 putative enzyme function groups were different (p<0.000007) between the two dietary groups and were associated to 194 pathways, which formed five discrete clusters based on average relative abundance. Of those, ten contained more (p<0.022) enzyme functions with significant diet effects than expected by chance. Six pathways were related to amino acid biosynthesis and metabolism linking changes in dietary protein with functional differences of the gut microbiome. Conclusions These data indicate that feline feces-derived microbiomes have large structural and functional differences relating to the dietary protein:carbohydrate ratio and highlight the impact of diet early in life. PMID:25010839

  11. From clinical sample to complete genome: Comparing methods for the extraction of HIV-1 RNA for high-throughput deep sequencing.

    PubMed

    Cornelissen, Marion; Gall, Astrid; Vink, Monique; Zorgdrager, Fokla; Binter, Špela; Edwards, Stephanie; Jurriaans, Suzanne; Bakker, Margreet; Ong, Swee Hoe; Gras, Luuk; van Sighem, Ard; Bezemer, Daniela; de Wolf, Frank; Reiss, Peter; Kellam, Paul; Berkhout, Ben; Fraser, Christophe; van der Kuyl, Antoinette C

    2017-07-15

    The BEEHIVE (Bridging the Evolution and Epidemiology of HIV in Europe) project aims to analyse nearly-complete viral genomes from >3000 HIV-1 infected Europeans using high-throughput deep sequencing techniques to investigate the virus genetic contribution to virulence. Following the development of a computational pipeline, including a new de novo assembler for RNA virus genomes, to generate larger contiguous sequences (contigs) from the abundance of short sequence reads that characterise the data, another area that determines genome sequencing success is the quality and quantity of the input RNA. A pilot experiment with 125 patient plasma samples was performed to investigate the optimal method for isolation of HIV-1 viral RNA for long amplicon genome sequencing. Manual isolation with the QIAamp Viral RNA Mini Kit (Qiagen) was superior over robotically extracted RNA using either the QIAcube robotic system, the mSample Preparation Systems RNA kit with automated extraction by the m2000sp system (Abbott Molecular), or the MagNA Pure 96 System in combination with the MagNA Pure 96 Instrument (Roche Diagnostics). We scored amplification of a set of four HIV-1 amplicons of ∼1.9, 3.6, 3.0 and 3.5kb, and subsequent recovery of near-complete viral genomes. Subsequently, 616 BEEHIVE patient samples were analysed to determine factors that influence successful amplification of the genome in four overlapping amplicons using the QIAamp Viral RNA Kit for viral RNA isolation. Both low plasma viral load and high sample age (stored before 1999) negatively influenced the amplification of viral amplicons >3kb. A plasma viral load of >100,000 copies/ml resulted in successful amplification of all four amplicons for 86% of the samples, this value dropped to only 46% for samples with viral loads of <20,000 copies/ml. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  12. Deep reefs are not universal refuges: Reseeding potential varies among coral species

    PubMed Central

    Bongaerts, Pim; Riginos, Cynthia; Brunner, Ramona; Englebert, Norbert; Smith, Struan R.; Hoegh-Guldberg, Ove

    2017-01-01

    Deep coral reefs (that is, mesophotic coral ecosystems) can act as refuges against major disturbances affecting shallow reefs. It has been proposed that, through the provision of coral propagules, such deep refuges may aid in shallow reef recovery; however, this “reseeding” hypothesis remains largely untested. We conducted a genome-wide assessment of two scleractinian coral species with contrasting reproductive modes, to assess the potential for connectivity between mesophotic (40 m) and shallow (12 m) depths on an isolated reef system in the Western Atlantic (Bermuda). To overcome the pervasive issue of endosymbiont contamination associated with de novo sequencing of corals, we used a novel subtraction reference approach. We have demonstrated that strong depth-associated selection has led to genome-wide divergence in the brooding species Agaricia fragilis (with divergence by depth exceeding divergence by location). Despite introgression from shallow into deep populations, a lack of first-generation migrants indicates that effective connectivity over ecological time scales is extremely limited for this species and thus precludes reseeding of shallow reefs from deep refuges. In contrast, no genetic structuring between depths (or locations) was observed for the broadcasting species Stephanocoenia intersepta, indicating substantial potential for vertical connectivity. Our findings demonstrate that vertical connectivity within the same reef system can differ greatly between species and that the reseeding potential of deep reefs in Bermuda may apply to only a small number of scleractinian species. Overall, we argue that the “deep reef refuge hypothesis” holds for individual coral species during episodic disturbances but should not be assumed as a broader ecosystem-wide phenomenon. PMID:28246645

  13. Utility of next-generation RNA-sequencing in identifying chimeric transcription involving human endogenous retroviruses.

    PubMed

    Sokol, Martin; Jessen, Karen Margrethe; Pedersen, Finn Skou

    2016-01-01

    Several studies have shown that human endogenous retroviruses and endogenous retrovirus-like repeats (here collectively HERVs) impose direct regulation on human genes through enhancer and promoter motifs present in their long terminal repeats (LTRs). Although chimeric transcription in which novel gene isoforms containing retroviral and human sequence are transcribed from viral promoters are commonly associated with disease, regulation by HERVs is beneficial in other settings; for example, in human testis chimeric isoforms of TP63 induced by an ERV9 LTR protect the male germ line upon DNA damage by inducing apoptosis, whereas in the human globin locus the γ- and β-globin switch during normal hematopoiesis is mediated by complex interactions of an ERV9 LTR and surrounding human sequence. The advent of deep sequencing or next-generation sequencing (NGS) has revolutionized the way researchers solve important scientific questions and develop novel hypotheses in relation to human genome regulation. We recently applied next-generation paired-end RNA-sequencing (RNA-seq) together with chromatin immunoprecipitation with sequencing (ChIP-seq) to examine ERV9 chimeric transcription in human reference cell lines from Encyclopedia of DNA Elements (ENCODE). This led to the discovery of advanced regulation mechanisms by ERV9s and other HERVs across numerous human loci including transcription of large gene-unannotated genomic regions, as well as cooperative regulation by multiple HERVs and non-LTR repeats such as Alu elements. In this article, well-established examples of human gene regulation by HERVs are reviewed followed by a description of paired-end RNA-seq, and its application in identifying chimeric transcription genome-widely. Based on integrative analyses of RNA-seq and ChIP-seq, data we then present novel examples of regulation by ERV9s of tumor suppressor genes CADM2 and SEMA3A, as well as transcription of an unannotated region. Taken together, this article highlights the high suitability of contemporary sequencing methods in future analyses of human biology in relation to evolutionary acquired retroviruses in the human genome. © 2016 APMIS. Published by John Wiley & Sons Ltd.

  14. Modeling positional effects of regulatory sequences with spline transformations increases prediction accuracy of deep neural networks

    PubMed Central

    Avsec, Žiga; Cheng, Jun; Gagneur, Julien

    2018-01-01

    Abstract Motivation Regulatory sequences are not solely defined by their nucleic acid sequence but also by their relative distances to genomic landmarks such as transcription start site, exon boundaries or polyadenylation site. Deep learning has become the approach of choice for modeling regulatory sequences because of its strength to learn complex sequence features. However, modeling relative distances to genomic landmarks in deep neural networks has not been addressed. Results Here we developed spline transformation, a neural network module based on splines to flexibly and robustly model distances. Modeling distances to various genomic landmarks with spline transformations significantly increased state-of-the-art prediction accuracy of in vivo RNA-binding protein binding sites for 120 out of 123 proteins. We also developed a deep neural network for human splice branchpoint based on spline transformations that outperformed the current best, already distance-based, machine learning model. Compared to piecewise linear transformation, as obtained by composition of rectified linear units, spline transformation yields higher prediction accuracy as well as faster and more robust training. As spline transformation can be applied to further quantities beyond distances, such as methylation or conservation, we foresee it as a versatile component in the genomics deep learning toolbox. Availability and implementation Spline transformation is implemented as a Keras layer in the CONCISE python package: https://github.com/gagneurlab/concise. Analysis code is available at https://github.com/gagneurlab/Manuscript_Avsec_Bioinformatics_2017. Contact avsec@in.tum.de or gagneur@in.tum.de Supplementary information Supplementary data are available at Bioinformatics online. PMID:29155928

  15. MUFOLD-SS: New deep inception-inside-inception networks for protein secondary structure prediction.

    PubMed

    Fang, Chao; Shang, Yi; Xu, Dong

    2018-05-01

    Protein secondary structure prediction can provide important information for protein 3D structure prediction and protein functions. Deep learning offers a new opportunity to significantly improve prediction accuracy. In this article, a new deep neural network architecture, named the Deep inception-inside-inception (Deep3I) network, is proposed for protein secondary structure prediction and implemented as a software tool MUFOLD-SS. The input to MUFOLD-SS is a carefully designed feature matrix corresponding to the primary amino acid sequence of a protein, which consists of a rich set of information derived from individual amino acid, as well as the context of the protein sequence. Specifically, the feature matrix is a composition of physio-chemical properties of amino acids, PSI-BLAST profile, and HHBlits profile. MUFOLD-SS is composed of a sequence of nested inception modules and maps the input matrix to either eight states or three states of secondary structures. The architecture of MUFOLD-SS enables effective processing of local and global interactions between amino acids in making accurate prediction. In extensive experiments on multiple datasets, MUFOLD-SS outperformed the best existing methods and other deep neural networks significantly. MUFold-SS can be downloaded from http://dslsrv8.cs.missouri.edu/~cf797/MUFoldSS/download.html. © 2018 Wiley Periodicals, Inc.

  16. Deep Sequencing of the Trypanosoma cruzi GP63 Surface Proteases Reveals Diversity and Diversifying Selection among Chronic and Congenital Chagas Disease Patients

    PubMed Central

    Llewellyn, Martin S.; Messenger, Louisa A.; Luquetti, Alejandro O.; Garcia, Lineth; Torrico, Faustino; Tavares, Suelene B. N.; Cheaib, Bachar; Derome, Nicolas; Delepine, Marc; Baulard, Céline; Deleuze, Jean-Francois; Sauer, Sascha; Miles, Michael A.

    2015-01-01

    Background Chagas disease results from infection with the diploid protozoan parasite Trypanosoma cruzi. T. cruzi is highly genetically diverse, and multiclonal infections in individual hosts are common, but little studied. In this study, we explore T. cruzi infection multiclonality in the context of age, sex and clinical profile among a cohort of chronic patients, as well as paired congenital cases from Cochabamba, Bolivia and Goias, Brazil using amplicon deep sequencing technology. Methodology/ Principal Findings A 450bp fragment of the trypomastigote TcGP63I surface protease gene was amplified and sequenced across 70 chronic and 22 congenital cases on the Illumina MiSeq platform. In addition, a second, mitochondrial target—ND5—was sequenced across the same cohort of cases. Several million reads were generated, and sequencing read depths were normalized within patient cohorts (Goias chronic, n = 43, Goias congenital n = 2, Bolivia chronic, n = 27; Bolivia congenital, n = 20), Among chronic cases, analyses of variance indicated no clear correlation between intra-host sequence diversity and age, sex or symptoms, while principal coordinate analyses showed no clustering by symptoms between patients. Between congenital pairs, we found evidence for the transmission of multiple sequence types from mother to infant, as well as widespread instances of novel genotypes in infants. Finally, non-synonymous to synonymous (dn:ds) nucleotide substitution ratios among sequences of TcGP63Ia and TcGP63Ib subfamilies within each cohort provided powerful evidence of strong diversifying selection at this locus. Conclusions/Significance Our results shed light on the diversity of parasite DTUs within each patient, as well as the extent to which parasite strains pass between mother and foetus in congenital cases. Although we were unable to find any evidence that parasite diversity accumulates with age in our study cohorts, putative diversifying selection within members of the TcGP63I gene family suggests a link between genetic diversity within this gene family and survival in the mammalian host. PMID:25849488

  17. Computer Aided Process Planning for Non-Axisymmetric Deep Drawing Products

    NASA Astrophysics Data System (ADS)

    Park, Dong Hwan; Yarlagadda, Prasad K. D. V.

    2004-06-01

    In general, deep drawing products have various cross-section shapes such as cylindrical, rectangular and non-axisymmetric shapes. The application of the surface area calculation to non-axisymmetric deep drawing process has not been published yet. In this research, a surface area calculation for non-axisymmetric deep drawing products with elliptical shape was constructed for a design of blank shape of deep drawing products by using an AutoLISP function of AutoCAD software. A computer-aided process planning (CAPP) system for rotationally symmetric deep drawing products has been developed. However, the application of the system to non-axisymmetric components has not been reported yet. Thus, the CAPP system for non-axisymmetric deep drawing products with elliptical shape was constructed by using process sequence design. The system developed in this work consists of four modules. The first is recognition of shape module to recognize non-axisymmetric products. The second is a three-dimensional (3-D) modeling module to calculate the surface area for non-axisymmetric products. The third is a blank design module to create an oval-shaped blank with the identical surface area. The forth is a process planning module based on the production rules that play the best important role in an expert system for manufacturing. The production rules are generated and upgraded by interviewing field engineers. Especially, the drawing coefficient, the punch and die radii for elliptical shape products are considered as main design parameters. The suitability of this system was verified by applying to a real deep drawing product. This CAPP system constructed would be very useful to reduce lead-time for manufacturing and improve an accuracy of products.

  18. Visualization of Whole-Night Sleep EEG From 2-Channel Mobile Recording Device Reveals Distinct Deep Sleep Stages with Differential Electrodermal Activity.

    PubMed

    Onton, Julie A; Kang, Dae Y; Coleman, Todd P

    2016-01-01

    Brain activity during sleep is a powerful marker of overall health, but sleep lab testing is prohibitively expensive and only indicated for major sleep disorders. This report demonstrates that mobile 2-channel in-home electroencephalogram (EEG) recording devices provided sufficient information to detect and visualize sleep EEG. Displaying whole-night sleep EEG in a spectral display allowed for quick assessment of general sleep stability, cycle lengths, stage lengths, dominant frequencies and other indices of sleep quality. By visualizing spectral data down to 0.1 Hz, a differentiation emerged between slow-wave sleep with dominant frequency between 0.1-1 Hz or 1-3 Hz, but rarely both. Thus, we present here the new designations, Hi and Lo Deep sleep, according to the frequency range with dominant power. Simultaneously recorded electrodermal activity (EDA) was primarily associated with Lo Deep and very rarely with Hi Deep or any other stage. Therefore, Hi and Lo Deep sleep appear to be physiologically distinct states that may serve unique functions during sleep. We developed an algorithm to classify five stages (Awake, Light, Hi Deep, Lo Deep and rapid eye movement (REM)) using a Hidden Markov Model (HMM), model fitting with the expectation-maximization (EM) algorithm, and estimation of the most likely sleep state sequence by the Viterbi algorithm. The resulting automatically generated sleep hypnogram can help clinicians interpret the spectral display and help researchers computationally quantify sleep stages across participants. In conclusion, this study demonstrates the feasibility of in-home sleep EEG collection, a rapid and informative sleep report format, and novel deep sleep designations accounting for spectral and physiological differences.

  19. The Deep Space Network. An instrument for radio navigation of deep space probes

    NASA Technical Reports Server (NTRS)

    Renzetti, N. A.; Jordan, J. F.; Berman, A. L.; Wackley, J. A.; Yunck, T. P.

    1982-01-01

    The Deep Space Network (DSN) network configurations used to generate the navigation observables and the basic process of deep space spacecraft navigation, from data generation through flight path determination and correction are described. Special emphasis is placed on the DSN Systems which generate the navigation data: the DSN Tracking and VLBI Systems. In addition, auxiliary navigational support functions are described.

  20. Fungal diversity in deep-sea sediments of a hydrothermal vent system in the Southwest Indian Ridge

    NASA Astrophysics Data System (ADS)

    Xu, Wei; Gong, Lin-feng; Pang, Ka-Lai; Luo, Zhu-Hua

    2018-01-01

    Deep-sea hydrothermal sediment is known to support remarkably diverse microbial consortia. In deep sea environments, fungal communities remain less studied despite their known taxonomic and functional diversity. High-throughput sequencing methods have augmented our capacity to assess eukaryotic diversity and their functions in microbial ecology. Here we provide the first description of the fungal community diversity found in deep sea sediments collected at the Southwest Indian Ridge (SWIR) using culture-dependent and high-throughput sequencing approaches. A total of 138 fungal isolates were cultured from seven different sediment samples using various nutrient media, and these isolates were identified to 14 fungal taxa, including 11 Ascomycota taxa (7 genera) and 3 Basidiomycota taxa (2 genera) based on internal transcribed spacers (ITS1, ITS2 and 5.8S) of rDNA. Using illumina HiSeq sequencing, a total of 757,467 fungal ITS2 tags were recovered from the samples and clustered into 723 operational taxonomic units (OTUs) belonging to 79 taxa (Ascomycota and Basidiomycota contributed to 99% of all samples) based on 97% sequence similarity. Results from both approaches suggest that there is a high fungal diversity in the deep-sea sediments collected in the SWIR and fungal communities were shown to be slightly different by location, although all were collected from adjacent sites at the SWIR. This study provides baseline data of the fungal diversity and biogeography, and a glimpse to the microbial ecology associated with the deep-sea sediments of the hydrothermal vent system of the Southwest Indian Ridge.

  1. A Phylogenomic Perspective on the Radiation of Ray-Finned Fishes Based upon Targeted Sequencing of Ultraconserved Elements (UCEs)

    PubMed Central

    Sorenson, Laurie; Santini, Francesco

    2013-01-01

    Ray-finned fishes constitute the dominant radiation of vertebrates with over 32,000 species. Although molecular phylogenetics has begun to disentangle major evolutionary relationships within this vast section of the Tree of Life, there is no widely available approach for efficiently collecting phylogenomic data within fishes, leaving much of the enormous potential of massively parallel sequencing technologies for resolving major radiations in ray-finned fishes unrealized. Here, we provide a genomic perspective on longstanding questions regarding the diversification of major groups of ray-finned fishes through targeted enrichment of ultraconserved nuclear DNA elements (UCEs) and their flanking sequence. Our workflow efficiently and economically generates data sets that are orders of magnitude larger than those produced by traditional approaches and is well-suited to working with museum specimens. Analysis of the UCE data set recovers a well-supported phylogeny at both shallow and deep time-scales that supports a monophyletic relationship between Amia and Lepisosteus (Holostei) and reveals elopomorphs and then osteoglossomorphs to be the earliest diverging teleost lineages. Our approach additionally reveals that sequence capture of UCE regions and their flanking sequence offers enormous potential for resolving phylogenetic relationships within ray-finned fishes. PMID:23824177

  2. Deep learning of the regulatory grammar of yeast 5′ untranslated regions from 500,000 random sequences

    PubMed Central

    Groves, Benjamin; Kuchina, Anna; Rosenberg, Alexander B.; Jojic, Nebojsa; Fields, Stanley; Seelig, Georg

    2017-01-01

    Our ability to predict protein expression from DNA sequence alone remains poor, reflecting our limited understanding of cis-regulatory grammar and hampering the design of engineered genes for synthetic biology applications. Here, we generate a model that predicts the protein expression of the 5′ untranslated region (UTR) of mRNAs in the yeast Saccharomyces cerevisiae. We constructed a library of half a million 50-nucleotide-long random 5′ UTRs and assayed their activity in a massively parallel growth selection experiment. The resulting data allow us to quantify the impact on protein expression of Kozak sequence composition, upstream open reading frames (uORFs), and secondary structure. We trained a convolutional neural network (CNN) on the random library and showed that it performs well at predicting the protein expression of both a held-out set of the random 5′ UTRs as well as native S. cerevisiae 5′ UTRs. The model additionally was used to computationally evolve highly active 5′ UTRs. We confirmed experimentally that the great majority of the evolved sequences led to higher protein expression rates than the starting sequences, demonstrating the predictive power of this model. PMID:29097404

  3. Experimental demonstration of multiuser communication in deep water using time reversal.

    PubMed

    Shimura, T; Ochi, H; Song, H C

    2013-10-01

    Multiuser communication is demonstrated using experimental data (450-550 Hz) collected in deep water, south of Japan. The multiple users are spatially distributed either in depth or range while a 114-m long, 20-element vertical array (i.e., base station) is deployed to around the sound channel axis (~1000 m). First, signals received separately from ranges of 150 km and 180 km at various depths are combined asynchronously to generate multiuser communication sequences for subsequent processing, achieving an aggregate data rate of 300 bits/s for up to three users. Adaptive time reversal is employed to separate collided packets at the base station, followed by a single channel decision feedback equalizer. Then it is demonstrated that two users separated by 3 km in range at ~1000 m depth can transmit information simultaneously to the base station at ~500 km range with an aggregate data rate of 200 bits/s.

  4. Cross-shore and Vertical Distributions of Invertebrate Larvae Using Autonomous Sampling Coupled with Genetic Analysis

    NASA Astrophysics Data System (ADS)

    Govindarajan, A.; Pineda, J.; Purcell, M.; Tradd, K.; Packard, G.; Girard, A.; Dennett, M.; Breier, J. A., Jr.

    2016-02-01

    We present a new method to estimate the distribution of invertebrate larvae relative to environmental variables such as temperature, salinity, and circulation. A large volume in situ filtering system developed for discrete biogeochemical sampling in the deep-sea (the Suspended Particulate Rosette "SUPR" multisampler) was mounted to the autonomous underwater vehicle REMUS 600 for coastal larval and environmental sampling. We describe the results of SUPR-REMUS deployments conducted in Buzzards Bay, Massachusetts (2014) and west of Martha's Vineyard, Massachusetts (2015). We collected discrete samples cross-shore and from surface, middle, and bottom layers of the water column. Samples were preserved for DNA analysis. Our Buzzards Bay deployment targeted barnacle larvae, which are abundant in late winter and early spring. For these samples, we used morphological analysis and DNA barcodes generated by Sanger sequencing to obtain stage and species-specific cross-shore and vertical distributions. We targeted bivalve larvae in our 2015 deployments, and genetic analysis of larvae from these samples is underway. For these samples, we are comparing species barcode data derived from traditional Sanger sequencing of individuals to those obtained from next generation sequencing (NGS) of bulk plankton samples. Our results demonstrate the utility of autonomous sampling combined with DNA barcoding for studying larval distributions and transport dynamics.

  5. ToTem: a tool for variant calling pipeline optimization.

    PubMed

    Tom, Nikola; Tom, Ondrej; Malcikova, Jitka; Pavlova, Sarka; Kubesova, Blanka; Rausch, Tobias; Kolarik, Miroslav; Benes, Vladimir; Bystry, Vojtech; Pospisilova, Sarka

    2018-06-26

    High-throughput bioinformatics analyses of next generation sequencing (NGS) data often require challenging pipeline optimization. The key problem is choosing appropriate tools and selecting the best parameters for optimal precision and recall. Here we introduce ToTem, a tool for automated pipeline optimization. ToTem is a stand-alone web application with a comprehensive graphical user interface (GUI). ToTem is written in Java and PHP with an underlying connection to a MySQL database. Its primary role is to automatically generate, execute and benchmark different variant calling pipeline settings. Our tool allows an analysis to be started from any level of the process and with the possibility of plugging almost any tool or code. To prevent an over-fitting of pipeline parameters, ToTem ensures the reproducibility of these by using cross validation techniques that penalize the final precision, recall and F-measure. The results are interpreted as interactive graphs and tables allowing an optimal pipeline to be selected, based on the user's priorities. Using ToTem, we were able to optimize somatic variant calling from ultra-deep targeted gene sequencing (TGS) data and germline variant detection in whole genome sequencing (WGS) data. ToTem is a tool for automated pipeline optimization which is freely available as a web application at  https://totem.software .

  6. Unveiling the metabolic potential of two soil-derived microbial consortia selected on wheat straw

    PubMed Central

    Jiménez, Diego Javier; Chaves-Moreno, Diego; van Elsas, Jan Dirk

    2015-01-01

    Based on the premise that plant biomass can be efficiently degraded by mixed microbial cultures and/or enzymes, we here applied a targeted metagenomics-based approach to explore the metabolic potential of two forest soil-derived lignocellulolytic microbial consortia, denoted RWS and TWS (bred on wheat straw). Using the metagenomes of three selected batches of two experimental systems, about 1.2 Gb of sequence was generated. Comparative analyses revealed an overrepresentation of predicted carbohydrate transporters (ABC, TonB and phosphotransferases), two-component sensing systems and β-glucosidases/galactosidases in the two consortia as compared to the forest soil inoculum. Additionally, “profiling” of carbohydrate-active enzymes showed significant enrichments of several genes encoding glycosyl hydrolases of families GH2, GH43, GH92 and GH95. Sequence analyses revealed these to be most strongly affiliated to genes present on the genomes of Sphingobacterium, Bacteroides, Flavobacterium and Pedobacter spp. Assembly of the RWS and TWS metagenomes generated 16,536 and 15,902 contigs of ≥10 Kb, respectively. Thirteen contigs, containing 39 glycosyl hydrolase genes, constitute novel (hemi)cellulose utilization loci with affiliation to sequences primarily found in the Bacteroidetes. Overall, this study provides deep insight in the plant polysaccharide degrading capabilities of microbial consortia bred from forest soil, highlighting their biotechnological potential. PMID:26343383

  7. Next-generation sequencing offers new insights into the resistance of Candida spp. to echinocandins and azoles.

    PubMed

    Garnaud, Cécile; Botterel, Françoise; Sertour, Natacha; Bougnoux, Marie-Elisabeth; Dannaoui, Eric; Larrat, Sylvie; Hennequin, Christophe; Guinea, Jesus; Cornet, Muriel; Maubon, Danièle

    2015-09-01

    MDR Candida strains are emerging. Next-generation sequencing (NGS), which enables extensive and deep genome analysis, was used to investigate echinocandin and azole resistance in clinical Candida isolates. Six genes commonly involved in antifungal resistance (ERG11, ERG3, TAC1, CgPDR1, FKS1 and FKS2) were analysed using NGS in 40 Candida isolates (18 Candida albicans, 15 Candida glabrata and 7 Candida parapsilosis). The strategy was validated using strains with known sequences. Then, 8 clinical strains displaying antifungal resistance and 23 sequential isolates collected from 10 patients receiving antifungal therapy were analysed. A total of 391 SNPs were detected, among which 6 coding SNPs were reported for the first time. Novel genetic alterations were detected in both azole and echinocandin resistance genes. A C. glabrata strain, which was resistant to echinocandins but highly susceptible to azoles, harboured an FKS2 S663P mutation plus a novel presumed loss-of-function CgPDR1 mutation. This isolate was from a patient with deep-seated and urinary candidiasis. Another C. glabrata isolate, with an MDR phenotype, carried a new FKS2 S663A mutation and a new putative gain-of-function CgPDR1 mutation (T370I); this isolate showed mutated (80%) and WT (20%) populations and was collected after 75 days of exposure to caspofungin from a patient who underwent complicated abdominal surgery. This study shows that NGS can be used for extensive assessment of genetic mutations involved in antifungal resistance. This type of wide genome approach will become very valuable for detecting mechanisms of resistance in clinical strains subjected to multidrug pressure. © The Author 2015. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. Hairpin RNA Targeting Multiple Viral Genes Confers Strong Resistance to Rice Black-Streaked Dwarf Virus.

    PubMed

    Wang, Fangquan; Li, Wenqi; Zhu, Jinyan; Fan, Fangjun; Wang, Jun; Zhong, Weigong; Wang, Ming-Bo; Liu, Qing; Zhu, Qian-Hao; Zhou, Tong; Lan, Ying; Zhou, Yijun; Yang, Jie

    2016-05-11

    Rice black-streaked dwarf virus (RBSDV) belongs to the genus Fijivirus in the family of Reoviridae and causes severe yield loss in rice-producing areas in Asia. RNA silencing, as a natural defence mechanism against plant viruses, has been successfully exploited for engineering virus resistance in plants, including rice. In this study, we generated transgenic rice lines harbouring a hairpin RNA (hpRNA) construct targeting four RBSDV genes, S1, S2, S6 and S10, encoding the RNA-dependent RNA polymerase, the putative core protein, the RNA silencing suppressor and the outer capsid protein, respectively. Both field nursery and artificial inoculation assays of three generations of the transgenic lines showed that they had strong resistance to RBSDV infection. The RBSDV resistance in the segregating transgenic populations correlated perfectly with the presence of the hpRNA transgene. Furthermore, the hpRNA transgene was expressed in the highly resistant transgenic lines, giving rise to abundant levels of 21-24 nt small interfering RNA (siRNA). By small RNA deep sequencing, the RBSDV-resistant transgenic lines detected siRNAs from all four viral gene sequences in the hpRNA transgene, indicating that the whole chimeric fusion sequence can be efficiently processed by Dicer into siRNAs. Taken together, our results suggest that long hpRNA targeting multiple viral genes can be used to generate stable and durable virus resistance in rice, as well as other plant species.

  9. Targeted next-generation sequencing appoints c16orf57 as clericuzio-type poikiloderma with neutropenia gene.

    PubMed

    Volpi, Ludovica; Roversi, Gaia; Colombo, Elisa Adele; Leijsten, Nico; Concolino, Daniela; Calabria, Andrea; Mencarelli, Maria Antonietta; Fimiani, Michele; Macciardi, Fabio; Pfundt, Rolph; Schoenmakers, Eric F P M; Larizza, Lidia

    2010-01-01

    Next-generation sequencing is a straightforward tool for the identification of disease genes in extended genomic regions. Autozygosity mapping was performed on a five-generation inbred Italian family with three siblings affected with Clericuzio-type poikiloderma with neutropenia (PN [MIM %604173]), a rare autosomal-recessive genodermatosis characterised by poikiloderma, pachyonychia, and chronic neutropenia. The siblings were initially diagnosed as affected with Rothmund-Thomson syndrome (RTS [MIM #268400]), with which PN shows phenotypic overlap. Linkage analysis on all living subjects of the family identified a large 16q region inherited identically by descent (IBD) in all affected family members. Deep sequencing of this 3.4 Mb region previously enriched with array capture revealed a homozygous c.504-2 A>C mismatch in all affected siblings. The mutation destroys the invariant AG acceptor site of intron 4 of the evolutionarily conserved C16orf57 gene. Two distinct deleterious mutations (c.502A>G and c.666_676+1del12) identified in an unrelated PN patient confirmed that the C16orf57 gene is responsible for PN. The function of the predicted C16orf57 gene is unknown, but its product has been shown to be interconnected to RECQL4 protein via SMAD4 proteins. The unravelled clinical and genetic identity of PN allows patients to undergo genetic testing and follow-up. 2010 The American Society of Human Genetics. Published by Elsevier Inc.

  10. Transcript and metabolite profiling for the evaluation of tobacco tree and poplar as feedstock for the bio-based industry.

    PubMed

    Ruprecht, Colin; Tohge, Takayuki; Fernie, Alisdair; Mortimer, Cara L; Kozlo, Amanda; Fraser, Paul D; Funke, Norma; Cesarino, Igor; Vanholme, Ruben; Boerjan, Wout; Morreel, Kris; Burgert, Ingo; Gierlinger, Notburga; Bulone, Vincent; Schneider, Vera; Stockero, Andrea; Navarro-Aviñó, Juan; Pudel, Frank; Tambuyser, Bart; Hygate, James; Bumstead, Jon; Notley, Louis; Persson, Staffan

    2014-05-16

    The global demand for food, feed, energy and water poses extraordinary challenges for future generations. It is evident that robust platforms for the exploration of renewable resources are necessary to overcome these challenges. Within the multinational framework MultiBioPro we are developing biorefinery pipelines to maximize the use of plant biomass. More specifically, we use poplar and tobacco tree (Nicotiana glauca) as target crop species for improving saccharification, isoprenoid, long chain hydrocarbon contents, fiber quality, and suberin and lignin contents. The methods used to obtain these outputs include GC-MS, LC-MS and RNA sequencing platforms. The metabolite pipelines are well established tools to generate these types of data, but also have the limitations in that only well characterized metabolites can be used. The deep sequencing will allow us to include all transcripts present during the developmental stages of the tobacco tree leaf, but has to be mapped back to the sequence of Nicotiana tabacum. With these set-ups, we aim at a basic understanding for underlying processes and at establishing an industrial framework to exploit the outcomes. In a more long term perspective, we believe that data generated here will provide means for a sustainable biorefinery process using poplar and tobacco tree as raw material. To date the basal level of metabolites in the samples have been analyzed and the protocols utilized are provided in this article.

  11. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier.

    PubMed

    Kulmanov, Maxat; Khan, Mohammed Asif; Hoehndorf, Robert; Wren, Jonathan

    2018-02-15

    A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein-protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations. Web server: http://deepgo.bio2vec.net, Source code: https://github.com/bio-ontology-research-group/deepgo. robert.hoehndorf@kaust.edu.sa. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  12. Identification and Validation of Expressed Sequence Tags from Pigeonpea (Cajanus cajan L.) Root

    PubMed Central

    Kumar, Ravi Ranjan; Yadav, Shailesh; Joshi, Shourabh; Bhandare, Prithviraj P.; Patil, Vinod Kumar; Kulkarni, Pramod B.; Sonkawade, Swati; Naik, G. R.

    2014-01-01

    Pigeonpea (Cajanus cajan (L) Millsp.) is an important food legume crop of rain fed agriculture in the arid and semiarid tropics of the world. It has deep and extensive root system which serves a number of important physiological and metabolic functions in plant development and growth. In order to identify genes associated with pigeonpea root, ESTs were generated from the root tissues of pigeonpea (GRG-295 genotype) by normalized cDNA library. A total of 105 high quality ESTs were generated by sequencing of 250 random clones which resulted in 72 unigenes comprising 25 contigs and 47 singlets. The ESTs were assigned to 9 functional categories on the basis of their putative function. In order to validate the possible expression of transcripts, four genes, namely, S-adenosylmethionine synthetase, phosphoglycerate kinase, serine carboxypeptidase, and methionine aminopeptidase, were further analyzed by reverse transcriptase PCR. The possible role of the identified transcripts and their functions associated with root will also be a valuable resource for the functional genomics study in legume crop. PMID:24895494

  13. Vertical water mass structure in the North Atlantic influences the bathymetric distribution of species in the deep-sea coral genus Paramuricea

    NASA Astrophysics Data System (ADS)

    Radice, Veronica Z.; Quattrini, Andrea M.; Wareham, Vonda E.; Edinger, Evan N.; Cordes, Erik E.

    2016-10-01

    Deep-sea corals are the structural foundation of their ecosystems along continental margins worldwide, yet the factors driving their broad distribution are poorly understood. Environmental factors, especially depth-related variables including water mass properties, are thought to considerably affect the realized distribution of deep-sea corals. These factors are governed by local and regional oceanographic conditions that directly influence the dispersal of larvae, and therefore affect the ultimate distribution of adult corals. We used molecular barcoding of mitochondrial and nuclear sequences to identify species of octocorals in the genus Paramuricea collected from the Labrador Sea to the Grand Banks of Newfoundland, Canada at depths of 150-1500 m. The results of this study revealed overlapping bathymetric distributions of the Paramuricea species present off the eastern Canadian coast, including the presence of a few cryptic species previously designated as Paramuricea placomus. The distribution of Paramuricea species in the western North Atlantic differs from the Gulf of Mexico, where five Paramuricea species exhibit strong segregation by depth. The different patterns of Paramuricea species in these contrasting biogeographic regions provide insight into how water mass structure may shape species distribution. Investigating Paramuricea prevalence and distribution in conjunction with oceanographic conditions can help demonstrate the factors that generate and maintain deep-sea biodiversity.

  14. Biogeography and ecology of the rare and abundant microbial lineages in deep-sea hydrothermal vents.

    PubMed

    Anderson, Rika E; Sogin, Mitchell L; Baross, John A

    2015-01-01

    Environmental gradients generate countless ecological niches in deep-sea hydrothermal vent systems, which foster diverse microbial communities. The majority of distinct microbial lineages in these communities occur in very low abundance. However, the ecological role and distribution of rare and abundant lineages, particularly in deep, hot subsurface environments, remain unclear. Here, we use 16S rRNA tag sequencing to describe biogeographic patterning and microbial community structure of both rare and abundant archaea and bacteria in hydrothermal vent systems. We show that while rare archaeal lineages and almost all bacterial lineages displayed geographically restricted community structuring patterns, the abundant lineages of archaeal communities displayed a much more cosmopolitan distribution. Finally, analysis of one high-volume, high-temperature fluid sample representative of the deep hot biosphere described a unique microbial community that differed from microbial populations in diffuse flow fluid or sulfide samples, yet the rare thermophilic archaeal groups showed similarities to those that occur in sulfides. These results suggest that while most archaeal and bacterial lineages in vents are rare and display a highly regional distribution, a small percentage of lineages, particularly within the archaeal domain, are successful at widespread dispersal and colonization. © FEMS 2014. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  15. Molecular diversity of bacterial communities from subseafloor rock samples in a deep-water production basin in Brazil.

    PubMed

    von der Weid, Irene; Korenblum, Elisa; Jurelevicius, Diogo; Rosado, Alexandre Soares; Dino, Rodolfo; Sebastian, Gina Vasquez; Seldin, Lucy

    2008-01-01

    The deep subseafloor rock in oil reservoirs represents a unique environment in which a high oilcontamination and very low biomass can be observed. Sampling this environment has been a challenge owing to the techniques used for drilling and coring. In this study, the facilities developed by the Brazilian oil company PETROBRAS for accessing deep subsurface oil reservoirs were used to obtain rock samples at 2,822-2,828 m below the ocean floor surface from a virgin field located in the Atlantic Ocean, Rio de Janeiro. To address the bacterial diversity of these rock samples, PCR amplicons were obtained using the DNA from four core sections and universal primers for 16S rRNA and for APS reductase (aps) genes. Clone libraries were generated from these PCR fragments and 87 clones were sequenced. The phylogenetic analyses of the 16S rDNA clone libraries showed a wide distribution of types in the domain bacteria in the four core samples, and the majority of the clones were identified as belonging to Betaproteobacteria. The sulfate-reducing bacteria community could only be amplified by PCR in one sample, and all clones were identified as belonging to Gammaproteobacteria. For the first time, the bacterial community was assessed in such deep subsurface environment.

  16. Sensitive Deep-Sequencing-Based HIV-1 Genotyping Assay To Simultaneously Determine Susceptibility to Protease, Reverse Transcriptase, Integrase, and Maturation Inhibitors, as Well as HIV-1 Coreceptor Tropism

    PubMed Central

    Gibson, Richard M.; Meyer, Ashley M.; Winner, Dane; Archer, John; Feyertag, Felix; Ruiz-Mateos, Ezequiel; Leal, Manuel; Robertson, David L.; Schmotzer, Christine L.

    2014-01-01

    With 29 individual antiretroviral drugs available from six classes that are approved for the treatment of HIV-1 infection, a combination of different phenotypic and genotypic tests is currently needed to monitor HIV-infected individuals. In this study, we developed a novel HIV-1 genotypic assay based on deep sequencing (DeepGen HIV) to simultaneously assess HIV-1 susceptibilities to all drugs targeting the three viral enzymes and to predict HIV-1 coreceptor tropism. Patient-derived gag-p2/NCp7/p1/p6/pol-PR/RT/IN- and env-C2V3 PCR products were sequenced using the Ion Torrent Personal Genome Machine. Reads spanning the 3′ end of the Gag, protease (PR), reverse transcriptase (RT), integrase (IN), and V3 regions were extracted, truncated, translated, and assembled for genotype and HIV-1 coreceptor tropism determination. DeepGen HIV consistently detected both minority drug-resistant viruses and non-R5 HIV-1 variants from clinical specimens with viral loads of ≥1,000 copies/ml and from B and non-B subtypes. Additional mutations associated with resistance to PR, RT, and IN inhibitors, previously undetected by standard (Sanger) population sequencing, were reliably identified at frequencies as low as 1%. DeepGen HIV results correlated with phenotypic (original Trofile, 92%; enhanced-sensitivity Trofile assay [ESTA], 80%; TROCAI, 81%; and VeriTrop, 80%) and genotypic (population sequencing/Geno2Pheno with a 10% false-positive rate [FPR], 84%) HIV-1 tropism test results. DeepGen HIV (83%) and Trofile (85%) showed similar concordances with the clinical response following an 8-day course of maraviroc monotherapy (MCT). In summary, this novel all-inclusive HIV-1 genotypic and coreceptor tropism assay, based on deep sequencing of the PR, RT, IN, and V3 regions, permits simultaneous multiplex detection of low-level drug-resistant and/or non-R5 viruses in up to 96 clinical samples. This comprehensive test, the first of its class, will be instrumental in the development of new antiretroviral drugs and, more importantly, will aid in the treatment and management of HIV-infected individuals. PMID:24468782

  17. De novo transcriptome assembly and positive selection analysis of an individual deep-sea fish.

    PubMed

    Lan, Yi; Sun, Jin; Xu, Ting; Chen, Chong; Tian, Renmao; Qiu, Jian-Wen; Qian, Pei-Yuan

    2018-05-24

    High hydrostatic pressure and low temperatures make the deep sea a harsh environment for life forms. Actin organization and microtubules assembly, which are essential for intracellular transport and cell motility, can be disrupted by high hydrostatic pressure. High hydrostatic pressure can also damage DNA. Nucleic acids exposed to low temperatures can form secondary structures that hinder genetic information processing. To study how deep-sea creatures adapt to such a hostile environment, one of the most straightforward ways is to sequence and compare their genes with those of their shallow-water relatives. We captured an individual of the fish species Aldrovandia affinis, which is a typical deep-sea inhabitant, from the Okinawa Trough at a depth of 1550 m using a remotely operated vehicle (ROV). We sequenced its transcriptome and analyzed its molecular adaptation. We obtained 27,633 protein coding sequences using an Illumina platform and compared them with those of several shallow-water fish species. Analysis of 4918 single-copy orthologs identified 138 positively selected genes in A. affinis, including genes involved in microtubule regulation. Particularly, functional domains related to cold shock as well as DNA repair are exposed to positive selection pressure in both deep-sea fish and hadal amphipod. Overall, we have identified a set of positively selected genes related to cytoskeleton structures, DNA repair and genetic information processing, which shed light on molecular adaptation to the deep sea. These results suggest that amino acid substitutions of these positively selected genes may contribute crucially to the adaptation of deep-sea animals. Additionally, we provide a high-quality transcriptome of a deep-sea fish for future deep-sea studies.

  18. Revealing the unexplored fungal communities in deep groundwater of crystalline bedrock fracture zones in Olkiluoto, Finland.

    PubMed

    Sohlberg, Elina; Bomberg, Malin; Miettinen, Hanna; Nyyssönen, Mari; Salavirta, Heikki; Vikman, Minna; Itävaara, Merja

    2015-01-01

    The diversity and functional role of fungi, one of the ecologically most important groups of eukaryotic microorganisms, remains largely unknown in deep biosphere environments. In this study we investigated fungal communities in packer-isolated bedrock fractures in Olkiluoto, Finland at depths ranging from 296 to 798 m below surface level. DNA- and cDNA-based high-throughput amplicon sequencing analysis of the fungal internal transcribed spacer (ITS) gene markers was used to examine the total fungal diversity and to identify the active members in deep fracture zones at different depths. Results showed that fungi were present in fracture zones at all depths and fungal diversity was higher than expected. Most of the observed fungal sequences belonged to the phylum Ascomycota. Phyla Basidiomycota and Chytridiomycota were only represented as a minor part of the fungal community. Dominating fungal classes in the deep bedrock aquifers were Sordariomycetes, Eurotiomycetes, and Dothideomycetes from the Ascomycota phylum and classes Microbotryomycetes and Tremellomycetes from the Basidiomycota phylum, which are the most frequently detected fungal taxa reported also from deep sea environments. In addition some fungal sequences represented potentially novel fungal species. Active fungi were detected in most of the fracture zones, which proves that fungi are able to maintain cellular activity in these oligotrophic conditions. Possible roles of fungi and their origin in deep bedrock groundwater can only be speculated in the light of current knowledge but some species may be specifically adapted to deep subsurface environment and may play important roles in the utilization and recycling of nutrients and thus sustaining the deep subsurface microbial community.

  19. Analysis of the Gull Fecal Microbial Community Reveals the Dominance of Catellicoccus marimammalium in Relation to Culturable Enterococci

    PubMed Central

    Koskey, Amber M.; Fisher, Jenny C.; Traudt, Mary F.; Newton, Ryan J.

    2014-01-01

    Gulls are prevalent in beach environments and can be a major source of fecal contamination. Gulls have been shown to harbor a high abundance of fecal indicator bacteria (FIB), such as Escherichia coli and enterococci, which can be readily detected as part of routine beach monitoring. Despite the ubiquitous presence of gull fecal material in beach environments, the associated microbial community is relatively poorly characterized. We generated comprehensive microbial community profiles of gull fecal samples using Roche 454 and Illumina MiSeq platforms to investigate the composition and variability of the gull fecal microbial community and to measure the proportion of FIB. Enterococcaceae and Enterobacteriaceae were the two most abundant families in our gull samples. Sequence comparisons between short-read data and nearly full-length 16S rRNA gene clones generated from the same samples revealed Catellicoccus marimammalium as the most numerous taxon among all samples. The identification of bacteria from gull fecal pellets cultured on membrane-Enterococcus indoxyl-β-d-glucoside (mEI) plates showed that the dominant sequences recovered in our sequence libraries did not represent organisms culturable on mEI. Based on 16S rRNA gene sequencing of gull fecal isolates cultured on mEI plates, 98.8% were identified as Enterococcus spp., 1.2% were identified as Streptococcus spp., and none were identified as C. marimammalium. Illumina deep sequencing indicated that gull fecal samples harbor significantly higher proportions of C. marimammalium 16S rRNA gene sequences (>50-fold) relative to typical mEI culturable Enterococcus spp. C. marimammalium therefore can be confidently utilized as a genetic marker to identify gull fecal pollution in the beach environment. PMID:24242244

  20. Application of Genomic Technologies to the Breeding of Trees

    PubMed Central

    Badenes, Maria L.; Fernández i Martí, Angel; Ríos, Gabino; Rubio-Cabetas, María J.

    2016-01-01

    The recent introduction of next generation sequencing (NGS) technologies represents a major revolution in providing new tools for identifying the genes and/or genomic intervals controlling important traits for selection in breeding programs. In perennial fruit trees with long generation times and large sizes of adult plants, the impact of these techniques is even more important. High-throughput DNA sequencing technologies have provided complete annotated sequences in many important tree species. Most of the high-throughput genotyping platforms described are being used for studies of genetic diversity and population structure. Dissection of complex traits became possible through the availability of genome sequences along with phenotypic variation data, which allow to elucidate the causative genetic differences that give rise to observed phenotypic variation. Association mapping facilitates the association between genetic markers and phenotype in unstructured and complex populations, identifying molecular markers for assisted selection and breeding. Also, genomic data provide in silico identification and characterization of genes and gene families related to important traits, enabling new tools for molecular marker assisted selection in tree breeding. Deep sequencing of transcriptomes is also a powerful tool for the analysis of precise expression levels of each gene in a sample. It consists in quantifying short cDNA reads, obtained by NGS technologies, in order to compare the entire transcriptomes between genotypes and environmental conditions. The miRNAs are non-coding short RNAs involved in the regulation of different physiological processes, which can be identified by high-throughput sequencing of RNA libraries obtained by reverse transcription of purified short RNAs, and by in silico comparison with known miRNAs from other species. All together, NGS techniques and their applications have increased the resources for plant breeding in tree species, closing the former gap of genetic tools between trees and annual species. PMID:27895664

  1. Partial sequence homogenization in the 5S multigene families may generate sequence chimeras and spurious results in phylogenetic reconstructions.

    PubMed

    Galián, José A; Rosato, Marcela; Rosselló, Josep A

    2014-03-01

    Multigene families have provided opportunities for evolutionary biologists to assess molecular evolution processes and phylogenetic reconstructions at deep and shallow systematic levels. However, the use of these markers is not free of technical and analytical challenges. Many evolutionary studies that used the nuclear 5S rDNA gene family rarely used contiguous 5S coding sequences due to the routine use of head-to-tail polymerase chain reaction primers that are anchored to the coding region. Moreover, the 5S coding sequences have been concatenated with independent, adjacent gene units in many studies, creating simulated chimeric genes as the raw data for evolutionary analysis. This practice is based on the tacitly assumed, but rarely tested, hypothesis that strict intra-locus concerted evolution processes are operating in 5S rDNA genes, without any empirical evidence as to whether it holds for the recovered data. The potential pitfalls of analysing the patterns of molecular evolution and reconstructing phylogenies based on these chimeric genes have not been assessed to date. Here, we compared the sequence integrity and phylogenetic behavior of entire versus concatenated 5S coding regions from a real data set obtained from closely related plant species (Medicago, Fabaceae). Our results suggest that within arrays sequence homogenization is partially operating in the 5S coding region, which is traditionally assumed to be highly conserved. Consequently, concatenating 5S genes increases haplotype diversity, generating novel chimeric genotypes that most likely do not exist within the genome. In addition, the patterns of gene evolution are distorted, leading to incorrect haplotype relationships in some evolutionary reconstructions.

  2. Application of Genomic Technologies to the Breeding of Trees.

    PubMed

    Badenes, Maria L; Fernández I Martí, Angel; Ríos, Gabino; Rubio-Cabetas, María J

    2016-01-01

    The recent introduction of next generation sequencing (NGS) technologies represents a major revolution in providing new tools for identifying the genes and/or genomic intervals controlling important traits for selection in breeding programs. In perennial fruit trees with long generation times and large sizes of adult plants, the impact of these techniques is even more important. High-throughput DNA sequencing technologies have provided complete annotated sequences in many important tree species. Most of the high-throughput genotyping platforms described are being used for studies of genetic diversity and population structure. Dissection of complex traits became possible through the availability of genome sequences along with phenotypic variation data, which allow to elucidate the causative genetic differences that give rise to observed phenotypic variation. Association mapping facilitates the association between genetic markers and phenotype in unstructured and complex populations, identifying molecular markers for assisted selection and breeding. Also, genomic data provide in silico identification and characterization of genes and gene families related to important traits, enabling new tools for molecular marker assisted selection in tree breeding. Deep sequencing of transcriptomes is also a powerful tool for the analysis of precise expression levels of each gene in a sample. It consists in quantifying short cDNA reads, obtained by NGS technologies, in order to compare the entire transcriptomes between genotypes and environmental conditions. The miRNAs are non-coding short RNAs involved in the regulation of different physiological processes, which can be identified by high-throughput sequencing of RNA libraries obtained by reverse transcription of purified short RNAs, and by in silico comparison with known miRNAs from other species. All together, NGS techniques and their applications have increased the resources for plant breeding in tree species, closing the former gap of genetic tools between trees and annual species.

  3. Validation of a next-generation sequencing assay for clinical molecular oncology.

    PubMed

    Cottrell, Catherine E; Al-Kateb, Hussam; Bredemeyer, Andrew J; Duncavage, Eric J; Spencer, David H; Abel, Haley J; Lockwood, Christina M; Hagemann, Ian S; O'Guin, Stephanie M; Burcea, Lauren C; Sawyer, Christopher S; Oschwald, Dayna M; Stratman, Jennifer L; Sher, Dorie A; Johnson, Mark R; Brown, Justin T; Cliften, Paul F; George, Bijoy; McIntosh, Leslie D; Shrivastava, Savita; Nguyen, Tudung T; Payton, Jacqueline E; Watson, Mark A; Crosby, Seth D; Head, Richard D; Mitra, Robi D; Nagarajan, Rakesh; Kulkarni, Shashikant; Seibert, Karen; Virgin, Herbert W; Milbrandt, Jeffrey; Pfeifer, John D

    2014-01-01

    Currently, oncology testing includes molecular studies and cytogenetic analysis to detect genetic aberrations of clinical significance. Next-generation sequencing (NGS) allows rapid analysis of multiple genes for clinically actionable somatic variants. The WUCaMP assay uses targeted capture for NGS analysis of 25 cancer-associated genes to detect mutations at actionable loci. We present clinical validation of the assay and a detailed framework for design and validation of similar clinical assays. Deep sequencing of 78 tumor specimens (≥ 1000× average unique coverage across the capture region) achieved high sensitivity for detecting somatic variants at low allele fraction (AF). Validation revealed sensitivities and specificities of 100% for detection of single-nucleotide variants (SNVs) within coding regions, compared with SNP array sequence data (95% CI = 83.4-100.0 for sensitivity and 94.2-100.0 for specificity) or whole-genome sequencing (95% CI = 89.1-100.0 for sensitivity and 99.9-100.0 for specificity) of HapMap samples. Sensitivity for detecting variants at an observed 10% AF was 100% (95% CI = 93.2-100.0) in HapMap mixes. Analysis of 15 masked specimens harboring clinically reported variants yielded concordant calls for 13/13 variants at AF of ≥ 15%. The WUCaMP assay is a robust and sensitive method to detect somatic variants of clinical significance in molecular oncology laboratories, with reduced time and cost of genetic analysis allowing for strategic patient management. Copyright © 2014 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  4. Strain-Level Diversity of Secondary Metabolism in Streptomyces albus

    PubMed Central

    Seipke, Ryan F.

    2015-01-01

    Streptomyces spp. are robust producers of medicinally-, industrially- and agriculturally-important small molecules. Increased resistance to antibacterial agents and the lack of new antibiotics in the pipeline have led to a renaissance in natural product discovery. This endeavor has benefited from inexpensive high quality DNA sequencing technology, which has generated more than 140 genome sequences for taxonomic type strains and environmental Streptomyces spp. isolates. Many of the sequenced streptomycetes belong to the same species. For instance, Streptomyces albus has been isolated from diverse environmental niches and seven strains have been sequenced, consequently this species has been sequenced more than any other streptomycete, allowing valuable analyses of strain-level diversity in secondary metabolism. Bioinformatics analyses identified a total of 48 unique biosynthetic gene clusters harboured by Streptomyces albus strains. Eighteen of these gene clusters specify the core secondary metabolome of the species. Fourteen of the gene clusters are contained by one or more strain and are considered auxiliary, while 16 of the gene clusters encode the production of putative strain-specific secondary metabolites. Analysis of Streptomyces albus strains suggests that each strain of a Streptomyces species likely harbours at least one strain-specific biosynthetic gene cluster. Importantly, this implies that deep sequencing of a species will not exhaust gene cluster diversity and will continue to yield novelty. PMID:25635820

  5. Zygotic amplification of secondary piRNAs during silkworm embryogenesis

    PubMed Central

    Kawaoka, Shinpei; Arai, Yuji; Kadota, Koji; Suzuki, Yutaka; Hara, Kahori; Sugano, Sumio; Shimizu, Kentaro; Tomari, Yukihide; Shimada, Toru; Katsuma, Susumu

    2011-01-01

    PIWI-interacting RNAs (piRNAs) are 23–30-nucleotide-long small RNAs that act as sequence-specific silencers of transposable elements in animal gonads. In flies, genetics and deep sequencing data have led to a hypothesis for piRNA biogenesis called the ping-pong cycle, where antisense primary piRNAs initiate an amplification loop to generate sense secondary piRNAs. However, to date, the process of the ping-pong cycle has never been monitored at work. Here, by large-scale profiling of piRNAs from silkworm ovary and embryos of different developmental stages, we demonstrate that maternally inherited antisense-biased piRNAs trigger acute amplification of secondary sense piRNA production in zygotes, at a time coinciding with zygotic transcription of sense transposon mRNAs. These results provide on-site evidence for the ping-pong cycle. PMID:21628432

  6. Diverse, rare microbial taxa responded to the Deepwater Horizon deep-sea hydrocarbon plume

    PubMed Central

    Kleindienst, Sara; Grim, Sharon; Sogin, Mitchell; Bracco, Annalisa; Crespo-Medina, Melitza; Joye, Samantha B

    2016-01-01

    The Deepwater Horizon (DWH) oil well blowout generated an enormous plume of dispersed hydrocarbons that substantially altered the Gulf of Mexico's deep-sea microbial community. A significant enrichment of distinct microbial populations was observed, yet, little is known about the abundance and richness of specific microbial ecotypes involved in gas, oil and dispersant biodegradation in the wake of oil spills. Here, we document a previously unrecognized diversity of closely related taxa affiliating with Cycloclasticus, Colwellia and Oceanospirillaceae and describe their spatio-temporal distribution in the Gulf's deepwater, in close proximity to the discharge site and at increasing distance from it, before, during and after the discharge. A highly sensitive, computational method (oligotyping) applied to a data set generated from 454-tag pyrosequencing of bacterial 16S ribosomal RNA gene V4–V6 regions, enabled the detection of population dynamics at the sub-operational taxonomic unit level (0.2% sequence similarity). The biogeochemical signature of the deep-sea samples was assessed via total cell counts, concentrations of short-chain alkanes (C1–C5), nutrients, (colored) dissolved organic and inorganic carbon, as well as methane oxidation rates. Statistical analysis elucidated environmental factors that shaped ecologically relevant dynamics of oligotypes, which likely represent distinct ecotypes. Major hydrocarbon degraders, adapted to the slow-diffusive natural hydrocarbon seepage in the Gulf of Mexico, appeared unable to cope with the conditions encountered during the DWH spill or were outcompeted. In contrast, diverse, rare taxa increased rapidly in abundance, underscoring the importance of specialized sub-populations and potential ecotypes during massive deep-sea oil discharges and perhaps other large-scale perturbations. PMID:26230048

  7. Resolving Relationships among the Megadiverse Butterflies and Moths with a Novel Pipeline for Anchored Phylogenomics.

    PubMed

    Breinholt, Jesse W; Earl, Chandra; Lemmon, Alan R; Lemmon, Emily Moriarty; Xiao, Lei; Kawahara, Akito Y

    2018-01-01

    The advent of next-generation sequencing technology has allowed for thecollection of large portions of the genome for phylogenetic analysis. Hybrid enrichment and transcriptomics are two techniques that leverage next-generation sequencing and have shown much promise. However, methods for processing hybrid enrichment data are still limited. We developed a pipeline for anchored hybrid enrichment (AHE) read assembly, orthology determination, contamination screening, and data processing for sequences flanking the target "probe" region. We apply this approach to study the phylogeny of butterflies and moths (Lepidoptera), a megadiverse group of more than 157,000 described species with poorly understood deep-level phylogenetic relationships. We introduce a new, 855 locus AHE kit for Lepidoptera phylogenetics and compare resulting trees to those from transcriptomes. The enrichment kit was designed from existing genomes, transcriptomes, and expressed sequence tags and was used to capture sequence data from 54 species from 23 lepidopteran families. Phylogenies estimated from AHE data were largely congruent with trees generated from transcriptomes, with strong support for relationships at all but the deepest taxonomic levels. We combine AHE and transcriptomic data to generate a new Lepidoptera phylogeny, representing 76 exemplar species in 42 families. The tree provides robust support for many relationships, including those among the seven butterfly families. The addition of AHE data to an existing transcriptomic dataset lowers node support along the Lepidoptera backbone, but firmly places taxa with AHE data on the phylogeny. Combining taxa sequenced for AHE with existing transcriptomes and genomes resulted in a tree with strong support for (Calliduloidea $+$ Gelechioidea $+$ Thyridoidea) $+$ (Papilionoidea $+$ Pyraloidea $+$ Macroheterocera). To examine the efficacy of AHE at a shallow taxonomic level, phylogenetic analyses were also conducted on a sister group representing a more recent divergence, the Saturniidae and Sphingidae. These analyses utilized sequences from the probe region and data flanking it, nearly doubled the size of the dataset; resulting trees supported new phylogenetics relationships, especially within the Saturniidae and Sphingidae (e.g., Hemarina derived in the latter). We hope that our data processing pipeline, hybrid enrichment gene set, and approach of combining AHE data with transcriptomes will be useful for the broader systematics community. © The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  8. Deep RNNs for video denoising

    NASA Astrophysics Data System (ADS)

    Chen, Xinyuan; Song, Li; Yang, Xiaokang

    2016-09-01

    Video denoising can be described as the problem of mapping from a specific length of noisy frames to clean one. We propose a deep architecture based on Recurrent Neural Network (RNN) for video denoising. The model learns a patch-based end-to-end mapping between the clean and noisy video sequences. It takes the corrupted video sequences as the input and outputs the clean one. Our deep network, which we refer to as deep Recurrent Neural Networks (deep RNNs or DRNNs), stacks RNN layers where each layer receives the hidden state of the previous layer as input. Experiment shows (i) the recurrent architecture through temporal domain extracts motion information and does favor to video denoising, and (ii) deep architecture have large enough capacity for expressing mapping relation between corrupted videos as input and clean videos as output, furthermore, (iii) the model has generality to learned different mappings from videos corrupted by different types of noise (e.g., Poisson-Gaussian noise). By training on large video databases, we are able to compete with some existing video denoising methods.

  9. Genome-Wide Identification of miRNAs Responsive to Drought in Peach (Prunus persica) by High-Throughput Deep Sequencing

    PubMed Central

    Eldem, Vahap; Çelikkol Akçay, Ufuk; Ozhuner, Esma; Bakır, Yakup; Uranbey, Serkan; Unver, Turgay

    2012-01-01

    Peach (Prunus persica L.) is one of the most important worldwide fresh fruits. Since fruit growth largely depends on adequate water supply, drought stress is considered as the most important abiotic stress limiting fleshy fruit production and quality in peach. Plant responses to drought stress are regulated both at transcriptional and post-transcriptional level. As post-transcriptional gene regulators, miRNAs (miRNAs) are small (19–25 nucleotides in length), endogenous, non-coding RNAs. Recent studies indicate that miRNAs are involved in plant responses to drought. Therefore, Illumina deep sequencing technology was used for genome-wide identification of miRNAs and their expression profile in response to drought in peach. In this study, four sRNA libraries were constructed from leaf control (LC), leaf stress (LS), root control (RC) and root stress (RS) samples. We identified a total of 531, 471, 535 and 487 known mature miRNAs in LC, LS, RC and RS libraries, respectively. The expression level of 262 (104 up-regulated, 158 down-regulated) of the 453 miRNAs changed significantly in leaf tissue, whereas 368 (221 up-regulated, 147 down-regulated) of the 465 miRNAs had expression levels that changed significantly in root tissue upon drought stress. Additionally, a total of 197, 221, 238 and 265 novel miRNA precursor candidates were identified from LC, LS, RC and RS libraries, respectively. Target transcripts (137 for LC, 133 for LS, 148 for RC and 153 for RS) generated significant Gene Ontology (GO) terms related to DNA binding and catalytic activites. Genome-wide miRNA expression analysis of peach by deep sequencing approach helped to expand our understanding of miRNA function in response to drought stress in peach and Rosaceae. A set of differentially expressed miRNAs could pave the way for developing new strategies to alleviate the adverse effects of drought stress on plant growth and development. PMID:23227166

  10. Deep ART Neural Model for Biologically Inspired Episodic Memory and Its Application to Task Performance of Robots.

    PubMed

    Park, Gyeong-Moon; Yoo, Yong-Ho; Kim, Deok-Hwa; Kim, Jong-Hwan; Gyeong-Moon Park; Yong-Ho Yoo; Deok-Hwa Kim; Jong-Hwan Kim; Yoo, Yong-Ho; Park, Gyeong-Moon; Kim, Jong-Hwan; Kim, Deok-Hwa

    2018-06-01

    Robots are expected to perform smart services and to undertake various troublesome or difficult tasks in the place of humans. Since these human-scale tasks consist of a temporal sequence of events, robots need episodic memory to store and retrieve the sequences to perform the tasks autonomously in similar situations. As episodic memory, in this paper we propose a novel Deep adaptive resonance theory (ART) neural model and apply it to the task performance of the humanoid robot, Mybot, developed in the Robot Intelligence Technology Laboratory at KAIST. Deep ART has a deep structure to learn events, episodes, and even more like daily episodes. Moreover, it can retrieve the correct episode from partial input cues robustly. To demonstrate the effectiveness and applicability of the proposed Deep ART, experiments are conducted with the humanoid robot, Mybot, for performing the three tasks of arranging toys, making cereal, and disposing of garbage.

  11. Generation of sedimentary fabrics and facies by repetitive excavation and storm infilling of burrow networks Holocene of south Florida and Caicos Platform, B. W. I

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tedesco, L.P.; Wanless, H.R.

    Excavation of deep, open burrow networks and subsequent infilling with sediment from the surface produces an entirely new sedimentary deposit and results in obliteration to severe diagenetic transformation of precursor depositional facies. Repetitive excavation and infilling is responsible for creating the preserved depositional facies of many marine deposits. Excavating burrowers occur from intertidal to abyssal depths, and are important throughout the Phanerozoic. The repetitive coupling of deep, open burrow excavation with subsequent storm sediment infilling of open burrow networks is a gradual process that ultimately results in the loss of the original deposit and the generation of new lithologies, fabricsmore » and facies. The new lithologies are produced in the subsurface and possess fabrics, textures and skeletal assemblages that are not a direct reflection of either precursor facies or the surficial depositional conditions. Sedimentary facies generated by repetitive burrow excavation and infilling commonly are massively bedded and generally are mottled skeletal packstones. Skeletal grains usually are well-preserved and coarser components are concentrated in burrow networks, pockets and patches. The coarse skeletal components of burrow-generated facies are a mixture of coarse bioclasts from the precursor facies and both the coarse and fine skeletal material introduced from the sediment surface. Many so-called bioturbated or massive facies may, in fact, be primary depositional facies generated in the subsurface and represent severe diagenetic transformation of originally deposited sequences. In addition, mudstones and wackestones mottled with packstone patches may record storm sedimentation.« less

  12. Ecogenomic characterization of a marine microorganism belonging to a Firmicutes lineage that is widespread in both terrestrial and oceanic subsurface environments

    NASA Astrophysics Data System (ADS)

    Jungbluth, S.; Glavina del Rio, T.; Tringe, S. G.; Stepanauskas, R.; Rappe, M. S.

    2015-12-01

    Large-volumes of basalt-hosted fluids from the sediment-covered subseafloor were collected in July 2011 from a horizon extending 29-117 meters below the sediment-rock interface at borehole 1362B, as well as from a deep horizon extending 193-292 meters below the sediment-rock interface at borehole 1362A, which are two of the latest generation of borehole observatories on the Juan de Fuca Ridge flank in the Northeast Pacific Ocean. Environmental DNA was sequenced from one fluid sample collected from each borehole, and a genomic bin related to the terrestrial Candidatus Desulforudis audaxviator lineage within the Firmicutes phylum of bacteria was identified. The near-complete bacterial genome, herein named Candidatus Desulfopertinax inferamarinus, is composed of six scaffolds totaling 1.78 Mbp in length. Despite vast differences in geography and environment of origin, phylogenomic analysis indicate that D. inferamarinus and D. audaxviator form a monophyletic clade to the exclusion of all other sequenced genomes. Similar to its terrestrial relative, the draft genome of the marine D. inferamarinus revealed a motile, sporulating, sulfate-reducing, chemoautotrophic thermophile that is capable of synthesizing all amino acids and fixing inorganic carbon via the Wood-Ljungdahl pathway. Unlike the terrestrial clade, relatively few integrases and transposases were identified. The marine genome described here provides evidence that a life-style adapted to the isolated deep subsurface environment is a general feature of the broader, globally-distributed Desulforudis/Desulfopertinax lineage and provides insight into the adaptations required for microbial life in the marine versus terrestrial deep biospheres.

  13. Use of deep whole-genome sequencing data to identify structure risk variants in breast cancer susceptibility genes.

    PubMed

    Guo, Xingyi; Shi, Jiajun; Cai, Qiuyin; Shu, Xiao-Ou; He, Jing; Wen, Wanqing; Allen, Jamie; Pharoah, Paul; Dunning, Alison; Hunter, David J; Kraft, Peter; Easton, Douglas F; Zheng, Wei; Long, Jirong

    2018-03-01

    Functional disruptions of susceptibility genes by large genomic structure variant (SV) deletions in germlines are known to be associated with cancer risk. However, few studies have been conducted to systematically search for SV deletions in breast cancer susceptibility genes. We analysed deep (> 30x) whole-genome sequencing (WGS) data generated in blood samples from 128 breast cancer patients of Asian and European descent with either a strong family history of breast cancer or early cancer onset disease. To identify SV deletions in known or suspected breast cancer susceptibility genes, we used multiple SV calling tools including Genome STRiP, Delly, Manta, BreakDancer and Pindel. SV deletions were detected by at least three of these bioinformatics tools in five genes. Specifically, we identified heterozygous deletions covering a fraction of the coding regions of BRCA1 (with approximately 80kb in two patients), and TP53 genes (with ∼1.6 kb in two patients), and of intronic regions (∼1 kb) of the PALB2 (one patient), PTEN (three patients) and RAD51C genes (one patient). We confirmed the presence of these deletions using real-time quantitative PCR (qPCR). Our study identified novel SV deletions in breast cancer susceptibility genes and the identification of such SV deletions may improve clinical testing.

  14. The complete mitochondrial genome of the deep-sea sponge Poecillastra laminaris (Astrophorida, Vulcanellidae).

    PubMed

    Zeng, Cong; Thomas, Leighton J; Kelly, Michelle; Gardner, Jonathan P A

    2016-05-01

    The complete mitochondrial genome of a New Zealand specimen of the deep-sea sponge Poecillastra laminaris (Sollas, 1886) (Astrophorida, Vulcanellidae), from the Colville Ridge, New Zealand, was sequenced using the 454 Life Science pyrosequencing system. To identify homologous mitochondrial sequences, the 454 reads were mapped to the complete mitochondrial genome sequence of Geodia neptuni (GeneBank No. NC_006990). The P. laminaris genome is 18,413 bp in length and includes 14 protein-coding genes, 24 transfer RNA genes and 2 ribosomal RNA genes. Gene order resembled that of other demosponges. The base composition of the genome is A (29.1%), T (35.2%), C (14.0%) and G (21.7%). This is the second published mitogenome for a sponge of the order Astrophorida and will be useful in future phylogenetic analysis of deep-sea sponges.

  15. Unraveling the Tangled Skein: The Evolution of Transcriptional Regulatory Networks in Development.

    PubMed

    Rebeiz, Mark; Patel, Nipam H; Hinman, Veronica F

    2015-01-01

    The molecular and genetic basis for the evolution of anatomical diversity is a major question that has inspired evolutionary and developmental biologists for decades. Because morphology takes form during development, a true comprehension of how anatomical structures evolve requires an understanding of the evolutionary events that alter developmental genetic programs. Vast gene regulatory networks (GRNs) that connect transcription factors to their target regulatory sequences control gene expression in time and space and therefore determine the tissue-specific genetic programs that shape morphological structures. In recent years, many new examples have greatly advanced our understanding of the genetic alterations that modify GRNs to generate newly evolved morphologies. Here, we review several aspects of GRN evolution, including their deep preservation, their mechanisms of alteration, and how they originate to generate novel developmental programs.

  16. MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction.

    PubMed

    Wang, Duolin; Zeng, Shuai; Xu, Chunhui; Qiu, Wangren; Liang, Yanchun; Joshi, Trupti; Xu, Dong

    2017-12-15

    Computational methods for phosphorylation site prediction play important roles in protein function studies and experimental design. Most existing methods are based on feature extraction, which may result in incomplete or biased features. Deep learning as the cutting-edge machine learning method has the ability to automatically discover complex representations of phosphorylation patterns from the raw sequences, and hence it provides a powerful tool for improvement of phosphorylation site prediction. We present MusiteDeep, the first deep-learning framework for predicting general and kinase-specific phosphorylation sites. MusiteDeep takes raw sequence data as input and uses convolutional neural networks with a novel two-dimensional attention mechanism. It achieves over a 50% relative improvement in the area under the precision-recall curve in general phosphorylation site prediction and obtains competitive results in kinase-specific prediction compared to other well-known tools on the benchmark data. MusiteDeep is provided as an open-source tool available at https://github.com/duolinwang/MusiteDeep. xudong@missouri.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  17. Fungal diversity in deep-sea sediments associated with asphalt seeps at the Sao Paulo Plateau

    NASA Astrophysics Data System (ADS)

    Nagano, Yuriko; Miura, Toshiko; Nishi, Shinro; Lima, Andre O.; Nakayama, Cristina; Pellizari, Vivian H.; Fujikura, Katsunori

    2017-12-01

    We investigated the fungal diversity in a total of 20 deep-sea sediment samples (of which 14 samples were associated with natural asphalt seeps and 6 samples were not associated) collected from two different sites at the Sao Paulo Plateau off Brazil by Ion Torrent PGM targeting ITS region of ribosomal RNA. Our results suggest that diverse fungi (113 operational taxonomic units (OTUs) based on clustering at 97% sequence similarity assigned into 9 classes and 31 genus) are present in deep-sea sediment samples collected at the Sao Paulo Plateau, dominated by Ascomycota (74.3%), followed by Basidiomycota (11.5%), unidentified fungi (7.1%), and sequences with no affiliation to any organisms in the public database (7.1%). However, it was revealed that only three species, namely Penicillium sp., Cadophora malorum and Rhodosporidium diobovatum, were dominant, with the majority of OTUs remaining a minor community. Unexpectedly, there was no significant difference in major fungal community structure between the asphalt seep and non-asphalt seep sites, despite the presence of mass hydrocarbon deposits and the high amount of macro organisms surrounding the asphalt seeps. However, there were some differences in the minor fungal communities, with possible asphalt degrading fungi present specifically in the asphalt seep sites. In contrast, some differences were found between the two different sampling sites. Classification of OTUs revealed that only 47 (41.6%) fungal OTUs exhibited >97% sequence similarity, in comparison with pre-existing ITS sequences in public databases, indicating that a majority of deep-sea inhabiting fungal taxa still remain undescribed. Although our knowledge on fungi and their role in deep-sea environments is still limited and scarce, this study increases our understanding of fungal diversity and community structure in deep-sea environments.

  18. LookSeq: a browser-based viewer for deep sequencing data.

    PubMed

    Manske, Heinrich Magnus; Kwiatkowski, Dominic P

    2009-11-01

    Sequencing a genome to great depth can be highly informative about heterogeneity within an individual or a population. Here we address the problem of how to visualize the multiple layers of information contained in deep sequencing data. We propose an interactive AJAX-based web viewer for browsing large data sets of aligned sequence reads. By enabling seamless browsing and fast zooming, the LookSeq program assists the user to assimilate information at different levels of resolution, from an overview of a genomic region to fine details such as heterogeneity within the sample. A specific problem, particularly if the sample is heterogeneous, is how to depict information about structural variation. LookSeq provides a simple graphical representation of paired sequence reads that is more revealing about potential insertions and deletions than are conventional methods.

  19. The deep, hot biosphere: Twenty-five years of retrospection

    PubMed Central

    Colman, Daniel R.; Poudel, Saroj; Stamps, Blake W.; Boyd, Eric S.; Spear, John R.

    2017-01-01

    Twenty-five years ago this month, Thomas Gold published a seminal manuscript suggesting the presence of a “deep, hot biosphere” in the Earth’s crust. Since this publication, a considerable amount of attention has been given to the study of deep biospheres, their role in geochemical cycles, and their potential to inform on the origin of life and its potential outside of Earth. Overwhelming evidence now supports the presence of a deep biosphere ubiquitously distributed on Earth in both terrestrial and marine settings. Furthermore, it has become apparent that much of this life is dependent on lithogenically sourced high-energy compounds to sustain productivity. A vast diversity of uncultivated microorganisms has been detected in subsurface environments, and we show that H2, CH4, and CO feature prominently in many of their predicted metabolisms. Despite 25 years of intense study, key questions remain on life in the deep subsurface, including whether it is endemic and the extent of its involvement in the anaerobic formation and degradation of hydrocarbons. Emergent data from cultivation and next-generation sequencing approaches continue to provide promising new hints to answer these questions. As Gold suggested, and as has become increasingly evident, to better understand the subsurface is critical to further understanding the Earth, life, the evolution of life, and the potential for life elsewhere. To this end, we suggest the need to develop a robust network of interdisciplinary scientists and accessible field sites for long-term monitoring of the Earth’s subsurface in the form of a deep subsurface microbiome initiative. PMID:28674200

  20. STRUCTURAL GEOMETRY OF AN EXHUMED UHP TERRANE IN THE EASTERN SULU OROGEN, CHINA: IMPLICATIONS FOR CONTINENTAL COLLISIONAL PROCESSES

    NASA Astrophysics Data System (ADS)

    Wang, L.; Kusky, T.

    2009-12-01

    High-precision 1:1,000 mapping of Yangkou Bay, eastern Sulu orogen, defines the structural geometry and history of the world’s most significant UHP (Ultrahigh Pressure) rock exposures. Four stages of folds are recognized in the UHP rocks and associated quartzo-feldspathic gneiss. Eclogite facies rootless F1 and isoclinal F2 folds are preserved locally in coesite-eclogite. Mylonitic to ultramylonitic cosesit-eclogite shear zones separate 5-10-meter-thick nappes of ultramafic-mafic UHP rocks from banded quartzo-feldspathic gneiss. These shear zones are folded, and progressively overprinted by amphibolite and greenschist facies shear zones that become wider with lower grade. The deformation sequences is explained by deep subduction of offscraped thrust slices of oceanic or lower continental crust, caught between the colliding North and South China cratons in the Mesozoic. After these slices were structurally isolated along the plate interface, they were rolled like ball-bearings, in the subduction channel during their exhumation, forming several generations of folds, sequentially lower-grade foliations and lineations, and intruded by several generations of in situ and exotically derived melts. The shear zones formed during different generations of deformation are wider with lower grades, suggesting that deep-crustal/upper mantle deformation operates efficiently (perhaps with more active crystallographic slip systems) than deformation at mid to upper crustal levels.

  1. Deep Sequencing of Influenza A Virus from a Human Challenge Study Reveals a Selective Bottleneck and Only Limited Intrahost Genetic Diversification.

    PubMed

    Sobel Leonard, Ashley; McClain, Micah T; Smith, Gavin J D; Wentworth, David E; Halpin, Rebecca A; Lin, Xudong; Ransier, Amy; Stockwell, Timothy B; Das, Suman R; Gilbert, Anthony S; Lambkin-Williams, Robert; Ginsburg, Geoffrey S; Woods, Christopher W; Koelle, Katia

    2016-12-15

    Knowledge of influenza virus evolution at the point of transmission and at the intrahost level remains limited, particularly for human hosts. Here, we analyze a unique viral data set of next-generation sequencing (NGS) samples generated from a human influenza challenge study wherein 17 healthy subjects were inoculated with cell- and egg-passaged virus. Nasal wash samples collected from 7 of these subjects were successfully deep sequenced. From these, we characterized changes in the subjects' viral populations during infection and identified differences between the virus in these samples and the viral stock used to inoculate the subjects. We first calculated pairwise genetic distances between the subjects' nasal wash samples, the viral stock, and the influenza virus A/Wisconsin/67/2005 (H3N2) reference strain used to generate the stock virus. These distances revealed that considerable viral evolution occurred at various points in the human challenge study. Further quantitative analyses indicated that (i) the viral stock contained genetic variants that originated and likely were selected for during the passaging process, (ii) direct intranasal inoculation with the viral stock resulted in a selective bottleneck that reduced nonsynonymous genetic diversity in the viral hemagglutinin and nucleoprotein, and (iii) intrahost viral evolution continued over the course of infection. These intrahost evolutionary dynamics were dominated by purifying selection. Our findings indicate that rapid viral evolution can occur during acute influenza infection in otherwise healthy human hosts when the founding population size of the virus is large, as is the case with direct intranasal inoculation. Influenza viruses circulating among humans are known to rapidly evolve over time. However, little is known about how influenza virus evolves across single transmission events and over the course of a single infection. To address these issues, we analyze influenza virus sequences from a human challenge experiment that initiated infection with a cell- and egg-passaged viral stock, which appeared to have adapted during its preparation. We find that the subjects' viral populations differ genetically from the viral stock, with subjects' viral populations having lower representation of the amino-acid-changing variants that arose during viral preparation. We also find that most of the viral evolution occurring over single infections is characterized by further decreases in the frequencies of these amino-acid-changing variants and that only limited intrahost genetic diversification through new mutations is apparent. Our findings indicate that influenza virus populations can undergo rapid genetic changes during acute human infections. Copyright © 2016 Sobel Leonard et al.

  2. Unveiling the Biodiversity of Deep-Sea Nematodes through Metabarcoding: Are We Ready to Bypass the Classical Taxonomy?

    PubMed Central

    2015-01-01

    Nematodes inhabiting benthic deep-sea ecosystems account for >90% of the total metazoan abundances and they have been hypothesised to be hyper-diverse, but their biodiversity is still largely unknown. Metabarcoding could facilitate the census of biodiversity, especially for those tiny metazoans for which morphological identification is difficult. We compared, for the first time, different DNA extraction procedures based on the use of two commercial kits and a previously published laboratory protocol and tested their suitability for sequencing analyses of 18S rDNA of marine nematodes. We also investigated the reliability of Roche 454 sequencing analyses for assessing the biodiversity of deep-sea nematode assemblages previously morphologically identified. Finally, intra-genomic variation in 18S rRNA gene repeats was investigated by Illumina MiSeq in different deep-sea nematode morphospecies to assess the influence of polymorphisms on nematode biodiversity estimates. Our results indicate that the two commercial kits should be preferred for the molecular analysis of biodiversity of deep-sea nematodes since they consistently provide amplifiable DNA suitable for sequencing. We report that the morphological identification of deep-sea nematodes matches the results obtained by metabarcoding analysis only at the order-family level and that a large portion of Operational Clustered Taxonomic Units (OCTUs) was not assigned. We also show that independently from the cut-off criteria and bioinformatic pipelines used, the number of OCTUs largely exceeds the number of individuals and that 18S rRNA gene of different morpho-species of nematodes displayed intra-genomic polymorphisms. Our results indicate that metabarcoding is an important tool to explore the diversity of deep-sea nematodes, but still fails in identifying most of the species due to limited number of sequences deposited in the public databases, and in providing quantitative data on the species encountered. These aspects should be carefully taken into account before using metabarcoding in quantitative ecological research and monitoring programmes of marine biodiversity. PMID:26701112

  3. Unveiling the Biodiversity of Deep-Sea Nematodes through Metabarcoding: Are We Ready to Bypass the Classical Taxonomy?

    PubMed

    Dell'Anno, Antonio; Carugati, Laura; Corinaldesi, Cinzia; Riccioni, Giulia; Danovaro, Roberto

    2015-01-01

    Nematodes inhabiting benthic deep-sea ecosystems account for >90% of the total metazoan abundances and they have been hypothesised to be hyper-diverse, but their biodiversity is still largely unknown. Metabarcoding could facilitate the census of biodiversity, especially for those tiny metazoans for which morphological identification is difficult. We compared, for the first time, different DNA extraction procedures based on the use of two commercial kits and a previously published laboratory protocol and tested their suitability for sequencing analyses of 18S rDNA of marine nematodes. We also investigated the reliability of Roche 454 sequencing analyses for assessing the biodiversity of deep-sea nematode assemblages previously morphologically identified. Finally, intra-genomic variation in 18S rRNA gene repeats was investigated by Illumina MiSeq in different deep-sea nematode morphospecies to assess the influence of polymorphisms on nematode biodiversity estimates. Our results indicate that the two commercial kits should be preferred for the molecular analysis of biodiversity of deep-sea nematodes since they consistently provide amplifiable DNA suitable for sequencing. We report that the morphological identification of deep-sea nematodes matches the results obtained by metabarcoding analysis only at the order-family level and that a large portion of Operational Clustered Taxonomic Units (OCTUs) was not assigned. We also show that independently from the cut-off criteria and bioinformatic pipelines used, the number of OCTUs largely exceeds the number of individuals and that 18S rRNA gene of different morpho-species of nematodes displayed intra-genomic polymorphisms. Our results indicate that metabarcoding is an important tool to explore the diversity of deep-sea nematodes, but still fails in identifying most of the species due to limited number of sequences deposited in the public databases, and in providing quantitative data on the species encountered. These aspects should be carefully taken into account before using metabarcoding in quantitative ecological research and monitoring programmes of marine biodiversity.

  4. PhyloPythiaS+: a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes.

    PubMed

    Gregor, Ivan; Dröge, Johannes; Schirmer, Melanie; Quince, Christopher; McHardy, Alice C

    2016-01-01

    Background. Metagenomics is an approach for characterizing environmental microbial communities in situ, it allows their functional and taxonomic characterization and to recover sequences from uncultured taxa. This is often achieved by a combination of sequence assembly and binning, where sequences are grouped into 'bins' representing taxa of the underlying microbial community. Assignment to low-ranking taxonomic bins is an important challenge for binning methods as is scalability to Gb-sized datasets generated with deep sequencing techniques. One of the best available methods for species bins recovery from deep-branching phyla is the expert-trained PhyloPythiaS package, where a human expert decides on the taxa to incorporate in the model and identifies 'training' sequences based on marker genes directly from the sample. Due to the manual effort involved, this approach does not scale to multiple metagenome samples and requires substantial expertise, which researchers who are new to the area do not have. Results. We have developed PhyloPythiaS+, a successor to our PhyloPythia(S) software. The new (+) component performs the work previously done by the human expert. PhyloPythiaS+ also includes a new k-mer counting algorithm, which accelerated the simultaneous counting of 4-6-mers used for taxonomic binning 100-fold and reduced the overall execution time of the software by a factor of three. Our software allows to analyze Gb-sized metagenomes with inexpensive hardware, and to recover species or genera-level bins with low error rates in a fully automated fashion. PhyloPythiaS+ was compared to MEGAN, taxator-tk, Kraken and the generic PhyloPythiaS model. The results showed that PhyloPythiaS+ performs especially well for samples originating from novel environments in comparison to the other methods. Availability. PhyloPythiaS+ in a virtual machine is available for installation under Windows, Unix systems or OS X on: https://github.com/algbioi/ppsp/wiki.

  5. Molecular Phylogenetic Analysis of Archaeal Intron-Containing Genes Coding for rRNA Obtained from a Deep-Subsurface Geothermal Water Pool

    PubMed Central

    Takai, Ken; Horikoshi, Koki

    1999-01-01

    Molecular phylogenetic analysis of a naturally occurring microbial community in a deep-subsurface geothermal environment indicated that the phylogenetic diversity of the microbial population in the environment was extremely limited and that only hyperthermophilic archaeal members closely related to Pyrobaculum were present. All archaeal ribosomal DNA sequences contained intron-like sequences, some of which had open reading frames with repeated homing-endonuclease motifs. The sequence similarity analysis and the phylogenetic analysis of these homing endonucleases suggested the possible phylogenetic relationship among archaeal rRNA-encoded homing endonucleases. PMID:10584021

  6. A first insight into the occurrence and expression of functional amoA and accA genes of autotrophic and ammonia-oxidizing bathypelagic Crenarchaeota of Tyrrhenian Sea

    NASA Astrophysics Data System (ADS)

    Yakimov, Michail M.; Cono, Violetta La; Denaro, Renata

    2009-05-01

    The autotrophic and ammonia-oxidizing crenarchaeal assemblage at offshore site located in the deep Mediterranean (Tyrrhenian Sea, depth 3000 m) water was studied by PCR amplification of the key functional genes involved in energy (ammonia mono-oxygenase alpha subunit, amoA) and central metabolism (acetyl-CoA carboxylase alpha subunit, accA). Using two recently annotated genomes of marine crenarchaeons, an initial set of primers targeting archaeal accA-like genes was designed. Approximately 300 clones were analyzed, of which 100% of amoA library and almost 70% of accA library were unambiguously related to the corresponding genes from marine Crenarchaeota. Even though the acetyl-CoA carboxylase is phylogenetically not well conserved and the remaining clones were affiliated to various bacterial acetyl-CoA/propionyl-CoA carboxylase genes, the pool of archaeal sequences was applied for development of quantitative PCR analysis of accA-like distribution using TaqMan ® methodolgy. The archaeal accA gene fragments, together with alignable gene fragments from the Sargasso Sea and North Pacific Subtropical Gyre (ALOHA Station) metagenome databases, were analyzed by multiple sequence alignment. Two accA-like sequences, found in ALOHA Station at the depth of 4000 m, formed a deeply branched clade with 64% of all archaeal Tyrrhenian clones. No close relatives for residual 36% of clones, except of those recovered from Eastern Mediterranean, was found, suggesting the existence of a specific lineage of the crenarchaeal accA genes in deep Mediterranean water. Alignment of Mediterranean amoA sequences defined four cosmopolitan phylotypes of Crenarchaeota putative ammonia mono-oxygenase subunit A gene occurring in the water sample from the 3000 m depth. Without exception all phylotypes fell into Deep Marine Group I cluster that contain the vast majority of known sequences recovered from global deep-sea environment. Remarkably, three phylotypes accounted for 91% of all Mediterranean amoA clones and corresponded to the sequences retrieved from the less deep compartments of the world's ocean, most likely reflecting the higher temperature at the depth of the Mediterranean Sea. In order to verify whether these phylotypes might represent important Crenarchaeota in the functioning of the Mediterranean bathypelagic ecosystem, expression of crenarchaeal amoA gene was monitored by direct RNA retrieval and following analysis of amoA-related mRNA transcripts. Surprisingly, all mRNA-derived sequences formed a tight monophyletic group, which fell into large Shallow Marine Group I cluster with sequences retrieved from shallow (up to 200 m) waters, sediments and corals. This group was not detected in DNA-based clone library, obviously, due to an overwhelming dominance of the Deep Marine Group I. The failure to recover the amoA transcripts, related to Deep Marine Group I of Crenarchaeota, was unanticipated and likely resulted from the physiology of these strongly adapted deep-sea organisms. As far as all seawater samples were treated on-board under atmospheric pressure conditions and sunlight, the decompression and/or photoinhibition likely affected their metabolic activity, followed by the strong decay of gene expression.

  7. PolyA_DB 3 catalogs cleavage and polyadenylation sites identified by deep sequencing in multiple genomes

    PubMed Central

    Wang, Ruijia; Nambiar, Ram; Zheng, Dinghai

    2018-01-01

    Abstract PolyA_DB is a database cataloging cleavage and polyadenylation sites (PASs) in several genomes. Previous versions were based mainly on expressed sequence tags (ESTs), which had a limited amount and could lead to inaccurate PAS identification due to the presence of internal A-rich sequences in transcripts. Here, we present an updated version of the database based solely on deep sequencing data. First, PASs are mapped by the 3′ region extraction and deep sequencing (3′READS) method, ensuring unequivocal PAS identification. Second, a large volume of data based on diverse biological samples increases PAS coverage by 3.5-fold over the EST-based version and provides PAS usage information. Third, strand-specific RNA-seq data are used to extend annotated 3′ ends of genes to obtain more thorough annotations of alternative polyadenylation (APA) sites. Fourth, conservation information of PAS across mammals sheds light on significance of APA sites. The database (URL: http://www.polya-db.org/v3) currently holds PASs in human, mouse, rat and chicken, and has links to the UCSC genome browser for further visualization and for integration with other genomic data. PMID:29069441

  8. MRI markers of small vessel disease in lobar and deep hemispheric intracerebral hemorrhage.

    PubMed

    Smith, Eric E; Nandigam, Kaveer R N; Chen, Yu-Wei; Jeng, Jed; Salat, David; Halpin, Amy; Frosch, Matthew; Wendell, Lauren; Fazen, Louis; Rosand, Jonathan; Viswanathan, Anand; Greenberg, Steven M

    2010-09-01

    MRI evidence of small vessel disease is common in intracerebral hemorrhage (ICH). We hypothesized that ICH caused by cerebral amyloid angiopathy (CAA) or hypertensive vasculopathy would have different distributions of MRI T2 white matter hyperintensity (WMH) and microbleeds. Data were analyzed from 133 consecutive patients with primary supratentorial ICH and adequate MRI sequences. CAA was diagnosed using the Boston criteria. WMH segmentation was performed using a validated semiautomated method. WMH and microbleeds were compared according to site of symptomatic hematoma origin (lobar versus deep) or by pattern of hemorrhages, including both hematomas and microbleeds, on MRI gradient recalled echo sequence (grouped as lobar only-probable CAA, lobar only-possible CAA, deep hemispheric only, or mixed lobar and deep hemorrhages). Patients with lobar and deep hemispheric hematoma had similar median normalized WMH volumes (19.5 cm versus 19.9 cm(3), P=0.74) and prevalence of >or=1 microbleed (54% versus 52%, P=0.99). The supratentorial WMH distribution was similar according to hemorrhage location category; however, the prevalence of brain stem T2 hyperintensity was lower in lobar hematoma versus deep hematoma (54% versus 70%, P=0.004). Mixed ICH was common (23%). Patients with mixed ICH had large normalized WMH volumes and a posterior distribution of cortical hemorrhages similar to that seen in CAA. WMH distribution is largely similar between CAA-related and non-CAA-related ICH. Mixed lobar and deep hemorrhages are seen on MRI gradient recalled echo sequence in up to one fourth of patients; in these patients, both hypertension and CAA may be contributing to the burden of WMH.

  9. Resolving ancient radiations: can complete plastid gene sets elucidate deep relationships among the tropical gingers (Zingiberales)?

    PubMed

    Barrett, Craig F; Specht, Chelsea D; Leebens-Mack, Jim; Stevenson, Dennis Wm; Zomlefer, Wendy B; Davis, Jerrold I

    2014-01-01

    Zingiberales comprise a clade of eight tropical monocot families including approx. 2500 species and are hypothesized to have undergone an ancient, rapid radiation during the Cretaceous. Zingiberales display substantial variation in floral morphology, and several members are ecologically and economically important. Deep phylogenetic relationships among primary lineages of Zingiberales have proved difficult to resolve in previous studies, representing a key region of uncertainty in the monocot tree of life. Next-generation sequencing was used to construct complete plastid gene sets for nine taxa of Zingiberales, which were added to five previously sequenced sets in an attempt to resolve deep relationships among families in the order. Variation in taxon sampling, process partition inclusion and partition model parameters were examined to assess their effects on topology and support. Codon-based likelihood analysis identified a strongly supported clade of ((Cannaceae, Marantaceae), (Costaceae, Zingiberaceae)), sister to (Musaceae, (Lowiaceae, Strelitziaceae)), collectively sister to Heliconiaceae. However, the deepest divergences in this phylogenetic analysis comprised short branches with weak support. Additionally, manipulation of matrices resulted in differing deep topologies in an unpredictable fashion. Alternative topology testing allowed statistical rejection of some of the topologies. Saturation fails to explain observed topological uncertainty and low support at the base of Zingiberales. Evidence for conflict among the plastid data was based on a support metric that accounts for conflicting resampled topologies. Many relationships were resolved with robust support, but the paucity of character information supporting the deepest nodes and the existence of conflict suggest that plastid coding regions are insufficient to resolve and support the earliest divergences among families of Zingiberales. Whole plastomes will continue to be highly useful in plant phylogenetics, but the current study adds to a growing body of literature suggesting that they may not provide enough character information for resolving ancient, rapid radiations.

  10. Resolving ancient radiations: can complete plastid gene sets elucidate deep relationships among the tropical gingers (Zingiberales)?

    PubMed Central

    Barrett, Craig F.; Specht, Chelsea D.; Leebens-Mack, Jim; Stevenson, Dennis Wm.; Zomlefer, Wendy B.; Davis, Jerrold I.

    2014-01-01

    Background and Aims Zingiberales comprise a clade of eight tropical monocot families including approx. 2500 species and are hypothesized to have undergone an ancient, rapid radiation during the Cretaceous. Zingiberales display substantial variation in floral morphology, and several members are ecologically and economically important. Deep phylogenetic relationships among primary lineages of Zingiberales have proved difficult to resolve in previous studies, representing a key region of uncertainty in the monocot tree of life. Methods Next-generation sequencing was used to construct complete plastid gene sets for nine taxa of Zingiberales, which were added to five previously sequenced sets in an attempt to resolve deep relationships among families in the order. Variation in taxon sampling, process partition inclusion and partition model parameters were examined to assess their effects on topology and support. Key Results Codon-based likelihood analysis identified a strongly supported clade of ((Cannaceae, Marantaceae), (Costaceae, Zingiberaceae)), sister to (Musaceae, (Lowiaceae, Strelitziaceae)), collectively sister to Heliconiaceae. However, the deepest divergences in this phylogenetic analysis comprised short branches with weak support. Additionally, manipulation of matrices resulted in differing deep topologies in an unpredictable fashion. Alternative topology testing allowed statistical rejection of some of the topologies. Saturation fails to explain observed topological uncertainty and low support at the base of Zingiberales. Evidence for conflict among the plastid data was based on a support metric that accounts for conflicting resampled topologies. Conclusions Many relationships were resolved with robust support, but the paucity of character information supporting the deepest nodes and the existence of conflict suggest that plastid coding regions are insufficient to resolve and support the earliest divergences among families of Zingiberales. Whole plastomes will continue to be highly useful in plant phylogenetics, but the current study adds to a growing body of literature suggesting that they may not provide enough character information for resolving ancient, rapid radiations. PMID:24280362

  11. Deep sequencing approaches for the analysis of prokaryotic transcriptional boundaries and dynamics.

    PubMed

    James, Katherine; Cockell, Simon J; Zenkin, Nikolay

    2017-05-01

    The identification of the protein-coding regions of a genome is straightforward due to the universality of start and stop codons. However, the boundaries of the transcribed regions, conditional operon structures, non-coding RNAs and the dynamics of transcription, such as pausing of elongation, are non-trivial to identify, even in the comparatively simple genomes of prokaryotes. Traditional methods for the study of these areas, such as tiling arrays, are noisy, labour-intensive and lack the resolution required for densely-packed bacterial genomes. Recently, deep sequencing has become increasingly popular for the study of the transcriptome due to its lower costs, higher accuracy and single nucleotide resolution. These methods have revolutionised our understanding of prokaryotic transcriptional dynamics. Here, we review the deep sequencing and data analysis techniques that are available for the study of transcription in prokaryotes, and discuss the bioinformatic considerations of these analyses. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. Insertion sequences enrichment in extreme Red sea brine pool vent.

    PubMed

    Elbehery, Ali H A; Aziz, Ramy K; Siam, Rania

    2017-03-01

    Mobile genetic elements are major agents of genome diversification and evolution. Limited studies addressed their characteristics, including abundance, and role in extreme habitats. One of the rare natural habitats exposed to multiple-extreme conditions, including high temperature, salinity and concentration of heavy metals, are the Red Sea brine pools. We assessed the abundance and distribution of different mobile genetic elements in four Red Sea brine pools including the world's largest known multiple-extreme deep-sea environment, the Red Sea Atlantis II Deep. We report a gradient in the abundance of mobile genetic elements, dramatically increasing in the harshest environment of the pool. Additionally, we identified a strong association between the abundance of insertion sequences and extreme conditions, being highest in the harshest and deepest layer of the Red Sea Atlantis II Deep. Our comparative analyses of mobile genetic elements in secluded, extreme and relatively non-extreme environments, suggest that insertion sequences predominantly contribute to polyextremophiles genome plasticity.

  13. Measurements of RF heating during 3.0-T MRI of a pig implanted with deep brain stimulator.

    PubMed

    Gorny, Krzysztof R; Presti, Michael F; Goerss, Stephan J; Hwang, Sun C; Jang, Dong-Pyo; Kim, Inyong; Min, Hoon-Ki; Shu, Yunhong; Favazza, Christopher P; Lee, Kendall H; Bernstein, Matt A

    2013-06-01

    To present preliminary, in vivo temperature measurements during MRI of a pig implanted with a deep brain stimulation (DBS) system. DBS system (Medtronic Inc., Minneapolis, MN) was implanted in the brain of an anesthetized pig. 3.0-T MRI was performed with a T/R head coil using the low-SAR GRE EPI and IR-prepped GRE sequences (SAR: 0.42 and 0.39 W/kg, respectively), and the high-SAR 4-echo RF spin echo (SAR: 2.9 W/kg). Fluoroptic thermometry was used to directly measure RF-related heating at the DBS electrodes, and at the implantable pulse generator (IPG). For reference the measurements were repeated in the same pig at 1.5 T and, at both field strengths, in a phantom. At 3.0T, the maximal temperature elevations at DBS electrodes were 0.46 °C and 2.3 °C, for the low- and high-SAR sequences, respectively. No heating was observed on the implanted IPG during any of the measurements. Measurements of in vivo heating differed from those obtained in the phantom. The 3.0-T MRI using GRE EPI and IR-prepped GRE sequences resulted in local temperature elevations at DBS electrodes of no more than 0.46 °C. Although no extrapolation should be made to human exams and much further study will be needed, these preliminary data are encouraging for the future use 3.0-T MRI in patients with DBS. Copyright © 2013 Elsevier Inc. All rights reserved.

  14. mRNA deep sequencing reveals 75 new genes and a complex transcriptional landscape in Mimivirus.

    PubMed

    Legendre, Matthieu; Audic, Stéphane; Poirot, Olivier; Hingamp, Pascal; Seltzer, Virginie; Byrne, Deborah; Lartigue, Audrey; Lescot, Magali; Bernadac, Alain; Poulain, Julie; Abergel, Chantal; Claverie, Jean-Michel

    2010-05-01

    Mimivirus, a virus infecting Acanthamoeba, is the prototype of the Mimiviridae, the latest addition to the nucleocytoplasmic large DNA viruses. The Mimivirus genome encodes close to 1000 proteins, many of them never before encountered in a virus, such as four amino-acyl tRNA synthetases. To explore the physiology of this exceptional virus and identify the genes involved in the building of its characteristic intracytoplasmic "virion factory," we coupled electron microscopy observations with the massively parallel pyrosequencing of the polyadenylated RNA fractions of Acanthamoeba castellanii cells at various time post-infection. We generated 633,346 reads, of which 322,904 correspond to Mimivirus transcripts. This first application of deep mRNA sequencing (454 Life Sciences [Roche] FLX) to a large DNA virus allowed the precise delineation of the 5' and 3' extremities of Mimivirus mRNAs and revealed 75 new transcripts including several noncoding RNAs. Mimivirus genes are expressed across a wide dynamic range, in a finely regulated manner broadly described by three main temporal classes: early, intermediate, and late. This RNA-seq study confirmed the AAAATTGA sequence as an early promoter element, as well as the presence of palindromes at most of the polyadenylation sites. It also revealed a new promoter element correlating with late gene expression, which is also prominent in Sputnik, the recently described Mimivirus "virophage." These results-validated genome-wide by the hybridization of total RNA extracted from infected Acanthamoeba cells on a tiling array (Agilent)--will constitute the foundation on which to build subsequent functional studies of the Mimivirus/Acanthamoeba system.

  15. Measurements of RF Heating during 3.0T MRI of a Pig Implanted with Deep Brain Stimulator

    PubMed Central

    Gorny, Krzysztof R; Presti, Michael F; Goerss, Stephan J; Hwang, Sun C; Jang, Dong-Pyo; Kim, Inyong; Shu, Yunhong; Favazza, Christopher P; Lee, Kendall H; Bernstein, Matt A

    2012-01-01

    Purpose To present preliminary, in vivo temperature measurements during MRI of a pig implanted with a deep brain stimulation (DBS) system. Materials and Methods DBS system (Medtronic Inc., Minneapolis, MN) was implanted in the brain of an anesthetized pig. 3.0T MRI was performed with a T/R head coil using the low-SAR GRE EPI and IR-prepped GRE sequences (SAR: 0.42 W/kg and 0.39 W/kg, respectively), and the high-SAR 4-echo RF spin echo (SAR: 2.9 W/kg). Fluoroptic thermometry was used to directly measure RF-related heating at the DBS electrodes, and at the implantable pulse generator (IPG). For reference the measurements were repeated in the same pig at 1.5T and, at both field strengths, in a phantom. Results At 3.0T, the maximal temperature elevations at DBS electrodes were 0.46 °C and 2.3 °C, for the low- and high-SAR sequences, respectively. No heating was observed on the implanted IPG during any of the measurements. Measurements of in-vivo heating differed from those obtained in the phantom. Conclusion The 3.0T MRI using GRE EPI and IR-prepped GRE sequences resulted in local temperature elevations at DBS electrodes of no more than 0.46°C. Although no extrapolation should be made to human exams and much further study will be needed, these preliminary data are encouraging for the future use 3.0T MRI in patients with DBS. PMID:23228310

  16. Analysis of Variability in HIV-1 Subtype A Strains in Russia Suggests a Combination of Deep Sequencing and Multitarget RNA Interference for Silencing of the Virus.

    PubMed

    Kretova, Olga V; Chechetkin, Vladimir R; Fedoseeva, Daria M; Kravatsky, Yuri V; Sosin, Dmitri V; Alembekov, Ildar R; Gorbacheva, Maria A; Gashnikova, Natalya M; Tchurikov, Nickolai A

    2017-02-01

    Any method for silencing the activity of the HIV-1 retrovirus should tackle the extremely high variability of HIV-1 sequences and mutational escape. We studied sequence variability in the vicinity of selected RNA interference (RNAi) targets from isolates of HIV-1 subtype A in Russia, and we propose that using artificial RNAi is a potential alternative to traditional antiretroviral therapy. We prove that using multiple RNAi targets overcomes the variability in HIV-1 isolates. The optimal number of targets critically depends on the conservation of the target sequences. The total number of targets that are conserved with a probability of 0.7-0.8 should exceed at least 2. Combining deep sequencing and multitarget RNAi may provide an efficient approach to cure HIV/AIDS.

  17. CASSIUS: The Cassini Uplink Scheduler

    NASA Technical Reports Server (NTRS)

    Bellinger, Earl

    2012-01-01

    The Cassini Uplink Scheduler (CASSIUS) is cross-platform software used to generate a radiation sequence plan for commands being sent to the Cassini spacecraft. Because signals must travel through varying amounts of Earth's atmosphere, several different modes of constant telemetry rates have been devised. These modes guarantee that the spacecraft and the Deep Space Network agree with respect to the data transmission rate. However, the memory readout of a command will be lost if it occurs on a telemetry mode boundary. Given a list of spacecraft message files as well as the available telemetry modes, CASSIUS can find an uplink sequence that ensures safe transmission of each file. In addition, it can predict when the two on-board solid state recorders will swap. CASSIUS prevents data corruption by making sure that commands are not planned for memory readout during telemetry rate changes or a solid state recorder swap.

  18. Simultaneous detection of human mitochondrial DNA and nuclear-inserted mitochondrial-origin sequences (NumtS) using forensic mtDNA amplification strategies and pyrosequencing technology.

    PubMed

    Bintz, Brittania J; Dixon, Groves B; Wilson, Mark R

    2014-07-01

    Next-generation sequencing technologies enable the identification of minor mitochondrial DNA variants with higher sensitivity than Sanger methods, allowing for enhanced identification of minor variants. In this study, mixtures of human mtDNA control region amplicons were subjected to pyrosequencing to determine the detection threshold of the Roche GS Junior(®) instrument (Roche Applied Science, Indianapolis, IN). In addition to expected variants, a set of reproducible variants was consistently found in reads from one particular amplicon. A BLASTn search of the variant sequence revealed identity to a segment of a 611-bp nuclear insertion of the mitochondrial control region (NumtS) spanning the primer-binding sites of this amplicon (Nature 1995;378:489). Primers (Hum Genet 2012;131:757; Hum Biol 1996;68:847) flanking the insertion were used to confirm the presence or absence of the NumtS in buccal DNA extracts from twenty donors. These results further our understanding of human mtDNA variation and are expected to have a positive impact on the interpretation of mtDNA profiles using deep-sequencing methods in casework. © 2014 American Academy of Forensic Sciences.

  19. TCRmodel: high resolution modeling of T cell receptors from sequence.

    PubMed

    Gowthaman, Ragul; Pierce, Brian G

    2018-05-22

    T cell receptors (TCRs), along with antibodies, are responsible for specific antigen recognition in the adaptive immune response, and millions of unique TCRs are estimated to be present in each individual. Understanding the structural basis of TCR targeting has implications in vaccine design, autoimmunity, as well as T cell therapies for cancer. Given advances in deep sequencing leading to immune repertoire-level TCR sequence data, fast and accurate modeling methods are needed to elucidate shared and unique 3D structural features of these molecules which lead to their antigen targeting and cross-reactivity. We developed a new algorithm in the program Rosetta to model TCRs from sequence, and implemented this functionality in a web server, TCRmodel. This web server provides an easy to use interface, and models are generated quickly that users can investigate in the browser and download. Benchmarking of this method using a set of nonredundant recently released TCR crystal structures shows that models are accurate and compare favorably to models from another available modeling method. This server enables the community to obtain insights into TCRs of interest, and can be combined with methods to model and design TCR recognition of antigens. The TCRmodel server is available at: http://tcrmodel.ibbr.umd.edu/.

  20. Selective ribosome profiling as a tool to study the interaction of chaperones and targeting factors with nascent polypeptide chains and ribosomes

    PubMed Central

    Becker, Annemarie H.; Oh, Eugene; Weissman, Jonathan S.; Kramer, Günter; Bukau, Bernd

    2014-01-01

    A plethora of factors is involved in the maturation of newly synthesized proteins, including chaperones, membrane targeting factors, and enzymes. Many factors act cotranslationally through association with ribosome-nascent chain complexes (RNCs), but their target specificities and modes of action remain poorly understood. We developed selective ribosome profiling (SeRP) to identify substrate pools and points of RNC engagement of these factors. SeRP is based on sequencing mRNA fragments covered by translating ribosomes (general ribosome profiling, RP), combined with a procedure to selectively isolate RNCs whose nascent polypeptides are associated with the factor of interest. Factor–RNC interactions are stabilized by crosslinking, the resulting factor–RNC adducts are then nuclease-treated to generate monosomes, and affinity-purified. The ribosome-extracted mRNA footprints are converted to DNA libraries for deep sequencing. The protocol is specified for general RP and SeRP in bacteria. It was first applied to the chaperone trigger factor and is readily adaptable to other cotranslationally acting factors, including eukaryotic factors. Factor–RNC purification and sequencing library preparation takes 7–8 days, sequencing and data analysis can be completed in 5–6 days. PMID:24136347

  1. High diversity of picornaviruses in rats from different continents revealed by deep sequencing.

    PubMed

    Hansen, Thomas Arn; Mollerup, Sarah; Nguyen, Nam-Phuong; White, Nicole E; Coghlan, Megan; Alquezar-Planas, David E; Joshi, Tejal; Jensen, Randi Holm; Fridholm, Helena; Kjartansdóttir, Kristín Rós; Mourier, Tobias; Warnow, Tandy; Belsham, Graham J; Bunce, Michael; Willerslev, Eske; Nielsen, Lars Peter; Vinner, Lasse; Hansen, Anders Johannes

    2016-08-17

    Outbreaks of zoonotic diseases in humans and livestock are not uncommon, and an important component in containment of such emerging viral diseases is rapid and reliable diagnostics. Such methods are often PCR-based and hence require the availability of sequence data from the pathogen. Rattus norvegicus (R. norvegicus) is a known reservoir for important zoonotic pathogens. Transmission may be direct via contact with the animal, for example, through exposure to its faecal matter, or indirectly mediated by arthropod vectors. Here we investigated the viral content in rat faecal matter (n=29) collected from two continents by analyzing 2.2 billion next-generation sequencing reads derived from both DNA and RNA. Among other virus families, we found sequences from members of the Picornaviridae to be abundant in the microbiome of all the samples. Here we describe the diversity of the picornavirus-like contigs including near-full-length genomes closely related to the Boone cardiovirus and Theiler's encephalomyelitis virus. From this study, we conclude that picornaviruses within R. norvegicus are more diverse than previously recognized. The virome of R. norvegicus should be investigated further to assess the full potential for zoonotic virus transmission.

  2. DeepLoc: prediction of protein subcellular localization using deep learning.

    PubMed

    Almagro Armenteros, José Juan; Sønderby, Casper Kaae; Sønderby, Søren Kaae; Nielsen, Henrik; Winther, Ole

    2017-11-01

    The prediction of eukaryotic protein subcellular localization is a well-studied topic in bioinformatics due to its relevance in proteomics research. Many machine learning methods have been successfully applied in this task, but in most of them, predictions rely on annotation of homologues from knowledge databases. For novel proteins where no annotated homologues exist, and for predicting the effects of sequence variants, it is desirable to have methods for predicting protein properties from sequence information only. Here, we present a prediction algorithm using deep neural networks to predict protein subcellular localization relying only on sequence information. At its core, the prediction model uses a recurrent neural network that processes the entire protein sequence and an attention mechanism identifying protein regions important for the subcellular localization. The model was trained and tested on a protein dataset extracted from one of the latest UniProt releases, in which experimentally annotated proteins follow more stringent criteria than previously. We demonstrate that our model achieves a good accuracy (78% for 10 categories; 92% for membrane-bound or soluble), outperforming current state-of-the-art algorithms, including those relying on homology information. The method is available as a web server at http://www.cbs.dtu.dk/services/DeepLoc. Example code is available at https://github.com/JJAlmagro/subcellular_localization. The dataset is available at http://www.cbs.dtu.dk/services/DeepLoc/data.php. jjalma@dtu.dk. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  3. A deep learning framework for improving long-range residue-residue contact prediction using a hierarchical strategy.

    PubMed

    Xiong, Dapeng; Zeng, Jianyang; Gong, Haipeng

    2017-09-01

    Residue-residue contacts are of great value for protein structure prediction, since contact information, especially from those long-range residue pairs, can significantly reduce the complexity of conformational sampling for protein structure prediction in practice. Despite progresses in the past decade on protein targets with abundant homologous sequences, accurate contact prediction for proteins with limited sequence information is still far from satisfaction. Methodologies for these hard targets still need further improvement. We presented a computational program DeepConPred, which includes a pipeline of two novel deep-learning-based methods (DeepCCon and DeepRCon) as well as a contact refinement step, to improve the prediction of long-range residue contacts from primary sequences. When compared with previous prediction approaches, our framework employed an effective scheme to identify optimal and important features for contact prediction, and was only trained with coevolutionary information derived from a limited number of homologous sequences to ensure robustness and usefulness for hard targets. Independent tests showed that 59.33%/49.97%, 64.39%/54.01% and 70.00%/59.81% of the top L/5, top L/10 and top 5 predictions were correct for CASP10/CASP11 proteins, respectively. In general, our algorithm ranked as one of the best methods for CASP targets. All source data and codes are available at http://166.111.152.91/Downloads.html . hgong@tsinghua.edu.cn or zengjy321@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  4. Musculoskeletal MRI findings of juvenile localized scleroderma.

    PubMed

    Eutsler, Eric P; Horton, Daniel B; Epelman, Monica; Finkel, Terri; Averill, Lauren W

    2017-04-01

    Juvenile localized scleroderma comprises a group of autoimmune conditions often characterized clinically by an area of skin hardening. In addition to superficial changes in the skin and subcutaneous tissues, juvenile localized scleroderma may involve the deep soft tissues, bones and joints, possibly resulting in functional impairment and pain in addition to cosmetic changes. There is literature documenting the spectrum of findings for deep involvement of localized scleroderma (fascia, muscles, tendons, bones and joints) in adults, but there is limited literature for the condition in children. We aimed to document the spectrum of musculoskeletal magnetic resonance imaging (MRI) findings of both superficial and deep juvenile localized scleroderma involvement in children and to evaluate the utility of various MRI sequences for detecting those findings. Two radiologists retrospectively evaluated 20 MRI studies of the extremities in 14 children with juvenile localized scleroderma. Each imaging sequence was also given a subjective score of 0 (not useful), 1 (somewhat useful) or 2 (most useful for detecting the findings). Deep tissue involvement was detected in 65% of the imaged extremities. Fascial thickening and enhancement were seen in 50% of imaged extremities. Axial T1, axial T1 fat-suppressed (FS) contrast-enhanced and axial fluid-sensitive sequences were rated most useful. Fascial thickening and enhancement were the most commonly encountered deep tissue findings in extremity MRIs of children with juvenile localized scleroderma. Because abnormalities of the skin, subcutaneous tissues and fascia tend to run longitudinally in an affected limb, axial T1, axial fluid-sensitive and axial T1-FS contrast-enhanced sequences should be included in the imaging protocol.

  5. Deep nirS amplicon sequencing of San Francisco Bay sediments enables prediction of geography and environmental conditions from denitrifying community composition.

    PubMed

    Lee, Jessica A; Francis, Christopher A

    2017-12-01

    Denitrification is a dominant nitrogen loss process in the sediments of San Francisco Bay. In this study, we sought to understand the ecology of denitrifying bacteria by using next-generation sequencing (NGS) to survey the diversity of a denitrification functional gene, nirS (encoding cytchrome-cd 1 nitrite reductase), along the salinity gradient of San Francisco Bay over the course of a year. We compared our dataset to a library of nirS sequences obtained previously from the same samples by standard PCR cloning and Sanger sequencing, and showed that both methods similarly demonstrated geography, salinity and, to a lesser extent, nitrogen, to be strong determinants of community composition. Furthermore, the depth afforded by NGS enabled novel techniques for measuring the association between environment and community composition. We used Random Forests modelling to demonstrate that the site and salinity of a sample could be predicted from its nirS sequences, and to identify indicator taxa associated with those environmental characteristics. This work contributes significantly to our understanding of the distribution and dynamics of denitrifying communities in San Francisco Bay, and provides valuable tools for the further study of this key N-cycling guild in all estuarine systems. © 2017 Society for Applied Microbiology and John Wiley & Sons Ltd.

  6. Deep whole-genome sequencing of 100 southeast Asian Malays.

    PubMed

    Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2013-01-10

    Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  7. Deep Whole-Genome Sequencing of 100 Southeast Asian Malays

    PubMed Central

    Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2013-01-01

    Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. PMID:23290073

  8. Application of viromics: a new approach to the understanding of viral infections in humans.

    PubMed

    Ramamurthy, Mageshbabu; Sankar, Sathish; Kannangai, Rajesh; Nandagopal, Balaji; Sridharan, Gopalan

    2017-12-01

    This review is focused at exploring the strengths of modern technology driven data compiled in the areas of virus gene sequencing, virus protein structures and their implication to viral diagnosis and therapy. The information for virome analysis (viromics) is generated by the study of viral genomes (entire nucleotide sequence) and viral genes (coding for protein). Presently, the study of viral infectious diseases in terms of etiopathogenesis and development of newer therapeutics is undergoing rapid changes. Currently, viromics relies on deep sequencing, next generation sequencing (NGS) data and public domain databases like GenBank and unique virus specific databases. Two commonly used NGS platforms: Illumina and Ion Torrent, recommend maximum fragment lengths of about 300 and 400 nucleotides for analysis respectively. Direct detection of viruses in clinical samples is now evolving using these methods. Presently, there are a considerable number of good treatment options for HBV/HIV/HCV. These viruses however show development of drug resistance. The drug susceptibility regions of the genomes are sequenced and the prediction of drug resistance is now possible from 3 public domains available on the web. This has been made possible through advances in the technology with the advent of high throughput sequencing and meta-analysis through sophisticated and easy to use software and the use of high speed computers for bioinformatics. More recently NGS technology has been improved with single-molecule real-time sequencing. Here complete long reads can be obtained with less error overcoming a limitation of the NGS which is inherently prone to software anomalies that arise in the hands of personnel without adequate training. The development in understanding the viruses in terms of their genome, pathobiology, transcriptomics and molecular epidemiology constitutes viromics. It could be stated that these developments will bring about radical changes and advancement especially in the field of antiviral therapy and diagnostic virology.

  9. Use of four next-generation sequencing platforms to determine HIV-1 coreceptor tropism.

    PubMed

    Archer, John; Weber, Jan; Henry, Kenneth; Winner, Dane; Gibson, Richard; Lee, Lawrence; Paxinos, Ellen; Arts, Eric J; Robertson, David L; Mimms, Larry; Quiñones-Mateu, Miguel E

    2012-01-01

    HIV-1 coreceptor tropism assays are required to rule out the presence of CXCR4-tropic (non-R5) viruses prior treatment with CCR5 antagonists. Phenotypic (e.g., Trofile™, Monogram Biosciences) and genotypic (e.g., population sequencing linked to bioinformatic algorithms) assays are the most widely used. Although several next-generation sequencing (NGS) platforms are available, to date all published deep sequencing HIV-1 tropism studies have used the 454™ Life Sciences/Roche platform. In this study, HIV-1 co-receptor usage was predicted for twelve patients scheduled to start a maraviroc-based antiretroviral regimen. The V3 region of the HIV-1 env gene was sequenced using four NGS platforms: 454™, PacBio® RS (Pacific Biosciences), Illumina®, and Ion Torrent™ (Life Technologies). Cross-platform variation was evaluated, including number of reads, read length and error rates. HIV-1 tropism was inferred using Geno2Pheno, Web PSSM, and the 11/24/25 rule and compared with Trofile™ and virologic response to antiretroviral therapy. Error rates related to insertions/deletions (indels) and nucleotide substitutions introduced by the four NGS platforms were low compared to the actual HIV-1 sequence variation. Each platform detected all major virus variants within the HIV-1 population with similar frequencies. Identification of non-R5 viruses was comparable among the four platforms, with minor differences attributable to the algorithms used to infer HIV-1 tropism. All NGS platforms showed similar concordance with virologic response to the maraviroc-based regimen (75% to 80% range depending on the algorithm used), compared to Trofile (80%) and population sequencing (70%). In conclusion, all four NGS platforms were able to detect minority non-R5 variants at comparable levels suggesting that any NGS-based method can be used to predict HIV-1 coreceptor usage.

  10. The Vigna unguiculata Gene Expression Atlas (VuGEA) from de novo assembly and quantification of RNA-seq data provides insights into seed maturation mechanisms.

    PubMed

    Yao, Shaolun; Jiang, Chuan; Huang, Ziyue; Torres-Jerez, Ivone; Chang, Junil; Zhang, Heng; Udvardi, Michael; Liu, Renyi; Verdier, Jerome

    2016-10-01

    Legume research and cultivar development are important for sustainable food production, especially of high-protein seed. Thanks to the development of deep-sequencing technologies, crop species have been taken to the front line, even without completion of their genome sequences. Black-eyed pea (Vigna unguiculata) is a legume species widely grown in semi-arid regions, which has high potential to provide stable seed protein production in a broad range of environments, including drought conditions. The black-eyed pea reference genotype has been used to generate a gene expression atlas of the major plant tissues (i.e. leaf, root, stem, flower, pod and seed), with a developmental time series for pods and seeds. From these various organs, 27 cDNA libraries were generated and sequenced, resulting in more than one billion reads. Following filtering, these reads were de novo assembled into 36 529 transcript sequences that were annotated and quantified across the different tissues. A set of 24 866 unique transcript sequences, called Unigenes, was identified. All the information related to transcript identification, annotation and quantification were stored into a gene expression atlas webserver (http://vugea.noble.org), providing a user-friendly interface and necessary tools to analyse transcript expression in black-eyed pea organs and to compare data with other legume species. Using this gene expression atlas, we inferred details of molecular processes that are active during seed development, and identified key putative regulators of seed maturation. Additionally, we found evidence for conservation of regulatory mechanisms involving miRNA in plant tissues subjected to drought and seeds undergoing desiccation. © 2016 The Authors. The Plant Journal published by Society for Experimental Biology and John Wiley & Sons Ltd.

  11. Deep mRNA Sequencing of the Tritonia diomedea Brain Transcriptome Provides Access to Gene Homologues for Neuronal Excitability, Synaptic Transmission and Peptidergic Signalling

    PubMed Central

    Senatore, Adriano; Edirisinghe, Neranjan; Katz, Paul S.

    2015-01-01

    Background The sea slug Tritonia diomedea (Mollusca, Gastropoda, Nudibranchia), has a simple and highly accessible nervous system, making it useful for studying neuronal and synaptic mechanisms underlying behavior. Although many important contributions have been made using Tritonia, until now, a lack of genetic information has impeded exploration at the molecular level. Results We performed Illumina sequencing of central nervous system mRNAs from Tritonia, generating 133.1 million 100 base pair, paired-end reads. De novo reconstruction of the RNA-Seq data yielded a total of 185,546 contigs, which partitioned into 123,154 non-redundant gene clusters (unigenes). BLAST comparison with RefSeq and Swiss-Prot protein databases, as well as mRNA data from other invertebrates (gastropod molluscs: Aplysia californica, Lymnaea stagnalis and Biomphalaria glabrata; cnidarian: Nematostella vectensis) revealed that up to 76,292 unigenes in the Tritonia transcriptome have putative homologues in other databases, 18,246 of which are below a more stringent E-value cut-off of 1x10-6. In silico prediction of secreted proteins from the Tritonia transcriptome shotgun assembly (TSA) produced a database of 579 unique sequences of secreted proteins, which also exhibited markedly higher expression levels compared to other genes in the TSA. Conclusions Our efforts greatly expand the availability of gene sequences available for Tritonia diomedea. We were able to extract full length protein sequences for most queried genes, including those involved in electrical excitability, synaptic vesicle release and neurotransmission, thus confirming that the transcriptome will serve as a useful tool for probing the molecular correlates of behavior in this species. We also generated a neurosecretome database that will serve as a useful tool for probing peptidergic signalling systems in the Tritonia brain. PMID:25719197

  12. AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling.

    PubMed

    Wang, Sheng; Sun, Siqi; Xu, Jinbo

    2016-09-01

    Deep Convolutional Neural Networks (DCNN) has shown excellent performance in a variety of machine learning tasks. This paper presents Deep Convolutional Neural Fields (DeepCNF), an integration of DCNN with Conditional Random Field (CRF), for sequence labeling with an imbalanced label distribution. The widely-used training methods, such as maximum-likelihood and maximum labelwise accuracy, do not work well on imbalanced data. To handle this, we present a new training algorithm called maximum-AUC for DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced data. To fulfill this, we formulate AUC in a pairwise ranking framework, approximate it by a polynomial function and then apply a gradient-based procedure to optimize it. Our experimental results confirm that maximum-AUC greatly outperforms the other two training methods on 8-state secondary structure prediction and disorder prediction since their label distributions are highly imbalanced and also has similar performance as the other two training methods on solvent accessibility prediction, which has three equally-distributed labels. Furthermore, our experimental results show that our AUC-trained DeepCNF models greatly outperform existing popular predictors of these three tasks. The data and software related to this paper are available at https://github.com/realbigws/DeepCNF_AUC.

  13. AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling

    PubMed Central

    Wang, Sheng; Sun, Siqi

    2017-01-01

    Deep Convolutional Neural Networks (DCNN) has shown excellent performance in a variety of machine learning tasks. This paper presents Deep Convolutional Neural Fields (DeepCNF), an integration of DCNN with Conditional Random Field (CRF), for sequence labeling with an imbalanced label distribution. The widely-used training methods, such as maximum-likelihood and maximum labelwise accuracy, do not work well on imbalanced data. To handle this, we present a new training algorithm called maximum-AUC for DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced data. To fulfill this, we formulate AUC in a pairwise ranking framework, approximate it by a polynomial function and then apply a gradient-based procedure to optimize it. Our experimental results confirm that maximum-AUC greatly outperforms the other two training methods on 8-state secondary structure prediction and disorder prediction since their label distributions are highly imbalanced and also has similar performance as the other two training methods on solvent accessibility prediction, which has three equally-distributed labels. Furthermore, our experimental results show that our AUC-trained DeepCNF models greatly outperform existing popular predictors of these three tasks. The data and software related to this paper are available at https://github.com/realbigws/DeepCNF_AUC. PMID:28884168

  14. Modeling genome coverage in single-cell sequencing

    PubMed Central

    Daley, Timothy; Smith, Andrew D.

    2014-01-01

    Motivation: Single-cell DNA sequencing is necessary for examining genetic variation at the cellular level, which remains hidden in bulk sequencing experiments. But because they begin with such small amounts of starting material, the amount of information that is obtained from single-cell sequencing experiment is highly sensitive to the choice of protocol employed and variability in library preparation. In particular, the fraction of the genome represented in single-cell sequencing libraries exhibits extreme variability due to quantitative biases in amplification and loss of genetic material. Results: We propose a method to predict the genome coverage of a deep sequencing experiment using information from an initial shallow sequencing experiment mapped to a reference genome. The observed coverage statistics are used in a non-parametric empirical Bayes Poisson model to estimate the gain in coverage from deeper sequencing. This approach allows researchers to know statistical features of deep sequencing experiments without actually sequencing deeply, providing a basis for optimizing and comparing single-cell sequencing protocols or screening libraries. Availability and implementation: The method is available as part of the preseq software package. Source code is available at http://smithlabresearch.org/preseq. Contact: andrewds@usc.edu Supplementary information: Supplementary material is available at Bioinformatics online. PMID:25107873

  15. EHR Big Data Deep Phenotyping

    PubMed Central

    Lenert, L.; Lopez-Campos, G.

    2014-01-01

    Summary Objectives Given the quickening speed of discovery of variant disease drivers from combined patient genotype and phenotype data, the objective is to provide methodology using big data technology to support the definition of deep phenotypes in medical records. Methods As the vast stores of genomic information increase with next generation sequencing, the importance of deep phenotyping increases. The growth of genomic data and adoption of Electronic Health Records (EHR) in medicine provides a unique opportunity to integrate phenotype and genotype data into medical records. The method by which collections of clinical findings and other health related data are leveraged to form meaningful phenotypes is an active area of research. Longitudinal data stored in EHRs provide a wealth of information that can be used to construct phenotypes of patients. We focus on a practical problem around data integration for deep phenotype identification within EHR data. The use of big data approaches are described that enable scalable markup of EHR events that can be used for semantic and temporal similarity analysis to support the identification of phenotype and genotype relationships. Conclusions Stead and colleagues’ 2005 concept of using light standards to increase the productivity of software systems by riding on the wave of hardware/processing power is described as a harbinger for designing future healthcare systems. The big data solution, using flexible markup, provides a route to improved utilization of processing power for organizing patient records in genotype and phenotype research. PMID:25123744

  16. Protein Solvent-Accessibility Prediction by a Stacked Deep Bidirectional Recurrent Neural Network.

    PubMed

    Zhang, Buzhong; Li, Linqing; Lü, Qiang

    2018-05-25

    Residue solvent accessibility is closely related to the spatial arrangement and packing of residues. Predicting the solvent accessibility of a protein is an important step to understand its structure and function. In this work, we present a deep learning method to predict residue solvent accessibility, which is based on a stacked deep bidirectional recurrent neural network applied to sequence profiles. To capture more long-range sequence information, a merging operator was proposed when bidirectional information from hidden nodes was merged for outputs. Three types of merging operators were used in our improved model, with a long short-term memory network performing as a hidden computing node. The trained database was constructed from 7361 proteins extracted from the PISCES server using a cut-off of 25% sequence identity. Sequence-derived features including position-specific scoring matrix, physical properties, physicochemical characteristics, conservation score and protein coding were used to represent a residue. Using this method, predictive values of continuous relative solvent-accessible area were obtained, and then, these values were transformed into binary states with predefined thresholds. Our experimental results showed that our deep learning method improved prediction quality relative to current methods, with mean absolute error and Pearson's correlation coefficient values of 8.8% and 74.8%, respectively, on the CB502 dataset and 8.2% and 78%, respectively, on the Manesh215 dataset.

  17. Technical adequacy of bisulfite sequencing and pyrosequencing for detection of mitochondrial DNA methylation: Sources and avoidance of false-positive detection.

    PubMed

    Owa, Chie; Poulin, Matthew; Yan, Liying; Shioda, Toshi

    2018-01-01

    The existence of cytosine methylation in mammalian mitochondrial DNA (mtDNA) is a controversial subject. Because detection of DNA methylation depends on resistance of 5'-modified cytosines to bisulfite-catalyzed conversion to uracil, examined parameters that affect technical adequacy of mtDNA methylation analysis. Negative control amplicons (NCAs) devoid of cytosine methylation were amplified to cover the entire human or mouse mtDNA by long-range PCR. When the pyrosequencing template amplicons were gel-purified after bisulfite conversion, bisulfite pyrosequencing of NCAs did not detect significant levels of bisulfite-resistant cytosines (brCs) at ND1 (7 CpG sites) or CYTB (8 CpG sites) genes (CI95 = 0%-0.94%); without gel-purification, significant false-positive brCs were detected from NCAs (CI95 = 4.2%-6.8%). Bisulfite pyrosequencing of highly purified, linearized mtDNA isolated from human iPS cells or mouse liver detected significant brCs (~30%) in human ND1 gene when the sequencing primer was not selective in bisulfite-converted and unconverted templates. However, repeated experiments using a sequencing primer selective in bisulfite-converted templates almost completely (< 0.8%) suppressed brC detection, supporting the false-positive nature of brCs detected using the non-selective primer. Bisulfite-seq deep sequencing of linearized, gel-purified human mtDNA detected 9.4%-14.8% brCs for 9 CpG sites in ND1 gene. However, because all these brCs were associated with adjacent non-CpG brCs showing the same degrees of bisulfite resistance, DNA methylation in this mtDNA-encoded gene was not confirmed. Without linearization, data generated by bisulfite pyrosequencing or deep sequencing of purified mtDNA templates did not pass the quality control criteria. Shotgun bisulfite sequencing of human mtDNA detected extremely low levels of CpG methylation (<0.65%) over non-CpG methylation (<0.55%). Taken together, our study demonstrates that adequacy of mtDNA methylation analysis using methods dependent on bisulfite conversion needs to be established for each experiment, taking effects of incomplete bisulfite conversion and template impurity or topology into consideration.

  18. Deep sequencing is an appropriate tool for the selection of unique Hepatitis C virus (HCV) variants after single genomic amplification.

    PubMed

    Guinoiseau, Thibault; Moreau, Alain; Hohnadel, Guillaume; Ngo-Giang-Huong, Nicole; Brulard, Celine; Vourc'h, Patrick; Goudeau, Alain; Gaudy-Graffin, Catherine

    2017-01-01

    Hepatitis C virus (HCV) evolves rapidly in a single host and circulates as a quasispecies wich is a complex mixture of genetically distinct virus's but closely related namely variants. To identify intra-individual diversity and investigate their functional properties in vitro, it is necessary to define their quasispecies composition and isolate the HCV variants. This is possible using single genome amplification (SGA). This technique, based on serially diluted cDNA to amplify a single cDNA molecule (clonal amplicon), has already been used to determine individual HCV diversity. In these studies, positive PCR reactions from SGA were directly sequenced using Sanger technology. The detection of non-clonal amplicons is necessary for excluding them to facilitate further functional analysis. Here, we compared Next Generation Sequencing (NGS) with De Novo assembly and Sanger sequencing for their ability to distinguish clonal and non-clonal amplicons after SGA on one plasma specimen. All amplicons (n = 42) classified as clonal by NGS were also classified as clonal by Sanger sequencing. No double peaks were seen on electropherograms for non-clonal amplicons with position-specific nucleotide variation below 15% by NGS. Altogether, NGS circumvented many of the difficulties encountered when using Sanger sequencing after SGA and is an appropriate tool to reliability select clonal amplicons for further functional studies.

  19. Deep sequencing is an appropriate tool for the selection of unique Hepatitis C virus (HCV) variants after single genomic amplification

    PubMed Central

    Guinoiseau, Thibault; Moreau, Alain; Hohnadel, Guillaume; Ngo-Giang-Huong, Nicole; Brulard, Celine; Vourc’h, Patrick; Goudeau, Alain; Gaudy-Graffin, Catherine

    2017-01-01

    Hepatitis C virus (HCV) evolves rapidly in a single host and circulates as a quasispecies wich is a complex mixture of genetically distinct virus’s but closely related namely variants. To identify intra-individual diversity and investigate their functional properties in vitro, it is necessary to define their quasispecies composition and isolate the HCV variants. This is possible using single genome amplification (SGA). This technique, based on serially diluted cDNA to amplify a single cDNA molecule (clonal amplicon), has already been used to determine individual HCV diversity. In these studies, positive PCR reactions from SGA were directly sequenced using Sanger technology. The detection of non-clonal amplicons is necessary for excluding them to facilitate further functional analysis. Here, we compared Next Generation Sequencing (NGS) with De Novo assembly and Sanger sequencing for their ability to distinguish clonal and non-clonal amplicons after SGA on one plasma specimen. All amplicons (n = 42) classified as clonal by NGS were also classified as clonal by Sanger sequencing. No double peaks were seen on electropherograms for non-clonal amplicons with position-specific nucleotide variation below 15% by NGS. Altogether, NGS circumvented many of the difficulties encountered when using Sanger sequencing after SGA and is an appropriate tool to reliability select clonal amplicons for further functional studies. PMID:28362878

  20. Prognostic value of deep sequencing method for minimal residual disease detection in multiple myeloma

    PubMed Central

    Lahuerta, Juan J.; Pepin, François; González, Marcos; Barrio, Santiago; Ayala, Rosa; Puig, Noemí; Montalban, María A.; Paiva, Bruno; Weng, Li; Jiménez, Cristina; Sopena, María; Moorhead, Martin; Cedena, Teresa; Rapado, Immaculada; Mateos, María Victoria; Rosiñol, Laura; Oriol, Albert; Blanchard, María J.; Martínez, Rafael; Bladé, Joan; San Miguel, Jesús; Faham, Malek; García-Sanz, Ramón

    2014-01-01

    We assessed the prognostic value of minimal residual disease (MRD) detection in multiple myeloma (MM) patients using a sequencing-based platform in bone marrow samples from 133 MM patients in at least very good partial response (VGPR) after front-line therapy. Deep sequencing was carried out in patients in whom a high-frequency myeloma clone was identified and MRD was assessed using the IGH-VDJH, IGH-DJH, and IGK assays. The results were contrasted with those of multiparametric flow cytometry (MFC) and allele-specific oligonucleotide polymerase chain reaction (ASO-PCR). The applicability of deep sequencing was 91%. Concordance between sequencing and MFC and ASO-PCR was 83% and 85%, respectively. Patients who were MRD– by sequencing had a significantly longer time to tumor progression (TTP) (median 80 vs 31 months; P < .0001) and overall survival (median not reached vs 81 months; P = .02), compared with patients who were MRD+. When stratifying patients by different levels of MRD, the respective TTP medians were: MRD ≥10−3 27 months, MRD 10−3 to 10−5 48 months, and MRD <10−5 80 months (P = .003 to .0001). Ninety-two percent of VGPR patients were MRD+. In complete response patients, the TTP remained significantly longer for MRD– compared with MRD+ patients (131 vs 35 months; P = .0009). PMID:24646471

  1. Deep sequencing and flow cytometric characterization of expanded effector memory CD8+CD57+ T cells frequently reveals T-cell receptor Vβ oligoclonality and CDR3 homology in acquired aplastic anemia.

    PubMed

    Giudice, Valentina; Feng, Xingmin; Lin, Zenghua; Hu, Wei; Zhang, Fanmao; Qiao, Wangmin; Ibanez, Maria Del Pilar Fernandez; Rios, Olga; Young, Neal S

    2018-05-01

    Oligoclonal expansion of CD8 + CD28 - lymphocytes has been considered indirect evidence for a pathogenic immune response in acquired aplastic anemia. A subset of CD8 + CD28 - cells with CD57 expression, termed effector memory cells, is expanded in several immune-mediated diseases and may have a role in immune surveillance. We hypothesized that effector memory CD8 + CD28 - CD57 + cells may drive aberrant oligoclonal expansion in aplastic anemia. We found CD8 + CD57 + cells frequently expanded in the blood of aplastic anemia patients, with oligoclonal characteristics by flow cytometric Vβ usage analysis: skewing in 1-5 Vβ families and frequencies of immunodominant clones ranging from 1.98% to 66.5%. Oligoclonal characteristics were also observed in total CD8 + cells from aplastic anemia patients with CD8 + CD57 + cell expansion by T-cell receptor deep sequencing, as well as the presence of 1-3 immunodominant clones. Oligoclonality was confirmed by T-cell receptor repertoire deep sequencing of enriched CD8 + CD57 + cells, which also showed decreased diversity compared to total CD4 + and CD8 + cell pools. From analysis of complementarity-determining region 3 sequences in the CD8 + cell pool, a total of 29 sequences were shared between patients and controls, but these sequences were highly expressed in aplastic anemia subjects and also present in their immunodominant clones. In summary, expansion of effector memory CD8 + T cells is frequent in aplastic anemia and mirrors Vβ oligoclonal expansion. Flow cytometric Vβ usage analysis combined with deep sequencing technologies allows high resolution characterization of the T-cell receptor repertoire, and might represent a useful tool in the diagnosis and periodic evaluation of aplastic anemia patients. (Registered at clinicaltrials.gov identifiers: 00001620, 01623167, 00001397, 00071045, 00081523, 00961064 ). Copyright © 2018 Ferrata Storti Foundation.

  2. Complete genome sequence of a tomato infecting tomato mottle mosaic virus in New York

    USDA-ARS?s Scientific Manuscript database

    Complete genome sequence of an emerging isolate of tomato mottle mosaic virus (ToMMV) infecting experimental nicotianan benthamiana plants in up-state New York was obtained using small RNA deep sequencing. ToMMV_NY-13 shared 99% sequence identity to ToMMV isolates from Mexico and Florida. Broader d...

  3. High-Throughput Sequencing of Campylobacter jejuni Insertion Mutant Libraries Reveals mapA as a Fitness Factor for Chicken Colonization

    PubMed Central

    Johnson, Jeremiah G.; Livny, Jonathan

    2014-01-01

    Campylobacter jejuni is a leading cause of gastrointestinal infections worldwide, due primarily to its ability to asymptomatically colonize the gastrointestinal tracts of agriculturally relevant animals, including chickens. Infection often occurs following consumption of meat that was contaminated by C. jejuni during harvest. Because of this, much interest lies in understanding the mechanisms that allow C. jejuni to colonize the chicken gastrointestinal tract. To address this, we generated a C. jejuni transposon mutant library that is amenable to insertion sequencing and introduced this mutant pool into day-of-hatch chicks. Following deep sequencing of C. jejuni mutants in the cecal outputs, several novel factors required for efficient colonization of the chicken gastrointestinal tract were identified, including the predicted outer membrane protein MapA. A mutant strain lacking mapA was constructed and found to be significantly reduced for chicken colonization in both competitive infections and monoinfections. Further, we found that mapA is required for in vitro competition with wild-type C. jejuni but is dispensable for growth in monoculture. PMID:24633877

  4. High-throughput sequencing of Campylobacter jejuni insertion mutant libraries reveals mapA as a fitness factor for chicken colonization.

    PubMed

    Johnson, Jeremiah G; Livny, Jonathan; Dirita, Victor J

    2014-06-01

    Campylobacter jejuni is a leading cause of gastrointestinal infections worldwide, due primarily to its ability to asymptomatically colonize the gastrointestinal tracts of agriculturally relevant animals, including chickens. Infection often occurs following consumption of meat that was contaminated by C. jejuni during harvest. Because of this, much interest lies in understanding the mechanisms that allow C. jejuni to colonize the chicken gastrointestinal tract. To address this, we generated a C. jejuni transposon mutant library that is amenable to insertion sequencing and introduced this mutant pool into day-of-hatch chicks. Following deep sequencing of C. jejuni mutants in the cecal outputs, several novel factors required for efficient colonization of the chicken gastrointestinal tract were identified, including the predicted outer membrane protein MapA. A mutant strain lacking mapA was constructed and found to be significantly reduced for chicken colonization in both competitive infections and monoinfections. Further, we found that mapA is required for in vitro competition with wild-type C. jejuni but is dispensable for growth in monoculture.

  5. Reproducibility and Consistency of In Vitro Nucleosome Reconstitutions Demonstrated by Invitrosome Isolation and Sequencing

    PubMed Central

    Kempton, Colton E.; Heninger, Justin R.; Johnson, Steven M.

    2014-01-01

    Nucleosomes and their positions in the eukaryotic genome play an important role in regulating gene expression by influencing accessibility to DNA. Many factors influence a nucleosome's final position in the chromatin landscape including the underlying genomic sequence. One of the primary reasons for performing in vitro nucleosome reconstitution experiments is to identify how the underlying DNA sequence will influence a nucleosome's position in the absence of other compounding cellular factors. However, concerns have been raised about the reproducibility of data generated from these kinds of experiments. Here we present data for in vitro nucleosome reconstitution experiments performed on linear plasmid DNA that demonstrate that, when coverage is deep enough, these reconstitution experiments are exquisitely reproducible and highly consistent. Our data also suggests that a coverage depth of 35X be maintained for maximal confidence when assaying nucleosome positions, but lower coverage levels may be generally sufficient. These coverage depth recommendations are sufficient in the experimental system and conditions used in this study, but may vary depending on the exact parameters used in other systems. PMID:25093869

  6. The Cosmic Skidmark: witnessing galaxy transformation at z = 0.19

    NASA Astrophysics Data System (ADS)

    Murphy, David N. A.

    2015-02-01

    We present an early-look analysis of the ``Cosmic Skidmark''. Discovered following visual inspection of the Geach, Murphy & Bower (2011) SDSS Stripe 82 cluster catalogue generated by ORCA (an automated cluster algorithm searching for red-sequences; Murphy, Geach & Bower 2012), this z = 0.19 1.4L* galaxy appears to have been caught in the rare act of transformation while accreting onto an estimated 1013-1014 h -1 M⊙-mass galaxy group. SDSS spectroscopy reveals clear signatures of star formation whilst deep optical imaging reveals a pronounced 50 kpc cometary tail. Pending completion of our ALMA Cycle 2 and IFU observations, we show here preliminary analysis of this target.

  7. Estimating time of HIV-1 infection from next-generation sequence diversity

    PubMed Central

    2017-01-01

    Estimating the time since infection (TI) in newly diagnosed HIV-1 patients is challenging, but important to understand the epidemiology of the infection. Here we explore the utility of virus diversity estimated by next-generation sequencing (NGS) as novel biomarker by using a recent genome-wide longitudinal dataset obtained from 11 untreated HIV-1-infected patients with known dates of infection. The results were validated on a second dataset from 31 patients. Virus diversity increased linearly with time, particularly at 3rd codon positions, with little inter-patient variation. The precision of the TI estimate improved with increasing sequencing depth, showing that diversity in NGS data yields superior estimates to the number of ambiguous sites in Sanger sequences, which is one of the alternative biomarkers. The full advantage of deep NGS was utilized with continuous diversity measures such as average pairwise distance or site entropy, rather than the fraction of polymorphic sites. The precision depended on the genomic region and codon position and was highest when 3rd codon positions in the entire pol gene were used. For these data, TI estimates had a mean absolute error of around 1 year. The error increased only slightly from around 0.6 years at a TI of 6 months to around 1.1 years at 6 years. Our results show that virus diversity determined by NGS can be used to estimate time since HIV-1 infection many years after the infection, in contrast to most alternative biomarkers. We provide the regression coefficients as well as web tool for TI estimation. PMID:28968389

  8. Proteome-wide Identification of Novel Ceramide-binding Proteins by Yeast Surface cDNA Display and Deep Sequencing.

    PubMed

    Bidlingmaier, Scott; Ha, Kevin; Lee, Nam-Kyung; Su, Yang; Liu, Bin

    2016-04-01

    Although the bioactive sphingolipid ceramide is an important cell signaling molecule, relatively few direct ceramide-interacting proteins are known. We used an approach combining yeast surface cDNA display and deep sequencing technology to identify novel proteins binding directly to ceramide. We identified 234 candidate ceramide-binding protein fragments and validated binding for 20. Most (17) bound selectively to ceramide, although a few (3) bound to other lipids as well. Several novel ceramide-binding domains were discovered, including the EF-hand calcium-binding motif, the heat shock chaperonin-binding motif STI1, the SCP2 sterol-binding domain, and the tetratricopeptide repeat region motif. Interestingly, four of the verified ceramide-binding proteins (HPCA, HPCAL1, NCS1, and VSNL1) and an additional three candidate ceramide-binding proteins (NCALD, HPCAL4, and KCNIP3) belong to the neuronal calcium sensor family of EF hand-containing proteins. We used mutagenesis to map the ceramide-binding site in HPCA and to create a mutant HPCA that does not bind to ceramide. We demonstrated selective binding to ceramide by mammalian cell-produced wild type but not mutant HPCA. Intriguingly, we also identified a fragment from prostaglandin D2synthase that binds preferentially to ceramide 1-phosphate. The wide variety of proteins and domains capable of binding to ceramide suggests that many of the signaling functions of ceramide may be regulated by direct binding to these proteins. Based on the deep sequencing data, we estimate that our yeast surface cDNA display library covers ∼60% of the human proteome and our selection/deep sequencing protocol can identify target-interacting protein fragments that are present at extremely low frequency in the starting library. Thus, the yeast surface cDNA display/deep sequencing approach is a rapid, comprehensive, and flexible method for the analysis of protein-ligand interactions, particularly for the study of non-protein ligands. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.

  9. Identification and Removal of Contaminant Sequences From Ribosomal Gene Databases: Lessons From the Census of Deep Life

    PubMed Central

    Sheik, Cody S.; Reese, Brandi Kiel; Twing, Katrina I.; Sylvan, Jason B.; Grim, Sharon L.; Schrenk, Matthew O.; Sogin, Mitchell L.; Colwell, Frederick S.

    2018-01-01

    Earth’s subsurface environment is one of the largest, yet least studied, biomes on Earth, and many questions remain regarding what microorganisms are indigenous to the subsurface. Through the activity of the Census of Deep Life (CoDL) and the Deep Carbon Observatory, an open access 16S ribosomal RNA gene sequence database from diverse subsurface environments has been compiled. However, due to low quantities of biomass in the deep subsurface, the potential for incorporation of contaminants from reagents used during sample collection, processing, and/or sequencing is high. Thus, to understand the ecology of subsurface microorganisms (i.e., the distribution, richness, or survival), it is necessary to minimize, identify, and remove contaminant sequences that will skew the relative abundances of all taxa in the sample. In this meta-analysis, we identify putative contaminants associated with the CoDL dataset, recommend best practices for removing contaminants from samples, and propose a series of best practices for subsurface microbiology sampling. The most abundant putative contaminant genera observed, independent of evenness across samples, were Propionibacterium, Aquabacterium, Ralstonia, and Acinetobacter. While the top five most frequently observed genera were Pseudomonas, Propionibacterium, Acinetobacter, Ralstonia, and Sphingomonas. The majority of the most frequently observed genera (high evenness) were associated with reagent or potential human contamination. Additionally, in DNA extraction blanks, we observed potential archaeal contaminants, including methanogens, which have not been discussed in previous contamination studies. Such contaminants would directly affect the interpretation of subsurface molecular studies, as methanogenesis is an important subsurface biogeochemical process. Utilizing previously identified contaminant genera, we found that ∼27% of the total dataset were identified as contaminant sequences that likely originate from DNA extraction and DNA cleanup methods. Thus, controls must be taken at every step of the collection and processing procedure when working with low biomass environments such as, but not limited to, portions of Earth’s deep subsurface. Taken together, we stress that the CoDL dataset is an incredible resource for the broader research community interested in subsurface life, and steps to remove contamination derived sequences must be taken prior to using this dataset. PMID:29780369

  10. Low-abundance HIV drug-resistant viral variants in treatment-experienced persons correlate with historical antiretroviral use.

    PubMed

    Le, Thuy; Chiarella, Jennifer; Simen, Birgitte B; Hanczaruk, Bozena; Egholm, Michael; Landry, Marie L; Dieckhaus, Kevin; Rosen, Marc I; Kozal, Michael J

    2009-06-29

    It is largely unknown how frequently low-abundance HIV drug-resistant variants at levels under limit of detection of conventional genotyping (<20% of quasi-species) are present in antiretroviral-experienced persons experiencing virologic failure. Further, the clinical implications of low-abundance drug-resistant variants at time of virologic failure are unknown. Plasma samples from 22 antiretroviral-experienced subjects collected at time of virologic failure (viral load 1380 to 304,000 copies/mL) were obtained from a specimen bank (from 2004-2007). The prevalence and profile of drug-resistant mutations were determined using Sanger sequencing and ultra-deep pyrosequencing. Genotypes were interpreted using Stanford HIV database algorithm. Antiretroviral treatment histories were obtained by chart review and correlated with drug-resistant mutations. Low-abundance drug-resistant mutations were detected in all 22 subjects by deep sequencing and only in 3 subjects by Sanger sequencing. In total they accounted for 90 of 247 mutations (36%) detected by deep sequencing; the majority of these (95%) were not detected by standard genotyping. A mean of 4 additional mutations per subject were detected by deep sequencing (p<0.0001, 95%CI: 2.85-5.53). The additional low-abundance drug-resistant mutations increased a subject's genotypic resistance to one or more antiretrovirals in 17 of 22 subjects (77%). When correlated with subjects' antiretroviral treatment histories, the additional low-abundance drug-resistant mutations correlated with the failing antiretroviral drugs in 21% subjects and correlated with historical antiretroviral use in 79% subjects (OR, 13.73; 95% CI, 2.5-74.3, p = 0.0016). Low-abundance HIV drug-resistant mutations in antiretroviral-experienced subjects at time of virologic failure can increase a subject's overall burden of resistance, yet commonly go unrecognized by conventional genotyping. The majority of unrecognized resistant mutations correlate with historical antiretroviral use. Ultra-deep sequencing can provide important historical resistance information for clinicians when planning subsequent antiretroviral regimens for highly treatment-experienced patients, particularly when their prior treatment histories and longitudinal genotypes are not available.

  11. Low-Abundance HIV Drug-Resistant Viral Variants in Treatment-Experienced Persons Correlate with Historical Antiretroviral Use

    PubMed Central

    Le, Thuy; Chiarella, Jennifer; Simen, Birgitte B.; Hanczaruk, Bozena; Egholm, Michael; Landry, Marie L.; Dieckhaus, Kevin; Rosen, Marc I.; Kozal, Michael J.

    2009-01-01

    Background It is largely unknown how frequently low-abundance HIV drug-resistant variants at levels under limit of detection of conventional genotyping (<20% of quasi-species) are present in antiretroviral-experienced persons experiencing virologic failure. Further, the clinical implications of low-abundance drug-resistant variants at time of virologic failure are unknown. Methodology/Principal Findings Plasma samples from 22 antiretroviral-experienced subjects collected at time of virologic failure (viral load 1380 to 304,000 copies/mL) were obtained from a specimen bank (from 2004–2007). The prevalence and profile of drug-resistant mutations were determined using Sanger sequencing and ultra-deep pyrosequencing. Genotypes were interpreted using Stanford HIV database algorithm. Antiretroviral treatment histories were obtained by chart review and correlated with drug-resistant mutations. Low-abundance drug-resistant mutations were detected in all 22 subjects by deep sequencing and only in 3 subjects by Sanger sequencing. In total they accounted for 90 of 247 mutations (36%) detected by deep sequencing; the majority of these (95%) were not detected by standard genotyping. A mean of 4 additional mutations per subject were detected by deep sequencing (p<0.0001, 95%CI: 2.85–5.53). The additional low-abundance drug-resistant mutations increased a subject's genotypic resistance to one or more antiretrovirals in 17 of 22 subjects (77%). When correlated with subjects' antiretroviral treatment histories, the additional low-abundance drug-resistant mutations correlated with the failing antiretroviral drugs in 21% subjects and correlated with historical antiretroviral use in 79% subjects (OR, 13.73; 95% CI, 2.5–74.3, p = 0.0016). Conclusions/Significance Low-abundance HIV drug-resistant mutations in antiretroviral-experienced subjects at time of virologic failure can increase a subject's overall burden of resistance, yet commonly go unrecognized by conventional genotyping. The majority of unrecognized resistant mutations correlate with historical antiretroviral use. Ultra-deep sequencing can provide important historical resistance information for clinicians when planning subsequent antiretroviral regimens for highly treatment-experienced patients, particularly when their prior treatment histories and longitudinal genotypes are not available. PMID:19562031

  12. Identification and Removal of Contaminant Sequences From Ribosomal Gene Databases: Lessons From the Census of Deep Life.

    PubMed

    Sheik, Cody S; Reese, Brandi Kiel; Twing, Katrina I; Sylvan, Jason B; Grim, Sharon L; Schrenk, Matthew O; Sogin, Mitchell L; Colwell, Frederick S

    2018-01-01

    Earth's subsurface environment is one of the largest, yet least studied, biomes on Earth, and many questions remain regarding what microorganisms are indigenous to the subsurface. Through the activity of the Census of Deep Life (CoDL) and the Deep Carbon Observatory, an open access 16S ribosomal RNA gene sequence database from diverse subsurface environments has been compiled. However, due to low quantities of biomass in the deep subsurface, the potential for incorporation of contaminants from reagents used during sample collection, processing, and/or sequencing is high. Thus, to understand the ecology of subsurface microorganisms (i.e., the distribution, richness, or survival), it is necessary to minimize, identify, and remove contaminant sequences that will skew the relative abundances of all taxa in the sample. In this meta-analysis, we identify putative contaminants associated with the CoDL dataset, recommend best practices for removing contaminants from samples, and propose a series of best practices for subsurface microbiology sampling. The most abundant putative contaminant genera observed, independent of evenness across samples, were Propionibacterium , Aquabacterium , Ralstonia , and Acinetobacter . While the top five most frequently observed genera were Pseudomonas , Propionibacterium , Acinetobacter , Ralstonia , and Sphingomonas . The majority of the most frequently observed genera (high evenness) were associated with reagent or potential human contamination. Additionally, in DNA extraction blanks, we observed potential archaeal contaminants, including methanogens, which have not been discussed in previous contamination studies. Such contaminants would directly affect the interpretation of subsurface molecular studies, as methanogenesis is an important subsurface biogeochemical process. Utilizing previously identified contaminant genera, we found that ∼27% of the total dataset were identified as contaminant sequences that likely originate from DNA extraction and DNA cleanup methods. Thus, controls must be taken at every step of the collection and processing procedure when working with low biomass environments such as, but not limited to, portions of Earth's deep subsurface. Taken together, we stress that the CoDL dataset is an incredible resource for the broader research community interested in subsurface life, and steps to remove contamination derived sequences must be taken prior to using this dataset.

  13. A Deep-Coverage Tomato BAC Library and Prospects Toward Development of an STC Framework for Genome Sequencing

    PubMed Central

    Budiman, Muhammad A.; Mao, Long; Wood, Todd C.; Wing, Rod A.

    2000-01-01

    Recently a new strategy using BAC end sequences as sequence-tagged connectors (STCs) was proposed for whole-genome sequencing projects. In this study, we present the construction and detailed characterization of a 15.0 haploid genome equivalent BAC library for the cultivated tomato, Lycopersicon esculentum cv. Heinz 1706. The library contains 129,024 clones with an average insert size of 117.5 kb and a chloroplast content of 1.11%. BAC end sequences from 1490 ends were generated and analyzed as a preliminary evaluation for using this library to develop an STC framework to sequence the tomato genome. A total of 1205 BAC end sequences (80.9%) were obtained, with an average length of 360 high-quality bases, and were searched against the GenBank database. Using a cutoff expectation value of <10−6, and combining the results from BLASTN, BLASTX, and TBLASTX searches, 24.3% of the BAC end sequences were similar to known sequences, of which almost half (48.7%) share sequence similarities to retrotransposons and 7% to known genes. Some of the transposable element sequences were the first reported in tomato, such as sequences similar to maize transposon Activator (Ac) ORF and tobacco pararetrovirus-like sequences. Interestingly, there were no BAC end sequences similar to the highly repeated TGRI and TGRII elements. However, the majority (70.3%) of STCs did not share significant sequence similarities to any sequences in GenBank at either the DNA or predicted protein levels, indicating that a large portion of the tomato genome is still unknown. Our data demonstrate that this BAC library is suitable for developing an STC database to sequence the tomato genome. The advantages of developing an STC framework for whole-genome sequencing of tomato are discussed. [The BAC end sequences described in this paper have been deposited in the GenBank data library under accession nos. AQ367111–AQ368361.] PMID:10645957

  14. Graphical classification of DNA sequences of HLA alleles by deep learning.

    PubMed

    Miyake, Jun; Kaneshita, Yuhei; Asatani, Satoshi; Tagawa, Seiichi; Niioka, Hirohiko; Hirano, Takashi

    2018-04-01

    Alleles of human leukocyte antigen (HLA)-A DNAs are classified and expressed graphically by using artificial intelligence "Deep Learning (Stacked autoencoder)". Nucleotide sequence data corresponding to the length of 822 bp, collected from the Immuno Polymorphism Database, were compressed to 2-dimensional representation and were plotted. Profiles of the two-dimensional plots indicate that the alleles can be classified as clusters are formed. The two-dimensional plot of HLA-A DNAs gives a clear outlook for characterizing the various alleles.

  15. 3' terminal diversity of MRP RNA and other human noncoding RNAs revealed by deep sequencing.

    PubMed

    Goldfarb, Katherine C; Cech, Thomas R

    2013-09-21

    Post-transcriptional 3' end processing is a key component of RNA regulation. The abundant and essential RNA subunit of RNase MRP has been proposed to function in three distinct cellular compartments and therefore may utilize this mode of regulation. Here we employ 3' RACE coupled with high-throughput sequencing to characterize the 3' terminal sequences of human MRP RNA and other noncoding RNAs that form RNP complexes. The 3' terminal sequence of MRP RNA from HEK293T cells has a distinctive distribution of genomically encoded termini (including an assortment of U residues) with a portion of these selectively tagged by oligo(A) tails. This profile contrasts with the relatively homogenous 3' terminus of an in vitro transcribed MRP RNA control and the differing 3' terminal profiles of U3 snoRNA, RNase P RNA, and telomerase RNA (hTR). 3' RACE coupled with deep sequencing provides a valuable framework for the functional characterization of 3' terminal sequences of noncoding RNAs.

  16. Deep targeted sequencing in pediatric acute lymphoblastic leukemia unveils distinct mutational patterns between genetic subtypes and novel relapse-associated genes.

    PubMed

    Lindqvist, C Mårten; Lundmark, Anders; Nordlund, Jessica; Freyhult, Eva; Ekman, Diana; Carlsson Almlöf, Jonas; Raine, Amanda; Övernäs, Elin; Abrahamsson, Jonas; Frost, Britt-Marie; Grandér, Dan; Heyman, Mats; Palle, Josefine; Forestier, Erik; Lönnerholm, Gudmar; Berglund, Eva C; Syvänen, Ann-Christine

    2016-09-27

    To characterize the mutational patterns of acute lymphoblastic leukemia (ALL) we performed deep next generation sequencing of 872 cancer genes in 172 diagnostic and 24 relapse samples from 172 pediatric ALL patients. We found an overall greater mutational burden and more driver mutations in T-cell ALL (T-ALL) patients compared to B-cell precursor ALL (BCP-ALL) patients. In addition, the majority of the mutations in T-ALL had occurred in the original leukemic clone, while most of the mutations in BCP-ALL were subclonal. BCP-ALL patients carrying any of the recurrent translocations ETV6-RUNX1, BCR-ABL or TCF3-PBX1 harbored few mutations in driver genes compared to other BCP-ALL patients. Specifically in BCP-ALL, we identified ATRX as a novel putative driver gene and uncovered an association between somatic mutations in the Notch signaling pathway at ALL diagnosis and increased risk of relapse. Furthermore, we identified EP300, ARID1A and SH2B3 as relapse-associated genes. The genes highlighted in our study were frequently involved in epigenetic regulation, associated with germline susceptibility to ALL, and present in minor subclones at diagnosis that became dominant at relapse. We observed a high degree of clonal heterogeneity and evolution between diagnosis and relapse in both BCP-ALL and T-ALL, which could have implications for the treatment efficiency.

  17. Molecular Technique to Understand Deep Microbial Diversity

    NASA Technical Reports Server (NTRS)

    Vaishampayan, Parag A.; Venkateswaran, Kasthuri J.

    2012-01-01

    Current sequencing-based and DNA microarray techniques to study microbial diversity are based on an initial PCR (polymerase chain reaction) amplification step. However, a number of factors are known to bias PCR amplification and jeopardize the true representation of bacterial diversity. PCR amplification of the minor template appears to be suppressed by the exponential amplification of the more abundant template. It is widely acknowledged among environmental molecular microbiologists that genetic biosignatures identified from an environment only represent the most dominant populations. The technological bottleneck has overlooked the presence of the less abundant minority population, and underestimated their role in the ecosystem maintenance. To generate PCR amplicons for subsequent diversity analysis, bacterial l6S rRNA genes are amplified by PCR using universal primers. Two distinct PCR regimes are employed in parallel: one using normal and the other using biotinlabeled universal primers. PCR products obtained with biotin-labeled primers are mixed with streptavidin-labeled magnetic beads and selectively captured in the presence of a magnetic field. Less-abundant DNA templates that fail to amplify in this first round of PCR amplification are subjected to a second round of PCR using normal universal primers. These PCR products are then subjected to downstream diversity analyses such as conventional cloning and sequencing. A second round of PCR amplified the minority population and completed the deep diversity picture of the environmental sample.

  18. Detection of bovine viral diarrhoea virus nucleic acid, but not infectious virus, in bovine serum used for human vaccine manufacture.

    PubMed

    Laassri, Majid; Mee, Edward T; Connaughton, Sarah M; Manukyan, Hasmik; Gruber, Marion; Rodriguez-Hernandez, Carmen; Minor, Philip D; Schepelmann, Silke; Chumakov, Konstantin; Wood, David J

    2018-06-22

    Bovine viral diarrhoea virus (BVDV) is a cattle pathogen that has previously been reported to be present in bovine raw materials used in the manufacture of biological products for human use. Seven lots of trivalent measles, mumps and rubella (MMR) vaccine and 1 lot of measles vaccine from the same manufacturer, together with 17 lots of foetal bovine serum (FBS) from different vendors, 4 lots of horse serum, 2 lots of bovine trypsin and 5 lots of porcine trypsin were analysed for BVDV using recently developed techniques, including PCR assays for BVDV detection, a qRT-PCR and immunofluorescence-based virus replication assays, and deep sequencing to identify and genotype BVDV genomes. All FBS lots and one lot of bovine-derived trypsin were PCR-positive for the presence of BVDV genome; in contrast all vaccine lots and the other samples were negative. qRT-PCR based virus replication assay and immunofluorescence-based infection assay detected no infectious BVDV in the PCR-positive samples. Complete BVDV genomes were generated from FBS samples by deep sequencing, and all were BVDV type 1. These data confirmed that BVDV nucleic acid may be present in bovine-derived raw materials, but no infectious virus or genomic RNA was detected in the final vaccine products. Copyright © 2018 International Alliance for Biological Standardization. All rights reserved.

  19. Global impact of RNA splicing on transcriptome remodeling in the heart.

    PubMed

    Gao, Chen; Wang, Yibin

    2012-08-01

    In the eukaryotic transcriptome, both the numbers of genes and different RNA species produced by each gene contribute to the overall complexity. These RNA species are generated by the utilization of different transcriptional initiation or termination sites, or more commonly, from different messenger RNA (mRNA) splicing events. Among the 30,000+ genes in human genome, it is estimated that more than 95% of them can generate more than one gene product via alternative RNA splicing. The protein products generated from different RNA splicing variants can have different intracellular localization, activity, or tissue-distribution. Therefore, alternative RNA splicing is an important molecular process that contributes to the overall complexity of the genome and the functional specificity and diversity among different cell types. In this review, we will discuss current efforts to unravel the full complexity of the cardiac transcriptome using a deep-sequencing approach, and highlight the potential of this technology to uncover the global impact of RNA splicing on the transcriptome during development and diseases of the heart.

  20. Next-generation libraries for robust RNA interference-based genome-wide screens

    PubMed Central

    Kampmann, Martin; Horlbeck, Max A.; Chen, Yuwen; Tsai, Jordan C.; Bassik, Michael C.; Gilbert, Luke A.; Villalta, Jacqueline E.; Kwon, S. Chul; Chang, Hyeshik; Kim, V. Narry; Weissman, Jonathan S.

    2015-01-01

    Genetic screening based on loss-of-function phenotypes is a powerful discovery tool in biology. Although the recent development of clustered regularly interspaced short palindromic repeats (CRISPR)-based screening approaches in mammalian cell culture has enormous potential, RNA interference (RNAi)-based screening remains the method of choice in several biological contexts. We previously demonstrated that ultracomplex pooled short-hairpin RNA (shRNA) libraries can largely overcome the problem of RNAi off-target effects in genome-wide screens. Here, we systematically optimize several aspects of our shRNA library, including the promoter and microRNA context for shRNA expression, selection of guide strands, and features relevant for postscreen sample preparation for deep sequencing. We present next-generation high-complexity libraries targeting human and mouse protein-coding genes, which we grouped into 12 sublibraries based on biological function. A pilot screen suggests that our next-generation RNAi library performs comparably to current CRISPR interference (CRISPRi)-based approaches and can yield complementary results with high sensitivity and high specificity. PMID:26080438

  1. Geology and origin of epigenetic lode gold deposits, Tintina Gold Province, Alaska and Yukon: Chapter A in Recent U.S. Geological Survey studies in the Tintina Gold Province, Alaska, United States, and Yukon, Canada--results of a 5-year project

    USGS Publications Warehouse

    Goldfarb, Richard J.; Marsh, Erin E.; Hart, Craig J.R.; Mair, John L.; Miller, Marti L.; Johnson, Craig; Gough, Larry P.; Day, Warren C.

    2007-01-01

    -rich and 18O-rich crustal fluids, most commonly of low salinity. The older group of ores includes the low-grade intrusion-related gold systems at Fort Knox near Fairbanks and those in Yukon, with fluids exsolved from fractionating melts at depths of 3 to 9 kilometers and forming a zoned sequence of auriferous mineralization styles extending outward to the surrounding metasedimentary country rocks. The causative plutons are products of potassic mafic magmas generated in the subcontinental lithospheric mantle that interacted with overlying lower to middle crust to generate the more felsic ore-related intrusions. In addition, the older ores include spatially associated, high-grade, shear-zonerelated orogenic gold deposits formed at the same depths from upward-migrating metamorphic fluids; the Pogo deposit is a relatively deep-seated example of such. The younger gold ores, restricted to southwestern Alaska, formed in unmetamorphosed sedimentary rocks of the Kuskokwim basin within 1 to 2 kilometers of the surface. Most of these deposits formed via fluid exsolution from shallowly emplaced, highly evolved igneous complexes generated mainly as mantle melts. However, the giant Donlin Creek orogenic gold deposit is a product of either metamorphic devolatilization deep in the basin or of a gold-bearing fluid released from a flysch-melt igneous body.

  2. Genome-wide nucleosome map and cytosine methylation levels of an ancient human genome.

    PubMed

    Pedersen, Jakob Skou; Valen, Eivind; Velazquez, Amhed M Vargas; Parker, Brian J; Rasmussen, Morten; Lindgreen, Stinus; Lilje, Berit; Tobin, Desmond J; Kelly, Theresa K; Vang, Søren; Andersson, Robin; Jones, Peter A; Hoover, Cindi A; Tikhonov, Alexei; Prokhortchouk, Egor; Rubin, Edward M; Sandelin, Albin; Gilbert, M Thomas P; Krogh, Anders; Willerslev, Eske; Orlando, Ludovic

    2014-03-01

    Epigenetic information is available from contemporary organisms, but is difficult to track back in evolutionary time. Here, we show that genome-wide epigenetic information can be gathered directly from next-generation sequence reads of DNA isolated from ancient remains. Using the genome sequence data generated from hair shafts of a 4000-yr-old Paleo-Eskimo belonging to the Saqqaq culture, we generate the first ancient nucleosome map coupled with a genome-wide survey of cytosine methylation levels. The validity of both nucleosome map and methylation levels were confirmed by the recovery of the expected signals at promoter regions, exon/intron boundaries, and CTCF sites. The top-scoring nucleosome calls revealed distinct DNA positioning biases, attesting to nucleotide-level accuracy. The ancient methylation levels exhibited high conservation over time, clustering closely with modern hair tissues. Using ancient methylation information, we estimated the age at death of the Saqqaq individual and illustrate how epigenetic information can be used to infer ancient gene expression. Similar epigenetic signatures were found in other fossil material, such as 110,000- to 130,000-yr-old bones, supporting the contention that ancient epigenomic information can be reconstructed from a deep past. Our findings lay the foundation for extracting epigenomic information from ancient samples, allowing shifts in epialleles to be tracked through evolutionary time, as well as providing an original window into modern epigenomics.

  3. Genome-wide nucleosome map and cytosine methylation levels of an ancient human genome

    PubMed Central

    Pedersen, Jakob Skou; Valen, Eivind; Velazquez, Amhed M. Vargas; Parker, Brian J.; Rasmussen, Morten; Lindgreen, Stinus; Lilje, Berit; Tobin, Desmond J.; Kelly, Theresa K.; Vang, Søren; Andersson, Robin; Jones, Peter A.; Hoover, Cindi A.; Tikhonov, Alexei; Prokhortchouk, Egor; Rubin, Edward M.; Sandelin, Albin; Gilbert, M. Thomas P.; Krogh, Anders; Willerslev, Eske; Orlando, Ludovic

    2014-01-01

    Epigenetic information is available from contemporary organisms, but is difficult to track back in evolutionary time. Here, we show that genome-wide epigenetic information can be gathered directly from next-generation sequence reads of DNA isolated from ancient remains. Using the genome sequence data generated from hair shafts of a 4000-yr-old Paleo-Eskimo belonging to the Saqqaq culture, we generate the first ancient nucleosome map coupled with a genome-wide survey of cytosine methylation levels. The validity of both nucleosome map and methylation levels were confirmed by the recovery of the expected signals at promoter regions, exon/intron boundaries, and CTCF sites. The top-scoring nucleosome calls revealed distinct DNA positioning biases, attesting to nucleotide-level accuracy. The ancient methylation levels exhibited high conservation over time, clustering closely with modern hair tissues. Using ancient methylation information, we estimated the age at death of the Saqqaq individual and illustrate how epigenetic information can be used to infer ancient gene expression. Similar epigenetic signatures were found in other fossil material, such as 110,000- to 130,000-yr-old bones, supporting the contention that ancient epigenomic information can be reconstructed from a deep past. Our findings lay the foundation for extracting epigenomic information from ancient samples, allowing shifts in epialleles to be tracked through evolutionary time, as well as providing an original window into modern epigenomics. PMID:24299735

  4. Genotyping of single spore isolates of a Pasteuria penetrans population occurring in Florida using SNP-based markers.

    PubMed

    Joseph, S; Schmidt, L M; Danquah, W B; Timper, P; Mekete, T

    2017-02-01

    To generate single spore lines of a population of bacterial parasite of root-knot nematode (RKN), Pasteuria penetrans, isolated from Florida and examine genotypic variation and virulence characteristics exist within the population. Six single spore lines (SSP), 16SSP, 17SSP, 18SSP, 25SSP, 26SSP and 30SSP were generated. Genetic variability was evaluated by comparing single-nucleotide polymorphisms (SNPs) in six protein-coding genes and the 16S rRNA gene. An average of one SNP was observed for every 69 bp in the 16S rRNA, whereas no SNPs were observed in the protein-coding sequences. Hierarchical cluster analysis of 16S rRNA sequences placed the clones into three distinct clades. Bio-efficacy analysis revealed significant heterogeneity in the level virulence and host specificity between the individual clones. The SNP markers developed to the 5' hypervariable region of the 16S rRNA gene may be useful in biotype differentiation within a population of P. penetrans. This study demonstrates an efficient method for generating single spore lines of P. penetrans and gives a deep insight into genetic heterogeneity and varying level of virulence exists within a population parasitizing a specific Meloidogyne sp. host. The results also suggest that the application of generalist spore lines in nematode management may achieve broad RKN control. © 2016 The Society for Applied Microbiology.

  5. Complete genome sequence of a novel genotype of squash mosaic virus

    USDA-ARS?s Scientific Manuscript database

    Complete genome sequence of a novel genotype of Squash mosaic virus (SqMV) infecting squash plants in Spain was obtained using deep sequencing of small ribonucleic acids and assembly. The low nucleotide sequence identities, with 87-88% on RNA1 and 84-86% on RNA2 to known SqMV isolates, suggest a new...

  6. First complete genome sequence of an emerging cucumber green mottle mosaic virus isolate in North America

    USDA-ARS?s Scientific Manuscript database

    The complete genome sequence (6,423 nt) of an emerging Cucumber green mottle mosaic virus (CGMMV) isolate on cucumber in North America was determined through deep sequencing of sRNA and rapid amplification of cDNA ends. It shares 99% nucleotide sequence identity to the Asian genotype, but only 90% t...

  7. Comprehensive discovery of noncoding RNAs in acute myeloid leukemia cell transcriptomes.

    PubMed

    Zhang, Jin; Griffith, Malachi; Miller, Christopher A; Griffith, Obi L; Spencer, David H; Walker, Jason R; Magrini, Vincent; McGrath, Sean D; Ly, Amy; Helton, Nichole M; Trissal, Maria; Link, Daniel C; Dang, Ha X; Larson, David E; Kulkarni, Shashikant; Cordes, Matthew G; Fronick, Catrina C; Fulton, Robert S; Klco, Jeffery M; Mardis, Elaine R; Ley, Timothy J; Wilson, Richard K; Maher, Christopher A

    2017-11-01

    To detect diverse and novel RNA species comprehensively, we compared deep small RNA and RNA sequencing (RNA-seq) methods applied to a primary acute myeloid leukemia (AML) sample. We were able to discover previously unannotated small RNAs using deep sequencing of a library method using broader insert size selection. We analyzed the long noncoding RNA (lncRNA) landscape in AML by comparing deep sequencing from multiple RNA-seq library construction methods for the sample that we studied and then integrating RNA-seq data from 179 AML cases. This identified lncRNAs that are completely novel, differentially expressed, and associated with specific AML subtypes. Our study revealed the complexity of the noncoding RNA transcriptome through a combined strategy of strand-specific small RNA and total RNA-seq. This dataset will serve as an invaluable resource for future RNA-based analyses. Copyright © 2017 ISEH – Society for Hematology and Stem Cells. Published by Elsevier Inc. All rights reserved.

  8. Development of a candidate reference material for adventitious virus detection in vaccine and biologicals manufacturing by deep sequencing

    PubMed Central

    Mee, Edward T.; Preston, Mark D.; Minor, Philip D.; Schepelmann, Silke; Huang, Xuening; Nguyen, Jenny; Wall, David; Hargrove, Stacey; Fu, Thomas; Xu, George; Li, Li; Cote, Colette; Delwart, Eric; Li, Linlin; Hewlett, Indira; Simonyan, Vahan; Ragupathy, Viswanath; Alin, Voskanian-Kordi; Mermod, Nicolas; Hill, Christiane; Ottenwälder, Birgit; Richter, Daniel C.; Tehrani, Arman; Jacqueline, Weber-Lehmann; Cassart, Jean-Pol; Letellier, Carine; Vandeputte, Olivier; Ruelle, Jean-Louis; Deyati, Avisek; La Neve, Fabio; Modena, Chiara; Mee, Edward; Schepelmann, Silke; Preston, Mark; Minor, Philip; Eloit, Marc; Muth, Erika; Lamamy, Arnaud; Jagorel, Florence; Cheval, Justine; Anscombe, Catherine; Misra, Raju; Wooldridge, David; Gharbia, Saheer; Rose, Graham; Ng, Siemon H.S.; Charlebois, Robert L.; Gisonni-Lex, Lucy; Mallet, Laurent; Dorange, Fabien; Chiu, Charles; Naccache, Samia; Kellam, Paul; van der Hoek, Lia; Cotten, Matt; Mitchell, Christine; Baier, Brian S.; Sun, Wenping; Malicki, Heather D.

    2016-01-01

    Background Unbiased deep sequencing offers the potential for improved adventitious virus screening in vaccines and biotherapeutics. Successful implementation of such assays will require appropriate control materials to confirm assay performance and sensitivity. Methods A common reference material containing 25 target viruses was produced and 16 laboratories were invited to process it using their preferred adventitious virus detection assay. Results Fifteen laboratories returned results, obtained using a wide range of wet-lab and informatics methods. Six of 25 target viruses were detected by all laboratories, with the remaining viruses detected by 4–14 laboratories. Six non-target viruses were detected by three or more laboratories. Conclusion The study demonstrated that a wide range of methods are currently used for adventitious virus detection screening in biological products by deep sequencing and that they can yield significantly different results. This underscores the need for common reference materials to ensure satisfactory assay performance and enable comparisons between laboratories. PMID:26709640

  9. Intelligent fault diagnosis of rolling bearings using an improved deep recurrent neural network

    NASA Astrophysics Data System (ADS)

    Jiang, Hongkai; Li, Xingqiu; Shao, Haidong; Zhao, Ke

    2018-06-01

    Traditional intelligent fault diagnosis methods for rolling bearings heavily depend on manual feature extraction and feature selection. For this purpose, an intelligent deep learning method, named the improved deep recurrent neural network (DRNN), is proposed in this paper. Firstly, frequency spectrum sequences are used as inputs to reduce the input size and ensure good robustness. Secondly, DRNN is constructed by the stacks of the recurrent hidden layer to automatically extract the features from the input spectrum sequences. Thirdly, an adaptive learning rate is adopted to improve the training performance of the constructed DRNN. The proposed method is verified with experimental rolling bearing data, and the results confirm that the proposed method is more effective than traditional intelligent fault diagnosis methods.

  10. Position-specific binding of FUS to nascent RNA regulates mRNA length

    PubMed Central

    Masuda, Akio; Takeda, Jun-ichi; Okuno, Tatsuya; Okamoto, Takaaki; Ohkawara, Bisei; Ito, Mikako; Ishigaki, Shinsuke; Sobue, Gen

    2015-01-01

    More than half of all human genes produce prematurely terminated polyadenylated short mRNAs. However, the underlying mechanisms remain largely elusive. CLIP-seq (cross-linking immunoprecipitation [CLIP] combined with deep sequencing) of FUS (fused in sarcoma) in neuronal cells showed that FUS is frequently clustered around an alternative polyadenylation (APA) site of nascent RNA. ChIP-seq (chromatin immunoprecipitation [ChIP] combined with deep sequencing) of RNA polymerase II (RNAP II) demonstrated that FUS stalls RNAP II and prematurely terminates transcription. When an APA site is located upstream of an FUS cluster, FUS enhances polyadenylation by recruiting CPSF160 and up-regulates the alternative short transcript. In contrast, when an APA site is located downstream from an FUS cluster, polyadenylation is not activated, and the RNAP II-suppressing effect of FUS leads to down-regulation of the alternative short transcript. CAGE-seq (cap analysis of gene expression [CAGE] combined with deep sequencing) and PolyA-seq (a strand-specific and quantitative method for high-throughput sequencing of 3' ends of polyadenylated transcripts) revealed that position-specific regulation of mRNA lengths by FUS is operational in two-thirds of transcripts in neuronal cells, with enrichment in genes involved in synaptic activities. PMID:25995189

  11. Insights about minority HIV-1 strains in transmitted drug resistance mutation dynamics and disease progression.

    PubMed

    Leda, Ana Rachel; Hunter, James; Oliveira, Ursula Castro; Azevedo, Inacio Junqueira; Sucupira, Maria Cecilia Araripe; Diaz, Ricardo Sobhie

    2018-04-19

    The presence of minority transmitted drug resistance mutations was assessed using ultra-deep sequencing and correlated with disease progression among recently HIV-1-infected individuals from Brazil. Samples at baseline during recent infection and 1 year after the establishment of the infection were analysed. Viral RNA and proviral DNA from 25 individuals were subjected to ultra-deep sequencing of the reverse transcriptase and protease regions of HIV-1. Viral strains carrying transmitted drug resistance mutations were detected in 9 out of the 25 patients, for all major antiretroviral classes, ranging from one to five mutations per patient. Ultra-deep sequencing detected strains with frequencies as low as 1.6% and only strains with frequencies >20% were detected by population plasma sequencing (three patients). Transmitted drug resistance strains with frequencies <14.8% did not persist upon established infection. The presence of transmitted drug resistance mutations was negatively correlated with the viral load and with CD4+ T cell count decay. Transmitted drug resistance mutations representing small percentages of the viral population do not persist during infection because they are negatively selected in the first year after HIV-1 seroconversion.

  12. RNA sequencing analysis to capture the transcriptome landscape during skin ulceration syndrome progression in sea cucumber Apostichopus japonicus.

    PubMed

    Yang, Aifu; Zhou, Zunchun; Pan, Yongjia; Jiang, Jingwei; Dong, Ying; Guan, Xiaoyan; Sun, Hongjuan; Gao, Shan; Chen, Zhong

    2016-06-14

    Sea cucumber Apostichopus japonicus is an important economic species in China, which is affected by various diseases; skin ulceration syndrome (SUS) is the most serious. In this study, we characterized the transcriptomes in A. japonicus challenged with Vibrio splendidus to elucidate the changes in gene expression throughout the three stages of SUS progression. RNA sequencing of 21 cDNA libraries from various tissues and developmental stages of SUS-affected A. japonicus yielded 553 million raw reads, of which 542 million high-quality reads were generated by deep-sequencing using the Illumina HiSeq™ 2000 platform. The reference transcriptome comprised a combination of the Illumina reads, 454 sequencing data and Sanger sequences obtained from the public database to generate 93,163 unigenes (average length, 1,052 bp; N50 = 1,575 bp); 33,860 were annotated. Transcriptome comparisons between healthy and SUS-affected A. japonicus revealed greater differences in gene expression profiles in the body walls (BW) than in the intestines (Int), respiratory trees (RT) and coelomocytes (C). Clustering of expression models revealed stable up-regulation as the main pattern occurring in the BW throughout the three stages of SUS progression. Significantly affected pathways were associated with signal transduction, immune system, cellular processes, development and metabolism. Ninety-two differentially expressed genes (DEGs) were divided into four functional categories: attachment/pathogen recognition (17), inflammatory reactions (38), oxidative stress response (7) and apoptosis (30). Using quantitative real-time PCR, twenty representative DEGs were selected to validate the sequencing results. The Pearson's correlation coefficient (R) of the 20 DEGs ranged from 0.811 to 0.999, which confirmed the consistency and accuracy between these two approaches. Dynamic changes in global gene expression occur during SUS progression in A. japonicus. Elucidation of these changes is important in clarifying the molecular mechanisms associated with the development of SUS in sea cucumber.

  13. Feasibility of 3.0T pelvic MR imaging in the evaluation of endometriosis.

    PubMed

    Manganaro, L; Fierro, F; Tomei, A; Irimia, D; Lodise, P; Sergi, M E; Vinci, V; Sollazzo, P; Porpora, M G; Delfini, R; Vittori, G; Marini, M

    2012-06-01

    Endometriosis represents an important clinical problem in women of reproductive age with high impact on quality of life, work productivity and health care management. The aim of this study is to define the role of 3T magnetom system MRI in the evaluation of endometriosis. Forty-six women, with transvaginal (TV) ultrasound examination positive for endometriosis, with pelvic pain, or infertile underwent an MR 3.0T examination with the following protocol: T2 weighted FRFSE HR sequences, T2 weighted FRFSE HR CUBE 3D sequences, T1 w FSE sequences, LAVA-flex sequences. Pelvic anatomy, macroscopic endometriosis implants, deep endometriosis implants, fallopian tube involvement, adhesions presence, fluid effusion in Douglas pouch, uterus and kidney pathologies or anomalies associated and sacral nervous routes were considered by two radiologists in consensus. Laparoscopy was considered the gold standard. MRI imaging diagnosed deep endometriosis in 22/46 patients, endometriomas not associated to deep implants in 9/46 patients, 15/46 patients resulted negative for endometriosis, 11 of 22 patients with deep endometriosis reported ovarian endometriosis cyst. We obtained high percentages of sensibility (96.97%), specificity (100.00%), VPP (100.00%), VPN (92.86%). Pelvic MRI performed with 3T system guarantees high spatial and contrast resolution, providing accurate information about endometriosis implants, with a good pre-surgery mapping of the lesions involving both bowels and bladder surface and recto-uterine ligaments. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  14. Simultaneous and complete genome sequencing of influenza A and B with high coverage by Illumina MiSeq Platform.

    PubMed

    Rutvisuttinunt, Wiriya; Chinnawirotpisan, Piyawan; Simasathien, Sriluck; Shrestha, Sanjaya K; Yoon, In-Kyu; Klungthong, Chonticha; Fernandez, Stefan

    2013-11-01

    Active global surveillance and characterization of influenza viruses are essential for better preparation against possible pandemic events. Obtaining comprehensive information about the influenza genome can improve our understanding of the evolution of influenza viruses and emergence of new strains, and improve the accuracy when designing preventive vaccines. This study investigated the use of deep sequencing by the next-generation sequencing (NGS) Illumina MiSeq Platform to obtain complete genome sequence information from influenza virus isolates. The influenza virus isolates were cultured from 6 respiratory acute clinical specimens collected in Thailand and Nepal. DNA libraries obtained from each viral isolate were mixed and all were sequenced simultaneously. Total information of 2.6 Gbases was obtained from a 455±14 K/mm2 density with 95.76% (8,571,655/8,950,724 clusters) of the clusters passing quality control (QC) filters. Approximately 93.7% of all sequences from Read1 and 83.5% from Read2 contained high quality sequences that were ≥Q30, a base calling QC score standard. Alignments analysis identified three seasonal influenza A H3N2 strains, one 2009 pandemic influenza A H1N1 strain and two influenza B strains. The nearly entire genomes of all six virus isolates yielded equal or greater than 600-fold sequence coverage depth. MiSeq Platform identified seasonal influenza A H3N2, 2009 pandemic influenza A H1N1and influenza B in the DNA library mixtures efficiently. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.

  15. Burrow-generated false facies and phantom sequences

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wanless, H.R.; Tagett, M.

    Callianassa (=Ophiomorpha) and other burrowers deeply rework shallow marine sequences. Through in-situ reworking, they create false sedimentary facies and stratigraphic sequences. Callianassa's key to effectiveness is that it expels sand and mud from burrow excavations but concentrates coarse material at the base of the burrow complex. Coarse material can be derived by falling into the burrow entrance, by reworking the existing sediment sequence, or by a combination of both. Examples come from shallow marine carbonate environments of south Florida and the Turks and Caicos Islands, British West Indies. Many mudbanks in south Florida are formed as stacks of layered mudstonemore » units 20-100 cm thick. Between events, seagrasses may recolonize, and a burrowing benthic community may repopulate the substrate. The layered mudstone beneath older areas of mudbank flats can gradually be converted to a bioturbated skeletal wackestone by the deep burrowing community. Burrowing also causes mixing of faunal assemblages. On Caicos Bank, an extensive carbonate tidal flat (3-4 m thick) is slowly being transgressed. About 1 m of tidal-flat sequence is eroded at the shoreline. The remaining 2-3 m could be preserved as part of the transgressive sequence. Callianassa burrowing, however, quickly reworks the sequence, replacing tidal-flat sands and muds with marine peloidal and skeletal sediment. Within 100 m of the shoreline, the only evidence of the tidal-flat sequence is a concentration of high-spired gastropods in Calliannassa burrows at the base of the Holocene sequence and a few patches of tidal-flat sediment that burrowers missed. What looks like a basal transgressive lag is in fact a biogenic concentrate from in-situ reworking of a now phantom sequence.« less

  16. ComplexContact: a web server for inter-protein contact prediction using deep learning.

    PubMed

    Zeng, Hong; Wang, Sheng; Zhou, Tianming; Zhao, Feifeng; Li, Xiufeng; Wu, Qing; Xu, Jinbo

    2018-05-22

    ComplexContact (http://raptorx2.uchicago.edu/ComplexContact/) is a web server for sequence-based interfacial residue-residue contact prediction of a putative protein complex. Interfacial residue-residue contacts are critical for understanding how proteins form complex and interact at residue level. When receiving a pair of protein sequences, ComplexContact first searches for their sequence homologs and builds two paired multiple sequence alignments (MSA), then it applies co-evolution analysis and a CASP-winning deep learning (DL) method to predict interfacial contacts from paired MSAs and visualizes the prediction as an image. The DL method was originally developed for intra-protein contact prediction and performed the best in CASP12. Our large-scale experimental test further shows that ComplexContact greatly outperforms pure co-evolution methods for inter-protein contact prediction, regardless of the species.

  17. Mixed heterolobosean and novel gregarine lineage genes from culture ATCC 50646: Long-branch artefacts, not lateral gene transfer, distort α-tubulin phylogeny.

    PubMed

    Cavalier-Smith, Thomas

    2015-04-01

    Contradictory and confusing results can arise if sequenced 'monoprotist' samples really contain DNA of very different species. Eukaryote-wide phylogenetic analyses using five genes from the amoeboflagellate culture ATCC 50646 previously implied it was an undescribed percolozoan related to percolatean flagellates (Stephanopogon, Percolomonas). Contrastingly, three phylogenetic analyses of 18S rRNA alone, did not place it within Percolozoa, but as an isolated deep-branching excavate. I resolve that contradiction by sequence phylogenies for all five genes individually, using up to 652 taxa. Its 18S rRNA sequence (GQ377652) is near-identical to one from stained-glass windows, somewhat more distant from one from cooling-tower water, all three related to terrestrial actinocephalid gregarines Hoplorhynchus and Pyxinia. All four protein-gene sequences (Hsp90; α-tubulin; β-tubulin; actin) are from an amoeboflagellate heterolobosean percolozoan, not especially deeply branching. Contrary to previous conclusions from trees combining protein and rRNA sequences or rDNA trees including Eozoa only, this culture does not represent a major novel deep-branching eukaryote lineage distinct from Heterolobosea, and thus lacks special significance for deep eukaryote phylogeny, though the rDNA sequence is important for gregarine phylogeny. α-Tubulin trees for over 250 eukaryotes refute earlier suggestions of lateral gene transfer within eukaryotes, being largely congruent with morphology and other gene trees. Copyright © 2015. Published by Elsevier GmbH.

  18. Development of deep-ultraviolet metal vapor lasers

    NASA Astrophysics Data System (ADS)

    Sabotinov, Nikola V.

    2004-06-01

    Deep ultraviolet laser generation is of great interest in connection with both the development of new industrial technologies and applications in medicine, biology, chemistry, etc. The development of metal vapor UV lasers oscillating in the pulsed mode with high pulse repetition frequencies and producing high average output powers is of particular interest for microprocessing of polymers, photolithography and fluorescence applications. At present, metal vapor lasers generate deep-UV radiation on the base of two methods. The first method is non-linear conversion of powerful laser generation from the visible region into the deep ultraviolet region. The second method is direct UV laser action on ion and atomic transitions of different metals.

  19. Deep sequencing-based transcriptome profiling analysis of bacteria-challenged Lateolabrax japonicus reveals insight into the immune-relevant genes in marine fish

    PubMed Central

    2010-01-01

    Background Systematic research on fish immunogenetics is indispensable in understanding the origin and evolution of immune systems. This has long been a challenging task because of the limited number of deep sequencing technologies and genome backgrounds of non-model fish available. The newly developed Solexa/Illumina RNA-seq and Digital gene expression (DGE) are high-throughput sequencing approaches and are powerful tools for genomic studies at the transcriptome level. This study reports the transcriptome profiling analysis of bacteria-challenged Lateolabrax japonicus using RNA-seq and DGE in an attempt to gain insights into the immunogenetics of marine fish. Results RNA-seq analysis generated 169,950 non-redundant consensus sequences, among which 48,987 functional transcripts with complete or various length encoding regions were identified. More than 52% of these transcripts are possibly involved in approximately 219 known metabolic or signalling pathways, while 2,673 transcripts were associated with immune-relevant genes. In addition, approximately 8% of the transcripts appeared to be fish-specific genes that have never been described before. DGE analysis revealed that the host transcriptome profile of Vibrio harveyi-challenged L. japonicus is considerably altered, as indicated by the significant up- or down-regulation of 1,224 strong infection-responsive transcripts. Results indicated an overall conservation of the components and transcriptome alterations underlying innate and adaptive immunity in fish and other vertebrate models. Analysis suggested the acquisition of numerous fish-specific immune system components during early vertebrate evolution. Conclusion This study provided a global survey of host defence gene activities against bacterial challenge in a non-model marine fish. Results can contribute to the in-depth study of candidate genes in marine fish immunity, and help improve current understanding of host-pathogen interactions and evolutionary history of immunogenetics from fish to mammals. PMID:20707909

  20. Model for turbidite-to-contourite continuum and multiple process transport in deep marine settings: examples in the rock record

    NASA Astrophysics Data System (ADS)

    Stanley, Daniel Jean

    1993-01-01

    Petrological analysis of geological sections in St. Croix in the Caribbean, the Niesenflysch in Switzerland and the Annot Sandstone in the French Maritime Alps sheds light on multiple process transport in deep marine settings. A model depicting a turbidite-to-contourite continuum of stratal types is applied to these three rock units. Recognition of a diverse suite of bedforms, coupled with analysis of paleocurrents, helps to better interpret depositional origin and basin paleogeography. The St. Croix strata record emplacement by gravity flows and, subsequently, by bottom currents flowing parallel to the base of slope; these sediments accumulated on a lower slope apron. A Niesenflysch section in the Swiss Alps west of Adelboden includes turbidites which were deposited at fairly regular intervals beyond the base of slope, in a setting more distal than that of the St. Croix sequences. Most of these turbidites appear to have been partially reworked by bottom currents related to basin circulation or to density flows from the basin margins. In the Annot Sandstone, reworked turbidites (termed transitional variants) and packets of entirely rippled strata are observed in submarine fan and slope sequences in the Peira-Cava area. In contrast to those in St. Croix and the Niesenflysch, the current-emplaced deposits of the Annot Sandstone are directly associated with fan-valley deposits. Such rippled strata in channels are deposits of gravity flow origin which were subsequently reworked downslope by currents generated by successive gravity flows; they also occur on levees by overbank flow. Consideration of multiple process transport is of special help to interpret sections which are poorly exposed, or which can be examined in cores, or which are located in sequences that have been highly deformed structurally.

  1. mRNA deep sequencing reveals 75 new genes and a complex transcriptional landscape in Mimivirus

    PubMed Central

    Legendre, Matthieu; Audic, Stéphane; Poirot, Olivier; Hingamp, Pascal; Seltzer, Virginie; Byrne, Deborah; Lartigue, Audrey; Lescot, Magali; Bernadac, Alain; Poulain, Julie; Abergel, Chantal; Claverie, Jean-Michel

    2010-01-01

    Mimivirus, a virus infecting Acanthamoeba, is the prototype of the Mimiviridae, the latest addition to the nucleocytoplasmic large DNA viruses. The Mimivirus genome encodes close to 1000 proteins, many of them never before encountered in a virus, such as four amino-acyl tRNA synthetases. To explore the physiology of this exceptional virus and identify the genes involved in the building of its characteristic intracytoplasmic “virion factory,” we coupled electron microscopy observations with the massively parallel pyrosequencing of the polyadenylated RNA fractions of Acanthamoeba castellanii cells at various time post-infection. We generated 633,346 reads, of which 322,904 correspond to Mimivirus transcripts. This first application of deep mRNA sequencing (454 Life Sciences [Roche] FLX) to a large DNA virus allowed the precise delineation of the 5′ and 3′ extremities of Mimivirus mRNAs and revealed 75 new transcripts including several noncoding RNAs. Mimivirus genes are expressed across a wide dynamic range, in a finely regulated manner broadly described by three main temporal classes: early, intermediate, and late. This RNA-seq study confirmed the AAAATTGA sequence as an early promoter element, as well as the presence of palindromes at most of the polyadenylation sites. It also revealed a new promoter element correlating with late gene expression, which is also prominent in Sputnik, the recently described Mimivirus “virophage.” These results—validated genome-wide by the hybridization of total RNA extracted from infected Acanthamoeba cells on a tiling array (Agilent)—will constitute the foundation on which to build subsequent functional studies of the Mimivirus/Acanthamoeba system. PMID:20360389

  2. Molecular Definition of Vaginal Microbiota in East African Commercial Sex Workers ▿ †

    PubMed Central

    Schellenberg, John J.; Links, Matthew G.; Hill, Janet E.; Dumonceaux, Tim J.; Kimani, Joshua; Jaoko, Walter; Wachihi, Charles; Mungai, Jane Njeri; Peters, Geoffrey A.; Tyler, Shaun; Graham, Morag; Severini, Alberto; Fowke, Keith R.; Ball, T. Blake; Plummer, Francis A.

    2011-01-01

    Resistance to HIV infection in a cohort of commercial sex workers living in Nairobi, Kenya, is linked to mucosal and antiinflammatory factors that may be influenced by the vaginal microbiota. Since bacterial vaginosis (BV), a polymicrobial dysbiosis characterized by low levels of protective Lactobacillus organisms, is an established risk factor for HIV infection, we investigated whether vaginal microbiology was associated with HIV-exposed seronegative (HESN) or HIV-seropositive (HIV+) status in this cohort. A subset of 44 individuals was selected for deep-sequencing analysis based on the chaperonin 60 (cpn60) universal target (UT), including HESN individuals (n = 16), other HIV-seronegative controls (HIV-N, n = 16), and HIV+ individuals (n = 12). Our findings indicate exceptionally high phylogenetic resolution of the cpn60 UT using reads as short as 200 bp, with 54 species in 29 genera detected in this group. Contrary to our initial hypothesis, few differences between HESN and HIV-N women were observed. Several HIV+ women had distinct profiles dominated by Escherichia coli. The deep-sequencing phylogenetic profile of the vaginal microbiota corresponds closely to BV+ and BV− diagnoses by microscopy, elucidating BV at the molecular level. A cluster of samples with intermediate abundance of Lactobacillus and dominant Gardnerella was identified, defining a distinct BV phenotype that may represent a transitional stage between BV+ and BV−. Several alpha- and betaproteobacteria, including the recently described species Variovorax paradoxus, were found to correlate positively with increased Lactobacillus levels that define the BV− (“normal”) phenotype. We conclude that cpn60 UT is ideally suited to next-generation sequencing technologies for further investigation of microbial community dynamics and mucosal immunity underlying HIV resistance in this cohort. PMID:21531840

  3. Deep learning improves prediction of CRISPR-Cpf1 guide RNA activity.

    PubMed

    Kim, Hui Kwon; Min, Seonwoo; Song, Myungjae; Jung, Soobin; Choi, Jae Woo; Kim, Younggwang; Lee, Sangeun; Yoon, Sungroh; Kim, Hyongbum Henry

    2018-03-01

    We present two algorithms to predict the activity of AsCpf1 guide RNAs. Indel frequencies for 15,000 target sequences were used in a deep-learning framework based on a convolutional neural network to train Seq-deepCpf1. We then incorporated chromatin accessibility information to create the better-performing DeepCpf1 algorithm for cell lines for which such information is available and show that both algorithms outperform previous machine learning algorithms on our own and published data sets.

  4. Unified Deep Learning Architecture for Modeling Biology Sequence.

    PubMed

    Wu, Hongjie; Cao, Chengyuan; Xia, Xiaoyan; Lu, Qiang

    2017-10-09

    Prediction of the spatial structure or function of biological macromolecules based on their sequence remains an important challenge in bioinformatics. When modeling biological sequences using traditional sequencing models, characteristics, such as long-range interactions between basic units, the complicated and variable output of labeled structures, and the variable length of biological sequences, usually lead to different solutions on a case-by-case basis. This study proposed the use of bidirectional recurrent neural networks based on long short-term memory or a gated recurrent unit to capture long-range interactions by designing the optional reshape operator to adapt to the diversity of the output labels and implementing a training algorithm to support the training of sequence models capable of processing variable-length sequences. Additionally, the merge and pooling operators enhanced the ability to capture short-range interactions between basic units of biological sequences. The proposed deep-learning model and its training algorithm might be capable of solving currently known biological sequence-modeling problems through the use of a unified framework. We validated our model on one of the most difficult biological sequence-modeling problems currently known, with our results indicating the ability of the model to obtain predictions of protein residue interactions that exceeded the accuracy of current popular approaches by 10% based on multiple benchmarks.

  5. Economic contribution of 'artificial upwelling' mariculture to sea-thermal power generation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Roels, O.A.

    1976-07-01

    Deep-sea water has two valuable properties: it is uniformly cold and, compared to surface water, it is rich in nutrients such as nitrate and phosphate which are necessary for plant growth. In tropical and subtropical areas, the temperature difference between the warm surface water and the cold deep water can be used for sea-thermal power generation or other cooling applications such as air-conditioning, ice-making, desalination, and cooling of refineries, power plants, etc. Once the deep water is brought to the surface, utilization of both the cold temperature and the nutrient content is likely to be more advantageous than the usemore » of only one of them. Claude demonstrated the technical feasibility of sea-thermal power generation in Cuba in 1930. The technical feasibility of artificial upwelling mariculture in the St. Croix installation has been demonstrated. Results to date demonstrate that the gross sales value of the potential mariculture yield from a given volume of deep-sea water is many times that of the sales value of the power which can be generated by the Claude process from the same volume of deep water. Utilizing both the nutrient content and the cold temperature of the deep water may therefore make sea-thermal power generation economically feasible.« less

  6. Looking beyond the exome: a phenotype-first approach to molecular diagnostic resolution in rare and undiagnosed diseases.

    PubMed

    Pena, Loren D M; Jiang, Yong-Hui; Schoch, Kelly; Spillmann, Rebecca C; Walley, Nicole; Stong, Nicholas; Rapisardo Horn, Sarah; Sullivan, Jennifer A; McConkie-Rosell, Allyn; Kansagra, Sujay; Smith, Edward C; El-Dairi, Mays; Bellet, Jane; Keels, Martha Ann; Jasien, Joan; Kranz, Peter G; Noel, Richard; Nagaraj, Shashi K; Lark, Robert K; Wechsler, Daniel S G; Del Gaudio, Daniela; Leung, Marco L; Hendon, Laura G; Parker, Collette C; Jones, Kelly L; Goldstein, David B; Shashi, Vandana

    2018-04-01

    PurposeTo describe examples of missed pathogenic variants on whole-exome sequencing (WES) and the importance of deep phenotyping for further diagnostic testing.MethodsGuided by phenotypic information, three children with negative WES underwent targeted single-gene testing.ResultsIndividual 1 had a clinical diagnosis consistent with infantile systemic hyalinosis, although WES and a next-generation sequencing (NGS)-based ANTXR2 test were negative. Sanger sequencing of ANTXR2 revealed a homozygous single base pair insertion, previously missed by the WES variant caller software. Individual 2 had neurodevelopmental regression and cerebellar atrophy, with no diagnosis on WES. New clinical findings prompted Sanger sequencing and copy number testing of PLA2G6. A novel homozygous deletion of the noncoding exon 1 (not included in the WES capture kit) was detected, with extension into the promoter, confirming the clinical suspicion of infantile neuroaxonal dystrophy. Individual 3 had progressive ataxia, spasticity, and magnetic resonance image changes of vanishing white matter leukoencephalopathy. An NGS leukodystrophy gene panel and WES showed a heterozygous pathogenic variant in EIF2B5; no deletions/duplications were detected. Sanger sequencing of EIF2B5 showed a frameshift indel, probably missed owing to failure of alignment.ConclusionThese cases illustrate potential pitfalls of WES/NGS testing and the importance of phenotype-guided molecular testing in yielding diagnoses.

  7. Next-generation sequencing facilitates quantitative analysis of wild-type and Nrl−/− retinal transcriptomes

    PubMed Central

    Brooks, Matthew J.; Rajasimha, Harsha K.; Roger, Jerome E.

    2011-01-01

    Purpose Next-generation sequencing (NGS) has revolutionized systems-based analysis of cellular pathways. The goals of this study are to compare NGS-derived retinal transcriptome profiling (RNA-seq) to microarray and quantitative reverse transcription polymerase chain reaction (qRT–PCR) methods and to evaluate protocols for optimal high-throughput data analysis. Methods Retinal mRNA profiles of 21-day-old wild-type (WT) and neural retina leucine zipper knockout (Nrl−/−) mice were generated by deep sequencing, in triplicate, using Illumina GAIIx. The sequence reads that passed quality filters were analyzed at the transcript isoform level with two methods: Burrows–Wheeler Aligner (BWA) followed by ANOVA (ANOVA) and TopHat followed by Cufflinks. qRT–PCR validation was performed using TaqMan and SYBR Green assays. Results Using an optimized data analysis workflow, we mapped about 30 million sequence reads per sample to the mouse genome (build mm9) and identified 16,014 transcripts in the retinas of WT and Nrl−/− mice with BWA workflow and 34,115 transcripts with TopHat workflow. RNA-seq data confirmed stable expression of 25 known housekeeping genes, and 12 of these were validated with qRT–PCR. RNA-seq data had a linear relationship with qRT–PCR for more than four orders of magnitude and a goodness of fit (R2) of 0.8798. Approximately 10% of the transcripts showed differential expression between the WT and Nrl−/− retina, with a fold change ≥1.5 and p value <0.05. Altered expression of 25 genes was confirmed with qRT–PCR, demonstrating the high degree of sensitivity of the RNA-seq method. Hierarchical clustering of differentially expressed genes uncovered several as yet uncharacterized genes that may contribute to retinal function. Data analysis with BWA and TopHat workflows revealed a significant overlap yet provided complementary insights in transcriptome profiling. Conclusions Our study represents the first detailed analysis of retinal transcriptomes, with biologic replicates, generated by RNA-seq technology. The optimized data analysis workflows reported here should provide a framework for comparative investigations of expression profiles. Our results show that NGS offers a comprehensive and more accurate quantitative and qualitative evaluation of mRNA content within a cell or tissue. We conclude that RNA-seq based transcriptome characterization would expedite genetic network analyses and permit the dissection of complex biologic functions. PMID:22162623

  8. DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations.

    PubMed

    Yuan, Yuchen; Shi, Yi; Li, Changyang; Kim, Jinman; Cai, Weidong; Han, Zeguang; Feng, David Dagan

    2016-12-23

    With the developments of DNA sequencing technology, large amounts of sequencing data have become available in recent years and provide unprecedented opportunities for advanced association studies between somatic point mutations and cancer types/subtypes, which may contribute to more accurate somatic point mutation based cancer classification (SMCC). However in existing SMCC methods, issues like high data sparsity, small volume of sample size, and the application of simple linear classifiers, are major obstacles in improving the classification performance. To address the obstacles in existing SMCC studies, we propose DeepGene, an advanced deep neural network (DNN) based classifier, that consists of three steps: firstly, the clustered gene filtering (CGF) concentrates the gene data by mutation occurrence frequency, filtering out the majority of irrelevant genes; secondly, the indexed sparsity reduction (ISR) converts the gene data into indexes of its non-zero elements, thereby significantly suppressing the impact of data sparsity; finally, the data after CGF and ISR is fed into a DNN classifier, which extracts high-level features for accurate classification. Experimental results on our curated TCGA-DeepGene dataset, which is a reformulated subset of the TCGA dataset containing 12 selected types of cancer, show that CGF, ISR and DNN all contribute in improving the overall classification performance. We further compare DeepGene with three widely adopted classifiers and demonstrate that DeepGene has at least 24% performance improvement in terms of testing accuracy. Based on deep learning and somatic point mutation data, we devise DeepGene, an advanced cancer type classifier, which addresses the obstacles in existing SMCC studies. Experiments indicate that DeepGene outperforms three widely adopted existing classifiers, which is mainly attributed to its deep learning module that is able to extract the high level features between combinatorial somatic point mutations and cancer types.

  9. Evidence for thermal convection in the deep carbonate aquifer of the eastern sector of the Po Plain, Italy

    NASA Astrophysics Data System (ADS)

    Pasquale, V.; Chiozzi, P.; Verdoya, M.

    2013-05-01

    Temperatures recorded in wells as deep as 6 km drilled for hydrocarbon prospecting were used together with geological information to depict the thermal regime of the sedimentary sequence of the eastern sector of the Po Plain. After correction for drilling disturbance, temperature data were analyzed through an inversion technique based on a laterally constant thermal gradient model. The obtained thermal gradient is quite low within the deep carbonate unit (14 mK m- 1), while it is larger (53 mK m- 1) in the overlying impermeable formations. In the uppermost sedimentary layers, the thermal gradient is close to the regional average (21 mK m- 1). We argue that such a vertical change cannot be ascribed to thermal conductivity variation within the sedimentary sequence, but to deep groundwater flow. Since the hydrogeological characteristics (including litho-stratigraphic sequence and structural setting) hardly permit forced convection, we suggest that thermal convection might occur within the deep carbonate aquifer. The potential of this mechanism was evaluated by means of the Rayleigh number analysis. It turned out that permeability required for convection to occur must be larger than 3 10- 15 m2. The average over-heat ratio is 0.45. The lateral variation of hydrothermal regime was tested by using temperature data representing the aquifer thermal conditions. We found that thermal convection might be more developed and variable at the Ferrara High and its surroundings, where widespread fracturing may have increased permeability.

  10. De novo assembly of honey bee RNA viral genomes by tapping into the innate insect antiviral response pathway.

    PubMed

    Fung, Elisabeth; Hill, Kelly; Hogendoorn, Katja; Glatz, Richard V; Napier, Kathryn R; Bellgard, Matthew I; Barrero, Roberto A

    2018-02-01

    Bee pollination is critical for improving productivity of one third of all plants or plant products consumed by humans. The health of honey bees is in decline in many countries worldwide, and RNA viruses together with other biological, environmental and anthropogenic factors have been identified as the main causes. The rapid genetic variation of viruses represents a challenge for diagnosis. Thus, application of deep sequencing methods for detection and analysis of viruses has increased over the last years. In this study, we leverage from the innate Dicer-2 mediated antiviral response against viruses to reconstruct complete viral genomes using virus-derived small interfering RNAs (vsiRNAs). Symptomatic A. mellifera larvae collected from hives free of Colony Collapse Disorder (CCD) and the parasitic Varroa mite (Varroa destructor) were used to generate more than 107 million small RNA reads. We show that de novo assembly of insect viral sequences is less fragmented using only 22 nt long vsiRNAs rather than a combination of 21-22 nt small RNAs. Our results show that A. mellifera larvae activate the RNAi immune response in the presence of Sacbrood virus (SBV). We assembled three SBV genomes from three individual larvae from different hives in a single apiary, with 1-2% nucleotide sequence variability among them. We found 3-4% variability between SBV genomes generated in this study and earlier published Australian variants suggesting the presence of different SBV quasispecies within the country. Copyright © 2018. Published by Elsevier Inc.

  11. Ancient orphan crop joins modern era: gene-based SNP discovery and mapping in lentil.

    PubMed

    Sharpe, Andrew G; Ramsay, Larissa; Sanderson, Lacey-Anne; Fedoruk, Michael J; Clarke, Wayne E; Li, Rong; Kagale, Sateesh; Vijayan, Perumal; Vandenberg, Albert; Bett, Kirstin E

    2013-03-18

    The genus Lens comprises a range of closely related species within the galegoid clade of the Papilionoideae family. The clade includes other important crops (e.g. chickpea and pea) as well as a sequenced model legume (Medicago truncatula). Lentil is a global food crop increasing in importance in the Indian sub-continent and elsewhere due to its nutritional value and quick cooking time. Despite this importance there has been a dearth of genetic and genomic resources for the crop and this has limited the application of marker-assisted selection strategies in breeding. We describe here the development of a deep and diverse transcriptome resource for lentil using next generation sequencing technology. The generation of data in multiple cultivated (L. culinaris) and wild (L. ervoides) genotypes together with the utilization of a bioinformatics workflow enabled the identification of a large collection of SNPs and the subsequent development of a genotyping platform that was used to establish the first comprehensive genetic map of the L. culinaris genome. Extensive collinearity with M. truncatula was evident on the basis of sequence homology between mapped markers and the model genome and large translocations and inversions relative to M. truncatula were identified. An estimate for the time divergence of L. culinaris from L. ervoides and of both from M. truncatula was also calculated. The availability of the genomic and derived molecular marker resources presented here will help change lentil breeding strategies and lead to increased genetic gain in the future.

  12. Contrasting morphological and DNA barcode-suggested species boundaries among shallow-water amphipod fauna from the southern European Atlantic coast.

    PubMed

    Lobo, Jorge; Ferreira, Maria S; Antunes, Ilisa C; Teixeira, Marcos A L; Borges, Luisa M S; Sousa, Ronaldo; Gomes, Pedro A; Costa, Maria Helena; Cunha, Marina R; Costa, Filipe O

    2017-02-01

    In this study we compared DNA barcode-suggested species boundaries with morphology-based species identifications in the amphipod fauna of the southern European Atlantic coast. DNA sequences of the cytochrome c oxidase subunit I barcode region (COI-5P) were generated for 43 morphospecies (178 specimens) collected along the Portuguese coast which, together with publicly available COI-5P sequences, produced a final dataset comprising 68 morphospecies and 295 sequences. Seventy-five BINs (Barcode Index Numbers) were assigned to these morphospecies, of which 48 were concordant (i.e., 1 BIN = 1 species), 8 were taxonomically discordant, and 19 were singletons. Twelve species had matching sequences (<2% distance) with conspecifics from distant locations (e.g., North Sea). Seven morphospecies were assigned to multiple, and highly divergent, BINs, including specimens of Corophium multisetosum (18% divergence) and Dexamine spiniventris (16% divergence), which originated from sampling locations on the west coast of Portugal (only about 36 and 250 km apart, respectively). We also found deep divergence (4%-22%) among specimens of seven species from Portugal compared to those from the North Sea and Italy. The detection of evolutionarily meaningful divergence among populations of several amphipod species from southern Europe reinforces the need for a comprehensive re-assessment of the diversity of this faunal group.

  13. Sequence capture by hybridization to explore modern and ancient genomic diversity in model and nonmodel organisms

    PubMed Central

    Gasc, Cyrielle; Peyretaillade, Eric

    2016-01-01

    Abstract The recent expansion of next-generation sequencing has significantly improved biological research. Nevertheless, deep exploration of genomes or metagenomic samples remains difficult because of the sequencing depth and the associated costs required. Therefore, different partitioning strategies have been developed to sequence informative subsets of studied genomes. Among these strategies, hybridization capture has proven to be an innovative and efficient tool for targeting and enriching specific biomarkers in complex DNA mixtures. It has been successfully applied in numerous areas of biology, such as exome resequencing for the identification of mutations underlying Mendelian or complex diseases and cancers, and its usefulness has been demonstrated in the agronomic field through the linking of genetic variants to agricultural phenotypic traits of interest. Moreover, hybridization capture has provided access to underexplored, but relevant fractions of genomes through its ability to enrich defined targets and their flanking regions. Finally, on the basis of restricted genomic information, this method has also allowed the expansion of knowledge of nonreference species and ancient genomes and provided a better understanding of metagenomic samples. In this review, we present the major advances and discoveries permitted by hybridization capture and highlight the potency of this approach in all areas of biology. PMID:27105841

  14. Sequence capture by hybridization to explore modern and ancient genomic diversity in model and nonmodel organisms.

    PubMed

    Gasc, Cyrielle; Peyretaillade, Eric; Peyret, Pierre

    2016-06-02

    The recent expansion of next-generation sequencing has significantly improved biological research. Nevertheless, deep exploration of genomes or metagenomic samples remains difficult because of the sequencing depth and the associated costs required. Therefore, different partitioning strategies have been developed to sequence informative subsets of studied genomes. Among these strategies, hybridization capture has proven to be an innovative and efficient tool for targeting and enriching specific biomarkers in complex DNA mixtures. It has been successfully applied in numerous areas of biology, such as exome resequencing for the identification of mutations underlying Mendelian or complex diseases and cancers, and its usefulness has been demonstrated in the agronomic field through the linking of genetic variants to agricultural phenotypic traits of interest. Moreover, hybridization capture has provided access to underexplored, but relevant fractions of genomes through its ability to enrich defined targets and their flanking regions. Finally, on the basis of restricted genomic information, this method has also allowed the expansion of knowledge of nonreference species and ancient genomes and provided a better understanding of metagenomic samples. In this review, we present the major advances and discoveries permitted by hybridization capture and highlight the potency of this approach in all areas of biology. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Targeted exome sequencing reveals novel USH2A mutations in Chinese patients with simplex Usher syndrome.

    PubMed

    Shu, Hai-Rong; Bi, Huai; Pan, Yang-Chun; Xu, Hang-Yu; Song, Jian-Xin; Hu, Jie

    2015-09-16

    Usher syndrome (USH) is an autosomal recessive disorder characterized by hearing impairment and vision dysfunction due to retinitis pigmentosa. Phenotypic and genetic heterogeneities of this disease make it impractical to obtain a genetic diagnosis by conventional Sanger sequencing. In this study, we applied a next-generation sequencing approach to detect genetic abnormalities in patients with USH. Two unrelated Chinese families were recruited, consisting of two USH afflicted patients and four unaffected relatives. We selected 199 genes related to inherited retinal diseases as targets for deep exome sequencing. Through systematic data analysis using an established bioinformatics pipeline, all variants that passed filter criteria were validated by Sanger sequencing and co-segregation analysis. A homozygous frameshift mutation (c.4382delA, p.T1462Lfs*2) was revealed in exon20 of gene USH2A in the F1 family. Two compound heterozygous mutations, IVS47 + 1G > A and c.13156A > T (p.I4386F), located in intron 48 and exon 63 respectively, of USH2A, were identified as causative mutations for the F2 family. Of note, the missense mutation c.13156A > T has not been reported so far. In conclusion, targeted exome sequencing precisely and rapidly identified the genetic defects in two Chinese USH families and this technique can be applied as a routine examination for these disorders with significant clinical and genetic heterogeneity.

  16. Ultra-deep sequencing enables high-fidelity recovery of biodiversity for bulk arthropod samples without PCR amplification

    PubMed Central

    2013-01-01

    Background Next-generation-sequencing (NGS) technologies combined with a classic DNA barcoding approach have enabled fast and credible measurement for biodiversity of mixed environmental samples. However, the PCR amplification involved in nearly all existing NGS protocols inevitably introduces taxonomic biases. In the present study, we developed new Illumina pipelines without PCR amplifications to analyze terrestrial arthropod communities. Results Mitochondrial enrichment directly followed by Illumina shotgun sequencing, at an ultra-high sequence volume, enabled the recovery of Cytochrome c Oxidase subunit 1 (COI) barcode sequences, which allowed for the estimation of species composition at high fidelity for a terrestrial insect community. With 15.5 Gbp Illumina data, approximately 97% and 92% were detected out of the 37 input Operational Taxonomic Units (OTUs), whether the reference barcode library was used or not, respectively, while only 1 novel OTU was found for the latter. Additionally, relatively strong correlation between the sequencing volume and the total biomass was observed for species from the bulk sample, suggesting a potential solution to reveal relative abundance. Conclusions The ability of the new Illumina PCR-free pipeline for DNA metabarcoding to detect small arthropod specimens and its tendency to avoid most, if not all, false positives suggests its great potential in biodiversity-related surveillance, such as in biomonitoring programs. However, further improvement for mitochondrial enrichment is likely needed for the application of the new pipeline in analyzing arthropod communities at higher diversity. PMID:23587339

  17. Analytical and Clinical Validation of a Digital Sequencing Panel for Quantitative, Highly Accurate Evaluation of Cell-Free Circulating Tumor DNA

    PubMed Central

    Zill, Oliver A.; Sebisanovic, Dragan; Lopez, Rene; Blau, Sibel; Collisson, Eric A.; Divers, Stephen G.; Hoon, Dave S. B.; Kopetz, E. Scott; Lee, Jeeyun; Nikolinakos, Petros G.; Baca, Arthur M.; Kermani, Bahram G.; Eltoukhy, Helmy; Talasaz, AmirAli

    2015-01-01

    Next-generation sequencing of cell-free circulating solid tumor DNA addresses two challenges in contemporary cancer care. First this method of massively parallel and deep sequencing enables assessment of a comprehensive panel of genomic targets from a single sample, and second, it obviates the need for repeat invasive tissue biopsies. Digital SequencingTM is a novel method for high-quality sequencing of circulating tumor DNA simultaneously across a comprehensive panel of over 50 cancer-related genes with a simple blood test. Here we report the analytic and clinical validation of the gene panel. Analytic sensitivity down to 0.1% mutant allele fraction is demonstrated via serial dilution studies of known samples. Near-perfect analytic specificity (> 99.9999%) enables complete coverage of many genes without the false positives typically seen with traditional sequencing assays at mutant allele frequencies or fractions below 5%. We compared digital sequencing of plasma-derived cell-free DNA to tissue-based sequencing on 165 consecutive matched samples from five outside centers in patients with stage III-IV solid tumor cancers. Clinical sensitivity of plasma-derived NGS was 85.0%, comparable to 80.7% sensitivity for tissue. The assay success rate on 1,000 consecutive samples in clinical practice was 99.8%. Digital sequencing of plasma-derived DNA is indicated in advanced cancer patients to prevent repeated invasive biopsies when the initial biopsy is inadequate, unobtainable for genomic testing, or uninformative, or when the patient’s cancer has progressed despite treatment. Its clinical utility is derived from reduction in the costs, complications and delays associated with invasive tissue biopsies for genomic testing. PMID:26474073

  18. Deep sequencing of the viral phoH gene reveals temporal variation, depth-specific composition, and persistent dominance of the same viral phoH genes in the Sargasso Sea

    PubMed Central

    Goldsmith, Dawn B.; Parsons, Rachel J.; Beyene, Damitu; Salamon, Peter

    2015-01-01

    Deep sequencing of the viral phoH gene, a host-derived auxiliary metabolic gene, was used to track viral diversity throughout the water column at the Bermuda Atlantic Time-series Study (BATS) site in the summer (September) and winter (March) of three years. Viral phoH sequences reveal differences in the viral communities throughout a depth profile and between seasons in the same year. Variation was also detected between the same seasons in subsequent years, though these differences were not as great as the summer/winter distinctions. Over 3,600 phoH operational taxonomic units (OTUs; 97% sequence identity) were identified. Despite high richness, most phoH sequences belong to a few large, common OTUs whereas the majority of the OTUs are small and rare. While many OTUs make sporadic appearances at just a few times or depths, a small number of OTUs dominate the community throughout the seasons, depths, and years. PMID:26157645

  19. Comparison of magnetic resonance imaging sequences for depicting the subthalamic nucleus for deep brain stimulation.

    PubMed

    Nagahama, Hiroshi; Suzuki, Kengo; Shonai, Takaharu; Aratani, Kazuki; Sakurai, Yuuki; Nakamura, Manami; Sakata, Motomichi

    2015-01-01

    Electrodes are surgically implanted into the subthalamic nucleus (STN) of Parkinson's disease patients to provide deep brain stimulation. For ensuring correct positioning, the anatomic location of the STN must be determined preoperatively. Magnetic resonance imaging has been used for pinpointing the location of the STN. To identify the optimal imaging sequence for identifying the STN, we compared images produced with T2 star-weighted angiography (SWAN), gradient echo T2*-weighted imaging, and fast spin echo T2-weighted imaging in 6 healthy volunteers. Our comparison involved measurement of the contrast-to-noise ratio (CNR) for the STN and substantia nigra and a radiologist's interpretations of the images. Of the sequences examined, the CNR and qualitative scores were significantly higher on SWAN images than on other images (p < 0.01) for STN visualization. Kappa value (0.74) on SWAN images was the highest in three sequences for visualizing the STN. SWAN is the sequence best suited for identifying the STN at the present time.

  20. Development of genic-SSR markers by deep transcriptome sequencing in pigeonpea [Cajanus cajan (L.) Millspaugh].

    PubMed

    Dutta, Sutapa; Kumawat, Giriraj; Singh, Bikram P; Gupta, Deepak K; Singh, Sangeeta; Dogra, Vivek; Gaikwad, Kishor; Sharma, Tilak R; Raje, Ranjeet S; Bandhopadhya, Tapas K; Datta, Subhojit; Singh, Mahendra N; Bashasab, Fakrudin; Kulwal, Pawan; Wanjari, K B; K Varshney, Rajeev; Cook, Douglas R; Singh, Nagendra K

    2011-01-20

    Pigeonpea [Cajanus cajan (L.) Millspaugh], one of the most important food legumes of semi-arid tropical and subtropical regions, has limited genomic resources, particularly expressed sequence based (genic) markers. We report a comprehensive set of validated genic simple sequence repeat (SSR) markers using deep transcriptome sequencing, and its application in genetic diversity analysis and mapping. In this study, 43,324 transcriptome shotgun assembly unigene contigs were assembled from 1.696 million 454 GS-FLX sequence reads of separate pooled cDNA libraries prepared from leaf, root, stem and immature seed of two pigeonpea varieties, Asha and UPAS 120. A total of 3,771 genic-SSR loci, excluding homopolymeric and compound repeats, were identified; of which 2,877 PCR primer pairs were designed for marker development. Dinucleotide was the most common repeat motif with a frequency of 60.41%, followed by tri- (34.52%), hexa- (2.62%), tetra- (1.67%) and pentanucleotide (0.76%) repeat motifs. Primers were synthesized and tested for 772 of these loci with repeat lengths of ≥ 18 bp. Of these, 550 markers were validated for consistent amplification in eight diverse pigeonpea varieties; 71 were found to be polymorphic on agarose gel electrophoresis. Genetic diversity analysis was done on 22 pigeonpea varieties and eight wild species using 20 highly polymorphic genic-SSR markers. The number of alleles at these loci ranged from 4-10 and the polymorphism information content values ranged from 0.46 to 0.72. Neighbor-joining dendrogram showed distinct separation of the different groups of pigeonpea cultivars and wild species. Deep transcriptome sequencing of the two parental lines helped in silico identification of polymorphic genic-SSR loci to facilitate the rapid development of an intra-species reference genetic map, a subset of which was validated for expected allelic segregation in the reference mapping population. We developed 550 validated genic-SSR markers in pigeonpea using deep transcriptome sequencing. From these, 20 highly polymorphic markers were used to evaluate the genetic relationship among species of the genus Cajanus. A comprehensive set of genic-SSR markers was developed as an important genomic resource for diversity analysis and genetic mapping in pigeonpea.

  1. Development of genic-SSR markers by deep transcriptome sequencing in pigeonpea [Cajanus cajan (L.) Millspaugh

    PubMed Central

    2011-01-01

    Background Pigeonpea [Cajanus cajan (L.) Millspaugh], one of the most important food legumes of semi-arid tropical and subtropical regions, has limited genomic resources, particularly expressed sequence based (genic) markers. We report a comprehensive set of validated genic simple sequence repeat (SSR) markers using deep transcriptome sequencing, and its application in genetic diversity analysis and mapping. Results In this study, 43,324 transcriptome shotgun assembly unigene contigs were assembled from 1.696 million 454 GS-FLX sequence reads of separate pooled cDNA libraries prepared from leaf, root, stem and immature seed of two pigeonpea varieties, Asha and UPAS 120. A total of 3,771 genic-SSR loci, excluding homopolymeric and compound repeats, were identified; of which 2,877 PCR primer pairs were designed for marker development. Dinucleotide was the most common repeat motif with a frequency of 60.41%, followed by tri- (34.52%), hexa- (2.62%), tetra- (1.67%) and pentanucleotide (0.76%) repeat motifs. Primers were synthesized and tested for 772 of these loci with repeat lengths of ≥18 bp. Of these, 550 markers were validated for consistent amplification in eight diverse pigeonpea varieties; 71 were found to be polymorphic on agarose gel electrophoresis. Genetic diversity analysis was done on 22 pigeonpea varieties and eight wild species using 20 highly polymorphic genic-SSR markers. The number of alleles at these loci ranged from 4-10 and the polymorphism information content values ranged from 0.46 to 0.72. Neighbor-joining dendrogram showed distinct separation of the different groups of pigeonpea cultivars and wild species. Deep transcriptome sequencing of the two parental lines helped in silico identification of polymorphic genic-SSR loci to facilitate the rapid development of an intra-species reference genetic map, a subset of which was validated for expected allelic segregation in the reference mapping population. Conclusion We developed 550 validated genic-SSR markers in pigeonpea using deep transcriptome sequencing. From these, 20 highly polymorphic markers were used to evaluate the genetic relationship among species of the genus Cajanus. A comprehensive set of genic-SSR markers was developed as an important genomic resource for diversity analysis and genetic mapping in pigeonpea. PMID:21251263

  2. Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance.

    PubMed

    Bashir, Ali; Bansal, Vikas; Bafna, Vineet

    2010-06-18

    Massively parallel DNA sequencing technologies have enabled the sequencing of several individual human genomes. These technologies are also being used in novel ways for mRNA expression profiling, genome-wide discovery of transcription-factor binding sites, small RNA discovery, etc. The multitude of sequencing platforms, each with their unique characteristics, pose a number of design challenges, regarding the technology to be used and the depth of sequencing required for a particular sequencing application. Here we describe a number of analytical and empirical results to address design questions for two applications: detection of structural variations from paired-end sequencing and estimating mRNA transcript abundance. For structural variation, our results provide explicit trade-offs between the detection and resolution of rearrangement breakpoints, and the optimal mix of paired-read insert lengths. Specifically, we prove that optimal detection and resolution of breakpoints is achieved using a mix of exactly two insert library lengths. Furthermore, we derive explicit formulae to determine these insert length combinations, enabling a 15% improvement in breakpoint detection at the same experimental cost. On empirical short read data, these predictions show good concordance with Illumina 200 bp and 2 Kbp insert length libraries. For transcriptome sequencing, we determine the sequencing depth needed to detect rare transcripts from a small pilot study. With only 1 Million reads, we derive corrections that enable almost perfect prediction of the underlying expression probability distribution, and use this to predict the sequencing depth required to detect low expressed genes with greater than 95% probability. Together, our results form a generic framework for many design considerations related to high-throughput sequencing. We provide software tools http://bix.ucsd.edu/projects/NGS-DesignTools to derive platform independent guidelines for designing sequencing experiments (amount of sequencing, choice of insert length, mix of libraries) for novel applications of next generation sequencing.

  3. Sedimentation and basin-fill history of the Neogene clastic succession exposed in the southeastern fold belt of the Bengal Basin, Bangladesh: a high-resolution sequence stratigraphic approach

    NASA Astrophysics Data System (ADS)

    Royhan Gani, M.; Mustafa Alam, M.

    2003-02-01

    The Tertiary basin-fill history of the Bengal Basin suffers from oversimplification. The interpretation of the sedimentary history of the basin should be consistent with the evolution of its three geo-tectonic provinces, namely, western, northeastern and eastern. Each province has its own basin generation and sediment-fill history related mainly to the Indo-Burmese and subordinately to the Indo-Tibetan plate convergence. This paper is mainly concerned with facies and facies sequence analysis of the Neogene clastic succession within the subduction-related active margin setting (oblique convergence) in the southeastern fold belt of the Bengal Basin. Detailed fieldwork was carried out in the Sitapahar anticline of the Rangamati area and the Mirinja anticline of the Lama area. The study shows that the exposed Neogene succession represents an overall basinward progradation from deep marine through shallow marine to continental-fluvial environments. Based on regionally correlatable erosion surfaces the entire succession (3000+ m thick) has been grouped into three composite sequences C, B and A, from oldest to youngest. Composite sequence C begins with deep-water base-of-slope clastics overlain by thick slope mud that passes upward into shallow marine and nearshore clastics. Composite sequence B characteristically depicts tide-dominated open-marine to coastal depositional systems with evidence of cyclic marine regression and transgression. Repetitive occurrence of incised channel, tidal inlet, tidal ridge/shoal, tidal flat and other tidal deposits is separated by shelfal mudstone. Most of the sandbodies contain a full spectrum of tide-generated structures (e.g. herringbone cross-bedding, bundle structure, mud couplet, bipolar cross-lamination with reactivation surfaces, 'tidal' bedding). Storm activities appear to have played a subordinate role in the mid and inner shelf region. Rizocorallium, Rosselia, Planolites and Zoophycos are the dominant ichnofacies within the shelfal mudstone. This paralic sedimentation of Neogene succession in the study area can serve as a good point of reference for tide-dominated regressive shelf depositional systems. The top of the composite sequence B is marked by a pronounced erosion surface indicating the final phase of marine regression followed by the gradual establishment of continental-fluvial depositional systems represented by composite sequence A. In this composite sequence, stacked channel bars of low-sinuosity braided rivers gradually pass upsequence into high-sinuosity meandering river deposits. A sequence stratigraphic approach has been adopted to interpret the basin-fill history with respect to relative sea-level changes; and to subdivide the rock record into several sequences and units (systems tracts and parasequences) based on identified bounding discontinuities, such as transgressive erosion surface (TES), regressive erosion surface (RES), marine flooding surface (MFS), and incised valley floor (IVF). This approach provides new insight for both exploration and exploitation strategy for hydrocarbon plays that may prove vital to the oil companies engaged in exploration activities in the Bengal Basin. It is strongly recommended here that the traditional lithostratigraphic classification of this part of the basin, which is based on the Assam stratigraphy, be abandoned or at least revised. A tentative allostratigraphic scheme is presented, and it is suggested that to formalize the scheme further study, both surface and subsurface, is needed.

  4. Culturable prokaryotic diversity of deep, gas hydrate sediments: first use of a continuous high-pressure, anaerobic, enrichment and isolation system for subseafloor sediments (DeepIsoBUG)

    PubMed Central

    Parkes, R John; Sellek, Gerard; Webster, Gordon; Martin, Derek; Anders, Erik; Weightman, Andrew J; Sass, Henrik

    2009-01-01

    Deep subseafloor sediments may contain depressurization-sensitive, anaerobic, piezophilic prokaryotes. To test this we developed the DeepIsoBUG system, which when coupled with the HYACINTH pressure-retaining drilling and core storage system and the PRESS core cutting and processing system, enables deep sediments to be handled without depressurization (up to 25 MPa) and anaerobic prokaryotic enrichments and isolation to be conducted up to 100 MPa. Here, we describe the system and its first use with subsurface gas hydrate sediments from the Indian Continental Shelf, Cascadia Margin and Gulf of Mexico. Generally, highest cell concentrations in enrichments occurred close to in situ pressures (14 MPa) in a variety of media, although growth continued up to at least 80 MPa. Predominant sequences in enrichments were Carnobacterium, Clostridium, Marinilactibacillus and Pseudomonas, plus Acetobacterium and Bacteroidetes in Indian samples, largely independent of media and pressures. Related 16S rRNA gene sequences for all of these Bacteria have been detected in deep, subsurface environments, although isolated strains were piezotolerant, being able to grow at atmospheric pressure. Only the Clostridium and Acetobacterium were obligate anaerobes. No Archaea were enriched. It may be that these sediment samples were not deep enough (total depth 1126–1527 m) to obtain obligate piezophiles. PMID:19694787

  5. 3′ terminal diversity of MRP RNA and other human noncoding RNAs revealed by deep sequencing

    PubMed Central

    2013-01-01

    Background Post-transcriptional 3′ end processing is a key component of RNA regulation. The abundant and essential RNA subunit of RNase MRP has been proposed to function in three distinct cellular compartments and therefore may utilize this mode of regulation. Here we employ 3′ RACE coupled with high-throughput sequencing to characterize the 3′ terminal sequences of human MRP RNA and other noncoding RNAs that form RNP complexes. Results The 3′ terminal sequence of MRP RNA from HEK293T cells has a distinctive distribution of genomically encoded termini (including an assortment of U residues) with a portion of these selectively tagged by oligo(A) tails. This profile contrasts with the relatively homogenous 3′ terminus of an in vitro transcribed MRP RNA control and the differing 3′ terminal profiles of U3 snoRNA, RNase P RNA, and telomerase RNA (hTR). Conclusions 3′ RACE coupled with deep sequencing provides a valuable framework for the functional characterization of 3′ terminal sequences of noncoding RNAs. PMID:24053768

  6. Biasing genome-editing events toward precise length deletions with an RNA-guided TevCas9 dual nuclease.

    PubMed

    Wolfs, Jason M; Hamilton, Thomas A; Lant, Jeremy T; Laforet, Marcon; Zhang, Jenny; Salemi, Louisa M; Gloor, Gregory B; Schild-Poulter, Caroline; Edgell, David R

    2016-12-27

    The CRISPR/Cas9 nuclease is commonly used to make gene knockouts. The blunt DNA ends generated by cleavage can be efficiently ligated by the classical nonhomologous end-joining repair pathway (c-NHEJ), regenerating the target site. This repair creates a cycle of cleavage, ligation, and target site regeneration that persists until sufficient modification of the DNA break by alternative NHEJ prevents further Cas9 cutting, generating a heterogeneous population of insertions and deletions typical of gene knockouts. Here, we develop a strategy to escape this cycle and bias events toward defined length deletions by creating an RNA-guided dual active site nuclease that generates two noncompatible DNA breaks at a target site, effectively deleting the majority of the target site such that it cannot be regenerated. The TevCas9 nuclease, a fusion of the I-TevI nuclease domain to Cas9, functions robustly in HEK293 cells and generates 33- to 36-bp deletions at frequencies up to 40%. Deep sequencing revealed minimal processing of TevCas9 products, consistent with protection of the DNA ends from exonucleolytic degradation and repair by the c-NHEJ pathway. Directed evolution experiments identified I-TevI variants with broadened targeting range, making TevCas9 an easy-to-use reagent. Our results highlight how the sequence-tolerant cleavage properties of the I-TevI homing endonuclease can be harnessed to enhance Cas9 applications, circumventing the cleavage and ligation cycle and biasing genome-editing events toward defined length deletions.

  7. Variant calling in low-coverage whole genome sequencing of a Native American population sample.

    PubMed

    Bizon, Chris; Spiegel, Michael; Chasse, Scott A; Gizer, Ian R; Li, Yun; Malc, Ewa P; Mieczkowski, Piotr A; Sailsbery, Josh K; Wang, Xiaoshu; Ehlers, Cindy L; Wilhelmsen, Kirk C

    2014-01-30

    The reduction in the cost of sequencing a human genome has led to the use of genotype sampling strategies in order to impute and infer the presence of sequence variants that can then be tested for associations with traits of interest. Low-coverage Whole Genome Sequencing (WGS) is a sampling strategy that overcomes some of the deficiencies seen in fixed content SNP array studies. Linkage-disequilibrium (LD) aware variant callers, such as the program Thunder, may provide a calling rate and accuracy that makes a low-coverage sequencing strategy viable. We examined the performance of an LD-aware variant calling strategy in a population of 708 low-coverage whole genome sequences from a community sample of Native Americans. We assessed variant calling through a comparison of the sequencing results to genotypes measured in 641 of the same subjects using a fixed content first generation exome array. The comparison was made using the variant calling routines GATK Unified Genotyper program and the LD-aware variant caller Thunder. Thunder was found to improve concordance in a coverage dependent fashion, while correctly calling nearly all of the common variants as well as a high percentage of the rare variants present in the sample. Low-coverage WGS is a strategy that appears to collect genetic information intermediate in scope between fixed content genotyping arrays and deep-coverage WGS. Our data suggests that low-coverage WGS is a viable strategy with a greater chance of discovering novel variants and associations than fixed content arrays for large sample association analyses.

  8. Optical Communications Channel Combiner

    NASA Technical Reports Server (NTRS)

    Quirk, Kevin J.; Quirk, Kevin J.; Nguyen, Danh H.; Nguyen, Huy

    2012-01-01

    NASA has identified deep-space optical communications links as an integral part of a unified space communication network in order to provide data rates in excess of 100 Mb/s. The distances and limited power inherent in a deep-space optical downlink necessitate the use of photon-counting detectors and a power-efficient modulation such as pulse position modulation (PPM). For the output of each photodetector, whether from a separate telescope or a portion of the detection area, a communication receiver estimates a log-likelihood ratio for each PPM slot. To realize the full effective aperture of these receivers, their outputs must be combined prior to information decoding. A channel combiner was developed to synchronize the log-likelihood ratio (LLR) sequences of multiple receivers, and then combines these into a single LLR sequence for information decoding. The channel combiner synchronizes the LLR sequences of up to three receivers and then combines these into a single LLR sequence for output. The channel combiner has three channel inputs, each of which takes as input a sequence of four-bit LLRs for each PPM slot in a codeword via a XAUI 10 Gb/s quad optical fiber interface. The cross-correlation between the channels LLR time series are calculated and used to synchronize the sequences prior to combining. The output of the channel combiner is a sequence of four-bit LLRs for each PPM slot in a codeword via a XAUI 10 Gb/s quad optical fiber interface. The unit is controlled through a 1 Gb/s Ethernet UDP/IP interface. A deep-space optical communication link has not yet been demonstrated. This ground-station channel combiner was developed to demonstrate this capability and is unique in its ability to process such a signal.

  9. The 3-D aftershock distribution of three recent M5~5.5 earthquakes in the Anza region,California

    NASA Astrophysics Data System (ADS)

    Zhang, Q.; Wdowinski, S.; Lin, G.

    2011-12-01

    The San Jacinto fault zone (SJFZ) exhibits the highest level of seismicity compared to other regions in southern California. On average, it produces four earthquakes per day, most of them at depth of 10-17 km. Over the past decade, an increasing seismic activity occurred in the Anza region, which included three M5~5.5 events and their aftershock sequences. These events occurred in 2001, 2005, and 2010. In this research we map the 3-D distribution of these three events to evaluate their rupture geometry and better understand the unusual deep seismic pattern along the SJFZ, which was termed "deep creep" (Wdowinski, 2009). We relocated 97,562 events from 1981 to 2011 in Anza region by applying the Source-Specific Station Term (SSST) method (Lin et al., 2006) and used an accurate 1-D velocity model derived from 3-D model of Lin et al (2007) and used In order to separate the aftershock sequence from background seismicity, we characterized each of the three aftershock sequences using Omori's law. Preliminary results show that all three sequences had a similar geometry of deep elongated aftershock distribution. Most aftershocks occurred at depth of 10-17 km and extended over a 70 km long segments of the SJFZ, centered at the mainshock hypocenters. A comparative study of other M5~5.5 mainshocks and their aftershock sequences in southern California reveals very different geometrical pattern, suggesting that the three Anza M5~5.5 events are unique and can be indicative of "deep creep" deformation processes. Reference 1.Lin, G.and Shearer,P.M.,2006, The COMPLOC earthquake location package,Seism. Res. Lett.77, pp.440-444. 2.Lin, G. and Shearer, P.M., Hauksson, E., and Thurber C.H.,2007, A three-dimensional crustal seismic velocity model for southern California from a composite event method,J. Geophys.Res.112, B12306, doi: 10.1029/ 2007JB004977. 3.Wdowinski, S. ,2009, Deep creep as a cause for the excess seismicity along the San Jacinto fault, Nat. Geosci.,doi:10.1038/NGEO684.

  10. When less is more: 'slicing' sequencing data improves read decoding accuracy and de novo assembly quality.

    PubMed

    Lonardi, Stefano; Mirebrahim, Hamid; Wanamaker, Steve; Alpert, Matthew; Ciardo, Gianfranco; Duma, Denisa; Close, Timothy J

    2015-09-15

    As the invention of DNA sequencing in the 70s, computational biologists have had to deal with the problem of de novo genome assembly with limited (or insufficient) depth of sequencing. In this work, we investigate the opposite problem, that is, the challenge of dealing with excessive depth of sequencing. We explore the effect of ultra-deep sequencing data in two domains: (i) the problem of decoding reads to bacterial artificial chromosome (BAC) clones (in the context of the combinatorial pooling design we have recently proposed), and (ii) the problem of de novo assembly of BAC clones. Using real ultra-deep sequencing data, we show that when the depth of sequencing increases over a certain threshold, sequencing errors make these two problems harder and harder (instead of easier, as one would expect with error-free data), and as a consequence the quality of the solution degrades with more and more data. For the first problem, we propose an effective solution based on 'divide and conquer': we 'slice' a large dataset into smaller samples of optimal size, decode each slice independently, and then merge the results. Experimental results on over 15 000 barley BACs and over 4000 cowpea BACs demonstrate a significant improvement in the quality of the decoding and the final assembly. For the second problem, we show for the first time that modern de novo assemblers cannot take advantage of ultra-deep sequencing data. Python scripts to process slices and resolve decoding conflicts are available from http://goo.gl/YXgdHT; software Hashfilter can be downloaded from http://goo.gl/MIyZHs stelo@cs.ucr.edu or timothy.close@ucr.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  11. Phylogeny of sipunculan worms: A combined analysis of four gene regions and morphology.

    PubMed

    Schulze, Anja; Cutler, Edward B; Giribet, Gonzalo

    2007-01-01

    The intra-phyletic relationships of sipunculan worms were analyzed based on DNA sequence data from four gene regions and 58 morphological characters. Initially we analyzed the data under direct optimization using parsimony as optimality criterion. An implied alignment resulting from the direct optimization analysis was subsequently utilized to perform a Bayesian analysis with mixed models for the different data partitions. For this we applied a doublet model for the stem regions of the 18S rRNA. Both analyses support monophyly of Sipuncula and most of the same clades within the phylum. The analyses differ with respect to the relationships among the major groups but whereas the deep nodes in the direct optimization analysis generally show low jackknife support, they are supported by 100% posterior probability in the Bayesian analysis. Direct optimization has been useful for handling sequences of unequal length and generating conservative phylogenetic hypotheses whereas the Bayesian analysis under mixed models provided high resolution in the basal nodes of the tree.

  12. Unique microbial community in drilling fluids from Chinese continental scientific drilling

    USGS Publications Warehouse

    Zhang, Gengxin; Dong, Hailiang; Jiang, Hongchen; Xu, Zhiqin; Eberl, Dennis D.

    2006-01-01

    Circulating drilling fluid is often regarded as a contamination source in investigations of subsurface microbiology. However, it also provides an opportunity to sample geological fluids at depth and to study contained microbial communities. During our study of deep subsurface microbiology of the Chinese Continental Scientific Deep drilling project, we collected 6 drilling fluid samples from a borehole from 2290 to 3350 m below the land surface. Microbial communities in these samples were characterized with cultivation-dependent and -independent techniques. Characterization of 16S rRNA genes indicated that the bacterial clone sequences related to Firmicutes became progressively dominant with increasing depth. Most sequences were related to anaerobic, thermophilic, halophilic or alkaliphilic bacteria. These habitats were consistent with the measured geochemical characteristics of the drilling fluids that have incorporated geological fluids and partly reflected the in-situ conditions. Several clone types were closely related to Thermoanaerobacter ethanolicus, Caldicellulosiruptor lactoaceticus, and Anaerobranca gottschalkii, an anaerobic metal-reducer, an extreme thermophile, and an anaerobic chemoorganotroph, respectively, with an optimal growth temperature of 50–68°C. Seven anaerobic, thermophilic Fe(III)-reducing bacterial isolates were obtained and they were capable of reducing iron oxide and clay minerals to produce siderite, vivianite, and illite. The archaeal diversity was low. Most archaeal sequences were not related to any known cultivated species, but rather to environmental clone sequences recovered from subsurface environments. We infer that the detected microbes were derived from geological fluids at depth and their growth habitats reflected the deep subsurface conditions. These findings have important implications for microbial survival and their ecological functions in the deep subsurface.

  13. Aligning and synchronization of MIS5 proxy records from Lake Ohrid (FYROM) with independently dated Mediterranean archives: implications for DEEP core chronology

    NASA Astrophysics Data System (ADS)

    Zanchetta, Giovanni; Regattieri, Eleonora; Giaccio, Biagio; Wagner, Bernd; Sulpizio, Roberto; Francke, Alex; Vogel, Hendrik; Sadori, Laura; Masi, Alessia; Sinopoli, Gaia; Lacey, Jack H.; Leng, Melanie J.; Leicher, Niklas

    2016-05-01

    The DEEP site sediment sequence obtained during the ICDP SCOPSCO project at Lake Ohrid was dated using tephrostratigraphic information, cyclostratigraphy, and orbital tuning through the marine isotope stages (MIS) 15-1. Although this approach is suitable for the generation of a general chronological framework of the long succession, it is insufficient to resolve more detailed palaeoclimatological questions, such as leads and lags of climate events between marine and terrestrial records or between different regions. Here, we demonstrate how the use of different tie points can affect cyclostratigraphy and orbital tuning for the period between ca. 140 and 70 ka and how the results can be correlated with directly/indirectly radiometrically dated Mediterranean marine and continental proxy records. The alternative age model presented here shows consistent differences with that initially proposed by Francke et al. (2015) for the same interval, in particular at the level of the MIS6-5e transition. According to this new age model, different proxies from the DEEP site sediment record support an increase of temperatures between glacial to interglacial conditions, which is almost synchronous with a rapid increase in sea surface temperature observed in the western Mediterranean. The results show how a detailed study of independent chronological tie points is important to align different records and to highlight asynchronisms of climate events. Moreover, Francke et al. (2016) have incorporated the new chronology proposed for tephra OH-DP-0499 in the final DEEP age model. This has reduced substantially the chronological discrepancies between the DEEP site age model and the model proposed here for the last glacial-interglacial transition.

  14. A comprehensive survey of 3' animal miRNA modification events and a possible role for 3' adenylation in modulating miRNA targeting effectiveness.

    PubMed

    Burroughs, A Maxwell; Ando, Yoshinari; de Hoon, Michiel J L; Tomaru, Yasuhiro; Nishibu, Takahiro; Ukekawa, Ryo; Funakoshi, Taku; Kurokawa, Tsutomu; Suzuki, Harukazu; Hayashizaki, Yoshihide; Daub, Carsten O

    2010-10-01

    Animal microRNA sequences are subject to 3' nucleotide addition. Through detailed analysis of deep-sequenced short RNA data sets, we show adenylation and uridylation of miRNA is globally present and conserved across Drosophila and vertebrates. To better understand 3' adenylation function, we deep-sequenced RNA after knockdown of nucleotidyltransferase enzymes. The PAPD4 nucleotidyltransferase adenylates a wide range of miRNA loci, but adenylation does not appear to affect miRNA stability on a genome-wide scale. Adenine addition appears to reduce effectiveness of miRNA targeting of mRNA transcripts while deep-sequencing of RNA bound to immunoprecipitated Argonaute (AGO) subfamily proteins EIF2C1-EIF2C3 revealed substantial reduction of adenine addition in miRNA associated with EIF2C2 and EIF2C3. Our findings show 3' addition events are widespread and conserved across animals, PAPD4 is a primary miRNA adenylating enzyme, and suggest a role for 3' adenine addition in modulating miRNA effectiveness, possibly through interfering with incorporation into the RNA-induced silencing complex (RISC), a regulatory role that would complement the role of miRNA uridylation in blocking DICER1 uptake.

  15. Maximum entropy methods for extracting the learned features of deep neural networks.

    PubMed

    Finnegan, Alex; Song, Jun S

    2017-10-01

    New architectures of multilayer artificial neural networks and new methods for training them are rapidly revolutionizing the application of machine learning in diverse fields, including business, social science, physical sciences, and biology. Interpreting deep neural networks, however, currently remains elusive, and a critical challenge lies in understanding which meaningful features a network is actually learning. We present a general method for interpreting deep neural networks and extracting network-learned features from input data. We describe our algorithm in the context of biological sequence analysis. Our approach, based on ideas from statistical physics, samples from the maximum entropy distribution over possible sequences, anchored at an input sequence and subject to constraints implied by the empirical function learned by a network. Using our framework, we demonstrate that local transcription factor binding motifs can be identified from a network trained on ChIP-seq data and that nucleosome positioning signals are indeed learned by a network trained on chemical cleavage nucleosome maps. Imposing a further constraint on the maximum entropy distribution also allows us to probe whether a network is learning global sequence features, such as the high GC content in nucleosome-rich regions. This work thus provides valuable mathematical tools for interpreting and extracting learned features from feed-forward neural networks.

  16. w4CSeq: software and web application to analyze 4C-seq data.

    PubMed

    Cai, Mingyang; Gao, Fan; Lu, Wange; Wang, Kai

    2016-11-01

    Circularized Chromosome Conformation Capture followed by deep sequencing (4C-Seq) is a powerful technique to identify genome-wide partners interacting with a pre-specified genomic locus. Here, we present a computational and statistical approach to analyze 4C-Seq data generated from both enzyme digestion and sonication fragmentation-based methods. We implemented a command line software tool and a web interface called w4CSeq, which takes in the raw 4C sequencing data (FASTQ files) as input, performs automated statistical analysis and presents results in a user-friendly manner. Besides providing users with the list of candidate interacting sites/regions, w4CSeq generates figures showing genome-wide distribution of interacting regions, and sketches the enrichment of key features such as TSSs, TTSs, CpG sites and DNA replication timing around 4C sites. Users can establish their own web server by downloading source codes at https://github.com/WGLab/w4CSeq Additionally, a demo web server is available at http://w4cseq.wglab.org CONTACT: kaiwang@usc.edu or wangelu@usc.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. Revealing stable processing products from ribosome-associated small RNAs by deep-sequencing data analysis.

    PubMed

    Zywicki, Marek; Bakowska-Zywicka, Kamilla; Polacek, Norbert

    2012-05-01

    The exploration of the non-protein-coding RNA (ncRNA) transcriptome is currently focused on profiling of microRNA expression and detection of novel ncRNA transcription units. However, recent studies suggest that RNA processing can be a multi-layer process leading to the generation of ncRNAs of diverse functions from a single primary transcript. Up to date no methodology has been presented to distinguish stable functional RNA species from rapidly degraded side products of nucleases. Thus the correct assessment of widespread RNA processing events is one of the major obstacles in transcriptome research. Here, we present a novel automated computational pipeline, named APART, providing a complete workflow for the reliable detection of RNA processing products from next-generation-sequencing data. The major features include efficient handling of non-unique reads, detection of novel stable ncRNA transcripts and processing products and annotation of known transcripts based on multiple sources of information. To disclose the potential of APART, we have analyzed a cDNA library derived from small ribosome-associated RNAs in Saccharomyces cerevisiae. By employing the APART pipeline, we were able to detect and confirm by independent experimental methods multiple novel stable RNA molecules differentially processed from well known ncRNAs, like rRNAs, tRNAs or snoRNAs, in a stress-dependent manner.

  18. In-depth comparison of somatic point mutation callers based on different tumor next-generation sequencing depth data

    NASA Astrophysics Data System (ADS)

    Cai, Lei; Yuan, Wei; Zhang, Zhou; He, Lin; Chou, Kuo-Chen

    2016-11-01

    Four popular somatic single nucleotide variant (SNV) calling methods (Varscan, SomaticSniper, Strelka and MuTect2) were carefully evaluated on the real whole exome sequencing (WES, depth of ~50X) and ultra-deep targeted sequencing (UDT-Seq, depth of ~370X) data. The four tools returned poor consensus on candidates (only 20% of calls were with multiple hits by the callers). For both WES and UDT-Seq, MuTect2 and Strelka obtained the largest proportion of COSMIC entries as well as the lowest rate of dbSNP presence and high-alternative-alleles-in-control calls, demonstrating their superior sensitivity and accuracy. Combining different callers does increase reliability of candidates, but narrows the list down to very limited range of tumor read depth and variant allele frequency. Calling SNV on UDT-Seq data, which were of much higher read-depth, discovered additional true-positive variations, despite an even more tremendous growth in false positive predictions. Our findings not only provide valuable benchmark for state-of-the-art SNV calling methods, but also shed light on the access to more accurate SNV identification in the future.

  19. omiRas: a Web server for differential expression analysis of miRNAs derived from small RNA-Seq data.

    PubMed

    Müller, Sören; Rycak, Lukas; Winter, Peter; Kahl, Günter; Koch, Ina; Rotter, Björn

    2013-10-15

    Small RNA deep sequencing is widely used to characterize non-coding RNAs (ncRNAs) differentially expressed between two conditions, e.g. healthy and diseased individuals and to reveal insights into molecular mechanisms underlying condition-specific phenotypic traits. The ncRNAome is composed of a multitude of RNAs, such as transfer RNA, small nucleolar RNA and microRNA (miRNA), to name few. Here we present omiRas, a Web server for the annotation, comparison and visualization of interaction networks of ncRNAs derived from next-generation sequencing experiments of two different conditions. The Web tool allows the user to submit raw sequencing data and results are presented as: (i) static annotation results including length distribution, mapping statistics, alignments and quantification tables for each library as well as lists of differentially expressed ncRNAs between conditions and (ii) an interactive network visualization of user-selected miRNAs and their target genes based on the combination of several miRNA-mRNA interaction databases. The omiRas Web server is implemented in Python, PostgreSQL, R and can be accessed at: http://tools.genxpro.net/omiras/.

  20. Short intronic repeat sequences facilitate circular RNA production

    PubMed Central

    Liang, Dongming

    2014-01-01

    Recent deep sequencing studies have revealed thousands of circular noncoding RNAs generated from protein-coding genes. These RNAs are produced when the precursor messenger RNA (pre-mRNA) splicing machinery “backsplices” and covalently joins, for example, the two ends of a single exon. However, the mechanism by which the spliceosome selects only certain exons to circularize is largely unknown. Using extensive mutagenesis of expression plasmids, we show that miniature introns containing the splice sites along with short (∼30- to 40-nucleotide) inverted repeats, such as Alu elements, are sufficient to allow the intervening exons to circularize in cells. The intronic repeats must base-pair to one another, thereby bringing the splice sites into close proximity to each other. More than simple thermodynamics is clearly at play, however, as not all repeats support circularization, and increasing the stability of the hairpin between the repeats can sometimes inhibit circular RNA biogenesis. The intronic repeats and exonic sequences must collaborate with one another, and a functional 3′ end processing signal is required, suggesting that circularization may occur post-transcriptionally. These results suggest detailed and generalizable models that explain how the splicing machinery determines whether to produce a circular noncoding RNA or a linear mRNA. PMID:25281217

  1. A comprehensive framework for functional diversity patterns of marine chromophytic phytoplankton using rbcL phylogeny

    PubMed Central

    Samanta, Brajogopal; Bhadury, Punyasloke

    2016-01-01

    Marine chromophytes are taxonomically diverse group of algae and contribute approximately half of the total oceanic primary production. To understand the global patterns of functional diversity of chromophytic phytoplankton, robust bioinformatics and statistical analyses including deep phylogeny based on 2476 form ID rbcL gene sequences representing seven ecologically significant oceanographic ecoregions were undertaken. In addition, 12 form ID rbcL clone libraries were generated and analyzed (148 sequences) from Sundarbans Biosphere Reserve representing the world’s largest mangrove ecosystem as part of this study. Global phylogenetic analyses recovered 11 major clades of chromophytic phytoplankton in varying proportions with several novel rbcL sequences in each of the seven targeted ecoregions. Majority of OTUs was found to be exclusive to each ecoregion, whereas some were shared by two or more ecoregions based on beta-diversity analysis. Present phylogenetic and bioinformatics analyses provide a strong statistical support for the hypothesis that different oceanographic regimes harbor distinct and coherent groups of chromophytic phytoplankton. It has been also shown as part of this study that varying natural selection pressure on form ID rbcL gene under different environmental conditions could lead to functional differences and overall fitness of chromophytic phytoplankton populations. PMID:26861415

  2. Systematic comparison of variant calling pipelines using gold standard personal exome variants

    PubMed Central

    Hwang, Sohyun; Kim, Eiru; Lee, Insuk; Marcotte, Edward M.

    2015-01-01

    The success of clinical genomics using next generation sequencing (NGS) requires the accurate and consistent identification of personal genome variants. Assorted variant calling methods have been developed, which show low concordance between their calls. Hence, a systematic comparison of the variant callers could give important guidance to NGS-based clinical genomics. Recently, a set of high-confident variant calls for one individual (NA12878) has been published by the Genome in a Bottle (GIAB) consortium, enabling performance benchmarking of different variant calling pipelines. Based on the gold standard reference variant calls from GIAB, we compared the performance of thirteen variant calling pipelines, testing combinations of three read aligners—BWA-MEM, Bowtie2, and Novoalign—and four variant callers—Genome Analysis Tool Kit HaplotypeCaller (GATK-HC), Samtools mpileup, Freebayes and Ion Proton Variant Caller (TVC), for twelve data sets for the NA12878 genome sequenced by different platforms including Illumina2000, Illumina2500, and Ion Proton, with various exome capture systems and exome coverage. We observed different biases toward specific types of SNP genotyping errors by the different variant callers. The results of our study provide useful guidelines for reliable variant identification from deep sequencing of personal genomes. PMID:26639839

  3. Advances for Studying Clonal Evolution in Cancer

    PubMed Central

    Raphael, Benjamin J.; Chen, Feng; Wendl, Michael C.

    2013-01-01

    The “clonal evolution” model of cancer emerged and “evolved” amid ongoing advances in technology, especially in recent years during which next generation sequencing instruments have provided ever higher resolution pictures of the genetic changes in cancer cells and heterogeneity in tumors. It has become increasingly clear that clonal evolution is not a single sequential process, but instead frequently involves simultaneous evolution of multiple subclones that co-exist because they are of similar fitness or are spatially separated. Co-evolution of subclones also occurs when they complement each other’s survival advantages. Recent studies have also shown that clonal evolution is highly heterogeneous: different individual tumors of the same type may undergo very different paths of clonal evolution. New methodological advancements, including deep digital sequencing of a mixed tumor population, single cell sequencing, and the development of more sophisticated computational tools, will continue to shape and reshape the models of clonal evolution. In turn, these will provide both an improved framework for the understanding of cancer progression and a guide for treatment strategies aimed at the elimination of all, rather than just some, of the cancer cells within a patient. PMID:23353056

  4. Advances for studying clonal evolution in cancer.

    PubMed

    Ding, Li; Raphael, Benjamin J; Chen, Feng; Wendl, Michael C

    2013-11-01

    The "clonal evolution" model of cancer emerged and "evolved" amid ongoing advances in technology, especially in recent years during which next generation sequencing instruments have provided ever higher resolution pictures of the genetic changes in cancer cells and heterogeneity in tumors. It has become increasingly clear that clonal evolution is not a single sequential process, but instead frequently involves simultaneous evolution of multiple subclones that co-exist because they are of similar fitness or are spatially separated. Co-evolution of subclones also occurs when they complement each other's survival advantages. Recent studies have also shown that clonal evolution is highly heterogeneous: different individual tumors of the same type may undergo very different paths of clonal evolution. New methodological advancements, including deep digital sequencing of a mixed tumor population, single cell sequencing, and the development of more sophisticated computational tools, will continue to shape and reshape the models of clonal evolution. In turn, these will provide both an improved framework for the understanding of cancer progression and a guide for treatment strategies aimed at the elimination of all, rather than just some, of the cancer cells within a patient. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  5. Software for Automation of Real-Time Agents, Version 2

    NASA Technical Reports Server (NTRS)

    Fisher, Forest; Estlin, Tara; Gaines, Daniel; Schaffer, Steve; Chouinard, Caroline; Engelhardt, Barbara; Wilklow, Colette; Mutz, Darren; Knight, Russell; Rabideau, Gregg; hide

    2005-01-01

    Version 2 of Closed Loop Execution and Recovery (CLEaR) has been developed. CLEaR is an artificial intelligence computer program for use in planning and execution of actions of autonomous agents, including, for example, Deep Space Network (DSN) antenna ground stations, robotic exploratory ground vehicles (rovers), robotic aircraft (UAVs), and robotic spacecraft. CLEaR automates the generation and execution of command sequences, monitoring the sequence execution, and modifying the command sequence in response to execution deviations and failures as well as new goals for the agent to achieve. The development of CLEaR has focused on the unification of planning and execution to increase the ability of the autonomous agent to perform under tight resource and time constraints coupled with uncertainty in how much of resources and time will be required to perform a task. This unification is realized by extending the traditional three-tier robotic control architecture by increasing the interaction between the software components that perform deliberation and reactive functions. The increase in interaction reduces the need to replan, enables earlier detection of the need to replan, and enables replanning to occur before an agent enters a state of failure.

  6. Genomic approaches for the elucidation of genes and gene networks underlying cardiovascular traits.

    PubMed

    Adriaens, M E; Bezzina, C R

    2018-06-22

    Genome-wide association studies have shed light on the association between natural genetic variation and cardiovascular traits. However, linking a cardiovascular trait associated locus to a candidate gene or set of candidate genes for prioritization for follow-up mechanistic studies is all but straightforward. Genomic technologies based on next-generation sequencing technology nowadays offer multiple opportunities to dissect gene regulatory networks underlying genetic cardiovascular trait associations, thereby aiding in the identification of candidate genes at unprecedented scale. RNA sequencing in particular becomes a powerful tool when combined with genotyping to identify loci that modulate transcript abundance, known as expression quantitative trait loci (eQTL), or loci modulating transcript splicing known as splicing quantitative trait loci (sQTL). Additionally, the allele-specific resolution of RNA-sequencing technology enables estimation of allelic imbalance, a state where the two alleles of a gene are expressed at a ratio differing from the expected 1:1 ratio. When multiple high-throughput approaches are combined with deep phenotyping in a single study, a comprehensive elucidation of the relationship between genotype and phenotype comes into view, an approach known as systems genetics. In this review, we cover key applications of systems genetics in the broad cardiovascular field.

  7. The mitochondrial genome of Ifremeria nautilei and the phylogenetic position of the enigmatic deep-sea Abyssochrysoidea (Mollusca: Gastropoda).

    PubMed

    Osca, David; Templado, José; Zardoya, Rafael

    2014-09-01

    The complete nucleotide sequence of the mitochondrial (mt) genome of the deep-sea vent snail Ifremeria nautilei (Gastropoda: Abyssochrysoidea) was determined. The double stranded circular molecule is 15,664 pb in length and encodes for the typical 37 metazoan mitochondrial genes. The gene arrangement of the Ifremeria mt genome is most similar to genome organization of caenogastropods and differs only on the relative position of the trnW gene. The deduced amino acid sequences of the mt protein coding genes of Ifremeria mt genome were aligned with orthologous sequences from representatives of the main lineages of gastropods and phylogenetic relationships were inferred. The reconstructed phylogeny supports that Ifremeria belongs to Caenogastropoda and that it is closely related to hypsogastropod superfamilies. Results were compared with a reconstructed nuclear-based phylogeny. Moreover, a relaxed molecular-clock timetree calibrated with fossils dated the divergence of Abyssochrysoidea in the Late Jurassic-Early Cretaceous indicating a relatively modern colonization of deep-sea environments by these snails. Copyright © 2014 Elsevier B.V. All rights reserved.

  8. Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shi, CY; Yang, H; Wei, CL

    Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq) provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes. Using high-throughput Illumina RNA-seq, the transcriptome from poly (A){sup +} RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs). Approximate 34.5 million reads were obtained, trimmed, and assembled intomore » 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010). Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG) found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were analyzed by RT-PCR and quantitative real time PCR (qRT-PCR). An extensive transcriptome dataset has been obtained from the deep sequencing of tea plant. The coverage of the transcriptome is comprehensive enough to discover all known genes of several major metabolic pathways. This transcriptome dataset can serve as an important public information platform for gene expression, genomics, and functional genomic studies in C. sinensis.« less

  9. Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

    PubMed Central

    2011-01-01

    Background Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq) provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes. Results Using high-throughput Illumina RNA-seq, the transcriptome from poly (A)+ RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs). Approximate 34.5 million reads were obtained, trimmed, and assembled into 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010). Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG) found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were analyzed by RT-PCR and quantitative real time PCR (qRT-PCR). Conclusions An extensive transcriptome dataset has been obtained from the deep sequencing of tea plant. The coverage of the transcriptome is comprehensive enough to discover all known genes of several major metabolic pathways. This transcriptome dataset can serve as an important public information platform for gene expression, genomics, and functional genomic studies in C. sinensis. PMID:21356090

  10. Genomic Evidence that Methanotrophic Endosymbionts Likely Provide Deep-Sea Bathymodiolus Mussels with a Sterol Intermediate in Cholesterol Biosynthesis

    PubMed Central

    Takaki, Yoshihiro; Chikaraishi, Yoshito; Ikuta, Tetsuro; Ozawa, Genki; Yoshida, Takao; Ohkouchi, Naohiko; Fujikura, Katsunori

    2017-01-01

    Sterols are key cyclic triterpenoid lipid components of eukaryotic cellular membranes, which are synthesized through complex multi-enzyme pathways. Similar to most animals, Bathymodiolus mussels, which inhabit deep-sea chemosynthetic ecosystems and harbor methanotrophic and/or thiotrophic bacterial endosymbionts, possess cholesterol as their main sterol. Based on the stable carbon isotope analyses, it has been suggested that host Bathymodiolus mussels synthesize cholesterol using a sterol intermediate derived from the methanotrophic endosymbionts. To test this hypothesis, we sequenced the genome of the methanotrophic endosymbiont in Bathymodiolus platifrons. The genome sequence data demonstrated that the endosymbiont potentially generates up to 4,4-dimethyl-cholesta-8,14,24-trienol, a sterol intermediate in cholesterol biosynthesis, from methane. In addition, transcripts for a subset of the enzymes of the biosynthetic pathway to cholesterol downstream from a sterol intermediate derived from methanotroph endosymbionts were detected in our transcriptome data for B. platifrons. These findings suggest that this mussel can de novo synthesize cholesterol from methane in cooperation with the symbionts. By in situ hybridization analyses, we showed that genes associated with cholesterol biosynthesis from both host and endosymbionts were expressed exclusively in the gill epithelial bacteriocytes containing endosymbionts. Thus, cholesterol production is probably localized within these specialized cells of the gill. Considering that the host mussel cannot de novo synthesize cholesterol and depends largely on endosymbionts for nutrition, the capacity of endosymbionts to synthesize sterols may be important in establishing symbiont–host relationships in these chemosynthetic mussels. PMID:28453654

  11. Depositional model and seismic expression of turbidites in Campos basin, offshore Brazil

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Guardado, L.R.; Peres, W.E.; Souza Cruz, C.E.

    1986-05-01

    The Campos basin, located at the southeastern coast of Brazil, is one of the most prolific hydrocarbon areas along the Brazilian margin. The producing reservoirs range from the Neocomian to the early Miocene and were deposited in various environments. However, on deep-water submarine fans (turbidites), these reservoirs reach their maximum expression. The ratio between subsidence and sediment influx, associated to halokinesis, generated two major depositional styles for the sections before and after the early Oligocene, which affect the geometry of the turbidite bodies. Albian, Campanian, and Eocene turbidites occur mainly under the present continental shelf and are confined to thatmore » area. However, the Oligocene-Miocene turbidites occur mainly under the present slope. These younger turbidites were deposited as large submarine fans in broad subbasins as blankets, at the base of deep-water prograding sequences, and they have the potential for large hydrocarbon accumulations. These submarine fans are well defined on the seismic data, and specific processing improves the definition of anomalous amplitudes related to the presence of oil in the reservoirs.« less

  12. Recent paleoseismicity record in Prince William Sound, Alaska, USA

    NASA Astrophysics Data System (ADS)

    Kuehl, Steven A.; Miller, Eric J.; Marshall, Nicole R.; Dellapenna, Timothy M.

    2017-12-01

    Sedimentological and geochemical investigation of sediment cores collected in the deep (>400 m) central basin of Prince William Sound, along with geochemical fingerprinting of sediment source areas, are used to identify earthquake-generated sediment gravity flows. Prince William Sound receives sediment from two distinct sources: from offshore (primarily Copper River) through Hinchinbrook Inlet, and from sources within the Sound (primarily Columbia Glacier). These sources are found to have diagnostic elemental ratios indicative of provenance; Copper River Basin sediments were significantly higher in Sr/Pb and Cu/Pb, whereas Prince William Sound sediments were significantly higher in K/Ca and Rb/Sr. Within the past century, sediment gravity flows deposited within the deep central channel of Prince William Sound have robust geochemical (provenance) signatures that can be correlated with known moderate to large earthquakes in the region. Given the thick Holocene sequence in the Sound ( 200 m) and correspondingly high sedimentation rates (>1 cm year-1), this relationship suggests that sediments within the central basin of Prince William Sound may contain an extraordinary high-resolution record of paleoseismicity in the region.

  13. A Statistical Guide to the Design of Deep Mutational Scanning Experiments

    PubMed Central

    Matuszewski, Sebastian; Hildebrandt, Marcel E.; Ghenu, Ana-Hermina; Jensen, Jeffrey D.; Bank, Claudia

    2016-01-01

    The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deep-sequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates. PMID:27412710

  14. Petrology and geochemistry of mafic magmatic rocks from the Sarve-Abad ophiolites (Kurdistan region, Iran): Evidence for interaction between MORB-type asthenosphere and OIB-type components in the southern Neo-Tethys Ocean

    NASA Astrophysics Data System (ADS)

    Saccani, Emilio; Allahyari, Khalil; Rahimzadeh, Bahman

    2014-05-01

    The Sarve-Abad (Sawlava) ophiolites crop out in the Main Zagros Thrust Zone and represent remnants of the Mesozoic southern Neo-Tethys Ocean that was located between the Arabian shield and Sanandaj-Sirjan continental block. They consist of several incomplete ophiolitic sequences including gabbroic bodies, a dyke complex, and pillow lava sequences. These rocks generally range from sub-alkaline to transitional character. Mineral chemistry and whole-rock geochemistry indicate that they have compositions akin to enriched-type mid-ocean ridge basalts (E-MORB) and plume-type MORB (P-MORB). Nonetheless, the different depletion degrees in heavy rare earth elements (HREE), which can be observed in both E-MORB like and P-MORB like rocks enable two main basic chemical types of rocks to be distinguished as Type-I and Type-II. Type-I rocks are strongly depleted in HREE (YbN < ~ 6), whereas Type-II rocks are moderately depleted in HREE (YbN > 9.0). Petrogenetic modeling shows that Type-I rocks originated from 7 to 16% polybaric partial melting of a MORB-type mantle source, which was significantly enriched by plume-type components. These rocks resulted from the mixing of variable fractions of melts generated in garnet-facies and the spinel-facies mantle. In contrast, Type-II rocks originated from 5 to 8% partial melting in the spinel-facies of a MORB-type source, which was moderately enriched by plume-type components. A possible tectono-magmatic model for the generation of the southern Neo-Tethys oceanic crust implies that the continental rift and subsequent oceanic spreading were associated with uprising of MORB-type asthenospheric mantle featuring plume-type component influences decreasing from deep to shallow mantle levels. These deep plume-type components were most likely inherited from Carboniferous mantle plume activity that was associated with the opening of Paleo-Tethys in the same area.

  15. Dissecting genetic and environmental mutation signatures with model organisms.

    PubMed

    Segovia, Romulo; Tam, Annie S; Stirling, Peter C

    2015-08-01

    Deep sequencing has impacted on cancer research by enabling routine sequencing of genomes and exomes to identify genetic changes associated with carcinogenesis. Researchers can now use the frequency, type, and context of all mutations in tumor genomes to extract mutation signatures that reflect the driving mutational processes. Identifying mutation signatures, however, may not immediately suggest a mechanism. Consequently, several recent studies have employed deep sequencing of model organisms exposed to discrete genetic or environmental perturbations. These studies exploit the simpler genomes and availability of powerful genetic tools in model organisms to analyze mutation signatures under controlled conditions, forging mechanistic links between mutational processes and signatures. We discuss the power of this approach and suggest that many such studies may be on the horizon. Copyright © 2015 Elsevier Ltd. All rights reserved.

  16. A ruby-colored Pseudobaeospora species is described as new from material collected on the island of Hawaii.

    PubMed

    Desjardin, Dennis E; Hemmes, Don E; Perry, Brian A

    2014-01-01

    Pseudobaeospora wipapatiae is described as new based on material collected in alien wet habitats on the island of Hawaii. Unique features of this beautiful species include deep ruby-colored basidiomes with two-spored basidia, amyloid cheilocystidia and a hymeniderm pileipellis with abundant pileocystidia that is initially deep ruby in KOH then changes to lilac gray. Phylogenetic analysis of nuclear large ribosomal subunit sequence data suggest a close relationship between Pseudobaeospora and Tricholoma. BLAST comparisons of internal transcribed spacer and 5.8S nuclear ribosomal subunit regions sequence data reveal greatest similarity with existing sequences of Pseudobaeospora species. A comprehensive description, color photograph, illustrations of salient micromorphological features and comparisons with phenetically similar taxa are provided. © 2014 by The Mycological Society of America.

  17. Sensitive Next-Generation Sequencing Method Reveals Deep Genetic Diversity of HIV-1 in the Democratic Republic of the Congo.

    PubMed

    Rodgers, Mary A; Wilkinson, Eduan; Vallari, Ana; McArthur, Carole; Sthreshley, Larry; Brennan, Catherine A; Cloherty, Gavin; de Oliveira, Tulio

    2017-03-15

    As the epidemiological epicenter of the human immunodeficiency virus (HIV) pandemic, the Democratic Republic of the Congo (DRC) is a reservoir of circulating HIV strains exhibiting high levels of diversity and recombination. In this study, we characterized HIV specimens collected in two rural areas of the DRC between 2001 and 2003 to identify rare strains of HIV. The env gp41 region was sequenced and characterized for 172 HIV-positive specimens. The env sequences were predominantly subtype A (43.02%), but 7 other subtypes (33.14%), 20 circulating recombinant forms (CRFs; 11.63%), and 20 unclassified (11.63%) sequences were also found. Of the rare and unclassified subtypes, 18 specimens were selected for next-generation sequencing (NGS) by a modified HIV-switching mechanism at the 5' end of the RNA template (SMART) method to obtain full-genome sequences. NGS produced 14 new complete genomes, which included pure subtype C ( n = 2), D ( n = 1), F1 ( n = 1), H ( n = 3), and J ( n = 1) genomes. The two subtype C genomes and one of the subtype H genomes branched basal to their respective subtype branches but had no evidence of recombination. The remaining 6 genomes were complex recombinants of 2 or more subtypes, including subtypes A1, F, G, H, J, and K and unclassified fragments, including one subtype CRF25 isolate, which branched basal to all CRF25 references. Notably, all recombinant subtype H fragments branched basal to the H clade. Spatial-geographical analysis indicated that the diverse sequences identified here did not expand globally. The full-genome and subgenomic sequences identified in our study population significantly increase the documented diversity of the strains involved in the continually evolving HIV-1 pandemic. IMPORTANCE Very little is known about the ancestral HIV-1 strains that founded the global pandemic, and very few complete genome sequences are available from patients in the Congo Basin, where HIV-1 expanded early in the global pandemic. By sequencing a subgenomic fragment of the HIV-1 envelope from study participants in the DRC, we identified rare variants for complete genome sequencing. The basal branching of some of the complete genome sequences that we recovered suggests that these strains are more closely related to ancestral HIV-1 strains than to previously reported strains and is evidence that the local diversification of HIV in the DRC continues to outpace the diversity of global strains decades after the emergence of the pandemic. Copyright © 2017 Rodgers et al.

  18. A Next-Generation Sequencing Primer—How Does It Work and What Can It Do?

    PubMed Central

    Alekseyev, Yuriy O.; Fazeli, Roghayeh; Yang, Shi; Basran, Raveen; Miller, Nancy S.

    2018-01-01

    Next-generation sequencing refers to a high-throughput technology that determines the nucleic acid sequences and identifies variants in a sample. The technology has been introduced into clinical laboratory testing and produces test results for precision medicine. Since next-generation sequencing is relatively new, graduate students, medical students, pathology residents, and other physicians may benefit from a primer to provide a foundation about basic next-generation sequencing methods and applications, as well as specific examples where it has had diagnostic and prognostic utility. Next-generation sequencing technology grew out of advances in multiple fields to produce a sophisticated laboratory test with tremendous potential. Next-generation sequencing may be used in the clinical setting to look for specific genetic alterations in patients with cancer, diagnose inherited conditions such as cystic fibrosis, and detect and profile microbial organisms. This primer will review DNA sequencing technology, the commercialization of next-generation sequencing, and clinical uses of next-generation sequencing. Specific applications where next-generation sequencing has demonstrated utility in oncology are provided. PMID:29761157

  19. Improving resolution of MR images with an adversarial network incorporating images with different contrast.

    PubMed

    Kim, Ki Hwan; Do, Won-Joon; Park, Sung-Hong

    2018-05-04

    The routine MRI scan protocol consists of multiple pulse sequences that acquire images of varying contrast. Since high frequency contents such as edges are not significantly affected by image contrast, down-sampled images in one contrast may be improved by high resolution (HR) images acquired in another contrast, reducing the total scan time. In this study, we propose a new deep learning framework that uses HR MR images in one contrast to generate HR MR images from highly down-sampled MR images in another contrast. The proposed convolutional neural network (CNN) framework consists of two CNNs: (a) a reconstruction CNN for generating HR images from the down-sampled images using HR images acquired with a different MRI sequence and (b) a discriminator CNN for improving the perceptual quality of the generated HR images. The proposed method was evaluated using a public brain tumor database and in vivo datasets. The performance of the proposed method was assessed in tumor and no-tumor cases separately, with perceptual image quality being judged by a radiologist. To overcome the challenge of training the network with a small number of available in vivo datasets, the network was pretrained using the public database and then fine-tuned using the small number of in vivo datasets. The performance of the proposed method was also compared to that of several compressed sensing (CS) algorithms. Incorporating HR images of another contrast improved the quantitative assessments of the generated HR image in reference to ground truth. Also, incorporating a discriminator CNN yielded perceptually higher image quality. These results were verified in regions of normal tissue as well as tumors for various MRI sequences from pseudo k-space data generated from the public database. The combination of pretraining with the public database and fine-tuning with the small number of real k-space datasets enhanced the performance of CNNs in in vivo application compared to training CNNs from scratch. The proposed method outperformed the compressed sensing methods. The proposed method can be a good strategy for accelerating routine MRI scanning. © 2018 American Association of Physicists in Medicine.

  20. DeepSig: deep learning improves signal peptide detection in proteins.

    PubMed

    Savojardo, Castrense; Martelli, Pier Luigi; Fariselli, Piero; Casadio, Rita

    2018-05-15

    The identification of signal peptides in protein sequences is an important step toward protein localization and function characterization. Here, we present DeepSig, an improved approach for signal peptide detection and cleavage-site prediction based on deep learning methods. Comparative benchmarks performed on an updated independent dataset of proteins show that DeepSig is the current best performing method, scoring better than other available state-of-the-art approaches on both signal peptide detection and precise cleavage-site identification. DeepSig is available as both standalone program and web server at https://deepsig.biocomp.unibo.it. All datasets used in this study can be obtained from the same website. pierluigi.martelli@unibo.it. Supplementary data are available at Bioinformatics online.

  1. Deep Sequencing Reveals the Complete Genome and Evidence for Transcriptional Activity of the First Virus-Like Sequences Identified in Aristotelia chilensis (Maqui Berry)

    PubMed Central

    Villacreses, Javier; Rojas-Herrera, Marcelo; Sánchez, Carolina; Hewstone, Nicole; Undurraga, Soledad F.; Alzate, Juan F.; Manque, Patricio; Maracaja-Coutinho, Vinicius; Polanco, Victor

    2015-01-01

    Here, we report the genome sequence and evidence for transcriptional activity of a virus-like element in the native Chilean berry tree Aristotelia chilensis. We propose to name the endogenous sequence as Aristotelia chilensis Virus 1 (AcV1). High-throughput sequencing of the genome of this tree uncovered an endogenous viral element, with a size of 7122 bp, corresponding to the complete genome of AcV1. Its sequence contains three open reading frames (ORFs): ORFs 1 and 2 shares 66%–73% amino acid similarity with members of the Caulimoviridae virus family, especially the Petunia vein clearing virus (PVCV), Petuvirus genus. ORF1 encodes a movement protein (MP); ORF2 a Reverse Transcriptase (RT) and a Ribonuclease H (RNase H) domain; and ORF3 showed no amino acid sequence similarity with any other known virus proteins. Analogous to other known endogenous pararetrovirus sequences (EPRVs), AcV1 is integrated in the genome of Maqui Berry and showed low viral transcriptional activity, which was detected by deep sequencing technology (DNA and RNA-seq). Phylogenetic analysis of AcV1 and other pararetroviruses revealed a closer resemblance with Petuvirus. Overall, our data suggests that AcV1 could be a new member of Caulimoviridae family, genus Petuvirus, and the first evidence of this kind of virus in a fruit plant. PMID:25855242

  2. High Diversity of Myocyanophage in Various Aquatic Environments Revealed by High-Throughput Sequencing of Major Capsid Protein Gene With a New Set of Primers.

    PubMed

    Hou, Weiguo; Wang, Shang; Briggs, Brandon R; Li, Gaoyuan; Xie, Wei; Dong, Hailiang

    2018-01-01

    Myocyanophages, a group of viruses infecting cyanobacteria, are abundant and play important roles in elemental cycling. Here we investigated the particle-associated viral communities retained on 0.2 μm filters and in sediment samples (representing ancient cyanophage communities) from four ocean and three lake locations, using high-throughput sequencing and a newly designed primer pair targeting a gene fragment (∼145-bp in length) encoding the cyanophage gp23 major capsid protein (MCP). Diverse viral communities were detected in all samples. The fragments of 142-, 145-, and 148-bp in length were most abundant in the amplicons, and most sequences (>92%) belonged to cyanophages. Additionally, different sequencing depths resulted in different diversity estimates of the viral community. Operational taxonomic units obtained from deep sequencing of the MCP gene covered the majority of those obtained from shallow sequencing, suggesting that deep sequencing exhibited a more complete picture of cyanophage community than shallow sequencing. Our results also revealed a wide geographic distribution of marine myocyanophages, i.e., higher dissimilarities of the myocyanophage communities corresponded with the larger distances between the sampling sites. Collectively, this study suggests that the newly designed primer pair can be effectively used to study the community and diversity of myocyanophage from different environments, and the high-throughput sequencing represents a good method to understand viral diversity.

  3. High Diversity of Myocyanophage in Various Aquatic Environments Revealed by High-Throughput Sequencing of Major Capsid Protein Gene With a New Set of Primers

    PubMed Central

    Hou, Weiguo; Wang, Shang; Briggs, Brandon R.; Li, Gaoyuan; Xie, Wei; Dong, Hailiang

    2018-01-01

    Myocyanophages, a group of viruses infecting cyanobacteria, are abundant and play important roles in elemental cycling. Here we investigated the particle-associated viral communities retained on 0.2 μm filters and in sediment samples (representing ancient cyanophage communities) from four ocean and three lake locations, using high-throughput sequencing and a newly designed primer pair targeting a gene fragment (∼145-bp in length) encoding the cyanophage gp23 major capsid protein (MCP). Diverse viral communities were detected in all samples. The fragments of 142-, 145-, and 148-bp in length were most abundant in the amplicons, and most sequences (>92%) belonged to cyanophages. Additionally, different sequencing depths resulted in different diversity estimates of the viral community. Operational taxonomic units obtained from deep sequencing of the MCP gene covered the majority of those obtained from shallow sequencing, suggesting that deep sequencing exhibited a more complete picture of cyanophage community than shallow sequencing. Our results also revealed a wide geographic distribution of marine myocyanophages, i.e., higher dissimilarities of the myocyanophage communities corresponded with the larger distances between the sampling sites. Collectively, this study suggests that the newly designed primer pair can be effectively used to study the community and diversity of myocyanophage from different environments, and the high-throughput sequencing represents a good method to understand viral diversity.

  4. High Class-Imbalance in pre-miRNA Prediction: A Novel Approach Based on deepSOM.

    PubMed

    Stegmayer, Georgina; Yones, Cristian; Kamenetzky, Laura; Milone, Diego H

    2017-01-01

    The computational prediction of novel microRNA within a full genome involves identifying sequences having the highest chance of being a miRNA precursor (pre-miRNA). These sequences are usually named candidates to miRNA. The well-known pre-miRNAs are usually only a few in comparison to the hundreds of thousands of potential candidates to miRNA that have to be analyzed, which makes this task a high class-imbalance classification problem. The classical way of approaching it has been training a binary classifier in a supervised manner, using well-known pre-miRNAs as positive class and artificially defining the negative class. However, although the selection of positive labeled examples is straightforward, it is very difficult to build a set of negative examples in order to obtain a good set of training samples for a supervised method. In this work, we propose a novel and effective way of approaching this problem using machine learning, without the definition of negative examples. The proposal is based on clustering unlabeled sequences of a genome together with well-known miRNA precursors for the organism under study, which allows for the quick identification of the best candidates to miRNA as those sequences clustered with known precursors. Furthermore, we propose a deep model to overcome the problem of having very few positive class labels. They are always maintained in the deep levels as positive class while less likely pre-miRNA sequences are filtered level after level. Our approach has been compared with other methods for pre-miRNAs prediction in several species, showing effective predictivity of novel miRNAs. Additionally, we will show that our approach has a lower training time and allows for a better graphical navegability and interpretation of the results. A web-demo interface to try deepSOM is available at http://fich.unl.edu.ar/sinc/web-demo/deepsom/.

  5. Molecular diversity and distribution pattern of ciliates in sediments from deep-sea hydrothermal vents in the Okinawa Trough and adjacent sea areas

    NASA Astrophysics Data System (ADS)

    Zhao, Feng; Xu, Kuidong

    2016-10-01

    In comparison with the macrobenthos and prokaryotes, patterns of diversity and distribution of microbial eukaryotes in deep-sea hydrothermal vents are poorly known. The widely used high-throughput sequencing of 18S rDNA has revealed a high diversity of microeukaryotes yielded from both living organisms and buried DNA in marine sediments. More recently, cDNA surveys have been utilized to uncover the diversity of active organisms. However, both methods have never been used to evaluate the diversity of ciliates in hydrothermal vents. By using high-throughput DNA and cDNA sequencing of 18S rDNA, we evaluated the molecular diversity of ciliates, a representative group of microbial eukaryotes, from the sediments of deep-sea hydrothermal vents in the Okinawa Trough and compared it with that of an adjacent deep-sea area about 15 km away and that of an offshore area of the Yellow Sea about 500 km away. The results of DNA sequencing showed that Spirotrichea and Oligohymenophorea were the most diverse and abundant groups in all the three habitats. The proportion of sequences of Oligohymenophorea was the highest in the hydrothermal vents whereas Spirotrichea was the most diverse group at all three habitats. Plagiopyleans were found only in the hydrothermal vents but with low diversity and abundance. By contrast, the cDNA sequencing showed that Plagiopylea was the most diverse and most abundant group in the hydrothermal vents, followed by Spirotrichea in terms of diversity and Oligohymenophorea in terms of relative abundance. A novel group of ciliates, distinctly separate from the 12 known classes, was detected in the hydrothermal vents, indicating undescribed, possibly highly divergent ciliates may inhabit this environment. Statistical analyses showed that: (i) the three habitats differed significantly from one another in terms of diversity of both the rare and the total ciliate taxa, and; (ii) the adjacent deep sea was more similar to the offshore area than to the hydrothermal vents. In terms of the diversity of abundant taxa, however, there was no significant difference between the hydrothermal vents and the adjacent deep sea, both of which differed significantly from the offshore area. As abundant ciliate taxa can be found in several sampling sites, they are likely adapted to large environmental variations, while rare taxa are found in specific habitat and thus are potentially more sensitive to varying environmental conditions.

  6. Ultra Deep Sequencing of Listeria monocytogenes sRNA Transcriptome Revealed New Antisense RNAs

    PubMed Central

    Behrens, Sebastian; Widder, Stefanie; Mannala, Gopala Krishna; Qing, Xiaoxing; Madhugiri, Ramakanth; Kefer, Nathalie; Mraheil, Mobarak Abu; Rattei, Thomas; Hain, Torsten

    2014-01-01

    Listeria monocytogenes, a gram-positive pathogen, and causative agent of listeriosis, has become a widely used model organism for intracellular infections. Recent studies have identified small non-coding RNAs (sRNAs) as important factors for regulating gene expression and pathogenicity of L. monocytogenes. Increased speed and reduced costs of high throughput sequencing (HTS) techniques have made RNA sequencing (RNA-Seq) the state-of-the-art method to study bacterial transcriptomes. We created a large transcriptome dataset of L. monocytogenes containing a total of 21 million reads, using the SOLiD sequencing technology. The dataset contained cDNA sequences generated from L. monocytogenes RNA collected under intracellular and extracellular condition and additionally was size fractioned into three different size ranges from <40 nt, 40–150 nt and >150 nt. We report here, the identification of nine new sRNAs candidates of L. monocytogenes and a reevaluation of known sRNAs of L. monocytogenes EGD-e. Automatic comparison to known sRNAs revealed a high recovery rate of 55%, which was increased to 90% by manual revision of the data. Moreover, thorough classification of known sRNAs shed further light on their possible biological functions. Interestingly among the newly identified sRNA candidates are antisense RNAs (asRNAs) associated to the housekeeping genes purA, fumC and pgi and potentially their regulation, emphasizing the significance of sRNAs for metabolic adaptation in L. monocytogenes. PMID:24498259

  7. Molecular Diet Analysis of Two African Free-Tailed Bats (Molossidae) Using High Throughput Sequencing

    PubMed Central

    Bohmann, Kristine; Monadjem, Ara; Lehmkuhl Noer, Christina; Rasmussen, Morten; Zeale, Matt R. K.; Clare, Elizabeth; Jones, Gareth; Willerslev, Eske; Gilbert, M. Thomas P.

    2011-01-01

    Given the diversity of prey consumed by insectivorous bats, it is difficult to discern the composition of their diet using morphological or conventional PCR-based analyses of their faeces. We demonstrate the use of a powerful alternate tool, the use of the Roche FLX sequencing platform to deep-sequence uniquely 5′ tagged insect-generic barcode cytochrome c oxidase I (COI) fragments, that were PCR amplified from faecal pellets of two free-tailed bat species Chaerephon pumilus and Mops condylurus (family: Molossidae). Although the analyses were challenged by the paucity of southern African insect COI sequences in the GenBank and BOLD databases, similarity to existing collections allowed the preliminary identification of 25 prey families from six orders of insects within the diet of C. pumilus, and 24 families from seven orders within the diet of M. condylurus. Insects identified to families within the orders Lepidoptera and Diptera were widely present among the faecal samples analysed. The two families that were observed most frequently were Noctuidae and Nymphalidae (Lepidoptera). Species-level analysis of the data was accomplished using novel bioinformatics techniques for the identification of molecular operational taxonomic units (MOTU). Based on these analyses, our data provide little evidence of resource partitioning between sympatric M. condylurus and C. pumilus in the Simunye region of Swaziland at the time of year when the samples were collected, although as more complete databases against which to compare the sequences are generated this may have to be re-evaluated. PMID:21731749

  8. Role of Mitochondrial Inheritance on Prostate Cancer Outcome in African American Men. Addendum

    DTIC Science & Technology

    2016-11-01

    DNA sequencing technique developed by our collaborator using single amplicon long-range PCR that permits deep coverage (10,000-20,000X on average) of...the mitochondrial genome. We have sequenced 652 samples derived from frozen fully using this technology. The additional DNA samples derived from...paraffin embedded (FFPE) tissue were more challenging, but have now been sequenced . Mapping of DNA variants in our sequenced genomes to mitochondrial

  9. Deep Sequencing Reveals a Divergent Ugandan cassava brown streak virus Isolate from Malawi

    PubMed Central

    Winter, Stephan; Mukasa, Settumba; Tairo, Fred; Sseruwagi, Peter; Ndunguru, Joseph; Duffy, Siobain

    2017-01-01

    ABSTRACT Illumina sequencing of RNA from a cassava cutting from northern Malawi produced a genome of Ugandan cassava brown streak virus (UCBSV-MW-NB7_2013). Sequence comparisons revealed stronger similarity to an isolate from nearby Tanzania (93.4% pairwise nucleotide identity) than to those previously reported from Malawi (86.9 to 87.0%). PMID:28818908

  10. High-Throughput SNP Discovery through Deep Resequencing of a Reduced Representation Library to Anchor and Orient Scaffolds in the Soybean Whole Genome Sequence

    USDA-ARS?s Scientific Manuscript database

    The soybean Consensus Map 4.0 facilitated the anchoring of 95.6% of the soybean whole genome sequence developed by the Joint Genome Institute, Department of Energy but only properly oriented 66% of the sequence scaffolds. To find additional single nucleotide polymorphism (SNP) markers for additiona...

  11. A map of human genome variation from population-scale sequencing.

    PubMed

    Abecasis, Gonçalo R; Altshuler, David; Auton, Adam; Brooks, Lisa D; Durbin, Richard M; Gibbs, Richard A; Hurles, Matt E; McVean, Gil A

    2010-10-28

    The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

  12. Unravelling the complexity of microRNA-mediated gene regulation in black pepper (Piper nigrum L.) using high-throughput small RNA profiling.

    PubMed

    Asha, Srinivasan; Sreekumar, Sweda; Soniya, E V

    2016-01-01

    Analysis of high-throughput small RNA deep sequencing data, in combination with black pepper transcriptome sequences revealed microRNA-mediated gene regulation in black pepper ( Piper nigrum L.). Black pepper is an important spice crop and its berries are used worldwide as a natural food additive that contributes unique flavour to foods. In the present study to characterize microRNAs from black pepper, we generated a small RNA library from black pepper leaf and sequenced it by Illumina high-throughput sequencing technology. MicroRNAs belonging to a total of 303 conserved miRNA families were identified from the sRNAome data. Subsequent analysis from recently sequenced black pepper transcriptome confirmed precursor sequences of 50 conserved miRNAs and four potential novel miRNA candidates. Stem-loop qRT-PCR experiments demonstrated differential expression of eight conserved miRNAs in black pepper. Computational analysis of targets of the miRNAs showed 223 potential black pepper unigene targets that encode diverse transcription factors and enzymes involved in plant development, disease resistance, metabolic and signalling pathways. RLM-RACE experiments further mapped miRNA-mediated cleavage at five of the mRNA targets. In addition, miRNA isoforms corresponding to 18 miRNA families were also identified from black pepper. This study presents the first large-scale identification of microRNAs from black pepper and provides the foundation for the future studies of miRNA-mediated gene regulation of stress responses and diverse metabolic processes in black pepper.

  13. DeepSAGE Based Differential Gene Expression Analysis under Cold and Freeze Stress in Seabuckthorn (Hippophae rhamnoides L.)

    PubMed Central

    Chaudhary, Saurabh; Sharma, Prakash C.

    2015-01-01

    Seabuckthorn (Hippophae rhamnoides L.), an important plant species of Indian Himalayas, is well known for its immense medicinal and nutritional value. The plant has the ability to sustain growth in harsh environments of extreme temperatures, drought and salinity. We employed DeepSAGE, a tag based approach, to identify differentially expressed genes under cold and freeze stress in seabuckthorn. In total 36.2 million raw tags including 13.9 million distinct tags were generated using Illumina sequencing platform for three leaf tissue libraries including control (CON), cold stress (CS) and freeze stress (FS). After discarding low quality tags, 35.5 million clean tags including 7 million distinct clean tags were obtained. In all, 11922 differentially expressed genes (DEGs) including 6539 up regulated and 5383 down regulated genes were identified in three comparative setups i.e. CON vs CS, CON vs FS and CS vs FS. Gene ontology and KEGG pathway analysis were performed to assign gene ontology term to DEGs and ascertain their biological functions. DEGs were mapped back to our existing seabuckthorn transcriptome assembly comprising of 88,297 putative unigenes leading to the identification of 428 cold and freeze stress responsive genes. Expression of randomly selected 22 DEGs was validated using qRT-PCR that further supported our DeepSAGE results. The present study provided a comprehensive view of global gene expression profile of seabuckthorn under cold and freeze stresses. The DeepSAGE data could also serve as a valuable resource for further functional genomics studies aiming selection of candidate genes for development of abiotic stress tolerant transgenic plants. PMID:25803684

  14. DeepSAGE based differential gene expression analysis under cold and freeze stress in seabuckthorn (Hippophae rhamnoides L.).

    PubMed

    Chaudhary, Saurabh; Sharma, Prakash C

    2015-01-01

    Seabuckthorn (Hippophae rhamnoides L.), an important plant species of Indian Himalayas, is well known for its immense medicinal and nutritional value. The plant has the ability to sustain growth in harsh environments of extreme temperatures, drought and salinity. We employed DeepSAGE, a tag based approach, to identify differentially expressed genes under cold and freeze stress in seabuckthorn. In total 36.2 million raw tags including 13.9 million distinct tags were generated using Illumina sequencing platform for three leaf tissue libraries including control (CON), cold stress (CS) and freeze stress (FS). After discarding low quality tags, 35.5 million clean tags including 7 million distinct clean tags were obtained. In all, 11922 differentially expressed genes (DEGs) including 6539 up regulated and 5383 down regulated genes were identified in three comparative setups i.e. CON vs CS, CON vs FS and CS vs FS. Gene ontology and KEGG pathway analysis were performed to assign gene ontology term to DEGs and ascertain their biological functions. DEGs were mapped back to our existing seabuckthorn transcriptome assembly comprising of 88,297 putative unigenes leading to the identification of 428 cold and freeze stress responsive genes. Expression of randomly selected 22 DEGs was validated using qRT-PCR that further supported our DeepSAGE results. The present study provided a comprehensive view of global gene expression profile of seabuckthorn under cold and freeze stresses. The DeepSAGE data could also serve as a valuable resource for further functional genomics studies aiming selection of candidate genes for development of abiotic stress tolerant transgenic plants.

  15. Comparative metagenomics of microbial communities inhabiting deep-sea hydrothermal vent chimneys with contrasting chemistries

    PubMed Central

    Xie, Wei; Wang, Fengping; Guo, Lei; Chen, Zeling; Sievert, Stefan M; Meng, Jun; Huang, Guangrui; Li, Yuxin; Yan, Qingyu; Wu, Shan; Wang, Xin; Chen, Shangwu; He, Guangyuan; Xiao, Xiang; Xu, Anlong

    2011-01-01

    Deep-sea hydrothermal vent chimneys harbor a high diversity of largely unknown microorganisms. Although the phylogenetic diversity of these microorganisms has been described previously, the adaptation and metabolic potential of the microbial communities is only beginning to be revealed. A pyrosequencing approach was used to directly obtain sequences from a fosmid library constructed from a black smoker chimney 4143-1 in the Mothra hydrothermal vent field at the Juan de Fuca Ridge. A total of 308 034 reads with an average sequence length of 227 bp were generated. Comparative genomic analyses of metagenomes from a variety of environments by two-way clustering of samples and functional gene categories demonstrated that the 4143-1 metagenome clustered most closely with that from a carbonate chimney from Lost City. Both are highly enriched in genes for mismatch repair and homologous recombination, suggesting that the microbial communities have evolved extensive DNA repair systems to cope with the extreme conditions that have potential deleterious effects on the genomes. As previously reported for the Lost City microbiome, the metagenome of chimney 4143-1 exhibited a high proportion of transposases, implying that horizontal gene transfer may be a common occurrence in the deep-sea vent chimney biosphere. In addition, genes for chemotaxis and flagellar assembly were highly enriched in the chimney metagenomes, reflecting the adaptation of the organisms to the highly dynamic conditions present within the chimney walls. Reconstruction of the metabolic pathways revealed that the microbial community in the wall of chimney 4143-1 was mainly fueled by sulfur oxidation, putatively coupled to nitrate reduction to perform inorganic carbon fixation through the Calvin–Benson–Bassham cycle. On the basis of the genomic organization of the key genes of the carbon fixation and sulfur oxidation pathways contained in the large genomic fragments, both obligate and facultative autotrophs appear to be present and contribute to biomass production. PMID:20927138

  16. Identifying active foraminifera in the Sea of Japan using metatranscriptomic approach

    NASA Astrophysics Data System (ADS)

    Lejzerowicz, Franck; Voltsky, Ivan; Pawlowski, Jan

    2013-02-01

    Metagenetics represents an efficient and rapid tool to describe environmental diversity patterns of microbial eukaryotes based on ribosomal DNA sequences. However, the results of metagenetic studies are often biased by the presence of extracellular DNA molecules that are persistent in the environment, especially in deep-sea sediment. As an alternative, short-lived RNA molecules constitute a good proxy for the detection of active species. Here, we used a metatranscriptomic approach based on RNA-derived (cDNA) sequences to study the diversity of the deep-sea benthic foraminifera and compared it to the metagenetic approach. We analyzed 257 ribosomal DNA and cDNA sequences obtained from seven sediments samples collected in the Sea of Japan at depths ranging from 486 to 3665 m. The DNA and RNA-based approaches gave a similar view of the taxonomic composition of foraminiferal assemblage, but differed in some important points. First, the cDNA dataset was dominated by sequences of rotaliids and robertiniids, suggesting that these calcareous species, some of which have been observed in Rose Bengal stained samples, are the most active component of foraminiferal community. Second, the richness of monothalamous (single-chambered) foraminifera was particularly high in DNA extracts from the deepest samples, confirming that this group of foraminifera is abundant but not necessarily very active in the deep-sea sediments. Finally, the high divergence of undetermined sequences in cDNA dataset indicate the limits of our database and lack of knowledge about some active but possibly rare species. Our study demonstrates the capability of the metatranscriptomic approach to detect active foraminiferal species and prompt its use in future high-throughput sequencing-based environmental surveys.

  17. Carbonate sedimentation in an extensional active margin: Cretaceous history of the Haymana region, Pontides

    NASA Astrophysics Data System (ADS)

    Okay, Aral I.; Altiner, Demir

    2016-10-01

    The Haymana region in Central Anatolia is located in the southern part of the Pontides close to the İzmir-Ankara suture. During the Cretaceous, the region formed part of the south-facing active margin of the Eurasia. The area preserves a nearly complete record of the Cretaceous system. Shallow marine carbonates of earliest Cretaceous age are overlain by a 700-m-thick Cretaceous sequence, dominated by deep marine limestones. Three unconformity-bounded pelagic carbonate sequences of Berriasian, Albian-Cenomanian and Turonian-Santonian ages are recognized: Each depositional sequence is preceded by a period of tilting and submarine erosion during the Berriasian, early Albian and late Cenomanian, which corresponds to phases of local extension in the active continental margin. Carbonate breccias mark the base of the sequences and each carbonate sequence steps down on older units. The deep marine carbonate deposition ended in the late Santonian followed by tilting, erosion and folding during the Campanian. Deposition of thick siliciclastic turbidites started in the late Campanian and continued into the Tertiary. Unlike most forearc basins, the Haymana region was a site of deep marine carbonate deposition until the Campanian. This was because the Pontide arc was extensional and the volcanic detritus was trapped in the intra-arc basins and did not reach the forearc or the trench. The extensional nature of the arc is also shown by the opening of the Black Sea as a backarc basin in the Turonian-Santonian. The carbonate sedimentation in an active margin is characterized by synsedimentary vertical displacements, which results in submarine erosion, carbonate breccias and in the lateral discontinuity of the sequences, and differs from blanket like carbonate deposition in the passive margins.

  18. Subsurface microbial diversity in deep-granitic-fracture water in Colorado

    USGS Publications Warehouse

    Sahl, J.W.; Schmidt, R.; Swanner, E.D.; Mandernack, K.W.; Templeton, A.S.; Kieft, Thomas L.; Smith, R.L.; Sanford, W.E.; Callaghan, R.L.; Mitton, J.B.; Spear, J.R.

    2008-01-01

    A microbial community analysis using 16S rRNA gene sequencing was performed on borehole water and a granite rock core from Henderson Mine, a >1,000-meter-deep molybdenum mine near Empire, CO. Chemical analysis of borehole water at two separate depths (1,044 m and 1,004 m below the mine entrance) suggests that a sharp chemical gradient exists, likely from the mixing of two distinct subsurface fluids, one metal rich and one relatively dilute; this has created unique niches for microorganisms. The microbial community analyzed from filtered, oxic borehole water indicated an abundance of sequences from iron-oxidizing bacteria (Gallionella spp.) and was compared to the community from the same borehole after 2 weeks of being plugged with an expandable packer. Statistical analyses with UniFrac revealed a significant shift in community structure following the addition of the packer. Phospholipid fatty acid (PLFA) analysis suggested that Nitrosomonadales dominated the oxic borehole, while PLFAs indicative of anaerobic bacteria were most abundant in the samples from the plugged borehole. Microbial sequences were represented primarily by Firmicutes, Proteobacteria, and a lineage of sequences which did not group with any identified bacterial division; phylogenetic analyses confirmed the presence of a novel candidate division. This "Henderson candidate division" dominated the clone libraries from the dilute anoxic fluids. Sequences obtained from the granitic rock core (1,740 m below the surface) were represented by the divisions Proteobacteria (primarily the family Ralstoniaceae) and Firmicutes. Sequences grouping within Ralstoniaceae were also found in the clone libraries from metal-rich fluids yet were absent in more dilute fluids. Lineage-specific comparisons, combined with phylogenetic statistical analyses, show that geochemical variance has an important effect on microbial community structure in deep, subsurface systems. Copyright ?? 2008, American Society for Microbiology. All Rights Reserved.

  19. Subsurface Microbial Diversity in Deep-Granitic-Fracture Water in Colorado▿

    PubMed Central

    Sahl, Jason W.; Schmidt, Raleigh; Swanner, Elizabeth D.; Mandernack, Kevin W.; Templeton, Alexis S.; Kieft, Thomas L.; Smith, Richard L.; Sanford, William E.; Callaghan, Robert L.; Mitton, Jeffry B.; Spear, John R.

    2008-01-01

    A microbial community analysis using 16S rRNA gene sequencing was performed on borehole water and a granite rock core from Henderson Mine, a >1,000-meter-deep molybdenum mine near Empire, CO. Chemical analysis of borehole water at two separate depths (1,044 m and 1,004 m below the mine entrance) suggests that a sharp chemical gradient exists, likely from the mixing of two distinct subsurface fluids, one metal rich and one relatively dilute; this has created unique niches for microorganisms. The microbial community analyzed from filtered, oxic borehole water indicated an abundance of sequences from iron-oxidizing bacteria (Gallionella spp.) and was compared to the community from the same borehole after 2 weeks of being plugged with an expandable packer. Statistical analyses with UniFrac revealed a significant shift in community structure following the addition of the packer. Phospholipid fatty acid (PLFA) analysis suggested that Nitrosomonadales dominated the oxic borehole, while PLFAs indicative of anaerobic bacteria were most abundant in the samples from the plugged borehole. Microbial sequences were represented primarily by Firmicutes, Proteobacteria, and a lineage of sequences which did not group with any identified bacterial division; phylogenetic analyses confirmed the presence of a novel candidate division. This “Henderson candidate division” dominated the clone libraries from the dilute anoxic fluids. Sequences obtained from the granitic rock core (1,740 m below the surface) were represented by the divisions Proteobacteria (primarily the family Ralstoniaceae) and Firmicutes. Sequences grouping within Ralstoniaceae were also found in the clone libraries from metal-rich fluids yet were absent in more dilute fluids. Lineage-specific comparisons, combined with phylogenetic statistical analyses, show that geochemical variance has an important effect on microbial community structure in deep, subsurface systems. PMID:17981950

  20. Deep sequencing of hepatitis C virus hypervariable region 1 reveals no correlation between genetic heterogeneity and antiviral treatment outcome

    PubMed Central

    2014-01-01

    Background Hypervariable region 1 (HVR1) contained within envelope protein 2 (E2) gene is the most variable part of HCV genome and its translation product is a major target for the host immune response. Variability within HVR1 may facilitate evasion of the immune response and could affect treatment outcome. The aim of the study was to analyze the impact of HVR1 heterogeneity employing sensitive ultra-deep sequencing, on the outcome of PEG-IFN-α (pegylated interferon α) and ribavirin treatment. Methods HVR1 sequences were amplified from pretreatment serum samples of 25 patients infected with genotype 1b HCV (12 responders and 13 non-responders) and were subjected to pyrosequencing (GS Junior, 454/Roche). Reads were corrected for sequencing error using ShoRAH software, while population reconstruction was done using three different minimal variant frequency cut-offs of 1%, 2% and 5%. Statistical analysis was done using Mann–Whitney and Fisher’s exact tests. Results Complexity, Shannon entropy, nucleotide diversity per site, genetic distance and the number of genetic substitutions were not significantly different between responders and non-responders, when analyzing viral populations at any of the three frequencies (≥1%, ≥2% and ≥5%). When clonal sample was used to determine pyrosequencing error, 4% of reads were found to be incorrect and the most abundant variant was present at a frequency of 1.48%. Use of ShoRAH reduced the sequencing error to 1%, with the most abundant erroneous variant present at frequency of 0.5%. Conclusions While deep sequencing revealed complex genetic heterogeneity of HVR1 in chronic hepatitis C patients, there was no correlation between treatment outcome and any of the analyzed quasispecies parameters. PMID:25016390

  1. Estimating Exceptionally Rare Germline and Somatic Mutation Frequencies via Next Generation Sequencing

    PubMed Central

    Yoon, Song-Ro; Arnheim, Norman; Calabrese, Peter

    2016-01-01

    We used targeted next generation deep-sequencing (Safe Sequencing System) to measure ultra-rare de novo mutation frequencies in the human male germline by attaching a unique identifier code to each target DNA molecule. Segments from three different human genes (FGFR3, MECP2 and PTPN11) were studied. Regardless of the gene segment, the particular testis donor or the 73 different testis pieces used, the frequencies for any one of the six different mutation types were consistent. Averaging over the C>T/G>A and G>T/C>A mutation types the background mutation frequency was 2.6x10-5 per base pair, while for the four other mutation types the average background frequency was lower at 1.5x10-6 per base pair. These rates far exceed the well documented human genome average frequency per base pair (~10−8) suggesting a non-biological explanation for our data. By computational modeling and a new experimental procedure to distinguish between pre-mutagenic lesion base mismatches and a fully mutated base pair in the original DNA molecule, we argue that most of the base-dependent variation in background frequency is due to a mixture of deamination and oxidation during the first two PCR cycles. Finally, we looked at a previously studied disease mutation in the PTPN11 gene and could easily distinguish true mutations from the SSS background. We also discuss the limits and possibilities of this and other methods to measure exceptionally rare mutation frequencies, and we present calculations for other scientists seeking to design their own such experiments. PMID:27341568

  2. Deep sequencing reveals distinct patterns of DNA methylation in prostate cancer.

    PubMed

    Kim, Jung H; Dhanasekaran, Saravana M; Prensner, John R; Cao, Xuhong; Robinson, Daniel; Kalyana-Sundaram, Shanker; Huang, Christina; Shankar, Sunita; Jing, Xiaojun; Iyer, Matthew; Hu, Ming; Sam, Lee; Grasso, Catherine; Maher, Christopher A; Palanisamy, Nallasivam; Mehra, Rohit; Kominsky, Hal D; Siddiqui, Javed; Yu, Jindan; Qin, Zhaohui S; Chinnaiyan, Arul M

    2011-07-01

    Beginning with precursor lesions, aberrant DNA methylation marks the entire spectrum of prostate cancer progression. We mapped the global DNA methylation patterns in select prostate tissues and cell lines using MethylPlex-next-generation sequencing (M-NGS). Hidden Markov model-based next-generation sequence analysis identified ∼68,000 methylated regions per sample. While global CpG island (CGI) methylation was not differential between benign adjacent and cancer samples, overall promoter CGI methylation significantly increased from ~12.6% in benign samples to 19.3% and 21.8% in localized and metastatic cancer tissues, respectively (P-value < 2 × 10(-16)). We found distinct patterns of promoter methylation around transcription start sites, where methylation occurred not only on the CGIs, but also on flanking regions and CGI sparse promoters. Among the 6691 methylated promoters in prostate tissues, 2481 differentially methylated regions (DMRs) are cancer-specific, including numerous novel DMRs. A novel cancer-specific DMR in the WFDC2 promoter showed frequent methylation in cancer (17/22 tissues, 6/6 cell lines), but not in the benign tissues (0/10) and normal PrEC cells. Integration of LNCaP DNA methylation and H3K4me3 data suggested an epigenetic mechanism for alternate transcription start site utilization, and these modifications segregated into distinct regions when present on the same promoter. Finally, we observed differences in repeat element methylation, particularly LINE-1, between ERG gene fusion-positive and -negative cancers, and we confirmed this observation using pyrosequencing on a tissue panel. This comprehensive methylome map will further our understanding of epigenetic regulation in prostate cancer progression.

  3. Deep Sequencing-guided Design of a High Affinity Dual Specificity Antibody to Target Two Angiogenic Factors in Neovascular Age-related Macular Degeneration* ♦

    PubMed Central

    Koenig, Patrick; Lee, Chingwei V.; Sanowar, Sarah; Wu, Ping; Stinson, Jeremy; Harris, Seth F.; Fuh, Germaine

    2015-01-01

    The development of dual targeting antibodies promises therapies with improved efficacy over mono-specific antibodies. Here, we engineered a Two-in-One VEGF/angiopoietin 2 antibody with dual action Fab (DAF) as a potential therapeutic for neovascular age-related macular degeneration. Crystal structures of the VEGF/angiopoietin 2 DAF in complex with its two antigens showed highly overlapping binding sites. To achieve sufficient affinity of the DAF to block both angiogenic factors, we turned to deep mutational scanning in the complementarity determining regions (CDRs). By mutating all three CDRs of each antibody chain simultaneously, we were able not only to identify affinity improving single mutations but also mutation pairs from different CDRs that synergistically improve both binding functions. Furthermore, insights into the cooperativity between mutations allowed us to identify fold-stabilizing mutations in the CDRs. The data obtained from deep mutational scanning reveal that the majority of the 52 CDR residues are utilized differently for the two antigen binding function and permit, for the first time, the engineering of several DAF variants with sub-nanomolar affinity against two structurally unrelated antigens. The improved variants show similar blocking activity of receptor binding as the high affinity mono-specific antibodies against these two proteins, demonstrating the feasibility of generating a dual specificity binding surface with comparable properties to individual high affinity mono-specific antibodies. PMID:26088137

  4. Hawaii Energy and Environmental Technologies Initiative

    DTIC Science & Technology

    2005-06-01

    include a hydrate synthesis system, benthic pressure chambers to simulate deep seafloor sediment, and specialized instrumentation for high pressure...the high probability that a sulfide/oxygen microbial fuel cell can generate electricity in deep ocean sediments, and that prolonged power generation may...hydrogen generation (using an electrolyser) and storage, and on-line high -resolution gas analysis. In addition to installation and commissioning of

  5. Long-term seismicity of the Reykjanes Ridge (North Atlantic) recorded by a regional hydrophone array

    NASA Astrophysics Data System (ADS)

    Goslin, Jean; Lourenço, Nuno; Dziak, Robert P.; Bohnenstiehl, DelWayne R.; Haxel, Joe; Luis, Joaquim

    2005-08-01

    The seismicity of the northern Mid-Atlantic Ridge was recorded by two hydrophone networks moored in the sound fixing and ranging (SOFAR) channel, on the flanks of the Mid-Atlantic Ridge, north and south of the Azores. During its period of operation (05/2002-09/2003), the northern `SIRENA' network, deployed between latitudes 40° 20'N and 50° 30'N, recorded acoustic signals generated by 809 earthquakes on the hotspot-influenced Reykjanes Ridge. This activity was distributed between five spatio-temporal event clusters, each initiated by a moderate-to-large magnitude (4.0-5.6 M) earthquake. The rate of earthquake occurrence within the initial portion of the largest sequence (which began on 2002 October 6) is described adequately by a modified Omori law aftershock model. Although this is consistent with triggering by tectonic processes, none of the Reykjanes Ridge sequences are dominated by a single large-magnitude earthquake, and they appear to be of relatively short duration (0.35-4.5 d) when compared to previously described mid-ocean ridge aftershock sequences. The occurrence of several near-equal magnitude events distributed throughout each sequence is inconsistent with the simple relaxation of mainshock-induced stresses and may reflect the involvement of magmatic or fluid processes along this deep (>2000 m) section of the Reykjanes Ridge.

  6. ViDiT-CACTUS: an inexpensive and versatile library preparation and sequence analysis method for virus discovery and other microbiology applications.

    PubMed

    Verhoeven, Joost Theo Petra; Canuti, Marta; Munro, Hannah J; Dufour, Suzanne C; Lang, Andrew S

    2018-04-19

    High-throughput sequencing (HTS) technologies are becoming increasingly important within microbiology research, but aspects of library preparation, such as high cost per sample or strict input requirements, make HTS difficult to implement in some niche applications and for research groups on a budget. To answer these necessities, we developed ViDiT, a customizable, PCR-based, extremely low-cost (<5 US dollars per sample) and versatile library preparation method, and CACTUS, an analysis pipeline designed to rely on cloud computing power to generate high-quality data from ViDiT-based experiments without the need of expensive servers. We demonstrate here the versatility and utility of these methods within three fields of microbiology: virus discovery, amplicon-based viral genome sequencing and microbiome profiling. ViDiT-CACTUS allowed the identification of viral fragments from 25 different viral families from 36 oropharyngeal-cloacal swabs collected from wild birds, the sequencing of three almost complete genomes of avian influenza A viruses (>90% coverage), and the characterization and functional profiling of the complete microbial diversity (bacteria, archaea, viruses) within a deep-sea carnivorous sponge. ViDiT-CACTUS demonstrated its validity in a wide range of microbiology applications and its simplicity and modularity make it easily implementable in any molecular biology laboratory, towards various research goals.

  7. PIK3CA-associated developmental disorders exhibit distinct classes of mutations with variable expression and tissue distribution

    PubMed Central

    Timms, Andrew E.; Conti, Valerio; Girisha, Katta M.; Martin, Beth; Olds, Carissa; Collins, Sarah; Park, Kaylee; Carter, Melissa; Krägeloh-Mann, Inge; Chitayat, David; Parikh, Aditi Shah; Bradshaw, Rachael; Torti, Erin; Braddock, Stephen; Burke, Leah; Ghedia, Sondhya; Stephan, Mark; Stewart, Fiona; Prasad, Chitra; Napier, Melanie; Saitta, Sulagna; Straussberg, Rachel; Gabbett, Michael; O’Connor, Bridget C.; Yin, Lim Jiin; Lai, Angeline Hwei Meeng; Martin, Nicole; McKinnon, Margaret; Addor, Marie-Claude; Schwartz, Charles E.; Lanoel, Agustina; Conway, Robert L.; Devriendt, Koenraad; Tatton-Brown, Katrina; Pierpont, Mary Ella; Painter, Michael; Worgan, Lisa; Reggin, James; Hennekam, Raoul; Pritchard, Colin C.; Aracena, Mariana; Gripp, Karen W.; Cordisco, Maria; Van Esch, Hilde; Garavelli, Livia; Curry, Cynthia; Goriely, Anne; Kayserilli, Hulya; Shendure, Jay; Graham, John; Guerrini, Renzo; Dobyns, William B.

    2016-01-01

    Mosaicism is increasingly recognized as a cause of developmental disorders with the advent of next-generation sequencing (NGS). Mosaic mutations of PIK3CA have been associated with the widest spectrum of phenotypes associated with overgrowth and vascular malformations. We performed targeted NGS using 2 independent deep-coverage methods that utilize molecular inversion probes and amplicon sequencing in a cohort of 241 samples from 181 individuals with brain and/or body overgrowth. We identified PIK3CA mutations in 60 individuals. Several other individuals (n = 12) were identified separately to have mutations in PIK3CA by clinical targeted-panel testing (n = 6), whole-exome sequencing (n = 5), or Sanger sequencing (n = 1). Based on the clinical and molecular features, this cohort segregated into three distinct groups: (a) severe focal overgrowth due to low-level but highly activating (hotspot) mutations, (b) predominantly brain overgrowth and less severe somatic overgrowth due to less-activating mutations, and (c) intermediate phenotypes (capillary malformations with overgrowth) with intermediately activating mutations. Sixteen of 29 PIK3CA mutations were novel. We also identified constitutional PIK3CA mutations in 10 patients. Our molecular data, combined with review of the literature, show that PIK3CA-related overgrowth disorders comprise a discontinuous spectrum of disorders that correlate with the severity and distribution of mutations. PMID:27631024

  8. BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone.

    PubMed

    Yang, Bite; Liu, Feng; Ren, Chao; Ouyang, Zhangyi; Xie, Ziwei; Bo, Xiaochen; Shu, Wenjie

    2017-07-01

    Enhancer elements are noncoding stretches of DNA that play key roles in controlling gene expression programmes. Despite major efforts to develop accurate enhancer prediction methods, identifying enhancer sequences continues to be a challenge in the annotation of mammalian genomes. One of the major issues is the lack of large, sufficiently comprehensive and experimentally validated enhancers for humans or other species. Thus, the development of computational methods based on limited experimentally validated enhancers and deciphering the transcriptional regulatory code encoded in the enhancer sequences is urgent. We present a deep-learning-based hybrid architecture, BiRen, which predicts enhancers using the DNA sequence alone. Our results demonstrate that BiRen can learn common enhancer patterns directly from the DNA sequence and exhibits superior accuracy, robustness and generalizability in enhancer prediction relative to other state-of-the-art enhancer predictors based on sequence characteristics. Our BiRen will enable researchers to acquire a deeper understanding of the regulatory code of enhancer sequences. Our BiRen method can be freely accessed at https://github.com/wenjiegroup/BiRen . shuwj@bmi.ac.cn or boxc@bmi.ac.cn. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  9. Distribution and Diversity of Microbial Eukaryotes in Bathypelagic Waters of the South China Sea.

    PubMed

    Xu, Dapeng; Jiao, Nianzhi; Ren, Rui; Warren, Alan

    2017-05-01

    Little is known about the biodiversity of microbial eukaryotes in the South China Sea, especially in waters at bathyal depths. Here, we employed SSU rDNA gene sequencing to reveal the diversity and community structure across depth and distance gradients in the South China Sea. Vertically, the highest alpha diversity was found at 75-m depth. The communities of microbial eukaryotes were clustered into shallow-, middle-, and deep-water groups according to the depth from which they were collected, indicating a depth-related diversity and distribution pattern. Rhizaria sequences dominated the microeukaryote community and occurred in all samples except those from less than 50-m deep, being most abundant near the sea floor where they contributed ca. 64-97% and 40-74% of the total sequences and OTUs recovered, respectively. A large portion of rhizarian OTUs has neither a nearest named neighbor nor a nearest neighbor in the GenBank database which indicated the presence of new phylotypes in the South China Sea. Given their overwhelming abundance and richness, further phylogenetic analysis of rhizarians were performed and three new genetic clusters were revealed containing sequences retrieved from the deep waters of the South China Sea. Our results shed light on the diversity and community structure of microbial eukaryotes in this not yet fully explored area. © 2016 The Author(s) Journal of Eukaryotic Microbiology © 2016 International Society of Protistologists.

  10. Deep RNA-Seq to unlock the gene bank of floral development in Sinapis arvensis.

    PubMed

    Liu, Jia; Mei, Desheng; Li, Yunchang; Huang, Shunmou; Hu, Qiong

    2014-01-01

    Sinapis arvensis is a weed with strong biological activity. Despite being a problematic annual weed that contaminates agricultural crop yield, it is a valuable alien germplasm resource. It can be utilized for broadening the genetic background of Brassica crops with desirable agricultural traits like resistance to blackleg (Leptosphaeria maculans), stem rot (Sclerotinia sclerotium) and pod shatter (caused by FRUITFULL gene). However, few genetic studies of S. arvensis were reported because of the lack of genomic resources. In the present study, we performed de novo transcriptome sequencing to produce a comprehensive dataset for S. arvensis for the first time. We used Illumina paired-end sequencing technology to sequence the S. arvensis flower transcriptome and generated 40,981,443 reads that were assembled into 131,278 transcripts. We de novo assembled 96,562 high quality unigenes with an average length of 832 bp. A total of 33,662 full-length ORF complete sequences were identified, and 41,415 unigenes were mapped onto 128 pathways using the KEGG Pathway database. The annotated unigenes were compared against Brassica rapa, B. oleracea, B. napus and Arabidopsis thaliana. Among these unigenes, 76,324 were identified as putative homologs of annotated sequences in the public protein databases, of which 1194 were associated with plant hormone signal transduction and 113 were related to gibberellin homeostasis/signaling. Unigenes that did not match any of those sequence datasets were considered to be unique to S. arvensis. Furthermore, 21,321 simple sequence repeats were found. Our study will enhance the currently available resources for Brassicaceae and will provide a platform for future genomic studies for genetic improvement of Brassica crops.

  11. Deep RNA-Seq to Unlock the Gene Bank of Floral Development in Sinapis arvensis

    PubMed Central

    Liu, Jia; Mei, Desheng; Li, Yunchang; Huang, Shunmou; Hu, Qiong

    2014-01-01

    Sinapis arvensis is a weed with strong biological activity. Despite being a problematic annual weed that contaminates agricultural crop yield, it is a valuable alien germplasm resource. It can be utilized for broadening the genetic background of Brassica crops with desirable agricultural traits like resistance to blackleg (Leptosphaeria maculans), stem rot (Sclerotinia sclerotium) and pod shatter (caused by FRUITFULL gene). However, few genetic studies of S. arvensis were reported because of the lack of genomic resources. In the present study, we performed de novo transcriptome sequencing to produce a comprehensive dataset for S. arvensis for the first time. We used Illumina paired-end sequencing technology to sequence the S. arvensis flower transcriptome and generated 40,981,443 reads that were assembled into 131,278 transcripts. We de novo assembled 96,562 high quality unigenes with an average length of 832 bp. A total of 33,662 full-length ORF complete sequences were identified, and 41,415 unigenes were mapped onto 128 pathways using the KEGG Pathway database. The annotated unigenes were compared against Brassica rapa, B. oleracea, B. napus and Arabidopsis thaliana. Among these unigenes, 76,324 were identified as putative homologs of annotated sequences in the public protein databases, of which 1194 were associated with plant hormone signal transduction and 113 were related to gibberellin homeostasis/signaling. Unigenes that did not match any of those sequence datasets were considered to be unique to S. arvensis. Furthermore, 21,321 simple sequence repeats were found. Our study will enhance the currently available resources for Brassicaceae and will provide a platform for future genomic studies for genetic improvement of Brassica crops. PMID:25192023

  12. Metavisitor, a Suite of Galaxy Tools for Simple and Rapid Detection and Discovery of Viruses in Deep Sequence Data

    PubMed Central

    Vernick, Kenneth D.

    2017-01-01

    Metavisitor is a software package that allows biologists and clinicians without specialized bioinformatics expertise to detect and assemble viral genomes from deep sequence datasets. The package is composed of a set of modular bioinformatic tools and workflows that are implemented in the Galaxy framework. Using the graphical Galaxy workflow editor, users with minimal computational skills can use existing Metavisitor workflows or adapt them to suit specific needs by adding or modifying analysis modules. Metavisitor works with DNA, RNA or small RNA sequencing data over a range of read lengths and can use a combination of de novo and guided approaches to assemble genomes from sequencing reads. We show that the software has the potential for quick diagnosis as well as discovery of viruses from a vast array of organisms. Importantly, we provide here executable Metavisitor use cases, which increase the accessibility and transparency of the software, ultimately enabling biologists or clinicians to focus on biological or medical questions. PMID:28045932

  13. Complete genome sequence of the aerobic, heterotroph Marinithermus hydrothermalis type strain (T1T) from a deep-sea hydrothermal vent chimney

    PubMed Central

    Copeland, Alex; Gu, Wei; Yasawong, Montri; Lapidus, Alla; Lucas, Susan; Deshpande, Shweta; Pagani, Ioanna; Tapia, Roxanne; Cheng, Jan-Fang; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Pan, Chongle; Brambilla, Evelyne-Marie; Rohde, Manfred; Tindall, Brian J.; Sikorski, Johannes; Göker, Markus; Detter, John C.; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Woyke, Tanja

    2012-01-01

    Marinithermus hydrothermalis Sako et al. 2003 is the type species of the monotypic genus Marinithermus. M. hydrothermalis T1T was the first isolate within the phylum “Thermus-Deinococcus” to exhibit optimal growth under a salinity equivalent to that of sea water and to have an absolute requirement for NaCl for growth. M. hydrothermalis T1T is of interest because it may provide a new insight into the ecological significance of the aerobic, thermophilic decomposers in the circulation of organic compounds in deep-sea hydrothermal vent ecosystems. This is the first completed genome sequence of a member of the genus Marinithermus and the seventh sequence from the family Thermaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,269,167 bp long genome with its 2,251 protein-coding and 59 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:22675595

  14. Brain Tumor Segmentation Using Deep Belief Networks and Pathological Knowledge.

    PubMed

    Zhan, Tianming; Chen, Yi; Hong, Xunning; Lu, Zhenyu; Chen, Yunjie

    2017-01-01

    In this paper, we propose an automatic brain tumor segmentation method based on Deep Belief Networks (DBNs) and pathological knowledge. The proposed method is targeted against gliomas (both low and high grade) obtained in multi-sequence magnetic resonance images (MRIs). Firstly, a novel deep architecture is proposed to combine the multi-sequences intensities feature extraction with classification to get the classification probabilities of each voxel. Then, graph cut based optimization is executed on the classification probabilities to strengthen the spatial relationships of voxels. At last, pathological knowledge of gliomas is applied to remove some false positives. Our method was validated in the Brain Tumor Segmentation Challenge 2012 and 2013 databases (BRATS 2012, 2013). The performance of segmentation results demonstrates our proposal providing a competitive solution with stateof- the-art methods. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  15. The seismic stratigraphy of Okanagan Lake, British Columbia; a record of rapid deglaciation in a deep 'fiord-lake' basin

    NASA Astrophysics Data System (ADS)

    Eyles, Nicholas; Mullins, Henry T.; Hine, Albert C.

    1991-09-01

    This paper presents the first detailed data regarding the newly discovered deep infill of Okanagan Lake. Okanagan Lake (50°00'N, 119°30'W) is 120 km long, ˜ 3-5 km wide and occupies a glacially overdeepened bedrock basin in the southern interior of British Columbia. This basin, and other elongate lakes of the region (e.g. Shuswap, Kootenay, Kalamalka, Canim and Mahood lakes), mark the site of westward flowing ice streams within successive Cordilleran ice sheets. An air gun seismic survey of Okanagan Lake shows that the bedrock floor is nearly 650 m below sea-level, more than 2000 m below the rim of the surrounding plateau. The maximum thickness of Pleistocene sediment in Okanagan Lake basin approaches 800 m. Forty-six seismic reflection traverses and an axial profile show a relatively simple stratigraphy composed of three seismic sequences argued to be no older than the last glacial cycle (< 30 ka). A discontinuous basal unit (sequence I) characterized by large-scale diffractions, and up to 460 m thick, infills the narrow, V-shaped bedrock floor of the basin and is interpreted as a boulder gravel deposited by subglacial meltwaters. Overlying seismic sequence II is composed of two sub-sequences. Sub-sequence IIa is a chaotic to massive facies up to 736 m thick. Lakeshore exposures close to where this unit reaches lake level show deformed and chaotically-bedded glaciolacustrine silts containing gravel lens and large ice-rafted boulders. The surface topography of this sub-sequence is irregular and in general mimics the form of the underlying bedrock as a result of compaction. This sequence passes laterally into stratified facies (sub-sequence IIb) at the northern end of the basin. Seismic sequence II appears to record rapid ice-proximal dumping of glaciolacustrine silt as the Okanagan glacier backwasted upvalley in a deep lake. A thin (60 m max.) laminated seismic sequence (III) drapes the hummocky surface of sequence II and represents postglacial sedimentation from fan-deltas. The extreme thickness of sequences I and II in Okanagan Lake reflects the focussing of large volumes of meltwater and sediment into the basin during deglaciation; pre-existing sediments that pre-date the last glacial cycle appear to have been completely eroded. Glaciological conditions during sedimentation may have been similar to marine-based outlet glaciers calving in deep water in fiord basins. In contrast to marine settings where ice bergs are free to disperse, large volumes of dead ice were trapped within the basin; structural evidence for sedimentation around dead ice blocks has been previously used to argue that the Cordilleran Ice Sheet downwasted in situ. We emphasize in contrast, the trapping of dead ice left behind by rapidly calving lake-based outlet glaciers.

  16. Modeling language and cognition with deep unsupervised learning: a tutorial overview

    PubMed Central

    Zorzi, Marco; Testolin, Alberto; Stoianov, Ivilin P.

    2013-01-01

    Deep unsupervised learning in stochastic recurrent neural networks with many layers of hidden units is a recent breakthrough in neural computation research. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. In this article we discuss the theoretical foundations of this approach and we review key issues related to training, testing and analysis of deep networks for modeling language and cognitive processing. The classic letter and word perception problem of McClelland and Rumelhart (1981) is used as a tutorial example to illustrate how structured and abstract representations may emerge from deep generative learning. We argue that the focus on deep architectures and generative (rather than discriminative) learning represents a crucial step forward for the connectionist modeling enterprise, because it offers a more plausible model of cortical learning as well as a way to bridge the gap between emergentist connectionist models and structured Bayesian models of cognition. PMID:23970869

  17. Modeling language and cognition with deep unsupervised learning: a tutorial overview.

    PubMed

    Zorzi, Marco; Testolin, Alberto; Stoianov, Ivilin P

    2013-01-01

    Deep unsupervised learning in stochastic recurrent neural networks with many layers of hidden units is a recent breakthrough in neural computation research. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. In this article we discuss the theoretical foundations of this approach and we review key issues related to training, testing and analysis of deep networks for modeling language and cognitive processing. The classic letter and word perception problem of McClelland and Rumelhart (1981) is used as a tutorial example to illustrate how structured and abstract representations may emerge from deep generative learning. We argue that the focus on deep architectures and generative (rather than discriminative) learning represents a crucial step forward for the connectionist modeling enterprise, because it offers a more plausible model of cortical learning as well as a way to bridge the gap between emergentist connectionist models and structured Bayesian models of cognition.

  18. Deep Space Telecommunications

    NASA Technical Reports Server (NTRS)

    Kuiper, T. B. H.; Resch, G. M.

    2000-01-01

    The increasing load on NASA's deep Space Network, the new capabilities for deep space missions inherent in a next-generation radio telescope, and the potential of new telescope technology for reducing construction and operation costs suggest a natural marriage between radio astronomy and deep space telecommunications in developing advanced radio telescope concepts.

  19. Are landslides in the New Madrid Seismic Zone the result of the 1811-1812 earthquake sequence or multiple prehistoric earthquakes?

    NASA Astrophysics Data System (ADS)

    Gold, Ryan; Williams, Robert; Jibson, Randall

    2014-05-01

    Previous research indicates that deep translational and rotational landslides along the bluffs east of the Mississippi River in western Tennessee were triggered by the M7-8 1811-1812 New Madrid earthquake sequence. Analysis of recently acquired airborne LiDAR data suggests the possibility of multiple generations of landslides, possibly triggered by older, similar magnitude earthquake sequences, which paleoliquifaction studies show occurred circa 1450 and about 900 A.D. Using these LiDAR data, we have remapped recent landslides along two sections of the bluffs: a northern section near Reelfoot Lake and a southern section near Meeman-Shelby State Park (20 km north of Memphis, Tennessee). The bare-earth, digital-elevation models derived from these LiDAR data have a resolution of 0.5 m and reveal valuable details of topography given the region's dense forest canopy. Our mapping confirms much of the previous landslide mapping, refutes a few previously mapped landslides, and reveals new, undetected landslides. Importantly, we observe that the landslide deposits in the Reelfoot region are characterized by rotated blocks with sharp uphill-facing scarps and steep headwall scarps, indicating youthful, relatively recent movement. In comparison, landslide deposits near Meeman-Shelby are muted in appearance, with headwall scarps and rotated blocks that are extensively dissected by gullies, indicating they might be an older generation of landslides. Because of these differences in morphology, we hypothesize that the landslides near Reelfoot Lake were triggered by the 1811-1812 earthquake sequence and that landslides near Meeman-Shelby resulted from shaking associated with earlier earthquake sequences. To test this hypothesis, we will evaluate differences in bluff height, local geology, vegetation, and proximity to known seismic sources. Furthermore, planned fieldwork will help evaluate whether the observed landslide displacements occurred in single earthquakes or if they might result from episodic movements associated with a sequence of multiple prehistoric earthquake. This study highlights the value of high-resolution, bare-earth topographic data to investigate the secondary effects of groundshaking in stable continental regions, where primary tectonic deformation associated with large earthquakes is commonly obscure or subtle.

  20. Joint deep shape and appearance learning: application to optic pathway glioma segmentation

    NASA Astrophysics Data System (ADS)

    Mansoor, Awais; Li, Ien; Packer, Roger J.; Avery, Robert A.; Linguraru, Marius George

    2017-03-01

    Automated tissue characterization is one of the major applications of computer-aided diagnosis systems. Deep learning techniques have recently demonstrated impressive performance for the image patch-based tissue characterization. However, existing patch-based tissue classification techniques struggle to exploit the useful shape information. Local and global shape knowledge such as the regional boundary changes, diameter, and volumetrics can be useful in classifying the tissues especially in scenarios where the appearance signature does not provide significant classification information. In this work, we present a deep neural network-based method for the automated segmentation of the tumors referred to as optic pathway gliomas (OPG) located within the anterior visual pathway (AVP; optic nerve, chiasm or tracts) using joint shape and appearance learning. Voxel intensity values of commonly used MRI sequences are generally not indicative of OPG. To be considered an OPG, current clinical practice dictates that some portion of AVP must demonstrate shape enlargement. The method proposed in this work integrates multiple sequence magnetic resonance image (T1, T2, and FLAIR) along with local boundary changes to train a deep neural network. For training and evaluation purposes, we used a dataset of multiple sequence MRI obtained from 20 subjects (10 controls, 10 NF1+OPG). To our best knowledge, this is the first deep representation learning-based approach designed to merge shape and multi-channel appearance data for the glioma detection. In our experiments, mean misclassification errors of 2:39% and 0:48% were observed respectively for glioma and control patches extracted from the AVP. Moreover, an overall dice similarity coefficient of 0:87+/-0:13 (0:93+/-0:06 for healthy tissue, 0:78+/-0:18 for glioma tissue) demonstrates the potential of the proposed method in the accurate localization and early detection of OPG.

Top