Virus Identification in Unknown Tropical Febrile Illness Cases Using Deep Sequencing
Balmaseda, Angel; Harris, Eva; DeRisi, Joseph L.
2012-01-01
Dengue virus is an emerging infectious agent that infects an estimated 50–100 million people annually worldwide, yet current diagnostic practices cannot detect an etiologic pathogen in ∼40% of dengue-like illnesses. Metagenomic approaches to pathogen detection, such as viral microarrays and deep sequencing, are promising tools to address emerging and non-diagnosable disease challenges. In this study, we used the Virochip microarray and deep sequencing to characterize the spectrum of viruses present in human sera from 123 Nicaraguan patients presenting with dengue-like symptoms but testing negative for dengue virus. We utilized a barcoding strategy to simultaneously deep sequence multiple serum specimens, generating on average over 1 million reads per sample. We then implemented a stepwise bioinformatic filtering pipeline to remove the majority of human and low-quality sequences to improve the speed and accuracy of subsequent unbiased database searches. By deep sequencing, we were able to detect virus sequence in 37% (45/123) of previously negative cases. These included 13 cases with Human Herpesvirus 6 sequences. Other samples contained sequences with similarity to sequences from viruses in the Herpesviridae, Flaviviridae, Circoviridae, Anelloviridae, Asfarviridae, and Parvoviridae families. In some cases, the putative viral sequences were virtually identical to known viruses, and in others they diverged, suggesting that they may derive from novel viruses. These results demonstrate the utility of unbiased metagenomic approaches in the detection of known and divergent viruses in the study of tropical febrile illness. PMID:22347512
Shafiee, Mohammad Javad; Chung, Audrey G; Khalvati, Farzad; Haider, Masoom A; Wong, Alexander
2017-10-01
While lung cancer is the second most diagnosed form of cancer in men and women, a sufficiently early diagnosis can be pivotal in patient survival rates. Imaging-based, or radiomics-driven, detection methods have been developed to aid diagnosticians, but largely rely on hand-crafted features that may not fully encapsulate the differences between cancerous and healthy tissue. Recently, the concept of discovery radiomics was introduced, where custom abstract features are discovered from readily available imaging data. We propose an evolutionary deep radiomic sequencer discovery approach based on evolutionary deep intelligence. Motivated by patient privacy concerns and the idea of operational artificial intelligence, the evolutionary deep radiomic sequencer discovery approach organically evolves increasingly more efficient deep radiomic sequencers that produce significantly more compact yet similarly descriptive radiomic sequences over multiple generations. As a result, this framework improves operational efficiency and enables diagnosis to be run locally at the radiologist's computer while maintaining detection accuracy. We evaluated the evolved deep radiomic sequencer (EDRS) discovered via the proposed evolutionary deep radiomic sequencer discovery framework against state-of-the-art radiomics-driven and discovery radiomics methods using clinical lung CT data with pathologically proven diagnostic data from the LIDC-IDRI dataset. The EDRS shows improved sensitivity (93.42%), specificity (82.39%), and diagnostic accuracy (88.78%) relative to previous radiomics approaches.
Chen, Zhao; Moran, Kimberly; Richards-Yutz, Jennifer; Toorens, Erik; Gerhart, Daniel; Ganguly, Tapan; Shields, Carol L; Ganguly, Arupa
2014-03-01
Sporadic retinoblastoma (RB) is caused by de novo mutations in the RB1 gene. Often, these mutations are present as mosaic mutations that cannot be detected by Sanger sequencing. Next-generation deep sequencing allows unambiguous detection of the mosaic mutations in lymphocyte DNA. Deep sequencing of the RB1 gene on lymphocyte DNA from 20 bilateral and 70 unilateral RB cases was performed, where Sanger sequencing excluded the presence of mutations. The individual exons of the RB1 gene from each sample were amplified, pooled, ligated to barcoded adapters, and sequenced using semiconductor sequencing on an Ion Torrent Personal Genome Machine. Six low-level mosaic mutations were identified in bilateral RB and four in unilateral RB cases. The incidence of low-level mosaic mutation was estimated to be 30% and 6%, respectively, in sporadic bilateral and unilateral RB cases, previously classified as mutation negative. The frequency of point mutations detectable in lymphocyte DNA increased from 96% to 97% for bilateral RB and from 13% to 18% for unilateral RB. The use of deep sequencing technology increased the sensitivity of the detection of low-level germline mosaic mutations in the RB1 gene. This finding has significant implications for improved clinical diagnosis, genetic counseling, surveillance, and management of RB. © 2013 WILEY PERIODICALS, INC.
Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions
2014-01-01
Deep sequencing harnesses the high throughput nature of next generation sequencing technologies to generate population samples, treating information contained in individual reads as meaningful. Here, we review applications of deep sequencing to pathogen evolution. Pioneering deep sequencing studies from the virology literature are discussed, such as whole genome Roche-454 sequencing analyses of the dynamics of the rapidly mutating pathogens hepatitis C virus and HIV. Extension of the deep sequencing approach to bacterial populations is then discussed, including the impacts of emerging sequencing technologies. While it is clear that deep sequencing has unprecedented potential for assessing the genetic structure and evolutionary history of pathogen populations, bioinformatic challenges remain. We summarise current approaches to overcoming these challenges, in particular methods for detecting low frequency variants in the context of sequencing error and reconstructing individual haplotypes from short reads. PMID:24428920
VirusDetect: An automated pipeline for efficient virus discovery using deep sequencing of small RNAs
USDA-ARS?s Scientific Manuscript database
Accurate detection of viruses in plants and animals is critical for agriculture production and human health. Deep sequencing and assembly of virus-derived siRNAs has proven to be a highly efficient approach for virus discovery. However, to date no computational tools specifically designed for both k...
A deep learning method for lincRNA detection using auto-encoder algorithm.
Yu, Ning; Yu, Zeng; Pan, Yi
2017-12-06
RNA sequencing technique (RNA-seq) enables scientists to develop novel data-driven methods for discovering more unidentified lincRNAs. Meantime, knowledge-based technologies are experiencing a potential revolution ignited by the new deep learning methods. By scanning the newly found data set from RNA-seq, scientists have found that: (1) the expression of lincRNAs appears to be regulated, that is, the relevance exists along the DNA sequences; (2) lincRNAs contain some conversed patterns/motifs tethered together by non-conserved regions. The two evidences give the reasoning for adopting knowledge-based deep learning methods in lincRNA detection. Similar to coding region transcription, non-coding regions are split at transcriptional sites. However, regulatory RNAs rather than message RNAs are generated. That is, the transcribed RNAs participate the biological process as regulatory units instead of generating proteins. Identifying these transcriptional regions from non-coding regions is the first step towards lincRNA recognition. The auto-encoder method achieves 100% and 92.4% prediction accuracy on transcription sites over the putative data sets. The experimental results also show the excellent performance of predictive deep neural network on the lincRNA data sets compared with support vector machine and traditional neural network. In addition, it is validated through the newly discovered lincRNA data set and one unreported transcription site is found by feeding the whole annotated sequences through the deep learning machine, which indicates that deep learning method has the extensive ability for lincRNA prediction. The transcriptional sequences of lincRNAs are collected from the annotated human DNA genome data. Subsequently, a two-layer deep neural network is developed for the lincRNA detection, which adopts the auto-encoder algorithm and utilizes different encoding schemes to obtain the best performance over intergenic DNA sequence data. Driven by those newly annotated lincRNA data, deep learning methods based on auto-encoder algorithm can exert their capability in knowledge learning in order to capture the useful features and the information correlation along DNA genome sequences for lincRNA detection. As our knowledge, this is the first application to adopt the deep learning techniques for identifying lincRNA transcription sequences.
DeepBase: annotation and discovery of microRNAs and other noncoding RNAs from deep-sequencing data.
Yang, Jian-Hua; Qu, Liang-Hu
2012-01-01
Recent advances in high-throughput deep-sequencing technology have produced large numbers of short and long RNA sequences and enabled the detection and profiling of known and novel microRNAs (miRNAs) and other noncoding RNAs (ncRNAs) at unprecedented sensitivity and depth. In this chapter, we describe the use of deepBase, a database that we have developed to integrate all public deep-sequencing data and to facilitate the comprehensive annotation and discovery of miRNAs and other ncRNAs from these data. deepBase provides an integrative, interactive, and versatile web graphical interface to evaluate miRBase-annotated miRNA genes and other known ncRNAs, explores the expression patterns of miRNAs and other ncRNAs, and discovers novel miRNAs and other ncRNAs from deep-sequencing data. deepBase also provides a deepView genome browser to comparatively analyze these data at multiple levels. deepBase is available at http://deepbase.sysu.edu.cn/.
Swenson, Luke C; Moores, Andrew; Low, Andrew J; Thielen, Alexander; Dong, Winnie; Woods, Conan; Jensen, Mark A; Wynhoven, Brian; Chan, Dennison; Glascock, Christopher; Harrigan, P Richard
2010-08-01
Tropism testing should rule out CXCR4-using HIV before treatment with CCR5 antagonists. Currently, the recombinant phenotypic Trofile assay (Monogram) is most widely utilized; however, genotypic tests may represent alternative methods. Independent triplicate amplifications of the HIV gp120 V3 region were made from either plasma HIV RNA or proviral DNA. These underwent standard, population-based sequencing with an ABI3730 (RNA n = 63; DNA n = 40), or "deep" sequencing with a Roche/454 Genome Sequencer-FLX (RNA n = 12; DNA n = 12). Position-specific scoring matrices (PSSMX4/R5) (-6.96 cutoff) and geno2pheno[coreceptor] (5% false-positive rate) inferred tropism from V3 sequence. These methods were then independently validated with a separate, blinded dataset (n = 278) of screening samples from the maraviroc MOTIVATE trials. Standard sequencing of HIV RNA with PSSM yielded 69% sensitivity and 91% specificity, relative to Trofile. The validation dataset gave 75% sensitivity and 83% specificity. Proviral DNA plus PSSM gave 77% sensitivity and 71% specificity. "Deep" sequencing of HIV RNA detected >2% inferred-CXCR4-using virus in 8/8 samples called non-R5 by Trofile, and <2% in 4/4 samples called R5. Triplicate analyses of V3 standard sequence data detect greater proportions of CXCR4-using samples than previously achieved. Sequencing proviral DNA and "deep" V3 sequencing may also be useful tools for assessing tropism.
Mee, Edward T.; Preston, Mark D.; Minor, Philip D.; Schepelmann, Silke; Huang, Xuening; Nguyen, Jenny; Wall, David; Hargrove, Stacey; Fu, Thomas; Xu, George; Li, Li; Cote, Colette; Delwart, Eric; Li, Linlin; Hewlett, Indira; Simonyan, Vahan; Ragupathy, Viswanath; Alin, Voskanian-Kordi; Mermod, Nicolas; Hill, Christiane; Ottenwälder, Birgit; Richter, Daniel C.; Tehrani, Arman; Jacqueline, Weber-Lehmann; Cassart, Jean-Pol; Letellier, Carine; Vandeputte, Olivier; Ruelle, Jean-Louis; Deyati, Avisek; La Neve, Fabio; Modena, Chiara; Mee, Edward; Schepelmann, Silke; Preston, Mark; Minor, Philip; Eloit, Marc; Muth, Erika; Lamamy, Arnaud; Jagorel, Florence; Cheval, Justine; Anscombe, Catherine; Misra, Raju; Wooldridge, David; Gharbia, Saheer; Rose, Graham; Ng, Siemon H.S.; Charlebois, Robert L.; Gisonni-Lex, Lucy; Mallet, Laurent; Dorange, Fabien; Chiu, Charles; Naccache, Samia; Kellam, Paul; van der Hoek, Lia; Cotten, Matt; Mitchell, Christine; Baier, Brian S.; Sun, Wenping; Malicki, Heather D.
2016-01-01
Background Unbiased deep sequencing offers the potential for improved adventitious virus screening in vaccines and biotherapeutics. Successful implementation of such assays will require appropriate control materials to confirm assay performance and sensitivity. Methods A common reference material containing 25 target viruses was produced and 16 laboratories were invited to process it using their preferred adventitious virus detection assay. Results Fifteen laboratories returned results, obtained using a wide range of wet-lab and informatics methods. Six of 25 target viruses were detected by all laboratories, with the remaining viruses detected by 4–14 laboratories. Six non-target viruses were detected by three or more laboratories. Conclusion The study demonstrated that a wide range of methods are currently used for adventitious virus detection screening in biological products by deep sequencing and that they can yield significantly different results. This underscores the need for common reference materials to ensure satisfactory assay performance and enable comparisons between laboratories. PMID:26709640
Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A; Larsen, Martin Jakob
2016-01-01
Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths.
Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A.; Larsen, Martin Jakob
2016-01-01
Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths. PMID:27002637
Quantitative phenotyping via deep barcode sequencing.
Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey
2009-10-01
Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.
Detection of Emerging Vaccine-Related Polioviruses by Deep Sequencing.
Sahoo, Malaya K; Holubar, Marisa; Huang, ChunHong; Mohamed-Hadley, Alisha; Liu, Yuanyuan; Waggoner, Jesse J; Troy, Stephanie B; Garcia-Garcia, Lourdes; Ferreyra-Reyes, Leticia; Maldonado, Yvonne; Pinsky, Benjamin A
2017-07-01
Oral poliovirus vaccine can mutate to regain neurovirulence. To date, evaluation of these mutations has been performed primarily on culture-enriched isolates by using conventional Sanger sequencing. We therefore developed a culture-independent, deep-sequencing method targeting the 5' untranslated region (UTR) and P1 genomic region to characterize vaccine-related poliovirus variants. Error analysis of the deep-sequencing method demonstrated reliable detection of poliovirus mutations at levels of <1%, depending on read depth. Sequencing of viral nucleic acids from the stool of vaccinated, asymptomatic children and their close contacts collected during a prospective cohort study in Veracruz, Mexico, revealed no vaccine-derived polioviruses. This was expected given that the longest duration between sequenced sample collection and the end of the most recent national immunization week was 66 days. However, we identified many low-level variants (<5%) distributed across the 5' UTR and P1 genomic region in all three Sabin serotypes, as well as vaccine-related viruses with multiple canonical mutations associated with phenotypic reversion present at high levels (>90%). These results suggest that monitoring emerging vaccine-related poliovirus variants by deep sequencing may aid in the poliovirus endgame and efforts to ensure global polio eradication. Copyright © 2017 Sahoo et al.
Ultra-deep mutant spectrum profiling: improving sequencing accuracy using overlapping read pairs.
Chen-Harris, Haiyin; Borucki, Monica K; Torres, Clinton; Slezak, Tom R; Allen, Jonathan E
2013-02-12
High throughput sequencing is beginning to make a transformative impact in the area of viral evolution. Deep sequencing has the potential to reveal the mutant spectrum within a viral sample at high resolution, thus enabling the close examination of viral mutational dynamics both within- and between-hosts. The challenge however, is to accurately model the errors in the sequencing data and differentiate real viral mutations, particularly those that exist at low frequencies, from sequencing errors. We demonstrate that overlapping read pairs (ORP) -- generated by combining short fragment sequencing libraries and longer sequencing reads -- significantly reduce sequencing error rates and improve rare variant detection accuracy. Using this sequencing protocol and an error model optimized for variant detection, we are able to capture a large number of genetic mutations present within a viral population at ultra-low frequency levels (<0.05%). Our rare variant detection strategies have important implications beyond viral evolution and can be applied to any basic and clinical research area that requires the identification of rare mutations.
Quantitative phenotyping via deep barcode sequencing
Smith, Andrew M.; Heisler, Lawrence E.; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J.; Chee, Mark; Roth, Frederick P.; Giaever, Guri; Nislow, Corey
2009-01-01
Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or “Bar-seq,” outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that ∼20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene–environment interactions on a genome-wide scale. PMID:19622793
Deep sequencing reveals double mutations in cis of MPL exon 10 in myeloproliferative neoplasms.
Pietra, Daniela; Brisci, Angela; Rumi, Elisa; Boggi, Sabrina; Elena, Chiara; Pietrelli, Alessandro; Bordoni, Roberta; Ferrari, Maurizio; Passamonti, Francesco; De Bellis, Gianluca; Cremonesi, Laura; Cazzola, Mario
2011-04-01
Somatic mutations of MPL exon 10, mainly involving a W515 substitution, have been described in JAK2 (V617F)-negative patients with essential thrombocythemia and primary myelofibrosis. We used direct sequencing and high-resolution melt analysis to identify mutations of MPL exon 10 in 570 patients with myeloproliferative neoplasms, and allele specific PCR and deep sequencing to further characterize a subset of mutated patients. Somatic mutations were detected in 33 of 221 patients (15%) with JAK2 (V617F)-negative essential thrombocythemia or primary myelofibrosis. Only one patient with essential thrombocythemia carried both JAK2 (V617F) and MPL (W515L). High-resolution melt analysis identified abnormal patterns in all the MPL mutated cases, while direct sequencing did not detect the mutant MPL in one fifth of them. In 3 cases carrying double MPL mutations, deep sequencing analysis showed identical load and location in cis of the paired lesions, indicating their simultaneous occurrence on the same chromosome.
Making sense of deep sequencing
Goldman, D.; Domschke, K.
2016-01-01
This review, the first of an occasional series, tries to make sense of the concepts and uses of deep sequencing of polynucleic acids (DNA and RNA). Deep sequencing, synonymous with next-generation sequencing, high-throughput sequencing and massively parallel sequencing, includes whole genome sequencing but is more often and diversely applied to specific parts of the genome captured in different ways, for example the highly expressed portion of the genome known as the exome and portions of the genome that are epigenetically marked either by DNA methylation, the binding of proteins including histones, or that are in different configurations and thus more or less accessible to enzymes that cleave DNA. Deep sequencing of RNA (RNASeq) reverse-transcribed to complementary DNA is invaluable for measuring RNA expression and detecting changes in RNA structure. Important concepts in deep sequencing include the length and depth of sequence reads, mapping and assembly of reads, sequencing error, haplotypes, and the propensity of deep sequencing, as with other types of ‘big data’, to generate large numbers of errors, requiring monitoring for methodologic biases and strategies for replication and validation. Deep sequencing yields a unique genetic fingerprint that can be used to identify a person, and a trove of predictors of genetic medical diseases. Deep sequencing to identify epigenetic events including changes in DNA methylation and RNA expression can reveal the history and impact of environmental exposures. Because of the power of sequencing to identify and deliver biomedically significant information about a person and their blood relatives, it creates ethical dilemmas and practical challenges in research and clinical care, for example the decision and procedures to report incidental findings that will increasingly and frequently be discovered. PMID:24925306
Le, Thuy; Chiarella, Jennifer; Simen, Birgitte B; Hanczaruk, Bozena; Egholm, Michael; Landry, Marie L; Dieckhaus, Kevin; Rosen, Marc I; Kozal, Michael J
2009-06-29
It is largely unknown how frequently low-abundance HIV drug-resistant variants at levels under limit of detection of conventional genotyping (<20% of quasi-species) are present in antiretroviral-experienced persons experiencing virologic failure. Further, the clinical implications of low-abundance drug-resistant variants at time of virologic failure are unknown. Plasma samples from 22 antiretroviral-experienced subjects collected at time of virologic failure (viral load 1380 to 304,000 copies/mL) were obtained from a specimen bank (from 2004-2007). The prevalence and profile of drug-resistant mutations were determined using Sanger sequencing and ultra-deep pyrosequencing. Genotypes were interpreted using Stanford HIV database algorithm. Antiretroviral treatment histories were obtained by chart review and correlated with drug-resistant mutations. Low-abundance drug-resistant mutations were detected in all 22 subjects by deep sequencing and only in 3 subjects by Sanger sequencing. In total they accounted for 90 of 247 mutations (36%) detected by deep sequencing; the majority of these (95%) were not detected by standard genotyping. A mean of 4 additional mutations per subject were detected by deep sequencing (p<0.0001, 95%CI: 2.85-5.53). The additional low-abundance drug-resistant mutations increased a subject's genotypic resistance to one or more antiretrovirals in 17 of 22 subjects (77%). When correlated with subjects' antiretroviral treatment histories, the additional low-abundance drug-resistant mutations correlated with the failing antiretroviral drugs in 21% subjects and correlated with historical antiretroviral use in 79% subjects (OR, 13.73; 95% CI, 2.5-74.3, p = 0.0016). Low-abundance HIV drug-resistant mutations in antiretroviral-experienced subjects at time of virologic failure can increase a subject's overall burden of resistance, yet commonly go unrecognized by conventional genotyping. The majority of unrecognized resistant mutations correlate with historical antiretroviral use. Ultra-deep sequencing can provide important historical resistance information for clinicians when planning subsequent antiretroviral regimens for highly treatment-experienced patients, particularly when their prior treatment histories and longitudinal genotypes are not available.
Le, Thuy; Chiarella, Jennifer; Simen, Birgitte B.; Hanczaruk, Bozena; Egholm, Michael; Landry, Marie L.; Dieckhaus, Kevin; Rosen, Marc I.; Kozal, Michael J.
2009-01-01
Background It is largely unknown how frequently low-abundance HIV drug-resistant variants at levels under limit of detection of conventional genotyping (<20% of quasi-species) are present in antiretroviral-experienced persons experiencing virologic failure. Further, the clinical implications of low-abundance drug-resistant variants at time of virologic failure are unknown. Methodology/Principal Findings Plasma samples from 22 antiretroviral-experienced subjects collected at time of virologic failure (viral load 1380 to 304,000 copies/mL) were obtained from a specimen bank (from 2004–2007). The prevalence and profile of drug-resistant mutations were determined using Sanger sequencing and ultra-deep pyrosequencing. Genotypes were interpreted using Stanford HIV database algorithm. Antiretroviral treatment histories were obtained by chart review and correlated with drug-resistant mutations. Low-abundance drug-resistant mutations were detected in all 22 subjects by deep sequencing and only in 3 subjects by Sanger sequencing. In total they accounted for 90 of 247 mutations (36%) detected by deep sequencing; the majority of these (95%) were not detected by standard genotyping. A mean of 4 additional mutations per subject were detected by deep sequencing (p<0.0001, 95%CI: 2.85–5.53). The additional low-abundance drug-resistant mutations increased a subject's genotypic resistance to one or more antiretrovirals in 17 of 22 subjects (77%). When correlated with subjects' antiretroviral treatment histories, the additional low-abundance drug-resistant mutations correlated with the failing antiretroviral drugs in 21% subjects and correlated with historical antiretroviral use in 79% subjects (OR, 13.73; 95% CI, 2.5–74.3, p = 0.0016). Conclusions/Significance Low-abundance HIV drug-resistant mutations in antiretroviral-experienced subjects at time of virologic failure can increase a subject's overall burden of resistance, yet commonly go unrecognized by conventional genotyping. The majority of unrecognized resistant mutations correlate with historical antiretroviral use. Ultra-deep sequencing can provide important historical resistance information for clinicians when planning subsequent antiretroviral regimens for highly treatment-experienced patients, particularly when their prior treatment histories and longitudinal genotypes are not available. PMID:19562031
Deep learning on temporal-spectral data for anomaly detection
NASA Astrophysics Data System (ADS)
Ma, King; Leung, Henry; Jalilian, Ehsan; Huang, Daniel
2017-05-01
Detecting anomalies is important for continuous monitoring of sensor systems. One significant challenge is to use sensor data and autonomously detect changes that cause different conditions to occur. Using deep learning methods, we are able to monitor and detect changes as a result of some disturbance in the system. We utilize deep neural networks for sequence analysis of time series. We use a multi-step method for anomaly detection. We train the network to learn spectral and temporal features from the acoustic time series. We test our method using fiber-optic acoustic data from a pipeline.
Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J; Kellam, Paul; van der Hoek, Lia
2014-01-01
We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis.
Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J.; Kellam, Paul; van der Hoek, Lia
2014-01-01
We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis. PMID:24695106
A robust and cost-effective approach to sequence and analyze complete genomes of small RNA viruses
USDA-ARS?s Scientific Manuscript database
Background: Next-generation sequencing (NGS) allows ultra-deep sequencing of nucleic acids. The use of sequence-independent amplification of viral nucleic acids without utilization of target-specific primers provides advantages over traditional sequencing methods and allows detection of unsuspected ...
DeepSig: deep learning improves signal peptide detection in proteins.
Savojardo, Castrense; Martelli, Pier Luigi; Fariselli, Piero; Casadio, Rita
2018-05-15
The identification of signal peptides in protein sequences is an important step toward protein localization and function characterization. Here, we present DeepSig, an improved approach for signal peptide detection and cleavage-site prediction based on deep learning methods. Comparative benchmarks performed on an updated independent dataset of proteins show that DeepSig is the current best performing method, scoring better than other available state-of-the-art approaches on both signal peptide detection and precise cleavage-site identification. DeepSig is available as both standalone program and web server at https://deepsig.biocomp.unibo.it. All datasets used in this study can be obtained from the same website. pierluigi.martelli@unibo.it. Supplementary data are available at Bioinformatics online.
Leda, Ana Rachel; Hunter, James; Oliveira, Ursula Castro; Azevedo, Inacio Junqueira; Sucupira, Maria Cecilia Araripe; Diaz, Ricardo Sobhie
2018-04-19
The presence of minority transmitted drug resistance mutations was assessed using ultra-deep sequencing and correlated with disease progression among recently HIV-1-infected individuals from Brazil. Samples at baseline during recent infection and 1 year after the establishment of the infection were analysed. Viral RNA and proviral DNA from 25 individuals were subjected to ultra-deep sequencing of the reverse transcriptase and protease regions of HIV-1. Viral strains carrying transmitted drug resistance mutations were detected in 9 out of the 25 patients, for all major antiretroviral classes, ranging from one to five mutations per patient. Ultra-deep sequencing detected strains with frequencies as low as 1.6% and only strains with frequencies >20% were detected by population plasma sequencing (three patients). Transmitted drug resistance strains with frequencies <14.8% did not persist upon established infection. The presence of transmitted drug resistance mutations was negatively correlated with the viral load and with CD4+ T cell count decay. Transmitted drug resistance mutations representing small percentages of the viral population do not persist during infection because they are negatively selected in the first year after HIV-1 seroconversion.
Deep Sequencing Analysis of Apple Infecting Viruses in Korea
Cho, In-Sook; Igori, Davaajargal; Lim, Seungmo; Choi, Gug-Seoun; Hammond, John; Lim, Hyoun-Sub; Moon, Jae Sun
2016-01-01
Deep sequencing has generated 52 contigs derived from five viruses; Apple chlorotic leaf spot virus (ACLSV), Apple stem grooving virus (ASGV), Apple stem pitting virus (ASPV), Apple green crinkle associated virus (AGCaV), and Apricot latent virus (ApLV) were identified from eight apple samples showing small leaves and/or growth retardation. Nucleotide (nt) sequence identity of the assembled contigs was from 68% to 99% compared to the reference sequences of the five respective viral genomes. Sequences of ASPV and ASGV were the most abundantly represented by the 52 contigs assembled. The presence of the five viruses in the samples was confirmed by RT-PCR using specific primers based on the sequences of each assembled contig. All five viruses were detected in three of the samples, whereas all samples had mixed infections with at least two viruses. The most frequently detected virus was ASPV, followed by ASGV, ApLV, ACLSV, and AGCaV which were withal found in mixed infections in the tested samples. AGCaV was identified in assembled contigs ID 1012480 and 93549, which showed 82% and 78% nt sequence identity with ORF1 of AGCaV isolate Aurora-1. ApLV was identified in three assembled contigs, ID 65587, 1802365, and 116777, which showed 77%, 78%, and 76% nt sequence identity respectively with ORF1 of ApLV isolate LA2. Deep sequencing assay was shown to be a valuable and powerful tool for detection and identification of known and unknown virome in infected apple trees, here identifying ApLV and AGCaV in commercial orchards in Korea for the first time. PMID:27721694
Porter, Danielle P.; Daeumer, Martin; Thielen, Alexander; Chang, Silvia; Martin, Ross; Cohen, Cal; Miller, Michael D.; White, Kirsten L.
2015-01-01
At Week 96 of the Single-Tablet Regimen (STaR) study, more treatment-naïve subjects that received rilpivirine/emtricitabine/tenofovir DF (RPV/FTC/TDF) developed resistance mutations compared to those treated with efavirenz (EFV)/FTC/TDF by population sequencing. Furthermore, more RPV/FTC/TDF-treated subjects with baseline HIV-1 RNA >100,000 copies/mL developed resistance compared to subjects with baseline HIV-1 RNA ≤100,000 copies/mL. Here, deep sequencing was utilized to assess the presence of pre-existing low-frequency variants in subjects with and without resistance development in the STaR study. Deep sequencing (Illumina MiSeq) was performed on baseline and virologic failure samples for all subjects analyzed for resistance by population sequencing during the clinical study (n = 33), as well as baseline samples from control subjects with virologic response (n = 118). Primary NRTI or NNRTI drug resistance mutations present at low frequency (≥2% to 20%) were detected in 6.6% of baseline samples by deep sequencing, all of which occurred in control subjects. Deep sequencing results were generally consistent with population sequencing but detected additional primary NNRTI and NRTI resistance mutations at virologic failure in seven samples. HIV-1 drug resistance mutations emerging while on RPV/FTC/TDF or EFV/FTC/TDF treatment were not present at low frequency at baseline in the STaR study. PMID:26690199
Porter, Danielle P; Daeumer, Martin; Thielen, Alexander; Chang, Silvia; Martin, Ross; Cohen, Cal; Miller, Michael D; White, Kirsten L
2015-12-07
At Week 96 of the Single-Tablet Regimen (STaR) study, more treatment-naïve subjects that received rilpivirine/emtricitabine/tenofovir DF (RPV/FTC/TDF) developed resistance mutations compared to those treated with efavirenz (EFV)/FTC/TDF by population sequencing. Furthermore, more RPV/FTC/TDF-treated subjects with baseline HIV-1 RNA >100,000 copies/mL developed resistance compared to subjects with baseline HIV-1 RNA ≤100,000 copies/mL. Here, deep sequencing was utilized to assess the presence of pre-existing low-frequency variants in subjects with and without resistance development in the STaR study. Deep sequencing (Illumina MiSeq) was performed on baseline and virologic failure samples for all subjects analyzed for resistance by population sequencing during the clinical study (n = 33), as well as baseline samples from control subjects with virologic response (n = 118). Primary NRTI or NNRTI drug resistance mutations present at low frequency (≥2% to 20%) were detected in 6.6% of baseline samples by deep sequencing, all of which occurred in control subjects. Deep sequencing results were generally consistent with population sequencing but detected additional primary NNRTI and NRTI resistance mutations at virologic failure in seven samples. HIV-1 drug resistance mutations emerging while on RPV/FTC/TDF or EFV/FTC/TDF treatment were not present at low frequency at baseline in the STaR study.
Musculoskeletal MRI findings of juvenile localized scleroderma.
Eutsler, Eric P; Horton, Daniel B; Epelman, Monica; Finkel, Terri; Averill, Lauren W
2017-04-01
Juvenile localized scleroderma comprises a group of autoimmune conditions often characterized clinically by an area of skin hardening. In addition to superficial changes in the skin and subcutaneous tissues, juvenile localized scleroderma may involve the deep soft tissues, bones and joints, possibly resulting in functional impairment and pain in addition to cosmetic changes. There is literature documenting the spectrum of findings for deep involvement of localized scleroderma (fascia, muscles, tendons, bones and joints) in adults, but there is limited literature for the condition in children. We aimed to document the spectrum of musculoskeletal magnetic resonance imaging (MRI) findings of both superficial and deep juvenile localized scleroderma involvement in children and to evaluate the utility of various MRI sequences for detecting those findings. Two radiologists retrospectively evaluated 20 MRI studies of the extremities in 14 children with juvenile localized scleroderma. Each imaging sequence was also given a subjective score of 0 (not useful), 1 (somewhat useful) or 2 (most useful for detecting the findings). Deep tissue involvement was detected in 65% of the imaged extremities. Fascial thickening and enhancement were seen in 50% of imaged extremities. Axial T1, axial T1 fat-suppressed (FS) contrast-enhanced and axial fluid-sensitive sequences were rated most useful. Fascial thickening and enhancement were the most commonly encountered deep tissue findings in extremity MRIs of children with juvenile localized scleroderma. Because abnormalities of the skin, subcutaneous tissues and fascia tend to run longitudinally in an affected limb, axial T1, axial fluid-sensitive and axial T1-FS contrast-enhanced sequences should be included in the imaging protocol.
Jeanne, Nicolas; Saliou, Adrien; Carcenac, Romain; Lefebvre, Caroline; Dubois, Martine; Cazabat, Michelle; Nicot, Florence; Loiseau, Claire; Raymond, Stéphanie; Izopet, Jacques; Delobel, Pierre
2015-01-01
HIV-1 coreceptor usage must be accurately determined before starting CCR5 antagonist-based treatment as the presence of undetected minor CXCR4-using variants can cause subsequent virological failure. Ultra-deep pyrosequencing of HIV-1 V3 env allows to detect low levels of CXCR4-using variants that current genotypic approaches miss. However, the computation of the mass of sequence data and the need to identify true minor variants while excluding artifactual sequences generated during amplification and ultra-deep pyrosequencing is rate-limiting. Arbitrary fixed cut-offs below which minor variants are discarded are currently used but the errors generated during ultra-deep pyrosequencing are sequence-dependant rather than random. We have developed an automated processing of HIV-1 V3 env ultra-deep pyrosequencing data that uses biological filters to discard artifactual or non-functional V3 sequences followed by statistical filters to determine position-specific sensitivity thresholds, rather than arbitrary fixed cut-offs. It allows to retain authentic sequences with point mutations at V3 positions of interest and discard artifactual ones with accurate sensitivity thresholds. PMID:26585833
Jeanne, Nicolas; Saliou, Adrien; Carcenac, Romain; Lefebvre, Caroline; Dubois, Martine; Cazabat, Michelle; Nicot, Florence; Loiseau, Claire; Raymond, Stéphanie; Izopet, Jacques; Delobel, Pierre
2015-11-20
HIV-1 coreceptor usage must be accurately determined before starting CCR5 antagonist-based treatment as the presence of undetected minor CXCR4-using variants can cause subsequent virological failure. Ultra-deep pyrosequencing of HIV-1 V3 env allows to detect low levels of CXCR4-using variants that current genotypic approaches miss. However, the computation of the mass of sequence data and the need to identify true minor variants while excluding artifactual sequences generated during amplification and ultra-deep pyrosequencing is rate-limiting. Arbitrary fixed cut-offs below which minor variants are discarded are currently used but the errors generated during ultra-deep pyrosequencing are sequence-dependant rather than random. We have developed an automated processing of HIV-1 V3 env ultra-deep pyrosequencing data that uses biological filters to discard artifactual or non-functional V3 sequences followed by statistical filters to determine position-specific sensitivity thresholds, rather than arbitrary fixed cut-offs. It allows to retain authentic sequences with point mutations at V3 positions of interest and discard artifactual ones with accurate sensitivity thresholds.
Pan, Xiaoyong; Shen, Hong-Bin
2018-05-02
RNA-binding proteins (RBPs) take over 5∼10% of the eukaryotic proteome and play key roles in many biological processes, e.g. gene regulation. Experimental detection of RBP binding sites is still time-intensive and high-costly. Instead, computational prediction of the RBP binding sites using pattern learned from existing annotation knowledge is a fast approach. From the biological point of view, the local structure context derived from local sequences will be recognized by specific RBPs. However, in computational modeling using deep learning, to our best knowledge, only global representations of entire RNA sequences are employed. So far, the local sequence information is ignored in the deep model construction process. In this study, we present a computational method iDeepE to predict RNA-protein binding sites from RNA sequences by combining global and local convolutional neural networks (CNNs). For the global CNN, we pad the RNA sequences into the same length. For the local CNN, we split a RNA sequence into multiple overlapping fixed-length subsequences, where each subsequence is a signal channel of the whole sequence. Next, we train deep CNNs for multiple subsequences and the padded sequences to learn high-level features, respectively. Finally, the outputs from local and global CNNs are combined to improve the prediction. iDeepE demonstrates a better performance over state-of-the-art methods on two large-scale datasets derived from CLIP-seq. We also find that the local CNN run 1.8 times faster than the global CNN with comparable performance when using GPUs. Our results show that iDeepE has captured experimentally verified binding motifs. https://github.com/xypan1232/iDeepE. xypan172436@gmail.com or hbshen@sjtu.edu.cn. Supplementary data are available at Bioinformatics online.
Lu, Zen H; Brown, Alexander; Wilson, Alison D; Calvert, Jay G; Balasch, Monica; Fuentes-Utrilla, Pablo; Loecherbach, Julia; Turner, Frances; Talbot, Richard; Archibald, Alan L; Ait-Ali, Tahar
2014-03-04
Porcine Reproductive and Respiratory Syndrome (PRRS) is a disease of major economic impact worldwide. The etiologic agent of this disease is the PRRS virus (PRRSV). Increasing evidence suggest that microevolution within a coexisting quasispecies population can give rise to high sequence heterogeneity in PRRSV. We developed a pipeline based on the ultra-deep next generation sequencing approach to first construct the complete genome of a European PRRSV, strain Olot/9, cultured on macrophages and then capture the rare variants representative of the mixed quasispecies population. Olot/91 differs from the reference Lelystad strain by about 5% and a total of 88 variants, with frequencies as low as 1%, were detected in the mixed population. These variants included 16 non-synonymous variants concentrated in the genes encoding structural and nonstructural proteins; including Glycoprotein 2a and 5. Using an ultra-deep sequencing methodology, the complete genome of Olot/91 was constructed without any prior knowledge of the sequence. Rare variants that constitute minor fractions of the heterogeneous PRRSV population could successfully be detected to allow further exploration of microevolutionary events.
Zhang, Xiao-yong; Tang, Gui-ling; Xu, Xin-ya; Nong, Xu-hua; Qi, Shu-Hua
2014-01-01
The fungal diversity in deep-sea environments has recently gained an increasing amount attention. Our knowledge and understanding of the true fungal diversity and the role it plays in deep-sea environments, however, is still limited. We investigated the fungal community structure in five sediments from a depth of ∼4000 m in the East India Ocean using a combination of targeted environmental sequencing and traditional cultivation. This approach resulted in the recovery of a total of 45 fungal operational taxonomic units (OTUs) and 20 culturable fungal phylotypes. This finding indicates that there is a great amount of fungal diversity in the deep-sea sediments collected in the East Indian Ocean. Three fungal OTUs and one culturable phylotype demonstrated high divergence (89%–97%) from the existing sequences in the GenBank. Moreover, 44.4% fungal OTUs and 30% culturable fungal phylotypes are new reports for deep-sea sediments. These results suggest that the deep-sea sediments from the East India Ocean can serve as habitats for new fungal communities compared with other deep-sea environments. In addition, different fungal community could be detected when using targeted environmental sequencing compared with traditional cultivation in this study, which suggests that a combination of targeted environmental sequencing and traditional cultivation will generate a more diverse fungal community in deep-sea environments than using either targeted environmental sequencing or traditional cultivation alone. This study is the first to report new insights into the fungal communities in deep-sea sediments from the East Indian Ocean, which increases our knowledge and understanding of the fungal diversity in deep-sea environments. PMID:25272044
Quasispecies Analyses of the HIV-1 Near-full-length Genome With Illumina MiSeq
Ode, Hirotaka; Matsuda, Masakazu; Matsuoka, Kazuhiro; Hachiya, Atsuko; Hattori, Junko; Kito, Yumiko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru
2015-01-01
Human immunodeficiency virus type-1 (HIV-1) exhibits high between-host genetic diversity and within-host heterogeneity, recognized as quasispecies. Because HIV-1 quasispecies fluctuate in terms of multiple factors, such as antiretroviral exposure and host immunity, analyzing the HIV-1 genome is critical for selecting effective antiretroviral therapy and understanding within-host viral coevolution mechanisms. Here, to obtain HIV-1 genome sequence information that includes minority variants, we sought to develop a method for evaluating quasispecies throughout the HIV-1 near-full-length genome using the Illumina MiSeq benchtop deep sequencer. To ensure the reliability of minority mutation detection, we applied an analysis method of sequence read mapping onto a consensus sequence derived from de novo assembly followed by iterative mapping and subsequent unique error correction. Deep sequencing analyses of aHIV-1 clone showed that the analysis method reduced erroneous base prevalence below 1% in each sequence position and discarded only < 1% of all collected nucleotides, maximizing the usage of the collected genome sequences. Further, we designed primer sets to amplify the HIV-1 near-full-length genome from clinical plasma samples. Deep sequencing of 92 samples in combination with the primer sets and our analysis method provided sufficient coverage to identify >1%-frequency sequences throughout the genome. When we evaluated sequences of pol genes from 18 treatment-naïve patients' samples, the deep sequencing results were in agreement with Sanger sequencing and identified numerous additional minority mutations. The results suggest that our deep sequencing method would be suitable for identifying within-host viral population dynamics throughout the genome. PMID:26617593
Accurate identification of RNA editing sites from primitive sequence with deep neural networks.
Ouyang, Zhangyi; Liu, Feng; Zhao, Chenghui; Ren, Chao; An, Gaole; Mei, Chuan; Bo, Xiaochen; Shu, Wenjie
2018-04-16
RNA editing is a post-transcriptional RNA sequence alteration. Current methods have identified editing sites and facilitated research but require sufficient genomic annotations and prior-knowledge-based filtering steps, resulting in a cumbersome, time-consuming identification process. Moreover, these methods have limited generalizability and applicability in species with insufficient genomic annotations or in conditions of limited prior knowledge. We developed DeepRed, a deep learning-based method that identifies RNA editing from primitive RNA sequences without prior-knowledge-based filtering steps or genomic annotations. DeepRed achieved 98.1% and 97.9% area under the curve (AUC) in training and test sets, respectively. We further validated DeepRed using experimentally verified U87 cell RNA-seq data, achieving 97.9% positive predictive value (PPV). We demonstrated that DeepRed offers better prediction accuracy and computational efficiency than current methods with large-scale, mass RNA-seq data. We used DeepRed to assess the impact of multiple factors on editing identification with RNA-seq data from the Association of Biomolecular Resource Facilities and Sequencing Quality Control projects. We explored developmental RNA editing pattern changes during human early embryogenesis and evolutionary patterns in Drosophila species and the primate lineage using DeepRed. Our work illustrates DeepRed's state-of-the-art performance; it may decipher the hidden principles behind RNA editing, making editing detection convenient and effective.
Deep Whole-Genome Sequencing to Detect Mixed Infection of Mycobacterium tuberculosis
Gan, Mingyu; Liu, Qingyun; Yang, Chongguang; Gao, Qian; Luo, Tao
2016-01-01
Mixed infection by multiple Mycobacterium tuberculosis (MTB) strains is associated with poor treatment outcome of tuberculosis (TB). Traditional genotyping methods have been used to detect mixed infections of MTB, however, their sensitivity and resolution are limited. Deep whole-genome sequencing (WGS) has been proved highly sensitive and discriminative for studying population heterogeneity of MTB. Here, we developed a phylogenetic-based method to detect MTB mixed infections using WGS data. We collected published WGS data of 782 global MTB strains from public database. We called homogeneous and heterogeneous single nucleotide variations (SNVs) of individual strains by mapping short reads to the ancestral MTB reference genome. We constructed a phylogenomic database based on 68,639 homogeneous SNVs of 652 MTB strains. Mixed infections were determined if multiple evolutionary paths were identified by mapping the SNVs of individual samples to the phylogenomic database. By simulation, our method could specifically detect mixed infections when the sequencing depth of minor strains was as low as 1× coverage, and when the genomic distance of two mixed strains was as small as 16 SNVs. By applying our methods to all 782 samples, we detected 47 mixed infections and 45 of them were caused by locally endemic strains. The results indicate that our method is highly sensitive and discriminative for identifying mixed infections from deep WGS data of MTB isolates. PMID:27391214
Sohlberg, Elina; Bomberg, Malin; Miettinen, Hanna; Nyyssönen, Mari; Salavirta, Heikki; Vikman, Minna; Itävaara, Merja
2015-01-01
The diversity and functional role of fungi, one of the ecologically most important groups of eukaryotic microorganisms, remains largely unknown in deep biosphere environments. In this study we investigated fungal communities in packer-isolated bedrock fractures in Olkiluoto, Finland at depths ranging from 296 to 798 m below surface level. DNA- and cDNA-based high-throughput amplicon sequencing analysis of the fungal internal transcribed spacer (ITS) gene markers was used to examine the total fungal diversity and to identify the active members in deep fracture zones at different depths. Results showed that fungi were present in fracture zones at all depths and fungal diversity was higher than expected. Most of the observed fungal sequences belonged to the phylum Ascomycota. Phyla Basidiomycota and Chytridiomycota were only represented as a minor part of the fungal community. Dominating fungal classes in the deep bedrock aquifers were Sordariomycetes, Eurotiomycetes, and Dothideomycetes from the Ascomycota phylum and classes Microbotryomycetes and Tremellomycetes from the Basidiomycota phylum, which are the most frequently detected fungal taxa reported also from deep sea environments. In addition some fungal sequences represented potentially novel fungal species. Active fungi were detected in most of the fracture zones, which proves that fungi are able to maintain cellular activity in these oligotrophic conditions. Possible roles of fungi and their origin in deep bedrock groundwater can only be speculated in the light of current knowledge but some species may be specifically adapted to deep subsurface environment and may play important roles in the utilization and recycling of nutrients and thus sustaining the deep subsurface microbial community.
Miyatake, Satoko; Koshimizu, Eriko; Hayashi, Yukiko K; Miya, Kazushi; Shiina, Masaaki; Nakashima, Mitsuko; Tsurusaki, Yoshinori; Miyake, Noriko; Saitsu, Hirotomo; Ogata, Kazuhiro; Nishino, Ichizo; Matsumoto, Naomichi
2014-07-01
When an expected mutation in a particular disease-causing gene is not identified in a suspected carrier, it is usually assumed to be due to germline mosaicism. We report here very-low-grade somatic mosaicism in ACTA1 in an unaffected mother of two siblings affected with a neonatal form of nemaline myopathy. The mosaicism was detected by deep resequencing using a next-generation sequencer. We identified a novel heterozygous mutation in ACTA1, c.448A>G (p.Thr150Ala), in the affected siblings. Three-dimensional structural modeling suggested that this mutation may affect polymerization and/or actin's interactions with other proteins. In this family, we expected autosomal dominant inheritance with either parent demonstrating germline or somatic mosaicism. Sanger sequencing identified no mutation. However, further deep resequencing of this mutation on a next-generation sequencer identified very-low-grade somatic mosaicism in the mother: 0.4%, 1.1%, and 8.3% in the saliva, blood leukocytes, and nails, respectively. Our study demonstrates the possibility of very-low-grade somatic mosaicism in suspected carriers, rather than germline mosaicism. Copyright © 2014 Elsevier B.V. All rights reserved.
Lahuerta, Juan J.; Pepin, François; González, Marcos; Barrio, Santiago; Ayala, Rosa; Puig, Noemí; Montalban, María A.; Paiva, Bruno; Weng, Li; Jiménez, Cristina; Sopena, María; Moorhead, Martin; Cedena, Teresa; Rapado, Immaculada; Mateos, María Victoria; Rosiñol, Laura; Oriol, Albert; Blanchard, María J.; Martínez, Rafael; Bladé, Joan; San Miguel, Jesús; Faham, Malek; García-Sanz, Ramón
2014-01-01
We assessed the prognostic value of minimal residual disease (MRD) detection in multiple myeloma (MM) patients using a sequencing-based platform in bone marrow samples from 133 MM patients in at least very good partial response (VGPR) after front-line therapy. Deep sequencing was carried out in patients in whom a high-frequency myeloma clone was identified and MRD was assessed using the IGH-VDJH, IGH-DJH, and IGK assays. The results were contrasted with those of multiparametric flow cytometry (MFC) and allele-specific oligonucleotide polymerase chain reaction (ASO-PCR). The applicability of deep sequencing was 91%. Concordance between sequencing and MFC and ASO-PCR was 83% and 85%, respectively. Patients who were MRD– by sequencing had a significantly longer time to tumor progression (TTP) (median 80 vs 31 months; P < .0001) and overall survival (median not reached vs 81 months; P = .02), compared with patients who were MRD+. When stratifying patients by different levels of MRD, the respective TTP medians were: MRD ≥10−3 27 months, MRD 10−3 to 10−5 48 months, and MRD <10−5 80 months (P = .003 to .0001). Ninety-two percent of VGPR patients were MRD+. In complete response patients, the TTP remained significantly longer for MRD– compared with MRD+ patients (131 vs 35 months; P = .0009). PMID:24646471
Leung, Preston; Eltahla, Auda A; Lloyd, Andrew R; Bull, Rowena A; Luciani, Fabio
2017-07-15
With the advent of affordable deep sequencing technologies, detection of low frequency variants within genetically diverse viral populations can now be achieved with unprecedented depth and efficiency. The high-resolution data provided by next generation sequencing technologies is currently recognised as the gold standard in estimation of viral diversity. In the analysis of rapidly mutating viruses, longitudinal deep sequencing datasets from viral genomes during individual infection episodes, as well as at the epidemiological level during outbreaks, now allow for more sophisticated analyses such as statistical estimates of the impact of complex mutation patterns on the evolution of the viral populations both within and between hosts. These analyses are revealing more accurate descriptions of the evolutionary dynamics that underpin the rapid adaptation of these viruses to the host response, and to drug therapies. This review assesses recent developments in methods and provide informative research examples using deep sequencing data generated from rapidly mutating viruses infecting humans, particularly hepatitis C virus (HCV), human immunodeficiency virus (HIV), Ebola virus and influenza virus, to understand the evolution of viral genomes and to explore the relationship between viral mutations and the host adaptive immune response. Finally, we discuss limitations in current technologies, and future directions that take advantage of publically available large deep sequencing datasets. Copyright © 2016 Elsevier B.V. All rights reserved.
Sakai, Hiroaki; Kanamori, Hiroyuki; Arai-Kichise, Yuko; Shibata-Hatta, Mari; Ebana, Kaworu; Oono, Youko; Kurita, Kanako; Fujisawa, Hiroko; Katagiri, Satoshi; Mukai, Yoshiyuki; Hamada, Masao; Itoh, Takeshi; Matsumoto, Takashi; Katayose, Yuichi; Wakasa, Kyo; Yano, Masahiro; Wu, Jianzhong
2014-01-01
Having a deep genetic structure evolved during its domestication and adaptation, the Asian cultivated rice (Oryza sativa) displays considerable physiological and morphological variations. Here, we describe deep whole-genome sequencing of the aus rice cultivar Kasalath by using the advanced next-generation sequencing (NGS) technologies to gain a better understanding of the sequence and structural changes among highly differentiated cultivars. The de novo assembled Kasalath sequences represented 91.1% (330.55 Mb) of the genome and contained 35 139 expressed loci annotated by RNA-Seq analysis. We detected 2 787 250 single-nucleotide polymorphisms (SNPs) and 7393 large insertion/deletion (indel) sites (>100 bp) between Kasalath and Nipponbare, and 2 216 251 SNPs and 3780 large indels between Kasalath and 93-11. Extensive comparison of the gene contents among these cultivars revealed similar rates of gene gain and loss. We detected at least 7.39 Mb of inserted sequences and 40.75 Mb of unmapped sequences in the Kasalath genome in comparison with the Nipponbare reference genome. Mapping of the publicly available NGS short reads from 50 rice accessions proved the necessity and the value of using the Kasalath whole-genome sequence as an additional reference to capture the sequence polymorphisms that cannot be discovered by using the Nipponbare sequence alone. PMID:24578372
Deep whole-genome sequencing of 100 southeast Asian Malays.
Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying
2013-01-10
Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Deep Whole-Genome Sequencing of 100 Southeast Asian Malays
Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying
2013-01-01
Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. PMID:23290073
Oasis 2: improved online analysis of small RNA-seq data.
Rahman, Raza-Ur; Gautam, Abhivyakti; Bethune, Jörn; Sattar, Abdul; Fiosins, Maksims; Magruder, Daniel Sumner; Capece, Vincenzo; Shomroni, Orr; Bonn, Stefan
2018-02-14
Small RNA molecules play important roles in many biological processes and their dysregulation or dysfunction can cause disease. The current method of choice for genome-wide sRNA expression profiling is deep sequencing. Here we present Oasis 2, which is a new main release of the Oasis web application for the detection, differential expression, and classification of small RNAs in deep sequencing data. Compared to its predecessor Oasis, Oasis 2 features a novel and speed-optimized sRNA detection module that supports the identification of small RNAs in any organism with higher accuracy. Next to the improved detection of small RNAs in a target organism, the software now also recognizes potential cross-species miRNAs and viral and bacterial sRNAs in infected samples. In addition, novel miRNAs can now be queried and visualized interactively, providing essential information for over 700 high-quality miRNA predictions across 14 organisms. Robust biomarker signatures can now be obtained using the novel enhanced classification module. Oasis 2 enables biologists and medical researchers to rapidly analyze and query small RNA deep sequencing data with improved precision, recall, and speed, in an interactive and user-friendly environment. Oasis 2 is implemented in Java, J2EE, mysql, Python, R, PHP and JavaScript. It is freely available at https://oasis.dzne.de.
Vernick, Kenneth D.
2017-01-01
Metavisitor is a software package that allows biologists and clinicians without specialized bioinformatics expertise to detect and assemble viral genomes from deep sequence datasets. The package is composed of a set of modular bioinformatic tools and workflows that are implemented in the Galaxy framework. Using the graphical Galaxy workflow editor, users with minimal computational skills can use existing Metavisitor workflows or adapt them to suit specific needs by adding or modifying analysis modules. Metavisitor works with DNA, RNA or small RNA sequencing data over a range of read lengths and can use a combination of de novo and guided approaches to assemble genomes from sequencing reads. We show that the software has the potential for quick diagnosis as well as discovery of viruses from a vast array of organisms. Importantly, we provide here executable Metavisitor use cases, which increase the accessibility and transparency of the software, ultimately enabling biologists or clinicians to focus on biological or medical questions. PMID:28045932
Kravatsky, Yuri; Chechetkin, Vladimir; Fedoseeva, Daria; Gorbacheva, Maria; Kravatskaya, Galina; Kretova, Olga; Tchurikov, Nickolai
2017-11-23
The efficient development of antiviral drugs, including efficient antiviral small interfering RNAs (siRNAs), requires continuous monitoring of the strict correspondence between a drug and the related highly variable viral DNA/RNA target(s). Deep sequencing is able to provide an assessment of both the general target conservation and the frequency of particular mutations in the different target sites. The aim of this study was to develop a reliable bioinformatic pipeline for the analysis of millions of short, deep sequencing reads corresponding to selected highly variable viral sequences that are drug target(s). The suggested bioinformatic pipeline combines the available programs and the ad hoc scripts based on an original algorithm of the search for the conserved targets in the deep sequencing data. We also present the statistical criteria for the threshold of reliable mutation detection and for the assessment of variations between corresponding data sets. These criteria are robust against the possible sequencing errors in the reads. As an example, the bioinformatic pipeline is applied to the study of the conservation of RNA interference (RNAi) targets in human immunodeficiency virus 1 (HIV-1) subtype A. The developed pipeline is freely available to download at the website http://virmut.eimb.ru/. Brief comments and comparisons between VirMut and other pipelines are also presented.
Gibson, Richard M.; Meyer, Ashley M.; Winner, Dane; Archer, John; Feyertag, Felix; Ruiz-Mateos, Ezequiel; Leal, Manuel; Robertson, David L.; Schmotzer, Christine L.
2014-01-01
With 29 individual antiretroviral drugs available from six classes that are approved for the treatment of HIV-1 infection, a combination of different phenotypic and genotypic tests is currently needed to monitor HIV-infected individuals. In this study, we developed a novel HIV-1 genotypic assay based on deep sequencing (DeepGen HIV) to simultaneously assess HIV-1 susceptibilities to all drugs targeting the three viral enzymes and to predict HIV-1 coreceptor tropism. Patient-derived gag-p2/NCp7/p1/p6/pol-PR/RT/IN- and env-C2V3 PCR products were sequenced using the Ion Torrent Personal Genome Machine. Reads spanning the 3′ end of the Gag, protease (PR), reverse transcriptase (RT), integrase (IN), and V3 regions were extracted, truncated, translated, and assembled for genotype and HIV-1 coreceptor tropism determination. DeepGen HIV consistently detected both minority drug-resistant viruses and non-R5 HIV-1 variants from clinical specimens with viral loads of ≥1,000 copies/ml and from B and non-B subtypes. Additional mutations associated with resistance to PR, RT, and IN inhibitors, previously undetected by standard (Sanger) population sequencing, were reliably identified at frequencies as low as 1%. DeepGen HIV results correlated with phenotypic (original Trofile, 92%; enhanced-sensitivity Trofile assay [ESTA], 80%; TROCAI, 81%; and VeriTrop, 80%) and genotypic (population sequencing/Geno2Pheno with a 10% false-positive rate [FPR], 84%) HIV-1 tropism test results. DeepGen HIV (83%) and Trofile (85%) showed similar concordances with the clinical response following an 8-day course of maraviroc monotherapy (MCT). In summary, this novel all-inclusive HIV-1 genotypic and coreceptor tropism assay, based on deep sequencing of the PR, RT, IN, and V3 regions, permits simultaneous multiplex detection of low-level drug-resistant and/or non-R5 viruses in up to 96 clinical samples. This comprehensive test, the first of its class, will be instrumental in the development of new antiretroviral drugs and, more importantly, will aid in the treatment and management of HIV-infected individuals. PMID:24468782
Xu, Chang; Nezami Ranjbar, Mohammad R; Wu, Zhong; DiCarlo, John; Wang, Yexun
2017-01-03
Detection of DNA mutations at very low allele fractions with high accuracy will significantly improve the effectiveness of precision medicine for cancer patients. To achieve this goal through next generation sequencing, researchers need a detection method that 1) captures rare mutation-containing DNA fragments efficiently in the mix of abundant wild-type DNA; 2) sequences the DNA library extensively to deep coverage; and 3) distinguishes low level true variants from amplification and sequencing errors with high accuracy. Targeted enrichment using PCR primers provides researchers with a convenient way to achieve deep sequencing for a small, yet most relevant region using benchtop sequencers. Molecular barcoding (or indexing) provides a unique solution for reducing sequencing artifacts analytically. Although different molecular barcoding schemes have been reported in recent literature, most variant calling has been done on limited targets, using simple custom scripts. The analytical performance of barcode-aware variant calling can be significantly improved by incorporating advanced statistical models. We present here a highly efficient, simple and scalable enrichment protocol that integrates molecular barcodes in multiplex PCR amplification. In addition, we developed smCounter, an open source, generic, barcode-aware variant caller based on a Bayesian probabilistic model. smCounter was optimized and benchmarked on two independent read sets with SNVs and indels at 5 and 1% allele fractions. Variants were called with very good sensitivity and specificity within coding regions. We demonstrated that we can accurately detect somatic mutations with allele fractions as low as 1% in coding regions using our enrichment protocol and variant caller.
A deep intronic mutation in the SLC12A3 gene leads to Gitelman syndrome.
Nozu, Kandai; Iijima, Kazumoto; Nozu, Yoshimi; Ikegami, Ei; Imai, Takehide; Fu, Xue Jun; Kaito, Hiroshi; Nakanishi, Koichi; Yoshikawa, Norishige; Matsuo, Masafumi
2009-11-01
Many mutations have been detected in the SLC12A3 gene of Gitelman syndrome (GS, OMIM 263800) patients. In previous studies, only one mutant allele was detected in approximately 20 to 41% of patients with GS; however, the exact reason for the nonidentification has not been established. In this study, we used RT-PCR using mRNA to investigate for the first time transcript abnormalities caused by deep intronic mutation. Direct sequencing analysis of leukocyte DNA identified one base insertion in exon 6 (c.818_819insG), but no mutation was detected in another allele. We analyzed RNA extracted from leukocytes and urine sediments and detected unknown sequence containing 238bp between exons 13 and 14. The genomic DNA analysis of intron 13 revealed a single-base substitution (c.1670-191C>T) that creates a new donor splice site within the intron resulting in the inclusion of a novel cryptic exon in mRNA. This is the first report of creation of a splice site by a deep intronic single-nucleotide change in GS and the first report to detect the onset mechanism in a patient with GS and missing mutation in one allele. This molecular onset mechanism may partly explain the poor success rate of mutation detection in both alleles of patients with GS.
Isakov, Ofer; Bordería, Antonio V; Golan, David; Hamenahem, Amir; Celniker, Gershon; Yoffe, Liron; Blanc, Hervé; Vignuzzi, Marco; Shomron, Noam
2015-07-01
The study of RNA virus populations is a challenging task. Each population of RNA virus is composed of a collection of different, yet related genomes often referred to as mutant spectra or quasispecies. Virologists using deep sequencing technologies face major obstacles when studying virus population dynamics, both experimentally and in natural settings due to the relatively high error rates of these technologies and the lack of high performance pipelines. In order to overcome these hurdles we developed a computational pipeline, termed ViVan (Viral Variance Analysis). ViVan is a complete pipeline facilitating the identification, characterization and comparison of sequence variance in deep sequenced virus populations. Applying ViVan on deep sequenced data obtained from samples that were previously characterized by more classical approaches, we uncovered novel and potentially crucial aspects of virus populations. With our experimental work, we illustrate how ViVan can be used for studies ranging from the more practical, detection of resistant mutations and effects of antiviral treatments, to the more theoretical temporal characterization of the population in evolutionary studies. Freely available on the web at http://www.vivanbioinfo.org : nshomron@post.tau.ac.il Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Detection and characterization of the first North American mastrevirus in Switchgrass
USDA-ARS?s Scientific Manuscript database
Virus infections have the potential to reduce biomass yields in energy crops, including Panicum virgatum (switchgrass). As a first step towards managing virus-induced biomass reductions, deep sequencing was used to identify viruses associated with mosaic symptoms in switch grass, which detected thre...
Owa, Chie; Poulin, Matthew; Yan, Liying; Shioda, Toshi
2018-01-01
The existence of cytosine methylation in mammalian mitochondrial DNA (mtDNA) is a controversial subject. Because detection of DNA methylation depends on resistance of 5'-modified cytosines to bisulfite-catalyzed conversion to uracil, examined parameters that affect technical adequacy of mtDNA methylation analysis. Negative control amplicons (NCAs) devoid of cytosine methylation were amplified to cover the entire human or mouse mtDNA by long-range PCR. When the pyrosequencing template amplicons were gel-purified after bisulfite conversion, bisulfite pyrosequencing of NCAs did not detect significant levels of bisulfite-resistant cytosines (brCs) at ND1 (7 CpG sites) or CYTB (8 CpG sites) genes (CI95 = 0%-0.94%); without gel-purification, significant false-positive brCs were detected from NCAs (CI95 = 4.2%-6.8%). Bisulfite pyrosequencing of highly purified, linearized mtDNA isolated from human iPS cells or mouse liver detected significant brCs (~30%) in human ND1 gene when the sequencing primer was not selective in bisulfite-converted and unconverted templates. However, repeated experiments using a sequencing primer selective in bisulfite-converted templates almost completely (< 0.8%) suppressed brC detection, supporting the false-positive nature of brCs detected using the non-selective primer. Bisulfite-seq deep sequencing of linearized, gel-purified human mtDNA detected 9.4%-14.8% brCs for 9 CpG sites in ND1 gene. However, because all these brCs were associated with adjacent non-CpG brCs showing the same degrees of bisulfite resistance, DNA methylation in this mtDNA-encoded gene was not confirmed. Without linearization, data generated by bisulfite pyrosequencing or deep sequencing of purified mtDNA templates did not pass the quality control criteria. Shotgun bisulfite sequencing of human mtDNA detected extremely low levels of CpG methylation (<0.65%) over non-CpG methylation (<0.55%). Taken together, our study demonstrates that adequacy of mtDNA methylation analysis using methods dependent on bisulfite conversion needs to be established for each experiment, taking effects of incomplete bisulfite conversion and template impurity or topology into consideration.
miRanalyzer: a microRNA detection and analysis tool for next-generation sequencing experiments.
Hackenberg, Michael; Sturm, Martin; Langenberger, David; Falcón-Pérez, Juan Manuel; Aransay, Ana M
2009-07-01
Next-generation sequencing allows now the sequencing of small RNA molecules and the estimation of their expression levels. Consequently, there will be a high demand of bioinformatics tools to cope with the several gigabytes of sequence data generated in each single deep-sequencing experiment. Given this scene, we developed miRanalyzer, a web server tool for the analysis of deep-sequencing experiments for small RNAs. The web server tool requires a simple input file containing a list of unique reads and its copy numbers (expression levels). Using these data, miRanalyzer (i) detects all known microRNA sequences annotated in miRBase, (ii) finds all perfect matches against other libraries of transcribed sequences and (iii) predicts new microRNAs. The prediction of new microRNAs is an especially important point as there are many species with very few known microRNAs. Therefore, we implemented a highly accurate machine learning algorithm for the prediction of new microRNAs that reaches AUC values of 97.9% and recall values of up to 75% on unseen data. The web tool summarizes all the described steps in a single output page, which provides a comprehensive overview of the analysis, adding links to more detailed output pages for each analysis module. miRanalyzer is available at http://web.bioinformatics.cicbiogune.es/microRNA/.
A visual tracking method based on deep learning without online model updating
NASA Astrophysics Data System (ADS)
Tang, Cong; Wang, Yicheng; Feng, Yunsong; Zheng, Chao; Jin, Wei
2018-02-01
The paper proposes a visual tracking method based on deep learning without online model updating. In consideration of the advantages of deep learning in feature representation, deep model SSD (Single Shot Multibox Detector) is used as the object extractor in the tracking model. Simultaneously, the color histogram feature and HOG (Histogram of Oriented Gradient) feature are combined to select the tracking object. In the process of tracking, multi-scale object searching map is built to improve the detection performance of deep detection model and the tracking efficiency. In the experiment of eight respective tracking video sequences in the baseline dataset, compared with six state-of-the-art methods, the method in the paper has better robustness in the tracking challenging factors, such as deformation, scale variation, rotation variation, illumination variation, and background clutters, moreover, its general performance is better than other six tracking methods.
Tracking the origins and drivers of subclonal metastatic expansion in prostate cancer
Hong, Matthew K. H.; Macintyre, Geoff; Wedge, David C.; ...
2015-04-01
Tumour heterogeneity in primary prostate cancer is a well-established phenomenon. However, how the subclonal diversity of tumours changes during metastasis and progression to lethality is poorly understood. Here we reveal the precise direction of metastatic spread across four lethal prostate cancer patients using whole-genome and ultra-deep targeted sequencing of longitudinally collected primary and metastatic tumours. We find one case of metastatic spread to the surgical bed causing local recurrence, and another case of cross-metastatic site seeding combining with dynamic remoulding of subclonal mixtures in response to therapy. By ultra-deep sequencing end-stage blood, we detect both metastatic and primary tumour clones,more » even years after removal of the prostate. As a result, analysis of mutations associated with metastasis reveals an enrichment of TP53 mutations, and additional sequencing of metastases from 19 patients demonstrates that acquisition of TP53 mutations is linked with the expansion of subclones with metastatic potential which we can detect in the blood.« less
Tracking the origins and drivers of subclonal metastatic expansion in prostate cancer.
Hong, Matthew K H; Macintyre, Geoff; Wedge, David C; Van Loo, Peter; Patel, Keval; Lunke, Sebastian; Alexandrov, Ludmil B; Sloggett, Clare; Cmero, Marek; Marass, Francesco; Tsui, Dana; Mangiola, Stefano; Lonie, Andrew; Naeem, Haroon; Sapre, Nikhil; Phal, Pramit M; Kurganovs, Natalie; Chin, Xiaowen; Kerger, Michael; Warren, Anne Y; Neal, David; Gnanapragasam, Vincent; Rosenfeld, Nitzan; Pedersen, John S; Ryan, Andrew; Haviv, Izhak; Costello, Anthony J; Corcoran, Niall M; Hovens, Christopher M
2015-04-01
Tumour heterogeneity in primary prostate cancer is a well-established phenomenon. However, how the subclonal diversity of tumours changes during metastasis and progression to lethality is poorly understood. Here we reveal the precise direction of metastatic spread across four lethal prostate cancer patients using whole-genome and ultra-deep targeted sequencing of longitudinally collected primary and metastatic tumours. We find one case of metastatic spread to the surgical bed causing local recurrence, and another case of cross-metastatic site seeding combining with dynamic remoulding of subclonal mixtures in response to therapy. By ultra-deep sequencing end-stage blood, we detect both metastatic and primary tumour clones, even years after removal of the prostate. Analysis of mutations associated with metastasis reveals an enrichment of TP53 mutations, and additional sequencing of metastases from 19 patients demonstrates that acquisition of TP53 mutations is linked with the expansion of subclones with metastatic potential which we can detect in the blood.
Detection of non-coding RNA in bacteria and archaea using the DETR'PROK Galaxy pipeline.
Toffano-Nioche, Claire; Luo, Yufei; Kuchly, Claire; Wallon, Claire; Steinbach, Delphine; Zytnicki, Matthias; Jacq, Annick; Gautheret, Daniel
2013-09-01
RNA-seq experiments are now routinely used for the large scale sequencing of transcripts. In bacteria or archaea, such deep sequencing experiments typically produce 10-50 million fragments that cover most of the genome, including intergenic regions. In this context, the precise delineation of the non-coding elements is challenging. Non-coding elements include untranslated regions (UTRs) of mRNAs, independent small RNA genes (sRNAs) and transcripts produced from the antisense strand of genes (asRNA). Here we present a computational pipeline (DETR'PROK: detection of ncRNAs in prokaryotes) based on the Galaxy framework that takes as input a mapping of deep sequencing reads and performs successive steps of clustering, comparison with existing annotation and identification of transcribed non-coding fragments classified into putative 5' UTRs, sRNAs and asRNAs. We provide a step-by-step description of the protocol using real-life example data sets from Vibrio splendidus and Escherichia coli. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.
Some New Windows into Terrestrial Deep Subsurface Microbial Ecosystems
NASA Astrophysics Data System (ADS)
Moser, D. P.
2011-12-01
Over the past several years, our group has surveyed the microbial ecology and biogeochemistry of a range of fracture rock subsurface ecosystems via deep mine boreholes in South Africa, the United States, and Canada; and boreholes from surface or deeply-sourced natural springs of the U.S. Great Basin. Collectively, these mostly unexplored habitats represent a wide range of geologic provinces, host rock types, aquatic chemistries, and the vast potential for biogeographic isolation. Thus, patterns of microbial diversity are of interest from the perspective of filling a fundamental knowledge gap; and while not necessarily expected, the detection of closely related microorganisms from geographically isolated settings would be noteworthy. Across these sample sets, microbial communities were invariably very low in biomass (e.g. 10e3 - 10e4 cells per mL) and dominated by deeply-branching bacterial lineages, particularly from the phyla Firmicutes and Nitrospira. In several cases, the Firmicutes have shown very close phylogenetic affiliations to lineages detected at divergent locations. For example, one abundant lineage from a new artesian well drilled into the Furnace Creek Fault of Death Valley, CA bears a very close phylogenetic relatedness to environmental DNA sequences (SSU rRNA gene) detected in one of the world's deepest mines (Tau Tona of South Africa) and what was North America's deepest gold mine (Homestake of South Dakota). Several radioactive wells from the Nevada National Security Site have produced rRNA gene sequences very close (e.g. greater than 99% identity) to that of Desulforudis audaxviator, a rarely detected microorganism thought to subsist as a single species ecosystem on the products of radiochemical reactions in deep crustal rocks from the South African Witwatersrand Basin. These sequences, along with more distantly related sequences from the marine subsurface (ridge flank basalt and mud volcanoes) and groundwater in Europe, hint at a role in certain hydrogen-rich subsurface settings for this group. Likewise, patterns of archaeal diversity across many of our Great Basin sites suggest shared deep lineages, particularly with the phylum, Thaumarchaeota. Here we will explore the possible significance of these patterns of diversity and discuss future research plans involving high throughput molecular techniques.
Li, Jonathan Z; Chapman, Brad; Charlebois, Patrick; Hofmann, Oliver; Weiner, Brian; Porter, Alyssa J; Samuel, Reshmi; Vardhanabhuti, Saran; Zheng, Lu; Eron, Joseph; Taiwo, Babafemi; Zody, Michael C; Henn, Matthew R; Kuritzkes, Daniel R; Hide, Winston; Wilson, Cara C; Berzins, Baiba I; Acosta, Edward P; Bastow, Barbara; Kim, Peter S; Read, Sarah W; Janik, Jennifer; Meres, Debra S; Lederman, Michael M; Mong-Kryspin, Lori; Shaw, Karl E; Zimmerman, Louis G; Leavitt, Randi; De La Rosa, Guy; Jennings, Amy
2014-01-01
The impact of raltegravir-resistant HIV-1 minority variants (MVs) on raltegravir treatment failure is unknown. Illumina sequencing offers greater throughput than 454, but sequence analysis tools for viral sequencing are needed. We evaluated Illumina and 454 for the detection of HIV-1 raltegravir-resistant MVs. A5262 was a single-arm study of raltegravir and darunavir/ritonavir in treatment-naïve patients. Pre-treatment plasma was obtained from 5 participants with raltegravir resistance at the time of virologic failure. A control library was created by pooling integrase clones at predefined proportions. Multiplexed sequencing was performed with Illumina and 454 platforms at comparable costs. Illumina sequence analysis was performed with the novel snp-assess tool and 454 sequencing was analyzed with V-Phaser. Illumina sequencing resulted in significantly higher sequence coverage and a 0.095% limit of detection. Illumina accurately detected all MVs in the control library at ≥0.5% and 7/10 MVs expected at 0.1%. 454 sequencing failed to detect any MVs at 0.1% with 5 false positive calls. For MVs detected in the patient samples by both 454 and Illumina, the correlation in the detected variant frequencies was high (R2 = 0.92, P<0.001). Illumina sequencing detected 2.4-fold greater nucleotide MVs and 2.9-fold greater amino acid MVs compared to 454. The only raltegravir-resistant MV detected was an E138K mutation in one participant by Illumina sequencing, but not by 454. In participants of A5262 with raltegravir resistance at virologic failure, baseline raltegravir-resistant MVs were rarely detected. At comparable costs to 454 sequencing, Illumina demonstrated greater depth of coverage, increased sensitivity for detecting HIV MVs, and fewer false positive variant calls.
St. John, Elizabeth P.; Simen, Birgitte B.; Turenchalk, Gregory S.; Braverman, Michael S.; Abbate, Isabella; Aerssens, Jeroen; Bouchez, Olivier; Gabriel, Christian; Izopet, Jacques; Meixenberger, Karolin; Di Giallonardo, Francesca; Schlapbach, Ralph; Paredes, Roger; Sakwa, James; Schmitz-Agheguian, Gudrun G.; Thielen, Alexander; Victor, Martin
2016-01-01
Background Ultra deep sequencing is of increasing use not only in research but also in diagnostics. For implementation of ultra deep sequencing assays in clinical laboratories for routine diagnostics, intra- and inter-laboratory testing are of the utmost importance. Methods A multicenter study was conducted to validate an updated assay design for 454 Life Sciences’ GS FLX Titanium system targeting protease/reverse transcriptase (RTP) and env (V3) regions to identify HIV-1 drug-resistance mutations and determine co-receptor use with high sensitivity. The study included 30 HIV-1 subtype B and 6 subtype non-B samples with viral titers (VT) of 3,940–447,400 copies/mL, two dilution series (52,129–1,340 and 25,130–734 copies/mL), and triplicate samples. Amplicons spanning PR codons 10–99, RT codons 1–251 and the entire V3 region were generated using barcoded primers. Analysis was performed using the GS Amplicon Variant Analyzer and geno2pheno for tropism. For comparison, population sequencing was performed using the ViroSeq HIV-1 genotyping system. Results The median sequencing depth across the 11 sites was 1,829 reads per position for RTP (IQR 592–3,488) and 2,410 for V3 (IQR 786–3,695). 10 preselected drug resistant variants were measured across sites and showed high inter-laboratory correlation across all sites with data (P<0.001). The triplicate samples of a plasmid mixture confirmed the high inter-laboratory consistency (mean% ± stdev: 4.6 ±0.5, 4.8 ±0.4, 4.9 ±0.3) and revealed good intra-laboratory consistency (mean% range ± stdev range: 4.2–5.2 ± 0.04–0.65). In the two dilutions series, no variants >20% were missed, variants 2–10% were detected at most sites (even at low VT), and variants 1–2% were detected by some sites. All mutations detected by population sequencing were also detected by UDS. Conclusions This assay design results in an accurate and reproducible approach to analyze HIV-1 mutant spectra, even at variant frequencies well below those routinely detectable by population sequencing. PMID:26756901
Liang, Bin; Tan, Yaoju; Li, Zi; Tian, Xueshan; Du, Chen; Li, Hui; Li, Guoli; Yao, Xiangyang; Wang, Zhongan; Xu, Ye; Li, Qingge
2018-02-01
Detection of heteroresistance of Mycobacterium tuberculosis remains challenging using current genotypic drug susceptibility testing methods. Here, we described a melting curve analysis-based approach, termed DeepMelt, that can detect less-abundant mutants through selective clamping of the wild type in mixed populations. The singleplex DeepMelt assay detected 0.01% katG S315T in 10 5 M. tuberculosis genomes/μl. The multiplex DeepMelt TB/INH detected 1% of mutant species in the four loci associated with isoniazid resistance in 10 4 M. tuberculosis genomes/μl. The DeepMelt TB/INH assay was tested on a panel of DNA extracted from 602 precharacterized clinical isolates. Using the 1% proportion method as the gold standard, the sensitivity was found to be increased from 93.6% (176/188, 95% confidence interval [CI] = 89.2 to 96.3%) to 95.7% (180/188, 95% CI = 91.8 to 97.8%) compared to the MeltPro TB/INH assay. Further evaluation of 109 smear-positive sputum specimens increased the sensitivity from 83.3% (20/24, 95% CI = 64.2 to 93.3%) to 91.7% (22/24, 95% CI = 74.2 to 97.7%). In both cases, the specificity remained nearly unchanged. All heteroresistant samples newly identified by the DeepMelt TB/INH assay were confirmed by DNA sequencing and even partially by digital PCR. The DeepMelt assay may fill the gap between current genotypic and phenotypic drug susceptibility testing for detecting drug-resistant tuberculosis patients. Copyright © 2018 American Society for Microbiology.
Video Salient Object Detection via Fully Convolutional Networks.
Wang, Wenguan; Shen, Jianbing; Shao, Ling
This paper proposes a deep learning model to efficiently detect salient regions in videos. It addresses two important issues: 1) deep video saliency model training with the absence of sufficiently large and pixel-wise annotated video data and 2) fast video saliency training and detection. The proposed deep video saliency network consists of two modules, for capturing the spatial and temporal saliency information, respectively. The dynamic saliency model, explicitly incorporating saliency estimates from the static saliency model, directly produces spatiotemporal saliency inference without time-consuming optical flow computation. We further propose a novel data augmentation technique that simulates video training data from existing annotated image data sets, which enables our network to learn diverse saliency information and prevents overfitting with the limited number of training videos. Leveraging our synthetic video data (150K video sequences) and real videos, our deep video saliency model successfully learns both spatial and temporal saliency cues, thus producing accurate spatiotemporal saliency estimate. We advance the state-of-the-art on the densely annotated video segmentation data set (MAE of .06) and the Freiburg-Berkeley Motion Segmentation data set (MAE of .07), and do so with much improved speed (2 fps with all steps).This paper proposes a deep learning model to efficiently detect salient regions in videos. It addresses two important issues: 1) deep video saliency model training with the absence of sufficiently large and pixel-wise annotated video data and 2) fast video saliency training and detection. The proposed deep video saliency network consists of two modules, for capturing the spatial and temporal saliency information, respectively. The dynamic saliency model, explicitly incorporating saliency estimates from the static saliency model, directly produces spatiotemporal saliency inference without time-consuming optical flow computation. We further propose a novel data augmentation technique that simulates video training data from existing annotated image data sets, which enables our network to learn diverse saliency information and prevents overfitting with the limited number of training videos. Leveraging our synthetic video data (150K video sequences) and real videos, our deep video saliency model successfully learns both spatial and temporal saliency cues, thus producing accurate spatiotemporal saliency estimate. We advance the state-of-the-art on the densely annotated video segmentation data set (MAE of .06) and the Freiburg-Berkeley Motion Segmentation data set (MAE of .07), and do so with much improved speed (2 fps with all steps).
Identifying active foraminifera in the Sea of Japan using metatranscriptomic approach
NASA Astrophysics Data System (ADS)
Lejzerowicz, Franck; Voltsky, Ivan; Pawlowski, Jan
2013-02-01
Metagenetics represents an efficient and rapid tool to describe environmental diversity patterns of microbial eukaryotes based on ribosomal DNA sequences. However, the results of metagenetic studies are often biased by the presence of extracellular DNA molecules that are persistent in the environment, especially in deep-sea sediment. As an alternative, short-lived RNA molecules constitute a good proxy for the detection of active species. Here, we used a metatranscriptomic approach based on RNA-derived (cDNA) sequences to study the diversity of the deep-sea benthic foraminifera and compared it to the metagenetic approach. We analyzed 257 ribosomal DNA and cDNA sequences obtained from seven sediments samples collected in the Sea of Japan at depths ranging from 486 to 3665 m. The DNA and RNA-based approaches gave a similar view of the taxonomic composition of foraminiferal assemblage, but differed in some important points. First, the cDNA dataset was dominated by sequences of rotaliids and robertiniids, suggesting that these calcareous species, some of which have been observed in Rose Bengal stained samples, are the most active component of foraminiferal community. Second, the richness of monothalamous (single-chambered) foraminifera was particularly high in DNA extracts from the deepest samples, confirming that this group of foraminifera is abundant but not necessarily very active in the deep-sea sediments. Finally, the high divergence of undetermined sequences in cDNA dataset indicate the limits of our database and lack of knowledge about some active but possibly rare species. Our study demonstrates the capability of the metatranscriptomic approach to detect active foraminiferal species and prompt its use in future high-throughput sequencing-based environmental surveys.
Coffey, Lark L; Page, Brady L; Greninger, Alexander L; Herring, Belinda L; Russell, Richard C; Doggett, Stephen L; Haniotis, John; Wang, Chunlin; Deng, Xutao; Delwart, Eric L
2014-01-05
Viral metagenomics characterizes known and identifies unknown viruses based on sequence similarities to any previously sequenced viral genomes. A metagenomics approach was used to identify virus sequences in Australian mosquitoes causing cytopathic effects in inoculated mammalian cell cultures. Sequence comparisons revealed strains of Liao Ning virus (Reovirus, Seadornavirus), previously detected only in China, livestock-infecting Stretch Lagoon virus (Reovirus, Orbivirus), two novel dimarhabdoviruses, named Beaumont and North Creek viruses, and two novel orthobunyaviruses, named Murrumbidgee and Salt Ash viruses. The novel virus proteomes diverged by ≥ 50% relative to their closest previously genetically characterized viral relatives. Deep sequencing also generated genomes of Warrego and Wallal viruses, orbiviruses linked to kangaroo blindness, whose genomes had not been fully characterized. This study highlights viral metagenomics in concert with traditional arbovirus surveillance to characterize known and new arboviruses in field-collected mosquitoes. Follow-up epidemiological studies are required to determine whether the novel viruses infect humans. © 2013 Elsevier Inc. All rights reserved.
Joint deep shape and appearance learning: application to optic pathway glioma segmentation
NASA Astrophysics Data System (ADS)
Mansoor, Awais; Li, Ien; Packer, Roger J.; Avery, Robert A.; Linguraru, Marius George
2017-03-01
Automated tissue characterization is one of the major applications of computer-aided diagnosis systems. Deep learning techniques have recently demonstrated impressive performance for the image patch-based tissue characterization. However, existing patch-based tissue classification techniques struggle to exploit the useful shape information. Local and global shape knowledge such as the regional boundary changes, diameter, and volumetrics can be useful in classifying the tissues especially in scenarios where the appearance signature does not provide significant classification information. In this work, we present a deep neural network-based method for the automated segmentation of the tumors referred to as optic pathway gliomas (OPG) located within the anterior visual pathway (AVP; optic nerve, chiasm or tracts) using joint shape and appearance learning. Voxel intensity values of commonly used MRI sequences are generally not indicative of OPG. To be considered an OPG, current clinical practice dictates that some portion of AVP must demonstrate shape enlargement. The method proposed in this work integrates multiple sequence magnetic resonance image (T1, T2, and FLAIR) along with local boundary changes to train a deep neural network. For training and evaluation purposes, we used a dataset of multiple sequence MRI obtained from 20 subjects (10 controls, 10 NF1+OPG). To our best knowledge, this is the first deep representation learning-based approach designed to merge shape and multi-channel appearance data for the glioma detection. In our experiments, mean misclassification errors of 2:39% and 0:48% were observed respectively for glioma and control patches extracted from the AVP. Moreover, an overall dice similarity coefficient of 0:87+/-0:13 (0:93+/-0:06 for healthy tissue, 0:78+/-0:18 for glioma tissue) demonstrates the potential of the proposed method in the accurate localization and early detection of OPG.
Comprehensive discovery of noncoding RNAs in acute myeloid leukemia cell transcriptomes.
Zhang, Jin; Griffith, Malachi; Miller, Christopher A; Griffith, Obi L; Spencer, David H; Walker, Jason R; Magrini, Vincent; McGrath, Sean D; Ly, Amy; Helton, Nichole M; Trissal, Maria; Link, Daniel C; Dang, Ha X; Larson, David E; Kulkarni, Shashikant; Cordes, Matthew G; Fronick, Catrina C; Fulton, Robert S; Klco, Jeffery M; Mardis, Elaine R; Ley, Timothy J; Wilson, Richard K; Maher, Christopher A
2017-11-01
To detect diverse and novel RNA species comprehensively, we compared deep small RNA and RNA sequencing (RNA-seq) methods applied to a primary acute myeloid leukemia (AML) sample. We were able to discover previously unannotated small RNAs using deep sequencing of a library method using broader insert size selection. We analyzed the long noncoding RNA (lncRNA) landscape in AML by comparing deep sequencing from multiple RNA-seq library construction methods for the sample that we studied and then integrating RNA-seq data from 179 AML cases. This identified lncRNAs that are completely novel, differentially expressed, and associated with specific AML subtypes. Our study revealed the complexity of the noncoding RNA transcriptome through a combined strategy of strand-specific small RNA and total RNA-seq. This dataset will serve as an invaluable resource for future RNA-based analyses. Copyright © 2017 ISEH – Society for Hematology and Stem Cells. Published by Elsevier Inc. All rights reserved.
Wu, Lucia R.; Chen, Sherry X.; Wu, Yalei; Patel, Abhijit A.; Zhang, David Yu
2018-01-01
Rare DNA-sequence variants hold important clinical and biological information, but existing detection techniques are expensive, complex, allele-specific, or don’t allow for significant multiplexing. Here, we report a temperature-robust polymerase-chain-reaction method, which we term blocker displacement amplification (BDA), that selectively amplifies all sequence variants, including single-nucleotide variants (SNVs), within a roughly 20-nucleotide window by 1,000-fold over wild-type sequences. This allows for easy detection and quantitation of hundreds of potential variants originally at ≤0.1% in allele frequency. BDA is compatible with inexpensive thermocycler instrumentation and employs a rationally designed competitive hybridization reaction to achieve comparable enrichment performance across annealing temperatures ranging from 56 °C to 64 °C. To show the sequence generality of BDA, we demonstrate enrichment of 156 SNVs and the reliable detection of single-digit copies. We also show that the BDA detection of rare driver mutations in cell-free DNA samples extracted from the blood plasma of lung-cancer patients is highly consistent with deep sequencing using molecular lineage tags, with a receiver operator characteristic accuracy of 95%. PMID:29805844
DOE Office of Scientific and Technical Information (OSTI.GOV)
Whitehead, Timothy A.; Chevalier, Aaron; Song, Yifan
2012-06-19
We show that comprehensive sequence-function maps obtained by deep sequencing can be used to reprogram interaction specificity and to leapfrog over bottlenecks in affinity maturation by combining many individually small contributions not detectable in conventional approaches. We use this approach to optimize two computationally designed inhibitors against H1N1 influenza hemagglutinin and, in both cases, obtain variants with subnanomolar binding affinity. The most potent of these, a 51-residue protein, is broadly cross-reactive against all influenza group 1 hemagglutinins, including human H2, and neutralizes H1N1 viruses with a potency that rivals that of several human monoclonal antibodies, demonstrating that computational design followedmore » by comprehensive energy landscape mapping can generate proteins with potential therapeutic utility.« less
Ojamies, P N; Kontro, M; Edgren, H; Ellonen, P; Lagström, S; Almusa, H; Miettinen, T; Eldfors, S; Tamborero, D; Wennerberg, K; Heckman, C; Porkka, K; Wolf, M; Kallioniemi, O
2017-05-01
In our individualized systems medicine program, personalized treatment options are identified and administered to chemorefractory acute myeloid leukemia (AML) patients based on exome sequencing and ex vivo drug sensitivity and resistance testing data. Here, we analyzed how clonal heterogeneity affects the responses of 13 AML patients to chemotherapy or targeted treatments using ultra-deep (average 68 000 × coverage) amplicon resequencing. Using amplicon resequencing, we identified 16 variants from 4 patients (frequency 0.54-2%) that were not detected previously by exome sequencing. A correlation-based method was developed to detect mutation-specific responses in serial samples across multiple time points. Significant subclone-specific responses were observed for both chemotherapy and targeted therapy. We detected subclonal responses in patients where clinical European LeukemiaNet (ELN) criteria showed no response. Subclonal responses also helped to identify putative mechanisms underlying drug sensitivities, such as sensitivity to azacitidine in DNMT3A mutated cell clones and resistance to cytarabine in a subclone with loss of NF1 gene. In summary, ultra-deep amplicon resequencing method enables sensitive quantification of subclonal variants and their responses to therapies. This approach provides new opportunities for designing combinatorial therapies blocking multiple subclones as well as for real-time assessment of such treatments.
Tsuchiya, Mariko; Amano, Kojiro; Abe, Masaya; Seki, Misato; Hase, Sumitaka; Sato, Kengo; Sakakibara, Yasubumi
2016-06-15
Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs. We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5'-end processing and 3'-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain. The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA004502. yasu@bio.keio.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Goldsmith, Dawn B.; Parsons, Rachel J.; Beyene, Damitu; Salamon, Peter
2015-01-01
Deep sequencing of the viral phoH gene, a host-derived auxiliary metabolic gene, was used to track viral diversity throughout the water column at the Bermuda Atlantic Time-series Study (BATS) site in the summer (September) and winter (March) of three years. Viral phoH sequences reveal differences in the viral communities throughout a depth profile and between seasons in the same year. Variation was also detected between the same seasons in subsequent years, though these differences were not as great as the summer/winter distinctions. Over 3,600 phoH operational taxonomic units (OTUs; 97% sequence identity) were identified. Despite high richness, most phoH sequences belong to a few large, common OTUs whereas the majority of the OTUs are small and rare. While many OTUs make sporadic appearances at just a few times or depths, a small number of OTUs dominate the community throughout the seasons, depths, and years. PMID:26157645
GenomeGems: evaluation of genetic variability from deep sequencing data
2012-01-01
Background Detection of disease-causing mutations using Deep Sequencing technologies possesses great challenges. In particular, organizing the great amount of sequences generated so that mutations, which might possibly be biologically relevant, are easily identified is a difficult task. Yet, for this assignment only limited automatic accessible tools exist. Findings We developed GenomeGems to gap this need by enabling the user to view and compare Single Nucleotide Polymorphisms (SNPs) from multiple datasets and to load the data onto the UCSC Genome Browser for an expanded and familiar visualization. As such, via automatic, clear and accessible presentation of processed Deep Sequencing data, our tool aims to facilitate ranking of genomic SNP calling. GenomeGems runs on a local Personal Computer (PC) and is freely available at http://www.tau.ac.il/~nshomron/GenomeGems. Conclusions GenomeGems enables researchers to identify potential disease-causing SNPs in an efficient manner. This enables rapid turnover of information and leads to further experimental SNP validation. The tool allows the user to compare and visualize SNPs from multiple experiments and to easily load SNP data onto the UCSC Genome browser for further detailed information. PMID:22748151
Bayesian mixture analysis for metagenomic community profiling.
Morfopoulou, Sofia; Plagnol, Vincent
2015-09-15
Deep sequencing of clinical samples is now an established tool for the detection of infectious pathogens, with direct medical applications. The large amount of data generated produces an opportunity to detect species even at very low levels, provided that computational tools can effectively profile the relevant metagenomic communities. Data interpretation is complicated by the fact that short sequencing reads can match multiple organisms and by the lack of completeness of existing databases, in particular for viral pathogens. Here we present metaMix, a Bayesian mixture model framework for resolving complex metagenomic mixtures. We show that the use of parallel Monte Carlo Markov chains for the exploration of the species space enables the identification of the set of species most likely to contribute to the mixture. We demonstrate the greater accuracy of metaMix compared with relevant methods, particularly for profiling complex communities consisting of several related species. We designed metaMix specifically for the analysis of deep transcriptome sequencing datasets, with a focus on viral pathogen detection; however, the principles are generally applicable to all types of metagenomic mixtures. metaMix is implemented as a user friendly R package, freely available on CRAN: http://cran.r-project.org/web/packages/metaMix sofia.morfopoulou.10@ucl.ac.uk Supplementary data are available at Bionformatics online. © The Author 2015. Published by Oxford University Press.
VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research
Lai, Zhongwu; Markovets, Aleksandra; Ahdesmaki, Miika; Chapman, Brad; Hofmann, Oliver; McEwen, Robert; Johnson, Justin; Dougherty, Brian; Barrett, J. Carl; Dry, Jonathan R.
2016-01-01
Abstract Accurate variant calling in next generation sequencing (NGS) is critical to understand cancer genomes better. Here we present VarDict, a novel and versatile variant caller for both DNA- and RNA-sequencing data. VarDict simultaneously calls SNV, MNV, InDels, complex and structural variants, expanding the detected genetic driver landscape of tumors. It performs local realignments on the fly for more accurate allele frequency estimation. VarDict performance scales linearly to sequencing depth, enabling ultra-deep sequencing used to explore tumor evolution or detect tumor DNA circulating in blood. In addition, VarDict performs amplicon aware variant calling for polymerase chain reaction (PCR)-based targeted sequencing often used in diagnostic settings, and is able to detect PCR artifacts. Finally, VarDict also detects differences in somatic and loss of heterozygosity variants between paired samples. VarDict reprocessing of The Cancer Genome Atlas (TCGA) Lung Adenocarcinoma dataset called known driver mutations in KRAS, EGFR, BRAF, PIK3CA and MET in 16% more patients than previously published variant calls. We believe VarDict will greatly facilitate application of NGS in clinical cancer research. PMID:27060149
Rath, Matthias; Jenssen, Sönke E; Schwefel, Konrad; Spiegler, Stefanie; Kleimeier, Dana; Sperling, Christian; Kaderali, Lars; Felbor, Ute
2017-09-01
Cerebral cavernous malformations (CCM) are vascular lesions of the central nervous system that can cause headaches, seizures and hemorrhagic stroke. Disease-associated mutations have been identified in three genes: CCM1/KRIT1, CCM2 and CCM3/PDCD10. The precise proportion of deep-intronic variants in these genes and their clinical relevance is yet unknown. Here, a long-range PCR (LR-PCR) approach for target enrichment of the entire genomic regions of the three genes was combined with next generation sequencing (NGS) to screen for coding and non-coding variants. NGS detected all six CCM1/KRIT1, two CCM2 and four CCM3/PDCD10 mutations that had previously been identified by Sanger sequencing. Two of the pathogenic variants presented here are novel. Additionally, 20 stringently selected CCM index cases that had remained mutation-negative after conventional sequencing and exclusion of copy number variations were screened for deep-intronic mutations. The combination of bioinformatics filtering and transcript analyses did not reveal any deep-intronic splice mutations in these cases. Our results demonstrate that target enrichment by LR-PCR combined with NGS can be used for a comprehensive analysis of the entire genomic regions of the CCM genes in a research context. However, its clinical utility is limited as deep-intronic splice mutations in CCM1/KRIT1, CCM2 and CCM3/PDCD10 seem to be rather rare. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
DSAP: deep-sequencing small RNA analysis pipeline.
Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus
2010-07-01
DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.
Blake, Jonathon; Riddell, Andrew; Theiss, Susanne; Gonzalez, Alexis Perez; Haase, Bettina; Jauch, Anna; Janssen, Johannes W. G.; Ibberson, David; Pavlinic, Dinko; Moog, Ute; Benes, Vladimir; Runz, Heiko
2014-01-01
Balanced chromosome abnormalities (BCAs) occur at a high frequency in healthy and diseased individuals, but cost-efficient strategies to identify BCAs and evaluate whether they contribute to a phenotype have not yet become widespread. Here we apply genome-wide mate-pair library sequencing to characterize structural variation in a patient with unclear neurodevelopmental disease (NDD) and complex de novo BCAs at the karyotype level. Nucleotide-level characterization of the clinically described BCA breakpoints revealed disruption of at least three NDD candidate genes (LINC00299, NUP205, PSMD14) that gave rise to abnormal mRNAs and could be assumed as disease-causing. However, unbiased genome-wide analysis of the sequencing data for cryptic structural variation was key to reveal an additional submicroscopic inversion that truncates the schizophrenia- and bipolar disorder-associated brain transcription factor ZNF804A as an equally likely NDD-driving gene. Deep sequencing of fluorescent-sorted wild-type and derivative chromosomes confirmed the clinically undetected BCA. Moreover, deep sequencing further validated a high accuracy of mate-pair library sequencing to detect structural variants larger than 10 kB, proposing that this approach is powerful for clinical-grade genome-wide structural variant detection. Our study supports previous evidence for a role of ZNF804A in NDD and highlights the need for a more comprehensive assessment of structural variation in karyotypically abnormal individuals and patients with neurocognitive disease to avoid diagnostic deception. PMID:24625750
Evolution of simeprevir-resistant variants over time by ultra-deep sequencing in HCV genotype 1b.
Akuta, Norio; Suzuki, Fumitaka; Sezaki, Hitomi; Suzuki, Yoshiyuki; Hosaka, Tetsuya; Kobayashi, Masahiro; Kobayashi, Mariko; Saitoh, Satoshi; Ikeda, Kenji; Kumada, Hiromitsu
2014-08-01
Using ultra-deep sequencing technology, the present study was designed to investigate the evolution of simeprevir-resistant variants (amino acid substitutions of aa80, aa155, aa156, and aa168 positions in HCV NS3 region) over time. In Toranomon Hospital, 18 Japanese patients infected with HCV genotype 1b, received triple therapy of simeprevir/PEG-IFN/ribavirin (DRAGON or CONCERT study). Sustained virological response rate was 67%, and that was significantly higher in patients with IL28B rs8099917 TT than in those with non-TT. Six patients, who did not achieve sustained virological response, were tested for resistant variants by ultra-deep sequencing, at the baseline, at the time of re-elevation of viral loads, and at 96 weeks after the completion of treatment. Twelve of 18 resistant variants, detected at re-elevation of viral load, were de novo resistant variants. Ten of 12 de novo resistant variants become undetectable over time, and that five of seven resistant variants, detected at baseline, persisted over time. In one patient, variants of Q80R at baseline (0.3%) increased at 96-week after the cessation of treatment (10.2%), and de novo resistant variants of D168E (0.3%) also increased at 96-week after the cessation of treatment (9.7%). In conclusion, the present study indicates that the emergence of simeprevir-resistant variants after the start of treatment could not be predicted at baseline, and the majority of de novo resistant variants become undetectable over time. Further large-scale prospective studies should be performed to investigate the clinical utility in detecting simeprevir-resistant variants. © 2014 Wiley Periodicals, Inc.
Genetic diversity among pandemic 2009 influenza viruses isolated from a transmission chain
2013-01-01
Background Influenza viruses such as swine-origin influenza A(H1N1) virus (A(H1N1)pdm09) generate genetic diversity due to the high error rate of their RNA polymerase, often resulting in mixed genotype populations (intra-host variants) within a single infection. This variation helps influenza to rapidly respond to selection pressures, such as those imposed by the immunological host response and antiviral therapy. We have applied deep sequencing to characterize influenza intra-host variation in a transmission chain consisting of three cases due to oseltamivir-sensitive viruses, and one derived oseltamivir-resistant case. Methods Following detection of the A(H1N1)pdm09 infections, we deep-sequenced the complete NA gene from two of the oseltamivir-sensitive virus-infected cases, and all eight gene segments of the viruses causing the remaining two cases. Results No evidence for the resistance-causing mutation (resulting in NA H275Y substitution) was observed in the oseltamivir-sensitive cases. Furthermore, deep sequencing revealed a subpopulation of oseltamivir-sensitive viruses in the case carrying resistant viruses. We detected higher levels of intra-host variation in the case carrying oseltamivir-resistant viruses than in those infected with oseltamivir-sensitive viruses. Conclusions Oseltamivir-resistance was only detected after prophylaxis with oseltamivir, suggesting that the mutation was selected for as a result of antiviral intervention. The persisting oseltamivir-sensitive virus population in the case carrying resistant viruses suggests either that a small proportion survive the treatment, or that the oseltamivir-sensitive virus rapidly re-establishes itself in the virus population after the bottleneck. Moreover, the increased intra-host variation in the oseltamivir-resistant case is consistent with the hypothesis that the population diversity of a RNA virus can increase rapidly following a population bottleneck. PMID:23587185
Bashir, Ali; Bansal, Vikas; Bafna, Vineet
2010-06-18
Massively parallel DNA sequencing technologies have enabled the sequencing of several individual human genomes. These technologies are also being used in novel ways for mRNA expression profiling, genome-wide discovery of transcription-factor binding sites, small RNA discovery, etc. The multitude of sequencing platforms, each with their unique characteristics, pose a number of design challenges, regarding the technology to be used and the depth of sequencing required for a particular sequencing application. Here we describe a number of analytical and empirical results to address design questions for two applications: detection of structural variations from paired-end sequencing and estimating mRNA transcript abundance. For structural variation, our results provide explicit trade-offs between the detection and resolution of rearrangement breakpoints, and the optimal mix of paired-read insert lengths. Specifically, we prove that optimal detection and resolution of breakpoints is achieved using a mix of exactly two insert library lengths. Furthermore, we derive explicit formulae to determine these insert length combinations, enabling a 15% improvement in breakpoint detection at the same experimental cost. On empirical short read data, these predictions show good concordance with Illumina 200 bp and 2 Kbp insert length libraries. For transcriptome sequencing, we determine the sequencing depth needed to detect rare transcripts from a small pilot study. With only 1 Million reads, we derive corrections that enable almost perfect prediction of the underlying expression probability distribution, and use this to predict the sequencing depth required to detect low expressed genes with greater than 95% probability. Together, our results form a generic framework for many design considerations related to high-throughput sequencing. We provide software tools http://bix.ucsd.edu/projects/NGS-DesignTools to derive platform independent guidelines for designing sequencing experiments (amount of sequencing, choice of insert length, mix of libraries) for novel applications of next generation sequencing.
Deep sequencing reveals persistence of cell-associated mumps vaccine virus in chronic encephalitis.
Morfopoulou, Sofia; Mee, Edward T; Connaughton, Sarah M; Brown, Julianne R; Gilmour, Kimberly; Chong, W K 'Kling'; Duprex, W Paul; Ferguson, Deborah; Hubank, Mike; Hutchinson, Ciaran; Kaliakatsos, Marios; McQuaid, Stephen; Paine, Simon; Plagnol, Vincent; Ruis, Christopher; Virasami, Alex; Zhan, Hong; Jacques, Thomas S; Schepelmann, Silke; Qasim, Waseem; Breuer, Judith
2017-01-01
Routine childhood vaccination against measles, mumps and rubella has virtually abolished virus-related morbidity and mortality. Notwithstanding this, we describe here devastating neurological complications associated with the detection of live-attenuated mumps virus Jeryl Lynn (MuV JL5 ) in the brain of a child who had undergone successful allogeneic transplantation for severe combined immunodeficiency (SCID). This is the first confirmed report of MuV JL5 associated with chronic encephalitis and highlights the need to exclude immunodeficient individuals from immunisation with live-attenuated vaccines. The diagnosis was only possible by deep sequencing of the brain biopsy. Sequence comparison of the vaccine batch to the MuV JL5 isolated from brain identified biased hypermutation, particularly in the matrix gene, similar to those found in measles from cases of SSPE. The findings provide unique insights into the pathogenesis of paramyxovirus brain infections.
High-speed railway real-time localization auxiliary method based on deep neural network
NASA Astrophysics Data System (ADS)
Chen, Dongjie; Zhang, Wensheng; Yang, Yang
2017-11-01
High-speed railway intelligent monitoring and management system is composed of schedule integration, geographic information, location services, and data mining technology for integration of time and space data. Assistant localization is a significant submodule of the intelligent monitoring system. In practical application, the general access is to capture the image sequences of the components by using a high-definition camera, digital image processing technique and target detection, tracking and even behavior analysis method. In this paper, we present an end-to-end character recognition method based on a deep CNN network called YOLO-toc for high-speed railway pillar plate number. Different from other deep CNNs, YOLO-toc is an end-to-end multi-target detection framework, furthermore, it exhibits a state-of-art performance on real-time detection with a nearly 50fps achieved on GPU (GTX960). Finally, we realize a real-time but high-accuracy pillar plate number recognition system and integrate natural scene OCR into a dedicated classification YOLO-toc model.
Shirts, Brian H; Salipante, Stephen J; Casadei, Silvia; Ryan, Shawnia; Martin, Judith; Jacobson, Angela; Vlaskin, Tatyana; Koehler, Karen; Livingston, Robert J; King, Mary-Claire; Walsh, Tom; Pritchard, Colin C
2014-10-01
Single-exon inversions have rarely been described in clinical syndromes and are challenging to detect using Sanger sequencing. We report the case of a 40-year-old woman with adenomatous colon polyps too numerous to count and who had a complex inversion spanning the entire exon 10 in APC (the gene encoding for adenomatous polyposis coli), causing exon skipping and resulting in a frameshift and premature protein truncation. In this study, we employed complete APC gene sequencing using high-coverage next-generation sequencing by ColoSeq, analysis with BreakDancer and SLOPE software, and confirmatory transcript analysis. ColoSeq identified a complex small genomic rearrangement consisting of an inversion that results in translational skipping of exon 10 in the APC gene. This mutation would not have been detected by traditional sequencing or gene-dosage methods. We report a case of adenomatous polyposis resulting from a complex single-exon inversion. Our report highlights the benefits of large-scale sequencing methods that capture intronic sequences with high enough depth of coverage-as well as the use of informatics tools-to enable detection of small pathogenic structural rearrangements.
Fingerprints of Modified RNA Bases from Deep Sequencing Profiles.
Kietrys, Anna M; Velema, Willem A; Kool, Eric T
2017-11-29
Posttranscriptional modifications of RNA bases are not only found in many noncoding RNAs but have also recently been identified in coding (messenger) RNAs as well. They require complex and laborious methods to locate, and many still lack methods for localized detection. Here we test the ability of next-generation sequencing (NGS) to detect and distinguish between ten modified bases in synthetic RNAs. We compare ultradeep sequencing patterns of modified bases, including miscoding, insertions and deletions (indels), and truncations, to unmodified bases in the same contexts. The data show widely varied responses to modification, ranging from no response, to high levels of mutations, insertions, deletions, and truncations. The patterns are distinct for several of the modifications, and suggest the future use of ultradeep sequencing as a fingerprinting strategy for locating and identifying modifications in cellular RNAs.
Trentin, Luca; Bresolin, Silvia; Giarin, Emanuela; Bardini, Michela; Serafin, Valentina; Accordi, Benedetta; Fais, Franco; Tenca, Claudya; De Lorenzo, Paola; Valsecchi, Maria Grazia; Cazzaniga, Giovanni; Kronnie, Geertruy Te; Basso, Giuseppe
2016-10-04
To induce and sustain the leukaemogenic process, MLL-AF4+ leukaemia seems to require very few genetic alterations in addition to the fusion gene itself. Studies of infant and paediatric patients with MLL-AF4+ B cell precursor acute lymphoblastic leukaemia (BCP-ALL) have reported mutations in KRAS and NRAS with incidences ranging from 25 to 50%. Whereas previous studies employed Sanger sequencing, here we used next generation amplicon deep sequencing for in depth evaluation of RAS mutations in 36 paediatric patients at diagnosis of MLL-AF4+ leukaemia. RAS mutations including those in small sub-clones were detected in 63.9% of patients. Furthermore, the mutational analysis of 17 paired samples at diagnosis and relapse revealed complex RAS clone dynamics and showed that the mutated clones present at relapse were almost all originated from clones that were already detectable at diagnosis and survived to the initial therapy. Finally, we showed that mutated patients were indeed characterized by a RAS related signature at both transcriptional and protein levels and that the targeting of the RAS pathway could be of beneficial for treatment of MLL-AF4+ BCP-ALL clones carrying somatic RAS mutations.
HomozygosityMapper2012--bridging the gap between homozygosity mapping and deep sequencing.
Seelow, Dominik; Schuelke, Markus
2012-07-01
Homozygosity mapping is a common method to map recessive traits in consanguineous families. To facilitate these analyses, we have developed HomozygosityMapper, a web-based approach to homozygosity mapping. HomozygosityMapper allows researchers to directly upload the genotype files produced by the major genotyping platforms as well as deep sequencing data. It detects stretches of homozygosity shared by the affected individuals and displays them graphically. Users can interactively inspect the underlying genotypes, manually refine these regions and eventually submit them to our candidate gene search engine GeneDistiller to identify the most promising candidate genes. Here, we present the new version of HomozygosityMapper. The most striking new feature is the support of Next Generation Sequencing *.vcf files as input. Upon users' requests, we have implemented the analysis of common experimental rodents as well as of important farm animals. Furthermore, we have extended the options for single families and loss of heterozygosity studies. Another new feature is the export of *.bed files for targeted enrichment of the potential disease regions for deep sequencing strategies. HomozygosityMapper also generates files for conventional linkage analyses which are already restricted to the possible disease regions, hence superseding CPU-intensive genome-wide analyses. HomozygosityMapper is freely available at http://www.homozygositymapper.org/.
Chen, Ping; Zhang, Limin; Guo, Xiaoxuan; Dai, Xin; Liu, Li; Xi, Lijun; Wang, Jian; Song, Lei; Wang, Yuezhu; Zhu, Yaxin; Huang, Li; Huang, Ying
2016-01-01
The phylum Actinobacteria has been reported to be common or even abundant in deep marine sediments, however, knowledge about the diversity, distribution, and function of actinobacteria is limited. In this study, actinobacterial diversity in the deep sea along the Southwest Indian Ridge (SWIR) was investigated using both 16S rRNA gene pyrosequencing and culture-based methods. The samples were collected at depths of 1662–4000 m below water surface. Actinobacterial sequences represented 1.2–9.1% of all microbial 16S rRNA gene amplicon sequences in each sample. A total of 5 actinobacterial classes, 17 orders, 28 families, and 52 genera were detected by pyrosequencing, dominated by the classes Acidimicrobiia and Actinobacteria. Differences in actinobacterial community compositions were found among the samples. The community structure showed significant correlations to geochemical factors, notably pH, calcium, total organic carbon, total phosphorus, and total nitrogen, rather than to spatial distance at the scale of the investigation. In addition, 176 strains of the Actinobacteria class, belonging to 9 known orders, 18 families, and 29 genera, were isolated. Among these cultivated taxa, 8 orders, 13 families, and 15 genera were also recovered by pyrosequencing. At a 97% 16S rRNA gene sequence similarity, the pyrosequencing data encompassed 77.3% of the isolates but the isolates represented only 10.3% of the actinobacterial reads. Phylogenetic analysis of all the representative actinobacterial sequences and isolates indicated that at least four new orders within the phylum Actinobacteria were detected by pyrosequencing. More than half of the isolates spanning 23 genera and all samples demonstrated activity in the degradation of refractory organics, including polycyclic aromatic hydrocarbons and polysaccharides, suggesting their potential ecological functions and biotechnological applications for carbon recycling. PMID:27621725
Parkes, R John; Sellek, Gerard; Webster, Gordon; Martin, Derek; Anders, Erik; Weightman, Andrew J; Sass, Henrik
2009-01-01
Deep subseafloor sediments may contain depressurization-sensitive, anaerobic, piezophilic prokaryotes. To test this we developed the DeepIsoBUG system, which when coupled with the HYACINTH pressure-retaining drilling and core storage system and the PRESS core cutting and processing system, enables deep sediments to be handled without depressurization (up to 25 MPa) and anaerobic prokaryotic enrichments and isolation to be conducted up to 100 MPa. Here, we describe the system and its first use with subsurface gas hydrate sediments from the Indian Continental Shelf, Cascadia Margin and Gulf of Mexico. Generally, highest cell concentrations in enrichments occurred close to in situ pressures (14 MPa) in a variety of media, although growth continued up to at least 80 MPa. Predominant sequences in enrichments were Carnobacterium, Clostridium, Marinilactibacillus and Pseudomonas, plus Acetobacterium and Bacteroidetes in Indian samples, largely independent of media and pressures. Related 16S rRNA gene sequences for all of these Bacteria have been detected in deep, subsurface environments, although isolated strains were piezotolerant, being able to grow at atmospheric pressure. Only the Clostridium and Acetobacterium were obligate anaerobes. No Archaea were enriched. It may be that these sediment samples were not deep enough (total depth 1126–1527 m) to obtain obligate piezophiles. PMID:19694787
Hirose, Yusuke; Onuki, Mamiko; Tenjimbayashi, Yuri; Mori, Seiichiro; Ishii, Yoshiyuki; Takeuchi, Takamasa; Tasaka, Nobutaka; Satoh, Toyomi; Morisada, Tohru; Iwata, Takashi; Miyamoto, Shingo; Matsumoto, Koji; Sekizawa, Akihiko; Kukimoto, Iwao
2018-06-15
Persistent infection with oncogenic human papillomaviruses (HPVs) causes cervical cancer, accompanied by the accumulation of somatic mutations into the host genome. There are concomitant genetic changes in the HPV genome during viral infection; however, their relevance to cervical carcinogenesis is poorly understood. Here, we explored within-host genetic diversity of HPV by performing deep-sequencing analyses of viral whole-genome sequences in clinical specimens. The whole genomes of HPV types 16, 52, and 58 were amplified by type-specific PCR from total cellular DNA of cervical exfoliated cells collected from patients with cervical intraepithelial neoplasia (CIN) and invasive cervical cancer (ICC) and were deep sequenced. After constructing a reference viral genome sequence for each specimen, nucleotide positions showing changes with >0.5% frequencies compared to the reference sequence were determined for individual samples. In total, 1,052 positions of nucleotide variations were detected in HPV genomes from 151 samples (CIN1, n = 56; CIN2/3, n = 68; ICC, n = 27), with various numbers per sample. Overall, C-to-T and C-to-A substitutions were the dominant changes observed across all histological grades. While C-to-T transitions were predominantly detected in CIN1, their prevalence was decreased in CIN2/3 and fell below that of C-to-A transversions in ICC. Analysis of the trinucleotide context encompassing substituted bases revealed that TpCpN, a preferred target sequence for cellular APOBEC cytosine deaminases, was a primary site for C-to-T substitutions in the HPV genome. These results strongly imply that the APOBEC proteins are drivers of HPV genome mutation, particularly in CIN1 lesions. IMPORTANCE HPVs exhibit surprisingly high levels of genetic diversity, including a large repertoire of minor genomic variants in each viral genotype. Here, by conducting deep-sequencing analyses, we show for the first time a comprehensive snapshot of the within-host genetic diversity of high-risk HPVs during cervical carcinogenesis. Quasispecies harboring minor nucleotide variations in viral whole-genome sequences were extensively observed across different grades of CIN and cervical cancer. Among the within-host variations, C-to-T transitions, a characteristic change mediated by cellular APOBEC cytosine deaminases, were predominantly detected throughout the whole viral genome, most strikingly in low-grade CIN lesions. The results strongly suggest that within-host variations of the HPV genome are primarily generated through the interaction with host cell DNA-editing enzymes and that such within-host variability is an evolutionary source of the genetic diversity of HPVs. Copyright © 2018 American Society for Microbiology.
Chaillon, Antoine; Nakazawa, Masato; Wertheim, Joel O; Little, Susan J; Smith, Davey M; Mehta, Sanjay R; Gianella, Sara
2017-11-01
During primary HIV infection, the presence of minority drug resistance mutations (DRM) may be a consequence of sexual transmission, de novo mutations, or technical errors in identification. Baseline blood samples were collected from 24 HIV-infected antiretroviral-naive, genetically and epidemiologically linked source and recipient partners shortly after the recipient's estimated date of infection. An additional 32 longitudinal samples were available from 11 recipients. Deep sequencing of HIV reverse transcriptase (RT) was performed (Roche/454), and the sequences were screened for nucleoside and nonnucleoside RT inhibitor DRM. The likelihood of sexual transmission and persistence of DRM was assessed using Bayesian-based statistical modeling. While the majority of DRM (>20%) were consistently transmitted from source to recipient, the probability of detecting a minority DRM in the recipient was not increased when the same minority DRM was detected in the source (Bayes factor [BF] = 6.37). Longitudinal analyses revealed an exponential decay of DRM (BF = 0.05) while genetic diversity increased. Our analysis revealed no substantial evidence for sexual transmission of minority DRM (BF = 0.02). The presence of minority DRM during early infection, followed by a rapid decay, is consistent with the "mutation-selection balance" hypothesis, in which deleterious mutations are more efficiently purged later during HIV infection when the larger effective population size allows more efficient selection. Future studies using more recent sequencing technologies that are less prone to single-base errors should confirm these results by applying a similar Bayesian framework in other clinical settings. IMPORTANCE The advent of sensitive sequencing platforms has led to an increased identification of minority drug resistance mutations (DRM), including among antiretroviral therapy-naive HIV-infected individuals. While transmission of DRM may impact future therapy options for newly infected individuals, the clinical significance of the detection of minority DRM remains controversial. In the present study, we applied deep-sequencing techniques within a Bayesian hierarchical framework to a cohort of 24 transmission pairs to investigate whether minority DRM detected shortly after transmission were the consequence of (i) sexual transmission from the source, (ii) de novo emergence shortly after infection followed by viral selection and evolution, or (iii) technical errors/limitations of deep-sequencing methods. We found no clear evidence to support the sexual transmission of minority resistant variants, and our results suggested that minor resistant variants may emerge de novo shortly after transmission, when the small effective population size limits efficient purge by natural selection. Copyright © 2017 American Society for Microbiology.
On-Line Detection and Segmentation of Sports Motions Using a Wearable Sensor.
Kim, Woosuk; Kim, Myunggyu
2018-03-19
In sports motion analysis, observation is a prerequisite for understanding the quality of motions. This paper introduces a novel approach to detect and segment sports motions using a wearable sensor for supporting systematic observation. The main goal is, for convenient analysis, to automatically provide motion data, which are temporally classified according to the phase definition. For explicit segmentation, a motion model is defined as a sequence of sub-motions with boundary states. A sequence classifier based on deep neural networks is designed to detect sports motions from continuous sensor inputs. The evaluation on two types of motions (soccer kicking and two-handed ball throwing) verifies that the proposed method is successful for the accurate detection and segmentation of sports motions. By developing a sports motion analysis system using the motion model and the sequence classifier, we show that the proposed method is useful for observation of sports motions by automatically providing relevant motion data for analysis.
Soverini, Simona; De Benedittis, Caterina; Castagnetti, Fausto; Gugliotta, Gabriele; Mancini, Manuela; Bavaro, Luana; Machova Polakova, Katerina; Linhartova, Jana; Iurlo, Alessandra; Russo, Domenico; Pane, Fabrizio; Saglio, Giuseppe; Rosti, Gianantonio; Cavo, Michele; Baccarani, Michele; Martinelli, Giovanni
2016-08-02
Imatinib-resistant chronic myeloid leukemia (CML) patients receiving second-line tyrosine kinase inhibitor (TKI) therapy with dasatinib or nilotinib have a higher risk of disease relapse and progression and not infrequently BCR-ABL1 kinase domain (KD) mutations are implicated in therapeutic failure. In this setting, earlier detection of emerging BCR-ABL1 KD mutations would offer greater chances of efficacy for subsequent salvage therapy and limit the biological consequences of full BCR-ABL1 kinase reactivation. Taking advantage of an already set up and validated next-generation deep amplicon sequencing (DS) assay, we aimed to assess whether DS may allow a larger window of detection of emerging BCR-ABL1 KD mutants predicting for an impending relapse. a total of 125 longitudinal samples from 51 CML patients who had acquired dasatinib- or nilotinib-resistant mutations during second-line therapy were analyzed by DS from the time of failure and mutation detection by conventional sequencing backwards. BCR-ABL1/ABL1%(IS) transcript levels were used to define whether the patient had 'optimal response', 'warning' or 'failure' at the time of first mutation detection by DS. DS was able to backtrack dasatinib- or nilotinib-resistant mutations to the previous sample(s) in 23/51 (45 %) pts. Median mutation burden at the time of first detection by DS was 5.5 % (range, 1.5-17.5 %); median interval between detection by DS and detection by conventional sequencing was 3 months (range, 1-9 months). In 5 cases, the mutations were detectable at baseline. In the remaining cases, response level at the time mutations were first detected by DS could be defined as 'Warning' (according to the 2013 ELN definitions of response to 2nd-line therapy) in 13 cases, as 'Optimal response' in one case, as 'Failure' in 4 cases. No dasatinib- or nilotinib-resistant mutations were detected by DS in 15 randomly selected patients with 'warning' at various timepoints, that later turned into optimal responders with no treatment changes. DS enables a larger window of detection of emerging BCR-ABL1 KD mutations predicting for an impending relapse. A 'Warning' response may represent a rational trigger, besides 'Failure', for DS-based mutation screening in CML patients undergoing second-line TKI therapy.
Generic Amplicon Deep Sequencing to Determine Ilarvirus Species Diversity in Australian Prunus
Kinoti, Wycliff M.; Constable, Fiona E.; Nancarrow, Narelle; Plummer, Kim M.; Rodoni, Brendan
2017-01-01
The distribution of Ilarvirus species populations amongst 61 Australian Prunus trees was determined by next generation sequencing (NGS) of amplicons generated using a genus-based generic RT-PCR targeting a conserved region of the Ilarvirus RNA2 component that encodes the RNA dependent RNA polymerase (RdRp) gene. Presence of Ilarvirus sequences in each positive sample was further validated by Sanger sequencing of cloned amplicons of regions of each of RNA1, RNA2 and/or RNA3 that were generated by species specific PCRs and by metagenomic NGS. Prunus necrotic ringspot virus (PNRSV) was the most frequently detected Ilarvirus, occurring in 48 of the 61 Ilarvirus-positive trees and Prune dwarf virus (PDV) and Apple mosaic virus (ApMV) were detected in three trees and one tree, respectively. American plum line pattern virus (APLPV) was detected in three trees and represents the first report of APLPV detection in Australia. Two novel and distinct groups of Ilarvirus-like RNA2 amplicon sequences were also identified in several trees by the generic amplicon NGS approach. The high read depth from the amplicon NGS of the generic PCR products allowed the detection of distinct RNA2 RdRp sequence variant populations of PNRSV, PDV, ApMV, APLPV and the two novel Ilarvirus-like sequences. Mixed infections of ilarviruses were also detected in seven Prunus trees. Sanger sequencing of specific RNA1, RNA2, and/or RNA3 genome segments of each virus and total nucleic acid metagenomics NGS confirmed the presence of PNRSV, PDV, ApMV and APLPV detected by RNA2 generic amplicon NGS. However, the two novel groups of Ilarvirus-like RNA2 amplicon sequences detected by the generic amplicon NGS could not be associated to the presence of sequence from RNA1 or RNA3 genome segments or full Ilarvirus genomes, and their origin is unclear. This work highlights the sensitivity of genus-specific amplicon NGS in detection of virus sequences and their distinct populations in multiple samples, and the need for a standardized approach to accurately determine what constitutes an active, viable virus infection after detection by molecular based methods. PMID:28713347
Generic Amplicon Deep Sequencing to Determine Ilarvirus Species Diversity in Australian Prunus.
Kinoti, Wycliff M; Constable, Fiona E; Nancarrow, Narelle; Plummer, Kim M; Rodoni, Brendan
2017-01-01
The distribution of Ilarvirus species populations amongst 61 Australian Prunus trees was determined by next generation sequencing (NGS) of amplicons generated using a genus-based generic RT-PCR targeting a conserved region of the Ilarvirus RNA2 component that encodes the RNA dependent RNA polymerase (RdRp) gene. Presence of Ilarvirus sequences in each positive sample was further validated by Sanger sequencing of cloned amplicons of regions of each of RNA1, RNA2 and/or RNA3 that were generated by species specific PCRs and by metagenomic NGS. Prunus necrotic ringspot virus (PNRSV) was the most frequently detected Ilarvirus , occurring in 48 of the 61 Ilarvirus -positive trees and Prune dwarf virus (PDV) and Apple mosaic virus (ApMV) were detected in three trees and one tree, respectively. American plum line pattern virus (APLPV) was detected in three trees and represents the first report of APLPV detection in Australia. Two novel and distinct groups of Ilarvirus -like RNA2 amplicon sequences were also identified in several trees by the generic amplicon NGS approach. The high read depth from the amplicon NGS of the generic PCR products allowed the detection of distinct RNA2 RdRp sequence variant populations of PNRSV, PDV, ApMV, APLPV and the two novel Ilarvirus -like sequences. Mixed infections of ilarviruses were also detected in seven Prunus trees. Sanger sequencing of specific RNA1, RNA2, and/or RNA3 genome segments of each virus and total nucleic acid metagenomics NGS confirmed the presence of PNRSV, PDV, ApMV and APLPV detected by RNA2 generic amplicon NGS. However, the two novel groups of Ilarvirus -like RNA2 amplicon sequences detected by the generic amplicon NGS could not be associated to the presence of sequence from RNA1 or RNA3 genome segments or full Ilarvirus genomes, and their origin is unclear. This work highlights the sensitivity of genus-specific amplicon NGS in detection of virus sequences and their distinct populations in multiple samples, and the need for a standardized approach to accurately determine what constitutes an active, viable virus infection after detection by molecular based methods.
Small RNA Analysis in Sindbis Virus Infected Human HEK293 Cells
Dalmay, Tamas; Powell, Penny P.
2013-01-01
Introduction In contrast to the defence mechanism of RNA interference (RNAi) in plants and invertebrates, its role in the innate response to virus infection of mammals is a matter of debate. Since RNAi has a well-established role in controlling infection of the alphavirus Sindbis virus (SINV) in insects, we have used this virus to investigate the role of RNAi in SINV infection of human cells. Results SINV AR339 and TR339-GFP were adapted to grow in HEK293 cells. Deep sequencing of small RNAs (sRNAs) early in SINV infection (4 and 6 hpi) showed low abundance (0.8%) of viral sRNAs (vsRNAs), with no size, sequence or location specific patterns characteristic of Dicer products nor did they possess any discernible pattern to ascribe to a specific RNAi biogenesis pathway. This was supported by multiple variants for each sequence, and lack of hot spots along the viral genome sequence. The abundance of the best defined vsRNAs was below the limit of Northern blot detection. The adaptation of the virus to HEK293 cells showed little sequence changes compared to the reference; however, a SNP in E1 gene with a preference from G to C was found. Deep sequencing results showed little variation of expression of cellular microRNAs (miRNAs) at 4 and 6 hpi compared to uninfected cells. Twelve miRNAs exhibiting some minor differential expression by sequencing, showed no difference in expression by Northern blot analysis. Conclusions We show that, unlike SINV infection of invertebrates, generation of Dicer-dependent svRNAs and change in expression of cellular miRNAs were not detected as part of the Human response to SINV. PMID:24391886
Is There Still Room for Novel Viral Pathogens in Pediatric Respiratory Tract Infections?
Taboada, Blanca; Espinoza, Marco A.; Isa, Pavel; Aponte, Fernando E.; Arias-Ortiz, María A.; Monge-Martínez, Jesús; Rodríguez-Vázquez, Rubén; Díaz-Hernández, Fidel; Zárate-Vidal, Fernando; Wong-Chew, Rosa María; Firo-Reyes, Verónica; del Río-Almendárez, Carlos N.; Gaitán-Meza, Jesús; Villaseñor-Sierra, Alberto; Martínez-Aguilar, Gerardo; Salas-Mier, Ma. del Carmen; Noyola, Daniel E.; Pérez-Gónzalez, Luis F.; López, Susana; Santos-Preciado, José I.; Arias, Carlos F.
2014-01-01
Viruses are the most frequent cause of respiratory disease in children. However, despite the advanced diagnostic methods currently in use, in 20 to 50% of respiratory samples a specific pathogen cannot be detected. In this work, we used a metagenomic approach and deep sequencing to examine respiratory samples from children with lower and upper respiratory tract infections that had been previously found negative for 6 bacteria and 15 respiratory viruses by PCR. Nasal washings from 25 children (out of 250) hospitalized with a diagnosis of pneumonia and nasopharyngeal swabs from 46 outpatient children (out of 526) were studied. DNA reads for at least one virus commonly associated to respiratory infections was found in 20 of 25 hospitalized patients, while reads for pathogenic respiratory bacteria were detected in the remaining 5 children. For outpatients, all the samples were pooled into 25 DNA libraries for sequencing. In this case, in 22 of the 25 sequenced libraries at least one respiratory virus was identified, while in all other, but one, pathogenic bacteria were detected. In both patient groups reads for respiratory syncytial virus, coronavirus-OC43, and rhinovirus were identified. In addition, viruses less frequently associated to respiratory infections were also found. Saffold virus was detected in outpatient but not in hospitalized children. Anellovirus, rotavirus, and astrovirus, as well as several animal and plant viruses were detected in both groups. No novel viruses were identified. Adding up the deep sequencing results to the PCR data, 79.2% of 250 hospitalized and 76.6% of 526 ambulatory patients were positive for viruses, and all other children, but one, had pathogenic respiratory bacteria identified. These results suggest that at least in the type of populations studied and with the sampling methods used the odds of finding novel, clinically relevant viruses, in pediatric respiratory infections are low. PMID:25412469
De novo peptide sequencing by deep learning
Tran, Ngoc Hieu; Zhang, Xianglilan; Xin, Lei; Shan, Baozhen; Li, Ming
2017-01-01
De novo peptide sequencing from tandem MS data is the key technology in proteomics for the characterization of proteins, especially for new sequences, such as mAbs. In this study, we propose a deep neural network model, DeepNovo, for de novo peptide sequencing. DeepNovo architecture combines recent advances in convolutional neural networks and recurrent neural networks to learn features of tandem mass spectra, fragment ions, and sequence patterns of peptides. The networks are further integrated with local dynamic programming to solve the complex optimization task of de novo sequencing. We evaluated the method on a wide variety of species and found that DeepNovo considerably outperformed state of the art methods, achieving 7.7–22.9% higher accuracy at the amino acid level and 38.1–64.0% higher accuracy at the peptide level. We further used DeepNovo to automatically reconstruct the complete sequences of antibody light and heavy chains of mouse, achieving 97.5–100% coverage and 97.2–99.5% accuracy, without assisting databases. Moreover, DeepNovo is retrainable to adapt to any sources of data and provides a complete end-to-end training and prediction solution to the de novo sequencing problem. Not only does our study extend the deep learning revolution to a new field, but it also shows an innovative approach in solving optimization problems by using deep learning and dynamic programming. PMID:28720701
Villacreses, Javier; Rojas-Herrera, Marcelo; Sánchez, Carolina; Hewstone, Nicole; Undurraga, Soledad F.; Alzate, Juan F.; Manque, Patricio; Maracaja-Coutinho, Vinicius; Polanco, Victor
2015-01-01
Here, we report the genome sequence and evidence for transcriptional activity of a virus-like element in the native Chilean berry tree Aristotelia chilensis. We propose to name the endogenous sequence as Aristotelia chilensis Virus 1 (AcV1). High-throughput sequencing of the genome of this tree uncovered an endogenous viral element, with a size of 7122 bp, corresponding to the complete genome of AcV1. Its sequence contains three open reading frames (ORFs): ORFs 1 and 2 shares 66%–73% amino acid similarity with members of the Caulimoviridae virus family, especially the Petunia vein clearing virus (PVCV), Petuvirus genus. ORF1 encodes a movement protein (MP); ORF2 a Reverse Transcriptase (RT) and a Ribonuclease H (RNase H) domain; and ORF3 showed no amino acid sequence similarity with any other known virus proteins. Analogous to other known endogenous pararetrovirus sequences (EPRVs), AcV1 is integrated in the genome of Maqui Berry and showed low viral transcriptional activity, which was detected by deep sequencing technology (DNA and RNA-seq). Phylogenetic analysis of AcV1 and other pararetroviruses revealed a closer resemblance with Petuvirus. Overall, our data suggests that AcV1 could be a new member of Caulimoviridae family, genus Petuvirus, and the first evidence of this kind of virus in a fruit plant. PMID:25855242
Hou, Weiguo; Wang, Shang; Briggs, Brandon R; Li, Gaoyuan; Xie, Wei; Dong, Hailiang
2018-01-01
Myocyanophages, a group of viruses infecting cyanobacteria, are abundant and play important roles in elemental cycling. Here we investigated the particle-associated viral communities retained on 0.2 μm filters and in sediment samples (representing ancient cyanophage communities) from four ocean and three lake locations, using high-throughput sequencing and a newly designed primer pair targeting a gene fragment (∼145-bp in length) encoding the cyanophage gp23 major capsid protein (MCP). Diverse viral communities were detected in all samples. The fragments of 142-, 145-, and 148-bp in length were most abundant in the amplicons, and most sequences (>92%) belonged to cyanophages. Additionally, different sequencing depths resulted in different diversity estimates of the viral community. Operational taxonomic units obtained from deep sequencing of the MCP gene covered the majority of those obtained from shallow sequencing, suggesting that deep sequencing exhibited a more complete picture of cyanophage community than shallow sequencing. Our results also revealed a wide geographic distribution of marine myocyanophages, i.e., higher dissimilarities of the myocyanophage communities corresponded with the larger distances between the sampling sites. Collectively, this study suggests that the newly designed primer pair can be effectively used to study the community and diversity of myocyanophage from different environments, and the high-throughput sequencing represents a good method to understand viral diversity.
Hou, Weiguo; Wang, Shang; Briggs, Brandon R.; Li, Gaoyuan; Xie, Wei; Dong, Hailiang
2018-01-01
Myocyanophages, a group of viruses infecting cyanobacteria, are abundant and play important roles in elemental cycling. Here we investigated the particle-associated viral communities retained on 0.2 μm filters and in sediment samples (representing ancient cyanophage communities) from four ocean and three lake locations, using high-throughput sequencing and a newly designed primer pair targeting a gene fragment (∼145-bp in length) encoding the cyanophage gp23 major capsid protein (MCP). Diverse viral communities were detected in all samples. The fragments of 142-, 145-, and 148-bp in length were most abundant in the amplicons, and most sequences (>92%) belonged to cyanophages. Additionally, different sequencing depths resulted in different diversity estimates of the viral community. Operational taxonomic units obtained from deep sequencing of the MCP gene covered the majority of those obtained from shallow sequencing, suggesting that deep sequencing exhibited a more complete picture of cyanophage community than shallow sequencing. Our results also revealed a wide geographic distribution of marine myocyanophages, i.e., higher dissimilarities of the myocyanophage communities corresponded with the larger distances between the sampling sites. Collectively, this study suggests that the newly designed primer pair can be effectively used to study the community and diversity of myocyanophage from different environments, and the high-throughput sequencing represents a good method to understand viral diversity.
Unique microbial community in drilling fluids from Chinese continental scientific drilling
Zhang, Gengxin; Dong, Hailiang; Jiang, Hongchen; Xu, Zhiqin; Eberl, Dennis D.
2006-01-01
Circulating drilling fluid is often regarded as a contamination source in investigations of subsurface microbiology. However, it also provides an opportunity to sample geological fluids at depth and to study contained microbial communities. During our study of deep subsurface microbiology of the Chinese Continental Scientific Deep drilling project, we collected 6 drilling fluid samples from a borehole from 2290 to 3350 m below the land surface. Microbial communities in these samples were characterized with cultivation-dependent and -independent techniques. Characterization of 16S rRNA genes indicated that the bacterial clone sequences related to Firmicutes became progressively dominant with increasing depth. Most sequences were related to anaerobic, thermophilic, halophilic or alkaliphilic bacteria. These habitats were consistent with the measured geochemical characteristics of the drilling fluids that have incorporated geological fluids and partly reflected the in-situ conditions. Several clone types were closely related to Thermoanaerobacter ethanolicus, Caldicellulosiruptor lactoaceticus, and Anaerobranca gottschalkii, an anaerobic metal-reducer, an extreme thermophile, and an anaerobic chemoorganotroph, respectively, with an optimal growth temperature of 50–68°C. Seven anaerobic, thermophilic Fe(III)-reducing bacterial isolates were obtained and they were capable of reducing iron oxide and clay minerals to produce siderite, vivianite, and illite. The archaeal diversity was low. Most archaeal sequences were not related to any known cultivated species, but rather to environmental clone sequences recovered from subsurface environments. We infer that the detected microbes were derived from geological fluids at depth and their growth habitats reflected the deep subsurface conditions. These findings have important implications for microbial survival and their ecological functions in the deep subsurface.
Cui, Zhihua; Zhang, Yi
2014-02-01
As a promising and innovative research field, bioinformatics has attracted increasing attention recently. Beneath the enormous number of open problems in this field, one fundamental issue is about the accurate and efficient computational methodology that can deal with tremendous amounts of data. In this paper, we survey some applications of swarm intelligence to discover patterns of multiple sequences. To provide a deep insight, ant colony optimization, particle swarm optimization, artificial bee colony and artificial fish swarm algorithm are selected, and their applications to multiple sequence alignment and motif detecting problem are discussed.
Optical Communications Channel Combiner
NASA Technical Reports Server (NTRS)
Quirk, Kevin J.; Quirk, Kevin J.; Nguyen, Danh H.; Nguyen, Huy
2012-01-01
NASA has identified deep-space optical communications links as an integral part of a unified space communication network in order to provide data rates in excess of 100 Mb/s. The distances and limited power inherent in a deep-space optical downlink necessitate the use of photon-counting detectors and a power-efficient modulation such as pulse position modulation (PPM). For the output of each photodetector, whether from a separate telescope or a portion of the detection area, a communication receiver estimates a log-likelihood ratio for each PPM slot. To realize the full effective aperture of these receivers, their outputs must be combined prior to information decoding. A channel combiner was developed to synchronize the log-likelihood ratio (LLR) sequences of multiple receivers, and then combines these into a single LLR sequence for information decoding. The channel combiner synchronizes the LLR sequences of up to three receivers and then combines these into a single LLR sequence for output. The channel combiner has three channel inputs, each of which takes as input a sequence of four-bit LLRs for each PPM slot in a codeword via a XAUI 10 Gb/s quad optical fiber interface. The cross-correlation between the channels LLR time series are calculated and used to synchronize the sequences prior to combining. The output of the channel combiner is a sequence of four-bit LLRs for each PPM slot in a codeword via a XAUI 10 Gb/s quad optical fiber interface. The unit is controlled through a 1 Gb/s Ethernet UDP/IP interface. A deep-space optical communication link has not yet been demonstrated. This ground-station channel combiner was developed to demonstrate this capability and is unique in its ability to process such a signal.
Takai, Erina; Totoki, Yasushi; Nakamura, Hiromi; Kato, Mamoru; Shibata, Tatsuhiro; Yachida, Shinichi
2016-01-01
Pancreatic ductal adenocarcinoma (PDAC) remains one of the most lethal malignancies. The genomic landscape of the PDAC genome features four frequently mutated genes (KRAS, CDKN2A, TP53, and SMAD4) and dozens of candidate driver genes altered at low frequency, including potential clinical targets. Circulating cell-free DNA (cfDNA) is a promising resource to detect molecular characteristics of tumors, supporting the concept of "liquid biopsy".We determined the mutational status of KRAS in plasma cfDNA using multiplex droplet digital PCR in 259 patients with PDAC, retrospectively. Furthermore, we constructed a novel modified SureSelect-KAPA-Illumina platform and an original panel of 60 genes. We then performed targeted deep sequencing of cfDNA in 48 patients who had ≥1 % mutant allele frequencies of KRAS in plasma cfDNA.Droplet digital PCR detected KRAS mutations in plasma cfDNA in 63 of 107 (58.9 %) patients with inoperable tumors. Importantly, potentially targetable somatic mutations were identified in 14 of 48 patients (29.2 %) examined by cfDNA sequencing.Our two-step approach with plasma cfDNA, combining droplet digital PCR and targeted deep sequencing, is a feasible clinical approach. Assessment of mutations in plasma cfDNA may provide a new diagnostic tool, assisting decisions for optimal therapeutic strategies for PDAC patients.
Clinical utility of circulating tumor DNA for molecular assessment in pancreatic cancer.
Takai, Erina; Totoki, Yasushi; Nakamura, Hiromi; Morizane, Chigusa; Nara, Satoshi; Hama, Natsuko; Suzuki, Masami; Furukawa, Eisaku; Kato, Mamoru; Hayashi, Hideyuki; Kohno, Takashi; Ueno, Hideki; Shimada, Kazuaki; Okusaka, Takuji; Nakagama, Hitoshi; Shibata, Tatsuhiro; Yachida, Shinichi
2015-12-16
Pancreatic ductal adenocarcinoma (PDAC) remains one of the most lethal malignancies. The genomic landscape of the PDAC genome features four frequently mutated genes (KRAS, CDKN2A, TP53, and SMAD4) and dozens of candidate driver genes altered at low frequency, including potential clinical targets. Circulating cell-free DNA (cfDNA) is a promising resource to detect and monitor molecular characteristics of tumors. In the present study, we determined the mutational status of KRAS in plasma cfDNA using multiplex picoliter-droplet digital PCR in 259 patients with PDAC. We constructed a novel modified SureSelect-KAPA-Illumina platform and an original panel of 60 genes. We then performed targeted deep sequencing of cfDNA and matched germline DNA samples in 48 patients who had ≥1% mutant allele frequencies of KRAS in plasma cfDNA. Importantly, potentially targetable somatic mutations were identified in 14 of 48 patients (29.2%) examined by targeted deep sequencing of cfDNA. We also analyzed somatic copy number alterations based on the targeted sequencing data using our in-house algorithm, and potentially targetable amplifications were detected. Assessment of mutations and copy number alterations in plasma cfDNA may provide a prognostic and diagnostic tool to assist decisions regarding optimal therapeutic strategies for PDAC patients.
NASA Astrophysics Data System (ADS)
Zhang, Xiao-Yong; Wang, Guang-Hua; Xu, Xin-Ya; Nong, Xu-Hua; Wang, Jie; Amin, Muhammad; Qi, Shu-Hua
2016-10-01
The present study investigated the fungal diversity in four different deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing of the nuclear ribosomal internal transcribed spacer-1 (ITS1). A total of 40,297 fungal ITS1 sequences clustered into 420 operational taxonomic units (OTUs) with 97% sequence similarity and 170 taxa were recovered from these sediments. Most ITS1 sequences (78%) belonged to the phylum Ascomycota, followed by Basidiomycota (17.3%), Zygomycota (1.5%) and Chytridiomycota (0.8%), and a small proportion (2.4%) belonged to unassigned fungal phyla. Compared with previous studies on fungal diversity of sediments from deep-sea environments by culture-dependent approach and clone library analysis, the present result suggested that Illumina sequencing had been dramatically accelerating the discovery of fungal community of deep-sea sediments. Furthermore, our results revealed that Sordariomycetes was the most diverse and abundant fungal class in this study, challenging the traditional view that the diversity of Sordariomycetes phylotypes was low in the deep-sea environments. In addition, more than 12 taxa accounted for 21.5% sequences were found to be rarely reported as deep-sea fungi, suggesting the deep-sea sediments from Okinawa Trough harbored a plethora of different fungal communities compared with other deep-sea environments. To our knowledge, this study is the first exploration of the fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing.
Detection of microRNAs in color space.
Marco, Antonio; Griffiths-Jones, Sam
2012-02-01
Deep sequencing provides inexpensive opportunities to characterize the transcriptional diversity of known genomes. The AB SOLiD technology generates millions of short sequencing reads in color-space; that is, the raw data is a sequence of colors, where each color represents 2 nt and each nucleotide is represented by two consecutive colors. This strategy is purported to have several advantages, including increased ability to distinguish sequencing errors from polymorphisms. Several programs have been developed to map short reads to genomes in color space. However, a number of previously unexplored technical issues arise when using SOLiD technology to characterize microRNAs. Here we explore these technical difficulties. First, since the sequenced reads are longer than the biological sequences, every read is expected to contain linker fragments. The color-calling error rate increases toward the 3(') end of the read such that recognizing the linker sequence for removal becomes problematic. Second, mapping in color space may lead to the loss of the first nucleotide of each read. We propose a sequential trimming and mapping approach to map small RNAs. Using our strategy, we reanalyze three published insect small RNA deep sequencing datasets and characterize 22 new microRNAs. A bash shell script to perform the sequential trimming and mapping procedure, called SeqTrimMap, is available at: http://www.mirbase.org/tools/seqtrimmap/ antonio.marco@manchester.ac.uk Supplementary data are available at Bioinformatics online.
Precise genotyping and recombination detection of Enterovirus
2015-01-01
Enteroviruses (EV) with different genotypes cause diverse infectious diseases in humans and mammals. A correct EV typing result is crucial for effective medical treatment and disease control; however, the emergence of novel viral strains has impaired the performance of available diagnostic tools. Here, we present a web-based tool, named EVIDENCE (EnteroVirus In DEep conception, http://symbiont.iis.sinica.edu.tw/evidence), for EV genotyping and recombination detection. We introduce the idea of using mixed-ranking scores to evaluate the fitness of prototypes based on relatedness and on the genome regions of interest. Using phylogenetic methods, the most possible genotype is determined based on the closest neighbor among the selected references. To detect possible recombination events, EVIDENCE calculates the sequence distance and phylogenetic relationship among sequences of all sliding windows scanning over the whole genome. Detected recombination events are plotted in an interactive figure for viewing of fine details. In addition, all EV sequences available in GenBank were collected and revised using the latest classification and nomenclature of EV in EVIDENCE. These sequences are built into the database and are retrieved in an indexed catalog, or can be searched for by keywords or by sequence similarity. EVIDENCE is the first web-based tool containing pipelines for genotyping and recombination detection, with updated, built-in, and complete reference sequences to improve sensitivity and specificity. The use of EVIDENCE can accelerate genotype identification, aiding clinical diagnosis and enhancing our understanding of EV evolution. PMID:26678286
Delivery and detection of dietary plant-based miRNAs in animal tissues
USDA-ARS?s Scientific Manuscript database
It has been proposed that genetic material, namely microRNAs (miRNAs), consumed in plant-based diets can affect animal gene expression. Though deep sequencing reveals the low-level presence of plant miRNAs in animal tissues, many groups have been thus far unable to replicate the finding that a rice ...
Härtl, Katja; Kalinowski, Gregor; Hoffmann, Thomas; Preuss, Anja; Schwab, Wilfried
2017-05-01
RNA interference (RNAi) has been exploited as a reverse genetic tool for functional genomics in the nonmodel species strawberry (Fragaria × ananassa) since 2006. Here, we analysed for the first time different but overlapping nucleotide sections (>200 nt) of two endogenous genes, FaCHS (chalcone synthase) and FaOMT (O-methyltransferase), as inducer sequences and a transitive vector system to compare their gene silencing efficiencies. In total, ten vectors were assembled each containing the nucleotide sequence of one fragment in sense and corresponding antisense orientation separated by an intron (inverted hairpin construct, ihp). All sequence fragments along the full lengths of both target genes resulted in a significant down-regulation of the respective gene expression and related metabolite levels. Quantitative PCR data and successful application of a transitive vector system coinciding with a phenotypic change suggested propagation of the silencing signal. The spreading of the signal in strawberry fruit in the 3' direction was shown for the first time by the detection of secondary small interfering RNAs (siRNAs) outside of the primary targets by deep sequencing. Down-regulation of endogenes by the transitive method was less effective than silencing by ihp constructs probably because the numbers of primary siRNAs exceeded the quantity of secondary siRNAs by three orders of magnitude. Besides, we observed consistent hotspots of primary and secondary siRNA formation along the target sequence which fall within a distance of less than 200 nt. Thus, ihp vectors seem to be superior over the transitive vector system for functional genomics in strawberry fruit. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Byers, Helen; Wallis, Yvonne; van Veen, Elke M; Lalloo, Fiona; Reay, Kim; Smith, Philip; Wallace, Andrew J; Bowers, Naomi; Newman, William G; Evans, D Gareth
2016-11-01
The sensitivity of testing BRCA1 and BRCA2 remains unresolved as the frequency of deep intronic splicing variants has not been defined in high-risk familial breast/ovarian cancer families. This variant category is reported at significant frequency in other tumour predisposition genes, including NF1 and MSH2. We carried out comprehensive whole gene RNA analysis on 45 high-risk breast/ovary and male breast cancer families with no identified pathogenic variant on exonic sequencing and copy number analysis of BRCA1/2. In addition, we undertook variant screening of a 10-gene high/moderate risk breast/ovarian cancer panel by next-generation sequencing. DNA testing identified the causative variant in 50/56 (89%) breast/ovarian/male breast cancer families with Manchester scores of ≥50 with two variants being confirmed to affect splicing on RNA analysis. RNA sequencing of BRCA1/BRCA2 on 45 individuals from high-risk families identified no deep intronic variants and did not suggest loss of RNA expression as a cause of lost sensitivity. Panel testing in 42 samples identified a known RAD51D variant, a high-risk ATM variant in another breast ovary family and a truncating CHEK2 mutation. Current exonic sequencing and copy number analysis variant detection methods of BRCA1/2 have high sensitivity in high-risk breast/ovarian cancer families. Sequence analysis of RNA does not identify any variants undetected by current analysis of BRCA1/2. However, RNA analysis clarified the pathogenicity of variants of unknown significance detected by current methods. The low diagnostic uplift achieved through sequence analysis of the other known breast/ovarian cancer susceptibility genes indicates that further high-risk genes remain to be identified.
2014-01-01
Background The objective of this study was to perform a systematic review and a meta-analysis in order to estimate the diagnostic accuracy of diffusion weighted imaging (DWI) in the preoperative assessment of deep myometrial invasion in patients with endometrial carcinoma. Methods Studies evaluating DWI for the detection of deep myometrial invasion in patients with endometrial carcinoma were systematically searched for in the MEDLINE, EMBASE, and Cochrane Library from January 1995 to January 2014. Methodologic quality was assessed by using the Quality Assessment of Diagnostic Accuracy Studies tool. Bivariate random-effects meta-analytic methods were used to obtain pooled estimates of sensitivity, specificity, diagnostic odds ratio (DOR) and receiver operating characteristic (ROC) curves. The study also evaluated the clinical utility of DWI in preoperative assessment of deep myometrial invasion. Results Seven studies enrolling a total of 320 individuals met the study inclusion criteria. The summary area under the ROC curve was 0.91. There was no evidence of publication bias (P = 0.90, bias coefficient analysis). Sensitivity and specificity of DWI for detection of deep myometrial invasion across all studies were 0.90 and 0.89, respectively. Positive and negative likelihood ratios with DWI were 8 and 0.11 respectively. In patients with high pre-test probabilities, DWI enabled confirmation of deep myometrial invasion; in patients with low pre-test probabilities, DWI enabled exclusion of deep myometrial invasion. The worst case scenario (pre-test probability, 50%) post-test probabilities were 89% and 10% for positive and negative DWI results, respectively. Conclusion DWI has high sensitivity and specificity for detecting deep myometrial invasion and more importantly can reliably rule out deep myometrial invasion. Therefore, it would be worthwhile to add a DWI sequence to the standard MRI protocols in preoperative evaluation of endometrial cancer in order to detect deep myometrial invasion, which along with other poor prognostic factors like age, tumor grade, and LVSI would be useful in stratifying high risk groups thereby helping in the tailoring of surgical approach in patient with low risk of endometrial carcinoma. PMID:25608571
Toward a real-time system for temporal enhanced ultrasound-guided prostate biopsy.
Azizi, Shekoofeh; Van Woudenberg, Nathan; Sojoudi, Samira; Li, Ming; Xu, Sheng; Abu Anas, Emran M; Yan, Pingkun; Tahmasebi, Amir; Kwak, Jin Tae; Turkbey, Baris; Choyke, Peter; Pinto, Peter; Wood, Bradford; Mousavi, Parvin; Abolmaesumi, Purang
2018-03-27
We have previously proposed temporal enhanced ultrasound (TeUS) as a new paradigm for tissue characterization. TeUS is based on analyzing a sequence of ultrasound data with deep learning and has been demonstrated to be successful for detection of cancer in ultrasound-guided prostate biopsy. Our aim is to enable the dissemination of this technology to the community for large-scale clinical validation. In this paper, we present a unified software framework demonstrating near-real-time analysis of ultrasound data stream using a deep learning solution. The system integrates ultrasound imaging hardware, visualization and a deep learning back-end to build an accessible, flexible and robust platform. A client-server approach is used in order to run computationally expensive algorithms in parallel. We demonstrate the efficacy of the framework using two applications as case studies. First, we show that prostate cancer detection using near-real-time analysis of RF and B-mode TeUS data and deep learning is feasible. Second, we present real-time segmentation of ultrasound prostate data using an integrated deep learning solution. The system is evaluated for cancer detection accuracy on ultrasound data obtained from a large clinical study with 255 biopsy cores from 157 subjects. It is further assessed with an independent dataset with 21 biopsy targets from six subjects. In the first study, we achieve area under the curve, sensitivity, specificity and accuracy of 0.94, 0.77, 0.94 and 0.92, respectively, for the detection of prostate cancer. In the second study, we achieve an AUC of 0.85. Our results suggest that TeUS-guided biopsy can be potentially effective for the detection of prostate cancer.
Zhong, Daibin; Lo, Eugenia; Wang, Xiaoming; Yewhalaw, Delenasaw; Zhou, Guofa; Atieli, Harrysone E; Githeko, Andrew; Hemming-Schroeder, Elizabeth; Lee, Ming-Chieh; Afrane, Yaw; Yan, Guiyun
2018-05-02
Parasite genetic diversity and multiplicity of infection (MOI) affect clinical outcomes, response to drug treatment and naturally-acquired or vaccine-induced immunity. Traditional methods often underestimate the frequency and diversity of multiclonal infections due to technical sensitivity and specificity. Next-generation sequencing techniques provide a novel opportunity to study complexity of parasite populations and molecular epidemiology. Symptomatic and asymptomatic Plasmodium vivax samples were collected from health centres/hospitals and schools, respectively, from 2011 to 2015 in Ethiopia. Similarly, both symptomatic and asymptomatic Plasmodium falciparum samples were collected, respectively, from hospitals and schools in 2005 and 2015 in Kenya. Finger-pricked blood samples were collected and dried on filter paper. Long amplicon (> 400 bp) deep sequencing of merozoite surface protein 1 (msp1) gene was conducted to determine multiplicity and molecular epidemiology of P. vivax and P. falciparum infections. The results were compared with those based on short amplicon (117 bp) deep sequencing. A total of 139 P. vivax and 222 P. falciparum samples were pyro-sequenced for pvmsp1 and pfmsp1, yielding a total of 21 P. vivax and 99 P. falciparum predominant haplotypes. The average MOI for P. vivax and P. falciparum were 2.16 and 2.68, respectively, which were significantly higher than that of microsatellite markers and short amplicon (117 bp) deep sequencing. Multiclonal infections were detected in 62.2% of the samples for P. vivax and 74.8% of the samples for P. falciparum. Four out of the five subjects with recurrent P. vivax malaria were found to be a relapse 44-65 days after clearance of parasites. No difference was observed in MOI among P. vivax patients of different symptoms, ages and genders. Similar patterns were also observed in P. falciparum except for one study site in Kenyan lowland areas with significantly higher MOI. The study used a novel method to evaluate Plasmodium MOI and molecular epidemiological patterns by long amplicon ultra-deep sequencing. The complexity of infections were similar among age groups, symptoms, genders, transmission settings (spatial heterogeneity), as well as over years (pre- vs. post-scale-up interventions). This study demonstrated that long amplicon deep sequencing is a useful tool to investigate multiplicity and molecular epidemiology of Plasmodium parasite infections.
Deep Sequencing to Identify the Causes of Viral Encephalitis
Chan, Benjamin K.; Wilson, Theodore; Fischer, Kael F.; Kriesel, John D.
2014-01-01
Deep sequencing allows for a rapid, accurate characterization of microbial DNA and RNA sequences in many types of samples. Deep sequencing (also called next generation sequencing or NGS) is being developed to assist with the diagnosis of a wide variety of infectious diseases. In this study, seven frozen brain samples from deceased subjects with recent encephalitis were investigated. RNA from each sample was extracted, randomly reverse transcribed and sequenced. The sequence analysis was performed in a blinded fashion and confirmed with pathogen-specific PCR. This analysis successfully identified measles virus sequences in two brain samples and herpes simplex virus type-1 sequences in three brain samples. No pathogen was identified in the other two brain specimens. These results were concordant with pathogen-specific PCR and partially concordant with prior neuropathological examinations, demonstrating that deep sequencing can accurately identify viral infections in frozen brain tissue. PMID:24699691
Retterer, Kyle; Scuffins, Julie; Schmidt, Daniel; Lewis, Rachel; Pineda-Alvarez, Daniel; Stafford, Amanda; Schmidt, Lindsay; Warren, Stephanie; Gibellini, Federica; Kondakova, Anastasia; Blair, Amanda; Bale, Sherri; Matyakhina, Ludmila; Meck, Jeanne; Aradhya, Swaroop; Haverfield, Eden
2015-08-01
Detection of copy-number variation (CNV) is important for investigating many genetic disorders. Testing a large clinical cohort by array comparative genomic hybridization provides a deep perspective on the spectrum of pathogenic CNV. In this context, we describe a bioinformatics approach to extract CNV information from whole-exome sequencing and demonstrate its utility in clinical testing. Exon-focused arrays and whole-genome chromosomal microarray analysis were used to test 14,228 and 14,000 individuals, respectively. Based on these results, we developed an algorithm to detect deletions/duplications in whole-exome sequencing data and a novel whole-exome array. In the exon array cohort, we observed a positive detection rate of 2.4% (25 duplications, 318 deletions), of which 39% involved one or two exons. Chromosomal microarray analysis identified 3,345 CNVs affecting single genes (18%). We demonstrate that our whole-exome sequencing algorithm resolves CNVs of three or more exons. These results demonstrate the clinical utility of single-exon resolution in CNV assays. Our whole-exome sequencing algorithm approaches this resolution but is complemented by a whole-exome array to unambiguously identify intragenic CNVs and single-exon changes. These data illustrate the next advancements in CNV analysis through whole-exome sequencing and whole-exome array.Genet Med 17 8, 623-629.
USDA-ARS?s Scientific Manuscript database
In mid-January 2016, an outbreak of H7N8 high pathogenicity avian influenza (HPAI) virus in commercial turkeys occurred in Indiana. The outbreak was first detected by an increase in mortality followed by laboratory confirmation of H7N8 HPAI virus. Surveillance within the 10 km Control Zone detected...
Cosmic Accretion and Galaxy Co-Evolution: Lessons from the Extended Chandra Deep Field South
NASA Astrophysics Data System (ADS)
Urry, C. Megan
2011-05-01
The Chandra deep fields reveal that most cosmic accretion onto supermassive black holes is obscured by gas and dust. The GOODS and MUSYC multiwavelength data show that many X-ray-detected AGN are faint and red (or even undetectable) in the optical but bright in the infrared, as is characteristic of obscured sources. (N.B. The ECDFS is most sensitive to the AGN that constitute the X-ray background, namely, moderate luminosity AGN, with log Lx=43-44, at moderate redshifts, 0.5
Laassri, Majid; Mee, Edward T; Connaughton, Sarah M; Manukyan, Hasmik; Gruber, Marion; Rodriguez-Hernandez, Carmen; Minor, Philip D; Schepelmann, Silke; Chumakov, Konstantin; Wood, David J
2018-06-22
Bovine viral diarrhoea virus (BVDV) is a cattle pathogen that has previously been reported to be present in bovine raw materials used in the manufacture of biological products for human use. Seven lots of trivalent measles, mumps and rubella (MMR) vaccine and 1 lot of measles vaccine from the same manufacturer, together with 17 lots of foetal bovine serum (FBS) from different vendors, 4 lots of horse serum, 2 lots of bovine trypsin and 5 lots of porcine trypsin were analysed for BVDV using recently developed techniques, including PCR assays for BVDV detection, a qRT-PCR and immunofluorescence-based virus replication assays, and deep sequencing to identify and genotype BVDV genomes. All FBS lots and one lot of bovine-derived trypsin were PCR-positive for the presence of BVDV genome; in contrast all vaccine lots and the other samples were negative. qRT-PCR based virus replication assay and immunofluorescence-based infection assay detected no infectious BVDV in the PCR-positive samples. Complete BVDV genomes were generated from FBS samples by deep sequencing, and all were BVDV type 1. These data confirmed that BVDV nucleic acid may be present in bovine-derived raw materials, but no infectious virus or genomic RNA was detected in the final vaccine products. Copyright © 2018 International Alliance for Biological Standardization. All rights reserved.
NASA Astrophysics Data System (ADS)
Wu, Yue-Hong; Liao, Li; Wang, Chun-Sheng; Ma, Wei-Lin; Meng, Fan-Xu; Wu, Min; Xu, Xue-Wei
2013-09-01
Deep-sea polymetallic nodules, rich in metals such as Fe, Mn, and Ni, are potential resources for future exploitation. Early culturing and microscopy studies suggest that polymetallic nodules are at least partially biogenic. To understand the microbial communities in this environment, we compared microbial community composition and diversity inside nodules and in the surrounding sediments. Three sampling sites in the Pacific Ocean containing polymetallic nodules were used for culture-independent investigations of microbial diversity. A total of 1013 near full-length bacterial 16S rRNA gene sequences and 640 archaeal 16S rRNA gene sequences with ~650 bp from nodules and the surrounding sediments were analyzed. Bacteria showed higher diversity than archaea. Interestingly, sediments contained more diverse bacterial communities than nodules, while the opposite was detected for archaea. Bacterial communities tend to be mostly unique to sediments or nodules, with only 13.3% of sequences shared. The most abundant bacterial groups detected only in nodules were Pseudoalteromonas and Alteromonas, which were predicted to play a role in building matrix outside cells to induce or control mineralization. However, archaeal communities were mostly shared between sediments and nodules, including the most abundant OTU containing 290 sequences from marine group I Thaumarchaeota. PcoA analysis indicated that microhabitat (i.e., nodule or sediment) seemed to be a major factor influencing microbial community composition, rather than sampling locations or distances between locations.
Zhao, Zijian; Voros, Sandrine; Weng, Ying; Chang, Faliang; Li, Ruijian
2017-12-01
Worldwide propagation of minimally invasive surgeries (MIS) is hindered by their drawback of indirect observation and manipulation, while monitoring of surgical instruments moving in the operated body required by surgeons is a challenging problem. Tracking of surgical instruments by vision-based methods is quite lucrative, due to its flexible implementation via software-based control with no need to modify instruments or surgical workflow. A MIS instrument is conventionally split into a shaft and end-effector portions, while a 2D/3D tracking-by-detection framework is proposed, which performs the shaft tracking followed by the end-effector one. The former portion is described by line features via the RANSAC scheme, while the latter is depicted by special image features based on deep learning through a well-trained convolutional neural network. The method verification in 2D and 3D formulation is performed through the experiments on ex-vivo video sequences, while qualitative validation on in-vivo video sequences is obtained. The proposed method provides robust and accurate tracking, which is confirmed by the experimental results: its 3D performance in ex-vivo video sequences exceeds those of the available state-of -the-art methods. Moreover, the experiments on in-vivo sequences demonstrate that the proposed method can tackle the difficult condition of tracking with unknown camera parameters. Further refinements of the method will refer to the occlusion and multi-instrumental MIS applications.
Diversity of Pico- to Mesoplankton along the 2000 km Salinity Gradient of the Baltic Sea
Hu, Yue O. O.; Karlson, Bengt; Charvet, Sophie; Andersson, Anders F.
2016-01-01
Microbial plankton form the productive base of both marine and freshwater ecosystems and are key drivers of global biogeochemical cycles of carbon and nutrients. Plankton diversity is immense with representations from all major phyla within the three domains of life. So far, plankton monitoring has mainly been based on microscopic identification, which has limited sensitivity and reproducibility, not least because of the numerical majority of plankton being unidentifiable under the light microscope. High-throughput sequencing of taxonomic marker genes offers a means to identify taxa inaccessible by traditional methods; thus, recent studies have unveiled an extensive previously unknown diversity of plankton. Here, we conducted ultra-deep Illumina sequencing (average 105 sequences/sample) of rRNA gene amplicons of surface water eukaryotic and bacterial plankton communities sampled in summer along a 2000 km transect following the salinity gradient of the Baltic Sea. Community composition was strongly correlated with salinity for both bacterial and eukaryotic plankton assemblages, highlighting the importance of salinity for structuring the biodiversity within this ecosystem. In contrast, no clear trends in alpha-diversity for bacterial or eukaryotic communities could be detected along the transect. The distribution of major planktonic taxa followed expected patterns as observed in monitoring programs, but groups novel to the Baltic Sea were also identified, such as relatives to the coccolithophore Emiliana huxleyi detected in the northern Baltic Sea. This study provides the first ultra-deep sequencing-based survey on eukaryotic and bacterial plankton biogeography in the Baltic Sea. PMID:27242706
Integrated digital error suppression for improved detection of circulating tumor DNA
Kurtz, David M.; Chabon, Jacob J.; Scherer, Florian; Stehr, Henning; Liu, Chih Long; Bratman, Scott V.; Say, Carmen; Zhou, Li; Carter, Justin N.; West, Robert B.; Sledge, George W.; Shrager, Joseph B.; Loo, Billy W.; Neal, Joel W.; Wakelee, Heather A.; Diehn, Maximilian; Alizadeh, Ash A.
2016-01-01
High-throughput sequencing of circulating tumor DNA (ctDNA) promises to facilitate personalized cancer therapy. However, low quantities of cell-free DNA (cfDNA) in the blood and sequencing artifacts currently limit analytical sensitivity. To overcome these limitations, we introduce an approach for integrated digital error suppression (iDES). Our method combines in silico elimination of highly stereotypical background artifacts with a molecular barcoding strategy for the efficient recovery of cfDNA molecules. Individually, these two methods each improve the sensitivity of cancer personalized profiling by deep sequencing (CAPP-Seq) by ~3 fold, and synergize when combined to yield ~15-fold improvements. As a result, iDES-enhanced CAPP-Seq facilitates noninvasive variant detection across hundreds of kilobases. Applied to clinical non-small cell lung cancer (NSCLC) samples, our method enabled biopsy-free profiling of EGFR kinase domain mutations with 92% sensitivity and 96% specificity and detection of ctDNA down to 4 in 105 cfDNA molecules. We anticipate that iDES will aid the noninvasive genotyping and detection of ctDNA in research and clinical settings. PMID:27018799
Chin, Ephrem L H; da Silva, Cristina; Hegde, Madhuri
2013-02-19
Detecting mutations in disease genes by full gene sequence analysis is common in clinical diagnostic laboratories. Sanger dideoxy terminator sequencing allows for rapid development and implementation of sequencing assays in the clinical laboratory, but it has limited throughput, and due to cost constraints, only allows analysis of one or at most a few genes in a patient. Next-generation sequencing (NGS), on the other hand, has evolved rapidly, although to date it has mainly been used for large-scale genome sequencing projects and is beginning to be used in the clinical diagnostic testing. One advantage of NGS is that many genes can be analyzed easily at the same time, allowing for mutation detection when there are many possible causative genes for a specific phenotype. In addition, regions of a gene typically not tested for mutations, like deep intronic and promoter mutations, can also be detected. Here we use 20 previously characterized Sanger-sequenced positive controls in disease-causing genes to demonstrate the utility of NGS in a clinical setting using standard PCR based amplification to assess the analytical sensitivity and specificity of the technology for detecting all previously characterized changes (mutations and benign SNPs). The positive controls chosen for validation range from simple substitution mutations to complex deletion and insertion mutations occurring in autosomal dominant and recessive disorders. The NGS data was 100% concordant with the Sanger sequencing data identifying all 119 previously identified changes in the 20 samples. We have demonstrated that NGS technology is ready to be deployed in clinical laboratories. However, NGS and associated technologies are evolving, and clinical laboratories will need to invest significantly in staff and infrastructure to build the necessary foundation for success.
Guo, Feng; Wang, Zhi-Ping; Yu, Ke; Zhang, T.
2015-01-01
Foaming of activated sludge (AS) causes adverse impacts on wastewater treatment operation and hygiene. In this study, we investigated the microbial communities of foam, foaming AS and non-foaming AS in a sewage treatment plant via deep-sequencing of the taxonomic marker genes 16S rRNA and mycobacterial rpoB and a metagenomic approach. In addition to Actinobacteria, many genera (e.g., Clostridium XI, Arcobacter, Flavobacterium) were more abundant in the foam than in the AS. On the other hand, deep-sequencing of rpoB did not detect any obligate pathogenic mycobacteria in the foam. We found that unknown factors other than the abundance of Gordonia sp. could determine the foaming process, because abundance of the same species was stable before and after a foaming event over six months. More interestingly, although the dominant Gordonia foam former was the closest with G. amarae, it was identified as an undescribed Gordonia species by referring to the 16S rRNA gene, gyrB and, most convincingly, the reconstructed draft genome from metagenomic reads. Our results, based on metagenomics and deep sequencing, reveal that foams are derived from diverse taxa, which expands previous understanding and provides new insight into the underlying complications of the foaming phenomenon in AS. PMID:25560234
Sachsenröder, Jana; Twardziok, Sven; Hammerl, Jens A; Janczyk, Pawel; Wrede, Paul; Hertwig, Stefan; Johne, Reimar
2012-01-01
Animal faeces comprise a community of many different microorganisms including bacteria and viruses. Only scarce information is available about the diversity of viruses present in the faeces of pigs. Here we describe a protocol, which was optimized for the purification of the total fraction of viral particles from pig faeces. The genomes of the purified DNA and RNA viruses were simultaneously amplified by PCR and subjected to deep sequencing followed by bioinformatic analyses. The efficiency of the method was monitored using a process control consisting of three bacteriophages (T4, M13 and MS2) with different morphology and genome types. Defined amounts of the bacteriophages were added to the sample and their abundance was assessed by quantitative PCR during the preparation procedure. The procedure was applied to a pooled faecal sample of five pigs. From this sample, 69,613 sequence reads were generated. All of the added bacteriophages were identified by sequence analysis of the reads. In total, 7.7% of the reads showed significant sequence identities with published viral sequences. They mainly originated from bacteriophages (73.9%) and mammalian viruses (23.9%); 0.8% of the sequences showed identities to plant viruses. The most abundant detected porcine viruses were kobuvirus, rotavirus C, astrovirus, enterovirus B, sapovirus and picobirnavirus. In addition, sequences with identities to the chimpanzee stool-associated circular ssDNA virus were identified. Whole genome analysis indicates that this virus, tentatively designated as pig stool-associated circular ssDNA virus (PigSCV), represents a novel pig virus. The established protocol enables the simultaneous detection of DNA and RNA viruses in pig faeces including the identification of so far unknown viruses. It may be applied in studies investigating aetiology, epidemiology and ecology of diseases. The implemented process control serves as quality control, ensures comparability of the method and may be used for further method optimization.
Bergfors, Assar; Leenheer, Daniël; Bergqvist, Anders; Ameur, Adam; Lennerstrand, Johan
2016-02-01
Development of Hepatitis C virus (HCV) resistance against direct-acting antivirals (DAAs), including NS5A inhibitors, is an obstacle to successful treatment of HCV when DAAs are used in sub-optimal combinations. Furthermore, it has been shown that baseline (pre-existing) resistance against DAAs is present in treatment naïve-patients and this will potentially complicate future treatment strategies in different HCV genotypes (GTs). Thus the aim was to detect low levels of NS5A resistant associated variants (RAVs) in a limited sample set of treatment-naïve patients of HCV GT1a and 3a, since such polymorphisms can display in vitro resistance as high as 60000 fold. Ultra-deep single molecule real time (SMRT) sequencing with the Pacific Biosciences (PacBio) RSII instrument was used to detect these RAVs. The SMRT sequencing was conducted on ten samples; three of them positive with Sanger sequencing (GT1a Q30H and Y93N, and GT3a Y93H), five GT1a samples, and two GT3a non-positive samples. The same methods were applied to the HCV GT1a H77-plasmid in a dilution series, in order to determine the error rates of replication, which in turn was used to determine the limit of detection (LOD), as defined by mean + 3SD, of minority variants down to 0.24%. We found important baseline NS5A RAVs at levels between 0.24 and 0.5%, which could potentially have clinical relevance. This new method with low level detection of baseline RAVs could be useful in predicting the most cost-efficient combination of DAA treatment, and reduce the treatment duration for an HCV infected individual. Copyright © 2015 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Yakimov, Michail M.; Cono, Violetta La; Denaro, Renata
2009-05-01
The autotrophic and ammonia-oxidizing crenarchaeal assemblage at offshore site located in the deep Mediterranean (Tyrrhenian Sea, depth 3000 m) water was studied by PCR amplification of the key functional genes involved in energy (ammonia mono-oxygenase alpha subunit, amoA) and central metabolism (acetyl-CoA carboxylase alpha subunit, accA). Using two recently annotated genomes of marine crenarchaeons, an initial set of primers targeting archaeal accA-like genes was designed. Approximately 300 clones were analyzed, of which 100% of amoA library and almost 70% of accA library were unambiguously related to the corresponding genes from marine Crenarchaeota. Even though the acetyl-CoA carboxylase is phylogenetically not well conserved and the remaining clones were affiliated to various bacterial acetyl-CoA/propionyl-CoA carboxylase genes, the pool of archaeal sequences was applied for development of quantitative PCR analysis of accA-like distribution using TaqMan ® methodolgy. The archaeal accA gene fragments, together with alignable gene fragments from the Sargasso Sea and North Pacific Subtropical Gyre (ALOHA Station) metagenome databases, were analyzed by multiple sequence alignment. Two accA-like sequences, found in ALOHA Station at the depth of 4000 m, formed a deeply branched clade with 64% of all archaeal Tyrrhenian clones. No close relatives for residual 36% of clones, except of those recovered from Eastern Mediterranean, was found, suggesting the existence of a specific lineage of the crenarchaeal accA genes in deep Mediterranean water. Alignment of Mediterranean amoA sequences defined four cosmopolitan phylotypes of Crenarchaeota putative ammonia mono-oxygenase subunit A gene occurring in the water sample from the 3000 m depth. Without exception all phylotypes fell into Deep Marine Group I cluster that contain the vast majority of known sequences recovered from global deep-sea environment. Remarkably, three phylotypes accounted for 91% of all Mediterranean amoA clones and corresponded to the sequences retrieved from the less deep compartments of the world's ocean, most likely reflecting the higher temperature at the depth of the Mediterranean Sea. In order to verify whether these phylotypes might represent important Crenarchaeota in the functioning of the Mediterranean bathypelagic ecosystem, expression of crenarchaeal amoA gene was monitored by direct RNA retrieval and following analysis of amoA-related mRNA transcripts. Surprisingly, all mRNA-derived sequences formed a tight monophyletic group, which fell into large Shallow Marine Group I cluster with sequences retrieved from shallow (up to 200 m) waters, sediments and corals. This group was not detected in DNA-based clone library, obviously, due to an overwhelming dominance of the Deep Marine Group I. The failure to recover the amoA transcripts, related to Deep Marine Group I of Crenarchaeota, was unanticipated and likely resulted from the physiology of these strongly adapted deep-sea organisms. As far as all seawater samples were treated on-board under atmospheric pressure conditions and sunlight, the decompression and/or photoinhibition likely affected their metabolic activity, followed by the strong decay of gene expression.
Giugni, Elisabetta; Sabatini, Umberto; Hagberg, Gisela E; Formisano, Rita; Castriota-Scanderbeg, Alessandro
2005-05-01
Diffuse axonal injury (DAI) is a common type of primary neuronal injury in patients with severe traumatic brain injury (TBI), and is frequently accompanied by tissue tear hemorrhage. T2-weighted gradient-recalled echo (GRE) sequences are more sensitive than T2-weighted spin-echo images for detection of hemorrhage. The purpose of this study is to compare turbo Proton Echo Planar Spectroscopic Imaging (t-PEPSI), an extremely fast sequence, with GRE sequence in the detection of DAI. Twenty-one patients (mean age 26.8 years) with severe TBI occurred at least 3 months earlier, underwent a brain MR Imaging study on a 1.5-T scanner. A qualitative evaluation of the t-PEPSI sequences was performed by identifying the optimal echo time and in-plane resolution. The number and size of DAI lesions, as well as the signal intensity contrast ratio (SI CR), were computed for each set of GRE and t-PEPSI images, and divided according to their anatomic location as lobar and/or deep brain. There was no significant difference between GRE and t-PEPSI sequences in the detection of the total number of DAI lesions (291 vs. 230, respectively). GRE sequence delineated a higher number of DAI in the temporal lobe compared to the t-PEPSI sequence (74 vs. 37, P < .004), while no differences were found for the other regions. The SI CR was significantly lower with the t-PEPSI than the GRE sequence (P < .00001). Owing to its very short scan time and high sensitivity to the hemorrhage foci, the t-PEPSI sequence may be used as an alternative to the GRE to assess brain DAI in severe TBI patients, especially if uncooperative and medically unstable.
USDA-ARS?s Scientific Manuscript database
Heilongjiang province is one of the most important potato production areas in China. Frequent outbreaks of virus and viroid diseases in production fields have significantly decreased potato yield and quality. However, we still do not have a clear understanding on the composition and genetic diversit...
Diversity of viruses detected by deep sequencing in pigs from a common background
USDA-ARS?s Scientific Manuscript database
The trial was successful in identifying a number of viruses in the feces of the pigs demonstrating the application of this technology to determine the background noise in the animals. The findings in this study are similar to the fecal virome in pigs from a typical commercial swine farm in the Unite...
Using small RNA deep sequencing data to detect siRNA duplexes induced by plant viruses
USDA-ARS?s Scientific Manuscript database
Small interfering RNA (siRNA) duplexes are produced in plants during virus infection, which are short (usually 21 to 24-base pair) double-stranded RNAs (dsRNAs) with several overhanging nucleotides on the 5' end and 3' end. The investigation of the siRNA duplexes is useful to better understand the R...
Uniform, optimal signal processing of mapped deep-sequencing data.
Kumar, Vibhor; Muratani, Masafumi; Rayan, Nirmala Arul; Kraus, Petra; Lufkin, Thomas; Ng, Huck Hui; Prabhakar, Shyam
2013-07-01
Despite their apparent diversity, many problems in the analysis of high-throughput sequencing data are merely special cases of two general problems, signal detection and signal estimation. Here we adapt formally optimal solutions from signal processing theory to analyze signals of DNA sequence reads mapped to a genome. We describe DFilter, a detection algorithm that identifies regulatory features in ChIP-seq, DNase-seq and FAIRE-seq data more accurately than assay-specific algorithms. We also describe EFilter, an estimation algorithm that accurately predicts mRNA levels from as few as 1-2 histone profiles (R ∼0.9). Notably, the presence of regulatory motifs in promoters correlates more with histone modifications than with mRNA levels, suggesting that histone profiles are more predictive of cis-regulatory mechanisms. We show by applying DFilter and EFilter to embryonic forebrain ChIP-seq data that regulatory protein identification and functional annotation are feasible despite tissue heterogeneity. The mathematical formalism underlying our tools facilitates integrative analysis of data from virtually any sequencing-based functional profile.
Same-day genomic and epigenomic diagnosis of brain tumors using real-time nanopore sequencing.
Euskirchen, Philipp; Bielle, Franck; Labreche, Karim; Kloosterman, Wigard P; Rosenberg, Shai; Daniau, Mailys; Schmitt, Charlotte; Masliah-Planchon, Julien; Bourdeaut, Franck; Dehais, Caroline; Marie, Yannick; Delattre, Jean-Yves; Idbaih, Ahmed
2017-11-01
Molecular classification of cancer has entered clinical routine to inform diagnosis, prognosis, and treatment decisions. At the same time, new tumor entities have been identified that cannot be defined histologically. For central nervous system tumors, the current World Health Organization classification explicitly demands molecular testing, e.g., for 1p/19q-codeletion or IDH mutations, to make an integrated histomolecular diagnosis. However, a plethora of sophisticated technologies is currently needed to assess different genomic and epigenomic alterations and turnaround times are in the range of weeks, which makes standardized and widespread implementation difficult and hinders timely decision making. Here, we explored the potential of a pocket-size nanopore sequencing device for multimodal and rapid molecular diagnostics of cancer. Low-pass whole genome sequencing was used to simultaneously generate copy number (CN) and methylation profiles from native tumor DNA in the same sequencing run. Single nucleotide variants in IDH1, IDH2, TP53, H3F3A, and the TERT promoter region were identified using deep amplicon sequencing. Nanopore sequencing yielded ~0.1X genome coverage within 6 h and resulting CN and epigenetic profiles correlated well with matched microarray data. Diagnostically relevant alterations, such as 1p/19q codeletion, and focal amplifications could be recapitulated. Using ad hoc random forests, we could perform supervised pan-cancer classification to distinguish gliomas, medulloblastomas, and brain metastases of different primary sites. Single nucleotide variants in IDH1, IDH2, and H3F3A were identified using deep amplicon sequencing within minutes of sequencing. Detection of TP53 and TERT promoter mutations shows that sequencing of entire genes and GC-rich regions is feasible. Nanopore sequencing allows same-day detection of structural variants, point mutations, and methylation profiling using a single device with negligible capital cost. It outperforms hybridization-based and current sequencing technologies with respect to time to diagnosis and required laboratory equipment and expertise, aiming to make precision medicine possible for every cancer patient, even in resource-restricted settings.
Wu, Shuang; Nakamoto, Shingo; Kanda, Tatsuo; Jiang, Xia; Nakamura, Masato; Miyamura, Tatsuo; Shirasawa, Hiroshi; Sugiura, Nobuyuki; Takahashi-Nakaguchi, Azusa; Gonoi, Tohru; Yokosuka, Osamu
2014-01-01
Hepatitis A virus (HAV) is a causative agent of acute viral hepatitis for which an effective vaccine has been developed. Here we describe ultra-deep pyrosequences (UDPSs) of HAV 5'-untranslated region (5'UTR) among cases of the same outbreak, which arose from a single source, associated with a revolving sushi bar. We determined the reference sequence from HAV-derived clone from an attendant by the Sanger method. Sixteen UDPSs from this outbreak and one from another sporadic case were compared with this reference. Nucleotide errors yielded a UDPS error rate of < 1%. This study confirmed that nucleotide substitutions of this region are transition mutations in outbreak cases, that insertion was observed only in non-severe cases, and that these nucleotide substitutions were different from those of the sporadic case. Analysis of UDPSs detected low-prevalence HAV variations in 5'UTR, but no specific mutations associated with severity in these outbreak cases. To our surprise, HAV strains in this outbreak conserved HAV IRES sequence even if we performed analysis of UDPSs. UDPS analysis of HAV 5'UTR gave us no association between the disease severity of hepatitis A and HAV 5'UTR substitutions. It might be more interesting to perform ultra-deep sequencing of full length HAV genome in order to reveal possible unknown genomic determinants associated with disease severity. Further studies will be needed. PMID:24396287
Magnetic resonance imaging of the subthalamic nucleus for deep brain stimulation.
Chandran, Arjun S; Bynevelt, Michael; Lind, Christopher R P
2016-01-01
The subthalamic nucleus (STN) is one of the most important stereotactic targets in neurosurgery, and its accurate imaging is crucial. With improving MRI sequences there is impetus for direct targeting of the STN. High-quality, distortion-free images are paramount. Image reconstruction techniques appear to show the greatest promise in balancing the issue of geometrical distortion and STN edge detection. Existing spin echo- and susceptibility-based MRI sequences are compared with new image reconstruction methods. Quantitative susceptibility mapping is the most promising technique for stereotactic imaging of the STN.
Validation of a next-generation sequencing assay for clinical molecular oncology.
Cottrell, Catherine E; Al-Kateb, Hussam; Bredemeyer, Andrew J; Duncavage, Eric J; Spencer, David H; Abel, Haley J; Lockwood, Christina M; Hagemann, Ian S; O'Guin, Stephanie M; Burcea, Lauren C; Sawyer, Christopher S; Oschwald, Dayna M; Stratman, Jennifer L; Sher, Dorie A; Johnson, Mark R; Brown, Justin T; Cliften, Paul F; George, Bijoy; McIntosh, Leslie D; Shrivastava, Savita; Nguyen, Tudung T; Payton, Jacqueline E; Watson, Mark A; Crosby, Seth D; Head, Richard D; Mitra, Robi D; Nagarajan, Rakesh; Kulkarni, Shashikant; Seibert, Karen; Virgin, Herbert W; Milbrandt, Jeffrey; Pfeifer, John D
2014-01-01
Currently, oncology testing includes molecular studies and cytogenetic analysis to detect genetic aberrations of clinical significance. Next-generation sequencing (NGS) allows rapid analysis of multiple genes for clinically actionable somatic variants. The WUCaMP assay uses targeted capture for NGS analysis of 25 cancer-associated genes to detect mutations at actionable loci. We present clinical validation of the assay and a detailed framework for design and validation of similar clinical assays. Deep sequencing of 78 tumor specimens (≥ 1000× average unique coverage across the capture region) achieved high sensitivity for detecting somatic variants at low allele fraction (AF). Validation revealed sensitivities and specificities of 100% for detection of single-nucleotide variants (SNVs) within coding regions, compared with SNP array sequence data (95% CI = 83.4-100.0 for sensitivity and 94.2-100.0 for specificity) or whole-genome sequencing (95% CI = 89.1-100.0 for sensitivity and 99.9-100.0 for specificity) of HapMap samples. Sensitivity for detecting variants at an observed 10% AF was 100% (95% CI = 93.2-100.0) in HapMap mixes. Analysis of 15 masked specimens harboring clinically reported variants yielded concordant calls for 13/13 variants at AF of ≥ 15%. The WUCaMP assay is a robust and sensitive method to detect somatic variants of clinical significance in molecular oncology laboratories, with reduced time and cost of genetic analysis allowing for strategic patient management. Copyright © 2014 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhang, Likui; Kang, Manyu; Xu, Jiajun; Xu, Jian; Shuai, Yinjie; Zhou, Xiaojian; Yang, Zhihui; Ma, Kesen
2016-05-01
Active deep-sea hydrothermal vents harbor abundant thermophilic and hyperthermophilic microorganisms. However, microbial communities in inactive hydrothermal vents have not been well documented. Here, we investigated bacterial and archaeal communities in the two deep-sea sediments (named as TVG4 and TVG11) collected from inactive hydrothermal vents in the Southwest India Ridge using the high-throughput sequencing technology of Illumina MiSeq2500 platform. Based on the V4 region of 16S rRNA gene, sequence analysis showed that bacterial communities in the two samples were dominated by Proteobacteria, followed by Bacteroidetes, Actinobacteria and Firmicutes. Furthermore, archaeal communities in the two samples were dominated by Thaumarchaeota and Euryarchaeota. Comparative analysis showed that (i) TVG4 displayed the higher bacterial richness and lower archaeal richness than TVG11; (ii) the two samples had more divergence in archaeal communities than bacterial communities. Bacteria and archaea that are potentially associated with nitrogen, sulfur metal and methane cycling were detected in the two samples. Overall, we first provided a comparative picture of bacterial and archaeal communities and revealed their potentially ecological roles in the deep-sea environments of inactive hydrothermal vents in the Southwest Indian Ridge, augmenting microbial communities in inactive hydrothermal vents.
Tillmar, Andreas O.; Dell'Amico, Barbara; Welander, Jenny; Holmlund, Gunilla
2013-01-01
Species identification can be interesting in a wide range of areas, for example, in forensic applications, food monitoring and in archeology. The vast majority of existing DNA typing methods developed for species determination, mainly focuses on a single species source. There are, however, many instances where all species from mixed sources need to be determined, even when the species in minority constitutes less than 1 % of the sample. The introduction of next generation sequencing opens new possibilities for such challenging samples. In this study we present a universal deep sequencing method using 454 GS Junior sequencing of a target on the mitochondrial gene 16S rRNA. The method was designed through phylogenetic analyses of DNA reference sequences from more than 300 mammal species. Experiments were performed on artificial species-species mixture samples in order to verify the method’s robustness and its ability to detect all species within a mixture. The method was also tested on samples from authentic forensic casework. The results showed to be promising, discriminating over 99.9 % of mammal species and the ability to detect multiple donors within a mixture and also to detect minor components as low as 1 % of a mixed sample. PMID:24358309
Vavrova, Eva; Kantorova, Barbara; Vonkova, Barbara; Kabathova, Jitka; Skuhrova-Francova, Hana; Diviskova, Eva; Letocha, Ondrej; Kotaskova, Jana; Brychtova, Yvona; Doubek, Michael; Mayer, Jiri; Pospisilova, Sarka
2017-09-01
The hotspot c.7541_7542delCT NOTCH1 mutation has been proven to have a negative clinical impact in chronic lymphocytic leukemia (CLL). However, an optimal method for its detection has not yet been specified. The aim of our study was to examine the presence of the NOTCH1 mutation in CLL using three commonly used molecular methods. Sanger sequencing, fragment analysis and allele-specific PCR were compared in the detection of the c.7541_7542delCT NOTCH1 mutation in 201 CLL patients. In 7 patients with inconclusive mutational analysis results, the presence of the NOTCH1 mutation was also confirmed using ultra-deep next generation sequencing. The NOTCH1 mutation was detected in 15% (30/201) of examined patients. Only fragment analysis was able to identify all 30 NOTCH1-mutated patients. Sanger sequencing and allele-specific PCR showed a lower detection efficiency, determining 93% (28/30) and 80% (24/30) of the present NOTCH1 mutations, respectively. Considering these three most commonly used methodologies for c.7541_7542delCT NOTCH1 mutation screening in CLL, we defined fragment analysis as the most suitable approach for detecting the hotspot NOTCH1 mutation. Copyright © 2017 Elsevier Ltd. All rights reserved.
Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing
Matochko, Wadim L.; Derda, Ratmir
2013-01-01
Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (S a). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of S a and use them to define the sequencing operator (S e q). Sequencing without any bias and errors is S e q = S a IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (C E N), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process. PMID:24416071
QQ-SNV: single nucleotide variant detection at low frequency by comparing the quality quantiles.
Van der Borght, Koen; Thys, Kim; Wetzels, Yves; Clement, Lieven; Verbist, Bie; Reumers, Joke; van Vlijmen, Herman; Aerssens, Jeroen
2015-11-10
Next generation sequencing enables studying heterogeneous populations of viral infections. When the sequencing is done at high coverage depth ("deep sequencing"), low frequency variants can be detected. Here we present QQ-SNV (http://sourceforge.net/projects/qqsnv), a logistic regression classifier model developed for the Illumina sequencing platforms that uses the quantiles of the quality scores, to distinguish true single nucleotide variants from sequencing errors based on the estimated SNV probability. To train the model, we created a dataset of an in silico mixture of five HIV-1 plasmids. Testing of our method in comparison to the existing methods LoFreq, ShoRAH, and V-Phaser 2 was performed on two HIV and four HCV plasmid mixture datasets and one influenza H1N1 clinical dataset. For default application of QQ-SNV, variants were called using a SNV probability cutoff of 0.5 (QQ-SNV(D)). To improve the sensitivity we used a SNV probability cutoff of 0.0001 (QQ-SNV(HS)). To also increase specificity, SNVs called were overruled when their frequency was below the 80(th) percentile calculated on the distribution of error frequencies (QQ-SNV(HS-P80)). When comparing QQ-SNV versus the other methods on the plasmid mixture test sets, QQ-SNV(D) performed similarly to the existing approaches. QQ-SNV(HS) was more sensitive on all test sets but with more false positives. QQ-SNV(HS-P80) was found to be the most accurate method over all test sets by balancing sensitivity and specificity. When applied to a paired-end HCV sequencing study, with lowest spiked-in true frequency of 0.5%, QQ-SNV(HS-P80) revealed a sensitivity of 100% (vs. 40-60% for the existing methods) and a specificity of 100% (vs. 98.0-99.7% for the existing methods). In addition, QQ-SNV required the least overall computation time to process the test sets. Finally, when testing on a clinical sample, four putative true variants with frequency below 0.5% were consistently detected by QQ-SNV(HS-P80) from different generations of Illumina sequencers. We developed and successfully evaluated a novel method, called QQ-SNV, for highly efficient single nucleotide variant calling on Illumina deep sequencing virology data.
Van Eygen, Veerle; Thys, Kim; Van Hove, Carl; Rimsky, Laurence T; De Meyer, Sandra; Aerssens, Jeroen; Picchio, Gaston; Vingerhoets, Johan
2016-05-01
Minority variants (1.0-25.0%) were evaluated by deep sequencing (DS) at baseline and virological failure (VF) in a selection of antiretroviral treatment-naïve, HIV-1-infected patients from the rilpivirine ECHO/THRIVE phase III studies. Linkage between frequently emerging resistance-associated mutations (RAMs) was determined. DS (llIumina®) and population sequencing (PS) results were available at baseline for 47 VFs and time of failure for 48 VFs; and at baseline for 49 responders matched for baseline characteristics. Minority mutations were accurately detected at frequencies down to 1.2% of the HIV-1 quasispecies. No baseline minority rilpivirine RAMs were detected in VFs; one responder carried 1.9% F227C. Baseline minority mutations associated with resistance to other non-nucleoside reverse transcriptase inhibitors (NNRTIs) were detected in 8/47 VFs (17.0%) and 7/49 responders (14.3%). Baseline minority nucleoside/nucleotide reverse transcriptase inhibitor (NRTI) RAMs M184V and L210W were each detected in one VF (none in responders). At failure, two patients without NNRTI RAMs by PS carried minority rilpivirine RAMs K101E and/or E138K; and five additional patients carried other minority NNRTI RAMs V90I, V106I, V179I, V189I, and Y188H. Overall at failure, minority NNRTI RAMs and NRTI RAMs were found in 29/48 (60.4%) and 16/48 VFs (33.3%), respectively. Linkage analysis showed that E138K and K101E were usually not observed on the same viral genome. In conclusion, baseline minority rilpivirine RAMs and other NNRTI/NRTI RAMs were uncommon in the rilpivirine arm of the ECHO and THRIVE studies. DS at failure showed emerging NNRTI resistant minority variants in seven rilpivirine VFs who had no detectable NNRTI RAMs by PS. © 2015 Wiley Periodicals, Inc.
Rössle, Matthias; Sigg, Michèle; Rüschoff, Jan H; Wild, Peter J; Moch, Holger; Weber, Achim; Rechsteiner, Markus P
2013-11-01
The activating BRAF (V600) mutation is a well-established negative prognostic biomarker in metastatic colorectal carcinoma (CRC). A recently developed monoclonal mouse antibody (clone VE1) has been shown to detect reliably BRAF (V600E) mutated protein by immunohistochemistry (IHC). In this study, we aimed to compare the detection of BRAF (V600E) mutations by IHC, Sanger sequencing (SaS), and ultra-deep sequencing (UDS) in CRC. VE1-IHC was established in a cohort of 68 KRAS wild-type CRCs. The VE1-IHC was only positive in the three patients with a known BRAF (V600E) mutation as assessed by SaS and UDS. The test cohort consisted of 265 non-selected, consecutive CRC samples. Thirty-nine out of 265 cases (14.7%) were positive by VE1-IHC. SaS of 20 randomly selected IHC negative tumors showed BRAF wild-type (20/20). Twenty-four IHC-positive cases were confirmed by SaS (24/39; 61.5%) and 15 IHC-positive cases (15/39; 38.5%) showed a BRAF wild-type by SaS. UDS detected a BRAF (V600E) mutation in 13 of these 15 discordant cases. In one tumor, the mutation frequency was below our threshold for UDS positivity, while in another case, UDS could not be performed due to low DNA amount. Statistical analysis showed sensitivities of 100% and 63% and specificities of 95 and 100% for VE1-IHC and SaS, respectively, compared to combined results of SaS and UDS. Our data suggests that there is high concordance between UDS and IHC using the anti-BRAF(V600E) (VE1) antibody. Thus, VE1 immunohistochemistry is a highly sensitive and specific method in detecting BRAF (V600E) mutations in colorectal carcinoma.
Bi, Yaqi; Tugume, Arthur K.; Valkonen, Jari P. T.
2012-01-01
Background Arctium species (Asteraceae) are distributed worldwide and are used as food and rich sources of secondary metabolites for the pharmaceutical industry, e.g., against avian influenza virus. RNA silencing is an antiviral defense mechanism that detects and destroys virus-derived double-stranded RNA, resulting in accumulation of virus-derived small RNAs (21–24 nucleotides) that can be used for generic detection of viruses by small-RNA deep sequencing (SRDS). Methodology/Principal Findings SRDS was used to detect viruses in the biennial wild plant species Arctium tomentosum (woolly burdock; family Asteraceae) displaying virus-like symptoms of vein yellowing and leaf mosaic in southern Finland. Assembly of the small-RNA reads resulted in contigs homologous to Alstroemeria virus X (AlsVX), a positive/single-stranded RNA virus of genus Potexvirus (family Alphaflexiviridae), or related to negative/single-stranded RNA viruses of the genus Emaravirus. The coat protein gene of AlsVX was 81% and 89% identical to the two AlsVX isolates from Japan and Norway, respectively. The deduced, partial nucleocapsid protein amino acid sequence of the emara-like virus was only 78% or less identical to reported emaraviruses and showed no variability among the virus isolates characterized. This virus—tentatively named as Woolly burdock yellow vein virus—was exclusively associated with yellow vein and leaf mosaic symptoms in woolly burdock, whereas AlsVX was detected in only one of the 52 plants tested. Conclusions/Significance These results provide novel information about natural virus infections in Acrtium species and reveal woolly burdock as the first natural host of AlsVX besides Alstroemeria (family Alstroemeriaceae). Results also revealed a new virus related to the recently emerged Emaravirus genus and demonstrated applicability of SRDS to detect negative-strand RNA viruses. SRDS potentiates virus surveys of wild plants, a research area underrepresented in plant virology, and helps reveal natural reservoirs of viruses that cause yield losses in cultivated plants. PMID:22912734
Zhu, Yuan O; Aw, Pauline P K; de Sessions, Paola Florez; Hong, Shuzhen; See, Lee Xian; Hong, Lewis Z; Wilm, Andreas; Li, Chen Hao; Hue, Stephane; Lim, Seng Gee; Nagarajan, Niranjan; Burkholder, William F; Hibberd, Martin
2017-10-27
Viral populations are complex, dynamic, and fast evolving. The evolution of groups of closely related viruses in a competitive environment is termed quasispecies. To fully understand the role that quasispecies play in viral evolution, characterizing the trajectories of viral genotypes in an evolving population is the key. In particular, long-range haplotype information for thousands of individual viruses is critical; yet generating this information is non-trivial. Popular deep sequencing methods generate relatively short reads that do not preserve linkage information, while third generation sequencing methods have higher error rates that make detection of low frequency mutations a bioinformatics challenge. Here we applied BAsE-Seq, an Illumina-based single-virion sequencing technology, to eight samples from four chronic hepatitis B (CHB) patients - once before antiviral treatment and once after viral rebound due to resistance. With single-virion sequencing, we obtained 248-8796 single-virion sequences per sample, which allowed us to find evidence for both hard and soft selective sweeps. We were able to reconstruct population demographic history that was independently verified by clinically collected data. We further verified four of the samples independently through PacBio SMRT and Illumina Pooled deep sequencing. Overall, we showed that single-virion sequencing yields insight into viral evolution and population dynamics in an efficient and high throughput manner. We believe that single-virion sequencing is widely applicable to the study of viral evolution in the context of drug resistance and host adaptation, allows differentiation between soft or hard selective sweeps, and may be useful in the reconstruction of intra-host viral population demographic history.
Triadó-Margarit, Xavier; Casamayor, Emilio O
2015-12-01
Diversity of small protists was studied in sulfidic and anoxic (euxinic) stratified karstic lakes and coastal lagoons by 18S rRNA gene analyses. We hypothesized a major sulfide effect, reducing protist diversity and richness with only a few specialized populations adapted to deal with low-redox conditions and high-sulfide concentrations. However, genetic fingerprinting suggested similar ecological diversity in anoxic and sulfurous than in upper oxygen rich water compartments with specific populations inhabiting euxinic waters. Many of them agreed with genera previously identified by microscopic observations, but also new and unexpected groups were detected. Most of the sequences matched a rich assemblage of Ciliophora (i.e., Coleps, Prorodon, Plagiopyla, Strombidium, Metopus, Vorticella and Caenomorpha, among others) and algae (mainly Cryptomonadales). Unidentified Cercozoa, Fungi, Stramenopiles and Discoba were recurrently found. The lack of GenBank counterparts was higher in deep hypolimnetic waters and appeared differentially allocated in the different taxa, being higher within Discoba and lower in Cryptophyceae. A larger number of populations than expected were specifically detected in the deep sulfurous waters, with unknown ecological interactions and metabolic capabilities. © 2015 Society for Applied Microbiology and John Wiley & Sons Ltd.
Deep Packet/Flow Analysis using GPUs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gong, Qian; Wu, Wenji; DeMar, Phil
Deep packet inspection (DPI) faces severe performance challenges in high-speed networks (40/100 GE) as it requires a large amount of raw computing power and high I/O throughputs. Recently, researchers have tentatively used GPUs to address the above issues and boost the performance of DPI. Typically, DPI applications involve highly complex operations in both per-packet and per-flow data level, often in real-time. The parallel architecture of GPUs fits exceptionally well for per-packet network traffic processing. However, for stateful network protocols such as TCP, their data stream need to be reconstructed in a per-flow level to deliver a consistent content analysis. Sincemore » the flow-centric operations are naturally antiparallel and often require large memory space for buffering out-of-sequence packets, they can be problematic for GPUs, whose memory is normally limited to several gigabytes. In this work, we present a highly efficient GPU-based deep packet/flow analysis framework. The proposed design includes a purely GPU-implemented flow tracking and TCP stream reassembly. Instead of buffering and waiting for TCP packets to become in sequence, our framework process the packets in batch and uses a deterministic finite automaton (DFA) with prefix-/suffix- tree method to detect patterns across out-of-sequence packets that happen to be located in different batches. In conclusion, evaluation shows that our code can reassemble and forward tens of millions of packets per second and conduct a stateful signature-based deep packet inspection at 55 Gbit/s using an NVIDIA K40 GPU.« less
You, Ronghui; Huang, Xiaodi; Zhu, Shanfeng
2018-06-06
As of April 2018, UniProtKB has collected more than 115 million protein sequences. Less than 0.15% of these proteins, however, have been associated with experimental GO annotations. As such, the use of automatic protein function prediction (AFP) to reduce this huge gap becomes increasingly important. The previous studies conclude that sequence homology based methods are highly effective in AFP. In addition, mining motif, domain, and functional information from protein sequences has been found very helpful for AFP. Other than sequences, alternative information sources such as text, however, may be useful for AFP as well. Instead of using BOW (bag of words) representation in traditional text-based AFP, we propose a new method called DeepText2GO that relies on deep semantic text representation, together with different kinds of available protein information such as sequence homology, families, domains, and motifs, to improve large-scale AFP. Furthermore, DeepText2GO integrates text-based methods with sequence-based ones by means of a consensus approach. Extensive experiments on the benchmark dataset extracted from UniProt/SwissProt have demonstrated that DeepText2GO significantly outperformed both text-based and sequence-based methods, validating its superiority. Copyright © 2018 Elsevier Inc. All rights reserved.
Current and future molecular diagnostics for ocular infectious diseases.
Doan, Thuy; Pinsky, Benjamin A
2016-11-01
Confirmation of ocular infections can pose great challenges to the clinician. A fundamental limitation is the small amounts of specimen that can be obtained from the eye. Molecular diagnostics can circumvent this limitation and have been shown to be more sensitive than conventional culture. The purpose of this review is to describe new molecular methods and to discuss the applications of next-generation sequencing-based approaches in the diagnosis of ocular infections. Efforts have focused on improving the sensitivity of pathogen detection using molecular methods. This review describes a new molecular target for Toxoplasma gondii-directed polymerase chain reaction assays. Molecular diagnostics for Chlamydia trachomatis and Acanthamoeba species are also discussed. Finally, we describe a hypothesis-free approach, metagenomic deep sequencing, which can detect DNA and RNA pathogens from a single specimen in one test. In some cases, this method can provide the geographic location and timing of the infection. Pathogen-directed PCRs have been powerful tools in the diagnosis of ocular infections for over 20 years. The use of next-generation sequencing-based approaches, when available, will further improve sensitivity of detection with the potential to improve patient care.
Zhang, Hanyuan; Vieira Resende E Silva, Bruno; Cui, Juan
2018-05-01
Small RNA sequencing is the most widely used tool for microRNA (miRNA) discovery, and shows great potential for the efficient study of miRNA cross-species transport, i.e., by detecting the presence of exogenous miRNA sequences in the host species. Because of the increased appreciation of dietary miRNAs and their far-reaching implication in human health, research interests are currently growing with regard to exogenous miRNAs bioavailability, mechanisms of cross-species transport and miRNA function in cellular biological processes. In this article, we present microRNA Discovery (miRDis), a new small RNA sequencing data analysis pipeline for both endogenous and exogenous miRNA detection. Specifically, we developed and deployed a Web service that supports the annotation and expression profiling data of known host miRNAs and the detection of novel miRNAs, other noncoding RNAs, and the exogenous miRNAs from dietary species. As a proof-of-concept, we analyzed a set of human plasma sequencing data from a milk-feeding study where 225 human miRNAs were detected in the plasma samples and 44 show elevated expression after milk intake. By examining the bovine-specific sequences, data indicate that three bovine miRNAs (bta-miR-378, -181* and -150) are present in human plasma possibly because of the dietary uptake. Further evaluation based on different sets of public data demonstrates that miRDis outperforms other state-of-the-art tools in both detection and quantification of miRNA from either animal or plant sources. The miRDis Web server is available at: http://sbbi.unl.edu/miRDis/index.php.
Exon 11 skipping of SCN10A coding for voltage-gated sodium channels in dorsal root ganglia
Schirmeyer, Jana; Szafranski, Karol; Leipold, Enrico; Mawrin, Christian; Platzer, Matthias; Heinemann, Stefan H
2014-01-01
The voltage-gated sodium channel NaV1.8 (encoded by SCN10A) is predominantly expressed in dorsal root ganglia (DRG) and plays a critical role in pain perception. We analyzed SCN10A transcripts isolated from human DRGs using deep sequencing and found a novel splice variant lacking exon 11, which codes for 98 amino acids of the domain I/II linker. Quantitative PCR analysis revealed an abundance of this variant of up to 5–10% in human, while no such variants were detected in mouse or rat. Since no obvious functional differences between channels with and without the exon-11 sequence were detected, it is suggested that SCN10A exon 11 skipping in humans is a tolerated event. PMID:24763188
Mason, Olivia U; Hazen, Terry C; Borglin, Sharon; Chain, Patrick S G; Dubinsky, Eric A; Fortney, Julian L; Han, James; Holman, Hoi-Ying N; Hultman, Jenni; Lamendella, Regina; Mackelprang, Rachel; Malfatti, Stephanie; Tom, Lauren M; Tringe, Susannah G; Woyke, Tanja; Zhou, Jizhong; Rubin, Edward M; Jansson, Janet K
2012-09-01
The Deepwater Horizon oil spill in the Gulf of Mexico resulted in a deep-sea hydrocarbon plume that caused a shift in the indigenous microbial community composition with unknown ecological consequences. Early in the spill history, a bloom of uncultured, thus uncharacterized, members of the Oceanospirillales was previously detected, but their role in oil disposition was unknown. Here our aim was to determine the functional role of the Oceanospirillales and other active members of the indigenous microbial community using deep sequencing of community DNA and RNA, as well as single-cell genomics. Shotgun metagenomic and metatranscriptomic sequencing revealed that genes for motility, chemotaxis and aliphatic hydrocarbon degradation were significantly enriched and expressed in the hydrocarbon plume samples compared with uncontaminated seawater collected from plume depth. In contrast, although genes coding for degradation of more recalcitrant compounds, such as benzene, toluene, ethylbenzene, total xylenes and polycyclic aromatic hydrocarbons, were identified in the metagenomes, they were expressed at low levels, or not at all based on analysis of the metatranscriptomes. Isolation and sequencing of two Oceanospirillales single cells revealed that both cells possessed genes coding for n-alkane and cycloalkane degradation. Specifically, the near-complete pathway for cyclohexane oxidation in the Oceanospirillales single cells was elucidated and supported by both metagenome and metatranscriptome data. The draft genome also included genes for chemotaxis, motility and nutrient acquisition strategies that were also identified in the metagenomes and metatranscriptomes. These data point towards a rapid response of members of the Oceanospirillales to aliphatic hydrocarbons in the deep sea.
Leung, Ross Ka-Kit; Dong, Zhi Qiang; Sa, Fei; Chong, Cheong Meng; Lei, Si Wan; Tsui, Stephen Kwok-Wing; Lee, Simon Ming-Yuen
2014-02-01
Minor variants have significant implications in quasispecies evolution, early cancer detection and non-invasive fetal genotyping but their accurate detection by next-generation sequencing (NGS) is hampered by sequencing errors. We generated sequencing data from mixtures at predetermined ratios in order to provide insight into sequencing errors and variations that can arise for which simulation cannot be performed. The information also enables better parameterization in depth of coverage, read quality and heterogeneity, library preparation techniques, technical repeatability for mathematical modeling, theory development and simulation experimental design. We devised minor variant authentication rules that achieved 100% accuracy in both testing and validation experiments. The rules are free from tedious inspection of alignment accuracy, sequencing read quality or errors introduced by homopolymers. The authentication processes only require minor variants to: (1) have minimum depth of coverage larger than 30; (2) be reported by (a) four or more variant callers, or (b) DiBayes or LoFreq, plus SNVer (or BWA when no results are returned by SNVer), and with the interassay coefficient of variation (CV) no larger than 0.1. Quantification accuracy undermined by sequencing errors could neither be overcome by ultra-deep sequencing, nor recruiting more variant callers to reach a consensus, such that consistent underestimation and overestimation (i.e. low CV) were observed. To accommodate stochastic error and adjust the observed ratio within a specified accuracy, we presented a proof of concept for the use of a double calibration curve for quantification, which provides an important reference towards potential industrial-scale fabrication of calibrants for NGS.
NASA Astrophysics Data System (ADS)
De Marchi, G.; Paresce, F.; Straniero, O.; Prada Moroni, P. G.
2004-03-01
Very deep images of the Galactic globular cluster M 4 (NGC 6121) through the F606W and F814W filters were taken in 2001 with the WFPC2 on board the HST. A first published analysis of this data set (Richer et al. \\cite{Richer2002}) produced the result that the age of M 4 is 12.7± 0.7 Gyr (Hansen et al. \\cite{Hansen2002}), thus setting a robust lower limit to the age of the universe. In view of the great astronomical importance of getting this number right, we have subjected the same data set to the simplest possible photometric analysis that completely avoids uncertain assumptions about the origin of the detected sources. This analysis clearly reveals both a thin main sequence, from which can be deduced the deepest statistically complete mass function yet determined for a globular cluster, and a white dwarf (WD) sequence extending all the way down to the 5 \\sigma detection limit at I ≃ 27. The WD sequence is abruptly terminated at exactly this limit as expected by detection statistics. Using our most recent theoretical WD models (Prada Moroni & Straniero \\cite{Prada2002}) to obtain the expected WD sequence for different ages in the observed bandpasses, we find that the data so far obtained do not reach the peak of the WD luminosity function, thus only allowing one to set a lower limit to the age of M 4 of ˜9 Gyr. Thus, the problem of determining the absolute age of a globular cluster and, therefore, the onset of GC formation with cosmologically significant accuracy remains completely open. Only observations several magnitudes deeper than the limit obtained so far would allow one to approach this objective. Based on observations with the NASA/ESA Hubble Space Telescope, obtained at the Space Telescope Science Institute, which is operated by AURA for NASA under contract NAS5-26555.
Itakura, Jun; Kurosaki, Masayuki; Higuchi, Mayu; Takada, Hitomi; Nakakuki, Natsuko; Itakura, Yoshie; Tamaki, Nobuharu; Yasui, Yutaka; Suzuki, Shoko; Tsuchiya, Kaoru; Nakanishi, Hiroyuki; Takahashi, Yuka; Maekawa, Shinya; Enomoto, Nobuyuki; Izumi, Namiki
2015-01-01
The presence of resistance-associated variants (RAVs) of hepatitis C virus (HCV) attenuates the efficacy of direct acting antivirals (DAAs). The objective of this study was to characterize the susceptibility of RAVs to interferon-based therapy. Direct and deep sequencing were performed to detect Y93H RAV in the NS5A region. Twenty nine genotype 1b patients with detectable RAV at baseline were treated by a combination of simeprevir, pegylated interferon and ribavirin. The longitudinal changes in the proportion of Y93H RAV during therapy and at breakthrough or relapse were determined. By direct sequencing, Y93H RAV became undetectable or decreased in proportion at an early time point during therapy (within 7 days) in 57% of patients with both the Y93H variant and wild type virus at baseline when HCV RNA was still detectable. By deep sequencing, the proportion of Y93H RAV against Y93 wild type was 52.7% (5.8%- 97.4%) at baseline which significantly decreased to 29.7% (0.16%- 98.3%) within 7 days of initiation of treatment (p = 0.023). The proportion of Y93H RAV was reduced in 21 of 29 cases (72.4%) and a marked reduction of more than 10% was observed in 14 cases (48.7%). HCV RNA reduction was significantly greater for Y93H RAV (-3.65±1.3 logIU/mL/day) than the Y93 wild type (-3.35±1.0 logIU/mL/day) (p<0.001). Y93H RAV is more susceptible to interferon-based therapy than the Y93 wild type.
Ghaju Shrestha, Rajani; Tanaka, Yasuhiro; Malla, Bikash; Bhandari, Dinesh; Tandukar, Sarmila; Inoue, Daisuke; Sei, Kazunari; Sherchand, Jeevan B; Haramoto, Eiji
2017-12-01
Bacteriological analysis of drinking water leads to detection of only conventional fecal indicator bacteria. This study aimed to explore and characterize bacterial diversity, to understand the extent of pathogenic bacterial contamination, and to examine the relationship between pathogenic bacteria and fecal indicator bacteria in different water sources in the Kathmandu Valley, Nepal. Sixteen water samples were collected from shallow dug wells (n=12), a deep tube well (n=1), a spring (n=1), and rivers (n=2) in September 2014 for 16S rRNA gene next-generation sequencing. A total of 525 genera were identified, of which 81 genera were classified as possible pathogenic bacteria. Acinetobacter, Arcobacter, and Clostridium were detected with a relatively higher abundance (>0.1% of total bacterial genes) in 16, 13, and 5 of the 16 samples, respectively, and the highest abundance ratio of Acinetobacter (85.14%) was obtained in the deep tube well sample. Furthermore, the bla OXA23-like genes of Acinetobacter were detected using SYBR Green-based quantitative PCR in 13 (35%) of 37 water samples, including the 16 samples that were analyzed for next-generation sequencing, with concentrations ranging 5.3-7.5logcopies/100mL. There was no sufficient correlation found between fecal indicator bacteria, such as Escherichia coli and total coliforms, and potential pathogenic bacteria, as well as the bla OXA23-like gene of Acinetobacter. These results suggest the limitation of using conventional fecal indicator bacteria in evaluating the pathogenic bacteria contamination of different water sources in the Kathmandu Valley. Copyright © 2017 Elsevier B.V. All rights reserved.
Castro, Rosario; Navelsaker, Sofie; Krasnov, Aleksei; Du Pasquier, Louis; Boudinot, Pierre
2017-10-01
During the last decades, gene and cDNA cloning identified TCR and Ig genes across vertebrates; genome sequencing of TCR and Ig loci in many species revealed the different organizations selected during evolution under the pressure of generating diverse repertoires of Ag receptors. By detecting clonotypes over a wide range of frequency, deep sequencing of Ig and TCR transcripts provides a new way to compare the structure of expressed repertoires in species of various sizes, at different stages of development, with different physiologies, and displaying multiple adaptations to the environment. In this review, we provide a short overview of the technologies currently used to produce global description of immune repertoires, describe how they have already been used in comparative immunology, and we discuss the future potential of such approaches. The development of these methodologies in new species holds promise for new discoveries concerning particular adaptations. As an example, understanding the development of adaptive immunity across metamorphosis in frogs has been made possible by such approaches. Repertoire sequencing is now widely used, not only in basic research but also in the context of immunotherapy and vaccination. Analysis of fish responses to pathogens and vaccines has already benefited from these methods. Finally, we also discuss potential advances based on repertoire sequencing of multigene families of immune sensors and effectors in invertebrates. Copyright © 2017 Elsevier Ltd. All rights reserved.
Giugni, E; Sabatini, U; Hagberg, G E; Formisano, R; Castriota-Scanderbeg, A
2005-01-01
Diffuse axonal injury (DAI) is a common type of primary neuronal injury in patients with severe traumatic brain injury, and is frequently accompanied by tissue tear haemorrhage. The T2*-weighted gradient-recalled echo (GRE) sequences are more sensitive than T2-weighted spin-echo images for detection of haemorrhage. This study was undertaken to determine whether turbo-PEPSI, an extremely fast multi-echo-planar-imaging sequence, can be used as an alternative to the GRE sequence for detection of DAI. Nineteen patients (mean age 24,5 year) with severe traumatic brain injury (TBI), occurred at least 3 months earlier, underwent a brain MRI study on a 1.5-Tesla scanner. A qualitative evaluation of the turbo-PEPSI sequences was performed by identifying the optimal echo time and in-plane resolution. The number and size of DAI lesions, as well as the signal intensity contrast ratio (SI CR), were computed for each set of GRE and turbo-PEPSI images, and divided according to their anatomic location into lobar and/or deep brain. There was no significant difference between GRE and turbo-PEPSI sequences in the total number of DAI lesions detected (283 vs 225 lesions, respectively). The GRE sequence identified a greater number of hypointense lesions in the temporal lobe compared to the t-PEPSI sequence (72 vs 35, p<0.003), while no significant differences were found for the other brain regions. The SI CR was significantly better (i.e. lower) for the turbo-PEPSI than for the GRE sequence (p<0.00001). Owing to its very short scan time and high sensitivity to the haemorrhage foci, the turbo-PEPSI sequence can be used as an alternative to the GRE to assess brain DAI in severe TBI patients, especially if uncooperative and medically unstable.
NASA Astrophysics Data System (ADS)
Zhao, Feng; Xu, Kuidong
2016-10-01
In comparison with the macrobenthos and prokaryotes, patterns of diversity and distribution of microbial eukaryotes in deep-sea hydrothermal vents are poorly known. The widely used high-throughput sequencing of 18S rDNA has revealed a high diversity of microeukaryotes yielded from both living organisms and buried DNA in marine sediments. More recently, cDNA surveys have been utilized to uncover the diversity of active organisms. However, both methods have never been used to evaluate the diversity of ciliates in hydrothermal vents. By using high-throughput DNA and cDNA sequencing of 18S rDNA, we evaluated the molecular diversity of ciliates, a representative group of microbial eukaryotes, from the sediments of deep-sea hydrothermal vents in the Okinawa Trough and compared it with that of an adjacent deep-sea area about 15 km away and that of an offshore area of the Yellow Sea about 500 km away. The results of DNA sequencing showed that Spirotrichea and Oligohymenophorea were the most diverse and abundant groups in all the three habitats. The proportion of sequences of Oligohymenophorea was the highest in the hydrothermal vents whereas Spirotrichea was the most diverse group at all three habitats. Plagiopyleans were found only in the hydrothermal vents but with low diversity and abundance. By contrast, the cDNA sequencing showed that Plagiopylea was the most diverse and most abundant group in the hydrothermal vents, followed by Spirotrichea in terms of diversity and Oligohymenophorea in terms of relative abundance. A novel group of ciliates, distinctly separate from the 12 known classes, was detected in the hydrothermal vents, indicating undescribed, possibly highly divergent ciliates may inhabit this environment. Statistical analyses showed that: (i) the three habitats differed significantly from one another in terms of diversity of both the rare and the total ciliate taxa, and; (ii) the adjacent deep sea was more similar to the offshore area than to the hydrothermal vents. In terms of the diversity of abundant taxa, however, there was no significant difference between the hydrothermal vents and the adjacent deep sea, both of which differed significantly from the offshore area. As abundant ciliate taxa can be found in several sampling sites, they are likely adapted to large environmental variations, while rare taxa are found in specific habitat and thus are potentially more sensitive to varying environmental conditions.
deepTools: a flexible platform for exploring deep-sequencing data.
Ramírez, Fidel; Dündar, Friederike; Diehl, Sarah; Grüning, Björn A; Manke, Thomas
2014-07-01
We present a Galaxy based web server for processing and visualizing deeply sequenced data. The web server's core functionality consists of a suite of newly developed tools, called deepTools, that enable users with little bioinformatic background to explore the results of their sequencing experiments in a standardized setting. Users can upload pre-processed files with continuous data in standard formats and generate heatmaps and summary plots in a straight-forward, yet highly customizable manner. In addition, we offer several tools for the analysis of files containing aligned reads and enable efficient and reproducible generation of normalized coverage files. As a modular and open-source platform, deepTools can easily be expanded and customized to future demands and developments. The deepTools webserver is freely available at http://deeptools.ie-freiburg.mpg.de and is accompanied by extensive documentation and tutorials aimed at conveying the principles of deep-sequencing data analysis. The web server can be used without registration. deepTools can be installed locally either stand-alone or as part of Galaxy. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Metagenomic Analysis of Viral Communities in (Hado)Pelagic Sediments
Yoshida, Mitsuhiro; Takaki, Yoshihiro; Eitoku, Masamitsu; Nunoura, Takuro; Takai, Ken
2013-01-01
In this study, we analyzed viral metagenomes (viromes) in the sedimentary habitats of three geographically and geologically distinct (hado)pelagic environments in the northwest Pacific; the Izu-Ogasawara Trench (water depth = 9,760 m) (OG), the Challenger Deep in the Mariana Trench (10,325 m) (MA), and the forearc basin off the Shimokita Peninsula (1,181 m) (SH). Virus abundance ranged from 106 to 1011 viruses/cm3 of sediments (down to 30 cm below the seafloor [cmbsf]). We recovered viral DNA assemblages (viromes) from the (hado)pelagic sediment samples and obtained a total of 37,458, 39,882, and 70,882 sequence reads by 454 GS FLX Titanium pyrosequencing from the virome libraries of the OG, MA, and SH (hado)pelagic sediments, respectively. Only 24−30% of the sequence reads from each virome library exhibited significant similarities to the sequences deposited in the public nr protein database (E-value <10−3 in BLAST). Among the sequences identified as potential viral genes based on the BLAST search, 95−99% of the sequence reads in each library were related to genes from single-stranded DNA (ssDNA) viral families, including Microviridae, Circoviridae, and Geminiviridae. A relatively high abundance of sequences related to the genetic markers (major capsid protein [VP1] and replication protein [Rep]) of two ssDNA viral groups were also detected in these libraries, thereby revealing a high genotypic diversity of their viruses (833 genotypes for VP1 and 2,551 genotypes for Rep). A majority of the viral genes predicted from each library were classified into three ssDNA viral protein categories: Rep, VP1, and minor capsid protein. The deep-sea sedimentary viromes were distinct from the viromes obtained from the oceanic and fresh waters and marine eukaryotes, and thus, deep-sea sediments harbor novel viromes, including previously unidentified ssDNA viruses. PMID:23468952
Metagenomic analysis of viral communities in (hado)pelagic sediments.
Yoshida, Mitsuhiro; Takaki, Yoshihiro; Eitoku, Masamitsu; Nunoura, Takuro; Takai, Ken
2013-01-01
In this study, we analyzed viral metagenomes (viromes) in the sedimentary habitats of three geographically and geologically distinct (hado)pelagic environments in the northwest Pacific; the Izu-Ogasawara Trench (water depth = 9,760 m) (OG), the Challenger Deep in the Mariana Trench (10,325 m) (MA), and the forearc basin off the Shimokita Peninsula (1,181 m) (SH). Virus abundance ranged from 10(6) to 10(11) viruses/cm(3) of sediments (down to 30 cm below the seafloor [cmbsf]). We recovered viral DNA assemblages (viromes) from the (hado)pelagic sediment samples and obtained a total of 37,458, 39,882, and 70,882 sequence reads by 454 GS FLX Titanium pyrosequencing from the virome libraries of the OG, MA, and SH (hado)pelagic sediments, respectively. Only 24-30% of the sequence reads from each virome library exhibited significant similarities to the sequences deposited in the public nr protein database (E-value <10(-3) in BLAST). Among the sequences identified as potential viral genes based on the BLAST search, 95-99% of the sequence reads in each library were related to genes from single-stranded DNA (ssDNA) viral families, including Microviridae, Circoviridae, and Geminiviridae. A relatively high abundance of sequences related to the genetic markers (major capsid protein [VP1] and replication protein [Rep]) of two ssDNA viral groups were also detected in these libraries, thereby revealing a high genotypic diversity of their viruses (833 genotypes for VP1 and 2,551 genotypes for Rep). A majority of the viral genes predicted from each library were classified into three ssDNA viral protein categories: Rep, VP1, and minor capsid protein. The deep-sea sedimentary viromes were distinct from the viromes obtained from the oceanic and fresh waters and marine eukaryotes, and thus, deep-sea sediments harbor novel viromes, including previously unidentified ssDNA viruses.
Li, Guo; Liu, Yong; Liu, Chao; Su, Zhongwu; Ren, Shuling; Wang, Yunyun; Deng, Tengbo; Huang, Donghai; Tian, Yongquan; Qiu, Yuanzheng
2016-09-06
Radioresistance is one of the major factors limiting the therapeutic efficacy and prognosis of patients with nasopharyngeal carcinoma (NPC). Accumulating evidence has suggested that aberrant expression of long noncoding RNAs (lncRNAs) contributes to cancer progression. Therefore, here we identified lncRNAs associated with radioresistance in NPC. The differential expression profiles of lncRNAs associated with NPC radioresistance were constructed by next-generation deep sequencing by comparing radioresistant NPC cells with their parental cells. LncRNA-related mRNAs were predicted and analyzed using bioinformatics algorithms compared with the mRNA profiles related to radioresistance obtained in our previous study. Several lncRNAs and associated mRNAs were validated in established NPC radioresistant cell models and NPC tissues. By comparison between radioresistant CNE-2-Rs and parental CNE-2 cells by next-generation deep sequencing, a total of 781 known lncRNAs and 2054 novel lncRNAs were annotated. The top five upregulated and downregulated known/novel lncRNAs were detected using quantitative real-time reverse transcription-polymerase chain reaction, and 7/10 known lncRNAs and 3/10 novel lncRNAs were demonstrated to have significant differential expression trends that were the same as those predicted by deep sequencing. From the prediction process, 13 pairs of lncRNAs and their associated genes were acquired, and the prediction trends of three pairs were validated in both radioresistant CNE-2-Rs and 6-10B-Rs cell lines, including lncRNA n373932 and SLITRK5, n409627 and PRSS12, and n386034 and RIMKLB. LncRNA n373932 and its related SLITRK5 showed dramatic expression changes in post-irradiation radioresistant cells and a negative expression correlation in NPC tissues (R = -0.595, p < 0.05). Our study provides an overview of the expression profiles of radioresistant lncRNAs and potentially related mRNAs, which will facilitate future investigations into the function of lncRNAs in NPC radioresistance.
Maruyama, Sandra Regina; Castro-Jorge, Luiza Antunes; Ribeiro, José Marcos Chaves; Gardinassi, Luiz Gustavo; Garcia, Gustavo Rocha; Brandão, Lucinda Giampietro; Rodrigues, Aline Rezende; Okada, Marcos Ituo; Abrão, Emiliana Pereira; Ferreira, Beatriz Rossetti; da Fonseca, Benedito Antonio Lopes; de Miranda-Santos, Isabel Kinney Ferreira
2013-01-01
Transcripts similar to those that encode the nonstructural (NS) proteins NS3 and NS5 from flaviviruses were found in a salivary gland (SG) complementary DNA (cDNA) library from the cattle tick Rhipicephalus microplus. Tick extracts were cultured with cells to enable the isolation of viruses capable of replicating in cultured invertebrate and vertebrate cells. Deep sequencing of the viral RNA isolated from culture supernatants provided the complete coding sequences for the NS3 and NS5 proteins and their molecular characterisation confirmed similarity with the NS3 and NS5 sequences from other flaviviruses. Despite this similarity, phylogenetic analyses revealed that this potentially novel virus may be a highly divergent member of the genus Flavivirus. Interestingly, we detected the divergent NS3 and NS5 sequences in ticks collected from several dairy farms widely distributed throughout three regions of Brazil. This is the first report of flavivirus-like transcripts in R. microplus ticks. This novel virus is a potential arbovirus because it replicated in arthropod and mammalian cells; furthermore, it was detected in a cDNA library from tick SGs and therefore may be present in tick saliva. It is important to determine whether and by what means this potential virus is transmissible and to monitor the virus as a potential emerging tick-borne zoonotic pathogen. PMID:24626302
Yu, Qichao; Zhang, Wei; Zhang, Xiaolong; Zeng, Yongli; Wang, Yeming; Wang, Yanhui; Xu, Liqin; Huang, Xiaoyun; Li, Nannan; Zhou, Xinlan; Lu, Jie; Guo, Xiaosen; Li, Guibo; Hou, Yong; Liu, Shiping; Li, Bo
2017-09-01
Active retrotransposons play important roles during evolution and continue to shape our genomes today, especially in genetic polymorphisms underlying a diverse set of diseases. However, studies of human retrotransposon insertion polymorphisms (RIPs) based on whole-genome deep sequencing at the population level have not been sufficiently undertaken, despite the obvious need for a thorough characterization of RIPs in the general population. Herein, we present a novel and efficient computational tool called Specific Insertions Detector (SID) for the detection of non-reference RIPs. We demonstrate that SID is suitable for high-depth whole-genome sequencing data using paired-end reads obtained from simulated and real datasets. We construct a comprehensive RIP database using a large population of 90 Han Chinese individuals with a mean ×68 depth per individual. In total, we identify 9342 recent RIPs, and 8433 of these RIPs are novel compared with dbRIP, including 5826 Alu, 2169 long interspersed nuclear element 1 (L1), 383 SVA, and 55 long terminal repeats. Among the 9342 RIPs, 4828 were located in gene regions and 5 were located in protein-coding regions. We demonstrate that RIPs can, in principle, be an informative resource to perform population evolution and phylogenetic analyses. Taking the demographic effects into account, we identify a weak negative selection on SVA and L1 but an approximately neutral selection for Alu elements based on the frequency spectrum of RIPs. SID is a powerful open-source program for the detection of non-reference RIPs. We built a non-reference RIP dataset that greatly enhanced the diversity of RIPs detected in the general population, and it should be invaluable to researchers interested in many aspects of human evolution, genetics, and disease. As a proof of concept, we demonstrate that the RIPs can be used as biomarkers in a similar way as single nucleotide polymorphisms. © The Authors 2017. Published by Oxford University Press.
Bazak, Lily; Haviv, Ami; Barak, Michal; Jacob-Hirsch, Jasmine; Deng, Patricia; Zhang, Rui; Isaacs, Farren J; Rechavi, Gideon; Li, Jin Billy; Eisenberg, Eli; Levanon, Erez Y
2014-03-01
RNA molecules transmit the information encoded in the genome and generally reflect its content. Adenosine-to-inosine (A-to-I) RNA editing by ADAR proteins converts a genomically encoded adenosine into inosine. It is known that most RNA editing in human takes place in the primate-specific Alu sequences, but the extent of this phenomenon and its effect on transcriptome diversity are not yet clear. Here, we analyzed large-scale RNA-seq data and detected ∼1.6 million editing sites. As detection sensitivity increases with sequencing coverage, we performed ultradeep sequencing of selected Alu sequences and showed that the scope of editing is much larger than anticipated. We found that virtually all adenosines within Alu repeats that form double-stranded RNA undergo A-to-I editing, although most sites exhibit editing at only low levels (<1%). Moreover, using high coverage sequencing, we observed editing of transcripts resulting from residual antisense expression, doubling the number of edited sites in the human genome. Based on bioinformatic analyses and deep targeted sequencing, we estimate that there are over 100 million human Alu RNA editing sites, located in the majority of human genes. These findings set the stage for exploring how this primate-specific massive diversification of the transcriptome is utilized.
Low-Latency Telerobotic Sample Return and Biomolecular Sequencing for Deep Space Gateway
NASA Astrophysics Data System (ADS)
Lupisella, M.; Bleacher, J.; Lewis, R.; Dworkin, J.; Wright, M.; Burton, A.; Rubins, K.; Wallace, S.; Stahl, S.; John, K.; Archer, D.; Niles, P.; Regberg, A.; Smith, D.; Race, M.; Chiu, C.; Russell, J.; Rampe, E.; Bywaters, K.
2018-02-01
Low-latency telerobotics, crew-assisted sample return, and biomolecular sequencing can be used to acquire and analyze lunar farside and/or Apollo landing site samples. Sequencing can also be used to monitor and study Deep Space Gateway environment and crew health.
USDA-ARS?s Scientific Manuscript database
Squash mosaic virus (SqMV), a seed-borne virus belonging to the genus Commovirus in the family Comoviridae, could cause a serious yield loss on cucurbit crops worldwide. SqMV has a bipartite single-stranded ribonucleic acid (RNA) genome (RNA-1 and RNA-2) encapsidated separately with two capsid prote...
Zhang, Likui; Kang, Manyu; Huang, Yangchao; Yang, Lixiang
2016-05-01
The diversity and ecological significance of bacteria and archaea in deep-sea environments have been thoroughly investigated, but eukaryotic microorganisms in these areas, such as fungi, are poorly understood. To elucidate fungal diversity in calcareous deep-sea sediments in the Southwest India Ridge (SWIR), the internal transcribed spacer (ITS) regions of rRNA genes from two sediment metagenomic DNA samples were amplified and sequenced using the Illumina sequencing platform. The results revealed that 58-63 % and 36-42 % of the ITS sequences (97 % similarity) belonged to Basidiomycota and Ascomycota, respectively. These findings suggest that Basidiomycota and Ascomycota are the predominant fungal phyla in the two samples. We also found that Agaricomycetes, Leotiomycetes, and Pezizomycetes were the major fungal classes in the two samples. At the species level, Thelephoraceae sp. and Phialocephala fortinii were major fungal species in the two samples. Despite the low relative abundance, unidentified fungal sequences were also observed in the two samples. Furthermore, we found that there were slight differences in fungal diversity between the two sediment samples, although both were collected from the SWIR. Thus, our results demonstrate that calcareous deep-sea sediments in the SWIR harbor diverse fungi, which augment the fungal groups in deep-sea sediments. This is the first report of fungal communities in calcareous deep-sea sediments in the SWIR revealed by Illumina sequencing.
The deep biosphere in terrestrial sediments in the chesapeake bay area, virginia, USA.
Breuker, Anja; Köweker, Gerrit; Blazejak, Anna; Schippers, Axel
2011-01-01
For the first time quantitative data on the abundance of Bacteria, Archaea, and Eukarya in deep terrestrial sediments are provided using multiple methods (total cell counting, quantitative real-time PCR, Q-PCR and catalyzed reporter deposition-fluorescence in situ hybridization, CARD-FISH). The oligotrophic (organic carbon content of ∼0.2%) deep terrestrial sediments in the Chesapeake Bay area at Eyreville, Virginia, USA, were drilled and sampled up to a depth of 140 m in 2006. The possibility of contamination during drilling was checked using fluorescent microspheres. Total cell counts decreased from 10(9) to 10(6) cells/g dry weight within the uppermost 20 m, and did not further decrease with depth below. Within the top 7 m, a significant proportion of the total cell counts could be detected with CARD-FISH. The CARD-FISH numbers for Bacteria were about an order of magnitude higher than those for Archaea. The dominance of Bacteria over Archaea was confirmed by Q-PCR. The down core quantitative distribution of prokaryotic and eukaryotic small subunit ribosomal RNA genes as well as functional genes involved in different biogeochemical processes was revealed by Q-PCR for the uppermost 10 m and for 80-140 m depth. Eukarya and the Fe(III)- and Mn(IV)-reducing bacterial group Geobacteriaceae were almost exclusively found in the uppermost meter (arable soil), where reactive iron was detected in higher amounts. The bacterial candidate division JS-1 and the classes Anaerolineae and Caldilineae of the phylum Chloroflexi, highly abundant in marine sediments, were found up to the maximum sampling depth in high copy numbers at this terrestrial site as well. A similar high abundance of the functional gene cbbL encoding for the large subunit of RubisCO suggests that autotrophic microorganisms could be relevant in addition to heterotrophs. The functional gene aprA of sulfate reducing bacteria was found within distinct layers up to ca. 100 m depth in low copy numbers. The gene mcrA of methanogens was not detectable. Cloning and sequencing data of 16S rRNA genes revealed sequences of typical soil Bacteria. The closest relatives of the archaeal sequences were Archaea recovered from terrestrial and marine environments. Phylogenetic analysis of the Crenarchaeota and Euryarchaeota revealed new members of the uncultured South African Gold Mine Group, Deep Sea Hydrothermal Vent Euryarchaeotal Group 6, and Miscellaneous Crenarcheotic Group clusters.
NASA Astrophysics Data System (ADS)
Payler, Samuel J.; Biddle, Jennifer F.; Coates, Andrew J.; Cousins, Claire R.; Cross, Rachel E.; Cullen, David C.; Downs, Michael T.; Direito, Susana O. L.; Edwards, Thomas; Gray, Amber L.; Genis, Jac; Gunn, Matthew; Hansford, Graeme M.; Harkness, Patrick; Holt, John; Josset, Jean-Luc; Li, Xuan; Lees, David S.; Lim, Darlene S. S.; McHugh, Melissa; McLuckie, David; Meehan, Emma; Paling, Sean M.; Souchon, Audrey; Yeoman, Louise; Cockell, Charles S.
2017-04-01
The subsurface exploration of other planetary bodies can be used to unravel their geological history and assess their habitability. On Mars in particular, present-day habitable conditions may be restricted to the subsurface. Using a deep subsurface mine, we carried out a program of extraterrestrial analog research - MINe Analog Research (MINAR). MINAR aims to carry out the scientific study of the deep subsurface and test instrumentation designed for planetary surface exploration by investigating deep subsurface geology, whilst establishing the potential this technology has to be transferred into the mining industry. An integrated multi-instrument suite was used to investigate samples of representative evaporite minerals from a subsurface Permian evaporite sequence, in particular to assess mineral and elemental variations which provide small-scale regions of enhanced habitability. The instruments used were the Panoramic Camera emulator, Close-Up Imager, Raman spectrometer, Small Planetary Linear Impulse Tool, Ultrasonic drill and handheld X-ray diffraction (XRD). We present science results from the analog research and show that these instruments can be used to investigate in situ the geological context and mineralogical variations of a deep subsurface environment, and thus habitability, from millimetre to metre scales. We also show that these instruments are complementary. For example, the identification of primary evaporite minerals such as NaCl and KCl, which are difficult to detect by portable Raman spectrometers, can be accomplished with XRD. By contrast, Raman is highly effective at locating and detecting mineral inclusions in primary evaporite minerals. MINAR demonstrates the effective use of a deep subsurface environment for planetary instrument development, understanding the habitability of extreme deep subsurface environments on Earth and other planetary bodies, and advancing the use of space technology in economic mining.
Kobayashi, Tohru; Koide, Osamu; Mori, Kozue; Shimamura, Shigeru; Matsuura, Takae; Miura, Takeshi; Takaki, Yoshihiro; Morono, Yuki; Nunoura, Takuro; Imachi, Hiroyuki; Inagaki, Fumio; Takai, Ken; Horikoshi, Koki
2008-07-01
"A meta-enzyme approach" is proposed as an ecological enzymatic method to explore the potential functions of microbial communities in extreme environments such as the deep marine subsurface. We evaluated a variety of extra-cellular enzyme activities of sediment slurries and isolates from a deep subseafloor sediment core. Using the new deep-sea drilling vessel "Chikyu", we obtained 365 m of core sediments that contained approximately 2% organic matter and considerable amounts of methane from offshore the Shimokita Peninsula in Japan at a water depth of 1,180 m. In the extra-sediment fraction of the slurry samples, phosphatase, esterase, and catalase activities were detected consistently throughout the core sediments down to the deepest slurry sample from 342.5 m below seafloor (mbsf). Detectable enzyme activities predicted the existence of a sizable population of viable aerobic microorganisms even in deep subseafloor habitats. The subsequent quantitative cultivation using solid media represented remarkably high numbers of aerobic, heterotrophic microbial populations (e.g., maximally 4.4x10(7) cells cm(-3) at 342.5 mbsf). Analysis of 16S rRNA gene sequences revealed that the predominant cultivated microbial components were affiliated with the genera Bacillus, Shewanella, Pseudoalteromonas, Halomonas, Pseudomonas, Paracoccus, Rhodococcus, Microbacterium, and Flexibacteracea. Many of the predominant and scarce isolates produced a variety of extra-cellular enzymes such as proteases, amylases, lipases, chitinases, phosphatases, and deoxyribonucleases. Our results indicate that microbes in the deep subseafloor environment off Shimokita are metabolically active and that the cultivable populations may have a great potential in biotechnology.
Weber, Jan; Vazquez, Ana C; Winner, Dane; Gibson, Richard M; Rhea, Ariel M; Rose, Justine D; Wylie, Doug; Henry, Kenneth; Wright, Alison; King, Kevin; Archer, John; Poveda, Eva; Soriano, Vicente; Robertson, David L; Olivo, Paul D; Arts, Eric J; Quiñones-Mateu, Miguel E
2013-05-01
CCR5 antagonists are a powerful new class of antiretroviral drugs that require a companion assay to evaluate the presence of CXCR4-tropic (non-R5) viruses prior to use in human immunodeficiency virus (HIV)-infected individuals. In this study, we have developed, characterized, verified, and prevalidated a novel phenotypic test to determine HIV-1 coreceptor tropism (VERITROP) based on a sensitive cell-to-cell fusion assay. A proprietary vector was constructed containing a near-full-length HIV-1 genome with the yeast uracil biosynthesis (URA3) gene replacing the HIV-1 env coding sequence. Patient-derived HIV-1 PCR products were introduced by homologous recombination using an innovative yeast-based cloning strategy. The env-expressing vectors were then used in a cell-to-cell fusion assay to determine the presence of R5 and/or non-R5 HIV-1 variants within the viral population. Results were compared with (i) the original version of Trofile (Monogram Biosciences, San Francisco, CA), (ii) population sequencing, and (iii) 454 pyrosequencing, with the genotypic data analyzed using several bioinformatics tools, i.e., the 11/24/25 rule, Geno2Pheno (2% to 5.75%, 3.5%, or 10% false-positive rate [FPR]), and webPSSM. VERITROP consistently detected minority non-R5 variants from clinical specimens, with an analytical sensitivity of 0.3%, with viral loads of ≥1,000 copies/ml, and from B and non-B subtypes. In a pilot study, a 73.7% (56/76) concordance was observed with the original Trofile assay, with 19 of the 20 discordant results corresponding to non-R5 variants detected using VERITROP and not by the original Trofile assay. The degree of concordance of VERITROP and Trofile with population and deep sequencing results depended on the algorithm used to determine HIV-1 coreceptor tropism. Overall, VERITROP showed better concordance with deep sequencing/Geno2Pheno at a 0.3% detection threshold (67%), whereas Trofile matched better with population sequencing (79%). However, 454 sequencing using Geno2Pheno at a 10% FPR and 0.3% threshold and VERITROP more accurately predicted the success of a maraviroc-based regimen. In conclusion, VERITROP may promote the development of new HIV coreceptor antagonists and aid in the treatment and management of HIV-infected individuals prior to and/or during treatment with this class of drugs.
Bass, David; Moureau, Gregory; Tang, Shuoya; McAlister, Erica; Culverwell, C. Lorna; Glücksman, Edvard; Wang, Hui; Brown, T. David K.; Gould, Ernest A.; Harbach, Ralph E.; de Lamballerie, Xavier; Firth, Andrew E.
2013-01-01
We investigated whether small RNA (sRNA) sequenced from field-collected mosquitoes and chironomids (Diptera) can be used as a proxy signature of viral prevalence within a range of species and viral groups, using sRNAs sequenced from wild-caught specimens, to inform total RNA deep sequencing of samples of particular interest. Using this strategy, we sequenced from adult Anopheles maculipennis s.l. mosquitoes the apparently nearly complete genome of one previously undescribed virus related to chronic bee paralysis virus, and, from a pool of Ochlerotatus caspius and Oc. detritus mosquitoes, a nearly complete entomobirnavirus genome. We also reconstructed long sequences (1503-6557 nt) related to at least nine other viruses. Crucially, several of the sequences detected were reconstructed from host organisms highly divergent from those in which related viruses have been previously isolated or discovered. It is clear that viral transmission and maintenance cycles in nature are likely to be significantly more complex and taxonomically diverse than previously expected. PMID:24260463
Buschmann, Tilo; Zhang, Rong; Brash, Douglas E; Bystrykh, Leonid V
2014-08-07
DNA barcodes are short unique sequences used to label DNA or RNA-derived samples in multiplexed deep sequencing experiments. During the demultiplexing step, barcodes must be detected and their position identified. In some cases (e.g., with PacBio SMRT), the position of the barcode and DNA context is not well defined. Many reads start inside the genomic insert so that adjacent primers might be missed. The matter is further complicated by coincidental similarities between barcode sequences and reference DNA. Therefore, a robust strategy is required in order to detect barcoded reads and avoid a large number of false positives or negatives.For mass inference problems such as this one, false discovery rate (FDR) methods are powerful and balanced solutions. Since existing FDR methods cannot be applied to this particular problem, we present an adapted FDR method that is suitable for the detection of barcoded reads as well as suggest possible improvements. In our analysis, barcode sequences showed high rates of coincidental similarities with the Mus musculus reference DNA. This problem became more acute when the length of the barcode sequence decreased and the number of barcodes in the set increased. The method presented in this paper controls the tail area-based false discovery rate to distinguish between barcoded and unbarcoded reads. This method helps to establish the highest acceptable minimal distance between reads and barcode sequences. In a proof of concept experiment we correctly detected barcodes in 83% of the reads with a precision of 89%. Sensitivity improved to 99% at 99% precision when the adjacent primer sequence was incorporated in the analysis. The analysis was further improved using a paired end strategy. Following an analysis of the data for sequence variants induced in the Atp1a1 gene of C57BL/6 murine melanocytes by ultraviolet light and conferring resistance to ouabain, we found no evidence of cross-contamination of DNA material between samples. Our method offers a proper quantitative treatment of the problem of detecting barcoded reads in a noisy sequencing environment. It is based on the false discovery rate statistics that allows a proper trade-off between sensitivity and precision to be chosen.
Coghlan, Megan L.; Haile, James; Houston, Jayne; Murray, Dáithí C.; White, Nicole E.; Moolhuijzen, Paula; Bellgard, Matthew I.; Bunce, Michael
2012-01-01
Traditional Chinese medicine (TCM) has been practiced for thousands of years, but only within the last few decades has its use become more widespread outside of Asia. Concerns continue to be raised about the efficacy, legality, and safety of many popular complementary alternative medicines, including TCMs. Ingredients of some TCMs are known to include derivatives of endangered, trade-restricted species of plants and animals, and therefore contravene the Convention on International Trade in Endangered Species (CITES) legislation. Chromatographic studies have detected the presence of heavy metals and plant toxins within some TCMs, and there are numerous cases of adverse reactions. It is in the interests of both biodiversity conservation and public safety that techniques are developed to screen medicinals like TCMs. Targeting both the p-loop region of the plastid trnL gene and the mitochondrial 16S ribosomal RNA gene, over 49,000 amplicon sequence reads were generated from 15 TCM samples presented in the form of powders, tablets, capsules, bile flakes, and herbal teas. Here we show that second-generation, high-throughput sequencing (HTS) of DNA represents an effective means to genetically audit organic ingredients within complex TCMs. Comparison of DNA sequence data to reference databases revealed the presence of 68 different plant families and included genera, such as Ephedra and Asarum, that are potentially toxic. Similarly, animal families were identified that include genera that are classified as vulnerable, endangered, or critically endangered, including Asiatic black bear (Ursus thibetanus) and Saiga antelope (Saiga tatarica). Bovidae, Cervidae, and Bufonidae DNA were also detected in many of the TCM samples and were rarely declared on the product packaging. This study demonstrates that deep sequencing via HTS is an efficient and cost-effective way to audit highly processed TCM products and will assist in monitoring their legality and safety especially when plant reference databases become better established. PMID:22511890
Baleen whale infrasonic sounds: Natural variability and function
NASA Astrophysics Data System (ADS)
Clark, Christopher W.
2004-05-01
Blue and fin whales (Balaenoptera musculus and B. physalus) produce very intense, long, patterned sequences of infrasonic sounds. The acoustic characteristics of these sounds suggest strong selection for signals optimized for very long-range propagation in the deep ocean as first hypothesized by Payne and Webb in 1971. This hypothesis has been partially validated by very long-range detections using hydrophone arrays in deep water. Humpback songs recorded in deep water contain units in the 20-l00 Hz range, and these relatively simple song components are detectable out to many hundreds of miles. The mid-winter peak in the occurrence of 20-Hz fin whale sounds led Watkins to hypothesize a reproductive function similar to humpback (Megaptera novaeangliae) song, and by default this function has been extended to blue whale songs. More recent evidence shows that blue and fin whales produce infrasonic calls in high latitudes during the feeding season, and that singing is associated with areas of high productivity where females congregate to feed. Acoustic sampling over broad spatial and temporal scales for baleen species is revealing higher geographic and seasonal variability in the low-frequency vocal behaviors than previously reported, suggesting that present explanations for baleen whale sounds are too simplistic.
Jenkins, Paul A; Song, Yun S; Brem, Rachel B
2012-01-01
Genetic exchange between isolated populations, or introgression between species, serves as a key source of novel genetic material on which natural selection can act. While detecting historical gene flow from DNA sequence data is of much interest, many existing methods can be limited by requirements for deep population genomic sampling. In this paper, we develop a scalable genealogy-based method to detect candidate signatures of gene flow into a given population when the source of the alleles is unknown. Our method does not require sequenced samples from the source population, provided that the alleles have not reached fixation in the sampled recipient population. The method utilizes recent advances in algorithms for the efficient reconstruction of ancestral recombination graphs, which encode genealogical histories of DNA sequence data at each site, and is capable of detecting the signatures of gene flow whose footprints are of length up to single genes. Further, we employ a theoretical framework based on coalescent theory to test for statistical significance of certain recombination patterns consistent with gene flow from divergent sources. Implementing these methods for application to whole-genome sequences of environmental yeast isolates, we illustrate the power of our approach to highlight loci with unusual recombination histories. By developing innovative theory and methods to analyze signatures of gene flow from population sequence data, our work establishes a foundation for the continued study of introgression and its evolutionary relevance.
Jenkins, Paul A.; Song, Yun S.; Brem, Rachel B.
2012-01-01
Genetic exchange between isolated populations, or introgression between species, serves as a key source of novel genetic material on which natural selection can act. While detecting historical gene flow from DNA sequence data is of much interest, many existing methods can be limited by requirements for deep population genomic sampling. In this paper, we develop a scalable genealogy-based method to detect candidate signatures of gene flow into a given population when the source of the alleles is unknown. Our method does not require sequenced samples from the source population, provided that the alleles have not reached fixation in the sampled recipient population. The method utilizes recent advances in algorithms for the efficient reconstruction of ancestral recombination graphs, which encode genealogical histories of DNA sequence data at each site, and is capable of detecting the signatures of gene flow whose footprints are of length up to single genes. Further, we employ a theoretical framework based on coalescent theory to test for statistical significance of certain recombination patterns consistent with gene flow from divergent sources. Implementing these methods for application to whole-genome sequences of environmental yeast isolates, we illustrate the power of our approach to highlight loci with unusual recombination histories. By developing innovative theory and methods to analyze signatures of gene flow from population sequence data, our work establishes a foundation for the continued study of introgression and its evolutionary relevance. PMID:23226196
Estorninho, Megan; Gibson, Vivienne B; Kronenberg-Versteeg, Deborah; Liu, Yuk-Fun; Ni, Chester; Cerosaletti, Karen; Peakman, Mark
2013-12-01
Extensive diversity in the human repertoire of TCRs for Ag is both a cornerstone of effective adaptive immunity that enables host protection against a multiplicity of pathogens and a weakness that gives rise to potential pathological self-reactivity. The complexity arising from diversity makes detection and tracking of single Ag-specific CD4 T cells (ASTs) involved in these immune responses challenging. We report a tandem, multistep process to quantify rare TCRβ-chain variable sequences of ASTs in large polyclonal populations. The approach combines deep high-throughput sequencing (HTS) within functional CD4 T cell compartments, such as naive/memory cells, with shallow, multiple identifier-based HTS of ASTs identified by activation marker upregulation after short-term Ag stimulation in vitro. We find that clonotypes recognizing HLA class II-restricted epitopes of both pathogen-derived Ags and self-Ags are oligoclonal and typically private. Clonotype tracking within an individual reveals private AST clonotypes resident in the memory population, as would be expected, representing clonal expansions (identical nucleotide sequence; "ultraprivate"). Other AST clonotypes share CDR3β amino acid sequences through convergent recombination and are found in memory populations of multiple individuals. Tandem HTS-based clonotyping will facilitate studying AST dynamics, epitope spreading, and repertoire changes that arise postvaccination and following Ag-specific immunotherapies for cancer and autoimmune disease.
USDA-ARS?s Scientific Manuscript database
Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RN...
Bintz, Brittania J; Dixon, Groves B; Wilson, Mark R
2014-07-01
Next-generation sequencing technologies enable the identification of minor mitochondrial DNA variants with higher sensitivity than Sanger methods, allowing for enhanced identification of minor variants. In this study, mixtures of human mtDNA control region amplicons were subjected to pyrosequencing to determine the detection threshold of the Roche GS Junior(®) instrument (Roche Applied Science, Indianapolis, IN). In addition to expected variants, a set of reproducible variants was consistently found in reads from one particular amplicon. A BLASTn search of the variant sequence revealed identity to a segment of a 611-bp nuclear insertion of the mitochondrial control region (NumtS) spanning the primer-binding sites of this amplicon (Nature 1995;378:489). Primers (Hum Genet 2012;131:757; Hum Biol 1996;68:847) flanking the insertion were used to confirm the presence or absence of the NumtS in buccal DNA extracts from twenty donors. These results further our understanding of human mtDNA variation and are expected to have a positive impact on the interpretation of mtDNA profiles using deep-sequencing methods in casework. © 2014 American Academy of Forensic Sciences.
Geoseq: a tool for dissecting deep-sequencing datasets.
Gurtowski, James; Cancio, Anthony; Shah, Hardik; Levovitz, Chaya; George, Ajish; Homann, Robert; Sachidanandam, Ravi
2010-10-12
Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a) identify differential isoform expression in mRNA-seq datasets, b) identify miRNAs (microRNAs) in libraries, and identify mature and star sequences in miRNAS and c) to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.
Rational Protein Engineering Guided by Deep Mutational Scanning
Shin, HyeonSeok; Cho, Byung-Kwan
2015-01-01
Sequence–function relationship in a protein is commonly determined by the three-dimensional protein structure followed by various biochemical experiments. However, with the explosive increase in the number of genome sequences, facilitated by recent advances in sequencing technology, the gap between protein sequences available and three-dimensional structures is rapidly widening. A recently developed method termed deep mutational scanning explores the functional phenotype of thousands of mutants via massive sequencing. Coupled with a highly efficient screening system, this approach assesses the phenotypic changes made by the substitution of each amino acid sequence that constitutes a protein. Such an informational resource provides the functional role of each amino acid sequence, thereby providing sufficient rationale for selecting target residues for protein engineering. Here, we discuss the current applications of deep mutational scanning and consider experimental design. PMID:26404267
Burkholder, William F; Newell, Evan W; Poidinger, Michael; Chen, Swaine; Fink, Katja
2017-01-01
The inaugural workshop "Deep Sequencing in Infectious Diseases: Immune and Pathogen Repertoires for the Improvement of Patient Outcomes" was held in Singapore on 13-14 October 2016. The aim of the workshop was to discuss the latest trends in using high-throughput sequencing, bioinformatics, and allied technologies to analyze immune and pathogen repertoires and their interplay within the host, bringing together key international players in the field and Singapore-based researchers and clinician-scientists. The focus was in particular on the application of these technologies for the improvement of patient diagnosis, prognosis and treatment, and for other broad public health outcomes. The presentations by scientists and clinicians showed the potential of deep sequencing technology to capture the coevolution of adaptive immunity and pathogens. For clinical applications, some key challenges remain, such as the long turnaround time and relatively high cost of deep sequencing for pathogen identification and characterization and the lack of international standardization in immune repertoire analysis.
Burkholder, William F.; Newell, Evan W.; Poidinger, Michael; Chen, Swaine; Fink, Katja
2017-01-01
The inaugural workshop “Deep Sequencing in Infectious Diseases: Immune and Pathogen Repertoires for the Improvement of Patient Outcomes” was held in Singapore on 13–14 October 2016. The aim of the workshop was to discuss the latest trends in using high-throughput sequencing, bioinformatics, and allied technologies to analyze immune and pathogen repertoires and their interplay within the host, bringing together key international players in the field and Singapore-based researchers and clinician-scientists. The focus was in particular on the application of these technologies for the improvement of patient diagnosis, prognosis and treatment, and for other broad public health outcomes. The presentations by scientists and clinicians showed the potential of deep sequencing technology to capture the coevolution of adaptive immunity and pathogens. For clinical applications, some key challenges remain, such as the long turnaround time and relatively high cost of deep sequencing for pathogen identification and characterization and the lack of international standardization in immune repertoire analysis. PMID:28620372
Sibley, Christopher D; Peirano, Gisele; Church, Deirdre L
2012-04-01
Clinical microbiology laboratories worldwide have historically relied on phenotypic methods (i.e., culture and biochemical tests) for detection, identification and characterization of virulence traits (e.g., antibiotic resistance genes, toxins) of human pathogens. However, limitations to implementation of molecular methods for human infectious diseases testing are being rapidly overcome allowing for the clinical evaluation and implementation of diverse technologies with expanding diagnostic capabilities. The advantages and limitation of molecular techniques including real-time polymerase chain reaction, partial or whole genome sequencing, molecular typing, microarrays, broad-range PCR and multiplexing will be discussed. Finally, terminal restriction fragment length polymorphism (T-RFLP) and deep sequencing are introduced as technologies at the clinical interface with the potential to dramatically enhance our ability to diagnose infectious diseases and better define the epidemiology and microbial ecology of a wide range of complex infections. Copyright © 2012 Elsevier B.V. All rights reserved.
Viruses in diarrhoeic dogs include novel kobuviruses and sapoviruses.
Li, Linlin; Pesavento, Patricia A; Shan, Tongling; Leutenegger, Christian M; Wang, Chunlin; Delwart, Eric
2011-11-01
The close interactions of dogs with humans and surrounding wildlife provide frequent opportunities for cross-species virus transmissions. In order to initiate an unbiased characterization of the eukaryotic viruses in the gut of dogs, this study used deep sequencing of partially purified viral capsid-protected nucleic acids from the faeces of 18 diarrhoeic dogs. Known canine parvoviruses, coronaviruses and rotaviruses were identified, and the genomes of the first reported canine kobuvirus and sapovirus were characterized. Canine kobuvirus, the first sequenced canine picornavirus and the closest genetic relative of the diarrhoea-causing human Aichi virus, was detected at high frequency in the faeces of both healthy and diarrhoeic dogs. Canine sapovirus constituted a novel genogroup within the genus Sapovirus, a group of viruses also associated with human and animal diarrhoea. These results highlight the high frequency of new virus detection possible even in extensively studied animal species using metagenomics approaches, and provide viral genomes for further disease-association studies.
Wang, Zheng Jia; Huang, Jian Qin; Huang, You Jun; Li, Zheng; Zheng, Bing Song
2012-08-01
Hickory (Carya cathayensis Sarg.) is an economically important woody plant in China, but its long juvenile phase delays yield. MicroRNAs (miRNAs) are critical regulators of genes and important for normal plant development and physiology, including flower development. We used Solexa technology to sequence two small RNA libraries from two floral differentiation stages in hickory to identify miRNAs related to flower development. We identified 39 conserved miRNA sequences from 114 loci belonging to 23 families as well as two novel and ten potential novel miRNAs belonging to nine families. Moreover, 35 conserved miRNA*s and two novel miRNA*s were detected. Twenty miRNA sequences from 49 loci belonging to 11 families were differentially expressed; all were up-regulated at the later stage of flower development in hickory. Quantitative real-time PCR of 12 conserved miRNA sequences, five novel miRNA families, and two novel miRNA*s validated that all were expressed during hickory flower development, and the expression patterns were similar to those detected with Solexa sequencing. Finally, a total of 146 targets of the novel and conserved miRNAs were predicted. This study identified a diverse set of miRNAs that were closely related to hickory flower development and that could help in plant floral induction.
Repeat-aware modeling and correction of short read errors.
Yang, Xiao; Aluru, Srinivas; Dorman, Karin S
2011-02-15
High-throughput short read sequencing is revolutionizing genomics and systems biology research by enabling cost-effective deep coverage sequencing of genomes and transcriptomes. Error detection and correction are crucial to many short read sequencing applications including de novo genome sequencing, genome resequencing, and digital gene expression analysis. Short read error detection is typically carried out by counting the observed frequencies of kmers in reads and validating those with frequencies exceeding a threshold. In case of genomes with high repeat content, an erroneous kmer may be frequently observed if it has few nucleotide differences with valid kmers with multiple occurrences in the genome. Error detection and correction were mostly applied to genomes with low repeat content and this remains a challenging problem for genomes with high repeat content. We develop a statistical model and a computational method for error detection and correction in the presence of genomic repeats. We propose a method to infer genomic frequencies of kmers from their observed frequencies by analyzing the misread relationships among observed kmers. We also propose a method to estimate the threshold useful for validating kmers whose estimated genomic frequency exceeds the threshold. We demonstrate that superior error detection is achieved using these methods. Furthermore, we break away from the common assumption of uniformly distributed errors within a read, and provide a framework to model position-dependent error occurrence frequencies common to many short read platforms. Lastly, we achieve better error correction in genomes with high repeat content. The software is implemented in C++ and is freely available under GNU GPL3 license and Boost Software V1.0 license at "http://aluru-sun.ece.iastate.edu/doku.php?id = redeem". We introduce a statistical framework to model sequencing errors in next-generation reads, which led to promising results in detecting and correcting errors for genomes with high repeat content.
NASA Astrophysics Data System (ADS)
Flot, J.-F.; Licuanan, W. Y.; Nakano, Y.; Payri, C.; Cruaud, C.; Tillier, S.
2008-12-01
The taxonomy of corals of the genus Seriatopora has not previously been studied using molecular sequence markers. As a first step toward a re-evaluation of species boundaries in this genus, mitochondrial sequence variability was analyzed in 51 samples collected from Okinawa, New Caledonia, and the Philippines. Four clusters of sequences were detected that showed little concordance with species currently recognized on a morphological basis. The most likely explanation is that the skeletal characters used for species identification are highly variable (polymorphic or phenotypically plastic); alternative explanations include introgression/hybridization, or deep coalescence and the retention of ancestral mitochondrial polymorphisms. In all individuals sequenced, two copies of trnW were found on either side of the atp8 gene near the putative D-loop, a novel mitochondrial gene arrangement that may have arisen from a duplication of the trnW-atp8 region followed by a deletion of one atp8.
DNA Replication Profiling Using Deep Sequencing.
Saayman, Xanita; Ramos-Pérez, Cristina; Brown, Grant W
2018-01-01
Profiling of DNA replication during progression through S phase allows a quantitative snap-shot of replication origin usage and DNA replication fork progression. We present a method for using deep sequencing data to profile DNA replication in S. cerevisiae.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mason, Olivia U.; Hazen, Terry C.; Borglin, Sharon
The Deepwater Horizon oil spill in the Gulf of Mexico resulted in a deep-sea hydrocarbon plume that caused a shift in the indigenous microbial community composition with unknown ecological consequences. Early in the spill history, a bloom of uncultured, thus uncharacterized, members of the Oceanospirillales was previously detected, but their role in oil disposition was unknown. Here our aim was to determine the functional role of the Oceanospirillales and other active members of the indigenous microbial community using deep sequencing of community DNA and RNA, as well as single-cell genomics. Shotgun metagenomic and metatranscriptomic sequencing revealed that genes for motility,more » chemotaxis and aliphatic hydrocarbon degradation were significantly enriched and expressed in the hydrocarbon plume samples compared with uncontaminated seawater collected from plume depth. In contrast, although genes coding for degradation of more recalcitrant compounds, such as benzene, toluene, ethylbenzene, total xylenes and polycyclic aromatic hydrocarbons, were identified in the metagenomes, they were expressed at low levels, or not at all based on analysis of the metatranscriptomes. Isolation and sequencing of two Oceanospirillales single cells revealed that both cells possessed genes coding for n-alkane and cycloalkane degradation. Specifically, the near-complete pathway for cyclohexane oxidation in the Oceanospirillales single cells was elucidated and supported by both metagenome and metatranscriptome data. The draft genome also included genes for chemotaxis, motility and nutrient acquisition strategies that were also identified in the metagenomes and metatranscriptomes. These data point towards a rapid response of members of the Oceanospirillales to aliphatic hydrocarbons in the deep sea.« less
DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data.
Arango-Argoty, Gustavo; Garner, Emily; Pruden, Amy; Heath, Lenwood S; Vikesland, Peter; Zhang, Liqing
2018-02-01
Growing concerns about increasing rates of antibiotic resistance call for expanded and comprehensive global monitoring. Advancing methods for monitoring of environmental media (e.g., wastewater, agricultural waste, food, and water) is especially needed for identifying potential resources of novel antibiotic resistance genes (ARGs), hot spots for gene exchange, and as pathways for the spread of ARGs and human exposure. Next-generation sequencing now enables direct access and profiling of the total metagenomic DNA pool, where ARGs are typically identified or predicted based on the "best hits" of sequence searches against existing databases. Unfortunately, this approach produces a high rate of false negatives. To address such limitations, we propose here a deep learning approach, taking into account a dissimilarity matrix created using all known categories of ARGs. Two deep learning models, DeepARG-SS and DeepARG-LS, were constructed for short read sequences and full gene length sequences, respectively. Evaluation of the deep learning models over 30 antibiotic resistance categories demonstrates that the DeepARG models can predict ARGs with both high precision (> 0.97) and recall (> 0.90). The models displayed an advantage over the typical best hit approach, yielding consistently lower false negative rates and thus higher overall recall (> 0.9). As more data become available for under-represented ARG categories, the DeepARG models' performance can be expected to be further enhanced due to the nature of the underlying neural networks. Our newly developed ARG database, DeepARG-DB, encompasses ARGs predicted with a high degree of confidence and extensive manual inspection, greatly expanding current ARG repositories. The deep learning models developed here offer more accurate antimicrobial resistance annotation relative to current bioinformatics practice. DeepARG does not require strict cutoffs, which enables identification of a much broader diversity of ARGs. The DeepARG models and database are available as a command line version and as a Web service at http://bench.cs.vt.edu/deeparg .
DEEP MOTIF DASHBOARD: VISUALIZING AND UNDERSTANDING GENOMIC SEQUENCES USING DEEP NEURAL NETWORKS.
Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun
2017-01-01
Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence's saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them.
Krawitz, Peter M; Schiska, Daniela; Krüger, Ulrike; Appelt, Sandra; Heinrich, Verena; Parkhomchuk, Dmitri; Timmermann, Bernd; Millan, Jose M; Robinson, Peter N; Mundlos, Stefan; Hecht, Jochen; Gross, Manfred
2014-01-01
Usher syndrome is an autosomal recessive disorder characterized both by deafness and blindness. For the three clinical subtypes of Usher syndrome causal mutations in altogether 12 genes and a modifier gene have been identified. Due to the genetic heterogeneity of Usher syndrome, the molecular analysis is predestined for a comprehensive and parallelized analysis of all known genes by next-generation sequencing (NGS) approaches. We describe here the targeted enrichment and deep sequencing for exons of Usher genes and compare the costs and workload of this approach compared to Sanger sequencing. We also present a bioinformatics analysis pipeline that allows us to detect single-nucleotide variants, short insertions and deletions, as well as copy number variations of one or more exons on the same sequence data. Additionally, we present a flexible in silico gene panel for the analysis of sequence variants, in which newly identified genes can easily be included. We applied this approach to a cohort of 44 Usher patients and detected biallelic pathogenic mutations in 35 individuals and monoallelic mutations in eight individuals of our cohort. Thirty-nine of the sequence variants, including two heterozygous deletions comprising several exons of USH2A, have not been reported so far. Our NGS-based approach allowed us to assess single-nucleotide variants, small indels, and whole exon deletions in a single test. The described diagnostic approach is fast and cost-effective with a high molecular diagnostic yield. PMID:25333064
Krawitz, Peter M; Schiska, Daniela; Krüger, Ulrike; Appelt, Sandra; Heinrich, Verena; Parkhomchuk, Dmitri; Timmermann, Bernd; Millan, Jose M; Robinson, Peter N; Mundlos, Stefan; Hecht, Jochen; Gross, Manfred
2014-09-01
Usher syndrome is an autosomal recessive disorder characterized both by deafness and blindness. For the three clinical subtypes of Usher syndrome causal mutations in altogether 12 genes and a modifier gene have been identified. Due to the genetic heterogeneity of Usher syndrome, the molecular analysis is predestined for a comprehensive and parallelized analysis of all known genes by next-generation sequencing (NGS) approaches. We describe here the targeted enrichment and deep sequencing for exons of Usher genes and compare the costs and workload of this approach compared to Sanger sequencing. We also present a bioinformatics analysis pipeline that allows us to detect single-nucleotide variants, short insertions and deletions, as well as copy number variations of one or more exons on the same sequence data. Additionally, we present a flexible in silico gene panel for the analysis of sequence variants, in which newly identified genes can easily be included. We applied this approach to a cohort of 44 Usher patients and detected biallelic pathogenic mutations in 35 individuals and monoallelic mutations in eight individuals of our cohort. Thirty-nine of the sequence variants, including two heterozygous deletions comprising several exons of USH2A, have not been reported so far. Our NGS-based approach allowed us to assess single-nucleotide variants, small indels, and whole exon deletions in a single test. The described diagnostic approach is fast and cost-effective with a high molecular diagnostic yield.
Fonseca-Coronado, Salvador; Escobar-Gutiérrez, Alejandro; Ruiz-Tovar, Karina; Cruz-Rivera, Mayra Yolanda; Rivera-Osorio, Pilar; Vazquez-Pichardo, Mauricio; Carpio-Pedroza, Juan Carlos; Ruíz-Pacheco, Juan Alberto; Cazares, Fernando
2012-01-01
The use of telaprevir and boceprevir, both protease inhibitors (PI), as part of the specifically targeted antiviral therapy for hepatitis C (STAT-C) has significantly improved sustained virologic response (SVR) rates. However, different clinical studies have also identified several mutations associated with viral resistance to both PIs. In the absence of selective pressure, drug-resistant hepatitis C virus (HCV) mutants are generally present at low frequency, making mutation detection challenging. Here, we describe a mismatch amplification mutation assay (MAMA) PCR method for the specific detection of naturally occurring drug-resistant HCV mutants. MAMA PCR successfully identified the corresponding HCV variants, while conventional methods such as direct sequencing, endpoint limiting dilution (EPLD), and bacterial cloning were not sensitive enough to detect circulating drug-resistant mutants in clinical specimens. Ultradeep pyrosequencing was used to confirm the presence of the corresponding HCV mutants. In treatment-naïve patients, the frequency of all resistant variants was below 1%. Deep amplicon sequencing allowed a detailed analysis of the structure of the viral population among these patients, showing that the evolution of the NS3 is limited to a rather small sequence space. Monitoring of HCV drug resistance before and during treatment is likely to provide important information for management of patients undergoing anti-HCV therapy. PMID:22116161
Zandrino, Franco; La Paglia, Ernesto; Musante, Francesco
2010-01-01
To assess the diagnostic accuracy of magnetic resonance imaging in local staging of endometrial carcinoma, and to review the results and pitfalls described in the literature. Thirty women with a histological diagnosis of endometrial carcinoma underwent magnetic resonance imaging. Unenhanced T2-weighted and dynamic contrast-enhanced Ti-weighted sequences were obtained. Hysterectomy and salpingo-oophorectomy was performed in all patients. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy were calculated for the detection of deep myometrial and cervical infiltration. For deep myometrial infiltration T2-weighted sequences reached a sensitivity of 85%, specificity of 76%, PPV of 73%, NVP of 87%, and accuracy of 80%, while contrast-enhanced scans reached a sensitivity of 90%, specificity of 80%, PPV of 82%, NPV of 89%, and accuracy of 85%. For cervical infiltration T2-weighted sequences reached a sensitivity of 75%, specificity of 88%, PPV of 50%, NPV of 96%, and accuracy of 87%, while contrast-enhanced scans reached a sensitivity of 100%, specificity of 94%, PPV of 75%, NPV of 100%, and accuracy of 95%. Unenhanced and dynamic gadolinium-enhanced magnetic resonance allows accurate assessment of myometrial and cervical infiltration. Information provided by magnetic resonance imaging can define prognosis and management.
Characterization by Deep Sequencing of Prunus virus T, a Novel Tepovirus Infecting Prunus Species.
Marais, Armelle; Faure, Chantal; Mustafayev, Eldar; Barone, Maria; Alioto, Daniela; Candresse, Thierry
2015-01-01
Double-stranded RNAs purified from a cherry tree collected in Italy and a plum tree collected in Azerbaijan were submitted to deep sequencing. Contigs showing weak but significant identity with various members of the family Betaflexiviridae were reconstructed. Sequence comparisons led to the conclusion that the viral isolates identified in the analyzed Prunus plants belong to the same viral species. Their genome organization is similar to that of some members of the family Betaflexiviridae, with three overlapping open reading frames (RNA polymerase, movement protein, and capsid protein). Phylogenetic analyses of the deduced encoded proteins showed a clustering with the sole member of the genus Tepovirus, Potato virus T (PVT). Given these results, the name Prunus virus T (PrVT) is proposed for the new virus. It should be considered as a new member of the genus Tepovirus, even if the level of nucleotide identity with PVT is borderline with the genus demarcation criteria for the family Betaflexiviridae. A reverse-transcription polymerase chain reaction detection assay was developed and allowed the identification of two other PrVT isolates and an estimate of 1% prevalence in the large Prunus collection screened. Due to the mixed infection status of all hosts identified to date, it was not possible to correlate the presence of PrVT with specific symptoms.
ASR5 is involved in the regulation of miRNA expression in rice.
Neto, Lauro Bücker; Arenhart, Rafael Augusto; de Oliveira, Luiz Felipe Valter; de Lima, Júlio Cesar; Bodanese-Zanettini, Maria Helena; Margis, Rogerio; Margis-Pinheiro, Márcia
2015-11-01
The work describes an ASR knockdown transcriptomic analysis by deep sequencing of rice root seedlings and the transactivation of ASR cis-acting elements in the upstream region of a MIR gene. MicroRNAs are key regulators of gene expression that guide post-transcriptional control of plant development and responses to environmental stresses. ASR (ABA, Stress and Ripening) proteins are plant-specific transcription factors with key roles in different biological processes. In rice, ASR proteins have been suggested to participate in the regulation of stress response genes. This work describes the transcriptomic analysis by deep sequencing two libraries, comparing miRNA abundance from the roots of transgenic ASR5 knockdown rice seedlings with that of the roots of wild-type non-transformed rice seedlings. Members of 59 miRNA families were detected, and 276 mature miRNAs were identified. Our analysis detected 112 miRNAs that were differentially expressed between the two libraries. A predicted inverse correlation between miR167abc and its target gene (LOC_Os07g29820) was confirmed using RT-qPCR. Protoplast transactivation assays showed that ASR5 is able to recognize binding sites upstream of the MIR167a gene and drive its expression in vivo. Together, our data establish a comparative study of miRNAome profiles and is the first study to suggest the involvement of ASR proteins in miRNA gene regulation.
Tanaka, Yasuhiro; Nishida, Kei; Nakamura, Takashi; Chapagain, Saroj Kumar; Inoue, Daisuke; Sei, Kazunari; Mori, Kazuhiro; Sakamoto, Yasushi; Kazama, Futaba
2012-03-01
Although groundwater is a major water supply source in the Kathmandu Valley of Nepal, it is known that the groundwater has significant microbial contamination exceeding the drinking water quality standard recommended by the World Health Organization (WHO), and that this has been implicated in causing a variety of diseases among people living in the valley. However, little is known about the distribution of pathogenic microbes in the groundwater. Here, we analysed the microbial communities of the six water samples from deep tube wells by using the 16S rRNA gene sequences based culture-independent method. The analysis showed that the groundwater has been contaminated with various types of opportunistic microbes in addition to fecal microbes. Particularly, the clonal sequences related to the opportunistic microbes within the genus Acinetobacter were detected in all samples. As many strains of Acinetobacter are known as multi-drug resistant microbes that are currently spreading in the world, we conducted a molecular-based survey for detection of the gene encoding carbapenem-hydrolysing β-lactamase (bla(oxa-23-like) gene), which is a key enzyme responsible for multi-drug resistance, in the groundwater samples. Nested polymerase chain reaction (PCR) using two specific primer sets for amplifying bla(oxa-23-like) gene indicated that two of six groundwater samples contain multi-drug resistant Acinetobacter.
Garnaud, Cécile; Botterel, Françoise; Sertour, Natacha; Bougnoux, Marie-Elisabeth; Dannaoui, Eric; Larrat, Sylvie; Hennequin, Christophe; Guinea, Jesus; Cornet, Muriel; Maubon, Danièle
2015-09-01
MDR Candida strains are emerging. Next-generation sequencing (NGS), which enables extensive and deep genome analysis, was used to investigate echinocandin and azole resistance in clinical Candida isolates. Six genes commonly involved in antifungal resistance (ERG11, ERG3, TAC1, CgPDR1, FKS1 and FKS2) were analysed using NGS in 40 Candida isolates (18 Candida albicans, 15 Candida glabrata and 7 Candida parapsilosis). The strategy was validated using strains with known sequences. Then, 8 clinical strains displaying antifungal resistance and 23 sequential isolates collected from 10 patients receiving antifungal therapy were analysed. A total of 391 SNPs were detected, among which 6 coding SNPs were reported for the first time. Novel genetic alterations were detected in both azole and echinocandin resistance genes. A C. glabrata strain, which was resistant to echinocandins but highly susceptible to azoles, harboured an FKS2 S663P mutation plus a novel presumed loss-of-function CgPDR1 mutation. This isolate was from a patient with deep-seated and urinary candidiasis. Another C. glabrata isolate, with an MDR phenotype, carried a new FKS2 S663A mutation and a new putative gain-of-function CgPDR1 mutation (T370I); this isolate showed mutated (80%) and WT (20%) populations and was collected after 75 days of exposure to caspofungin from a patient who underwent complicated abdominal surgery. This study shows that NGS can be used for extensive assessment of genetic mutations involved in antifungal resistance. This type of wide genome approach will become very valuable for detecting mechanisms of resistance in clinical strains subjected to multidrug pressure. © The Author 2015. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Lasigliè, Denise; Mensa-Vilaro, Anna; Ferrera, Denise; Caorsi, Roberta; Penco, Federica; Santamaria, Giuseppe; Di Duca, Marco; Amico, Giulia; Nakagawa, Kenji; Antonini, Francesca; Tommasini, Alberto; Consolini, Rita; Insalaco, Antonella; Cattalini, Marco; Obici, Laura; Gallizzi, Romina; Santarelli, Francesca; Del Zotto, Genny; Severino, Mariasavina; Rubartelli, Anna; Ravazzolo, Roberto; Martini, Alberto; Ceccherini, Isabella; Nishikomori, Ryuta; Gattorno, Marco; Arostegui, Juan I; Borghini, Silvia
2017-11-01
To evaluate the rate of somatic NLRP3 mosaicism in an Italian cohort of mutation-negative patients with cryopyrin-associated periodic syndrome (CAPS). The study enrolled 14 patients with a clinical phenotype consistent with CAPS in whom Sanger sequencing of the NLRP3 gene yielded negative results. Patients' DNA were subjected to amplicon-based NLRP3 deep sequencing. Low-level somatic NLRP3 mosaicism has been detected in 4 patients, 3 affected with chronic infantile neurological cutaneous and articular syndrome and 1 with Muckle-Wells syndrome. Identified nucleotide substitutions encode for 4 different amino acid exchanges, with 2 of them being novel (p.Y563C and p.G564S). In vitro functional studies confirmed the deleterious behavior of the 4 somatic NLRP3 mutations. Among the different neurological manifestations detected, 1 patient displayed mild loss of white matter volume on brain magnetic resonance imaging. The allele frequency of somatic NLRP3 mutations occurs generally under 15%, considered the threshold of detectability using the Sanger method of DNA sequencing. Consequently, routine genetic diagnostic of CAPS should be currently performed by next-generation techniques ensuring high coverage to identify also low-level mosaicism, whose actual frequency is yet unknown and probably underestimated.
Diversity of Bacteria at Healthy Human Conjunctiva
Dong, Qunfeng; Brulc, Jennifer M.; Iovieno, Alfonso; Bates, Brandon; Garoutte, Aaron; Miller, Darlene; Revanna, Kashi V.; Gao, Xiang; Antonopoulos, Dionysios A.; Slepak, Vladlen Z.
2011-01-01
Purpose. Ocular surface (OS) microbiota contributes to infectious and autoimmune diseases of the eye. Comprehensive analysis of microbial diversity at the OS has been impossible because of the limitations of conventional cultivation techniques. This pilot study aimed to explore true diversity of human OS microbiota using DNA sequencing-based detection and identification of bacteria. Methods. Composition of the bacterial community was characterized using deep sequencing of the 16S rRNA gene amplicon libraries generated from total conjunctival swab DNA. The DNA sequences were classified and the diversity parameters measured using bioinformatics software ESPRIT and MOTHUR and tools available through the Ribosomal Database Project-II (RDP-II). Results. Deep sequencing of conjunctival rDNA from four subjects yielded a total of 115,003 quality DNA reads, corresponding to 221 species-level phylotypes per subject. The combined bacterial community classified into 5 phyla and 59 distinct genera. However, 31% of all DNA reads belonged to unclassified or novel bacteria. The intersubject variability of individual OS microbiomes was very significant. Regardless, 12 genera—Pseudomonas, Propionibacterium, Bradyrhizobium, Corynebacterium, Acinetobacter, Brevundimonas, Staphylococci, Aquabacterium, Sphingomonas, Streptococcus, Streptophyta, and Methylobacterium—were ubiquitous among the analyzed cohort and represented the putative “core” of conjunctival microbiota. The other 47 genera accounted for <4% of the classified portion of this microbiome. Unexpectedly, healthy conjunctiva contained many genera that are commonly identified as ocular surface pathogens. Conclusions. The first DNA sequencing-based survey of bacterial population at the conjunctiva have revealed an unexpectedly diverse microbial community. All analyzed samples contained ubiquitous (core) genera that included commensal, environmental, and opportunistic pathogenic bacteria. PMID:21571682
Barrett, Nolan H.; McCarthy, Peter J.
2017-01-01
ABSTRACT The proteobacterium Alteromonas sp. strain V450 was isolated from the Atlantic deep-sea sponge Leiodermatium sp. Here, we report the draft genome sequence of this strain, with a genome size of approx. 4.39 Mb and a G+C content of 44.01%. The results will aid deep-sea microbial ecology, evolution, and sponge-microbe association studies. PMID:28153886
Ferragut, Fátima; Vega, Celina G; Mauroy, Axel; Conceição-Neto, Nádia; Zeller, Mark; Heylen, Elisabeth; Uriarte, Enrique Louge; Bilbao, Gladys; Bok, Marina; Matthijnssens, Jelle; Thiry, Etienne; Badaracco, Alejandra; Parreño, Viviana
2016-06-01
Bovine noroviruses are enteric pathogens detected in fecal samples of both diarrheic and non-diarrheic calves from several countries worldwide. However, epidemiological information regarding bovine noroviruses is still lacking for many important cattle producing countries from South America. In this study, three bovine norovirus genogroup III sequences were determined by conventional RT-PCR and Sanger sequencing in feces from diarrheic dairy calves from Argentina (B4836, B4848, and B4881, all collected in 2012). Phylogenetic studies based on a partial coding region for the RNA-dependent RNA polymerase (RdRp, 503 nucleotides) of these three samples suggested that two of them (B4836 and B4881) belong to genotype 2 (GIII.2) while the third one (B4848) was more closely related to genotype 1 (GIII.1) strains. By deep sequencing, the capsid region from two of these strains could be determined. This confirmed the circulation of genotype 1 (B4848) together with the presence of another sequence (B4881) sharing its highest genetic relatedness with genotype 1, but sufficiently distant to constitute a new genotype. This latter strain was shown in silico to be a recombinant: phylogenetic divergence was detected between its RNA-dependent RNA polymerase coding sequence (genotype GIII.2) and its capsid protein coding sequence (genotype GIII.1 or a potential norovirus genotype). According to this data, this strain could be the second genotype GIII.2_GIII.1 bovine norovirus recombinant described in literature worldwide. Further analysis suggested that this strain could even be a potential norovirus GIII genotype, tentatively named GIII.4. The data provides important epidemiological and evolutionary information on bovine noroviruses circulating in South America. Copyright © 2016. Published by Elsevier B.V.
Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome
Margulies, Elliott H.; Cooper, Gregory M.; Asimenos, George; Thomas, Daryl J.; Dewey, Colin N.; Siepel, Adam; Birney, Ewan; Keefe, Damian; Schwartz, Ariel S.; Hou, Minmei; Taylor, James; Nikolaev, Sergey; Montoya-Burgos, Juan I.; Löytynoja, Ari; Whelan, Simon; Pardi, Fabio; Massingham, Tim; Brown, James B.; Bickel, Peter; Holmes, Ian; Mullikin, James C.; Ureta-Vidal, Abel; Paten, Benedict; Stone, Eric A.; Rosenbloom, Kate R.; Kent, W. James; Bouffard, Gerard G.; Guan, Xiaobin; Hansen, Nancy F.; Idol, Jacquelyn R.; Maduro, Valerie V.B.; Maskeri, Baishali; McDowell, Jennifer C.; Park, Morgan; Thomas, Pamela J.; Young, Alice C.; Blakesley, Robert W.; Muzny, Donna M.; Sodergren, Erica; Wheeler, David A.; Worley, Kim C.; Jiang, Huaiyang; Weinstock, George M.; Gibbs, Richard A.; Graves, Tina; Fulton, Robert; Mardis, Elaine R.; Wilson, Richard K.; Clamp, Michele; Cuff, James; Gnerre, Sante; Jaffe, David B.; Chang, Jean L.; Lindblad-Toh, Kerstin; Lander, Eric S.; Hinrichs, Angie; Trumbower, Heather; Clawson, Hiram; Zweig, Ann; Kuhn, Robert M.; Barber, Galt; Harte, Rachel; Karolchik, Donna; Field, Matthew A.; Moore, Richard A.; Matthewson, Carrie A.; Schein, Jacqueline E.; Marra, Marco A.; Antonarakis, Stylianos E.; Batzoglou, Serafim; Goldman, Nick; Hardison, Ross; Haussler, David; Miller, Webb; Pachter, Lior; Green, Eric D.; Sidow, Arend
2007-01-01
A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization. PMID:17567995
Fang, Jiasong; Kato, Chiaki; Runko, Gabriella M.; Nogi, Yuichi; Hori, Tomoyuki; Li, Jiangtao; Morono, Yuki; Inagaki, Fumio
2017-01-01
Phylogenetically diverse microorganisms have been observed in marine subsurface sediments down to ~2.5 km below the seafloor (kmbsf). However, very little is known about the pressure-adapted and/or pressure-loving microorganisms, the so called piezophiles, in the deep subseafloor biosphere, despite that pressure directly affects microbial physiology, metabolism, and biogeochemical processes of carbon and other elements in situ. In this study, we studied taxonomic compositions of microbial communities in high-pressure incubated sediment, obtained during the Integrated Ocean Drilling Program (IODP) Expedition 337 off the Shimokita Peninsula, Japan. Analysis of 16S rRNA gene-tagged sequences showed that members of spore-forming bacteria within Firmicutes and Actinobacteria were predominantly detected in all enrichment cultures from ~1.5 to 2.4 km-deep sediment samples, followed by members of Proteobacteria, Acidobacteria, and Bacteroidetes according to the sequence frequency. To further study the physiology of the deep subseafloor sedimentary piezophilic bacteria, we isolated and characterized two bacterial strains, 19R1-5 and 29R7-12, from 1.9 and 2.4 km-deep sediment samples, respectively. The isolates were both low G+C content, gram-positive, endospore-forming and facultative anaerobic piezophilic bacteria, closely related to Virgibacillus pantothenticus and Bacillus subtilis within the phylum Firmicutes, respectively. The optimal pressure and temperature conditions for growth were 20 MPa and 42°C for strain 19R1-5, and 10 MPa and 43°C for strain 29R7-12. Bacterial (endo)spores were observed in both the enrichment and pure cultures examined, suggesting that these piezophilic members were derived from microbial communities buried in the ~20 million-year-old coal-bearing sediments after the long-term survival as spores and that the deep biosphere may host more abundant gram-positive spore-forming bacteria and their spores than hitherto recognized. PMID:28220112
Fang, Jiasong; Kato, Chiaki; Runko, Gabriella M; Nogi, Yuichi; Hori, Tomoyuki; Li, Jiangtao; Morono, Yuki; Inagaki, Fumio
2017-01-01
Phylogenetically diverse microorganisms have been observed in marine subsurface sediments down to ~2.5 km below the seafloor (kmbsf). However, very little is known about the pressure-adapted and/or pressure-loving microorganisms, the so called piezophiles, in the deep subseafloor biosphere, despite that pressure directly affects microbial physiology, metabolism, and biogeochemical processes of carbon and other elements in situ . In this study, we studied taxonomic compositions of microbial communities in high-pressure incubated sediment, obtained during the Integrated Ocean Drilling Program (IODP) Expedition 337 off the Shimokita Peninsula, Japan. Analysis of 16S rRNA gene-tagged sequences showed that members of spore-forming bacteria within Firmicutes and Actinobacteria were predominantly detected in all enrichment cultures from ~1.5 to 2.4 km-deep sediment samples, followed by members of Proteobacteria, Acidobacteria, and Bacteroidetes according to the sequence frequency. To further study the physiology of the deep subseafloor sedimentary piezophilic bacteria, we isolated and characterized two bacterial strains, 19R1-5 and 29R7-12, from 1.9 and 2.4 km-deep sediment samples, respectively. The isolates were both low G+C content, gram-positive, endospore-forming and facultative anaerobic piezophilic bacteria, closely related to Virgibacillus pantothenticus and Bacillus subtilis within the phylum Firmicutes, respectively. The optimal pressure and temperature conditions for growth were 20 MPa and 42°C for strain 19R1-5, and 10 MPa and 43°C for strain 29R7-12. Bacterial (endo)spores were observed in both the enrichment and pure cultures examined, suggesting that these piezophilic members were derived from microbial communities buried in the ~20 million-year-old coal-bearing sediments after the long-term survival as spores and that the deep biosphere may host more abundant gram-positive spore-forming bacteria and their spores than hitherto recognized.
Medial tibial pain: a dynamic contrast-enhanced MRI study.
Mattila, K T; Komu, M E; Dahlström, S; Koskinen, S K; Heikkilä, J
1999-09-01
The purpose of this study was to compare the sensitivity of different magnetic resonance imaging (MRI) sequences to depict periosteal edema in patients with medial tibial pain. Additionally, we evaluated the ability of dynamic contrast-enhanced imaging (DCES) to depict possible temporal alterations in muscular perfusion within compartments of the leg. Fifteen patients with medial tibial pain were examined with MRI. T1-, T2-weighted, proton density axial images and dynamic and static phase post-contrast images were compared in ability to depict periosteal edema. STIR was used in seven cases to depict bone marrow edema. Images were analyzed to detect signs of compartment edema. Region-of-interest measurements in compartments were performed during DCES and compared with controls. In detecting periosteal edema, post-contrast T1-weighted images were better than spin echo T2-weighted and proton density images or STIR images, but STIR depicted the bone marrow edema best. DCES best demonstrated the gradually enhancing periostitis. Four subjects with severe periosteal edema had visually detectable pathologic enhancement during DCES in the deep posterior compartment of the leg. Percentage enhancement in the deep posterior compartment of the leg was greater in patients than in controls. The fast enhancement phase in the deep posterior compartment began slightly slower in patients than in controls, but it continued longer. We believe that periosteal edema in bone stress reaction can cause impairment of venous flow in the deep posterior compartment. MRI can depict both these conditions. In patients with medial tibial pain, MR imaging protocol should include axial STIR images (to depict bone pathology) with T1-weighted axial pre and post-contrast images, and dynamic contrast enhanced imaging to show periosteal edema and abnormal contrast enhancement within a compartment.
Zhang, Wenqian; Meehan, Joe; Su, Zhenqiang; Ng, Hui Wen; Shu, Mao; Luo, Heng; Ge, Weigong; Perkins, Roger; Tong, Weida; Hong, Huixiao
2014-01-01
Due to a significant decline in the costs associated with next-generation sequencing, it has become possible to decipher the genetic architecture of a population by sequencing a large number of individuals to a deep coverage. The Korean Personal Genomes Project (KPGP) recently sequenced 35 Korean genomes at high coverage using the Illumina Hiseq platform and made the deep sequencing data publicly available, providing the scientific community opportunities to decipher the genetic architecture of the Korean population. In this study, we used two single nucleotide variant (SNV) calling pipelines: mapping the raw reads obtained from whole genome sequencing of 35 Korean individuals in KPGP using BWA and SOAP2 followed by SNV calling using SAMtools and SOAPsnp, respectively. The consensus SNVs obtained from the two SNV pipelines were used to represent the SNVs of the Korean population. We compared these SNVs to those from 17 other populations provided by the HapMap consortium and the 1000 Genomes Project (1KGP) and identified SNVs that were only present in the Korean population. We studied the mutation spectrum and analyzed the genes of non-synonymous SNVs only detected in the Korean population. We detected a total of 8,555,726 SNVs in the 35 Korean individuals and identified 1,213,613 SNVs detected in at least one Korean individual (SNV-1) and 12,640 in all of 35 Korean individuals (SNV-35) but not in 17 other populations. In contrast with the SNVs common to other populations in HapMap and 1KGP, the Korean only SNVs had high percentages of non-silent variants, emphasizing the unique roles of these Korean only SNVs in the Korean population. Specifically, we identified 8,361 non-synonymous Korean only SNVs, of which 58 SNVs existed in all 35 Korean individuals. The 5,754 genes of non-synonymous Korean only SNVs were highly enriched in some metabolic pathways. We found adhesion is the top disease term associated with SNV-1 and Nelson syndrome is the only disease term associated with SNV-35. We found that a significant number of Korean only SNVs are in genes that are associated with the drug term of adenosine. We identified the SNVs that were found in the Korean population but not seen in other populations, and explored the corresponding genes and pathways as well as the associated disease terms and drug terms. The results expand our knowledge of the genetic architecture of the Korean population, which will benefit the implementation of personalized medicine for the Korean population.
Steele-Stallard, Heather B; Le Quesne Stabej, Polona; Lenassi, Eva; Luxon, Linda M; Claustres, Mireille; Roux, Anne-Francoise; Webster, Andrew R; Bitner-Glindzicz, Maria
2013-08-08
Usher Syndrome is the leading cause of inherited deaf-blindness. It is divided into three subtypes, of which the most common is Usher type 2, and the USH2A gene accounts for 75-80% of cases. Despite recent sequencing strategies, in our cohort a significant proportion of individuals with Usher type 2 have just one heterozygous disease-causing mutation in USH2A, or no convincing disease-causing mutations across nine Usher genes. The purpose of this study was to improve the molecular diagnosis in these families by screening USH2A for duplications, heterozygous deletions and a common pathogenic deep intronic variant USH2A: c.7595-2144A>G. Forty-nine Usher type 2 or atypical Usher families who had missing mutations (mono-allelic USH2A or no mutations following Sanger sequencing of nine Usher genes) were screened for duplications/deletions using the USH2A SALSA MLPA reagent kit (MRC-Holland). Identification of USH2A: c.7595-2144A>G was achieved by Sanger sequencing. Mutations were confirmed by a combination of reverse transcription PCR using RNA extracted from nasal epithelial cells or fibroblasts, and by array comparative genomic hybridisation with sequencing across the genomic breakpoints. Eight mutations were identified in 23 Usher type 2 families (35%) with one previously identified heterozygous disease-causing mutation in USH2A. These consisted of five heterozygous deletions, one duplication, and two heterozygous instances of the pathogenic variant USH2A: c.7595-2144A>G. No variants were found in the 15 Usher type 2 families with no previously identified disease-causing mutations. In 11 atypical families, none of whom had any previously identified convincing disease-causing mutations, the mutation USH2A: c.7595-2144A>G was identified in a heterozygous state in one family. All five deletions and the heterozygous duplication we report here are novel. This is the first time that a duplication in USH2A has been reported as a cause of Usher syndrome. We found that 8 of 23 (35%) of 'missing' mutations in Usher type 2 probands with only a single heterozygous USH2A mutation detected with Sanger sequencing could be attributed to deletions, duplications or a pathogenic deep intronic variant. Future mutation detection strategies and genetic counselling will need to take into account the prevalence of these types of mutations in order to provide a more comprehensive diagnostic service.
Identification of Prostate Cancer-Specific microDNAs
2016-02-01
circular DNA by rolling circle amplification (RCA) and then amplified DNA fragments were subject to deep sequencing. Deep sequencing of the...demonstrate the existence of microDNAs in prostate cancer. We adopted multiple displacement amplification (MDA) with random 2 primers for enriched...prostate cancer cells through multiple displacement amplification and next generation sequencing. R e la ti v e c e ll g ro w th ( % ) 0 20
Sequence-specific bias correction for RNA-seq data using recurrent neural networks.
Zhang, Yao-Zhong; Yamaguchi, Rui; Imoto, Seiya; Miyano, Satoru
2017-01-25
The recent success of deep learning techniques in machine learning and artificial intelligence has stimulated a great deal of interest among bioinformaticians, who now wish to bring the power of deep learning to bare on a host of bioinformatical problems. Deep learning is ideally suited for biological problems that require automatic or hierarchical feature representation for biological data when prior knowledge is limited. In this work, we address the sequence-specific bias correction problem for RNA-seq data redusing Recurrent Neural Networks (RNNs) to model nucleotide sequences without pre-determining sequence structures. The sequence-specific bias of a read is then calculated based on the sequence probabilities estimated by RNNs, and used in the estimation of gene abundance. We explore the application of two popular RNN recurrent units for this task and demonstrate that RNN-based approaches provide a flexible way to model nucleotide sequences without knowledge of predetermined sequence structures. Our experiments show that training a RNN-based nucleotide sequence model is efficient and RNN-based bias correction methods compare well with the-state-of-the-art sequence-specific bias correction method on the commonly used MAQC-III data set. RNNs provides an alternative and flexible way to calculate sequence-specific bias without explicitly pre-determining sequence structures.
Phan, My V. T.; Anh, Pham Hong; Cuong, Nguyen Van; Munnink, Bas B. Oude; van der Hoek, Lia; My, Phuc Tran; Tri, Tue Ngo; Bryant, Juliet E.; Baker, Stephen; Thwaites, Guy; Woolhouse, Mark; Kellam, Paul; Rabaa, Maia A.
2016-01-01
Abstract Coordinated and synchronous surveillance for zoonotic viruses in both human clinical cases and animal reservoirs provides an opportunity to identify interspecies virus movement. Rotavirus (RV) is an important cause of viral gastroenteritis in humans and animals. In this study, we document the RV diversity within co-located humans and animals sampled from the Mekong delta region of Vietnam using a primer-independent, agnostic, deep sequencing approach. A total of 296 stool samples (146 from diarrhoeal human patients and 150 from pigs living in the same geographical region) were directly sequenced, generating the genomic sequences of sixty human rotaviruses (all group A) and thirty-one porcine rotaviruses (thirteen group A, seven group B, six group C, and five group H). Phylogenetic analyses showed the co-circulation of multiple distinct RV group A (RVA) genotypes/strains, many of which were divergent from the strain components of licensed RVA vaccines, as well as considerable virus diversity in pigs including full genomes of rotaviruses in groups B, C, and H, none of which have been previously reported in Vietnam. Furthermore, the detection of an atypical RVA genotype constellation (G4-P[6]-I1-R1-C1-M1-A8-N1-T7-E1-H1) in a human patient and a pig from the same region provides some evidence for a zoonotic event. PMID:28748110
Miya, M.; Sato, Y.; Fukunaga, T.; Sado, T.; Poulsen, J. Y.; Sato, K.; Minamoto, T.; Yamamoto, S.; Yamanaka, H.; Araki, H.; Kondoh, M.; Iwasaki, W.
2015-01-01
We developed a set of universal PCR primers (MiFish-U/E) for metabarcoding environmental DNA (eDNA) from fishes. Primers were designed using aligned whole mitochondrial genome (mitogenome) sequences from 880 species, supplemented by partial mitogenome sequences from 160 elasmobranchs (sharks and rays). The primers target a hypervariable region of the 12S rRNA gene (163–185 bp), which contains sufficient information to identify fishes to taxonomic family, genus and species except for some closely related congeners. To test versatility of the primers across a diverse range of fishes, we sampled eDNA from four tanks in the Okinawa Churaumi Aquarium with known species compositions, prepared dual-indexed libraries and performed paired-end sequencing of the region using high-throughput next-generation sequencing technologies. Out of the 180 marine fish species contained in the four tanks with reference sequences in a custom database, we detected 168 species (93.3%) distributed across 59 families and 123 genera. These fishes are not only taxonomically diverse, ranging from sharks and rays to higher teleosts, but are also greatly varied in their ecology, including both pelagic and benthic species living in shallow coastal to deep waters. We also sampled natural seawaters around coral reefs near the aquarium and detected 93 fish species using this approach. Of the 93 species, 64 were not detected in the four aquarium tanks, rendering the total number of species detected to 232 (from 70 families and 152 genera). The metabarcoding approach presented here is non-invasive, more efficient, more cost-effective and more sensitive than the traditional survey methods. It has the potential to serve as an alternative (or complementary) tool for biodiversity monitoring that revolutionizes natural resource management and ecological studies of fish communities on larger spatial and temporal scales. PMID:26587265
Miya, M; Sato, Y; Fukunaga, T; Sado, T; Poulsen, J Y; Sato, K; Minamoto, T; Yamamoto, S; Yamanaka, H; Araki, H; Kondoh, M; Iwasaki, W
2015-07-01
We developed a set of universal PCR primers (MiFish-U/E) for metabarcoding environmental DNA (eDNA) from fishes. Primers were designed using aligned whole mitochondrial genome (mitogenome) sequences from 880 species, supplemented by partial mitogenome sequences from 160 elasmobranchs (sharks and rays). The primers target a hypervariable region of the 12S rRNA gene (163-185 bp), which contains sufficient information to identify fishes to taxonomic family, genus and species except for some closely related congeners. To test versatility of the primers across a diverse range of fishes, we sampled eDNA from four tanks in the Okinawa Churaumi Aquarium with known species compositions, prepared dual-indexed libraries and performed paired-end sequencing of the region using high-throughput next-generation sequencing technologies. Out of the 180 marine fish species contained in the four tanks with reference sequences in a custom database, we detected 168 species (93.3%) distributed across 59 families and 123 genera. These fishes are not only taxonomically diverse, ranging from sharks and rays to higher teleosts, but are also greatly varied in their ecology, including both pelagic and benthic species living in shallow coastal to deep waters. We also sampled natural seawaters around coral reefs near the aquarium and detected 93 fish species using this approach. Of the 93 species, 64 were not detected in the four aquarium tanks, rendering the total number of species detected to 232 (from 70 families and 152 genera). The metabarcoding approach presented here is non-invasive, more efficient, more cost-effective and more sensitive than the traditional survey methods. It has the potential to serve as an alternative (or complementary) tool for biodiversity monitoring that revolutionizes natural resource management and ecological studies of fish communities on larger spatial and temporal scales.
Protein remote homology detection based on bidirectional long short-term memory.
Li, Shumin; Chen, Junjie; Liu, Bin
2017-10-10
Protein remote homology detection plays a vital role in studies of protein structures and functions. Almost all of the traditional machine leaning methods require fixed length features to represent the protein sequences. However, it is never an easy task to extract the discriminative features with limited knowledge of proteins. On the other hand, deep learning technique has demonstrated its advantage in automatically learning representations. It is worthwhile to explore the applications of deep learning techniques to the protein remote homology detection. In this study, we employ the Bidirectional Long Short-Term Memory (BLSTM) to learn effective features from pseudo proteins, also propose a predictor called ProDec-BLSTM: it includes input layer, bidirectional LSTM, time distributed dense layer and output layer. This neural network can automatically extract the discriminative features by using bidirectional LSTM and the time distributed dense layer. Experimental results on a widely-used benchmark dataset show that ProDec-BLSTM outperforms other related methods in terms of both the mean ROC and mean ROC50 scores. This promising result shows that ProDec-BLSTM is a useful tool for protein remote homology detection. Furthermore, the hidden patterns learnt by ProDec-BLSTM can be interpreted and visualized, and therefore, additional useful information can be obtained.
Lefèvre, Emilie; Bardot, Corinne; Noël, Christophe; Carrias, Jean-François; Viscogliosi, Eric; Amblard, Christian; Sime-Ngando, Télesphore
2007-01-01
This study presents an original 18S rRNA PCR survey of the freshwater picoeukaryote community, and was designed to detect unidentified heterotrophic picoflagellates (size range 0.6-5 microm) which are prevalent throughout the year within the heterotrophic flagellate assemblage in Lake Pavin. Four clone libraries were constructed from samples collected in two contrasting zones in the lake. Computerized statistic tools have suggested that sequence retrieval was representative of the in situ picoplankton diversity. The two sampling zones exhibited similar diversity patterns but shared only about 5% of the operational taxonomic units (OTUs). Phylogenetic analysis clustered our sequences into three taxonomic groups: Alveolates (30% of OTUs), Fungi (23%) and Cercozoa (19%). Fungi thus substantially contributed to the detected diversity, as was additionally supported by direct microscopic observations of fungal zoospores and sporangia. A large fraction of the sequences belonged to parasites, including Alveolate sequences affiliated to the genus Perkinsus known as zooparasites, and chytrids that include host-specific parasitic fungi of various freshwater phytoplankton species, primarily diatoms. Phylogenetic analysis revealed five novel clades that probably include typical freshwater environmental sequences. Overall, from the unsuspected fungal diversity unveiled, we think that fungal zooflagellates have been misidentified as phagotrophic nanoflagellates in previous studies. This is in agreement with a recent experimental demonstration that zoospore-producing fungi and parasitic activity may play an important role in aquatic food webs.
Pena, Loren D M; Jiang, Yong-Hui; Schoch, Kelly; Spillmann, Rebecca C; Walley, Nicole; Stong, Nicholas; Rapisardo Horn, Sarah; Sullivan, Jennifer A; McConkie-Rosell, Allyn; Kansagra, Sujay; Smith, Edward C; El-Dairi, Mays; Bellet, Jane; Keels, Martha Ann; Jasien, Joan; Kranz, Peter G; Noel, Richard; Nagaraj, Shashi K; Lark, Robert K; Wechsler, Daniel S G; Del Gaudio, Daniela; Leung, Marco L; Hendon, Laura G; Parker, Collette C; Jones, Kelly L; Goldstein, David B; Shashi, Vandana
2018-04-01
PurposeTo describe examples of missed pathogenic variants on whole-exome sequencing (WES) and the importance of deep phenotyping for further diagnostic testing.MethodsGuided by phenotypic information, three children with negative WES underwent targeted single-gene testing.ResultsIndividual 1 had a clinical diagnosis consistent with infantile systemic hyalinosis, although WES and a next-generation sequencing (NGS)-based ANTXR2 test were negative. Sanger sequencing of ANTXR2 revealed a homozygous single base pair insertion, previously missed by the WES variant caller software. Individual 2 had neurodevelopmental regression and cerebellar atrophy, with no diagnosis on WES. New clinical findings prompted Sanger sequencing and copy number testing of PLA2G6. A novel homozygous deletion of the noncoding exon 1 (not included in the WES capture kit) was detected, with extension into the promoter, confirming the clinical suspicion of infantile neuroaxonal dystrophy. Individual 3 had progressive ataxia, spasticity, and magnetic resonance image changes of vanishing white matter leukoencephalopathy. An NGS leukodystrophy gene panel and WES showed a heterozygous pathogenic variant in EIF2B5; no deletions/duplications were detected. Sanger sequencing of EIF2B5 showed a frameshift indel, probably missed owing to failure of alignment.ConclusionThese cases illustrate potential pitfalls of WES/NGS testing and the importance of phenotype-guided molecular testing in yielding diagnoses.
An, Xiaoping; Fan, Hang; Ma, Maijuan; Anderson, Benjamin D.; Jiang, Jiafu; Liu, Wei; Cao, Wuchun; Tong, Yigang
2014-01-01
This paper explored our hypothesis that sRNA (18∼30 bp) deep sequencing technique can be used as an efficient strategy to identify microorganisms other than viruses, such as prokaryotic and eukaryotic pathogens. In the study, the clean reads derived from the sRNA deep sequencing data of wild-caught ticks and mosquitoes were compared against the NCBI nucleotide collection (non-redundant nt database) using Blastn. The blast results were then analyzed with in-house Python scripts. An empirical formula was proposed to identify the putative pathogens. Results showed that not only viruses but also prokaryotic and eukaryotic species of interest can be screened out and were subsequently confirmed with experiments. Specially, a novel Rickettsia spp. was indicated to exist in Haemaphysalis longicornis ticks collected in Beijing. Our study demonstrated the reuse of sRNA deep sequencing data would have the potential to trace the origin of pathogens or discover novel agents of emerging/re-emerging infectious diseases. PMID:24618575
Wang, Guojun; Barrett, Nolan H; McCarthy, Peter J
2017-02-02
The proteobacterium Alteromonas sp. strain V450 was isolated from the Atlantic deep-sea sponge Leiodermatium sp. Here, we report the draft genome sequence of this strain, with a genome size of approx. 4.39 Mb and a G+C content of 44.01%. The results will aid deep-sea microbial ecology, evolution, and sponge-microbe association studies. Copyright © 2017 Wang et al.
NASA Technical Reports Server (NTRS)
Wilson, Gillian; Demarco, Ricardo; Muzzin, Adam; Yee, H.K.C.; Lacy, Mark; Surace, Jason; Gilbank, David; Blindert, Kris; Hoekstra, Henk; Majumdar, Subhabrata;
2008-01-01
The Spitzer Adaptation of the Red-sequence Cluster Survey (SpARCS) is a z'-passband imaging survey, consisting of deep (z' approx. 24 AB) observations made from both hemispheres using the CFHT 3.6m and CTIO 4m telescopes. The survey was designed with the primary aim of detecting galaxy clusters at z > 1. In tandem with pre-existing 3.6 micron observations from the Spitzer Space Telescope SWIRE Legacy Survey, SpARCS detects clusters using an infrared adaptation of the two-filter red-sequence cluster technique. The total effective area of the SpARCS cluster survey is 41.9 sq deg. In this paper, we provide an overview of the 13.6 sq deg Southern CTIO/MOSAICII observations. The 28.3 sq deg Northern CFHT/MegaCam observations are summarized in a companion paper by Muzzin et al. (2008a). In this paper, we also report spectroscopic confirmation of SpARCS J003550-431224, a very rich galaxy cluster at z = 1.335, discovered in the ELAIS-S1 field. To date, this is the highest spectroscopically confirmed redshift for a galaxy cluster discovered using the red-sequence technique. Based on nine confirmed members, SpARCS J003550-431224 has a preliminary velocity dispersion of 1050+/-230 km/s. With its proven capability for efficient cluster detection, SpARCS is a demonstration that we have entered an era of large, homogeneously-selected z > 1 cluster surveys.
2013-01-01
Background Next-generation-sequencing (NGS) technologies combined with a classic DNA barcoding approach have enabled fast and credible measurement for biodiversity of mixed environmental samples. However, the PCR amplification involved in nearly all existing NGS protocols inevitably introduces taxonomic biases. In the present study, we developed new Illumina pipelines without PCR amplifications to analyze terrestrial arthropod communities. Results Mitochondrial enrichment directly followed by Illumina shotgun sequencing, at an ultra-high sequence volume, enabled the recovery of Cytochrome c Oxidase subunit 1 (COI) barcode sequences, which allowed for the estimation of species composition at high fidelity for a terrestrial insect community. With 15.5 Gbp Illumina data, approximately 97% and 92% were detected out of the 37 input Operational Taxonomic Units (OTUs), whether the reference barcode library was used or not, respectively, while only 1 novel OTU was found for the latter. Additionally, relatively strong correlation between the sequencing volume and the total biomass was observed for species from the bulk sample, suggesting a potential solution to reveal relative abundance. Conclusions The ability of the new Illumina PCR-free pipeline for DNA metabarcoding to detect small arthropod specimens and its tendency to avoid most, if not all, false positives suggests its great potential in biodiversity-related surveillance, such as in biomonitoring programs. However, further improvement for mitochondrial enrichment is likely needed for the application of the new pipeline in analyzing arthropod communities at higher diversity. PMID:23587339
miRBase: integrating microRNA annotation and deep-sequencing data.
Kozomara, Ana; Griffiths-Jones, Sam
2011-01-01
miRBase is the primary online repository for all microRNA sequences and annotation. The current release (miRBase 16) contains over 15,000 microRNA gene loci in over 140 species, and over 17,000 distinct mature microRNA sequences. Deep-sequencing technologies have delivered a sharp rise in the rate of novel microRNA discovery. We have mapped reads from short RNA deep-sequencing experiments to microRNAs in miRBase and developed web interfaces to view these mappings. The user can view all read data associated with a given microRNA annotation, filter reads by experiment and count, and search for microRNAs by tissue- and stage-specific expression. These data can be used as a proxy for relative expression levels of microRNA sequences, provide detailed evidence for microRNA annotations and alternative isoforms of mature microRNAs, and allow us to revisit previous annotations. miRBase is available online at: http://www.mirbase.org/.
Transcriptome sequences resolve deep relationships of the grape family.
Wen, Jun; Xiong, Zhiqiang; Nie, Ze-Long; Mao, Likai; Zhu, Yabing; Kan, Xian-Zhao; Ickert-Bond, Stefanie M; Gerrath, Jean; Zimmer, Elizabeth A; Fang, Xiao-Dong
2013-01-01
Previous phylogenetic studies of the grape family (Vitaceae) yielded poorly resolved deep relationships, thus impeding our understanding of the evolution of the family. Next-generation sequencing now offers access to protein coding sequences very easily, quickly and cost-effectively. To improve upon earlier work, we extracted 417 orthologous single-copy nuclear genes from the transcriptomes of 15 species of the Vitaceae, covering its phylogenetic diversity. The resulting transcriptome phylogeny provides robust support for the deep relationships, showing the phylogenetic utility of transcriptome data for plants over a time scale at least since the mid-Cretaceous. The pros and cons of transcriptome data for phylogenetic inference in plants are also evaluated.
Takei, Hiraku; Morishita, Soji; Araki, Marito; Edahiro, Yoko; Sunami, Yoshitaka; Hironaka, Yumi; Noda, Naohiro; Sekiguchi, Yuji; Tsuneda, Satoshi; Ohsaka, Akimichi; Komatsu, Norio
2014-01-01
A gain-of-function mutation in the myeloproliferative leukemia virus (MPL) gene, which encodes the thrombopoietin receptor, has been identified in patients with essential thrombocythemia and primary myelofibrosis, subgroups of classic myeloproliferative neoplasms (MPNs). The presence of MPL gene mutations is a critical diagnostic criterion for these diseases. Here, we developed a rapid, simple, and cost-effective method of detecting two major MPL mutations, MPLW515L/K, in a single PCR assay; we termed this method DARMS (dual amplification refractory mutation system)-PCR. DARMS-PCR is designed to produce three different PCR products corresponding to MPLW515L, MPLW515K, and all MPL alleles. The amplicons are later detected and quantified using a capillary sequencer to determine the relative frequencies of the mutant and wild-type alleles. Applying DARMS-PCR to human specimens, we successfully identified MPL mutations in MPN patients, with the exception of patients bearing mutant allele frequencies below the detection limit (5%) of this method. The MPL mutant allele frequencies determined using DARMS-PCR correlated strongly with the values determined using deep sequencing. Thus, we demonstrated the potential of DARMS-PCR to detect MPL mutations and determine the allele frequencies in a timely and cost-effective manner. PMID:25144224
Takei, Hiraku; Morishita, Soji; Araki, Marito; Edahiro, Yoko; Sunami, Yoshitaka; Hironaka, Yumi; Noda, Naohiro; Sekiguchi, Yuji; Tsuneda, Satoshi; Ohsaka, Akimichi; Komatsu, Norio
2014-01-01
A gain-of-function mutation in the myeloproliferative leukemia virus (MPL) gene, which encodes the thrombopoietin receptor, has been identified in patients with essential thrombocythemia and primary myelofibrosis, subgroups of classic myeloproliferative neoplasms (MPNs). The presence of MPL gene mutations is a critical diagnostic criterion for these diseases. Here, we developed a rapid, simple, and cost-effective method of detecting two major MPL mutations, MPLW515L/K, in a single PCR assay; we termed this method DARMS (dual amplification refractory mutation system)-PCR. DARMS-PCR is designed to produce three different PCR products corresponding to MPLW515L, MPLW515K, and all MPL alleles. The amplicons are later detected and quantified using a capillary sequencer to determine the relative frequencies of the mutant and wild-type alleles. Applying DARMS-PCR to human specimens, we successfully identified MPL mutations in MPN patients, with the exception of patients bearing mutant allele frequencies below the detection limit (5%) of this method. The MPL mutant allele frequencies determined using DARMS-PCR correlated strongly with the values determined using deep sequencing. Thus, we demonstrated the potential of DARMS-PCR to detect MPL mutations and determine the allele frequencies in a timely and cost-effective manner.
Draft Genome Sequence of Pseudomonas oceani DSM 100277T, a Deep-Sea Bacterium
2018-01-01
ABSTRACT Pseudomonas oceani DSM 100277T was isolated from deep seawater in the Okinawa Trough at 1390 m. P. oceani belongs to the Pseudomonas pertucinogena group. Here, we report the draft genome sequence of P. oceani, which has an estimated size of 4.1 Mb and exhibits 3,790 coding sequences, with a G+C content of 59.94 mol%. PMID:29650573
Li, Yunfeng; Zhou, Zunchun; Tian, Meilin; Tian, Yi; Dong, Ying; Li, Shilei; Liu, Weidong; He, Chongbo
2017-08-01
In this study, single nucleotide polymorphism (SNP), microsatellite (SSR) and differentially expressed genes (DEGs) in the oral parts, gonads, and umbrella parts of the jellyfish Rhopilema esculentum were analyzed by RNA-Seq technology. A total of 76.4 million raw reads and 72.1 million clean reads were generated from deep sequencing. Approximately 119,874 tentative unigenes and 149,239 transcripts were obtained. A total of 1,034,708 SNP markers were detected in the three tissues. For microsatellite mining, 5088 SSRs were identified from the unigene sequences. The most frequent repeat motifs were mononucleotide repeats, which accounted for 61.93%. Transcriptome comparison of the three tissues yielded a total of 8841 DEGs, of which 3560 were up-regulated and 5281 were down-regulated. This study represents the greatest sequencing effort carried out for a jellyfish and provides the first high-throughput transcriptomic resource for jellyfish. Copyright © 2017 Elsevier B.V. All rights reserved.
Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks
Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun
2018-01-01
Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence’s saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them. PMID:27896980
Aftershock occurrence rate decay for individual sequences and catalogs
NASA Astrophysics Data System (ADS)
Nyffenegger, Paul A.
One of the earliest observations of the Earth's seismicity is that the rate of aftershock occurrence decays with time according to a power law commonly known as modified Omori-law (MOL) decay. However, the physical reasons for aftershock occurrence and the empirical decay in rate remain unclear despite numerous models that yield similar rate decay behavior. Key problems in relating the observed empirical relationship to the physical conditions of the mainshock and fault are the lack of studies including small magnitude mainshocks and the lack of uniformity between studies. We use simulated aftershock sequences to investigate the factors which influence the maximum likelihood (ML) estimate of the Omori-law p value, the parameter describing aftershock occurrence rate decay, for both individual aftershock sequences and "stacked" or superposed sequences. Generally the ML estimate of p is accurate, but since the ML estimated uncertainty is unaffected by whether the sequence resembles an MOL model, a goodness-of-fit test such as the Anderson-Darling statistic is necessary. While stacking aftershock sequences permits the study of entire catalogs and sequences with small aftershock populations, stacking introduces artifacts. The p value for stacked sequences is approximately equal to the mean of the individual sequence p values. We apply single-link cluster analysis to identify all aftershock sequences from eleven regional seismicity catalogs. We observe two new mathematically predictable empirical relationships for the distribution of aftershock sequence populations. The average properties of aftershock sequences are not correlated with tectonic environment, but aftershock populations and p values do show a depth dependence. The p values show great variability with time, and large values or changes in p sometimes precedes major earthquakes. Studies of teleseismic earthquake catalogs over the last twenty years have led seismologists to question seismicity models and aftershock sequence decay for deep sequences. For seven exceptional deep sequences, we conclude that MOL decay adequately describes these sequences, and little difference exists compared to shallow sequences. However, they do include larger aftershock populations compared to most deep sequences. These results imply that p values for deep sequences are larger than those for intermediate depth sequences.
Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.
Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo
2016-01-11
Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.
Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields
NASA Astrophysics Data System (ADS)
Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo
2016-01-01
Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.
Deep learning methods for protein torsion angle prediction.
Li, Haiou; Hou, Jie; Adhikari, Badri; Lyu, Qiang; Cheng, Jianlin
2017-09-18
Deep learning is one of the most powerful machine learning methods that has achieved the state-of-the-art performance in many domains. Since deep learning was introduced to the field of bioinformatics in 2012, it has achieved success in a number of areas such as protein residue-residue contact prediction, secondary structure prediction, and fold recognition. In this work, we developed deep learning methods to improve the prediction of torsion (dihedral) angles of proteins. We design four different deep learning architectures to predict protein torsion angles. The architectures including deep neural network (DNN) and deep restricted Boltzmann machine (DRBN), deep recurrent neural network (DRNN) and deep recurrent restricted Boltzmann machine (DReRBM) since the protein torsion angle prediction is a sequence related problem. In addition to existing protein features, two new features (predicted residue contact number and the error distribution of torsion angles extracted from sequence fragments) are used as input to each of the four deep learning architectures to predict phi and psi angles of protein backbone. The mean absolute error (MAE) of phi and psi angles predicted by DRNN, DReRBM, DRBM and DNN is about 20-21° and 29-30° on an independent dataset. The MAE of phi angle is comparable to the existing methods, but the MAE of psi angle is 29°, 2° lower than the existing methods. On the latest CASP12 targets, our methods also achieved the performance better than or comparable to a state-of-the art method. Our experiment demonstrates that deep learning is a valuable method for predicting protein torsion angles. The deep recurrent network architecture performs slightly better than deep feed-forward architecture, and the predicted residue contact number and the error distribution of torsion angles extracted from sequence fragments are useful features for improving prediction accuracy.
Jones, David T; Kandathil, Shaun M
2018-04-26
In addition to substitution frequency data from protein sequence alignments, many state-of-the-art methods for contact prediction rely on additional sources of information, or features, of protein sequences in order to predict residue-residue contacts, such as solvent accessibility, predicted secondary structure, and scores from other contact prediction methods. It is unclear how much of this information is needed to achieve state-of-the-art results. Here, we show that using deep neural network models, simple alignment statistics contain sufficient information to achieve state-of-the-art precision. Our prediction method, DeepCov, uses fully convolutional neural networks operating on amino-acid pair frequency or covariance data derived directly from sequence alignments, without using global statistical methods such as sparse inverse covariance or pseudolikelihood estimation. Comparisons against CCMpred and MetaPSICOV2 show that using pairwise covariance data calculated from raw alignments as input allows us to match or exceed the performance of both of these methods. Almost all of the achieved precision is obtained when considering relatively local windows (around 15 residues) around any member of a given residue pairing; larger window sizes have comparable performance. Assessment on a set of shallow sequence alignments (fewer than 160 effective sequences) indicates that the new method is substantially more precise than CCMpred and MetaPSICOV2 in this regime, suggesting that improved precision is attainable on smaller sequence families. Overall, the performance of DeepCov is competitive with the state of the art, and our results demonstrate that global models, which employ features from all parts of the input alignment when predicting individual contacts, are not strictly needed in order to attain precise contact predictions. DeepCov is freely available at https://github.com/psipred/DeepCov. d.t.jones@ucl.ac.uk.
Hanriot, Lucie; Keime, Céline; Gay, Nadine; Faure, Claudine; Dossat, Carole; Wincker, Patrick; Scoté-Blachon, Céline; Peyron, Christelle; Gandrillon, Olivier
2008-01-01
Background "Open" transcriptome analysis methods allow to study gene expression without a priori knowledge of the transcript sequences. As of now, SAGE (Serial Analysis of Gene Expression), LongSAGE and MPSS (Massively Parallel Signature Sequencing) are the mostly used methods for "open" transcriptome analysis. Both LongSAGE and MPSS rely on the isolation of 21 pb tag sequences from each transcript. In contrast to LongSAGE, the high throughput sequencing method used in MPSS enables the rapid sequencing of very large libraries containing several millions of tags, allowing deep transcriptome analysis. However, a bias in the complexity of the transcriptome representation obtained by MPSS was recently uncovered. Results In order to make a deep analysis of mouse hypothalamus transcriptome avoiding the limitation introduced by MPSS, we combined LongSAGE with the Solexa sequencing technology and obtained a library of more than 11 millions of tags. We then compared it to a LongSAGE library of mouse hypothalamus sequenced with the Sanger method. Conclusion We found that Solexa sequencing technology combined with LongSAGE is perfectly suited for deep transcriptome analysis. In contrast to MPSS, it gives a complex representation of transcriptome as reliable as a LongSAGE library sequenced by the Sanger method. PMID:18796152
Betz-Stablein, B. D.; Töpfer, A.; Littlejohn, M.; Yuen, L.; Colledge, D.; Sozzi, V.; Angus, P.; Thompson, A.; Revill, P.; Beerenwinkel, N.; Warner, N.
2016-01-01
ABSTRACT Chronic hepatitis B (CHB) is prevalent worldwide. The infectious agent, hepatitis B virus (HBV), replicates via an RNA intermediate and is error prone, leading to the rapid generation of closely related but not identical viral variants, including those that can escape host immune responses and antiviral treatments. The complexity of CHB can be further enhanced by the presence of HBV variants with large deletions in the genome generated via splicing (spHBV variants). Although spHBV variants are incapable of autonomous replication, their replication is rescued by wild-type HBV. spHBV variants have been shown to enhance wild-type virus replication, and their prevalence increases with liver disease progression. Single-molecule deep sequencing was performed on whole HBV genomes extracted from samples, including the liver explant, longitudinally collected from a subject with CHB over a 15-year period after liver transplantation. By employing novel bioinformatics methods, this analysis showed that the dynamics of the viral population across a period of changing treatment regimens was complex. The spHBV variants detected in the liver explant remained present posttransplantation, and a highly diverse novel spHBV population as well as variants with multiple deletions in the pre-S genes emerged. The identification of novel mutations outside the HBV reverse transcriptase gene that co-occurred with known drug resistance-associated mutations highlights the relevance of using full-genome deep sequencing and supports the hypothesis that drug resistance involves interactions across the full length of the HBV genome. IMPORTANCE Single-molecule sequencing allowed the characterization, in unprecedented detail, of the evolution of HBV populations and offered unique insights into the dynamics of defective and spHBV variants following liver transplantation and complex treatment regimens. This analysis also showed the rapid adaptation of HBV populations to treatment regimens with evolving drug resistance phenotypes and evidence of purifying selection across the whole genome. Finally, the new open-source bioinformatics tools with the capacity to easily identify potential spliced variants from deep sequencing data are freely available. PMID:27252524
Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi
2015-11-20
The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.
Chen, Zhouwei; Li, Lufeng; Shan, Zhan; Huang, Hannian; Chen, Huan; Ding, Xianfeng; Guo, Jiangfeng; Liu, Lili
2016-11-01
Kineococcus radiotolerans is a Gram-positive, radio-resistant bacterium isolated from a radioactive environment. The small noncoding RNAs (sRNAs) in bacteria are reported to play roles in the immediate response to stress and/or the recovery from stress. The analysis of K. radiotolerans transcriptome sequencing results can identify these sRNAs in a genome-wide detection, using RNA sequencing (RNA-seq) by the deep sequencing technique. In this study, the raw data of radiation-exposed samples (RS) and control samples (CS) were acquired separately from the sequencing platform. There were 217 common sRNA candidates in the two samples screened in the genome-wide scale by bioinformatics analysis. There were 43 differentially expressed sRNA candidates, including 28 up-regulated and 15 down-regulated ones. The down-regulated sRNAs were selected for the sRNA target prediction, of which 12 sRNAs that may modulate the genes related to the transcription regulation and DNA repair were considered as the candidates involved in the radio-resistance regulation system. Copyright © 2016 Elsevier GmbH. All rights reserved.
HomSI: a homozygous stretch identifier from next-generation sequencing data.
Görmez, Zeliha; Bakir-Gungor, Burcu; Sagiroglu, Mahmut Samil
2014-02-01
In consanguineous families, as a result of inheriting the same genomic segments through both parents, the individuals have stretches of their genomes that are homozygous. This situation leads to the prevalence of recessive diseases among the members of these families. Homozygosity mapping is based on this observation, and in consanguineous families, several recessive disease genes have been discovered with the help of this technique. The researchers typically use single nucleotide polymorphism arrays to determine the homozygous regions and then search for the disease gene by sequencing the genes within this candidate disease loci. Recently, the advent of next-generation sequencing enables the concurrent identification of homozygous regions and the detection of mutations relevant for diagnosis, using data from a single sequencing experiment. In this respect, we have developed a novel tool that identifies homozygous regions using deep sequence data. Using *.vcf (variant call format) files as an input file, our program identifies the majority of homozygous regions found by microarray single nucleotide polymorphism genotype data. HomSI software is freely available at www.igbam.bilgem.tubitak.gov.tr/softwares/HomSI, with an online manual.
The Role Of Rejuvenation In Shaping The High-Mass End Of The Main Sequence
NASA Astrophysics Data System (ADS)
Mancini, Chiara
2017-06-01
We investigate the nature of star forming galaxies with reduced specific SFRs and high stellar masses, those that seemingly cause the so-called bending of the main sequence. The fact that such objects host large bulges recently lead some to suggest that the internal formation of the bulges, via compaction or disk instabilities, was the late event that induced sSFRs of massive galaxies to drop in a slow downfall and thus the main sequence to bend. We have studied in detail a sample of 16 galaxies at 0.5
DOE Office of Scientific and Technical Information (OSTI.GOV)
Borucki, Monica K.; Lao, Victoria; Hwang, Mona
Middle East respiratory syndrome coronavirus (MERS-CoV) is an emerging human pathogen related to SARS virus. In vitro studies indicate this virus may have a broad host range suggesting an increased pandemic potential. Genetic and epidemiological evidence indicate camels serve as a reservoir for MERS virus but the mechanism of cross species transmission is unclear and many questions remain regarding the susceptibility of humans to infection. Deep sequencing data was obtained from the nasal samples of three camels that had been experimentally infected with a human MERS-CoV isolate. A majority of the genome was covered and average coverage was greater thanmore » 12,000x depth. Although only 5 mutations were detected in the consensus sequences, 473 intrahost single nucleotide variants were identified. Lastly, many of these variants were present at high frequencies and could potentially influence viral phenotype and the sensitivity of detection assays that target these regions for primer or probe binding.« less
Borucki, Monica K.; Lao, Victoria; Hwang, Mona; ...
2016-01-20
Middle East respiratory syndrome coronavirus (MERS-CoV) is an emerging human pathogen related to SARS virus. In vitro studies indicate this virus may have a broad host range suggesting an increased pandemic potential. Genetic and epidemiological evidence indicate camels serve as a reservoir for MERS virus but the mechanism of cross species transmission is unclear and many questions remain regarding the susceptibility of humans to infection. Deep sequencing data was obtained from the nasal samples of three camels that had been experimentally infected with a human MERS-CoV isolate. A majority of the genome was covered and average coverage was greater thanmore » 12,000x depth. Although only 5 mutations were detected in the consensus sequences, 473 intrahost single nucleotide variants were identified. Lastly, many of these variants were present at high frequencies and could potentially influence viral phenotype and the sensitivity of detection assays that target these regions for primer or probe binding.« less
Improved detection and relocation of micro-earthquakes applied to the Sea of Marmara
NASA Astrophysics Data System (ADS)
Tary, J. B.; Evangelia, B.; Géli, L.; Lomax, A.
2016-12-01
The Sea of Marmara is located at the western end of the North Anatolian Fault (NAF). This part of the NAF is considered as a seismic gap, being between the Izmit and Duzce earthquakes to the East and the Ganos earthquake to the West. Improved detection and location of seismicity in the Sea of Marmara is important for defining the seismic hazard in this area.On July 25, 2011, a Mw 5 earthquake occurred below the Western High in the western part of the Sea of Marmara. This earthquake as well as its aftershock sequence were recorded by a network of 10 ocean bottom seismometers (Ifremer) as well as seafloor observatories (KOERI). The OBSs were deployed from mid-April, 2011, to the end of July, 2011.The aftershock sequence is characterized by deep seismicity ( 10-15 km) around the main shock and shallow seismicity. Some of the shallow seismicity could be located at a similar depth as gas prone sediment layers below the Western High. The exact causes of these shallow aftershocks are still unclear. To better define this aftershock sequence, we use the match filter technique with a selection of aftershocks as templates to dig out child events from the continuous data streams. The templates are cross-correlated with the continuous data for stations with absolute time picks. The cross-correlation coefficients are then summed over all stations and components, and we then compute its median absolute deviation (MAD). Signals are detected when the summed cross-correlation time series exceeds a given number of times the MAD. Using a conservative detection threshold, we obtain a 10-fold increase in the number of events. The newly detected events are then relocated using the double-difference technique. With these newly detected events, we investigate the nucleation phase of the main shock and the aftershock sequence, as well as the possible triggering of the shallow aftershocks by the deeper seismicity.
Unsupervised Sequential Outlier Detection With Deep Architectures.
Lu, Weining; Cheng, Yu; Xiao, Cao; Chang, Shiyu; Huang, Shuai; Liang, Bin; Huang, Thomas
2017-09-01
Unsupervised outlier detection is a vital task and has high impact on a wide variety of applications domains, such as image analysis and video surveillance. It also gains long-standing attentions and has been extensively studied in multiple research areas. Detecting and taking action on outliers as quickly as possible are imperative in order to protect network and related stakeholders or to maintain the reliability of critical systems. However, outlier detection is difficult due to the one class nature and challenges in feature construction. Sequential anomaly detection is even harder with more challenges from temporal correlation in data, as well as the presence of noise and high dimensionality. In this paper, we introduce a novel deep structured framework to solve the challenging sequential outlier detection problem. We use autoencoder models to capture the intrinsic difference between outliers and normal instances and integrate the models to recurrent neural networks that allow the learning to make use of previous context as well as make the learners more robust to warp along the time axis. Furthermore, we propose to use a layerwise training procedure, which significantly simplifies the training procedure and hence helps achieve efficient and scalable training. In addition, we investigate a fine-tuning step to update all parameters set by incorporating the temporal correlation in the sequence. We further apply our proposed models to conduct systematic experiments on five real-world benchmark data sets. Experimental results demonstrate the effectiveness of our model, compared with other state-of-the-art approaches.
Cocos, Anne; Fiks, Alexander G; Masino, Aaron J
2017-07-01
Social media is an important pharmacovigilance data source for adverse drug reaction (ADR) identification. Human review of social media data is infeasible due to data quantity, thus natural language processing techniques are necessary. Social media includes informal vocabulary and irregular grammar, which challenge natural language processing methods. Our objective is to develop a scalable, deep-learning approach that exceeds state-of-the-art ADR detection performance in social media. We developed a recurrent neural network (RNN) model that labels words in an input sequence with ADR membership tags. The only input features are word-embedding vectors, which can be formed through task-independent pretraining or during ADR detection training. Our best-performing RNN model used pretrained word embeddings created from a large, non-domain-specific Twitter dataset. It achieved an approximate match F-measure of 0.755 for ADR identification on the dataset, compared to 0.631 for a baseline lexicon system and 0.65 for the state-of-the-art conditional random field model. Feature analysis indicated that semantic information in pretrained word embeddings boosted sensitivity and, combined with contextual awareness captured in the RNN, precision. Our model required no task-specific feature engineering, suggesting generalizability to additional sequence-labeling tasks. Learning curve analysis showed that our model reached optimal performance with fewer training examples than the other models. ADR detection performance in social media is significantly improved by using a contextually aware model and word embeddings formed from large, unlabeled datasets. The approach reduces manual data-labeling requirements and is scalable to large social media datasets. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Schulze, Philipp; Ludwig, Martin; Kohler, Frank; Belder, Detlev
2005-03-01
Deep UV fluorescence detection at 266-nm excitation wavelength has been realized for sensitive detection in microchip electrophoresis. For this purpose, an epifluorescence setup was developed enabling the coupling of a deep UV laser into a commercial fluorescence microscope. Deep UV laser excitation utilizing a frequency quadrupled pulsed laser operating at 266 nm shows an impressive performance for native fluorescence detection of various compounds in fused-silica microfluidic devices. Aromatic low molecular weight compounds such as serotonin, propranolol, a diol, and tryptophan could be detected at low-micromolar concentrations. Deep UV fluorescence detection was also successfully employed for the detection of unlabeled basic proteins. For this purpose, fused-silica chips dynamically coated with hydroxypropylmethyl cellulose were employed to suppress analyte adsorption. Utilizing fused-silica chips permanently coated with poly(vinyl alcohol), it was also possible to separate and detect egg white chicken proteins. These data show that deep UV fluorescence detection significantly widens the application range of fluorescence detection in chip-based analysis techniques.
DNA Barcoding the Geometrid Fauna of Bavaria (Lepidoptera): Successes, Surprises, and Questions
Hausmann, Axel; Haszprunar, Gerhard; Hebert, Paul D. N.
2011-01-01
Background The State of Bavaria is involved in a research program that will lead to the construction of a DNA barcode library for all animal species within its territorial boundaries. The present study provides a comprehensive DNA barcode library for the Geometridae, one of the most diverse of insect families. Methodology/Principal Findings This study reports DNA barcodes for 400 Bavarian geometrid species, 98 per cent of the known fauna, and approximately one per cent of all Bavarian animal species. Although 98.5% of these species possess diagnostic barcode sequences in Bavaria, records from neighbouring countries suggest that species-level resolution may be compromised in up to 3.5% of cases. All taxa which apparently share barcodes are discussed in detail. One case of modest divergence (1.4%) revealed a species overlooked by the current taxonomic system: Eupithecia goossensiata Mabille, 1869 stat.n. is raised from synonymy with Eupithecia absinthiata (Clerck, 1759) to species rank. Deep intraspecific sequence divergences (>2%) were detected in 20 traditionally recognized species. Conclusions/Significance The study emphasizes the effectiveness of DNA barcoding as a tool for monitoring biodiversity. Open access is provided to a data set that includes records for 1,395 geometrid specimens (331 species) from Bavaria, with 69 additional species from neighbouring regions. Taxa with deep intraspecific sequence divergences are undergoing more detailed analysis to ascertain if they represent cases of cryptic diversity. PMID:21423340
Azzouzi, Imane; Moest, Hansjoerg; Wollscheid, Bernd; Schmugge, Markus; Eekels, Julia J M; Speer, Oliver
2015-05-01
During maturation, erythropoietic cells extrude their nuclei but retain their ability to respond to oxidant stress by tightly regulating protein translation. Several studies have reported microRNA-mediated regulation of translation during terminal stages of erythropoiesis, even after enucleation. In the present study, we performed a detailed examination of the endogenous microRNA machinery in human red blood cells using a combination of deep sequencing analysis of microRNAs and proteomic analysis of the microRNA-induced silencing complex. Among the 197 different microRNAs detected, miR-451a was the most abundant, representing more than 60% of all read sequences. In addition, miR-451a and its known target, 14-3-3ζ mRNA, were bound to the microRNA-induced silencing complex, implying their direct interaction in red blood cells. The proteomic characterization of endogenous Argonaute 2-associated microRNA-induced silencing complex revealed 26 cofactor candidates. Among these cofactors, we identified several RNA-binding proteins, as well as motor proteins and vesicular trafficking proteins. Our results demonstrate that red blood cells contain complex microRNA machinery, which might enable immature red blood cells to control protein translation independent of de novo nuclei information. Copyright © 2015 ISEH - International Society for Experimental Hematology. Published by Elsevier Inc. All rights reserved.
Jensen, Sigmund; Lynch, Michael D J; Ray, Jessica L; Neufeld, Josh D; Hovland, Martin
2015-10-01
Deep-sea coral reefs do not receive sunlight and depend on plankton. Little is known about the plankton composition at such reefs, even though they constitute habitats for many invertebrates and fish. We investigated plankton communities from three reefs at 260-350 m depth at hydrocarbon fields off the mid-Norwegian coast using a combination of cultivation and small subunit (SSU) rRNA gene and transcript sequencing. Eight months incubations of a reef water sample with minimal medium, supplemented with carbon dioxide and gaseous alkanes at in situ-like conditions, enabled isolation of mostly Alphaproteobacteria (Sulfitobacter, Loktanella), Gammaproteobacteria (Colwellia) and Flavobacteria (Polaribacter). The relative abundance of isolates in the original sample ranged from ∼ 0.01% to 0.80%. Comparisons of bacterial SSU sequences from filtered plankton of reef and non-reef control samples indicated high abundance and metabolic activity of primarily Alphaproteobacteria (SAR11 Ia), Gammaproteobacteria (ARCTIC96BD-19), but also of Deltaproteobacteria (Nitrospina, SAR324). Eukaryote SSU sequences indicated metabolically active microalgae and animals, including codfish, at the reef sites. The plankton community composition varied between reefs and differed between DNA and RNA assessments. Over 5000 operational taxonomic units were detected, some indicators of reef sites (e.g. Flavobacteria, Cercozoa, Demospongiae) and some more active at reef sites (e.g. Gammaproteobacteria, Ciliophora, Copepoda). © 2014 Society for Applied Microbiology and John Wiley & Sons Ltd.
DEEP NEAR-IR OBSERVATIONS OF THE GLOBULAR CLUSTER M4: HUNTING FOR BROWN DWARFS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dieball, A.; Bedin, L. R.; Knigge, C.
2016-01-20
We present an analysis of deep Hubble Space Telescope (HST)/Wide Field Camera 3 near-IR (NIR) imaging data of the globular cluster (GC) M4. The best-photometry NIR color–magnitude diagram (CMD) clearly shows the main sequence extending toward the expected end of the hydrogen-burning limit and going beyond this point toward fainter sources. The white dwarf (WD) sequence can be identified. As such, this is the deepest NIR CMD of a GC to date. Archival HST optical data were used for proper-motion cleaning of the CMD and for distinguishing the WDs from brown dwarf (BD) candidates. Detection limits in the NIR aremore » around F110W ≈ 26.5 mag and F160W ≈ 27 mag, and in the optical around F775W ≈ 28 mag. Comparing our observed CMDs with theoretical models, we conclude that we have reached beyond the H-burning limit in our NIR CMD and are probably just above or around this limit in our optical–NIR CMDs. Thus, any faint NIR sources that have no optical counterpart are potential BD candidates, since the optical data are not deep enough to detect them. We visually inspected the positions of NIR sources that are fainter than the H-burning limit in F110W and for which the optical photometry did not return a counterpart. We found in total five sources for which we did not get an optical measurement. For four of these five sources, a faint optical counterpart could be visually identified, and an upper optical magnitude was estimated. Based on these upper optical magnitude limits, we conclude that one source is likely a WD, one source could be either a WD or BD candidate, and the remaining two sources agree with being BD candidates. No optical counterpart could be detected for just one source, which makes this source a good BD candidate. We conclude that we found in total four good BD candidates.« less
2011-01-01
Background Parasitoid insects manipulate their hosts' physiology by injecting various factors into their host upon parasitization. Transcriptomic approaches provide a powerful approach to study insect host-parasitoid interactions at the molecular level. In order to investigate the effects of parasitization by an ichneumonid wasp (Diadegma semiclausum) on the host (Plutella xylostella), the larval transcriptome profile was analyzed using a short-read deep sequencing method (Illumina). Symbiotic polydnaviruses (PDVs) associated with ichneumonid parasitoids, known as ichnoviruses, play significant roles in host immune suppression and developmental regulation. In the current study, D. semiclausum ichnovirus (DsIV) genes expressed in P. xylostella were identified and their sequences compared with other reported PDVs. Five of these genes encode proteins of unknown identity, that have not previously been reported. Results De novo assembly of cDNA sequence data generated 172,660 contigs between 100 and 10000 bp in length; with 35% of > 200 bp in length. Parasitization had significant impacts on expression levels of 928 identified insect host transcripts. Gene ontology data illustrated that the majority of the differentially expressed genes are involved in binding, catalytic activity, and metabolic and cellular processes. In addition, the results show that transcription levels of antimicrobial peptides, such as gloverin, cecropin E and lysozyme, were up-regulated after parasitism. Expression of ichnovirus genes were detected in parasitized larvae with 19 unique sequences identified from five PDV gene families including vankyrin, viral innexin, repeat elements, a cysteine-rich motif, and polar residue rich protein. Vankyrin 1 and repeat element 1 genes showed the highest transcription levels among the DsIV genes. Conclusion This study provides detailed information on differential expression of P. xylostella larval genes following parasitization, DsIV genes expressed in the host and also improves our current understanding of this host-parasitoid interaction. PMID:21906285
2012-01-01
Background Plants respond to external stimuli through fine regulation of gene expression partially ensured by small RNAs. Of these, microRNAs (miRNAs) play a crucial role. They negatively regulate gene expression by targeting the cleavage or translational inhibition of target messenger RNAs (mRNAs). In Hevea brasiliensis, environmental and harvesting stresses are known to affect natural rubber production. This study set out to identify abiotic stress-related miRNAs in Hevea using next-generation sequencing and bioinformatic analysis. Results Deep sequencing of small RNAs was carried out on plantlets subjected to severe abiotic stress using the Solexa technique. By combining the LeARN pipeline, data from the Plant microRNA database (PMRD) and Hevea EST sequences, we identified 48 conserved miRNA families already characterized in other plant species, and 10 putatively novel miRNA families. The results showed the most abundant size for miRNAs to be 24 nucleotides, except for seven families. Several MIR genes produced both 20-22 nucleotides and 23-27 nucleotides. The two miRNA class sizes were detected for both conserved and putative novel miRNA families, suggesting their functional duality. The EST databases were scanned with conserved and novel miRNA sequences. MiRNA targets were computationally predicted and analysed. The predicted targets involved in "responses to stimuli" and to "antioxidant" and "transcription activities" are presented. Conclusions Deep sequencing of small RNAs combined with transcriptomic data is a powerful tool for identifying conserved and novel miRNAs when the complete genome is not yet available. Our study provided additional information for evolutionary studies and revealed potentially specific regulation of the control of redox status in Hevea. PMID:22330773
NASA Technical Reports Server (NTRS)
Wissler, Steven S.; Maldague, Pierre; Rocca, Jennifer; Seybold, Calina
2006-01-01
The Deep Impact mission was ambitious and challenging. JPL's well proven, easily adaptable multi-mission sequence planning tools combined with integrated spacecraft subsystem models enabled a small operations team to develop, validate, and execute extremely complex sequence-based activities within very short development times. This paper focuses on the core planning tool used in the mission, APGEN. It shows how the multi-mission design and adaptability of APGEN made it possible to model spacecraft subsystems as well as ground assets throughout the lifecycle of the Deep Impact project, starting with models of initial, high-level mission objectives, and culminating in detailed predictions of spacecraft behavior during mission-critical activities.
Transcriptome Sequences Resolve Deep Relationships of the Grape Family
Wen, Jun; Xiong, Zhiqiang; Nie, Ze-Long; Mao, Likai; Zhu, Yabing; Kan, Xian-Zhao; Ickert-Bond, Stefanie M.; Gerrath, Jean; Zimmer, Elizabeth A.; Fang, Xiao-Dong
2013-01-01
Previous phylogenetic studies of the grape family (Vitaceae) yielded poorly resolved deep relationships, thus impeding our understanding of the evolution of the family. Next-generation sequencing now offers access to protein coding sequences very easily, quickly and cost-effectively. To improve upon earlier work, we extracted 417 orthologous single-copy nuclear genes from the transcriptomes of 15 species of the Vitaceae, covering its phylogenetic diversity. The resulting transcriptome phylogeny provides robust support for the deep relationships, showing the phylogenetic utility of transcriptome data for plants over a time scale at least since the mid-Cretaceous. The pros and cons of transcriptome data for phylogenetic inference in plants are also evaluated. PMID:24069307
P-Hint-Hunt: a deep parallelized whole genome DNA methylation detection tool.
Peng, Shaoliang; Yang, Shunyun; Gao, Ming; Liao, Xiangke; Liu, Jie; Yang, Canqun; Wu, Chengkun; Yu, Wenqiang
2017-03-14
The increasing studies have been conducted using whole genome DNA methylation detection as one of the most important part of epigenetics research to find the significant relationships among DNA methylation and several typical diseases, such as cancers and diabetes. In many of those studies, mapping the bisulfite treated sequence to the whole genome has been the main method to study DNA cytosine methylation. However, today's relative tools almost suffer from inaccuracies and time-consuming problems. In our study, we designed a new DNA methylation prediction tool ("Hint-Hunt") to solve the problem. By having an optimal complex alignment computation and Smith-Waterman matrix dynamic programming, Hint-Hunt could analyze and predict the DNA methylation status. But when Hint-Hunt tried to predict DNA methylation status with large-scale dataset, there are still slow speed and low temporal-spatial efficiency problems. In order to solve the problems of Smith-Waterman dynamic programming and low temporal-spatial efficiency, we further design a deep parallelized whole genome DNA methylation detection tool ("P-Hint-Hunt") on Tianhe-2 (TH-2) supercomputer. To the best of our knowledge, P-Hint-Hunt is the first parallel DNA methylation detection tool with a high speed-up to process large-scale dataset, and could run both on CPU and Intel Xeon Phi coprocessors. Moreover, we deploy and evaluate Hint-Hunt and P-Hint-Hunt on TH-2 supercomputer in different scales. The experimental results illuminate our tools eliminate the deviation caused by bisulfite treatment in mapping procedure and the multi-level parallel program yields a 48 times speed-up with 64 threads. P-Hint-Hunt gain a deep acceleration on CPU and Intel Xeon Phi heterogeneous platform, which gives full play of the advantages of multi-cores (CPU) and many-cores (Phi).
Won, J K; Keam, B; Koh, J; Cho, H J; Jeon, Y K; Kim, T M; Lee, S H; Lee, D S; Kim, D W; Chung, D H
2015-02-01
Epidermal growth factor receptor (EGFR) mutation and anaplastic lymphoma kinase (ALK) translocation are considered mutually exclusive in nonsmall-cell lung cancer (NSCLC). However, sporadic cases having concomitant EGFR and ALK alterations have been reported. The present study aimed to assess the prevalence of NSCLCs with concomitant EGFR and ALK alterations using mutation detection methods with different sensitivity and to propose an effective diagnostic and therapeutic strategy. A total of 1458 cases of lung cancer were screened for EGFR and ALK alterations by direct sequencing and flourescence in situ hybridization (FISH), respectively. For the 91 patients identified as having an ALK translocation, peptide nucleic acid (PNA)-clamping real-time PCR, targeted next-generation sequencing (NGS), and mutant-enriched NGS assays were carried out to detect EGFR mutation. EGFR mutations and ALK translocations were observed in 42.4% (612/1445) and 6.3% (91/1445) of NSCLCs by direct sequencing and FISH, respectively. Concomitant EGFR and ALK alterations were detected in four cases, which accounted for 4.4% (4/91) of ALK-translocated NSCLCs. Additional analyses for EGFR using PNA real-time PCR and ultra-deep sequencing by NGS, mutant-enriched NGS increased the detection rate of concomitant EGFR and ALK alterations to 8.8% (8/91), 12.1% (11/91), and 15.4% (14/91) of ALK-translocated NSCLCs, respectively. Of the 14 patients, 3 who were treated with gefitinib showed poor response to gefitinib with stable disease in one and progressive disease in two patients. However, eight patients who received ALK inhibitor (crizotinib or ceritinib) showed good response, with response rate of 87.5% (7/8 with partial response) and durable progression-free survival. A portion of NSCLC patients have concomitant EGFR and ALK alterations and the frequency of co-alteration detection increases when sensitive detection methods for EGFR mutation are applied. ALK inhibitors appear to be effective for patients with co-alterations. © The Author 2014. Published by Oxford University Press on behalf of the European Society for Medical Oncology. All rights reserved. For permissions, please email: journals.permissions@oup.com.
NASA Technical Reports Server (NTRS)
Herskovits, E. H.; Itoh, R.; Melhem, E. R.
2001-01-01
OBJECTIVE: The objective of our study was to determine the effects of MR sequence (fluid-attenuated inversion-recovery [FLAIR], proton density--weighted, and T2-weighted) and of lesion location on sensitivity and specificity of lesion detection. MATERIALS AND METHODS: We generated FLAIR, proton density-weighted, and T2-weighted brain images with 3-mm lesions using published parameters for acute multiple sclerosis plaques. Each image contained from zero to five lesions that were distributed among cortical-subcortical, periventricular, and deep white matter regions; on either side; and anterior or posterior in position. We presented images of 540 lesions, distributed among 2592 image regions, to six neuroradiologists. We constructed a contingency table for image regions with lesions and another for image regions without lesions (normal). Each table included the following: the reviewer's number (1--6); the MR sequence; the side, position, and region of the lesion; and the reviewer's response (lesion present or absent [normal]). We performed chi-square and log-linear analyses. RESULTS: The FLAIR sequence yielded the highest true-positive rates (p < 0.001) and the highest true-negative rates (p < 0.001). Regions also differed in reviewers' true-positive rates (p < 0.001) and true-negative rates (p = 0.002). The true-positive rate model generated by log-linear analysis contained an additional sequence-location interaction. The true-negative rate model generated by log-linear analysis confirmed these associations, but no higher order interactions were added. CONCLUSION: We developed software with which we can generate brain images of a wide range of pulse sequences and that allows us to specify the location, size, shape, and intrinsic characteristics of simulated lesions. We found that the use of FLAIR sequences increases detection accuracy for cortical-subcortical and periventricular lesions over that associated with proton density- and T2-weighted sequences.
Diverse molecular signatures for ribosomally ‘active’ Perkinsea in marine sediments
2014-01-01
Background Perkinsea are a parasitic lineage within the eukaryotic superphylum Alveolata. Recent studies making use of environmental small sub-unit ribosomal RNA gene (SSU rDNA) sequencing methodologies have detected a significant diversity and abundance of Perkinsea-like phylotypes in freshwater environments. In contrast only a few Perkinsea environmental sequences have been retrieved from marine samples and only two groups of Perkinsea have been cultured and morphologically described and these are parasites of marine molluscs or marine protists. These two marine groups form separate and distantly related phylogenetic clusters, composed of closely related lineages on SSU rDNA trees. Here, we test the hypothesis that Perkinsea are a hitherto under-sampled group in marine environments. Using 454 diversity ‘tag’ sequencing we investigate the diversity and distribution of these protists in marine sediments and water column samples taken from the Deep Chlorophyll Maximum (DCM) and sub-surface using both DNA and RNA as the source template and sampling four European offshore locations. Results We detected the presence of 265 sequences branching with known Perkinsea, the majority of them recovered from marine sediments. Moreover, 27% of these sequences were sampled from RNA derived cDNA libraries. Phylogenetic analyses classify a large proportion of these sequences into 38 cluster groups (including 30 novel marine cluster groups), which share less than 97% sequence similarity suggesting this diversity encompasses a range of biologically and ecologically distinct organisms. Conclusions These results demonstrate that the Perkinsea lineage is considerably more diverse than previously detected in marine environments. This wide diversity of Perkinsea-like protists is largely retrieved in marine sediment with a significant proportion detected in RNA derived libraries suggesting this diversity represents ribosomally ‘active’ and intact cells. Given the phylogenetic range of hosts infected by known Perkinsea parasites, these data suggest that Perkinsea either play a significant but hitherto unrecognized role as parasites in marine sediments and/or members of this group are present in the marine sediment possibly as part of the ‘seed bank’ microbial community. PMID:24779375
Deep Learning and Its Applications in Biomedicine.
Cao, Chensi; Liu, Feng; Tan, Hai; Song, Deshou; Shu, Wenjie; Li, Weizhong; Zhou, Yiming; Bo, Xiaochen; Xie, Zhi
2018-02-01
Advances in biological and medical technologies have been providing us explosive volumes of biological and physiological data, such as medical images, electroencephalography, genomic and protein sequences. Learning from these data facilitates the understanding of human health and disease. Developed from artificial neural networks, deep learning-based algorithms show great promise in extracting features and learning patterns from complex data. The aim of this paper is to provide an overview of deep learning techniques and some of the state-of-the-art applications in the biomedical field. We first introduce the development of artificial neural network and deep learning. We then describe two main components of deep learning, i.e., deep learning architectures and model optimization. Subsequently, some examples are demonstrated for deep learning applications, including medical image classification, genomic sequence analysis, as well as protein structure classification and prediction. Finally, we offer our perspectives for the future directions in the field of deep learning. Copyright © 2018. Production and hosting by Elsevier B.V.
DeepMitosis: Mitosis detection via deep detection, verification and segmentation networks.
Li, Chao; Wang, Xinggang; Liu, Wenyu; Latecki, Longin Jan
2018-04-01
Mitotic count is a critical predictor of tumor aggressiveness in the breast cancer diagnosis. Nowadays mitosis counting is mainly performed by pathologists manually, which is extremely arduous and time-consuming. In this paper, we propose an accurate method for detecting the mitotic cells from histopathological slides using a novel multi-stage deep learning framework. Our method consists of a deep segmentation network for generating mitosis region when only a weak label is given (i.e., only the centroid pixel of mitosis is annotated), an elaborately designed deep detection network for localizing mitosis by using contextual region information, and a deep verification network for improving detection accuracy by removing false positives. We validate the proposed deep learning method on two widely used Mitosis Detection in Breast Cancer Histological Images (MITOSIS) datasets. Experimental results show that we can achieve the highest F-score on the MITOSIS dataset from ICPR 2012 grand challenge merely using the deep detection network. For the ICPR 2014 MITOSIS dataset that only provides the centroid location of mitosis, we employ the segmentation model to estimate the bounding box annotation for training the deep detection network. We also apply the verification model to eliminate some false positives produced from the detection model. By fusing scores of the detection and verification models, we achieve the state-of-the-art results. Moreover, our method is very fast with GPU computing, which makes it feasible for clinical practice. Copyright © 2018 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gordon, Sean
2013-03-01
Sean Gordon of the USDA on Natural variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions at the 8th Annual Genomics of Energy Environment Meeting on March 27, 2013 in Walnut Creek, CA.
Microbial Diversity in Deep-sea Methane Seep Sediments Presented by SSU rRNA Gene Tag Sequencing
Nunoura, Takuro; Takaki, Yoshihiro; Kazama, Hiromi; Hirai, Miho; Ashi, Juichiro; Imachi, Hiroyuki; Takai, Ken
2012-01-01
Microbial community structures in methane seep sediments in the Nankai Trough were analyzed by tag-sequencing analysis for the small subunit (SSU) rRNA gene using a newly developed primer set. The dominant members of Archaea were Deep-sea Hydrothermal Vent Euryarchaeotic Group 6 (DHVEG 6), Marine Group I (MGI) and Deep Sea Archaeal Group (DSAG), and those in Bacteria were Alpha-, Gamma-, Delta- and Epsilonproteobacteria, Chloroflexi, Bacteroidetes, Planctomycetes and Acidobacteria. Diversity and richness were examined by 8,709 and 7,690 tag-sequences from sediments at 5 and 25 cm below the seafloor (cmbsf), respectively. The estimated diversity and richness in the methane seep sediment are as high as those in soil and deep-sea hydrothermal environments, although the tag-sequences obtained in this study were not sufficient to show whole microbial diversity in this analysis. We also compared the diversity and richness of each taxon/division between the sediments from the two depths, and found that the diversity and richness of some taxa/divisions varied significantly along with the depth. PMID:22510646
Deep Recurrent Neural Networks for Human Activity Recognition
Murad, Abdulmajid
2017-01-01
Adopting deep learning methods for human activity recognition has been effective in extracting discriminative features from raw input sequences acquired from body-worn sensors. Although human movements are encoded in a sequence of successive samples in time, typical machine learning methods perform recognition tasks without exploiting the temporal correlations between input data samples. Convolutional neural networks (CNNs) address this issue by using convolutions across a one-dimensional temporal sequence to capture dependencies among input data. However, the size of convolutional kernels restricts the captured range of dependencies between data samples. As a result, typical models are unadaptable to a wide range of activity-recognition configurations and require fixed-length input windows. In this paper, we propose the use of deep recurrent neural networks (DRNNs) for building recognition models that are capable of capturing long-range dependencies in variable-length input sequences. We present unidirectional, bidirectional, and cascaded architectures based on long short-term memory (LSTM) DRNNs and evaluate their effectiveness on miscellaneous benchmark datasets. Experimental results show that our proposed models outperform methods employing conventional machine learning, such as support vector machine (SVM) and k-nearest neighbors (KNN). Additionally, the proposed models yield better performance than other deep learning techniques, such as deep believe networks (DBNs) and CNNs. PMID:29113103
Deep Recurrent Neural Networks for Human Activity Recognition.
Murad, Abdulmajid; Pyun, Jae-Young
2017-11-06
Adopting deep learning methods for human activity recognition has been effective in extracting discriminative features from raw input sequences acquired from body-worn sensors. Although human movements are encoded in a sequence of successive samples in time, typical machine learning methods perform recognition tasks without exploiting the temporal correlations between input data samples. Convolutional neural networks (CNNs) address this issue by using convolutions across a one-dimensional temporal sequence to capture dependencies among input data. However, the size of convolutional kernels restricts the captured range of dependencies between data samples. As a result, typical models are unadaptable to a wide range of activity-recognition configurations and require fixed-length input windows. In this paper, we propose the use of deep recurrent neural networks (DRNNs) for building recognition models that are capable of capturing long-range dependencies in variable-length input sequences. We present unidirectional, bidirectional, and cascaded architectures based on long short-term memory (LSTM) DRNNs and evaluate their effectiveness on miscellaneous benchmark datasets. Experimental results show that our proposed models outperform methods employing conventional machine learning, such as support vector machine (SVM) and k-nearest neighbors (KNN). Additionally, the proposed models yield better performance than other deep learning techniques, such as deep believe networks (DBNs) and CNNs.
Draft Genome Sequence of Pseudomonas oceani DSM 100277T, a Deep-Sea Bacterium.
García-Valdés, Elena; Gomila, Margarita; Mulet, Magdalena; Lalucat, Jorge
2018-04-12
Pseudomonas oceani DSM 100277 T was isolated from deep seawater in the Okinawa Trough at 1390 m. P. oceani belongs to the Pseudomonas pertucinogena group. Here, we report the draft genome sequence of P. oceani , which has an estimated size of 4.1 Mb and exhibits 3,790 coding sequences, with a G+C content of 59.94 mol%. Copyright © 2018 García-Valdés et al.
Brown, Shawn P; Callaham, Mac A; Oliver, Alena K; Jumpponen, Ari
2013-12-01
Prescribed burning is a common management tool to control fuel loads, ground vegetation, and facilitate desirable game species. We evaluated soil fungal community responses to long-term prescribed fire treatments in a loblolly pine forest on the Piedmont of Georgia and utilized deep Internal Transcribed Spacer Region 1 (ITS1) amplicon sequencing afforded by the recent Ion Torrent Personal Genome Machine (PGM). These deep sequence data (19,000 + reads per sample after subsampling) indicate that frequent fires (3-year fire interval) shift soil fungus communities, whereas infrequent fires (6-year fire interval) permit system resetting to a state similar to that without prescribed fire. Furthermore, in nonmetric multidimensional scaling analyses, primarily ectomycorrhizal taxa were correlated with axes associated with long fire intervals, whereas soil saprobes tended to be correlated with the frequent fire recurrence. We conclude that (1) multiplexed Ion Torrent PGM analyses allow deep cost effective sequencing of fungal communities but may suffer from short read lengths and inconsistent sequence quality adjacent to the sequencing adaptor; (2) frequent prescribed fires elicit a shift in soil fungal communities; and (3) such shifts do not occur when fire intervals are longer. Our results emphasize the general responsiveness of these forests to management, and the importance of fire return intervals in meeting management objectives. © 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cardamone, Carolin N.; Van Dokkum, Pieter G.; Urry, C. Megan
2010-08-15
We present deep optical 18-medium-band photometry from the Subaru telescope over the {approx}30' x 30' Extended Chandra Deep Field-South, as part of the Multiwavelength Survey by Yale-Chile (MUSYC). This field has a wealth of ground- and space-based ancillary data, and contains the GOODS-South field and the Hubble Ultra Deep Field. We combine the Subaru imaging with existing UBVRIzJHK and Spitzer IRAC images to create a uniform catalog. Detecting sources in the MUSYC 'BVR' image we find {approx}40,000 galaxies with R {sub AB} < 25.3, the median 5{sigma} limit of the 18 medium bands. Photometric redshifts are determined using the EAzYmore » code and compared to {approx}2000 spectroscopic redshifts in this field. The medium-band filters provide very accurate redshifts for the (bright) subset of galaxies with spectroscopic redshifts, particularly at 0.1 < z < 1.2 and at z {approx}> 3.5. For 0.1 < z < 1.2, we find a 1{sigma} scatter in {Delta}z/(1 + z) of 0.007, similar to results obtained with a similar filter set in the COSMOS field. As a demonstration of the data quality, we show that the red sequence and blue cloud can be cleanly identified in rest-frame color-magnitude diagrams at 0.1 < z < 1.2. We find that {approx}20% of the red sequence galaxies show evidence of dust emission at longer rest-frame wavelengths. The reduced images, photometric catalog, and photometric redshifts are provided through the public MUSYC Web site.« less
RNA-Seq analysis to capture the transcriptome landscape of a single cell
Tang, Fuchou; Barbacioru, Catalin; Nordman, Ellen; Xu, Nanlan; Bashkirov, Vladimir I; Lao, Kaiqin; Surani, M. Azim
2013-01-01
We describe here a protocol for digital transcriptome analysis in a single mouse blastomere using a deep sequencing approach. An individual blastomere was first isolated and put into lysate buffer by mouth pipette. Reverse transcription was then performed directly on the whole cell lysate. After this, the free primers were removed by Exonuclease I and a poly(A) tail was added to the 3′ end of the first-strand cDNA by Terminal Deoxynucleotidyl Transferase. Then the single cell cDNAs were amplified by 20 plus 9 cycles of PCR. Then 100-200 ng of these amplified cDNAs were used to construct a sequencing library. The sequencing library can be used for deep sequencing using the SOLiD system. Compared with the cDNA microarray technique, our assay can capture up to 75% more genes expressed in early embryos. The protocol can generate deep sequencing libraries within 6 days for 16 single cell samples. PMID:20203668
Natural soil reservoirs for human pathogenic and fecal indicator bacteria
Boschiroli, Maria L; Falkinham, Joseph; Favre-Bonte, Sabine; Nazaret, Sylvie; Piveteau, Pascal; Sadowsky, Michael J.; Byappanahalli, Muruleedhara; Delaquis, Pascal; Hartmann, Alain
2016-01-01
Soils receive inputs of human pathogenic and indicator bacteria through land application of animal manures or sewage sludge, and inputs by wildlife. Soil is an extremely heterogeneous substrate and contains meso- and macrofauna that may be reservoirs for bacteria of human health concern. The ability to detect and quantify bacteria of human health concern is important in risk assessments and in evaluating the efficacy of agricultural soil management practices that are protective of crop quality and protective of adjacent water resources. The present chapter describes the distribution of selected Gram-positive and Gram-negative bacteria in soils. Methods for detecting and quantifying soilborne bacteria including extraction, enrichment using immunomagnetic capture, culturing, molecular detection and deep sequencing of metagenomic DNA to detect pathogens are overviewed. Methods for strain phenotypic and genotypic characterization are presented, as well as how comparison with clinical isolates can inform the potential for human health risk.
Bessette, Sandrine; Moalic, Yann; Gautey, Sébastien; Lesongeur, Françoise; Godfroy, Anne; Toffin, Laurent
2017-01-01
Sitting at ∼5,000 m water depth on the Congo-Angola margin and ∼760 km offshore of the West African coast, the recent lobe complex of the Congo deep-sea fan receives large amounts of fluvial sediments (3-5% organic carbon). This organic-rich sedimentation area harbors habitats with chemosynthetic communities similar to those of cold seeps. In this study, we investigated relative abundance, diversity and distribution of aerobic methane-oxidizing bacteria (MOB) communities at the oxic-anoxic interface of sedimentary habitats by using fluorescence in situ hybridization and comparative sequence analysis of particulate mono-oxygenase ( pmoA ) genes. Our findings revealed that sedimentary habitats of the recent lobe complex hosted type I and type II MOB cells and comparisons of pmoA community compositions showed variations among the different organic-rich habitats. Furthermore, the pmoA lineages were taxonomically more diverse compared to methane seep environments and were related to those found at cold seeps. Surprisingly, MOB phylogenetic lineages typical of terrestrial environments were observed at such water depth. In contrast, MOB cells or pmoA sequences were not detected at the previous lobe complex that is disconnected from the Congo River inputs.
NASA Astrophysics Data System (ADS)
Sokolov, S. Yu.; Moroz, E. A.; Abramova, A. S.; Zarayskaya, Yu. A.; Dobrolubova, K. O.
2017-07-01
On cruises 25 (2007) and 28 (2011) of the R/V Akademik Nikolai Strakhov in the northern part of the Barents Sea, the Geological Institute, Russian Academy of Sciences, conducted comprehensive research on the bottom relief and upper part of the sedimentary cover profile under the auspices of the International Polar Year program. One of the instrument components was the SeaBat 8111 shallow-water multibeam echo sounder, which can map the acoustic field similarly to a side scan sonar, which records the response both from the bottom and from the water column. In the operations area, intense sound scattering objects produced by the discharge of deep fluid flows are detected in the water column. The sound scattering objects and pockmarks in the bottom relief are related to anomalies in hydrocarbon gas concentrations in bottom sediments. The sound scattering objects are localized over Triassic sequences outcropping from the bottom. The most intense degassing processes manifest themselves near the contact of the Triassic sequences and Jurassic clay deposits, as well as over deep depressions in a field of Bouguer anomalies related to the basement of the Jurassic-Cretaceous rift system
Olsson, Linda; Zettermark, Sofia; Biloglav, Andrea; Castor, Anders; Behrendtz, Mikael; Forestier, Erik; Paulsson, Kajsa; Johansson, Bertil
2016-07-01
Cytogenetic analyses of a consecutive series of 67 paediatric (median age 8 years; range 0-17) de novo acute myeloid leukaemia (AML) patients revealed aberrations in 55 (82%) cases. The most common subgroups were KMT2A rearrangement (29%), normal karyotype (15%), RUNX1-RUNX1T1 (10%), deletions of 5q, 7q and/or 17p (9%), myeloid leukaemia associated with Down syndrome (7%), PML-RARA (7%) and CBFB-MYH11 (5%). Single nucleotide polymorphism array (SNP-A) analysis and exon sequencing of 100 genes, performed in 52 and 40 cases, respectively (39 overlapping), revealed ≥1 aberration in 89%; when adding cytogenetic data, this frequency increased to 98%. Uniparental isodisomies (UPIDs) were detected in 13% and copy number aberrations (CNAs) in 63% (median 2/case); three UPIDs and 22 CNAs were recurrent. Twenty-two genes were targeted by focal CNAs, including AEBP2 and PHF6 deletions and genes involved in AML-associated gene fusions. Deep sequencing identified mutations in 65% of cases (median 1/case). In total, 60 mutations were found in 30 genes, primarily those encoding signalling proteins (47%), transcription factors (25%), or epigenetic modifiers (13%). Twelve genes (BCOR, CEBPA, FLT3, GATA1, KIT, KRAS, NOTCH1, NPM1, NRAS, PTPN11, SMC3 and TP53) were recurrently mutated. We conclude that SNP-A and deep sequencing analyses complement the cytogenetic diagnosis of paediatric AML. © 2016 John Wiley & Sons Ltd.
IgM Repertoire Biodiversity is Reduced in HIV-1 Infection and Systemic Lupus Erythematosus.
Yin, Li; Hou, Wei; Liu, Li; Cai, Yunpeng; Wallet, Mark Andrew; Gardner, Brent Paul; Chang, Kaifen; Lowe, Amanda Catherine; Rodriguez, Carina Adriana; Sriaroon, Panida; Farmerie, William George; Sleasman, John William; Goodenow, Maureen Michels
2013-01-01
HIV-1 infection or systemic lupus erythematosus (SLE) disrupt B cell homeostasis, reduce memory B cells, and impair function of IgG and IgM antibodies. To determine how disturbances in B cell populations producing polyclonal antibodies relate to the IgM repertoire, the IgM transcriptome in health and disease was explored at the complementarity determining region 3 (CDRH3) sequence level. 454-deep pyrosequencing in combination with a novel analysis pipeline was applied to define populations of IGHM CDRH3 sequences based on absence or presence of somatic hypermutations (SHM) in peripheral blood B cells. HIV or SLE subjects have reduced biodiversity within their IGHM transcriptome compared to healthy subjects, mainly due to a significant decrease in the number of unique combinations of alleles, although recombination machinery was intact. While major differences between sequences without or with SHM occurred among all groups, IGHD and IGHJ allele use, CDRH3 length distribution, or generation of SHM were similar among study cohorts. Antiretroviral therapy failed to normalize IGHM biodiversity in HIV-infected individuals. All subjects had a low frequency of allelic combinations within the IGHM repertoire similar to known broadly neutralizing HIV-1 antibodies. Polyclonal expansion would decrease overall IgM biodiversity independent of other mechanisms for development of the B cell repertoire. Applying deep sequencing as a strategy to follow development of the IgM repertoire in health and disease provides a novel molecular assessment of multiple points along the B cell differentiation pathway that is highly sensitive for detecting perturbations within the repertoire at the population level.
Recurrent chimeric RNAs enriched in human prostate cancer identified by deep sequencing
Kannan, Kalpana; Wang, Liguo; Wang, Jianghua; Ittmann, Michael M.; Li, Wei; Yen, Laising
2011-01-01
Transcription-induced chimeric RNAs, possessing sequences from different genes, are expected to increase the proteomic diversity through chimeric proteins or altered regulation. Despite their importance, few studies have focused on chimeric RNAs especially regarding their presence/roles in human cancers. By deep sequencing the transcriptome of 20 human prostate cancer and 10 matched benign prostate tissues, we obtained 1.3 billion sequence reads, which led to the identification of 2,369 chimeric RNA candidates. Chimeric RNAs occurred in significantly higher frequency in cancer than in matched benign samples. Experimental investigation of a selected 46 set led to the confirmation of 32 chimeric RNAs, of which 27 were highly recurrent and previously undescribed in prostate cancer. Importantly, a subset of these chimeras was present in prostate cancer cell lines, but not detectable in primary human prostate epithelium cells, implying their associations with cancer. These chimeras contain discernable 5′ and 3′ splice sites at the RNA junction, indicating that their formation is mediated by splicing. Their presence is also largely independent of the expression of parental genes, suggesting that other factors are involved in their production and regulation. One chimera, TMEM79-SMG5, is highly differentially expressed in human cancer samples and therefore a potential biomarker. The prevalence of chimeric RNAs may allow the limited number of human genes to encode a substantially larger number of RNAs and proteins, forming an additional layer of cellular complexity. Together, our results suggest that chimeric RNAs are widespread, and increased chimeric RNA events could represent a unique class of molecular alteration in cancer. PMID:21571633
RaptorX-Property: a web server for protein structure property prediction.
Wang, Sheng; Li, Wei; Liu, Shiwang; Xu, Jinbo
2016-07-08
RaptorX Property (http://raptorx2.uchicago.edu/StructurePropertyPred/predict/) is a web server predicting structure property of a protein sequence without using any templates. It outperforms other servers, especially for proteins without close homologs in PDB or with very sparse sequence profile (i.e. carries little evolutionary information). This server employs a powerful in-house deep learning model DeepCNF (Deep Convolutional Neural Fields) to predict secondary structure (SS), solvent accessibility (ACC) and disorder regions (DISO). DeepCNF not only models complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent property labels. Our experimental results show that, tested on CASP10, CASP11 and the other benchmarks, this server can obtain ∼84% Q3 accuracy for 3-state SS, ∼72% Q8 accuracy for 8-state SS, ∼66% Q3 accuracy for 3-state solvent accessibility, and ∼0.89 area under the ROC curve (AUC) for disorder prediction. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Zywicki, Marek; Bakowska-Zywicka, Kamilla; Polacek, Norbert
2012-05-01
The exploration of the non-protein-coding RNA (ncRNA) transcriptome is currently focused on profiling of microRNA expression and detection of novel ncRNA transcription units. However, recent studies suggest that RNA processing can be a multi-layer process leading to the generation of ncRNAs of diverse functions from a single primary transcript. Up to date no methodology has been presented to distinguish stable functional RNA species from rapidly degraded side products of nucleases. Thus the correct assessment of widespread RNA processing events is one of the major obstacles in transcriptome research. Here, we present a novel automated computational pipeline, named APART, providing a complete workflow for the reliable detection of RNA processing products from next-generation-sequencing data. The major features include efficient handling of non-unique reads, detection of novel stable ncRNA transcripts and processing products and annotation of known transcripts based on multiple sources of information. To disclose the potential of APART, we have analyzed a cDNA library derived from small ribosome-associated RNAs in Saccharomyces cerevisiae. By employing the APART pipeline, we were able to detect and confirm by independent experimental methods multiple novel stable RNA molecules differentially processed from well known ncRNAs, like rRNAs, tRNAs or snoRNAs, in a stress-dependent manner.
Daikoku, Tohru; Oyama, Yukari; Yajima, Misako; Sekizuka, Tsuyoshi; Kuroda, Makoto; Shimada, Yuka; Takehara, Kazuhiko; Miwa, Naoko; Okuda, Tomoko; Sata, Tetsutaro; Shiraki, Kimiyasu
2015-06-01
Herpes simplex virus 2 caused a genital ulcer, and a secondary herpetic whitlow appeared during acyclovir therapy. The secondary and recurrent whitlow isolates were acyclovir-resistant and temperature-sensitive in contrast to a genital isolate. We identified the ribonucleotide reductase mutation responsible for temperature-sensitivity by deep-sequencing analysis.
Stratification-Based Outlier Detection over the Deep Web.
Xian, Xuefeng; Zhao, Pengpeng; Sheng, Victor S; Fang, Ligang; Gu, Caidong; Yang, Yuanfeng; Cui, Zhiming
2016-01-01
For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribution of this paper is to develop a new data mining method for outlier detection over deep web. In our approach, the query space of a deep web data source is stratified based on a pilot sample. Neighborhood sampling and uncertainty sampling are developed in this paper with the goal of improving recall and precision based on stratification. Finally, a careful performance evaluation of our algorithm confirms that our approach can effectively detect outliers in deep web.
Stratification-Based Outlier Detection over the Deep Web
Xian, Xuefeng; Zhao, Pengpeng; Sheng, Victor S.; Fang, Ligang; Gu, Caidong; Yang, Yuanfeng; Cui, Zhiming
2016-01-01
For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribution of this paper is to develop a new data mining method for outlier detection over deep web. In our approach, the query space of a deep web data source is stratified based on a pilot sample. Neighborhood sampling and uncertainty sampling are developed in this paper with the goal of improving recall and precision based on stratification. Finally, a careful performance evaluation of our algorithm confirms that our approach can effectively detect outliers in deep web. PMID:27313603
Brain tumor classification of microscopy images using deep residual learning
NASA Astrophysics Data System (ADS)
Ishikawa, Yota; Washiya, Kiyotada; Aoki, Kota; Nagahashi, Hiroshi
2016-12-01
The crisis rate of brain tumor is about one point four in ten thousands. In general, cytotechnologists take charge of cytologic diagnosis. However, the number of cytotechnologists who can diagnose brain tumors is not sufficient, because of the necessity of highly specialized skill. Computer-Aided Diagnosis by computational image analysis may dissolve the shortage of experts and support objective pathological examinations. Our purpose is to support a diagnosis from a microscopy image of brain cortex and to identify brain tumor by medical image processing. In this study, we analyze Astrocytes that is a type of glia cell of central nerve system. It is not easy for an expert to discriminate brain tumor correctly since the difference between astrocytes and low grade astrocytoma (tumors formed from Astrocyte) is very slight. In this study, we present a novel method to segment cell regions robustly using BING objectness estimation and to classify brain tumors using deep convolutional neural networks (CNNs) constructed by deep residual learning. BING is a fast object detection method and we use pretrained BING model to detect brain cells. After that, we apply a sequence of post-processing like Voronoi diagram, binarization, watershed transform to obtain fine segmentation. For classification using CNNs, a usual way of data argumentation is applied to brain cells database. Experimental results showed 98.5% accuracy of classification and 98.2% accuracy of segmentation.
A deep oxic ecosystem in the subseafloor South Pacific Gyre
NASA Astrophysics Data System (ADS)
D'Hondt, S. L.; Inagaki, F.; Alvarez Zarikian, C. A.; Integrated Ocean Drilling Program Expedition 329 Shipboard Scientific Party
2011-12-01
Scientific ocean drilling has demonstrated the occurrence of rich microbial communities, abundant active cells and diverse anaerobic activities in anoxic subseafloor sediment. Buried organic matter from the surface photosynthetic world sustains anaerobic heterotrophs in anoxic sediment as deeply buried as 1.6 km below the seafloor. However, these studies have been mostly restricted to the organic-rich sediment of continental margins and biologically productive regions. IODP Expedition 329 discovered that subseafloor habitat and life are fundamentally different in the vast expanse of organic-poor sediment that underlies Earth's largest oceanic province, the South Pacific Gyre (SPG). Dissolved O2 and dissolved major nutrients (C, N, P) are present throughout the entire sediment sequence and the upper basaltic basement of the SPG. The drilled sediment is up to 75 m thick. Although heterotrophic O2 reduction (aerobic respiration) persists for millions of years in SPG sediment (which accumulates very slowly), it falls below minimum detection just a few meters to tens of meters beneath the SPG seafloor. Cell concentrations approach minimum detection at similar depths, but are intermittently detectable throughout the entire sediment sequence. In situ radiolysis of water may be a significant source of energy for the microbes that inhabit the deepest (oldest) sediment.
Pena, Loren DM; Jiang, Yong-Hui; Schoch, Kelly; Spillmann, Rebecca C.; Walley, Nicole; Stong, Nicholas; Horn, Sarah Rapisardo; Sullivan, Jennifer A.; McConkie-Rosell, Allyn; Kansagra, Sujay; Smith, Edward C.; El-Dairi, Mays; Bellet, Jane; Ann Keels, Martha; Jasien, Joan; Kranz, Peter G.; Noel, Richard; Nagaraj, Shashi K.; Lark, Robert K.; Wechsler, Daniel SG; del Gaudio, Daniela; Leung, Marco L.; Hendon, Laura G.; Parker, Collette C.; Jones, Kelly L.; Goldstein, David B.; Shashi, Vandana
2017-01-01
Purpose To describe examples of missed pathogenic variants on whole exome sequencing (WES) and the importance of deep phenotyping for further diagnostic testing. Methods Guided by phenotypic information, three children with negative WES underwent targeted single gene testing. Results Individual 1 had a clinical diagnosis consistent with infantile systemic hyalinosis, although WES and an NGS-based ANTXR2 test were negative. Sanger sequencing of ANTXR2 revealed a homozygous single base pair insertion, previously missed by the WES variant caller software. Individual 2 had neurodevelopmental regression and cerebellar atrophy, with no diagnosis on WES. New clinical findings prompted Sanger sequencing and copy number testing of PLA2G6. A novel homozygous deletion of the non-coding exon 1 (not included in the WES capture kit) was detected, with extension into the promoter, confirming the clinical suspicion of infantile neuroaxonal dystrophy. Individual 3 had progressive ataxia, spasticity and MRI changes of vanishing white matter leukoencephalopathy. An NGS leukodystrophy gene panel and WES showed a heterozygous pathogenic variant in EIF2B5; no deletions/duplications were detected. Sanger sequencing of EIF2B5 showed a frameshift indel, likely missed due to failure of alignment. Conclusions These cases illustrate potential pitfalls of WES/NGS testing, and the importance of phenotype-guided molecular testing in yielding diagnoses. PMID:28914269
Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning.
Teng, Haotian; Cao, Minh Duc; Hall, Michael B; Duarte, Tania; Wang, Sheng; Coin, Lachlan J M
2018-05-01
Sequencing by translocating DNA fragments through an array of nanopores is a rapidly maturing technology that offers faster and cheaper sequencing than other approaches. However, accurately deciphering the DNA sequence from the noisy and complex electrical signal is challenging. Here, we report Chiron, the first deep learning model to achieve end-to-end basecalling and directly translate the raw signal to DNA sequence without the error-prone segmentation step. Trained with only a small set of 4,000 reads, we show that our model provides state-of-the-art basecalling accuracy, even on previously unseen species. Chiron achieves basecalling speeds of more than 2,000 bases per second using desktop computer graphics processing units.
Camacho, Sandra Catalina; Schumacher, Cassie A.; Irish, Jonathan C.; Harkins, Timothy T.; Belfer, Rachel; Kalir, Tamara; Reva, Boris; Dottino, Peter; Martignetti, John A.
2016-01-01
Background Endometrial cancer is the most common gynecologic malignancy, and its incidence and associated mortality are increasing. Despite the immediate need to detect these cancers at an earlier stage, there is no effective screening methodology or protocol for endometrial cancer. The comprehensive, genomics-based analysis of endometrial cancer by The Cancer Genome Atlas (TCGA) revealed many of the molecular defects that define this cancer. Based on these cancer genome results, and in a prospective study, we hypothesized that the use of ultra-deep, targeted gene sequencing could detect somatic mutations in uterine lavage fluid obtained from women undergoing hysteroscopy as a means of molecular screening and diagnosis. Methods and Findings Uterine lavage and paired blood samples were collected and analyzed from 107 consecutive patients who were undergoing hysteroscopy and curettage for diagnostic evaluation from this single-institution study. The lavage fluid was separated into cellular and acellular fractions by centrifugation. Cellular and cell-free DNA (cfDNA) were isolated from each lavage. Two targeted next-generation sequencing (NGS) gene panels, one composed of 56 genes and the other of 12 genes, were used for ultra-deep sequencing. To rule out potential NGS-based errors, orthogonal mutation validation was performed using digital PCR and Sanger sequencing. Seven patients were diagnosed with endometrial cancer based on classic histopathologic analysis. Six of these patients had stage IA cancer, and one of these cancers was only detectable as a microscopic focus within a polyp. All seven patients were found to have significant cancer-associated gene mutations in both cell pellet and cfDNA fractions. In the four patients in whom adequate tumor sample was available, all tumor mutations above a specific allele fraction were present in the uterine lavage DNA samples. Mutations originally only detected in lavage fluid fractions were later confirmed to be present in tumor but at allele fractions significantly less than 1%. Of the remaining 95 patients diagnosed with benign or non-cancer pathology, 44 had no significant cancer mutations detected. Intriguingly, 51 patients without histopathologic evidence of cancer had relatively high allele fraction (1.0%–30.4%), cancer-associated mutations. Participants with detected driver and potential driver mutations were significantly older (mean age mutated = 57.96, 95% confidence interval [CI]: 3.30–∞, mean age no mutations = 50.35; p-value = 0.002; Benjamini-Hochberg [BH] adjusted p-value = 0.015) and more likely to be post-menopausal (p-value = 0.004; BH-adjusted p-value = 0.015) than those without these mutations. No associations were detected between mutation status and race/ethnicity, body mass index, diabetes, parity, and smoking status. Long-term follow-up was not presently available in this prospective study for those women without histopathologic evidence of cancer. Conclusions Using ultra-deep NGS, we identified somatic mutations in DNA extracted both from cell pellets and a never previously reported cfDNA fraction from the uterine lavage. Using our targeted sequencing approach, endometrial driver mutations were identified in all seven women who received a cancer diagnosis based on classic histopathology of tissue curettage obtained at the time of hysteroscopy. In addition, relatively high allele fraction driver mutations were identified in the lavage fluid of approximately half of the women without a cancer diagnosis. Increasing age and post-menopausal status were associated with the presence of these cancer-associated mutations, suggesting the prevalent existence of a premalignant landscape in women without clinical evidence of cancer. Given that a uterine lavage can be easily and quickly performed even outside of the operating room and in a physician’s office-based setting, our findings suggest the future possibility of this approach for screening women for the earliest stages of endometrial cancer. However, our findings suggest that further insight into development of cancer or its interruption are needed before translation to the clinic. PMID:28027320
Park, Bo-Yong; Lee, Mi Ji; Lee, Seung-Hak; Cha, Jihoon; Chung, Chin-Sang; Kim, Sung Tae; Park, Hyunjin
2018-01-01
Migraineurs show an increased load of white matter hyperintensities (WMHs) and more rapid deep WMH progression. Previous methods for WMH segmentation have limited efficacy to detect small deep WMHs. We developed a new fully automated detection pipeline, DEWS (DEep White matter hyperintensity Segmentation framework), for small and superficially-located deep WMHs. A total of 148 non-elderly subjects with migraine were included in this study. The pipeline consists of three components: 1) white matter (WM) extraction, 2) WMH detection, and 3) false positive reduction. In WM extraction, we adjusted the WM mask to re-assign misclassified WMHs back to WM using many sequential low-level image processing steps. In WMH detection, the potential WMH clusters were detected using an intensity based threshold and region growing approach. For false positive reduction, the detected WMH clusters were classified into final WMHs and non-WMHs using the random forest (RF) classifier. Size, texture, and multi-scale deep features were used to train the RF classifier. DEWS successfully detected small deep WMHs with a high positive predictive value (PPV) of 0.98 and true positive rate (TPR) of 0.70 in the training and test sets. Similar performance of PPV (0.96) and TPR (0.68) was attained in the validation set. DEWS showed a superior performance in comparison with other methods. Our proposed pipeline is freely available online to help the research community in quantifying deep WMHs in non-elderly adults.
NASA Astrophysics Data System (ADS)
Hernsdorf, A. W.; Amano, Y.; Suzuki, Y.; Ise, K.; Thomas, B. C.; Banfield, J. F.
2015-12-01
Terrestrial sediments are an important global reservoir for methane. Microorganisms in the deep subsurface play a critical role in the methane cycle, yet much remains to be learned about their diversity and metabolisms. To provide more comprehensive insight into the microbiology of the methane cycle in the deep subsurface, we conducted a genome-resolved study of samples collected from the Horonobe Underground Research Laboratory (HURL), Japan. Groundwater samples were obtained from three boreholes from a depth range of between 140 m and 250 m in two consecutive years. Groundwater was filtered and metagenomic DNA extracted and sequenced, and the sequence data assembled. Based on the sequences of phylogenetically informative genes on the assembled fragments, we detected a high degree of overlap in community composition across a vertical transect within one borehole at the two sampling times. However, there was comparatively little similarity observed among communities across boreholes. Spatial and temporal abundance patterns were used in combination with tetranucleotide signatures of assembled genome fragments to bin the data and reconstruct over 200 unique draft genomes, of which 137 are considered to be of high quality (>90% complete). The deepest samples from one borehole were highly dominated by an archaeon identified as ANME-2D; this organism was also present at lower abundance in all other samples from that borehole. Also abundant in these microbial communities were novel members of the Gammaproteobacteria, Saccharibacteria (TM7) and Tenericute phyla. Notably, a ~2 Mbp draft genome for the ANME-2D archaeon was reconstructed. As expected, the genome encodes all of the genes predicted to be involved in the reverse methanogenesis pathway. In contrast with the previously reported ANME2-D genome, the HURL ANME-2D genome lacks the capacity to reduce nitrate. However, we identified many multiheme cytochromes with closest similarity to those of the known Fe-reducing/oxidizing archaeon Ferroglobus placidus. Thus, we suggest that ANME2-D may couple methane oxidation to reduction of ferric iron minerals in the sediment and may be generally important as a link between the iron and methane cycles in deep subsurface environments. Such information has important implications for modeling the global carbon cycle.
The deep, hot biosphere: Twenty-five years of retrospection.
Colman, Daniel R; Poudel, Saroj; Stamps, Blake W; Boyd, Eric S; Spear, John R
2017-07-03
Twenty-five years ago this month, Thomas Gold published a seminal manuscript suggesting the presence of a "deep, hot biosphere" in the Earth's crust. Since this publication, a considerable amount of attention has been given to the study of deep biospheres, their role in geochemical cycles, and their potential to inform on the origin of life and its potential outside of Earth. Overwhelming evidence now supports the presence of a deep biosphere ubiquitously distributed on Earth in both terrestrial and marine settings. Furthermore, it has become apparent that much of this life is dependent on lithogenically sourced high-energy compounds to sustain productivity. A vast diversity of uncultivated microorganisms has been detected in subsurface environments, and we show that H 2 , CH 4 , and CO feature prominently in many of their predicted metabolisms. Despite 25 years of intense study, key questions remain on life in the deep subsurface, including whether it is endemic and the extent of its involvement in the anaerobic formation and degradation of hydrocarbons. Emergent data from cultivation and next-generation sequencing approaches continue to provide promising new hints to answer these questions. As Gold suggested, and as has become increasingly evident, to better understand the subsurface is critical to further understanding the Earth, life, the evolution of life, and the potential for life elsewhere. To this end, we suggest the need to develop a robust network of interdisciplinary scientists and accessible field sites for long-term monitoring of the Earth's subsurface in the form of a deep subsurface microbiome initiative.
Avsec, Žiga; Cheng, Jun; Gagneur, Julien
2018-01-01
Abstract Motivation Regulatory sequences are not solely defined by their nucleic acid sequence but also by their relative distances to genomic landmarks such as transcription start site, exon boundaries or polyadenylation site. Deep learning has become the approach of choice for modeling regulatory sequences because of its strength to learn complex sequence features. However, modeling relative distances to genomic landmarks in deep neural networks has not been addressed. Results Here we developed spline transformation, a neural network module based on splines to flexibly and robustly model distances. Modeling distances to various genomic landmarks with spline transformations significantly increased state-of-the-art prediction accuracy of in vivo RNA-binding protein binding sites for 120 out of 123 proteins. We also developed a deep neural network for human splice branchpoint based on spline transformations that outperformed the current best, already distance-based, machine learning model. Compared to piecewise linear transformation, as obtained by composition of rectified linear units, spline transformation yields higher prediction accuracy as well as faster and more robust training. As spline transformation can be applied to further quantities beyond distances, such as methylation or conservation, we foresee it as a versatile component in the genomics deep learning toolbox. Availability and implementation Spline transformation is implemented as a Keras layer in the CONCISE python package: https://github.com/gagneurlab/concise. Analysis code is available at https://github.com/gagneurlab/Manuscript_Avsec_Bioinformatics_2017. Contact avsec@in.tum.de or gagneur@in.tum.de Supplementary information Supplementary data are available at Bioinformatics online. PMID:29155928
Lau, K C K; Osiowy, C; Giles, E; Lusina, B; van Marle, G; Burak, K W; Coffin, C S
2018-06-01
Recent studies suggest that withdrawal of hepatitis B immune globulin (HBIG) and nucleos(t)ide analogues (NA) prophylaxis may be considered in HBV surface antigen (HBsAg)-negative liver transplant (LT) recipients with a low risk of disease recurrence. However, the frequency of occult HBV infection (OBI) and HBV variants after LT in the current era of potent NA therapy is unknown. Twelve LT recipients on prophylaxis were tested in matched plasma and peripheral blood mononuclear cells (PBMCs) for HBV quasispecies by in-house nested PCR and next-generation sequencing of amplicons. HBV covalently closed circular DNA (cccDNA) was detected in Hirt DNA isolated from PBMCs with cccDNA-specific primers and confirmed by nucleic acid hybridization and Sanger sequencing. HBV mRNA in PBMC was detected with reverse-transcriptase nested PCR. In LT recipients on immunosuppressive therapy (10/12 male; median age 57.5 [IQR: 39.8-66.5]; median follow-up post-LT 60 months; 6 pre-LT hepatocellular carcinoma [HCC]), 9 were HBsAg-. HBV DNA was detected in all plasma and PBMC tested; cccDNA and/or mRNA was detected in the PBMC of 10/12 patients. Significant HBV quasispecies diversity (ie 143-2212 nonredundant HBV species) was noted in both sites, and single nucleotide polymorphisms associated with cirrhosis and HCC were detected at varying frequencies. In conclusion, OBI and HBV variants associated with severe liver disease persist in LT recipients on prophylaxis. Although HBV control and cccDNA transcriptional silencing may occur despite immunosuppression, complete virological eradication does not occur in LT recipients with a history of HBV-related end-stage liver disease. © 2018 John Wiley & Sons Ltd.
Yu, Tiantian; Li, Meng; Niu, Mingyang; Fan, Xibei; Liang, Wenyue; Wang, Fengping
2018-01-01
In marine sediments, microorganisms are known to play important roles in nitrogen cycling; however, the composition and quantity of microbes taking part in each process of nitrogen cycling are currently unclear. In this study, two different types of marine sediment samples (shallow bay and deep-sea sediments) in the South China Sea (SCS) were selected to investigate the microbial community involved in nitrogen cycling. The abundance and composition of prokaryotes and seven key functional genes involved in five processes of the nitrogen cycle [nitrogen fixation, nitrification, denitrification, dissimilatory nitrate reduction to ammonium (DNRA), and anaerobic ammonia oxidation (anammox)] were presented. The results showed that a higher abundance of denitrifiers was detected in shallow bay sediments, while a higher abundance of microbes involved in ammonia oxidation, anammox, and DNRA was found in the deep-sea sediments. Moreover, phylogenetic differentiation of bacterial amoA, nirS, nosZ, and nrfA sequences between the two types of sediments was also presented, suggesting environmental selection of microbes with the same geochemical functions but varying physiological properties.
Deep hierarchies in the primate visual cortex: what can we learn for computer vision?
Krüger, Norbert; Janssen, Peter; Kalkan, Sinan; Lappe, Markus; Leonardis, Ales; Piater, Justus; Rodríguez-Sánchez, Antonio J; Wiskott, Laurenz
2013-08-01
Computational modeling of the primate visual system yields insights of potential relevance to some of the challenges that computer vision is facing, such as object recognition and categorization, motion detection and activity recognition, or vision-based navigation and manipulation. This paper reviews some functional principles and structures that are generally thought to underlie the primate visual cortex, and attempts to extract biological principles that could further advance computer vision research. Organized for a computer vision audience, we present functional principles of the processing hierarchies present in the primate visual system considering recent discoveries in neurophysiology. The hierarchical processing in the primate visual system is characterized by a sequence of different levels of processing (on the order of 10) that constitute a deep hierarchy in contrast to the flat vision architectures predominantly used in today's mainstream computer vision. We hope that the functional description of the deep hierarchies realized in the primate visual system provides valuable insights for the design of computer vision algorithms, fostering increasingly productive interaction between biological and computer vision research.
MUFOLD-SS: New deep inception-inside-inception networks for protein secondary structure prediction.
Fang, Chao; Shang, Yi; Xu, Dong
2018-05-01
Protein secondary structure prediction can provide important information for protein 3D structure prediction and protein functions. Deep learning offers a new opportunity to significantly improve prediction accuracy. In this article, a new deep neural network architecture, named the Deep inception-inside-inception (Deep3I) network, is proposed for protein secondary structure prediction and implemented as a software tool MUFOLD-SS. The input to MUFOLD-SS is a carefully designed feature matrix corresponding to the primary amino acid sequence of a protein, which consists of a rich set of information derived from individual amino acid, as well as the context of the protein sequence. Specifically, the feature matrix is a composition of physio-chemical properties of amino acids, PSI-BLAST profile, and HHBlits profile. MUFOLD-SS is composed of a sequence of nested inception modules and maps the input matrix to either eight states or three states of secondary structures. The architecture of MUFOLD-SS enables effective processing of local and global interactions between amino acids in making accurate prediction. In extensive experiments on multiple datasets, MUFOLD-SS outperformed the best existing methods and other deep neural networks significantly. MUFold-SS can be downloaded from http://dslsrv8.cs.missouri.edu/~cf797/MUFoldSS/download.html. © 2018 Wiley Periodicals, Inc.
Fungal diversity in deep-sea sediments of a hydrothermal vent system in the Southwest Indian Ridge
NASA Astrophysics Data System (ADS)
Xu, Wei; Gong, Lin-feng; Pang, Ka-Lai; Luo, Zhu-Hua
2018-01-01
Deep-sea hydrothermal sediment is known to support remarkably diverse microbial consortia. In deep sea environments, fungal communities remain less studied despite their known taxonomic and functional diversity. High-throughput sequencing methods have augmented our capacity to assess eukaryotic diversity and their functions in microbial ecology. Here we provide the first description of the fungal community diversity found in deep sea sediments collected at the Southwest Indian Ridge (SWIR) using culture-dependent and high-throughput sequencing approaches. A total of 138 fungal isolates were cultured from seven different sediment samples using various nutrient media, and these isolates were identified to 14 fungal taxa, including 11 Ascomycota taxa (7 genera) and 3 Basidiomycota taxa (2 genera) based on internal transcribed spacers (ITS1, ITS2 and 5.8S) of rDNA. Using illumina HiSeq sequencing, a total of 757,467 fungal ITS2 tags were recovered from the samples and clustered into 723 operational taxonomic units (OTUs) belonging to 79 taxa (Ascomycota and Basidiomycota contributed to 99% of all samples) based on 97% sequence similarity. Results from both approaches suggest that there is a high fungal diversity in the deep-sea sediments collected in the SWIR and fungal communities were shown to be slightly different by location, although all were collected from adjacent sites at the SWIR. This study provides baseline data of the fungal diversity and biogeography, and a glimpse to the microbial ecology associated with the deep-sea sediments of the hydrothermal vent system of the Southwest Indian Ridge.
NASA Astrophysics Data System (ADS)
Daly, R. A.; Mouser, P. J.; Trexler, R.; Wrighton, K. C.
2014-12-01
Despite a growing appreciation for the ecological role of viruses in marine and gut systems, little is known about their role in the terrestrial deep (> 2000 m) subsurface. We used assembly-based metagenomics to examine the viral component in fluids from hydraulically fractured Marcellus shale gas wells. Here we reconstructed microbial and viral genomes from samples collected 7, 82, and 328 days post fracturing. Viruses accounted for 4.14%, 0.92% and 0.59% of the sample reads that mapped to the assembly. We identified 6 complete, circularized viral genomes and an additional 92 viral contigs > 5 kb with a maximum contig size of 73.6 kb. A BLAST comparison to NCBI viral genomes revealed that 85% of viral contigs had significant hits to the viral order Caudovirales, with 43% of sequences belonging to the family Siphoviridae, 38% to Myoviridae, and 12% to Podoviridae. Enrichment of Caudovirales viruses was supported by a large number of predicted proteins characteristic of tailed viruses including terminases (TerL), tape measure, tail formation, and baseplate related proteins. The viral contigs included evidence of lytic and temperate lifestyles, with the 7 day sample having the greatest number of detected lytic viruses. Notably in this sample, the most abundant virus was lytic and its inferred host, a member of the Vibrionaceae, was not detected at later time points. Analyses of CRISPR sequences (a viral and foreign DNA immune system in bacteria and archaea), linked 18 viral contigs to hosts. CRISPR linkages increased through time and all bacterial and archaeal genomes recovered in the final time point had genes for CRISPR-mediated viral defense. The majority of CRISPR sequences linked phage genomes to several Halanaerobium strains, which are the dominant and persisting members of the community inferred to be responsible for carbon and sulfur cycling in these shales. Network analysis revealed that several viruses were present in the 82 and 328 day samples; this viral persistence is consistent with concomitant temporal stability in geochemistry and microbial community composition. Our findings suggest that after a disturbance (hydraulic fracturing) viral predation and host immunity is an important controller of microbial community structure, metabolism, and thus biogeochemical cycling in the deep subsurface.
Phinikaridou, Alkystis; Andia, Marcelo E; Saha, Prakash; Modarai, Bijan; Smith, Alberto; Botnar, René M
2013-05-01
Deep vein thrombosis remains a major health problem necessitating accurate diagnosis. Thrombolysis is associated with significant morbidity and is effective only for the treatment of unorganized thrombus. We tested the feasibility of in vivo magnetization transfer (MT) and diffusion-weighted magnetic resonance imaging to detect thrombus organization in a murine model of deep vein thrombosis. Deep vein thrombosis was induced in the inferior vena cava of male BALB/C mice. Magnetic resonance imaging was performed at days 1, 7, 14, 21, and 28 after thrombus induction using MT, diffusion-weighted, inversion-recovery, and T1-mapping protocols. Delayed enhancement and T1 mapping were repeated 2 hours after injection of a fibrin contrast agent. Finally, excised thrombi were used for histology. We found that MT and diffusion-weighted imaging can detect histological changes associated with thrombus aging. MT rate (MTR) maps and percentage of MT rate (%MTR) allowed visualization and quantification of the thrombus protein content, respectively. The %MTR increased with thrombus organization and was significantly higher at days 14, 21, and 28 after thrombus induction (days 1, 7, 14, 21, 28: %MTR=2483±451, 2079±1210, 7029±2490, 10 295±4356, 32 994±25 449; PANOVA<0.05). There was a significant positive correlation between the %MTR and the histological protein content of the thrombus (r=0.70; P<0.05). The apparent diffusion coefficient was lower in erythrocyte-rich and collagen-rich thrombus (0.72±0.10 and 0.69±0.05 [×10(-3) mm(2)/s]). Thrombus at days 7 and 14 had the highest apparent diffusion coefficient values (0.95±0.09 and 1.10±0.18 [×10(-3) mm(2)/s]). MT and diffusion-weighted magnetic resonance imaging sequences are promising for the staging of thrombus composition and could be useful in guiding medical intervention.
Saha, Prakash; Modarai, Bijan; Smith, Alberto; Botnar, René M.
2014-01-01
Background Deep vein thrombosis remains a major health problem necessitating accurate diagnosis. Thrombolysis is associated with significant morbidity and is effective only for the treatment of unorganized thrombus. We tested the feasibility of in vivo magnetization transfer (MT) and diffusion-weighted magnetic resonance imaging to detect thrombus organization in a murine model of deep vein thrombosis. Methods and Results Deep vein thrombosis was induced in the inferior vena cava of male BALB/C mice. Magnetic resonance imaging was performed at days 1, 7, 14, 21, and 28 after thrombus induction using MT, diffusion-weighted, inversion-recovery, and T1-mapping protocols. Delayed enhancement and T1 mapping were repeated 2 hours after injection of a fibrin contrast agent. Finally, excised thrombi were used for histology. We found that MT and diffusion-weighted imaging can detect histological changes associated with thrombus aging. MT rate (MTR) maps and percentage of MT rate (%MTR) allowed visualization and quantification of the thrombus protein content, respectively. The %MTR increased with thrombus organization and was significantly higher at days 14, 21, and 28 after thrombus induction (days 1, 7, 14, 21, 28: %MTR=2483±451, 2079±1210, 7029±2490, 10 295±4356, 32 994±25 449; Panova<0.05). There was a significant positive correlation between the %MTR and the histological protein content of the thrombus (r=0.70; P<0.05). The apparent diffusion coefficient was lower in erythrocyte-rich and collagen-rich thrombus (0.72±0.10 and 0.69±0.05 [×10−3 mm2/s]). Thrombus at days 7 and 14 had the highest apparent diffusion coefficient values (0.95±0.09 and 1.10±0.18 [×10−3 mm2/s]). Conclusions MT and diffusion-weighted magnetic resonance imaging sequences are promising for the staging of thrombus composition and could be useful in guiding medical intervention. PMID:23564561
Onton, Julie A; Kang, Dae Y; Coleman, Todd P
2016-01-01
Brain activity during sleep is a powerful marker of overall health, but sleep lab testing is prohibitively expensive and only indicated for major sleep disorders. This report demonstrates that mobile 2-channel in-home electroencephalogram (EEG) recording devices provided sufficient information to detect and visualize sleep EEG. Displaying whole-night sleep EEG in a spectral display allowed for quick assessment of general sleep stability, cycle lengths, stage lengths, dominant frequencies and other indices of sleep quality. By visualizing spectral data down to 0.1 Hz, a differentiation emerged between slow-wave sleep with dominant frequency between 0.1-1 Hz or 1-3 Hz, but rarely both. Thus, we present here the new designations, Hi and Lo Deep sleep, according to the frequency range with dominant power. Simultaneously recorded electrodermal activity (EDA) was primarily associated with Lo Deep and very rarely with Hi Deep or any other stage. Therefore, Hi and Lo Deep sleep appear to be physiologically distinct states that may serve unique functions during sleep. We developed an algorithm to classify five stages (Awake, Light, Hi Deep, Lo Deep and rapid eye movement (REM)) using a Hidden Markov Model (HMM), model fitting with the expectation-maximization (EM) algorithm, and estimation of the most likely sleep state sequence by the Viterbi algorithm. The resulting automatically generated sleep hypnogram can help clinicians interpret the spectral display and help researchers computationally quantify sleep stages across participants. In conclusion, this study demonstrates the feasibility of in-home sleep EEG collection, a rapid and informative sleep report format, and novel deep sleep designations accounting for spectral and physiological differences.
Christiansen, Peter; Nielsen, Lars N; Steen, Kim A; Jørgensen, Rasmus N; Karstoft, Henrik
2016-11-11
Convolutional neural network (CNN)-based systems are increasingly used in autonomous vehicles for detecting obstacles. CNN-based object detection and per-pixel classification (semantic segmentation) algorithms are trained for detecting and classifying a predefined set of object types. These algorithms have difficulties in detecting distant and heavily occluded objects and are, by definition, not capable of detecting unknown object types or unusual scenarios. The visual characteristics of an agriculture field is homogeneous, and obstacles, like people, animals and other obstacles, occur rarely and are of distinct appearance compared to the field. This paper introduces DeepAnomaly, an algorithm combining deep learning and anomaly detection to exploit the homogenous characteristics of a field to perform anomaly detection. We demonstrate DeepAnomaly as a fast state-of-the-art detector for obstacles that are distant, heavily occluded and unknown. DeepAnomaly is compared to state-of-the-art obstacle detectors including "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks" (RCNN). In a human detector test case, we demonstrate that DeepAnomaly detects humans at longer ranges (45-90 m) than RCNN. RCNN has a similar performance at a short range (0-30 m). However, DeepAnomaly has much fewer model parameters and (182 ms/25 ms =) a 7.28-times faster processing time per image. Unlike most CNN-based methods, the high accuracy, the low computation time and the low memory footprint make it suitable for a real-time system running on a embedded GPU (Graphics Processing Unit).
Christiansen, Peter; Nielsen, Lars N.; Steen, Kim A.; Jørgensen, Rasmus N.; Karstoft, Henrik
2016-01-01
Convolutional neural network (CNN)-based systems are increasingly used in autonomous vehicles for detecting obstacles. CNN-based object detection and per-pixel classification (semantic segmentation) algorithms are trained for detecting and classifying a predefined set of object types. These algorithms have difficulties in detecting distant and heavily occluded objects and are, by definition, not capable of detecting unknown object types or unusual scenarios. The visual characteristics of an agriculture field is homogeneous, and obstacles, like people, animals and other obstacles, occur rarely and are of distinct appearance compared to the field. This paper introduces DeepAnomaly, an algorithm combining deep learning and anomaly detection to exploit the homogenous characteristics of a field to perform anomaly detection. We demonstrate DeepAnomaly as a fast state-of-the-art detector for obstacles that are distant, heavily occluded and unknown. DeepAnomaly is compared to state-of-the-art obstacle detectors including “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks” (RCNN). In a human detector test case, we demonstrate that DeepAnomaly detects humans at longer ranges (45–90 m) than RCNN. RCNN has a similar performance at a short range (0–30 m). However, DeepAnomaly has much fewer model parameters and (182 ms/25 ms =) a 7.28-times faster processing time per image. Unlike most CNN-based methods, the high accuracy, the low computation time and the low memory footprint make it suitable for a real-time system running on a embedded GPU (Graphics Processing Unit). PMID:27845717
Zhou, Yi; Othus, Megan; Walter, Roland B; Estey, Elihu H; Wu, David; Wood, Brent L
2018-04-21
Relapse is the major cause of death in patients with acute myeloid leukemia (AML) after allogeneic hematopoietic cell transplantation (HCT). Measurable residual disease (MRD) detected by multiparameter flow cytometry (MFC) before and after HCT is a strong, independent risk factor for relapse. As next-generation sequencing (NGS) is increasingly applied in AML MRD detection, it remains to be determined if NGS can improve prediction of post-HCT relapse. Herein, we investigated pre-HCT MRD detected by MFC and NGS in 59 adult patients with NPM1-mutated AML in morphologic remission; 45 of the 59 had post-HCT MRD determined by MFC and NGS around day 28. Before HCT, MRD detected by MFC was the most significant risk factor for relapse (hazard ratio [HR], 4.63; P < .001), whereas MRD detected only by NGS was not. After HCT, MRD detected by either MFC or NGS was significant risk factor for relapse (HR, 4.96, P = .004 and HR, 4.36, P = .002, respectively). Combining pre- and post-HCT MRD provided the best prediction for relapse (HR, 5.25; P < .001), with a sensitivity at 83%. We conclude that NGS testing of mutated NPM1 post-HCT improves the risk assessment for relapse, whereas pre-HCT MFC testing identifies a subset of high-risk patients in whom additional therapy should be tested. Copyright © 2018 The American Society for Blood and Marrow Transplantation. Published by Elsevier Inc. All rights reserved.
Sensitivity of Small RNA-Based Detection of Plant Viruses.
Santala, Johanna; Valkonen, Jari P T
2018-01-01
Plants recognize unrelated viruses by the antiviral defense system called RNA interference (RNAi). RNAi processes double-stranded viral RNA into small RNAs (sRNAs) of 21-24 nucleotides, the reassembly of which into longer strands in silico allows virus identification by comparison with the sequences available in databases. The aim of this study was to compare the virus detection sensitivity of sRNA-based virus diagnosis with the established virus species-specific polymerase chain reaction (PCR) approach. Viruses propagated in tobacco plants included three engineered, infectious clones of Potato virus A (PVA), each carrying a different marker gene, and an infectious clone of Potato virus Y (PVY). Total RNA (containing sRNA) was isolated and subjected to reverse-transcription real-time PCR (RT-RT-PCR) and sRNA deep-sequencing at different concentrations. RNA extracted from various crop plants was included in the reactions to normalize RNA concentrations. Targeted detection of selected viruses showed a similar threshold for the sRNA and reverse-transcription quantitative PCR (RT-qPCR) analyses. The detection limit for PVY and PVA by RT-qPCR in this study was 3 and 1.5 fg of viral RNA, respectively, in 50 ng of total RNA per PCR reaction. When knowledge was available about the viruses likely present in the samples, sRNA-based virus detection was 10 times more sensitive than RT-RT-PCR. The advantage of sRNA analysis is the detection of all tested viruses without the need for virus-specific primers or probes.
OPTICAL–INFRARED PROPERTIES OF FAINT 1.3 mm SOURCES DETECTED WITH ALMA
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hatsukade, Bunyo; Yabe, Kiyoto; Ohta, Kouji
2015-09-10
We report optical-infrared (IR) properties of faint 1.3 mm sources (S{sub 1.3mm} = 0.2–1.0 mJy) detected with the Atacama Large Millimeter/submillimeter Array (ALMA) in the Subaru/XMM-Newton Deep Survey field. We searched for optical/IR counterparts of eight ALMA-detected sources (≥4.0σ, the sum of the probability of spurious source contamination is ∼1) in a K-band source catalog. Four ALMA sources have K-band counterpart candidates within a 0.″4 radius. Comparison between ALMA-detected and undetected K-band sources in the same observing fields shows that ALMA-detected sources tend to be brighter, more massive, and more actively forming stars. While many of the ALMA-identified submillimeter-bright galaxiesmore » (SMGs) in previous studies lie above the sequence of star-forming galaxies in the stellar mass–star formation rate plane, our ALMA sources are located in the sequence, suggesting that the ALMA-detected faint sources are more like “normal” star-forming galaxies rather than “classical” SMGs. We found a region where multiple ALMA sources and K-band sources reside in a narrow photometric redshift range (z ∼ 1.3–1.6) within a radius of 5″ (42 kpc if we assume z = 1.45). This is possibly a pre-merging system and we may be witnessing the early phase of formation of a massive elliptical galaxy.« less
2010-01-01
Background Bathymodiolus azoricus is a deep-sea hydrothermal vent mussel found in association with large faunal communities living in chemosynthetic environments at the bottom of the sea floor near the Azores Islands. Investigation of the exceptional physiological reactions that vent mussels have adopted in their habitat, including responses to environmental microbes, remains a difficult challenge for deep-sea biologists. In an attempt to reveal genes potentially involved in the deep-sea mussel innate immunity we carried out a high-throughput sequence analysis of freshly collected B. azoricus transcriptome using gills tissues as the primary source of immune transcripts given its strategic role in filtering the surrounding waterborne potentially infectious microorganisms. Additionally, a substantial EST data set was produced and from which a comprehensive collection of genes coding for putative proteins was organized in a dedicated database, "DeepSeaVent" the first deep-sea vent animal transcriptome database based on the 454 pyrosequencing technology. Results A normalized cDNA library from gills tissue was sequenced in a full 454 GS-FLX run, producing 778,996 sequencing reads. Assembly of the high quality reads resulted in 75,407 contigs of which 3,071 were singletons. A total of 39,425 transcripts were conceptually translated into amino-sequences of which 22,023 matched known proteins in the NCBI non-redundant protein database, 15,839 revealed conserved protein domains through InterPro functional classification and 9,584 were assigned with Gene Ontology terms. Queries conducted within the database enabled the identification of genes putatively involved in immune and inflammatory reactions which had not been previously evidenced in the vent mussel. Their physical counterpart was confirmed by semi-quantitative quantitative Reverse-Transcription-Polymerase Chain Reactions (RT-PCR) and their RNA transcription level by quantitative PCR (qPCR) experiments. Conclusions We have established the first tissue transcriptional analysis of a deep-sea hydrothermal vent animal and generated a searchable catalog of genes that provides a direct method of identifying and retrieving vast numbers of novel coding sequences which can be applied in gene expression profiling experiments from a non-conventional model organism. This provides the most comprehensive sequence resource for identifying novel genes currently available for a deep-sea vent organism, in particular, genes putatively involved in immune and inflammatory reactions in vent mussels. The characterization of the B. azoricus transcriptome will facilitate research into biological processes underlying physiological adaptations to hydrothermal vent environments and will provide a basis for expanding our understanding of genes putatively involved in adaptations processes during post-capture long term acclimatization experiments, at "sea-level" conditions, using B. azoricus as a model organism. PMID:20937131
Iftikhar, Romana; Ashfaq, Muhammad; Rasool, Akhtar; Hebert, Paul D N
2016-01-01
Although thrips are globally important crop pests and vectors of viral disease, species identifications are difficult because of their small size and inconspicuous morphological differences. Sequence variation in the mitochondrial COI-5' (DNA barcode) region has proven effective for the identification of species in many groups of insect pests. We analyzed barcode sequence variation among 471 thrips from various plant hosts in north-central Pakistan. The Barcode Index Number (BIN) system assigned these sequences to 55 BINs, while the Automatic Barcode Gap Discovery detected 56 partitions, a count that coincided with the number of monophyletic lineages recognized by Neighbor-Joining analysis and Bayesian inference. Congeneric species showed an average of 19% sequence divergence (range = 5.6% - 27%) at COI, while intraspecific distances averaged 0.6% (range = 0.0% - 7.6%). BIN analysis suggested that all intraspecific divergence >3.0% actually involved a species complex. In fact, sequences for three major pest species (Haplothrips reuteri, Thrips palmi, Thrips tabaci), and one predatory thrips (Aeolothrips intermedius) showed deep intraspecific divergences, providing evidence that each is a cryptic species complex. The study compiles the first barcode reference library for the thrips of Pakistan, and examines global haplotype diversity in four important pest thrips.
Vazquez-Guillen, Jose Manuel; Palacios-Saucedo, Gerardo C.; Rivera-Morales, Lydia G.; Garcia-Campos, Jorge; Ortiz-Lopez, Rocio; Noguera-Julian, Marc; Paredes, Roger; Vielma-Ramirez, Herlinda J.; Ramirez, Teresa J.; Chavez-Garcia, Marcelino; Lopez-Guillen, Paulo; Briones-Lara, Evangelina; Sanchez-Sanchez, Luz M.; Vazquez-Martinez, Carlos A.; Rodriguez-Padilla, Cristina
2016-01-01
Although Structured Treatment Interruptions (STI) are currently not considered an alternative strategy for antiretroviral treatment, their true benefits and limitations have not been fully established. Some studies suggest the possibility of improving the quality of life of patients with this strategy; however, the information that has been obtained corresponds mostly to studies conducted in adults, with a lack of knowledge about its impact on children. Furthermore, mutations associated with antiretroviral resistance could be selected due to sub-therapeutic levels of HAART at each interruption period. Genotyping methods to determine the resistance profiles of the infecting viruses have become increasingly important for the management of patients under STI, thus low-abundance antiretroviral drug-resistant mutations (DRM’s) at levels under limit of detection of conventional genotyping (<20% of quasispecies) could increase the risk of virologic failure. In this work, we analyzed the protease and reverse transcriptase regions of the pol gene by ultra-deep sequencing in pediatric patients under STI with the aim of determining the presence of high- and low-abundance DRM’s in the viral rebounds generated by the STI. High-abundance mutations in protease and high- and low-abundance mutations in reverse transcriptase were detected but no one of these are directly associated with resistance to antiretroviral drugs. The results could suggest that the evaluated STI program is virologically safe, but strict and carefully planned studies, with greater numbers of patients and interruption/restart cycles, are still needed to evaluate the selection of DRM’s during STI. PMID:26807922
Detecting voids in a 0.6 m coal seam, 7 m deep, using seismic reflection
Miller, R.D.; Steeples, D.W.
1991-01-01
Surface collapse over abandoned subsurface coal mines is a problem in many parts of the world. High-resolution P-wave reflection seismology was successfully used to evaluate the risk of an active sinkhole to a main north-south railroad line in an undermined area of southeastern Kansas, USA. Water-filled cavities responsible for sinkholes in this area are in a 0.6 m thick coal seam, 7 m deep. Dominant reflection frequencies in excess of 200 Hz enabled reflections from the coal seam to be discerned from the direct wave, refractions, air wave, and ground roll on unprocessed field files. Repetitive void sequences within competent coal on three seismic profiles are consistent with the "room and pillar" mining technique practiced in this area near the turn of the century. The seismic survey showed that the apparent active sinkhole was not the result of reactivated subsidence but probably erosion. ?? 1991.
Kulmanov, Maxat; Khan, Mohammed Asif; Hoehndorf, Robert; Wren, Jonathan
2018-02-15
A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein-protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations. Web server: http://deepgo.bio2vec.net, Source code: https://github.com/bio-ontology-research-group/deepgo. robert.hoehndorf@kaust.edu.sa. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
microRNA expression profiling in fetal single ventricle malformation identified by deep sequencing.
Yu, Zhang-Bin; Han, Shu-Ping; Bai, Yun-Fei; Zhu, Chun; Pan, Ya; Guo, Xi-Rong
2012-01-01
microRNAs (miRNAs) have emerged as key regulators in many biological processes, particularly cardiac growth and development, although the specific miRNA expression profile associated with this process remains to be elucidated. This study aimed to characterize the cellular microRNA profile involved in the development of congenital heart malformation, through the investigation of single ventricle (SV) defects. Comprehensive miRNA profiling in human fetal SV cardiac tissue was performed by deep sequencing. Differential expression of 48 miRNAs was revealed by sequencing by oligonucleotide ligation and detection (SOLiD) analysis. Of these, 38 were down-regulated and 10 were up-regulated in differentiated SV cardiac tissue, compared to control cardiac tissue. This was confirmed by real-time quantitative reverse transcription-polymerase chain reaction (qRT-PCR) analysis. Predicted target genes of the 48 differentially expressed miRNAs were analyzed by gene ontology and categorized according to cellular process, regulation of biological process and metabolic process. Pathway-Express analysis identified the WNT and mTOR signaling pathways as the most significant processes putatively affected by the differential expression of these miRNAs. The candidate genes involved in cardiac development were identified as potential targets for these differentially expressed microRNAs and the collaborative network of microRNAs and cardiac development related-mRNAs was constructed. These data provide the basis for future investigation of the mechanism of the occurrence and development of fetal SV malformations.
He, Bifang; Tjhung, Katrina F; Bennett, Nicholas J; Chou, Ying; Rau, Andrea; Huang, Jian; Derda, Ratmir
2018-01-19
Understanding the composition of a genetically-encoded (GE) library is instrumental to the success of ligand discovery. In this manuscript, we investigate the bias in GE-libraries of linear, macrocyclic and chemically post-translationally modified (cPTM) tetrapeptides displayed on the M13KE platform, which are produced via trinucleotide cassette synthesis (19 codons) and NNK-randomized codon. Differential enrichment of synthetic DNA {S}, ligated vector {L} (extension and ligation of synthetic DNA into the vector), naïve libraries {N} (transformation of the ligated vector into the bacteria followed by expression of the library for 4.5 hours to yield a "naïve" library), and libraries chemically modified by aldehyde ligation and cysteine macrocyclization {M} characterized by paired-end deep sequencing, detected a significant drop in diversity in {L} → {N}, but only a minor compositional difference in {S} → {L} and {N} → {M}. Libraries expressed at the N-terminus of phage protein pIII censored positively charged amino acids Arg and Lys; libraries expressed between pIII domains N1 and N2 overcame Arg/Lys-censorship but introduced new bias towards Gly and Ser. Interrogation of biases arising from cPTM by aldehyde ligation and cysteine macrocyclization unveiled censorship of sequences with Ser/Phe. Analogous analysis can be used to explore library diversity in new display platforms and optimize cPTM of these libraries.
Dendrites, deep learning, and sequences in the hippocampus.
Bhalla, Upinder S
2017-10-12
The hippocampus places us both in time and space. It does so over remarkably large spans: milliseconds to years, and centimeters to kilometers. This works for sensory representations, for memory, and for behavioral context. How does it fit in such wide ranges of time and space scales, and keep order among the many dimensions of stimulus context? A key organizing principle for a wide sweep of scales and stimulus dimensions is that of order in time, or sequences. Sequences of neuronal activity are ubiquitous in sensory processing, in motor control, in planning actions, and in memory. Against this strong evidence for the phenomenon, there are currently more models than definite experiments about how the brain generates ordered activity. The flip side of sequence generation is discrimination. Discrimination of sequences has been extensively studied at the behavioral, systems, and modeling level, but again physiological mechanisms are fewer. It is against this backdrop that I discuss two recent developments in neural sequence computation, that at face value share little beyond the label "neural." These are dendritic sequence discrimination, and deep learning. One derives from channel physiology and molecular signaling, the other from applied neural network theory - apparently extreme ends of the spectrum of neural circuit detail. I suggest that each of these topics has deep lessons about the possible mechanisms, scales, and capabilities of hippocampal sequence computation. © 2017 Wiley Periodicals, Inc.
Ordeig, Laura; Garcia-Cehic, Damir; Gregori, Josep; Soria, Maria Eugenia; Nieto-Aponte, Leonardo; Perales, Celia; Llorens, Meritxell; Chen, Qian; Riveiro-Barciela, Mar; Buti, Maria; Esteban, Rafael; Esteban, Juan Ignacio; Rodriguez-Frias, Francisco; Quer, Josep
2018-01-01
Hepatitis C virus (HCV) is a highly divergent virus currently classified into seven major genotypes and 86 subtypes (ICTV, June 2017), which can have differing responses to therapy. Accurate genotyping/subtyping using high-resolution HCV subtyping enables confident subtype identification, identifies mixed infections and allows detection of new subtypes. During routine genotyping/subtyping, one sample from an Equatorial Guinea patient could not be classified into any of the subtypes. The complete genomic sequence was compared to reference sequences by phylogenetic and sliding window analysis. Resistance-associated substitutions (RASs) were assessed by deep sequencing. The unclassified HCV genome did not belong to any of the existing genotype 1 (G1) subtypes. Sliding window analysis along the complete genome ruled out recombination phenomena suggesting that it belongs to a new HCV G1 subtype. Two NS5A RASs (L31V+Y93H) were found to be naturally combined in the genome which could limit treatment possibilities in patients infected with this subtype.
Fischer, Sebastian; Greipel, Leonie; Klockgether, Jens; Dorda, Marie; Wiehlmann, Lutz; Cramer, Nina; Tümmler, Burkhard
2017-05-01
Early antimicrobial chemotherapy can prevent or at least delay chronic cystic fibrosis (CF) airways infections with Pseudomonas aeruginosa. During a 10-year study period P. aeruginosa was detected for the first time in 54 CF patients regularly seen at the CF centre Hannover. Amplicon sequencing of 34 loci of the P. aeruginosa core genome was performed in baseline and post-treatment isolates of the 15 CF patients who had remained P. aeruginosa - positive after the first round of antipseudomonal chemotherapy. Deep sequencing uncovered coexisting alternative nucleotides at in total 33 of 55,284 examined genome positions including six non-synonymous polymorphisms in the lasR gene, a key regulator of quorum sensing. After early treatment 42 of 50 novel nucleotide substitutions had emerged in exopolysaccharide biosynthesis, efflux pump and porin genes. Early treatment selects pathoadaptive mutations in P. aeruginosa that are typical for chronic infections of CF lungs. Copyright © 2016 European Cystic Fibrosis Society. Published by Elsevier B.V. All rights reserved.
De novo transcriptome assembly and positive selection analysis of an individual deep-sea fish.
Lan, Yi; Sun, Jin; Xu, Ting; Chen, Chong; Tian, Renmao; Qiu, Jian-Wen; Qian, Pei-Yuan
2018-05-24
High hydrostatic pressure and low temperatures make the deep sea a harsh environment for life forms. Actin organization and microtubules assembly, which are essential for intracellular transport and cell motility, can be disrupted by high hydrostatic pressure. High hydrostatic pressure can also damage DNA. Nucleic acids exposed to low temperatures can form secondary structures that hinder genetic information processing. To study how deep-sea creatures adapt to such a hostile environment, one of the most straightforward ways is to sequence and compare their genes with those of their shallow-water relatives. We captured an individual of the fish species Aldrovandia affinis, which is a typical deep-sea inhabitant, from the Okinawa Trough at a depth of 1550 m using a remotely operated vehicle (ROV). We sequenced its transcriptome and analyzed its molecular adaptation. We obtained 27,633 protein coding sequences using an Illumina platform and compared them with those of several shallow-water fish species. Analysis of 4918 single-copy orthologs identified 138 positively selected genes in A. affinis, including genes involved in microtubule regulation. Particularly, functional domains related to cold shock as well as DNA repair are exposed to positive selection pressure in both deep-sea fish and hadal amphipod. Overall, we have identified a set of positively selected genes related to cytoskeleton structures, DNA repair and genetic information processing, which shed light on molecular adaptation to the deep sea. These results suggest that amino acid substitutions of these positively selected genes may contribute crucially to the adaptation of deep-sea animals. Additionally, we provide a high-quality transcriptome of a deep-sea fish for future deep-sea studies.
Deep Learning Method for Denial of Service Attack Detection Based on Restricted Boltzmann Machine.
Imamverdiyev, Yadigar; Abdullayeva, Fargana
2018-06-01
In this article, the application of the deep learning method based on Gaussian-Bernoulli type restricted Boltzmann machine (RBM) to the detection of denial of service (DoS) attacks is considered. To increase the DoS attack detection accuracy, seven additional layers are added between the visible and the hidden layers of the RBM. Accurate results in DoS attack detection are obtained by optimization of the hyperparameters of the proposed deep RBM model. The form of the RBM that allows application of the continuous data is used. In this type of RBM, the probability distribution of the visible layer is replaced by a Gaussian distribution. Comparative analysis of the accuracy of the proposed method with Bernoulli-Bernoulli RBM, Gaussian-Bernoulli RBM, deep belief network type deep learning methods on DoS attack detection is provided. Detection accuracy of the methods is verified on the NSL-KDD data set. Higher accuracy from the proposed multilayer deep Gaussian-Bernoulli type RBM is obtained.
Yanagawa, Katsunori; Nunoura, Takuro; McAllister, Sean M.; Hirai, Miho; Breuker, Anja; Brandt, Leah; House, Christopher H.; Moyer, Craig L.; Birrien, Jean-Louis; Aoike, Kan; Sunamura, Michinari; Urabe, Tetsuro; Mottl, Michael J.; Takai, Ken
2013-01-01
During the Integrated Ocean Drilling Program (IODP) Expedition 331 at the Iheya North hydrothermal system in the Mid-Okinawa Trough by the D/V Chikyu, we conducted microbiological contamination tests of the drilling and coring operations. The contamination from the drilling mud fluids was assessed using both perfluorocarbon tracers (PFT) and fluorescent microsphere beads. PFT infiltration was detected from the periphery of almost all whole round cores (WRCs). By contrast, fluorescent microspheres were not detected in hydrothermally active core samples, possibly due to thermal decomposition of the microspheres under high-temperature conditions. Microbial contamination from drilling mud fluids to the core interior subsamples was further characterized by molecular-based evaluation. The microbial 16S rRNA gene phylotype compositions in the drilling mud fluids were mainly composed of sequences of Beta- and Gammaproteobacteria, and Bacteroidetes and not archaeal sequences. The phylotypes that displayed more than 97% similarity to the sequences obtained from the drilling mud fluids were defined as possible contaminants in this study and were detected as minor components of the bacterial phylotype compositions in 13 of 37 core samples. The degree of microbiological contamination was consistent with that determined by the PFT and/or microsphere assessments. This study suggests a constructive approach for evaluation and eliminating microbial contamination during riser-less drilling and coring operations by the D/V Chikyu. PMID:24265628
Yanagawa, Katsunori; Nunoura, Takuro; McAllister, Sean M; Hirai, Miho; Breuker, Anja; Brandt, Leah; House, Christopher H; Moyer, Craig L; Birrien, Jean-Louis; Aoike, Kan; Sunamura, Michinari; Urabe, Tetsuro; Mottl, Michael J; Takai, Ken
2013-01-01
During the Integrated Ocean Drilling Program (IODP) Expedition 331 at the Iheya North hydrothermal system in the Mid-Okinawa Trough by the D/V Chikyu, we conducted microbiological contamination tests of the drilling and coring operations. The contamination from the drilling mud fluids was assessed using both perfluorocarbon tracers (PFT) and fluorescent microsphere beads. PFT infiltration was detected from the periphery of almost all whole round cores (WRCs). By contrast, fluorescent microspheres were not detected in hydrothermally active core samples, possibly due to thermal decomposition of the microspheres under high-temperature conditions. Microbial contamination from drilling mud fluids to the core interior subsamples was further characterized by molecular-based evaluation. The microbial 16S rRNA gene phylotype compositions in the drilling mud fluids were mainly composed of sequences of Beta- and Gammaproteobacteria, and Bacteroidetes and not archaeal sequences. The phylotypes that displayed more than 97% similarity to the sequences obtained from the drilling mud fluids were defined as possible contaminants in this study and were detected as minor components of the bacterial phylotype compositions in 13 of 37 core samples. The degree of microbiological contamination was consistent with that determined by the PFT and/or microsphere assessments. This study suggests a constructive approach for evaluation and eliminating microbial contamination during riser-less drilling and coring operations by the D/V Chikyu.
DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection.
Ouyang, Wanli; Zeng, Xingyu; Wang, Xiaogang; Qiu, Shi; Luo, Ping; Tian, Yonglong; Li, Hongsheng; Yang, Shuo; Wang, Zhe; Li, Hongyang; Loy, Chen Change; Wang, Kun; Yan, Junjie; Tang, Xiaoou
2016-07-07
In this paper, we propose deformable deep convolutional neural networks for generic object detection. This new deep learning object detection framework has innovations in multiple aspects. In the proposed new deep architecture, a new deformation constrained pooling (def-pooling) layer models the deformation of object parts with geometric constraint and penalty. A new pre-training strategy is proposed to learn feature representations more suitable for the object detection task and with good generalization capability. By changing the net structures, training strategies, adding and removing some key components in the detection pipeline, a set of models with large diversity are obtained, which significantly improves the effectiveness of model averaging. The proposed approach improves the mean averaged precision obtained by RCNN [16], which was the state-of-the-art, from 31% to 50.3% on the ILSVRC2014 detection test set. It also outperforms the winner of ILSVRC2014, GoogLeNet, by 6.1%. Detailed component-wise analysis is also provided through extensive experimental evaluation, which provides a global view for people to understand the deep learning object detection pipeline.
Blanc, Hervé; Bordería, Antonio V.; Díaz, Gisell; Henningsson, Rasmus; Gonzalez, Daniel; Santana, Emidalys; Alvarez, Mayling; Castro, Osvaldo; Fontes, Magnus; Vignuzzi, Marco; Guzman, Maria G.
2016-01-01
ABSTRACT During the dengue virus type 3 (DENV-3) epidemic that occurred in Havana in 2001 to 2002, severe disease was associated with the infection sequence DENV-1 followed by DENV-3 (DENV-1/DENV-3), while the sequence DENV-2/DENV-3 was associated with mild/asymptomatic infections. To determine the role of the virus in the increasing severity demonstrated during the epidemic, serum samples collected at different time points were studied. A total of 22 full-length sequences were obtained using a deep-sequencing approach. Bayesian phylogenetic analysis of consensus sequences revealed that two DENV-3 lineages were circulating in Havana at that time, both grouped within genotype III. The predominant lineage is closely related to Peruvian and Ecuadorian strains, while the minor lineage is related to Venezuelan strains. According to consensus sequences, relatively few nonsynonymous mutations were observed; only one was fixed during the epidemic at position 4380 in the NS2B gene. Intrahost genetic analysis indicated that a significant minor population was selected and became predominant toward the end of the epidemic. In conclusion, greater variability was detected during the epidemic's progression in terms of significant minority variants, particularly in the nonstructural genes. An increasing trend of genetic diversity toward the end of the epidemic was observed only for synonymous variant allele rates, with higher variability in secondary cases. Remarkably, significant intrahost genetic variation was demonstrated within the same patient during the course of secondary infection with DENV-1/DENV-3, including changes in the structural proteins premembrane (PrM) and envelope (E). Therefore, the dynamic of evolving viral populations in the context of heterotypic antibodies could be related to the increasing clinical severity observed during the epidemic. IMPORTANCE Based on the evidence that DENV fitness is context dependent, our research has focused on the study of viral factors associated with intraepidemic increasing severity in a unique epidemiological setting. Here, we investigated the intrahost genetic diversity in acute human samples collected at different time points during the DENV-3 epidemic that occurred in Cuba in 2001 to 2002 using a deep-sequencing approach. We concluded that greater variability in significant minor populations occurred as the epidemic progressed, particularly in the nonstructural genes, with higher variability observed in secondary infection cases. Remarkably, for the first time significant intrahost genetic variation was demonstrated within the same patient during the course of secondary infection with DENV-1/DENV-3, including changes in structural proteins. These findings indicate that high-resolution approaches are needed to unravel molecular mechanisms involved in dengue pathogenesis. PMID:26889031
Metatranscriptomic analyses of honey bee colonies.
Tozkar, Cansu Ö; Kence, Meral; Kence, Aykut; Huang, Qiang; Evans, Jay D
2015-01-01
Honey bees face numerous biotic threats from viruses to bacteria, fungi, protists, and mites. Here we describe a thorough analysis of microbes harbored by worker honey bees collected from field colonies in geographically distinct regions of Turkey. Turkey is one of the World's most important centers of apiculture, harboring five subspecies of Apis mellifera L., approximately 20% of the honey bee subspecies in the world. We use deep ILLUMINA-based RNA sequencing to capture RNA species for the honey bee and a sampling of all non-endogenous species carried by bees. After trimming and mapping these reads to the honey bee genome, approximately 10% of the sequences (9-10 million reads per library) remained. These were then mapped to a curated set of public sequences containing ca. Sixty megabase-pairs of sequence representing known microbial species associated with honey bees. Levels of key honey bee pathogens were confirmed using quantitative PCR screens. We contrast microbial matches across different sites in Turkey, showing new country recordings of Lake Sinai virus, two Spiroplasma bacterium species, symbionts Candidatus Schmidhempelia bombi, Frischella perrara, Snodgrassella alvi, Gilliamella apicola, Lactobacillus spp.), neogregarines, and a trypanosome species. By using metagenomic analysis, this study also reveals deep molecular evidence for the presence of bacterial pathogens (Melissococcus plutonius, Paenibacillus larvae), Varroa destructor-1 virus, Sacbrood virus, and fungi. Despite this effort we did not detect KBV, SBPV, Tobacco ringspot virus, VdMLV (Varroa Macula like virus), Acarapis spp., Tropilaeleps spp. and Apocephalus (phorid fly). We discuss possible impacts of management practices and honey bee subspecies on microbial retinues. The described workflow and curated microbial database will be generally useful for microbial surveys of healthy and declining honey bees.
Metatranscriptomic analyses of honey bee colonies
Tozkar, Cansu Ö.; Kence, Meral; Kence, Aykut; Huang, Qiang; Evans, Jay D.
2015-01-01
Honey bees face numerous biotic threats from viruses to bacteria, fungi, protists, and mites. Here we describe a thorough analysis of microbes harbored by worker honey bees collected from field colonies in geographically distinct regions of Turkey. Turkey is one of the World's most important centers of apiculture, harboring five subspecies of Apis mellifera L., approximately 20% of the honey bee subspecies in the world. We use deep ILLUMINA-based RNA sequencing to capture RNA species for the honey bee and a sampling of all non-endogenous species carried by bees. After trimming and mapping these reads to the honey bee genome, approximately 10% of the sequences (9–10 million reads per library) remained. These were then mapped to a curated set of public sequences containing ca. Sixty megabase-pairs of sequence representing known microbial species associated with honey bees. Levels of key honey bee pathogens were confirmed using quantitative PCR screens. We contrast microbial matches across different sites in Turkey, showing new country recordings of Lake Sinai virus, two Spiroplasma bacterium species, symbionts Candidatus Schmidhempelia bombi, Frischella perrara, Snodgrassella alvi, Gilliamella apicola, Lactobacillus spp.), neogregarines, and a trypanosome species. By using metagenomic analysis, this study also reveals deep molecular evidence for the presence of bacterial pathogens (Melissococcus plutonius, Paenibacillus larvae), Varroa destructor-1 virus, Sacbrood virus, and fungi. Despite this effort we did not detect KBV, SBPV, Tobacco ringspot virus, VdMLV (Varroa Macula like virus), Acarapis spp., Tropilaeleps spp. and Apocephalus (phorid fly). We discuss possible impacts of management practices and honey bee subspecies on microbial retinues. The described workflow and curated microbial database will be generally useful for microbial surveys of healthy and declining honey bees. PMID:25852743
Cassidy, J.F.; Balfour, N.; Hickson, C.; Kao, H.; White, Rickie; Caplan-Auerbach, J.; Mazzotti, S.; Rogers, Gary C.; Al-Khoubbi, I.; Bird, A.L.; Esteban, L.; Kelman, M.; Hutchinson, J.; McCormack, D.
2011-01-01
On 9 October 2007, an unusual sequence of earthquakes began in central British Columbia about 20 km west of the Nazko cone, the most recent (circa 7200 yr) volcanic center in the Anahim volcanic belt. Within 25 hr, eight earthquakes of magnitude 2.3-2.9 occurred in a region where no earthquakes had previously been recorded. During the next three weeks, more than 800 microearthquakes were located (and many more detected), most at a depth of 25-31 km and within a radius of about 5 km. After about two months, almost all activity ceased. The clear P- and S-wave arrivals indicated that these were high-frequency (volcanic-tectonic) earthquakes and the b value of 1.9 that we calculated is anomalous for crustal earthquakes but consistent with volcanic-related events. Analysis of receiver functions at a station immediately above the seismicity indicated a Moho near 30 km depth. Precise relocation of the seismicity using a double-difference method suggested a horizontal migration at the rate of about 0:5 km=d, with almost all events within the lowermost crust. Neither harmonic tremor nor long-period events were observed; however, some spasmodic bursts were recorded and determined to be colocated with the earthquake hypocenters. These observations are all very similar to a deep earthquake sequence recorded beneath Lake Tahoe, California, in 2003-2004. Based on these remarkable similarities, we interpret the Nazko sequence as an indication of an injection of magma into the lower crust beneath the Anahim volcanic belt. This magma injection fractures rock, producing high-frequency, volcanic-tectonic earthquakes and spasmodic bursts.
NASA Astrophysics Data System (ADS)
Chen, Xinyuan; Song, Li; Yang, Xiaokang
2016-09-01
Video denoising can be described as the problem of mapping from a specific length of noisy frames to clean one. We propose a deep architecture based on Recurrent Neural Network (RNN) for video denoising. The model learns a patch-based end-to-end mapping between the clean and noisy video sequences. It takes the corrupted video sequences as the input and outputs the clean one. Our deep network, which we refer to as deep Recurrent Neural Networks (deep RNNs or DRNNs), stacks RNN layers where each layer receives the hidden state of the previous layer as input. Experiment shows (i) the recurrent architecture through temporal domain extracts motion information and does favor to video denoising, and (ii) deep architecture have large enough capacity for expressing mapping relation between corrupted videos as input and clean videos as output, furthermore, (iii) the model has generality to learned different mappings from videos corrupted by different types of noise (e.g., Poisson-Gaussian noise). By training on large video databases, we are able to compete with some existing video denoising methods.
Dessimoz, Christophe; Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro
2011-09-01
Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references.
Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro
2011-01-01
Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references. PMID:21712341
deepTools2: a next generation web server for deep-sequencing data analysis.
Ramírez, Fidel; Ryan, Devon P; Grüning, Björn; Bhardwaj, Vivek; Kilpert, Fabian; Richter, Andreas S; Heyne, Steffen; Dündar, Friederike; Manke, Thomas
2016-07-08
We present an update to our Galaxy-based web server for processing and visualizing deeply sequenced data. Its core tool set, deepTools, allows users to perform complete bioinformatic workflows ranging from quality controls and normalizations of aligned reads to integrative analyses, including clustering and visualization approaches. Since we first described our deepTools Galaxy server in 2014, we have implemented new solutions for many requests from the community and our users. Here, we introduce significant enhancements and new tools to further improve data visualization and interpretation. deepTools continue to be open to all users and freely available as a web service at deeptools.ie-freiburg.mpg.de The new deepTools2 suite can be easily deployed within any Galaxy framework via the toolshed repository, and we also provide source code for command line usage under Linux and Mac OS X. A public and documented API for access to deepTools functionality is also available. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Park, Gyeong-Moon; Yoo, Yong-Ho; Kim, Deok-Hwa; Kim, Jong-Hwan; Gyeong-Moon Park; Yong-Ho Yoo; Deok-Hwa Kim; Jong-Hwan Kim; Yoo, Yong-Ho; Park, Gyeong-Moon; Kim, Jong-Hwan; Kim, Deok-Hwa
2018-06-01
Robots are expected to perform smart services and to undertake various troublesome or difficult tasks in the place of humans. Since these human-scale tasks consist of a temporal sequence of events, robots need episodic memory to store and retrieve the sequences to perform the tasks autonomously in similar situations. As episodic memory, in this paper we propose a novel Deep adaptive resonance theory (ART) neural model and apply it to the task performance of the humanoid robot, Mybot, developed in the Robot Intelligence Technology Laboratory at KAIST. Deep ART has a deep structure to learn events, episodes, and even more like daily episodes. Moreover, it can retrieve the correct episode from partial input cues robustly. To demonstrate the effectiveness and applicability of the proposed Deep ART, experiments are conducted with the humanoid robot, Mybot, for performing the three tasks of arranging toys, making cereal, and disposing of garbage.
Research on Daily Objects Detection Based on Deep Neural Network
NASA Astrophysics Data System (ADS)
Ding, Sheng; Zhao, Kun
2018-03-01
With the rapid development of deep learning, great breakthroughs have been made in the field of object detection. In this article, the deep learning algorithm is applied to the detection of daily objects, and some progress has been made in this direction. Compared with traditional object detection methods, the daily objects detection method based on deep learning is faster and more accurate. The main research work of this article: 1. collect a small data set of daily objects; 2. in the TensorFlow framework to build different models of object detection, and use this data set training model; 3. the training process and effect of the model are improved by fine-tuning the model parameters.
RESPONSE OF GRANULATION TO SMALL-SCALE BRIGHT FEATURES IN THE QUIET SUN
DOE Office of Scientific and Technical Information (OSTI.GOV)
Andic, A.; Chae, J.; Goode, P. R.
2011-04-10
We detected 2.8 bright points (BPs) per Mm{sup 2} in the quiet Sun with the New Solar Telescope at Big Bear Solar Observatory, using the TiO 705.68 nm spectral line at an angular resolution {approx}0.''1 to obtain a 30 minute data sequence. Some BPs formed knots that were stable in time and influenced the properties of the granulation pattern around them. The observed granulation pattern within {approx}3'' of knots presents smaller granules than those observed in a normal granulation pattern, i.e., around the knots a suppressed convection is detected. Observed BPs covered {approx}5% of the solar surface and were notmore » homogeneously distributed. BPs had an average size of 0.''22, they were detectable for 4.28 minutes on average, and had an averaged contrast of 0.1% in the deep red TiO spectral line.« less
Wu, Jieying; Gao, Weimin; Zhang, Weiwen; Meldrum, Deirdre R
2011-01-01
Limitation in sample quality and quantity is one of the big obstacles for applying metatranscriptomic technologies to explore gene expression and functionality of microbial communities in natural environments. In this study, several amplification methods were evaluated for whole-transcriptome amplification of deep-sea microbial samples, which are of low cell density and high impurity. The best amplification method was identified and incorporated into a complete protocol to isolate and amplify deep-sea microbial samples. In the protocol, total RNA was first isolated by a modified method combining Trizol (Invitrogen, CA) and RNeasy (QIAGEN, CA) method, amplified with a WT-Ovation™ Pico RNA Amplification System (NuGEN, CA), and then converted to double-strand DNA from single-strand cDNA with a WT-Ovation™ Exon Module (NuGEN, CA). The products from the whole-transcriptome amplification of deep-sea microbial samples were assessed first through random clone library sequencing. The BLAST search results showed that marine-based sequences are dominant in the libraries, consistent with the ecological source of the samples. The products were then used for next-generation Roche GS FLX Titanium sequencing to obtain metatranscriptome data. Preliminary analysis of the metatranscriptomic data showed good sequencing quality. Although the protocol was designed and demonstrated to be effective for deep-sea microbial samples, it should be applicable to similar samples from other extreme environments in exploring community structure and functionality of microbial communities. Copyright © 2010 Elsevier B.V. All rights reserved.
Subburaj, Saminathan; Chung, Sung Jin; Lee, Choongil; Ryu, Seuk-Min; Kim, Duk Hyoung; Kim, Jin-Soo; Bae, Sangsu; Lee, Geung-Joo
2016-07-01
Site-directed mutagenesis of nitrate reductase genes using direct delivery of purified Cas9 protein preassembled with guide RNA produces mutations efficiently in Petunia × hybrida protoplast system. The clustered, regularly interspaced, short palindromic repeat (CRISPR)-CRISPR associated endonuclease 9 (CRISPR/Cas9) system has been recently announced as a powerful molecular breeding tool for site-directed mutagenesis in higher plants. Here, we report a site-directed mutagenesis method targeting Petunia nitrate reductase (NR) gene locus. This method could create mutations efficiently using direct delivery of purified Cas9 protein and single guide RNA (sgRNA) into protoplast cells. After transient introduction of RNA-guided endonuclease (RGEN) ribonucleoproteins (RNPs) with different sgRNAs targeting NR genes, mutagenesis at the targeted loci was detected by T7E1 assay and confirmed by targeted deep sequencing. T7E1 assay showed that RGEN RNPs induced site-specific mutations at frequencies ranging from 2.4 to 21 % at four different sites (NR1, 2, 4 and 6) in the PhNR gene locus with average mutation efficiency of 14.9 ± 2.2 %. Targeted deep DNA sequencing revealed mutation rates of 5.3-17.8 % with average mutation rate of 11.5 ± 2 % at the same NR gene target sites in DNA fragments of analyzed protoplast transfectants. Further analysis from targeted deep sequencing showed that the average ratio of deletion to insertion produced collectively by the four NR-RGEN target sites (NR1, 2, 4, and 6) was about 63:37. Our results demonstrated that direct delivery of RGEN RNPs into protoplast cells of Petunia can be exploited as an efficient tool for site-directed mutagenesis of genes or genome editing in plant systems.
Zeng, Cong; Thomas, Leighton J; Kelly, Michelle; Gardner, Jonathan P A
2016-05-01
The complete mitochondrial genome of a New Zealand specimen of the deep-sea sponge Poecillastra laminaris (Sollas, 1886) (Astrophorida, Vulcanellidae), from the Colville Ridge, New Zealand, was sequenced using the 454 Life Science pyrosequencing system. To identify homologous mitochondrial sequences, the 454 reads were mapped to the complete mitochondrial genome sequence of Geodia neptuni (GeneBank No. NC_006990). The P. laminaris genome is 18,413 bp in length and includes 14 protein-coding genes, 24 transfer RNA genes and 2 ribosomal RNA genes. Gene order resembled that of other demosponges. The base composition of the genome is A (29.1%), T (35.2%), C (14.0%) and G (21.7%). This is the second published mitogenome for a sponge of the order Astrophorida and will be useful in future phylogenetic analysis of deep-sea sponges.
Wang, Duolin; Zeng, Shuai; Xu, Chunhui; Qiu, Wangren; Liang, Yanchun; Joshi, Trupti; Xu, Dong
2017-12-15
Computational methods for phosphorylation site prediction play important roles in protein function studies and experimental design. Most existing methods are based on feature extraction, which may result in incomplete or biased features. Deep learning as the cutting-edge machine learning method has the ability to automatically discover complex representations of phosphorylation patterns from the raw sequences, and hence it provides a powerful tool for improvement of phosphorylation site prediction. We present MusiteDeep, the first deep-learning framework for predicting general and kinase-specific phosphorylation sites. MusiteDeep takes raw sequence data as input and uses convolutional neural networks with a novel two-dimensional attention mechanism. It achieves over a 50% relative improvement in the area under the precision-recall curve in general phosphorylation site prediction and obtains competitive results in kinase-specific prediction compared to other well-known tools on the benchmark data. MusiteDeep is provided as an open-source tool available at https://github.com/duolinwang/MusiteDeep. xudong@missouri.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Fungal diversity in deep-sea sediments associated with asphalt seeps at the Sao Paulo Plateau
NASA Astrophysics Data System (ADS)
Nagano, Yuriko; Miura, Toshiko; Nishi, Shinro; Lima, Andre O.; Nakayama, Cristina; Pellizari, Vivian H.; Fujikura, Katsunori
2017-12-01
We investigated the fungal diversity in a total of 20 deep-sea sediment samples (of which 14 samples were associated with natural asphalt seeps and 6 samples were not associated) collected from two different sites at the Sao Paulo Plateau off Brazil by Ion Torrent PGM targeting ITS region of ribosomal RNA. Our results suggest that diverse fungi (113 operational taxonomic units (OTUs) based on clustering at 97% sequence similarity assigned into 9 classes and 31 genus) are present in deep-sea sediment samples collected at the Sao Paulo Plateau, dominated by Ascomycota (74.3%), followed by Basidiomycota (11.5%), unidentified fungi (7.1%), and sequences with no affiliation to any organisms in the public database (7.1%). However, it was revealed that only three species, namely Penicillium sp., Cadophora malorum and Rhodosporidium diobovatum, were dominant, with the majority of OTUs remaining a minor community. Unexpectedly, there was no significant difference in major fungal community structure between the asphalt seep and non-asphalt seep sites, despite the presence of mass hydrocarbon deposits and the high amount of macro organisms surrounding the asphalt seeps. However, there were some differences in the minor fungal communities, with possible asphalt degrading fungi present specifically in the asphalt seep sites. In contrast, some differences were found between the two different sampling sites. Classification of OTUs revealed that only 47 (41.6%) fungal OTUs exhibited >97% sequence similarity, in comparison with pre-existing ITS sequences in public databases, indicating that a majority of deep-sea inhabiting fungal taxa still remain undescribed. Although our knowledge on fungi and their role in deep-sea environments is still limited and scarce, this study increases our understanding of fungal diversity and community structure in deep-sea environments.
Delving into the Deep Biosphere
NASA Astrophysics Data System (ADS)
Grim, S. L.; Sogin, M. L.; Boetius, A.; Briggs, B. R.; Brazelton, W. J.; D'Hondt, S. L.; Edwards, K. J.; Fisk, M. R.; Gaidos, E.; Gralnick, J.; Hinrichs, K.; Lazar, C.; Lavalleur, H.; Lever, M. A.; Marteinsson, V.; Moser, D. P.; Orcutt, B.; Pedersen, K.; Popa, R.; Ramette, A.; Schrenk, M. O.; Sylvan, J. B.; Smith, A. R.; Teske, A.; Walsh, E. A.; Colwell, F. S.
2013-12-01
The Census of Deep Life organized an international survey of microbial community diversity in terrestrial and marine deep subsurface environments. Habitats included subsurface continental fractured rock aquifers, volcanic and metamorphic subseafloor sedimentary units from the open ocean, subsurface oxic and anoxic sediments and underlying basaltic oceanic crust, and their overlying water columns. Our survey employed high-throughput pyrosequencing of the hypervariable V4-V6 16S rRNA gene of bacteria and archaea. We detected 1292 bacterial genera representing 40 phyla, and 99 archaeal genera from 30 phyla. Of these, a core group of thirteen bacterial genera occurred in every environment. A genus of the South African Goldmine Group (Euryarchaeota) was always present whenever archaea were detected. Members of the rare biosphere in one system often represented highly abundant taxa in other environments. Dispersal could account for this observation but mechanisms of transport remain elusive. Ralstonia (Betaproteobacteria) represented highly abundant taxa in marine communities and terrestrial rock, but generally low abundance organisms in groundwater. Some of these taxa could represent sample contamination, and their extensive distribution in several systems requires further assessment. An unknown Sphingobacteriales (Bacteroidetes) genus, Stenotrophomonas (Gammaproteobacteria), Acidovorax and Aquabacterium (both Betaproteobacteria), a Chlorobiales genus, and a TM7 genus were in the core group as well but more prevalent in terrestrial environments. Similarly, Bacillus (Firmicutes), a new cyanobacterial genus, Bradyrhizobium and Sphingomonas (both Alphaproteobacteria), a novel Acidobacteriaceae genus, and Variovorax (Betaproteobacteria) frequently occurred in marine systems but represented low abundance taxa in other environments. Communities tended to cluster by biome and material, and many genera were unique to systems. For example, certain Rhizobiales (Alphaproteobacteria) only occurred in groundwater, and select Firmicutes and actinobacterial taxa were specific to rock environments. We continue to investigate the ecological and physiological context of these organisms. By combining deep sequencing of microbial communities and geochemical and physical evaluations of their environments, we bring to light the diversity and scope of the deep biosphere and insight into the factors that determine the nature of these communities.
LookSeq: a browser-based viewer for deep sequencing data.
Manske, Heinrich Magnus; Kwiatkowski, Dominic P
2009-11-01
Sequencing a genome to great depth can be highly informative about heterogeneity within an individual or a population. Here we address the problem of how to visualize the multiple layers of information contained in deep sequencing data. We propose an interactive AJAX-based web viewer for browsing large data sets of aligned sequence reads. By enabling seamless browsing and fast zooming, the LookSeq program assists the user to assimilate information at different levels of resolution, from an overview of a genomic region to fine details such as heterogeneity within the sample. A specific problem, particularly if the sample is heterogeneous, is how to depict information about structural variation. LookSeq provides a simple graphical representation of paired sequence reads that is more revealing about potential insertions and deletions than are conventional methods.
Bessette, Sandrine; Moalic, Yann; Gautey, Sébastien; Lesongeur, Françoise; Godfroy, Anne; Toffin, Laurent
2017-01-01
Sitting at ∼5,000 m water depth on the Congo-Angola margin and ∼760 km offshore of the West African coast, the recent lobe complex of the Congo deep-sea fan receives large amounts of fluvial sediments (3–5% organic carbon). This organic-rich sedimentation area harbors habitats with chemosynthetic communities similar to those of cold seeps. In this study, we investigated relative abundance, diversity and distribution of aerobic methane-oxidizing bacteria (MOB) communities at the oxic–anoxic interface of sedimentary habitats by using fluorescence in situ hybridization and comparative sequence analysis of particulate mono-oxygenase (pmoA) genes. Our findings revealed that sedimentary habitats of the recent lobe complex hosted type I and type II MOB cells and comparisons of pmoA community compositions showed variations among the different organic-rich habitats. Furthermore, the pmoA lineages were taxonomically more diverse compared to methane seep environments and were related to those found at cold seeps. Surprisingly, MOB phylogenetic lineages typical of terrestrial environments were observed at such water depth. In contrast, MOB cells or pmoA sequences were not detected at the previous lobe complex that is disconnected from the Congo River inputs. PMID:28487684
Applegate, Tanya L; Gaudieri, Silvana; Plauzolles, Anne; Chopra, Abha; Grebely, Jason; Lucas, Michaela; Hellard, Margaret; Luciani, Fabio; Dore, Gregory J; Matthews, Gail V
2015-01-01
Direct-acting antivirals (DAAs) are predicted to transform hepatitis C therapy, yet little is known about the prevalence of naturally occurring resistance mutations in recently acquired HCV. This study aimed to determine the prevalence and frequency of drug resistance mutations in the viral quasispecies among HIV-positive and -negative individuals with recent HCV. The NS3 protease, NS5A and NS5B polymerase genes were amplified from 50 genotype 1a participants of the Australian Trial in Acute Hepatitis C. Amino acid variations at sites known to be associated with possible drug resistance were analysed by ultra-deep pyrosequencing. A total of 12% of individuals harboured dominant resistance mutations, while 36% demonstrated non-dominant resistant variants below that detectable by bulk sequencing (that is, <20%) but above a threshold of 1%. Resistance variants (<1%) were observed at most sites associated with DAA resistance from all classes, with the exception of sofosbuvir. Dominant resistant mutations were uncommonly observed in the setting of recent HCV. However, low-level mutations to all DAA classes were observed by deep sequencing at the majority of sites and in most individuals. The significance of these variants and impact on future treatment options remains to be determined. Clinicaltrials.gov NCT00192569.
Guo, Xingyi; Shi, Jiajun; Cai, Qiuyin; Shu, Xiao-Ou; He, Jing; Wen, Wanqing; Allen, Jamie; Pharoah, Paul; Dunning, Alison; Hunter, David J; Kraft, Peter; Easton, Douglas F; Zheng, Wei; Long, Jirong
2018-03-01
Functional disruptions of susceptibility genes by large genomic structure variant (SV) deletions in germlines are known to be associated with cancer risk. However, few studies have been conducted to systematically search for SV deletions in breast cancer susceptibility genes. We analysed deep (> 30x) whole-genome sequencing (WGS) data generated in blood samples from 128 breast cancer patients of Asian and European descent with either a strong family history of breast cancer or early cancer onset disease. To identify SV deletions in known or suspected breast cancer susceptibility genes, we used multiple SV calling tools including Genome STRiP, Delly, Manta, BreakDancer and Pindel. SV deletions were detected by at least three of these bioinformatics tools in five genes. Specifically, we identified heterozygous deletions covering a fraction of the coding regions of BRCA1 (with approximately 80kb in two patients), and TP53 genes (with ∼1.6 kb in two patients), and of intronic regions (∼1 kb) of the PALB2 (one patient), PTEN (three patients) and RAD51C genes (one patient). We confirmed the presence of these deletions using real-time quantitative PCR (qPCR). Our study identified novel SV deletions in breast cancer susceptibility genes and the identification of such SV deletions may improve clinical testing.
2008-09-20
surface gravity . With ourL-band spectra in order 25 (3.0450Y3.0865m;Fig. 1e), we detect strong HCN absorption (10% deep) and weaker C2H2 absorption in...Doppmann et al. 2005). The absorption lines of Na i and Mg i are particularly gravity and temperature sensitive, but in the opposite sense from each...other. For example, at cool effective temperatures (3200Y4500 K) and subdwarf surface gravities (3:5 log g 4:5) Na andMg lines both grow stronger as
NASA Astrophysics Data System (ADS)
Takahashi, Y.; Hata, T.; Nishida, H.
2017-12-01
In normal coring of deep marine sediments, the sampled cores are exposed to the pressure of the atmosphere, which results in dissociation of gas-hydrates and might change microbial diversity. In this study, we analyzed microbial composition in methane hydrate-bearing sediment core sampled and preserved by Hybrid-PCS (Pressure Coring System). We sliced core into three layers; (i) outside layer, which were most affected by drilling fluids, (ii) middle layer, and (iii) inner layer, which were expected to be most preserved as the original state. From each layer, we directly extracted DNA, and amplified V3-V4 region of 16S rRNA gene. We determined at least 5000 of nucleotide sequences of the partial 16S rDNA from each layer by Miseq (Illumina). In the all layers, facultative anaerobes, which can grow with or without oxygen because they can metabolize energy aerobically or anaerobically, were detected as majority. However, the genera which are often detected anaerobic environment is abundant in the inner layer compared to the outside layer, indicating that condition of drilling and preservation affect the microbial composition in the deep marine sediment core. This study was conducted as a part of the activity of the Research Consortium for Methane Hydrate Resources in Japan [MH21 consortium], and supported by JOGMEC (Japan Oil, Gas and Metals National Corporation). The sample was provided by AIST (National Institute of Advanced Industrial Science and Technology).
2015-01-01
Nematodes inhabiting benthic deep-sea ecosystems account for >90% of the total metazoan abundances and they have been hypothesised to be hyper-diverse, but their biodiversity is still largely unknown. Metabarcoding could facilitate the census of biodiversity, especially for those tiny metazoans for which morphological identification is difficult. We compared, for the first time, different DNA extraction procedures based on the use of two commercial kits and a previously published laboratory protocol and tested their suitability for sequencing analyses of 18S rDNA of marine nematodes. We also investigated the reliability of Roche 454 sequencing analyses for assessing the biodiversity of deep-sea nematode assemblages previously morphologically identified. Finally, intra-genomic variation in 18S rRNA gene repeats was investigated by Illumina MiSeq in different deep-sea nematode morphospecies to assess the influence of polymorphisms on nematode biodiversity estimates. Our results indicate that the two commercial kits should be preferred for the molecular analysis of biodiversity of deep-sea nematodes since they consistently provide amplifiable DNA suitable for sequencing. We report that the morphological identification of deep-sea nematodes matches the results obtained by metabarcoding analysis only at the order-family level and that a large portion of Operational Clustered Taxonomic Units (OCTUs) was not assigned. We also show that independently from the cut-off criteria and bioinformatic pipelines used, the number of OCTUs largely exceeds the number of individuals and that 18S rRNA gene of different morpho-species of nematodes displayed intra-genomic polymorphisms. Our results indicate that metabarcoding is an important tool to explore the diversity of deep-sea nematodes, but still fails in identifying most of the species due to limited number of sequences deposited in the public databases, and in providing quantitative data on the species encountered. These aspects should be carefully taken into account before using metabarcoding in quantitative ecological research and monitoring programmes of marine biodiversity. PMID:26701112
Dell'Anno, Antonio; Carugati, Laura; Corinaldesi, Cinzia; Riccioni, Giulia; Danovaro, Roberto
2015-01-01
Nematodes inhabiting benthic deep-sea ecosystems account for >90% of the total metazoan abundances and they have been hypothesised to be hyper-diverse, but their biodiversity is still largely unknown. Metabarcoding could facilitate the census of biodiversity, especially for those tiny metazoans for which morphological identification is difficult. We compared, for the first time, different DNA extraction procedures based on the use of two commercial kits and a previously published laboratory protocol and tested their suitability for sequencing analyses of 18S rDNA of marine nematodes. We also investigated the reliability of Roche 454 sequencing analyses for assessing the biodiversity of deep-sea nematode assemblages previously morphologically identified. Finally, intra-genomic variation in 18S rRNA gene repeats was investigated by Illumina MiSeq in different deep-sea nematode morphospecies to assess the influence of polymorphisms on nematode biodiversity estimates. Our results indicate that the two commercial kits should be preferred for the molecular analysis of biodiversity of deep-sea nematodes since they consistently provide amplifiable DNA suitable for sequencing. We report that the morphological identification of deep-sea nematodes matches the results obtained by metabarcoding analysis only at the order-family level and that a large portion of Operational Clustered Taxonomic Units (OCTUs) was not assigned. We also show that independently from the cut-off criteria and bioinformatic pipelines used, the number of OCTUs largely exceeds the number of individuals and that 18S rRNA gene of different morpho-species of nematodes displayed intra-genomic polymorphisms. Our results indicate that metabarcoding is an important tool to explore the diversity of deep-sea nematodes, but still fails in identifying most of the species due to limited number of sequences deposited in the public databases, and in providing quantitative data on the species encountered. These aspects should be carefully taken into account before using metabarcoding in quantitative ecological research and monitoring programmes of marine biodiversity.
Global diversity and biogeography of deep-sea pelagic prokaryotes.
Salazar, Guillem; Cornejo-Castillo, Francisco M; Benítez-Barrios, Verónica; Fraile-Nuez, Eugenio; Álvarez-Salgado, X Antón; Duarte, Carlos M; Gasol, Josep M; Acinas, Silvia G
2016-03-01
The deep-sea is the largest biome of the biosphere, and contains more than half of the whole ocean's microbes. Uncovering their general patterns of diversity and community structure at a global scale remains a great challenge, as only fragmentary information of deep-sea microbial diversity exists based on regional-scale studies. Here we report the first globally comprehensive survey of the prokaryotic communities inhabiting the bathypelagic ocean using high-throughput sequencing of the 16S rRNA gene. This work identifies the dominant prokaryotes in the pelagic deep ocean and reveals that 50% of the operational taxonomic units (OTUs) belong to previously unknown prokaryotic taxa, most of which are rare and appear in just a few samples. We show that whereas the local richness of communities is comparable to that observed in previous regional studies, the global pool of prokaryotic taxa detected is modest (~3600 OTUs), as a high proportion of OTUs are shared among samples. The water masses appear to act as clear drivers of the geographical distribution of both particle-attached and free-living prokaryotes. In addition, we show that the deep-oceanic basins in which the bathypelagic realm is divided contain different particle-attached (but not free-living) microbial communities. The combination of the aging of the water masses and a lack of complete dispersal are identified as the main drivers for this biogeographical pattern. All together, we identify the potential of the deep ocean as a reservoir of still unknown biological diversity with a higher degree of spatial complexity than hitherto considered.
Global diversity and biogeography of deep-sea pelagic prokaryotes
Salazar, Guillem; Cornejo-Castillo, Francisco M; Benítez-Barrios, Verónica; Fraile-Nuez, Eugenio; Álvarez-Salgado, X Antón; Duarte, Carlos M; Gasol, Josep M; Acinas, Silvia G
2016-01-01
The deep-sea is the largest biome of the biosphere, and contains more than half of the whole ocean's microbes. Uncovering their general patterns of diversity and community structure at a global scale remains a great challenge, as only fragmentary information of deep-sea microbial diversity exists based on regional-scale studies. Here we report the first globally comprehensive survey of the prokaryotic communities inhabiting the bathypelagic ocean using high-throughput sequencing of the 16S rRNA gene. This work identifies the dominant prokaryotes in the pelagic deep ocean and reveals that 50% of the operational taxonomic units (OTUs) belong to previously unknown prokaryotic taxa, most of which are rare and appear in just a few samples. We show that whereas the local richness of communities is comparable to that observed in previous regional studies, the global pool of prokaryotic taxa detected is modest (~3600 OTUs), as a high proportion of OTUs are shared among samples. The water masses appear to act as clear drivers of the geographical distribution of both particle-attached and free-living prokaryotes. In addition, we show that the deep-oceanic basins in which the bathypelagic realm is divided contain different particle-attached (but not free-living) microbial communities. The combination of the aging of the water masses and a lack of complete dispersal are identified as the main drivers for this biogeographical pattern. All together, we identify the potential of the deep ocean as a reservoir of still unknown biological diversity with a higher degree of spatial complexity than hitherto considered. PMID:26251871
The deep, hot biosphere: Twenty-five years of retrospection
Colman, Daniel R.; Poudel, Saroj; Stamps, Blake W.; Boyd, Eric S.; Spear, John R.
2017-01-01
Twenty-five years ago this month, Thomas Gold published a seminal manuscript suggesting the presence of a “deep, hot biosphere” in the Earth’s crust. Since this publication, a considerable amount of attention has been given to the study of deep biospheres, their role in geochemical cycles, and their potential to inform on the origin of life and its potential outside of Earth. Overwhelming evidence now supports the presence of a deep biosphere ubiquitously distributed on Earth in both terrestrial and marine settings. Furthermore, it has become apparent that much of this life is dependent on lithogenically sourced high-energy compounds to sustain productivity. A vast diversity of uncultivated microorganisms has been detected in subsurface environments, and we show that H2, CH4, and CO feature prominently in many of their predicted metabolisms. Despite 25 years of intense study, key questions remain on life in the deep subsurface, including whether it is endemic and the extent of its involvement in the anaerobic formation and degradation of hydrocarbons. Emergent data from cultivation and next-generation sequencing approaches continue to provide promising new hints to answer these questions. As Gold suggested, and as has become increasingly evident, to better understand the subsurface is critical to further understanding the Earth, life, the evolution of life, and the potential for life elsewhere. To this end, we suggest the need to develop a robust network of interdisciplinary scientists and accessible field sites for long-term monitoring of the Earth’s subsurface in the form of a deep subsurface microbiome initiative. PMID:28674200
Takai, Ken; Horikoshi, Koki
1999-01-01
Molecular phylogenetic analysis of a naturally occurring microbial community in a deep-subsurface geothermal environment indicated that the phylogenetic diversity of the microbial population in the environment was extremely limited and that only hyperthermophilic archaeal members closely related to Pyrobaculum were present. All archaeal ribosomal DNA sequences contained intron-like sequences, some of which had open reading frames with repeated homing-endonuclease motifs. The sequence similarity analysis and the phylogenetic analysis of these homing endonucleases suggested the possible phylogenetic relationship among archaeal rRNA-encoded homing endonucleases. PMID:10584021
The star forming universe after z=1
NASA Astrophysics Data System (ADS)
Harker, Justin J.
This dissertation explores three projects in the field of galaxy formation and evolution: the formation of the red sequence via quenching, the detection, characterization, and frequency of starbursts in the DEEP2 sample, and the behavior of a main sequence of star forming galaxies whose behavior is determined by baryonic mass, referred to as staged star formation. The first section, in Chapter 2, presents a breakdown of several population synthesis models designed to probe the history of the red sequence. Known from measurements at low redshift to be composed of objects with a large range of ages, the red sequence is not well-modeled as being the result of a single monolithic event in the distant past. By combining information on restframe color, Balmer absorption line strengths, and the number density of L* galaxies as a function of redshift, we find evidence that the red sequence is built up over time. The second section, in Chapter 3 and 4, presents a novel method for determining simultaneously the absorption line and emission line contributions to the total measured equivalent width of Balmer lines. Relying on the predictable behavior of both absorption lines, which are to first order equivalent to one another, and emission lines, which follow a predictable decrement toward shorter wavelengths, a single measurement of total line strength for Hb and Hd yield uncoupled emission and absorption line components. Using the measurement of Hd in absorption against D n 4000 and Hb in emission, we isolate a population of potential starbursts in the DEEP2 sample. The final section, in Chapter 5, explores the regularity of star formation as a function of redshift, using the staged star formation prescription of Noeske et al. (2007a). We compute a set of t-models using the prescription, and compare them to the data in a number of parameters in addition to mass and star formation. While the staged star formation model is a good match in a number of parameters, we find several irregularities.
Yu, Hui; Zhang, Victor Wei; Stray-Pedersen, Asbjørg; Hanson, Imelda Celine; Forbes, Lisa R; de la Morena, M Teresa; Chinn, Ivan K; Gorman, Elizabeth; Mendelsohn, Nancy J; Pozos, Tamara; Wiszniewski, Wojciech; Nicholas, Sarah K; Yates, Anne B; Moore, Lindsey E; Berge, Knut Erik; Sorte, Hanne; Bayer, Diana K; ALZahrani, Daifulah; Geha, Raif S; Feng, Yanming; Wang, Guoli; Orange, Jordan S; Lupski, James R; Wang, Jing; Wong, Lee-Jun
2016-10-01
Primary immunodeficiency diseases (PIDDs) are inherited disorders of the immune system. The most severe form, severe combined immunodeficiency (SCID), presents with profound deficiencies of T cells, B cells, or both at birth. If not treated promptly, affected patients usually do not live beyond infancy because of infections. Genetic heterogeneity of SCID frequently delays the diagnosis; a specific diagnosis is crucial for life-saving treatment and optimal management. We developed a next-generation sequencing (NGS)-based multigene-targeted panel for SCID and other severe PIDDs requiring rapid therapeutic actions in a clinical laboratory setting. The target gene capture/NGS assay provides an average read depth of approximately 1000×. The deep coverage facilitates simultaneous detection of single nucleotide variants and exonic copy number variants in one comprehensive assessment. Exons with insufficient coverage (<20× read depth) or high sequence homology (pseudogenes) are complemented by amplicon-based sequencing with specific primers to ensure 100% coverage of all targeted regions. Analysis of 20 patient samples with low T-cell receptor excision circle numbers on newborn screening or a positive family history or clinical suspicion of SCID or other severe PIDD identified deleterious mutations in 14 of them. Identified pathogenic variants included both single nucleotide variants and exonic copy number variants, such as hemizygous nonsense, frameshift, and missense changes in IL2RG; compound heterozygous changes in ATM, RAG1, and CIITA; homozygous changes in DCLRE1C and IL7R; and a heterozygous nonsense mutation in CHD7. High-throughput deep sequencing analysis with complete clinical validation greatly increases the diagnostic yield of severe primary immunodeficiency. Establishing a molecular diagnosis enables early immune reconstitution through prompt therapeutic intervention and guides management for improved long-term quality of life. Copyright © 2016 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.
Wang, Ruijia; Nambiar, Ram; Zheng, Dinghai
2018-01-01
Abstract PolyA_DB is a database cataloging cleavage and polyadenylation sites (PASs) in several genomes. Previous versions were based mainly on expressed sequence tags (ESTs), which had a limited amount and could lead to inaccurate PAS identification due to the presence of internal A-rich sequences in transcripts. Here, we present an updated version of the database based solely on deep sequencing data. First, PASs are mapped by the 3′ region extraction and deep sequencing (3′READS) method, ensuring unequivocal PAS identification. Second, a large volume of data based on diverse biological samples increases PAS coverage by 3.5-fold over the EST-based version and provides PAS usage information. Third, strand-specific RNA-seq data are used to extend annotated 3′ ends of genes to obtain more thorough annotations of alternative polyadenylation (APA) sites. Fourth, conservation information of PAS across mammals sheds light on significance of APA sites. The database (URL: http://www.polya-db.org/v3) currently holds PASs in human, mouse, rat and chicken, and has links to the UCSC genome browser for further visualization and for integration with other genomic data. PMID:29069441
Deep whole-genome sequencing of 90 Han Chinese genomes.
Lan, Tianming; Lin, Haoxiang; Zhu, Wenjuan; Laurent, Tellier Christian Asker Melchior; Yang, Mengcheng; Liu, Xin; Wang, Jun; Wang, Jian; Yang, Huanming; Xu, Xun; Guo, Xiaosen
2017-09-01
Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low-frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low-frequency and novel variants. Although whole-exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole-genome sequencing data are limited for any population, and a large amount of low-frequency, population-specific variants remain uncharacterized. We have performed whole-genome sequencing at a high depth (∼×80) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genomes Project samples, including 45 Northern Han Chinese and 45 Southern Han Chinese samples. Eighty-three of these 90 have been sequenced by the 1000 Genomes Project. We have identified 12 568 804 single nucleotide polymorphisms, 2 074 210 short InDels, and 26 142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7 000 629 novel variants with low frequency (defined as minor allele frequency < 5%), including 5 813 503 single nucleotide polymorphisms, 1 169 199 InDels, and 17 927 structural variants. Using deep sequencing data, we have built a greatly expanded spectrum of genetic variation for the Han Chinese genome. Compared to the 1000 Genomes Project, these Han Chinese deep sequencing data enhance the characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement to the 1000 Genomes Project, as well as to other human genome projects. © The Authors 2017. Published by Oxford University Press.
Novel intra-genic large deletions of CTNNB1 gene identified in WT desmoid-type fibromatosis.
Colombo, Chiara; Urbini, Milena; Astolfi, Annalisa; Collini, Paola; Indio, Valentina; Belfiore, Antonino; Paielli, Nicholas; Perrone, Federica; Tarantino, Giuseppe; Palassini, Elena; Fiore, Marco; Pession, Andrea; Stacchiotti, Silvia; Pantaleo, Maria Abbondanza; Gronchi, Alessandro
2018-06-14
A wait and see approach for desmoid tumors (DT) has become part of the routine treatment strategy. However, predictive factors to select the risk of progressive disease are still lacking. A translational project was run in order to identify genomic signatures in patients enrolled within an Italian prospective observational study. Among 12 DT patients (ten CTNNB1-mutated and two WT) enrolled from our Institution only two patients (17%) showed a progressive disease. Tumor biopsies were collected for whole exome sequencing. Overall, DT exhibited low somatic sequence mutation rate and no additional recurrent mutation was found. In the two WT cases, two novel alterations were detected: a complex deletion of APC and a pathogenic mutation of LAMTOR2. Focusing on WT DT subtype, deep sequencing of CTNNB1, APC and LAMTOR2 was conducted on a retrospective series of 11 WT DT using a targeted approach. No other mutation of LAMTOR2 was detected, while APC was mutated in two cases. Low-frequency (mean reads of 16%) CTNNB1 mutations were discovered in five samples (45%) and two novel intra-genic deletions in CTNNB1 were detected in two cases. Both deletions and low frequency mutations of CTNNB1 were highly expressed. In conclusion, a minority of DT is WT for either CTNNB1, APC or any other gene involved in the WNT pathway. In this subgroup novel and hard to be detected molecular alterations in APC and CTNNB1 were discovered, contributing to explain a portion of the allegedly WT DT cases. This article is protected by copyright. All rights reserved. © 2018 Wiley Periodicals, Inc.
UV/Optical Detections of Candidate Tidal Disruption Events by GALEX and CFHTLS
NASA Astrophysics Data System (ADS)
Gezari, S.; Basa, S.; Martin, D. C.; Bazin, G.; Forster, K.; Milliard, B.; Halpern, J. P.; Friedman, P. G.; Morrissey, P.; Neff, S. G.; Schiminovich, D.; Seibert, M.; Small, T.; Wyder, T. K.
2008-04-01
We present two luminous UV/optical flares from the nuclei of apparently inactive early-type galaxies at z = 0.37 and 0.33 that have the radiative properties of a flare from the tidal disruption of a star. In this paper we report the second candidate tidal disruption event discovery in the UV by the GALEX Deep Imaging Survey and present simultaneous optical light curves from the CFHTLS Deep Imaging Survey for both UV flares. The first few months of the UV/optical light curves are well fitted with the canonical t-5/3 power-law decay predicted for emission from the fallback of debris from a tidally disrupted star. Chandra ACIS X-ray observations during the flares detect soft X-ray sources with Tbb = (2-5) × 105 K or Γ > 3 and place limits on hard X-ray emission from an underlying AGN down to LX(2-10 keV) lesssim 1041 ergs s-1. Blackbody fits to the UV/optical spectral energy distributions of the flares indicate peak flare luminosities of gtrsim1044-1045 ergs s-1. The temperature, luminosity, and light curves of both flares are in excellent agreement with emission from a tidally disrupted main-sequence star onto a central black hole of several times 107 M⊙. The observed detection rate of our search over ~2.9 deg2 of GALEX Deep Imaging Survey data spanning from 2003 to 2007 is consistent with tidal disruption rates calculated from dynamical models, and we use these models to make predictions for the detection rates of the next generation of optical synoptic surveys. Some of the data presented herein were obtained at the W. M. Keck Observatory, which is operated as a scientific partnership among the California Institute of Technology, the University of California, and the National Aeronautics and Space Administration. The Observatory was made possible by the generous financial support of the W. M. Keck Foundation.
2010-01-01
Background Molecular characterization of collagen-VI related myopathies currently relies on standard sequencing, which yields a detection rate approximating 75-79% in Ullrich congenital muscular dystrophy (UCMD) and 60-65% in Bethlem myopathy (BM) patients as PCR-based techniques tend to miss gross genomic rearrangements as well as copy number variations (CNVs) in both the coding sequence and intronic regions. Methods We have designed a custom oligonucleotide CGH array in order to investigate the presence of CNVs in the coding and non-coding regions of COL6A1, A2, A3, A5 and A6 genes and a group of genes functionally related to collagen VI. A cohort of 12 patients with UCMD/BM negative at sequencing analysis and 2 subjects carrying a single COL6 mutation whose clinical phenotype was not explicable by inheritance were selected and the occurrence of allelic and genetic heterogeneity explored. Results A deletion within intron 1A of the COL6A2 gene, occurring in compound heterozygosity with a small deletion in exon 28, previously detected by routine sequencing, was identified in a BM patient. RNA studies showed monoallelic transcription of the COL6A2 gene, thus elucidating the functional effect of the intronic deletion. No pathogenic mutations were identified in the remaining analyzed patients, either within COL6A genes, or in genes functionally related to collagen VI. Conclusions Our custom CGH array may represent a useful complementary diagnostic tool, especially in recessive forms of the disease, when only one mutant allele is detected by standard sequencing. The intronic deletion we identified represents the first example of a pure intronic mutation in COL6A genes. PMID:20302629
Optimization of conditions to sequence long cDNAs from viruses
USDA-ARS?s Scientific Manuscript database
Fourth generation sequencing with the Minion nanopore sequencer provides opportunity to obtain deep coverage and long read for single molecules. This will benefit studies on RNA viruses. In the past, Sanger, Illumina, and Ion Torrent sequencing have been utilized to study RNA viruses. Both technique...
SNP discovery through de novo deep sequencing using the next generation of DNA sequencers
USDA-ARS?s Scientific Manuscript database
The production of high volumes of DNA sequence data using new technologies has permitted more efficient identification of single nucleotide polymorphisms in vertebrate genomes. This chapter presented practical methodology for production and analysis of DNA sequence data for SNP discovery....
Paparini, Andrea; Gofton, Alexander; Yang, Rongchang; White, Nicole; Bunce, Michael; Ryan, Una M
2015-01-01
Cryptosporidium is an important enteric pathogen that infects a wide range of humans and animals. Rapid and reliable detection and characterisation methods are essential for understanding the transmission dynamics of the parasite. Sanger sequencing, and high-throughput sequencing (HTS) on an Ion Torrent platform, were compared with each other for their sensitivity and accuracy in detecting and characterising 25 Cryptosporidium-positive human and animal faecal samples. Ion Torrent reads (n = 123,857) were obtained at both 18S rRNA and actin loci for 21 of the 25 samples. Of these, one isolate at the actin locus (Cattle 05) and three at the 18S rRNA locus (HTS 10, HTS 11 and HTS 12), suffered PCR drop-out (i.e. PCR failures) when using fusion-tagged PCR. Sanger sequences were obtained for both loci for 23 of the 25 samples and showed good agreement with Ion Torrent-based genotyping. Two samples both from pythons (SK 02 and SK 05) produced mixed 18S and actin chromatograms by Sanger sequencing but were clearly identified by Ion Torrent sequencing as C. muris. One isolate (SK 03) was typed as C. muris by Sanger sequencing but was identified as a mixed C. muris and C. tyzzeri infection by HTS. 18S rRNA Type B sequences were identified in 4/6 C. parvum isolates when deep sequenced but were undetected in Sanger sequencing. Sanger was cheaper than Ion Torrent when sequencing a small numbers of samples, but when larger numbers of samples are considered (n = 60), the costs were comparative. Fusion-tagged amplicon based approaches are a powerful way of approaching mixtures, the only draw-back being the loss of PCR efficiency on low-template samples when using primers coupled to MID tags and adaptors. Taken together these data show that HTS has excellent potential for revealing the "true" composition of species/types in a Cryptosporidium infection, but that HTS workflows need to be carefully developed to ensure sensitivity, accuracy and contamination are controlled. Copyright © 2015 Elsevier Inc. All rights reserved.
MRI markers of small vessel disease in lobar and deep hemispheric intracerebral hemorrhage.
Smith, Eric E; Nandigam, Kaveer R N; Chen, Yu-Wei; Jeng, Jed; Salat, David; Halpin, Amy; Frosch, Matthew; Wendell, Lauren; Fazen, Louis; Rosand, Jonathan; Viswanathan, Anand; Greenberg, Steven M
2010-09-01
MRI evidence of small vessel disease is common in intracerebral hemorrhage (ICH). We hypothesized that ICH caused by cerebral amyloid angiopathy (CAA) or hypertensive vasculopathy would have different distributions of MRI T2 white matter hyperintensity (WMH) and microbleeds. Data were analyzed from 133 consecutive patients with primary supratentorial ICH and adequate MRI sequences. CAA was diagnosed using the Boston criteria. WMH segmentation was performed using a validated semiautomated method. WMH and microbleeds were compared according to site of symptomatic hematoma origin (lobar versus deep) or by pattern of hemorrhages, including both hematomas and microbleeds, on MRI gradient recalled echo sequence (grouped as lobar only-probable CAA, lobar only-possible CAA, deep hemispheric only, or mixed lobar and deep hemorrhages). Patients with lobar and deep hemispheric hematoma had similar median normalized WMH volumes (19.5 cm versus 19.9 cm(3), P=0.74) and prevalence of >or=1 microbleed (54% versus 52%, P=0.99). The supratentorial WMH distribution was similar according to hemorrhage location category; however, the prevalence of brain stem T2 hyperintensity was lower in lobar hematoma versus deep hematoma (54% versus 70%, P=0.004). Mixed ICH was common (23%). Patients with mixed ICH had large normalized WMH volumes and a posterior distribution of cortical hemorrhages similar to that seen in CAA. WMH distribution is largely similar between CAA-related and non-CAA-related ICH. Mixed lobar and deep hemorrhages are seen on MRI gradient recalled echo sequence in up to one fourth of patients; in these patients, both hypertension and CAA may be contributing to the burden of WMH.
Deep sequencing approaches for the analysis of prokaryotic transcriptional boundaries and dynamics.
James, Katherine; Cockell, Simon J; Zenkin, Nikolay
2017-05-01
The identification of the protein-coding regions of a genome is straightforward due to the universality of start and stop codons. However, the boundaries of the transcribed regions, conditional operon structures, non-coding RNAs and the dynamics of transcription, such as pausing of elongation, are non-trivial to identify, even in the comparatively simple genomes of prokaryotes. Traditional methods for the study of these areas, such as tiling arrays, are noisy, labour-intensive and lack the resolution required for densely-packed bacterial genomes. Recently, deep sequencing has become increasingly popular for the study of the transcriptome due to its lower costs, higher accuracy and single nucleotide resolution. These methods have revolutionised our understanding of prokaryotic transcriptional dynamics. Here, we review the deep sequencing and data analysis techniques that are available for the study of transcription in prokaryotes, and discuss the bioinformatic considerations of these analyses. Copyright © 2017 Elsevier Inc. All rights reserved.
Insertion sequences enrichment in extreme Red sea brine pool vent.
Elbehery, Ali H A; Aziz, Ramy K; Siam, Rania
2017-03-01
Mobile genetic elements are major agents of genome diversification and evolution. Limited studies addressed their characteristics, including abundance, and role in extreme habitats. One of the rare natural habitats exposed to multiple-extreme conditions, including high temperature, salinity and concentration of heavy metals, are the Red Sea brine pools. We assessed the abundance and distribution of different mobile genetic elements in four Red Sea brine pools including the world's largest known multiple-extreme deep-sea environment, the Red Sea Atlantis II Deep. We report a gradient in the abundance of mobile genetic elements, dramatically increasing in the harshest environment of the pool. Additionally, we identified a strong association between the abundance of insertion sequences and extreme conditions, being highest in the harshest and deepest layer of the Red Sea Atlantis II Deep. Our comparative analyses of mobile genetic elements in secluded, extreme and relatively non-extreme environments, suggest that insertion sequences predominantly contribute to polyextremophiles genome plasticity.
MutScan: fast detection and visualization of target mutations by scanning FASTQ data.
Chen, Shifu; Huang, Tanxiao; Wen, Tiexiang; Li, Hong; Xu, Mingyan; Gu, Jia
2018-01-22
Some types of clinical genetic tests, such as cancer testing using circulating tumor DNA (ctDNA), require sensitive detection of known target mutations. However, conventional next-generation sequencing (NGS) data analysis pipelines typically involve different steps of filtering, which may cause miss-detection of key mutations with low frequencies. Variant validation is also indicated for key mutations detected by bioinformatics pipelines. Typically, this process can be executed using alignment visualization tools such as IGV or GenomeBrowse. However, these tools are too heavy and therefore unsuitable for validating mutations in ultra-deep sequencing data. We developed MutScan to address problems of sensitive detection and efficient validation for target mutations. MutScan involves highly optimized string-searching algorithms, which can scan input FASTQ files to grab all reads that support target mutations. The collected supporting reads for each target mutation will be piled up and visualized using web technologies such as HTML and JavaScript. Algorithms such as rolling hash and bloom filter are applied to accelerate scanning and make MutScan applicable to detect or visualize target mutations in a very fast way. MutScan is a tool for the detection and visualization of target mutations by only scanning FASTQ raw data directly. Compared to conventional pipelines, this offers a very high performance, executing about 20 times faster, and offering maximal sensitivity since it can grab mutations with even one single supporting read. MutScan visualizes detected mutations by generating interactive pile-ups using web technologies. These can serve to validate target mutations, thus avoiding false positives. Furthermore, MutScan can visualize all mutation records in a VCF file to HTML pages for cloud-friendly VCF validation. MutScan is an open source tool available at GitHub: https://github.com/OpenGene/MutScan.
Kretova, Olga V; Chechetkin, Vladimir R; Fedoseeva, Daria M; Kravatsky, Yuri V; Sosin, Dmitri V; Alembekov, Ildar R; Gorbacheva, Maria A; Gashnikova, Natalya M; Tchurikov, Nickolai A
2017-02-01
Any method for silencing the activity of the HIV-1 retrovirus should tackle the extremely high variability of HIV-1 sequences and mutational escape. We studied sequence variability in the vicinity of selected RNA interference (RNAi) targets from isolates of HIV-1 subtype A in Russia, and we propose that using artificial RNAi is a potential alternative to traditional antiretroviral therapy. We prove that using multiple RNAi targets overcomes the variability in HIV-1 isolates. The optimal number of targets critically depends on the conservation of the target sequences. The total number of targets that are conserved with a probability of 0.7-0.8 should exceed at least 2. Combining deep sequencing and multitarget RNAi may provide an efficient approach to cure HIV/AIDS.
Tenggardjaja, Kimberly A; Bowen, Brian W; Bernardi, Giacomo
2014-01-01
Understanding vertical and horizontal connectivity is a major priority in research on mesophotic coral ecosystems (30-150 m). However, horizontal connectivity has been the focus of few studies, and data on vertical connectivity are limited to sessile benthic mesophotic organisms. Here we present patterns of vertical and horizontal connectivity in the Hawaiian Islands-Johnston Atoll endemic threespot damselfish, Chromis verater, based on 319 shallow specimens and 153 deep specimens. The mtDNA markers cytochrome b and control region were sequenced to analyze genetic structure: 1) between shallow (< 30 m) and mesophotic (30-150 m) populations and 2) across the species' geographic range. Additionally, the nuclear markers rhodopsin and internal transcribed spacer 2 of ribosomal DNA were sequenced to assess connectivity between shallow and mesophotic populations. There was no significant genetic differentiation by depth, indicating high levels of vertical connectivity between shallow and deep aggregates of C. verater. Consequently, shallow and deep samples were combined by location for analyses of horizontal connectivity. We detected low but significant population structure across the Hawaiian Archipelago (overall cytochrome b: ΦST = 0.009, P = 0.020; control region: ΦST = 0.012, P = 0.009) and a larger break between the archipelago and Johnston Atoll (cytochrome b: ΦST = 0.068, P < 0.001; control region: ΦST = 0.116, P < 0.001). The population structure within the archipelago was driven by samples from the island of Hawaii at the southeast end of the chain and Lisianski in the middle of the archipelago. The lack of vertical genetic structure supports the refugia hypothesis that deep reefs may constitute a population reservoir for species depleted in shallow reef habitats. These findings represent the first connectivity study on a mobile organism that spans shallow and mesophotic depths and provide a reference point for future connectivity studies on mesophotic fishes.
Tenggardjaja, Kimberly A.; Bowen, Brian W.; Bernardi, Giacomo
2014-01-01
Understanding vertical and horizontal connectivity is a major priority in research on mesophotic coral ecosystems (30–150 m). However, horizontal connectivity has been the focus of few studies, and data on vertical connectivity are limited to sessile benthic mesophotic organisms. Here we present patterns of vertical and horizontal connectivity in the Hawaiian Islands-Johnston Atoll endemic threespot damselfish, Chromis verater, based on 319 shallow specimens and 153 deep specimens. The mtDNA markers cytochrome b and control region were sequenced to analyze genetic structure: 1) between shallow (<30 m) and mesophotic (30–150 m) populations and 2) across the species' geographic range. Additionally, the nuclear markers rhodopsin and internal transcribed spacer 2 of ribosomal DNA were sequenced to assess connectivity between shallow and mesophotic populations. There was no significant genetic differentiation by depth, indicating high levels of vertical connectivity between shallow and deep aggregates of C. verater. Consequently, shallow and deep samples were combined by location for analyses of horizontal connectivity. We detected low but significant population structure across the Hawaiian Archipelago (overall cytochrome b: ΦST = 0.009, P = 0.020; control region: ΦST = 0.012, P = 0.009) and a larger break between the archipelago and Johnston Atoll (cytochrome b: ΦST = 0.068, P<0.001; control region: ΦST = 0.116, P<0.001). The population structure within the archipelago was driven by samples from the island of Hawaii at the southeast end of the chain and Lisianski in the middle of the archipelago. The lack of vertical genetic structure supports the refugia hypothesis that deep reefs may constitute a population reservoir for species depleted in shallow reef habitats. These findings represent the first connectivity study on a mobile organism that spans shallow and mesophotic depths and provide a reference point for future connectivity studies on mesophotic fishes. PMID:25517964
DeepLoc: prediction of protein subcellular localization using deep learning.
Almagro Armenteros, José Juan; Sønderby, Casper Kaae; Sønderby, Søren Kaae; Nielsen, Henrik; Winther, Ole
2017-11-01
The prediction of eukaryotic protein subcellular localization is a well-studied topic in bioinformatics due to its relevance in proteomics research. Many machine learning methods have been successfully applied in this task, but in most of them, predictions rely on annotation of homologues from knowledge databases. For novel proteins where no annotated homologues exist, and for predicting the effects of sequence variants, it is desirable to have methods for predicting protein properties from sequence information only. Here, we present a prediction algorithm using deep neural networks to predict protein subcellular localization relying only on sequence information. At its core, the prediction model uses a recurrent neural network that processes the entire protein sequence and an attention mechanism identifying protein regions important for the subcellular localization. The model was trained and tested on a protein dataset extracted from one of the latest UniProt releases, in which experimentally annotated proteins follow more stringent criteria than previously. We demonstrate that our model achieves a good accuracy (78% for 10 categories; 92% for membrane-bound or soluble), outperforming current state-of-the-art algorithms, including those relying on homology information. The method is available as a web server at http://www.cbs.dtu.dk/services/DeepLoc. Example code is available at https://github.com/JJAlmagro/subcellular_localization. The dataset is available at http://www.cbs.dtu.dk/services/DeepLoc/data.php. jjalma@dtu.dk. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Xiong, Dapeng; Zeng, Jianyang; Gong, Haipeng
2017-09-01
Residue-residue contacts are of great value for protein structure prediction, since contact information, especially from those long-range residue pairs, can significantly reduce the complexity of conformational sampling for protein structure prediction in practice. Despite progresses in the past decade on protein targets with abundant homologous sequences, accurate contact prediction for proteins with limited sequence information is still far from satisfaction. Methodologies for these hard targets still need further improvement. We presented a computational program DeepConPred, which includes a pipeline of two novel deep-learning-based methods (DeepCCon and DeepRCon) as well as a contact refinement step, to improve the prediction of long-range residue contacts from primary sequences. When compared with previous prediction approaches, our framework employed an effective scheme to identify optimal and important features for contact prediction, and was only trained with coevolutionary information derived from a limited number of homologous sequences to ensure robustness and usefulness for hard targets. Independent tests showed that 59.33%/49.97%, 64.39%/54.01% and 70.00%/59.81% of the top L/5, top L/10 and top 5 predictions were correct for CASP10/CASP11 proteins, respectively. In general, our algorithm ranked as one of the best methods for CASP targets. All source data and codes are available at http://166.111.152.91/Downloads.html . hgong@tsinghua.edu.cn or zengjy321@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Dissecting enzyme function with microfluidic-based deep mutational scanning.
Romero, Philip A; Tran, Tuan M; Abate, Adam R
2015-06-09
Natural enzymes are incredibly proficient catalysts, but engineering them to have new or improved functions is challenging due to the complexity of how an enzyme's sequence relates to its biochemical properties. Here, we present an ultrahigh-throughput method for mapping enzyme sequence-function relationships that combines droplet microfluidic screening with next-generation DNA sequencing. We apply our method to map the activity of millions of glycosidase sequence variants. Microfluidic-based deep mutational scanning provides a comprehensive and unbiased view of the enzyme function landscape. The mapping displays expected patterns of mutational tolerance and a strong correspondence to sequence variation within the enzyme family, but also reveals previously unreported sites that are crucial for glycosidase function. We modified the screening protocol to include a high-temperature incubation step, and the resulting thermotolerance landscape allowed the discovery of mutations that enhance enzyme thermostability. Droplet microfluidics provides a general platform for enzyme screening that, when combined with DNA-sequencing technologies, enables high-throughput mapping of enzyme sequence space.
Detection of deep venous thrombophlebitis by gallium 67 scintigraphy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, J.H.
1981-07-01
Deep venous thrombophlebitis may escape clinical detection. Three cases are reported in which whole-body gallium 67 scintigraphy was used to detect unsuspected deep venous thrombophlebitis related to indwelling catheters in three children who were being evaluated for fevers of unknown origin. Two of these children had septicemia from Candida organisms secondary to these venous lines. Gallium 67 scintigraphy may be useful in the detection of complications of indwelling venous catheters.
Detection of deep venous thrombophlebitis by Gallium 67 scintigraphy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, J.H.
1981-07-01
Deep venous thrombophlebitis may escape clinical detection. Three cases are reported in which whole-body gallium 67 scintigraphy was used to detect unsuspected deep venous thrombophlebitis related to indwelling catheters in three children who were being evaluated for fevers of unknown origin. Two of these children had septicemia from Candida organisms secondary to these venous lines. Gallium 67 scintigraphy may be useful in the detection of complications of indwelling venous catheters.
USDA-ARS?s Scientific Manuscript database
The complete genome sequence of a Southern tomato virus (STV) isolate on tomato plants in a seed production field in Bangladesh was obtained for the first time using next generation sequencing. The identified isolate STV_BD-13 shares high degree of sequence identity (99%) with several known STV isol...
USDA-ARS?s Scientific Manuscript database
Complete genome sequence of a double-stranded RNA (dsRNA) virus, southern tomato virus (STV), on tomatoes in China, was elucidated using small RNAs deep sequencing. The identified STV_CN12 shares 99% sequence identity to other isolates from Mexico, France, Spain, and U.S. This is the first report ...
Application of Subspace Detection to the 6 November 2011 M5.6 Prague, Oklahoma Aftershock Sequence
NASA Astrophysics Data System (ADS)
McMahon, N. D.; Benz, H.; Johnson, C. E.; Aster, R. C.; McNamara, D. E.
2015-12-01
Subspace detection is a powerful tool for the identification of small seismic events. Subspace detectors improve upon single-event matched filtering techniques by using multiple orthogonal waveform templates whose linear combinations characterize a range of observed signals from previously identified earthquakes. Subspace detectors running on multiple stations can significantly increasing the number of locatable events, lowering the catalog's magnitude of completeness and thus providing extraordinary detail on the kinematics of the aftershock process. The 6 November 2011 M5.6 earthquake near Prague, Oklahoma is the largest earthquake instrumentally recorded in Oklahoma history and the largest earthquake resultant from deep wastewater injection. A M4.8 foreshock on 5 November 2011 and the M5.6 mainshock triggered tens of thousands of detectable aftershocks along a 20 km splay of the Wilzetta Fault Zone known as the Meeker-Prague fault. In response to this unprecedented earthquake, 21 temporary seismic stations were deployed surrounding the seismic activity. We utilized a catalog of 767 previously located aftershocks to construct subspace detectors for the 21 temporary and 10 closest permanent seismic stations. Subspace detection identified more than 500,000 new arrival-time observations, which associated into more than 20,000 locatable earthquakes. The associated earthquakes were relocated using the Bayesloc multiple-event locator, resulting in ~7,000 earthquakes with hypocentral uncertainties of less than 500 m. The relocated seismicity provides unique insight into the spatio-temporal evolution of the aftershock sequence along the Wilzetta Fault Zone and its associated structures. We find that the crystalline basement and overlying sedimentary Arbuckle formation accommodate the majority of aftershocks. While we observe aftershocks along the entire 20 km length of the Meeker-Prague fault, the vast majority of earthquakes were confined to a 9 km wide by 9 km deep surface striking N54°E and dipping 83° to the northwest near the junction of the splay with the main Wilzetta fault structure. Relocated seismicity shows off-fault stress-related interaction to distances of 10 km or more from the mainshock, including clustered seismicity to the northwest and southeast of the mainshock.
Schoening, Timm; Bergmann, Melanie; Ontrup, Jörg; Taylor, James; Dannheim, Jennifer; Gutt, Julian; Purser, Autun; Nattkemper, Tim W
2012-01-01
Megafauna play an important role in benthic ecosystem function and are sensitive indicators of environmental change. Non-invasive monitoring of benthic communities can be accomplished by seafloor imaging. However, manual quantification of megafauna in images is labor-intensive and therefore, this organism size class is often neglected in ecosystem studies. Automated image analysis has been proposed as a possible approach to such analysis, but the heterogeneity of megafaunal communities poses a non-trivial challenge for such automated techniques. Here, the potential of a generalized object detection architecture, referred to as iSIS (intelligent Screening of underwater Image Sequences), for the quantification of a heterogenous group of megafauna taxa is investigated. The iSIS system is tuned for a particular image sequence (i.e. a transect) using a small subset of the images, in which megafauna taxa positions were previously marked by an expert. To investigate the potential of iSIS and compare its results with those obtained from human experts, a group of eight different taxa from one camera transect of seafloor images taken at the Arctic deep-sea observatory HAUSGARTEN is used. The results show that inter- and intra-observer agreements of human experts exhibit considerable variation between the species, with a similar degree of variation apparent in the automatically derived results obtained by iSIS. Whilst some taxa (e. g. Bathycrinus stalks, Kolga hyalina, small white sea anemone) were well detected by iSIS (i. e. overall Sensitivity: 87%, overall Positive Predictive Value: 67%), some taxa such as the small sea cucumber Elpidia heckeri remain challenging, for both human observers and iSIS.
Schoening, Timm; Bergmann, Melanie; Ontrup, Jörg; Taylor, James; Dannheim, Jennifer; Gutt, Julian; Purser, Autun; Nattkemper, Tim W.
2012-01-01
Megafauna play an important role in benthic ecosystem function and are sensitive indicators of environmental change. Non-invasive monitoring of benthic communities can be accomplished by seafloor imaging. However, manual quantification of megafauna in images is labor-intensive and therefore, this organism size class is often neglected in ecosystem studies. Automated image analysis has been proposed as a possible approach to such analysis, but the heterogeneity of megafaunal communities poses a non-trivial challenge for such automated techniques. Here, the potential of a generalized object detection architecture, referred to as iSIS (intelligent Screening of underwater Image Sequences), for the quantification of a heterogenous group of megafauna taxa is investigated. The iSIS system is tuned for a particular image sequence (i.e. a transect) using a small subset of the images, in which megafauna taxa positions were previously marked by an expert. To investigate the potential of iSIS and compare its results with those obtained from human experts, a group of eight different taxa from one camera transect of seafloor images taken at the Arctic deep-sea observatory HAUSGARTEN is used. The results show that inter- and intra-observer agreements of human experts exhibit considerable variation between the species, with a similar degree of variation apparent in the automatically derived results obtained by iSIS. Whilst some taxa (e. g. Bathycrinus stalks, Kolga hyalina, small white sea anemone) were well detected by iSIS (i. e. overall Sensitivity: 87%, overall Positive Predictive Value: 67%), some taxa such as the small sea cucumber Elpidia heckeri remain challenging, for both human observers and iSIS. PMID:22719868
Patterns of DNA barcode variation in Canadian marine molluscs.
Layton, Kara K S; Martel, André L; Hebert, Paul D N
2014-01-01
Molluscs are the most diverse marine phylum and this high diversity has resulted in considerable taxonomic problems. Because the number of species in Canadian oceans remains uncertain, there is a need to incorporate molecular methods into species identifications. A 648 base pair segment of the cytochrome c oxidase subunit I gene has proven useful for the identification and discovery of species in many animal lineages. While the utility of DNA barcoding in molluscs has been demonstrated in other studies, this is the first effort to construct a DNA barcode registry for marine molluscs across such a large geographic area. This study examines patterns of DNA barcode variation in 227 species of Canadian marine molluscs. Intraspecific sequence divergences ranged from 0-26.4% and a barcode gap existed for most taxa. Eleven cases of relatively deep (>2%) intraspecific divergence were detected, suggesting the possible presence of overlooked species. Structural variation was detected in COI with indels found in 37 species, mostly bivalves. Some indels were present in divergent lineages, primarily in the region of the first external loop, suggesting certain areas are hotspots for change. Lastly, mean GC content varied substantially among orders (24.5%-46.5%), and showed a significant positive correlation with nearest neighbour distances. DNA barcoding is an effective tool for the identification of Canadian marine molluscs and for revealing possible cases of overlooked species. Some species with deep intraspecific divergence showed a biogeographic partition between lineages on the Atlantic, Arctic and Pacific coasts, suggesting the role of Pleistocene glaciations in the subdivision of their populations. Indels were prevalent in the barcode region of the COI gene in bivalves and gastropods. This study highlights the efficacy of DNA barcoding for providing insights into sequence variation across a broad taxonomic group on a large geographic scale.
AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling.
Wang, Sheng; Sun, Siqi; Xu, Jinbo
2016-09-01
Deep Convolutional Neural Networks (DCNN) has shown excellent performance in a variety of machine learning tasks. This paper presents Deep Convolutional Neural Fields (DeepCNF), an integration of DCNN with Conditional Random Field (CRF), for sequence labeling with an imbalanced label distribution. The widely-used training methods, such as maximum-likelihood and maximum labelwise accuracy, do not work well on imbalanced data. To handle this, we present a new training algorithm called maximum-AUC for DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced data. To fulfill this, we formulate AUC in a pairwise ranking framework, approximate it by a polynomial function and then apply a gradient-based procedure to optimize it. Our experimental results confirm that maximum-AUC greatly outperforms the other two training methods on 8-state secondary structure prediction and disorder prediction since their label distributions are highly imbalanced and also has similar performance as the other two training methods on solvent accessibility prediction, which has three equally-distributed labels. Furthermore, our experimental results show that our AUC-trained DeepCNF models greatly outperform existing popular predictors of these three tasks. The data and software related to this paper are available at https://github.com/realbigws/DeepCNF_AUC.
AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling
Wang, Sheng; Sun, Siqi
2017-01-01
Deep Convolutional Neural Networks (DCNN) has shown excellent performance in a variety of machine learning tasks. This paper presents Deep Convolutional Neural Fields (DeepCNF), an integration of DCNN with Conditional Random Field (CRF), for sequence labeling with an imbalanced label distribution. The widely-used training methods, such as maximum-likelihood and maximum labelwise accuracy, do not work well on imbalanced data. To handle this, we present a new training algorithm called maximum-AUC for DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced data. To fulfill this, we formulate AUC in a pairwise ranking framework, approximate it by a polynomial function and then apply a gradient-based procedure to optimize it. Our experimental results confirm that maximum-AUC greatly outperforms the other two training methods on 8-state secondary structure prediction and disorder prediction since their label distributions are highly imbalanced and also has similar performance as the other two training methods on solvent accessibility prediction, which has three equally-distributed labels. Furthermore, our experimental results show that our AUC-trained DeepCNF models greatly outperform existing popular predictors of these three tasks. The data and software related to this paper are available at https://github.com/realbigws/DeepCNF_AUC. PMID:28884168
Modeling genome coverage in single-cell sequencing
Daley, Timothy; Smith, Andrew D.
2014-01-01
Motivation: Single-cell DNA sequencing is necessary for examining genetic variation at the cellular level, which remains hidden in bulk sequencing experiments. But because they begin with such small amounts of starting material, the amount of information that is obtained from single-cell sequencing experiment is highly sensitive to the choice of protocol employed and variability in library preparation. In particular, the fraction of the genome represented in single-cell sequencing libraries exhibits extreme variability due to quantitative biases in amplification and loss of genetic material. Results: We propose a method to predict the genome coverage of a deep sequencing experiment using information from an initial shallow sequencing experiment mapped to a reference genome. The observed coverage statistics are used in a non-parametric empirical Bayes Poisson model to estimate the gain in coverage from deeper sequencing. This approach allows researchers to know statistical features of deep sequencing experiments without actually sequencing deeply, providing a basis for optimizing and comparing single-cell sequencing protocols or screening libraries. Availability and implementation: The method is available as part of the preseq software package. Source code is available at http://smithlabresearch.org/preseq. Contact: andrewds@usc.edu Supplementary information: Supplementary material is available at Bioinformatics online. PMID:25107873
Spatial constraints govern competition of mutant clones in human epidermis.
Lynch, M D; Lynch, C N S; Craythorne, E; Liakath-Ali, K; Mallipeddi, R; Barker, J N; Watt, F M
2017-10-24
Deep sequencing can detect somatic DNA mutations in tissues permitting inference of clonal relationships. This has been applied to human epidermis, where sun exposure leads to the accumulation of mutations and an increased risk of skin cancer. However, previous studies have yielded conflicting conclusions about the relative importance of positive selection and neutral drift in clonal evolution. Here, we sequenced larger areas of skin than previously, focusing on cancer-prone skin spanning five decades of life. The mutant clones identified were too large to be accounted for solely by neutral drift. Rather, using mathematical modelling and computational lattice-based simulations, we show that observed clone size distributions can be explained by a combination of neutral drift and stochastic nucleation of mutations at the boundary of expanding mutant clones that have a competitive advantage. These findings demonstrate that spatial context and cell competition cooperate to determine the fate of a mutant stem cell.
Jansen, Anne M L; Crobach, Stijn; Geurts-Giele, Willemina R R; van den Akker, Brendy E W M; Garcia, Marina Ventayol; Ruano, Dina; Nielsen, Maartje; Tops, Carli M J; Wijnen, Juul T; Hes, Frederik J; van Wezel, Tom; Dinjens, Winand N M; Morreau, Hans
2017-02-01
We investigated the presence and patterns of mosaicism in the APC gene in patients with colon neoplasms not associated with any other genetic variants; we performed deep sequence analysis of APC in at least 2 adenomas or carcinomas per patient. We identified mosaic variants in APC in adenomas from 9 of the 18 patients with 21 to approximately 100 adenomas. Mosaic variants of APC were variably detected in leukocyte DNA and/or non-neoplastic intestinal mucosa of these patients. In a comprehensive sequence analysis of 1 patient, we found no evidence for mosaicism in APC in non-neoplastic intestinal mucosa. One patient was found to carry a mosaic c.4666dupA APC variant in only 10 of 16 adenomas, indicating the importance of screening 2 or more adenomas for genetic variants. Copyright © 2017 AGA Institute. Published by Elsevier Inc. All rights reserved.
Mu, John C.; Tootoonchi Afshar, Pegah; Mohiyuddin, Marghoob; Chen, Xi; Li, Jian; Bani Asadi, Narges; Gerstein, Mark B.; Wong, Wing H.; Lam, Hugo Y. K.
2015-01-01
A high-confidence, comprehensive human variant set is critical in assessing accuracy of sequencing algorithms, which are crucial in precision medicine based on high-throughput sequencing. Although recent works have attempted to provide such a resource, they still do not encompass all major types of variants including structural variants (SVs). Thus, we leveraged the massive high-quality Sanger sequences from the HuRef genome to construct by far the most comprehensive gold set of a single individual, which was cross validated with deep Illumina sequencing, population datasets, and well-established algorithms. It was a necessary effort to completely reanalyze the HuRef genome as its previously published variants were mostly reported five years ago, suffering from compatibility, organization, and accuracy issues that prevent their direct use in benchmarking. Our extensive analysis and validation resulted in a gold set with high specificity and sensitivity. In contrast to the current gold sets of the NA12878 or HS1011 genomes, our gold set is the first that includes small variants, deletion SVs and insertion SVs up to a hundred thousand base-pairs. We demonstrate the utility of our HuRef gold set to benchmark several published SV detection tools. PMID:26412485
Hawlitschek, Oliver; Porch, Nick; Hendrich, Lars; Balke, Michael
2011-02-09
DNA sequencing techniques used to estimate biodiversity, such as DNA barcoding, may reveal cryptic species. However, disagreements between barcoding and morphological data have already led to controversy. Species delimitation should therefore not be based on mtDNA alone. Here, we explore the use of nDNA and bioclimatic modelling in a new species of aquatic beetle revealed by mtDNA sequence data. The aquatic beetle fauna of Australia is characterised by high degrees of endemism, including local radiations such as the genus Antiporus. Antiporus femoralis was previously considered to exist in two disjunct, but morphologically indistinguishable populations in south-western and south-eastern Australia. We constructed a phylogeny of Antiporus and detected a deep split between these populations. Diagnostic characters from the highly variable nuclear protein encoding arginine kinase gene confirmed the presence of two isolated populations. We then used ecological niche modelling to examine the climatic niche characteristics of the two populations. All results support the status of the two populations as distinct species. We describe the south-western species as Antiporus occidentalis sp.n. In addition to nDNA sequence data and extended use of mitochondrial sequences, ecological niche modelling has great potential for delineating morphologically cryptic species.
Hawlitschek, Oliver; Porch, Nick; Hendrich, Lars; Balke, Michael
2011-01-01
Background DNA sequencing techniques used to estimate biodiversity, such as DNA barcoding, may reveal cryptic species. However, disagreements between barcoding and morphological data have already led to controversy. Species delimitation should therefore not be based on mtDNA alone. Here, we explore the use of nDNA and bioclimatic modelling in a new species of aquatic beetle revealed by mtDNA sequence data. Methodology/Principal Findings The aquatic beetle fauna of Australia is characterised by high degrees of endemism, including local radiations such as the genus Antiporus. Antiporus femoralis was previously considered to exist in two disjunct, but morphologically indistinguishable populations in south-western and south-eastern Australia. We constructed a phylogeny of Antiporus and detected a deep split between these populations. Diagnostic characters from the highly variable nuclear protein encoding arginine kinase gene confirmed the presence of two isolated populations. We then used ecological niche modelling to examine the climatic niche characteristics of the two populations. All results support the status of the two populations as distinct species. We describe the south-western species as Antiporus occidentalis sp.n. Conclusion/Significance In addition to nDNA sequence data and extended use of mitochondrial sequences, ecological niche modelling has great potential for delineating morphologically cryptic species. PMID:21347370
Long-term detection of Parkinsonian tremor activity from subthalamic nucleus local field potentials.
Houston, Brady; Blumenfeld, Zack; Quinn, Emma; Bronte-Stewart, Helen; Chizeck, Howard
2015-01-01
Current deep brain stimulation paradigms deliver continuous stimulation to deep brain structures to ameliorate the symptoms of Parkinson's disease. This continuous stimulation has undesirable side effects and decreases the lifespan of the unit's battery, necessitating earlier replacement. A closed-loop deep brain stimulator that uses brain signals to determine when to deliver stimulation based on the occurrence of symptoms could potentially address these drawbacks of current technology. Attempts to detect Parkinsonian tremor using brain signals recorded during the implantation procedure have been successful. However, the ability of these methods to accurately detect tremor over extended periods of time is unknown. Here we use local field potentials recorded during a deep brain stimulation clinical follow-up visit 1 month after initial programming to build a tremor detection algorithm and use this algorithm to detect tremor in subsequent visits up to 8 months later. Using this method, we detected the occurrence of tremor with accuracies between 68-93%. These results demonstrate the potential of tremor detection methods for efficacious closed-loop deep brain stimulation over extended periods of time.
High-resolution characterization of sequence signatures due to non-random cleavage of cell-free DNA.
Chandrananda, Dineika; Thorne, Natalie P; Bahlo, Melanie
2015-06-17
High-throughput sequencing of cell-free DNA fragments found in human plasma has been used to non-invasively detect fetal aneuploidy, monitor organ transplants and investigate tumor DNA. However, many biological properties of this extracellular genetic material remain unknown. Research that further characterizes circulating DNA could substantially increase its diagnostic value by allowing the application of more sophisticated bioinformatics tools that lead to an improved signal to noise ratio in the sequencing data. In this study, we investigate various features of cell-free DNA in plasma using deep-sequencing data from two pregnant women (>70X, >50X) and compare them with matched cellular DNA. We utilize a descriptive approach to examine how the biological cleavage of cell-free DNA affects different sequence signatures such as fragment lengths, sequence motifs at fragment ends and the distribution of cleavage sites along the genome. We show that the size distributions of these cell-free DNA molecules are dependent on their autosomal and mitochondrial origin as well as the genomic location within chromosomes. DNA mapping to particular microsatellites and alpha repeat elements display unique size signatures. We show how cell-free fragments occur in clusters along the genome, localizing to nucleosomal arrays and are preferentially cleaved at linker regions by correlating the mapping locations of these fragments with ENCODE annotation of chromatin organization. Our work further demonstrates that cell-free autosomal DNA cleavage is sequence dependent. The region spanning up to 10 positions on either side of the DNA cleavage site show a consistent pattern of preference for specific nucleotides. This sequence motif is present in cleavage sites localized to nucleosomal cores and linker regions but is absent in nucleosome-free mitochondrial DNA. These background signals in cell-free DNA sequencing data stem from the non-random biological cleavage of these fragments. This sequence structure can be harnessed to improve bioinformatics algorithms, in particular for CNV and structural variant detection. Descriptive measures for cell-free DNA features developed here could also be used in biomarker analysis to monitor the changes that occur during different pathological conditions.
USDA-ARS?s Scientific Manuscript database
Over the past decade, Next Generation Sequencing (NGS) technologies, also called deep sequencing, have continued to evolve, increasing capacity and lower the cost necessary for large genome sequencing projects. The one of the advantage of NGS platforms is the possibility to sequence the samples with...
Yildirim, Özal
2018-05-01
Long-short term memory networks (LSTMs), which have recently emerged in sequential data analysis, are the most widely used type of recurrent neural networks (RNNs) architecture. Progress on the topic of deep learning includes successful adaptations of deep versions of these architectures. In this study, a new model for deep bidirectional LSTM network-based wavelet sequences called DBLSTM-WS was proposed for classifying electrocardiogram (ECG) signals. For this purpose, a new wavelet-based layer is implemented to generate ECG signal sequences. The ECG signals were decomposed into frequency sub-bands at different scales in this layer. These sub-bands are used as sequences for the input of LSTM networks. New network models that include unidirectional (ULSTM) and bidirectional (BLSTM) structures are designed for performance comparisons. Experimental studies have been performed for five different types of heartbeats obtained from the MIT-BIH arrhythmia database. These five types are Normal Sinus Rhythm (NSR), Ventricular Premature Contraction (VPC), Paced Beat (PB), Left Bundle Branch Block (LBBB), and Right Bundle Branch Block (RBBB). The results show that the DBLSTM-WS model gives a high recognition performance of 99.39%. It has been observed that the wavelet-based layer proposed in the study significantly improves the recognition performance of conventional networks. This proposed network structure is an important approach that can be applied to similar signal processing problems. Copyright © 2018 Elsevier Ltd. All rights reserved.
Protein Solvent-Accessibility Prediction by a Stacked Deep Bidirectional Recurrent Neural Network.
Zhang, Buzhong; Li, Linqing; Lü, Qiang
2018-05-25
Residue solvent accessibility is closely related to the spatial arrangement and packing of residues. Predicting the solvent accessibility of a protein is an important step to understand its structure and function. In this work, we present a deep learning method to predict residue solvent accessibility, which is based on a stacked deep bidirectional recurrent neural network applied to sequence profiles. To capture more long-range sequence information, a merging operator was proposed when bidirectional information from hidden nodes was merged for outputs. Three types of merging operators were used in our improved model, with a long short-term memory network performing as a hidden computing node. The trained database was constructed from 7361 proteins extracted from the PISCES server using a cut-off of 25% sequence identity. Sequence-derived features including position-specific scoring matrix, physical properties, physicochemical characteristics, conservation score and protein coding were used to represent a residue. Using this method, predictive values of continuous relative solvent-accessible area were obtained, and then, these values were transformed into binary states with predefined thresholds. Our experimental results showed that our deep learning method improved prediction quality relative to current methods, with mean absolute error and Pearson's correlation coefficient values of 8.8% and 74.8%, respectively, on the CB502 dataset and 8.2% and 78%, respectively, on the Manesh215 dataset.
Use of four next-generation sequencing platforms to determine HIV-1 coreceptor tropism.
Archer, John; Weber, Jan; Henry, Kenneth; Winner, Dane; Gibson, Richard; Lee, Lawrence; Paxinos, Ellen; Arts, Eric J; Robertson, David L; Mimms, Larry; Quiñones-Mateu, Miguel E
2012-01-01
HIV-1 coreceptor tropism assays are required to rule out the presence of CXCR4-tropic (non-R5) viruses prior treatment with CCR5 antagonists. Phenotypic (e.g., Trofile™, Monogram Biosciences) and genotypic (e.g., population sequencing linked to bioinformatic algorithms) assays are the most widely used. Although several next-generation sequencing (NGS) platforms are available, to date all published deep sequencing HIV-1 tropism studies have used the 454™ Life Sciences/Roche platform. In this study, HIV-1 co-receptor usage was predicted for twelve patients scheduled to start a maraviroc-based antiretroviral regimen. The V3 region of the HIV-1 env gene was sequenced using four NGS platforms: 454™, PacBio® RS (Pacific Biosciences), Illumina®, and Ion Torrent™ (Life Technologies). Cross-platform variation was evaluated, including number of reads, read length and error rates. HIV-1 tropism was inferred using Geno2Pheno, Web PSSM, and the 11/24/25 rule and compared with Trofile™ and virologic response to antiretroviral therapy. Error rates related to insertions/deletions (indels) and nucleotide substitutions introduced by the four NGS platforms were low compared to the actual HIV-1 sequence variation. Each platform detected all major virus variants within the HIV-1 population with similar frequencies. Identification of non-R5 viruses was comparable among the four platforms, with minor differences attributable to the algorithms used to infer HIV-1 tropism. All NGS platforms showed similar concordance with virologic response to the maraviroc-based regimen (75% to 80% range depending on the algorithm used), compared to Trofile (80%) and population sequencing (70%). In conclusion, all four NGS platforms were able to detect minority non-R5 variants at comparable levels suggesting that any NGS-based method can be used to predict HIV-1 coreceptor usage.
Masetti, Riccardo; Castelli, Ilaria; Astolfi, Annalisa; Bertuccio, Salvatore Nicola; Indio, Valentina; Togni, Marco; Belotti, Tamara; Serravalle, Salvatore; Tarantino, Giuseppe; Zecca, Marco; Pigazzi, Martina; Basso, Giuseppe; Pession, Andrea; Locatelli, Franco
2016-08-30
Despite significant improvement in treatment of childhood acute myeloid leukemia (AML), 30% of patients experience disease recurrence, which is still the major cause of treatment failure and death in these patients. To investigate molecular mechanisms underlying relapse, we performed whole-exome sequencing of diagnosis-relapse pairs and matched remission samples from 4 pediatric AML patients without recurrent cytogenetic alterations. Candidate driver mutations were selected for targeted deep sequencing at high coverage, suitable to detect small subclones (0.12%). BiCEBPα mutation was found to be stable and highly penetrant, representing a separate biological and clinical entity, unlike WT1 mutations, which were extremely unstable. Among the mutational patterns underlying relapse, we detected the acquisition of proliferative advantage by signaling activation (PTPN11 and FLT3-TKD mutations) and the increased resistance to apoptosis (hyperactivation of TYK2). We also found a previously undescribed feature of AML, consisting of a hypermutator phenotype caused by SETD2 inactivation. The consequent accumulation of new mutations promotes the adaptability of the leukemia, contributing to clonal selection. We report a novel ASXL3 mutation characterizing a very small subclone (<1%) present at diagnosis and undergoing expansion (60%) at relapse. Taken together, these findings provide molecular clues for designing optimal therapeutic strategies, in terms of target selection, adequate schedule design and reliable response-monitoring techniques.
Vassiliki, Kokkinou; George, Koutsodontis; Polixeni, Stamatiou; Christoforos, Giatzakis; Minas, Aslanides Ioannis; Stavrenia, Koukoula; Ioannis, Datseris
2018-01-01
Aim To evaluate the frequency and pattern of disease-associated mutations of ABCA4 gene among Greek patients with presumed Stargardt disease (STGD1). Materials and Methods A total of 59 patients were analyzed for ABCA4 mutations using the ABCR400 microarray and PCR-based sequencing of all coding exons and flanking intronic regions. MLPA analysis as well as sequencing of two regions in introns 30 and 36 reported earlier to harbor deep intronic disease-associated variants was used in 4 selected cases. Results An overall detection rate of at least one mutant allele was achieved in 52 of the 59 patients (88.1%). Direct sequencing improved significantly the complete characterization rate, that is, identification of two mutations compared to the microarray analysis (93.1% versus 50%). In total, 40 distinct potentially disease-causing variants of the ABCA4 gene were detected, including six previously unreported potentially pathogenic variants. Among the disease-causing variants, in this cohort, the most frequent was c.5714+5G>A representing 16.1%, while p.Gly1961Glu and p.Leu541Pro represented 15.2% and 8.5%, respectively. Conclusions By using a combination of methods, we completely molecularly diagnosed 48 of the 59 patients studied. In addition, we identified six previously unreported, potentially pathogenic ABCA4 mutations. PMID:29854428
Genome-wide discovery of novel and conserved microRNAs in white shrimp (Litopenaeus vannamei).
Xi, Qian-Yun; Xiong, Yuan-Yan; Wang, Yuan-Mei; Cheng, Xiao; Qi, Qi-En; Shu, Gang; Wang, Song-Bo; Wang, Li-Na; Gao, Ping; Zhu, Xiao-Tong; Jiang, Qing-Yan; Zhang, Yong-Liang; Liu, Li
2015-01-01
Of late years, a large amount of conserved and species-specific microRNAs (miRNAs) have been performed on identification from species which are economically important but lack a full genome sequence. In this study, Solexa deep sequencing and cross-species miRNA microarray were used to detect miRNAs in white shrimp. We identified 239 conserved miRNAs, 14 miRNA* sequences and 20 novel miRNAs by bioinformatics analysis from 7,561,406 high-quality reads representing 325,370 distinct sequences. The all 20 novel miRNAs were species-specific in white shrimp and not homologous in other species. Using the conserved miRNAs from the miRBase database as a query set to search for homologs from shrimp expressed sequence tags (ESTs), 32 conserved computationally predicted miRNAs were discovered in shrimp. In addition, using microarray analysis in the shrimp fed with Panax ginseng polysaccharide complex, 151 conserved miRNAs were identified, 18 of which were significant up-expression, while 49 miRNAs were significant down-expression. In particular, qRT-PCR analysis was also performed for nine miRNAs in three shrimp tissues such as muscle, gill and hepatopancreas. Results showed that these miRNAs expression are tissue specific. Combining results of the three methods, we detected 20 novel and 394 conserved miRNAs. Verification with quantitative reverse transcription (qRT-PCR) and Northern blot showed a high confidentiality of data. The study provides the first comprehensive specific miRNA profile of white shrimp, which includes useful information for future investigations into the function of miRNAs in regulation of shrimp development and immunology.
Giudice, Valentina; Feng, Xingmin; Lin, Zenghua; Hu, Wei; Zhang, Fanmao; Qiao, Wangmin; Ibanez, Maria Del Pilar Fernandez; Rios, Olga; Young, Neal S
2018-05-01
Oligoclonal expansion of CD8 + CD28 - lymphocytes has been considered indirect evidence for a pathogenic immune response in acquired aplastic anemia. A subset of CD8 + CD28 - cells with CD57 expression, termed effector memory cells, is expanded in several immune-mediated diseases and may have a role in immune surveillance. We hypothesized that effector memory CD8 + CD28 - CD57 + cells may drive aberrant oligoclonal expansion in aplastic anemia. We found CD8 + CD57 + cells frequently expanded in the blood of aplastic anemia patients, with oligoclonal characteristics by flow cytometric Vβ usage analysis: skewing in 1-5 Vβ families and frequencies of immunodominant clones ranging from 1.98% to 66.5%. Oligoclonal characteristics were also observed in total CD8 + cells from aplastic anemia patients with CD8 + CD57 + cell expansion by T-cell receptor deep sequencing, as well as the presence of 1-3 immunodominant clones. Oligoclonality was confirmed by T-cell receptor repertoire deep sequencing of enriched CD8 + CD57 + cells, which also showed decreased diversity compared to total CD4 + and CD8 + cell pools. From analysis of complementarity-determining region 3 sequences in the CD8 + cell pool, a total of 29 sequences were shared between patients and controls, but these sequences were highly expressed in aplastic anemia subjects and also present in their immunodominant clones. In summary, expansion of effector memory CD8 + T cells is frequent in aplastic anemia and mirrors Vβ oligoclonal expansion. Flow cytometric Vβ usage analysis combined with deep sequencing technologies allows high resolution characterization of the T-cell receptor repertoire, and might represent a useful tool in the diagnosis and periodic evaluation of aplastic anemia patients. (Registered at clinicaltrials.gov identifiers: 00001620, 01623167, 00001397, 00071045, 00081523, 00961064 ). Copyright © 2018 Ferrata Storti Foundation.
Branton, William G.; Ellestad, Kristofor K.; Maingat, Ferdinand; Wheatley, B. Matt; Rud, Erling; Warren, René L.; Holt, Robert A.; Surette, Michael G.; Power, Christopher
2013-01-01
The brain is assumed to be a sterile organ in the absence of disease although the impact of immune disruption is uncertain in terms of brain microbial diversity or quantity. To investigate microbial diversity and quantity in the brain, the profile of infectious agents was examined in pathologically normal and abnormal brains from persons with HIV/AIDS [HIV] (n = 12), other disease controls [ODC] (n = 14) and in cerebral surgical resections for epilepsy [SURG] (n = 6). Deep sequencing of cerebral white matter-derived RNA from the HIV (n = 4) and ODC (n = 4) patients and SURG (n = 2) groups revealed bacterially-encoded 16 s RNA sequences in all brain specimens with α-proteobacteria representing over 70% of bacterial sequences while the other 30% of bacterial classes varied widely. Bacterial rRNA was detected in white matter glial cells by in situ hybridization and peptidoglycan immunoreactivity was also localized principally in glia in human brains. Analyses of amplified bacterial 16 s rRNA sequences disclosed that Proteobacteria was the principal bacterial phylum in all human brain samples with similar bacterial rRNA quantities in HIV and ODC groups despite increased host neuroimmune responses in the HIV group. Exogenous viruses including bacteriophage and human herpes viruses-4, -5 and -6 were detected variably in autopsied brains from both clinical groups. Brains from SIV- and SHIV-infected macaques displayed a profile of bacterial phyla also dominated by Proteobacteria but bacterial sequences were not detected in experimentally FIV-infected cat or RAG1−/− mouse brains. Intracerebral implantation of human brain homogenates into RAG1−/− mice revealed a preponderance of α-proteobacteria 16 s RNA sequences in the brains of recipient mice at 7 weeks post-implantation, which was abrogated by prior heat-treatment of the brain homogenate. Thus, α-proteobacteria represented the major bacterial component of the primate brain’s microbiome regardless of underlying immune status, which could be transferred into naïve hosts leading to microbial persistence in the brain. PMID:23355888
Complete genome sequence of a tomato infecting tomato mottle mosaic virus in New York
USDA-ARS?s Scientific Manuscript database
Complete genome sequence of an emerging isolate of tomato mottle mosaic virus (ToMMV) infecting experimental nicotianan benthamiana plants in up-state New York was obtained using small RNA deep sequencing. ToMMV_NY-13 shared 99% sequence identity to ToMMV isolates from Mexico and Florida. Broader d...
Guinoiseau, Thibault; Moreau, Alain; Hohnadel, Guillaume; Ngo-Giang-Huong, Nicole; Brulard, Celine; Vourc'h, Patrick; Goudeau, Alain; Gaudy-Graffin, Catherine
2017-01-01
Hepatitis C virus (HCV) evolves rapidly in a single host and circulates as a quasispecies wich is a complex mixture of genetically distinct virus's but closely related namely variants. To identify intra-individual diversity and investigate their functional properties in vitro, it is necessary to define their quasispecies composition and isolate the HCV variants. This is possible using single genome amplification (SGA). This technique, based on serially diluted cDNA to amplify a single cDNA molecule (clonal amplicon), has already been used to determine individual HCV diversity. In these studies, positive PCR reactions from SGA were directly sequenced using Sanger technology. The detection of non-clonal amplicons is necessary for excluding them to facilitate further functional analysis. Here, we compared Next Generation Sequencing (NGS) with De Novo assembly and Sanger sequencing for their ability to distinguish clonal and non-clonal amplicons after SGA on one plasma specimen. All amplicons (n = 42) classified as clonal by NGS were also classified as clonal by Sanger sequencing. No double peaks were seen on electropherograms for non-clonal amplicons with position-specific nucleotide variation below 15% by NGS. Altogether, NGS circumvented many of the difficulties encountered when using Sanger sequencing after SGA and is an appropriate tool to reliability select clonal amplicons for further functional studies.
Guinoiseau, Thibault; Moreau, Alain; Hohnadel, Guillaume; Ngo-Giang-Huong, Nicole; Brulard, Celine; Vourc’h, Patrick; Goudeau, Alain; Gaudy-Graffin, Catherine
2017-01-01
Hepatitis C virus (HCV) evolves rapidly in a single host and circulates as a quasispecies wich is a complex mixture of genetically distinct virus’s but closely related namely variants. To identify intra-individual diversity and investigate their functional properties in vitro, it is necessary to define their quasispecies composition and isolate the HCV variants. This is possible using single genome amplification (SGA). This technique, based on serially diluted cDNA to amplify a single cDNA molecule (clonal amplicon), has already been used to determine individual HCV diversity. In these studies, positive PCR reactions from SGA were directly sequenced using Sanger technology. The detection of non-clonal amplicons is necessary for excluding them to facilitate further functional analysis. Here, we compared Next Generation Sequencing (NGS) with De Novo assembly and Sanger sequencing for their ability to distinguish clonal and non-clonal amplicons after SGA on one plasma specimen. All amplicons (n = 42) classified as clonal by NGS were also classified as clonal by Sanger sequencing. No double peaks were seen on electropherograms for non-clonal amplicons with position-specific nucleotide variation below 15% by NGS. Altogether, NGS circumvented many of the difficulties encountered when using Sanger sequencing after SGA and is an appropriate tool to reliability select clonal amplicons for further functional studies. PMID:28362878
Bidlingmaier, Scott; Ha, Kevin; Lee, Nam-Kyung; Su, Yang; Liu, Bin
2016-04-01
Although the bioactive sphingolipid ceramide is an important cell signaling molecule, relatively few direct ceramide-interacting proteins are known. We used an approach combining yeast surface cDNA display and deep sequencing technology to identify novel proteins binding directly to ceramide. We identified 234 candidate ceramide-binding protein fragments and validated binding for 20. Most (17) bound selectively to ceramide, although a few (3) bound to other lipids as well. Several novel ceramide-binding domains were discovered, including the EF-hand calcium-binding motif, the heat shock chaperonin-binding motif STI1, the SCP2 sterol-binding domain, and the tetratricopeptide repeat region motif. Interestingly, four of the verified ceramide-binding proteins (HPCA, HPCAL1, NCS1, and VSNL1) and an additional three candidate ceramide-binding proteins (NCALD, HPCAL4, and KCNIP3) belong to the neuronal calcium sensor family of EF hand-containing proteins. We used mutagenesis to map the ceramide-binding site in HPCA and to create a mutant HPCA that does not bind to ceramide. We demonstrated selective binding to ceramide by mammalian cell-produced wild type but not mutant HPCA. Intriguingly, we also identified a fragment from prostaglandin D2synthase that binds preferentially to ceramide 1-phosphate. The wide variety of proteins and domains capable of binding to ceramide suggests that many of the signaling functions of ceramide may be regulated by direct binding to these proteins. Based on the deep sequencing data, we estimate that our yeast surface cDNA display library covers ∼60% of the human proteome and our selection/deep sequencing protocol can identify target-interacting protein fragments that are present at extremely low frequency in the starting library. Thus, the yeast surface cDNA display/deep sequencing approach is a rapid, comprehensive, and flexible method for the analysis of protein-ligand interactions, particularly for the study of non-protein ligands. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.
Connectivity in the deep: Phylogeography of the velvet belly lanternshark
NASA Astrophysics Data System (ADS)
Gubili, Chrysoula; Macleod, Kirsty; Perry, William; Hanel, Pia; Batzakas, Ioannis; Farrell, Edward D.; Lynghammar, Arve; Mancusi, Cecilia; Mariani, Stefano; Menezes, Gui M.; Neat, Francis; Scarcella, Giuseppe; Griffiths, Andrew M.
2016-09-01
The velvet belly lanternshark, Etmopterus spinax, is a deep-sea bioluminescent squaloid shark, found predominantly in the Northeast Atlantic and Mediterranean Sea. It has been exposed to relatively high levels of mortality associated with by-catch in some regions. Its late maturity and low fecundity potentially renders it vulnerable to over-exploitation, although little remains known about processes of connectivity between key habitats/regions. This study utilised DNA sequencing of partial regions of the mitochondrial control region and nuclear ribosomal internal transcribed spacer 2 to investigate population structure and phylogeography of this species across the Northeast Atlantic and Mediterranean Basin. Despite the inclusion of samples from the range edges or remote locations, no evidence of significant population structure was detected. An important exception was identified using the control region sequence, with much greater (and statistically significant) levels of genetic differentiation between the Mediterranean and Atlantic. This suggests that the Strait of Gibraltar may represent an important bathymetric barrier, separating regions with very low levels of female dispersal. Bayesian estimation of divergence time also places the separation between the Mediterranean and Atlantic lineages within the last 100,000 years, presumably connected with perturbations during the last Glacial Period. These results demonstrate population subdivision at a much smaller geographic distance than has generally been identified in previous work on deep-sea sharks. This highlights a very significant role for shallow bathymetry in promoting genetic differentiation in deepwater taxa. It acts as an important exception to a general paradigm of marine species being connected by high levels of gene-flow, representing single stocks over large scales. It may also have significant implications for the fisheries management of this species.
Sheik, Cody S.; Reese, Brandi Kiel; Twing, Katrina I.; Sylvan, Jason B.; Grim, Sharon L.; Schrenk, Matthew O.; Sogin, Mitchell L.; Colwell, Frederick S.
2018-01-01
Earth’s subsurface environment is one of the largest, yet least studied, biomes on Earth, and many questions remain regarding what microorganisms are indigenous to the subsurface. Through the activity of the Census of Deep Life (CoDL) and the Deep Carbon Observatory, an open access 16S ribosomal RNA gene sequence database from diverse subsurface environments has been compiled. However, due to low quantities of biomass in the deep subsurface, the potential for incorporation of contaminants from reagents used during sample collection, processing, and/or sequencing is high. Thus, to understand the ecology of subsurface microorganisms (i.e., the distribution, richness, or survival), it is necessary to minimize, identify, and remove contaminant sequences that will skew the relative abundances of all taxa in the sample. In this meta-analysis, we identify putative contaminants associated with the CoDL dataset, recommend best practices for removing contaminants from samples, and propose a series of best practices for subsurface microbiology sampling. The most abundant putative contaminant genera observed, independent of evenness across samples, were Propionibacterium, Aquabacterium, Ralstonia, and Acinetobacter. While the top five most frequently observed genera were Pseudomonas, Propionibacterium, Acinetobacter, Ralstonia, and Sphingomonas. The majority of the most frequently observed genera (high evenness) were associated with reagent or potential human contamination. Additionally, in DNA extraction blanks, we observed potential archaeal contaminants, including methanogens, which have not been discussed in previous contamination studies. Such contaminants would directly affect the interpretation of subsurface molecular studies, as methanogenesis is an important subsurface biogeochemical process. Utilizing previously identified contaminant genera, we found that ∼27% of the total dataset were identified as contaminant sequences that likely originate from DNA extraction and DNA cleanup methods. Thus, controls must be taken at every step of the collection and processing procedure when working with low biomass environments such as, but not limited to, portions of Earth’s deep subsurface. Taken together, we stress that the CoDL dataset is an incredible resource for the broader research community interested in subsurface life, and steps to remove contamination derived sequences must be taken prior to using this dataset. PMID:29780369
Sheik, Cody S; Reese, Brandi Kiel; Twing, Katrina I; Sylvan, Jason B; Grim, Sharon L; Schrenk, Matthew O; Sogin, Mitchell L; Colwell, Frederick S
2018-01-01
Earth's subsurface environment is one of the largest, yet least studied, biomes on Earth, and many questions remain regarding what microorganisms are indigenous to the subsurface. Through the activity of the Census of Deep Life (CoDL) and the Deep Carbon Observatory, an open access 16S ribosomal RNA gene sequence database from diverse subsurface environments has been compiled. However, due to low quantities of biomass in the deep subsurface, the potential for incorporation of contaminants from reagents used during sample collection, processing, and/or sequencing is high. Thus, to understand the ecology of subsurface microorganisms (i.e., the distribution, richness, or survival), it is necessary to minimize, identify, and remove contaminant sequences that will skew the relative abundances of all taxa in the sample. In this meta-analysis, we identify putative contaminants associated with the CoDL dataset, recommend best practices for removing contaminants from samples, and propose a series of best practices for subsurface microbiology sampling. The most abundant putative contaminant genera observed, independent of evenness across samples, were Propionibacterium , Aquabacterium , Ralstonia , and Acinetobacter . While the top five most frequently observed genera were Pseudomonas , Propionibacterium , Acinetobacter , Ralstonia , and Sphingomonas . The majority of the most frequently observed genera (high evenness) were associated with reagent or potential human contamination. Additionally, in DNA extraction blanks, we observed potential archaeal contaminants, including methanogens, which have not been discussed in previous contamination studies. Such contaminants would directly affect the interpretation of subsurface molecular studies, as methanogenesis is an important subsurface biogeochemical process. Utilizing previously identified contaminant genera, we found that ∼27% of the total dataset were identified as contaminant sequences that likely originate from DNA extraction and DNA cleanup methods. Thus, controls must be taken at every step of the collection and processing procedure when working with low biomass environments such as, but not limited to, portions of Earth's deep subsurface. Taken together, we stress that the CoDL dataset is an incredible resource for the broader research community interested in subsurface life, and steps to remove contamination derived sequences must be taken prior to using this dataset.
NASA Astrophysics Data System (ADS)
Decarli, Roberto; Walter, Fabian; Aravena, Manuel; Carilli, Chris; Bouwens, Rychard; da Cunha, Elisabete; Daddi, Emanuele; Elbaz, David; Riechers, Dominik; Smail, Ian; Swinbank, Mark; Weiss, Axel; Bacon, Roland; Bauer, Franz; Bell, Eric F.; Bertoldi, Frank; Chapman, Scott; Colina, Luis; Cortes, Paulo C.; Cox, Pierre; Gónzalez-López, Jorge; Inami, Hanae; Ivison, Rob; Hodge, Jacqueline; Karim, Alex; Magnelli, Benjamin; Ota, Kazuaki; Popping, Gergö; Rix, Hans-Walter; Sargent, Mark; van der Wel, Arjen; van der Werf, Paul
2016-12-01
We study the molecular gas properties of high-z galaxies observed in the ALMA Spectroscopic Survey (ASPECS) that targets an ˜1 arcmin2 region in the Hubble Ultra Deep Field (UDF), a blind survey of CO emission (tracing molecular gas) in the 3 and 1 mm bands. Of a total of 1302 galaxies in the field, 56 have spectroscopic redshifts and correspondingly well-defined physical properties. Among these, 11 have infrared luminosities {L}{IR}\\gt {10}11 {L}⊙ , I.e., a detection in CO emission was expected. Out of these, 7 are detected at various significance in CO, and 4 are undetected in CO emission. In the CO-detected sources, we find CO excitation conditions that are lower than those typically found in starburst/sub-mm galaxy/QSO environments. We use the CO luminosities (including limits for non-detections) to derive molecular gas masses. We discuss our findings in the context of previous molecular gas observations at high redshift (star formation law, gas depletion times, gas fractions): the CO-detected galaxies in the UDF tend to reside on the low-{L}{IR} envelope of the scatter in the {L}{IR}{--}{L}{CO}\\prime relation, but exceptions exist. For the CO-detected sources, we find an average depletion time of ˜1 Gyr, with significant scatter. The average molecular-to-stellar mass ratio ({M}{{H}2}/M *) is consistent with earlier measurements of main-sequence galaxies at these redshifts, and again shows large variations among sources. In some cases, we also measure dust continuum emission. On average, the dust-based estimates of the molecular gas are a factor ˜2-5× smaller than those based on CO. When we account for detections as well as non-detections, we find large diversity in the molecular gas properties of the high-redshift galaxies covered by ASPECS.
Webster, Nicole S; Taylor, Michael W; Behnam, Faris; Lücker, Sebastian; Rattei, Thomas; Whalan, Stephen; Horn, Matthias; Wagner, Michael
2010-08-01
Marine sponges contain complex bacterial communities of considerable ecological and biotechnological importance, with many of these organisms postulated to be specific to sponge hosts. Testing this hypothesis in light of the recent discovery of the rare microbial biosphere, we investigated three Australian sponges by massively parallel 16S rRNA gene tag pyrosequencing. Here we show bacterial diversity that is unparalleled in an invertebrate host, with more than 250,000 sponge-derived sequence tags being assigned to 23 bacterial phyla and revealing up to 2996 operational taxonomic units (95% sequence similarity) per sponge species. Of the 33 previously described 'sponge-specific' clusters that were detected in this study, 48% were found exclusively in adults and larvae - implying vertical transmission of these groups. The remaining taxa, including 'Poribacteria', were also found at very low abundance among the 135,000 tags retrieved from surrounding seawater. Thus, members of the rare seawater biosphere may serve as seed organisms for widely occurring symbiont populations in sponges and their host association might have evolved much more recently than previously thought. © 2009 Society for Applied Microbiology and Blackwell Publishing Ltd.
Webster, Nicole S; Taylor, Michael W; Behnam, Faris; Lücker, Sebastian; Rattei, Thomas; Whalan, Stephen; Horn, Matthias; Wagner, Michael
2010-01-01
Marine sponges contain complex bacterial communities of considerable ecological and biotechnological importance, with many of these organisms postulated to be specific to sponge hosts. Testing this hypothesis in light of the recent discovery of the rare microbial biosphere, we investigated three Australian sponges by massively parallel 16S rRNA gene tag pyrosequencing. Here we show bacterial diversity that is unparalleled in an invertebrate host, with more than 250 000 sponge-derived sequence tags being assigned to 23 bacterial phyla and revealing up to 2996 operational taxonomic units (95% sequence similarity) per sponge species. Of the 33 previously described ‘sponge-specific’ clusters that were detected in this study, 48% were found exclusively in adults and larvae – implying vertical transmission of these groups. The remaining taxa, including ‘Poribacteria’, were also found at very low abundance among the 135 000 tags retrieved from surrounding seawater. Thus, members of the rare seawater biosphere may serve as seed organisms for widely occurring symbiont populations in sponges and their host association might have evolved much more recently than previously thought. PMID:21966903
Siniscalchi, Luciene Alves Batista; Leite, Laura Rabelo; Oliveira, Guilherme; Chernicharo, Carlos Augusto Lemos; de Araújo, Juliana Calabria
2017-07-01
Methane is produced in anaerobic environments, such as reactors used to treat wastewaters, and can be consumed by methanotrophs. The composition and structure of a microbial community enriched from anaerobic sewage sludge under methane-oxidation condition coupled to denitrification were investigated. Denaturing gradient gel electrophoresis (DGGE) analysis retrieved sequences of Methylocaldum and Chloroflexi. Deep sequencing analysis revealed a complex community that changed over time and was affected by methane concentration. Methylocaldum (8.2%), Methylosinus (2.3%), Methylomonas (0.02%), Methylacidiphilales (0.45%), Nitrospirales (0.18%), and Methanosarcinales (0.3%) were detected. Despite denitrifying conditions provided, Nitrospirales and Methanosarcinales, known to perform anaerobic methane oxidation coupled to denitrification (DAMO) process, were in very low abundance. Results demonstrated that aerobic and anaerobic methanotrophs coexisted in the reactor together with heterotrophic microorganisms, suggesting that a diverse microbial community was important to sustain methanotrophic activity. The methanogenic sludge was a good inoculum to enrich methanotrophs, and cultivation conditions play a selective role in determining community composition.
Real-Time Human Detection for Aerial Captured Video Sequences via Deep Models.
AlDahoul, Nouar; Md Sabri, Aznul Qalid; Mansoor, Ali Mohammed
2018-01-01
Human detection in videos plays an important role in various real life applications. Most of traditional approaches depend on utilizing handcrafted features which are problem-dependent and optimal for specific tasks. Moreover, they are highly susceptible to dynamical events such as illumination changes, camera jitter, and variations in object sizes. On the other hand, the proposed feature learning approaches are cheaper and easier because highly abstract and discriminative features can be produced automatically without the need of expert knowledge. In this paper, we utilize automatic feature learning methods which combine optical flow and three different deep models (i.e., supervised convolutional neural network (S-CNN), pretrained CNN feature extractor, and hierarchical extreme learning machine) for human detection in videos captured using a nonstatic camera on an aerial platform with varying altitudes. The models are trained and tested on the publicly available and highly challenging UCF-ARG aerial dataset. The comparison between these models in terms of training, testing accuracy, and learning speed is analyzed. The performance evaluation considers five human actions (digging, waving, throwing, walking, and running). Experimental results demonstrated that the proposed methods are successful for human detection task. Pretrained CNN produces an average accuracy of 98.09%. S-CNN produces an average accuracy of 95.6% with soft-max and 91.7% with Support Vector Machines (SVM). H-ELM has an average accuracy of 95.9%. Using a normal Central Processing Unit (CPU), H-ELM's training time takes 445 seconds. Learning in S-CNN takes 770 seconds with a high performance Graphical Processing Unit (GPU).
Long-term decay and possible reactivation of induced seismicity at the Basel EGS site
NASA Astrophysics Data System (ADS)
Kraft, Toni; Herrmann, Marcus; Karvounis, Dimitrios; Tormann, Thessa; Deichmann, Nicolas; Wiemer, Stefan
2016-04-01
In December 2006, an extensive fluid injection was carried out below the city of Basel, Switzerland, to stimulate a reservoir for an Enhanced Geothermal System (EGS). After six days of gradual increase of flow rate (and thus seismicity), a strongly felt ML3.4 earthquakes led to the immediate termination of the project. The well was opened subsequently and seismicity declined rapidly. The Basel EGS project might be an unsuccessful attempt in terms of energy supply, but a chance to advance the physical understanding of EGSs. The well-monitored and well-studied induced sequence allowed many new insights in terms of reservoir creation. A special observation in the nine years of monitoring is the revive of seismic activity six years after prolonged seismic decay. This renewed activity increase might relate to a gradual pressure increase due to the ultimate shut-in (closure) of the borehole about one year before. Until now, a detailed analysis of the long-term behaviour remained unexplored since a consistent catalogue did not exist. In the current study, we took advantage of the high waveform similarity within a seismic sequence and applied a multi-trace template-matching (i.e. cross-correlation) procedure to detect seismic events about one order of magnitude below the detection threshold. We detected about 100,000 events within the six-day long stimulation alone - previously, only 13,000 microearthquakes were detected. We only scanned the recordings of the deepest borehole station (2.7km). This station is very close to the 5km-deep reservoir and has the highest signal-to-noise ratio among all (borehole-)stations. Our newly obtained catalogue spans over more than nine years and features a uniform (and low) detection threshold and a uniform magnitude determination. The improved resolution of the long-term behaviour and the later seismicity increase will help to understand involved mechanisms better. More induced or natural sequences can be investigated with our procedure.
Roseovarius indicus sp. nov., isolated from deep-sea water of the Indian Ocean.
Lai, Qiliang; Zhong, Huanzi; Wang, Jianning; Yuan, Jun; Sun, Fengqin; Wang, Liping; Zheng, Tianling; Shao, Zongze
2011-09-01
A taxonomic study was carried out on a novel bacterial strain, designated B108(T), which was isolated from a polycyclic aromatic hydrocarbon (PAH)-degrading consortium, enriched from deep-sea water of the Indian Ocean. The isolate was Gram-reaction-negative, rod-shaped and non-motile. Growth of strain B108(T) was observed in 1-15 % (w/v) NaCl and at 10-39 °C and it was unable to degrade Tween 80 or gelatin. 16S rRNA gene sequence comparisons showed that strain B108(T) was most closely related to Roseovarius halotolerans HJ50(T) (97.1 % sequence similarity), followed by Roseovarius pacificus 81-2(T) (96.6 %) and Roseovarius aestuarii SMK-122(T) (95.2 %); other species shared <95.0 % sequence similarity. DNA-DNA hybridization tests showed that strain B108(T) had a low DNA-DNA relatedness to R. halotolerans HJ50(T) and R. pacificus 81-2(T) (48±4 % and 44±5 %, respectively). The predominant fatty acids were C₁₆:₀, C₁₆:₀ 2-OH, summed feature 8 (C₁₈:₁ω7c/ω6c) and C₁₉:₀ω8c cyclo, which accounted for 84.2 % of the total cellular fatty acids. The G+C content of the chromosomal DNA was 63.6 mol%. The major respiratory quinone was ubiquinone 10 (Q10). Phosphatidylcholine, phosphatidylglycerol, diphosphatidylglycerol, phosphatidylethanolamine and some unidentified compounds were detected. These characteristics were in good agreement with those of members of the genus Roseovarius. The pufLM gene was also detected. According to its morphology, physiology, fatty acid composition and phylogenetic position based on 16S rRNA sequence data, the novel strain most appropriately belongs to the genus Roseovarius but can be readily distinguished from known species of this genus. Therefore, strain B108(T) represents a novel species, of the genus Roseovarius, for which the name Roseovarius indicus sp. nov. is proposed. The type strain is B108(T) ( = 2PR52-14(T) = CCTCC AB 208233(T) = LMG 24622(T) = MCCC 1A01227(T)).
Deforestation and Industrial Forest Patterns in Colombia: a Case Study
NASA Astrophysics Data System (ADS)
Huo, L. Z.; Boschetti, L.; Sparks, A. M.; Clerici, N.
2017-12-01
The recent peace agreement between the government and the Revolutionary Armed Forces of Colombia (FARC) offers new opportunities for peaceful and sustainable development, but at the same time requires a timely effort to protect biological resources, and ecosystem services (Clerici et al., 2016). In this context, we use the 2001-2017 Landsat data record to prototype a methodology to establish a baseline of deforestation, afforestation and industrial forest practices (i.e. establishment and harvest of forest plantations), and to monitor future changes. Two study areas, which have seen considerable deforestation in recent years, were selected: one in the South of the country, at the edge of the Amazon Forest (WRS path 008 row 059) and one in the center, in mixed forest (WRS path 008 row 055). The time series of all the available cloud free Landsat 5, Landsat 7 and Landsat 8 data was classified into a sequence of binary forest/non forest maps using a deep learning model, successfully used in the natural language processing field, trained to detect forest transitions. Recurrent Neural Networks (RNN) is a class of artificial neural network that extends the conventional neural network with loops in the connections (Graves et al., 2013). Unlike a feed-forward neural network, an RNN is able to process the sequential inputs by having a recurrent hidden state whose activation at each step depends on that of the previous steps. In this manner, the RNN provides a good framework to dynamically model time series data, and has been successfully applied to natural language processing in Google (Sutskever et al., 2014). The sequence of forest cover state maps was subsequently post-processed to differentiate between deforestation (e.g. transition from forest to non forest land use) and industrial forest harvest (i.e. timber harvest followed by regrowth), by integrating the detection of temporal patterns, and spatial patterns. References Clerici, N., et al., (2016). Colombia: Dealing in conservation. Science, 354(6309), 190-190. Sutskever I.,et al. (2014). Sequence to sequence learning with neural networks. Advances in neural information processing systems, 3104-3112. Graves A., et al. (2013). Speech recognition with deep recurrent neural networks. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 6645-6649.
Metagenomics uncovers a new group of low GC and ultra-small marine Actinobacteria
Ghai, Rohit; Mizuno, Carolina Megumi; Picazo, Antonio; Camacho, Antonio; Rodriguez-Valera, Francisco
2013-01-01
We describe a deep-branching lineage of marine Actinobacteria with very low GC content (33%) and the smallest free living cells described yet (cell volume ca. 0.013 μm3), even smaller than the cosmopolitan marine photoheterotroph, ‘Candidatus Pelagibacter ubique'. These microbes are highly related to 16S rRNA sequences retrieved by PCR from the Pacific and Atlantic oceans 20 years ago. Metagenomic fosmids allowed a virtual genome reconstruction that also indicated very small genomes below 1 Mb. A new kind of rhodopsin was detected indicating a photoheterotrophic lifestyle. They are estimated to be ~4% of the total numbers of cells found at the site studied (the Mediterranean deep chlorophyll maximum) and similar numbers were estimated in all tropical and temperate photic zone metagenomes available. Their geographic distribution mirrors that of picocyanobacteria and there appears to be an association between these microbial groups. A new sub-class, ‘Candidatus Actinomarinidae' is proposed to designate these microbes. PMID:23959135
Graphical classification of DNA sequences of HLA alleles by deep learning.
Miyake, Jun; Kaneshita, Yuhei; Asatani, Satoshi; Tagawa, Seiichi; Niioka, Hirohiko; Hirano, Takashi
2018-04-01
Alleles of human leukocyte antigen (HLA)-A DNAs are classified and expressed graphically by using artificial intelligence "Deep Learning (Stacked autoencoder)". Nucleotide sequence data corresponding to the length of 822 bp, collected from the Immuno Polymorphism Database, were compressed to 2-dimensional representation and were plotted. Profiles of the two-dimensional plots indicate that the alleles can be classified as clusters are formed. The two-dimensional plot of HLA-A DNAs gives a clear outlook for characterizing the various alleles.
A novel rhabdovirus associated with acute hemorrhagic fever in central Africa.
Grard, Gilda; Fair, Joseph N; Lee, Deanna; Slikas, Elizabeth; Steffen, Imke; Muyembe, Jean-Jacques; Sittler, Taylor; Veeraraghavan, Narayanan; Ruby, J Graham; Wang, Chunlin; Makuwa, Maria; Mulembakani, Prime; Tesh, Robert B; Mazet, Jonna; Rimoin, Anne W; Taylor, Travis; Schneider, Bradley S; Simmons, Graham; Delwart, Eric; Wolfe, Nathan D; Chiu, Charles Y; Leroy, Eric M
2012-09-01
Deep sequencing was used to discover a novel rhabdovirus (Bas-Congo virus, or BASV) associated with a 2009 outbreak of 3 human cases of acute hemorrhagic fever in Mangala village, Democratic Republic of Congo (DRC), Africa. The cases, presenting over a 3-week period, were characterized by abrupt disease onset, high fever, mucosal hemorrhage, and, in two patients, death within 3 days. BASV was detected in an acute serum sample from the lone survivor at a concentration of 1.09 × 10(6) RNA copies/mL, and 98.2% of the genome was subsequently de novo assembled from ≈ 140 million sequence reads. Phylogenetic analysis revealed that BASV is highly divergent and shares less than 34% amino acid identity with any other rhabdovirus. High convalescent neutralizing antibody titers of >1:1000 were detected in the survivor and an asymptomatic nurse directly caring for him, both of whom were health care workers, suggesting the potential for human-to-human transmission of BASV. The natural animal reservoir host or arthropod vector and precise mode of transmission for the virus remain unclear. BASV is an emerging human pathogen associated with acute hemorrhagic fever in Africa.
A Novel Rhabdovirus Associated with Acute Hemorrhagic Fever in Central Africa
Slikas, Elizabeth; Steffen, Imke; Muyembe, Jean-Jacques; Sittler, Taylor; Veeraraghavan, Narayanan; Ruby, J. Graham; Wang, Chunlin; Makuwa, Maria; Mulembakani, Prime; Tesh, Robert B.; Mazet, Jonna; Rimoin, Anne W.; Taylor, Travis; Schneider, Bradley S.; Simmons, Graham; Delwart, Eric; Wolfe, Nathan D.; Chiu, Charles Y.; Leroy, Eric M.
2012-01-01
Deep sequencing was used to discover a novel rhabdovirus (Bas-Congo virus, or BASV) associated with a 2009 outbreak of 3 human cases of acute hemorrhagic fever in Mangala village, Democratic Republic of Congo (DRC), Africa. The cases, presenting over a 3-week period, were characterized by abrupt disease onset, high fever, mucosal hemorrhage, and, in two patients, death within 3 days. BASV was detected in an acute serum sample from the lone survivor at a concentration of 1.09×106 RNA copies/mL, and 98.2% of the genome was subsequently de novo assembled from ∼140 million sequence reads. Phylogenetic analysis revealed that BASV is highly divergent and shares less than 34% amino acid identity with any other rhabdovirus. High convalescent neutralizing antibody titers of >1∶1000 were detected in the survivor and an asymptomatic nurse directly caring for him, both of whom were health care workers, suggesting the potential for human-to-human transmission of BASV. The natural animal reservoir host or arthropod vector and precise mode of transmission for the virus remain unclear. BASV is an emerging human pathogen associated with acute hemorrhagic fever in Africa. PMID:23028323
Genetic Determinants of Drug Resistance in Mycobacterium tuberculosis and Their Diagnostic Value.
Farhat, Maha R; Sultana, Razvan; Iartchouk, Oleg; Bozeman, Sam; Galagan, James; Sisk, Peter; Stolte, Christian; Nebenzahl-Guimaraes, Hanna; Jacobson, Karen; Sloutsky, Alexander; Kaur, Devinder; Posey, James; Kreiswirth, Barry N; Kurepina, Natalia; Rigouts, Leen; Streicher, Elizabeth M; Victor, Tommie C; Warren, Robin M; van Soolingen, Dick; Murray, Megan
2016-09-01
The development of molecular diagnostics that detect both the presence of Mycobacterium tuberculosis in clinical samples and drug resistance-conferring mutations promises to revolutionize patient care and interrupt transmission by ensuring early diagnosis. However, these tools require the identification of genetic determinants of resistance to the full range of antituberculosis drugs. To determine the optimal molecular approach needed, we sought to create a comprehensive catalog of resistance mutations and assess their sensitivity and specificity in diagnosing drug resistance. We developed and validated molecular inversion probes for DNA capture and deep sequencing of 28 drug-resistance loci in M. tuberculosis. We used the probes for targeted sequencing of a geographically diverse set of 1,397 clinical M. tuberculosis isolates with known drug resistance phenotypes. We identified a minimal set of mutations to predict resistance to first- and second-line antituberculosis drugs and validated our predictions in an independent dataset. We constructed and piloted a web-based database that provides public access to the sequence data and prediction tool. The predicted resistance to rifampicin and isoniazid exceeded 90% sensitivity and specificity but was lower for other drugs. The number of mutations needed to diagnose resistance is large, and for the 13 drugs studied it was 238 across 18 genetic loci. These data suggest that a comprehensive M. tuberculosis drug resistance diagnostic will need to allow for a high dimension of mutation detection. They also support the hypothesis that currently unknown genetic determinants, potentially discoverable by whole-genome sequencing, encode resistance to second-line tuberculosis drugs.
Genetic Determinants of Drug Resistance in Mycobacterium tuberculosis and Their Diagnostic Value
Sultana, Razvan; Iartchouk, Oleg; Bozeman, Sam; Galagan, James; Sisk, Peter; Stolte, Christian; Nebenzahl-Guimaraes, Hanna; Jacobson, Karen; Sloutsky, Alexander; Kaur, Devinder; Posey, James; Kreiswirth, Barry N.; Kurepina, Natalia; Rigouts, Leen; Streicher, Elizabeth M.; Victor, Tommie C.; Warren, Robin M.; van Soolingen, Dick; Murray, Megan
2016-01-01
Rationale: The development of molecular diagnostics that detect both the presence of Mycobacterium tuberculosis in clinical samples and drug resistance–conferring mutations promises to revolutionize patient care and interrupt transmission by ensuring early diagnosis. However, these tools require the identification of genetic determinants of resistance to the full range of antituberculosis drugs. Objectives: To determine the optimal molecular approach needed, we sought to create a comprehensive catalog of resistance mutations and assess their sensitivity and specificity in diagnosing drug resistance. Methods: We developed and validated molecular inversion probes for DNA capture and deep sequencing of 28 drug-resistance loci in M. tuberculosis. We used the probes for targeted sequencing of a geographically diverse set of 1,397 clinical M. tuberculosis isolates with known drug resistance phenotypes. We identified a minimal set of mutations to predict resistance to first- and second-line antituberculosis drugs and validated our predictions in an independent dataset. We constructed and piloted a web-based database that provides public access to the sequence data and prediction tool. Measurements and Main Results: The predicted resistance to rifampicin and isoniazid exceeded 90% sensitivity and specificity but was lower for other drugs. The number of mutations needed to diagnose resistance is large, and for the 13 drugs studied it was 238 across 18 genetic loci. Conclusions: These data suggest that a comprehensive M. tuberculosis drug resistance diagnostic will need to allow for a high dimension of mutation detection. They also support the hypothesis that currently unknown genetic determinants, potentially discoverable by whole-genome sequencing, encode resistance to second-line tuberculosis drugs. PMID:26910495
3' terminal diversity of MRP RNA and other human noncoding RNAs revealed by deep sequencing.
Goldfarb, Katherine C; Cech, Thomas R
2013-09-21
Post-transcriptional 3' end processing is a key component of RNA regulation. The abundant and essential RNA subunit of RNase MRP has been proposed to function in three distinct cellular compartments and therefore may utilize this mode of regulation. Here we employ 3' RACE coupled with high-throughput sequencing to characterize the 3' terminal sequences of human MRP RNA and other noncoding RNAs that form RNP complexes. The 3' terminal sequence of MRP RNA from HEK293T cells has a distinctive distribution of genomically encoded termini (including an assortment of U residues) with a portion of these selectively tagged by oligo(A) tails. This profile contrasts with the relatively homogenous 3' terminus of an in vitro transcribed MRP RNA control and the differing 3' terminal profiles of U3 snoRNA, RNase P RNA, and telomerase RNA (hTR). 3' RACE coupled with deep sequencing provides a valuable framework for the functional characterization of 3' terminal sequences of noncoding RNAs.
Wu, Jinghua; Jia, Shan; Wang, Changxi; Zhang, Wei; Liu, Sixi; Zeng, Xiaojing; Mai, Huirong; Yuan, Xiuli; Du, Yuanping; Wang, Xiaodong; Hong, Xueyu; Li, Xuemei; Wen, Feiqiu; Xu, Xun; Pan, Jianhua; Li, Changgang; Liu, Xiao
2016-01-01
Acute B lymphoblastic leukemia (B-ALL) is one of the most common types of childhood cancer worldwide and chemotherapy is the main treatment approach. Despite good response rates to chemotherapy regiments, many patients eventually relapse and minimal residual disease (MRD) is the leading risk factor for relapse. The evolution of leukemic clones during disease development and treatment may have clinical significance. In this study, we performed immunoglobulin heavy chain ( IGH ) repertoire high throughput sequencing (HTS) on the diagnostic and post-treatment samples of 51 pediatric B-ALL patients. We identified leukemic IGH clones in 92.2% of the diagnostic samples and nearly half of the patients were polyclonal. About one-third of the leukemic clones have correct open reading frame in the complementarity determining region 3 (CDR3) of IGH , which demonstrates that the leukemic B cells were in the early developmental stage. We also demonstrated the higher sensitivity of HTS in MRD detection and investigated the clinical value of using peripheral blood in MRD detection and monitoring the clonal IGH evolution. In addition, we found leukemic clones were extensively undergoing continuous clonal IGH evolution by variable gene replacement. Dynamic frequency change and newly emerged evolved IGH clones were identified upon the pressure of chemotherapy. In summary, we confirmed the high sensitivity and universal applicability of HTS in MRD detection. We also reported the ubiquitous evolved IGH clones in B-ALL samples and their response to chemotherapy during treatment.
Impact of sequencing depth in ChIP-seq experiments
Jung, Youngsook L.; Luquette, Lovelace J.; Ho, Joshua W.K.; Ferrari, Francesco; Tolstorukov, Michael; Minoda, Aki; Issner, Robbyn; Epstein, Charles B.; Karpen, Gary H.; Kuroda, Mitzi I.; Park, Peter J.
2014-01-01
In a chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) experiment, an important consideration in experimental design is the minimum number of sequenced reads required to obtain statistically significant results. We present an extensive evaluation of the impact of sequencing depth on identification of enriched regions for key histone modifications (H3K4me3, H3K36me3, H3K27me3 and H3K9me2/me3) using deep-sequenced datasets in human and fly. We propose to define sufficient sequencing depth as the number of reads at which detected enrichment regions increase <1% for an additional million reads. Although the required depth depends on the nature of the mark and the state of the cell in each experiment, we observe that sufficient depth is often reached at <20 million reads for fly. For human, there are no clear saturation points for the examined datasets, but our analysis suggests 40–50 million reads as a practical minimum for most marks. We also devise a mathematical model to estimate the sufficient depth and total genomic coverage of a mark. Lastly, we find that the five algorithms tested do not agree well for broad enrichment profiles, especially at lower depths. Our findings suggest that sufficient sequencing depth and an appropriate peak-calling algorithm are essential for ensuring robustness of conclusions derived from ChIP-seq data. PMID:24598259
Pick- and waveform-based techniques for real-time detection of induced seismicity
NASA Astrophysics Data System (ADS)
Grigoli, Francesco; Scarabello, Luca; Böse, Maren; Weber, Bernd; Wiemer, Stefan; Clinton, John F.
2018-05-01
The monitoring of induced seismicity is a common operation in many industrial activities, such as conventional and non-conventional hydrocarbon production or mining and geothermal energy exploitation, to cite a few. During such operations, we generally collect very large and strongly noise-contaminated data sets that require robust and automated analysis procedures. Induced seismicity data sets are often characterized by sequences of multiple events with short interevent times or overlapping events; in these cases, pick-based location methods may struggle to correctly assign picks to phases and events, and errors can lead to missed detections and/or reduced location resolution and incorrect magnitudes, which can have significant consequences if real-time seismicity information are used for risk assessment frameworks. To overcome these issues, different waveform-based methods for the detection and location of microseismicity have been proposed. The main advantages of waveform-based methods is that they appear to perform better and can simultaneously detect and locate seismic events providing high-quality locations in a single step, while the main disadvantage is that they are computationally expensive. Although these methods have been applied to different induced seismicity data sets, an extensive comparison with sophisticated pick-based detection methods is still missing. In this work, we introduce our improved waveform-based detector and we compare its performance with two pick-based detectors implemented within the SeiscomP3 software suite. We test the performance of these three approaches with both synthetic and real data sets related to the induced seismicity sequence at the deep geothermal project in the vicinity of the city of St. Gallen, Switzerland.
Viral activities and life cycles in deep subseafloor sediments.
Engelhardt, Tim; Orsi, William D; Jørgensen, Bo Barker
2015-12-01
Viruses are highly abundant in marine subsurface sediments and can even exceed the number of prokaryotes. However, their activity and quantitative impact on microbial populations are still poorly understood. Here, we use gene expression data from published continental margin subseafloor metatranscriptomes to qualitatively assess viral diversity and activity in sediments up to 159 metres below seafloor (mbsf). Mining of the metatranscriptomic data revealed 4651 representative viral homologues (RVHs), representing 2.2% of all metatranscriptome sequence reads, which have close translated homology (average 77%, range 60-97% amino acid identity) to viral proteins. Archaea-infecting RVHs are exclusively detected in the upper 30 mbsf, whereas RVHs for filamentous inoviruses predominate in the deepest sediment layers. RVHs indicative of lysogenic phage-host interactions and lytic activity, notably cell lysis, are detected at all analysed depths and suggest a dynamic virus-host association in the marine deep biosphere studied here. Ongoing lytic viral activity is further indicated by the expression of clustered, regularly interspaced, short palindromic repeat-associated cascade genes involved in cellular defence against viral attacks. The data indicate the activity of viruses in subsurface sediment of the Peruvian margin and suggest that viruses indeed cause cell mortality and may play an important role in the turnover of subseafloor microbial biomass. © 2015 Society for Applied Microbiology and John Wiley & Sons Ltd.
Li, Dandan; Li, Chunjin; Xu, Ying; Xu, Duo; Li, Hongjiao; Gao, Liwei; Chen, Shuxiong; Fu, Lulu; Xu, Xin; Liu, Yongzheng; Zhang, Xueying; Zhang, Jingshun; Ming, Hao; Zheng, Lianwen
2016-04-01
Polycystic ovary syndrome (PCOS) is a complex and heterogeneous endocrine disorder. To understand the pathogenesis of PCOS, we established rat models of PCOS induced by letrozole and employed deep sequencing to screen the differential expression of microRNAs (miRNAs) in PCOS rats and control rats. We observed vaginal smear and detected ovarian pathological alteration and hormone level changes in PCOS rats. Deep sequencing showed that a total of 129 miRNAs were differentially expressed in the ovaries from letrozole-induced rat model compared with the control, including 49 miRNAs upregulated and 80 miRNAs downregulated. Furthermore, the differential expression of miR-201-5p, miR-34b-5p, miR-141-3p, and miR-200a-3p were confirmed by real-time polymerase chain reaction. Bioinformatic analysis revealed that these four miRNAs were predicted to target a large set of genes with different functions. Pathway analysis supported that the miRNAs regulate oocyte meiosis, mitogen-activated protein kinase (MAPK) signaling, phosphoinositide 3-kinase/Akt (PI3K-Akt) signaling, Rap1 signaling, and Notch signaling. These data indicate that miRNAs are differentially expressed in rat PCOS model and the differentially expressed miRNA are involved in the etiology and pathophysiology of PCOS. Our findings will help identify miRNAs as novel diagnostic markers and therapeutic targets for PCOS.
Engel, Juan C.; Ruby, J. Graham; Ganem, Donald; Andino, Raul; DeRisi, Joseph L.
2011-01-01
Honey bees (Apis mellifera) play a critical role in global food production as pollinators of numerous crops. Recently, honey bee populations in the United States, Canada, and Europe have suffered an unexplained increase in annual losses due to a phenomenon known as Colony Collapse Disorder (CCD). Epidemiological analysis of CCD is confounded by a relative dearth of bee pathogen field studies. To identify what constitutes an abnormal pathophysiological condition in a honey bee colony, it is critical to have characterized the spectrum of exogenous infectious agents in healthy hives over time. We conducted a prospective study of a large scale migratory bee keeping operation using high-frequency sampling paired with comprehensive molecular detection methods, including a custom microarray, qPCR, and ultra deep sequencing. We established seasonal incidence and abundance of known viruses, Nosema sp., Crithidia mellificae, and bacteria. Ultra deep sequence analysis further identified four novel RNA viruses, two of which were the most abundant observed components of the honey bee microbiome (∼1011 viruses per honey bee). Our results demonstrate episodic viral incidence and distinct pathogen patterns between summer and winter time-points. Peak infection of common honey bee viruses and Nosema occurred in the summer, whereas levels of the trypanosomatid Crithidia mellificae and Lake Sinai virus 2, a novel virus, peaked in January. PMID:21687739
Ando, Haruko; Horikoshi, Kazuo; Suzuki, Hajime; Isagi, Yuji
2018-01-01
The foraging ecology of pelagic seabirds is difficult to characterize because of their large foraging areas. In the face of this difficulty, DNA metabarcoding may be a useful approach to analyze diet compositions and foraging behaviors. Using this approach, we investigated the diet composition and its seasonal variation of a common seabird species on the Ogasawara Islands, Japan: the wedge-tailed shearwater Ardenna pacifica. We collected fecal samples during the prebreeding (N = 73) and rearing (N = 96) periods. The diet composition of wedge-tailed shearwater was analyzed by Ion Torrent sequencing using two universal polymerase chain reaction primers for the 12S and 16S mitochondrial DNA regions that targeted vertebrates and mollusks, respectively. The results of a BLAST search of obtained sequences detected 31 and 1 vertebrate and mollusk taxa, respectively. The results of the diet composition analysis showed that wedge-tailed shearwaters frequently consumed deep-sea fishes throughout the sampling season, indicating the importance of these fishes as a stable food resource. However, there was a marked seasonal shift in diet, which may reflect seasonal changes in food resource availability and wedge-tailed shearwater foraging behavior. The collected data regarding the shearwater diet may be useful for in situ conservation efforts. Future research that combines DNA metabarcoding with other tools, such as data logging, may provide further insight into the foraging ecology of pelagic seabirds. PMID:29630670
Runckel, Charles; Flenniken, Michelle L; Engel, Juan C; Ruby, J Graham; Ganem, Donald; Andino, Raul; DeRisi, Joseph L
2011-01-01
Honey bees (Apis mellifera) play a critical role in global food production as pollinators of numerous crops. Recently, honey bee populations in the United States, Canada, and Europe have suffered an unexplained increase in annual losses due to a phenomenon known as Colony Collapse Disorder (CCD). Epidemiological analysis of CCD is confounded by a relative dearth of bee pathogen field studies. To identify what constitutes an abnormal pathophysiological condition in a honey bee colony, it is critical to have characterized the spectrum of exogenous infectious agents in healthy hives over time. We conducted a prospective study of a large scale migratory bee keeping operation using high-frequency sampling paired with comprehensive molecular detection methods, including a custom microarray, qPCR, and ultra deep sequencing. We established seasonal incidence and abundance of known viruses, Nosema sp., Crithidia mellificae, and bacteria. Ultra deep sequence analysis further identified four novel RNA viruses, two of which were the most abundant observed components of the honey bee microbiome (∼10(11) viruses per honey bee). Our results demonstrate episodic viral incidence and distinct pathogen patterns between summer and winter time-points. Peak infection of common honey bee viruses and Nosema occurred in the summer, whereas levels of the trypanosomatid Crithidia mellificae and Lake Sinai virus 2, a novel virus, peaked in January.
Complete genome sequence of a novel genotype of squash mosaic virus
USDA-ARS?s Scientific Manuscript database
Complete genome sequence of a novel genotype of Squash mosaic virus (SqMV) infecting squash plants in Spain was obtained using deep sequencing of small ribonucleic acids and assembly. The low nucleotide sequence identities, with 87-88% on RNA1 and 84-86% on RNA2 to known SqMV isolates, suggest a new...
USDA-ARS?s Scientific Manuscript database
The complete genome sequence (6,423 nt) of an emerging Cucumber green mottle mosaic virus (CGMMV) isolate on cucumber in North America was determined through deep sequencing of sRNA and rapid amplification of cDNA ends. It shares 99% nucleotide sequence identity to the Asian genotype, but only 90% t...
Fujita, Junta; Drumm, David T; Iguchi, Akira; Ueda, Yuji; Yamashita, Yuho; Ito, Masaki; Tominaga, Osamu; Kai, Yoshiaki; Ueno, Masahiro; Yamashita, Yoh
2017-10-01
The deep-sea crangonid shrimp, Argis lar, is a highly abundant species from the northern Pacific Ocean. We investigated its phylogeographic and demographic structure across the species' extensive range, using mitochondrial DNA sequence variation to evaluate the impact of deep-sea paleoenvironmental dynamics in the Sea of Japan on population histories. The haplotype network detected three distinct lineages with allopatric isolation, which roughly corresponded to the Sea of Japan (Lineage A), the northwestern Pacific off the Japanese Archipelago (Lineage B), and the Bering Sea/Gulf of Alaska (Lineage C). Lineage A showed relatively low haplotype and nucleotide diversity, a significantly negative value of Tajima's D, and a star-shaped network, suggesting that anoxic bottom-water in the Sea of Japan over the last glacial period may have brought about a reduction in the Sea of Japan population. Furthermore, unexpectedly, the distributions of Lineage A and B were closely related to the pathways of the two ocean currents, especially along the Sanriku Coast. This result indicated that A. lar could disperse across shallow straits through the ocean current, despite their deep-sea adult habitat. Bayesian inference of divergence time revealed that A. lar separated into three lineages approximately 1 million years before present (BP) in the Pleistocene, and then had been influenced by deep-sea paleoenvironmental change in the Sea of Japan during the last glacial period, followed by a more recent larval dispersal with the ocean current since ca. 6 kilo years BP.
Kitahara, Marcelo V.; Cairns, Stephen D.; Stolarski, Jarosław; Blair, David; Miller, David J.
2010-01-01
Background Classical morphological taxonomy places the approximately 1400 recognized species of Scleractinia (hard corals) into 27 families, but many aspects of coral evolution remain unclear despite the application of molecular phylogenetic methods. In part, this may be a consequence of such studies focusing on the reef-building (shallow water and zooxanthellate) Scleractinia, and largely ignoring the large number of deep-sea species. To better understand broad patterns of coral evolution, we generated molecular data for a broad and representative range of deep sea scleractinians collected off New Caledonia and Australia during the last decade, and conducted the most comprehensive molecular phylogenetic analysis to date of the order Scleractinia. Methodology Partial (595 bp) sequences of the mitochondrial cytochrome oxidase subunit 1 (CO1) gene were determined for 65 deep-sea (azooxanthellate) scleractinians and 11 shallow-water species. These new data were aligned with 158 published sequences, generating a 234 taxon dataset representing 25 of the 27 currently recognized scleractinian families. Principal Findings/Conclusions There was a striking discrepancy between the taxonomic validity of coral families consisting predominantly of deep-sea or shallow-water species. Most families composed predominantly of deep-sea azooxanthellate species were monophyletic in both maximum likelihood and Bayesian analyses but, by contrast (and consistent with previous studies), most families composed predominantly of shallow-water zooxanthellate taxa were polyphyletic, although Acroporidae, Poritidae, Pocilloporidae, and Fungiidae were exceptions to this general pattern. One factor contributing to this inconsistency may be the greater environmental stability of deep-sea environments, effectively removing taxonomic “noise” contributed by phenotypic plasticity. Our phylogenetic analyses imply that the most basal extant scleractinians are azooxanthellate solitary corals from deep-water, their divergence predating that of the robust and complex corals. Deep-sea corals are likely to be critical to understanding anthozoan evolution and the origins of the Scleractinia. PMID:20628613
Intelligent fault diagnosis of rolling bearings using an improved deep recurrent neural network
NASA Astrophysics Data System (ADS)
Jiang, Hongkai; Li, Xingqiu; Shao, Haidong; Zhao, Ke
2018-06-01
Traditional intelligent fault diagnosis methods for rolling bearings heavily depend on manual feature extraction and feature selection. For this purpose, an intelligent deep learning method, named the improved deep recurrent neural network (DRNN), is proposed in this paper. Firstly, frequency spectrum sequences are used as inputs to reduce the input size and ensure good robustness. Secondly, DRNN is constructed by the stacks of the recurrent hidden layer to automatically extract the features from the input spectrum sequences. Thirdly, an adaptive learning rate is adopted to improve the training performance of the constructed DRNN. The proposed method is verified with experimental rolling bearing data, and the results confirm that the proposed method is more effective than traditional intelligent fault diagnosis methods.
Position-specific binding of FUS to nascent RNA regulates mRNA length
Masuda, Akio; Takeda, Jun-ichi; Okuno, Tatsuya; Okamoto, Takaaki; Ohkawara, Bisei; Ito, Mikako; Ishigaki, Shinsuke; Sobue, Gen
2015-01-01
More than half of all human genes produce prematurely terminated polyadenylated short mRNAs. However, the underlying mechanisms remain largely elusive. CLIP-seq (cross-linking immunoprecipitation [CLIP] combined with deep sequencing) of FUS (fused in sarcoma) in neuronal cells showed that FUS is frequently clustered around an alternative polyadenylation (APA) site of nascent RNA. ChIP-seq (chromatin immunoprecipitation [ChIP] combined with deep sequencing) of RNA polymerase II (RNAP II) demonstrated that FUS stalls RNAP II and prematurely terminates transcription. When an APA site is located upstream of an FUS cluster, FUS enhances polyadenylation by recruiting CPSF160 and up-regulates the alternative short transcript. In contrast, when an APA site is located downstream from an FUS cluster, polyadenylation is not activated, and the RNAP II-suppressing effect of FUS leads to down-regulation of the alternative short transcript. CAGE-seq (cap analysis of gene expression [CAGE] combined with deep sequencing) and PolyA-seq (a strand-specific and quantitative method for high-throughput sequencing of 3' ends of polyadenylated transcripts) revealed that position-specific regulation of mRNA lengths by FUS is operational in two-thirds of transcripts in neuronal cells, with enrichment in genes involved in synaptic activities. PMID:25995189
NASA Astrophysics Data System (ADS)
Ward, N.; Page, S.; Heidelberg, J.; Eisen, J. A.; Fraser, C. M.
2002-12-01
The composition of microbial communities associated with deep-sea hydrothermal vent animals is of interest because of the key role of bacterial symbionts in driving the chemosynthetic food chain of the vent system, and also because bacterial biofilms attached to animal exterior surfaces may play a part in settlement of larval forms. Sequence analysis of 16S ribosomal RNA (rRNA) genes from such communities provides a snapshot of community structure, as this gene is present in all Bacteria and Archaea, and a useful phylogenetic marker for both cultivated microbial species, and uncultivated species such as many of those found in the deep-sea environment. Specimens of giant tube worms (Riftia pachyptila), mussels (Bathymodiolus thermophilus), and clams (Calyptogena magnifica) were collected during the 2002 R/V Atlantis research cruises to the East Pacific Rise (9N) and Galápagos Rift. Microbial biofilms attached to the exterior surfaces of individual animals were sampled, as were tissues known to harbor chemosynthetic bacterial endosymbionts. Genomic DNA was extracted from the samples using a commercially available kit, and 16S rRNA genes amplified from the mixed bacterial communities using the polymerase chain reaction (PCR) and oligonucleotide primers targeting conserved terminal regions of the 16S rRNA gene. The PCR products obtained were cloned into a plasmid vector and the recombinant plasmids transformed into cells of Escherichia coli. Individual cloned 16S rRNA genes were sequenced at the 5' end of the gene (the most phylogenetically informative region in most taxa) and the sequence data compared to publicly available gene sequence databases, to allow a preliminary assignment of clones to taxonomic groups within the Bacteria and Archaea, and to determine the overall composition and phylogenetic diversity of the animal-associated microbial communities. Analysis of Riftia pachyptila exterior biofilm samples revealed the presence of members of the delta and epsilon proteobacteria, low GC Gram positive bacteria (firmicutes), spirochetes, CFB (Cytophaga-Flavobacterium-Bacteroides) group, green nonsulfur bacteria, acidobacteria, verrucomicrobia, and planctomycetes. The presence of the latter three taxonomic groups is of special interest, as they represent phylogenetically distinct groups within the Bacteria for which specific ecological functions have not yet been identified, but which have been found to be widely distributed and often numerically significant in diverse terrestrial and aquatic habitats. Although further sequencing is required to demonstrate the presence of a Riftia-associated microbial population distinct from that of the surrounding seawater, results available from three Riftia individuals from the East Pacific Rise suggest this to be the case. Analysis of microbial communities associated with the gill tissue of the mussel Bathymodiolus thermophilus shows a population dominated by gamma-Proteobacterial chemoautotrophic symbionts, although lower frequency novel phylotypes have been detected. Representatives of specific taxonomic groups have been selected for sequencing of the complete 16S rRNA gene, and the sequences used to reconstruct phylogenetic trees to more accurately determine the evolutionary relationships between the novel sequences, and available sequences for both cultured and non-cultured bacteria.
Xu, Xiaojing; Yang, Xiaoxu; Wu, Qixi; Liu, Aijie; Yang, Xiaoling; Ye, Adam Yongxin; Huang, August Yue; Li, Jiarui; Wang, Meng; Yu, Zhe; Wang, Sheng; Zhang, Zhichao; Wu, Xiru
2015-01-01
ABSTRACT The majority of children with Dravet syndrome (DS) are caused by de novo SCN1A mutations. To investigate the origin of the mutations, we developed and applied a new method that combined deep amplicon resequencing with a Bayesian model to detect and quantify allelic fractions with improved sensitivity. Of 174 SCN1A mutations in DS probands which were considered “de novo” by Sanger sequencing, we identified 15 cases (8.6%) of parental mosaicism. We identified another five cases of parental mosaicism that were also detectable by Sanger sequencing. Fraction of mutant alleles in the 20 cases of parental mosaicism ranged from 1.1% to 32.6%. Thirteen (65% of 20) mutations originated paternally and seven (35% of 20) maternally. Twelve (60% of 20) mosaic parents did not have any epileptic symptoms. Their mutant allelic fractions were significantly lower than those in mosaic parents with epileptic symptoms (P = 0.016). We identified mosaicism with varied allelic fractions in blood, saliva, urine, hair follicle, oral epithelium, and semen, demonstrating that postzygotic mutations could affect multiple somatic cells as well as germ cells. Our results suggest that more sensitive tools for detecting low‐level mosaicism in parents of families with seemingly “de novo” mutations will allow for better informed genetic counseling. PMID:26096185
ADAR2 induces reproducible changes in sequence and abundance of mature microRNAs in the mouse brain
Vesely, Cornelia; Tauber, Stefanie; Sedlazeck, Fritz J.; Tajaddod, Mansoureh; von Haeseler, Arndt; Jantsch, Michael F.
2014-01-01
Adenosine deaminases that act on RNA (ADARs) deaminate adenosines to inosines in double-stranded RNAs including miRNA precursors. A to I editing is widespread and required for normal life. By comparing deep sequencing data of brain miRNAs from wild-type and ADAR2 deficient mouse strains, we detect editing sites and altered miRNA processing at high sensitivity. We detect 48 novel editing events in miRNAs. Some editing events reach frequencies of up to 80%. About half of all editing events depend on ADAR2 while some miRNAs are preferentially edited by ADAR1. Sixty-four percent of all editing events are located within the seed region of mature miRNAs. For the highly edited miR-3099, we experimentally prove retargeting of the edited miRNA to novel 3′ UTRs. We show further that an abundant editing event in miR-497 promotes processing by Drosha of the corresponding pri-miRNA. We also detect reproducible changes in the abundance of specific miRNAs in ADAR2-deficient mice that occur independent of adjacent A to I editing events. This indicates that ADAR2 binding but not editing of miRNA precursors may influence their processing. Correlating with changes in miRNA abundance we find misregulation of putative targets of these miRNAs in the presence or absence of ADAR2. PMID:25260591
Tagliani, Elisa; Hassan, Mohamed Osman; Waberi, Yacine; De Filippo, Maria Rosaria; Falzon, Dennis; Dean, Anna; Zignol, Matteo; Supply, Philip; Abdoulkader, Mohamed Ali; Hassangue, Hawa; Cirillo, Daniela Maria
2017-12-15
Djibouti is a small country in the Horn of Africa with a high TB incidence (378/100,000 in 2015). Multidrug-resistant TB (MDR-TB) and resistance to second-line agents have been previously identified in the country but the extent of the problem has yet to be quantified. A national survey was conducted to estimate the proportion of MDR-TB among a representative sample of TB patients. Sputum was tested using XpertMTB/RIF and samples positive for MTB and resistant to rifampicin underwent first line phenotypic susceptibility testing. The TB supranational reference laboratory in Milan, Italy, undertook external quality assurance, genotypic testing based on whole genome and targeted-deep sequencing and phylogenetic studies. 301 new and 66 previously treated TB cases were enrolled. MDR-TB was detected in 34 patients: 4.7% of new and 31% of previously treated cases. Resistance to pyrazinamide, aminoglycosides and capreomycin was detected in 68%, 18% and 29% of MDR-TB strains respectively, while resistance to fluoroquinolones was not detected. Cluster analysis identified transmission of MDR-TB as a critical factor fostering drug resistance in the country. Levels of MDR-TB in Djibouti are among the highest on the African continent. High prevalence of resistance to pyrazinamide and second-line injectable agents have important implications for treatment regimens.
Feasibility of 3.0T pelvic MR imaging in the evaluation of endometriosis.
Manganaro, L; Fierro, F; Tomei, A; Irimia, D; Lodise, P; Sergi, M E; Vinci, V; Sollazzo, P; Porpora, M G; Delfini, R; Vittori, G; Marini, M
2012-06-01
Endometriosis represents an important clinical problem in women of reproductive age with high impact on quality of life, work productivity and health care management. The aim of this study is to define the role of 3T magnetom system MRI in the evaluation of endometriosis. Forty-six women, with transvaginal (TV) ultrasound examination positive for endometriosis, with pelvic pain, or infertile underwent an MR 3.0T examination with the following protocol: T2 weighted FRFSE HR sequences, T2 weighted FRFSE HR CUBE 3D sequences, T1 w FSE sequences, LAVA-flex sequences. Pelvic anatomy, macroscopic endometriosis implants, deep endometriosis implants, fallopian tube involvement, adhesions presence, fluid effusion in Douglas pouch, uterus and kidney pathologies or anomalies associated and sacral nervous routes were considered by two radiologists in consensus. Laparoscopy was considered the gold standard. MRI imaging diagnosed deep endometriosis in 22/46 patients, endometriomas not associated to deep implants in 9/46 patients, 15/46 patients resulted negative for endometriosis, 11 of 22 patients with deep endometriosis reported ovarian endometriosis cyst. We obtained high percentages of sensibility (96.97%), specificity (100.00%), VPP (100.00%), VPN (92.86%). Pelvic MRI performed with 3T system guarantees high spatial and contrast resolution, providing accurate information about endometriosis implants, with a good pre-surgery mapping of the lesions involving both bowels and bladder surface and recto-uterine ligaments. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
2018-01-01
Liquid biopsies to genotype the epidermal growth factor receptor (EGFR) for targeted therapy have been implemented in clinical decision-making in the field of lung cancer, but harmonization of detection methods is still scarce among clinical laboratories. We performed a pilot external quality assurance (EQA) scheme to harmonize circulating tumor DNA testing among laboratories. For EQA, we created materials containing different levels of spiked cell-free DNA (cfDNA) in normal plasma. The limit of detection (LOD) of the cobas® EGFR Mutation Test v2 (Roche Molecular Systems) was also evaluated. From November 2016 to June 2017, seven clinical diagnostic laboratories participated in the EQA program. The majority (98.94%) of results obtained using the cobas assay and next-generation sequencing (NGS) were acceptable. Quantitative results from the cobas assay were positively correlated with allele frequencies derived from digital droplet PCR measurements and showed good reproducibility among laboratories. The LOD of the cobas assay was 5~27 copies/mL for p.E746_A750del (exon 19 deletion), 35~70 copies/mL for p.L858R, 18~36 copies/mL for p.T790M, and 15~31 copies/mL for p.A767_V769dup (exon 20 insertion). Deep sequencing of materials (>100,000X depth of coverage) resulted in detection of low-level targets present at frequencies of 0.06~0.13%. Our results indicate that the cobas assay is a reliable and rapid method for detecting EGFR mutations in plasma cfDNA. Careful interpretation is particularly important for p.T790M detection in the setting of relapse. Individual laboratories should optimize NGS performance to maximize clinical utility.
ComplexContact: a web server for inter-protein contact prediction using deep learning.
Zeng, Hong; Wang, Sheng; Zhou, Tianming; Zhao, Feifeng; Li, Xiufeng; Wu, Qing; Xu, Jinbo
2018-05-22
ComplexContact (http://raptorx2.uchicago.edu/ComplexContact/) is a web server for sequence-based interfacial residue-residue contact prediction of a putative protein complex. Interfacial residue-residue contacts are critical for understanding how proteins form complex and interact at residue level. When receiving a pair of protein sequences, ComplexContact first searches for their sequence homologs and builds two paired multiple sequence alignments (MSA), then it applies co-evolution analysis and a CASP-winning deep learning (DL) method to predict interfacial contacts from paired MSAs and visualizes the prediction as an image. The DL method was originally developed for intra-protein contact prediction and performed the best in CASP12. Our large-scale experimental test further shows that ComplexContact greatly outperforms pure co-evolution methods for inter-protein contact prediction, regardless of the species.
Cavalier-Smith, Thomas
2015-04-01
Contradictory and confusing results can arise if sequenced 'monoprotist' samples really contain DNA of very different species. Eukaryote-wide phylogenetic analyses using five genes from the amoeboflagellate culture ATCC 50646 previously implied it was an undescribed percolozoan related to percolatean flagellates (Stephanopogon, Percolomonas). Contrastingly, three phylogenetic analyses of 18S rRNA alone, did not place it within Percolozoa, but as an isolated deep-branching excavate. I resolve that contradiction by sequence phylogenies for all five genes individually, using up to 652 taxa. Its 18S rRNA sequence (GQ377652) is near-identical to one from stained-glass windows, somewhat more distant from one from cooling-tower water, all three related to terrestrial actinocephalid gregarines Hoplorhynchus and Pyxinia. All four protein-gene sequences (Hsp90; α-tubulin; β-tubulin; actin) are from an amoeboflagellate heterolobosean percolozoan, not especially deeply branching. Contrary to previous conclusions from trees combining protein and rRNA sequences or rDNA trees including Eozoa only, this culture does not represent a major novel deep-branching eukaryote lineage distinct from Heterolobosea, and thus lacks special significance for deep eukaryote phylogeny, though the rDNA sequence is important for gregarine phylogeny. α-Tubulin trees for over 250 eukaryotes refute earlier suggestions of lateral gene transfer within eukaryotes, being largely congruent with morphology and other gene trees. Copyright © 2015. Published by Elsevier GmbH.
NASA Astrophysics Data System (ADS)
George, Daniel; Huerta, E. A.
2018-03-01
The recent Nobel-prize-winning detections of gravitational waves from merging black holes and the subsequent detection of the collision of two neutron stars in coincidence with electromagnetic observations have inaugurated a new era of multimessenger astrophysics. To enhance the scope of this emergent field of science, we pioneered the use of deep learning with convolutional neural networks, that take time-series inputs, for rapid detection and characterization of gravitational wave signals. This approach, Deep Filtering, was initially demonstrated using simulated LIGO noise. In this article, we present the extension of Deep Filtering using real data from LIGO, for both detection and parameter estimation of gravitational waves from binary black hole mergers using continuous data streams from multiple LIGO detectors. We demonstrate for the first time that machine learning can detect and estimate the true parameters of real events observed by LIGO. Our results show that Deep Filtering achieves similar sensitivities and lower errors compared to matched-filtering while being far more computationally efficient and more resilient to glitches, allowing real-time processing of weak time-series signals in non-stationary non-Gaussian noise with minimal resources, and also enables the detection of new classes of gravitational wave sources that may go unnoticed with existing detection algorithms. This unified framework for data analysis is ideally suited to enable coincident detection campaigns of gravitational waves and their multimessenger counterparts in real-time.
Deep learning improves prediction of CRISPR-Cpf1 guide RNA activity.
Kim, Hui Kwon; Min, Seonwoo; Song, Myungjae; Jung, Soobin; Choi, Jae Woo; Kim, Younggwang; Lee, Sangeun; Yoon, Sungroh; Kim, Hyongbum Henry
2018-03-01
We present two algorithms to predict the activity of AsCpf1 guide RNAs. Indel frequencies for 15,000 target sequences were used in a deep-learning framework based on a convolutional neural network to train Seq-deepCpf1. We then incorporated chromatin accessibility information to create the better-performing DeepCpf1 algorithm for cell lines for which such information is available and show that both algorithms outperform previous machine learning algorithms on our own and published data sets.
Unified Deep Learning Architecture for Modeling Biology Sequence.
Wu, Hongjie; Cao, Chengyuan; Xia, Xiaoyan; Lu, Qiang
2017-10-09
Prediction of the spatial structure or function of biological macromolecules based on their sequence remains an important challenge in bioinformatics. When modeling biological sequences using traditional sequencing models, characteristics, such as long-range interactions between basic units, the complicated and variable output of labeled structures, and the variable length of biological sequences, usually lead to different solutions on a case-by-case basis. This study proposed the use of bidirectional recurrent neural networks based on long short-term memory or a gated recurrent unit to capture long-range interactions by designing the optional reshape operator to adapt to the diversity of the output labels and implementing a training algorithm to support the training of sequence models capable of processing variable-length sequences. Additionally, the merge and pooling operators enhanced the ability to capture short-range interactions between basic units of biological sequences. The proposed deep-learning model and its training algorithm might be capable of solving currently known biological sequence-modeling problems through the use of a unified framework. We validated our model on one of the most difficult biological sequence-modeling problems currently known, with our results indicating the ability of the model to obtain predictions of protein residue interactions that exceeded the accuracy of current popular approaches by 10% based on multiple benchmarks.
Biedrzycka, Aleksandra; Sebastian, Alvaro; Migalska, Magdalena; Westerdahl, Helena; Radwan, Jacek
2017-07-01
Characterization of highly duplicated genes, such as genes of the major histocompatibility complex (MHC), where multiple loci often co-amplify, has until recently been hindered by insufficient read depths per amplicon. Here, we used ultra-deep Illumina sequencing to resolve genotypes at exon 3 of MHC class I genes in the sedge warbler (Acrocephalus schoenobaenus). We sequenced 24 individuals in two replicates and used this data, as well as a simulated data set, to test the effect of amplicon coverage (range: 500-20 000 reads per amplicon) on the repeatability of genotyping using four different genotyping approaches. A third replicate employed unique barcoding to assess the extent of tag jumping, that is swapping of individual tag identifiers, which may confound genotyping. The reliability of MHC genotyping increased with coverage and approached or exceeded 90% within-method repeatability of allele calling at coverages of >5000 reads per amplicon. We found generally high agreement between genotyping methods, especially at high coverages. High reliability of the tested genotyping approaches was further supported by our analysis of the simulated data set, although the genotyping approach relying primarily on replication of variants in independent amplicons proved sensitive to repeatable errors. According to the most repeatable genotyping method, the number of co-amplifying variants per individual ranged from 19 to 42. Tag jumping was detectable, but at such low frequencies that it did not affect the reliability of genotyping. We thus demonstrate that gene families with many co-amplifying genes can be reliably genotyped using HTS, provided that there is sufficient per amplicon coverage. © 2016 John Wiley & Sons Ltd.
Molecular Definition of Vaginal Microbiota in East African Commercial Sex Workers ▿ †
Schellenberg, John J.; Links, Matthew G.; Hill, Janet E.; Dumonceaux, Tim J.; Kimani, Joshua; Jaoko, Walter; Wachihi, Charles; Mungai, Jane Njeri; Peters, Geoffrey A.; Tyler, Shaun; Graham, Morag; Severini, Alberto; Fowke, Keith R.; Ball, T. Blake; Plummer, Francis A.
2011-01-01
Resistance to HIV infection in a cohort of commercial sex workers living in Nairobi, Kenya, is linked to mucosal and antiinflammatory factors that may be influenced by the vaginal microbiota. Since bacterial vaginosis (BV), a polymicrobial dysbiosis characterized by low levels of protective Lactobacillus organisms, is an established risk factor for HIV infection, we investigated whether vaginal microbiology was associated with HIV-exposed seronegative (HESN) or HIV-seropositive (HIV+) status in this cohort. A subset of 44 individuals was selected for deep-sequencing analysis based on the chaperonin 60 (cpn60) universal target (UT), including HESN individuals (n = 16), other HIV-seronegative controls (HIV-N, n = 16), and HIV+ individuals (n = 12). Our findings indicate exceptionally high phylogenetic resolution of the cpn60 UT using reads as short as 200 bp, with 54 species in 29 genera detected in this group. Contrary to our initial hypothesis, few differences between HESN and HIV-N women were observed. Several HIV+ women had distinct profiles dominated by Escherichia coli. The deep-sequencing phylogenetic profile of the vaginal microbiota corresponds closely to BV+ and BV− diagnoses by microscopy, elucidating BV at the molecular level. A cluster of samples with intermediate abundance of Lactobacillus and dominant Gardnerella was identified, defining a distinct BV phenotype that may represent a transitional stage between BV+ and BV−. Several alpha- and betaproteobacteria, including the recently described species Variovorax paradoxus, were found to correlate positively with increased Lactobacillus levels that define the BV− (“normal”) phenotype. We conclude that cpn60 UT is ideally suited to next-generation sequencing technologies for further investigation of microbial community dynamics and mucosal immunity underlying HIV resistance in this cohort. PMID:21531840
DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations.
Yuan, Yuchen; Shi, Yi; Li, Changyang; Kim, Jinman; Cai, Weidong; Han, Zeguang; Feng, David Dagan
2016-12-23
With the developments of DNA sequencing technology, large amounts of sequencing data have become available in recent years and provide unprecedented opportunities for advanced association studies between somatic point mutations and cancer types/subtypes, which may contribute to more accurate somatic point mutation based cancer classification (SMCC). However in existing SMCC methods, issues like high data sparsity, small volume of sample size, and the application of simple linear classifiers, are major obstacles in improving the classification performance. To address the obstacles in existing SMCC studies, we propose DeepGene, an advanced deep neural network (DNN) based classifier, that consists of three steps: firstly, the clustered gene filtering (CGF) concentrates the gene data by mutation occurrence frequency, filtering out the majority of irrelevant genes; secondly, the indexed sparsity reduction (ISR) converts the gene data into indexes of its non-zero elements, thereby significantly suppressing the impact of data sparsity; finally, the data after CGF and ISR is fed into a DNN classifier, which extracts high-level features for accurate classification. Experimental results on our curated TCGA-DeepGene dataset, which is a reformulated subset of the TCGA dataset containing 12 selected types of cancer, show that CGF, ISR and DNN all contribute in improving the overall classification performance. We further compare DeepGene with three widely adopted classifiers and demonstrate that DeepGene has at least 24% performance improvement in terms of testing accuracy. Based on deep learning and somatic point mutation data, we devise DeepGene, an advanced cancer type classifier, which addresses the obstacles in existing SMCC studies. Experiments indicate that DeepGene outperforms three widely adopted existing classifiers, which is mainly attributed to its deep learning module that is able to extract the high level features between combinatorial somatic point mutations and cancer types.
Pérez, Ruben; Calleros, Lucía; Marandino, Ana; Sarute, Nicolás; Iraola, Gregorio; Grecco, Sofia; Blanc, Hervé; Vignuzzi, Marco; Isakov, Ofer; Shomron, Noam; Carrau, Lucía; Hernández, Martín; Francia, Lourdes; Sosa, Katia; Tomás, Gonzalo; Panzera, Yanina
2014-01-01
Canine parvovirus (CPV), a fast-evolving single-stranded DNA virus, comprises three antigenic variants (2a, 2b, and 2c) with different frequencies and genetic variability among countries. The contribution of co-infection and recombination to the genetic variability of CPV is far from being fully elucidated. Here we took advantage of a natural CPV population, recently formed by the convergence of divergent CPV-2c and CPV-2a strains, to study co-infection and recombination. Complete sequences of the viral coding region of CPV-2a and CPV-2c strains from 40 samples were generated and analyzed using phylogenetic tools. Two samples showed co-infection and were further analyzed by deep sequencing. The sequence profile of one of the samples revealed the presence of CPV-2c and CPV-2a strains that differed at 29 nucleotides. The other sample included a minor CPV-2a strain (13.3% of the viral population) and a major recombinant strain (86.7%). The recombinant strain arose from inter-genotypic recombination between CPV-2c and CPV-2a strains within the VP1/VP2 gene boundary. Our findings highlight the importance of deep-sequencing analysis to provide a better understanding of CPV molecular diversity. PMID:25365348
Šlapeta, Jan; Saverimuttu, Stefan; Vogelnest, Larry; Sangster, Cheryl; Hulst, Frances; Rose, Karrie; Thompson, Paul; Whittington, Richard
2017-11-01
The short-beaked echidna (Tachyglossus aculeatus) and the platypus (Ornithorhynchus anatinus) are iconic egg-laying monotremes (Mammalia: Monotremata) from Australasia. The aim of this study was to demonstrate the utility of diversity profiles in disease investigations of monotremes. Using small subunit (18S) rDNA amplicon deep-sequencing we demonstrated the presence of apicomplexan parasites and confirmed by direct and cloned amplicon gene sequencing Theileria ornithorhynchi, Theileria tachyglossi, Eimeria echidnae and Cryptosporidium fayeri. Using a combination of samples from healthy and diseased animals, we show a close evolutionary relationship between species of coccidia (Eimeria) and piroplasms (Theileria) from the echidna and platypus. The presence of E. echidnae was demonstrated in faeces and tissues affected by disseminated coccidiosis. Moreover, the presence of E. echidnae DNA in the blood of echidnas was associated with atoxoplasma-like stages in white blood cells, suggesting Hepatozoon tachyglossi blood stages are disseminated E. echidnae stages. These next-generation DNA sequencing technologies are suited to material and organisms that have not been previously characterised and for which the material is scarce. The deep sequencing approach supports traditional diagnostic methods, including microscopy, clinical pathology and histopathology, to better define the status quo. This approach is particularly suitable for wildlife disease investigation. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Pasquale, V.; Chiozzi, P.; Verdoya, M.
2013-05-01
Temperatures recorded in wells as deep as 6 km drilled for hydrocarbon prospecting were used together with geological information to depict the thermal regime of the sedimentary sequence of the eastern sector of the Po Plain. After correction for drilling disturbance, temperature data were analyzed through an inversion technique based on a laterally constant thermal gradient model. The obtained thermal gradient is quite low within the deep carbonate unit (14 mK m- 1), while it is larger (53 mK m- 1) in the overlying impermeable formations. In the uppermost sedimentary layers, the thermal gradient is close to the regional average (21 mK m- 1). We argue that such a vertical change cannot be ascribed to thermal conductivity variation within the sedimentary sequence, but to deep groundwater flow. Since the hydrogeological characteristics (including litho-stratigraphic sequence and structural setting) hardly permit forced convection, we suggest that thermal convection might occur within the deep carbonate aquifer. The potential of this mechanism was evaluated by means of the Rayleigh number analysis. It turned out that permeability required for convection to occur must be larger than 3 10- 15 m2. The average over-heat ratio is 0.45. The lateral variation of hydrothermal regime was tested by using temperature data representing the aquifer thermal conditions. We found that thermal convection might be more developed and variable at the Ferrara High and its surroundings, where widespread fracturing may have increased permeability.
Hellner, Karin; Miranda, Fabrizio; Fotso Chedom, Donatien; Herrero-Gonzalez, Sandra; Hayden, Daniel M; Tearle, Rick; Artibani, Mara; KaramiNejadRanjbar, Mohammad; Williams, Ruth; Gaitskell, Kezia; Elorbany, Samar; Xu, Ruoyan; Laios, Alex; Buiga, Petronela; Ahmed, Karim; Dhar, Sunanda; Zhang, Rebecca Yu; Campo, Leticia; Myers, Kevin A; Lozano, María; Ruiz-Miró, María; Gatius, Sónia; Mota, Alba; Moreno-Bueno, Gema; Matias-Guiu, Xavier; Benítez, Javier; Witty, Lorna; McVean, Gil; Leedham, Simon; Tomlinson, Ian; Drmanac, Radoje; Cazier, Jean-Baptiste; Klein, Robert; Dunne, Kevin; Bast, Robert C; Kennedy, Stephen H; Hassan, Bassim; Lise, Stefano; Garcia, María José; Peters, Brock A; Yau, Christopher; Sauka-Spengler, Tatjana; Ahmed, Ahmed Ashour
2016-08-01
Current screening methods for ovarian cancer can only detect advanced disease. Earlier detection has proved difficult because the molecular precursors involved in the natural history of the disease are unknown. To identify early driver mutations in ovarian cancer cells, we used dense whole genome sequencing of micrometastases and microscopic residual disease collected at three time points over three years from a single patient during treatment for high-grade serous ovarian cancer (HGSOC). The functional and clinical significance of the identified mutations was examined using a combination of population-based whole genome sequencing, targeted deep sequencing, multi-center analysis of protein expression, loss of function experiments in an in-vivo reporter assay and mammalian models, and gain of function experiments in primary cultured fallopian tube epithelial (FTE) cells. We identified frequent mutations involving a 40kb distal repressor region for the key stem cell differentiation gene SOX2. In the apparently normal FTE, the region was also mutated. This was associated with a profound increase in SOX2 expression (p<2(-16)), which was not found in patients without cancer (n=108). Importantly, we show that SOX2 overexpression in FTE is nearly ubiquitous in patients with HGSOCs (n=100), and common in BRCA1-BRCA2 mutation carriers (n=71) who underwent prophylactic salpingo-oophorectomy. We propose that the finding of SOX2 overexpression in FTE could be exploited to develop biomarkers for detecting disease at a premalignant stage, which would reduce mortality from this devastating disease. Copyright © 2016 The Ohio State University Wexner Medical Center. Published by Elsevier B.V. All rights reserved.
Monitoring controlled graves representing common burial scenarios with ground penetrating radar
NASA Astrophysics Data System (ADS)
Schultz, John J.; Martin, Michael M.
2012-08-01
Implementing controlled geophysical research is imperative to understand the variables affecting detection of clandestine graves during real-life forensic searches. This study focused on monitoring two empty control graves (shallow and deep) and six burials containing a small pig carcass (Sus scrofa) representing different burial forensic scenarios: a shallow buried naked carcass, a deep buried naked carcass, a deep buried carcass covered by a layer of rocks, a deep buried carcass covered by a layer of lime, a deep buried carcass wrapped in an impermeable tarpaulin and a deep buried carcass wrapped in a cotton blanket. Multi-frequency, ground penetrating radar (GPR) data were collected monthly over a 12-month monitoring period. The research site was a cleared field within a wooded area in a humid subtropical environment, and the soil consisted of a Spodosol, a common soil type in Florida. This study compared 2D GPR reflection profiles and horizontal time slices obtained with both 250 and 500 MHz dominant frequency antennae to determine the utility of both antennae for grave detection in this environment over time. Overall, a combination of both antennae frequencies provided optimal detection of the targets. Better images were noted for deep graves, compared to shallow graves. The 250 MHz antenna provided better images for detecting deep graves, as less non-target anomalies were produced with lower radar frequencies. The 250 MHz antenna also provided better images detecting the disturbed ground. Conversely, the 500 MHz antenna provided better images when detecting the shallow pig grave. The graves that contained a pig carcass with associated grave items provided the best results, particularly the carcass covered with rocks and the carcass wrapped in a tarpaulin. Finally, during periods of increased soil moisture levels, there was increased detection of graves that was most likely related to conductive decompositional fluid from the carcasses.
Han, R; Rai, A; Nakamura, M; Suzuki, H; Takahashi, H; Yamazaki, M; Saito, K
2016-01-01
Study on transcriptome, the entire pool of transcripts in an organism or single cells at certain physiological or pathological stage, is indispensable in unraveling the connection and regulation between DNA and protein. Before the advent of deep sequencing, microarray was the main approach to handle transcripts. Despite obvious shortcomings, including limited dynamic range and difficulties to compare the results from distinct experiments, microarray was widely applied. During the past decade, next-generation sequencing (NGS) has revolutionized our understanding of genomics in a fast, high-throughput, cost-effective, and tractable manner. By adopting NGS, efficiency and fruitful outcomes concerning the efforts to elucidate genes responsible for producing active compounds in medicinal plants were profoundly enhanced. The whole process involves steps, from the plant material sampling, to cDNA library preparation, to deep sequencing, and then bioinformatics takes over to assemble enormous-yet fragmentary-data from which to comb and extract information. The unprecedentedly rapid development of such technologies provides so many choices to facilitate the task, which can cause confusion when choosing the suitable methodology for specific purposes. Here, we review the general approaches for deep transcriptome analysis and then focus on their application in discovering biosynthetic pathways of medicinal plants that produce important secondary metabolites. © 2016 Elsevier Inc. All rights reserved.
Planetary cubesats - mission architectures
NASA Astrophysics Data System (ADS)
Bousquet, Pierre W.; Ulamec, Stephan; Jaumann, Ralf; Vane, Gregg; Baker, John; Clark, Pamela; Komarek, Tomas; Lebreton, Jean-Pierre; Yano, Hajime
2016-07-01
Miniaturisation of technologies over the last decade has made cubesats a valid solution for deep space missions. For example, a spectacular set 13 cubesats will be delivered in 2018 to a high lunar orbit within the frame of SLS' first flight, referred to as Exploration Mission-1 (EM-1). Each of them will perform autonomously valuable scientific or technological investigations. Other situations are encountered, such as the auxiliary landers / rovers and autonomous camera that will be carried in 2018 to asteroid 1993 JU3 by JAXA's Hayabusas 2 probe, and will provide complementary scientific return to their mothership. In this case, cubesats depend on a larger spacecraft for deployment and other resources, such as telecommunication relay or propulsion. For both situations, we will describe in this paper how cubesats can be used as remote observatories (such as NEO detection missions), as technology demonstrators, and how they can perform or contribute to all steps in the Deep Space exploration sequence: Measurements during Deep Space cruise, Body Fly-bies, Body Orbiters, Atmospheric probes (Jupiter probe, Venus atmospheric probes, ..), Static Landers, Mobile landers (such as balloons, wheeled rovers, small body rovers, drones, penetrators, floating devices, …), Sample Return. We will elaborate on mission architectures for the most promising concepts where cubesat size devices offer an advantage in terms of affordability, feasibility, and increase of scientific return.
NASA Astrophysics Data System (ADS)
Mercier, Annie; Baillon, Sandrine; Daly, Marymegan; Macrander, Jason; Hamel, Jean-François
2017-03-01
Knowledge of the general biology and reproductive ecology of deep-water species can help predict their resilience to environmental and anthropogenic disturbances. The present study centers on live specimens of a deep-water sea anemone which were collected at bathyal depths between 1100 and 1400 m and kept in a mesocosm for over 6 years. Morphology and DNA sequencing confirmed that the species belongs to the genus Urticina. Male and female (9-10 cm pedal disk diameter, 90 tentacles) spawned 4 years post collection, in early spring (March). Both sexes released gametes through the mouth. The negatively buoyant oocytes (550-600 μm in diameter) quickly settled on the rocks and soft sediments surrounding the female. Lecithotrophic embryonic and larval development occurred on the substratum. Fully developed planula larvae were detected after 17-21 days. Planulae started to crawl and swim around but remained demersal. Metamorphosis and settlement occurred after 30-35 days, exclusively on hard substrata and preferentially on undersurfaces. Offspring grew slowly, developing 8 tentacles after 5 months and 24 tentacles after 12 months (3-4 mm pedal disk diameter). After 2.6 years of growth, the captive-born sea anemones reached 12-16 mm in pedal disk diameter and possessed 48-54 tentacles.
Deep History of East Asian Populations Revealed Through Genetic Analysis of the Ainu
Jeong, Choongwon; Nakagome, Shigeki; Di Rienzo, Anna
2016-01-01
Despite recent advances in population genomics, much remains to be elucidated with regard to East Asian population history. The Ainu, a hunter–gatherer population of northern Japan and Sakhalin island of Russia, are thought to be key to elucidating the prehistory of Japan and the peopling of East Asia. Here, we study the genetic relationship of the Ainu with other East Asian and Siberian populations outside the Japanese archipelago using genome-wide genotyping data. We find that the Ainu represent a deep branch of East Asian diversity more basal than all present-day East Asian farmers. However, we did not find a genetic connection between the Ainu and populations of the Tibetan plateau, rejecting their long-held hypothetical connection based on Y chromosome data. Unlike all other East Asian populations investigated, the Ainu have a closer genetic relationship with northeast Siberians than with central Siberians, suggesting ancient connections among populations around the Sea of Okhotsk. We also detect a recent genetic contribution of the Ainu to nearby populations, but no evidence for reciprocal recent gene flow is observed. Whole genome sequencing of contemporary and ancient Ainu individuals will be helpful to understand the details of the deep history of East Asians. PMID:26500257
NGC 1866: First Spectroscopic Detection of Fast-rotating Stars in a Young LMC Cluster
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dupree, A. K.; Dotter, A.; Johnson, C. I.
High-resolution spectroscopic observations were taken of 29 extended main-sequence turnoff (eMSTO) stars in the young (∼200 Myr) Large Magellanic Cloud (LMC) cluster, NGC 1866, using the Michigan/ Magellan Fiber System and MSpec spectrograph on the Magellan -Clay 6.5 m telescope. These spectra reveal the first direct detection of rapidly rotating stars whose presence has only been inferred from photometric studies. The eMSTO stars exhibit H α emission (indicative of Be-star decretion disks), others have shallow broad H α absorption (consistent with rotation ≳150 km s{sup −1}), or deep H α core absorption signaling lower rotation velocities (≲150 km s{sup −1}).more » The spectra appear consistent with two populations of stars—one rapidly rotating, and the other, younger and slowly rotating.« less
Lobo, Jorge; Ferreira, Maria S; Antunes, Ilisa C; Teixeira, Marcos A L; Borges, Luisa M S; Sousa, Ronaldo; Gomes, Pedro A; Costa, Maria Helena; Cunha, Marina R; Costa, Filipe O
2017-02-01
In this study we compared DNA barcode-suggested species boundaries with morphology-based species identifications in the amphipod fauna of the southern European Atlantic coast. DNA sequences of the cytochrome c oxidase subunit I barcode region (COI-5P) were generated for 43 morphospecies (178 specimens) collected along the Portuguese coast which, together with publicly available COI-5P sequences, produced a final dataset comprising 68 morphospecies and 295 sequences. Seventy-five BINs (Barcode Index Numbers) were assigned to these morphospecies, of which 48 were concordant (i.e., 1 BIN = 1 species), 8 were taxonomically discordant, and 19 were singletons. Twelve species had matching sequences (<2% distance) with conspecifics from distant locations (e.g., North Sea). Seven morphospecies were assigned to multiple, and highly divergent, BINs, including specimens of Corophium multisetosum (18% divergence) and Dexamine spiniventris (16% divergence), which originated from sampling locations on the west coast of Portugal (only about 36 and 250 km apart, respectively). We also found deep divergence (4%-22%) among specimens of seven species from Portugal compared to those from the North Sea and Italy. The detection of evolutionarily meaningful divergence among populations of several amphipod species from southern Europe reinforces the need for a comprehensive re-assessment of the diversity of this faunal group.
Investigation of a Canine Parvovirus Outbreak using Next Generation Sequencing.
Parker, Jayme; Murphy, Molly; Hueffer, Karsten; Chen, Jack
2017-08-29
Canine parvovirus (CPV) outbreaks can have a devastating effect in communities with dense dog populations. The interior region of Alaska experienced a CPV outbreak in the winter of 2016 leading to the further investigation of the virus due to reports of increased morbidity and mortality occurring at dog mushing kennels in the area. Twelve rectal-swab specimens from dogs displaying clinical signs consistent with parvoviral-associated disease were processed using next-generation sequencing (NGS) methodologies by targeting RNA transcripts, and therefore detecting only replicating virus. All twelve specimens demonstrated the presence of the CPV transcriptome, with read depths ranging from 2.2X - 12,381X, genome coverage ranging from 44.8-96.5%, and representation of CPV sequencing reads to those of the metagenome background ranging from 0.0015-6.7%. Using the data generated by NGS, the presence of newly evolved, yet known, strains of both CPV-2a and CPV-2b were identified and grouped geographically. Deep-sequencing data provided additional diagnostic information in terms of investigating novel CPV in this outbreak. NGS data in addition to limited serological data provided strong diagnostic evidence that this outbreak most likely arose from unvaccinated or under-vaccinated canines, not from a novel CPV strain incapable of being neutralized by current vaccination efforts.
Levin, Mattias; King, Jasmine J.; Glanville, Jacob; Jackson, Katherine J. L.; Looney, Timothy J.; Hoh, Ramona A.; Mari, Adriano; Andersson, Morgan; Greiff, Lennart; Fire, Andrew Z.; Boyd, Scott D.; Ohlin, Mats
2016-01-01
Background Specific immunotherapy (SIT) is the only treatment with proven long-term curative potential in allergic disease. Allergen-specific IgE is the causative agent of allergic disease, and antibodies contribute to SIT, but the effects of SIT on aeroallergen-specific B cell repertoires are not well understood. Objective To characterize the IgE sequences expressed by allergen-specific B cells, and track the fate of these B cell clones during SIT. Methods We have used high-throughput antibody gene sequencing and identification of allergen-specific IgE using combinatorial antibody fragment library technology to analyze immunoglobulin repertoires of blood and nasal mucosa of aeroallergen-sensitized individuals before and during the first year of subcutaneous SIT. Results Of 52 distinct allergen-specific IgE heavy chains from eight allergic donors, 37 were also detected by high-throughput antibody gene sequencing of blood, nasal mucosa, or both sample types. The allergen-specific clones had increased persistence, higher likelihood of belonging to clones expressing other switched isotypes, and possibly larger clone size than the rest of the IgE repertoire. Clone members in nasal tissue showed close mutational relationships. Conclusion Combining functional binding studies, deep antibody repertoire sequencing, and information on clinical outcomes in larger studies may in the future aid assessment of SIT mechanisms and efficacy. PMID:26559321
Sabokrou, Mohammad; Fayyaz, Mohsen; Fathy, Mahmood; Klette, Reinhard
2017-02-17
This paper proposes a fast and reliable method for anomaly detection and localization in video data showing crowded scenes. Time-efficient anomaly localization is an ongoing challenge and subject of this paper. We propose a cubicpatch- based method, characterised by a cascade of classifiers, which makes use of an advanced feature-learning approach. Our cascade of classifiers has two main stages. First, a light but deep 3D auto-encoder is used for early identification of "many" normal cubic patches. This deep network operates on small cubic patches as being the first stage, before carefully resizing remaining candidates of interest, and evaluating those at the second stage using a more complex and deeper 3D convolutional neural network (CNN). We divide the deep autoencoder and the CNN into multiple sub-stages which operate as cascaded classifiers. Shallow layers of the cascaded deep networks (designed as Gaussian classifiers, acting as weak single-class classifiers) detect "simple" normal patches such as background patches, and more complex normal patches are detected at deeper layers. It is shown that the proposed novel technique (a cascade of two cascaded classifiers) performs comparable to current top-performing detection and localization methods on standard benchmarks, but outperforms those in general with respect to required computation time.
Nagahama, Hiroshi; Suzuki, Kengo; Shonai, Takaharu; Aratani, Kazuki; Sakurai, Yuuki; Nakamura, Manami; Sakata, Motomichi
2015-01-01
Electrodes are surgically implanted into the subthalamic nucleus (STN) of Parkinson's disease patients to provide deep brain stimulation. For ensuring correct positioning, the anatomic location of the STN must be determined preoperatively. Magnetic resonance imaging has been used for pinpointing the location of the STN. To identify the optimal imaging sequence for identifying the STN, we compared images produced with T2 star-weighted angiography (SWAN), gradient echo T2*-weighted imaging, and fast spin echo T2-weighted imaging in 6 healthy volunteers. Our comparison involved measurement of the contrast-to-noise ratio (CNR) for the STN and substantia nigra and a radiologist's interpretations of the images. Of the sequences examined, the CNR and qualitative scores were significantly higher on SWAN images than on other images (p < 0.01) for STN visualization. Kappa value (0.74) on SWAN images was the highest in three sequences for visualizing the STN. SWAN is the sequence best suited for identifying the STN at the present time.
Metagenomic gene annotation by a homology-independent approach
DOE Office of Scientific and Technical Information (OSTI.GOV)
Froula, Jeff; Zhang, Tao; Salmeen, Annette
2011-06-02
Fully understanding the genetic potential of a microbial community requires functional annotation of all the genes it encodes. The recently developed deep metagenome sequencing approach has enabled rapid identification of millions of genes from a complex microbial community without cultivation. Current homology-based gene annotation fails to detect distantly-related or structural homologs. Furthermore, homology searches with millions of genes are very computational intensive. To overcome these limitations, we developed rhModeller, a homology-independent software pipeline to efficiently annotate genes from metagenomic sequencing projects. Using cellulases and carbonic anhydrases as two independent test cases, we demonstrated that rhModeller is much faster than HMMERmore » but with comparable accuracy, at 94.5percent and 99.9percent accuracy, respectively. More importantly, rhModeller has the ability to detect novel proteins that do not share significant homology to any known protein families. As {approx}50percent of the 2 million genes derived from the cow rumen metagenome failed to be annotated based on sequence homology, we tested whether rhModeller could be used to annotate these genes. Preliminary results suggest that rhModeller is robust in the presence of missense and frameshift mutations, two common errors in metagenomic genes. Applying the pipeline to the cow rumen genes identified 4,990 novel cellulases candidates and 8,196 novel carbonic anhydrase candidates.In summary, we expect rhModeller to dramatically increase the speed and quality of metagnomic gene annotation.« less
MicroRNA-like RNAs from the same miRNA precursors play a role in cassava chilling responses.
Zeng, Changying; Xia, Jing; Chen, Xin; Zhou, Yufei; Peng, Ming; Zhang, Weixiong
2017-12-07
MicroRNAs (miRNAs) are known to play important roles in various cellular processes and stress responses. MiRNAs can be identified by analyzing reads from high-throughput deep sequencing. The reads realigned to miRNA precursors besides canonical miRNAs were initially considered as sequencing noise and ignored from further analysis. Here we reported a small-RNA species of phased and half-phased miRNA-like RNAs different from canonical miRNAs from cassava miRNA precursors detected under four distinct chilling conditions. They can form abundant multiple small RNAs arranged along precursors in a tandem and phased or half-phased fashion. Some of these miRNA-like RNAs were experimentally confirmed by re-amplification and re-sequencing, and have a similar qRT-PCR detection ratio as their cognate canonical miRNAs. The target genes of those phased and half-phased miRNA-like RNAs function in process of cell growth metabolism and play roles in protein kinase. Half-phased miR171d.3 was confirmed to have cleavage activities on its target gene P-glycoprotein 11, a broad substrate efflux pump across cellular membranes, which is thought to provide protection for tropical cassava during sharp temperature decease. Our results showed that the RNAs from miRNA precursors are miRNA-like small RNAs that are viable negative gene regulators and may have potential functions in cassava chilling responses.
NASA Astrophysics Data System (ADS)
Li, Ren; Zhou, Mingxing; Li, Jine; Wang, Zihua; Zhang, Weikai; Yue, Chunyan; Ma, Yan; Peng, Hailin; Wei, Zewen; Hu, Zhiyuan
2018-03-01
EGFR mutations companion diagnostics have been proved to be crucial for the efficacy of tyrosine kinase inhibitor targeted cancer therapies. To uncover multiple mutations occurred in minority of EGFR-mutated cells, which may be covered by the noises from majority of un-mutated cells, is currently becoming an urgent clinical requirement. Here we present the validation of a microfluidic-chip-based method for detecting EGFR multi-mutations at single-cell level. By trapping and immunofluorescently imaging single cells in specifically designed silicon microwells, the EGFR-expressed cells were easily identified. By in situ lysing single cells, the cell lysates of EGFR-expressed cells were retrieved without cross-contamination. Benefited from excluding the noise from cells without EGFR expression, the simple and cost-effective Sanger's sequencing, but not the expensive deep sequencing of the whole cell population, was used to discover multi-mutations. We verified the new method with precisely discovering three most important EGFR drug-related mutations from a sample in which EGFR-mutated cells only account for a small percentage of whole cell population. The microfluidic chip is capable of discovering not only the existence of specific EGFR multi-mutations, but also other valuable single-cell-level information: on which specific cells the mutations occurred, or whether different mutations coexist on the same cells. This microfluidic chip constitutes a promising method to promote simple and cost-effective Sanger's sequencing to be a routine test before performing targeted cancer therapy.[Figure not available: see fulltext.
Deep sequencing methods for protein engineering and design.
Wrenbeck, Emily E; Faber, Matthew S; Whitehead, Timothy A
2017-08-01
The advent of next-generation sequencing (NGS) has revolutionized protein science, and the development of complementary methods enabling NGS-driven protein engineering have followed. In general, these experiments address the functional consequences of thousands of protein variants in a massively parallel manner using genotype-phenotype linked high-throughput functional screens followed by DNA counting via deep sequencing. We highlight the use of information rich datasets to engineer protein molecular recognition. Examples include the creation of multiple dual-affinity Fabs targeting structurally dissimilar epitopes and engineering of a broad germline-targeted anti-HIV-1 immunogen. Additionally, we highlight the generation of enzyme fitness landscapes for conducting fundamental studies of protein behavior and evolution. We conclude with discussion of technological advances. Copyright © 2016 Elsevier Ltd. All rights reserved.
Dutta, Sutapa; Kumawat, Giriraj; Singh, Bikram P; Gupta, Deepak K; Singh, Sangeeta; Dogra, Vivek; Gaikwad, Kishor; Sharma, Tilak R; Raje, Ranjeet S; Bandhopadhya, Tapas K; Datta, Subhojit; Singh, Mahendra N; Bashasab, Fakrudin; Kulwal, Pawan; Wanjari, K B; K Varshney, Rajeev; Cook, Douglas R; Singh, Nagendra K
2011-01-20
Pigeonpea [Cajanus cajan (L.) Millspaugh], one of the most important food legumes of semi-arid tropical and subtropical regions, has limited genomic resources, particularly expressed sequence based (genic) markers. We report a comprehensive set of validated genic simple sequence repeat (SSR) markers using deep transcriptome sequencing, and its application in genetic diversity analysis and mapping. In this study, 43,324 transcriptome shotgun assembly unigene contigs were assembled from 1.696 million 454 GS-FLX sequence reads of separate pooled cDNA libraries prepared from leaf, root, stem and immature seed of two pigeonpea varieties, Asha and UPAS 120. A total of 3,771 genic-SSR loci, excluding homopolymeric and compound repeats, were identified; of which 2,877 PCR primer pairs were designed for marker development. Dinucleotide was the most common repeat motif with a frequency of 60.41%, followed by tri- (34.52%), hexa- (2.62%), tetra- (1.67%) and pentanucleotide (0.76%) repeat motifs. Primers were synthesized and tested for 772 of these loci with repeat lengths of ≥ 18 bp. Of these, 550 markers were validated for consistent amplification in eight diverse pigeonpea varieties; 71 were found to be polymorphic on agarose gel electrophoresis. Genetic diversity analysis was done on 22 pigeonpea varieties and eight wild species using 20 highly polymorphic genic-SSR markers. The number of alleles at these loci ranged from 4-10 and the polymorphism information content values ranged from 0.46 to 0.72. Neighbor-joining dendrogram showed distinct separation of the different groups of pigeonpea cultivars and wild species. Deep transcriptome sequencing of the two parental lines helped in silico identification of polymorphic genic-SSR loci to facilitate the rapid development of an intra-species reference genetic map, a subset of which was validated for expected allelic segregation in the reference mapping population. We developed 550 validated genic-SSR markers in pigeonpea using deep transcriptome sequencing. From these, 20 highly polymorphic markers were used to evaluate the genetic relationship among species of the genus Cajanus. A comprehensive set of genic-SSR markers was developed as an important genomic resource for diversity analysis and genetic mapping in pigeonpea.
2011-01-01
Background Pigeonpea [Cajanus cajan (L.) Millspaugh], one of the most important food legumes of semi-arid tropical and subtropical regions, has limited genomic resources, particularly expressed sequence based (genic) markers. We report a comprehensive set of validated genic simple sequence repeat (SSR) markers using deep transcriptome sequencing, and its application in genetic diversity analysis and mapping. Results In this study, 43,324 transcriptome shotgun assembly unigene contigs were assembled from 1.696 million 454 GS-FLX sequence reads of separate pooled cDNA libraries prepared from leaf, root, stem and immature seed of two pigeonpea varieties, Asha and UPAS 120. A total of 3,771 genic-SSR loci, excluding homopolymeric and compound repeats, were identified; of which 2,877 PCR primer pairs were designed for marker development. Dinucleotide was the most common repeat motif with a frequency of 60.41%, followed by tri- (34.52%), hexa- (2.62%), tetra- (1.67%) and pentanucleotide (0.76%) repeat motifs. Primers were synthesized and tested for 772 of these loci with repeat lengths of ≥18 bp. Of these, 550 markers were validated for consistent amplification in eight diverse pigeonpea varieties; 71 were found to be polymorphic on agarose gel electrophoresis. Genetic diversity analysis was done on 22 pigeonpea varieties and eight wild species using 20 highly polymorphic genic-SSR markers. The number of alleles at these loci ranged from 4-10 and the polymorphism information content values ranged from 0.46 to 0.72. Neighbor-joining dendrogram showed distinct separation of the different groups of pigeonpea cultivars and wild species. Deep transcriptome sequencing of the two parental lines helped in silico identification of polymorphic genic-SSR loci to facilitate the rapid development of an intra-species reference genetic map, a subset of which was validated for expected allelic segregation in the reference mapping population. Conclusion We developed 550 validated genic-SSR markers in pigeonpea using deep transcriptome sequencing. From these, 20 highly polymorphic markers were used to evaluate the genetic relationship among species of the genus Cajanus. A comprehensive set of genic-SSR markers was developed as an important genomic resource for diversity analysis and genetic mapping in pigeonpea. PMID:21251263
Shu, Hai-Rong; Bi, Huai; Pan, Yang-Chun; Xu, Hang-Yu; Song, Jian-Xin; Hu, Jie
2015-09-16
Usher syndrome (USH) is an autosomal recessive disorder characterized by hearing impairment and vision dysfunction due to retinitis pigmentosa. Phenotypic and genetic heterogeneities of this disease make it impractical to obtain a genetic diagnosis by conventional Sanger sequencing. In this study, we applied a next-generation sequencing approach to detect genetic abnormalities in patients with USH. Two unrelated Chinese families were recruited, consisting of two USH afflicted patients and four unaffected relatives. We selected 199 genes related to inherited retinal diseases as targets for deep exome sequencing. Through systematic data analysis using an established bioinformatics pipeline, all variants that passed filter criteria were validated by Sanger sequencing and co-segregation analysis. A homozygous frameshift mutation (c.4382delA, p.T1462Lfs*2) was revealed in exon20 of gene USH2A in the F1 family. Two compound heterozygous mutations, IVS47 + 1G > A and c.13156A > T (p.I4386F), located in intron 48 and exon 63 respectively, of USH2A, were identified as causative mutations for the F2 family. Of note, the missense mutation c.13156A > T has not been reported so far. In conclusion, targeted exome sequencing precisely and rapidly identified the genetic defects in two Chinese USH families and this technique can be applied as a routine examination for these disorders with significant clinical and genetic heterogeneity.
USDA-ARS?s Scientific Manuscript database
Modern day genomics holds the promise of solving the complexities of basic plant sciences, and of catalyzing practical advances in plant breeding. While contiguous, "base perfect" deep sequencing is a key module of any genome project, recent advances in parallel next generation sequencing technologi...
3′ terminal diversity of MRP RNA and other human noncoding RNAs revealed by deep sequencing
2013-01-01
Background Post-transcriptional 3′ end processing is a key component of RNA regulation. The abundant and essential RNA subunit of RNase MRP has been proposed to function in three distinct cellular compartments and therefore may utilize this mode of regulation. Here we employ 3′ RACE coupled with high-throughput sequencing to characterize the 3′ terminal sequences of human MRP RNA and other noncoding RNAs that form RNP complexes. Results The 3′ terminal sequence of MRP RNA from HEK293T cells has a distinctive distribution of genomically encoded termini (including an assortment of U residues) with a portion of these selectively tagged by oligo(A) tails. This profile contrasts with the relatively homogenous 3′ terminus of an in vitro transcribed MRP RNA control and the differing 3′ terminal profiles of U3 snoRNA, RNase P RNA, and telomerase RNA (hTR). Conclusions 3′ RACE coupled with deep sequencing provides a valuable framework for the functional characterization of 3′ terminal sequences of noncoding RNAs. PMID:24053768
Schilmiller, Anthony L; Miner, Dennis P; Larson, Matthew; McDowell, Eric; Gang, David R; Wilkerson, Curtis; Last, Robert L
2010-07-01
Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces beta-caryophyllene and alpha-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells.
Schilmiller, Anthony L.; Miner, Dennis P.; Larson, Matthew; McDowell, Eric; Gang, David R.; Wilkerson, Curtis; Last, Robert L.
2010-01-01
Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces β-caryophyllene and α-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells. PMID:20431087
NASA Astrophysics Data System (ADS)
Aravena, M.; Decarli, R.; Walter, F.; Da Cunha, E.; Bauer, F. E.; Carilli, C. L.; Daddi, E.; Elbaz, D.; Ivison, R. J.; Riechers, D. A.; Smail, I.; Swinbank, A. M.; Weiss, A.; Anguita, T.; Assef, R. J.; Bell, E.; Bertoldi, F.; Bacon, R.; Bouwens, R.; Cortes, P.; Cox, P.; Gónzalez-López, J.; Hodge, J.; Ibar, E.; Inami, H.; Infante, L.; Karim, A.; Le Le Fèvre, O.; Magnelli, B.; Ota, K.; Popping, G.; Sheth, K.; van der Werf, P.; Wagg, J.
2016-12-01
We present an analysis of a deep (1σ = 13 μJy) cosmological 1.2 mm continuum map based on ASPECS, the ALMA Spectroscopic Survey in the Hubble Ultra Deep Field. In the 1 arcmin2 covered by ASPECS we detect nine sources at \\gt 3.5σ significance at 1.2 mm. Our ALMA-selected sample has a median redshift of z=1.6+/- 0.4, with only one galaxy detected at z > 2 within the survey area. This value is significantly lower than that found in millimeter samples selected at a higher flux density cutoff and similar frequencies. Most galaxies have specific star formation rates (SFRs) similar to that of main-sequence galaxies at the same epoch, and we find median values of stellar mass and SFRs of 4.0× {10}10 {M}⊙ and ˜ 40 {M}⊙ yr-1, respectively. Using the dust emission as a tracer for the interstellar medium (ISM) mass, we derive depletion times that are typically longer than 300 Myr, and we find molecular gas fractions ranging from ˜0.1 to 1.0. As noted by previous studies, these values are lower than those using CO-based ISM estimates by a factor of ˜2. The 1 mm number counts (corrected for fidelity and completeness) are in agreement with previous studies that were typically restricted to brighter sources. With our individual detections only, we recover 55% ± 4% of the extragalactic background light (EBL) at 1.2 mm measured by the Planck satellite, and we recover 80% ± 7% of this EBL if we include the bright end of the number counts and additional detections from stacking. The stacked contribution is dominated by galaxies at z˜ 1{--}2, with stellar masses of (1-3) × 1010 M {}⊙ . For the first time, we are able to characterize the population of galaxies that dominate the EBL at 1.2 mm.
NASA Astrophysics Data System (ADS)
Ma, Ling; Lu, Guolan; Wang, Dongsheng; Wang, Xu; Chen, Zhuo Georgia; Muller, Susan; Chen, Amy; Fei, Baowei
2017-03-01
Hyperspectral imaging (HSI) is an emerging imaging modality that can provide a noninvasive tool for cancer detection and image-guided surgery. HSI acquires high-resolution images at hundreds of spectral bands, providing big data to differentiating different types of tissue. We proposed a deep learning based method for the detection of head and neck cancer with hyperspectral images. Since the deep learning algorithm can learn the feature hierarchically, the learned features are more discriminative and concise than the handcrafted features. In this study, we adopt convolutional neural networks (CNN) to learn the deep feature of pixels for classifying each pixel into tumor or normal tissue. We evaluated our proposed classification method on the dataset containing hyperspectral images from 12 tumor-bearing mice. Experimental results show that our method achieved an average accuracy of 91.36%. The preliminary study demonstrated that our deep learning method can be applied to hyperspectral images for detecting head and neck tumors in animal models.
Automated detection of very Low Surface Brightness galaxies in the Virgo Cluster
NASA Astrophysics Data System (ADS)
Prole, D. J.; Davies, J. I.; Keenan, O. C.; Davies, L. J. M.
2018-04-01
We report the automatic detection of a new sample of very low surface brightness (LSB) galaxies, likely members of the Virgo cluster. We introduce our new software, DeepScan, that has been designed specifically to detect extended LSB features automatically using the DBSCAN algorithm. We demonstrate the technique by applying it over a 5 degree2 portion of the Next-Generation Virgo Survey (NGVS) data to reveal 53 low surface brightness galaxies that are candidate cluster members based on their sizes and colours. 30 of these sources are new detections despite the region being searched specifically for LSB galaxies previously. Our final sample contains galaxies with 26.0 ≤ ⟨μe⟩ ≤ 28.5 and 19 ≤ mg ≤ 21, making them some of the faintest known in Virgo. The majority of them have colours consistent with the red sequence, and have a mean stellar mass of 106.3 ± 0.5M⊙ assuming cluster membership. After using ProFit to fit Sérsic profiles to our detections, none of the new sources have effective radii larger than 1.5 Kpc and do not meet the criteria for ultra-diffuse galaxy (UDG) classification, so we classify them as ultra-faint dwarfs.
In situ Detection of Microbial Life in the Deep Biosphere in Igneous Ocean Crust.
Salas, Everett C; Bhartia, Rohit; Anderson, Louise; Hug, William F; Reid, Ray D; Iturrino, Gerardo; Edwards, Katrina J
2015-01-01
The deep biosphere is a major frontier to science. Recent studies have shown the presence and activity of cells in deep marine sediments and in the continental deep biosphere. Volcanic lavas in the deep ocean subsurface, through which substantial fluid flow occurs, present another potentially massive deep biosphere. We present results from the deployment of a novel in situ logging tool designed to detect microbial life harbored in a deep, native, borehole environment within igneous oceanic crust, using deep ultraviolet native fluorescence spectroscopy. Results demonstrate the predominance of microbial-like signatures within the borehole environment, with densities in the range of 10(5) cells/mL. Based on transport and flux models, we estimate that such a concentration of microbial cells could not be supported by transport through the crust, suggesting in situ growth of these communities.
The salt-responsive transcriptome of chickpea roots and nodules via deepSuperSAGE
2011-01-01
Background The combination of high-throughput transcript profiling and next-generation sequencing technologies is a prerequisite for genome-wide comprehensive transcriptome analysis. Our recent innovation of deepSuperSAGE is based on an advanced SuperSAGE protocol and its combination with massively parallel pyrosequencing on Roche's 454 sequencing platform. As a demonstration of the power of this combination, we have chosen the salt stress transcriptomes of roots and nodules of the third most important legume crop chickpea (Cicer arietinum L.). While our report is more technology-oriented, it nevertheless addresses a major world-wide problem for crops generally: high salinity. Together with low temperatures and water stress, high salinity is responsible for crop losses of millions of tons of various legume (and other) crops. Continuously deteriorating environmental conditions will combine with salinity stress to further compromise crop yields. As a good example for such stress-exposed crop plants, we started to characterize salt stress responses of chickpeas on the transcriptome level. Results We used deepSuperSAGE to detect early global transcriptome changes in salt-stressed chickpea. The salt stress responses of 86,919 transcripts representing 17,918 unique 26 bp deepSuperSAGE tags (UniTags) from roots of the salt-tolerant variety INRAT-93 two hours after treatment with 25 mM NaCl were characterized. Additionally, the expression of 57,281 transcripts representing 13,115 UniTags was monitored in nodules of the same plants. From a total of 144,200 analyzed 26 bp tags in roots and nodules together, 21,401 unique transcripts were identified. Of these, only 363 and 106 specific transcripts, respectively, were commonly up- or down-regulated (>3.0-fold) under salt stress in both organs, witnessing a differential organ-specific response to stress. Profiting from recent pioneer works on massive cDNA sequencing in chickpea, more than 9,400 UniTags were able to be linked to UniProt entries. Additionally, gene ontology (GO) categories over-representation analysis enabled to filter out enriched biological processes among the differentially expressed UniTags. Subsequently, the gathered information was further cross-checked with stress-related pathways. From several filtered pathways, here we focus exemplarily on transcripts associated with the generation and scavenging of reactive oxygen species (ROS), as well as on transcripts involved in Na+ homeostasis. Although both processes are already very well characterized in other plants, the information generated in the present work is of high value. Information on expression profiles and sequence similarity for several hundreds of transcripts of potential interest is now available. Conclusions This report demonstrates, that the combination of the high-throughput transcriptome profiling technology SuperSAGE with one of the next-generation sequencing platforms allows deep insights into the first molecular reactions of a plant exposed to salinity. Cross validation with recent reports enriched the information about the salt stress dynamics of more than 9,000 chickpea ESTs, and enlarged their pool of alternative transcripts isoforms. As an example for the high resolution of the employed technology that we coin deepSuperSAGE, we demonstrate that ROS-scavenging and -generating pathways undergo strong global transcriptome changes in chickpea roots and nodules already 2 hours after onset of moderate salt stress (25 mM NaCl). Additionally, a set of more than 15 candidate transcripts are proposed to be potential components of the salt overly sensitive (SOS) pathway in chickpea. Newly identified transcript isoforms are potential targets for breeding novel cultivars with high salinity tolerance. We demonstrate that these targets can be integrated into breeding schemes by micro-arrays and RT-PCR assays downstream of the generation of 26 bp tags by SuperSAGE. PMID:21320317
The salt-responsive transcriptome of chickpea roots and nodules via deepSuperSAGE.
Molina, Carlos; Zaman-Allah, Mainassara; Khan, Faheema; Fatnassi, Nadia; Horres, Ralf; Rotter, Björn; Steinhauer, Diana; Amenc, Laurie; Drevon, Jean-Jacques; Winter, Peter; Kahl, Günter
2011-02-14
The combination of high-throughput transcript profiling and next-generation sequencing technologies is a prerequisite for genome-wide comprehensive transcriptome analysis. Our recent innovation of deepSuperSAGE is based on an advanced SuperSAGE protocol and its combination with massively parallel pyrosequencing on Roche's 454 sequencing platform. As a demonstration of the power of this combination, we have chosen the salt stress transcriptomes of roots and nodules of the third most important legume crop chickpea (Cicer arietinum L.). While our report is more technology-oriented, it nevertheless addresses a major world-wide problem for crops generally: high salinity. Together with low temperatures and water stress, high salinity is responsible for crop losses of millions of tons of various legume (and other) crops. Continuously deteriorating environmental conditions will combine with salinity stress to further compromise crop yields. As a good example for such stress-exposed crop plants, we started to characterize salt stress responses of chickpeas on the transcriptome level. We used deepSuperSAGE to detect early global transcriptome changes in salt-stressed chickpea. The salt stress responses of 86,919 transcripts representing 17,918 unique 26 bp deepSuperSAGE tags (UniTags) from roots of the salt-tolerant variety INRAT-93 two hours after treatment with 25 mM NaCl were characterized. Additionally, the expression of 57,281 transcripts representing 13,115 UniTags was monitored in nodules of the same plants. From a total of 144,200 analyzed 26 bp tags in roots and nodules together, 21,401 unique transcripts were identified. Of these, only 363 and 106 specific transcripts, respectively, were commonly up- or down-regulated (>3.0-fold) under salt stress in both organs, witnessing a differential organ-specific response to stress.Profiting from recent pioneer works on massive cDNA sequencing in chickpea, more than 9,400 UniTags were able to be linked to UniProt entries. Additionally, gene ontology (GO) categories over-representation analysis enabled to filter out enriched biological processes among the differentially expressed UniTags. Subsequently, the gathered information was further cross-checked with stress-related pathways. From several filtered pathways, here we focus exemplarily on transcripts associated with the generation and scavenging of reactive oxygen species (ROS), as well as on transcripts involved in Na+ homeostasis. Although both processes are already very well characterized in other plants, the information generated in the present work is of high value. Information on expression profiles and sequence similarity for several hundreds of transcripts of potential interest is now available. This report demonstrates, that the combination of the high-throughput transcriptome profiling technology SuperSAGE with one of the next-generation sequencing platforms allows deep insights into the first molecular reactions of a plant exposed to salinity. Cross validation with recent reports enriched the information about the salt stress dynamics of more than 9,000 chickpea ESTs, and enlarged their pool of alternative transcripts isoforms. As an example for the high resolution of the employed technology that we coin deepSuperSAGE, we demonstrate that ROS-scavenging and -generating pathways undergo strong global transcriptome changes in chickpea roots and nodules already 2 hours after onset of moderate salt stress (25 mM NaCl). Additionally, a set of more than 15 candidate transcripts are proposed to be potential components of the salt overly sensitive (SOS) pathway in chickpea. Newly identified transcript isoforms are potential targets for breeding novel cultivars with high salinity tolerance. We demonstrate that these targets can be integrated into breeding schemes by micro-arrays and RT-PCR assays downstream of the generation of 26 bp tags by SuperSAGE.
The 3-D aftershock distribution of three recent M5~5.5 earthquakes in the Anza region,California
NASA Astrophysics Data System (ADS)
Zhang, Q.; Wdowinski, S.; Lin, G.
2011-12-01
The San Jacinto fault zone (SJFZ) exhibits the highest level of seismicity compared to other regions in southern California. On average, it produces four earthquakes per day, most of them at depth of 10-17 km. Over the past decade, an increasing seismic activity occurred in the Anza region, which included three M5~5.5 events and their aftershock sequences. These events occurred in 2001, 2005, and 2010. In this research we map the 3-D distribution of these three events to evaluate their rupture geometry and better understand the unusual deep seismic pattern along the SJFZ, which was termed "deep creep" (Wdowinski, 2009). We relocated 97,562 events from 1981 to 2011 in Anza region by applying the Source-Specific Station Term (SSST) method (Lin et al., 2006) and used an accurate 1-D velocity model derived from 3-D model of Lin et al (2007) and used In order to separate the aftershock sequence from background seismicity, we characterized each of the three aftershock sequences using Omori's law. Preliminary results show that all three sequences had a similar geometry of deep elongated aftershock distribution. Most aftershocks occurred at depth of 10-17 km and extended over a 70 km long segments of the SJFZ, centered at the mainshock hypocenters. A comparative study of other M5~5.5 mainshocks and their aftershock sequences in southern California reveals very different geometrical pattern, suggesting that the three Anza M5~5.5 events are unique and can be indicative of "deep creep" deformation processes. Reference 1.Lin, G.and Shearer,P.M.,2006, The COMPLOC earthquake location package,Seism. Res. Lett.77, pp.440-444. 2.Lin, G. and Shearer, P.M., Hauksson, E., and Thurber C.H.,2007, A three-dimensional crustal seismic velocity model for southern California from a composite event method,J. Geophys.Res.112, B12306, doi: 10.1029/ 2007JB004977. 3.Wdowinski, S. ,2009, Deep creep as a cause for the excess seismicity along the San Jacinto fault, Nat. Geosci.,doi:10.1038/NGEO684.
Effects of hydrostatic pressure on yeasts isolated from deep-sea hydrothermal vents.
Burgaud, Gaëtan; Hué, Nguyen Thi Minh; Arzur, Danielle; Coton, Monika; Perrier-Cornet, Jean-Marie; Jebbar, Mohamed; Barbier, Georges
2015-11-01
Hydrostatic pressure plays a significant role in the distribution of life in the biosphere. Knowledge of deep-sea piezotolerant and (hyper)piezophilic bacteria and archaea diversity has been well documented, along with their specific adaptations to cope with high hydrostatic pressure (HHP). Recent investigations of deep-sea microbial community compositions have shown unexpected micro-eukaryotic communities, mainly dominated by fungi. Molecular methods such as next-generation sequencing have been used for SSU rRNA gene sequencing to reveal fungal taxa. Currently, a difficult but fascinating challenge for marine mycologists is to create deep-sea marine fungus culture collections and assess their ability to cope with pressure. Indeed, although there is no universal genetic marker for piezoresistance, physiological analyses provide concrete relevant data for estimating their adaptations and understanding the role of fungal communities in the abyss. The present study investigated morphological and physiological responses of fungi to HHP using a collection of deep-sea yeasts as a model. The aim was to determine whether deep-sea yeasts were able to tolerate different HHP and if they were metabolically active. Here we report an unexpected taxonomic-based dichotomic response to pressure with piezosensitve ascomycetes and piezotolerant basidiomycetes, and distinct morphological switches triggered by pressure for certain strains. Copyright © 2015 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.
X-linked Alport syndrome caused by splicing mutations in COL4A5.
Nozu, Kandai; Vorechovsky, Igor; Kaito, Hiroshi; Fu, Xue Jun; Nakanishi, Koichi; Hashimura, Yuya; Hashimoto, Fusako; Kamei, Koichi; Ito, Shuichi; Kaku, Yoshitsugu; Imasawa, Toshiyuki; Ushijima, Katsumi; Shimizu, Junya; Makita, Yoshio; Konomoto, Takao; Yoshikawa, Norishige; Iijima, Kazumoto
2014-11-07
X-linked Alport syndrome is caused by mutations in the COL4A5 gene. Although many COL4A5 mutations have been detected, the mutation detection rate has been unsatisfactory. Some men with X-linked Alport syndrome show a relatively mild phenotype, but molecular basis investigations have rarely been conducted to clarify the underlying mechanism. In total, 152 patients with X-linked Alport syndrome who were suspected of having Alport syndrome through clinical and pathologic investigations and referred to the hospital for mutational analysis between January of 2006 and January of 2013 were genetically diagnosed. Among those patients, 22 patients had suspected splice site mutations. Transcripts are routinely examined when suspected splice site mutations for abnormal transcripts are detected; 11 of them showed expected exon skipping, but others showed aberrant splicing patterns. The mutation detection strategy had two steps: (1) genomic DNA analysis using PCR and direct sequencing and (2) mRNA analysis using RT-PCR to detect RNA processing abnormalities. Six splicing consensus site mutations resulting in aberrant splicing patterns, one exonic mutation leading to exon skipping, and four deep intronic mutations producing cryptic splice site activation were identified. Interestingly, one case produced a cryptic splice site with a single nucleotide substitution in the deep intron that led to intronic exonization containing a stop codon; however, the patient showed a clearly milder phenotype for X-linked Alport syndrome in men with a truncating mutation. mRNA extracted from the kidney showed both normal and abnormal transcripts, with the normal transcript resulting in the milder phenotype. This novel mechanism leads to mild clinical characteristics. This report highlights the importance of analyzing transcripts to enhance the mutation detection rate and provides insight into genotype-phenotype correlations. This approach can clarify the cause of atypically mild phenotypes in X-linked Alport syndrome. Copyright © 2014 by the American Society of Nephrology.
2012-01-01
Background Yellow lupin (Lupinus luteus L.) is a minor legume crop characterized by its high seed protein content. Although grown in several temperate countries, its orphan condition has limited the generation of genomic tools to aid breeding efforts to improve yield and nutritional quality. In this study, we report the construction of 454-expresed sequence tag (EST) libraries, carried out comparative studies between L. luteus and model legume species, developed a comprehensive set of EST-simple sequence repeat (SSR) markers, and validated their utility on diversity studies and transferability to related species. Results Two runs of 454 pyrosequencing yielded 205 Mb and 530 Mb of sequence data for L1 (young leaves, buds and flowers) and L2 (immature seeds) EST- libraries. A combined assembly (L1L2) yielded 71,655 contigs with an average contig length of 632 nucleotides. L1L2 contigs were clustered into 55,309 isotigs. 38,200 isotigs translated into proteins and 8,741 of them were full length. Around 57% of L. luteus sequences had significant similarity with at least one sequence of Medicago, Lotus, Arabidopsis, or Glycine, and 40.17% showed positive matches with all of these species. L. luteus isotigs were also screened for the presence of SSR sequences. A total of 2,572 isotigs contained at least one EST-SSR, with a frequency of one SSR per 17.75 kbp. Empirical evaluation of the EST-SSR candidate markers resulted in 222 polymorphic EST-SSRs. Two hundred and fifty four (65.7%) and 113 (30%) SSR primer pairs were able to amplify fragments from L. hispanicus and L. mutabilis DNA, respectively. Fifty polymorphic EST-SSRs were used to genotype a sample of 64 L. luteus accessions. Neighbor-joining distance analysis detected the existence of several clusters among L. luteus accessions, strongly suggesting the existence of population subdivisions. However, no clear clustering patterns followed the accession’s origin. Conclusion L. luteus deep transcriptome sequencing will facilitate the further development of genomic tools and lupin germplasm. Massive sequencing of cDNA libraries will continue to produce raw materials for gene discovery, identification of polymorphisms (SNPs, EST-SSRs, INDELs, etc.) for marker development, anchoring sequences for genome comparisons and putative gene candidates for QTL detection. PMID:22920992
Parra-González, Lorena B; Aravena-Abarzúa, Gabriela A; Navarro-Navarro, Cristell S; Udall, Joshua; Maughan, Jeff; Peterson, Louis M; Salvo-Garrido, Haroldo E; Maureira-Butler, Iván J
2012-08-24
Yellow lupin (Lupinus luteus L.) is a minor legume crop characterized by its high seed protein content. Although grown in several temperate countries, its orphan condition has limited the generation of genomic tools to aid breeding efforts to improve yield and nutritional quality. In this study, we report the construction of 454-expresed sequence tag (EST) libraries, carried out comparative studies between L. luteus and model legume species, developed a comprehensive set of EST-simple sequence repeat (SSR) markers, and validated their utility on diversity studies and transferability to related species. Two runs of 454 pyrosequencing yielded 205 Mb and 530 Mb of sequence data for L1 (young leaves, buds and flowers) and L2 (immature seeds) EST- libraries. A combined assembly (L1L2) yielded 71,655 contigs with an average contig length of 632 nucleotides. L1L2 contigs were clustered into 55,309 isotigs. 38,200 isotigs translated into proteins and 8,741 of them were full length. Around 57% of L. luteus sequences had significant similarity with at least one sequence of Medicago, Lotus, Arabidopsis, or Glycine, and 40.17% showed positive matches with all of these species. L. luteus isotigs were also screened for the presence of SSR sequences. A total of 2,572 isotigs contained at least one EST-SSR, with a frequency of one SSR per 17.75 kbp. Empirical evaluation of the EST-SSR candidate markers resulted in 222 polymorphic EST-SSRs. Two hundred and fifty four (65.7%) and 113 (30%) SSR primer pairs were able to amplify fragments from L. hispanicus and L. mutabilis DNA, respectively. Fifty polymorphic EST-SSRs were used to genotype a sample of 64 L. luteus accessions. Neighbor-joining distance analysis detected the existence of several clusters among L. luteus accessions, strongly suggesting the existence of population subdivisions. However, no clear clustering patterns followed the accession's origin. L. luteus deep transcriptome sequencing will facilitate the further development of genomic tools and lupin germplasm. Massive sequencing of cDNA libraries will continue to produce raw materials for gene discovery, identification of polymorphisms (SNPs, EST-SSRs, INDELs, etc.) for marker development, anchoring sequences for genome comparisons and putative gene candidates for QTL detection.
Lonardi, Stefano; Mirebrahim, Hamid; Wanamaker, Steve; Alpert, Matthew; Ciardo, Gianfranco; Duma, Denisa; Close, Timothy J
2015-09-15
As the invention of DNA sequencing in the 70s, computational biologists have had to deal with the problem of de novo genome assembly with limited (or insufficient) depth of sequencing. In this work, we investigate the opposite problem, that is, the challenge of dealing with excessive depth of sequencing. We explore the effect of ultra-deep sequencing data in two domains: (i) the problem of decoding reads to bacterial artificial chromosome (BAC) clones (in the context of the combinatorial pooling design we have recently proposed), and (ii) the problem of de novo assembly of BAC clones. Using real ultra-deep sequencing data, we show that when the depth of sequencing increases over a certain threshold, sequencing errors make these two problems harder and harder (instead of easier, as one would expect with error-free data), and as a consequence the quality of the solution degrades with more and more data. For the first problem, we propose an effective solution based on 'divide and conquer': we 'slice' a large dataset into smaller samples of optimal size, decode each slice independently, and then merge the results. Experimental results on over 15 000 barley BACs and over 4000 cowpea BACs demonstrate a significant improvement in the quality of the decoding and the final assembly. For the second problem, we show for the first time that modern de novo assemblers cannot take advantage of ultra-deep sequencing data. Python scripts to process slices and resolve decoding conflicts are available from http://goo.gl/YXgdHT; software Hashfilter can be downloaded from http://goo.gl/MIyZHs stelo@cs.ucr.edu or timothy.close@ucr.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Mapping Phylogenetic Trees to Reveal Distinct Patterns of Evolution
Kendall, Michelle; Colijn, Caroline
2016-01-01
Evolutionary relationships are frequently described by phylogenetic trees, but a central barrier in many fields is the difficulty of interpreting data containing conflicting phylogenetic signals. We present a metric-based method for comparing trees which extracts distinct alternative evolutionary relationships embedded in data. We demonstrate detection and resolution of phylogenetic uncertainty in a recent study of anole lizards, leading to alternate hypotheses about their evolutionary relationships. We use our approach to compare trees derived from different genes of Ebolavirus and find that the VP30 gene has a distinct phylogenetic signature composed of three alternatives that differ in the deep branching structure. Key words: phylogenetics, evolution, tree metrics, genetics, sequencing. PMID:27343287
Deep learning applications in ophthalmology.
Rahimy, Ehsan
2018-05-01
To describe the emerging applications of deep learning in ophthalmology. Recent studies have shown that various deep learning models are capable of detecting and diagnosing various diseases afflicting the posterior segment of the eye with high accuracy. Most of the initial studies have centered around detection of referable diabetic retinopathy, age-related macular degeneration, and glaucoma. Deep learning has shown promising results in automated image analysis of fundus photographs and optical coherence tomography images. Additional testing and research is required to clinically validate this technology.
Development and application of deep convolutional neural network in target detection
NASA Astrophysics Data System (ADS)
Jiang, Xiaowei; Wang, Chunping; Fu, Qiang
2018-04-01
With the development of big data and algorithms, deep convolution neural networks with more hidden layers have more powerful feature learning and feature expression ability than traditional machine learning methods, making artificial intelligence surpass human level in many fields. This paper first reviews the development and application of deep convolutional neural networks in the field of object detection in recent years, then briefly summarizes and ponders some existing problems in the current research, and the future development of deep convolutional neural network is prospected.
Gizewski, Elke R; Maderwald, Stefan; Linn, Jennifer; Dassinger, Benjamin; Bochmann, Katja; Forsting, Michael; Ladd, Mark E
2014-03-01
The purpose of this paper is to assess the value of 7 Tesla (7 T) MRI for the depiction of brain stem and cranial nerve (CN) anatomy. Six volunteers were examined at 7 T using high-resolution SWI, MPRAGE, MP2RAGE, 3D SPACE T2, T2, and PD images to establish scanning parameters targeted at optimizing spatial resolution. Direct comparisons between 3 and 7 T were performed in two additional subjects using the finalized sequences (3 T: T2, PD, MPRAGE, SWAN; 7 T: 3D T2, MPRAGE, SWI, MP2RAGE). Artifacts and the depiction of structures were evaluated by two neuroradiologists using a standardized score sheet. Sequences could be established for high-resolution 7 T imaging even in caudal cranial areas. High in-plane resolution T2, PD, and SWI images provided depiction of inner brain stem structures such as pons fibers, raphe, reticular formation, nerve roots, and periaqueductal gray. MPRAGE and MP2RAGE provided clear depiction of the CNs. 3D T2 images improved depiction of inner brain structure in comparison to T2 images at 3 T. Although the 7-T SWI sequence provided improved contrast to some inner structures, extended areas were influenced by artifacts due to image disturbances from susceptibility differences. Seven-tesla imaging of basal brain areas is feasible and might have significant impact on detection and diagnosis in patients with specific diseases, e.g., trigeminal pain related to affection of the nerve root. Some inner brain stem structures can be depicted at 3 T, but certain sequences at 7 T, in particular 3D SPACE T2, are superior in producing anatomical in vivo images of deep brain stem structures.
Genome-wide mapping of alternative splicing in Arabidopsis thaliana
Filichkin, Sergei A.; Priest, Henry D.; Givan, Scott A.; Shen, Rongkun; Bryant, Douglas W.; Fox, Samuel E.; Wong, Weng-Keen; Mockler, Todd C.
2010-01-01
Alternative splicing can enhance transcriptome plasticity and proteome diversity. In plants, alternative splicing can be manifested at different developmental stages, and is frequently associated with specific tissue types or environmental conditions such as abiotic stress. We mapped the Arabidopsis transcriptome at single-base resolution using the Illumina platform for ultrahigh-throughput RNA sequencing (RNA-seq). Deep transcriptome sequencing confirmed a majority of annotated introns and identified thousands of novel alternatively spliced mRNA isoforms. Our analysis suggests that at least ∼42% of intron-containing genes in Arabidopsis are alternatively spliced; this is significantly higher than previous estimates based on cDNA/expressed sequence tag sequencing. Random validation confirmed that novel splice isoforms empirically predicted by RNA-seq can be detected in vivo. Novel introns detected by RNA-seq were substantially enriched in nonconsensus terminal dinucleotide splice signals. Alternative isoforms with premature termination codons (PTCs) comprised the majority of alternatively spliced transcripts. Using an example of an essential circadian clock gene, we show that intron retention can generate relatively abundant PTC+ isoforms and that this specific event is highly conserved among diverse plant species. Alternatively spliced PTC+ isoforms can be potentially targeted for degradation by the nonsense mediated mRNA decay (NMD) surveillance machinery or regulate the level of functional transcripts by the mechanism of regulated unproductive splicing and translation (RUST). We demonstrate that the relative ratios of the PTC+ and reference isoforms for several key regulatory genes can be considerably shifted under abiotic stress treatments. Taken together, our results suggest that like in animals, NMD and RUST may be widespread in plants and may play important roles in regulating gene expression. PMID:19858364
Limited Variation in BK Virus T-Cell Epitopes Revealed by Next-Generation Sequencing
Sahoo, Malaya K.; Tan, Susanna K.; Chen, Sharon F.; Kapusinszky, Beatrix; Concepcion, Katherine R.; Kjelson, Lynn; Mallempati, Kalyan; Farina, Heidi M.; Fernández-Viña, Marcelo; Tyan, Dolly; Grimm, Paul C.; Anderson, Matthew W.; Concepcion, Waldo
2015-01-01
BK virus (BKV) infection causing end-organ disease remains a formidable challenge to the hematopoietic cell transplant (HCT) and kidney transplant fields. As BKV-specific treatments are limited, immunologic-based therapies may be a promising and novel therapeutic option for transplant recipients with persistent BKV infection. Here, we describe a whole-genome, deep-sequencing methodology and bioinformatics pipeline that identify BKV variants across the genome and at BKV-specific HLA-A2-, HLA-B0702-, and HLA-B08-restricted CD8 T-cell epitopes. BKV whole genomes were amplified using long-range PCR with four inverse primer sets, and fragmentation libraries were sequenced on the Ion Torrent Personal Genome Machine (PGM). An error model and variant-calling algorithm were developed to accurately identify rare variants. A total of 65 samples from 18 pediatric HCT and kidney recipients with quantifiable BKV DNAemia underwent whole-genome sequencing. Limited genetic variation was observed. The median number of amino acid variants identified per sample was 8 (range, 2 to 37; interquartile range, 10), with the majority of variants (77%) detected at a frequency of <5%. When normalized for length, there was no statistical difference in the median number of variants across all genes. Similarly, the predominant virus population within samples harbored T-cell epitopes similar to the reference BKV strain that was matched for the BKV genotype. Despite the conservation of epitopes, low-level variants in T-cell epitopes were detected in 77.7% (14/18) of patients. Understanding epitope variation across the whole genome provides insight into the virus-immune interface and may help guide the development of protocols for novel immunologic-based therapies. PMID:26202116
Nouchi, A; Nguyen, T; Valantin, M A; Simon, A; Sayon, S; Agher, R; Calvez, V; Katlama, C; Marcelin, A G; Soulie, C
2018-05-29
To investigate the dynamics of HIV-1 variants archived in cells harbouring drug resistance-associated mutations (DRAMs) to lamivudine/emtricitabine, etravirine and rilpivirine in patients under effective ART free from selective pressure on these DRAMs, in order to assess the possibility of recycling molecules with resistance history. We studied 25 patients with at least one DRAM to lamivudine/emtricitabine, etravirine and/or rilpivirine identified on an RNA sequence in their history and with virological control for at least 5 years under a regimen excluding all drugs from the resistant class. Longitudinal ultra-deep sequencing (UDS) and Sanger sequencing of the reverse transcriptase region were performed on cell-associated HIV-1 DNA samples taken over the 5 years of follow-up. Viral variants harbouring the analysed DRAMs were no longer detected by UDS over the 5 years in 72% of patients, with viruses susceptible to the molecules of interest found after 5 years in 80% of patients with UDS and in 88% of patients with Sanger. Residual viraemia with <50 copies/mL was detected in 52% of patients. The median HIV DNA level remained stable (2.4 at baseline versus 2.1 log10 copies/106 cells 5 years later). These results show a clear trend towards clearance of archived DRAMs to reverse transcriptase inhibitors in cell-associated HIV-1 DNA after a long period of virological control, free from therapeutic selective pressure on these DRAMs, reflecting probable residual replication in some reservoirs of the fittest viruses and leading to persistent evolution of the archived HIV-1 DNA resistance profile.
Mensa-Vilaro, Anna; Teresa Bosque, María; Magri, Giuliana; Honda, Yoshitaka; Martínez-Banaclocha, Helios; Casorran-Berges, Marta; Sintes, Jordi; González-Roca, Eva; Ruiz-Ortiz, Estibaliz; Heike, Toshio; Martínez-Garcia, Juan J; Baroja-Mazo, Alberto; Cerutti, Andrea; Nishikomori, Ryuta; Yagüe, Jordi; Pelegrín, Pablo; Delgado-Beltran, Concha; Aróstegui, Juan I
2016-12-01
Gain-of-function NLRP3 mutations cause cryopyrin-associated periodic syndrome (CAPS), with gene mosaicism playing a relevant role in the pathogenesis. This study was undertaken to characterize the genetic cause underlying late-onset but otherwise typical CAPS. We studied a 64-year-old patient who presented with recurrent episodes of urticaria-like rash, fever, conjunctivitis, and oligoarthritis at age 56 years. DNA was extracted from both unfractionated blood and isolated leukocyte and CD34+ subpopulations. Genetic studies were performed using both the Sanger method of DNA sequencing and next-generation sequencing (NGS) methods. In vitro and ex vivo analyses were performed to determine the consequences that the presence of the variant have in the normal structure or function of the protein of the detected variant. NGS analyses revealed the novel p.Gln636Glu NLRP3 variant in unfractionated blood, with an allele frequency (18.4%) compatible with gene mosaicism. Sanger sequence chromatograms revealed a small peak corresponding to the variant allele. Amplicon-based deep sequencing revealed somatic NLRP3 mosaicism restricted to myeloid cells (31.8% in monocytes, 24.6% in neutrophils, and 11.2% in circulating CD34+ common myeloid progenitor cells) and its complete absence in lymphoid cells. Functional analyses confirmed the gain-of-function behavior of the gene variant and hyperactivity of the NLRP3 inflammasome in the patient. Treatment with anakinra resulted in good control of the disease. We identified the novel gain-of-function p.Gln636Glu NLRP3 mutation, which was detected as a somatic mutation restricted to myeloid cells, as the cause of late-onset but otherwise typical CAPS. Our results expand the diversity of CAPS toward milder phenotypes than previously reported, including those starting during adulthood. © 2016, American College of Rheumatology.
Abriata, Luciano A; Bovigny, Christophe; Dal Peraro, Matteo
2016-06-17
Protein variability can now be studied by measuring high-resolution tolerance-to-substitution maps and fitness landscapes in saturated mutational libraries. But these rich and expensive datasets are typically interpreted coarsely, restricting detailed analyses to positions of extremely high or low variability or dubbed important beforehand based on existing knowledge about active sites, interaction surfaces, (de)stabilizing mutations, etc. Our new webserver PsychoProt (freely available without registration at http://psychoprot.epfl.ch or at http://lucianoabriata.altervista.org/psychoprot/index.html ) helps to detect, quantify, and sequence/structure map the biophysical and biochemical traits that shape amino acid preferences throughout a protein as determined by deep-sequencing of saturated mutational libraries or from large alignments of naturally occurring variants. We exemplify how PsychoProt helps to (i) unveil protein structure-function relationships from experiments and from alignments that are consistent with structures according to coevolution analysis, (ii) recall global information about structural and functional features and identify hitherto unknown constraints to variation in alignments, and (iii) point at different sources of variation among related experimental datasets or between experimental and alignment-based data. Remarkably, metabolic costs of the amino acids pose strong constraints to variability at protein surfaces in nature but not in the laboratory. This and other differences call for caution when extrapolating results from in vitro experiments to natural scenarios in, for example, studies of protein evolution. We show through examples how PsychoProt can be a useful tool for the broad communities of structural biology and molecular evolution, particularly for studies about protein modeling, evolution and design.
Cross-species inference of long non-coding RNAs greatly expands the ruminant transcriptome.
Bush, Stephen J; Muriuki, Charity; McCulloch, Mary E B; Farquhar, Iseabail L; Clark, Emily L; Hume, David A
2018-04-24
mRNA-like long non-coding RNAs (lncRNAs) are a significant component of mammalian transcriptomes, although most are expressed only at low levels, with high tissue-specificity and/or at specific developmental stages. Thus, in many cases lncRNA detection by RNA-sequencing (RNA-seq) is compromised by stochastic sampling. To account for this and create a catalogue of ruminant lncRNAs, we compared de novo assembled lncRNAs derived from large RNA-seq datasets in transcriptional atlas projects for sheep and goats with previous lncRNAs assembled in cattle and human. We then combined the novel lncRNAs with the sheep transcriptional atlas to identify co-regulated sets of protein-coding and non-coding loci. Few lncRNAs could be reproducibly assembled from a single dataset, even with deep sequencing of the same tissues from multiple animals. Furthermore, there was little sequence overlap between lncRNAs that were assembled from pooled RNA-seq data. We combined positional conservation (synteny) with cross-species mapping of candidate lncRNAs to identify a consensus set of ruminant lncRNAs and then used the RNA-seq data to demonstrate detectable and reproducible expression in each species. In sheep, 20 to 30% of lncRNAs were located close to protein-coding genes with which they are strongly co-expressed, which is consistent with the evolutionary origin of some ncRNAs in enhancer sequences. Nevertheless, most of the lncRNAs are not co-expressed with neighbouring protein-coding genes. Alongside substantially expanding the ruminant lncRNA repertoire, the outcomes of our analysis demonstrate that stochastic sampling can be partly overcome by combining RNA-seq datasets from related species. This has practical implications for the future discovery of lncRNAs in other species.
Bierzynska, Agnieszka; McCarthy, Hugh J; Soderquest, Katrina; Sen, Ethan S; Colby, Elizabeth; Ding, Wen Y; Nabhan, Marwa M; Kerecuk, Larissa; Hegde, Shivram; Hughes, David; Marks, Stephen; Feather, Sally; Jones, Caroline; Webb, Nicholas J A; Ognjanovic, Milos; Christian, Martin; Gilbert, Rodney D; Sinha, Manish D; Lord, Graham M; Simpson, Michael; Koziell, Ania B; Welsh, Gavin I; Saleem, Moin A
2017-04-01
Steroid Resistant Nephrotic Syndrome (SRNS) in children and young adults has differing etiologies with monogenic disease accounting for 2.9-30% in selected series. Using whole exome sequencing we sought to stratify a national population of children with SRNS into monogenic and non-monogenic forms, and further define those groups by detailed phenotypic analysis. Pediatric patients with SRNS were identified via a national United Kingdom Renal Registry. Whole exome sequencing was performed on 187 patients, of which 12% have a positive family history with a focus on the 53 genes currently known to be associated with nephrotic syndrome. Genetic findings were correlated with individual case disease characteristics. Disease causing variants were detected in 26.2% of patients. Most often this occurred in the three most common SRNS-associated genes: NPHS1, NPHS2, and WT1 but also in 14 other genes. The genotype did not always correlate with expected phenotype since mutations in OCRL, COL4A3, and DGKE associated with specific syndromes were detected in patients with isolated renal disease. Analysis by primary/presumed compared with secondary steroid resistance found 30.8% monogenic disease in primary compared with none in secondary SRNS permitting further mechanistic stratification. Genetic SRNS progressed faster to end stage renal failure, with no documented disease recurrence post-transplantation within this cohort. Primary steroid resistance in which no gene mutation was identified had a 47.8% risk of recurrence. In this unbiased pediatric population, whole exome sequencing allowed screening of all current candidate genes. Thus, deep phenotyping combined with whole exome sequencing is an effective tool for early identification of SRNS etiology, yielding an evidence-based algorithm for clinical management. Copyright © 2016 International Society of Nephrology. Published by Elsevier Inc. All rights reserved.
Lou, Haiyi; Lu, Yan; Lu, Dongsheng; Fu, Ruiqing; Wang, Xiaoji; Feng, Qidi; Wu, Sijie; Yang, Yajun; Li, Shilin; Kang, Longli; Guan, Yaqun; Hoh, Boon-Peng; Chung, Yeun-Jun; Jin, Li; Su, Bing; Xu, Shuhua
2015-01-01
Tibetan high-altitude adaptation (HAA) has been studied extensively, and many candidate genes have been reported. Subsequent efforts targeting HAA functional variants, however, have not been that successful (e.g., no functional variant has been suggested for the top candidate HAA gene, EPAS1). With WinXPCNVer, a method developed in this study, we detected in microarray data a Tibetan-enriched deletion (TED) carried by 90% of Tibetans; 50% were homozygous for the deletion, whereas only 3% carried the TED and 0% carried the homozygous deletion in 2,792 worldwide samples (p < 10−15). We employed long PCR and Sanger sequencing technologies to determine the exact copy number and breakpoints of the TED in 70 additional Tibetan and 182 diverse samples. The TED had identical boundaries (chr2: 46,694,276–46,697,683; hg19) and was 80 kb downstream of EPAS1. Notably, the TED was in strong linkage disequilibrium (LD; r2 = 0.8) with EPAS1 variants associated with reduced blood concentrations of hemoglobin. It was also in complete LD with the 5-SNP motif, which was suspected to be introgressed from Denisovans, but the deletion itself was absent from the Denisovan sequence. Correspondingly, we detected that footprints of positive selection for the TED occurred 12,803 (95% confidence interval = 12,075–14,725) years ago. We further whole-genome deep sequenced (>60×) seven Tibetans and verified the TED but failed to identify any other copy-number variations with comparable patterns, giving this TED top priority for further study. We speculate that the specific patterns of the TED resulted from its own functionality in HAA of Tibetans or LD with a functional variant of EPAS1. PMID:26073780
Lauck, Michael; Switzer, William M; Sibley, Samuel D; Hyeroba, David; Tumukunde, Alex; Weny, Geoffrey; Taylor, Bill; Shankar, Anupama; Ting, Nelson; Chapman, Colin A; Friedrich, Thomas C; Goldberg, Tony L; O'Connor, David H
2013-10-21
African non-human primates (NHPs) are natural hosts for simian immunodeficiency viruses (SIV), the zoonotic transmission of which led to the emergence of HIV-1 and HIV-2. However, our understanding of SIV diversity and evolution is limited by incomplete taxonomic and geographic sampling of NHPs, particularly in East Africa. In this study, we screened blood specimens from nine black-and-white colobus monkeys (Colobus guereza occidentalis) from Kibale National Park, Uganda, for novel SIVs using a combination of serology and "unbiased" deep-sequencing, a method that does not rely on genetic similarity to previously characterized viruses. We identified two novel and divergent SIVs, tentatively named SIVkcol-1 and SIVkcol-2, and assembled genomes covering the entire coding region for each virus. SIVkcol-1 and SIVkcol-2 were detected in three and four animals, respectively, but with no animals co-infected. Phylogenetic analyses showed that SIVkcol-1 and SIVkcol-2 form a lineage with SIVcol, previously discovered in black-and-white colobus from Cameroon. Although SIVkcol-1 and SIVkcol-2 were isolated from the same host population in Uganda, SIVkcol-1 is more closely related to SIVcol than to SIVkcol-2. Analysis of functional motifs in the extracellular envelope glycoprotein (gp120) revealed that SIVkcol-2 is unique among primate lentiviruses in containing only 16 conserved cysteine residues instead of the usual 18 or more. Our results demonstrate that the genetic diversity of SIVs infecting black-and-white colobus across equatorial Africa is greater than previously appreciated and that divergent SIVs can co-circulate in the same colobine population. We also show that the use of "unbiased" deep sequencing for the detection of SIV has great advantages over traditional serological approaches, especially for studies of unknown or poorly characterized viruses. Finally, the detection of the first SIV containing only 16 conserved cysteines in the extracellular envelope protein gp120 further expands the range of functional motifs observed among SIVs and highlights the complex evolutionary history of simian retroviruses.
2013-01-01
Background African non-human primates (NHPs) are natural hosts for simian immunodeficiency viruses (SIV), the zoonotic transmission of which led to the emergence of HIV-1 and HIV-2. However, our understanding of SIV diversity and evolution is limited by incomplete taxonomic and geographic sampling of NHPs, particularly in East Africa. In this study, we screened blood specimens from nine black-and-white colobus monkeys (Colobus guereza occidentalis) from Kibale National Park, Uganda, for novel SIVs using a combination of serology and “unbiased” deep-sequencing, a method that does not rely on genetic similarity to previously characterized viruses. Results We identified two novel and divergent SIVs, tentatively named SIVkcol-1 and SIVkcol-2, and assembled genomes covering the entire coding region for each virus. SIVkcol-1 and SIVkcol-2 were detected in three and four animals, respectively, but with no animals co-infected. Phylogenetic analyses showed that SIVkcol-1 and SIVkcol-2 form a lineage with SIVcol, previously discovered in black-and-white colobus from Cameroon. Although SIVkcol-1 and SIVkcol-2 were isolated from the same host population in Uganda, SIVkcol-1 is more closely related to SIVcol than to SIVkcol-2. Analysis of functional motifs in the extracellular envelope glycoprotein (gp120) revealed that SIVkcol-2 is unique among primate lentiviruses in containing only 16 conserved cysteine residues instead of the usual 18 or more. Conclusions Our results demonstrate that the genetic diversity of SIVs infecting black-and-white colobus across equatorial Africa is greater than previously appreciated and that divergent SIVs can co-circulate in the same colobine population. We also show that the use of “unbiased” deep sequencing for the detection of SIV has great advantages over traditional serological approaches, especially for studies of unknown or poorly characterized viruses. Finally, the detection of the first SIV containing only 16 conserved cysteines in the extracellular envelope protein gp120 further expands the range of functional motifs observed among SIVs and highlights the complex evolutionary history of simian retroviruses. PMID:24139306