Quantitative phenotyping via deep barcode sequencing.
Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey
2009-10-01
Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.
Quantitative phenotyping via deep barcode sequencing
Smith, Andrew M.; Heisler, Lawrence E.; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J.; Chee, Mark; Roth, Frederick P.; Giaever, Guri; Nislow, Corey
2009-01-01
Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or “Bar-seq,” outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that ∼20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene–environment interactions on a genome-wide scale. PMID:19622793
DNA Replication Profiling Using Deep Sequencing.
Saayman, Xanita; Ramos-Pérez, Cristina; Brown, Grant W
2018-01-01
Profiling of DNA replication during progression through S phase allows a quantitative snap-shot of replication origin usage and DNA replication fork progression. We present a method for using deep sequencing data to profile DNA replication in S. cerevisiae.
2010-01-01
Background Bathymodiolus azoricus is a deep-sea hydrothermal vent mussel found in association with large faunal communities living in chemosynthetic environments at the bottom of the sea floor near the Azores Islands. Investigation of the exceptional physiological reactions that vent mussels have adopted in their habitat, including responses to environmental microbes, remains a difficult challenge for deep-sea biologists. In an attempt to reveal genes potentially involved in the deep-sea mussel innate immunity we carried out a high-throughput sequence analysis of freshly collected B. azoricus transcriptome using gills tissues as the primary source of immune transcripts given its strategic role in filtering the surrounding waterborne potentially infectious microorganisms. Additionally, a substantial EST data set was produced and from which a comprehensive collection of genes coding for putative proteins was organized in a dedicated database, "DeepSeaVent" the first deep-sea vent animal transcriptome database based on the 454 pyrosequencing technology. Results A normalized cDNA library from gills tissue was sequenced in a full 454 GS-FLX run, producing 778,996 sequencing reads. Assembly of the high quality reads resulted in 75,407 contigs of which 3,071 were singletons. A total of 39,425 transcripts were conceptually translated into amino-sequences of which 22,023 matched known proteins in the NCBI non-redundant protein database, 15,839 revealed conserved protein domains through InterPro functional classification and 9,584 were assigned with Gene Ontology terms. Queries conducted within the database enabled the identification of genes putatively involved in immune and inflammatory reactions which had not been previously evidenced in the vent mussel. Their physical counterpart was confirmed by semi-quantitative quantitative Reverse-Transcription-Polymerase Chain Reactions (RT-PCR) and their RNA transcription level by quantitative PCR (qPCR) experiments. Conclusions We have established the first tissue transcriptional analysis of a deep-sea hydrothermal vent animal and generated a searchable catalog of genes that provides a direct method of identifying and retrieving vast numbers of novel coding sequences which can be applied in gene expression profiling experiments from a non-conventional model organism. This provides the most comprehensive sequence resource for identifying novel genes currently available for a deep-sea vent organism, in particular, genes putatively involved in immune and inflammatory reactions in vent mussels. The characterization of the B. azoricus transcriptome will facilitate research into biological processes underlying physiological adaptations to hydrothermal vent environments and will provide a basis for expanding our understanding of genes putatively involved in adaptations processes during post-capture long term acclimatization experiments, at "sea-level" conditions, using B. azoricus as a model organism. PMID:20937131
2015-01-01
Nematodes inhabiting benthic deep-sea ecosystems account for >90% of the total metazoan abundances and they have been hypothesised to be hyper-diverse, but their biodiversity is still largely unknown. Metabarcoding could facilitate the census of biodiversity, especially for those tiny metazoans for which morphological identification is difficult. We compared, for the first time, different DNA extraction procedures based on the use of two commercial kits and a previously published laboratory protocol and tested their suitability for sequencing analyses of 18S rDNA of marine nematodes. We also investigated the reliability of Roche 454 sequencing analyses for assessing the biodiversity of deep-sea nematode assemblages previously morphologically identified. Finally, intra-genomic variation in 18S rRNA gene repeats was investigated by Illumina MiSeq in different deep-sea nematode morphospecies to assess the influence of polymorphisms on nematode biodiversity estimates. Our results indicate that the two commercial kits should be preferred for the molecular analysis of biodiversity of deep-sea nematodes since they consistently provide amplifiable DNA suitable for sequencing. We report that the morphological identification of deep-sea nematodes matches the results obtained by metabarcoding analysis only at the order-family level and that a large portion of Operational Clustered Taxonomic Units (OCTUs) was not assigned. We also show that independently from the cut-off criteria and bioinformatic pipelines used, the number of OCTUs largely exceeds the number of individuals and that 18S rRNA gene of different morpho-species of nematodes displayed intra-genomic polymorphisms. Our results indicate that metabarcoding is an important tool to explore the diversity of deep-sea nematodes, but still fails in identifying most of the species due to limited number of sequences deposited in the public databases, and in providing quantitative data on the species encountered. These aspects should be carefully taken into account before using metabarcoding in quantitative ecological research and monitoring programmes of marine biodiversity. PMID:26701112
Dell'Anno, Antonio; Carugati, Laura; Corinaldesi, Cinzia; Riccioni, Giulia; Danovaro, Roberto
2015-01-01
Nematodes inhabiting benthic deep-sea ecosystems account for >90% of the total metazoan abundances and they have been hypothesised to be hyper-diverse, but their biodiversity is still largely unknown. Metabarcoding could facilitate the census of biodiversity, especially for those tiny metazoans for which morphological identification is difficult. We compared, for the first time, different DNA extraction procedures based on the use of two commercial kits and a previously published laboratory protocol and tested their suitability for sequencing analyses of 18S rDNA of marine nematodes. We also investigated the reliability of Roche 454 sequencing analyses for assessing the biodiversity of deep-sea nematode assemblages previously morphologically identified. Finally, intra-genomic variation in 18S rRNA gene repeats was investigated by Illumina MiSeq in different deep-sea nematode morphospecies to assess the influence of polymorphisms on nematode biodiversity estimates. Our results indicate that the two commercial kits should be preferred for the molecular analysis of biodiversity of deep-sea nematodes since they consistently provide amplifiable DNA suitable for sequencing. We report that the morphological identification of deep-sea nematodes matches the results obtained by metabarcoding analysis only at the order-family level and that a large portion of Operational Clustered Taxonomic Units (OCTUs) was not assigned. We also show that independently from the cut-off criteria and bioinformatic pipelines used, the number of OCTUs largely exceeds the number of individuals and that 18S rRNA gene of different morpho-species of nematodes displayed intra-genomic polymorphisms. Our results indicate that metabarcoding is an important tool to explore the diversity of deep-sea nematodes, but still fails in identifying most of the species due to limited number of sequences deposited in the public databases, and in providing quantitative data on the species encountered. These aspects should be carefully taken into account before using metabarcoding in quantitative ecological research and monitoring programmes of marine biodiversity.
Modeling genome coverage in single-cell sequencing
Daley, Timothy; Smith, Andrew D.
2014-01-01
Motivation: Single-cell DNA sequencing is necessary for examining genetic variation at the cellular level, which remains hidden in bulk sequencing experiments. But because they begin with such small amounts of starting material, the amount of information that is obtained from single-cell sequencing experiment is highly sensitive to the choice of protocol employed and variability in library preparation. In particular, the fraction of the genome represented in single-cell sequencing libraries exhibits extreme variability due to quantitative biases in amplification and loss of genetic material. Results: We propose a method to predict the genome coverage of a deep sequencing experiment using information from an initial shallow sequencing experiment mapped to a reference genome. The observed coverage statistics are used in a non-parametric empirical Bayes Poisson model to estimate the gain in coverage from deeper sequencing. This approach allows researchers to know statistical features of deep sequencing experiments without actually sequencing deeply, providing a basis for optimizing and comparing single-cell sequencing protocols or screening libraries. Availability and implementation: The method is available as part of the preseq software package. Source code is available at http://smithlabresearch.org/preseq. Contact: andrewds@usc.edu Supplementary information: Supplementary material is available at Bioinformatics online. PMID:25107873
Position-specific binding of FUS to nascent RNA regulates mRNA length
Masuda, Akio; Takeda, Jun-ichi; Okuno, Tatsuya; Okamoto, Takaaki; Ohkawara, Bisei; Ito, Mikako; Ishigaki, Shinsuke; Sobue, Gen
2015-01-01
More than half of all human genes produce prematurely terminated polyadenylated short mRNAs. However, the underlying mechanisms remain largely elusive. CLIP-seq (cross-linking immunoprecipitation [CLIP] combined with deep sequencing) of FUS (fused in sarcoma) in neuronal cells showed that FUS is frequently clustered around an alternative polyadenylation (APA) site of nascent RNA. ChIP-seq (chromatin immunoprecipitation [ChIP] combined with deep sequencing) of RNA polymerase II (RNAP II) demonstrated that FUS stalls RNAP II and prematurely terminates transcription. When an APA site is located upstream of an FUS cluster, FUS enhances polyadenylation by recruiting CPSF160 and up-regulates the alternative short transcript. In contrast, when an APA site is located downstream from an FUS cluster, polyadenylation is not activated, and the RNAP II-suppressing effect of FUS leads to down-regulation of the alternative short transcript. CAGE-seq (cap analysis of gene expression [CAGE] combined with deep sequencing) and PolyA-seq (a strand-specific and quantitative method for high-throughput sequencing of 3' ends of polyadenylated transcripts) revealed that position-specific regulation of mRNA lengths by FUS is operational in two-thirds of transcripts in neuronal cells, with enrichment in genes involved in synaptic activities. PMID:25995189
Xiao, Chuan-Le; Mai, Zhi-Biao; Lian, Xin-Lei; Zhong, Jia-Yong; Jin, Jing-Jie; He, Qing-Yu; Zhang, Gong
2014-01-01
Correct and bias-free interpretation of the deep sequencing data is inevitably dependent on the complete mapping of all mappable reads to the reference sequence, especially for quantitative RNA-seq applications. Seed-based algorithms are generally slow but robust, while Burrows-Wheeler Transform (BWT) based algorithms are fast but less robust. To have both advantages, we developed an algorithm FANSe2 with iterative mapping strategy based on the statistics of real-world sequencing error distribution to substantially accelerate the mapping without compromising the accuracy. Its sensitivity and accuracy are higher than the BWT-based algorithms in the tests using both prokaryotic and eukaryotic sequencing datasets. The gene identification results of FANSe2 is experimentally validated, while the previous algorithms have false positives and false negatives. FANSe2 showed remarkably better consistency to the microarray than most other algorithms in terms of gene expression quantifications. We implemented a scalable and almost maintenance-free parallelization method that can utilize the computational power of multiple office computers, a novel feature not present in any other mainstream algorithm. With three normal office computers, we demonstrated that FANSe2 mapped an RNA-seq dataset generated from an entire Illunima HiSeq 2000 flowcell (8 lanes, 608 M reads) to masked human genome within 4.1 hours with higher sensitivity than Bowtie/Bowtie2. FANSe2 thus provides robust accuracy, full indel sensitivity, fast speed, versatile compatibility and economical computational utilization, making it a useful and practical tool for deep sequencing applications. FANSe2 is freely available at http://bioinformatics.jnu.edu.cn/software/fanse2/.
Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J; Kellam, Paul; van der Hoek, Lia
2014-01-01
We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis.
Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J.; Kellam, Paul; van der Hoek, Lia
2014-01-01
We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis. PMID:24695106
Yoshida, Mitsuhiro; Mochizuki, Tomohiro; Urayama, Syun-Ichi; Yoshida-Takashima, Yukari; Nishi, Shinro; Hirai, Miho; Nomaki, Hidetaka; Takaki, Yoshihiro; Nunoura, Takuro; Takai, Ken
2018-01-01
Previous studies on marine environmental virology have primarily focused on double-stranded DNA (dsDNA) viruses; however, it has recently been suggested that single-stranded DNA (ssDNA) viruses are more abundant in marine ecosystems. In this study, we performed a quantitative viral community DNA analysis to estimate the relative abundance and composition of both ssDNA and dsDNA viruses in offshore upper bathyal sediment from Tohoku, Japan (water depth = 500 m). The estimated dsDNA viral abundance ranged from 3 × 106 to 5 × 106 genome copies per cm3 sediment, showing values similar to the range of fluorescence-based direct virus counts. In contrast, the estimated ssDNA viral abundance ranged from 1 × 108 to 3 × 109 genome copies per cm3 sediment, thus providing an estimation that the ssDNA viral populations represent 96.3–99.8% of the benthic total DNA viral assemblages. In the ssDNA viral metagenome, most of the identified viral sequences were associated with ssDNA viral families such as Circoviridae and Microviridae. The principle components analysis of the ssDNA viral sequence components from the sedimentary ssDNA viral metagenomic libraries found that the different depth viral communities at the study site all exhibited similar profiles compared with deep-sea sediment ones at other reference sites. Our results suggested that deep-sea benthic ssDNA viruses have been significantly underestimated by conventional direct virus counts and that their contributions to deep-sea benthic microbial mortality and geochemical cycles should be further addressed by such a new quantitative approach. PMID:29467725
Samad, Abdul Fatah A; Nazaruddin, Nazaruddin; Murad, Abdul Munir Abdul; Jani, Jaeyres; Zainal, Zamri; Ismail, Ismanizan
2018-03-01
In current era, majority of microRNA (miRNA) are being discovered through computational approaches which are more confined towards model plants. Here, for the first time, we have described the identification and characterization of novel miRNA in a non-model plant, Persicaria minor ( P . minor ) using computational approach. Unannotated sequences from deep sequencing were analyzed based on previous well-established parameters. Around 24 putative novel miRNAs were identified from 6,417,780 reads of the unannotated sequence which represented 11 unique putative miRNA sequences. PsRobot target prediction tool was deployed to identify the target transcripts of putative novel miRNAs. Most of the predicted target transcripts (mRNAs) were known to be involved in plant development and stress responses. Gene ontology showed that majority of the putative novel miRNA targets involved in cellular component (69.07%), followed by molecular function (30.08%) and biological process (0.85%). Out of 11 unique putative miRNAs, 7 miRNAs were validated through semi-quantitative PCR. These novel miRNAs discoveries in P . minor may develop and update the current public miRNA database.
Magnetic resonance imaging of the subthalamic nucleus for deep brain stimulation.
Chandran, Arjun S; Bynevelt, Michael; Lind, Christopher R P
2016-01-01
The subthalamic nucleus (STN) is one of the most important stereotactic targets in neurosurgery, and its accurate imaging is crucial. With improving MRI sequences there is impetus for direct targeting of the STN. High-quality, distortion-free images are paramount. Image reconstruction techniques appear to show the greatest promise in balancing the issue of geometrical distortion and STN edge detection. Existing spin echo- and susceptibility-based MRI sequences are compared with new image reconstruction methods. Quantitative susceptibility mapping is the most promising technique for stereotactic imaging of the STN.
Zhang, Yong; Weissmann, Gary S; Fogg, Graham E; Lu, Bingqing; Sun, HongGuang; Zheng, Chunmiao
2018-06-05
Groundwater susceptibility to non-point source contamination is typically quantified by stable indexes, while groundwater quality evolution (or deterioration globally) can be a long-term process that may last for decades and exhibit strong temporal variations. This study proposes a three-dimensional (3- d ), transient index map built upon physical models to characterize the complete temporal evolution of deep aquifer susceptibility. For illustration purposes, the previous travel time probability density (BTTPD) approach is extended to assess the 3- d deep groundwater susceptibility to non-point source contamination within a sequence stratigraphic framework observed in the Kings River fluvial fan (KRFF) aquifer. The BTTPD, which represents complete age distributions underlying a single groundwater sample in a regional-scale aquifer, is used as a quantitative, transient measure of aquifer susceptibility. The resultant 3- d imaging of susceptibility using the simulated BTTPDs in KRFF reveals the strong influence of regional-scale heterogeneity on susceptibility. The regional-scale incised-valley fill deposits increase the susceptibility of aquifers by enhancing rapid downward solute movement and displaying relatively narrow and young age distributions. In contrast, the regional-scale sequence-boundary paleosols within the open-fan deposits "protect" deep aquifers by slowing downward solute movement and displaying a relatively broad and old age distribution. Further comparison of the simulated susceptibility index maps to known contaminant distributions shows that these maps are generally consistent with the high concentration and quick evolution of 1,2-dibromo-3-chloropropane (DBCP) in groundwater around the incised-valley fill since the 1970s'. This application demonstrates that the BTTPDs can be used as quantitative and transient measures of deep aquifer susceptibility to non-point source contamination.
Liu, Tong; Hu, John; Zuo, Yuhu; Jin, Yazhong; Hou, Jumei
2016-04-01
Deep sequencing of small RNAs is a useful tool to identify novel small RNAs that may be involved in fungal growth and pathogenesis. In this study, we used HiSeq deep sequencing to identify 747,487 unique small RNAs from Curvularia lunata. Among these small RNAs were 1012 microRNA-like RNAs (milRNAs), which are similar to other known microRNAs, and 48 potential novel milRNAs without homologs in other organisms have been identified using the miRBase© database. We used quantitative PCR to analyze the expression of four of these milRNAs from C. lunata at different developmental stages. The analysis revealed several changes associated with germinating conidia and mycelial growth, suggesting that these milRNAs may play a role in pathogen infection and mycelial growth. A total of 8334 target mRNAs for the 1012 milRNAs that were identified, and 256 target mRNAs for the 48 novel milRNAs were predicted by computational analysis. These target mRNAs of milRNAs were also performed by gene ontology and Kyoto Encyclopedia of Genes and Genomes pathway analysis. To our knowledge, this study is the first report of C. lunata's milRNA profiles. This information will provide a better understanding of pathogen development and infection mechanism.
Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions
2014-01-01
Deep sequencing harnesses the high throughput nature of next generation sequencing technologies to generate population samples, treating information contained in individual reads as meaningful. Here, we review applications of deep sequencing to pathogen evolution. Pioneering deep sequencing studies from the virology literature are discussed, such as whole genome Roche-454 sequencing analyses of the dynamics of the rapidly mutating pathogens hepatitis C virus and HIV. Extension of the deep sequencing approach to bacterial populations is then discussed, including the impacts of emerging sequencing technologies. While it is clear that deep sequencing has unprecedented potential for assessing the genetic structure and evolutionary history of pathogen populations, bioinformatic challenges remain. We summarise current approaches to overcoming these challenges, in particular methods for detecting low frequency variants in the context of sequencing error and reconstructing individual haplotypes from short reads. PMID:24428920
The deep biosphere in terrestrial sediments in the chesapeake bay area, virginia, USA.
Breuker, Anja; Köweker, Gerrit; Blazejak, Anna; Schippers, Axel
2011-01-01
For the first time quantitative data on the abundance of Bacteria, Archaea, and Eukarya in deep terrestrial sediments are provided using multiple methods (total cell counting, quantitative real-time PCR, Q-PCR and catalyzed reporter deposition-fluorescence in situ hybridization, CARD-FISH). The oligotrophic (organic carbon content of ∼0.2%) deep terrestrial sediments in the Chesapeake Bay area at Eyreville, Virginia, USA, were drilled and sampled up to a depth of 140 m in 2006. The possibility of contamination during drilling was checked using fluorescent microspheres. Total cell counts decreased from 10(9) to 10(6) cells/g dry weight within the uppermost 20 m, and did not further decrease with depth below. Within the top 7 m, a significant proportion of the total cell counts could be detected with CARD-FISH. The CARD-FISH numbers for Bacteria were about an order of magnitude higher than those for Archaea. The dominance of Bacteria over Archaea was confirmed by Q-PCR. The down core quantitative distribution of prokaryotic and eukaryotic small subunit ribosomal RNA genes as well as functional genes involved in different biogeochemical processes was revealed by Q-PCR for the uppermost 10 m and for 80-140 m depth. Eukarya and the Fe(III)- and Mn(IV)-reducing bacterial group Geobacteriaceae were almost exclusively found in the uppermost meter (arable soil), where reactive iron was detected in higher amounts. The bacterial candidate division JS-1 and the classes Anaerolineae and Caldilineae of the phylum Chloroflexi, highly abundant in marine sediments, were found up to the maximum sampling depth in high copy numbers at this terrestrial site as well. A similar high abundance of the functional gene cbbL encoding for the large subunit of RubisCO suggests that autotrophic microorganisms could be relevant in addition to heterotrophs. The functional gene aprA of sulfate reducing bacteria was found within distinct layers up to ca. 100 m depth in low copy numbers. The gene mcrA of methanogens was not detectable. Cloning and sequencing data of 16S rRNA genes revealed sequences of typical soil Bacteria. The closest relatives of the archaeal sequences were Archaea recovered from terrestrial and marine environments. Phylogenetic analysis of the Crenarchaeota and Euryarchaeota revealed new members of the uncultured South African Gold Mine Group, Deep Sea Hydrothermal Vent Euryarchaeotal Group 6, and Miscellaneous Crenarcheotic Group clusters.
Deep sequencing of cardiac microRNA-mRNA interactomes in clinical and experimental cardiomyopathy
Matkovich, Scot J.; Dorn, Gerald W.
2018-01-01
Summary MicroRNAs are a family of short (~21 nucleotide) noncoding RNAs that serve key roles in cellular growth and differentiation and the response of the heart to stress stimuli. As the sequence-specific recognition element of RNA-induced silencing complexes (RISCs), microRNAs bind mRNAs and prevent their translation via mechanisms that may include transcript degradation and/or prevention of ribosome binding. Short microRNA sequences and the ability of microRNAs to bind to mRNA sites having only partial/imperfect sequence complementarity complicates purely computational analyses of microRNA-mRNA interactomes. Furthermore, computational microRNA target prediction programs typically ignore biological context, and therefore the principal determinants of microRNA-mRNA binding: the presence and quantity of each. To address these deficiencies we describe an empirical method, developed via studies of stressed and failing hearts, to determine disease-induced changes in microRNAs, mRNAs, and the mRNAs targeted to the RISC, without cross-linking mRNAs to RISC proteins. Deep sequencing methods are used to determine RNA abundances, delivering unbiased, quantitative RNA data limited only by their annotation in the genome of interest. We describe the laboratory bench steps required to perform these experiments, experimental design strategies to achieve an appropriate number of sequencing reads per biological replicate, and computer-based processing tools and procedures to convert large raw sequencing data files into gene expression measures useful for differential expression analyses. PMID:25836573
Deep sequencing of cardiac microRNA-mRNA interactomes in clinical and experimental cardiomyopathy.
Matkovich, Scot J; Dorn, Gerald W
2015-01-01
MicroRNAs are a family of short (~21 nucleotide) noncoding RNAs that serve key roles in cellular growth and differentiation and the response of the heart to stress stimuli. As the sequence-specific recognition element of RNA-induced silencing complexes (RISCs), microRNAs bind mRNAs and prevent their translation via mechanisms that may include transcript degradation and/or prevention of ribosome binding. Short microRNA sequences and the ability of microRNAs to bind to mRNA sites having only partial/imperfect sequence complementarity complicate purely computational analyses of microRNA-mRNA interactomes. Furthermore, computational microRNA target prediction programs typically ignore biological context, and therefore the principal determinants of microRNA-mRNA binding: the presence and quantity of each. To address these deficiencies we describe an empirical method, developed via studies of stressed and failing hearts, to determine disease-induced changes in microRNAs, mRNAs, and the mRNAs targeted to the RISC, without cross-linking mRNAs to RISC proteins. Deep sequencing methods are used to determine RNA abundances, delivering unbiased, quantitative RNA data limited only by their annotation in the genome of interest. We describe the laboratory bench steps required to perform these experiments, experimental design strategies to achieve an appropriate number of sequencing reads per biological replicate, and computer-based processing tools and procedures to convert large raw sequencing data files into gene expression measures useful for differential expression analyses.
Deep sequencing in library selection projects: what insight does it bring?
Glanville, J; D'Angelo, S; Khan, T A; Reddy, S T; Naranjo, L; Ferrara, F; Bradbury, A R M
2015-08-01
High throughput sequencing is poised to change all aspects of the way antibodies and other binders are discovered and engineered. Millions of available sequence reads provide an unprecedented sampling depth able to guide the design and construction of effective, high quality naïve libraries containing tens of billions of unique molecules. Furthermore, during selections, high throughput sequencing enables quantitative tracing of enriched clones and position-specific guidance to amino acid variation under positive selection during antibody engineering. Successful application of the technologies relies on specific PCR reagent design, correct sequencing platform selection, and effective use of computational tools and statistical measures to remove error, identify antibodies, estimate diversity, and extract signatures of selection from the clone down to individual structural positions. Here we review these considerations and discuss some of the remaining challenges to the widespread adoption of the technology. Copyright © 2015 Elsevier Ltd. All rights reserved.
Deep sequencing in library selection projects: what insight does it bring?
Glanville, J; D’Angelo, S; Khan, T.A.; Reddy, S. T.; Naranjo, L.; Ferrara, F.; Bradbury, A.R.M.
2015-01-01
High throughput sequencing is poised to change all aspects of the way antibodies and other binders are discovered and engineered. Millions of available sequence reads provide an unprecedented sampling depth able to guide the design and construction of effective, high quality naïve libraries containing tens of billions of unique molecules. Furthermore, during selections, high throughput sequencing enables quantitative tracing of enriched clones and position-specific guidance to amino acid variation under positive selection during antibody engineering. Successful application of the technologies relies on specific PCR reagent design, correct sequencing platform selection, and effective use of computational tools and statistical measures to remove error, identify antibodies, estimate diversity, and extract signatures of selection from the clone down to individual structural positions. Here we review these considerations and discuss some of the remaining challenges to the widespread adoption of the technology. PMID:26451649
NASA Astrophysics Data System (ADS)
Yakimov, Michail M.; Cono, Violetta La; Denaro, Renata
2009-05-01
The autotrophic and ammonia-oxidizing crenarchaeal assemblage at offshore site located in the deep Mediterranean (Tyrrhenian Sea, depth 3000 m) water was studied by PCR amplification of the key functional genes involved in energy (ammonia mono-oxygenase alpha subunit, amoA) and central metabolism (acetyl-CoA carboxylase alpha subunit, accA). Using two recently annotated genomes of marine crenarchaeons, an initial set of primers targeting archaeal accA-like genes was designed. Approximately 300 clones were analyzed, of which 100% of amoA library and almost 70% of accA library were unambiguously related to the corresponding genes from marine Crenarchaeota. Even though the acetyl-CoA carboxylase is phylogenetically not well conserved and the remaining clones were affiliated to various bacterial acetyl-CoA/propionyl-CoA carboxylase genes, the pool of archaeal sequences was applied for development of quantitative PCR analysis of accA-like distribution using TaqMan ® methodolgy. The archaeal accA gene fragments, together with alignable gene fragments from the Sargasso Sea and North Pacific Subtropical Gyre (ALOHA Station) metagenome databases, were analyzed by multiple sequence alignment. Two accA-like sequences, found in ALOHA Station at the depth of 4000 m, formed a deeply branched clade with 64% of all archaeal Tyrrhenian clones. No close relatives for residual 36% of clones, except of those recovered from Eastern Mediterranean, was found, suggesting the existence of a specific lineage of the crenarchaeal accA genes in deep Mediterranean water. Alignment of Mediterranean amoA sequences defined four cosmopolitan phylotypes of Crenarchaeota putative ammonia mono-oxygenase subunit A gene occurring in the water sample from the 3000 m depth. Without exception all phylotypes fell into Deep Marine Group I cluster that contain the vast majority of known sequences recovered from global deep-sea environment. Remarkably, three phylotypes accounted for 91% of all Mediterranean amoA clones and corresponded to the sequences retrieved from the less deep compartments of the world's ocean, most likely reflecting the higher temperature at the depth of the Mediterranean Sea. In order to verify whether these phylotypes might represent important Crenarchaeota in the functioning of the Mediterranean bathypelagic ecosystem, expression of crenarchaeal amoA gene was monitored by direct RNA retrieval and following analysis of amoA-related mRNA transcripts. Surprisingly, all mRNA-derived sequences formed a tight monophyletic group, which fell into large Shallow Marine Group I cluster with sequences retrieved from shallow (up to 200 m) waters, sediments and corals. This group was not detected in DNA-based clone library, obviously, due to an overwhelming dominance of the Deep Marine Group I. The failure to recover the amoA transcripts, related to Deep Marine Group I of Crenarchaeota, was unanticipated and likely resulted from the physiology of these strongly adapted deep-sea organisms. As far as all seawater samples were treated on-board under atmospheric pressure conditions and sunlight, the decompression and/or photoinhibition likely affected their metabolic activity, followed by the strong decay of gene expression.
Blazejak, Anna; Schippers, Axel
2010-05-01
Sequences of members of the bacterial candidate division JS-1 and the classes Anaerolineae and Caldilineae of the phylum Chloroflexi are frequently found in 16S rRNA gene clone libraries obtained from marine sediments. Using a newly designed quantitative, real-time PCR assay, these bacterial groups were jointly quantified in samples from near-surface and deeply buried marine sediments from the Peru margin, the Black Sea, and a forearc basin off the island of Sumatra. In near-surface sediments, sequences of the JS-1 as well as Anaerolineae- and Caldilineae-related Bacteria were quantified with significantly lower 16S rRNA gene copy numbers than the sequences of total Bacteria. In contrast, in deeply buried sediments below approximately 1 m depth, similar quantities of the 16S rRNA gene copies of these specific groups and Bacteria were found. This finding indicates that JS-1 and Anaerolineae- and Caldilineae-related Bacteria might dominate the bacterial community in deeply buried marine sediments and thus seem to play an important ecological role in the deep biosphere.
Characterization of microRNAs from goat (Capra hircus) by Solexa deep-sequencing technology.
Ling, Y H; Ding, J P; Zhang, X D; Wang, L J; Zhang, Y H; Li, Y S; Zhang, Z J; Zhang, X R
2013-06-13
MicroRNAs (miRNAs) are an important class of small noncoding RNAs that are highly conserved in plants and animals. Many miRNAs are known to mediate a myriad of cell processes, including proliferation and differentiation, via the regulation of some transcription and signaling factors, which are closely related to muscle development and disease. In this study, small RNA cDNA libraries of Boer goats were constructed. In addition, we obtained the goat muscle miRNAs by using Solexa deep-sequencing technology and analyzed these miRNA characteristics by combining it with the bioinformatics technology. Based on Solexa sequencing and bioinformatics analysis, 562 species-conserved and 5 goat genome-specific miRNAs were identified, 322 of which exceeded 100 in the expression levels. The results of real-time quantitative polymerase chain reaction from 8 randomly selected miRNAs showed that the 8 miRNAs were expressed in goat muscle, and the expression patterns were consistent with the Solexa sequencing results. The identification and characterization of miRNAs in goat muscle provide important information on the role of miRNA regulation in muscle growth and development. These data will help to facilitate studies on the regulatory roles played by miRNAs during goat growth and development.
Han, Yucui; Lv, Peng; Hou, Shenglin; Li, Suying; Ji, Guisu; Ma, Xue; Du, Ruiheng; Liu, Guoqing
2015-01-01
Sorghum is one of the most promising bioenergy crops. Stem juice yield, together with stem sugar concentration, determines sugar yield in sweet sorghum. Bulked segregant analysis (BSA) is a gene mapping technique for identifying genomic regions containing genetic loci affecting a trait of interest that when combined with deep sequencing could effectively accelerate the gene mapping process. In this study, a dry stem sorghum landrace was characterized and the stem water controlling locus, qSW6, was fine mapped using QTL analysis and the combined BSA and deep sequencing technologies. Results showed that: (i) In sorghum variety Jiliang 2, stem water content was around 80% before flowering stage. It dropped to 75% during grain filling with little difference between different internodes. In landrace G21, stem water content keeps dropping after the flag leaf stage. The drop from 71% at flowering time progressed to 60% at grain filling time. Large differences exist between different internodes with the lowest (51%) at the 7th and 8th internodes at dough stage. (ii) A quantitative trait locus (QTL) controlling stem water content mapped on chromosome 6 between SSR markers Ch6-2 and gpsb069 explained about 34.7-56.9% of the phenotypic variation for the 5th to 10th internodes, respectively. (iii) BSA and deep sequencing analysis narrowed the associated region to 339 kb containing 38 putative genes. The results could help reveal molecular mechanisms underlying juice yield of sorghum and thus to improve total sugar yield.
Cai, Congbo; Wang, Chao; Zeng, Yiqing; Cai, Shuhui; Liang, Dong; Wu, Yawen; Chen, Zhong; Ding, Xinghao; Zhong, Jianhui
2018-04-24
An end-to-end deep convolutional neural network (CNN) based on deep residual network (ResNet) was proposed to efficiently reconstruct reliable T 2 mapping from single-shot overlapping-echo detachment (OLED) planar imaging. The training dataset was obtained from simulations that were carried out on SPROM (Simulation with PRoduct Operator Matrix) software developed by our group. The relationship between the original OLED image containing two echo signals and the corresponding T 2 mapping was learned by ResNet training. After the ResNet was trained, it was applied to reconstruct the T 2 mapping from simulation and in vivo human brain data. Although the ResNet was trained entirely on simulated data, the trained network was generalized well to real human brain data. The results from simulation and in vivo human brain experiments show that the proposed method significantly outperforms the echo-detachment-based method. Reliable T 2 mapping with higher accuracy is achieved within 30 ms after the network has been trained, while the echo-detachment-based OLED reconstruction method took approximately 2 min. The proposed method will facilitate real-time dynamic and quantitative MR imaging via OLED sequence, and deep convolutional neural network has the potential to reconstruct maps from complex MRI sequences efficiently. © 2018 International Society for Magnetic Resonance in Medicine.
DeepBase: annotation and discovery of microRNAs and other noncoding RNAs from deep-sequencing data.
Yang, Jian-Hua; Qu, Liang-Hu
2012-01-01
Recent advances in high-throughput deep-sequencing technology have produced large numbers of short and long RNA sequences and enabled the detection and profiling of known and novel microRNAs (miRNAs) and other noncoding RNAs (ncRNAs) at unprecedented sensitivity and depth. In this chapter, we describe the use of deepBase, a database that we have developed to integrate all public deep-sequencing data and to facilitate the comprehensive annotation and discovery of miRNAs and other ncRNAs from these data. deepBase provides an integrative, interactive, and versatile web graphical interface to evaluate miRBase-annotated miRNA genes and other known ncRNAs, explores the expression patterns of miRNAs and other ncRNAs, and discovers novel miRNAs and other ncRNAs from deep-sequencing data. deepBase also provides a deepView genome browser to comparatively analyze these data at multiple levels. deepBase is available at http://deepbase.sysu.edu.cn/.
Making sense of deep sequencing
Goldman, D.; Domschke, K.
2016-01-01
This review, the first of an occasional series, tries to make sense of the concepts and uses of deep sequencing of polynucleic acids (DNA and RNA). Deep sequencing, synonymous with next-generation sequencing, high-throughput sequencing and massively parallel sequencing, includes whole genome sequencing but is more often and diversely applied to specific parts of the genome captured in different ways, for example the highly expressed portion of the genome known as the exome and portions of the genome that are epigenetically marked either by DNA methylation, the binding of proteins including histones, or that are in different configurations and thus more or less accessible to enzymes that cleave DNA. Deep sequencing of RNA (RNASeq) reverse-transcribed to complementary DNA is invaluable for measuring RNA expression and detecting changes in RNA structure. Important concepts in deep sequencing include the length and depth of sequence reads, mapping and assembly of reads, sequencing error, haplotypes, and the propensity of deep sequencing, as with other types of ‘big data’, to generate large numbers of errors, requiring monitoring for methodologic biases and strategies for replication and validation. Deep sequencing yields a unique genetic fingerprint that can be used to identify a person, and a trove of predictors of genetic medical diseases. Deep sequencing to identify epigenetic events including changes in DNA methylation and RNA expression can reveal the history and impact of environmental exposures. Because of the power of sequencing to identify and deliver biomedically significant information about a person and their blood relatives, it creates ethical dilemmas and practical challenges in research and clinical care, for example the decision and procedures to report incidental findings that will increasingly and frequently be discovered. PMID:24925306
Li, Guo; Liu, Yong; Liu, Chao; Su, Zhongwu; Ren, Shuling; Wang, Yunyun; Deng, Tengbo; Huang, Donghai; Tian, Yongquan; Qiu, Yuanzheng
2016-09-06
Radioresistance is one of the major factors limiting the therapeutic efficacy and prognosis of patients with nasopharyngeal carcinoma (NPC). Accumulating evidence has suggested that aberrant expression of long noncoding RNAs (lncRNAs) contributes to cancer progression. Therefore, here we identified lncRNAs associated with radioresistance in NPC. The differential expression profiles of lncRNAs associated with NPC radioresistance were constructed by next-generation deep sequencing by comparing radioresistant NPC cells with their parental cells. LncRNA-related mRNAs were predicted and analyzed using bioinformatics algorithms compared with the mRNA profiles related to radioresistance obtained in our previous study. Several lncRNAs and associated mRNAs were validated in established NPC radioresistant cell models and NPC tissues. By comparison between radioresistant CNE-2-Rs and parental CNE-2 cells by next-generation deep sequencing, a total of 781 known lncRNAs and 2054 novel lncRNAs were annotated. The top five upregulated and downregulated known/novel lncRNAs were detected using quantitative real-time reverse transcription-polymerase chain reaction, and 7/10 known lncRNAs and 3/10 novel lncRNAs were demonstrated to have significant differential expression trends that were the same as those predicted by deep sequencing. From the prediction process, 13 pairs of lncRNAs and their associated genes were acquired, and the prediction trends of three pairs were validated in both radioresistant CNE-2-Rs and 6-10B-Rs cell lines, including lncRNA n373932 and SLITRK5, n409627 and PRSS12, and n386034 and RIMKLB. LncRNA n373932 and its related SLITRK5 showed dramatic expression changes in post-irradiation radioresistant cells and a negative expression correlation in NPC tissues (R = -0.595, p < 0.05). Our study provides an overview of the expression profiles of radioresistant lncRNAs and potentially related mRNAs, which will facilitate future investigations into the function of lncRNAs in NPC radioresistance.
De novo peptide sequencing by deep learning
Tran, Ngoc Hieu; Zhang, Xianglilan; Xin, Lei; Shan, Baozhen; Li, Ming
2017-01-01
De novo peptide sequencing from tandem MS data is the key technology in proteomics for the characterization of proteins, especially for new sequences, such as mAbs. In this study, we propose a deep neural network model, DeepNovo, for de novo peptide sequencing. DeepNovo architecture combines recent advances in convolutional neural networks and recurrent neural networks to learn features of tandem mass spectra, fragment ions, and sequence patterns of peptides. The networks are further integrated with local dynamic programming to solve the complex optimization task of de novo sequencing. We evaluated the method on a wide variety of species and found that DeepNovo considerably outperformed state of the art methods, achieving 7.7–22.9% higher accuracy at the amino acid level and 38.1–64.0% higher accuracy at the peptide level. We further used DeepNovo to automatically reconstruct the complete sequences of antibody light and heavy chains of mouse, achieving 97.5–100% coverage and 97.2–99.5% accuracy, without assisting databases. Moreover, DeepNovo is retrainable to adapt to any sources of data and provides a complete end-to-end training and prediction solution to the de novo sequencing problem. Not only does our study extend the deep learning revolution to a new field, but it also shows an innovative approach in solving optimization problems by using deep learning and dynamic programming. PMID:28720701
Rapid Creation and Quantitative Monitoring of High Coverage shRNA Libraries
Bassik, Michael C.; Lebbink, Robert Jan; Churchman, L. Stirling; Ingolia, Nicholas T.; Patena, Weronika; LeProust, Emily M.; Schuldiner, Maya; Weissman, Jonathan S.; McManus, Michael T.
2009-01-01
Short hairpin RNA (shRNA) libraries are limited by the low efficacy of many shRNAs, giving false negatives, and off-target effects, giving false positives. Here we present a strategy for rapidly creating expanded shRNA pools (∼30 shRNAs/gene) that are analyzed by deep-sequencing (EXPAND). This approach enables identification of multiple effective target-specific shRNAs from a complex pool, allowing a rigorous statistical evaluation of whether a gene is a true hit. PMID:19448642
Scala, Giovanni; Affinito, Ornella; Palumbo, Domenico; Florio, Ermanno; Monticelli, Antonella; Miele, Gennaro; Chiariotti, Lorenzo; Cocozza, Sergio
2016-11-25
CpG sites in an individual molecule may exist in a binary state (methylated or unmethylated) and each individual DNA molecule, containing a certain number of CpGs, is a combination of these states defining an epihaplotype. Classic quantification based approaches to study DNA methylation are intrinsically unable to fully represent the complexity of the underlying methylation substrate. Epihaplotype based approaches, on the other hand, allow methylation profiles of cell populations to be studied at the single molecule level. For such investigations, next-generation sequencing techniques can be used, both for quantitative and for epihaplotype analysis. Currently available tools for methylation analysis lack output formats that explicitly report CpG methylation profiles at the single molecule level and that have suited statistical tools for their interpretation. Here we present ampliMethProfiler, a python-based pipeline for the extraction and statistical epihaplotype analysis of amplicons from targeted deep bisulfite sequencing of multiple DNA regions. ampliMethProfiler tool provides an easy and user friendly way to extract and analyze the epihaplotype composition of reads from targeted bisulfite sequencing experiments. ampliMethProfiler is written in python language and requires a local installation of BLAST and (optionally) QIIME tools. It can be run on Linux and OS X platforms. The software is open source and freely available at http://amplimethprofiler.sourceforge.net .
NASA Astrophysics Data System (ADS)
Zhang, Xiao-Yong; Wang, Guang-Hua; Xu, Xin-Ya; Nong, Xu-Hua; Wang, Jie; Amin, Muhammad; Qi, Shu-Hua
2016-10-01
The present study investigated the fungal diversity in four different deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing of the nuclear ribosomal internal transcribed spacer-1 (ITS1). A total of 40,297 fungal ITS1 sequences clustered into 420 operational taxonomic units (OTUs) with 97% sequence similarity and 170 taxa were recovered from these sediments. Most ITS1 sequences (78%) belonged to the phylum Ascomycota, followed by Basidiomycota (17.3%), Zygomycota (1.5%) and Chytridiomycota (0.8%), and a small proportion (2.4%) belonged to unassigned fungal phyla. Compared with previous studies on fungal diversity of sediments from deep-sea environments by culture-dependent approach and clone library analysis, the present result suggested that Illumina sequencing had been dramatically accelerating the discovery of fungal community of deep-sea sediments. Furthermore, our results revealed that Sordariomycetes was the most diverse and abundant fungal class in this study, challenging the traditional view that the diversity of Sordariomycetes phylotypes was low in the deep-sea environments. In addition, more than 12 taxa accounted for 21.5% sequences were found to be rarely reported as deep-sea fungi, suggesting the deep-sea sediments from Okinawa Trough harbored a plethora of different fungal communities compared with other deep-sea environments. To our knowledge, this study is the first exploration of the fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing.
Genomic approaches for the elucidation of genes and gene networks underlying cardiovascular traits.
Adriaens, M E; Bezzina, C R
2018-06-22
Genome-wide association studies have shed light on the association between natural genetic variation and cardiovascular traits. However, linking a cardiovascular trait associated locus to a candidate gene or set of candidate genes for prioritization for follow-up mechanistic studies is all but straightforward. Genomic technologies based on next-generation sequencing technology nowadays offer multiple opportunities to dissect gene regulatory networks underlying genetic cardiovascular trait associations, thereby aiding in the identification of candidate genes at unprecedented scale. RNA sequencing in particular becomes a powerful tool when combined with genotyping to identify loci that modulate transcript abundance, known as expression quantitative trait loci (eQTL), or loci modulating transcript splicing known as splicing quantitative trait loci (sQTL). Additionally, the allele-specific resolution of RNA-sequencing technology enables estimation of allelic imbalance, a state where the two alleles of a gene are expressed at a ratio differing from the expected 1:1 ratio. When multiple high-throughput approaches are combined with deep phenotyping in a single study, a comprehensive elucidation of the relationship between genotype and phenotype comes into view, an approach known as systems genetics. In this review, we cover key applications of systems genetics in the broad cardiovascular field.
Preparation of metagenomic libraries from naturally occurring marine viruses.
Solonenko, Sergei A; Sullivan, Matthew B
2013-01-01
Microbes are now well recognized as major drivers of the biogeochemical cycling that fuels the Earth, and their viruses (phages) are known to be abundant and important in microbial mortality, horizontal gene transfer, and modulating microbial metabolic output. Investigation of environmental phages has been frustrated by an inability to culture the vast majority of naturally occurring diversity coupled with the lack of robust, quantitative, culture-independent methods for studying this uncultured majority. However, for double-stranded DNA phages, a quantitative viral metagenomic sample-to-sequence workflow now exists. Here, we review these advances with special emphasis on the technical details of preparing DNA sequencing libraries for metagenomic sequencing from environmentally relevant low-input DNA samples. Library preparation steps broadly involve manipulating the sample DNA by fragmentation, end repair and adaptor ligation, size fractionation, and amplification. One critical area of future research and development is parallel advances for alternate nucleic acid types such as single-stranded DNA and RNA viruses that are also abundant in nature. Combinations of recent advances in fragmentation (e.g., acoustic shearing and tagmentation), ligation reactions (adaptor-to-template ratio reference table availability), size fractionation (non-gel-sizing), and amplification (linear amplification for deep sequencing and linker amplification protocols) enhance our ability to generate quantitatively representative metagenomic datasets from low-input DNA samples. Such datasets are already providing new insights into the role of viruses in marine systems and will continue to do so as new environments are explored and synergies and paradigms emerge from large-scale comparative analyses. © 2013 Elsevier Inc. All rights reserved.
Avramenko, Russell W; Redman, Elizabeth M; Lewis, Roy; Yazwinski, Thomas A; Wasmuth, James D; Gilleard, John S
2015-01-01
Parasitic helminth infections have a considerable impact on global human health as well as animal welfare and production. Although co-infection with multiple parasite species within a host is common, there is a dearth of tools with which to study the composition of these complex parasite communities. Helminth species vary in their pathogenicity, epidemiology and drug sensitivity and the interactions that occur between co-infecting species and their hosts are poorly understood. We describe the first application of deep amplicon sequencing to study parasitic nematode communities as well as introduce the concept of the gastro-intestinal "nemabiome". The approach is analogous to 16S rDNA deep sequencing used to explore microbial communities, but utilizes the nematode ITS-2 rDNA locus instead. Gastro-intestinal parasites of cattle were used to develop the concept, as this host has many well-defined gastro-intestinal nematode species that commonly occur as complex co-infections. Further, the availability of pure mono-parasite populations from experimentally infected cattle allowed us to prepare mock parasite communities to determine, and correct for, species representation biases in the sequence data. We demonstrate that, once these biases have been corrected, accurate relative quantitation of gastro-intestinal parasitic nematode communities in cattle fecal samples can be achieved. We have validated the accuracy of the method applied to field-samples by comparing the results of detailed morphological examination of L3 larvae populations with those of the sequencing assay. The results illustrate the insights that can be gained into the species composition of parasite communities, using grazing cattle in the mid-west USA as an example. However, both the technical approach and the concept of the 'nemabiome' have a wide range of potential applications in human and veterinary medicine. These include investigations of host-parasite and parasite-parasite interactions during co-infection, parasite epidemiology, parasite ecology and the response of parasite populations to both drug treatments and control programs.
Deep Sequencing to Identify the Causes of Viral Encephalitis
Chan, Benjamin K.; Wilson, Theodore; Fischer, Kael F.; Kriesel, John D.
2014-01-01
Deep sequencing allows for a rapid, accurate characterization of microbial DNA and RNA sequences in many types of samples. Deep sequencing (also called next generation sequencing or NGS) is being developed to assist with the diagnosis of a wide variety of infectious diseases. In this study, seven frozen brain samples from deceased subjects with recent encephalitis were investigated. RNA from each sample was extracted, randomly reverse transcribed and sequenced. The sequence analysis was performed in a blinded fashion and confirmed with pathogen-specific PCR. This analysis successfully identified measles virus sequences in two brain samples and herpes simplex virus type-1 sequences in three brain samples. No pathogen was identified in the other two brain specimens. These results were concordant with pathogen-specific PCR and partially concordant with prior neuropathological examinations, demonstrating that deep sequencing can accurately identify viral infections in frozen brain tissue. PMID:24699691
Wu, Lucia R.; Chen, Sherry X.; Wu, Yalei; Patel, Abhijit A.; Zhang, David Yu
2018-01-01
Rare DNA-sequence variants hold important clinical and biological information, but existing detection techniques are expensive, complex, allele-specific, or don’t allow for significant multiplexing. Here, we report a temperature-robust polymerase-chain-reaction method, which we term blocker displacement amplification (BDA), that selectively amplifies all sequence variants, including single-nucleotide variants (SNVs), within a roughly 20-nucleotide window by 1,000-fold over wild-type sequences. This allows for easy detection and quantitation of hundreds of potential variants originally at ≤0.1% in allele frequency. BDA is compatible with inexpensive thermocycler instrumentation and employs a rationally designed competitive hybridization reaction to achieve comparable enrichment performance across annealing temperatures ranging from 56 °C to 64 °C. To show the sequence generality of BDA, we demonstrate enrichment of 156 SNVs and the reliable detection of single-digit copies. We also show that the BDA detection of rare driver mutations in cell-free DNA samples extracted from the blood plasma of lung-cancer patients is highly consistent with deep sequencing using molecular lineage tags, with a receiver operator characteristic accuracy of 95%. PMID:29805844
Methods, Tools and Current Perspectives in Proteogenomics *
Ruggles, Kelly V.; Krug, Karsten; Wang, Xiaojing; Clauser, Karl R.; Wang, Jing; Payne, Samuel H.; Fenyö, David; Zhang, Bing; Mani, D. R.
2017-01-01
With combined technological advancements in high-throughput next-generation sequencing and deep mass spectrometry-based proteomics, proteogenomics, i.e. the integrative analysis of proteomic and genomic data, has emerged as a new research field. Early efforts in the field were focused on improving protein identification using sample-specific genomic and transcriptomic sequencing data. More recently, integrative analysis of quantitative measurements from genomic and proteomic studies have identified novel insights into gene expression regulation, cell signaling, and disease. Many methods and tools have been developed or adapted to enable an array of integrative proteogenomic approaches and in this article, we systematically classify published methods and tools into four major categories, (1) Sequence-centric proteogenomics; (2) Analysis of proteogenomic relationships; (3) Integrative modeling of proteogenomic data; and (4) Data sharing and visualization. We provide a comprehensive review of methods and available tools in each category and highlight their typical applications. PMID:28456751
Novel method for high-throughput colony PCR screening in nanoliter-reactors
Walser, Marcel; Pellaux, Rene; Meyer, Andreas; Bechtold, Matthias; Vanderschuren, Herve; Reinhardt, Richard; Magyar, Joseph; Panke, Sven; Held, Martin
2009-01-01
We introduce a technology for the rapid identification and sequencing of conserved DNA elements employing a novel suspension array based on nanoliter (nl)-reactors made from alginate. The reactors have a volume of 35 nl and serve as reaction compartments during monoseptic growth of microbial library clones, colony lysis, thermocycling and screening for sequence motifs via semi-quantitative fluorescence analyses. nl-Reactors were kept in suspension during all high-throughput steps which allowed performing the protocol in a highly space-effective fashion and at negligible expenses of consumables and reagents. As a first application, 11 high-quality microsatellites for polymorphism studies in cassava were isolated and sequenced out of a library of 20 000 clones in 2 days. The technology is widely scalable and we envision that throughputs for nl-reactor based screenings can be increased up to 100 000 and more samples per day thereby efficiently complementing protocols based on established deep-sequencing technologies. PMID:19282448
Härtl, Katja; Kalinowski, Gregor; Hoffmann, Thomas; Preuss, Anja; Schwab, Wilfried
2017-05-01
RNA interference (RNAi) has been exploited as a reverse genetic tool for functional genomics in the nonmodel species strawberry (Fragaria × ananassa) since 2006. Here, we analysed for the first time different but overlapping nucleotide sections (>200 nt) of two endogenous genes, FaCHS (chalcone synthase) and FaOMT (O-methyltransferase), as inducer sequences and a transitive vector system to compare their gene silencing efficiencies. In total, ten vectors were assembled each containing the nucleotide sequence of one fragment in sense and corresponding antisense orientation separated by an intron (inverted hairpin construct, ihp). All sequence fragments along the full lengths of both target genes resulted in a significant down-regulation of the respective gene expression and related metabolite levels. Quantitative PCR data and successful application of a transitive vector system coinciding with a phenotypic change suggested propagation of the silencing signal. The spreading of the signal in strawberry fruit in the 3' direction was shown for the first time by the detection of secondary small interfering RNAs (siRNAs) outside of the primary targets by deep sequencing. Down-regulation of endogenes by the transitive method was less effective than silencing by ihp constructs probably because the numbers of primary siRNAs exceeded the quantity of secondary siRNAs by three orders of magnitude. Besides, we observed consistent hotspots of primary and secondary siRNA formation along the target sequence which fall within a distance of less than 200 nt. Thus, ihp vectors seem to be superior over the transitive vector system for functional genomics in strawberry fruit. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Resolving the Complexity of Human Skin Metagenomes Using Single-Molecule Sequencing
Tsai, Yu-Chih; Deming, Clayton; Segre, Julia A.; Kong, Heidi H.; Korlach, Jonas
2016-01-01
ABSTRACT Deep metagenomic shotgun sequencing has emerged as a powerful tool to interrogate composition and function of complex microbial communities. Computational approaches to assemble genome fragments have been demonstrated to be an effective tool for de novo reconstruction of genomes from these communities. However, the resultant “genomes” are typically fragmented and incomplete due to the limited ability of short-read sequence data to assemble complex or low-coverage regions. Here, we use single-molecule, real-time (SMRT) sequencing to reconstruct a high-quality, closed genome of a previously uncharacterized Corynebacterium simulans and its companion bacteriophage from a skin metagenomic sample. Considerable improvement in assembly quality occurs in hybrid approaches incorporating short-read data, with even relatively small amounts of long-read data being sufficient to improve metagenome reconstruction. Using short-read data to evaluate strain variation of this C. simulans in its skin community at single-nucleotide resolution, we observed a dominant C. simulans strain with moderate allelic heterozygosity throughout the population. We demonstrate the utility of SMRT sequencing and hybrid approaches in metagenome quantitation, reconstruction, and annotation. PMID:26861018
Shafiee, Mohammad Javad; Chung, Audrey G; Khalvati, Farzad; Haider, Masoom A; Wong, Alexander
2017-10-01
While lung cancer is the second most diagnosed form of cancer in men and women, a sufficiently early diagnosis can be pivotal in patient survival rates. Imaging-based, or radiomics-driven, detection methods have been developed to aid diagnosticians, but largely rely on hand-crafted features that may not fully encapsulate the differences between cancerous and healthy tissue. Recently, the concept of discovery radiomics was introduced, where custom abstract features are discovered from readily available imaging data. We propose an evolutionary deep radiomic sequencer discovery approach based on evolutionary deep intelligence. Motivated by patient privacy concerns and the idea of operational artificial intelligence, the evolutionary deep radiomic sequencer discovery approach organically evolves increasingly more efficient deep radiomic sequencers that produce significantly more compact yet similarly descriptive radiomic sequences over multiple generations. As a result, this framework improves operational efficiency and enables diagnosis to be run locally at the radiologist's computer while maintaining detection accuracy. We evaluated the evolved deep radiomic sequencer (EDRS) discovered via the proposed evolutionary deep radiomic sequencer discovery framework against state-of-the-art radiomics-driven and discovery radiomics methods using clinical lung CT data with pathologically proven diagnostic data from the LIDC-IDRI dataset. The EDRS shows improved sensitivity (93.42%), specificity (82.39%), and diagnostic accuracy (88.78%) relative to previous radiomics approaches.
Pan, Xiaoyong; Shen, Hong-Bin
2018-05-02
RNA-binding proteins (RBPs) take over 5∼10% of the eukaryotic proteome and play key roles in many biological processes, e.g. gene regulation. Experimental detection of RBP binding sites is still time-intensive and high-costly. Instead, computational prediction of the RBP binding sites using pattern learned from existing annotation knowledge is a fast approach. From the biological point of view, the local structure context derived from local sequences will be recognized by specific RBPs. However, in computational modeling using deep learning, to our best knowledge, only global representations of entire RNA sequences are employed. So far, the local sequence information is ignored in the deep model construction process. In this study, we present a computational method iDeepE to predict RNA-protein binding sites from RNA sequences by combining global and local convolutional neural networks (CNNs). For the global CNN, we pad the RNA sequences into the same length. For the local CNN, we split a RNA sequence into multiple overlapping fixed-length subsequences, where each subsequence is a signal channel of the whole sequence. Next, we train deep CNNs for multiple subsequences and the padded sequences to learn high-level features, respectively. Finally, the outputs from local and global CNNs are combined to improve the prediction. iDeepE demonstrates a better performance over state-of-the-art methods on two large-scale datasets derived from CLIP-seq. We also find that the local CNN run 1.8 times faster than the global CNN with comparable performance when using GPUs. Our results show that iDeepE has captured experimentally verified binding motifs. https://github.com/xypan1232/iDeepE. xypan172436@gmail.com or hbshen@sjtu.edu.cn. Supplementary data are available at Bioinformatics online.
Exon 11 skipping of SCN10A coding for voltage-gated sodium channels in dorsal root ganglia
Schirmeyer, Jana; Szafranski, Karol; Leipold, Enrico; Mawrin, Christian; Platzer, Matthias; Heinemann, Stefan H
2014-01-01
The voltage-gated sodium channel NaV1.8 (encoded by SCN10A) is predominantly expressed in dorsal root ganglia (DRG) and plays a critical role in pain perception. We analyzed SCN10A transcripts isolated from human DRGs using deep sequencing and found a novel splice variant lacking exon 11, which codes for 98 amino acids of the domain I/II linker. Quantitative PCR analysis revealed an abundance of this variant of up to 5–10% in human, while no such variants were detected in mouse or rat. Since no obvious functional differences between channels with and without the exon-11 sequence were detected, it is suggested that SCN10A exon 11 skipping in humans is a tolerated event. PMID:24763188
Novel microbial assemblages inhabiting crustal fluids within mid-ocean ridge flank subsurface basalt
Jungbluth, Sean P; Bowers, Robert M; Lin, Huei-Ting; Cowen, James P; Rappé, Michael S
2016-01-01
Although little is known regarding microbial life within our planet's rock-hosted deep subseafloor biosphere, boreholes drilled through deep ocean sediment and into the underlying basaltic crust provide invaluable windows of access that have been used previously to document the presence of microorganisms within fluids percolating through the deep ocean crust. In this study, the analysis of 1.7 million small subunit ribosomal RNA genes amplified and sequenced from marine sediment, bottom seawater and basalt-hosted deep subseafloor fluids that span multiple years and locations on the Juan de Fuca Ridge flank was used to quantitatively delineate a subseafloor microbiome comprised of distinct bacteria and archaea. Hot, anoxic crustal fluids tapped by newly installed seafloor sampling observatories at boreholes U1362A and U1362B contained abundant bacterial lineages of phylogenetically unique Nitrospirae, Aminicenantes, Calescamantes and Chloroflexi. Although less abundant, the domain Archaea was dominated by unique, uncultivated lineages of marine benthic group E, the Terrestrial Hot Spring Crenarchaeotic Group, the Bathyarchaeota and relatives of cultivated, sulfate-reducing Archaeoglobi. Consistent with recent geochemical measurements and bioenergetic predictions, the potential importance of methane cycling and sulfate reduction were imprinted within the basalt-hosted deep subseafloor crustal fluid microbial community. This unique window of access to the deep ocean subsurface basement reveals a microbial landscape that exhibits previously undetected spatial heterogeneity. PMID:26872042
You, Ronghui; Huang, Xiaodi; Zhu, Shanfeng
2018-06-06
As of April 2018, UniProtKB has collected more than 115 million protein sequences. Less than 0.15% of these proteins, however, have been associated with experimental GO annotations. As such, the use of automatic protein function prediction (AFP) to reduce this huge gap becomes increasingly important. The previous studies conclude that sequence homology based methods are highly effective in AFP. In addition, mining motif, domain, and functional information from protein sequences has been found very helpful for AFP. Other than sequences, alternative information sources such as text, however, may be useful for AFP as well. Instead of using BOW (bag of words) representation in traditional text-based AFP, we propose a new method called DeepText2GO that relies on deep semantic text representation, together with different kinds of available protein information such as sequence homology, families, domains, and motifs, to improve large-scale AFP. Furthermore, DeepText2GO integrates text-based methods with sequence-based ones by means of a consensus approach. Extensive experiments on the benchmark dataset extracted from UniProt/SwissProt have demonstrated that DeepText2GO significantly outperformed both text-based and sequence-based methods, validating its superiority. Copyright © 2018 Elsevier Inc. All rights reserved.
Sachsenröder, Jana; Twardziok, Sven; Hammerl, Jens A; Janczyk, Pawel; Wrede, Paul; Hertwig, Stefan; Johne, Reimar
2012-01-01
Animal faeces comprise a community of many different microorganisms including bacteria and viruses. Only scarce information is available about the diversity of viruses present in the faeces of pigs. Here we describe a protocol, which was optimized for the purification of the total fraction of viral particles from pig faeces. The genomes of the purified DNA and RNA viruses were simultaneously amplified by PCR and subjected to deep sequencing followed by bioinformatic analyses. The efficiency of the method was monitored using a process control consisting of three bacteriophages (T4, M13 and MS2) with different morphology and genome types. Defined amounts of the bacteriophages were added to the sample and their abundance was assessed by quantitative PCR during the preparation procedure. The procedure was applied to a pooled faecal sample of five pigs. From this sample, 69,613 sequence reads were generated. All of the added bacteriophages were identified by sequence analysis of the reads. In total, 7.7% of the reads showed significant sequence identities with published viral sequences. They mainly originated from bacteriophages (73.9%) and mammalian viruses (23.9%); 0.8% of the sequences showed identities to plant viruses. The most abundant detected porcine viruses were kobuvirus, rotavirus C, astrovirus, enterovirus B, sapovirus and picobirnavirus. In addition, sequences with identities to the chimpanzee stool-associated circular ssDNA virus were identified. Whole genome analysis indicates that this virus, tentatively designated as pig stool-associated circular ssDNA virus (PigSCV), represents a novel pig virus. The established protocol enables the simultaneous detection of DNA and RNA viruses in pig faeces including the identification of so far unknown viruses. It may be applied in studies investigating aetiology, epidemiology and ecology of diseases. The implemented process control serves as quality control, ensures comparability of the method and may be used for further method optimization.
Zhang, Xi
2016-01-01
Neurotransmitter ligand-gated ion channels (LGICs) are widespread and pivotal in brain functions. Unveiling their structure-function mechanisms is crucial to drive drug discovery, and demands robust proteomic quantitation of expression, post-translational modifications (PTMs) and dynamic structures. Yet unbiased digestion of these modified transmembrane proteins—at high efficiency and peptide reproducibility—poses the obstacle. Targeting both enzyme-substrate contacts and PTMs for peptide formation and detection, we devised flow-and-detergent-facilitated protease and de-PTM digestions for deep sequencing (FDD) method that combined omni-compatible detergent, tandem immobilized protease/PNGase columns, and Cys-selective reduction/alkylation, to achieve streamlined ultradeep peptide preparation within minutes not days, at high peptide reproducibility and low abundance-bias. FDD transformed enzyme-protein contacts into equal catalytic travel paths through enzyme-excessive columns regardless of protein abundance, removed products instantly preventing inhibition, tackled intricate structures via sequential multiple micro-digestions along the flow, and precisely controlled peptide formation by flow rate. Peptide-stage reactions reduced steric bias; low contamination deepened MS/MS scan; distinguishing disulfide from M oxidation and avoiding gain/loss artifacts unmasked protein-endogenous oxidation states. Using a recent interactome of 285-kDa human GABA type A receptor, this pilot study validated FDD platform's applicability to deep sequencing (up to 99% coverage), H/D-exchange and TMT-based structural mapping. FDD discovered novel subunit-specific PTM signatures, including unusual nontop-surface N-glycosylations, that may drive subunit biases in human Cys-loop LGIC assembly and pharmacology, by redefining subunit/ligand interfaces and connecting function domains. PMID:27073180
Zhang, Xiao-yong; Tang, Gui-ling; Xu, Xin-ya; Nong, Xu-hua; Qi, Shu-Hua
2014-01-01
The fungal diversity in deep-sea environments has recently gained an increasing amount attention. Our knowledge and understanding of the true fungal diversity and the role it plays in deep-sea environments, however, is still limited. We investigated the fungal community structure in five sediments from a depth of ∼4000 m in the East India Ocean using a combination of targeted environmental sequencing and traditional cultivation. This approach resulted in the recovery of a total of 45 fungal operational taxonomic units (OTUs) and 20 culturable fungal phylotypes. This finding indicates that there is a great amount of fungal diversity in the deep-sea sediments collected in the East Indian Ocean. Three fungal OTUs and one culturable phylotype demonstrated high divergence (89%–97%) from the existing sequences in the GenBank. Moreover, 44.4% fungal OTUs and 30% culturable fungal phylotypes are new reports for deep-sea sediments. These results suggest that the deep-sea sediments from the East India Ocean can serve as habitats for new fungal communities compared with other deep-sea environments. In addition, different fungal community could be detected when using targeted environmental sequencing compared with traditional cultivation in this study, which suggests that a combination of targeted environmental sequencing and traditional cultivation will generate a more diverse fungal community in deep-sea environments than using either targeted environmental sequencing or traditional cultivation alone. This study is the first to report new insights into the fungal communities in deep-sea sediments from the East Indian Ocean, which increases our knowledge and understanding of the fungal diversity in deep-sea environments. PMID:25272044
Gutierrez, Tony; Biddle, Jennifer F; Teske, Andreas; Aitken, Michael D
2015-01-01
Marine hydrocarbon-degrading bacteria perform a fundamental role in the biodegradation of crude oil and its petrochemical derivatives in coastal and open ocean environments. However, there is a paucity of knowledge on the diversity and function of these organisms in deep-sea sediment. Here we used stable-isotope probing (SIP), a valuable tool to link the phylogeny and function of targeted microbial groups, to investigate polycyclic aromatic hydrocarbon (PAH)-degrading bacteria under aerobic conditions in sediments from Guaymas Basin with uniformly labeled [(13)C]-phenanthrene (PHE). The dominant sequences in clone libraries constructed from (13)C-enriched bacterial DNA (from PHE enrichments) were identified to belong to the genus Cycloclasticus. We used quantitative PCR primers targeting the 16S rRNA gene of the SIP-identified Cycloclasticus to determine their abundance in sediment incubations amended with unlabeled PHE and showed substantial increases in gene abundance during the experiments. We also isolated a strain, BG-2, representing the SIP-identified Cycloclasticus sequence (99.9% 16S rRNA gene sequence identity), and used this strain to provide direct evidence of PHE degradation and mineralization. In addition, we isolated Halomonas, Thalassospira, and Lutibacterium sp. with demonstrable PHE-degrading capacity from Guaymas Basin sediment. This study demonstrates the value of coupling SIP with cultivation methods to identify and expand on the known diversity of PAH-degrading bacteria in the deep-sea.
Gutierrez, Tony; Biddle, Jennifer F.; Teske, Andreas; Aitken, Michael D.
2015-01-01
Marine hydrocarbon-degrading bacteria perform a fundamental role in the biodegradation of crude oil and its petrochemical derivatives in coastal and open ocean environments. However, there is a paucity of knowledge on the diversity and function of these organisms in deep-sea sediment. Here we used stable-isotope probing (SIP), a valuable tool to link the phylogeny and function of targeted microbial groups, to investigate polycyclic aromatic hydrocarbon (PAH)-degrading bacteria under aerobic conditions in sediments from Guaymas Basin with uniformly labeled [13C]-phenanthrene (PHE). The dominant sequences in clone libraries constructed from 13C-enriched bacterial DNA (from PHE enrichments) were identified to belong to the genus Cycloclasticus. We used quantitative PCR primers targeting the 16S rRNA gene of the SIP-identified Cycloclasticus to determine their abundance in sediment incubations amended with unlabeled PHE and showed substantial increases in gene abundance during the experiments. We also isolated a strain, BG-2, representing the SIP-identified Cycloclasticus sequence (99.9% 16S rRNA gene sequence identity), and used this strain to provide direct evidence of PHE degradation and mineralization. In addition, we isolated Halomonas, Thalassospira, and Lutibacterium sp. with demonstrable PHE-degrading capacity from Guaymas Basin sediment. This study demonstrates the value of coupling SIP with cultivation methods to identify and expand on the known diversity of PAH-degrading bacteria in the deep-sea. PMID:26217326
Zhang, Yiming; Jin, Quan; Wang, Shuting; Ren, Ren
2011-05-01
The mobile behavior of 1481 peptides in ion mobility spectrometry (IMS), which are generated by protease digestion of the Drosophila melanogaster proteome, is modeled and predicted based on two different types of characterization methods, i.e. sequence-based approach and structure-based approach. In this procedure, the sequence-based approach considers both the amino acid composition of a peptide and the local environment profile of each amino acid in the peptide; the structure-based approach is performed with the CODESSA protocol, which regards a peptide as a common organic compound and generates more than 200 statistically significant variables to characterize the whole structure profile of a peptide molecule. Subsequently, the nonlinear support vector machine (SVM) and Gaussian process (GP) as well as linear partial least squares (PLS) regression is employed to correlate the structural parameters of the characterizations with the IMS drift times of these peptides. The obtained quantitative structure-spectrum relationship (QSSR) models are evaluated rigorously and investigated systematically via both one-deep and two-deep cross-validations as well as the rigorous Monte Carlo cross-validation (MCCV). We also give a comprehensive comparison on the resulting statistics arising from the different combinations of variable types with modeling methods and find that the sequence-based approach can give the QSSR models with better fitting ability and predictive power but worse interpretability than the structure-based approach. In addition, though the QSSR modeling using sequence-based approach is not needed for the preparation of the minimization structures of peptides before the modeling, it would be considerably efficient as compared to that using structure-based approach. Copyright © 2011 Elsevier Ltd. All rights reserved.
Quasispecies Analyses of the HIV-1 Near-full-length Genome With Illumina MiSeq
Ode, Hirotaka; Matsuda, Masakazu; Matsuoka, Kazuhiro; Hachiya, Atsuko; Hattori, Junko; Kito, Yumiko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru
2015-01-01
Human immunodeficiency virus type-1 (HIV-1) exhibits high between-host genetic diversity and within-host heterogeneity, recognized as quasispecies. Because HIV-1 quasispecies fluctuate in terms of multiple factors, such as antiretroviral exposure and host immunity, analyzing the HIV-1 genome is critical for selecting effective antiretroviral therapy and understanding within-host viral coevolution mechanisms. Here, to obtain HIV-1 genome sequence information that includes minority variants, we sought to develop a method for evaluating quasispecies throughout the HIV-1 near-full-length genome using the Illumina MiSeq benchtop deep sequencer. To ensure the reliability of minority mutation detection, we applied an analysis method of sequence read mapping onto a consensus sequence derived from de novo assembly followed by iterative mapping and subsequent unique error correction. Deep sequencing analyses of aHIV-1 clone showed that the analysis method reduced erroneous base prevalence below 1% in each sequence position and discarded only < 1% of all collected nucleotides, maximizing the usage of the collected genome sequences. Further, we designed primer sets to amplify the HIV-1 near-full-length genome from clinical plasma samples. Deep sequencing of 92 samples in combination with the primer sets and our analysis method provided sufficient coverage to identify >1%-frequency sequences throughout the genome. When we evaluated sequences of pol genes from 18 treatment-naïve patients' samples, the deep sequencing results were in agreement with Sanger sequencing and identified numerous additional minority mutations. The results suggest that our deep sequencing method would be suitable for identifying within-host viral population dynamics throughout the genome. PMID:26617593
Virus Identification in Unknown Tropical Febrile Illness Cases Using Deep Sequencing
Balmaseda, Angel; Harris, Eva; DeRisi, Joseph L.
2012-01-01
Dengue virus is an emerging infectious agent that infects an estimated 50–100 million people annually worldwide, yet current diagnostic practices cannot detect an etiologic pathogen in ∼40% of dengue-like illnesses. Metagenomic approaches to pathogen detection, such as viral microarrays and deep sequencing, are promising tools to address emerging and non-diagnosable disease challenges. In this study, we used the Virochip microarray and deep sequencing to characterize the spectrum of viruses present in human sera from 123 Nicaraguan patients presenting with dengue-like symptoms but testing negative for dengue virus. We utilized a barcoding strategy to simultaneously deep sequence multiple serum specimens, generating on average over 1 million reads per sample. We then implemented a stepwise bioinformatic filtering pipeline to remove the majority of human and low-quality sequences to improve the speed and accuracy of subsequent unbiased database searches. By deep sequencing, we were able to detect virus sequence in 37% (45/123) of previously negative cases. These included 13 cases with Human Herpesvirus 6 sequences. Other samples contained sequences with similarity to sequences from viruses in the Herpesviridae, Flaviviridae, Circoviridae, Anelloviridae, Asfarviridae, and Parvoviridae families. In some cases, the putative viral sequences were virtually identical to known viruses, and in others they diverged, suggesting that they may derive from novel viruses. These results demonstrate the utility of unbiased metagenomic approaches in the detection of known and divergent viruses in the study of tropical febrile illness. PMID:22347512
deepTools: a flexible platform for exploring deep-sequencing data.
Ramírez, Fidel; Dündar, Friederike; Diehl, Sarah; Grüning, Björn A; Manke, Thomas
2014-07-01
We present a Galaxy based web server for processing and visualizing deeply sequenced data. The web server's core functionality consists of a suite of newly developed tools, called deepTools, that enable users with little bioinformatic background to explore the results of their sequencing experiments in a standardized setting. Users can upload pre-processed files with continuous data in standard formats and generate heatmaps and summary plots in a straight-forward, yet highly customizable manner. In addition, we offer several tools for the analysis of files containing aligned reads and enable efficient and reproducible generation of normalized coverage files. As a modular and open-source platform, deepTools can easily be expanded and customized to future demands and developments. The deepTools webserver is freely available at http://deeptools.ie-freiburg.mpg.de and is accompanied by extensive documentation and tutorials aimed at conveying the principles of deep-sequencing data analysis. The web server can be used without registration. deepTools can be installed locally either stand-alone or as part of Galaxy. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Jacono, Andrew A; Malone, Melanie H; Talei, Benjamin
2015-07-01
Facial aging is a complicated process that includes volume loss and soft tissue descent. This study provides quantitative 3-dimensional (3D) data on the long-term effect of vertical vector deep-plane rhytidectomy on restoring volume to the midface. To determine if primary vertical vector deep-plane rhytidectomy resulted in long-term volume change in the midface. We performed a prospective study on patients undergoing primary vertical vector deep-plane rhytidectomy to quantitate 3D volume changes in the midface. Quantitative analysis of volume changes was made using the Vectra 3D imaging software (Canfield Scientific, Inc, Fairfield, New Jersey) at a minimum follow-up of 1 year. Forty-three patients (86 hemifaces) were analyzed. The average volume gained in each hemi-midface after vertical vector deep-plane rhytidectomy was 3.2 mL. Vertical vector deep-plane rhytidectomy provides significant long-term augmentation of volume in the midface. These quantitative data demonstrate that some midface volume loss is related to gravitational descent of the cheek fat compartments and that vertical vector deep-plane rhytidectomy may obviate the need for other volumization procedures such as autologous fat grafting in selected cases. 4 Therapeutic. © 2015 The American Society for Aesthetic Plastic Surgery, Inc. Reprints and permission: journals.permissions@oup.com.
Complex and dynamic landscape of RNA polyadenylation revealed by PAS-Seq
Shepard, Peter J.; Choi, Eun-A; Lu, Jente; Flanagan, Lisa A.; Hertel, Klemens J.; Shi, Yongsheng
2011-01-01
Alternative polyadenylation (APA) of mRNAs has emerged as an important mechanism for post-transcriptional gene regulation in higher eukaryotes. Although microarrays have recently been used to characterize APA globally, they have a number of serious limitations that prevents comprehensive and highly quantitative analysis. To better characterize APA and its regulation, we have developed a deep sequencing-based method called Poly(A) Site Sequencing (PAS-Seq) for quantitatively profiling RNA polyadenylation at the transcriptome level. PAS-Seq not only accurately and comprehensively identifies poly(A) junctions in mRNAs and noncoding RNAs, but also provides quantitative information on the relative abundance of polyadenylated RNAs. PAS-Seq analyses of human and mouse transcriptomes showed that 40%–50% of all expressed genes produce alternatively polyadenylated mRNAs. Furthermore, our study detected evolutionarily conserved polyadenylation of histone mRNAs and revealed novel features of mitochondrial RNA polyadenylation. Finally, PAS-Seq analyses of mouse embryonic stem (ES) cells, neural stem/progenitor (NSP) cells, and neurons not only identified more poly(A) sites than what was found in the entire mouse EST database, but also detected significant changes in the global APA profile that lead to lengthening of 3′ untranslated regions (UTR) in many mRNAs during stem cell differentiation. Together, our PAS-Seq analyses revealed a complex landscape of RNA polyadenylation in mammalian cells and the dynamic regulation of APA during stem cell differentiation. PMID:21343387
Leynes, Andrew P; Yang, Jaewon; Wiesinger, Florian; Kaushik, Sandeep S; Shanbhag, Dattesh D; Seo, Youngho; Hope, Thomas A; Larson, Peder E Z
2018-05-01
Accurate quantification of uptake on PET images depends on accurate attenuation correction in reconstruction. Current MR-based attenuation correction methods for body PET use a fat and water map derived from a 2-echo Dixon MRI sequence in which bone is neglected. Ultrashort-echo-time or zero-echo-time (ZTE) pulse sequences can capture bone information. We propose the use of patient-specific multiparametric MRI consisting of Dixon MRI and proton-density-weighted ZTE MRI to directly synthesize pseudo-CT images with a deep learning model: we call this method ZTE and Dixon deep pseudo-CT (ZeDD CT). Methods: Twenty-six patients were scanned using an integrated 3-T time-of-flight PET/MRI system. Helical CT images of the patients were acquired separately. A deep convolutional neural network was trained to transform ZTE and Dixon MR images into pseudo-CT images. Ten patients were used for model training, and 16 patients were used for evaluation. Bone and soft-tissue lesions were identified, and the SUV max was measured. The root-mean-squared error (RMSE) was used to compare the MR-based attenuation correction with the ground-truth CT attenuation correction. Results: In total, 30 bone lesions and 60 soft-tissue lesions were evaluated. The RMSE in PET quantification was reduced by a factor of 4 for bone lesions (10.24% for Dixon PET and 2.68% for ZeDD PET) and by a factor of 1.5 for soft-tissue lesions (6.24% for Dixon PET and 4.07% for ZeDD PET). Conclusion: ZeDD CT produces natural-looking and quantitatively accurate pseudo-CT images and reduces error in pelvic PET/MRI attenuation correction compared with standard methods. © 2018 by the Society of Nuclear Medicine and Molecular Imaging.
Low-Latency Telerobotic Sample Return and Biomolecular Sequencing for Deep Space Gateway
NASA Astrophysics Data System (ADS)
Lupisella, M.; Bleacher, J.; Lewis, R.; Dworkin, J.; Wright, M.; Burton, A.; Rubins, K.; Wallace, S.; Stahl, S.; John, K.; Archer, D.; Niles, P.; Regberg, A.; Smith, D.; Race, M.; Chiu, C.; Russell, J.; Rampe, E.; Bywaters, K.
2018-02-01
Low-latency telerobotics, crew-assisted sample return, and biomolecular sequencing can be used to acquire and analyze lunar farside and/or Apollo landing site samples. Sequencing can also be used to monitor and study Deep Space Gateway environment and crew health.
Zhang, Likui; Kang, Manyu; Huang, Yangchao; Yang, Lixiang
2016-05-01
The diversity and ecological significance of bacteria and archaea in deep-sea environments have been thoroughly investigated, but eukaryotic microorganisms in these areas, such as fungi, are poorly understood. To elucidate fungal diversity in calcareous deep-sea sediments in the Southwest India Ridge (SWIR), the internal transcribed spacer (ITS) regions of rRNA genes from two sediment metagenomic DNA samples were amplified and sequenced using the Illumina sequencing platform. The results revealed that 58-63 % and 36-42 % of the ITS sequences (97 % similarity) belonged to Basidiomycota and Ascomycota, respectively. These findings suggest that Basidiomycota and Ascomycota are the predominant fungal phyla in the two samples. We also found that Agaricomycetes, Leotiomycetes, and Pezizomycetes were the major fungal classes in the two samples. At the species level, Thelephoraceae sp. and Phialocephala fortinii were major fungal species in the two samples. Despite the low relative abundance, unidentified fungal sequences were also observed in the two samples. Furthermore, we found that there were slight differences in fungal diversity between the two sediment samples, although both were collected from the SWIR. Thus, our results demonstrate that calcareous deep-sea sediments in the SWIR harbor diverse fungi, which augment the fungal groups in deep-sea sediments. This is the first report of fungal communities in calcareous deep-sea sediments in the SWIR revealed by Illumina sequencing.
Zhang, Xi
2016-12-01
Neurotransmitter ligand-gated ion channels (LGICs) are widespread and pivotal in brain functions. Unveiling their structure-function mechanisms is crucial to drive drug discovery, and demands robust proteomic quantitation of expression, post-translational modifications (PTMs) and dynamic structures. Yet unbiased digestion of these modified transmembrane proteins-at high efficiency and peptide reproducibility-poses the obstacle. Targeting both enzyme-substrate contacts and PTMs for peptide formation and detection, we devised flow-and-detergent-facilitated protease and de-PTM digestions for deep sequencing (FDD) method that combined omni-compatible detergent, tandem immobilized protease/PNGase columns, and Cys-selective reduction/alkylation, to achieve streamlined ultradeep peptide preparation within minutes not days, at high peptide reproducibility and low abundance-bias. FDD transformed enzyme-protein contacts into equal catalytic travel paths through enzyme-excessive columns regardless of protein abundance, removed products instantly preventing inhibition, tackled intricate structures via sequential multiple micro-digestions along the flow, and precisely controlled peptide formation by flow rate. Peptide-stage reactions reduced steric bias; low contamination deepened MS/MS scan; distinguishing disulfide from M oxidation and avoiding gain/loss artifacts unmasked protein-endogenous oxidation states. Using a recent interactome of 285-kDa human GABA type A receptor, this pilot study validated FDD platform's applicability to deep sequencing (up to 99% coverage), H/D-exchange and TMT-based structural mapping. FDD discovered novel subunit-specific PTM signatures, including unusual nontop-surface N-glycosylations, that may drive subunit biases in human Cys-loop LGIC assembly and pharmacology, by redefining subunit/ligand interfaces and connecting function domains. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.
Jeanne, Nicolas; Saliou, Adrien; Carcenac, Romain; Lefebvre, Caroline; Dubois, Martine; Cazabat, Michelle; Nicot, Florence; Loiseau, Claire; Raymond, Stéphanie; Izopet, Jacques; Delobel, Pierre
2015-01-01
HIV-1 coreceptor usage must be accurately determined before starting CCR5 antagonist-based treatment as the presence of undetected minor CXCR4-using variants can cause subsequent virological failure. Ultra-deep pyrosequencing of HIV-1 V3 env allows to detect low levels of CXCR4-using variants that current genotypic approaches miss. However, the computation of the mass of sequence data and the need to identify true minor variants while excluding artifactual sequences generated during amplification and ultra-deep pyrosequencing is rate-limiting. Arbitrary fixed cut-offs below which minor variants are discarded are currently used but the errors generated during ultra-deep pyrosequencing are sequence-dependant rather than random. We have developed an automated processing of HIV-1 V3 env ultra-deep pyrosequencing data that uses biological filters to discard artifactual or non-functional V3 sequences followed by statistical filters to determine position-specific sensitivity thresholds, rather than arbitrary fixed cut-offs. It allows to retain authentic sequences with point mutations at V3 positions of interest and discard artifactual ones with accurate sensitivity thresholds. PMID:26585833
Jeanne, Nicolas; Saliou, Adrien; Carcenac, Romain; Lefebvre, Caroline; Dubois, Martine; Cazabat, Michelle; Nicot, Florence; Loiseau, Claire; Raymond, Stéphanie; Izopet, Jacques; Delobel, Pierre
2015-11-20
HIV-1 coreceptor usage must be accurately determined before starting CCR5 antagonist-based treatment as the presence of undetected minor CXCR4-using variants can cause subsequent virological failure. Ultra-deep pyrosequencing of HIV-1 V3 env allows to detect low levels of CXCR4-using variants that current genotypic approaches miss. However, the computation of the mass of sequence data and the need to identify true minor variants while excluding artifactual sequences generated during amplification and ultra-deep pyrosequencing is rate-limiting. Arbitrary fixed cut-offs below which minor variants are discarded are currently used but the errors generated during ultra-deep pyrosequencing are sequence-dependant rather than random. We have developed an automated processing of HIV-1 V3 env ultra-deep pyrosequencing data that uses biological filters to discard artifactual or non-functional V3 sequences followed by statistical filters to determine position-specific sensitivity thresholds, rather than arbitrary fixed cut-offs. It allows to retain authentic sequences with point mutations at V3 positions of interest and discard artifactual ones with accurate sensitivity thresholds.
Accurate identification of RNA editing sites from primitive sequence with deep neural networks.
Ouyang, Zhangyi; Liu, Feng; Zhao, Chenghui; Ren, Chao; An, Gaole; Mei, Chuan; Bo, Xiaochen; Shu, Wenjie
2018-04-16
RNA editing is a post-transcriptional RNA sequence alteration. Current methods have identified editing sites and facilitated research but require sufficient genomic annotations and prior-knowledge-based filtering steps, resulting in a cumbersome, time-consuming identification process. Moreover, these methods have limited generalizability and applicability in species with insufficient genomic annotations or in conditions of limited prior knowledge. We developed DeepRed, a deep learning-based method that identifies RNA editing from primitive RNA sequences without prior-knowledge-based filtering steps or genomic annotations. DeepRed achieved 98.1% and 97.9% area under the curve (AUC) in training and test sets, respectively. We further validated DeepRed using experimentally verified U87 cell RNA-seq data, achieving 97.9% positive predictive value (PPV). We demonstrated that DeepRed offers better prediction accuracy and computational efficiency than current methods with large-scale, mass RNA-seq data. We used DeepRed to assess the impact of multiple factors on editing identification with RNA-seq data from the Association of Biomolecular Resource Facilities and Sequencing Quality Control projects. We explored developmental RNA editing pattern changes during human early embryogenesis and evolutionary patterns in Drosophila species and the primate lineage using DeepRed. Our work illustrates DeepRed's state-of-the-art performance; it may decipher the hidden principles behind RNA editing, making editing detection convenient and effective.
NASA Astrophysics Data System (ADS)
Lloyd, K. G.; Bird, J. T.; Shumaker, A.
2014-12-01
Very little is known about how evolutionary branches that are distantly related to cultured microorganisms make a living in the deep subsurface marine environment. Here, sediments are cut-off from surface inputs of organic substrates for tens of thousands of years; yet somehow support a diverse population of microorganisms. We examined the potential metabolic and ecological roles of uncultured archaea and bacteria in IODP Leg 347: Baltic Sea Paleoenvironment samples, using quantitative PCR holes 60B, 63E, 65C, and 59C and single cell genomic analysis for hole 60B. We quantified changes in total archaea and bacteria, as well as deeply-branching archaeal taxa with depth. These sediment cores alternate between high and low salinities, following a glacial cycle. This allows changes in the quantities of these groups to be placed in the context of potentially vastly different organic matter sources. In addition, single cells were isolated, and their genomes were amplified and sequenced to allow a deeper look into potential physiologies of uncultured deeply-branching organisms found up to 86 meters deep in marine sediments. Together, these data provide deeper insight into the relationship between microorganisms and their organic matter substrates in this extreme environments.
microRNA expression profiling in fetal single ventricle malformation identified by deep sequencing.
Yu, Zhang-Bin; Han, Shu-Ping; Bai, Yun-Fei; Zhu, Chun; Pan, Ya; Guo, Xi-Rong
2012-01-01
microRNAs (miRNAs) have emerged as key regulators in many biological processes, particularly cardiac growth and development, although the specific miRNA expression profile associated with this process remains to be elucidated. This study aimed to characterize the cellular microRNA profile involved in the development of congenital heart malformation, through the investigation of single ventricle (SV) defects. Comprehensive miRNA profiling in human fetal SV cardiac tissue was performed by deep sequencing. Differential expression of 48 miRNAs was revealed by sequencing by oligonucleotide ligation and detection (SOLiD) analysis. Of these, 38 were down-regulated and 10 were up-regulated in differentiated SV cardiac tissue, compared to control cardiac tissue. This was confirmed by real-time quantitative reverse transcription-polymerase chain reaction (qRT-PCR) analysis. Predicted target genes of the 48 differentially expressed miRNAs were analyzed by gene ontology and categorized according to cellular process, regulation of biological process and metabolic process. Pathway-Express analysis identified the WNT and mTOR signaling pathways as the most significant processes putatively affected by the differential expression of these miRNAs. The candidate genes involved in cardiac development were identified as potential targets for these differentially expressed microRNAs and the collaborative network of microRNAs and cardiac development related-mRNAs was constructed. These data provide the basis for future investigation of the mechanism of the occurrence and development of fetal SV malformations.
A deep learning method for lincRNA detection using auto-encoder algorithm.
Yu, Ning; Yu, Zeng; Pan, Yi
2017-12-06
RNA sequencing technique (RNA-seq) enables scientists to develop novel data-driven methods for discovering more unidentified lincRNAs. Meantime, knowledge-based technologies are experiencing a potential revolution ignited by the new deep learning methods. By scanning the newly found data set from RNA-seq, scientists have found that: (1) the expression of lincRNAs appears to be regulated, that is, the relevance exists along the DNA sequences; (2) lincRNAs contain some conversed patterns/motifs tethered together by non-conserved regions. The two evidences give the reasoning for adopting knowledge-based deep learning methods in lincRNA detection. Similar to coding region transcription, non-coding regions are split at transcriptional sites. However, regulatory RNAs rather than message RNAs are generated. That is, the transcribed RNAs participate the biological process as regulatory units instead of generating proteins. Identifying these transcriptional regions from non-coding regions is the first step towards lincRNA recognition. The auto-encoder method achieves 100% and 92.4% prediction accuracy on transcription sites over the putative data sets. The experimental results also show the excellent performance of predictive deep neural network on the lincRNA data sets compared with support vector machine and traditional neural network. In addition, it is validated through the newly discovered lincRNA data set and one unreported transcription site is found by feeding the whole annotated sequences through the deep learning machine, which indicates that deep learning method has the extensive ability for lincRNA prediction. The transcriptional sequences of lincRNAs are collected from the annotated human DNA genome data. Subsequently, a two-layer deep neural network is developed for the lincRNA detection, which adopts the auto-encoder algorithm and utilizes different encoding schemes to obtain the best performance over intergenic DNA sequence data. Driven by those newly annotated lincRNA data, deep learning methods based on auto-encoder algorithm can exert their capability in knowledge learning in order to capture the useful features and the information correlation along DNA genome sequences for lincRNA detection. As our knowledge, this is the first application to adopt the deep learning techniques for identifying lincRNA transcription sequences.
Jima, Dereje D.; Zhang, Jenny; Jacobs, Cassandra; Richards, Kristy L.; Dunphy, Cherie H.; Choi, William W. L.; Yan Au, Wing; Srivastava, Gopesh; Czader, Magdalena B.; Rizzieri, David A.; Lagoo, Anand S.; Lugar, Patricia L.; Mann, Karen P.; Flowers, Christopher R.; Bernal-Mizrachi, Leon; Naresh, Kikkeri N.; Evens, Andrew M.; Gordon, Leo I.; Luftig, Micah; Friedman, Daphne R.; Weinberg, J. Brice; Thompson, Michael A.; Gill, Javed I.; Liu, Qingquan; How, Tam; Grubor, Vladimir; Gao, Yuan; Patel, Amee; Wu, Han; Zhu, Jun; Blobe, Gerard C.; Lipsky, Peter E.; Chadburn, Amy
2010-01-01
A role for microRNA (miRNA) has been recognized in nearly every biologic system examined thus far. A complete delineation of their role must be preceded by the identification of all miRNAs present in any system. We elucidated the complete small RNA transcriptome of normal and malignant B cells through deep sequencing of 31 normal and malignant human B-cell samples that comprise the spectrum of B-cell differentiation and common malignant phenotypes. We identified the expression of 333 known miRNAs, which is more than twice the number previously recognized in any tissue type. We further identified the expression of 286 candidate novel miRNAs in normal and malignant B cells. These miRNAs were validated at a high rate (92%) using quantitative polymerase chain reaction, and we demonstrated their application in the distinction of clinically relevant subgroups of lymphoma. We further demonstrated that a novel miRNA cluster, previously annotated as a hypothetical gene LOC100130622, contains 6 novel miRNAs that regulate the transforming growth factor-β pathway. Thus, our work suggests that more than a third of the miRNAs present in most cellular types are currently unknown and that these miRNAs may regulate important cellular functions. PMID:20733160
nrDNA:mtDNA copy number ratios as a comparative metric for evolutionary and conservation genetics.
Goodall-Copestake, William Paul
2018-05-12
Identifying genetic cues of functional relevance is key to understanding the drivers of evolution and increasingly important for the conservation of biodiversity. This study introduces nuclear ribosomal DNA (nrDNA) to mitochondrial DNA (mtDNA) copy number ratios as a metric with which to screen for this functional genetic variation prior to more extensive omics analyses. To illustrate the metric, quantitative PCR was used to estimate nrDNA (18S) to mtDNA (16S) copy number ratios in muscle tissue from samples of two zooplankton species: Salpa thompsoni caught near Elephant Island (Southern Ocean) and S. fusiformis sampled off Gough Island (South Atlantic). Average 18S:16S ratios in these samples were 9:1 and 3:1, respectively. nrDNA 45S arrays and mitochondrial genomes were then deep sequenced to uncover the sources of intra-individual genetic variation underlying these 18S:16S copy number differences. The deep sequencing profiles obtained were consistent with genetic changes resulting from adaptive processes, including an expansion of nrDNA and damage to mtDNA in S. thompsoni, potentially in response to the polar environment. Beyond this example from zooplankton, nrDNA:mtDNA copy number ratios offer a promising metric to help identify genetic variation of functional relevance in animals more broadly.
Jobst-Schwan, Tilman; Schmidt, Johanna Magdalena; Schneider, Ronen; Hoogstraten, Charlotte A; Ullmann, Jeremy F P; Schapiro, David; Majmundar, Amar J; Kolb, Amy; Eddy, Kaitlyn; Shril, Shirlee; Braun, Daniela A; Poduri, Annapurna; Hildebrandt, Friedhelm
2018-01-01
Until recently, morpholino oligonucleotides have been widely employed in zebrafish as an acute and efficient loss-of-function assay. However, off-target effects and reproducibility issues when compared to stable knockout lines have compromised their further use. Here we employed an acute CRISPR/Cas approach using multiple single guide RNAs targeting simultaneously different positions in two exemplar genes (osgep or tprkb) to increase the likelihood of generating mutations on both alleles in the injected F0 generation and to achieve a similar effect as morpholinos but with the reproducibility of stable lines. This multi single guide RNA approach resulted in median likelihoods for at least one mutation on each allele of >99% and sgRNA specific insertion/deletion profiles as revealed by deep-sequencing. Immunoblot showed a significant reduction for Osgep and Tprkb proteins. For both genes, the acute multi-sgRNA knockout recapitulated the microcephaly phenotype and reduction in survival that we observed previously in stable knockout lines, though milder in the acute multi-sgRNA knockout. Finally, we quantify the degree of mutagenesis by deep sequencing, and provide a mathematical model to quantitate the chance for a biallelic loss-of-function mutation. Our findings can be generalized to acute and stable CRISPR/Cas targeting for any zebrafish gene of interest.
Guo, Xingyi; Shi, Jiajun; Cai, Qiuyin; Shu, Xiao-Ou; He, Jing; Wen, Wanqing; Allen, Jamie; Pharoah, Paul; Dunning, Alison; Hunter, David J; Kraft, Peter; Easton, Douglas F; Zheng, Wei; Long, Jirong
2018-03-01
Functional disruptions of susceptibility genes by large genomic structure variant (SV) deletions in germlines are known to be associated with cancer risk. However, few studies have been conducted to systematically search for SV deletions in breast cancer susceptibility genes. We analysed deep (> 30x) whole-genome sequencing (WGS) data generated in blood samples from 128 breast cancer patients of Asian and European descent with either a strong family history of breast cancer or early cancer onset disease. To identify SV deletions in known or suspected breast cancer susceptibility genes, we used multiple SV calling tools including Genome STRiP, Delly, Manta, BreakDancer and Pindel. SV deletions were detected by at least three of these bioinformatics tools in five genes. Specifically, we identified heterozygous deletions covering a fraction of the coding regions of BRCA1 (with approximately 80kb in two patients), and TP53 genes (with ∼1.6 kb in two patients), and of intronic regions (∼1 kb) of the PALB2 (one patient), PTEN (three patients) and RAD51C genes (one patient). We confirmed the presence of these deletions using real-time quantitative PCR (qPCR). Our study identified novel SV deletions in breast cancer susceptibility genes and the identification of such SV deletions may improve clinical testing.
ATAC-see reveals the accessible genome by transposase-mediated imaging and sequencing.
Chen, Xingqi; Shen, Ying; Draper, Will; Buenrostro, Jason D; Litzenburger, Ulrike; Cho, Seung Woo; Satpathy, Ansuman T; Carter, Ava C; Ghosh, Rajarshi P; East-Seletsky, Alexandra; Doudna, Jennifer A; Greenleaf, William J; Liphardt, Jan T; Chang, Howard Y
2016-12-01
Spatial organization of the genome plays a central role in gene expression, DNA replication, and repair. But current epigenomic approaches largely map DNA regulatory elements outside of the native context of the nucleus. Here we report assay of transposase-accessible chromatin with visualization (ATAC-see), a transposase-mediated imaging technology that employs direct imaging of the accessible genome in situ, cell sorting, and deep sequencing to reveal the identity of the imaged elements. ATAC-see revealed the cell-type-specific spatial organization of the accessible genome and the coordinated process of neutrophil chromatin extrusion, termed NETosis. Integration of ATAC-see with flow cytometry enables automated quantitation and prospective cell isolation as a function of chromatin accessibility, and it reveals a cell-cycle dependence of chromatin accessibility that is especially dynamic in G1 phase. The integration of imaging and epigenomics provides a general and scalable approach for deciphering the spatiotemporal architecture of gene control.
Medical Sequencing at the extremes of Human Body Mass
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ahituv, Nadav; Kavaslar, Nihan; Schackwitz, Wendy
2006-09-01
Body weight is a quantitative trait with significantheritability in humans. To identify potential genetic contributors tothis phenotype, we resequenced the coding exons and splice junctions of58 genes in 379 obese and 378 lean individuals. Our 96Mb survey included21 genes associated with monogenic forms of obesity in humans or mice, aswell as 37 genes that function in body weight-related pathways. We foundthat the monogenic obesity-associated gene group was enriched for rarenonsynonymous variants unique to the obese (n=46) versus lean (n=26)populations. Computational analysis further predicted a significantlygreater fraction of deleterious variants within the obese cohort.Consistent with the complex inheritance of body weight,more » we did notobserve obvious familial segregation in the majority of the 28 availablekindreds. Taken together, these data suggest that multiple rare alleleswith variable penetrance contribute to obesity in the population andprovide a deep medical sequencing based approach to detectthem.« less
Metatranscriptomic analyses of honey bee colonies.
Tozkar, Cansu Ö; Kence, Meral; Kence, Aykut; Huang, Qiang; Evans, Jay D
2015-01-01
Honey bees face numerous biotic threats from viruses to bacteria, fungi, protists, and mites. Here we describe a thorough analysis of microbes harbored by worker honey bees collected from field colonies in geographically distinct regions of Turkey. Turkey is one of the World's most important centers of apiculture, harboring five subspecies of Apis mellifera L., approximately 20% of the honey bee subspecies in the world. We use deep ILLUMINA-based RNA sequencing to capture RNA species for the honey bee and a sampling of all non-endogenous species carried by bees. After trimming and mapping these reads to the honey bee genome, approximately 10% of the sequences (9-10 million reads per library) remained. These were then mapped to a curated set of public sequences containing ca. Sixty megabase-pairs of sequence representing known microbial species associated with honey bees. Levels of key honey bee pathogens were confirmed using quantitative PCR screens. We contrast microbial matches across different sites in Turkey, showing new country recordings of Lake Sinai virus, two Spiroplasma bacterium species, symbionts Candidatus Schmidhempelia bombi, Frischella perrara, Snodgrassella alvi, Gilliamella apicola, Lactobacillus spp.), neogregarines, and a trypanosome species. By using metagenomic analysis, this study also reveals deep molecular evidence for the presence of bacterial pathogens (Melissococcus plutonius, Paenibacillus larvae), Varroa destructor-1 virus, Sacbrood virus, and fungi. Despite this effort we did not detect KBV, SBPV, Tobacco ringspot virus, VdMLV (Varroa Macula like virus), Acarapis spp., Tropilaeleps spp. and Apocephalus (phorid fly). We discuss possible impacts of management practices and honey bee subspecies on microbial retinues. The described workflow and curated microbial database will be generally useful for microbial surveys of healthy and declining honey bees.
Metatranscriptomic analyses of honey bee colonies
Tozkar, Cansu Ö.; Kence, Meral; Kence, Aykut; Huang, Qiang; Evans, Jay D.
2015-01-01
Honey bees face numerous biotic threats from viruses to bacteria, fungi, protists, and mites. Here we describe a thorough analysis of microbes harbored by worker honey bees collected from field colonies in geographically distinct regions of Turkey. Turkey is one of the World's most important centers of apiculture, harboring five subspecies of Apis mellifera L., approximately 20% of the honey bee subspecies in the world. We use deep ILLUMINA-based RNA sequencing to capture RNA species for the honey bee and a sampling of all non-endogenous species carried by bees. After trimming and mapping these reads to the honey bee genome, approximately 10% of the sequences (9–10 million reads per library) remained. These were then mapped to a curated set of public sequences containing ca. Sixty megabase-pairs of sequence representing known microbial species associated with honey bees. Levels of key honey bee pathogens were confirmed using quantitative PCR screens. We contrast microbial matches across different sites in Turkey, showing new country recordings of Lake Sinai virus, two Spiroplasma bacterium species, symbionts Candidatus Schmidhempelia bombi, Frischella perrara, Snodgrassella alvi, Gilliamella apicola, Lactobacillus spp.), neogregarines, and a trypanosome species. By using metagenomic analysis, this study also reveals deep molecular evidence for the presence of bacterial pathogens (Melissococcus plutonius, Paenibacillus larvae), Varroa destructor-1 virus, Sacbrood virus, and fungi. Despite this effort we did not detect KBV, SBPV, Tobacco ringspot virus, VdMLV (Varroa Macula like virus), Acarapis spp., Tropilaeleps spp. and Apocephalus (phorid fly). We discuss possible impacts of management practices and honey bee subspecies on microbial retinues. The described workflow and curated microbial database will be generally useful for microbial surveys of healthy and declining honey bees. PMID:25852743
USDA-ARS?s Scientific Manuscript database
Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RN...
Geoseq: a tool for dissecting deep-sequencing datasets.
Gurtowski, James; Cancio, Anthony; Shah, Hardik; Levovitz, Chaya; George, Ajish; Homann, Robert; Sachidanandam, Ravi
2010-10-12
Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a) identify differential isoform expression in mRNA-seq datasets, b) identify miRNAs (microRNAs) in libraries, and identify mature and star sequences in miRNAS and c) to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.
Detection of Emerging Vaccine-Related Polioviruses by Deep Sequencing.
Sahoo, Malaya K; Holubar, Marisa; Huang, ChunHong; Mohamed-Hadley, Alisha; Liu, Yuanyuan; Waggoner, Jesse J; Troy, Stephanie B; Garcia-Garcia, Lourdes; Ferreyra-Reyes, Leticia; Maldonado, Yvonne; Pinsky, Benjamin A
2017-07-01
Oral poliovirus vaccine can mutate to regain neurovirulence. To date, evaluation of these mutations has been performed primarily on culture-enriched isolates by using conventional Sanger sequencing. We therefore developed a culture-independent, deep-sequencing method targeting the 5' untranslated region (UTR) and P1 genomic region to characterize vaccine-related poliovirus variants. Error analysis of the deep-sequencing method demonstrated reliable detection of poliovirus mutations at levels of <1%, depending on read depth. Sequencing of viral nucleic acids from the stool of vaccinated, asymptomatic children and their close contacts collected during a prospective cohort study in Veracruz, Mexico, revealed no vaccine-derived polioviruses. This was expected given that the longest duration between sequenced sample collection and the end of the most recent national immunization week was 66 days. However, we identified many low-level variants (<5%) distributed across the 5' UTR and P1 genomic region in all three Sabin serotypes, as well as vaccine-related viruses with multiple canonical mutations associated with phenotypic reversion present at high levels (>90%). These results suggest that monitoring emerging vaccine-related poliovirus variants by deep sequencing may aid in the poliovirus endgame and efforts to ensure global polio eradication. Copyright © 2017 Sahoo et al.
Rational Protein Engineering Guided by Deep Mutational Scanning
Shin, HyeonSeok; Cho, Byung-Kwan
2015-01-01
Sequence–function relationship in a protein is commonly determined by the three-dimensional protein structure followed by various biochemical experiments. However, with the explosive increase in the number of genome sequences, facilitated by recent advances in sequencing technology, the gap between protein sequences available and three-dimensional structures is rapidly widening. A recently developed method termed deep mutational scanning explores the functional phenotype of thousands of mutants via massive sequencing. Coupled with a highly efficient screening system, this approach assesses the phenotypic changes made by the substitution of each amino acid sequence that constitutes a protein. Such an informational resource provides the functional role of each amino acid sequence, thereby providing sufficient rationale for selecting target residues for protein engineering. Here, we discuss the current applications of deep mutational scanning and consider experimental design. PMID:26404267
Burkholder, William F; Newell, Evan W; Poidinger, Michael; Chen, Swaine; Fink, Katja
2017-01-01
The inaugural workshop "Deep Sequencing in Infectious Diseases: Immune and Pathogen Repertoires for the Improvement of Patient Outcomes" was held in Singapore on 13-14 October 2016. The aim of the workshop was to discuss the latest trends in using high-throughput sequencing, bioinformatics, and allied technologies to analyze immune and pathogen repertoires and their interplay within the host, bringing together key international players in the field and Singapore-based researchers and clinician-scientists. The focus was in particular on the application of these technologies for the improvement of patient diagnosis, prognosis and treatment, and for other broad public health outcomes. The presentations by scientists and clinicians showed the potential of deep sequencing technology to capture the coevolution of adaptive immunity and pathogens. For clinical applications, some key challenges remain, such as the long turnaround time and relatively high cost of deep sequencing for pathogen identification and characterization and the lack of international standardization in immune repertoire analysis.
Burkholder, William F.; Newell, Evan W.; Poidinger, Michael; Chen, Swaine; Fink, Katja
2017-01-01
The inaugural workshop “Deep Sequencing in Infectious Diseases: Immune and Pathogen Repertoires for the Improvement of Patient Outcomes” was held in Singapore on 13–14 October 2016. The aim of the workshop was to discuss the latest trends in using high-throughput sequencing, bioinformatics, and allied technologies to analyze immune and pathogen repertoires and their interplay within the host, bringing together key international players in the field and Singapore-based researchers and clinician-scientists. The focus was in particular on the application of these technologies for the improvement of patient diagnosis, prognosis and treatment, and for other broad public health outcomes. The presentations by scientists and clinicians showed the potential of deep sequencing technology to capture the coevolution of adaptive immunity and pathogens. For clinical applications, some key challenges remain, such as the long turnaround time and relatively high cost of deep sequencing for pathogen identification and characterization and the lack of international standardization in immune repertoire analysis. PMID:28620372
Dai, Hanjun; Umarov, Ramzan; Kuwahara, Hiroyuki; Li, Yu; Song, Le; Gao, Xin
2017-11-15
An accurate characterization of transcription factor (TF)-DNA affinity landscape is crucial to a quantitative understanding of the molecular mechanisms underpinning endogenous gene regulation. While recent advances in biotechnology have brought the opportunity for building binding affinity prediction methods, the accurate characterization of TF-DNA binding affinity landscape still remains a challenging problem. Here we propose a novel sequence embedding approach for modeling the transcription factor binding affinity landscape. Our method represents DNA binding sequences as a hidden Markov model which captures both position specific information and long-range dependency in the sequence. A cornerstone of our method is a novel message passing-like embedding algorithm, called Sequence2Vec, which maps these hidden Markov models into a common nonlinear feature space and uses these embedded features to build a predictive model. Our method is a novel combination of the strength of probabilistic graphical models, feature space embedding and deep learning. We conducted comprehensive experiments on over 90 large-scale TF-DNA datasets which were measured by different high-throughput experimental technologies. Sequence2Vec outperforms alternative machine learning methods as well as the state-of-the-art binding affinity prediction methods. Our program is freely available at https://github.com/ramzan1990/sequence2vec. xin.gao@kaust.edu.sa or lsong@cc.gatech.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Cognitive Implications of Deep Gray Matter Iron in Multiple Sclerosis.
Fujiwara, E; Kmech, J A; Cobzas, D; Sun, H; Seres, P; Blevins, G; Wilman, A H
2017-05-01
Deep gray matter iron accumulation is increasingly recognized in association with multiple sclerosis and can be measured in vivo with MR imaging. The cognitive implications of this pathology are not well-understood, especially vis-à-vis deep gray matter atrophy. Our aim was to investigate the relationships between cognition and deep gray matter iron in MS by using 2 MR imaging-based iron-susceptibility measures. Forty patients with multiple sclerosis (relapsing-remitting, n = 16; progressive, n = 24) and 27 healthy controls were imaged at 4.7T by using the transverse relaxation rate and quantitative susceptibility mapping. The transverse relaxation rate and quantitative susceptibility mapping values and volumes (atrophy) of the caudate, putamen, globus pallidus, and thalamus were determined by multiatlas segmentation. Cognition was assessed with the Brief Repeatable Battery of Neuropsychological Tests. Relationships between cognition and deep gray matter iron were examined by hierarchic regressions. Compared with controls, patients showed reduced memory ( P < .001) and processing speed ( P = .02) and smaller putamen ( P < .001), globus pallidus ( P = .002), and thalamic volumes ( P < .001). Quantitative susceptibility mapping values were increased in patients compared with controls in the putamen ( P = .003) and globus pallidus ( P = .003). In patients only, thalamus ( P < .001) and putamen ( P = .04) volumes were related to cognitive performance. After we controlled for volume effects, quantitative susceptibility mapping values in the globus pallidus ( P = .03; trend for transverse relaxation rate, P = .10) were still related to cognition. Quantitative susceptibility mapping was more sensitive compared with the transverse relaxation rate in detecting deep gray matter iron accumulation in the current multiple sclerosis cohort. Atrophy and iron accumulation in deep gray matter both have negative but separable relationships to cognition in multiple sclerosis. © 2017 by American Journal of Neuroradiology.
Kobayashi, Tohru; Koide, Osamu; Mori, Kozue; Shimamura, Shigeru; Matsuura, Takae; Miura, Takeshi; Takaki, Yoshihiro; Morono, Yuki; Nunoura, Takuro; Imachi, Hiroyuki; Inagaki, Fumio; Takai, Ken; Horikoshi, Koki
2008-07-01
"A meta-enzyme approach" is proposed as an ecological enzymatic method to explore the potential functions of microbial communities in extreme environments such as the deep marine subsurface. We evaluated a variety of extra-cellular enzyme activities of sediment slurries and isolates from a deep subseafloor sediment core. Using the new deep-sea drilling vessel "Chikyu", we obtained 365 m of core sediments that contained approximately 2% organic matter and considerable amounts of methane from offshore the Shimokita Peninsula in Japan at a water depth of 1,180 m. In the extra-sediment fraction of the slurry samples, phosphatase, esterase, and catalase activities were detected consistently throughout the core sediments down to the deepest slurry sample from 342.5 m below seafloor (mbsf). Detectable enzyme activities predicted the existence of a sizable population of viable aerobic microorganisms even in deep subseafloor habitats. The subsequent quantitative cultivation using solid media represented remarkably high numbers of aerobic, heterotrophic microbial populations (e.g., maximally 4.4x10(7) cells cm(-3) at 342.5 mbsf). Analysis of 16S rRNA gene sequences revealed that the predominant cultivated microbial components were affiliated with the genera Bacillus, Shewanella, Pseudoalteromonas, Halomonas, Pseudomonas, Paracoccus, Rhodococcus, Microbacterium, and Flexibacteracea. Many of the predominant and scarce isolates produced a variety of extra-cellular enzymes such as proteases, amylases, lipases, chitinases, phosphatases, and deoxyribonucleases. Our results indicate that microbes in the deep subseafloor environment off Shimokita are metabolically active and that the cultivable populations may have a great potential in biotechnology.
Identification of microRNAs differentially expressed involved in male flower development.
Wang, Zhengjia; Huang, Jianqin; Sun, Zhichao; Zheng, Bingsong
2015-03-01
Hickory (Carya cathayensis Sarg.) is one of the most economically important woody trees in eastern China, but its long flowering phase delays yield. Our understanding of the regulatory roles of microRNAs (miRNAs) in male flower development in hickory remains poor. Using high-throughput sequencing technology, we have pyrosequenced two small RNA libraries from two male flower differentiation stages in hickory. Analysis of the sequencing data identified 114 conserved miRNAs that belonged to 23 miRNA families, five novel miRNAs including their corresponding miRNA*s, and 22 plausible miRNA candidates. Differential expression analysis revealed 12 miRNA sequences that were upregulated in the later (reproductive) stage of male flower development. Quantitative real-time PCR showed similar expression trends as that of the deep sequencing. Novel miRNAs and plausible miRNA candidates were predicted using bioinformatic analysis methods. The miRNAs newly identified in this study have increased the number of known miRNAs in hickory, and the identification of differentially expressed miRNAs will provide new avenues for studies into miRNAs involved in the process of male flower development in hickory and other related trees.
DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data.
Arango-Argoty, Gustavo; Garner, Emily; Pruden, Amy; Heath, Lenwood S; Vikesland, Peter; Zhang, Liqing
2018-02-01
Growing concerns about increasing rates of antibiotic resistance call for expanded and comprehensive global monitoring. Advancing methods for monitoring of environmental media (e.g., wastewater, agricultural waste, food, and water) is especially needed for identifying potential resources of novel antibiotic resistance genes (ARGs), hot spots for gene exchange, and as pathways for the spread of ARGs and human exposure. Next-generation sequencing now enables direct access and profiling of the total metagenomic DNA pool, where ARGs are typically identified or predicted based on the "best hits" of sequence searches against existing databases. Unfortunately, this approach produces a high rate of false negatives. To address such limitations, we propose here a deep learning approach, taking into account a dissimilarity matrix created using all known categories of ARGs. Two deep learning models, DeepARG-SS and DeepARG-LS, were constructed for short read sequences and full gene length sequences, respectively. Evaluation of the deep learning models over 30 antibiotic resistance categories demonstrates that the DeepARG models can predict ARGs with both high precision (> 0.97) and recall (> 0.90). The models displayed an advantage over the typical best hit approach, yielding consistently lower false negative rates and thus higher overall recall (> 0.9). As more data become available for under-represented ARG categories, the DeepARG models' performance can be expected to be further enhanced due to the nature of the underlying neural networks. Our newly developed ARG database, DeepARG-DB, encompasses ARGs predicted with a high degree of confidence and extensive manual inspection, greatly expanding current ARG repositories. The deep learning models developed here offer more accurate antimicrobial resistance annotation relative to current bioinformatics practice. DeepARG does not require strict cutoffs, which enables identification of a much broader diversity of ARGs. The DeepARG models and database are available as a command line version and as a Web service at http://bench.cs.vt.edu/deeparg .
Leung, Preston; Eltahla, Auda A; Lloyd, Andrew R; Bull, Rowena A; Luciani, Fabio
2017-07-15
With the advent of affordable deep sequencing technologies, detection of low frequency variants within genetically diverse viral populations can now be achieved with unprecedented depth and efficiency. The high-resolution data provided by next generation sequencing technologies is currently recognised as the gold standard in estimation of viral diversity. In the analysis of rapidly mutating viruses, longitudinal deep sequencing datasets from viral genomes during individual infection episodes, as well as at the epidemiological level during outbreaks, now allow for more sophisticated analyses such as statistical estimates of the impact of complex mutation patterns on the evolution of the viral populations both within and between hosts. These analyses are revealing more accurate descriptions of the evolutionary dynamics that underpin the rapid adaptation of these viruses to the host response, and to drug therapies. This review assesses recent developments in methods and provide informative research examples using deep sequencing data generated from rapidly mutating viruses infecting humans, particularly hepatitis C virus (HCV), human immunodeficiency virus (HIV), Ebola virus and influenza virus, to understand the evolution of viral genomes and to explore the relationship between viral mutations and the host adaptive immune response. Finally, we discuss limitations in current technologies, and future directions that take advantage of publically available large deep sequencing datasets. Copyright © 2016 Elsevier B.V. All rights reserved.
DEEP MOTIF DASHBOARD: VISUALIZING AND UNDERSTANDING GENOMIC SEQUENCES USING DEEP NEURAL NETWORKS.
Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun
2017-01-01
Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence's saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them.
Lu, Xin; Zhang, Xu-Xiang; Wang, Zhu; Huang, Kailong; Wang, Yuan; Liang, Weigang; Tan, Yunfei; Liu, Bo; Tang, Junying
2015-01-01
This study used 454 pyrosequencing, Illumina high-throughput sequencing and metagenomic analysis to investigate bacterial pathogens and their potential virulence in a sewage treatment plant (STP) applying both conventional and advanced treatment processes. Pyrosequencing and Illumina sequencing consistently demonstrated that Arcobacter genus occupied over 43.42% of total abundance of potential pathogens in the STP. At species level, potential pathogens Arcobacter butzleri, Aeromonas hydrophila and Klebsiella pneumonia dominated in raw sewage, which was also confirmed by quantitative real time PCR. Illumina sequencing also revealed prevalence of various types of pathogenicity islands and virulence proteins in the STP. Most of the potential pathogens and virulence factors were eliminated in the STP, and the removal efficiency mainly depended on oxidation ditch. Compared with sand filtration, magnetic resin seemed to have higher removals in most of the potential pathogens and virulence factors. However, presence of the residual A. butzleri in the final effluent still deserves more concerns. The findings indicate that sewage acts as an important source of environmental pathogens, but STPs can effectively control their spread in the environment. Joint use of the high-throughput sequencing technologies is considered a reliable method for deep and comprehensive overview of environmental bacterial virulence. PMID:25938416
Swenson, Luke C; Moores, Andrew; Low, Andrew J; Thielen, Alexander; Dong, Winnie; Woods, Conan; Jensen, Mark A; Wynhoven, Brian; Chan, Dennison; Glascock, Christopher; Harrigan, P Richard
2010-08-01
Tropism testing should rule out CXCR4-using HIV before treatment with CCR5 antagonists. Currently, the recombinant phenotypic Trofile assay (Monogram) is most widely utilized; however, genotypic tests may represent alternative methods. Independent triplicate amplifications of the HIV gp120 V3 region were made from either plasma HIV RNA or proviral DNA. These underwent standard, population-based sequencing with an ABI3730 (RNA n = 63; DNA n = 40), or "deep" sequencing with a Roche/454 Genome Sequencer-FLX (RNA n = 12; DNA n = 12). Position-specific scoring matrices (PSSMX4/R5) (-6.96 cutoff) and geno2pheno[coreceptor] (5% false-positive rate) inferred tropism from V3 sequence. These methods were then independently validated with a separate, blinded dataset (n = 278) of screening samples from the maraviroc MOTIVATE trials. Standard sequencing of HIV RNA with PSSM yielded 69% sensitivity and 91% specificity, relative to Trofile. The validation dataset gave 75% sensitivity and 83% specificity. Proviral DNA plus PSSM gave 77% sensitivity and 71% specificity. "Deep" sequencing of HIV RNA detected >2% inferred-CXCR4-using virus in 8/8 samples called non-R5 by Trofile, and <2% in 4/4 samples called R5. Triplicate analyses of V3 standard sequence data detect greater proportions of CXCR4-using samples than previously achieved. Sequencing proviral DNA and "deep" V3 sequencing may also be useful tools for assessing tropism.
Tang, Zhonghui; Zhang, Liping; Xu, Chenguang; Yuan, Shaohua; Zhang, Fengting; Zheng, Yonglian; Zhao, Changping
2012-01-01
The male sterility of thermosensitive genic male sterile (TGMS) lines of wheat (Triticum aestivum) is strictly controlled by temperature. The early phase of anther development is especially susceptible to cold stress. MicroRNAs (miRNAs) play an important role in plant development and in responses to environmental stress. In this study, deep sequencing of small RNA (smRNA) libraries obtained from spike tissues of the TGMS line under cold and control conditions identified a total of 78 unique miRNA sequences from 30 families and trans-acting small interfering RNAs (tasiRNAs) derived from two TAS3 genes. To identify smRNA targets in the wheat TGMS line, we applied the degradome sequencing method, which globally and directly identifies the remnants of smRNA-directed target cleavage. We identified 26 targets of 16 miRNA families and three targets of tasiRNAs. Comparing smRNA sequencing data sets and TaqMan quantitative polymerase chain reaction results, we identified six miRNAs and one tasiRNA (tasiRNA-ARF [for Auxin-Responsive Factor]) as cold stress-responsive smRNAs in spike tissues of the TGMS line. We also determined the expression profiles of target genes that encode transcription factors in response to cold stress. Interestingly, the expression of cold stress-responsive smRNAs integrated in the auxin-signaling pathway and their target genes was largely noncorrelated. We investigated the tissue-specific expression of smRNAs using a tissue microarray approach. Our data indicated that miR167 and tasiRNA-ARF play roles in regulating the auxin-signaling pathway and possibly in the developmental response to cold stress. These data provide evidence that smRNA regulatory pathways are linked with male sterility in the TGMS line during cold stress. PMID:22508932
Archaeal β diversity patterns under the seafloor along geochemical gradients
NASA Astrophysics Data System (ADS)
Koyano, Hitoshi; Tsubouchi, Taishi; Kishino, Hirohisa; Akutsu, Tatsuya
2014-09-01
Recently, deep drilling into the seafloor has revealed that there are vast sedimentary ecosystems of diverse microorganisms, particularly archaea, in subsurface areas. We investigated the β diversity patterns of archaeal communities in sediment layers under the seafloor and their determinants. This study was accomplished by analyzing large environmental samples of 16S ribosomal RNA gene sequences and various geochemical data collected from a sediment core of 365.3 m, obtained by drilling into the seafloor off the east coast of the Shimokita Peninsula. To extract the maximum amount of information from these environmental samples, we first developed a method for measuring β diversity using sequence data by applying probability theory on a set of strings developed by two of the authors in a previous publication. We introduced an index of β diversity between sequence populations from which the sequence data were sampled. We then constructed an estimator of the β diversity index based on the sequence data and demonstrated that it converges to the β diversity index between sequence populations with probability of 1 as the number of sampled sequences increases. Next, we applied this new method to quantify β diversities between archaeal sequence populations under the seafloor and constructed a quantitative model of the estimated β diversity patterns. Nearly 90% of the variation in the archaeal β diversity was explained by a model that included as variables the differences in the abundances of chlorine, iodine, and carbon between the sediment layers.
High-Resolution 7T MR Imaging of the Motor Cortex in Amyotrophic Lateral Sclerosis.
Cosottini, M; Donatelli, G; Costagli, M; Caldarazzo Ienco, E; Frosini, D; Pesaresi, I; Biagi, L; Siciliano, G; Tosetti, M
2016-03-01
Amyotrophic lateral sclerosis is a progressive motor neuron disorder that involves degeneration of both upper and lower motor neurons. In patients with amyotrophic lateral sclerosis, pathologic studies and ex vivo high-resolution MR imaging at ultra-high field strength revealed the co-localization of iron and activated microglia distributed in the deep layers of the primary motor cortex. The aims of the study were to measure the cortical thickness and evaluate the distribution of iron-related signal changes in the primary motor cortex of patients with amyotrophic lateral sclerosis as possible in vivo biomarkers of upper motor neuron impairment. Twenty-two patients with definite amyotrophic lateral sclerosis and 14 healthy subjects underwent a high-resolution 2D multiecho gradient-recalled sequence targeted on the primary motor cortex by using a 7T scanner. Image analysis consisted of the visual evaluation and quantitative measurement of signal intensity and cortical thickness of the primary motor cortex in patients and controls. Qualitative and quantitative MR imaging parameters were correlated with electrophysiologic and laboratory data and with clinical scores. Ultra-high field MR imaging revealed atrophy and signal hypointensity in the deep layers of the primary motor cortex of patients with amyotrophic lateral sclerosis with a diagnostic accuracy of 71%. Signal hypointensity of the deep layers of the primary motor cortex correlated with upper motor neuron impairment (r = -0.47; P < .001) and with disease progression rate (r = -0.60; P = .009). The combined high spatial resolution and sensitivity to paramagnetic substances of 7T MR imaging demonstrate in vivo signal changes of the cerebral motor cortex that resemble the distribution of activated microglia within the cortex of patients with amyotrophic lateral sclerosis. Cortical thinning and signal hypointensity of the deep layers of the primary motor cortex could constitute a marker of upper motor neuron impairment in patients with amyotrophic lateral sclerosis. © 2016 by American Journal of Neuroradiology.
Quantiprot - a Python package for quantitative analysis of protein sequences.
Konopka, Bogumił M; Marciniak, Marta; Dyrka, Witold
2017-07-17
The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where sequences can be related to each other and differences can be meaningfully interpreted. Quantiprot is a software package in Python, which provides a simple and consistent interface to multiple methods for quantitative characterization of protein sequences. The package can be used to calculate dozens of characteristics directly from sequences or using physico-chemical properties of amino acids. Besides basic measures, Quantiprot performs quantitative analysis of recurrence and determinism in the sequence, calculates distribution of n-grams and computes the Zipf's law coefficient. We propose three main fields of application of the Quantiprot package. First, quantitative characteristics can be used in alignment-free similarity searches, and in clustering of large and/or divergent sequence sets. Second, a feature space defined by quantitative properties can be used in comparative studies of protein families and organisms. Third, the feature space can be used for evaluating generative models, where large number of sequences generated by the model can be compared to actually observed sequences.
Chen, Zhao; Moran, Kimberly; Richards-Yutz, Jennifer; Toorens, Erik; Gerhart, Daniel; Ganguly, Tapan; Shields, Carol L; Ganguly, Arupa
2014-03-01
Sporadic retinoblastoma (RB) is caused by de novo mutations in the RB1 gene. Often, these mutations are present as mosaic mutations that cannot be detected by Sanger sequencing. Next-generation deep sequencing allows unambiguous detection of the mosaic mutations in lymphocyte DNA. Deep sequencing of the RB1 gene on lymphocyte DNA from 20 bilateral and 70 unilateral RB cases was performed, where Sanger sequencing excluded the presence of mutations. The individual exons of the RB1 gene from each sample were amplified, pooled, ligated to barcoded adapters, and sequenced using semiconductor sequencing on an Ion Torrent Personal Genome Machine. Six low-level mosaic mutations were identified in bilateral RB and four in unilateral RB cases. The incidence of low-level mosaic mutation was estimated to be 30% and 6%, respectively, in sporadic bilateral and unilateral RB cases, previously classified as mutation negative. The frequency of point mutations detectable in lymphocyte DNA increased from 96% to 97% for bilateral RB and from 13% to 18% for unilateral RB. The use of deep sequencing technology increased the sensitivity of the detection of low-level germline mosaic mutations in the RB1 gene. This finding has significant implications for improved clinical diagnosis, genetic counseling, surveillance, and management of RB. © 2013 WILEY PERIODICALS, INC.
Wang, Zheng Jia; Huang, Jian Qin; Huang, You Jun; Li, Zheng; Zheng, Bing Song
2012-08-01
Hickory (Carya cathayensis Sarg.) is an economically important woody plant in China, but its long juvenile phase delays yield. MicroRNAs (miRNAs) are critical regulators of genes and important for normal plant development and physiology, including flower development. We used Solexa technology to sequence two small RNA libraries from two floral differentiation stages in hickory to identify miRNAs related to flower development. We identified 39 conserved miRNA sequences from 114 loci belonging to 23 families as well as two novel and ten potential novel miRNAs belonging to nine families. Moreover, 35 conserved miRNA*s and two novel miRNA*s were detected. Twenty miRNA sequences from 49 loci belonging to 11 families were differentially expressed; all were up-regulated at the later stage of flower development in hickory. Quantitative real-time PCR of 12 conserved miRNA sequences, five novel miRNA families, and two novel miRNA*s validated that all were expressed during hickory flower development, and the expression patterns were similar to those detected with Solexa sequencing. Finally, a total of 146 targets of the novel and conserved miRNAs were predicted. This study identified a diverse set of miRNAs that were closely related to hickory flower development and that could help in plant floral induction.
2012-01-01
Background MicroRNAs (miRNAs) are one of the functional non-coding small RNAs involved in the epigenetic control of the plant genome. Although plants contain both evolutionary conserved miRNAs and species-specific miRNAs within their genomes, computational methods often only identify evolutionary conserved miRNAs. The recent sequencing of the Brassica rapa genome enables us to identify miRNAs and their putative target genes. In this study, we sought to provide a more comprehensive prediction of B. rapa miRNAs based on high throughput small RNA deep sequencing. Results We sequenced small RNAs from five types of tissue: seedlings, roots, petioles, leaves, and flowers. By analyzing 2.75 million unique reads that mapped to the B. rapa genome, we identified 216 novel and 196 conserved miRNAs that were predicted to target approximately 20% of the genome’s protein coding genes. Quantitative analysis of miRNAs from the five types of tissue revealed that novel miRNAs were expressed in diverse tissues but their expression levels were lower than those of the conserved miRNAs. Comparative analysis of the miRNAs between the B. rapa and Arabidopsis thaliana genomes demonstrated that redundant copies of conserved miRNAs in the B. rapa genome may have been deleted after whole genome triplication. Novel miRNA members seemed to have spontaneously arisen from the B. rapa and A. thaliana genomes, suggesting the species-specific expansion of miRNAs. We have made this data publicly available in a miRNA database of B. rapa called BraMRs. The database allows the user to retrieve miRNA sequences, their expression profiles, and a description of their target genes from the five tissue types investigated here. Conclusions This is the first report to identify novel miRNAs from Brassica crops using genome-wide high throughput techniques. The combination of computational methods and small RNA deep sequencing provides robust predictions of miRNAs in the genome. The finding of numerous novel miRNAs, many with few target genes and low expression levels, suggests the rapid evolution of miRNA genes. The development of a miRNA database, BraMRs, enables us to integrate miRNA identification, target prediction, and functional annotation of target genes. BraMRs will represent a valuable public resource with which to study the epigenetic control of B. rapa and other closely related Brassica species. The database is available at the following link: http://bramrs.rna.kr [1]. PMID:23163954
Barrett, Nolan H.; McCarthy, Peter J.
2017-01-01
ABSTRACT The proteobacterium Alteromonas sp. strain V450 was isolated from the Atlantic deep-sea sponge Leiodermatium sp. Here, we report the draft genome sequence of this strain, with a genome size of approx. 4.39 Mb and a G+C content of 44.01%. The results will aid deep-sea microbial ecology, evolution, and sponge-microbe association studies. PMID:28153886
Less is More: Membrane Protein Digestion Beyond Urea–Trypsin Solution for Next-level Proteomics*
Zhang, Xi
2015-01-01
The goal of next-level bottom-up membrane proteomics is protein function investigation, via high-coverage high-throughput peptide-centric quantitation of expression, modifications and dynamic structures at systems scale. Yet efficient digestion of mammalian membrane proteins presents a daunting barrier, and prevalent day-long urea–trypsin in-solution digestion proved insufficient to reach this goal. Many efforts contributed incremental advances over past years, but involved protein denaturation that disconnected measurement from functional states. Beyond denaturation, the recent discovery of structure/proteomics omni-compatible detergent n-dodecyl-β-d-maltopyranoside, combined with pepsin and PNGase F columns, enabled breakthroughs in membrane protein digestion: a 2010 DDM-low-TCEP (DLT) method for H/D-exchange (HDX) using human G protein-coupled receptor, and a 2015 flow/detergent-facilitated protease and de-PTM digestions (FDD) for integrative deep sequencing and quantitation using full-length human ion channel complex. Distinguishing protein solubilization from denaturation, protease digestion reliability from theoretical specificity, and reduction from alkylation, these methods shifted day(s)-long paradigms into minutes, and afforded fully automatable (HDX)-protein-peptide-(tandem mass tag)-HPLC pipelines to instantly measure functional proteins at deep coverage, high peptide reproducibility, low artifacts and minimal leakage. Promoting—not destroying—structures and activities harnessed membrane proteins for the next-level streamlined functional proteomics. This review analyzes recent advances in membrane protein digestion methods and highlights critical discoveries for future proteomics. PMID:26081834
Identification of Prostate Cancer-Specific microDNAs
2016-02-01
circular DNA by rolling circle amplification (RCA) and then amplified DNA fragments were subject to deep sequencing. Deep sequencing of the...demonstrate the existence of microDNAs in prostate cancer. We adopted multiple displacement amplification (MDA) with random 2 primers for enriched...prostate cancer cells through multiple displacement amplification and next generation sequencing. R e la ti v e c e ll g ro w th ( % ) 0 20
Sequence-specific bias correction for RNA-seq data using recurrent neural networks.
Zhang, Yao-Zhong; Yamaguchi, Rui; Imoto, Seiya; Miyano, Satoru
2017-01-25
The recent success of deep learning techniques in machine learning and artificial intelligence has stimulated a great deal of interest among bioinformaticians, who now wish to bring the power of deep learning to bare on a host of bioinformatical problems. Deep learning is ideally suited for biological problems that require automatic or hierarchical feature representation for biological data when prior knowledge is limited. In this work, we address the sequence-specific bias correction problem for RNA-seq data redusing Recurrent Neural Networks (RNNs) to model nucleotide sequences without pre-determining sequence structures. The sequence-specific bias of a read is then calculated based on the sequence probabilities estimated by RNNs, and used in the estimation of gene abundance. We explore the application of two popular RNN recurrent units for this task and demonstrate that RNN-based approaches provide a flexible way to model nucleotide sequences without knowledge of predetermined sequence structures. Our experiments show that training a RNN-based nucleotide sequence model is efficient and RNN-based bias correction methods compare well with the-state-of-the-art sequence-specific bias correction method on the commonly used MAQC-III data set. RNNs provides an alternative and flexible way to calculate sequence-specific bias without explicitly pre-determining sequence structures.
Deep Learning for Brain MRI Segmentation: State of the Art and Future Directions.
Akkus, Zeynettin; Galimzianova, Alfiia; Hoogi, Assaf; Rubin, Daniel L; Erickson, Bradley J
2017-08-01
Quantitative analysis of brain MRI is routine for many neurological diseases and conditions and relies on accurate segmentation of structures of interest. Deep learning-based segmentation approaches for brain MRI are gaining interest due to their self-learning and generalization ability over large amounts of data. As the deep learning architectures are becoming more mature, they gradually outperform previous state-of-the-art classical machine learning algorithms. This review aims to provide an overview of current deep learning-based segmentation approaches for quantitative brain MRI. First we review the current deep learning architectures used for segmentation of anatomical brain structures and brain lesions. Next, the performance, speed, and properties of deep learning approaches are summarized and discussed. Finally, we provide a critical assessment of the current state and identify likely future developments and trends.
An, Xiaoping; Fan, Hang; Ma, Maijuan; Anderson, Benjamin D.; Jiang, Jiafu; Liu, Wei; Cao, Wuchun; Tong, Yigang
2014-01-01
This paper explored our hypothesis that sRNA (18∼30 bp) deep sequencing technique can be used as an efficient strategy to identify microorganisms other than viruses, such as prokaryotic and eukaryotic pathogens. In the study, the clean reads derived from the sRNA deep sequencing data of wild-caught ticks and mosquitoes were compared against the NCBI nucleotide collection (non-redundant nt database) using Blastn. The blast results were then analyzed with in-house Python scripts. An empirical formula was proposed to identify the putative pathogens. Results showed that not only viruses but also prokaryotic and eukaryotic species of interest can be screened out and were subsequently confirmed with experiments. Specially, a novel Rickettsia spp. was indicated to exist in Haemaphysalis longicornis ticks collected in Beijing. Our study demonstrated the reuse of sRNA deep sequencing data would have the potential to trace the origin of pathogens or discover novel agents of emerging/re-emerging infectious diseases. PMID:24618575
Pereira, Arthur Prudêncio de Araujo; Andrade, Pedro Avelino Maia de; Bini, Daniel; Durrer, Ademir; Robin, Agnès; Bouillet, Jean Pierre; Andreote, Fernando Dini; Cardoso, Elke Jurandy Bran Nogueira
2017-01-01
Our knowledge of the rhizosphere bacterial communities in deep soils and the role of Eucalyptus and Acacia on the structure of these communities remains very limited. In this study, we targeted the bacterial community along a depth profile (0 to 800 cm) and compared community structure in monospecific or mixed plantations of Acacia mangium and Eucalyptus grandis. We applied quantitative PCR (qPCR) and sequence the V6 region of the 16S rRNA gene to characterize composition of bacterial communities. We identified a decrease in bacterial abundance with soil depth, and differences in community patterns between monospecific and mixed cultivations. Sequence analysis indicated a prevalent effect of soil depth on bacterial communities in the mixed plant cultivation system, and a remarkable differentiation of bacterial communities in areas solely cultivated with Eucalyptus. The groups most influenced by soil depth were Proteobacteria and Acidobacteria (more frequent in samples between 0 and 300 cm). The predominant bacterial groups differentially displayed in the monospecific stands of Eucalyptus were Firmicutes and Proteobacteria. Our results suggest that the addition of an N2-fixing tree in a monospecific cultivation system modulates bacterial community composition even at a great depth. We conclude that co-cultivation systems may represent a key strategy to improve soil resources and to establish more sustainable cultivation of Eucalyptus in Brazil.
Schneider, Ronen; Hoogstraten, Charlotte A.; Schapiro, David; Majmundar, Amar J.; Kolb, Amy; Eddy, Kaitlyn; Shril, Shirlee; Braun, Daniela A.; Poduri, Annapurna
2018-01-01
Until recently, morpholino oligonucleotides have been widely employed in zebrafish as an acute and efficient loss-of-function assay. However, off-target effects and reproducibility issues when compared to stable knockout lines have compromised their further use. Here we employed an acute CRISPR/Cas approach using multiple single guide RNAs targeting simultaneously different positions in two exemplar genes (osgep or tprkb) to increase the likelihood of generating mutations on both alleles in the injected F0 generation and to achieve a similar effect as morpholinos but with the reproducibility of stable lines. This multi single guide RNA approach resulted in median likelihoods for at least one mutation on each allele of >99% and sgRNA specific insertion/deletion profiles as revealed by deep-sequencing. Immunoblot showed a significant reduction for Osgep and Tprkb proteins. For both genes, the acute multi-sgRNA knockout recapitulated the microcephaly phenotype and reduction in survival that we observed previously in stable knockout lines, though milder in the acute multi-sgRNA knockout. Finally, we quantify the degree of mutagenesis by deep sequencing, and provide a mathematical model to quantitate the chance for a biallelic loss-of-function mutation. Our findings can be generalized to acute and stable CRISPR/Cas targeting for any zebrafish gene of interest. PMID:29346415
de Andrade, Pedro Avelino Maia; Bini, Daniel; Durrer, Ademir; Robin, Agnès; Bouillet, Jean Pierre; Andreote, Fernando Dini; Cardoso, Elke Jurandy Bran Nogueira
2017-01-01
Our knowledge of the rhizosphere bacterial communities in deep soils and the role of Eucalyptus and Acacia on the structure of these communities remains very limited. In this study, we targeted the bacterial community along a depth profile (0 to 800 cm) and compared community structure in monospecific or mixed plantations of Acacia mangium and Eucalyptus grandis. We applied quantitative PCR (qPCR) and sequence the V6 region of the 16S rRNA gene to characterize composition of bacterial communities. We identified a decrease in bacterial abundance with soil depth, and differences in community patterns between monospecific and mixed cultivations. Sequence analysis indicated a prevalent effect of soil depth on bacterial communities in the mixed plant cultivation system, and a remarkable differentiation of bacterial communities in areas solely cultivated with Eucalyptus. The groups most influenced by soil depth were Proteobacteria and Acidobacteria (more frequent in samples between 0 and 300 cm). The predominant bacterial groups differentially displayed in the monospecific stands of Eucalyptus were Firmicutes and Proteobacteria. Our results suggest that the addition of an N2-fixing tree in a monospecific cultivation system modulates bacterial community composition even at a great depth. We conclude that co-cultivation systems may represent a key strategy to improve soil resources and to establish more sustainable cultivation of Eucalyptus in Brazil. PMID:28686690
Wang, Guojun; Barrett, Nolan H; McCarthy, Peter J
2017-02-02
The proteobacterium Alteromonas sp. strain V450 was isolated from the Atlantic deep-sea sponge Leiodermatium sp. Here, we report the draft genome sequence of this strain, with a genome size of approx. 4.39 Mb and a G+C content of 44.01%. The results will aid deep-sea microbial ecology, evolution, and sponge-microbe association studies. Copyright © 2017 Wang et al.
miRBase: integrating microRNA annotation and deep-sequencing data.
Kozomara, Ana; Griffiths-Jones, Sam
2011-01-01
miRBase is the primary online repository for all microRNA sequences and annotation. The current release (miRBase 16) contains over 15,000 microRNA gene loci in over 140 species, and over 17,000 distinct mature microRNA sequences. Deep-sequencing technologies have delivered a sharp rise in the rate of novel microRNA discovery. We have mapped reads from short RNA deep-sequencing experiments to microRNAs in miRBase and developed web interfaces to view these mappings. The user can view all read data associated with a given microRNA annotation, filter reads by experiment and count, and search for microRNAs by tissue- and stage-specific expression. These data can be used as a proxy for relative expression levels of microRNA sequences, provide detailed evidence for microRNA annotations and alternative isoforms of mature microRNAs, and allow us to revisit previous annotations. miRBase is available online at: http://www.mirbase.org/.
Transcriptome sequences resolve deep relationships of the grape family.
Wen, Jun; Xiong, Zhiqiang; Nie, Ze-Long; Mao, Likai; Zhu, Yabing; Kan, Xian-Zhao; Ickert-Bond, Stefanie M; Gerrath, Jean; Zimmer, Elizabeth A; Fang, Xiao-Dong
2013-01-01
Previous phylogenetic studies of the grape family (Vitaceae) yielded poorly resolved deep relationships, thus impeding our understanding of the evolution of the family. Next-generation sequencing now offers access to protein coding sequences very easily, quickly and cost-effectively. To improve upon earlier work, we extracted 417 orthologous single-copy nuclear genes from the transcriptomes of 15 species of the Vitaceae, covering its phylogenetic diversity. The resulting transcriptome phylogeny provides robust support for the deep relationships, showing the phylogenetic utility of transcriptome data for plants over a time scale at least since the mid-Cretaceous. The pros and cons of transcriptome data for phylogenetic inference in plants are also evaluated.
Draft Genome Sequence of Pseudomonas oceani DSM 100277T, a Deep-Sea Bacterium
2018-01-01
ABSTRACT Pseudomonas oceani DSM 100277T was isolated from deep seawater in the Okinawa Trough at 1390 m. P. oceani belongs to the Pseudomonas pertucinogena group. Here, we report the draft genome sequence of P. oceani, which has an estimated size of 4.1 Mb and exhibits 3,790 coding sequences, with a G+C content of 59.94 mol%. PMID:29650573
Kravatsky, Yuri; Chechetkin, Vladimir; Fedoseeva, Daria; Gorbacheva, Maria; Kravatskaya, Galina; Kretova, Olga; Tchurikov, Nickolai
2017-11-23
The efficient development of antiviral drugs, including efficient antiviral small interfering RNAs (siRNAs), requires continuous monitoring of the strict correspondence between a drug and the related highly variable viral DNA/RNA target(s). Deep sequencing is able to provide an assessment of both the general target conservation and the frequency of particular mutations in the different target sites. The aim of this study was to develop a reliable bioinformatic pipeline for the analysis of millions of short, deep sequencing reads corresponding to selected highly variable viral sequences that are drug target(s). The suggested bioinformatic pipeline combines the available programs and the ad hoc scripts based on an original algorithm of the search for the conserved targets in the deep sequencing data. We also present the statistical criteria for the threshold of reliable mutation detection and for the assessment of variations between corresponding data sets. These criteria are robust against the possible sequencing errors in the reads. As an example, the bioinformatic pipeline is applied to the study of the conservation of RNA interference (RNAi) targets in human immunodeficiency virus 1 (HIV-1) subtype A. The developed pipeline is freely available to download at the website http://virmut.eimb.ru/. Brief comments and comparisons between VirMut and other pipelines are also presented.
Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks
Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun
2018-01-01
Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence’s saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them. PMID:27896980
Aftershock occurrence rate decay for individual sequences and catalogs
NASA Astrophysics Data System (ADS)
Nyffenegger, Paul A.
One of the earliest observations of the Earth's seismicity is that the rate of aftershock occurrence decays with time according to a power law commonly known as modified Omori-law (MOL) decay. However, the physical reasons for aftershock occurrence and the empirical decay in rate remain unclear despite numerous models that yield similar rate decay behavior. Key problems in relating the observed empirical relationship to the physical conditions of the mainshock and fault are the lack of studies including small magnitude mainshocks and the lack of uniformity between studies. We use simulated aftershock sequences to investigate the factors which influence the maximum likelihood (ML) estimate of the Omori-law p value, the parameter describing aftershock occurrence rate decay, for both individual aftershock sequences and "stacked" or superposed sequences. Generally the ML estimate of p is accurate, but since the ML estimated uncertainty is unaffected by whether the sequence resembles an MOL model, a goodness-of-fit test such as the Anderson-Darling statistic is necessary. While stacking aftershock sequences permits the study of entire catalogs and sequences with small aftershock populations, stacking introduces artifacts. The p value for stacked sequences is approximately equal to the mean of the individual sequence p values. We apply single-link cluster analysis to identify all aftershock sequences from eleven regional seismicity catalogs. We observe two new mathematically predictable empirical relationships for the distribution of aftershock sequence populations. The average properties of aftershock sequences are not correlated with tectonic environment, but aftershock populations and p values do show a depth dependence. The p values show great variability with time, and large values or changes in p sometimes precedes major earthquakes. Studies of teleseismic earthquake catalogs over the last twenty years have led seismologists to question seismicity models and aftershock sequence decay for deep sequences. For seven exceptional deep sequences, we conclude that MOL decay adequately describes these sequences, and little difference exists compared to shallow sequences. However, they do include larger aftershock populations compared to most deep sequences. These results imply that p values for deep sequences are larger than those for intermediate depth sequences.
Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.
Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo
2016-01-11
Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.
Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields
NASA Astrophysics Data System (ADS)
Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo
2016-01-01
Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.
Deep learning methods for protein torsion angle prediction.
Li, Haiou; Hou, Jie; Adhikari, Badri; Lyu, Qiang; Cheng, Jianlin
2017-09-18
Deep learning is one of the most powerful machine learning methods that has achieved the state-of-the-art performance in many domains. Since deep learning was introduced to the field of bioinformatics in 2012, it has achieved success in a number of areas such as protein residue-residue contact prediction, secondary structure prediction, and fold recognition. In this work, we developed deep learning methods to improve the prediction of torsion (dihedral) angles of proteins. We design four different deep learning architectures to predict protein torsion angles. The architectures including deep neural network (DNN) and deep restricted Boltzmann machine (DRBN), deep recurrent neural network (DRNN) and deep recurrent restricted Boltzmann machine (DReRBM) since the protein torsion angle prediction is a sequence related problem. In addition to existing protein features, two new features (predicted residue contact number and the error distribution of torsion angles extracted from sequence fragments) are used as input to each of the four deep learning architectures to predict phi and psi angles of protein backbone. The mean absolute error (MAE) of phi and psi angles predicted by DRNN, DReRBM, DRBM and DNN is about 20-21° and 29-30° on an independent dataset. The MAE of phi angle is comparable to the existing methods, but the MAE of psi angle is 29°, 2° lower than the existing methods. On the latest CASP12 targets, our methods also achieved the performance better than or comparable to a state-of-the art method. Our experiment demonstrates that deep learning is a valuable method for predicting protein torsion angles. The deep recurrent network architecture performs slightly better than deep feed-forward architecture, and the predicted residue contact number and the error distribution of torsion angles extracted from sequence fragments are useful features for improving prediction accuracy.
Jones, David T; Kandathil, Shaun M
2018-04-26
In addition to substitution frequency data from protein sequence alignments, many state-of-the-art methods for contact prediction rely on additional sources of information, or features, of protein sequences in order to predict residue-residue contacts, such as solvent accessibility, predicted secondary structure, and scores from other contact prediction methods. It is unclear how much of this information is needed to achieve state-of-the-art results. Here, we show that using deep neural network models, simple alignment statistics contain sufficient information to achieve state-of-the-art precision. Our prediction method, DeepCov, uses fully convolutional neural networks operating on amino-acid pair frequency or covariance data derived directly from sequence alignments, without using global statistical methods such as sparse inverse covariance or pseudolikelihood estimation. Comparisons against CCMpred and MetaPSICOV2 show that using pairwise covariance data calculated from raw alignments as input allows us to match or exceed the performance of both of these methods. Almost all of the achieved precision is obtained when considering relatively local windows (around 15 residues) around any member of a given residue pairing; larger window sizes have comparable performance. Assessment on a set of shallow sequence alignments (fewer than 160 effective sequences) indicates that the new method is substantially more precise than CCMpred and MetaPSICOV2 in this regime, suggesting that improved precision is attainable on smaller sequence families. Overall, the performance of DeepCov is competitive with the state of the art, and our results demonstrate that global models, which employ features from all parts of the input alignment when predicting individual contacts, are not strictly needed in order to attain precise contact predictions. DeepCov is freely available at https://github.com/psipred/DeepCov. d.t.jones@ucl.ac.uk.
Hanriot, Lucie; Keime, Céline; Gay, Nadine; Faure, Claudine; Dossat, Carole; Wincker, Patrick; Scoté-Blachon, Céline; Peyron, Christelle; Gandrillon, Olivier
2008-01-01
Background "Open" transcriptome analysis methods allow to study gene expression without a priori knowledge of the transcript sequences. As of now, SAGE (Serial Analysis of Gene Expression), LongSAGE and MPSS (Massively Parallel Signature Sequencing) are the mostly used methods for "open" transcriptome analysis. Both LongSAGE and MPSS rely on the isolation of 21 pb tag sequences from each transcript. In contrast to LongSAGE, the high throughput sequencing method used in MPSS enables the rapid sequencing of very large libraries containing several millions of tags, allowing deep transcriptome analysis. However, a bias in the complexity of the transcriptome representation obtained by MPSS was recently uncovered. Results In order to make a deep analysis of mouse hypothalamus transcriptome avoiding the limitation introduced by MPSS, we combined LongSAGE with the Solexa sequencing technology and obtained a library of more than 11 millions of tags. We then compared it to a LongSAGE library of mouse hypothalamus sequenced with the Sanger method. Conclusion We found that Solexa sequencing technology combined with LongSAGE is perfectly suited for deep transcriptome analysis. In contrast to MPSS, it gives a complex representation of transcriptome as reliable as a LongSAGE library sequenced by the Sanger method. PMID:18796152
Quantitative Susceptibility Mapping in Parkinson's Disease.
Langkammer, Christian; Pirpamer, Lukas; Seiler, Stephan; Deistung, Andreas; Schweser, Ferdinand; Franthal, Sebastian; Homayoon, Nina; Katschnig-Winter, Petra; Koegl-Wallner, Mariella; Pendl, Tamara; Stoegerer, Eva Maria; Wenzel, Karoline; Fazekas, Franz; Ropele, Stefan; Reichenbach, Jürgen Rainer; Schmidt, Reinhold; Schwingenschuh, Petra
2016-01-01
Quantitative susceptibility mapping (QSM) and R2* relaxation rate mapping have demonstrated increased iron deposition in the substantia nigra of patients with idiopathic Parkinson's disease (PD). However, the findings in other subcortical deep gray matter nuclei are converse and the sensitivity of QSM and R2* for morphological changes and their relation to clinical measures of disease severity has so far been investigated only sparsely. The local ethics committee approved this study and all subjects gave written informed consent. 66 patients with idiopathic Parkinson's disease and 58 control subjects underwent quantitative MRI at 3T. Susceptibility and R2* maps were reconstructed from a spoiled multi-echo 3D gradient echo sequence. Mean susceptibilities and R2* rates were measured in subcortical deep gray matter nuclei and compared between patients with PD and controls as well as related to clinical variables. Compared to control subjects, patients with PD had increased R2* values in the substantia nigra. QSM also showed higher susceptibilities in patients with PD in substantia nigra, in the nucleus ruber, thalamus, and globus pallidus. Magnetic susceptibility of several of these structures was correlated with the levodopa-equivalent daily dose (LEDD) and clinical markers of motor and non-motor disease severity (total MDS-UPDRS, MDS-UPDRS-I and II). Disease severity as assessed by the Hoehn & Yahr scale was correlated with magnetic susceptibility in the substantia nigra. The established finding of higher R2* rates in the substantia nigra was extended by QSM showing superior sensitivity for PD-related tissue changes in nigrostriatal dopaminergic pathways. QSM additionally reflected the levodopa-dosage and disease severity. These results suggest a more widespread pathologic involvement and QSM as a novel means for its investigation, more sensitive than current MRI techniques.
NASA Astrophysics Data System (ADS)
Tian, Caihong; Tek Tay, Wee; Feng, Hongqiang; Wang, Ying; Hu, Yongmin; Li, Guoping
2015-06-01
Adelphocoris suturalis is one of the most serious pest insects of Bt cotton in China, however its molecular genetics, biochemistry and physiology are poorly understood. We used high throughput sequencing platform to perform de novo transcriptome assembly and gene expression analyses across different developmental stages (eggs, 2nd and 5th instar nymphs, female and male adults). We obtained 20 GB of clean data and revealed 88,614 unigenes, including 23,830 clusters and 64,784 singletons. These unigene sequences were annotated and classified by Gene Ontology, Clusters of Orthologous Groups, and Kyoto Encyclopedia of Genes and Genomes databases. A large number of differentially expressed genes were discovered through pairwise comparisons between these developmental stages. Gene expression profiles were dramatically different between life stage transitions, with some of these most differentially expressed genes being associated with sex difference, metabolism and development. Quantitative real-time PCR results confirm deep-sequencing findings based on relative expression levels of nine randomly selected genes. Furthermore, over 791,390 single nucleotide polymorphisms and 2,682 potential simple sequence repeats were identified. Our study provided comprehensive transcriptional gene expression information for A. suturalis that will form the basis to better understanding of development pathways, hormone biosynthesis, sex differences and wing formation in mirid bugs.
Tian, Caihong; Tek Tay, Wee; Feng, Hongqiang; Wang, Ying; Hu, Yongmin; Li, Guoping
2015-01-01
Adelphocoris suturalis is one of the most serious pest insects of Bt cotton in China, however its molecular genetics, biochemistry and physiology are poorly understood. We used high throughput sequencing platform to perform de novo transcriptome assembly and gene expression analyses across different developmental stages (eggs, 2nd and 5th instar nymphs, female and male adults). We obtained 20 GB of clean data and revealed 88,614 unigenes, including 23,830 clusters and 64,784 singletons. These unigene sequences were annotated and classified by Gene Ontology, Clusters of Orthologous Groups, and Kyoto Encyclopedia of Genes and Genomes databases. A large number of differentially expressed genes were discovered through pairwise comparisons between these developmental stages. Gene expression profiles were dramatically different between life stage transitions, with some of these most differentially expressed genes being associated with sex difference, metabolism and development. Quantitative real-time PCR results confirm deep-sequencing findings based on relative expression levels of nine randomly selected genes. Furthermore, over 791,390 single nucleotide polymorphisms and 2,682 potential simple sequence repeats were identified. Our study provided comprehensive transcriptional gene expression information for A. suturalis that will form the basis to better understanding of development pathways, hormone biosynthesis, sex differences and wing formation in mirid bugs. PMID:26047353
MicroRNA repertoire for functional genome research in tilapia identified by deep sequencing.
Yan, Biao; Wang, Zhen-Hua; Zhu, Chang-Dong; Guo, Jin-Tao; Zhao, Jin-Liang
2014-08-01
The Nile tilapia (Oreochromis niloticus; Cichlidae) is an economically important species in aquaculture and occupies a prominent position in the aquaculture industry. MicroRNAs (miRNAs) are a class of noncoding RNAs that post-transcriptionally regulate gene expression involved in diverse biological and metabolic processes. To increase the repertoire of miRNAs characterized in tilapia, we used the Illumina/Solexa sequencing technology to sequence a small RNA library using pooled RNA sample isolated from the different developmental stages of tilapia. Bioinformatic analyses suggest that 197 conserved and 27 novel miRNAs are expressed in tilapia. Sequence alignments indicate that all tested miRNAs and miRNAs* are highly conserved across many species. In addition, we characterized the tissue expression patterns of five miRNAs using real-time quantitative PCR. We found that miR-1/206, miR-7/9, and miR-122 is abundantly expressed in muscle, brain, and liver, respectively, implying a potential role in the regulation of tissue differentiation or the maintenance of tissue identity. Overall, our results expand the number of tilapia miRNAs, and the discovery of miRNAs in tilapia genome contributes to a better understanding the role of miRNAs in regulating diverse biological processes.
Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome
Margulies, Elliott H.; Cooper, Gregory M.; Asimenos, George; Thomas, Daryl J.; Dewey, Colin N.; Siepel, Adam; Birney, Ewan; Keefe, Damian; Schwartz, Ariel S.; Hou, Minmei; Taylor, James; Nikolaev, Sergey; Montoya-Burgos, Juan I.; Löytynoja, Ari; Whelan, Simon; Pardi, Fabio; Massingham, Tim; Brown, James B.; Bickel, Peter; Holmes, Ian; Mullikin, James C.; Ureta-Vidal, Abel; Paten, Benedict; Stone, Eric A.; Rosenbloom, Kate R.; Kent, W. James; Bouffard, Gerard G.; Guan, Xiaobin; Hansen, Nancy F.; Idol, Jacquelyn R.; Maduro, Valerie V.B.; Maskeri, Baishali; McDowell, Jennifer C.; Park, Morgan; Thomas, Pamela J.; Young, Alice C.; Blakesley, Robert W.; Muzny, Donna M.; Sodergren, Erica; Wheeler, David A.; Worley, Kim C.; Jiang, Huaiyang; Weinstock, George M.; Gibbs, Richard A.; Graves, Tina; Fulton, Robert; Mardis, Elaine R.; Wilson, Richard K.; Clamp, Michele; Cuff, James; Gnerre, Sante; Jaffe, David B.; Chang, Jean L.; Lindblad-Toh, Kerstin; Lander, Eric S.; Hinrichs, Angie; Trumbower, Heather; Clawson, Hiram; Zweig, Ann; Kuhn, Robert M.; Barber, Galt; Harte, Rachel; Karolchik, Donna; Field, Matthew A.; Moore, Richard A.; Matthewson, Carrie A.; Schein, Jacqueline E.; Marra, Marco A.; Antonarakis, Stylianos E.; Batzoglou, Serafim; Goldman, Nick; Hardison, Ross; Haussler, David; Miller, Webb; Pachter, Lior; Green, Eric D.; Sidow, Arend
2007-01-01
A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization. PMID:17567995
NASA Technical Reports Server (NTRS)
Wissler, Steven S.; Maldague, Pierre; Rocca, Jennifer; Seybold, Calina
2006-01-01
The Deep Impact mission was ambitious and challenging. JPL's well proven, easily adaptable multi-mission sequence planning tools combined with integrated spacecraft subsystem models enabled a small operations team to develop, validate, and execute extremely complex sequence-based activities within very short development times. This paper focuses on the core planning tool used in the mission, APGEN. It shows how the multi-mission design and adaptability of APGEN made it possible to model spacecraft subsystems as well as ground assets throughout the lifecycle of the Deep Impact project, starting with models of initial, high-level mission objectives, and culminating in detailed predictions of spacecraft behavior during mission-critical activities.
Transcriptome Sequences Resolve Deep Relationships of the Grape Family
Wen, Jun; Xiong, Zhiqiang; Nie, Ze-Long; Mao, Likai; Zhu, Yabing; Kan, Xian-Zhao; Ickert-Bond, Stefanie M.; Gerrath, Jean; Zimmer, Elizabeth A.; Fang, Xiao-Dong
2013-01-01
Previous phylogenetic studies of the grape family (Vitaceae) yielded poorly resolved deep relationships, thus impeding our understanding of the evolution of the family. Next-generation sequencing now offers access to protein coding sequences very easily, quickly and cost-effectively. To improve upon earlier work, we extracted 417 orthologous single-copy nuclear genes from the transcriptomes of 15 species of the Vitaceae, covering its phylogenetic diversity. The resulting transcriptome phylogeny provides robust support for the deep relationships, showing the phylogenetic utility of transcriptome data for plants over a time scale at least since the mid-Cretaceous. The pros and cons of transcriptome data for phylogenetic inference in plants are also evaluated. PMID:24069307
Deep Learning and Its Applications in Biomedicine.
Cao, Chensi; Liu, Feng; Tan, Hai; Song, Deshou; Shu, Wenjie; Li, Weizhong; Zhou, Yiming; Bo, Xiaochen; Xie, Zhi
2018-02-01
Advances in biological and medical technologies have been providing us explosive volumes of biological and physiological data, such as medical images, electroencephalography, genomic and protein sequences. Learning from these data facilitates the understanding of human health and disease. Developed from artificial neural networks, deep learning-based algorithms show great promise in extracting features and learning patterns from complex data. The aim of this paper is to provide an overview of deep learning techniques and some of the state-of-the-art applications in the biomedical field. We first introduce the development of artificial neural network and deep learning. We then describe two main components of deep learning, i.e., deep learning architectures and model optimization. Subsequently, some examples are demonstrated for deep learning applications, including medical image classification, genomic sequence analysis, as well as protein structure classification and prediction. Finally, we offer our perspectives for the future directions in the field of deep learning. Copyright © 2018. Production and hosting by Elsevier B.V.
Hybrid DNA virus in Chinese patients with seronegative hepatitis discovered by deep sequencing.
Xu, Baoyan; Zhi, Ning; Hu, Gangqing; Wan, Zhihong; Zheng, Xiaobin; Liu, Xiaohong; Wong, Susan; Kajigaya, Sachiko; Zhao, Keji; Mao, Qing; Young, Neal S
2013-06-18
Seronegative hepatitis--non-A, non-B, non-C, non-D, non-E hepatitis--is poorly characterized but strongly associated with serious complications. We collected 92 sera specimens from patients with non-A-E hepatitis in Chongqing, China between 1999 and 2007. Ten sera pools were screened by Solexa deep sequencing. We discovered a 3,780-bp contig present in all 10 pools that yielded BLASTx E scores of 7e-05-0.008 against parvoviruses. The complete sequence of the in silico-assembled 3,780-bp contig was confirmed by gene amplification of overlapping regions over almost the entire genome, and the virus was provisionally designated NIH-CQV. Further analysis revealed that the contig was composed of two major ORFs. By protein BLAST, ORF1 and ORF2 were most homologous to the replication-associated protein of bat circovirus and the capsid protein of porcine parvovirus, respectively. Phylogenetic analysis indicated that NIH-CQV is located at the interface of Parvoviridae and Circoviridae. Prevalence of NIH-CQV in patients was determined by quantitative PCR. Sixty-three of 90 patient samples (70%) were positive, but all those from 45 healthy controls were negative. Average virus titer in the patient specimens was 1.05 e4 copies/µL. Specific antibodies against NIH-CQV were sought by immunoblotting. Eighty-four percent of patients were positive for IgG, and 31% were positive for IgM; in contrast, 78% of healthy controls were positive for IgG, but all were negative for IgM. Although more work is needed to determine the etiologic role of NIH-CQV in human disease, our data indicate that a parvovirus-like virus is highly prevalent in a cohort of patients with non-A-E hepatitis.
Doolan, Kyle M; Colby, David W
2015-01-30
Prion diseases are caused by a structural rearrangement of the cellular prion protein, PrP(C), into a disease-associated conformation, PrP(Sc), which may be distinguished from one another using conformation-specific antibodies. We used mutational scanning by cell-surface display to screen 1341 PrP single point mutants for attenuated interaction with four anti-PrP antibodies, including several with conformational specificity. Single-molecule real-time gene sequencing was used to quantify enrichment of mutants, returning 26,000 high-quality full-length reads for each screened population on average. Relative enrichment of mutants correlated to the magnitude of the change in binding affinity. Mutations that diminished binding of the antibody ICSM18 represented the core of contact residues in the published crystal structure of its complex. A similarly located binding site was identified for D18, comprising discontinuous residues in helix 1 of PrP, brought into close proximity to one another only when the alpha helix is intact. The specificity of these antibodies for the normal form of PrP likely arises from loss of this conformational feature after conversion to the disease-associated form. Intriguingly, 6H4 binding was found to depend on interaction with the same residues, among others, suggesting that its ability to recognize both forms of PrP depends on a structural rearrangement of the antigen. The application of mutational scanning and deep sequencing provides residue-level resolution of positions in the protein-protein interaction interface that are critical for binding, as well as a quantitative measure of the impact of mutations on binding affinity. Copyright © 2014 Elsevier Ltd. All rights reserved.
Ghaju Shrestha, Rajani; Tanaka, Yasuhiro; Malla, Bikash; Bhandari, Dinesh; Tandukar, Sarmila; Inoue, Daisuke; Sei, Kazunari; Sherchand, Jeevan B; Haramoto, Eiji
2017-12-01
Bacteriological analysis of drinking water leads to detection of only conventional fecal indicator bacteria. This study aimed to explore and characterize bacterial diversity, to understand the extent of pathogenic bacterial contamination, and to examine the relationship between pathogenic bacteria and fecal indicator bacteria in different water sources in the Kathmandu Valley, Nepal. Sixteen water samples were collected from shallow dug wells (n=12), a deep tube well (n=1), a spring (n=1), and rivers (n=2) in September 2014 for 16S rRNA gene next-generation sequencing. A total of 525 genera were identified, of which 81 genera were classified as possible pathogenic bacteria. Acinetobacter, Arcobacter, and Clostridium were detected with a relatively higher abundance (>0.1% of total bacterial genes) in 16, 13, and 5 of the 16 samples, respectively, and the highest abundance ratio of Acinetobacter (85.14%) was obtained in the deep tube well sample. Furthermore, the bla OXA23-like genes of Acinetobacter were detected using SYBR Green-based quantitative PCR in 13 (35%) of 37 water samples, including the 16 samples that were analyzed for next-generation sequencing, with concentrations ranging 5.3-7.5logcopies/100mL. There was no sufficient correlation found between fecal indicator bacteria, such as Escherichia coli and total coliforms, and potential pathogenic bacteria, as well as the bla OXA23-like gene of Acinetobacter. These results suggest the limitation of using conventional fecal indicator bacteria in evaluating the pathogenic bacteria contamination of different water sources in the Kathmandu Valley. Copyright © 2017 Elsevier B.V. All rights reserved.
Porter, Danielle P.; Daeumer, Martin; Thielen, Alexander; Chang, Silvia; Martin, Ross; Cohen, Cal; Miller, Michael D.; White, Kirsten L.
2015-01-01
At Week 96 of the Single-Tablet Regimen (STaR) study, more treatment-naïve subjects that received rilpivirine/emtricitabine/tenofovir DF (RPV/FTC/TDF) developed resistance mutations compared to those treated with efavirenz (EFV)/FTC/TDF by population sequencing. Furthermore, more RPV/FTC/TDF-treated subjects with baseline HIV-1 RNA >100,000 copies/mL developed resistance compared to subjects with baseline HIV-1 RNA ≤100,000 copies/mL. Here, deep sequencing was utilized to assess the presence of pre-existing low-frequency variants in subjects with and without resistance development in the STaR study. Deep sequencing (Illumina MiSeq) was performed on baseline and virologic failure samples for all subjects analyzed for resistance by population sequencing during the clinical study (n = 33), as well as baseline samples from control subjects with virologic response (n = 118). Primary NRTI or NNRTI drug resistance mutations present at low frequency (≥2% to 20%) were detected in 6.6% of baseline samples by deep sequencing, all of which occurred in control subjects. Deep sequencing results were generally consistent with population sequencing but detected additional primary NNRTI and NRTI resistance mutations at virologic failure in seven samples. HIV-1 drug resistance mutations emerging while on RPV/FTC/TDF or EFV/FTC/TDF treatment were not present at low frequency at baseline in the STaR study. PMID:26690199
Porter, Danielle P; Daeumer, Martin; Thielen, Alexander; Chang, Silvia; Martin, Ross; Cohen, Cal; Miller, Michael D; White, Kirsten L
2015-12-07
At Week 96 of the Single-Tablet Regimen (STaR) study, more treatment-naïve subjects that received rilpivirine/emtricitabine/tenofovir DF (RPV/FTC/TDF) developed resistance mutations compared to those treated with efavirenz (EFV)/FTC/TDF by population sequencing. Furthermore, more RPV/FTC/TDF-treated subjects with baseline HIV-1 RNA >100,000 copies/mL developed resistance compared to subjects with baseline HIV-1 RNA ≤100,000 copies/mL. Here, deep sequencing was utilized to assess the presence of pre-existing low-frequency variants in subjects with and without resistance development in the STaR study. Deep sequencing (Illumina MiSeq) was performed on baseline and virologic failure samples for all subjects analyzed for resistance by population sequencing during the clinical study (n = 33), as well as baseline samples from control subjects with virologic response (n = 118). Primary NRTI or NNRTI drug resistance mutations present at low frequency (≥2% to 20%) were detected in 6.6% of baseline samples by deep sequencing, all of which occurred in control subjects. Deep sequencing results were generally consistent with population sequencing but detected additional primary NNRTI and NRTI resistance mutations at virologic failure in seven samples. HIV-1 drug resistance mutations emerging while on RPV/FTC/TDF or EFV/FTC/TDF treatment were not present at low frequency at baseline in the STaR study.
VirusDetect: An automated pipeline for efficient virus discovery using deep sequencing of small RNAs
USDA-ARS?s Scientific Manuscript database
Accurate detection of viruses in plants and animals is critical for agriculture production and human health. Deep sequencing and assembly of virus-derived siRNAs has proven to be a highly efficient approach for virus discovery. However, to date no computational tools specifically designed for both k...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gordon, Sean
2013-03-01
Sean Gordon of the USDA on Natural variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions at the 8th Annual Genomics of Energy Environment Meeting on March 27, 2013 in Walnut Creek, CA.
Paleoclimatic analyses of middle Eocene through Oligocene planktic foraminiferal faunas
Keller, G.
1983-01-01
Quantitative faunal analyses and oxygen isotope ranking of individual planktic foraminiferal species from deep sea sequences of three oceans are used to make paleoceanographic and paleoclimatic inferences. Species grouped into surface, intermediate and deep water categories based on ??18O values provide evidence of major changes in water-mass stratification, and individual species abundances indicate low frequency cool-warm oscillations. These data suggest that relatively stable climatic phases with minor cool-warm oscillations of ???0.5 m.y. frequency are separated by rapid cooling events during middle Eocene to early Oligocene time. Five major climatic phases are evident in the water-mass stratification between middle Eocene through Oligocene time. Phase changes occur at P14/P15, P15/P16, P20/P21 and P21/P22 Zone boundaries and are marked by major faunal turnovers, rapid cooling in the isotope record, hiatuses and changes in the eustatic sea level. A general cooling trend between middle Eocene to early late Oligocene is indicated by the successive replacement of warm middle Eocene surface water species by cooler late Eocene intermediate water species and still cooler Oligocene intermediate and deep water species. Increased water-mass stratification in the latest Eocene (P17), indicated by the coexistence of surface, intermediate and deep dwelling species groups, suggest that increased thermal gradients developed between the equator and poles nearly coincident with the development of the psychrosphere. This pattern may be related to significant ice accumulation between late Eocene and early late Oligocene time. ?? 1983.
2011-01-01
Background Readthrough fusions across adjacent genes in the genome, or transcription-induced chimeras (TICs), have been estimated using expressed sequence tag (EST) libraries to involve 4-6% of all genes. Deep transcriptional sequencing (RNA-Seq) now makes it possible to study the occurrence and expression levels of TICs in individual samples across the genome. Methods We performed single-end RNA-Seq on three human prostate adenocarcinoma samples and their corresponding normal tissues, as well as brain and universal reference samples. We developed two bioinformatics methods to specifically identify TIC events: a targeted alignment method using artificial exon-exon junctions within 200,000 bp from adjacent genes, and genomic alignment allowing splicing within individual reads. We performed further experimental verification and characterization of selected TIC and fusion events using quantitative RT-PCR and comparative genomic hybridization microarrays. Results Targeted alignment against artificial exon-exon junctions yielded 339 distinct TIC events, including 32 gene pairs with multiple isoforms. The false discovery rate was estimated to be 1.5%. Spliced alignment to the genome was less sensitive, finding only 18% of those found by targeted alignment in 33-nt reads and 59% of those in 50-nt reads. However, spliced alignment revealed 30 cases of TICs with intervening exons, in addition to distant inversions, scrambled genes, and translocations. Our findings increase the catalog of observed TIC gene pairs by 66%. We verified 6 of 6 predicted TICs in all prostate samples, and 2 of 5 predicted novel distant gene fusions, both private events among 54 prostate tumor samples tested. Expression of TICs correlates with that of the upstream gene, which can explain the prostate-specific pattern of some TIC events and the restriction of the SLC45A3-ELK4 e4-e2 TIC to ERG-negative prostate samples, as confirmed in 20 matched prostate tumor and normal samples and 9 lung cancer cell lines. Conclusions Deep transcriptional sequencing and analysis with targeted and spliced alignment methods can effectively identify TIC events across the genome in individual tissues. Prostate and reference samples exhibit a wide range of TIC events, involving more genes than estimated previously using ESTs. Tissue specificity of TIC events is correlated with expression patterns of the upstream gene. Some TIC events, such as MSMB-NCOA4, may play functional roles in cancer. PMID:21261984
Microbial Diversity in Deep-sea Methane Seep Sediments Presented by SSU rRNA Gene Tag Sequencing
Nunoura, Takuro; Takaki, Yoshihiro; Kazama, Hiromi; Hirai, Miho; Ashi, Juichiro; Imachi, Hiroyuki; Takai, Ken
2012-01-01
Microbial community structures in methane seep sediments in the Nankai Trough were analyzed by tag-sequencing analysis for the small subunit (SSU) rRNA gene using a newly developed primer set. The dominant members of Archaea were Deep-sea Hydrothermal Vent Euryarchaeotic Group 6 (DHVEG 6), Marine Group I (MGI) and Deep Sea Archaeal Group (DSAG), and those in Bacteria were Alpha-, Gamma-, Delta- and Epsilonproteobacteria, Chloroflexi, Bacteroidetes, Planctomycetes and Acidobacteria. Diversity and richness were examined by 8,709 and 7,690 tag-sequences from sediments at 5 and 25 cm below the seafloor (cmbsf), respectively. The estimated diversity and richness in the methane seep sediment are as high as those in soil and deep-sea hydrothermal environments, although the tag-sequences obtained in this study were not sufficient to show whole microbial diversity in this analysis. We also compared the diversity and richness of each taxon/division between the sediments from the two depths, and found that the diversity and richness of some taxa/divisions varied significantly along with the depth. PMID:22510646
Deep Recurrent Neural Networks for Human Activity Recognition
Murad, Abdulmajid
2017-01-01
Adopting deep learning methods for human activity recognition has been effective in extracting discriminative features from raw input sequences acquired from body-worn sensors. Although human movements are encoded in a sequence of successive samples in time, typical machine learning methods perform recognition tasks without exploiting the temporal correlations between input data samples. Convolutional neural networks (CNNs) address this issue by using convolutions across a one-dimensional temporal sequence to capture dependencies among input data. However, the size of convolutional kernels restricts the captured range of dependencies between data samples. As a result, typical models are unadaptable to a wide range of activity-recognition configurations and require fixed-length input windows. In this paper, we propose the use of deep recurrent neural networks (DRNNs) for building recognition models that are capable of capturing long-range dependencies in variable-length input sequences. We present unidirectional, bidirectional, and cascaded architectures based on long short-term memory (LSTM) DRNNs and evaluate their effectiveness on miscellaneous benchmark datasets. Experimental results show that our proposed models outperform methods employing conventional machine learning, such as support vector machine (SVM) and k-nearest neighbors (KNN). Additionally, the proposed models yield better performance than other deep learning techniques, such as deep believe networks (DBNs) and CNNs. PMID:29113103
Deep Recurrent Neural Networks for Human Activity Recognition.
Murad, Abdulmajid; Pyun, Jae-Young
2017-11-06
Adopting deep learning methods for human activity recognition has been effective in extracting discriminative features from raw input sequences acquired from body-worn sensors. Although human movements are encoded in a sequence of successive samples in time, typical machine learning methods perform recognition tasks without exploiting the temporal correlations between input data samples. Convolutional neural networks (CNNs) address this issue by using convolutions across a one-dimensional temporal sequence to capture dependencies among input data. However, the size of convolutional kernels restricts the captured range of dependencies between data samples. As a result, typical models are unadaptable to a wide range of activity-recognition configurations and require fixed-length input windows. In this paper, we propose the use of deep recurrent neural networks (DRNNs) for building recognition models that are capable of capturing long-range dependencies in variable-length input sequences. We present unidirectional, bidirectional, and cascaded architectures based on long short-term memory (LSTM) DRNNs and evaluate their effectiveness on miscellaneous benchmark datasets. Experimental results show that our proposed models outperform methods employing conventional machine learning, such as support vector machine (SVM) and k-nearest neighbors (KNN). Additionally, the proposed models yield better performance than other deep learning techniques, such as deep believe networks (DBNs) and CNNs.
Brooks, Matthew J.; Rajasimha, Harsha K.; Roger, Jerome E.
2011-01-01
Purpose Next-generation sequencing (NGS) has revolutionized systems-based analysis of cellular pathways. The goals of this study are to compare NGS-derived retinal transcriptome profiling (RNA-seq) to microarray and quantitative reverse transcription polymerase chain reaction (qRT–PCR) methods and to evaluate protocols for optimal high-throughput data analysis. Methods Retinal mRNA profiles of 21-day-old wild-type (WT) and neural retina leucine zipper knockout (Nrl−/−) mice were generated by deep sequencing, in triplicate, using Illumina GAIIx. The sequence reads that passed quality filters were analyzed at the transcript isoform level with two methods: Burrows–Wheeler Aligner (BWA) followed by ANOVA (ANOVA) and TopHat followed by Cufflinks. qRT–PCR validation was performed using TaqMan and SYBR Green assays. Results Using an optimized data analysis workflow, we mapped about 30 million sequence reads per sample to the mouse genome (build mm9) and identified 16,014 transcripts in the retinas of WT and Nrl−/− mice with BWA workflow and 34,115 transcripts with TopHat workflow. RNA-seq data confirmed stable expression of 25 known housekeeping genes, and 12 of these were validated with qRT–PCR. RNA-seq data had a linear relationship with qRT–PCR for more than four orders of magnitude and a goodness of fit (R2) of 0.8798. Approximately 10% of the transcripts showed differential expression between the WT and Nrl−/− retina, with a fold change ≥1.5 and p value <0.05. Altered expression of 25 genes was confirmed with qRT–PCR, demonstrating the high degree of sensitivity of the RNA-seq method. Hierarchical clustering of differentially expressed genes uncovered several as yet uncharacterized genes that may contribute to retinal function. Data analysis with BWA and TopHat workflows revealed a significant overlap yet provided complementary insights in transcriptome profiling. Conclusions Our study represents the first detailed analysis of retinal transcriptomes, with biologic replicates, generated by RNA-seq technology. The optimized data analysis workflows reported here should provide a framework for comparative investigations of expression profiles. Our results show that NGS offers a comprehensive and more accurate quantitative and qualitative evaluation of mRNA content within a cell or tissue. We conclude that RNA-seq based transcriptome characterization would expedite genetic network analyses and permit the dissection of complex biologic functions. PMID:22162623
Draft Genome Sequence of Pseudomonas oceani DSM 100277T, a Deep-Sea Bacterium.
García-Valdés, Elena; Gomila, Margarita; Mulet, Magdalena; Lalucat, Jorge
2018-04-12
Pseudomonas oceani DSM 100277 T was isolated from deep seawater in the Okinawa Trough at 1390 m. P. oceani belongs to the Pseudomonas pertucinogena group. Here, we report the draft genome sequence of P. oceani , which has an estimated size of 4.1 Mb and exhibits 3,790 coding sequences, with a G+C content of 59.94 mol%. Copyright © 2018 García-Valdés et al.
Brown, Shawn P; Callaham, Mac A; Oliver, Alena K; Jumpponen, Ari
2013-12-01
Prescribed burning is a common management tool to control fuel loads, ground vegetation, and facilitate desirable game species. We evaluated soil fungal community responses to long-term prescribed fire treatments in a loblolly pine forest on the Piedmont of Georgia and utilized deep Internal Transcribed Spacer Region 1 (ITS1) amplicon sequencing afforded by the recent Ion Torrent Personal Genome Machine (PGM). These deep sequence data (19,000 + reads per sample after subsampling) indicate that frequent fires (3-year fire interval) shift soil fungus communities, whereas infrequent fires (6-year fire interval) permit system resetting to a state similar to that without prescribed fire. Furthermore, in nonmetric multidimensional scaling analyses, primarily ectomycorrhizal taxa were correlated with axes associated with long fire intervals, whereas soil saprobes tended to be correlated with the frequent fire recurrence. We conclude that (1) multiplexed Ion Torrent PGM analyses allow deep cost effective sequencing of fungal communities but may suffer from short read lengths and inconsistent sequence quality adjacent to the sequencing adaptor; (2) frequent prescribed fires elicit a shift in soil fungal communities; and (3) such shifts do not occur when fire intervals are longer. Our results emphasize the general responsiveness of these forests to management, and the importance of fire return intervals in meeting management objectives. © 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.
Huang, Shunmou; Yang, Hongli; Zhan, Gaomiao; Wang, Xinfa; Liu, Guihua; Wang, Hanzhong
2012-01-01
Background Single nucleotide polymorphisms (SNPs) are an important class of genetic marker for target gene mapping. As of yet, there is no rapid and effective method to identify SNPs linked with agronomic traits in rapeseed and other crop species. Methodology/Principal Findings We demonstrate a novel method for identifying SNP markers in rapeseed by deep sequencing a representative library and performing bulk segregant analysis. With this method, SNPs associated with rapeseed pod shatter-resistance were discovered. Firstly, a reduced representation of the rapeseed genome was used. Genomic fragments ranging from 450–550 bp were prepared from the susceptible bulk (ten F2 plants with the silique shattering resistance index, SSRI <0.10) and the resistance bulk (ten F2 plants with SSRI >0.90), and also Solexa sequencing-produced 90 bp reads. Approximately 50 million of these sequence reads were assembled into contigs to a depth of 20-fold coverage. Secondly, 60,396 ‘simple SNPs’ were identified, and the statistical significance was evaluated using Fisher's exact test. There were 70 associated SNPs whose –log10 p value over 16 were selected to be further analyzed. The distribution of these SNPs appeared a tight cluster, which consisted of 14 associated SNPs within a 396 kb region on chromosome A09. Our evidence indicates that this region contains a major quantitative trait locus (QTL). Finally, two associated SNPs from this region were mapped on a major QTL region. Conclusions/Significance 70 associated SNPs were discovered and a major QTL for rapeseed pod shatter-resistance was found on chromosome A09 using our novel method. The associated SNP markers were used for mapping of the QTL, and may be useful for improving pod shatter-resistance in rapeseed through marker-assisted selection and map-based cloning. This approach will accelerate the discovery of major QTLs and the cloning of functional genes for important agronomic traits in rapeseed and other crop species. PMID:22529909
RNA-Seq analysis to capture the transcriptome landscape of a single cell
Tang, Fuchou; Barbacioru, Catalin; Nordman, Ellen; Xu, Nanlan; Bashkirov, Vladimir I; Lao, Kaiqin; Surani, M. Azim
2013-01-01
We describe here a protocol for digital transcriptome analysis in a single mouse blastomere using a deep sequencing approach. An individual blastomere was first isolated and put into lysate buffer by mouth pipette. Reverse transcription was then performed directly on the whole cell lysate. After this, the free primers were removed by Exonuclease I and a poly(A) tail was added to the 3′ end of the first-strand cDNA by Terminal Deoxynucleotidyl Transferase. Then the single cell cDNAs were amplified by 20 plus 9 cycles of PCR. Then 100-200 ng of these amplified cDNAs were used to construct a sequencing library. The sequencing library can be used for deep sequencing using the SOLiD system. Compared with the cDNA microarray technique, our assay can capture up to 75% more genes expressed in early embryos. The protocol can generate deep sequencing libraries within 6 days for 16 single cell samples. PMID:20203668
Deep sequencing reveals double mutations in cis of MPL exon 10 in myeloproliferative neoplasms.
Pietra, Daniela; Brisci, Angela; Rumi, Elisa; Boggi, Sabrina; Elena, Chiara; Pietrelli, Alessandro; Bordoni, Roberta; Ferrari, Maurizio; Passamonti, Francesco; De Bellis, Gianluca; Cremonesi, Laura; Cazzola, Mario
2011-04-01
Somatic mutations of MPL exon 10, mainly involving a W515 substitution, have been described in JAK2 (V617F)-negative patients with essential thrombocythemia and primary myelofibrosis. We used direct sequencing and high-resolution melt analysis to identify mutations of MPL exon 10 in 570 patients with myeloproliferative neoplasms, and allele specific PCR and deep sequencing to further characterize a subset of mutated patients. Somatic mutations were detected in 33 of 221 patients (15%) with JAK2 (V617F)-negative essential thrombocythemia or primary myelofibrosis. Only one patient with essential thrombocythemia carried both JAK2 (V617F) and MPL (W515L). High-resolution melt analysis identified abnormal patterns in all the MPL mutated cases, while direct sequencing did not detect the mutant MPL in one fifth of them. In 3 cases carrying double MPL mutations, deep sequencing analysis showed identical load and location in cis of the paired lesions, indicating their simultaneous occurrence on the same chromosome.
Cassler, M; Peterson, C L; Ledger, A; Pomponi, S A; Wright, A E; Winegar, R; McCarthy, P J; Lopez, J V
2008-04-01
In this report, real-time quantitative PCR (TaqMan qPCR) of the small subunit (SSU) 16S-like rRNA molecule, a universal phylogenetic marker, was used to quantify the relative abundance of individual bacterial members of a diverse, yet mostly unculturable, microbial community from a marine sponge. Molecular phylogenetic analyses of bacterial communities derived from Caribbean Lithistid sponges have shown a wide diversity of microbes that included at least six major subdivisions; however, very little overlap was observed between the culturable and unculturable microbial communities. Based on sequence data of three culture-independent Lithistid-derived representative bacteria, we designed probe/primer sets for TaqMan qPCR to quantitatively characterize selected microbial residents in a Lithistid sponge, Vetulina, metagenome. TaqMan assays included specificity testing, DNA limit of detection analysis, and quantification of specific microbial rRNA sequences such as Nitrospira-like microbes and Actinobacteria up to 172 million copies per microgram per Lithistid sponge metagenome. By contrast, qPCR amplification with probes designed for common previously cultured sponge-associated bacteria in the genera Rheinheimera and Marinomonas and a representative of the CFB group resulted in only minimal detection of the Rheiheimera in total DNA extracted from the sponge. These data verify that a large portion of the microbial community within Lithistid sponges may consist of currently unculturable microorganisms.
RaptorX-Property: a web server for protein structure property prediction.
Wang, Sheng; Li, Wei; Liu, Shiwang; Xu, Jinbo
2016-07-08
RaptorX Property (http://raptorx2.uchicago.edu/StructurePropertyPred/predict/) is a web server predicting structure property of a protein sequence without using any templates. It outperforms other servers, especially for proteins without close homologs in PDB or with very sparse sequence profile (i.e. carries little evolutionary information). This server employs a powerful in-house deep learning model DeepCNF (Deep Convolutional Neural Fields) to predict secondary structure (SS), solvent accessibility (ACC) and disorder regions (DISO). DeepCNF not only models complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent property labels. Our experimental results show that, tested on CASP10, CASP11 and the other benchmarks, this server can obtain ∼84% Q3 accuracy for 3-state SS, ∼72% Q8 accuracy for 8-state SS, ∼66% Q3 accuracy for 3-state solvent accessibility, and ∼0.89 area under the ROC curve (AUC) for disorder prediction. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Isakov, Ofer; Bordería, Antonio V; Golan, David; Hamenahem, Amir; Celniker, Gershon; Yoffe, Liron; Blanc, Hervé; Vignuzzi, Marco; Shomron, Noam
2015-07-01
The study of RNA virus populations is a challenging task. Each population of RNA virus is composed of a collection of different, yet related genomes often referred to as mutant spectra or quasispecies. Virologists using deep sequencing technologies face major obstacles when studying virus population dynamics, both experimentally and in natural settings due to the relatively high error rates of these technologies and the lack of high performance pipelines. In order to overcome these hurdles we developed a computational pipeline, termed ViVan (Viral Variance Analysis). ViVan is a complete pipeline facilitating the identification, characterization and comparison of sequence variance in deep sequenced virus populations. Applying ViVan on deep sequenced data obtained from samples that were previously characterized by more classical approaches, we uncovered novel and potentially crucial aspects of virus populations. With our experimental work, we illustrate how ViVan can be used for studies ranging from the more practical, detection of resistant mutations and effects of antiviral treatments, to the more theoretical temporal characterization of the population in evolutionary studies. Freely available on the web at http://www.vivanbioinfo.org : nshomron@post.tau.ac.il Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Zill, Oliver A.; Sebisanovic, Dragan; Lopez, Rene; Blau, Sibel; Collisson, Eric A.; Divers, Stephen G.; Hoon, Dave S. B.; Kopetz, E. Scott; Lee, Jeeyun; Nikolinakos, Petros G.; Baca, Arthur M.; Kermani, Bahram G.; Eltoukhy, Helmy; Talasaz, AmirAli
2015-01-01
Next-generation sequencing of cell-free circulating solid tumor DNA addresses two challenges in contemporary cancer care. First this method of massively parallel and deep sequencing enables assessment of a comprehensive panel of genomic targets from a single sample, and second, it obviates the need for repeat invasive tissue biopsies. Digital SequencingTM is a novel method for high-quality sequencing of circulating tumor DNA simultaneously across a comprehensive panel of over 50 cancer-related genes with a simple blood test. Here we report the analytic and clinical validation of the gene panel. Analytic sensitivity down to 0.1% mutant allele fraction is demonstrated via serial dilution studies of known samples. Near-perfect analytic specificity (> 99.9999%) enables complete coverage of many genes without the false positives typically seen with traditional sequencing assays at mutant allele frequencies or fractions below 5%. We compared digital sequencing of plasma-derived cell-free DNA to tissue-based sequencing on 165 consecutive matched samples from five outside centers in patients with stage III-IV solid tumor cancers. Clinical sensitivity of plasma-derived NGS was 85.0%, comparable to 80.7% sensitivity for tissue. The assay success rate on 1,000 consecutive samples in clinical practice was 99.8%. Digital sequencing of plasma-derived DNA is indicated in advanced cancer patients to prevent repeated invasive biopsies when the initial biopsy is inadequate, unobtainable for genomic testing, or uninformative, or when the patient’s cancer has progressed despite treatment. Its clinical utility is derived from reduction in the costs, complications and delays associated with invasive tissue biopsies for genomic testing. PMID:26474073
Less is More: Membrane Protein Digestion Beyond Urea-Trypsin Solution for Next-level Proteomics.
Zhang, Xi
2015-09-01
The goal of next-level bottom-up membrane proteomics is protein function investigation, via high-coverage high-throughput peptide-centric quantitation of expression, modifications and dynamic structures at systems scale. Yet efficient digestion of mammalian membrane proteins presents a daunting barrier, and prevalent day-long urea-trypsin in-solution digestion proved insufficient to reach this goal. Many efforts contributed incremental advances over past years, but involved protein denaturation that disconnected measurement from functional states. Beyond denaturation, the recent discovery of structure/proteomics omni-compatible detergent n-dodecyl-β-d-maltopyranoside, combined with pepsin and PNGase F columns, enabled breakthroughs in membrane protein digestion: a 2010 DDM-low-TCEP (DLT) method for H/D-exchange (HDX) using human G protein-coupled receptor, and a 2015 flow/detergent-facilitated protease and de-PTM digestions (FDD) for integrative deep sequencing and quantitation using full-length human ion channel complex. Distinguishing protein solubilization from denaturation, protease digestion reliability from theoretical specificity, and reduction from alkylation, these methods shifted day(s)-long paradigms into minutes, and afforded fully automatable (HDX)-protein-peptide-(tandem mass tag)-HPLC pipelines to instantly measure functional proteins at deep coverage, high peptide reproducibility, low artifacts and minimal leakage. Promoting-not destroying-structures and activities harnessed membrane proteins for the next-level streamlined functional proteomics. This review analyzes recent advances in membrane protein digestion methods and highlights critical discoveries for future proteomics. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
Daikoku, Tohru; Oyama, Yukari; Yajima, Misako; Sekizuka, Tsuyoshi; Kuroda, Makoto; Shimada, Yuka; Takehara, Kazuhiko; Miwa, Naoko; Okuda, Tomoko; Sata, Tetsutaro; Shiraki, Kimiyasu
2015-06-01
Herpes simplex virus 2 caused a genital ulcer, and a secondary herpetic whitlow appeared during acyclovir therapy. The secondary and recurrent whitlow isolates were acyclovir-resistant and temperature-sensitive in contrast to a genital isolate. We identified the ribonucleotide reductase mutation responsible for temperature-sensitivity by deep-sequencing analysis.
Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning.
Teng, Haotian; Cao, Minh Duc; Hall, Michael B; Duarte, Tania; Wang, Sheng; Coin, Lachlan J M
2018-05-01
Sequencing by translocating DNA fragments through an array of nanopores is a rapidly maturing technology that offers faster and cheaper sequencing than other approaches. However, accurately deciphering the DNA sequence from the noisy and complex electrical signal is challenging. Here, we report Chiron, the first deep learning model to achieve end-to-end basecalling and directly translate the raw signal to DNA sequence without the error-prone segmentation step. Trained with only a small set of 4,000 reads, we show that our model provides state-of-the-art basecalling accuracy, even on previously unseen species. Chiron achieves basecalling speeds of more than 2,000 bases per second using desktop computer graphics processing units.
Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A; Larsen, Martin Jakob
2016-01-01
Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths.
Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A.; Larsen, Martin Jakob
2016-01-01
Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths. PMID:27002637
Avsec, Žiga; Cheng, Jun; Gagneur, Julien
2018-01-01
Abstract Motivation Regulatory sequences are not solely defined by their nucleic acid sequence but also by their relative distances to genomic landmarks such as transcription start site, exon boundaries or polyadenylation site. Deep learning has become the approach of choice for modeling regulatory sequences because of its strength to learn complex sequence features. However, modeling relative distances to genomic landmarks in deep neural networks has not been addressed. Results Here we developed spline transformation, a neural network module based on splines to flexibly and robustly model distances. Modeling distances to various genomic landmarks with spline transformations significantly increased state-of-the-art prediction accuracy of in vivo RNA-binding protein binding sites for 120 out of 123 proteins. We also developed a deep neural network for human splice branchpoint based on spline transformations that outperformed the current best, already distance-based, machine learning model. Compared to piecewise linear transformation, as obtained by composition of rectified linear units, spline transformation yields higher prediction accuracy as well as faster and more robust training. As spline transformation can be applied to further quantities beyond distances, such as methylation or conservation, we foresee it as a versatile component in the genomics deep learning toolbox. Availability and implementation Spline transformation is implemented as a Keras layer in the CONCISE python package: https://github.com/gagneurlab/concise. Analysis code is available at https://github.com/gagneurlab/Manuscript_Avsec_Bioinformatics_2017. Contact avsec@in.tum.de or gagneur@in.tum.de Supplementary information Supplementary data are available at Bioinformatics online. PMID:29155928
Principles of Quantitative MR Imaging with Illustrated Review of Applicable Modular Pulse Diagrams.
Mills, Andrew F; Sakai, Osamu; Anderson, Stephan W; Jara, Hernan
2017-01-01
Continued improvements in diagnostic accuracy using magnetic resonance (MR) imaging will require development of methods for tissue analysis that complement traditional qualitative MR imaging studies. Quantitative MR imaging is based on measurement and interpretation of tissue-specific parameters independent of experimental design, compared with qualitative MR imaging, which relies on interpretation of tissue contrast that results from experimental pulse sequence parameters. Quantitative MR imaging represents a natural next step in the evolution of MR imaging practice, since quantitative MR imaging data can be acquired using currently available qualitative imaging pulse sequences without modifications to imaging equipment. The article presents a review of the basic physical concepts used in MR imaging and how quantitative MR imaging is distinct from qualitative MR imaging. Subsequently, the article reviews the hierarchical organization of major applicable pulse sequences used in this article, with the sequences organized into conventional, hybrid, and multispectral sequences capable of calculating the main tissue parameters of T1, T2, and proton density. While this new concept offers the potential for improved diagnostic accuracy and workflow, awareness of this extension to qualitative imaging is generally low. This article reviews the basic physical concepts in MR imaging, describes commonly measured tissue parameters in quantitative MR imaging, and presents the major available pulse sequences used for quantitative MR imaging, with a focus on the hierarchical organization of these sequences. © RSNA, 2017.
MUFOLD-SS: New deep inception-inside-inception networks for protein secondary structure prediction.
Fang, Chao; Shang, Yi; Xu, Dong
2018-05-01
Protein secondary structure prediction can provide important information for protein 3D structure prediction and protein functions. Deep learning offers a new opportunity to significantly improve prediction accuracy. In this article, a new deep neural network architecture, named the Deep inception-inside-inception (Deep3I) network, is proposed for protein secondary structure prediction and implemented as a software tool MUFOLD-SS. The input to MUFOLD-SS is a carefully designed feature matrix corresponding to the primary amino acid sequence of a protein, which consists of a rich set of information derived from individual amino acid, as well as the context of the protein sequence. Specifically, the feature matrix is a composition of physio-chemical properties of amino acids, PSI-BLAST profile, and HHBlits profile. MUFOLD-SS is composed of a sequence of nested inception modules and maps the input matrix to either eight states or three states of secondary structures. The architecture of MUFOLD-SS enables effective processing of local and global interactions between amino acids in making accurate prediction. In extensive experiments on multiple datasets, MUFOLD-SS outperformed the best existing methods and other deep neural networks significantly. MUFold-SS can be downloaded from http://dslsrv8.cs.missouri.edu/~cf797/MUFoldSS/download.html. © 2018 Wiley Periodicals, Inc.
Comparative analysis of the small RNA transcriptomes of Pinus contorta and Oryza sativa
Morin, Ryan D.; Aksay, Gozde; Dolgosheina, Elena; Ebhardt, H. Alexander; Magrini, Vincent; Mardis, Elaine R.; Sahinalp, S. Cenk; Unrau, Peter J.
2008-01-01
The diversity of microRNAs and small-interfering RNAs has been extensively explored within angiosperms by focusing on a few key organisms such as Oryza sativa and Arabidopsis thaliana. A deeper division of the plants is defined by the radiation of the angiosperms and gymnosperms, with the latter comprising the commercially important conifers. The conifers are expected to provide important information regarding the evolution of highly conserved small regulatory RNAs. Deep sequencing provides the means to characterize and quantitatively profile small RNAs in understudied organisms such as these. Pyrosequencing of small RNAs from O. sativa revealed, as expected, ∼21- and ∼24-nt RNAs. The former contained known microRNAs, and the latter largely comprised intergenic-derived sequences likely representing heterochromatin siRNAs. In contrast, sequences from Pinus contorta were dominated by 21-nt small RNAs. Using a novel sequence-based clustering algorithm, we identified sequences belonging to 18 highly conserved microRNA families in P. contorta as well as numerous clusters of conserved small RNAs of unknown function. Using multiple methods, including expressed sequence folding and machine learning algorithms, we found a further 53 candidate novel microRNA families, 51 appearing specific to the P. contorta library. In addition, alignment of small RNA sequences to the O. sativa genome revealed six perfectly conserved classes of small RNA that included chloroplast transcripts and specific types of genomic repeats. The conservation of microRNAs and other small RNAs between the conifers and the angiosperms indicates that important RNA silencing processes were highly developed in the earliest spermatophytes. Genomic mapping of all sequences to the O. sativa genome can be viewed at http://microrna.bcgsc.ca/cgi-bin/gbrowse/rice_build_3/. PMID:18323537
Fungal diversity in deep-sea sediments of a hydrothermal vent system in the Southwest Indian Ridge
NASA Astrophysics Data System (ADS)
Xu, Wei; Gong, Lin-feng; Pang, Ka-Lai; Luo, Zhu-Hua
2018-01-01
Deep-sea hydrothermal sediment is known to support remarkably diverse microbial consortia. In deep sea environments, fungal communities remain less studied despite their known taxonomic and functional diversity. High-throughput sequencing methods have augmented our capacity to assess eukaryotic diversity and their functions in microbial ecology. Here we provide the first description of the fungal community diversity found in deep sea sediments collected at the Southwest Indian Ridge (SWIR) using culture-dependent and high-throughput sequencing approaches. A total of 138 fungal isolates were cultured from seven different sediment samples using various nutrient media, and these isolates were identified to 14 fungal taxa, including 11 Ascomycota taxa (7 genera) and 3 Basidiomycota taxa (2 genera) based on internal transcribed spacers (ITS1, ITS2 and 5.8S) of rDNA. Using illumina HiSeq sequencing, a total of 757,467 fungal ITS2 tags were recovered from the samples and clustered into 723 operational taxonomic units (OTUs) belonging to 79 taxa (Ascomycota and Basidiomycota contributed to 99% of all samples) based on 97% sequence similarity. Results from both approaches suggest that there is a high fungal diversity in the deep-sea sediments collected in the SWIR and fungal communities were shown to be slightly different by location, although all were collected from adjacent sites at the SWIR. This study provides baseline data of the fungal diversity and biogeography, and a glimpse to the microbial ecology associated with the deep-sea sediments of the hydrothermal vent system of the Southwest Indian Ridge.
Hoppe, Elisabeth; Körzdörfer, Gregor; Würfl, Tobias; Wetzl, Jens; Lugauer, Felix; Pfeuffer, Josef; Maier, Andreas
2017-01-01
The purpose of this work is to evaluate methods from deep learning for application to Magnetic Resonance Fingerprinting (MRF). MRF is a recently proposed measurement technique for generating quantitative parameter maps. In MRF a non-steady state signal is generated by a pseudo-random excitation pattern. A comparison of the measured signal in each voxel with the physical model yields quantitative parameter maps. Currently, the comparison is done by matching a dictionary of simulated signals to the acquired signals. To accelerate the computation of quantitative maps we train a Convolutional Neural Network (CNN) on simulated dictionary data. As a proof of principle we show that the neural network implicitly encodes the dictionary and can replace the matching process.
Kulmanov, Maxat; Khan, Mohammed Asif; Hoehndorf, Robert; Wren, Jonathan
2018-02-15
A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein-protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations. Web server: http://deepgo.bio2vec.net, Source code: https://github.com/bio-ontology-research-group/deepgo. robert.hoehndorf@kaust.edu.sa. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gritsenko, Marina A.; Xu, Zhe; Liu, Tao
Comprehensive, quantitative information on abundances of proteins and their post-translational modifications (PTMs) can potentially provide novel biological insights into diseases pathogenesis and therapeutic intervention. Herein, we introduce a quantitative strategy utilizing isobaric stable isotope-labelling techniques combined with two-dimensional liquid chromatography-tandem mass spectrometry (2D-LC-MS/MS) for large-scale, deep quantitative proteome profiling of biological samples or clinical specimens such as tumor tissues. The workflow includes isobaric labeling of tryptic peptides for multiplexed and accurate quantitative analysis, basic reversed-phase LC fractionation and concatenation for reduced sample complexity, and nano-LC coupled to high resolution and high mass accuracy MS analysis for high confidence identification andmore » quantification of proteins. This proteomic analysis strategy has been successfully applied for in-depth quantitative proteomic analysis of tumor samples, and can also be used for integrated proteome and PTM characterization, as well as comprehensive quantitative proteomic analysis across samples from large clinical cohorts.« less
Gritsenko, Marina A; Xu, Zhe; Liu, Tao; Smith, Richard D
2016-01-01
Comprehensive, quantitative information on abundances of proteins and their posttranslational modifications (PTMs) can potentially provide novel biological insights into diseases pathogenesis and therapeutic intervention. Herein, we introduce a quantitative strategy utilizing isobaric stable isotope-labeling techniques combined with two-dimensional liquid chromatography-tandem mass spectrometry (2D-LC-MS/MS) for large-scale, deep quantitative proteome profiling of biological samples or clinical specimens such as tumor tissues. The workflow includes isobaric labeling of tryptic peptides for multiplexed and accurate quantitative analysis, basic reversed-phase LC fractionation and concatenation for reduced sample complexity, and nano-LC coupled to high resolution and high mass accuracy MS analysis for high confidence identification and quantification of proteins. This proteomic analysis strategy has been successfully applied for in-depth quantitative proteomic analysis of tumor samples and can also be used for integrated proteome and PTM characterization, as well as comprehensive quantitative proteomic analysis across samples from large clinical cohorts.
Theory of Semiconducting Superlattices and Microstructures
1992-03-01
theory elucidated the various factors affecting deep levels, sets forth the conditions for obtaining shallow-deep transitions, and predicts that Si (a...theory elucidates the various factors affecting deep levels, sets forth the conditions for obtaining shallow-deep transitions, and predicts that Si (a...ondenotes the anion vacancy, which can be thought any quantitative theoretical factor are theof as originating from Column-O of the Period strengths of
Quantitative Comparison of the in situ Microbial Communities in Different Biomes
1995-09-01
70 60 50 40 [ a: 30 Ŕ ~ 20 10 o Neotrop. Neotrop_ Antartic . Antartic . -Deep Sea Deep Sea Austral. USA East West Surface 9·10 em...surface (n = 20) (21). ’" LL -’ a. " 2 C w 2 ’" a. 60 50 40 30 20 10 o Neotrop. Neolrop. Antartic . Antartic . Deep Sea Deep Sea Austral
Deep machine learning provides state-of-the-art performance in image-based plant phenotyping.
Pound, Michael P; Atkinson, Jonathan A; Townsend, Alexandra J; Wilson, Michael H; Griffiths, Marcus; Jackson, Aaron S; Bulat, Adrian; Tzimiropoulos, Georgios; Wells, Darren M; Murchie, Erik H; Pridmore, Tony P; French, Andrew P
2017-10-01
In plant phenotyping, it has become important to be able to measure many features on large image sets in order to aid genetic discovery. The size of the datasets, now often captured robotically, often precludes manual inspection, hence the motivation for finding a fully automated approach. Deep learning is an emerging field that promises unparalleled results on many data analysis problems. Building on artificial neural networks, deep approaches have many more hidden layers in the network, and hence have greater discriminative and predictive power. We demonstrate the use of such approaches as part of a plant phenotyping pipeline. We show the success offered by such techniques when applied to the challenging problem of image-based plant phenotyping and demonstrate state-of-the-art results (>97% accuracy) for root and shoot feature identification and localization. We use fully automated trait identification using deep learning to identify quantitative trait loci in root architecture datasets. The majority (12 out of 14) of manually identified quantitative trait loci were also discovered using our automated approach based on deep learning detection to locate plant features. We have shown deep learning-based phenotyping to have very good detection and localization accuracy in validation and testing image sets. We have shown that such features can be used to derive meaningful biological traits, which in turn can be used in quantitative trait loci discovery pipelines. This process can be completely automated. We predict a paradigm shift in image-based phenotyping bought about by such deep learning approaches, given sufficient training sets. © The Authors 2017. Published by Oxford University Press.
Gibson, Richard M.; Meyer, Ashley M.; Winner, Dane; Archer, John; Feyertag, Felix; Ruiz-Mateos, Ezequiel; Leal, Manuel; Robertson, David L.; Schmotzer, Christine L.
2014-01-01
With 29 individual antiretroviral drugs available from six classes that are approved for the treatment of HIV-1 infection, a combination of different phenotypic and genotypic tests is currently needed to monitor HIV-infected individuals. In this study, we developed a novel HIV-1 genotypic assay based on deep sequencing (DeepGen HIV) to simultaneously assess HIV-1 susceptibilities to all drugs targeting the three viral enzymes and to predict HIV-1 coreceptor tropism. Patient-derived gag-p2/NCp7/p1/p6/pol-PR/RT/IN- and env-C2V3 PCR products were sequenced using the Ion Torrent Personal Genome Machine. Reads spanning the 3′ end of the Gag, protease (PR), reverse transcriptase (RT), integrase (IN), and V3 regions were extracted, truncated, translated, and assembled for genotype and HIV-1 coreceptor tropism determination. DeepGen HIV consistently detected both minority drug-resistant viruses and non-R5 HIV-1 variants from clinical specimens with viral loads of ≥1,000 copies/ml and from B and non-B subtypes. Additional mutations associated with resistance to PR, RT, and IN inhibitors, previously undetected by standard (Sanger) population sequencing, were reliably identified at frequencies as low as 1%. DeepGen HIV results correlated with phenotypic (original Trofile, 92%; enhanced-sensitivity Trofile assay [ESTA], 80%; TROCAI, 81%; and VeriTrop, 80%) and genotypic (population sequencing/Geno2Pheno with a 10% false-positive rate [FPR], 84%) HIV-1 tropism test results. DeepGen HIV (83%) and Trofile (85%) showed similar concordances with the clinical response following an 8-day course of maraviroc monotherapy (MCT). In summary, this novel all-inclusive HIV-1 genotypic and coreceptor tropism assay, based on deep sequencing of the PR, RT, IN, and V3 regions, permits simultaneous multiplex detection of low-level drug-resistant and/or non-R5 viruses in up to 96 clinical samples. This comprehensive test, the first of its class, will be instrumental in the development of new antiretroviral drugs and, more importantly, will aid in the treatment and management of HIV-infected individuals. PMID:24468782
Dendrites, deep learning, and sequences in the hippocampus.
Bhalla, Upinder S
2017-10-12
The hippocampus places us both in time and space. It does so over remarkably large spans: milliseconds to years, and centimeters to kilometers. This works for sensory representations, for memory, and for behavioral context. How does it fit in such wide ranges of time and space scales, and keep order among the many dimensions of stimulus context? A key organizing principle for a wide sweep of scales and stimulus dimensions is that of order in time, or sequences. Sequences of neuronal activity are ubiquitous in sensory processing, in motor control, in planning actions, and in memory. Against this strong evidence for the phenomenon, there are currently more models than definite experiments about how the brain generates ordered activity. The flip side of sequence generation is discrimination. Discrimination of sequences has been extensively studied at the behavioral, systems, and modeling level, but again physiological mechanisms are fewer. It is against this backdrop that I discuss two recent developments in neural sequence computation, that at face value share little beyond the label "neural." These are dendritic sequence discrimination, and deep learning. One derives from channel physiology and molecular signaling, the other from applied neural network theory - apparently extreme ends of the spectrum of neural circuit detail. I suggest that each of these topics has deep lessons about the possible mechanisms, scales, and capabilities of hippocampal sequence computation. © 2017 Wiley Periodicals, Inc.
A robust and cost-effective approach to sequence and analyze complete genomes of small RNA viruses
USDA-ARS?s Scientific Manuscript database
Background: Next-generation sequencing (NGS) allows ultra-deep sequencing of nucleic acids. The use of sequence-independent amplification of viral nucleic acids without utilization of target-specific primers provides advantages over traditional sequencing methods and allows detection of unsuspected ...
De novo transcriptome assembly and positive selection analysis of an individual deep-sea fish.
Lan, Yi; Sun, Jin; Xu, Ting; Chen, Chong; Tian, Renmao; Qiu, Jian-Wen; Qian, Pei-Yuan
2018-05-24
High hydrostatic pressure and low temperatures make the deep sea a harsh environment for life forms. Actin organization and microtubules assembly, which are essential for intracellular transport and cell motility, can be disrupted by high hydrostatic pressure. High hydrostatic pressure can also damage DNA. Nucleic acids exposed to low temperatures can form secondary structures that hinder genetic information processing. To study how deep-sea creatures adapt to such a hostile environment, one of the most straightforward ways is to sequence and compare their genes with those of their shallow-water relatives. We captured an individual of the fish species Aldrovandia affinis, which is a typical deep-sea inhabitant, from the Okinawa Trough at a depth of 1550 m using a remotely operated vehicle (ROV). We sequenced its transcriptome and analyzed its molecular adaptation. We obtained 27,633 protein coding sequences using an Illumina platform and compared them with those of several shallow-water fish species. Analysis of 4918 single-copy orthologs identified 138 positively selected genes in A. affinis, including genes involved in microtubule regulation. Particularly, functional domains related to cold shock as well as DNA repair are exposed to positive selection pressure in both deep-sea fish and hadal amphipod. Overall, we have identified a set of positively selected genes related to cytoskeleton structures, DNA repair and genetic information processing, which shed light on molecular adaptation to the deep sea. These results suggest that amino acid substitutions of these positively selected genes may contribute crucially to the adaptation of deep-sea animals. Additionally, we provide a high-quality transcriptome of a deep-sea fish for future deep-sea studies.
Sohlberg, Elina; Bomberg, Malin; Miettinen, Hanna; Nyyssönen, Mari; Salavirta, Heikki; Vikman, Minna; Itävaara, Merja
2015-01-01
The diversity and functional role of fungi, one of the ecologically most important groups of eukaryotic microorganisms, remains largely unknown in deep biosphere environments. In this study we investigated fungal communities in packer-isolated bedrock fractures in Olkiluoto, Finland at depths ranging from 296 to 798 m below surface level. DNA- and cDNA-based high-throughput amplicon sequencing analysis of the fungal internal transcribed spacer (ITS) gene markers was used to examine the total fungal diversity and to identify the active members in deep fracture zones at different depths. Results showed that fungi were present in fracture zones at all depths and fungal diversity was higher than expected. Most of the observed fungal sequences belonged to the phylum Ascomycota. Phyla Basidiomycota and Chytridiomycota were only represented as a minor part of the fungal community. Dominating fungal classes in the deep bedrock aquifers were Sordariomycetes, Eurotiomycetes, and Dothideomycetes from the Ascomycota phylum and classes Microbotryomycetes and Tremellomycetes from the Basidiomycota phylum, which are the most frequently detected fungal taxa reported also from deep sea environments. In addition some fungal sequences represented potentially novel fungal species. Active fungi were detected in most of the fracture zones, which proves that fungi are able to maintain cellular activity in these oligotrophic conditions. Possible roles of fungi and their origin in deep bedrock groundwater can only be speculated in the light of current knowledge but some species may be specifically adapted to deep subsurface environment and may play important roles in the utilization and recycling of nutrients and thus sustaining the deep subsurface microbial community.
NASA Astrophysics Data System (ADS)
Zhang, Jinyu; Steel, Ronald; Ambrose, William
2017-12-01
Shelf margins prograde and aggrade by the incremental addition of deltaic sediments supplied from river channel belts and by stored shoreline sediment. This paper documents the shelf-edge trajectory and coeval channel belts for a segment of Paleocene Lower Wilcox Group in the northern Gulf of Mexico based on 400 wireline logs and 300 m of whole cores. By quantitatively analyzing these data and comparing them with global databases, we demonstrate how varying sediment supply impacted the Wilcox shelf-margin growth and deep-water sediment dispersal under greenhouse eustatic conditions. The coastal plain to marine topset and uppermost continental slope succession of the Lower Wilcox shelf-margin sediment prism is divided into eighteen high-frequency ( 300 ky duration) stratigraphic sequences, and further grouped into 5 sequence sets (labeled as A-E from bottom to top). Sequence Set A is dominantly muddy slope deposits. The shelf edge of Sequence Sets B and C prograded rapidly (> 10 km/Ma) and aggraded modestly (< 80 m/Ma). The coeval channel belts are relatively large (individually averaging 11-13 m thick) and amalgamated. The water discharge of Sequence Sets B and C rivers, estimated by channel-belt thickness, bedform type, and grain size, is 7000-29,000 m3/s, considered as large rivers when compared with modern river databases. In contrast, slow progradation (< 10 km/Ma) and rapid aggradation (> 80 m/Ma) characterizes Sequence Sets D and E, which is associated with smaller (9-10 m thick on average) and isolated channel belts. This stratigraphic trend is likely due to an upward decreasing sediment supply indicated by the shelf-edge progradation rate and channel size, as well as an upward increasing shelf accommodation indicated by the shelf-edge aggradation rate. The rapid shelf-edge progradation and large rivers in Sequence Sets B and C confirm earlier suggestions that it was the early phase of Lower Wilcox dispersal that brought the largest deep-water sediment volumes into the Gulf of Mexico. Key factors in this Lower Wilcox stratigraphic trend are likely to have been a very high initial sediment flux to the Gulf because of the high initial release of sediment from Laramide catchments to the north and northwest, possibly aided by modest eustatic sea-level fall on the Texas shelf, which is suggested by the early, flat shelf-edge trajectory, high amalgamation of channel belts, and the low overall aggradation rate of the Sequence Sets B and C.
Ultra-deep mutant spectrum profiling: improving sequencing accuracy using overlapping read pairs.
Chen-Harris, Haiyin; Borucki, Monica K; Torres, Clinton; Slezak, Tom R; Allen, Jonathan E
2013-02-12
High throughput sequencing is beginning to make a transformative impact in the area of viral evolution. Deep sequencing has the potential to reveal the mutant spectrum within a viral sample at high resolution, thus enabling the close examination of viral mutational dynamics both within- and between-hosts. The challenge however, is to accurately model the errors in the sequencing data and differentiate real viral mutations, particularly those that exist at low frequencies, from sequencing errors. We demonstrate that overlapping read pairs (ORP) -- generated by combining short fragment sequencing libraries and longer sequencing reads -- significantly reduce sequencing error rates and improve rare variant detection accuracy. Using this sequencing protocol and an error model optimized for variant detection, we are able to capture a large number of genetic mutations present within a viral population at ultra-low frequency levels (<0.05%). Our rare variant detection strategies have important implications beyond viral evolution and can be applied to any basic and clinical research area that requires the identification of rare mutations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shi, CY; Yang, H; Wei, CL
Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq) provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes. Using high-throughput Illumina RNA-seq, the transcriptome from poly (A){sup +} RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs). Approximate 34.5 million reads were obtained, trimmed, and assembled intomore » 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010). Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG) found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were analyzed by RT-PCR and quantitative real time PCR (qRT-PCR). An extensive transcriptome dataset has been obtained from the deep sequencing of tea plant. The coverage of the transcriptome is comprehensive enough to discover all known genes of several major metabolic pathways. This transcriptome dataset can serve as an important public information platform for gene expression, genomics, and functional genomic studies in C. sinensis.« less
2011-01-01
Background Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq) provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes. Results Using high-throughput Illumina RNA-seq, the transcriptome from poly (A)+ RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs). Approximate 34.5 million reads were obtained, trimmed, and assembled into 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010). Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG) found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were analyzed by RT-PCR and quantitative real time PCR (qRT-PCR). Conclusions An extensive transcriptome dataset has been obtained from the deep sequencing of tea plant. The coverage of the transcriptome is comprehensive enough to discover all known genes of several major metabolic pathways. This transcriptome dataset can serve as an important public information platform for gene expression, genomics, and functional genomic studies in C. sinensis. PMID:21356090
NASA Astrophysics Data System (ADS)
Chen, Xinyuan; Song, Li; Yang, Xiaokang
2016-09-01
Video denoising can be described as the problem of mapping from a specific length of noisy frames to clean one. We propose a deep architecture based on Recurrent Neural Network (RNN) for video denoising. The model learns a patch-based end-to-end mapping between the clean and noisy video sequences. It takes the corrupted video sequences as the input and outputs the clean one. Our deep network, which we refer to as deep Recurrent Neural Networks (deep RNNs or DRNNs), stacks RNN layers where each layer receives the hidden state of the previous layer as input. Experiment shows (i) the recurrent architecture through temporal domain extracts motion information and does favor to video denoising, and (ii) deep architecture have large enough capacity for expressing mapping relation between corrupted videos as input and clean videos as output, furthermore, (iii) the model has generality to learned different mappings from videos corrupted by different types of noise (e.g., Poisson-Gaussian noise). By training on large video databases, we are able to compete with some existing video denoising methods.
Constraints in cancer evolution.
Venkatesan, Subramanian; Birkbak, Nicolai J; Swanton, Charles
2017-02-08
Next-generation deep genome sequencing has only recently allowed us to quantitatively dissect the extent of heterogeneity within a tumour, resolving patterns of cancer evolution. Intratumour heterogeneity and natural selection contribute to resistance to anticancer therapies in the advanced setting. Recent evidence has also revealed that cancer evolution might be constrained. In this review, we discuss the origins of intratumour heterogeneity and subsequently focus on constraints imposed upon cancer evolution. The presence of (1) parallel evolution, (2) convergent evolution and (3) the biological impact of acquiring mutations in specific orders suggest that cancer evolution may be exploitable. These constraints on cancer evolution may help us identify cancer evolutionary rule books, which could eventually inform both diagnostic and therapeutic approaches to improve survival outcomes. © 2017 The Author(s); published by Portland Press Limited on behalf of the Biochemical Society.
deepTools2: a next generation web server for deep-sequencing data analysis.
Ramírez, Fidel; Ryan, Devon P; Grüning, Björn; Bhardwaj, Vivek; Kilpert, Fabian; Richter, Andreas S; Heyne, Steffen; Dündar, Friederike; Manke, Thomas
2016-07-08
We present an update to our Galaxy-based web server for processing and visualizing deeply sequenced data. Its core tool set, deepTools, allows users to perform complete bioinformatic workflows ranging from quality controls and normalizations of aligned reads to integrative analyses, including clustering and visualization approaches. Since we first described our deepTools Galaxy server in 2014, we have implemented new solutions for many requests from the community and our users. Here, we introduce significant enhancements and new tools to further improve data visualization and interpretation. deepTools continue to be open to all users and freely available as a web service at deeptools.ie-freiburg.mpg.de The new deepTools2 suite can be easily deployed within any Galaxy framework via the toolshed repository, and we also provide source code for command line usage under Linux and Mac OS X. A public and documented API for access to deepTools functionality is also available. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Park, Gyeong-Moon; Yoo, Yong-Ho; Kim, Deok-Hwa; Kim, Jong-Hwan; Gyeong-Moon Park; Yong-Ho Yoo; Deok-Hwa Kim; Jong-Hwan Kim; Yoo, Yong-Ho; Park, Gyeong-Moon; Kim, Jong-Hwan; Kim, Deok-Hwa
2018-06-01
Robots are expected to perform smart services and to undertake various troublesome or difficult tasks in the place of humans. Since these human-scale tasks consist of a temporal sequence of events, robots need episodic memory to store and retrieve the sequences to perform the tasks autonomously in similar situations. As episodic memory, in this paper we propose a novel Deep adaptive resonance theory (ART) neural model and apply it to the task performance of the humanoid robot, Mybot, developed in the Robot Intelligence Technology Laboratory at KAIST. Deep ART has a deep structure to learn events, episodes, and even more like daily episodes. Moreover, it can retrieve the correct episode from partial input cues robustly. To demonstrate the effectiveness and applicability of the proposed Deep ART, experiments are conducted with the humanoid robot, Mybot, for performing the three tasks of arranging toys, making cereal, and disposing of garbage.
Wu, Jieying; Gao, Weimin; Zhang, Weiwen; Meldrum, Deirdre R
2011-01-01
Limitation in sample quality and quantity is one of the big obstacles for applying metatranscriptomic technologies to explore gene expression and functionality of microbial communities in natural environments. In this study, several amplification methods were evaluated for whole-transcriptome amplification of deep-sea microbial samples, which are of low cell density and high impurity. The best amplification method was identified and incorporated into a complete protocol to isolate and amplify deep-sea microbial samples. In the protocol, total RNA was first isolated by a modified method combining Trizol (Invitrogen, CA) and RNeasy (QIAGEN, CA) method, amplified with a WT-Ovation™ Pico RNA Amplification System (NuGEN, CA), and then converted to double-strand DNA from single-strand cDNA with a WT-Ovation™ Exon Module (NuGEN, CA). The products from the whole-transcriptome amplification of deep-sea microbial samples were assessed first through random clone library sequencing. The BLAST search results showed that marine-based sequences are dominant in the libraries, consistent with the ecological source of the samples. The products were then used for next-generation Roche GS FLX Titanium sequencing to obtain metatranscriptome data. Preliminary analysis of the metatranscriptomic data showed good sequencing quality. Although the protocol was designed and demonstrated to be effective for deep-sea microbial samples, it should be applicable to similar samples from other extreme environments in exploring community structure and functionality of microbial communities. Copyright © 2010 Elsevier B.V. All rights reserved.
Zeng, Cong; Thomas, Leighton J; Kelly, Michelle; Gardner, Jonathan P A
2016-05-01
The complete mitochondrial genome of a New Zealand specimen of the deep-sea sponge Poecillastra laminaris (Sollas, 1886) (Astrophorida, Vulcanellidae), from the Colville Ridge, New Zealand, was sequenced using the 454 Life Science pyrosequencing system. To identify homologous mitochondrial sequences, the 454 reads were mapped to the complete mitochondrial genome sequence of Geodia neptuni (GeneBank No. NC_006990). The P. laminaris genome is 18,413 bp in length and includes 14 protein-coding genes, 24 transfer RNA genes and 2 ribosomal RNA genes. Gene order resembled that of other demosponges. The base composition of the genome is A (29.1%), T (35.2%), C (14.0%) and G (21.7%). This is the second published mitogenome for a sponge of the order Astrophorida and will be useful in future phylogenetic analysis of deep-sea sponges.
Wang, Duolin; Zeng, Shuai; Xu, Chunhui; Qiu, Wangren; Liang, Yanchun; Joshi, Trupti; Xu, Dong
2017-12-15
Computational methods for phosphorylation site prediction play important roles in protein function studies and experimental design. Most existing methods are based on feature extraction, which may result in incomplete or biased features. Deep learning as the cutting-edge machine learning method has the ability to automatically discover complex representations of phosphorylation patterns from the raw sequences, and hence it provides a powerful tool for improvement of phosphorylation site prediction. We present MusiteDeep, the first deep-learning framework for predicting general and kinase-specific phosphorylation sites. MusiteDeep takes raw sequence data as input and uses convolutional neural networks with a novel two-dimensional attention mechanism. It achieves over a 50% relative improvement in the area under the precision-recall curve in general phosphorylation site prediction and obtains competitive results in kinase-specific prediction compared to other well-known tools on the benchmark data. MusiteDeep is provided as an open-source tool available at https://github.com/duolinwang/MusiteDeep. xudong@missouri.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
NASA Astrophysics Data System (ADS)
Tian, Yuntao; Kohn, Barry P.; Qiu, Nansheng; Yuan, Yusong; Hu, Shengbiao; Gleadow, Andrew J. W.; Zhang, Peizhen
2018-02-01
A distinctive NNE trending belt of shortening structures dominates the topography and deformation of the eastern Sichuan Basin, 300 km east of the Tibetan Plateau. Debate continues as to whether the structures resulted from Cenozoic eastward growth of the Tibetan Plateau. A low-temperature thermochronology (AFT and AHe) data set from four deep boreholes and adjacent outcrops intersecting a branch of the shortening structures indicates distinctive differential cooling at 35-28 Ma across the structure, where stratigraphy has been offset vertically by 0.8-1.3 km. This result forms the first quantitative evidence for the existence of a late Eocene-Oligocene phase of shortening in the eastern Sichuan Basin, synchronous with the early phase of eastward growth and extrusion of the Tibetan Plateau. Further, a compilation of regional Cenozoic structures reveals a Miocene retreat of deformation from the foreland basin to the hinterland areas. Such a tectonic reorganization indicates that Eocene to Miocene deformation in the eastern Tibetan Plateau is out-of-sequence and was probably triggered by enhanced erosion in the eastern Tibetan Plateau.
New features of triacylglycerol biosynthetic pathways of peanut seeds in early developmental stages.
Yu, Mingli; Liu, Fengzhen; Zhu, Weiwei; Sun, Meihong; Liu, Jiang; Li, Xinzheng
2015-11-01
The peanut (Arachis hypogaea L.) is one of the three most important oil crops in the world due to its high average oil content (50 %). To reveal the biosynthetic pathways of seed oil in the early developmental stages of peanut pods with the goal of improving the oil quality, we presented a method combining deep sequencing analysis of the peanut pod transcriptome and quantitative real-time PCR (RT-PCR) verification of seed oil-related genes. From the sequencing data, approximately 1500 lipid metabolism-associated Unigenes were identified. The RT-PCR results quantified the different expression patterns of these triacylglycerol (TAG) synthesis-related genes in the early developmental stages of peanut pods. Based on these results and analysis, we proposed a novel construct of the metabolic pathways involved in the biosynthesis of TAG, including the Kennedy pathway, acyl-CoA-independent pathway and proposed monoacylglycerol pathway. It showed that the biosynthetic pathways of TAG in the early developmental stages of peanut pods were much more complicated than a simple, unidirectional, linear pathway.
Continental margin sedimentation: From sediment transport to sequence stratigraphy
Nittrouer, Charles A.; Austin, James A.; Field, Michael E.; Kravitz, Joseph H.; Syvitski, James P. M.; Wiberg, Patricia L.
2007-01-01
This volume on continental margin sedimentation brings together an expert editorial and contributor team to create a state-of-the-art resource. Taking a global perspective, the book spans a range of timescales and content, ranging from how oceans transport particles, to how thick rock sequences are formed on continental margins.- Summarizes and integrates our understanding of sedimentary processes and strata associated with fluvial dispersal systems on continental shelves and slopes- Explores timescales ranging from particle transport at one extreme, to deep burial at the other- Insights are presented for margins in general, and with focus on a tectonically active margin (northern California) and a passive margin (New Jersey), enabling detailed examination of the intricate relationships between a wide suite of sedimentary processes and their preserved stratigraphy- Includes observational studies which document the processes and strata found on particular margins, in addition to numerical models and laboratory experimentation, which provide a quantitative basis for extrapolation in time and space of insights about continental-margin sedimentation- Provides a research resource for scientists studying modern and ancient margins, and an educational text for advanced students in sedimentology and stratigraphy
Fungal diversity in deep-sea sediments associated with asphalt seeps at the Sao Paulo Plateau
NASA Astrophysics Data System (ADS)
Nagano, Yuriko; Miura, Toshiko; Nishi, Shinro; Lima, Andre O.; Nakayama, Cristina; Pellizari, Vivian H.; Fujikura, Katsunori
2017-12-01
We investigated the fungal diversity in a total of 20 deep-sea sediment samples (of which 14 samples were associated with natural asphalt seeps and 6 samples were not associated) collected from two different sites at the Sao Paulo Plateau off Brazil by Ion Torrent PGM targeting ITS region of ribosomal RNA. Our results suggest that diverse fungi (113 operational taxonomic units (OTUs) based on clustering at 97% sequence similarity assigned into 9 classes and 31 genus) are present in deep-sea sediment samples collected at the Sao Paulo Plateau, dominated by Ascomycota (74.3%), followed by Basidiomycota (11.5%), unidentified fungi (7.1%), and sequences with no affiliation to any organisms in the public database (7.1%). However, it was revealed that only three species, namely Penicillium sp., Cadophora malorum and Rhodosporidium diobovatum, were dominant, with the majority of OTUs remaining a minor community. Unexpectedly, there was no significant difference in major fungal community structure between the asphalt seep and non-asphalt seep sites, despite the presence of mass hydrocarbon deposits and the high amount of macro organisms surrounding the asphalt seeps. However, there were some differences in the minor fungal communities, with possible asphalt degrading fungi present specifically in the asphalt seep sites. In contrast, some differences were found between the two different sampling sites. Classification of OTUs revealed that only 47 (41.6%) fungal OTUs exhibited >97% sequence similarity, in comparison with pre-existing ITS sequences in public databases, indicating that a majority of deep-sea inhabiting fungal taxa still remain undescribed. Although our knowledge on fungi and their role in deep-sea environments is still limited and scarce, this study increases our understanding of fungal diversity and community structure in deep-sea environments.
LookSeq: a browser-based viewer for deep sequencing data.
Manske, Heinrich Magnus; Kwiatkowski, Dominic P
2009-11-01
Sequencing a genome to great depth can be highly informative about heterogeneity within an individual or a population. Here we address the problem of how to visualize the multiple layers of information contained in deep sequencing data. We propose an interactive AJAX-based web viewer for browsing large data sets of aligned sequence reads. By enabling seamless browsing and fast zooming, the LookSeq program assists the user to assimilate information at different levels of resolution, from an overview of a genomic region to fine details such as heterogeneity within the sample. A specific problem, particularly if the sample is heterogeneous, is how to depict information about structural variation. LookSeq provides a simple graphical representation of paired sequence reads that is more revealing about potential insertions and deletions than are conventional methods.
Takai, Ken; Horikoshi, Koki
1999-01-01
Molecular phylogenetic analysis of a naturally occurring microbial community in a deep-subsurface geothermal environment indicated that the phylogenetic diversity of the microbial population in the environment was extremely limited and that only hyperthermophilic archaeal members closely related to Pyrobaculum were present. All archaeal ribosomal DNA sequences contained intron-like sequences, some of which had open reading frames with repeated homing-endonuclease motifs. The sequence similarity analysis and the phylogenetic analysis of these homing endonucleases suggested the possible phylogenetic relationship among archaeal rRNA-encoded homing endonucleases. PMID:10584021
Azuma, M; Hirai, T; Yamada, K; Yamashita, S; Ando, Y; Tateishi, M; Iryo, Y; Yoneda, T; Kitajima, M; Wang, Y; Yamashita, Y
2016-05-01
Quantitative susceptibility mapping is useful for assessing iron deposition in the substantia nigra of patients with Parkinson disease. We aimed to determine whether quantitative susceptibility mapping is useful for assessing the lateral asymmetry and spatial difference in iron deposits in the substantia nigra of patients with Parkinson disease. Our study population comprised 24 patients with Parkinson disease and 24 age- and sex-matched healthy controls. They underwent 3T MR imaging by using a 3D multiecho gradient-echo sequence. On reconstructed quantitative susceptibility mapping, we measured the susceptibility values in the anterior, middle, and posterior parts of the substantia nigra, the whole substantia nigra, and other deep gray matter structures in both hemibrains. To identify the more and less affected hemibrains in patients with Parkinson disease, we assessed the severity of movement symptoms for each hemibrain by using the Unified Parkinson's Disease Rating Scale. In the posterior substantia nigra of patients with Parkinson disease, the mean susceptibility value was significantly higher in the more than the less affected hemibrain substantia nigra (P < .05). This value was significantly higher in both the more and less affected hemibrains of patients with Parkinson disease than in controls (P < .05). Asymmetry of the mean susceptibility values was significantly greater for patients than controls (P < .05). Receiver operating characteristic analysis showed that quantitative susceptibility mapping of the posterior substantia nigra in the more affected hemibrain provided the highest power for discriminating patients with Parkinson disease from the controls. Quantitative susceptibility mapping is useful for assessing the lateral asymmetry and spatial difference of iron deposition in the substantia nigra of patients with Parkinson disease. © 2016 by American Journal of Neuroradiology.
Lu, Zen H; Brown, Alexander; Wilson, Alison D; Calvert, Jay G; Balasch, Monica; Fuentes-Utrilla, Pablo; Loecherbach, Julia; Turner, Frances; Talbot, Richard; Archibald, Alan L; Ait-Ali, Tahar
2014-03-04
Porcine Reproductive and Respiratory Syndrome (PRRS) is a disease of major economic impact worldwide. The etiologic agent of this disease is the PRRS virus (PRRSV). Increasing evidence suggest that microevolution within a coexisting quasispecies population can give rise to high sequence heterogeneity in PRRSV. We developed a pipeline based on the ultra-deep next generation sequencing approach to first construct the complete genome of a European PRRSV, strain Olot/9, cultured on macrophages and then capture the rare variants representative of the mixed quasispecies population. Olot/91 differs from the reference Lelystad strain by about 5% and a total of 88 variants, with frequencies as low as 1%, were detected in the mixed population. These variants included 16 non-synonymous variants concentrated in the genes encoding structural and nonstructural proteins; including Glycoprotein 2a and 5. Using an ultra-deep sequencing methodology, the complete genome of Olot/91 was constructed without any prior knowledge of the sequence. Rare variants that constitute minor fractions of the heterogeneous PRRSV population could successfully be detected to allow further exploration of microevolutionary events.
Wang, Ruijia; Nambiar, Ram; Zheng, Dinghai
2018-01-01
Abstract PolyA_DB is a database cataloging cleavage and polyadenylation sites (PASs) in several genomes. Previous versions were based mainly on expressed sequence tags (ESTs), which had a limited amount and could lead to inaccurate PAS identification due to the presence of internal A-rich sequences in transcripts. Here, we present an updated version of the database based solely on deep sequencing data. First, PASs are mapped by the 3′ region extraction and deep sequencing (3′READS) method, ensuring unequivocal PAS identification. Second, a large volume of data based on diverse biological samples increases PAS coverage by 3.5-fold over the EST-based version and provides PAS usage information. Third, strand-specific RNA-seq data are used to extend annotated 3′ ends of genes to obtain more thorough annotations of alternative polyadenylation (APA) sites. Fourth, conservation information of PAS across mammals sheds light on significance of APA sites. The database (URL: http://www.polya-db.org/v3) currently holds PASs in human, mouse, rat and chicken, and has links to the UCSC genome browser for further visualization and for integration with other genomic data. PMID:29069441
Deep whole-genome sequencing of 90 Han Chinese genomes.
Lan, Tianming; Lin, Haoxiang; Zhu, Wenjuan; Laurent, Tellier Christian Asker Melchior; Yang, Mengcheng; Liu, Xin; Wang, Jun; Wang, Jian; Yang, Huanming; Xu, Xun; Guo, Xiaosen
2017-09-01
Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low-frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low-frequency and novel variants. Although whole-exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole-genome sequencing data are limited for any population, and a large amount of low-frequency, population-specific variants remain uncharacterized. We have performed whole-genome sequencing at a high depth (∼×80) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genomes Project samples, including 45 Northern Han Chinese and 45 Southern Han Chinese samples. Eighty-three of these 90 have been sequenced by the 1000 Genomes Project. We have identified 12 568 804 single nucleotide polymorphisms, 2 074 210 short InDels, and 26 142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7 000 629 novel variants with low frequency (defined as minor allele frequency < 5%), including 5 813 503 single nucleotide polymorphisms, 1 169 199 InDels, and 17 927 structural variants. Using deep sequencing data, we have built a greatly expanded spectrum of genetic variation for the Han Chinese genome. Compared to the 1000 Genomes Project, these Han Chinese deep sequencing data enhance the characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement to the 1000 Genomes Project, as well as to other human genome projects. © The Authors 2017. Published by Oxford University Press.
Optimization of conditions to sequence long cDNAs from viruses
USDA-ARS?s Scientific Manuscript database
Fourth generation sequencing with the Minion nanopore sequencer provides opportunity to obtain deep coverage and long read for single molecules. This will benefit studies on RNA viruses. In the past, Sanger, Illumina, and Ion Torrent sequencing have been utilized to study RNA viruses. Both technique...
SNP discovery through de novo deep sequencing using the next generation of DNA sequencers
USDA-ARS?s Scientific Manuscript database
The production of high volumes of DNA sequence data using new technologies has permitted more efficient identification of single nucleotide polymorphisms in vertebrate genomes. This chapter presented practical methodology for production and analysis of DNA sequence data for SNP discovery....
Deep sequencing identification of miRNAs in pigeon ovaries illuminated with monochromatic light.
Wang, Ying; Yang, Hai-Ming; Cao, Wei; Li, Yang-Bai; Wang, Zhi-Yue
2018-06-08
The use of light of different wavelengths has grown popular in the poultry industry. An optimum wavelength is believed to improve pigeon egg production, but little is known about the role of microRNAs (miRNAs) in the effects of monochromatic light on ovarian pigeon function. Herein, we harvested ovaries from pigeons reared under monochromatic light of different wavelength and performed deep sequencing on various tissues using an Illumina Solexa high-throughput instrument. We obtained 66,148,548, 67,873,805, and 71,661,771 clean reads from ovaries of pigeons reared under red light (RL), blue light (BL), and white light (WL), respectively. We identified 1917 known miRNAs in nine libraries, of which 524 were novel. Three and five differentially expressed miRNAs were identified in BL vs. WL and RL vs. WL groups, respectively. Quantitative reverse transcription PCR was used to validate differentially expressed miRNAs (miR-200, miR-122, and miR-205b). In addition, 5824 target genes were annotated as differentially expressed miRNAs, most of which are involved in reproductive pathways including oestrogen signalling, cell cycle, and oocyte maturation. Notably, ovarian miR-205b expression was significantly negatively correlated with its target 11β-hydroxysteroid dehydrogenase type 1 (HSD11B1). miRNA-mRNA network analysis suggests that miR-205b targeting of HSD11B1 plays a key role in the effects of monochromatic light on pigeon egg production. These findings indicate that monochromatic light shortens the oviposition interval of pigeons, which may be useful for egg production and pigeon breeding.
MRI markers of small vessel disease in lobar and deep hemispheric intracerebral hemorrhage.
Smith, Eric E; Nandigam, Kaveer R N; Chen, Yu-Wei; Jeng, Jed; Salat, David; Halpin, Amy; Frosch, Matthew; Wendell, Lauren; Fazen, Louis; Rosand, Jonathan; Viswanathan, Anand; Greenberg, Steven M
2010-09-01
MRI evidence of small vessel disease is common in intracerebral hemorrhage (ICH). We hypothesized that ICH caused by cerebral amyloid angiopathy (CAA) or hypertensive vasculopathy would have different distributions of MRI T2 white matter hyperintensity (WMH) and microbleeds. Data were analyzed from 133 consecutive patients with primary supratentorial ICH and adequate MRI sequences. CAA was diagnosed using the Boston criteria. WMH segmentation was performed using a validated semiautomated method. WMH and microbleeds were compared according to site of symptomatic hematoma origin (lobar versus deep) or by pattern of hemorrhages, including both hematomas and microbleeds, on MRI gradient recalled echo sequence (grouped as lobar only-probable CAA, lobar only-possible CAA, deep hemispheric only, or mixed lobar and deep hemorrhages). Patients with lobar and deep hemispheric hematoma had similar median normalized WMH volumes (19.5 cm versus 19.9 cm(3), P=0.74) and prevalence of >or=1 microbleed (54% versus 52%, P=0.99). The supratentorial WMH distribution was similar according to hemorrhage location category; however, the prevalence of brain stem T2 hyperintensity was lower in lobar hematoma versus deep hematoma (54% versus 70%, P=0.004). Mixed ICH was common (23%). Patients with mixed ICH had large normalized WMH volumes and a posterior distribution of cortical hemorrhages similar to that seen in CAA. WMH distribution is largely similar between CAA-related and non-CAA-related ICH. Mixed lobar and deep hemorrhages are seen on MRI gradient recalled echo sequence in up to one fourth of patients; in these patients, both hypertension and CAA may be contributing to the burden of WMH.
Deep Sequencing Analysis of Apple Infecting Viruses in Korea
Cho, In-Sook; Igori, Davaajargal; Lim, Seungmo; Choi, Gug-Seoun; Hammond, John; Lim, Hyoun-Sub; Moon, Jae Sun
2016-01-01
Deep sequencing has generated 52 contigs derived from five viruses; Apple chlorotic leaf spot virus (ACLSV), Apple stem grooving virus (ASGV), Apple stem pitting virus (ASPV), Apple green crinkle associated virus (AGCaV), and Apricot latent virus (ApLV) were identified from eight apple samples showing small leaves and/or growth retardation. Nucleotide (nt) sequence identity of the assembled contigs was from 68% to 99% compared to the reference sequences of the five respective viral genomes. Sequences of ASPV and ASGV were the most abundantly represented by the 52 contigs assembled. The presence of the five viruses in the samples was confirmed by RT-PCR using specific primers based on the sequences of each assembled contig. All five viruses were detected in three of the samples, whereas all samples had mixed infections with at least two viruses. The most frequently detected virus was ASPV, followed by ASGV, ApLV, ACLSV, and AGCaV which were withal found in mixed infections in the tested samples. AGCaV was identified in assembled contigs ID 1012480 and 93549, which showed 82% and 78% nt sequence identity with ORF1 of AGCaV isolate Aurora-1. ApLV was identified in three assembled contigs, ID 65587, 1802365, and 116777, which showed 77%, 78%, and 76% nt sequence identity respectively with ORF1 of ApLV isolate LA2. Deep sequencing assay was shown to be a valuable and powerful tool for detection and identification of known and unknown virome in infected apple trees, here identifying ApLV and AGCaV in commercial orchards in Korea for the first time. PMID:27721694
Deep sequencing approaches for the analysis of prokaryotic transcriptional boundaries and dynamics.
James, Katherine; Cockell, Simon J; Zenkin, Nikolay
2017-05-01
The identification of the protein-coding regions of a genome is straightforward due to the universality of start and stop codons. However, the boundaries of the transcribed regions, conditional operon structures, non-coding RNAs and the dynamics of transcription, such as pausing of elongation, are non-trivial to identify, even in the comparatively simple genomes of prokaryotes. Traditional methods for the study of these areas, such as tiling arrays, are noisy, labour-intensive and lack the resolution required for densely-packed bacterial genomes. Recently, deep sequencing has become increasingly popular for the study of the transcriptome due to its lower costs, higher accuracy and single nucleotide resolution. These methods have revolutionised our understanding of prokaryotic transcriptional dynamics. Here, we review the deep sequencing and data analysis techniques that are available for the study of transcription in prokaryotes, and discuss the bioinformatic considerations of these analyses. Copyright © 2017 Elsevier Inc. All rights reserved.
Insertion sequences enrichment in extreme Red sea brine pool vent.
Elbehery, Ali H A; Aziz, Ramy K; Siam, Rania
2017-03-01
Mobile genetic elements are major agents of genome diversification and evolution. Limited studies addressed their characteristics, including abundance, and role in extreme habitats. One of the rare natural habitats exposed to multiple-extreme conditions, including high temperature, salinity and concentration of heavy metals, are the Red Sea brine pools. We assessed the abundance and distribution of different mobile genetic elements in four Red Sea brine pools including the world's largest known multiple-extreme deep-sea environment, the Red Sea Atlantis II Deep. We report a gradient in the abundance of mobile genetic elements, dramatically increasing in the harshest environment of the pool. Additionally, we identified a strong association between the abundance of insertion sequences and extreme conditions, being highest in the harshest and deepest layer of the Red Sea Atlantis II Deep. Our comparative analyses of mobile genetic elements in secluded, extreme and relatively non-extreme environments, suggest that insertion sequences predominantly contribute to polyextremophiles genome plasticity.
End-to-end deep neural network for optical inversion in quantitative photoacoustic imaging.
Cai, Chuangjian; Deng, Kexin; Ma, Cheng; Luo, Jianwen
2018-06-15
An end-to-end deep neural network, ResU-net, is developed for quantitative photoacoustic imaging. A residual learning framework is used to facilitate optimization and to gain better accuracy from considerably increased network depth. The contracting and expanding paths enable ResU-net to extract comprehensive context information from multispectral initial pressure images and, subsequently, to infer a quantitative image of chromophore concentration or oxygen saturation (sO 2 ). According to our numerical experiments, the estimations of sO 2 and indocyanine green concentration are accurate and robust against variations in both optical property and object geometry. An extremely short reconstruction time of 22 ms is achieved.
Kretova, Olga V; Chechetkin, Vladimir R; Fedoseeva, Daria M; Kravatsky, Yuri V; Sosin, Dmitri V; Alembekov, Ildar R; Gorbacheva, Maria A; Gashnikova, Natalya M; Tchurikov, Nickolai A
2017-02-01
Any method for silencing the activity of the HIV-1 retrovirus should tackle the extremely high variability of HIV-1 sequences and mutational escape. We studied sequence variability in the vicinity of selected RNA interference (RNAi) targets from isolates of HIV-1 subtype A in Russia, and we propose that using artificial RNAi is a potential alternative to traditional antiretroviral therapy. We prove that using multiple RNAi targets overcomes the variability in HIV-1 isolates. The optimal number of targets critically depends on the conservation of the target sequences. The total number of targets that are conserved with a probability of 0.7-0.8 should exceed at least 2. Combining deep sequencing and multitarget RNAi may provide an efficient approach to cure HIV/AIDS.
DeepLoc: prediction of protein subcellular localization using deep learning.
Almagro Armenteros, José Juan; Sønderby, Casper Kaae; Sønderby, Søren Kaae; Nielsen, Henrik; Winther, Ole
2017-11-01
The prediction of eukaryotic protein subcellular localization is a well-studied topic in bioinformatics due to its relevance in proteomics research. Many machine learning methods have been successfully applied in this task, but in most of them, predictions rely on annotation of homologues from knowledge databases. For novel proteins where no annotated homologues exist, and for predicting the effects of sequence variants, it is desirable to have methods for predicting protein properties from sequence information only. Here, we present a prediction algorithm using deep neural networks to predict protein subcellular localization relying only on sequence information. At its core, the prediction model uses a recurrent neural network that processes the entire protein sequence and an attention mechanism identifying protein regions important for the subcellular localization. The model was trained and tested on a protein dataset extracted from one of the latest UniProt releases, in which experimentally annotated proteins follow more stringent criteria than previously. We demonstrate that our model achieves a good accuracy (78% for 10 categories; 92% for membrane-bound or soluble), outperforming current state-of-the-art algorithms, including those relying on homology information. The method is available as a web server at http://www.cbs.dtu.dk/services/DeepLoc. Example code is available at https://github.com/JJAlmagro/subcellular_localization. The dataset is available at http://www.cbs.dtu.dk/services/DeepLoc/data.php. jjalma@dtu.dk. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Xiong, Dapeng; Zeng, Jianyang; Gong, Haipeng
2017-09-01
Residue-residue contacts are of great value for protein structure prediction, since contact information, especially from those long-range residue pairs, can significantly reduce the complexity of conformational sampling for protein structure prediction in practice. Despite progresses in the past decade on protein targets with abundant homologous sequences, accurate contact prediction for proteins with limited sequence information is still far from satisfaction. Methodologies for these hard targets still need further improvement. We presented a computational program DeepConPred, which includes a pipeline of two novel deep-learning-based methods (DeepCCon and DeepRCon) as well as a contact refinement step, to improve the prediction of long-range residue contacts from primary sequences. When compared with previous prediction approaches, our framework employed an effective scheme to identify optimal and important features for contact prediction, and was only trained with coevolutionary information derived from a limited number of homologous sequences to ensure robustness and usefulness for hard targets. Independent tests showed that 59.33%/49.97%, 64.39%/54.01% and 70.00%/59.81% of the top L/5, top L/10 and top 5 predictions were correct for CASP10/CASP11 proteins, respectively. In general, our algorithm ranked as one of the best methods for CASP targets. All source data and codes are available at http://166.111.152.91/Downloads.html . hgong@tsinghua.edu.cn or zengjy321@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Rath, Matthias; Jenssen, Sönke E; Schwefel, Konrad; Spiegler, Stefanie; Kleimeier, Dana; Sperling, Christian; Kaderali, Lars; Felbor, Ute
2017-09-01
Cerebral cavernous malformations (CCM) are vascular lesions of the central nervous system that can cause headaches, seizures and hemorrhagic stroke. Disease-associated mutations have been identified in three genes: CCM1/KRIT1, CCM2 and CCM3/PDCD10. The precise proportion of deep-intronic variants in these genes and their clinical relevance is yet unknown. Here, a long-range PCR (LR-PCR) approach for target enrichment of the entire genomic regions of the three genes was combined with next generation sequencing (NGS) to screen for coding and non-coding variants. NGS detected all six CCM1/KRIT1, two CCM2 and four CCM3/PDCD10 mutations that had previously been identified by Sanger sequencing. Two of the pathogenic variants presented here are novel. Additionally, 20 stringently selected CCM index cases that had remained mutation-negative after conventional sequencing and exclusion of copy number variations were screened for deep-intronic mutations. The combination of bioinformatics filtering and transcript analyses did not reveal any deep-intronic splice mutations in these cases. Our results demonstrate that target enrichment by LR-PCR combined with NGS can be used for a comprehensive analysis of the entire genomic regions of the CCM genes in a research context. However, its clinical utility is limited as deep-intronic splice mutations in CCM1/KRIT1, CCM2 and CCM3/PDCD10 seem to be rather rare. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
Musculoskeletal MRI findings of juvenile localized scleroderma.
Eutsler, Eric P; Horton, Daniel B; Epelman, Monica; Finkel, Terri; Averill, Lauren W
2017-04-01
Juvenile localized scleroderma comprises a group of autoimmune conditions often characterized clinically by an area of skin hardening. In addition to superficial changes in the skin and subcutaneous tissues, juvenile localized scleroderma may involve the deep soft tissues, bones and joints, possibly resulting in functional impairment and pain in addition to cosmetic changes. There is literature documenting the spectrum of findings for deep involvement of localized scleroderma (fascia, muscles, tendons, bones and joints) in adults, but there is limited literature for the condition in children. We aimed to document the spectrum of musculoskeletal magnetic resonance imaging (MRI) findings of both superficial and deep juvenile localized scleroderma involvement in children and to evaluate the utility of various MRI sequences for detecting those findings. Two radiologists retrospectively evaluated 20 MRI studies of the extremities in 14 children with juvenile localized scleroderma. Each imaging sequence was also given a subjective score of 0 (not useful), 1 (somewhat useful) or 2 (most useful for detecting the findings). Deep tissue involvement was detected in 65% of the imaged extremities. Fascial thickening and enhancement were seen in 50% of imaged extremities. Axial T1, axial T1 fat-suppressed (FS) contrast-enhanced and axial fluid-sensitive sequences were rated most useful. Fascial thickening and enhancement were the most commonly encountered deep tissue findings in extremity MRIs of children with juvenile localized scleroderma. Because abnormalities of the skin, subcutaneous tissues and fascia tend to run longitudinally in an affected limb, axial T1, axial fluid-sensitive and axial T1-FS contrast-enhanced sequences should be included in the imaging protocol.
Dissecting enzyme function with microfluidic-based deep mutational scanning.
Romero, Philip A; Tran, Tuan M; Abate, Adam R
2015-06-09
Natural enzymes are incredibly proficient catalysts, but engineering them to have new or improved functions is challenging due to the complexity of how an enzyme's sequence relates to its biochemical properties. Here, we present an ultrahigh-throughput method for mapping enzyme sequence-function relationships that combines droplet microfluidic screening with next-generation DNA sequencing. We apply our method to map the activity of millions of glycosidase sequence variants. Microfluidic-based deep mutational scanning provides a comprehensive and unbiased view of the enzyme function landscape. The mapping displays expected patterns of mutational tolerance and a strong correspondence to sequence variation within the enzyme family, but also reveals previously unreported sites that are crucial for glycosidase function. We modified the screening protocol to include a high-temperature incubation step, and the resulting thermotolerance landscape allowed the discovery of mutations that enhance enzyme thermostability. Droplet microfluidics provides a general platform for enzyme screening that, when combined with DNA-sequencing technologies, enables high-throughput mapping of enzyme sequence space.
USDA-ARS?s Scientific Manuscript database
The complete genome sequence of a Southern tomato virus (STV) isolate on tomato plants in a seed production field in Bangladesh was obtained for the first time using next generation sequencing. The identified isolate STV_BD-13 shares high degree of sequence identity (99%) with several known STV isol...
USDA-ARS?s Scientific Manuscript database
Complete genome sequence of a double-stranded RNA (dsRNA) virus, southern tomato virus (STV), on tomatoes in China, was elucidated using small RNAs deep sequencing. The identified STV_CN12 shares 99% sequence identity to other isolates from Mexico, France, Spain, and U.S. This is the first report ...
Deep whole-genome sequencing of 100 southeast Asian Malays.
Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying
2013-01-10
Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Deep Whole-Genome Sequencing of 100 Southeast Asian Malays
Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying
2013-01-01
Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. PMID:23290073
NASA Astrophysics Data System (ADS)
Kanari, M.; Ketter, T.; Tibor, G.; Schattner, U.
2017-12-01
We aim to characterize the seafloor morphology and its shallow sub-surface structures and deformations in the deep part of the Levant basin (eastern Mediterranean) using recently acquired high-resolution shallow seismic reflection data and multibeam bathymetry, which allow quantitative analysis of morphology and structure. The Levant basin at the eastern Mediterranean is considered a passive continental margin, where most of the recent geological processes were related in literature to salt tectonics rooted at the Messinian deposits from 6Ma. We analyzed two sets of recently acquired high-resolution data from multibeam bathymetry and 3.5 kHz Chirp sub-bottom seismic reflection in the deep basin of the continental shelf offshore Israel (water depths up to 2100 m). Semi-automatic mapping of seafloor features and seismic data interpretation resulted in quantitative morphological analysis of the seafloor and its underlying sediment with penetration depth up to 60 m. The quantitative analysis and its interpretation are still in progress. Preliminary results reveal distinct morphologies of four major elements: channels, faults, folds and sediment waves, validated by seismic data. From the spatial distribution and orientation analyses of these phenomena, we identify two primary process types which dominate the formation of the seafloor in the Levant basin: structural and sedimentary. Characterization of the geological and geomorphological processes forming the seafloor helps to better understand the transport mechanisms and the relations between sediment transport and deposition in deep water and the shallower parts of the shelf and slope.
EZH2 mutations and promoter hypermethylation in childhood acute lymphoblastic leukemia.
Schäfer, Vivien; Ernst, Jana; Rinke, Jenny; Winkelmann, Nils; Beck, James F; Hochhaus, Andreas; Gruhn, Bernd; Ernst, Thomas
2016-07-01
Acute lymphoblastic leukemia (ALL) is the most common malignancy in children and young adults. The polycomb repressive complex 2 (PRC2) has been identified as one of the most frequently mutated epigenetic protein complexes in hematologic cancers. PRC2 acts as an epigenetic repressor through histone H3 lysine 27 trimethylation (H3K27me3), catalyzed by the histone methyltransferase enhancer of zeste homolog 2 protein (EZH2). To study the prevalence and clinical impact of PRC2 aberrations in an unselected childhood ALL cohort (n = 152), we performed PRC2 mutational screenings by Sanger sequencing and promoter methylation analyses by quantitative pyrosequencing for the three PRC2 core component genes EZH2, suppressor of zeste 12 (SUZ12), and embryonic ectoderm development (EED). Targeted deep next-generation sequencing of 30 frequently mutated genes in leukemia was performed to search for cooperating mutations in patients harboring PRC2 aberrations. Finally, the functional consequence of EZH2 promoter hypermethylation on H3K27me3 was studied by Western blot analyses of primary cells. Loss-of-function EZH2 mutations were detected in 2/152 (1.3 %) patients with common-ALL and early T-cell precursor (ETP)-ALL, respectively. In one patient, targeted deep sequencing identified cooperating mutations in ASXL1 and TET2. EZH2 promoter hypermethylation was found in one patient with ETP-ALL which led to reduced H3K27me3. In comparison with healthy children, the EZH2 promoter was significantly higher methylated in T-ALL patients. No mutations or promoter methylation changes were identified for SUZ12 or EED genes, respectively. Although PRC2 aberrations seem to be rare in childhood ALL, our findings indicate that EZH2 aberrations might contribute to the disease in specific cases. Hereby, EZH2 promoter hypermethylation might have functionally similar consequences as loss-of-function mutations.
AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling.
Wang, Sheng; Sun, Siqi; Xu, Jinbo
2016-09-01
Deep Convolutional Neural Networks (DCNN) has shown excellent performance in a variety of machine learning tasks. This paper presents Deep Convolutional Neural Fields (DeepCNF), an integration of DCNN with Conditional Random Field (CRF), for sequence labeling with an imbalanced label distribution. The widely-used training methods, such as maximum-likelihood and maximum labelwise accuracy, do not work well on imbalanced data. To handle this, we present a new training algorithm called maximum-AUC for DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced data. To fulfill this, we formulate AUC in a pairwise ranking framework, approximate it by a polynomial function and then apply a gradient-based procedure to optimize it. Our experimental results confirm that maximum-AUC greatly outperforms the other two training methods on 8-state secondary structure prediction and disorder prediction since their label distributions are highly imbalanced and also has similar performance as the other two training methods on solvent accessibility prediction, which has three equally-distributed labels. Furthermore, our experimental results show that our AUC-trained DeepCNF models greatly outperform existing popular predictors of these three tasks. The data and software related to this paper are available at https://github.com/realbigws/DeepCNF_AUC.
AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling
Wang, Sheng; Sun, Siqi
2017-01-01
Deep Convolutional Neural Networks (DCNN) has shown excellent performance in a variety of machine learning tasks. This paper presents Deep Convolutional Neural Fields (DeepCNF), an integration of DCNN with Conditional Random Field (CRF), for sequence labeling with an imbalanced label distribution. The widely-used training methods, such as maximum-likelihood and maximum labelwise accuracy, do not work well on imbalanced data. To handle this, we present a new training algorithm called maximum-AUC for DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced data. To fulfill this, we formulate AUC in a pairwise ranking framework, approximate it by a polynomial function and then apply a gradient-based procedure to optimize it. Our experimental results confirm that maximum-AUC greatly outperforms the other two training methods on 8-state secondary structure prediction and disorder prediction since their label distributions are highly imbalanced and also has similar performance as the other two training methods on solvent accessibility prediction, which has three equally-distributed labels. Furthermore, our experimental results show that our AUC-trained DeepCNF models greatly outperform existing popular predictors of these three tasks. The data and software related to this paper are available at https://github.com/realbigws/DeepCNF_AUC. PMID:28884168
Liu, Hongjun; Zhang, Lin; Wang, Jiechen; Li, Changsheng; Zeng, Xing; Xie, Shupeng; Zhang, Yongzhong; Liu, Sisi; Hu, Songlin; Wang, Jianhua; Lee, Michael; Lübberstedt, Thomas; Zhao, Guangwu
2017-01-01
Deep-sowing is an effective measure to ensure seeds absorbing water from deep soil layer and emerging normally in arid and semiarid regions. However, existing varieties demonstrate poor germination ability in deep soil layer and some key quantitative trait loci (QTL) or genes related to deep-sowing germination ability remain to be identified and analyzed. In this study, a high-resolution genetic map based on 280 lines of the intermated B73 × Mo17 (IBM) Syn10 doubled haploid (DH) population which comprised 6618 bin markers was used for the QTL analysis of deep-sowing germination related traits. The results showed significant differences in germination related traits under deep-sowing condition (12.5 cm) and standard-germination condition (2 cm) between two parental lines. In total, 8, 11, 13, 15, and 18 QTL for germination rate, seedling length, mesocotyl length, plumule length, and coleoptile length were detected for the two sowing conditions, respectively. These QTL explained 2.51–7.8% of the phenotypic variance with LOD scores ranging from 2.52 to 7.13. Additionally, 32 overlapping QTL formed 11 QTL clusters on all chromosomes except for chromosome 8, indicating the minor effect genes have a pleiotropic role in regulating various traits. Furthermore, we identified six candidate genes related to deep-sowing germination ability, which were co-located in the cluster regions. The results provide a basis for molecular marker assisted breeding and functional study in deep-sowing germination ability of maize. PMID:28588594
Liu, Hongjun; Zhang, Lin; Wang, Jiechen; Li, Changsheng; Zeng, Xing; Xie, Shupeng; Zhang, Yongzhong; Liu, Sisi; Hu, Songlin; Wang, Jianhua; Lee, Michael; Lübberstedt, Thomas; Zhao, Guangwu
2017-01-01
Deep-sowing is an effective measure to ensure seeds absorbing water from deep soil layer and emerging normally in arid and semiarid regions. However, existing varieties demonstrate poor germination ability in deep soil layer and some key quantitative trait loci (QTL) or genes related to deep-sowing germination ability remain to be identified and analyzed. In this study, a high-resolution genetic map based on 280 lines of the intermated B73 × Mo17 (IBM) Syn10 doubled haploid (DH) population which comprised 6618 bin markers was used for the QTL analysis of deep-sowing germination related traits. The results showed significant differences in germination related traits under deep-sowing condition (12.5 cm) and standard-germination condition (2 cm) between two parental lines. In total, 8, 11, 13, 15, and 18 QTL for germination rate, seedling length, mesocotyl length, plumule length, and coleoptile length were detected for the two sowing conditions, respectively. These QTL explained 2.51-7.8% of the phenotypic variance with LOD scores ranging from 2.52 to 7.13. Additionally, 32 overlapping QTL formed 11 QTL clusters on all chromosomes except for chromosome 8, indicating the minor effect genes have a pleiotropic role in regulating various traits. Furthermore, we identified six candidate genes related to deep-sowing germination ability, which were co-located in the cluster regions. The results provide a basis for molecular marker assisted breeding and functional study in deep-sowing germination ability of maize.
Joseph, Arun Antony; Merboldt, Klaus-Dietmar; Voit, Dirk; Dahm, Johannes; Frahm, Jens
2016-12-01
The accurate assessment of peripheral venous flow is important for the early diagnosis and treatment of disorders such as deep-vein thrombosis (DVT) which is a major cause of post-thrombotic syndrome or even death due to pulmonary embolism. The aim of this work is to quantitatively determine blood flow in deep veins during rest and muscular exercise using a novel real-time magnetic resonance imaging (MRI) method for velocity-encoded phase-contrast (PC) MRI at high spatiotemporal resolution. Real-time PC MRI of eight healthy volunteers and one patient was performed at 3 Tesla (Prisma fit, Siemens, Erlangen, Germany) using a flexible 16-channel receive coil (Variety, NORAS, Hoechberg, Germany). Acquisitions were based on a highly undersampled radial FLASH sequence with image reconstruction by regularized nonlinear inversion at 0.5×0.5×6 mm 3 spatial resolution and 100 ms temporal resolution. Flow was assessed in two cross-sections of the lower leg at the level of the calf muscle and knee using a protocol of 10 s rest, 20 s flexion and extension of the foot, and 10 s rest. Quantitative analyses included through-plane flow in the right posterior tibial, right peroneal and popliteal vein (PC maps) as well as signal intensity changes due to flow and muscle movements (corresponding magnitude images). Real-time PC MRI successfully monitored the dynamics of venous flow at high spatiotemporal resolution and clearly demonstrated increased flow in deep veins in response to flexion and extension of the foot. In normal subjects, the maximum velocity (averaged across vessel lumen) during exercise was 9.4±5.7 cm·s -1 for the right peroneal vein, 8.5±4.6 cm·s -1 for the right posterior tibial vein and 17.8±5.8 cm·s -1 for the popliteal vein. The integrated flow volume per exercise (20 s) was 1.9, 1.6 and 50 mL (mean across subjects) for right peroneal, right posterior tibial and popliteal vein, respectively. A patient with DVT presented with peak flow velocities of only about 2 cm·s -1 during exercise and less than 1 cm·s -1 during rest. Real-time PC MRI emerges as a new tool for quantifying the dynamics of muscle-induced flow in deep veins. The method provides both signal intensity changes and velocity information for the assessment of blood flow and muscle movements. It now warrants extended clinical trials to patients with suspected thrombosis.
Aasvang, E K; Werner, M U; Kehlet, H
2014-09-01
Deep pain complaints are more frequent than cutaneous in post-surgical patients, and a prevalent finding in quantitative sensory testing studies. However, the preferred assessment method - pressure algometry - is indirect and tissue unspecific, hindering advances in treatment and preventive strategies. Thus, there is a need for development of methods with direct stimulation of suspected hyperalgesic tissues to identify the peripheral origin of nociceptive input. We compared the reliability of an ultrasound-guided needle stimulation protocol of electrical detection and pain thresholds to pressure algometry, by performing identical test-retest sequences 10 days apart, in deep tissues in the groin region. Electrical stimulation was performed by five up-and-down staircase series of single impulses of 0.04 ms duration, starting from 0 mA in increments of 0.2 mA until a threshold was reached and descending until sensation was lost. Method reliability was assessed by Bland-Altman plots, descriptive statistics, coefficients of variance and intraclass correlation coefficients. The electrical stimulation method was comparable to pressure algometry regarding 10 days test-retest repeatability, but with superior same-day reliability for electrical stimulation (P < 0.05). Between-subject variance rather than within-subject variance was the main source for test variation. There were no systematic differences in electrical thresholds across tissues and locations (P > 0.05). The presented tissue-specific direct deep tissue electrical stimulation technique has equal or superior reliability compared with the indirect tissue-unspecific stimulation by pressure algometry. This method may facilitate advances in mechanism based preventive and treatment strategies in acute and chronic post-surgical pain states. © 2014 The Acta Anaesthesiologica Scandinavica Foundation. Published by John Wiley & Sons Ltd.
USDA-ARS?s Scientific Manuscript database
Over the past decade, Next Generation Sequencing (NGS) technologies, also called deep sequencing, have continued to evolve, increasing capacity and lower the cost necessary for large genome sequencing projects. The one of the advantage of NGS platforms is the possibility to sequence the samples with...
Yildirim, Özal
2018-05-01
Long-short term memory networks (LSTMs), which have recently emerged in sequential data analysis, are the most widely used type of recurrent neural networks (RNNs) architecture. Progress on the topic of deep learning includes successful adaptations of deep versions of these architectures. In this study, a new model for deep bidirectional LSTM network-based wavelet sequences called DBLSTM-WS was proposed for classifying electrocardiogram (ECG) signals. For this purpose, a new wavelet-based layer is implemented to generate ECG signal sequences. The ECG signals were decomposed into frequency sub-bands at different scales in this layer. These sub-bands are used as sequences for the input of LSTM networks. New network models that include unidirectional (ULSTM) and bidirectional (BLSTM) structures are designed for performance comparisons. Experimental studies have been performed for five different types of heartbeats obtained from the MIT-BIH arrhythmia database. These five types are Normal Sinus Rhythm (NSR), Ventricular Premature Contraction (VPC), Paced Beat (PB), Left Bundle Branch Block (LBBB), and Right Bundle Branch Block (RBBB). The results show that the DBLSTM-WS model gives a high recognition performance of 99.39%. It has been observed that the wavelet-based layer proposed in the study significantly improves the recognition performance of conventional networks. This proposed network structure is an important approach that can be applied to similar signal processing problems. Copyright © 2018 Elsevier Ltd. All rights reserved.
Protein Solvent-Accessibility Prediction by a Stacked Deep Bidirectional Recurrent Neural Network.
Zhang, Buzhong; Li, Linqing; Lü, Qiang
2018-05-25
Residue solvent accessibility is closely related to the spatial arrangement and packing of residues. Predicting the solvent accessibility of a protein is an important step to understand its structure and function. In this work, we present a deep learning method to predict residue solvent accessibility, which is based on a stacked deep bidirectional recurrent neural network applied to sequence profiles. To capture more long-range sequence information, a merging operator was proposed when bidirectional information from hidden nodes was merged for outputs. Three types of merging operators were used in our improved model, with a long short-term memory network performing as a hidden computing node. The trained database was constructed from 7361 proteins extracted from the PISCES server using a cut-off of 25% sequence identity. Sequence-derived features including position-specific scoring matrix, physical properties, physicochemical characteristics, conservation score and protein coding were used to represent a residue. Using this method, predictive values of continuous relative solvent-accessible area were obtained, and then, these values were transformed into binary states with predefined thresholds. Our experimental results showed that our deep learning method improved prediction quality relative to current methods, with mean absolute error and Pearson's correlation coefficient values of 8.8% and 74.8%, respectively, on the CB502 dataset and 8.2% and 78%, respectively, on the Manesh215 dataset.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mello, M.R.; Soldan, A.L.; Maxwell, J.R.
A geochemical and biological marker investigation of a variety of oils from offshore Brazil and west Africa, ranging in age from Lower Cretaceous to Tertiary, has been done, with the following aims: (1) assessing the depositional environment of source rocks, (2) correlating the reservoired oils, (3) comparing the Brazilian oils with their west African counterparts. The approach was based in stable isotope data; bulk, elemental, and hydrous pyrolysis results; and molecular studies involving quantitative geological marker investigations of alkanes using GC-MS and GC-MS-MS. The results reveal similarities between groups of oils from each side of the Atlantic and suggest anmore » origin from source rocks deposited in five types of depositional environment: lacustrine fresh water, lacustrine saline water, marine evaporitic/carbonate, restricted marine anoxic, and marine deltaic. In west Africa, the Upper Cretaceous marine anoxic succession (Cenomanian-Santonian) appears to be a major oil producer, but in Brazil it is generally immature. The Brazilian offshore oils have arisen mainly from the pre-salt sequence, whereas the African oils show a balance between origins from the pre-salt and marine sequences. The integration of the geochemical and geological data indicate that new frontiers of hydrocarbon exploration in the west African basins must consider the Tertiary reservoirs in the offshore area of Niger Delta, the reservoirs of the rift sequences in the shallow-water areas of south Gabon, Congo, and Cuanza basins, and the reservoirs from the drift sequences (post-salt) in the deep-water areas of Gabon, Congo Cabinda, and Cuanza basins.« less
Analysis of miRNA expression profiles in melatonin-exposed GC-1 spg cell line.
Zhu, Xiaoling; Chen, Shuxiong; Jiang, Yanwen; Xu, Ying; Zhao, Yun; Chen, Lu; Li, Chunjin; Zhou, Xu
2018-02-05
Melatonin is an endocrine neurohormone secreted by pinealocytes in the pineal gland. It exerts diverse physiological effects, such as circadian rhythm regulator and antioxidant. However, the functional importance of melatonin in spermatogenesis regulation remains unclear. The objectives of this study are to: (1) detect melatonin affection on miRNA expression profiles in GC-1 spg cells by miRNA deep sequencing (DeepSeq) and (2) define melatonin affected miRNA-mRNA interactions and associated biological processes using bioinformatics analysis. GC-1 spg cells were cultured with melatonin (10 -7 M) for 24h. DeepSeq data were validated using quantitative real-time reverse transcription polymerase chain reaction analysis (qRT-PCR). A total of 176 miRNA expressions were found to be significantly different between two groups (fold change of >2 or <0.5 and FDR<0.05). Among these expressions, 171 were up-regulated, and 5 were down-regulated. Ontology analysis of biological processes of these targets indicated a variety of biological functions. Pathway analysis indicated that the predicted targets were involved in cancers, apoptosis and signaling pathways, such as VEGF, TNF, Ras and Notch. Results implicated that melatonin could regulate the expression of miRNA to perform its physiological effects in GC-1 spg cells. These results should be useful to investigate the biological function of miRNAs regulated by melatonin in spermatogenesis and testicular germ cell tumor. Copyright © 2017 Elsevier B.V. All rights reserved.
Bevan, Samantha J; Chan, Cecilia W L; Tanner, Julian A
2014-01-01
Although there is increasing evidence for a relationship between courses that emphasize student engagement and achievement of student deep learning, there is a paucity of quantitative comparative studies in a biochemistry and molecular biology context. Here, we present a pedagogical study in two contrasting parallel biochemistry introductory courses to compare student surface and deep learning. Surface and deep learning were measured quantitatively by a study process questionnaire at the start and end of the semester, and qualitatively by questionnaires and interviews with students. In the traditional lecture/examination based course, there was a dramatic shift to surface learning approaches through the semester. In the course that emphasized student engagement and adopted multiple forms of assessment, a preference for deep learning was sustained with only a small reduction through the semester. Such evidence for the benefits of implementing student engagement and more diverse non-examination based assessment has important implications for the design, delivery, and renewal of introductory courses in biochemistry and molecular biology. © 2014 The International Union of Biochemistry and Molecular Biology.
Lahuerta, Juan J.; Pepin, François; González, Marcos; Barrio, Santiago; Ayala, Rosa; Puig, Noemí; Montalban, María A.; Paiva, Bruno; Weng, Li; Jiménez, Cristina; Sopena, María; Moorhead, Martin; Cedena, Teresa; Rapado, Immaculada; Mateos, María Victoria; Rosiñol, Laura; Oriol, Albert; Blanchard, María J.; Martínez, Rafael; Bladé, Joan; San Miguel, Jesús; Faham, Malek; García-Sanz, Ramón
2014-01-01
We assessed the prognostic value of minimal residual disease (MRD) detection in multiple myeloma (MM) patients using a sequencing-based platform in bone marrow samples from 133 MM patients in at least very good partial response (VGPR) after front-line therapy. Deep sequencing was carried out in patients in whom a high-frequency myeloma clone was identified and MRD was assessed using the IGH-VDJH, IGH-DJH, and IGK assays. The results were contrasted with those of multiparametric flow cytometry (MFC) and allele-specific oligonucleotide polymerase chain reaction (ASO-PCR). The applicability of deep sequencing was 91%. Concordance between sequencing and MFC and ASO-PCR was 83% and 85%, respectively. Patients who were MRD– by sequencing had a significantly longer time to tumor progression (TTP) (median 80 vs 31 months; P < .0001) and overall survival (median not reached vs 81 months; P = .02), compared with patients who were MRD+. When stratifying patients by different levels of MRD, the respective TTP medians were: MRD ≥10−3 27 months, MRD 10−3 to 10−5 48 months, and MRD <10−5 80 months (P = .003 to .0001). Ninety-two percent of VGPR patients were MRD+. In complete response patients, the TTP remained significantly longer for MRD– compared with MRD+ patients (131 vs 35 months; P = .0009). PMID:24646471
Giudice, Valentina; Feng, Xingmin; Lin, Zenghua; Hu, Wei; Zhang, Fanmao; Qiao, Wangmin; Ibanez, Maria Del Pilar Fernandez; Rios, Olga; Young, Neal S
2018-05-01
Oligoclonal expansion of CD8 + CD28 - lymphocytes has been considered indirect evidence for a pathogenic immune response in acquired aplastic anemia. A subset of CD8 + CD28 - cells with CD57 expression, termed effector memory cells, is expanded in several immune-mediated diseases and may have a role in immune surveillance. We hypothesized that effector memory CD8 + CD28 - CD57 + cells may drive aberrant oligoclonal expansion in aplastic anemia. We found CD8 + CD57 + cells frequently expanded in the blood of aplastic anemia patients, with oligoclonal characteristics by flow cytometric Vβ usage analysis: skewing in 1-5 Vβ families and frequencies of immunodominant clones ranging from 1.98% to 66.5%. Oligoclonal characteristics were also observed in total CD8 + cells from aplastic anemia patients with CD8 + CD57 + cell expansion by T-cell receptor deep sequencing, as well as the presence of 1-3 immunodominant clones. Oligoclonality was confirmed by T-cell receptor repertoire deep sequencing of enriched CD8 + CD57 + cells, which also showed decreased diversity compared to total CD4 + and CD8 + cell pools. From analysis of complementarity-determining region 3 sequences in the CD8 + cell pool, a total of 29 sequences were shared between patients and controls, but these sequences were highly expressed in aplastic anemia subjects and also present in their immunodominant clones. In summary, expansion of effector memory CD8 + T cells is frequent in aplastic anemia and mirrors Vβ oligoclonal expansion. Flow cytometric Vβ usage analysis combined with deep sequencing technologies allows high resolution characterization of the T-cell receptor repertoire, and might represent a useful tool in the diagnosis and periodic evaluation of aplastic anemia patients. (Registered at clinicaltrials.gov identifiers: 00001620, 01623167, 00001397, 00071045, 00081523, 00961064 ). Copyright © 2018 Ferrata Storti Foundation.
Deep-brain-stimulation does not impair deglutition in Parkinson's disease.
Lengerer, Sabrina; Kipping, Judy; Rommel, Natalie; Weiss, Daniel; Breit, Sorin; Gasser, Thomas; Plewnia, Christian; Krüger, Rejko; Wächter, Tobias
2012-08-01
A large proportion of patients with Parkinson's disease develop dysphagia during the course of the disease. Dysphagia in Parkinson's disease affects different phases of deglutition, has a strong impact on quality of life and may cause severe complications, i.e., aspirational pneumonia. So far, little is known on how deep-brain-stimulation of the subthalamic nucleus influences deglutition in PD. Videofluoroscopic swallowing studies on 18 patients with Parkinson's disease, which had been performed preoperatively, and postoperatively with deep-brain-stimulation-on and deep-brain-stimulation-off, were analyzed retrospectively. The patients were examined in each condition with three consistencies (viscous, fluid and solid). The 'New Zealand index for multidisciplinary evaluation of swallowing (NZIMES) Subscale One' for qualitative and 'Logemann-MBS-Parameters' for quantitative evaluation were assessed. Preoperatively, none of the patients presented with clinically relevant signs of dysphagia. While postoperatively, the mean daily levodopa equivalent dosage was reduced by 50% and deep-brain-stimulation led to a 50% improvement in motor symptoms measured by the UPDRS III, no clinically relevant influence of deep-brain-stimulation-on swallowing was observed using qualitative parameters (NZIMES). However quantitative parameters (Logemann scale) found significant changes of pharyngeal parameters with deep-brain-stimulation-on as compared to preoperative condition and deep-brain-stimulation-off mostly with fluid consistency. In Parkinson patients without dysphagia deep-brain-stimulation of the subthalamic nucleus modulates the pharyngeal deglutition phase but has no clinically relevant influence on deglutition. Further studies are needed to test if deep-brain-stimulation is a therapeutic option for patients with swallowing disorders. Copyright © 2012 Elsevier Ltd. All rights reserved.
Complete genome sequence of a tomato infecting tomato mottle mosaic virus in New York
USDA-ARS?s Scientific Manuscript database
Complete genome sequence of an emerging isolate of tomato mottle mosaic virus (ToMMV) infecting experimental nicotianan benthamiana plants in up-state New York was obtained using small RNA deep sequencing. ToMMV_NY-13 shared 99% sequence identity to ToMMV isolates from Mexico and Florida. Broader d...
Bidlingmaier, Scott; Ha, Kevin; Lee, Nam-Kyung; Su, Yang; Liu, Bin
2016-04-01
Although the bioactive sphingolipid ceramide is an important cell signaling molecule, relatively few direct ceramide-interacting proteins are known. We used an approach combining yeast surface cDNA display and deep sequencing technology to identify novel proteins binding directly to ceramide. We identified 234 candidate ceramide-binding protein fragments and validated binding for 20. Most (17) bound selectively to ceramide, although a few (3) bound to other lipids as well. Several novel ceramide-binding domains were discovered, including the EF-hand calcium-binding motif, the heat shock chaperonin-binding motif STI1, the SCP2 sterol-binding domain, and the tetratricopeptide repeat region motif. Interestingly, four of the verified ceramide-binding proteins (HPCA, HPCAL1, NCS1, and VSNL1) and an additional three candidate ceramide-binding proteins (NCALD, HPCAL4, and KCNIP3) belong to the neuronal calcium sensor family of EF hand-containing proteins. We used mutagenesis to map the ceramide-binding site in HPCA and to create a mutant HPCA that does not bind to ceramide. We demonstrated selective binding to ceramide by mammalian cell-produced wild type but not mutant HPCA. Intriguingly, we also identified a fragment from prostaglandin D2synthase that binds preferentially to ceramide 1-phosphate. The wide variety of proteins and domains capable of binding to ceramide suggests that many of the signaling functions of ceramide may be regulated by direct binding to these proteins. Based on the deep sequencing data, we estimate that our yeast surface cDNA display library covers ∼60% of the human proteome and our selection/deep sequencing protocol can identify target-interacting protein fragments that are present at extremely low frequency in the starting library. Thus, the yeast surface cDNA display/deep sequencing approach is a rapid, comprehensive, and flexible method for the analysis of protein-ligand interactions, particularly for the study of non-protein ligands. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.
Sheik, Cody S.; Reese, Brandi Kiel; Twing, Katrina I.; Sylvan, Jason B.; Grim, Sharon L.; Schrenk, Matthew O.; Sogin, Mitchell L.; Colwell, Frederick S.
2018-01-01
Earth’s subsurface environment is one of the largest, yet least studied, biomes on Earth, and many questions remain regarding what microorganisms are indigenous to the subsurface. Through the activity of the Census of Deep Life (CoDL) and the Deep Carbon Observatory, an open access 16S ribosomal RNA gene sequence database from diverse subsurface environments has been compiled. However, due to low quantities of biomass in the deep subsurface, the potential for incorporation of contaminants from reagents used during sample collection, processing, and/or sequencing is high. Thus, to understand the ecology of subsurface microorganisms (i.e., the distribution, richness, or survival), it is necessary to minimize, identify, and remove contaminant sequences that will skew the relative abundances of all taxa in the sample. In this meta-analysis, we identify putative contaminants associated with the CoDL dataset, recommend best practices for removing contaminants from samples, and propose a series of best practices for subsurface microbiology sampling. The most abundant putative contaminant genera observed, independent of evenness across samples, were Propionibacterium, Aquabacterium, Ralstonia, and Acinetobacter. While the top five most frequently observed genera were Pseudomonas, Propionibacterium, Acinetobacter, Ralstonia, and Sphingomonas. The majority of the most frequently observed genera (high evenness) were associated with reagent or potential human contamination. Additionally, in DNA extraction blanks, we observed potential archaeal contaminants, including methanogens, which have not been discussed in previous contamination studies. Such contaminants would directly affect the interpretation of subsurface molecular studies, as methanogenesis is an important subsurface biogeochemical process. Utilizing previously identified contaminant genera, we found that ∼27% of the total dataset were identified as contaminant sequences that likely originate from DNA extraction and DNA cleanup methods. Thus, controls must be taken at every step of the collection and processing procedure when working with low biomass environments such as, but not limited to, portions of Earth’s deep subsurface. Taken together, we stress that the CoDL dataset is an incredible resource for the broader research community interested in subsurface life, and steps to remove contamination derived sequences must be taken prior to using this dataset. PMID:29780369
Le, Thuy; Chiarella, Jennifer; Simen, Birgitte B; Hanczaruk, Bozena; Egholm, Michael; Landry, Marie L; Dieckhaus, Kevin; Rosen, Marc I; Kozal, Michael J
2009-06-29
It is largely unknown how frequently low-abundance HIV drug-resistant variants at levels under limit of detection of conventional genotyping (<20% of quasi-species) are present in antiretroviral-experienced persons experiencing virologic failure. Further, the clinical implications of low-abundance drug-resistant variants at time of virologic failure are unknown. Plasma samples from 22 antiretroviral-experienced subjects collected at time of virologic failure (viral load 1380 to 304,000 copies/mL) were obtained from a specimen bank (from 2004-2007). The prevalence and profile of drug-resistant mutations were determined using Sanger sequencing and ultra-deep pyrosequencing. Genotypes were interpreted using Stanford HIV database algorithm. Antiretroviral treatment histories were obtained by chart review and correlated with drug-resistant mutations. Low-abundance drug-resistant mutations were detected in all 22 subjects by deep sequencing and only in 3 subjects by Sanger sequencing. In total they accounted for 90 of 247 mutations (36%) detected by deep sequencing; the majority of these (95%) were not detected by standard genotyping. A mean of 4 additional mutations per subject were detected by deep sequencing (p<0.0001, 95%CI: 2.85-5.53). The additional low-abundance drug-resistant mutations increased a subject's genotypic resistance to one or more antiretrovirals in 17 of 22 subjects (77%). When correlated with subjects' antiretroviral treatment histories, the additional low-abundance drug-resistant mutations correlated with the failing antiretroviral drugs in 21% subjects and correlated with historical antiretroviral use in 79% subjects (OR, 13.73; 95% CI, 2.5-74.3, p = 0.0016). Low-abundance HIV drug-resistant mutations in antiretroviral-experienced subjects at time of virologic failure can increase a subject's overall burden of resistance, yet commonly go unrecognized by conventional genotyping. The majority of unrecognized resistant mutations correlate with historical antiretroviral use. Ultra-deep sequencing can provide important historical resistance information for clinicians when planning subsequent antiretroviral regimens for highly treatment-experienced patients, particularly when their prior treatment histories and longitudinal genotypes are not available.
Le, Thuy; Chiarella, Jennifer; Simen, Birgitte B.; Hanczaruk, Bozena; Egholm, Michael; Landry, Marie L.; Dieckhaus, Kevin; Rosen, Marc I.; Kozal, Michael J.
2009-01-01
Background It is largely unknown how frequently low-abundance HIV drug-resistant variants at levels under limit of detection of conventional genotyping (<20% of quasi-species) are present in antiretroviral-experienced persons experiencing virologic failure. Further, the clinical implications of low-abundance drug-resistant variants at time of virologic failure are unknown. Methodology/Principal Findings Plasma samples from 22 antiretroviral-experienced subjects collected at time of virologic failure (viral load 1380 to 304,000 copies/mL) were obtained from a specimen bank (from 2004–2007). The prevalence and profile of drug-resistant mutations were determined using Sanger sequencing and ultra-deep pyrosequencing. Genotypes were interpreted using Stanford HIV database algorithm. Antiretroviral treatment histories were obtained by chart review and correlated with drug-resistant mutations. Low-abundance drug-resistant mutations were detected in all 22 subjects by deep sequencing and only in 3 subjects by Sanger sequencing. In total they accounted for 90 of 247 mutations (36%) detected by deep sequencing; the majority of these (95%) were not detected by standard genotyping. A mean of 4 additional mutations per subject were detected by deep sequencing (p<0.0001, 95%CI: 2.85–5.53). The additional low-abundance drug-resistant mutations increased a subject's genotypic resistance to one or more antiretrovirals in 17 of 22 subjects (77%). When correlated with subjects' antiretroviral treatment histories, the additional low-abundance drug-resistant mutations correlated with the failing antiretroviral drugs in 21% subjects and correlated with historical antiretroviral use in 79% subjects (OR, 13.73; 95% CI, 2.5–74.3, p = 0.0016). Conclusions/Significance Low-abundance HIV drug-resistant mutations in antiretroviral-experienced subjects at time of virologic failure can increase a subject's overall burden of resistance, yet commonly go unrecognized by conventional genotyping. The majority of unrecognized resistant mutations correlate with historical antiretroviral use. Ultra-deep sequencing can provide important historical resistance information for clinicians when planning subsequent antiretroviral regimens for highly treatment-experienced patients, particularly when their prior treatment histories and longitudinal genotypes are not available. PMID:19562031
Sheik, Cody S; Reese, Brandi Kiel; Twing, Katrina I; Sylvan, Jason B; Grim, Sharon L; Schrenk, Matthew O; Sogin, Mitchell L; Colwell, Frederick S
2018-01-01
Earth's subsurface environment is one of the largest, yet least studied, biomes on Earth, and many questions remain regarding what microorganisms are indigenous to the subsurface. Through the activity of the Census of Deep Life (CoDL) and the Deep Carbon Observatory, an open access 16S ribosomal RNA gene sequence database from diverse subsurface environments has been compiled. However, due to low quantities of biomass in the deep subsurface, the potential for incorporation of contaminants from reagents used during sample collection, processing, and/or sequencing is high. Thus, to understand the ecology of subsurface microorganisms (i.e., the distribution, richness, or survival), it is necessary to minimize, identify, and remove contaminant sequences that will skew the relative abundances of all taxa in the sample. In this meta-analysis, we identify putative contaminants associated with the CoDL dataset, recommend best practices for removing contaminants from samples, and propose a series of best practices for subsurface microbiology sampling. The most abundant putative contaminant genera observed, independent of evenness across samples, were Propionibacterium , Aquabacterium , Ralstonia , and Acinetobacter . While the top five most frequently observed genera were Pseudomonas , Propionibacterium , Acinetobacter , Ralstonia , and Sphingomonas . The majority of the most frequently observed genera (high evenness) were associated with reagent or potential human contamination. Additionally, in DNA extraction blanks, we observed potential archaeal contaminants, including methanogens, which have not been discussed in previous contamination studies. Such contaminants would directly affect the interpretation of subsurface molecular studies, as methanogenesis is an important subsurface biogeochemical process. Utilizing previously identified contaminant genera, we found that ∼27% of the total dataset were identified as contaminant sequences that likely originate from DNA extraction and DNA cleanup methods. Thus, controls must be taken at every step of the collection and processing procedure when working with low biomass environments such as, but not limited to, portions of Earth's deep subsurface. Taken together, we stress that the CoDL dataset is an incredible resource for the broader research community interested in subsurface life, and steps to remove contamination derived sequences must be taken prior to using this dataset.
NASA Astrophysics Data System (ADS)
Simon, Dirk; Meijer, Paul
2016-04-01
Today, the Atlantic-Mediterranean gateway (the Strait of Gibraltar) and the strong evaporative loss in the east let the Mediterranean Sea attain a salinity of 2-3 g/l higher than the Atlantic Ocean. During the winter months, strong cooling of surface waters in the north forms deep water, which mixes the Mediterranean, while during summer the water column is stratified. During the Messinian Salinity Crisis (MSC, 5.97-5.33Ma) the salt concentration was high enough to reach the saturation of gypsum (~130-160 g/l) and halite (~350 g/l). This caused large deposits of these evaporites all over the basin, capturing 6% of the World Ocean salt within the Mediterranean at the time. Although several mechanisms have been proposed as to how the Mediterranean circulation might have functioned, these mechanisms have yet to be rooted in physics and tested quantitatively. Understanding circulation during the MSC becomes particularly important when comparing Mediterranean marginal to deep basins. On the one hand, many of the marginal basins in the Mediterranean are well studied, like the Sorbas basin (Spain) or the Vena del Gesso basin (Italy). On the other hand, the deep Mediterranean is less well studied, as no full record of the whole deep sequence exists. This makes it very complicated to correlate marginal and deep basin records. Here we are presenting the first steps in working towards a physics-based understanding of the mixing and stratification bahaviour of the Mediterranean Sea during the MSC. The final goal is to identify the physical mechanism needed to form such a salt brine and to understand how it differs from today's situation. We are hoping to compare our results to, and learn from, the much smaller but best available analog to the MSC, the Dead Sea, where recent overturning has been documented.
Zhou, Peng; Wang, Congcong; Tian, Feifei; Ren, Yanrong; Yang, Chao; Huang, Jian
2013-01-01
Quantitative structure-activity relationship (QSAR), a regression modeling methodology that establishes statistical correlation between structure feature and apparent behavior for a series of congeneric molecules quantitatively, has been widely used to evaluate the activity, toxicity and property of various small-molecule compounds such as drugs, toxicants and surfactants. However, it is surprising to see that such useful technique has only very limited applications to biomacromolecules, albeit the solved 3D atom-resolution structures of proteins, nucleic acids and their complexes have accumulated rapidly in past decades. Here, we present a proof-of-concept paradigm for the modeling, prediction and interpretation of the binding affinity of 144 sequence-nonredundant, structure-available and affinity-known protein complexes (Kastritis et al. Protein Sci 20:482-491, 2011) using a biomacromolecular QSAR (BioQSAR) scheme. We demonstrate that the modeling performance and predictive power of BioQSAR are comparable to or even better than that of traditional knowledge-based strategies, mechanism-type methods and empirical scoring algorithms, while BioQSAR possesses certain additional features compared to the traditional methods, such as adaptability, interpretability, deep-validation and high-efficiency. The BioQSAR scheme could be readily modified to infer the biological behavior and functions of other biomacromolecules, if their X-ray crystal structures, NMR conformation assemblies or computationally modeled structures are available.
Graphical classification of DNA sequences of HLA alleles by deep learning.
Miyake, Jun; Kaneshita, Yuhei; Asatani, Satoshi; Tagawa, Seiichi; Niioka, Hirohiko; Hirano, Takashi
2018-04-01
Alleles of human leukocyte antigen (HLA)-A DNAs are classified and expressed graphically by using artificial intelligence "Deep Learning (Stacked autoencoder)". Nucleotide sequence data corresponding to the length of 822 bp, collected from the Immuno Polymorphism Database, were compressed to 2-dimensional representation and were plotted. Profiles of the two-dimensional plots indicate that the alleles can be classified as clusters are formed. The two-dimensional plot of HLA-A DNAs gives a clear outlook for characterizing the various alleles.
Coffey, Lark L; Page, Brady L; Greninger, Alexander L; Herring, Belinda L; Russell, Richard C; Doggett, Stephen L; Haniotis, John; Wang, Chunlin; Deng, Xutao; Delwart, Eric L
2014-01-05
Viral metagenomics characterizes known and identifies unknown viruses based on sequence similarities to any previously sequenced viral genomes. A metagenomics approach was used to identify virus sequences in Australian mosquitoes causing cytopathic effects in inoculated mammalian cell cultures. Sequence comparisons revealed strains of Liao Ning virus (Reovirus, Seadornavirus), previously detected only in China, livestock-infecting Stretch Lagoon virus (Reovirus, Orbivirus), two novel dimarhabdoviruses, named Beaumont and North Creek viruses, and two novel orthobunyaviruses, named Murrumbidgee and Salt Ash viruses. The novel virus proteomes diverged by ≥ 50% relative to their closest previously genetically characterized viral relatives. Deep sequencing also generated genomes of Warrego and Wallal viruses, orbiviruses linked to kangaroo blindness, whose genomes had not been fully characterized. This study highlights viral metagenomics in concert with traditional arbovirus surveillance to characterize known and new arboviruses in field-collected mosquitoes. Follow-up epidemiological studies are required to determine whether the novel viruses infect humans. © 2013 Elsevier Inc. All rights reserved.
3' terminal diversity of MRP RNA and other human noncoding RNAs revealed by deep sequencing.
Goldfarb, Katherine C; Cech, Thomas R
2013-09-21
Post-transcriptional 3' end processing is a key component of RNA regulation. The abundant and essential RNA subunit of RNase MRP has been proposed to function in three distinct cellular compartments and therefore may utilize this mode of regulation. Here we employ 3' RACE coupled with high-throughput sequencing to characterize the 3' terminal sequences of human MRP RNA and other noncoding RNAs that form RNP complexes. The 3' terminal sequence of MRP RNA from HEK293T cells has a distinctive distribution of genomically encoded termini (including an assortment of U residues) with a portion of these selectively tagged by oligo(A) tails. This profile contrasts with the relatively homogenous 3' terminus of an in vitro transcribed MRP RNA control and the differing 3' terminal profiles of U3 snoRNA, RNase P RNA, and telomerase RNA (hTR). 3' RACE coupled with deep sequencing provides a valuable framework for the functional characterization of 3' terminal sequences of noncoding RNAs.
Zhong, Daibin; Lo, Eugenia; Wang, Xiaoming; Yewhalaw, Delenasaw; Zhou, Guofa; Atieli, Harrysone E; Githeko, Andrew; Hemming-Schroeder, Elizabeth; Lee, Ming-Chieh; Afrane, Yaw; Yan, Guiyun
2018-05-02
Parasite genetic diversity and multiplicity of infection (MOI) affect clinical outcomes, response to drug treatment and naturally-acquired or vaccine-induced immunity. Traditional methods often underestimate the frequency and diversity of multiclonal infections due to technical sensitivity and specificity. Next-generation sequencing techniques provide a novel opportunity to study complexity of parasite populations and molecular epidemiology. Symptomatic and asymptomatic Plasmodium vivax samples were collected from health centres/hospitals and schools, respectively, from 2011 to 2015 in Ethiopia. Similarly, both symptomatic and asymptomatic Plasmodium falciparum samples were collected, respectively, from hospitals and schools in 2005 and 2015 in Kenya. Finger-pricked blood samples were collected and dried on filter paper. Long amplicon (> 400 bp) deep sequencing of merozoite surface protein 1 (msp1) gene was conducted to determine multiplicity and molecular epidemiology of P. vivax and P. falciparum infections. The results were compared with those based on short amplicon (117 bp) deep sequencing. A total of 139 P. vivax and 222 P. falciparum samples were pyro-sequenced for pvmsp1 and pfmsp1, yielding a total of 21 P. vivax and 99 P. falciparum predominant haplotypes. The average MOI for P. vivax and P. falciparum were 2.16 and 2.68, respectively, which were significantly higher than that of microsatellite markers and short amplicon (117 bp) deep sequencing. Multiclonal infections were detected in 62.2% of the samples for P. vivax and 74.8% of the samples for P. falciparum. Four out of the five subjects with recurrent P. vivax malaria were found to be a relapse 44-65 days after clearance of parasites. No difference was observed in MOI among P. vivax patients of different symptoms, ages and genders. Similar patterns were also observed in P. falciparum except for one study site in Kenyan lowland areas with significantly higher MOI. The study used a novel method to evaluate Plasmodium MOI and molecular epidemiological patterns by long amplicon ultra-deep sequencing. The complexity of infections were similar among age groups, symptoms, genders, transmission settings (spatial heterogeneity), as well as over years (pre- vs. post-scale-up interventions). This study demonstrated that long amplicon deep sequencing is a useful tool to investigate multiplicity and molecular epidemiology of Plasmodium parasite infections.
TU-H-CAMPUS-IeP2-01: Quantitative Evaluation of PROPELLER DWI Using QIBA Diffusion Phantom
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yung, J; Ai, H; Liu, H
Purpose: The purpose of this study is to determine the quantitative variability of apparent diffusion coefficient (ADC) values when varying imaging parameters in a diffusion-weighted (DW) fast spin echo (FSE) sequence with Periodically Rotated Overlapping ParallEL Lines with Enhanced Reconstruction (PROPELLER) k-space trajectory. Methods: Using a 3T MRI scanner, a NIST traceable, quantitative magnetic resonance imaging (MRI) diffusion phantom (High Precision Devices, Inc, Boulder, Colorado) consisting of 13 vials filled with various concentrations of polymer polyvinylpyrrolidone (PVP) in aqueous solution was imaged with a standard Quantitative Imaging Biomarkers Alliance (QIBA) DWI spin echo, echo planar imaging (SE EPI) acquisition. Themore » same phantom was then imaged with a DWI PROPELLER sequence at varying echo train lengths (ETL) of 8, 20, and 32, as well as b-values of 400, 900, and 2000. QIBA DWI phantom analysis software was used to generate ADC maps and create region of interests (ROIs) for quantitative measurements of each vial. Mean and standard deviations of the ROIs were compared. Results: The SE EPI sequence generated ADC values that showed very good agreement with the known ADC values of the phantom (r2 = 0.9995, slope = 1.0061). The ADC values measured from the PROPELLER sequences were inflated, but were highly correlated with an r2 range from 0.8754 to 0.9880. The PROPELLER sequence with an ETL=20 and b-value of 0 and 2000 showed the closest agreement (r2 = 0.9034, slope = 0.9880). Conclusion: The DW PROPELLER sequence is promising for quantitative evaluation of ADC values. A drawback of the PROPELLER sequence is the longer acquisition time. The 180° refocusing pulses may also cause the observed increase in ADC values compared to the standard SE EPI DW sequence. However, the FSE sequence offers an advantage with in-plane motion and geometric distortion which will be investigated in future studies.« less
Tian, Hui; Sun, Yuanyuan; Liu, Chenghui; Duan, Xinrui; Tang, Wei; Li, Zhengping
2016-12-06
MicroRNA (miRNA) analysis in a single cell is extremely important because it allows deep understanding of the exact correlation between the miRNAs and cell functions. Herein, we wish to report a highly sensitive and precisely quantitative assay for miRNA detection based on ligation-based droplet digital polymerase chain reaction (ddPCR), which permits the quantitation of miRNA in a single cell. In this ligation-based ddPCR assay, two target-specific oligonucleotide probes can be simply designed to be complementary to the half-sequence of the target miRNA, respectively, which avoids the sophisticated design of reverse transcription and provides high specificity to discriminate a single-base difference among miRNAs with simple operations. After the miRNA-templated ligation, the ddPCR partitions individual ligated products into a water-in-oil droplet and digitally counts the fluorescence-positive and negative droplets after PCR amplification for quantification of the target molecules, which possesses the power of precise quantitation and robustness to variation in PCR efficiency. By integrating the advantages of the precise quantification of ddPCR and the simplicity of the ligation-based PCR, the proposed method can sensitively measure let-7a miRNA with a detection limit of 20 aM (12 copies per microliter), and even a single-base difference can be discriminated in let-7 family members. More importantly, due to its high selectivity and sensitivity, the proposed method can achieve precise quantitation of miRNAs in single-cell lysate. Therefore, the ligation-based ddPCR assay may serve as a useful tool to exactly reveal the miRNAs' actions in a single cell, which is of great importance for the study of miRNAs' biofunction as well as for the related biomedical studies.
Complete genome sequence of a novel genotype of squash mosaic virus
USDA-ARS?s Scientific Manuscript database
Complete genome sequence of a novel genotype of Squash mosaic virus (SqMV) infecting squash plants in Spain was obtained using deep sequencing of small ribonucleic acids and assembly. The low nucleotide sequence identities, with 87-88% on RNA1 and 84-86% on RNA2 to known SqMV isolates, suggest a new...
USDA-ARS?s Scientific Manuscript database
The complete genome sequence (6,423 nt) of an emerging Cucumber green mottle mosaic virus (CGMMV) isolate on cucumber in North America was determined through deep sequencing of sRNA and rapid amplification of cDNA ends. It shares 99% nucleotide sequence identity to the Asian genotype, but only 90% t...
Glasner, Heidelinde; Riml, Christian; Micura, Ronald; Breuker, Kathrin
2017-07-27
Nucleobase methylations are ubiquitous posttranscriptional modifications of ribonucleic acids (RNA) that can substantially increase the structural diversity of RNA in a highly dynamic fashion with implications for gene expression and human disease. However, high throughput, deep sequencing does not generally provide information on posttranscriptional modifications (PTMs). A promising alternative approach for the characterization of PTMs, i.e. their identification, localization, and relative quantitation, is top-down mass spectrometry (MS). In this study, we have investigated how specific nucleobase methylations affect RNA ionization in electrospray ionization (ESI), and backbone cleavage in collisionally activated dissociation (CAD) and electron detachment dissociation (EDD). For this purpose, we have developed two new approaches for the characterization of RNA methylations in mixtures of either isomers of RNA or nonisomeric RNA forms. Fragment ions from dissociation experiments were analyzed to identify the modification type, to localize the modification sites, and to reveal the site-specific, relative extent of modification for each site. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Croft, Nathan P.; de Verteuil, Danielle A.; Smith, Stewart A.; Wong, Yik Chun; Schittenhelm, Ralf B.; Tscharke, David C.; Purcell, Anthony W.
2015-01-01
The generation of antigen-specific reagents is a significant bottleneck in the study of complex pathogens that express many hundreds to thousands of different proteins or to emerging or new strains of viruses that display potential pandemic qualities and therefore require rapid investigation. In these instances the development of antibodies for example can be prohibitively expensive to cover the full pathogen proteome, or the lead time may be unacceptably long in urgent cases where new highly pathogenic viral strains may emerge. Because genomic information on such pathogens can be rapidly acquired this opens up avenues using mass spectrometric approaches to study pathogen antigen expression, host responses and for screening the utility of therapeutics. In particular, data-independent acquisition (DIA) modalities on high-resolution mass spectrometers generate spectral information on all components of a complex sample providing depth of coverage hitherto only seen in genomic deep sequencing. The spectral information generated by DIA can be iteratively interrogated for potentially any protein of interest providing both evidence of protein expression and quantitation. Here we apply a solely DIA mass spectrometry based methodology to profile the viral antigen expression in cells infected with vaccinia virus up to 9 h post infection without the need for antigen specific antibodies or other reagents. We demonstrate deep coverage of the vaccinia virus proteome using a SWATH-MS acquisition approach, extracting quantitative kinetics of 100 virus proteins within a single experiment. The results highlight the complexity of vaccinia protein expression, complementing what is known at the transcriptomic level, and provide a valuable resource and technique for future studies of viral infection and replication kinetics. Furthermore, they highlight the utility of DIA and mass spectrometry in the dissection of host-pathogen interactions. PMID:25755296
2013-01-01
Background Although Candida albicans and Candida dubliniensis are most closely related, both species behave significantly different with respect to morphogenesis and virulence. In order to gain further insight into the divergent routes for morphogenetic adaptation in both species, we investigated qualitative along with quantitative differences in the transcriptomes of both organisms by cDNA deep sequencing. Results Following genome-associated assembly of sequence reads we were able to generate experimentally verified databases containing 6016 and 5972 genes for C. albicans and C. dubliniensis, respectively. About 95% of the transcriptionally active regions (TARs) contain open reading frames while the remaining TARs most likely represent non-coding RNAs. Comparison of our annotations with publically available gene models for C. albicans and C. dubliniensis confirmed approximately 95% of already predicted genes, but also revealed so far unknown novel TARs in both species. Qualitative cross-species analysis of these databases revealed in addition to 5802 orthologs also 399 and 49 species-specific protein coding genes for C. albicans and C. dubliniensis, respectively. Furthermore, quantitative transcriptional profiling using RNA-Seq revealed significant differences in the expression of orthologs across both species. We defined a core subset of 84 hyphal-specific genes required for both species, as well as a set of 42 genes that seem to be specifically induced during hyphal morphogenesis in C. albicans. Conclusions Species-specific adaptation in C. albicans and C. dubliniensis is governed by individual genetic repertoires but also by altered regulation of conserved orthologs on the transcriptional level. PMID:23547856
Kitahara, Marcelo V.; Cairns, Stephen D.; Stolarski, Jarosław; Blair, David; Miller, David J.
2010-01-01
Background Classical morphological taxonomy places the approximately 1400 recognized species of Scleractinia (hard corals) into 27 families, but many aspects of coral evolution remain unclear despite the application of molecular phylogenetic methods. In part, this may be a consequence of such studies focusing on the reef-building (shallow water and zooxanthellate) Scleractinia, and largely ignoring the large number of deep-sea species. To better understand broad patterns of coral evolution, we generated molecular data for a broad and representative range of deep sea scleractinians collected off New Caledonia and Australia during the last decade, and conducted the most comprehensive molecular phylogenetic analysis to date of the order Scleractinia. Methodology Partial (595 bp) sequences of the mitochondrial cytochrome oxidase subunit 1 (CO1) gene were determined for 65 deep-sea (azooxanthellate) scleractinians and 11 shallow-water species. These new data were aligned with 158 published sequences, generating a 234 taxon dataset representing 25 of the 27 currently recognized scleractinian families. Principal Findings/Conclusions There was a striking discrepancy between the taxonomic validity of coral families consisting predominantly of deep-sea or shallow-water species. Most families composed predominantly of deep-sea azooxanthellate species were monophyletic in both maximum likelihood and Bayesian analyses but, by contrast (and consistent with previous studies), most families composed predominantly of shallow-water zooxanthellate taxa were polyphyletic, although Acroporidae, Poritidae, Pocilloporidae, and Fungiidae were exceptions to this general pattern. One factor contributing to this inconsistency may be the greater environmental stability of deep-sea environments, effectively removing taxonomic “noise” contributed by phenotypic plasticity. Our phylogenetic analyses imply that the most basal extant scleractinians are azooxanthellate solitary corals from deep-water, their divergence predating that of the robust and complex corals. Deep-sea corals are likely to be critical to understanding anthozoan evolution and the origins of the Scleractinia. PMID:20628613
Comprehensive discovery of noncoding RNAs in acute myeloid leukemia cell transcriptomes.
Zhang, Jin; Griffith, Malachi; Miller, Christopher A; Griffith, Obi L; Spencer, David H; Walker, Jason R; Magrini, Vincent; McGrath, Sean D; Ly, Amy; Helton, Nichole M; Trissal, Maria; Link, Daniel C; Dang, Ha X; Larson, David E; Kulkarni, Shashikant; Cordes, Matthew G; Fronick, Catrina C; Fulton, Robert S; Klco, Jeffery M; Mardis, Elaine R; Ley, Timothy J; Wilson, Richard K; Maher, Christopher A
2017-11-01
To detect diverse and novel RNA species comprehensively, we compared deep small RNA and RNA sequencing (RNA-seq) methods applied to a primary acute myeloid leukemia (AML) sample. We were able to discover previously unannotated small RNAs using deep sequencing of a library method using broader insert size selection. We analyzed the long noncoding RNA (lncRNA) landscape in AML by comparing deep sequencing from multiple RNA-seq library construction methods for the sample that we studied and then integrating RNA-seq data from 179 AML cases. This identified lncRNAs that are completely novel, differentially expressed, and associated with specific AML subtypes. Our study revealed the complexity of the noncoding RNA transcriptome through a combined strategy of strand-specific small RNA and total RNA-seq. This dataset will serve as an invaluable resource for future RNA-based analyses. Copyright © 2017 ISEH – Society for Hematology and Stem Cells. Published by Elsevier Inc. All rights reserved.
Mee, Edward T.; Preston, Mark D.; Minor, Philip D.; Schepelmann, Silke; Huang, Xuening; Nguyen, Jenny; Wall, David; Hargrove, Stacey; Fu, Thomas; Xu, George; Li, Li; Cote, Colette; Delwart, Eric; Li, Linlin; Hewlett, Indira; Simonyan, Vahan; Ragupathy, Viswanath; Alin, Voskanian-Kordi; Mermod, Nicolas; Hill, Christiane; Ottenwälder, Birgit; Richter, Daniel C.; Tehrani, Arman; Jacqueline, Weber-Lehmann; Cassart, Jean-Pol; Letellier, Carine; Vandeputte, Olivier; Ruelle, Jean-Louis; Deyati, Avisek; La Neve, Fabio; Modena, Chiara; Mee, Edward; Schepelmann, Silke; Preston, Mark; Minor, Philip; Eloit, Marc; Muth, Erika; Lamamy, Arnaud; Jagorel, Florence; Cheval, Justine; Anscombe, Catherine; Misra, Raju; Wooldridge, David; Gharbia, Saheer; Rose, Graham; Ng, Siemon H.S.; Charlebois, Robert L.; Gisonni-Lex, Lucy; Mallet, Laurent; Dorange, Fabien; Chiu, Charles; Naccache, Samia; Kellam, Paul; van der Hoek, Lia; Cotten, Matt; Mitchell, Christine; Baier, Brian S.; Sun, Wenping; Malicki, Heather D.
2016-01-01
Background Unbiased deep sequencing offers the potential for improved adventitious virus screening in vaccines and biotherapeutics. Successful implementation of such assays will require appropriate control materials to confirm assay performance and sensitivity. Methods A common reference material containing 25 target viruses was produced and 16 laboratories were invited to process it using their preferred adventitious virus detection assay. Results Fifteen laboratories returned results, obtained using a wide range of wet-lab and informatics methods. Six of 25 target viruses were detected by all laboratories, with the remaining viruses detected by 4–14 laboratories. Six non-target viruses were detected by three or more laboratories. Conclusion The study demonstrated that a wide range of methods are currently used for adventitious virus detection screening in biological products by deep sequencing and that they can yield significantly different results. This underscores the need for common reference materials to ensure satisfactory assay performance and enable comparisons between laboratories. PMID:26709640
Intelligent fault diagnosis of rolling bearings using an improved deep recurrent neural network
NASA Astrophysics Data System (ADS)
Jiang, Hongkai; Li, Xingqiu; Shao, Haidong; Zhao, Ke
2018-06-01
Traditional intelligent fault diagnosis methods for rolling bearings heavily depend on manual feature extraction and feature selection. For this purpose, an intelligent deep learning method, named the improved deep recurrent neural network (DRNN), is proposed in this paper. Firstly, frequency spectrum sequences are used as inputs to reduce the input size and ensure good robustness. Secondly, DRNN is constructed by the stacks of the recurrent hidden layer to automatically extract the features from the input spectrum sequences. Thirdly, an adaptive learning rate is adopted to improve the training performance of the constructed DRNN. The proposed method is verified with experimental rolling bearing data, and the results confirm that the proposed method is more effective than traditional intelligent fault diagnosis methods.
Miyatake, Satoko; Koshimizu, Eriko; Hayashi, Yukiko K; Miya, Kazushi; Shiina, Masaaki; Nakashima, Mitsuko; Tsurusaki, Yoshinori; Miyake, Noriko; Saitsu, Hirotomo; Ogata, Kazuhiro; Nishino, Ichizo; Matsumoto, Naomichi
2014-07-01
When an expected mutation in a particular disease-causing gene is not identified in a suspected carrier, it is usually assumed to be due to germline mosaicism. We report here very-low-grade somatic mosaicism in ACTA1 in an unaffected mother of two siblings affected with a neonatal form of nemaline myopathy. The mosaicism was detected by deep resequencing using a next-generation sequencer. We identified a novel heterozygous mutation in ACTA1, c.448A>G (p.Thr150Ala), in the affected siblings. Three-dimensional structural modeling suggested that this mutation may affect polymerization and/or actin's interactions with other proteins. In this family, we expected autosomal dominant inheritance with either parent demonstrating germline or somatic mosaicism. Sanger sequencing identified no mutation. However, further deep resequencing of this mutation on a next-generation sequencer identified very-low-grade somatic mosaicism in the mother: 0.4%, 1.1%, and 8.3% in the saliva, blood leukocytes, and nails, respectively. Our study demonstrates the possibility of very-low-grade somatic mosaicism in suspected carriers, rather than germline mosaicism. Copyright © 2014 Elsevier B.V. All rights reserved.
Leda, Ana Rachel; Hunter, James; Oliveira, Ursula Castro; Azevedo, Inacio Junqueira; Sucupira, Maria Cecilia Araripe; Diaz, Ricardo Sobhie
2018-04-19
The presence of minority transmitted drug resistance mutations was assessed using ultra-deep sequencing and correlated with disease progression among recently HIV-1-infected individuals from Brazil. Samples at baseline during recent infection and 1 year after the establishment of the infection were analysed. Viral RNA and proviral DNA from 25 individuals were subjected to ultra-deep sequencing of the reverse transcriptase and protease regions of HIV-1. Viral strains carrying transmitted drug resistance mutations were detected in 9 out of the 25 patients, for all major antiretroviral classes, ranging from one to five mutations per patient. Ultra-deep sequencing detected strains with frequencies as low as 1.6% and only strains with frequencies >20% were detected by population plasma sequencing (three patients). Transmitted drug resistance strains with frequencies <14.8% did not persist upon established infection. The presence of transmitted drug resistance mutations was negatively correlated with the viral load and with CD4+ T cell count decay. Transmitted drug resistance mutations representing small percentages of the viral population do not persist during infection because they are negatively selected in the first year after HIV-1 seroconversion.
GenomeGems: evaluation of genetic variability from deep sequencing data
2012-01-01
Background Detection of disease-causing mutations using Deep Sequencing technologies possesses great challenges. In particular, organizing the great amount of sequences generated so that mutations, which might possibly be biologically relevant, are easily identified is a difficult task. Yet, for this assignment only limited automatic accessible tools exist. Findings We developed GenomeGems to gap this need by enabling the user to view and compare Single Nucleotide Polymorphisms (SNPs) from multiple datasets and to load the data onto the UCSC Genome Browser for an expanded and familiar visualization. As such, via automatic, clear and accessible presentation of processed Deep Sequencing data, our tool aims to facilitate ranking of genomic SNP calling. GenomeGems runs on a local Personal Computer (PC) and is freely available at http://www.tau.ac.il/~nshomron/GenomeGems. Conclusions GenomeGems enables researchers to identify potential disease-causing SNPs in an efficient manner. This enables rapid turnover of information and leads to further experimental SNP validation. The tool allows the user to compare and visualize SNPs from multiple experiments and to easily load SNP data onto the UCSC Genome browser for further detailed information. PMID:22748151
Feasibility of 3.0T pelvic MR imaging in the evaluation of endometriosis.
Manganaro, L; Fierro, F; Tomei, A; Irimia, D; Lodise, P; Sergi, M E; Vinci, V; Sollazzo, P; Porpora, M G; Delfini, R; Vittori, G; Marini, M
2012-06-01
Endometriosis represents an important clinical problem in women of reproductive age with high impact on quality of life, work productivity and health care management. The aim of this study is to define the role of 3T magnetom system MRI in the evaluation of endometriosis. Forty-six women, with transvaginal (TV) ultrasound examination positive for endometriosis, with pelvic pain, or infertile underwent an MR 3.0T examination with the following protocol: T2 weighted FRFSE HR sequences, T2 weighted FRFSE HR CUBE 3D sequences, T1 w FSE sequences, LAVA-flex sequences. Pelvic anatomy, macroscopic endometriosis implants, deep endometriosis implants, fallopian tube involvement, adhesions presence, fluid effusion in Douglas pouch, uterus and kidney pathologies or anomalies associated and sacral nervous routes were considered by two radiologists in consensus. Laparoscopy was considered the gold standard. MRI imaging diagnosed deep endometriosis in 22/46 patients, endometriomas not associated to deep implants in 9/46 patients, 15/46 patients resulted negative for endometriosis, 11 of 22 patients with deep endometriosis reported ovarian endometriosis cyst. We obtained high percentages of sensibility (96.97%), specificity (100.00%), VPP (100.00%), VPN (92.86%). Pelvic MRI performed with 3T system guarantees high spatial and contrast resolution, providing accurate information about endometriosis implants, with a good pre-surgery mapping of the lesions involving both bowels and bladder surface and recto-uterine ligaments. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Duan, Naibin; Bai, Yang; Sun, Honghe; Wang, Nan; Ma, Yumin; Li, Mingjun; Wang, Xin; Jiao, Chen; Legall, Noah; Mao, Linyong; Wan, Sibao; Wang, Kun; He, Tianming; Feng, Shouqian; Zhang, Zongying; Mao, Zhiquan; Shen, Xiang; Chen, Xiaoliu; Jiang, Yuanmao; Wu, Shujing; Yin, Chengmiao; Ge, Shunfeng; Yang, Long; Jiang, Shenghui; Xu, Haifeng; Liu, Jingxuan; Wang, Deyun; Qu, Changzhi; Wang, Yicheng; Zuo, Weifang; Xiang, Li; Liu, Chang; Zhang, Daoyuan; Gao, Yuan; Xu, Yimin; Xu, Kenong; Chao, Thomas; Fazio, Gennaro; Shu, Huairui; Zhong, Gan-Yuan; Cheng, Lailiang; Fei, Zhangjun; Chen, Xuesen
2017-08-15
Human selection has reshaped crop genomes. Here we report an apple genome variation map generated through genome sequencing of 117 diverse accessions. A comprehensive model of apple speciation and domestication along the Silk Road is proposed based on evidence from diverse genomic analyses. Cultivated apples likely originate from Malus sieversii in Kazakhstan, followed by intensive introgressions from M. sylvestris. M. sieversii in Xinjiang of China turns out to be an "ancient" isolated ecotype not directly contributing to apple domestication. We have identified selective sweeps underlying quantitative trait loci/genes of important fruit quality traits including fruit texture and flavor, and provide evidences supporting a model of apple fruit size evolution comprising two major events with one occurring prior to domestication and the other during domestication. This study outlines the genetic basis of apple domestication and evolution, and provides valuable information for facilitating marker-assisted breeding and apple improvement.Apple is one of the most important fruit crops. Here, the authors perform deep genome resequencing of 117 diverse accessions and reveal comprehensive models of apple origin, speciation, domestication, and fruit size evolution as well as candidate genes associated with important agronomic traits.
Engineered Luciferase Reporter from a Deep Sea Shrimp Utilizing a Novel Imidazopyrazinone Substrate
2012-01-01
Bioluminescence methodologies have been extraordinarily useful due to their high sensitivity, broad dynamic range, and operational simplicity. These capabilities have been realized largely through incremental adaptations of native enzymes and substrates, originating from luminous organisms of diverse evolutionary lineages. We engineered both an enzyme and substrate in combination to create a novel bioluminescence system capable of more efficient light emission with superior biochemical and physical characteristics. Using a small luciferase subunit (19 kDa) from the deep sea shrimp Oplophorus gracilirostris, we have improved luminescence expression in mammalian cells ∼2.5 million-fold by merging optimization of protein structure with development of a novel imidazopyrazinone substrate (furimazine). The new luciferase, NanoLuc, produces glow-type luminescence (signal half-life >2 h) with a specific activity ∼150-fold greater than that of either firefly (Photinus pyralis) or Renilla luciferases similarly configured for glow-type assays. In mammalian cells, NanoLuc shows no evidence of post-translational modifications or subcellular partitioning. The enzyme exhibits high physical stability, retaining activity with incubation up to 55 °C or in culture medium for >15 h at 37 °C. As a genetic reporter, NanoLuc may be configured for high sensitivity or for response dynamics by appending a degradation sequence to reduce intracellular accumulation. Appending a signal sequence allows NanoLuc to be exported to the culture medium, where reporter expression can be measured without cell lysis. Fusion onto other proteins allows luminescent assays of their metabolism or localization within cells. Reporter quantitation is achievable even at very low expression levels to facilitate more reliable coupling with endogenous cellular processes. PMID:22894855
Identifying Preserved Storm Events on Beaches from Trenches and Cores
NASA Astrophysics Data System (ADS)
Wadman, H. M.; Gallagher, E. L.; McNinch, J.; Reniers, A.; Koktas, M.
2014-12-01
Recent research suggests that even small scale variations in grain size in the shallow stratigraphy of sandy beaches can significantly influence large-scale morphology change. However, few quantitative studies of variations in shallow stratigraphic layers, as differentiated by variations in mean grain size, have been conducted, in no small part due to the difficulty of collecting undisturbed sediment cores in the energetic lower beach and swash zone. Due to this lack of quantitative stratigraphic grain size data, most coastal morphology models assume that uniform grain sizes dominate sandy beaches, allowing for little to no temporal or spatial variations in grain size heterogeneity. In a first-order attempt to quantify small-scale, temporal and spatial variations in beach stratigraphy, thirty-five vibracores were collected at the USACE Field Research Facility (FRF), Duck, NC, in March-April of 2014 using the FRF's Coastal Research and Amphibious Buggy (CRAB). Vibracores were collected at set locations along a cross-shore profile from the toe of the dune to a water depth of ~1m in the surf zone. Vibracores were repeatedly collected from the same locations throughout a tidal cycle, as well as pre- and post a nor'easter event. In addition, two ~1.5m deep trenches were dug in the cross-shore and along-shore directions (each ~14m in length) after coring was completed to allow better interpretation of the stratigraphic sequences observed in the vibracores. The elevations of coherent stratigraphic layers, as revealed in vibracore-based fence diagrams and trench data, are used to relate specific observed stratigraphic sequences to individual storm events observed at the FRF. These data provide a first-order, quantitative examination of the small-scale temporal and spatial variability of shallow grain size along an open, sandy coastline. The data will be used to refine morphological model predictions to include variations in grain size and associated shallow stratigraphy.
ComplexContact: a web server for inter-protein contact prediction using deep learning.
Zeng, Hong; Wang, Sheng; Zhou, Tianming; Zhao, Feifeng; Li, Xiufeng; Wu, Qing; Xu, Jinbo
2018-05-22
ComplexContact (http://raptorx2.uchicago.edu/ComplexContact/) is a web server for sequence-based interfacial residue-residue contact prediction of a putative protein complex. Interfacial residue-residue contacts are critical for understanding how proteins form complex and interact at residue level. When receiving a pair of protein sequences, ComplexContact first searches for their sequence homologs and builds two paired multiple sequence alignments (MSA), then it applies co-evolution analysis and a CASP-winning deep learning (DL) method to predict interfacial contacts from paired MSAs and visualizes the prediction as an image. The DL method was originally developed for intra-protein contact prediction and performed the best in CASP12. Our large-scale experimental test further shows that ComplexContact greatly outperforms pure co-evolution methods for inter-protein contact prediction, regardless of the species.
Cavalier-Smith, Thomas
2015-04-01
Contradictory and confusing results can arise if sequenced 'monoprotist' samples really contain DNA of very different species. Eukaryote-wide phylogenetic analyses using five genes from the amoeboflagellate culture ATCC 50646 previously implied it was an undescribed percolozoan related to percolatean flagellates (Stephanopogon, Percolomonas). Contrastingly, three phylogenetic analyses of 18S rRNA alone, did not place it within Percolozoa, but as an isolated deep-branching excavate. I resolve that contradiction by sequence phylogenies for all five genes individually, using up to 652 taxa. Its 18S rRNA sequence (GQ377652) is near-identical to one from stained-glass windows, somewhat more distant from one from cooling-tower water, all three related to terrestrial actinocephalid gregarines Hoplorhynchus and Pyxinia. All four protein-gene sequences (Hsp90; α-tubulin; β-tubulin; actin) are from an amoeboflagellate heterolobosean percolozoan, not especially deeply branching. Contrary to previous conclusions from trees combining protein and rRNA sequences or rDNA trees including Eozoa only, this culture does not represent a major novel deep-branching eukaryote lineage distinct from Heterolobosea, and thus lacks special significance for deep eukaryote phylogeny, though the rDNA sequence is important for gregarine phylogeny. α-Tubulin trees for over 250 eukaryotes refute earlier suggestions of lateral gene transfer within eukaryotes, being largely congruent with morphology and other gene trees. Copyright © 2015. Published by Elsevier GmbH.
Deep learning improves prediction of CRISPR-Cpf1 guide RNA activity.
Kim, Hui Kwon; Min, Seonwoo; Song, Myungjae; Jung, Soobin; Choi, Jae Woo; Kim, Younggwang; Lee, Sangeun; Yoon, Sungroh; Kim, Hyongbum Henry
2018-03-01
We present two algorithms to predict the activity of AsCpf1 guide RNAs. Indel frequencies for 15,000 target sequences were used in a deep-learning framework based on a convolutional neural network to train Seq-deepCpf1. We then incorporated chromatin accessibility information to create the better-performing DeepCpf1 algorithm for cell lines for which such information is available and show that both algorithms outperform previous machine learning algorithms on our own and published data sets.
DSAP: deep-sequencing small RNA analysis pipeline.
Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus
2010-07-01
DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.
Sakai, Hiroaki; Kanamori, Hiroyuki; Arai-Kichise, Yuko; Shibata-Hatta, Mari; Ebana, Kaworu; Oono, Youko; Kurita, Kanako; Fujisawa, Hiroko; Katagiri, Satoshi; Mukai, Yoshiyuki; Hamada, Masao; Itoh, Takeshi; Matsumoto, Takashi; Katayose, Yuichi; Wakasa, Kyo; Yano, Masahiro; Wu, Jianzhong
2014-01-01
Having a deep genetic structure evolved during its domestication and adaptation, the Asian cultivated rice (Oryza sativa) displays considerable physiological and morphological variations. Here, we describe deep whole-genome sequencing of the aus rice cultivar Kasalath by using the advanced next-generation sequencing (NGS) technologies to gain a better understanding of the sequence and structural changes among highly differentiated cultivars. The de novo assembled Kasalath sequences represented 91.1% (330.55 Mb) of the genome and contained 35 139 expressed loci annotated by RNA-Seq analysis. We detected 2 787 250 single-nucleotide polymorphisms (SNPs) and 7393 large insertion/deletion (indel) sites (>100 bp) between Kasalath and Nipponbare, and 2 216 251 SNPs and 3780 large indels between Kasalath and 93-11. Extensive comparison of the gene contents among these cultivars revealed similar rates of gene gain and loss. We detected at least 7.39 Mb of inserted sequences and 40.75 Mb of unmapped sequences in the Kasalath genome in comparison with the Nipponbare reference genome. Mapping of the publicly available NGS short reads from 50 rice accessions proved the necessity and the value of using the Kasalath whole-genome sequence as an additional reference to capture the sequence polymorphisms that cannot be discovered by using the Nipponbare sequence alone. PMID:24578372
Unified Deep Learning Architecture for Modeling Biology Sequence.
Wu, Hongjie; Cao, Chengyuan; Xia, Xiaoyan; Lu, Qiang
2017-10-09
Prediction of the spatial structure or function of biological macromolecules based on their sequence remains an important challenge in bioinformatics. When modeling biological sequences using traditional sequencing models, characteristics, such as long-range interactions between basic units, the complicated and variable output of labeled structures, and the variable length of biological sequences, usually lead to different solutions on a case-by-case basis. This study proposed the use of bidirectional recurrent neural networks based on long short-term memory or a gated recurrent unit to capture long-range interactions by designing the optional reshape operator to adapt to the diversity of the output labels and implementing a training algorithm to support the training of sequence models capable of processing variable-length sequences. Additionally, the merge and pooling operators enhanced the ability to capture short-range interactions between basic units of biological sequences. The proposed deep-learning model and its training algorithm might be capable of solving currently known biological sequence-modeling problems through the use of a unified framework. We validated our model on one of the most difficult biological sequence-modeling problems currently known, with our results indicating the ability of the model to obtain predictions of protein residue interactions that exceeded the accuracy of current popular approaches by 10% based on multiple benchmarks.
Quaternary paleoceanography of the deep Arctic Ocean based on quantitative analysis of Ostracoda
Cronin, T. M.; Holtz, T.R.; Whatley, R.C.
1994-01-01
Ostracodes were studied from deep Arctic Ocean cores obtained during the Arctic 91 expedition of the Polarstern to the Nansen, Amundsen and Makarov Basins, the Lomonosov Ridge, Morris Jesup Rise and Yermak Plateau, in order to investigate their distribution in Arctic Ocean deep water (AODW) and apply these data to paleoceanographic reconstruction of bottom water masses during the Quaternary. Analyses of coretop assemblages from Arctic 91 boxcores indicate the following: ostracodes are common at all depths between 1000 and 4500 m, and species distribution is strongly influenced by water mass characteristics and bathymetry; quantitative analyses comparing Eurasian and Canada Basin assemblages indicate that distinct assemblages inhabit regions east and west of the Lomonosov Ridge, a barrier especially important to species living in lower AODW; deep Eurasian Basin assemblages are more similar to those living in Greenland Sea deep water (GSDW) than those in Canada Basin deep water; two upper AODW assemblages were recognized throughout the Arctic Ocean, one living between 1000 and 1500 m, and the other, having high species diversity, at 1500-3000 m. Downcore quantitative analyses of species' abundances and the squared chord distance coefficient of similarity reveals a distinct series of abundance peaks in key indicator taxa interpreted to signify the following late Quaternary deep water history of the Eurasian Basin. During the Last Glacial Maximum (LGM), a GSDW/AODW assemblage, characteristic of cold, well oxygenated deep water > 3000 m today, inhabited the Lomonosov Ridge to depths as shallow as 1000 m, perhaps indicating the influence of GSDW at mid-depths in the central Arctic Ocean. During Termination 1, a period of high organic productivity associated with a strong inflowing warm North Atlantic layer occurred. During the mid-Holocene, several key faunal events indicate a period of warming and/or enhanced flow between the Canada and Eurasian Basins. A long-term record of ostracode assemblages from kastenlot core PS2200-5 (1073 m water depth) from the Morris Jesup Rise indicates a quasi-cyclic pattern of water mass changes during the last 300 kyr. Interglacial ostracode assemblages corresponding to oxygen isotope stages 1, 5, and 7 indicate rapid changes in dissolved oxygen and productivity during glacial-interglacial transitions. ?? 1994.
Quantitative analysis of the anti-noise performance of an m-sequence in an electromagnetic method
NASA Astrophysics Data System (ADS)
Yuan, Zhe; Zhang, Yiming; Zheng, Qijia
2018-02-01
An electromagnetic method with a transmitted waveform coded by an m-sequence achieved better anti-noise performance compared to the conventional manner with a square-wave. The anti-noise performance of the m-sequence varied with multiple coding parameters; hence, a quantitative analysis of the anti-noise performance for m-sequences with different coding parameters was required to optimize them. This paper proposes the concept of an identification system, with the identified Earth impulse response obtained by measuring the system output with the input of the voltage response. A quantitative analysis of the anti-noise performance of the m-sequence was achieved by analyzing the amplitude-frequency response of the corresponding identification system. The effects of the coding parameters on the anti-noise performance are summarized by numerical simulation, and their optimization is further discussed in our conclusions; the validity of the conclusions is further verified by field experiment. The quantitative analysis method proposed in this paper provides a new insight into the anti-noise mechanism of the m-sequence, and could be used to evaluate the anti-noise performance of artificial sources in other time-domain exploration methods, such as the seismic method.
Genome-wide discovery of novel and conserved microRNAs in white shrimp (Litopenaeus vannamei).
Xi, Qian-Yun; Xiong, Yuan-Yan; Wang, Yuan-Mei; Cheng, Xiao; Qi, Qi-En; Shu, Gang; Wang, Song-Bo; Wang, Li-Na; Gao, Ping; Zhu, Xiao-Tong; Jiang, Qing-Yan; Zhang, Yong-Liang; Liu, Li
2015-01-01
Of late years, a large amount of conserved and species-specific microRNAs (miRNAs) have been performed on identification from species which are economically important but lack a full genome sequence. In this study, Solexa deep sequencing and cross-species miRNA microarray were used to detect miRNAs in white shrimp. We identified 239 conserved miRNAs, 14 miRNA* sequences and 20 novel miRNAs by bioinformatics analysis from 7,561,406 high-quality reads representing 325,370 distinct sequences. The all 20 novel miRNAs were species-specific in white shrimp and not homologous in other species. Using the conserved miRNAs from the miRBase database as a query set to search for homologs from shrimp expressed sequence tags (ESTs), 32 conserved computationally predicted miRNAs were discovered in shrimp. In addition, using microarray analysis in the shrimp fed with Panax ginseng polysaccharide complex, 151 conserved miRNAs were identified, 18 of which were significant up-expression, while 49 miRNAs were significant down-expression. In particular, qRT-PCR analysis was also performed for nine miRNAs in three shrimp tissues such as muscle, gill and hepatopancreas. Results showed that these miRNAs expression are tissue specific. Combining results of the three methods, we detected 20 novel and 394 conserved miRNAs. Verification with quantitative reverse transcription (qRT-PCR) and Northern blot showed a high confidentiality of data. The study provides the first comprehensive specific miRNA profile of white shrimp, which includes useful information for future investigations into the function of miRNAs in regulation of shrimp development and immunology.
DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations.
Yuan, Yuchen; Shi, Yi; Li, Changyang; Kim, Jinman; Cai, Weidong; Han, Zeguang; Feng, David Dagan
2016-12-23
With the developments of DNA sequencing technology, large amounts of sequencing data have become available in recent years and provide unprecedented opportunities for advanced association studies between somatic point mutations and cancer types/subtypes, which may contribute to more accurate somatic point mutation based cancer classification (SMCC). However in existing SMCC methods, issues like high data sparsity, small volume of sample size, and the application of simple linear classifiers, are major obstacles in improving the classification performance. To address the obstacles in existing SMCC studies, we propose DeepGene, an advanced deep neural network (DNN) based classifier, that consists of three steps: firstly, the clustered gene filtering (CGF) concentrates the gene data by mutation occurrence frequency, filtering out the majority of irrelevant genes; secondly, the indexed sparsity reduction (ISR) converts the gene data into indexes of its non-zero elements, thereby significantly suppressing the impact of data sparsity; finally, the data after CGF and ISR is fed into a DNN classifier, which extracts high-level features for accurate classification. Experimental results on our curated TCGA-DeepGene dataset, which is a reformulated subset of the TCGA dataset containing 12 selected types of cancer, show that CGF, ISR and DNN all contribute in improving the overall classification performance. We further compare DeepGene with three widely adopted classifiers and demonstrate that DeepGene has at least 24% performance improvement in terms of testing accuracy. Based on deep learning and somatic point mutation data, we devise DeepGene, an advanced cancer type classifier, which addresses the obstacles in existing SMCC studies. Experiments indicate that DeepGene outperforms three widely adopted existing classifiers, which is mainly attributed to its deep learning module that is able to extract the high level features between combinatorial somatic point mutations and cancer types.
The genetic landscape of a physical interaction
Diss, Guillaume
2018-01-01
A key question in human genetics and evolutionary biology is how mutations in different genes combine to alter phenotypes. Efforts to systematically map genetic interactions have mostly made use of gene deletions. However, most genetic variation consists of point mutations of diverse and difficult to predict effects. Here, by developing a new sequencing-based protein interaction assay – deepPCA – we quantified the effects of >120,000 pairs of point mutations on the formation of the AP-1 transcription factor complex between the products of the FOS and JUN proto-oncogenes. Genetic interactions are abundant both in cis (within one protein) and trans (between the two molecules) and consist of two classes – interactions driven by thermodynamics that can be predicted using a three-parameter global model, and structural interactions between proximally located residues. These results reveal how physical interactions generate quantitatively predictable genetic interactions. PMID:29638215
Teaching Real-World Applications of Business Statistics Using Communication to Scaffold Learning
ERIC Educational Resources Information Center
Green, Gareth P.; Jones, Stacey; Bean, John C.
2015-01-01
Our assessment research suggests that quantitative business courses that rely primarily on algorithmic problem solving may not produce the deep learning required for addressing real-world business problems. This article illustrates a strategy, supported by recent learning theory, for promoting deep learning by moving students gradually from…
Viral activities and life cycles in deep subseafloor sediments.
Engelhardt, Tim; Orsi, William D; Jørgensen, Bo Barker
2015-12-01
Viruses are highly abundant in marine subsurface sediments and can even exceed the number of prokaryotes. However, their activity and quantitative impact on microbial populations are still poorly understood. Here, we use gene expression data from published continental margin subseafloor metatranscriptomes to qualitatively assess viral diversity and activity in sediments up to 159 metres below seafloor (mbsf). Mining of the metatranscriptomic data revealed 4651 representative viral homologues (RVHs), representing 2.2% of all metatranscriptome sequence reads, which have close translated homology (average 77%, range 60-97% amino acid identity) to viral proteins. Archaea-infecting RVHs are exclusively detected in the upper 30 mbsf, whereas RVHs for filamentous inoviruses predominate in the deepest sediment layers. RVHs indicative of lysogenic phage-host interactions and lytic activity, notably cell lysis, are detected at all analysed depths and suggest a dynamic virus-host association in the marine deep biosphere studied here. Ongoing lytic viral activity is further indicated by the expression of clustered, regularly interspaced, short palindromic repeat-associated cascade genes involved in cellular defence against viral attacks. The data indicate the activity of viruses in subsurface sediment of the Peruvian margin and suggest that viruses indeed cause cell mortality and may play an important role in the turnover of subseafloor microbial biomass. © 2015 Society for Applied Microbiology and John Wiley & Sons Ltd.
Pérez, Ruben; Calleros, Lucía; Marandino, Ana; Sarute, Nicolás; Iraola, Gregorio; Grecco, Sofia; Blanc, Hervé; Vignuzzi, Marco; Isakov, Ofer; Shomron, Noam; Carrau, Lucía; Hernández, Martín; Francia, Lourdes; Sosa, Katia; Tomás, Gonzalo; Panzera, Yanina
2014-01-01
Canine parvovirus (CPV), a fast-evolving single-stranded DNA virus, comprises three antigenic variants (2a, 2b, and 2c) with different frequencies and genetic variability among countries. The contribution of co-infection and recombination to the genetic variability of CPV is far from being fully elucidated. Here we took advantage of a natural CPV population, recently formed by the convergence of divergent CPV-2c and CPV-2a strains, to study co-infection and recombination. Complete sequences of the viral coding region of CPV-2a and CPV-2c strains from 40 samples were generated and analyzed using phylogenetic tools. Two samples showed co-infection and were further analyzed by deep sequencing. The sequence profile of one of the samples revealed the presence of CPV-2c and CPV-2a strains that differed at 29 nucleotides. The other sample included a minor CPV-2a strain (13.3% of the viral population) and a major recombinant strain (86.7%). The recombinant strain arose from inter-genotypic recombination between CPV-2c and CPV-2a strains within the VP1/VP2 gene boundary. Our findings highlight the importance of deep-sequencing analysis to provide a better understanding of CPV molecular diversity. PMID:25365348
Šlapeta, Jan; Saverimuttu, Stefan; Vogelnest, Larry; Sangster, Cheryl; Hulst, Frances; Rose, Karrie; Thompson, Paul; Whittington, Richard
2017-11-01
The short-beaked echidna (Tachyglossus aculeatus) and the platypus (Ornithorhynchus anatinus) are iconic egg-laying monotremes (Mammalia: Monotremata) from Australasia. The aim of this study was to demonstrate the utility of diversity profiles in disease investigations of monotremes. Using small subunit (18S) rDNA amplicon deep-sequencing we demonstrated the presence of apicomplexan parasites and confirmed by direct and cloned amplicon gene sequencing Theileria ornithorhynchi, Theileria tachyglossi, Eimeria echidnae and Cryptosporidium fayeri. Using a combination of samples from healthy and diseased animals, we show a close evolutionary relationship between species of coccidia (Eimeria) and piroplasms (Theileria) from the echidna and platypus. The presence of E. echidnae was demonstrated in faeces and tissues affected by disseminated coccidiosis. Moreover, the presence of E. echidnae DNA in the blood of echidnas was associated with atoxoplasma-like stages in white blood cells, suggesting Hepatozoon tachyglossi blood stages are disseminated E. echidnae stages. These next-generation DNA sequencing technologies are suited to material and organisms that have not been previously characterised and for which the material is scarce. The deep sequencing approach supports traditional diagnostic methods, including microscopy, clinical pathology and histopathology, to better define the status quo. This approach is particularly suitable for wildlife disease investigation. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Pasquale, V.; Chiozzi, P.; Verdoya, M.
2013-05-01
Temperatures recorded in wells as deep as 6 km drilled for hydrocarbon prospecting were used together with geological information to depict the thermal regime of the sedimentary sequence of the eastern sector of the Po Plain. After correction for drilling disturbance, temperature data were analyzed through an inversion technique based on a laterally constant thermal gradient model. The obtained thermal gradient is quite low within the deep carbonate unit (14 mK m- 1), while it is larger (53 mK m- 1) in the overlying impermeable formations. In the uppermost sedimentary layers, the thermal gradient is close to the regional average (21 mK m- 1). We argue that such a vertical change cannot be ascribed to thermal conductivity variation within the sedimentary sequence, but to deep groundwater flow. Since the hydrogeological characteristics (including litho-stratigraphic sequence and structural setting) hardly permit forced convection, we suggest that thermal convection might occur within the deep carbonate aquifer. The potential of this mechanism was evaluated by means of the Rayleigh number analysis. It turned out that permeability required for convection to occur must be larger than 3 10- 15 m2. The average over-heat ratio is 0.45. The lateral variation of hydrothermal regime was tested by using temperature data representing the aquifer thermal conditions. We found that thermal convection might be more developed and variable at the Ferrara High and its surroundings, where widespread fracturing may have increased permeability.
Danish, Shabbar F; Baltuch, Gordon H; Jaggi, Jurg L; Wong, Stephen
2008-04-01
Microelectrode recording during deep brain stimulation surgery is a useful adjunct for subthalamic nucleus (STN) localization. We hypothesize that information in the nonspike background activity can help identify STN boundaries. We present results from a novel quantitative analysis that accomplishes this goal. Thirteen consecutive microelectrode recordings were retrospectively analyzed. Spikes were removed from the recordings with an automated algorithm. The remaining "despiked" signals were converted via root mean square amplitude and curve length calculations into "feature profile" time series. Subthalamic nucleus boundaries determined by inspection, based on sustained deviations from baseline for each feature profile, were compared against those determined intraoperatively by the clinical neurophysiologist. Feature profile activity within STN exhibited a sustained rise in 10 of 13 tracks (77%). The sensitivity of STN entry was 60% and 90% for curve length and root mean square amplitude, respectively, when agreement within 0.5 mm of the neurophysiologist's prediction was used. Sensitivities were 70% and 100% for 1 mm accuracy. Exit point sensitivities were 80% and 90% for both features within 0.5 mm and 1.0 mm, respectively. Reproducible activity patterns in deep brain stimulation microelectrode recordings can allow accurate identification of STN boundaries. Quantitative analyses of this type may provide useful adjunctive information for electrode placement in deep brain stimulation surgery.
Han, R; Rai, A; Nakamura, M; Suzuki, H; Takahashi, H; Yamazaki, M; Saito, K
2016-01-01
Study on transcriptome, the entire pool of transcripts in an organism or single cells at certain physiological or pathological stage, is indispensable in unraveling the connection and regulation between DNA and protein. Before the advent of deep sequencing, microarray was the main approach to handle transcripts. Despite obvious shortcomings, including limited dynamic range and difficulties to compare the results from distinct experiments, microarray was widely applied. During the past decade, next-generation sequencing (NGS) has revolutionized our understanding of genomics in a fast, high-throughput, cost-effective, and tractable manner. By adopting NGS, efficiency and fruitful outcomes concerning the efforts to elucidate genes responsible for producing active compounds in medicinal plants were profoundly enhanced. The whole process involves steps, from the plant material sampling, to cDNA library preparation, to deep sequencing, and then bioinformatics takes over to assemble enormous-yet fragmentary-data from which to comb and extract information. The unprecedentedly rapid development of such technologies provides so many choices to facilitate the task, which can cause confusion when choosing the suitable methodology for specific purposes. Here, we review the general approaches for deep transcriptome analysis and then focus on their application in discovering biosynthetic pathways of medicinal plants that produce important secondary metabolites. © 2016 Elsevier Inc. All rights reserved.
Goldsmith, Dawn B.; Parsons, Rachel J.; Beyene, Damitu; Salamon, Peter
2015-01-01
Deep sequencing of the viral phoH gene, a host-derived auxiliary metabolic gene, was used to track viral diversity throughout the water column at the Bermuda Atlantic Time-series Study (BATS) site in the summer (September) and winter (March) of three years. Viral phoH sequences reveal differences in the viral communities throughout a depth profile and between seasons in the same year. Variation was also detected between the same seasons in subsequent years, though these differences were not as great as the summer/winter distinctions. Over 3,600 phoH operational taxonomic units (OTUs; 97% sequence identity) were identified. Despite high richness, most phoH sequences belong to a few large, common OTUs whereas the majority of the OTUs are small and rare. While many OTUs make sporadic appearances at just a few times or depths, a small number of OTUs dominate the community throughout the seasons, depths, and years. PMID:26157645
Nagahama, Hiroshi; Suzuki, Kengo; Shonai, Takaharu; Aratani, Kazuki; Sakurai, Yuuki; Nakamura, Manami; Sakata, Motomichi
2015-01-01
Electrodes are surgically implanted into the subthalamic nucleus (STN) of Parkinson's disease patients to provide deep brain stimulation. For ensuring correct positioning, the anatomic location of the STN must be determined preoperatively. Magnetic resonance imaging has been used for pinpointing the location of the STN. To identify the optimal imaging sequence for identifying the STN, we compared images produced with T2 star-weighted angiography (SWAN), gradient echo T2*-weighted imaging, and fast spin echo T2-weighted imaging in 6 healthy volunteers. Our comparison involved measurement of the contrast-to-noise ratio (CNR) for the STN and substantia nigra and a radiologist's interpretations of the images. Of the sequences examined, the CNR and qualitative scores were significantly higher on SWAN images than on other images (p < 0.01) for STN visualization. Kappa value (0.74) on SWAN images was the highest in three sequences for visualizing the STN. SWAN is the sequence best suited for identifying the STN at the present time.
Deep sequencing methods for protein engineering and design.
Wrenbeck, Emily E; Faber, Matthew S; Whitehead, Timothy A
2017-08-01
The advent of next-generation sequencing (NGS) has revolutionized protein science, and the development of complementary methods enabling NGS-driven protein engineering have followed. In general, these experiments address the functional consequences of thousands of protein variants in a massively parallel manner using genotype-phenotype linked high-throughput functional screens followed by DNA counting via deep sequencing. We highlight the use of information rich datasets to engineer protein molecular recognition. Examples include the creation of multiple dual-affinity Fabs targeting structurally dissimilar epitopes and engineering of a broad germline-targeted anti-HIV-1 immunogen. Additionally, we highlight the generation of enzyme fitness landscapes for conducting fundamental studies of protein behavior and evolution. We conclude with discussion of technological advances. Copyright © 2016 Elsevier Ltd. All rights reserved.
Dutta, Sutapa; Kumawat, Giriraj; Singh, Bikram P; Gupta, Deepak K; Singh, Sangeeta; Dogra, Vivek; Gaikwad, Kishor; Sharma, Tilak R; Raje, Ranjeet S; Bandhopadhya, Tapas K; Datta, Subhojit; Singh, Mahendra N; Bashasab, Fakrudin; Kulwal, Pawan; Wanjari, K B; K Varshney, Rajeev; Cook, Douglas R; Singh, Nagendra K
2011-01-20
Pigeonpea [Cajanus cajan (L.) Millspaugh], one of the most important food legumes of semi-arid tropical and subtropical regions, has limited genomic resources, particularly expressed sequence based (genic) markers. We report a comprehensive set of validated genic simple sequence repeat (SSR) markers using deep transcriptome sequencing, and its application in genetic diversity analysis and mapping. In this study, 43,324 transcriptome shotgun assembly unigene contigs were assembled from 1.696 million 454 GS-FLX sequence reads of separate pooled cDNA libraries prepared from leaf, root, stem and immature seed of two pigeonpea varieties, Asha and UPAS 120. A total of 3,771 genic-SSR loci, excluding homopolymeric and compound repeats, were identified; of which 2,877 PCR primer pairs were designed for marker development. Dinucleotide was the most common repeat motif with a frequency of 60.41%, followed by tri- (34.52%), hexa- (2.62%), tetra- (1.67%) and pentanucleotide (0.76%) repeat motifs. Primers were synthesized and tested for 772 of these loci with repeat lengths of ≥ 18 bp. Of these, 550 markers were validated for consistent amplification in eight diverse pigeonpea varieties; 71 were found to be polymorphic on agarose gel electrophoresis. Genetic diversity analysis was done on 22 pigeonpea varieties and eight wild species using 20 highly polymorphic genic-SSR markers. The number of alleles at these loci ranged from 4-10 and the polymorphism information content values ranged from 0.46 to 0.72. Neighbor-joining dendrogram showed distinct separation of the different groups of pigeonpea cultivars and wild species. Deep transcriptome sequencing of the two parental lines helped in silico identification of polymorphic genic-SSR loci to facilitate the rapid development of an intra-species reference genetic map, a subset of which was validated for expected allelic segregation in the reference mapping population. We developed 550 validated genic-SSR markers in pigeonpea using deep transcriptome sequencing. From these, 20 highly polymorphic markers were used to evaluate the genetic relationship among species of the genus Cajanus. A comprehensive set of genic-SSR markers was developed as an important genomic resource for diversity analysis and genetic mapping in pigeonpea.
2011-01-01
Background Pigeonpea [Cajanus cajan (L.) Millspaugh], one of the most important food legumes of semi-arid tropical and subtropical regions, has limited genomic resources, particularly expressed sequence based (genic) markers. We report a comprehensive set of validated genic simple sequence repeat (SSR) markers using deep transcriptome sequencing, and its application in genetic diversity analysis and mapping. Results In this study, 43,324 transcriptome shotgun assembly unigene contigs were assembled from 1.696 million 454 GS-FLX sequence reads of separate pooled cDNA libraries prepared from leaf, root, stem and immature seed of two pigeonpea varieties, Asha and UPAS 120. A total of 3,771 genic-SSR loci, excluding homopolymeric and compound repeats, were identified; of which 2,877 PCR primer pairs were designed for marker development. Dinucleotide was the most common repeat motif with a frequency of 60.41%, followed by tri- (34.52%), hexa- (2.62%), tetra- (1.67%) and pentanucleotide (0.76%) repeat motifs. Primers were synthesized and tested for 772 of these loci with repeat lengths of ≥18 bp. Of these, 550 markers were validated for consistent amplification in eight diverse pigeonpea varieties; 71 were found to be polymorphic on agarose gel electrophoresis. Genetic diversity analysis was done on 22 pigeonpea varieties and eight wild species using 20 highly polymorphic genic-SSR markers. The number of alleles at these loci ranged from 4-10 and the polymorphism information content values ranged from 0.46 to 0.72. Neighbor-joining dendrogram showed distinct separation of the different groups of pigeonpea cultivars and wild species. Deep transcriptome sequencing of the two parental lines helped in silico identification of polymorphic genic-SSR loci to facilitate the rapid development of an intra-species reference genetic map, a subset of which was validated for expected allelic segregation in the reference mapping population. Conclusion We developed 550 validated genic-SSR markers in pigeonpea using deep transcriptome sequencing. From these, 20 highly polymorphic markers were used to evaluate the genetic relationship among species of the genus Cajanus. A comprehensive set of genic-SSR markers was developed as an important genomic resource for diversity analysis and genetic mapping in pigeonpea. PMID:21251263
Tao, Yi-Fan; Qiang, Jun; Yin, Guo-Jun; Xu, Pao; Shi, Qiong; Bao, Jing-Wen
2017-10-01
MicroRNAs (miRNAs) play vital roles in modulating diverse metabolic processes in the liver, including lipid metabolism. Genetically improved farmed tilapia (GIFT, Oreochromis niloticus), an important aquaculture species in China, is susceptible to hepatic steatosis when reared in intensive culture systems. To investigate the miRNAs involved in GIFT lipid metabolism, two hepatic small RNA libraries from high-fat diet-fed and normal-fat diet-fed GIFT were constructed and sequenced using high-throughput sequencing technology. A total of 204 known and 56 novel miRNAs were identified by aligning the sequencing data with known Danio rerio miRNAs listed in miRBase 21.0. Six known miRNAs (miR-30a-5p, miR-34a, miR-145-5p, miR-29a, miR-205-5p, and miR-23a-3p) that were differentially expressed between the high-fat diet and normal-fat diet groups were validated by quantitative real-time PCR. Bioinformatics tools were used to predict the potential target genes of these differentially expressed miRNAs, and Gene Ontology enrichment analysis indicated that these miRNAs may play important roles in diet-induced hepatic steatosis in GIFT. Our results provide a foundation for further studies of the role of miRNAs in tilapia lipid homeostasis regulation, and may help to identify novel targets for therapeutic interventions to reduce the occurrence of fatty liver disease in farmed tilapia. Copyright © 2017. Published by Elsevier Ltd.
Parkes, R John; Sellek, Gerard; Webster, Gordon; Martin, Derek; Anders, Erik; Weightman, Andrew J; Sass, Henrik
2009-01-01
Deep subseafloor sediments may contain depressurization-sensitive, anaerobic, piezophilic prokaryotes. To test this we developed the DeepIsoBUG system, which when coupled with the HYACINTH pressure-retaining drilling and core storage system and the PRESS core cutting and processing system, enables deep sediments to be handled without depressurization (up to 25 MPa) and anaerobic prokaryotic enrichments and isolation to be conducted up to 100 MPa. Here, we describe the system and its first use with subsurface gas hydrate sediments from the Indian Continental Shelf, Cascadia Margin and Gulf of Mexico. Generally, highest cell concentrations in enrichments occurred close to in situ pressures (14 MPa) in a variety of media, although growth continued up to at least 80 MPa. Predominant sequences in enrichments were Carnobacterium, Clostridium, Marinilactibacillus and Pseudomonas, plus Acetobacterium and Bacteroidetes in Indian samples, largely independent of media and pressures. Related 16S rRNA gene sequences for all of these Bacteria have been detected in deep, subsurface environments, although isolated strains were piezotolerant, being able to grow at atmospheric pressure. Only the Clostridium and Acetobacterium were obligate anaerobes. No Archaea were enriched. It may be that these sediment samples were not deep enough (total depth 1126–1527 m) to obtain obligate piezophiles. PMID:19694787
USDA-ARS?s Scientific Manuscript database
Modern day genomics holds the promise of solving the complexities of basic plant sciences, and of catalyzing practical advances in plant breeding. While contiguous, "base perfect" deep sequencing is a key module of any genome project, recent advances in parallel next generation sequencing technologi...
3′ terminal diversity of MRP RNA and other human noncoding RNAs revealed by deep sequencing
2013-01-01
Background Post-transcriptional 3′ end processing is a key component of RNA regulation. The abundant and essential RNA subunit of RNase MRP has been proposed to function in three distinct cellular compartments and therefore may utilize this mode of regulation. Here we employ 3′ RACE coupled with high-throughput sequencing to characterize the 3′ terminal sequences of human MRP RNA and other noncoding RNAs that form RNP complexes. Results The 3′ terminal sequence of MRP RNA from HEK293T cells has a distinctive distribution of genomically encoded termini (including an assortment of U residues) with a portion of these selectively tagged by oligo(A) tails. This profile contrasts with the relatively homogenous 3′ terminus of an in vitro transcribed MRP RNA control and the differing 3′ terminal profiles of U3 snoRNA, RNase P RNA, and telomerase RNA (hTR). Conclusions 3′ RACE coupled with deep sequencing provides a valuable framework for the functional characterization of 3′ terminal sequences of noncoding RNAs. PMID:24053768
Schilmiller, Anthony L; Miner, Dennis P; Larson, Matthew; McDowell, Eric; Gang, David R; Wilkerson, Curtis; Last, Robert L
2010-07-01
Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces beta-caryophyllene and alpha-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells.
Schilmiller, Anthony L.; Miner, Dennis P.; Larson, Matthew; McDowell, Eric; Gang, David R.; Wilkerson, Curtis; Last, Robert L.
2010-01-01
Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces β-caryophyllene and α-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells. PMID:20431087
Geith, Tobias; Schmidt, Gerwin; Biffar, Andreas; Dietrich, Olaf; Dürr, Hans Roland; Reiser, Maximilian; Baur-Melnyk, Andrea
2012-11-01
The objective of our study was to compare the diagnostic value of qualitative diffusion-weighted imaging (DWI), quantitative DWI, and chemical-shift imaging in a single prospective cohort of patients with acute osteoporotic and malignant vertebral fractures. The study group was composed of patients with 26 osteoporotic vertebral fractures (18 women, eight men; mean age, 69 years; age range, 31 years 6 months to 86 years 2 months) and 20 malignant vertebral fractures (nine women, 11 men; mean age, 63.4 years; age range, 24 years 8 months to 86 years 4 months). T1-weighted, STIR, and T2-weighted sequences were acquired at 1.5 T. A DW reverse fast imaging with steady-state free precession (PSIF) sequence at different delta values was evaluated qualitatively. A DW echo-planar imaging (EPI) sequence and a DW single-shot turbo spin-echo (TSE) sequence at different b values were evaluated qualitatively and quantitatively using the apparent diffusion coefficient. Opposed-phase sequences were used to assess signal intensity qualitatively. The signal loss between in- and opposed-phase images was determined quantitatively. Two-tailed Fisher exact test, Mann-Whitney test, and receiver operating characteristic analysis were performed. Sensitivities, specificities, and accuracies were determined. Qualitative DW-PSIF imaging (delta = 3 ms) showed the best performance for distinguishing between benign and malignant fractures (sensitivity, 100%; specificity, 88.5%; accuracy, 93.5%). Qualitative DW-EPI (b = 50 s/mm(2) [p = 1.00]; b = 250 s/mm(2) [p = 0.50]) and DW single-shot TSE imaging (b = 100 s/mm(2) [p = 1.00]; b = 250 s/mm(2) [p = 0.18]; b = 400 s/mm(2) [p = 0.18]; b = 600 s/mm(2) [p = 0.39]) did not indicate significant differences between benign and malignant fractures. DW-EPI using a b value of 500 s/mm(2) (p = 0.01) indicated significant differences between benign and malignant vertebral fractures. Quantitative DW-EPI (p = 0.09) and qualitative opposed-phase imaging (p = 0.06) did not exhibit significant differences, quantitative DW single-shot TSE imaging (p = 0.002) and quantitative chemical-shift imaging (p = 0.01) showed significant differences between benign and malignant fractures. The DW-PSIF sequence (delta = 3 ms) had the highest accuracy in differentiating benign from malignant vertebral fractures. Quantitative chemical-shift imaging and quantitative DW single-shot TSE imaging had a lower accuracy than DW-PSIF imaging because of a large overlap. Qualitative assessment of opposed-phase, DW-EPI, and DW single-shot TSE sequences and quantitative assessment of the DW-EPI sequence were not suitable for distinguishing between benign and malignant vertebral fractures.
Optical Communications Channel Combiner
NASA Technical Reports Server (NTRS)
Quirk, Kevin J.; Quirk, Kevin J.; Nguyen, Danh H.; Nguyen, Huy
2012-01-01
NASA has identified deep-space optical communications links as an integral part of a unified space communication network in order to provide data rates in excess of 100 Mb/s. The distances and limited power inherent in a deep-space optical downlink necessitate the use of photon-counting detectors and a power-efficient modulation such as pulse position modulation (PPM). For the output of each photodetector, whether from a separate telescope or a portion of the detection area, a communication receiver estimates a log-likelihood ratio for each PPM slot. To realize the full effective aperture of these receivers, their outputs must be combined prior to information decoding. A channel combiner was developed to synchronize the log-likelihood ratio (LLR) sequences of multiple receivers, and then combines these into a single LLR sequence for information decoding. The channel combiner synchronizes the LLR sequences of up to three receivers and then combines these into a single LLR sequence for output. The channel combiner has three channel inputs, each of which takes as input a sequence of four-bit LLRs for each PPM slot in a codeword via a XAUI 10 Gb/s quad optical fiber interface. The cross-correlation between the channels LLR time series are calculated and used to synchronize the sequences prior to combining. The output of the channel combiner is a sequence of four-bit LLRs for each PPM slot in a codeword via a XAUI 10 Gb/s quad optical fiber interface. The unit is controlled through a 1 Gb/s Ethernet UDP/IP interface. A deep-space optical communication link has not yet been demonstrated. This ground-station channel combiner was developed to demonstrate this capability and is unique in its ability to process such a signal.
The 3-D aftershock distribution of three recent M5~5.5 earthquakes in the Anza region,California
NASA Astrophysics Data System (ADS)
Zhang, Q.; Wdowinski, S.; Lin, G.
2011-12-01
The San Jacinto fault zone (SJFZ) exhibits the highest level of seismicity compared to other regions in southern California. On average, it produces four earthquakes per day, most of them at depth of 10-17 km. Over the past decade, an increasing seismic activity occurred in the Anza region, which included three M5~5.5 events and their aftershock sequences. These events occurred in 2001, 2005, and 2010. In this research we map the 3-D distribution of these three events to evaluate their rupture geometry and better understand the unusual deep seismic pattern along the SJFZ, which was termed "deep creep" (Wdowinski, 2009). We relocated 97,562 events from 1981 to 2011 in Anza region by applying the Source-Specific Station Term (SSST) method (Lin et al., 2006) and used an accurate 1-D velocity model derived from 3-D model of Lin et al (2007) and used In order to separate the aftershock sequence from background seismicity, we characterized each of the three aftershock sequences using Omori's law. Preliminary results show that all three sequences had a similar geometry of deep elongated aftershock distribution. Most aftershocks occurred at depth of 10-17 km and extended over a 70 km long segments of the SJFZ, centered at the mainshock hypocenters. A comparative study of other M5~5.5 mainshocks and their aftershock sequences in southern California reveals very different geometrical pattern, suggesting that the three Anza M5~5.5 events are unique and can be indicative of "deep creep" deformation processes. Reference 1.Lin, G.and Shearer,P.M.,2006, The COMPLOC earthquake location package,Seism. Res. Lett.77, pp.440-444. 2.Lin, G. and Shearer, P.M., Hauksson, E., and Thurber C.H.,2007, A three-dimensional crustal seismic velocity model for southern California from a composite event method,J. Geophys.Res.112, B12306, doi: 10.1029/ 2007JB004977. 3.Wdowinski, S. ,2009, Deep creep as a cause for the excess seismicity along the San Jacinto fault, Nat. Geosci.,doi:10.1038/NGEO684.
Effects of hydrostatic pressure on yeasts isolated from deep-sea hydrothermal vents.
Burgaud, Gaëtan; Hué, Nguyen Thi Minh; Arzur, Danielle; Coton, Monika; Perrier-Cornet, Jean-Marie; Jebbar, Mohamed; Barbier, Georges
2015-11-01
Hydrostatic pressure plays a significant role in the distribution of life in the biosphere. Knowledge of deep-sea piezotolerant and (hyper)piezophilic bacteria and archaea diversity has been well documented, along with their specific adaptations to cope with high hydrostatic pressure (HHP). Recent investigations of deep-sea microbial community compositions have shown unexpected micro-eukaryotic communities, mainly dominated by fungi. Molecular methods such as next-generation sequencing have been used for SSU rRNA gene sequencing to reveal fungal taxa. Currently, a difficult but fascinating challenge for marine mycologists is to create deep-sea marine fungus culture collections and assess their ability to cope with pressure. Indeed, although there is no universal genetic marker for piezoresistance, physiological analyses provide concrete relevant data for estimating their adaptations and understanding the role of fungal communities in the abyss. The present study investigated morphological and physiological responses of fungi to HHP using a collection of deep-sea yeasts as a model. The aim was to determine whether deep-sea yeasts were able to tolerate different HHP and if they were metabolically active. Here we report an unexpected taxonomic-based dichotomic response to pressure with piezosensitve ascomycetes and piezotolerant basidiomycetes, and distinct morphological switches triggered by pressure for certain strains. Copyright © 2015 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.
Lonardi, Stefano; Mirebrahim, Hamid; Wanamaker, Steve; Alpert, Matthew; Ciardo, Gianfranco; Duma, Denisa; Close, Timothy J
2015-09-15
As the invention of DNA sequencing in the 70s, computational biologists have had to deal with the problem of de novo genome assembly with limited (or insufficient) depth of sequencing. In this work, we investigate the opposite problem, that is, the challenge of dealing with excessive depth of sequencing. We explore the effect of ultra-deep sequencing data in two domains: (i) the problem of decoding reads to bacterial artificial chromosome (BAC) clones (in the context of the combinatorial pooling design we have recently proposed), and (ii) the problem of de novo assembly of BAC clones. Using real ultra-deep sequencing data, we show that when the depth of sequencing increases over a certain threshold, sequencing errors make these two problems harder and harder (instead of easier, as one would expect with error-free data), and as a consequence the quality of the solution degrades with more and more data. For the first problem, we propose an effective solution based on 'divide and conquer': we 'slice' a large dataset into smaller samples of optimal size, decode each slice independently, and then merge the results. Experimental results on over 15 000 barley BACs and over 4000 cowpea BACs demonstrate a significant improvement in the quality of the decoding and the final assembly. For the second problem, we show for the first time that modern de novo assemblers cannot take advantage of ultra-deep sequencing data. Python scripts to process slices and resolve decoding conflicts are available from http://goo.gl/YXgdHT; software Hashfilter can be downloaded from http://goo.gl/MIyZHs stelo@cs.ucr.edu or timothy.close@ucr.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Stability of deep features across CT scanners and field of view using a physical phantom
NASA Astrophysics Data System (ADS)
Paul, Rahul; Shafiq-ul-Hassan, Muhammad; Moros, Eduardo G.; Gillies, Robert J.; Hall, Lawrence O.; Goldgof, Dmitry B.
2018-02-01
Radiomics is the process of analyzing radiological images by extracting quantitative features for monitoring and diagnosis of various cancers. Analyzing images acquired from different medical centers is confounded by many choices in acquisition, reconstruction parameters and differences among device manufacturers. Consequently, scanning the same patient or phantom using various acquisition/reconstruction parameters as well as different scanners may result in different feature values. To further evaluate this issue, in this study, CT images from a physical radiomic phantom were used. Recent studies showed that some quantitative features were dependent on voxel size and that this dependency could be reduced or removed by the appropriate normalization factor. Deep features extracted from a convolutional neural network, may also provide additional features for image analysis. Using a transfer learning approach, we obtained deep features from three convolutional neural networks pre-trained on color camera images. An we examination of the dependency of deep features on image pixel size was done. We found that some deep features were pixel size dependent, and to remove this dependency we proposed two effective normalization approaches. For analyzing the effects of normalization, a threshold has been used based on the calculated standard deviation and average distance from a best fit horizontal line among the features' underlying pixel size before and after normalization. The inter and intra scanner dependency of deep features has also been evaluated.
Covell, Christine L; Sidani, Souraya; Ritchie, Judith A
2012-06-01
The sequence used for collecting quantitative and qualitative data in concurrent mixed-methods research may influence participants' responses. Empirical evidence is needed to determine if the order of data collection in concurrent mixed methods research biases participants' responses to closed and open-ended questions. To examine the influence of the quantitative-qualitative sequence on responses to closed and open-ended questions when assessing the same variables or aspects of a phenomenon simultaneously within the same study phase. A descriptive cross-sectional, concurrent mixed-methods design was used to collect quantitative (survey) and qualitative (interview) data. The setting was a large multi-site health care centre in Canada. A convenience sample of 50 registered nurses was selected and participated in the study. Participants were randomly assigned to one of two sequences for data collection, quantitative-qualitative or qualitative-quantitative. Independent t-tests were performed to compare the two groups' responses to the survey items. Directed content analysis was used to compare the participants' responses to the interview questions. The sequence of data collection did not greatly affect the participants' responses to the closed-ended questions (survey items) or the open-ended questions (interview questions). The sequencing of data collection, when using both survey and semi-structured interviews, may not bias participants' responses to closed or open-ended questions. Additional research is required to confirm these findings. Copyright © 2011 Elsevier Ltd. All rights reserved.
Unique microbial community in drilling fluids from Chinese continental scientific drilling
Zhang, Gengxin; Dong, Hailiang; Jiang, Hongchen; Xu, Zhiqin; Eberl, Dennis D.
2006-01-01
Circulating drilling fluid is often regarded as a contamination source in investigations of subsurface microbiology. However, it also provides an opportunity to sample geological fluids at depth and to study contained microbial communities. During our study of deep subsurface microbiology of the Chinese Continental Scientific Deep drilling project, we collected 6 drilling fluid samples from a borehole from 2290 to 3350 m below the land surface. Microbial communities in these samples were characterized with cultivation-dependent and -independent techniques. Characterization of 16S rRNA genes indicated that the bacterial clone sequences related to Firmicutes became progressively dominant with increasing depth. Most sequences were related to anaerobic, thermophilic, halophilic or alkaliphilic bacteria. These habitats were consistent with the measured geochemical characteristics of the drilling fluids that have incorporated geological fluids and partly reflected the in-situ conditions. Several clone types were closely related to Thermoanaerobacter ethanolicus, Caldicellulosiruptor lactoaceticus, and Anaerobranca gottschalkii, an anaerobic metal-reducer, an extreme thermophile, and an anaerobic chemoorganotroph, respectively, with an optimal growth temperature of 50–68°C. Seven anaerobic, thermophilic Fe(III)-reducing bacterial isolates were obtained and they were capable of reducing iron oxide and clay minerals to produce siderite, vivianite, and illite. The archaeal diversity was low. Most archaeal sequences were not related to any known cultivated species, but rather to environmental clone sequences recovered from subsurface environments. We infer that the detected microbes were derived from geological fluids at depth and their growth habitats reflected the deep subsurface conditions. These findings have important implications for microbial survival and their ecological functions in the deep subsurface.
Adhikari, Badri; Hou, Jie; Cheng, Jianlin
2018-03-01
In this study, we report the evaluation of the residue-residue contacts predicted by our three different methods in the CASP12 experiment, focusing on studying the impact of multiple sequence alignment, residue coevolution, and machine learning on contact prediction. The first method (MULTICOM-NOVEL) uses only traditional features (sequence profile, secondary structure, and solvent accessibility) with deep learning to predict contacts and serves as a baseline. The second method (MULTICOM-CONSTRUCT) uses our new alignment algorithm to generate deep multiple sequence alignment to derive coevolution-based features, which are integrated by a neural network method to predict contacts. The third method (MULTICOM-CLUSTER) is a consensus combination of the predictions of the first two methods. We evaluated our methods on 94 CASP12 domains. On a subset of 38 free-modeling domains, our methods achieved an average precision of up to 41.7% for top L/5 long-range contact predictions. The comparison of the three methods shows that the quality and effective depth of multiple sequence alignments, coevolution-based features, and machine learning integration of coevolution-based features and traditional features drive the quality of predicted protein contacts. On the full CASP12 dataset, the coevolution-based features alone can improve the average precision from 28.4% to 41.6%, and the machine learning integration of all the features further raises the precision to 56.3%, when top L/5 predicted long-range contacts are evaluated. And the correlation between the precision of contact prediction and the logarithm of the number of effective sequences in alignments is 0.66. © 2017 Wiley Periodicals, Inc.
Burroughs, A Maxwell; Ando, Yoshinari; de Hoon, Michiel J L; Tomaru, Yasuhiro; Nishibu, Takahiro; Ukekawa, Ryo; Funakoshi, Taku; Kurokawa, Tsutomu; Suzuki, Harukazu; Hayashizaki, Yoshihide; Daub, Carsten O
2010-10-01
Animal microRNA sequences are subject to 3' nucleotide addition. Through detailed analysis of deep-sequenced short RNA data sets, we show adenylation and uridylation of miRNA is globally present and conserved across Drosophila and vertebrates. To better understand 3' adenylation function, we deep-sequenced RNA after knockdown of nucleotidyltransferase enzymes. The PAPD4 nucleotidyltransferase adenylates a wide range of miRNA loci, but adenylation does not appear to affect miRNA stability on a genome-wide scale. Adenine addition appears to reduce effectiveness of miRNA targeting of mRNA transcripts while deep-sequencing of RNA bound to immunoprecipitated Argonaute (AGO) subfamily proteins EIF2C1-EIF2C3 revealed substantial reduction of adenine addition in miRNA associated with EIF2C2 and EIF2C3. Our findings show 3' addition events are widespread and conserved across animals, PAPD4 is a primary miRNA adenylating enzyme, and suggest a role for 3' adenine addition in modulating miRNA effectiveness, possibly through interfering with incorporation into the RNA-induced silencing complex (RISC), a regulatory role that would complement the role of miRNA uridylation in blocking DICER1 uptake.
HomozygosityMapper2012--bridging the gap between homozygosity mapping and deep sequencing.
Seelow, Dominik; Schuelke, Markus
2012-07-01
Homozygosity mapping is a common method to map recessive traits in consanguineous families. To facilitate these analyses, we have developed HomozygosityMapper, a web-based approach to homozygosity mapping. HomozygosityMapper allows researchers to directly upload the genotype files produced by the major genotyping platforms as well as deep sequencing data. It detects stretches of homozygosity shared by the affected individuals and displays them graphically. Users can interactively inspect the underlying genotypes, manually refine these regions and eventually submit them to our candidate gene search engine GeneDistiller to identify the most promising candidate genes. Here, we present the new version of HomozygosityMapper. The most striking new feature is the support of Next Generation Sequencing *.vcf files as input. Upon users' requests, we have implemented the analysis of common experimental rodents as well as of important farm animals. Furthermore, we have extended the options for single families and loss of heterozygosity studies. Another new feature is the export of *.bed files for targeted enrichment of the potential disease regions for deep sequencing strategies. HomozygosityMapper also generates files for conventional linkage analyses which are already restricted to the possible disease regions, hence superseding CPU-intensive genome-wide analyses. HomozygosityMapper is freely available at http://www.homozygositymapper.org/.
Maximum entropy methods for extracting the learned features of deep neural networks.
Finnegan, Alex; Song, Jun S
2017-10-01
New architectures of multilayer artificial neural networks and new methods for training them are rapidly revolutionizing the application of machine learning in diverse fields, including business, social science, physical sciences, and biology. Interpreting deep neural networks, however, currently remains elusive, and a critical challenge lies in understanding which meaningful features a network is actually learning. We present a general method for interpreting deep neural networks and extracting network-learned features from input data. We describe our algorithm in the context of biological sequence analysis. Our approach, based on ideas from statistical physics, samples from the maximum entropy distribution over possible sequences, anchored at an input sequence and subject to constraints implied by the empirical function learned by a network. Using our framework, we demonstrate that local transcription factor binding motifs can be identified from a network trained on ChIP-seq data and that nucleosome positioning signals are indeed learned by a network trained on chemical cleavage nucleosome maps. Imposing a further constraint on the maximum entropy distribution also allows us to probe whether a network is learning global sequence features, such as the high GC content in nucleosome-rich regions. This work thus provides valuable mathematical tools for interpreting and extracting learned features from feed-forward neural networks.
Osca, David; Templado, José; Zardoya, Rafael
2014-09-01
The complete nucleotide sequence of the mitochondrial (mt) genome of the deep-sea vent snail Ifremeria nautilei (Gastropoda: Abyssochrysoidea) was determined. The double stranded circular molecule is 15,664 pb in length and encodes for the typical 37 metazoan mitochondrial genes. The gene arrangement of the Ifremeria mt genome is most similar to genome organization of caenogastropods and differs only on the relative position of the trnW gene. The deduced amino acid sequences of the mt protein coding genes of Ifremeria mt genome were aligned with orthologous sequences from representatives of the main lineages of gastropods and phylogenetic relationships were inferred. The reconstructed phylogeny supports that Ifremeria belongs to Caenogastropoda and that it is closely related to hypsogastropod superfamilies. Results were compared with a reconstructed nuclear-based phylogeny. Moreover, a relaxed molecular-clock timetree calibrated with fossils dated the divergence of Abyssochrysoidea in the Late Jurassic-Early Cretaceous indicating a relatively modern colonization of deep-sea environments by these snails. Copyright © 2014 Elsevier B.V. All rights reserved.
An Alu-based, MGB Eclipse real-time PCR method for quantitation of human DNA in forensic samples.
Nicklas, Janice A; Buel, Eric
2005-09-01
The forensic community needs quick, reliable methods to quantitate human DNA in crime scene samples to replace the laborious and imprecise slot blot method. A real-time PCR based method has the possibility of allowing development of a faster and more quantitative assay. Alu sequences are primate-specific and are found in many copies in the human genome, making these sequences an excellent target or marker for human DNA. This paper describes the development of a real-time Alu sequence-based assay using MGB Eclipse primers and probes. The advantages of this assay are simplicity, speed, less hands-on-time and automated quantitation, as well as a large dynamic range (128 ng/microL to 0.5 pg/microL).
High Resolution Qualitative and Quantitative MR Evaluation of the Glenoid Labrum
Iwasaki, Kenyu; Tafur, Monica; Chang, Eric Y.; SherondaStatum; Biswas, Reni; Tran, Betty; Bae, Won C.; Du, Jiang; Bydder, Graeme M.; Chung, Christine B.
2015-01-01
Objective To implement qualitative and quantitative MR sequences for the evaluation of labral pathology. Methods Six glenoid labra were dissected and the anterior and posterior portions were divided into normal, mildly degenerated, or severely degenerated groups using gross and MR findings. Qualitative evaluation was performed using T1-weighted, proton density-weighted (PD), spoiled gradient echo (SPGR) and ultra-short echo time (UTE) sequences. Quantitative evaluation included T2 and T1rho measurements as well as T1, T2*, and T1rho measurements acquired with UTE techniques. Results SPGR and UTE sequences best demonstrated labral fiber structure. Degenerated labra had a tendency towards decreased T1 values, increased T2/T2* values and increased T1 rho values. T2* values obtained with the UTE sequence allowed for delineation between normal, mildly degenerated and severely degenerated groups (p<0.001). Conclusion Quantitative T2* measurements acquired with the UTE technique are useful for distinguishing between normal, mildly degenerated and severely degenerated labra. PMID:26359581
Jensen, Corey T; Chahin, Antoun; Amin, Veral D; Khalaf, Ahmed M; Elsayes, Khaled M; Wagner-Bartak, Nicolaus; Zhao, Bo; Zhou, Shouhao; Bedi, Deepak G
2017-09-01
To determine whether the qualitative sonographic appearance of slow deep venous flow in the lower extremities correlates with quantitative slow flow and an increased risk of deep venous thrombosis (DVT) in oncology patients. In this Institutional Review Board-approved retrospective study, we reviewed lower extremity venous Doppler sonographic examinations of 975 consecutive patients: 482 with slow flow and 493 with normal flow. The subjective slow venous flow and absence of initial DVT were confirmed by 2 radiologists. Peak velocities were recorded at 3 levels. Each patient was followed for DVT development. The associations between DVT and the presence of slow venous flow were examined by the Fisher exact test; a 2-sample t test was used for peak velocity and DVT group comparisons. The optimal cutoff peak velocity for correlation with the radiologists' perceived slow flow was determined by the Youden index. Deep venous thrombosis development in the slow-flow group (21 of 482 [4.36%]) was almost doubled compared with patients who had normal flow (11 of 493 [2.23%]; P = .0456). Measured peak venous velocities were lower in the slow-venous flow group (P < .001). Patients with subsequent DVT did not have a significant difference in venous velocities compared with their respective patient groups. The sum of 3 venous level velocities resulted in the best cutoff for dichotomizing groups into normal versus slow venous flow. Qualitative slow venous flow in the lower extremities on Doppler sonography accurately correlates with quantitatively slower flow, and this preliminary evaluation suggests an associated mildly increased rate of subsequent DVT development in oncology patients. © 2017 by the American Institute of Ultrasound in Medicine.
A Statistical Guide to the Design of Deep Mutational Scanning Experiments
Matuszewski, Sebastian; Hildebrandt, Marcel E.; Ghenu, Ana-Hermina; Jensen, Jeffrey D.; Bank, Claudia
2016-01-01
The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deep-sequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates. PMID:27412710
Rudnick, Paul A.; Markey, Sanford P.; Roth, Jeri; Mirokhin, Yuri; Yan, Xinjian; Tchekhovskoi, Dmitrii V.; Edwards, Nathan J.; Thangudu, Ratna R.; Ketchum, Karen A.; Kinsinger, Christopher R.; Mesri, Mehdi; Rodriguez, Henry; Stein, Stephen E.
2016-01-01
The Clinical Proteomic Tumor Analysis Consortium (CPTAC) has produced large proteomics datasets from the mass spectrometric interrogation of tumor samples previously analyzed by The Cancer Genome Atlas (TCGA) program. The availability of the genomic and proteomic data is enabling proteogenomic study for both reference (i.e., contained in major sequence databases) and non-reference markers of cancer. The CPTAC labs have focused on colon, breast, and ovarian tissues in the first round of analyses; spectra from these datasets were produced from 2D LC-MS/MS analyses and represent deep coverage. To reduce the variability introduced by disparate data analysis platforms (e.g., software packages, versions, parameters, sequence databases, etc.), the CPTAC Common Data Analysis Platform (CDAP) was created. The CDAP produces both peptide-spectrum-match (PSM) reports and gene-level reports. The pipeline processes raw mass spectrometry data according to the following: (1) Peak-picking and quantitative data extraction, (2) database searching, (3) gene-based protein parsimony, and (4) false discovery rate (FDR)-based filtering. The pipeline also produces localization scores for the phosphopeptide enrichment studies using the PhosphoRS program. Quantitative information for each of the datasets is specific to the sample processing, with PSM and protein reports containing the spectrum-level or gene-level (“rolled-up”) precursor peak areas and spectral counts for label-free or reporter ion log-ratios for 4plex iTRAQ™. The reports are available in simple tab-delimited formats and, for the PSM-reports, in mzIdentML. The goal of the CDAP is to provide standard, uniform reports for all of the CPTAC data, enabling comparisons between different samples and cancer types as well as across the major ‘omics fields. PMID:26860878
Rudnick, Paul A; Markey, Sanford P; Roth, Jeri; Mirokhin, Yuri; Yan, Xinjian; Tchekhovskoi, Dmitrii V; Edwards, Nathan J; Thangudu, Ratna R; Ketchum, Karen A; Kinsinger, Christopher R; Mesri, Mehdi; Rodriguez, Henry; Stein, Stephen E
2016-03-04
The Clinical Proteomic Tumor Analysis Consortium (CPTAC) has produced large proteomics data sets from the mass spectrometric interrogation of tumor samples previously analyzed by The Cancer Genome Atlas (TCGA) program. The availability of the genomic and proteomic data is enabling proteogenomic study for both reference (i.e., contained in major sequence databases) and nonreference markers of cancer. The CPTAC laboratories have focused on colon, breast, and ovarian tissues in the first round of analyses; spectra from these data sets were produced from 2D liquid chromatography-tandem mass spectrometry analyses and represent deep coverage. To reduce the variability introduced by disparate data analysis platforms (e.g., software packages, versions, parameters, sequence databases, etc.), the CPTAC Common Data Analysis Platform (CDAP) was created. The CDAP produces both peptide-spectrum-match (PSM) reports and gene-level reports. The pipeline processes raw mass spectrometry data according to the following: (1) peak-picking and quantitative data extraction, (2) database searching, (3) gene-based protein parsimony, and (4) false-discovery rate-based filtering. The pipeline also produces localization scores for the phosphopeptide enrichment studies using the PhosphoRS program. Quantitative information for each of the data sets is specific to the sample processing, with PSM and protein reports containing the spectrum-level or gene-level ("rolled-up") precursor peak areas and spectral counts for label-free or reporter ion log-ratios for 4plex iTRAQ. The reports are available in simple tab-delimited formats and, for the PSM-reports, in mzIdentML. The goal of the CDAP is to provide standard, uniform reports for all of the CPTAC data to enable comparisons between different samples and cancer types as well as across the major omics fields.
Quantitative phase microscopy using deep neural networks
NASA Astrophysics Data System (ADS)
Li, Shuai; Sinha, Ayan; Lee, Justin; Barbastathis, George
2018-02-01
Deep learning has been proven to achieve ground-breaking accuracy in various tasks. In this paper, we implemented a deep neural network (DNN) to achieve phase retrieval in a wide-field microscope. Our DNN utilized the residual neural network (ResNet) architecture and was trained using the data generated by a phase SLM. The results showed that our DNN was able to reconstruct the profile of the phase target qualitatively. In the meantime, large error still existed, which indicated that our approach still need to be improved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Whitehead, Timothy A.; Chevalier, Aaron; Song, Yifan
2012-06-19
We show that comprehensive sequence-function maps obtained by deep sequencing can be used to reprogram interaction specificity and to leapfrog over bottlenecks in affinity maturation by combining many individually small contributions not detectable in conventional approaches. We use this approach to optimize two computationally designed inhibitors against H1N1 influenza hemagglutinin and, in both cases, obtain variants with subnanomolar binding affinity. The most potent of these, a 51-residue protein, is broadly cross-reactive against all influenza group 1 hemagglutinins, including human H2, and neutralizes H1N1 viruses with a potency that rivals that of several human monoclonal antibodies, demonstrating that computational design followedmore » by comprehensive energy landscape mapping can generate proteins with potential therapeutic utility.« less
Dissecting genetic and environmental mutation signatures with model organisms.
Segovia, Romulo; Tam, Annie S; Stirling, Peter C
2015-08-01
Deep sequencing has impacted on cancer research by enabling routine sequencing of genomes and exomes to identify genetic changes associated with carcinogenesis. Researchers can now use the frequency, type, and context of all mutations in tumor genomes to extract mutation signatures that reflect the driving mutational processes. Identifying mutation signatures, however, may not immediately suggest a mechanism. Consequently, several recent studies have employed deep sequencing of model organisms exposed to discrete genetic or environmental perturbations. These studies exploit the simpler genomes and availability of powerful genetic tools in model organisms to analyze mutation signatures under controlled conditions, forging mechanistic links between mutational processes and signatures. We discuss the power of this approach and suggest that many such studies may be on the horizon. Copyright © 2015 Elsevier Ltd. All rights reserved.
Desjardin, Dennis E; Hemmes, Don E; Perry, Brian A
2014-01-01
Pseudobaeospora wipapatiae is described as new based on material collected in alien wet habitats on the island of Hawaii. Unique features of this beautiful species include deep ruby-colored basidiomes with two-spored basidia, amyloid cheilocystidia and a hymeniderm pileipellis with abundant pileocystidia that is initially deep ruby in KOH then changes to lilac gray. Phylogenetic analysis of nuclear large ribosomal subunit sequence data suggest a close relationship between Pseudobaeospora and Tricholoma. BLAST comparisons of internal transcribed spacer and 5.8S nuclear ribosomal subunit regions sequence data reveal greatest similarity with existing sequences of Pseudobaeospora species. A comprehensive description, color photograph, illustrations of salient micromorphological features and comparisons with phenetically similar taxa are provided. © 2014 by The Mycological Society of America.
Yang, Aifu; Zhou, Zunchun; Pan, Yongjia; Jiang, Jingwei; Dong, Ying; Guan, Xiaoyan; Sun, Hongjuan; Gao, Shan; Chen, Zhong
2016-06-14
Sea cucumber Apostichopus japonicus is an important economic species in China, which is affected by various diseases; skin ulceration syndrome (SUS) is the most serious. In this study, we characterized the transcriptomes in A. japonicus challenged with Vibrio splendidus to elucidate the changes in gene expression throughout the three stages of SUS progression. RNA sequencing of 21 cDNA libraries from various tissues and developmental stages of SUS-affected A. japonicus yielded 553 million raw reads, of which 542 million high-quality reads were generated by deep-sequencing using the Illumina HiSeq™ 2000 platform. The reference transcriptome comprised a combination of the Illumina reads, 454 sequencing data and Sanger sequences obtained from the public database to generate 93,163 unigenes (average length, 1,052 bp; N50 = 1,575 bp); 33,860 were annotated. Transcriptome comparisons between healthy and SUS-affected A. japonicus revealed greater differences in gene expression profiles in the body walls (BW) than in the intestines (Int), respiratory trees (RT) and coelomocytes (C). Clustering of expression models revealed stable up-regulation as the main pattern occurring in the BW throughout the three stages of SUS progression. Significantly affected pathways were associated with signal transduction, immune system, cellular processes, development and metabolism. Ninety-two differentially expressed genes (DEGs) were divided into four functional categories: attachment/pathogen recognition (17), inflammatory reactions (38), oxidative stress response (7) and apoptosis (30). Using quantitative real-time PCR, twenty representative DEGs were selected to validate the sequencing results. The Pearson's correlation coefficient (R) of the 20 DEGs ranged from 0.811 to 0.999, which confirmed the consistency and accuracy between these two approaches. Dynamic changes in global gene expression occur during SUS progression in A. japonicus. Elucidation of these changes is important in clarifying the molecular mechanisms associated with the development of SUS in sea cucumber.
DeepSig: deep learning improves signal peptide detection in proteins.
Savojardo, Castrense; Martelli, Pier Luigi; Fariselli, Piero; Casadio, Rita
2018-05-15
The identification of signal peptides in protein sequences is an important step toward protein localization and function characterization. Here, we present DeepSig, an improved approach for signal peptide detection and cleavage-site prediction based on deep learning methods. Comparative benchmarks performed on an updated independent dataset of proteins show that DeepSig is the current best performing method, scoring better than other available state-of-the-art approaches on both signal peptide detection and precise cleavage-site identification. DeepSig is available as both standalone program and web server at https://deepsig.biocomp.unibo.it. All datasets used in this study can be obtained from the same website. pierluigi.martelli@unibo.it. Supplementary data are available at Bioinformatics online.
Are Deep Strategic Learners Better Suited to PBL? A Preliminary Study
ERIC Educational Resources Information Center
Papinczak, Tracey
2009-01-01
The aim of this study was to determine if medical students categorised as having deep and strategic approaches to their learning find problem-based learning (PBL) enjoyable and supportive of their learning, and achieve well in the first-year course. Quantitative and qualitative data were gathered from first-year medical students (N = 213). All…
ERIC Educational Resources Information Center
Bevan, Samantha J.; Chan, Cecilia W. L.; Tanner, Julian A.
2014-01-01
Although there is increasing evidence for a relationship between courses that emphasize student engagement and achievement of student deep learning, there is a paucity of quantitative comparative studies in a biochemistry and molecular biology context. Here, we present a pedagogical study in two contrasting parallel biochemistry introductory…
Zhu, Wenhui; Liu, Shanshan; Liu, Jia; Zhou, Yan; Lin, Huancai
2018-05-01
Adherence capacity is one of the principal virulence factors of Streptococcus mutans, and adhesion virulence factors are controlled by small RNAs (sRNAs) at the post-transcriptional level in various bacteria. Here, we aimed to identify and decipher putative adhesion-related sRNAs in clinical strains of S. mutans. RNA deep-sequencing was performed to identify potential sRNAs under different adhesion conditions. The expression of sRNAs was analysed by quantitative real-time PCR (qRT-PCR), and bioinformatic methods were used to predict the functional characteristics of sRNAs. A total of 736 differentially expressed candidate sRNAs were predicted, and these included 352 sRNAs located on the antisense to mRNA (AM) and 384 sRNAs in intergenic regions (IGRs). The top 7 differentially expressed sRNAs were successfully validated by qRT-PCR in UA159, and 2 of these were further confirmed in 100 clinical isolates. Moreover, the sequences of two sRNAs were conserved in other Streptococcus species, indicating a conserved role in such closely related species. A good correlation between the expression of sRNAs and the adhesion of 100 clinical strains was observed, which, combined with GO and KEGG, provides a perspective for the comprehension of sRNA function annotation. This study revealed a multitude of novel putative adhesion-related sRNAs in S. mutans and contributed to a better understanding of information concerning the transcriptional regulation of adhesion in S. mutans.
Falk, Kristin; Falk, Hanna; Jakobsson Ung, Eva
2016-01-01
A key area for consideration is determining how optimal conditions for learning can be created. Higher education in nursing aims to prepare students to develop their capabilities to become independent professionals. The aim of this study was to evaluate the effects of sequencing clinical practice prior to theoretical studies on student's experiences of self-directed learning readiness and students' approach to learning in the second year of a three-year undergraduate study program in nursing. 123 nursing students was included in the study and divided in two groups. In group A (n = 60) clinical practice preceded theoretical studies. In group (n = 63) theoretical studies preceded clinical practice. Learning readiness was measured using the Directed Learning Readiness Scale for Nursing Education (SDLRSNE), and learning process was measured using the revised two-factor version of the Study Process Questionnaire (R-SPQ-2F). Students were also asked to write down their personal reflections throughout the course. By using a mixed method design, the qualitative component focused on the students' personal experiences in relation to the sequencing of theoretical studies and clinical practice. The quantitative component provided information about learning readiness before and after the intervention. Our findings confirm that students are sensitive and adaptable to their learning contexts, and that the sequencing of courses is subordinate to a pedagogical style enhancing students' deep learning approaches, which needs to be incorporated in the development of undergraduate nursing programs. Copyright © 2015 Elsevier Ltd. All rights reserved.
Villacreses, Javier; Rojas-Herrera, Marcelo; Sánchez, Carolina; Hewstone, Nicole; Undurraga, Soledad F.; Alzate, Juan F.; Manque, Patricio; Maracaja-Coutinho, Vinicius; Polanco, Victor
2015-01-01
Here, we report the genome sequence and evidence for transcriptional activity of a virus-like element in the native Chilean berry tree Aristotelia chilensis. We propose to name the endogenous sequence as Aristotelia chilensis Virus 1 (AcV1). High-throughput sequencing of the genome of this tree uncovered an endogenous viral element, with a size of 7122 bp, corresponding to the complete genome of AcV1. Its sequence contains three open reading frames (ORFs): ORFs 1 and 2 shares 66%–73% amino acid similarity with members of the Caulimoviridae virus family, especially the Petunia vein clearing virus (PVCV), Petuvirus genus. ORF1 encodes a movement protein (MP); ORF2 a Reverse Transcriptase (RT) and a Ribonuclease H (RNase H) domain; and ORF3 showed no amino acid sequence similarity with any other known virus proteins. Analogous to other known endogenous pararetrovirus sequences (EPRVs), AcV1 is integrated in the genome of Maqui Berry and showed low viral transcriptional activity, which was detected by deep sequencing technology (DNA and RNA-seq). Phylogenetic analysis of AcV1 and other pararetroviruses revealed a closer resemblance with Petuvirus. Overall, our data suggests that AcV1 could be a new member of Caulimoviridae family, genus Petuvirus, and the first evidence of this kind of virus in a fruit plant. PMID:25855242
Hou, Weiguo; Wang, Shang; Briggs, Brandon R; Li, Gaoyuan; Xie, Wei; Dong, Hailiang
2018-01-01
Myocyanophages, a group of viruses infecting cyanobacteria, are abundant and play important roles in elemental cycling. Here we investigated the particle-associated viral communities retained on 0.2 μm filters and in sediment samples (representing ancient cyanophage communities) from four ocean and three lake locations, using high-throughput sequencing and a newly designed primer pair targeting a gene fragment (∼145-bp in length) encoding the cyanophage gp23 major capsid protein (MCP). Diverse viral communities were detected in all samples. The fragments of 142-, 145-, and 148-bp in length were most abundant in the amplicons, and most sequences (>92%) belonged to cyanophages. Additionally, different sequencing depths resulted in different diversity estimates of the viral community. Operational taxonomic units obtained from deep sequencing of the MCP gene covered the majority of those obtained from shallow sequencing, suggesting that deep sequencing exhibited a more complete picture of cyanophage community than shallow sequencing. Our results also revealed a wide geographic distribution of marine myocyanophages, i.e., higher dissimilarities of the myocyanophage communities corresponded with the larger distances between the sampling sites. Collectively, this study suggests that the newly designed primer pair can be effectively used to study the community and diversity of myocyanophage from different environments, and the high-throughput sequencing represents a good method to understand viral diversity.
Hou, Weiguo; Wang, Shang; Briggs, Brandon R.; Li, Gaoyuan; Xie, Wei; Dong, Hailiang
2018-01-01
Myocyanophages, a group of viruses infecting cyanobacteria, are abundant and play important roles in elemental cycling. Here we investigated the particle-associated viral communities retained on 0.2 μm filters and in sediment samples (representing ancient cyanophage communities) from four ocean and three lake locations, using high-throughput sequencing and a newly designed primer pair targeting a gene fragment (∼145-bp in length) encoding the cyanophage gp23 major capsid protein (MCP). Diverse viral communities were detected in all samples. The fragments of 142-, 145-, and 148-bp in length were most abundant in the amplicons, and most sequences (>92%) belonged to cyanophages. Additionally, different sequencing depths resulted in different diversity estimates of the viral community. Operational taxonomic units obtained from deep sequencing of the MCP gene covered the majority of those obtained from shallow sequencing, suggesting that deep sequencing exhibited a more complete picture of cyanophage community than shallow sequencing. Our results also revealed a wide geographic distribution of marine myocyanophages, i.e., higher dissimilarities of the myocyanophage communities corresponded with the larger distances between the sampling sites. Collectively, this study suggests that the newly designed primer pair can be effectively used to study the community and diversity of myocyanophage from different environments, and the high-throughput sequencing represents a good method to understand viral diversity.
High Class-Imbalance in pre-miRNA Prediction: A Novel Approach Based on deepSOM.
Stegmayer, Georgina; Yones, Cristian; Kamenetzky, Laura; Milone, Diego H
2017-01-01
The computational prediction of novel microRNA within a full genome involves identifying sequences having the highest chance of being a miRNA precursor (pre-miRNA). These sequences are usually named candidates to miRNA. The well-known pre-miRNAs are usually only a few in comparison to the hundreds of thousands of potential candidates to miRNA that have to be analyzed, which makes this task a high class-imbalance classification problem. The classical way of approaching it has been training a binary classifier in a supervised manner, using well-known pre-miRNAs as positive class and artificially defining the negative class. However, although the selection of positive labeled examples is straightforward, it is very difficult to build a set of negative examples in order to obtain a good set of training samples for a supervised method. In this work, we propose a novel and effective way of approaching this problem using machine learning, without the definition of negative examples. The proposal is based on clustering unlabeled sequences of a genome together with well-known miRNA precursors for the organism under study, which allows for the quick identification of the best candidates to miRNA as those sequences clustered with known precursors. Furthermore, we propose a deep model to overcome the problem of having very few positive class labels. They are always maintained in the deep levels as positive class while less likely pre-miRNA sequences are filtered level after level. Our approach has been compared with other methods for pre-miRNAs prediction in several species, showing effective predictivity of novel miRNAs. Additionally, we will show that our approach has a lower training time and allows for a better graphical navegability and interpretation of the results. A web-demo interface to try deepSOM is available at http://fich.unl.edu.ar/sinc/web-demo/deepsom/.
Towards precision medicine: from quantitative imaging to radiomics
Acharya, U. Rajendra; Hagiwara, Yuki; Sudarshan, Vidya K.; Chan, Wai Yee; Ng, Kwan Hoong
2018-01-01
Radiology (imaging) and imaging-guided interventions, which provide multi-parametric morphologic and functional information, are playing an increasingly significant role in precision medicine. Radiologists are trained to understand the imaging phenotypes, transcribe those observations (phenotypes) to correlate with underlying diseases and to characterize the images. However, in order to understand and characterize the molecular phenotype (to obtain genomic information) of solid heterogeneous tumours, the advanced sequencing of those tissues using biopsy is required. Thus, radiologists image the tissues from various views and angles in order to have the complete image phenotypes, thereby acquiring a huge amount of data. Deriving meaningful details from all these radiological data becomes challenging and raises the big data issues. Therefore, interest in the application of radiomics has been growing in recent years as it has the potential to provide significant interpretive and predictive information for decision support. Radiomics is a combination of conventional computer-aided diagnosis, deep learning methods, and human skills, and thus can be used for quantitative characterization of tumour phenotypes. This paper discusses the overview of radiomics workflow, the results of various radiomics-based studies conducted using various radiological images such as computed tomography (CT), magnetic resonance imaging (MRI), and positron-emission tomography (PET), the challenges we are facing, and the potential contribution of radiomics towards precision medicine. PMID:29308604
Quantitative metagenomics reveals unique gut microbiome biomarkers in ankylosing spondylitis.
Wen, Chengping; Zheng, Zhijun; Shao, Tiejuan; Liu, Lin; Xie, Zhijun; Le Chatelier, Emmanuelle; He, Zhixing; Zhong, Wendi; Fan, Yongsheng; Zhang, Linshuang; Li, Haichang; Wu, Chunyan; Hu, Changfeng; Xu, Qian; Zhou, Jia; Cai, Shunfeng; Wang, Dawei; Huang, Yun; Breban, Maxime; Qin, Nan; Ehrlich, Stanislav Dusko
2017-07-27
The assessment and characterization of the gut microbiome has become a focus of research in the area of human autoimmune diseases. Ankylosing spondylitis is an inflammatory autoimmune disease and evidence showed that ankylosing spondylitis may be a microbiome-driven disease. To investigate the relationship between the gut microbiome and ankylosing spondylitis, a quantitative metagenomics study based on deep shotgun sequencing was performed, using gut microbial DNA from 211 Chinese individuals. A total of 23,709 genes and 12 metagenomic species were shown to be differentially abundant between ankylosing spondylitis patients and healthy controls. Patients were characterized by a form of gut microbial dysbiosis that is more prominent than previously reported cases with inflammatory bowel disease. Specifically, the ankylosing spondylitis patients demonstrated increases in the abundance of Prevotella melaninogenica, Prevotella copri, and Prevotella sp. C561 and decreases in Bacteroides spp. It is noteworthy that the Bifidobacterium genus, which is commonly used in probiotics, accumulated in the ankylosing spondylitis patients. Diagnostic algorithms were established using a subset of these gut microbial biomarkers. Alterations of the gut microbiome are associated with development of ankylosing spondylitis. Our data suggest biomarkers identified in this study might participate in the pathogenesis or development process of ankylosing spondylitis, providing new leads for the development of new diagnostic tools and potential treatments.
NASA Astrophysics Data System (ADS)
Zhao, Feng; Xu, Kuidong
2016-10-01
In comparison with the macrobenthos and prokaryotes, patterns of diversity and distribution of microbial eukaryotes in deep-sea hydrothermal vents are poorly known. The widely used high-throughput sequencing of 18S rDNA has revealed a high diversity of microeukaryotes yielded from both living organisms and buried DNA in marine sediments. More recently, cDNA surveys have been utilized to uncover the diversity of active organisms. However, both methods have never been used to evaluate the diversity of ciliates in hydrothermal vents. By using high-throughput DNA and cDNA sequencing of 18S rDNA, we evaluated the molecular diversity of ciliates, a representative group of microbial eukaryotes, from the sediments of deep-sea hydrothermal vents in the Okinawa Trough and compared it with that of an adjacent deep-sea area about 15 km away and that of an offshore area of the Yellow Sea about 500 km away. The results of DNA sequencing showed that Spirotrichea and Oligohymenophorea were the most diverse and abundant groups in all the three habitats. The proportion of sequences of Oligohymenophorea was the highest in the hydrothermal vents whereas Spirotrichea was the most diverse group at all three habitats. Plagiopyleans were found only in the hydrothermal vents but with low diversity and abundance. By contrast, the cDNA sequencing showed that Plagiopylea was the most diverse and most abundant group in the hydrothermal vents, followed by Spirotrichea in terms of diversity and Oligohymenophorea in terms of relative abundance. A novel group of ciliates, distinctly separate from the 12 known classes, was detected in the hydrothermal vents, indicating undescribed, possibly highly divergent ciliates may inhabit this environment. Statistical analyses showed that: (i) the three habitats differed significantly from one another in terms of diversity of both the rare and the total ciliate taxa, and; (ii) the adjacent deep sea was more similar to the offshore area than to the hydrothermal vents. In terms of the diversity of abundant taxa, however, there was no significant difference between the hydrothermal vents and the adjacent deep sea, both of which differed significantly from the offshore area. As abundant ciliate taxa can be found in several sampling sites, they are likely adapted to large environmental variations, while rare taxa are found in specific habitat and thus are potentially more sensitive to varying environmental conditions.
Amexis, Georgios; Oeth, Paul; Abel, Kenneth; Ivshina, Anna; Pelloquin, Francois; Cantor, Charles R.; Braun, Andreas; Chumakov, Konstantin
2001-01-01
RNA viruses exist as quasispecies, heterogeneous and dynamic mixtures of mutants having one or more consensus sequences. An adequate description of the genomic structure of such viral populations must include the consensus sequence(s) plus a quantitative assessment of sequence heterogeneities. For example, in quality control of live attenuated viral vaccines, the presence of even small quantities of mutants or revertants may indicate incomplete or unstable attenuation that may influence vaccine safety. Previously, we demonstrated the monitoring of oral poliovirus vaccine with the use of mutant analysis by PCR and restriction enzyme cleavage (MAPREC). In this report, we investigate genetic variation in live attenuated mumps virus vaccine by using both MAPREC and a platform (DNA MassArray) based on matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry. Mumps vaccines prepared from the Jeryl Lynn strain typically contain at least two distinct viral substrains, JL1 and JL2, which have been characterized by full length sequencing. We report the development of assays for characterizing sequence variants in these substrains and demonstrate their use in quantitative analysis of substrains and sequence variations in mixed virus cultures and mumps vaccines. The results obtained from both the MAPREC and MALDI-TOF methods showed excellent correlation. This suggests the potential utility of MALDI-TOF for routine quality control of live viral vaccines and for assessment of genetic stability and quantitative monitoring of genetic changes in other RNA viruses of clinical interest. PMID:11593021
Reproducibility and quantitation of amplicon sequencing-based detection
Zhou, Jizhong; Wu, Liyou; Deng, Ye; Zhi, Xiaoyang; Jiang, Yi-Huei; Tu, Qichao; Xie, Jianping; Van Nostrand, Joy D; He, Zhili; Yang, Yunfeng
2011-01-01
To determine the reproducibility and quantitation of the amplicon sequencing-based detection approach for analyzing microbial community structure, a total of 24 microbial communities from a long-term global change experimental site were examined. Genomic DNA obtained from each community was used to amplify 16S rRNA genes with two or three barcode tags as technical replicates in the presence of a small quantity (0.1% wt/wt) of genomic DNA from Shewanella oneidensis MR-1 as the control. The technical reproducibility of the amplicon sequencing-based detection approach is quite low, with an average operational taxonomic unit (OTU) overlap of 17.2%±2.3% between two technical replicates, and 8.2%±2.3% among three technical replicates, which is most likely due to problems associated with random sampling processes. Such variations in technical replicates could have substantial effects on estimating β-diversity but less on α-diversity. A high variation was also observed in the control across different samples (for example, 66.7-fold for the forward primer), suggesting that the amplicon sequencing-based detection approach could not be quantitative. In addition, various strategies were examined to improve the comparability of amplicon sequencing data, such as increasing biological replicates, and removing singleton sequences and less-representative OTUs across biological replicates. Finally, as expected, various statistical analyses with preprocessed experimental data revealed clear differences in the composition and structure of microbial communities between warming and non-warming, or between clipping and non-clipping. Taken together, these results suggest that amplicon sequencing-based detection is useful in analyzing microbial community structure even though it is not reproducible and quantitative. However, great caution should be taken in experimental design and data interpretation when the amplicon sequencing-based detection approach is used for quantitative analysis of the β-diversity of microbial communities. PMID:21346791
Role of Mitochondrial Inheritance on Prostate Cancer Outcome in African American Men. Addendum
2016-11-01
DNA sequencing technique developed by our collaborator using single amplicon long-range PCR that permits deep coverage (10,000-20,000X on average) of...the mitochondrial genome. We have sequenced 652 samples derived from frozen fully using this technology. The additional DNA samples derived from...paraffin embedded (FFPE) tissue were more challenging, but have now been sequenced . Mapping of DNA variants in our sequenced genomes to mitochondrial
Deep Sequencing Reveals a Divergent Ugandan cassava brown streak virus Isolate from Malawi
Winter, Stephan; Mukasa, Settumba; Tairo, Fred; Sseruwagi, Peter; Ndunguru, Joseph; Duffy, Siobain
2017-01-01
ABSTRACT Illumina sequencing of RNA from a cassava cutting from northern Malawi produced a genome of Ugandan cassava brown streak virus (UCBSV-MW-NB7_2013). Sequence comparisons revealed stronger similarity to an isolate from nearby Tanzania (93.4% pairwise nucleotide identity) than to those previously reported from Malawi (86.9 to 87.0%). PMID:28818908
USDA-ARS?s Scientific Manuscript database
The soybean Consensus Map 4.0 facilitated the anchoring of 95.6% of the soybean whole genome sequence developed by the Joint Genome Institute, Department of Energy but only properly oriented 66% of the sequence scaffolds. To find additional single nucleotide polymorphism (SNP) markers for additiona...
Identifying active foraminifera in the Sea of Japan using metatranscriptomic approach
NASA Astrophysics Data System (ADS)
Lejzerowicz, Franck; Voltsky, Ivan; Pawlowski, Jan
2013-02-01
Metagenetics represents an efficient and rapid tool to describe environmental diversity patterns of microbial eukaryotes based on ribosomal DNA sequences. However, the results of metagenetic studies are often biased by the presence of extracellular DNA molecules that are persistent in the environment, especially in deep-sea sediment. As an alternative, short-lived RNA molecules constitute a good proxy for the detection of active species. Here, we used a metatranscriptomic approach based on RNA-derived (cDNA) sequences to study the diversity of the deep-sea benthic foraminifera and compared it to the metagenetic approach. We analyzed 257 ribosomal DNA and cDNA sequences obtained from seven sediments samples collected in the Sea of Japan at depths ranging from 486 to 3665 m. The DNA and RNA-based approaches gave a similar view of the taxonomic composition of foraminiferal assemblage, but differed in some important points. First, the cDNA dataset was dominated by sequences of rotaliids and robertiniids, suggesting that these calcareous species, some of which have been observed in Rose Bengal stained samples, are the most active component of foraminiferal community. Second, the richness of monothalamous (single-chambered) foraminifera was particularly high in DNA extracts from the deepest samples, confirming that this group of foraminifera is abundant but not necessarily very active in the deep-sea sediments. Finally, the high divergence of undetermined sequences in cDNA dataset indicate the limits of our database and lack of knowledge about some active but possibly rare species. Our study demonstrates the capability of the metatranscriptomic approach to detect active foraminiferal species and prompt its use in future high-throughput sequencing-based environmental surveys.
NASA Astrophysics Data System (ADS)
Okay, Aral I.; Altiner, Demir
2016-10-01
The Haymana region in Central Anatolia is located in the southern part of the Pontides close to the İzmir-Ankara suture. During the Cretaceous, the region formed part of the south-facing active margin of the Eurasia. The area preserves a nearly complete record of the Cretaceous system. Shallow marine carbonates of earliest Cretaceous age are overlain by a 700-m-thick Cretaceous sequence, dominated by deep marine limestones. Three unconformity-bounded pelagic carbonate sequences of Berriasian, Albian-Cenomanian and Turonian-Santonian ages are recognized: Each depositional sequence is preceded by a period of tilting and submarine erosion during the Berriasian, early Albian and late Cenomanian, which corresponds to phases of local extension in the active continental margin. Carbonate breccias mark the base of the sequences and each carbonate sequence steps down on older units. The deep marine carbonate deposition ended in the late Santonian followed by tilting, erosion and folding during the Campanian. Deposition of thick siliciclastic turbidites started in the late Campanian and continued into the Tertiary. Unlike most forearc basins, the Haymana region was a site of deep marine carbonate deposition until the Campanian. This was because the Pontide arc was extensional and the volcanic detritus was trapped in the intra-arc basins and did not reach the forearc or the trench. The extensional nature of the arc is also shown by the opening of the Black Sea as a backarc basin in the Turonian-Santonian. The carbonate sedimentation in an active margin is characterized by synsedimentary vertical displacements, which results in submarine erosion, carbonate breccias and in the lateral discontinuity of the sequences, and differs from blanket like carbonate deposition in the passive margins.
Subsurface microbial diversity in deep-granitic-fracture water in Colorado
Sahl, J.W.; Schmidt, R.; Swanner, E.D.; Mandernack, K.W.; Templeton, A.S.; Kieft, Thomas L.; Smith, R.L.; Sanford, W.E.; Callaghan, R.L.; Mitton, J.B.; Spear, J.R.
2008-01-01
A microbial community analysis using 16S rRNA gene sequencing was performed on borehole water and a granite rock core from Henderson Mine, a >1,000-meter-deep molybdenum mine near Empire, CO. Chemical analysis of borehole water at two separate depths (1,044 m and 1,004 m below the mine entrance) suggests that a sharp chemical gradient exists, likely from the mixing of two distinct subsurface fluids, one metal rich and one relatively dilute; this has created unique niches for microorganisms. The microbial community analyzed from filtered, oxic borehole water indicated an abundance of sequences from iron-oxidizing bacteria (Gallionella spp.) and was compared to the community from the same borehole after 2 weeks of being plugged with an expandable packer. Statistical analyses with UniFrac revealed a significant shift in community structure following the addition of the packer. Phospholipid fatty acid (PLFA) analysis suggested that Nitrosomonadales dominated the oxic borehole, while PLFAs indicative of anaerobic bacteria were most abundant in the samples from the plugged borehole. Microbial sequences were represented primarily by Firmicutes, Proteobacteria, and a lineage of sequences which did not group with any identified bacterial division; phylogenetic analyses confirmed the presence of a novel candidate division. This "Henderson candidate division" dominated the clone libraries from the dilute anoxic fluids. Sequences obtained from the granitic rock core (1,740 m below the surface) were represented by the divisions Proteobacteria (primarily the family Ralstoniaceae) and Firmicutes. Sequences grouping within Ralstoniaceae were also found in the clone libraries from metal-rich fluids yet were absent in more dilute fluids. Lineage-specific comparisons, combined with phylogenetic statistical analyses, show that geochemical variance has an important effect on microbial community structure in deep, subsurface systems. Copyright ?? 2008, American Society for Microbiology. All Rights Reserved.
Subsurface Microbial Diversity in Deep-Granitic-Fracture Water in Colorado▿
Sahl, Jason W.; Schmidt, Raleigh; Swanner, Elizabeth D.; Mandernack, Kevin W.; Templeton, Alexis S.; Kieft, Thomas L.; Smith, Richard L.; Sanford, William E.; Callaghan, Robert L.; Mitton, Jeffry B.; Spear, John R.
2008-01-01
A microbial community analysis using 16S rRNA gene sequencing was performed on borehole water and a granite rock core from Henderson Mine, a >1,000-meter-deep molybdenum mine near Empire, CO. Chemical analysis of borehole water at two separate depths (1,044 m and 1,004 m below the mine entrance) suggests that a sharp chemical gradient exists, likely from the mixing of two distinct subsurface fluids, one metal rich and one relatively dilute; this has created unique niches for microorganisms. The microbial community analyzed from filtered, oxic borehole water indicated an abundance of sequences from iron-oxidizing bacteria (Gallionella spp.) and was compared to the community from the same borehole after 2 weeks of being plugged with an expandable packer. Statistical analyses with UniFrac revealed a significant shift in community structure following the addition of the packer. Phospholipid fatty acid (PLFA) analysis suggested that Nitrosomonadales dominated the oxic borehole, while PLFAs indicative of anaerobic bacteria were most abundant in the samples from the plugged borehole. Microbial sequences were represented primarily by Firmicutes, Proteobacteria, and a lineage of sequences which did not group with any identified bacterial division; phylogenetic analyses confirmed the presence of a novel candidate division. This “Henderson candidate division” dominated the clone libraries from the dilute anoxic fluids. Sequences obtained from the granitic rock core (1,740 m below the surface) were represented by the divisions Proteobacteria (primarily the family Ralstoniaceae) and Firmicutes. Sequences grouping within Ralstoniaceae were also found in the clone libraries from metal-rich fluids yet were absent in more dilute fluids. Lineage-specific comparisons, combined with phylogenetic statistical analyses, show that geochemical variance has an important effect on microbial community structure in deep, subsurface systems. PMID:17981950
2014-01-01
Background Hypervariable region 1 (HVR1) contained within envelope protein 2 (E2) gene is the most variable part of HCV genome and its translation product is a major target for the host immune response. Variability within HVR1 may facilitate evasion of the immune response and could affect treatment outcome. The aim of the study was to analyze the impact of HVR1 heterogeneity employing sensitive ultra-deep sequencing, on the outcome of PEG-IFN-α (pegylated interferon α) and ribavirin treatment. Methods HVR1 sequences were amplified from pretreatment serum samples of 25 patients infected with genotype 1b HCV (12 responders and 13 non-responders) and were subjected to pyrosequencing (GS Junior, 454/Roche). Reads were corrected for sequencing error using ShoRAH software, while population reconstruction was done using three different minimal variant frequency cut-offs of 1%, 2% and 5%. Statistical analysis was done using Mann–Whitney and Fisher’s exact tests. Results Complexity, Shannon entropy, nucleotide diversity per site, genetic distance and the number of genetic substitutions were not significantly different between responders and non-responders, when analyzing viral populations at any of the three frequencies (≥1%, ≥2% and ≥5%). When clonal sample was used to determine pyrosequencing error, 4% of reads were found to be incorrect and the most abundant variant was present at a frequency of 1.48%. Use of ShoRAH reduced the sequencing error to 1%, with the most abundant erroneous variant present at frequency of 0.5%. Conclusions While deep sequencing revealed complex genetic heterogeneity of HVR1 in chronic hepatitis C patients, there was no correlation between treatment outcome and any of the analyzed quasispecies parameters. PMID:25016390
Téllez-Sosa, Juan; Rodríguez, Mario Henry; Gómez-Barreto, Rosa E.; Valdovinos-Torres, Humberto; Hidalgo, Ana Cecilia; Cruz-Hervert, Pablo; Luna, René Santos; Carrillo-Valenzo, Erik; Ramos, Celso; García-García, Lourdes; Martínez-Barnetche, Jesús
2013-01-01
Background Influenza viruses display a high mutation rate and complex evolutionary patterns. Next-generation sequencing (NGS) has been widely used for qualitative and semi-quantitative assessment of genetic diversity in complex biological samples. The “deep sequencing” approach, enabled by the enormous throughput of current NGS platforms, allows the identification of rare genetic viral variants in targeted genetic regions, but is usually limited to a small number of samples. Methodology and Principal Findings We designed a proof-of-principle study to test whether redistributing sequencing throughput from a high depth-small sample number towards a low depth-large sample number approach is feasible and contributes to influenza epidemiological surveillance. Using 454-Roche sequencing, we sequenced at a rather low depth, a 307 bp amplicon of the neuraminidase gene of the Influenza A(H1N1) pandemic (A(H1N1)pdm) virus from cDNA amplicons pooled in 48 barcoded libraries obtained from nasal swab samples of infected patients (n = 299) taken from May to November, 2009 pandemic period in Mexico. This approach revealed that during the transition from the first (May-July) to second wave (September-November) of the pandemic, the initial genetic variants were replaced by the N248D mutation in the NA gene, and enabled the establishment of temporal and geographic associations with genetic diversity and the identification of mutations associated with oseltamivir resistance. Conclusions NGS sequencing of a short amplicon from the NA gene at low sequencing depth allowed genetic screening of a large number of samples, providing insights to viral genetic diversity dynamics and the identification of genetic variants associated with oseltamivir resistance. Further research is needed to explain the observed replacement of the genetic variants seen during the second wave. As sequencing throughput rises and library multiplexing and automation improves, we foresee that the approach presented here can be scaled up for global genetic surveillance of influenza and other infectious diseases. PMID:23843978
Genome-wide proteomics analysis on longissimus muscles in Qinchuan beef cattle.
He, Hua; Chen, Si; Liang, Wei; Liu, Xiaolin
2017-04-01
To gain further insight into the molecular mechanism of bovine muscle development, we combined mass spectrometry characterization of proteins with Illumina deep sequencing of RNAs obtained from bovine longissimus muscle (LD) at prenatal and postnatal stages. For the proteomic study, each group of LD proteins was extracted and labeled using isobaric tags for relative and absolute quantitation (iTRAQ) method. Among the 1321 proteins identified from six samples, 390 proteins were differentially expressed in embryos at day 135 post-fertilization (Emb135d) vs. 30-month-old adult cattle (Emb135d vs. 30M) samples. Gene Ontology, Cluster of Orthologous Groups and Kyoto Encyclopedia of Genes and Genomes analyses were further conducted to better understand the different functions. Furthermore, we analyzed the relationship between transcript and protein regulation between samples by direct comparison of expression levels from transcriptomic and iTRAQ-based proteomics. Association results indicated that 1295 of 1321 proteins could be mapped to transcriptome sequencing data. This study provides the most comprehensive, targeted survey of bovine LD proteins to date and has shown the power of combining transcriptomic and proteomic approaches to provide molecular insights for understanding the developmental characteristics in bovine muscle, and even in other mammals. © 2016 Stichting International Foundation for Animal Genetics.
GENOMIC BASIS OF AGING AND LIFE HISTORY EVOLUTION IN DROSOPHILA MELANOGASTER
Remolina, Silvia C.; Chang, Peter L.; Leips, Jeff; Nuzhdin, Sergey V.; Hughes, Kimberly A.
2015-01-01
Natural diversity in aging and other life history patterns is a hallmark of organismal variation. Related species, populations, and individuals within populations show genetically based variation in life span and other aspects of age-related performance. Population differences are especially informative because these differences can be large relative to within-population variation and because they occur in organisms with otherwise similar genomes. We used experimental evolution to produce populations divergent for life span and late-age fertility and then used deep genome sequencing to detect sequence variants with nucleotide-level resolution. Several genes and genome regions showed strong signatures of selection, and the same regions were implicated in independent comparisons, suggesting that the same alleles were selected in replicate lines. Genes related to oogenesis, immunity, and protein degradation were implicated as important modifiers of late-life performance. Expression profiling and functional annotation narrowed the list of strong candidate genes to 38, most of which are novel candidates for regulating aging. Life span and early-age fecundity were negatively correlated among populations; therefore the alleles we identified also are candidate regulators of a major life-history trade-off. More generally, we argue that hitchhiking mapping can be a powerful tool for uncovering the molecular bases of quantitative genetic variation. PMID:23106705
BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone.
Yang, Bite; Liu, Feng; Ren, Chao; Ouyang, Zhangyi; Xie, Ziwei; Bo, Xiaochen; Shu, Wenjie
2017-07-01
Enhancer elements are noncoding stretches of DNA that play key roles in controlling gene expression programmes. Despite major efforts to develop accurate enhancer prediction methods, identifying enhancer sequences continues to be a challenge in the annotation of mammalian genomes. One of the major issues is the lack of large, sufficiently comprehensive and experimentally validated enhancers for humans or other species. Thus, the development of computational methods based on limited experimentally validated enhancers and deciphering the transcriptional regulatory code encoded in the enhancer sequences is urgent. We present a deep-learning-based hybrid architecture, BiRen, which predicts enhancers using the DNA sequence alone. Our results demonstrate that BiRen can learn common enhancer patterns directly from the DNA sequence and exhibits superior accuracy, robustness and generalizability in enhancer prediction relative to other state-of-the-art enhancer predictors based on sequence characteristics. Our BiRen will enable researchers to acquire a deeper understanding of the regulatory code of enhancer sequences. Our BiRen method can be freely accessed at https://github.com/wenjiegroup/BiRen . shuwj@bmi.ac.cn or boxc@bmi.ac.cn. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Distribution and Diversity of Microbial Eukaryotes in Bathypelagic Waters of the South China Sea.
Xu, Dapeng; Jiao, Nianzhi; Ren, Rui; Warren, Alan
2017-05-01
Little is known about the biodiversity of microbial eukaryotes in the South China Sea, especially in waters at bathyal depths. Here, we employed SSU rDNA gene sequencing to reveal the diversity and community structure across depth and distance gradients in the South China Sea. Vertically, the highest alpha diversity was found at 75-m depth. The communities of microbial eukaryotes were clustered into shallow-, middle-, and deep-water groups according to the depth from which they were collected, indicating a depth-related diversity and distribution pattern. Rhizaria sequences dominated the microeukaryote community and occurred in all samples except those from less than 50-m deep, being most abundant near the sea floor where they contributed ca. 64-97% and 40-74% of the total sequences and OTUs recovered, respectively. A large portion of rhizarian OTUs has neither a nearest named neighbor nor a nearest neighbor in the GenBank database which indicated the presence of new phylotypes in the South China Sea. Given their overwhelming abundance and richness, further phylogenetic analysis of rhizarians were performed and three new genetic clusters were revealed containing sequences retrieved from the deep waters of the South China Sea. Our results shed light on the diversity and community structure of microbial eukaryotes in this not yet fully explored area. © 2016 The Author(s) Journal of Eukaryotic Microbiology © 2016 International Society of Protistologists.
An Inquiry-Based Quantitative Reasoning Course for Business Students
ERIC Educational Resources Information Center
Piercey, Victor; Militzer, Erin
2017-01-01
Quantitative Reasoning for Business is a two-semester sequence that serves as an alternative to elementary and intermediate algebra for first-year business students with weak mathematical preparation. Students who take the sequence have been retained at a higher rate and demonstrated a larger reduction in math anxiety than those who take the…
Vernick, Kenneth D.
2017-01-01
Metavisitor is a software package that allows biologists and clinicians without specialized bioinformatics expertise to detect and assemble viral genomes from deep sequence datasets. The package is composed of a set of modular bioinformatic tools and workflows that are implemented in the Galaxy framework. Using the graphical Galaxy workflow editor, users with minimal computational skills can use existing Metavisitor workflows or adapt them to suit specific needs by adding or modifying analysis modules. Metavisitor works with DNA, RNA or small RNA sequencing data over a range of read lengths and can use a combination of de novo and guided approaches to assemble genomes from sequencing reads. We show that the software has the potential for quick diagnosis as well as discovery of viruses from a vast array of organisms. Importantly, we provide here executable Metavisitor use cases, which increase the accessibility and transparency of the software, ultimately enabling biologists or clinicians to focus on biological or medical questions. PMID:28045932
Copeland, Alex; Gu, Wei; Yasawong, Montri; Lapidus, Alla; Lucas, Susan; Deshpande, Shweta; Pagani, Ioanna; Tapia, Roxanne; Cheng, Jan-Fang; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Pan, Chongle; Brambilla, Evelyne-Marie; Rohde, Manfred; Tindall, Brian J.; Sikorski, Johannes; Göker, Markus; Detter, John C.; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Woyke, Tanja
2012-01-01
Marinithermus hydrothermalis Sako et al. 2003 is the type species of the monotypic genus Marinithermus. M. hydrothermalis T1T was the first isolate within the phylum “Thermus-Deinococcus” to exhibit optimal growth under a salinity equivalent to that of sea water and to have an absolute requirement for NaCl for growth. M. hydrothermalis T1T is of interest because it may provide a new insight into the ecological significance of the aerobic, thermophilic decomposers in the circulation of organic compounds in deep-sea hydrothermal vent ecosystems. This is the first completed genome sequence of a member of the genus Marinithermus and the seventh sequence from the family Thermaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,269,167 bp long genome with its 2,251 protein-coding and 59 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:22675595
Brain Tumor Segmentation Using Deep Belief Networks and Pathological Knowledge.
Zhan, Tianming; Chen, Yi; Hong, Xunning; Lu, Zhenyu; Chen, Yunjie
2017-01-01
In this paper, we propose an automatic brain tumor segmentation method based on Deep Belief Networks (DBNs) and pathological knowledge. The proposed method is targeted against gliomas (both low and high grade) obtained in multi-sequence magnetic resonance images (MRIs). Firstly, a novel deep architecture is proposed to combine the multi-sequences intensities feature extraction with classification to get the classification probabilities of each voxel. Then, graph cut based optimization is executed on the classification probabilities to strengthen the spatial relationships of voxels. At last, pathological knowledge of gliomas is applied to remove some false positives. Our method was validated in the Brain Tumor Segmentation Challenge 2012 and 2013 databases (BRATS 2012, 2013). The performance of segmentation results demonstrates our proposal providing a competitive solution with stateof- the-art methods. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Lim, Issel Anne L; Faria, Andreia V; Li, Xu; Hsu, Johnny T C; Airan, Raag D; Mori, Susumu; van Zijl, Peter C M
2013-11-15
The purpose of this paper is to extend the single-subject Eve atlas from Johns Hopkins University, which currently contains diffusion tensor and T1-weighted anatomical maps, by including contrast based on quantitative susceptibility mapping. The new atlas combines a "deep gray matter parcellation map" (DGMPM) derived from a single-subject quantitative susceptibility map with the previously established "white matter parcellation map" (WMPM) from the same subject's T1-weighted and diffusion tensor imaging data into an MNI coordinate map named the "Everything Parcellation Map in Eve Space," also known as the "EvePM." It allows automated segmentation of gray matter and white matter structures. Quantitative susceptibility maps from five healthy male volunteers (30 to 33 years of age) were coregistered to the Eve Atlas with AIR and Large Deformation Diffeomorphic Metric Mapping (LDDMM), and the transformation matrices were applied to the EvePM to produce automated parcellation in subject space. Parcellation accuracy was measured with a kappa analysis for the left and right structures of six deep gray matter regions. For multi-orientation QSM images, the Kappa statistic was 0.85 between automated and manual segmentation, with the inter-rater reproducibility Kappa being 0.89 for the human raters, suggesting "almost perfect" agreement between all segmentation methods. Segmentation seemed slightly more difficult for human raters on single-orientation QSM images, with the Kappa statistic being 0.88 between automated and manual segmentation, and 0.85 and 0.86 between human raters. Overall, this atlas provides a time-efficient tool for automated coregistration and segmentation of quantitative susceptibility data to analyze many regions of interest. These data were used to establish a baseline for normal magnetic susceptibility measurements for over 60 brain structures of 30- to 33-year-old males. Correlating the average susceptibility with age-based iron concentrations in gray matter structures measured by Hallgren and Sourander (1958) allowed interpolation of the average iron concentration of several deep gray matter regions delineated in the EvePM. Copyright © 2013 Elsevier Inc. All rights reserved.
Lim, Issel Anne L.; Faria, Andreia V.; Li, Xu; Hsu, Johnny T.C.; Airan, Raag D.; Mori, Susumu; van Zijl, Peter C. M.
2013-01-01
The purpose of this paper is to extend the single-subject Eve atlas from Johns Hopkins University, which currently contains diffusion tensor and T1-weighted anatomical maps, by including contrast based on quantitative susceptibility mapping. The new atlas combines a “deep gray matter parcellation map” (DGMPM) derived from a single-subject quantitative susceptibility map with the previously established “white matter parcellation map” (WMPM) from the same subject’s T1-weighted and diffusion tensor imaging data into an MNI coordinate map named the “Everything Parcellation Map in Eve Space,” also known as the “EvePM.” It allows automated segmentation of gray matter and white matter structures. Quantitative susceptibility maps from five healthy male volunteers (30 to 33 years of age) were coregistered to the Eve Atlas with AIR and Large Deformation Diffeomorphic Metric Mapping (LDDMM), and the transformation matrices were applied to the EvePM to produce automated parcellation in subject space. Parcellation accuracy was measured with a kappa analysis for the left and right structures of six deep gray matter regions. For multi-orientation QSM images, the Kappa statistic was 0.85 between automated and manual segmentation, with the inter-rater reproducibility Kappa being 0.89 for the human raters, suggesting “almost perfect” agreement between all segmentation methods. Segmentation seemed slightly more difficult for human raters on single-orientation QSM images, with the Kappa statistic being 0.88 between automated and manual segmentation, and 0.85 and 0.86 between human raters. Overall, this atlas provides a time-efficient tool for automated coregistration and segmentation of quantitative susceptibility data to analyze many regions of interest. These data were used to establish a baseline for normal magnetic susceptibility measurements for over 60 brain structures of 30- to 33-year-old males. Correlating the average susceptibility with age-based iron concentrations in gray matter structures measured by Hallgren and Sourander (1958) allowed interpolation of the average iron concentration of several deep gray matter regions delineated in the EvePM. PMID:23769915
NASA Astrophysics Data System (ADS)
Eyles, Nicholas; Mullins, Henry T.; Hine, Albert C.
1991-09-01
This paper presents the first detailed data regarding the newly discovered deep infill of Okanagan Lake. Okanagan Lake (50°00'N, 119°30'W) is 120 km long, ˜ 3-5 km wide and occupies a glacially overdeepened bedrock basin in the southern interior of British Columbia. This basin, and other elongate lakes of the region (e.g. Shuswap, Kootenay, Kalamalka, Canim and Mahood lakes), mark the site of westward flowing ice streams within successive Cordilleran ice sheets. An air gun seismic survey of Okanagan Lake shows that the bedrock floor is nearly 650 m below sea-level, more than 2000 m below the rim of the surrounding plateau. The maximum thickness of Pleistocene sediment in Okanagan Lake basin approaches 800 m. Forty-six seismic reflection traverses and an axial profile show a relatively simple stratigraphy composed of three seismic sequences argued to be no older than the last glacial cycle (< 30 ka). A discontinuous basal unit (sequence I) characterized by large-scale diffractions, and up to 460 m thick, infills the narrow, V-shaped bedrock floor of the basin and is interpreted as a boulder gravel deposited by subglacial meltwaters. Overlying seismic sequence II is composed of two sub-sequences. Sub-sequence IIa is a chaotic to massive facies up to 736 m thick. Lakeshore exposures close to where this unit reaches lake level show deformed and chaotically-bedded glaciolacustrine silts containing gravel lens and large ice-rafted boulders. The surface topography of this sub-sequence is irregular and in general mimics the form of the underlying bedrock as a result of compaction. This sequence passes laterally into stratified facies (sub-sequence IIb) at the northern end of the basin. Seismic sequence II appears to record rapid ice-proximal dumping of glaciolacustrine silt as the Okanagan glacier backwasted upvalley in a deep lake. A thin (60 m max.) laminated seismic sequence (III) drapes the hummocky surface of sequence II and represents postglacial sedimentation from fan-deltas. The extreme thickness of sequences I and II in Okanagan Lake reflects the focussing of large volumes of meltwater and sediment into the basin during deglaciation; pre-existing sediments that pre-date the last glacial cycle appear to have been completely eroded. Glaciological conditions during sedimentation may have been similar to marine-based outlet glaciers calving in deep water in fiord basins. In contrast to marine settings where ice bergs are free to disperse, large volumes of dead ice were trapped within the basin; structural evidence for sedimentation around dead ice blocks has been previously used to argue that the Cordilleran Ice Sheet downwasted in situ. We emphasize in contrast, the trapping of dead ice left behind by rapidly calving lake-based outlet glaciers.
van den Broek, M; Bolat, I; Nijkamp, J F; Ramos, E; Luttik, M A H; Koopman, F; Geertman, J M; de Ridder, D; Pronk, J T; Daran, J-M
2015-09-01
Lager brewing strains of Saccharomyces pastorianus are natural interspecific hybrids originating from the spontaneous hybridization of Saccharomyces cerevisiae and Saccharomyces eubayanus. Over the past 500 years, S. pastorianus has been domesticated to become one of the most important industrial microorganisms. Production of lager-type beers requires a set of essential phenotypes, including the ability to ferment maltose and maltotriose at low temperature, the production of flavors and aromas, and the ability to flocculate. Understanding of the molecular basis of complex brewing-related phenotypic traits is a prerequisite for rational strain improvement. While genome sequences have been reported, the variability and dynamics of S. pastorianus genomes have not been investigated in detail. Here, using deep sequencing and chromosome copy number analysis, we showed that S. pastorianus strain CBS1483 exhibited extensive aneuploidy. This was confirmed by quantitative PCR and by flow cytometry. As a direct consequence of this aneuploidy, a massive number of sequence variants was identified, leading to at least 1,800 additional protein variants in S. pastorianus CBS1483. Analysis of eight additional S. pastorianus strains revealed that the previously defined group I strains showed comparable karyotypes, while group II strains showed large interstrain karyotypic variability. Comparison of three strains with nearly identical genome sequences revealed substantial chromosome copy number variation, which may contribute to strain-specific phenotypic traits. The observed variability of lager yeast genomes demonstrates that systematic linking of genotype to phenotype requires a three-dimensional genome analysis encompassing physical chromosomal structures, the copy number of individual chromosomes or chromosomal regions, and the allelic variation of copies of individual genes. Copyright © 2015, van den Broek et al.
2010-01-01
Background Classical and quantitative linkage analyses of genetic crosses have traditionally been used to map genes of interest, such as those conferring chloroquine or quinine resistance in malaria parasites. Next-generation sequencing technologies now present the possibility of determining genome-wide genetic variation at single base-pair resolution. Here, we combine in vivo experimental evolution, a rapid genetic strategy and whole genome re-sequencing to identify the precise genetic basis of artemisinin resistance in a lineage of the rodent malaria parasite, Plasmodium chabaudi. Such genetic markers will further the investigation of resistance and its control in natural infections of the human malaria, P. falciparum. Results A lineage of isogenic in vivo drug-selected mutant P. chabaudi parasites was investigated. By measuring the artemisinin responses of these clones, the appearance of an in vivo artemisinin resistance phenotype within the lineage was defined. The underlying genetic locus was mapped to a region of chromosome 2 by Linkage Group Selection in two different genetic crosses. Whole-genome deep coverage short-read re-sequencing (Illumina® Solexa) defined the point mutations, insertions, deletions and copy-number variations arising in the lineage. Eight point mutations arise within the mutant lineage, only one of which appears on chromosome 2. This missense mutation arises contemporaneously with artemisinin resistance and maps to a gene encoding a de-ubiquitinating enzyme. Conclusions This integrated approach facilitates the rapid identification of mutations conferring selectable phenotypes, without prior knowledge of biological and molecular mechanisms. For malaria, this model can identify candidate genes before resistant parasites are commonly observed in natural human malaria populations. PMID:20846421
Soler, Vincent José; Tran-Viet, Khanh-Nhat; Galiacy, Stéphane D; Limviphuvadh, Vachiranee; Klemm, Thomas Patrick; St Germain, Elizabeth; Fournié, Pierre R; Guillaud, Céline; Maurer-Stroh, Sebastian; Hawthorne, Felicia; Suarez, Cyrielle; Kantelip, Bernadette; Afshari, Natalie A; Creveaux, Isabelle; Luo, Xiaoyan; Meng, Weihua; Calvas, Patrick; Cassagne, Myriam; Arné, Jean-Louis; Rozen, Steven G; Malecaze, François; Young, Terri L
2014-01-01
Background Corneal intraepithelial dyskeratosis is an extremely rare condition. The classical form, affecting Native American Haliwa-Saponi tribe members, is called hereditary benign intraepithelial dyskeratosis (HBID). Herein, we present a new form of corneal intraepithelial dyskeratosis for which we identified the causative gene by using deep sequencing technology. Methods and results A seven member Caucasian French family with two corneal intraepithelial dyskeratosis affected individuals (6-year-old proband and his mother) was ascertained. The proband presented with bilateral complete corneal opacification and dyskeratosis. Palmoplantar hyperkeratosis and laryngeal dyskeratosis were associated with the phenotype. Histopathology studies of cornea and vocal cord biopsies showed dyskeratotic keratinisation. Quantitative PCR ruled out 4q35 duplication, classically described in HBID cases. Next generation sequencing with mean coverage of 50× using the Illumina Hi Seq and whole exome capture processing was performed. Sequence reads were aligned, and screened for single nucleotide variants and insertion/deletion calls. In-house pipeline filtering analyses and comparisons with available databases were performed. A novel missense mutation M77T was discovered for the gene NLRP1 which maps to chromosome 17p13.2. This was a de novo mutation in the proband’s mother, following segregation in the family, and not found in 738 control DNA samples. NLRP1 expression was determined in adult corneal epithelium. The amino acid change was found to destabilise significantly the protein structure. Conclusions We describe a new corneal intraepithelial dyskeratosis and how we identified its causative gene. The NLRP1 gene product is implicated in inflammation, autoimmune disorders, and caspase mediated apoptosis. NLRP1 polymorphisms are associated with various diseases. PMID:23349227
van den Broek, M.; Bolat, I.; Nijkamp, J. F.; Ramos, E.; Luttik, M. A. H.; Koopman, F.; Geertman, J. M.; de Ridder, D.; Pronk, J. T.
2015-01-01
Lager brewing strains of Saccharomyces pastorianus are natural interspecific hybrids originating from the spontaneous hybridization of Saccharomyces cerevisiae and Saccharomyces eubayanus. Over the past 500 years, S. pastorianus has been domesticated to become one of the most important industrial microorganisms. Production of lager-type beers requires a set of essential phenotypes, including the ability to ferment maltose and maltotriose at low temperature, the production of flavors and aromas, and the ability to flocculate. Understanding of the molecular basis of complex brewing-related phenotypic traits is a prerequisite for rational strain improvement. While genome sequences have been reported, the variability and dynamics of S. pastorianus genomes have not been investigated in detail. Here, using deep sequencing and chromosome copy number analysis, we showed that S. pastorianus strain CBS1483 exhibited extensive aneuploidy. This was confirmed by quantitative PCR and by flow cytometry. As a direct consequence of this aneuploidy, a massive number of sequence variants was identified, leading to at least 1,800 additional protein variants in S. pastorianus CBS1483. Analysis of eight additional S. pastorianus strains revealed that the previously defined group I strains showed comparable karyotypes, while group II strains showed large interstrain karyotypic variability. Comparison of three strains with nearly identical genome sequences revealed substantial chromosome copy number variation, which may contribute to strain-specific phenotypic traits. The observed variability of lager yeast genomes demonstrates that systematic linking of genotype to phenotype requires a three-dimensional genome analysis encompassing physical chromosomal structures, the copy number of individual chromosomes or chromosomal regions, and the allelic variation of copies of individual genes. PMID:26150454
BiQ Analyzer HT: locus-specific analysis of DNA methylation by high-throughput bisulfite sequencing
Lutsik, Pavlo; Feuerbach, Lars; Arand, Julia; Lengauer, Thomas; Walter, Jörn; Bock, Christoph
2011-01-01
Bisulfite sequencing is a widely used method for measuring DNA methylation in eukaryotic genomes. The assay provides single-base pair resolution and, given sufficient sequencing depth, its quantitative accuracy is excellent. High-throughput sequencing of bisulfite-converted DNA can be applied either genome wide or targeted to a defined set of genomic loci (e.g. using locus-specific PCR primers or DNA capture probes). Here, we describe BiQ Analyzer HT (http://biq-analyzer-ht.bioinf.mpi-inf.mpg.de/), a user-friendly software tool that supports locus-specific analysis and visualization of high-throughput bisulfite sequencing data. The software facilitates the shift from time-consuming clonal bisulfite sequencing to the more quantitative and cost-efficient use of high-throughput sequencing for studying locus-specific DNA methylation patterns. In addition, it is useful for locus-specific visualization of genome-wide bisulfite sequencing data. PMID:21565797
Quantitative trait locus mapping of deep rooting by linkage and association analysis in rice
Lou, Qiaojun; Chen, Liang; Mei, Hanwei; Wei, Haibin; Feng, Fangjun; Wang, Pei; Xia, Hui; Li, Tiemei; Luo, Lijun
2015-01-01
Deep rooting is a very important trait for plants’ drought avoidance, and it is usually represented by the ratio of deep rooting (RDR). Three sets of rice populations were used to determine the genetic base for RDR. A linkage mapping population with 180 recombinant inbred lines and an association mapping population containing 237 rice varieties were used to identify genes linked to RDR. Six quantitative trait loci (QTLs) of RDR were identified as being located on chromosomes 1, 2, 4, 7, and 10. Using 1 019 883 single-nucleotide polymorphisms (SNPs), a genome-wide association study of the RDR was performed. Forty-eight significant SNPs of the RDR were identified and formed a clear peak on the short arm of chromosome 1 in a Manhattan plot. Compared with the shallow-rooting group and the whole collection, the deep-rooting group had selective sweep regions on chromosomes 1 and 2, especially in the major QTL region on chromosome 2. Seven of the nine candidate SNPs identified by association mapping were verified in two RDR extreme groups. The findings from this study will be beneficial to rice drought-resistance research and breeding. PMID:26022253
Nguyen, Thanh; Bui, Vy; Lam, Van; Raub, Christopher B; Chang, Lin-Ching; Nehmetallah, George
2017-06-26
We propose a fully automatic technique to obtain aberration free quantitative phase imaging in digital holographic microscopy (DHM) based on deep learning. The traditional DHM solves the phase aberration compensation problem by manually detecting the background for quantitative measurement. This would be a drawback in real time implementation and for dynamic processes such as cell migration phenomena. A recent automatic aberration compensation approach using principle component analysis (PCA) in DHM avoids human intervention regardless of the cells' motion. However, it corrects spherical/elliptical aberration only and disregards the higher order aberrations. Traditional image segmentation techniques can be employed to spatially detect cell locations. Ideally, automatic image segmentation techniques make real time measurement possible. However, existing automatic unsupervised segmentation techniques have poor performance when applied to DHM phase images because of aberrations and speckle noise. In this paper, we propose a novel method that combines a supervised deep learning technique with convolutional neural network (CNN) and Zernike polynomial fitting (ZPF). The deep learning CNN is implemented to perform automatic background region detection that allows for ZPF to compute the self-conjugated phase to compensate for most aberrations.
Zhou, Zhichao; Zhang, Guo-Xia; Xu, Yan-Bin; Gu, Ji-Dong
2018-06-26
Thaumarchaeota and Bathyarchaeota (formerly named Miscellaneous Crenarchaeotal Group, MCG) are globally occurring archaea playing potential roles in nitrogen and carbon cycling, especially in marine benthic biogeochemical cycle. Information on their distributional and compositional patterns could provide critical clues to further delineate their physiological and biochemical characteristics. Profiles of thaumarchaeotal and the total archaeal community in the northern South China Sea surface sediments revealed a successively transitional pattern of Thaumarchaeota composition using MiSeq sequencing. Shallow-sea sediment enriched phylotypes decreased gradually along the slope from estuarine and coastal marine region to the deep-sea, while deep-sea sediment enriched phylotypes showed a trend of increasing. Proportion of Thaumarchaeota within the total archaea increased with seawater depth. Phylotypes enriched in shallow- and deep-sea sediments were affiliated to OTUs originated from similar niches, suggesting that physiological adaption not geographical distance shaped the distribution of Thaumarchaeota lineages. Quantitative PCR also depicted a successive decrease of thaumarchaeotal 16S rRNA gene abundance from the highest at shallow-sea sites E708S and E709S (2.57 × 10 6 and 2.73 × 10 6 gene copies/g of dry sediment) to the lowest at deep-sea sites E525S and E407S (1.97 × 10 6 and 2.14 × 10 6 gene copies/g of dry sediment). Both of the abundance fractions of Bathyarchaeota subgroups (including subgroups 1, 6, 8, 10, 13, 15, 17, and ungrouped Bathyarchaeota) and the total Bathyarchaeota in the total archaea showed a negative distribution to seawater depth. Partitioned distribution of Bathyarchaeota fraction in the total archaea is documented for the first time in this study, and the shallow- and deep-sea Bathyarchaeota could account for 17.8 and 0.8%, respectively, on average. Subgroups 6 and 8, enriched subgroups in shallow-sea sediments, largely explained this partitioned distribution pattern according to seawater depth. Their prevalence in shallow-sea and suboxic estuarine sediments rather than deep-sea sediments hints that their metabolic properties of carbon metabolism are adapted to carbon substrates in these environments.
Joint deep shape and appearance learning: application to optic pathway glioma segmentation
NASA Astrophysics Data System (ADS)
Mansoor, Awais; Li, Ien; Packer, Roger J.; Avery, Robert A.; Linguraru, Marius George
2017-03-01
Automated tissue characterization is one of the major applications of computer-aided diagnosis systems. Deep learning techniques have recently demonstrated impressive performance for the image patch-based tissue characterization. However, existing patch-based tissue classification techniques struggle to exploit the useful shape information. Local and global shape knowledge such as the regional boundary changes, diameter, and volumetrics can be useful in classifying the tissues especially in scenarios where the appearance signature does not provide significant classification information. In this work, we present a deep neural network-based method for the automated segmentation of the tumors referred to as optic pathway gliomas (OPG) located within the anterior visual pathway (AVP; optic nerve, chiasm or tracts) using joint shape and appearance learning. Voxel intensity values of commonly used MRI sequences are generally not indicative of OPG. To be considered an OPG, current clinical practice dictates that some portion of AVP must demonstrate shape enlargement. The method proposed in this work integrates multiple sequence magnetic resonance image (T1, T2, and FLAIR) along with local boundary changes to train a deep neural network. For training and evaluation purposes, we used a dataset of multiple sequence MRI obtained from 20 subjects (10 controls, 10 NF1+OPG). To our best knowledge, this is the first deep representation learning-based approach designed to merge shape and multi-channel appearance data for the glioma detection. In our experiments, mean misclassification errors of 2:39% and 0:48% were observed respectively for glioma and control patches extracted from the AVP. Moreover, an overall dice similarity coefficient of 0:87+/-0:13 (0:93+/-0:06 for healthy tissue, 0:78+/-0:18 for glioma tissue) demonstrates the potential of the proposed method in the accurate localization and early detection of OPG.
Ono, Atsushi; Murase, Kenya; Taniguchi, Toshitaka; Shibutani, Osamu; Takata, Satoru; Kobashi, Yasuyuki; Miyazaki, Mitsue
2009-04-01
Three noncontrast-enhanced MR venography techniques are presented for assessing deep vein thrombosis (DVT) at 0.5 T in patients with metallic implants. Two cardiac-gated 3D half-Fourier FSE fresh blood imaging sequences with flow-refocusing pulses (FR-FBI) in the read-out (RO) direction and without FR pulses (non-FR-FBI) were developed for slower-flowing blood. For faster flowing blood, a swap phase-encode arterial double-subtraction elimination (SPADE) technique was developed. The three techniques were assessed both quantitatively using signal-to-noise (SNR) and contrast-noise-ratio (CNR) measurements and qualitatively by subjective image analysis in 15 volunteers. SPADE was compared to FR-FBI in the pelvic veins and FR-FBI was compared to non-FR-FBI in the thigh and calf veins. Both SPADE and FR-FBI techniques produced significantly higher SNRs, CNRs, and image quality in each comparative study (P<0.001). Five patients with metallic implants and confirmed DVT underwent SPADE (pelvic veins) and FR-FBI (thigh and calf veins) examinations and the results were compared to conventional venography. The SPADE and FR-FBI images showed all DVTs from all five patients without interference from implant susceptibility artifacts. The excellent image quality produced by both SPADE and FR-FBI throughout peripheral vasculature demonstrates their promise for detecting DVT in postsurgery patients.
Assessing the link between recent supernovae near Earth and the iron-60 anomaly in a deep-sea crust
NASA Astrophysics Data System (ADS)
Schulreich, Michael M.; Breitschwerdt, Dieter
2016-06-01
Some time ago, an enhanced concentration of the radionuclide 60Fe was discovered in a deep-sea ferromanganese crust, isolated in layers dating from about 2.2, Myr ago. Since 60Fe (half-life of 2.6, Myr) is not naturally produced on Earth, such an excess can only be attributed to extraterrestrial sources, particularly one or several nearby supernovae in the recent past. It has been speculated that these supernovae might have been involved in the formation of the Local Superbubble, our Galactic habitat. The aim of this talk is to provide a quantitative evidence for this scenario. For that purpose, I will present results from high-resolution hydrodynamical simulations of the Local Superbubble and its neighbour Loop I in different environments, including a self-consistently evolved supernova-driven interstellar medium. For the superbubble modelling, the time sequence and locations of the generating core-collapse supernova explosions are taken into account, which are derived from the mass spectrum of the perished members of certain, carefully preselected stellar moving groups. The release and turbulent mixing of 60Fe is followed via passive scalars, where the yields of the decaying radioisotope were adjusted according to recent stellar evolution calculations. The models are able to reproduce both the timing and the intensity of the 60Fe excess observed with rather high precision.
Phylogenetic analysis of Archaea in the deep-sea sediments of west Pacific Warm Pool.
Wang, Peng; Xiao, Xiang; Wang, Fengping
2005-06-01
Archaea are known to play important roles in carbon cycling in marine sediments. The main compositions of archaeal community in five deep-sea sediment samples collected from west Pacific Warm Pool area (WP-0, WP-1, WP-2, WP-3, WP-4), and in five sediment layers (1 cm-, 3 cm-, 6 cm-, 10 cm-, 12 cm- layer) of the 12 cm sediment core of WP-0 were checked and compared by denaturing gradient gel electrophoresis and 16 S rRNA gene sequencing. It was revealed that all the deep-sea sediment samples checked contained members of non-thermophilic marine group I crenarchaeota as the predominant archaeal group. To further detect groups of archaea possibly relating with C1 metabolism, PCR amplification was carried out using primers targeting methane-oxidizing archaea. Although no methane-oxidizing archaea was detected, a group of novel archaea (named as WPA) was instead identified from all these five WP samples by clone analysis. They could be placed in the euryarchaeota kingdom, separated into two distinct groups, the main group was peripherally related with methanogens, the other group related with Thermoplasma. The vertical distributions of WPA, archaea and bacteria along the WP-0 sediment column were determined by quantitative-PCR. It was found that bacteria dominated at all depths, the numbers of bacteria were 10-10(4) times more than those of archaea. The proportion of archaea versus bacteria had a depth related increasing tendency, it was lowest at the first layer (0.01%), reached highest at the 12 cm- layer (10%). WPA only constituted a small proportion of the archaeal community (0.05% to 5%) of west Pacific Warm Pool sediment.
Succession in the petroleum reservoir microbiome through an oil field production lifecycle.
Vigneron, Adrien; Alsop, Eric B; Lomans, Bartholomeus P; Kyrpides, Nikos C; Head, Ian M; Tsesmetzis, Nicolas
2017-09-01
Subsurface petroleum reservoirs are an important component of the deep biosphere where indigenous microorganisms live under extreme conditions and in isolation from the Earth's surface for millions of years. However, unlike the bulk of the deep biosphere, the petroleum reservoir deep biosphere is subject to extreme anthropogenic perturbation, with the introduction of new electron acceptors, donors and exogenous microbes during oil exploration and production. Despite the fundamental and practical significance of this perturbation, there has never been a systematic evaluation of the ecological changes that occur over the production lifetime of an active offshore petroleum production system. Analysis of the entire Halfdan oil field in the North Sea (32 producing wells in production for 1-15 years) using quantitative PCR, multigenic sequencing, comparative metagenomic and genomic bins reconstruction revealed systematic shifts in microbial community composition and metabolic potential, as well as changing ecological strategies in response to anthropogenic perturbation of the oil field ecosystem, related to length of time in production. The microbial communities were initially dominated by slow growing anaerobes such as members of the Thermotogales and Clostridiales adapted to living on hydrocarbons and complex refractory organic matter. However, as seawater and nitrate injection (used for secondary oil production) delivered oxidants, the microbial community composition progressively changed to fast growing opportunists such as members of the Deferribacteres, Delta-, Epsilon- and Gammaproteobacteria, with energetically more favorable metabolism (for example, nitrate reduction, H 2 S, sulfide and sulfur oxidation). This perturbation has profound consequences for understanding the microbial ecology of the system and is of considerable practical importance as it promotes detrimental processes such as reservoir souring and metal corrosion. These findings provide a new conceptual framework for understanding the petroleum reservoir biosphere and have consequences for developing strategies to manage microbiological problems in the oil industry.
Krauze, Patryk; Kämpf, Horst; Horn, Fabian; Liu, Qi; Voropaev, Andrey; Wagner, Dirk; Alawi, Mashal
2017-01-01
The Cheb Basin (NW Bohemia, Czech Republic) is a shallow, neogene intracontinental basin. It is a non-volcanic region which features frequent earthquake swarms and large-scale diffuse degassing of mantle-derived CO 2 at the surface that occurs in the form of CO 2 -rich mineral springs and wet and dry mofettes. So far, the influence of CO 2 degassing onto the microbial communities has been studied for soil environments, but not for aquatic systems. We hypothesized, that deep-trenching CO 2 conduits interconnect the subsurface with the surface. This admixture of deep thermal fluids should be reflected in geochemical parameters and in the microbial community compositions. In the present study four mineral water springs and two wet mofettes were investigated through an interdisciplinary survey. The waters were acidic and differed in terms of organic carbon and anion/cation concentrations. Element geochemical and isotope analyses of fluid components were used to verify the origin of the fluids. Prokaryotic communities were characterized through quantitative PCR and Illumina 16S rRNA gene sequencing. Putative chemolithotrophic, anaerobic and microaerophilic organisms connected to sulfur (e.g., Sulfuricurvum, Sulfurimonas ) and iron (e.g., Gallionella, Sideroxydans ) cycling shaped the core community. Additionally, CO 2 -influenced waters form an ecosystem containing many taxa that are usually found in marine or terrestrial subsurface ecosystems. Multivariate statistics highlighted the influence of environmental parameters such as pH, Fe 2+ concentration and conductivity on species distribution. The hydrochemical and microbiological survey introduces a new perspective on mofettes. Our results support that mofettes are either analogs or rather windows into the deep biosphere and furthermore enable access to deeply buried paleo-sediments.
Krauze, Patryk; Kämpf, Horst; Horn, Fabian; Liu, Qi; Voropaev, Andrey; Wagner, Dirk; Alawi, Mashal
2017-01-01
The Cheb Basin (NW Bohemia, Czech Republic) is a shallow, neogene intracontinental basin. It is a non-volcanic region which features frequent earthquake swarms and large-scale diffuse degassing of mantle-derived CO2 at the surface that occurs in the form of CO2-rich mineral springs and wet and dry mofettes. So far, the influence of CO2 degassing onto the microbial communities has been studied for soil environments, but not for aquatic systems. We hypothesized, that deep-trenching CO2 conduits interconnect the subsurface with the surface. This admixture of deep thermal fluids should be reflected in geochemical parameters and in the microbial community compositions. In the present study four mineral water springs and two wet mofettes were investigated through an interdisciplinary survey. The waters were acidic and differed in terms of organic carbon and anion/cation concentrations. Element geochemical and isotope analyses of fluid components were used to verify the origin of the fluids. Prokaryotic communities were characterized through quantitative PCR and Illumina 16S rRNA gene sequencing. Putative chemolithotrophic, anaerobic and microaerophilic organisms connected to sulfur (e.g., Sulfuricurvum, Sulfurimonas) and iron (e.g., Gallionella, Sideroxydans) cycling shaped the core community. Additionally, CO2-influenced waters form an ecosystem containing many taxa that are usually found in marine or terrestrial subsurface ecosystems. Multivariate statistics highlighted the influence of environmental parameters such as pH, Fe2+ concentration and conductivity on species distribution. The hydrochemical and microbiological survey introduces a new perspective on mofettes. Our results support that mofettes are either analogs or rather windows into the deep biosphere and furthermore enable access to deeply buried paleo-sediments. PMID:29321765
Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees.
Williams, Philip H; Eyles, Rod; Weiller, Georg
2012-01-01
MicroRNAs (miRNAs) are nonprotein coding RNAs between 20 and 22 nucleotides long that attenuate protein production. Different types of sequence data are being investigated for novel miRNAs, including genomic and transcriptomic sequences. A variety of machine learning methods have successfully predicted miRNA precursors, mature miRNAs, and other nonprotein coding sequences. MirTools, mirDeep2, and miRanalyzer require "read count" to be included with the input sequences, which restricts their use to deep-sequencing data. Our aim was to train a predictor using a cross-section of different species to accurately predict miRNAs outside the training set. We wanted a system that did not require read-count for prediction and could therefore be applied to short sequences extracted from genomic, EST, or RNA-seq sources. A miRNA-predictive decision-tree model has been developed by supervised machine learning. It only requires that the corresponding genome or transcriptome is available within a sequence window that includes the precursor candidate so that the required sequence features can be collected. Some of the most critical features for training the predictor are the miRNA:miRNA(∗) duplex energy and the number of mismatches in the duplex. We present a cross-species plant miRNA predictor with 84.08% sensitivity and 98.53% specificity based on rigorous testing by leave-one-out validation.
miRanalyzer: a microRNA detection and analysis tool for next-generation sequencing experiments.
Hackenberg, Michael; Sturm, Martin; Langenberger, David; Falcón-Pérez, Juan Manuel; Aransay, Ana M
2009-07-01
Next-generation sequencing allows now the sequencing of small RNA molecules and the estimation of their expression levels. Consequently, there will be a high demand of bioinformatics tools to cope with the several gigabytes of sequence data generated in each single deep-sequencing experiment. Given this scene, we developed miRanalyzer, a web server tool for the analysis of deep-sequencing experiments for small RNAs. The web server tool requires a simple input file containing a list of unique reads and its copy numbers (expression levels). Using these data, miRanalyzer (i) detects all known microRNA sequences annotated in miRBase, (ii) finds all perfect matches against other libraries of transcribed sequences and (iii) predicts new microRNAs. The prediction of new microRNAs is an especially important point as there are many species with very few known microRNAs. Therefore, we implemented a highly accurate machine learning algorithm for the prediction of new microRNAs that reaches AUC values of 97.9% and recall values of up to 75% on unseen data. The web tool summarizes all the described steps in a single output page, which provides a comprehensive overview of the analysis, adding links to more detailed output pages for each analysis module. miRanalyzer is available at http://web.bioinformatics.cicbiogune.es/microRNA/.
Detection of microRNAs in color space.
Marco, Antonio; Griffiths-Jones, Sam
2012-02-01
Deep sequencing provides inexpensive opportunities to characterize the transcriptional diversity of known genomes. The AB SOLiD technology generates millions of short sequencing reads in color-space; that is, the raw data is a sequence of colors, where each color represents 2 nt and each nucleotide is represented by two consecutive colors. This strategy is purported to have several advantages, including increased ability to distinguish sequencing errors from polymorphisms. Several programs have been developed to map short reads to genomes in color space. However, a number of previously unexplored technical issues arise when using SOLiD technology to characterize microRNAs. Here we explore these technical difficulties. First, since the sequenced reads are longer than the biological sequences, every read is expected to contain linker fragments. The color-calling error rate increases toward the 3(') end of the read such that recognizing the linker sequence for removal becomes problematic. Second, mapping in color space may lead to the loss of the first nucleotide of each read. We propose a sequential trimming and mapping approach to map small RNAs. Using our strategy, we reanalyze three published insect small RNA deep sequencing datasets and characterize 22 new microRNAs. A bash shell script to perform the sequential trimming and mapping procedure, called SeqTrimMap, is available at: http://www.mirbase.org/tools/seqtrimmap/ antonio.marco@manchester.ac.uk Supplementary data are available at Bioinformatics online.
A Statistical Guide to the Design of Deep Mutational Scanning Experiments.
Matuszewski, Sebastian; Hildebrandt, Marcel E; Ghenu, Ana-Hermina; Jensen, Jeffrey D; Bank, Claudia
2016-09-01
The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deep-sequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates. Copyright © 2016 by the Genetics Society of America.
An introduction to deep learning on biological sequence data: examples and solutions.
Jurtz, Vanessa Isabell; Johansen, Alexander Rosenberg; Nielsen, Morten; Almagro Armenteros, Jose Juan; Nielsen, Henrik; Sønderby, Casper Kaae; Winther, Ole; Sønderby, Søren Kaae
2017-11-15
Deep neural network architectures such as convolutional and long short-term memory networks have become increasingly popular as machine learning tools during the recent years. The availability of greater computational resources, more data, new algorithms for training deep models and easy to use libraries for implementation and training of neural networks are the drivers of this development. The use of deep learning has been especially successful in image recognition; and the development of tools, applications and code examples are in most cases centered within this field rather than within biology. Here, we aim to further the development of deep learning methods within biology by providing application examples and ready to apply and adapt code templates. Given such examples, we illustrate how architectures consisting of convolutional and long short-term memory neural networks can relatively easily be designed and trained to state-of-the-art performance on three biological sequence problems: prediction of subcellular localization, protein secondary structure and the binding of peptides to MHC Class II molecules. All implementations and datasets are available online to the scientific community at https://github.com/vanessajurtz/lasagne4bio. skaaesonderby@gmail.com. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Saghatelyan, Ani; Poghosyan, Lianna
2015-01-01
The 2,379,636-bp draft genome sequence of Thermus scotoductus strain K1, isolated from geothermal spring outlet located in the Karvachar region in Nagorno Karabakh is presented. Strain K1 shares about 80% genome sequence similarity with T. scotoductus strain SA-01, recovered from a deep gold mine in South Africa. PMID:26564055
Brewer, Michael S; Swafford, Lynn; Spruill, Chad L; Bond, Jason E
2013-01-01
Arthropods are the most diverse group of eukaryotic organisms, but their phylogenetic relationships are poorly understood. Herein, we describe three mitochondrial genomes representing orders of millipedes for which complete genomes had not been characterized. Newly sequenced genomes are combined with existing data to characterize the protein coding regions of myriapods and to attempt to reconstruct the evolutionary relationships within the Myriapoda and Arthropoda. The newly sequenced genomes are similar to previously characterized millipede sequences in terms of synteny and length. Unique translocations occurred within the newly sequenced taxa, including one half of the Appalachioria falcifera genome, which is inverted with respect to other millipede genomes. Across myriapods, amino acid conservation levels are highly dependent on the gene region. Additionally, individual loci varied in the level of amino acid conservation. Overall, most gene regions showed low levels of conservation at many sites. Attempts to reconstruct the evolutionary relationships suffered from questionable relationships and low support values. Analyses of phylogenetic informativeness show the lack of signal deep in the trees (i.e., genes evolve too quickly). As a result, the myriapod tree resembles previously published results but lacks convincing support, and, within the arthropod tree, well established groups were recovered as polyphyletic. The novel genome sequences described herein provide useful genomic information concerning millipede groups that had not been investigated. Taken together with existing sequences, the variety of compositions and evolution of myriapod mitochondrial genomes are shown to be more complex than previously thought. Unfortunately, the use of mitochondrial protein-coding regions in deep arthropod phylogenetics appears problematic, a result consistent with previously published studies. Lack of phylogenetic signal renders the resulting tree topologies as suspect. As such, these data are likely inappropriate for investigating such ancient relationships.
Chong, Cheong-Meng; Leung, Siu Wai; Prieto-da-Silva, Álvaro R. B.; Havt, Alexandre; Quinet, Yves P.; Martins, Alice M. C.; Lee, Simon M. Y.; Rádis-Baptista, Gandhi
2014-01-01
Background Dinoponera quadriceps is a predatory giant ant that inhabits the Neotropical region and subdues its prey (insects) with stings that deliver a toxic cocktail of molecules. Human accidents occasionally occur and cause local pain and systemic symptoms. A comprehensive study of the D. quadriceps venom gland transcriptome is required to advance our knowledge about the toxin repertoire of the giant ant venom and to understand the physiopathological basis of Hymenoptera envenomation. Results We conducted a transcriptome analysis of a cDNA library from the D. quadriceps venom gland with Sanger sequencing in combination with whole-transcriptome shotgun deep sequencing. From the cDNA library, a total of 420 independent clones were analyzed. Although the proportion of dinoponeratoxin isoform precursors was high, the first giant ant venom inhibitor cysteine-knot (ICK) toxin was found. The deep next generation sequencing yielded a total of 2,514,767 raw reads that were assembled into 18,546 contigs. A BLAST search of the assembled contigs against non-redundant and Swiss-Prot databases showed that 6,463 contigs corresponded to BLASTx hits and indicated an interesting diversity of transcripts related to venom gene expression. The majority of these venom-related sequences code for a major polypeptide core, which comprises venom allergens, lethal-like proteins and esterases, and a minor peptide framework composed of inter-specific structurally conserved cysteine-rich toxins. Both the cDNA library and deep sequencing yielded large proportions of contigs that showed no similarities with known sequences. Conclusions To our knowledge, this is the first report of the venom gland transcriptome of the New World giant ant D. quadriceps. The glandular venom system was dissected, and the toxin arsenal was revealed; this process brought to light novel sequences that included an ICK-folded toxins, allergen proteins, esterases (phospholipases and carboxylesterases), and lethal-like toxins. These findings contribute to the understanding of the ecology, behavior and venomics of hymenopterans. PMID:24498135
Complete genome sequence of the Antarctic Halorubrum lacusprofundi type strain ACAM 34
Anderson, Iain J.; DasSarma, Priya; Lucas, Susan; ...
2016-09-10
Halorubrum lacusprofundi is an extreme halophile within the archaeal phylum Euryarchaeota. The type strain ACAM 34 was isolated from Deep Lake, Antarctica. H. lacusprofundi is of phylogenetic interest because it is distantly related to the haloarchaea that have previously been sequenced. It is also of interest because of its psychrotolerance. We report here the complete genome sequence of H. lacusprofundi type strain ACAM 34 and its annotation. In conclusion, this genome is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea.
Complete genome sequence of the Antarctic Halorubrum lacusprofundi type strain ACAM 34
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anderson, Iain J.; DasSarma, Priya; Lucas, Susan
Halorubrum lacusprofundi is an extreme halophile within the archaeal phylum Euryarchaeota. The type strain ACAM 34 was isolated from Deep Lake, Antarctica. H. lacusprofundi is of phylogenetic interest because it is distantly related to the haloarchaea that have previously been sequenced. It is also of interest because of its psychrotolerance. We report here the complete genome sequence of H. lacusprofundi type strain ACAM 34 and its annotation. In conclusion, this genome is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea.
MRI markers of small vessel disease in lobar and deep hemispheric intracerebral hemorrhage
Smith, Eric E.; Nandigam, Kaveer R.N.; Chen, Yu-Wei; Jeng, Jed; Salat, David; Halpin, Amy; Frosch, Matthew; Wendell, Lauren; Fazen, Louis; Rosand, Jonathan; Viswanathan, Anand; Greenberg, Steven M.
2014-01-01
Background MRI evidence of small vessel disease is common in intracerebral hemorrhage (ICH). We hypothesized that ICH caused by cerebral amyloid angiopathy (CAA) or hypertensive vasculopathy would have different distributions of MRI T2 white matter hyperintensity (WMH) and microbleeds (MB). Methods Data were analyzed from 133 consecutive patients with primary supratentorial ICH and adequate MRI sequences. CAA was diagnosed using the Boston criteria. WMH segmentation was performed using a validated semi-automated method. WMH and MB were compared according to site of symptomatic hematoma origin (lobar vs. deep) or by pattern of hemorrhages, including both hematomas and MB, on MRI GRE sequence (grouped as lobar only--probable CAA, lobar only--possible CAA, deep hemispheric only, or mixed lobar and deep hemorrhages). Results Lobar and deep hemispheric hematoma patients had similar median nWMH volumes (19.5 cm vs. 19.9 cm3, p=0.74) and prevalence of ≥1 MB (54% vs. 52%, p=0.99). The supratentorial WMH distribution was similar according to hemorrhage location category, however the prevalence of brainstem T2 hyperintensity was lower in lobar hematoma vs. deep hematoma (54% vs. 70%, p=0.004). Mixed ICH was common (23%). Mixed ICH patients had large nWMH volumes and a posterior distribution of cortical hemorrhages similar to that seen in CAA. Conclusions WMH distribution is largely similar between CAA-related and non-CAA-related ICH. Mixed lobar and deep hemorrhages are seen on MRI GRE in up to one quarter of patients; in these patients both hypertension and CAA may be contributing to the burden of WMH. PMID:20689084
Wei, Ran; Yan, Yue-Hong; Harris, AJ; Kang, Jong-Soo; Shen, Hui; Zhang, Xian-Chun
2017-01-01
Abstract The eupolypods II ferns represent a classic case of evolutionary radiation and, simultaneously, exhibit high substitution rate heterogeneity. These factors have been proposed to contribute to the contentious resolutions among clades within this fern group in multilocus phylogenetic studies. We investigated the deep phylogenetic relationships of eupolypod II ferns by sampling all major families and using 40 plastid genomes, or plastomes, of which 33 were newly sequenced with next-generation sequencing technology. We performed model-based analyses to evaluate the diversity of molecular evolutionary rates for these ferns. Our plastome data, with more than 26,000 informative characters, yielded good resolution for deep relationships within eupolypods II and unambiguously clarified the position of Rhachidosoraceae and the monophyly of Athyriaceae. Results of rate heterogeneity analysis revealed approximately 33 significant rate shifts in eupolypod II ferns, with the most heterogeneous rates (both accelerations and decelerations) occurring in two phylogenetically difficult lineages, that is, the Rhachidosoraceae–Aspleniaceae and Athyriaceae clades. These observations support the hypothesis that rate heterogeneity has previously constrained the deep phylogenetic resolution in eupolypods II. According to the plastome data, we propose that 14 chloroplast markers are particularly phylogenetically informative for eupolypods II both at the familial and generic levels. Our study demonstrates the power of a character-rich plastome data set and high-throughput sequencing for resolving the recalcitrant lineages, which have undergone rapid evolutionary radiation and dramatic changes in substitution rates. PMID:28854625
Zhang, Zhen; Shang, Haihong; Shi, Yuzhen; Huang, Long; Li, Junwen; Ge, Qun; Gong, Juwu; Liu, Aiying; Chen, Tingting; Wang, Dan; Wang, Yanling; Palanga, Koffi Kibalou; Muhammad, Jamshed; Li, Weijie; Lu, Quanwei; Deng, Xiaoying; Tan, Yunna; Song, Weiwu; Cai, Juan; Li, Pengtao; Rashid, Harun or; Gong, Wankui; Yuan, Youlu
2016-04-11
Upland Cotton (Gossypium hirsutum) is one of the most important worldwide crops it provides natural high-quality fiber for the industrial production and everyday use. Next-generation sequencing is a powerful method to identify single nucleotide polymorphism markers on a large scale for the construction of a high-density genetic map for quantitative trait loci mapping. In this research, a recombinant inbred lines population developed from two upland cotton cultivars 0-153 and sGK9708 was used to construct a high-density genetic map through the specific locus amplified fragment sequencing method. The high-density genetic map harbored 5521 single nucleotide polymorphism markers which covered a total distance of 3259.37 cM with an average marker interval of 0.78 cM without gaps larger than 10 cM. In total 18 quantitative trait loci of boll weight were identified as stable quantitative trait loci and were detected in at least three out of 11 environments and explained 4.15-16.70 % of the observed phenotypic variation. In total, 344 candidate genes were identified within the confidence intervals of these stable quantitative trait loci based on the cotton genome sequence. These genes were categorized based on their function through gene ontology analysis, Kyoto Encyclopedia of Genes and Genomes analysis and eukaryotic orthologous groups analysis. This research reported the first high-density genetic map for Upland Cotton (Gossypium hirsutum) with a recombinant inbred line population using single nucleotide polymorphism markers developed by specific locus amplified fragment sequencing. We also identified quantitative trait loci of boll weight across 11 environments and identified candidate genes within the quantitative trait loci confidence intervals. The results of this research would provide useful information for the next-step work including fine mapping, gene functional analysis, pyramiding breeding of functional genes as well as marker-assisted selection.
USDA-ARS?s Scientific Manuscript database
Quantitative PCR (Q-PCR) utilizing specific primer sequences and a fluorogenic, 5’-exonuclease linear hydrolysis probe is well established as a detection and identification method for Phakopsora pachyrhizi, the soybean rust pathogen. Because of the extreme sensitivity of Q-PCR, the DNA of a single u...
USDA-ARS?s Scientific Manuscript database
Wheat quality is defined by culinary end-uses and processing characteristics. Wheat breeders are interested to identify quantitative trait loci for grain, milling, and end-use quality traits because it is imperative to understand the genetic complexity underlying quantitatively inherited traits to ...
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.
Wang, Sheng; Sun, Siqi; Li, Zhen; Zhang, Renyu; Xu, Jinbo
2017-01-01
Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. http://raptorx.uchicago.edu/ContactMap/.
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model
Li, Zhen; Zhang, Renyu
2017-01-01
Motivation Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. Method This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Results Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. Availability http://raptorx.uchicago.edu/ContactMap/ PMID:28056090
Buschmann, Tilo; Zhang, Rong; Brash, Douglas E; Bystrykh, Leonid V
2014-08-07
DNA barcodes are short unique sequences used to label DNA or RNA-derived samples in multiplexed deep sequencing experiments. During the demultiplexing step, barcodes must be detected and their position identified. In some cases (e.g., with PacBio SMRT), the position of the barcode and DNA context is not well defined. Many reads start inside the genomic insert so that adjacent primers might be missed. The matter is further complicated by coincidental similarities between barcode sequences and reference DNA. Therefore, a robust strategy is required in order to detect barcoded reads and avoid a large number of false positives or negatives.For mass inference problems such as this one, false discovery rate (FDR) methods are powerful and balanced solutions. Since existing FDR methods cannot be applied to this particular problem, we present an adapted FDR method that is suitable for the detection of barcoded reads as well as suggest possible improvements. In our analysis, barcode sequences showed high rates of coincidental similarities with the Mus musculus reference DNA. This problem became more acute when the length of the barcode sequence decreased and the number of barcodes in the set increased. The method presented in this paper controls the tail area-based false discovery rate to distinguish between barcoded and unbarcoded reads. This method helps to establish the highest acceptable minimal distance between reads and barcode sequences. In a proof of concept experiment we correctly detected barcodes in 83% of the reads with a precision of 89%. Sensitivity improved to 99% at 99% precision when the adjacent primer sequence was incorporated in the analysis. The analysis was further improved using a paired end strategy. Following an analysis of the data for sequence variants induced in the Atp1a1 gene of C57BL/6 murine melanocytes by ultraviolet light and conferring resistance to ouabain, we found no evidence of cross-contamination of DNA material between samples. Our method offers a proper quantitative treatment of the problem of detecting barcoded reads in a noisy sequencing environment. It is based on the false discovery rate statistics that allows a proper trade-off between sensitivity and precision to be chosen.
Deep level transient spectroscopy (DLTS) on colloidal-synthesized nanocrystal solids.
Bozyigit, Deniz; Jakob, Michael; Yarema, Olesya; Wood, Vanessa
2013-04-24
We demonstrate current-based, deep level transient spectroscopy (DLTS) on semiconductor nanocrystal solids to obtain quantitative information on deep-lying trap states, which play an important role in the electronic transport properties of these novel solids and impact optoelectronic device performance. Here, we apply this purely electrical measurement to an ethanedithiol-treated, PbS nanocrystal solid and find a deep trap with an activation energy of 0.40 eV and a density of NT = 1.7 × 10(17) cm(-3). We use these findings to draw and interpret band structure models to gain insight into charge transport in PbS nanocrystal solids and the operation of PbS nanocrystal-based solar cells.
Draft Genome Sequence of Aldehyde-Degrading Strain Halomonas axialensis ACH-L-8
Ye, Jun; Ren, Chong; Shan, Xiexie
2016-01-01
Halomonas axialensis ACH-L-8, a deep-sea strain isolated from the South China Sea, has the ability to degrade aldehydes. Here, we present an annotated draft genome sequence of this species, which could provide fundamental molecular information on the aldehydes-degrading mechanism. PMID:27081145
USDA-ARS?s Scientific Manuscript database
Butyrate is a nutritional element with strong epigenetic regulatory activity as an inhibitor of histone deacetylases (HDACs). Based on the analysis of differentially expressed genes induced by butyrate in the bovine epithelial cell using deep RNA-sequencing technology (RNA-seq), a set of unique gen...
RNA splicing regulated by RBFOX1 is essential for cardiac function in zebrafish.
Frese, Karen S; Meder, Benjamin; Keller, Andreas; Just, Steffen; Haas, Jan; Vogel, Britta; Fischer, Simon; Backes, Christina; Matzas, Mark; Köhler, Doreen; Benes, Vladimir; Katus, Hugo A; Rottbauer, Wolfgang
2015-08-15
Alternative splicing is one of the major mechanisms through which the proteomic and functional diversity of eukaryotes is achieved. However, the complex nature of the splicing machinery, its associated splicing regulators and the functional implications of alternatively spliced transcripts are only poorly understood. Here, we investigated the functional role of the splicing regulator rbfox1 in vivo using the zebrafish as a model system. We found that loss of rbfox1 led to progressive cardiac contractile dysfunction and heart failure. By using deep-transcriptome sequencing and quantitative real-time PCR, we show that depletion of rbfox1 in zebrafish results in an altered isoform expression of several crucial target genes, such as actn3a and hug. This study underlines that tightly regulated splicing is necessary for unconstrained cardiac function and renders the splicing regulator rbfox1 an interesting target for investigation in human heart failure and cardiomyopathy. © 2015. Published by The Company of Biologists Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Xusheng; Pandey, Ashutosh K.; Mulligan, Megan K.
Phenome-wide association is a novel reverse genetic strategy to analyze genome-to-phenome relations in human clinical cohorts. Here we test this approach using a large murine population segregating for ~5 million sequence variants, and we compare our results to those extracted from a matched analysis of gene variants in a large human cohort. For the mouse cohort, we amassed a deep and broad open-access phenome consisting of ~4,500 metabolic, physiological, pharmacological and behavioural traits, and more than 90 independent expression quantitative trait locus (QTL), transcriptome, proteome, metagenome and metabolome data sets-by far the largest coherent phenome for any experimental cohort (www.genenetwork.org).more » Here, we tested downstream effects of subsets of variants and discovered several novel associations, including a missense mutation in fumarate hydratase that controls variation in the mitochondrial unfolded protein response in both mouse and Caenorhabditis elegans, and missense mutations in Col6a5 that underlies variation in bone mineral density in both mouse and human.« less
Wang, Mengmeng; Ong, Lee-Ling Sharon; Dauwels, Justin; Asada, H Harry
2018-04-01
Cell migration is a key feature for living organisms. Image analysis tools are useful in studying cell migration in three-dimensional (3-D) in vitro environments. We consider angiogenic vessels formed in 3-D microfluidic devices (MFDs) and develop an image analysis system to extract cell behaviors from experimental phase-contrast microscopy image sequences. The proposed system initializes tracks with the end-point confocal nuclei coordinates. We apply convolutional neural networks to detect cell candidates and combine backward Kalman filtering with multiple hypothesis tracking to link the cell candidates at each time step. These hypotheses incorporate prior knowledge on vessel formation and cell proliferation rates. The association accuracy reaches 86.4% for the proposed algorithm, indicating that the proposed system is able to associate cells more accurately than existing approaches. Cell culture experiments in 3-D MFDs have shown considerable promise for improving biology research. The proposed system is expected to be a useful quantitative tool for potential microscopy problems of MFDs.
Joint mouse–human phenome-wide association to test gene function and disease risk
Wang, Xusheng; Pandey, Ashutosh K.; Mulligan, Megan K.; ...
2016-02-02
Phenome-wide association is a novel reverse genetic strategy to analyze genome-to-phenome relations in human clinical cohorts. Here we test this approach using a large murine population segregating for ~5 million sequence variants, and we compare our results to those extracted from a matched analysis of gene variants in a large human cohort. For the mouse cohort, we amassed a deep and broad open-access phenome consisting of ~4,500 metabolic, physiological, pharmacological and behavioural traits, and more than 90 independent expression quantitative trait locus (QTL), transcriptome, proteome, metagenome and metabolome data sets-by far the largest coherent phenome for any experimental cohort (www.genenetwork.org).more » Here, we tested downstream effects of subsets of variants and discovered several novel associations, including a missense mutation in fumarate hydratase that controls variation in the mitochondrial unfolded protein response in both mouse and Caenorhabditis elegans, and missense mutations in Col6a5 that underlies variation in bone mineral density in both mouse and human.« less
Slack, J.F.; Grenne, Tor; Bekker, A.; Rouxel, O.J.; Lindberg, P.A.
2007-01-01
A current model for the evolution of Proterozoic deep seawater composition involves a change from anoxic sulfide-free to sulfidic conditions 1.8??Ga. In an earlier model the deep ocean became oxic at that time. Both models are based on the secular distribution of banded iron formation (BIF) in shallow marine sequences. We here present a new model based on rare earth elements, especially redox-sensitive Ce, in hydrothermal silica-iron oxide sediments from deeper-water, open-marine settings related to volcanogenic massive sulfide (VMS) deposits. In contrast to Archean, Paleozoic, and modern hydrothermal iron oxide sediments, 1.74 to 1.71??Ga hematitic chert (jasper) and iron formation in central Arizona, USA, show moderate positive to small negative Ce anomalies, suggesting that the redox state of the deep ocean then was at a transitional, suboxic state with low concentrations of dissolved O2 but no H2S. The presence of jasper and/or iron formation related to VMS deposits in other volcanosedimentary sequences ca. 1.79-1.69??Ga, 1.40??Ga, and 1.24??Ga also reflects oxygenated and not sulfidic deep ocean waters during these time periods. Suboxic conditions in the deep ocean are consistent with the lack of shallow-marine BIF ??? 1.8 to 0.8??Ga, and likely limited nutrient concentrations in seawater and, consequently, may have constrained biological evolution. ?? 2006 Elsevier B.V. All rights reserved.
Larval transport modeling of deep-sea invertebrates can aid the search for undiscovered populations.
Yearsley, Jon M; Sigwart, Julia D
2011-01-01
Many deep-sea benthic animals occur in patchy distributions separated by thousands of kilometres, yet because deep-sea habitats are remote, little is known about their larval dispersal. Our novel method simulates dispersal by combining data from the Argo array of autonomous oceanographic probes, deep-sea ecological surveys, and comparative invertebrate physiology. The predicted particle tracks allow quantitative, testable predictions about the dispersal of benthic invertebrate larvae in the south-west Pacific. In a test case presented here, using non-feeding, non-swimming (lecithotrophic trochophore) larvae of polyplacophoran molluscs (chitons), we show that the likely dispersal pathways in a single generation are significantly shorter than the distances between the three known population centres in our study region. The large-scale density of chiton populations throughout our study region is potentially much greater than present survey data suggest, with intermediate 'stepping stone' populations yet to be discovered. We present a new method that is broadly applicable to studies of the dispersal of deep-sea organisms. This test case demonstrates the power and potential applications of our new method, in generating quantitative, testable hypotheses at multiple levels to solve the mismatch between observed and expected distributions: probabilistic predictions of locations of intermediate populations, potential alternative dispersal mechanisms, and expected population genetic structure. The global Argo data have never previously been used to address benthic biology, and our method can be applied to any non-swimming larvae of the deep-sea, giving information upon dispersal corridors and population densities in habitats that remain intrinsically difficult to assess.
Larval Transport Modeling of Deep-Sea Invertebrates Can Aid the Search for Undiscovered Populations
Yearsley, Jon M.; Sigwart, Julia D.
2011-01-01
Background Many deep-sea benthic animals occur in patchy distributions separated by thousands of kilometres, yet because deep-sea habitats are remote, little is known about their larval dispersal. Our novel method simulates dispersal by combining data from the Argo array of autonomous oceanographic probes, deep-sea ecological surveys, and comparative invertebrate physiology. The predicted particle tracks allow quantitative, testable predictions about the dispersal of benthic invertebrate larvae in the south-west Pacific. Principal Findings In a test case presented here, using non-feeding, non-swimming (lecithotrophic trochophore) larvae of polyplacophoran molluscs (chitons), we show that the likely dispersal pathways in a single generation are significantly shorter than the distances between the three known population centres in our study region. The large-scale density of chiton populations throughout our study region is potentially much greater than present survey data suggest, with intermediate ‘stepping stone’ populations yet to be discovered. Conclusions/Significance We present a new method that is broadly applicable to studies of the dispersal of deep-sea organisms. This test case demonstrates the power and potential applications of our new method, in generating quantitative, testable hypotheses at multiple levels to solve the mismatch between observed and expected distributions: probabilistic predictions of locations of intermediate populations, potential alternative dispersal mechanisms, and expected population genetic structure. The global Argo data have never previously been used to address benthic biology, and our method can be applied to any non-swimming larvae of the deep-sea, giving information upon dispersal corridors and population densities in habitats that remain intrinsically difficult to assess. PMID:21857992
Geng, Huili; Sui, Zhenghong; Zhang, Shu; Du, Qingwei; Ren, Yuanyuan; Liu, Yuan; Kong, Fanna; Zhong, Jie; Ma, Qingxia
2015-01-01
Micro-ribonucleic acids (miRNAs) are a large group of endogenous, tiny, non-coding RNAs consisting of 19–25 nucleotides that regulate gene expression at either the transcriptional or post-transcriptional level by mediating gene silencing in eukaryotes. They are considered to be important regulators that affect growth, development, and response to various stresses in plants. Alexandrium catenella is an important marine toxic phytoplankton species that can cause harmful algal blooms (HABs). To date, identification and function analysis of miRNAs in A. catenella remain largely unexamined. In this study, high-throughput sequencing was performed on A. catenella to identify and quantitatively profile the repertoire of small RNAs from two different growth phases. A total of 38,092,056 and 32,969,156 raw reads were obtained from the two small RNA libraries, respectively. In total, 88 mature miRNAs belonging to 32 miRNA families were identified. Significant differences were found in the member number, expression level of various families, and expression abundance of each member within a family. A total of 15 potentially novel miRNAs were identified. Comparative profiling showed that 12 known miRNAs exhibited differential expression between the lag phase and the logarithmic phase. Real-time quantitative RT-PCR (qPCR) was performed to confirm the expression of two differentially expressed miRNAs that were one up-regulated novel miRNA (aca-miR-3p-456915), and one down-regulated conserved miRNA (tae-miR159a). The expression trend of the qPCR assay was generally consistent with the deep sequencing result. Target predictions of the 12 differentially expressed miRNAs resulted in 1813target genes. Gene ontology (GO) analysis and the Kyoto Encyclopedia of Genes and Genomes pathway database (KEGG) annotations revealed that some miRNAs were associated with growth and developmental processes of the alga. These results provide insights into the roles that miRNAs play in the growth of A. catenella, and they provide the basis for further studies of the molecular mechanisms that underlie bloom growth in red tides species. PMID:26398216
The dynamics of genome replication using deep sequencing
Müller, Carolin A.; Hawkins, Michelle; Retkute, Renata; Malla, Sunir; Wilson, Ray; Blythe, Martin J.; Nakato, Ryuichiro; Komata, Makiko; Shirahige, Katsuhiko; de Moura, Alessandro P.S.; Nieduszynski, Conrad A.
2014-01-01
Eukaryotic genomes are replicated from multiple DNA replication origins. We present complementary deep sequencing approaches to measure origin location and activity in Saccharomyces cerevisiae. Measuring the increase in DNA copy number during a synchronous S-phase allowed the precise determination of genome replication. To map origin locations, replication forks were stalled close to their initiation sites; therefore, copy number enrichment was limited to origins. Replication timing profiles were generated from asynchronous cultures using fluorescence-activated cell sorting. Applying this technique we show that the replication profiles of haploid and diploid cells are indistinguishable, indicating that both cell types use the same cohort of origins with the same activities. Finally, increasing sequencing depth allowed the direct measure of replication dynamics from an exponentially growing culture. This is the first time this approach, called marker frequency analysis, has been successfully applied to a eukaryote. These data provide a high-resolution resource and methodological framework for studying genome biology. PMID:24089142
Danielsson, Frida; Wiking, Mikaela; Mahdessian, Diana; Skogs, Marie; Ait Blal, Hammou; Hjelmare, Martin; Stadler, Charlotte; Uhlén, Mathias; Lundberg, Emma
2013-01-04
One of the major challenges of a chromosome-centric proteome project is to explore in a systematic manner the potential proteins identified from the chromosomal genome sequence, but not yet characterized on a protein level. Here, we describe the use of RNA deep sequencing to screen human cell lines for RNA profiles and to use this information to select cell lines suitable for characterization of the corresponding gene product. In this manner, the subcellular localization of proteins can be analyzed systematically using antibody-based confocal microscopy. We demonstrate the usefulness of selecting cell lines with high expression levels of RNA transcripts to increase the likelihood of high quality immunofluorescence staining and subsequent successful subcellular localization of the corresponding protein. The results show a path to combine transcriptomics with affinity proteomics to characterize the proteins in a gene- or chromosome-centric manner.
Deep sequencing reveals persistence of cell-associated mumps vaccine virus in chronic encephalitis.
Morfopoulou, Sofia; Mee, Edward T; Connaughton, Sarah M; Brown, Julianne R; Gilmour, Kimberly; Chong, W K 'Kling'; Duprex, W Paul; Ferguson, Deborah; Hubank, Mike; Hutchinson, Ciaran; Kaliakatsos, Marios; McQuaid, Stephen; Paine, Simon; Plagnol, Vincent; Ruis, Christopher; Virasami, Alex; Zhan, Hong; Jacques, Thomas S; Schepelmann, Silke; Qasim, Waseem; Breuer, Judith
2017-01-01
Routine childhood vaccination against measles, mumps and rubella has virtually abolished virus-related morbidity and mortality. Notwithstanding this, we describe here devastating neurological complications associated with the detection of live-attenuated mumps virus Jeryl Lynn (MuV JL5 ) in the brain of a child who had undergone successful allogeneic transplantation for severe combined immunodeficiency (SCID). This is the first confirmed report of MuV JL5 associated with chronic encephalitis and highlights the need to exclude immunodeficient individuals from immunisation with live-attenuated vaccines. The diagnosis was only possible by deep sequencing of the brain biopsy. Sequence comparison of the vaccine batch to the MuV JL5 isolated from brain identified biased hypermutation, particularly in the matrix gene, similar to those found in measles from cases of SSPE. The findings provide unique insights into the pathogenesis of paramyxovirus brain infections.
Saghatelyan, Ani; Poghosyan, Lianna; Panosyan, Hovik; Birkeland, Nils-Kåre
2015-11-12
The 2,379,636-bp draft genome sequence of Thermus scotoductus strain K1, isolated from geothermal spring outlet located in the Karvachar region in Nagorno Karabakh is presented. Strain K1 shares about 80% genome sequence similarity with T. scotoductus strain SA-01, recovered from a deep gold mine in South Africa. Copyright © 2015 Saghatelyan et al.
Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing
Matochko, Wadim L.; Derda, Ratmir
2013-01-01
Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (S a). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of S a and use them to define the sequencing operator (S e q). Sequencing without any bias and errors is S e q = S a IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (C E N), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process. PMID:24416071
Kukita, Yoji; Matoba, Ryo; Uchida, Junji; Hamakawa, Takuya; Doki, Yuichiro; Imamura, Fumio; Kato, Kikuya
2015-08-01
Circulating tumour DNA (ctDNA) is an emerging field of cancer research. However, current ctDNA analysis is usually restricted to one or a few mutation sites due to technical limitations. In the case of massively parallel DNA sequencers, the number of false positives caused by a high read error rate is a major problem. In addition, the final sequence reads do not represent the original DNA population due to the global amplification step during the template preparation. We established a high-fidelity target sequencing system of individual molecules identified in plasma cell-free DNA using barcode sequences; this system consists of the following two steps. (i) A novel target sequencing method that adds barcode sequences by adaptor ligation. This method uses linear amplification to eliminate the errors introduced during the early cycles of polymerase chain reaction. (ii) The monitoring and removal of erroneous barcode tags. This process involves the identification of individual molecules that have been sequenced and for which the number of mutations have been absolute quantitated. Using plasma cell-free DNA from patients with gastric or lung cancer, we demonstrated that the system achieved near complete elimination of false positives and enabled de novo detection and absolute quantitation of mutations in plasma cell-free DNA. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Poly(A)-tag deep sequencing data processing to extract poly(A) sites.
Wu, Xiaohui; Ji, Guoli; Li, Qingshun Quinn
2015-01-01
Polyadenylation [poly(A)] is an essential posttranscriptional processing step in the maturation of eukaryotic mRNA. The advent of next-generation sequencing (NGS) technology has offered feasible means to generate large-scale data and new opportunities for intensive study of polyadenylation, particularly deep sequencing of the transcriptome targeting the junction of 3'-UTR and the poly(A) tail of the transcript. To take advantage of this unprecedented amount of data, we present an automated workflow to identify polyadenylation sites by integrating NGS data cleaning, processing, mapping, normalizing, and clustering. In this pipeline, a series of Perl scripts are seamlessly integrated to iteratively map the single- or paired-end sequences to the reference genome. After mapping, the poly(A) tags (PATs) at the same genome coordinate are grouped into one cleavage site, and the internal priming artifacts removed. Then the ambiguous region is introduced to parse the genome annotation for cleavage site clustering. Finally, cleavage sites within a close range of 24 nucleotides and from different samples can be clustered into poly(A) clusters. This procedure could be used to identify thousands of reliable poly(A) clusters from millions of NGS sequences in different tissues or treatments.
Crespo, Bibiana G; Wallhead, Philip J; Logares, Ramiro; Pedrós-Alió, Carlos
2016-01-01
High-throughput sequencing (HTS) techniques have suggested the existence of a wealth of species with very low relative abundance: the rare biosphere. We attempted to exhaustively map this rare biosphere in two water samples by performing an exceptionally deep pyrosequencing analysis (~500,000 final reads per sample). Species data were derived by a 97% identity criterion and various parametric distributions were fitted to the observed counts. Using the best-fitting Sichel distribution we estimate a total species richness of 1,568-1,669 (95% Credible Interval) and 5,027-5,196 for surface and deep water samples respectively, implying that 84-89% of the total richness in those two samples was sequenced, and we predict that a quadrupling of the present sequencing effort would suffice to observe 90% of the total richness in both samples. Comparing the HTS results with a culturing approach we found that most of the cultured taxa were not obtained by HTS, despite the high sequencing effort. Culturing therefore remains a useful tool for uncovering marine bacterial diversity, in addition to its other uses for studying the ecology of marine bacteria.
Long, Rui-Cai; Li, Ming-Na; Kang, Jun-Mei; Zhang, Tie-Jun; Sun, Yan; Yang, Qing-Chuan
2015-05-01
Small 21- to 24-nucleotide (nt) ribonucleic acids (RNAs), notably the microRNA (miRNA), are emerging as a posttranscriptional regulation mechanism. Salt stress is one of the primary abiotic stresses that cause the crop losses worldwide. In saline lands, root growth and function of plant are determined by the action of environmental salt stress through specific genes that adapt root development to the restrictive condition. To elucidate the role of miRNAs in salt stress regulation in Medicago, we used a high-throughput sequencing approach to analyze four small RNA libraries from roots of Zhongmu-1 (Medicago sativa) and Jemalong A17 (Medicago truncatula), which were treated with 300 mM NaCl for 0 and 8 h. Each library generated about 20 million short sequences and contained predominantly small RNAs of 24-nt length, followed by 21-nt and 22-nt small RNAs. Using sequence analysis, we identified 385 conserved miRNAs from 96 families, along with 68 novel candidate miRNAs. Of all the 68 predicted novel miRNAs, 15 miRNAs were identified to have miRNA*. Statistical analysis on abundance of sequencing read revealed specific miRNA showing contrasting expression patterns between M. sativa and M. truncatula roots, as well as between roots treated for 0 and 8 h. The expression of 10 conserved and novel miRNAs was also quantified by quantitative real-time reverse transcription polymerase chain reaction (qRT-PCR). The miRNA precursor and target genes were predicted by bioinformatics analysis. We concluded that the salt stress related conserved and novel miRNAs may have a large variety of target mRNAs, some of which might play key roles in salt stress regulation of Medicago. © 2014 Scandinavian Plant Physiology Society.
Automated analysis of high-content microscopy data with deep learning.
Kraus, Oren Z; Grys, Ben T; Ba, Jimmy; Chong, Yolanda; Frey, Brendan J; Boone, Charles; Andrews, Brenda J
2017-04-18
Existing computational pipelines for quantitative analysis of high-content microscopy data rely on traditional machine learning approaches that fail to accurately classify more than a single dataset without substantial tuning and training, requiring extensive analysis. Here, we demonstrate that the application of deep learning to biological image data can overcome the pitfalls associated with conventional machine learning classifiers. Using a deep convolutional neural network (DeepLoc) to analyze yeast cell images, we show improved performance over traditional approaches in the automated classification of protein subcellular localization. We also demonstrate the ability of DeepLoc to classify highly divergent image sets, including images of pheromone-arrested cells with abnormal cellular morphology, as well as images generated in different genetic backgrounds and in different laboratories. We offer an open-source implementation that enables updating DeepLoc on new microscopy datasets. This study highlights deep learning as an important tool for the expedited analysis of high-content microscopy data. © 2017 The Authors. Published under the terms of the CC BY 4.0 license.
Browning, J.V.; Miller, K.G.; McLaughlin, P.P.; Edwards, L.E.; Kulpecz, A.A.; Powars, D.S.; Wade, B.S.; Feigenson, M.D.; Wright, J.D.
2009-01-01
The Eyreville core holes provide the first continuously cored record of postimpact sequences from within the deepest part of the central Chesapeake Bay impact crater. We analyzed the upper Eocene to Pliocene postimpact sediments from the Eyreville A and C core holes for lithology (semiquantitative measurements of grain size and composition), sequence stratigraphy, and chronostratigraphy. Age is based primarily on Sr isotope stratigraphy supplemented by biostratigraphy (dinocysts, nannofossils, and planktonic foraminifers); age resolution is approximately ??0.5 Ma for early Miocene sequences and approximately ??1.0 Ma for younger and older sequences. Eocene-lower Miocene sequences are subtle, upper middle to lower upper Miocene sequences are more clearly distinguished, and upper Miocene- Pliocene sequences display a distinct facies pattern within sequences. We recognize two upper Eocene, two Oligocene, nine Miocene, three Pliocene, and one Pleistocene sequence and correlate them with those in New Jersey and Delaware. The upper Eocene through Pleistocene strata at Eyreville record changes from: (1) rapidly deposited, extremely fi ne-grained Eocene strata that probably represent two sequences deposited in a deep (>200 m) basin; to (2) highly dissected Oligocene (two very thin sequences) to lower Miocene (three thin sequences) with a long hiatus; to (3) a thick, rapidly deposited (43-73 m/Ma), very fi ne-grained, biosiliceous middle Miocene (16.5-14 Ma) section divided into three sequences (V5-V3) deposited in middle neritic paleoenvironments; to (4) a 4.5-Ma-long hiatus (12.8-8.3 Ma); to (5) sandy, shelly upper Miocene to Pliocene strata (8.3-2.0 Ma) divided into six sequences deposited in shelf and shoreface environments; and, last, to (6) a sandy middle Pleistocene paralic sequence (~400 ka). The Eyreville cores thus record the fi lling of a deep impact-generated basin where the timing of sequence boundaries is heavily infl uenced by eustasy. ?? 2009 The Geological Society of America.
Wu, Shuang; Nakamoto, Shingo; Kanda, Tatsuo; Jiang, Xia; Nakamura, Masato; Miyamura, Tatsuo; Shirasawa, Hiroshi; Sugiura, Nobuyuki; Takahashi-Nakaguchi, Azusa; Gonoi, Tohru; Yokosuka, Osamu
2014-01-01
Hepatitis A virus (HAV) is a causative agent of acute viral hepatitis for which an effective vaccine has been developed. Here we describe ultra-deep pyrosequences (UDPSs) of HAV 5'-untranslated region (5'UTR) among cases of the same outbreak, which arose from a single source, associated with a revolving sushi bar. We determined the reference sequence from HAV-derived clone from an attendant by the Sanger method. Sixteen UDPSs from this outbreak and one from another sporadic case were compared with this reference. Nucleotide errors yielded a UDPS error rate of < 1%. This study confirmed that nucleotide substitutions of this region are transition mutations in outbreak cases, that insertion was observed only in non-severe cases, and that these nucleotide substitutions were different from those of the sporadic case. Analysis of UDPSs detected low-prevalence HAV variations in 5'UTR, but no specific mutations associated with severity in these outbreak cases. To our surprise, HAV strains in this outbreak conserved HAV IRES sequence even if we performed analysis of UDPSs. UDPS analysis of HAV 5'UTR gave us no association between the disease severity of hepatitis A and HAV 5'UTR substitutions. It might be more interesting to perform ultra-deep sequencing of full length HAV genome in order to reveal possible unknown genomic determinants associated with disease severity. Further studies will be needed. PMID:24396287
2010-01-01
Background Nematodes represent the most abundant benthic metazoa in one of the largest habitats on earth, the deep sea. Characterizing major patterns of biodiversity within this dominant group is a critical step towards understanding evolutionary patterns across this vast ecosystem. The present study has aimed to place deep-sea nematode species into a phylogenetic framework, investigate relationships between shallow water and deep-sea taxa, and elucidate phylogeographic patterns amongst the deep-sea fauna. Results Molecular data (18 S and 28 S rRNA) confirms a high diversity amongst deep-sea Enoplids. There is no evidence for endemic deep-sea lineages in Maximum Likelihood or Bayesian phylogenies, and Enoplids do not cluster according to depth or geographic location. Tree topologies suggest frequent interchanges between deep-sea and shallow water habitats, as well as a mixture of early radiations and more recently derived lineages amongst deep-sea taxa. This study also provides convincing evidence of cosmopolitan marine species, recovering a subset of Oncholaimid nematodes with identical gene sequences (18 S, 28 S and cox1) at trans-Atlantic sample sites. Conclusions The complex clade structures recovered within the Enoplida support a high global species richness for marine nematodes, with phylogeographic patterns suggesting the existence of closely related, globally distributed species complexes in the deep sea. True cosmopolitan species may additionally exist within this group, potentially driven by specific life history traits of Enoplids. Although this investigation aimed to intensively sample nematodes from the order Enoplida, specimens were only identified down to genus (at best) and our sampling regime focused on an infinitesimal small fraction of the deep-sea floor. Future nematode studies should incorporate an extended sample set covering a wide depth range (shelf, bathyal, and abyssal sites), utilize additional genetic loci (e.g. mtDNA) that are informative at the species level, and apply high-throughput sequencing methods to fully assay community diversity. Finally, further molecular studies are needed to determine whether phylogeographic patterns observed in Enoplids are common across other ubiquitous marine groups (e.g. Chromadorida, Monhysterida). PMID:21167065
Filippidou, Sevasti; Jaussi, Marion; Junier, Thomas; Wunderlin, Tina; Jeanneret, Nicole; Regenspurg, Simona; Li, Po-E; Lo, Chien-Chi; McMurry, Kim; Gleasner, Cheryl D.; Vuyisich, Momchilo; Chain, Patrick S.
2015-01-01
The genome of strain GS3372 is the first publicly available strain of Aeribacillus pallidus. This endospore-forming thermophilic strain was isolated from a deep geothermal reservoir. The availability of this genome can contribute to the clarification of the taxonomy of the closely related Anoxybacillus, Geobacillus, and Aeribacillus genera. PMID:26316637
Bacterial community diversity of the deep-sea octocoral Paramuricea placomus.
Kellogg, Christina A; Ross, Steve W; Brooke, Sandra D
2016-01-01
Compared to tropical corals, much less is known about deep-sea coral biology and ecology. Although the microbial communities of some deep-sea corals have been described, this is the first study to characterize the bacterial community associated with the deep-sea octocoral, Paramuricea placomus . Samples from five colonies of P. placomus were collected from Baltimore Canyon (379-382 m depth) in the Atlantic Ocean off the east coast of the United States of America. DNA was extracted from the coral samples and 16S rRNA gene amplicons were pyrosequenced using V4-V5 primers. Three samples sequenced deeply (>4,000 sequences each) and were further analyzed. The dominant microbial phylum was Proteobacteria, but other major phyla included Firmicutes and Planctomycetes. A conserved community of bacterial taxa held in common across the three P. placomus colonies was identified, comprising 68-90% of the total bacterial community depending on the coral individual. The bacterial community of P. placomus does not appear to include the genus Endozoicomonas , which has been found previously to be the dominant bacterial associate in several temperate and tropical gorgonians. Inferred functionality suggests the possibility of nitrogen cycling by the core bacterial community.
Bacterial community diversity of the deep-sea octocoral Paramuricea placomus
Kellogg, Christina A.; Ross, Steve W.; Brooke, Sandra D.
2016-01-01
Compared to tropical corals, much less is known about deep-sea coral biology and ecology. Although the microbial communities of some deep-sea corals have been described, this is the first study to characterize the bacterial community associated with the deep-sea octocoral, Paramuricea placomus. Samples from five colonies of P. placomus were collected from Baltimore Canyon (379–382 m depth) in the Atlantic Ocean off the east coast of the United States of America. DNA was extracted from the coral samples and 16S rRNA gene amplicons were pyrosequenced using V4-V5 primers. Three samples sequenced deeply (>4,000 sequences each) and were further analyzed. The dominant microbial phylum was Proteobacteria, but other major phyla included Firmicutes and Planctomycetes. A conserved community of bacterial taxa held in common across the three P. placomuscolonies was identified, comprising 68–90% of the total bacterial community depending on the coral individual. The bacterial community of P. placomusdoes not appear to include the genus Endozoicomonas, which has been found previously to be the dominant bacterial associate in several temperate and tropical gorgonians. Inferred functionality suggests the possibility of nitrogen cycling by the core bacterial community.
Edgar, Robyn; Veerapaneni, Ram S.; D’Elia, Tom; Morris, Paul F.; Rogers, Scott O.
2013-01-01
Lake Vostok, the 7th largest (by volume) and 4th deepest lake on Earth, is covered by more than 3,700 m of ice, making it the largest subglacial lake known. The combination of cold, heat (from possible hydrothermal activity), pressure (from the overriding glacier), limited nutrients and complete darkness presents extreme challenges to life. Here, we report metagenomic/metatranscriptomic sequence analyses from four accretion ice sections from the Vostok 5G ice core. Two sections accreted in the vicinity of an embayment on the southwestern end of the lake, and the other two represented part of the southern main basin. We obtained 3,507 unique gene sequences from concentrates of 500 ml of 0.22 µm-filtered accretion ice meltwater. Taxonomic classifications (to genus and/or species) were possible for 1,623 of the sequences. Species determinations in combination with mRNA gene sequence results allowed deduction of the metabolic pathways represented in the accretion ice and, by extension, in the lake. Approximately 94% of the sequences were from Bacteria and 6% were from Eukarya. Only two sequences were from Archaea. In general, the taxa were similar to organisms previously described from lakes, brackish water, marine environments, soil, glaciers, ice, lake sediments, deep-sea sediments, deep-sea thermal vents, animals and plants. Sequences from aerobic, anaerobic, psychrophilic, thermophilic, halophilic, alkaliphilic, acidophilic, desiccation-resistant, autotrophic and heterotrophic organisms were present, including a number from multicellular eukaryotes. PMID:23843994
Shtarkman, Yury M; Koçer, Zeynep A; Edgar, Robyn; Veerapaneni, Ram S; D'Elia, Tom; Morris, Paul F; Rogers, Scott O
2013-01-01
Lake Vostok, the 7(th) largest (by volume) and 4(th) deepest lake on Earth, is covered by more than 3,700 m of ice, making it the largest subglacial lake known. The combination of cold, heat (from possible hydrothermal activity), pressure (from the overriding glacier), limited nutrients and complete darkness presents extreme challenges to life. Here, we report metagenomic/metatranscriptomic sequence analyses from four accretion ice sections from the Vostok 5G ice core. Two sections accreted in the vicinity of an embayment on the southwestern end of the lake, and the other two represented part of the southern main basin. We obtained 3,507 unique gene sequences from concentrates of 500 ml of 0.22 µm-filtered accretion ice meltwater. Taxonomic classifications (to genus and/or species) were possible for 1,623 of the sequences. Species determinations in combination with mRNA gene sequence results allowed deduction of the metabolic pathways represented in the accretion ice and, by extension, in the lake. Approximately 94% of the sequences were from Bacteria and 6% were from Eukarya. Only two sequences were from Archaea. In general, the taxa were similar to organisms previously described from lakes, brackish water, marine environments, soil, glaciers, ice, lake sediments, deep-sea sediments, deep-sea thermal vents, animals and plants. Sequences from aerobic, anaerobic, psychrophilic, thermophilic, halophilic, alkaliphilic, acidophilic, desiccation-resistant, autotrophic and heterotrophic organisms were present, including a number from multicellular eukaryotes.
Transcriptome sequencing of rhizome tissue of Sinopodophyllum hexandrum at two temperatures.
Kumari, Anita; Singh, Heikham Russiachand; Jha, Ashwani; Swarnkar, Mohit Kumar; Shankar, Ravi; Kumar, Sanjay
2014-10-07
Sinopodophyllum hexandrum is an endangered medicinal herb, which is commonly present in elevations ranging between 2,400-4,500 m and is sensitive to temperature. Medicinal property of the species is attributed to the presence of podophyllotoxin in the rhizome tissue. The present work analyzed transcriptome of rhizome tissue of S. hexandrum exposed to 15°C and 25°C to understand the temperature mediated molecular responses including those associated with podophyllotoxin biosynthesis. Deep sequencing of transcriptome with an average coverage of 88.34X yielded 60,089 assembled transcript sequences representing 20,387 unique genes having homology to known genes. Fragments per kilobase of exon per million fragments mapped (FPKM) based expression analysis revealed genes related to growth and development were over-expressed at 15°C, whereas genes involved in stress response were over-expressed at 25°C. There was a decreasing trend of podophyllotoxin accumulation at 25°C; data was well supported by the expression of corresponding genes of the pathway. FPKM data was validated by quantitative real-time polymerase chain reaction data using a total of thirty four genes and a positive correlation between the two platforms of gene expression was obtained. Also, detailed analyses yielded cytochrome P450s, methyltransferases and glycosyltransferases which could be the potential candidate hitherto unidentified genes of podophyllotoxin biosynthesis pathway. The present work revealed temperature responsive transcriptome of S. hexandrum on Illumina platform. Data suggested expression of genes for growth and development and podophyllotoxin biosynthesis at 15°C, and prevalence of those associated with stress response at 25°C.
Wang, Chen; Han, Jian; Liu, Chonghuai; Kibet, Korir Nicholas; Kayesh, Emrul; Shangguan, Lingfei; Li, Xiaoying; Fang, Jinggui
2012-03-29
MicroRNA (miRNA) is a class of functional non-coding small RNA with 19-25 nucleotides in length while Amur grape (Vitis amurensis Rupr.) is an important wild fruit crop with the strongest cold resistance among the Vitis species, is used as an excellent breeding parent for grapevine, and has elicited growing interest in wine production. To date, there is a relatively large number of grapevine miRNAs (vv-miRNAs) from cultivated grapevine varieties such as Vitis vinifera L. and hybrids of V. vinifera and V. labrusca, but there is no report on miRNAs from Vitis amurensis Rupr, a wild grapevine species. A small RNA library from Amur grape was constructed and Solexa technology used to perform deep sequencing of the library followed by subsequent bioinformatics analysis to identify new miRNAs. In total, 126 conserved miRNAs belonging to 27 miRNA families were identified, and 34 known but non-conserved miRNAs were also found. Significantly, 72 new potential Amur grape-specific miRNAs were discovered. The sequences of these new potential va-miRNAs were further validated through miR-RACE, and accumulation of 18 new va-miRNAs in seven tissues of grapevines confirmed by real time RT-PCR (qRT-PCR) analysis. The expression levels of va-miRNAs in flowers and berries were found to be basically consistent in identity to those from deep sequenced sRNAs libraries of combined corresponding tissues. We also describe the conservation and variation of va-miRNAs using miR-SNPs and miR-LDs during plant evolution based on comparison of orthologous sequences, and further reveal that the number and sites of miR-SNP in diverse miRNA families exhibit distinct divergence. Finally, 346 target genes for the new miRNAs were predicted and they include a number of Amur grape stress tolerance genes and many genes regulating anthocyanin synthesis and sugar metabolism. Deep sequencing of short RNAs from Amur grape flowers and berries identified 72 new potential miRNAs and 34 known but non-conserved miRNAs, indicating that specific miRNAs exist in Amur grape. These results show that a number of regulatory miRNAs exist in Amur grape and play an important role in Amur grape growth, development, and response to abiotic or biotic stress.
Application of Deep Learning in Automated Analysis of Molecular Images in Cancer: A Survey
Xue, Yong; Chen, Shihui; Liu, Yong
2017-01-01
Molecular imaging enables the visualization and quantitative analysis of the alterations of biological procedures at molecular and/or cellular level, which is of great significance for early detection of cancer. In recent years, deep leaning has been widely used in medical imaging analysis, as it overcomes the limitations of visual assessment and traditional machine learning techniques by extracting hierarchical features with powerful representation capability. Research on cancer molecular images using deep learning techniques is also increasing dynamically. Hence, in this paper, we review the applications of deep learning in molecular imaging in terms of tumor lesion segmentation, tumor classification, and survival prediction. We also outline some future directions in which researchers may develop more powerful deep learning models for better performance in the applications in cancer molecular imaging. PMID:29114182
Takai, Ken; Oida, Hanako; Suzuki, Yohey; Hirayama, Hisako; Nakagawa, Satoshi; Nunoura, Takuro; Inagaki, Fumio; Nealson, Kenneth H; Horikoshi, Koki
2004-04-01
Distribution profiles of marine crenarchaeota group I in the vicinity of deep-sea hydrothermal systems were mapped with culture-independent molecular techniques. Planktonic samples were obtained from the waters surrounding two geographically and geologically distinct hydrothermal systems, and the abundance of marine crenarchaeota group I was examined by 16S ribosomal DNA clone analysis, quantitative PCR, and whole-cell fluorescence in situ hybridization. A much higher proportion of marine crenarchaeota group I within the microbial community was detected in deep-sea hydrothermal environments than in normal deep and surface seawaters. The highest proportion was always obtained from the ambient seawater adjacent to hydrothermal emissions and chimneys but not from the hydrothermal plumes. These profiles were markedly different from the profiles of epsilon-Proteobacteria, which are abundant in the low temperatures of deep-sea hydrothermal environments.
Buenrostro, Jason D.; Chircus, Lauren M.; Araya, Carlos L.; Layton, Curtis J.; Chang, Howard Y.; Snyder, Michael P.; Greenleaf, William J.
2015-01-01
RNA-protein interactions drive fundamental biological processes and are targets for molecular engineering, yet quantitative and comprehensive understanding of the sequence determinants of affinity remains limited. Here we repurpose a high-throughput sequencing instrument to quantitatively measure binding and dissociation of MS2 coat protein to >107 RNA targets generated on a flow-cell surface by in situ transcription and inter-molecular tethering of RNA to DNA. We decompose the binding energy contributions from primary and secondary RNA structure, finding that differences in affinity are often driven by sequence-specific changes in association rates. By analyzing the biophysical constraints and modeling mutational paths describing the molecular evolution of MS2 from low- to high-affinity hairpins, we quantify widespread molecular epistasis, and a long-hypothesized structure-dependent preference for G:U base pairs over C:A intermediates in evolutionary trajectories. Our results suggest that quantitative analysis of RNA on a massively parallel array (RNAMaP) relationships across molecular variants. PMID:24727714
Using small RNA (sRNA) deep sequencing to understand global virus distribution in plants
USDA-ARS?s Scientific Manuscript database
Small RNAs (sRNAs), a class of regulatory RNAs, have been used to serve as the specificity determinants of suppressing gene expression in plants and animals. Next generation sequencing (NGS) uncovered the sRNA landscape in most organisms including their associated microbes. In the current study, w...
Fatal Metacestode Infection in Bornean Orangutan Caused by Unknown Versteria Species
Gendron-Fitzpatrick, Annette; Deering, Kathleen M.; Wallace, Roberta S.; Clyde, Victoria L.; Lauck, Michael; Rosen, Gail E.; Bennett, Andrew J.; Greiner, Ellis C.; O’Connor, David H.
2014-01-01
A captive juvenile Bornean orangutan (Pongo pygmaeus) died from an unknown disseminated parasitic infection. Deep sequencing of DNA from infected tissues, followed by gene-specific PCR and sequencing, revealed a divergent species within the newly proposed genus Versteria (Cestoda: Taeniidae). Versteria may represent a previously unrecognized risk to primate health. PMID:24377497
USDA-ARS?s Scientific Manuscript database
The phylogeny of Amaryllidaceae tribe Hippeastreae was inferred using chloroplast (3’ycf1, ndhF, trnL-F) and nuclear (ITS rDNA) sequence data under maximum parsimony and maximum likelihood frameworks. Network analyses were applied to resolve conflicting signals among data sets and putative scenarios...
The partial 16S rDNA gene sequences of two thermophilic archaeal strains, TY and TYS, previously isolated from the Guaymas Basin hydrothermal vent site were determined. Lipid analyses and a comparative analysis performed with 16S rDNA sequences of similar thermophilic species sho...
Analysis of alterative cleavage and polyadenylation by 3′ region extraction and deep sequencing
Hoque, Mainul; Ji, Zhe; Zheng, Dinghai; Luo, Wenting; Li, Wencheng; You, Bei; Park, Ji Yeon; Yehia, Ghassan; Tian, Bin
2012-01-01
Alternative cleavage and polyadenylation (APA) leads to mRNA isoforms with different coding sequences (CDS) and/or 3′ untranslated regions (3′UTRs). Using 3′ Region Extraction And Deep Sequencing (3′READS), a method which addresses the internal priming and oligo(A) tail issues that commonly plague polyA site (pA) identification, we comprehensively mapped pAs in the mouse genome, thoroughly annotating 3′ ends of genes and revealing over five thousand pAs (~8% of total) flanked by A-rich sequences, which have hitherto been overlooked. About 79% of mRNA genes and 66% of long non-coding RNA (lncRNA) genes have APA; but these two gene types have distinct usage patterns for pAs in introns and upstream exons. Promoter-distal pAs become relatively more abundant during embryonic development and cell differentiation, a trend affecting pAs in both 3′-most exons and upstream regions. Upregulated isoforms generally have stronger pAs, suggesting global modulation of the 3′ end processing activity in development and differentiation. PMID:23241633
Microbes in deep marine sediments viewed through amplicon sequencing and metagenomics
NASA Astrophysics Data System (ADS)
Biddle, J.; Leon, Z. R.; Russell, J. A., III; Martino, A. J.
2016-12-01
Nearly twenty percent of microbial biomass on Earth can be found in the marine subsurface. The majority of this is concentrated on continental margins, which have been investigated by scientific drilling. On the Costa Rica Margin, Iberian Margin and Peru Margins, sediment samples have been investigated through DNA extraction followed by amplicon and metagenomic sequencing. Overall samples show a high degree of microbial diversity, including many lineages of newly defined groups. In this talk, metagenome assembled genomes of unusual lineages will be presented, including their relationships to shallower relatives. From Costa Rica, in particular, we have retrieved deep relatives of Lokiarchaeota and Thorarchaeota, as well as other deeply branching archaeal relatives. We discuss their genome similarities to both other archaea and eukaryotes. From the Iberian Margin, relatives of Atribacteria and Aerophobetes will be discussed. Finally, we will detail the knowledge lost or gained depending on whether samples are studied via amplicon sequencing or total metagenomics, as studies in other environments have shown that up to 15% of microbial diversity is ignored when samples are studied via amplicon sequencing alone.
Li, Chenghua; Feng, Weida; Qiu, Lihua; Xia, Changge; Su, Xiurong; Jin, Chunhua; Zhou, Tingting; Zeng, Yuan; Li, Taiwu
2012-08-01
MicroRNAs (miRNAs) constitute a family of small RNA species which have been demonstrated to be one of key effectors in mediating host-pathogen interaction. In this study, two haemocytes miRNA libraries were constructed with deep sequenced by illumina Hiseq2000 from healthy (L1) and skin ulceration syndrome Apostichopus japonicus (L2). The high throughput solexa sequencing resulted in 9,579,038 and 7,742,558 clean data from L1 and L2, respectively. Sequences analysis revealed that 40 conserved miRNAs were found in both libraries, in which let-7 and mir-125 were speculated to be clustered together and expressed accordingly. Eighty-six miRNA candidates were also identified by reference genome search and stem-loop structure prediction. Importantly, mir-31 and mir-2008 displayed significant differential expression between the two libraries according to FPKM model, which might be considered as promising targets for elucidating the intrinsic mechanism of skin ulceration syndrome outbreak in the species. Copyright © 2012 Elsevier Ltd. All rights reserved.
Analysis of deep learning methods for blind protein contact prediction in CASP12.
Wang, Sheng; Sun, Siqi; Xu, Jinbo
2018-03-01
Here we present the results of protein contact prediction achieved in CASP12 by our RaptorX-Contact server, which is an early implementation of our deep learning method for contact prediction. On a set of 38 free-modeling target domains with a median family size of around 58 effective sequences, our server obtained an average top L/5 long- and medium-range contact accuracy of 47% and 44%, respectively (L = length). A complete implementation has an average accuracy of 59% and 57%, respectively. Our deep learning method formulates contact prediction as a pixel-level image labeling problem and simultaneously predicts all residue pairs of a protein using a combination of two deep residual neural networks, taking as input the residue conservation information, predicted secondary structure and solvent accessibility, contact potential, and coevolution information. Our approach differs from existing methods mainly in (1) formulating contact prediction as a pixel-level image labeling problem instead of an image-level classification problem; (2) simultaneously predicting all contacts of an individual protein to make effective use of contact occurrence patterns; and (3) integrating both one-dimensional and two-dimensional deep convolutional neural networks to effectively learn complex sequence-structure relationship including high-order residue correlation. This paper discusses the RaptorX-Contact pipeline, both contact prediction and contact-based folding results, and finally the strength and weakness of our method. © 2017 Wiley Periodicals, Inc.
Diversity and Biogeography of Bathyal and Abyssal Seafloor Bacteria
Bienhold, Christina; Zinger, Lucie; Boetius, Antje; Ramette, Alban
2016-01-01
The deep ocean floor covers more than 60% of the Earth’s surface, and hosts diverse bacterial communities with important functions in carbon and nutrient cycles. The identification of key bacterial members remains a challenge and their patterns of distribution in seafloor sediment yet remain poorly described. Previous studies were either regionally restricted or included few deep-sea sediments, and did not specifically test biogeographic patterns across the vast oligotrophic bathyal and abyssal seafloor. Here we define the composition of this deep seafloor microbiome by describing those bacterial operational taxonomic units (OTU) that are specifically associated with deep-sea surface sediments at water depths ranging from 1000–5300 m. We show that the microbiome of the surface seafloor is distinct from the subsurface seafloor. The cosmopolitan bacterial OTU were affiliated with the clades JTB255 (class Gammaproteobacteria, order Xanthomonadales) and OM1 (Actinobacteria, order Acidimicrobiales), comprising 21% and 7% of their respective clades, and about 1% of all sequences in the study. Overall, few sequence-abundant bacterial types were globally dispersed and displayed positive range-abundance relationships. Most bacterial populations were rare and exhibited a high degree of endemism, explaining the substantial differences in community composition observed over large spatial scales. Despite the relative physicochemical uniformity of deep-sea sediments, we identified indicators of productivity regimes, especially sediment organic matter content, as factors significantly associated with changes in bacterial community structure across the globe. PMID:26814838
Diverse deep-sea fungi from the South China Sea and their antimicrobial activity.
Zhang, Xiao-Yong; Zhang, Yun; Xu, Xin-Ya; Qi, Shu-Hua
2013-11-01
We investigated the diversity of fungal communities in nine different deep-sea sediment samples of the South China Sea by culture-dependent methods followed by analysis of fungal internal transcribed spacer (ITS) sequences. Although 14 out of 27 identified species were reported in a previous study, 13 species were isolated from sediments of deep-sea environments for the first report. Moreover, these ITS sequences of six isolates shared 84-92 % similarity with their closest matches in GenBank, which suggested that they might be novel phylotypes of genera Ajellomyces, Podosordaria, Torula, and Xylaria. The antimicrobial activities of these fungal isolates were explored using a double-layer technique. A relatively high proportion (56 %) of fungal isolates exhibited antimicrobial activity against at least one pathogenic bacterium or fungus among four marine pathogenic microbes (Micrococcus luteus, Pseudoaltermonas piscida, Aspergerillus versicolor, and A. sydowii). Out of these antimicrobial fungi, the genera Arthrinium, Aspergillus, and Penicillium exhibited antibacterial and antifungal activities, while genus Aureobasidium displayed only antibacterial activity, and genera Acremonium, Cladosporium, Geomyces, and Phaeosphaeriopsis displayed only antifungal activity. To our knowledge, this is the first report to investigate the diversity and antimicrobial activity of culturable deep-sea-derived fungi in the South China Sea. These results suggest that diverse deep-sea fungi from the South China Sea are a potential source for antibiotics' discovery and further increase the pool of fungi available for natural bioactive product screening.
High fungal diversity and abundance recovered in the deep-sea sediments of the Pacific Ocean.
Xu, Wei; Pang, Ka-Lai; Luo, Zhu-Hua
2014-11-01
Knowledge about the presence and ecological significance of bacteria and archaea in the deep-sea environments has been well recognized, but the eukaryotic microorganisms, such as fungi, have rarely been reported. The present study investigated the composition and abundance of fungal community in the deep-sea sediments of the Pacific Ocean. In this study, a total of 1,947 internal transcribed spacer (ITS) regions of fungal rRNA gene clones were recovered from five sediment samples at the Pacific Ocean (water depths ranging from 5,017 to 6,986 m) using three different PCR primer sets. There were 16, 17, and 15 different operational taxonomic units (OTUs) identified from fungal-universal, Ascomycota-, and Basidiomycota-specific clone libraries, respectively. Majority of the recovered sequences belonged to diverse phylotypes of Ascomycota (25 phylotypes) and Basidiomycota (18 phylotypes). The multiple primer approach totally recovered 27 phylotypes which showed low similarities (≤97 %) with available fungal sequences in the GenBank, suggesting possible new fungal taxa occurring in the deep-sea environments or belonging to taxa not represented in the GenBank. Our results also recovered high fungal LSU rRNA gene copy numbers (3.52 × 10(6) to 5.23 × 10(7)copies/g wet sediment) from the Pacific Ocean sediment samples, suggesting that the fungi might be involved in important ecological functions in the deep-sea environments.
Zhang, Yuxin; Holmes, James; Rabanillo, Iñaki; Guidon, Arnaud; Wells, Shane; Hernando, Diego
2018-09-01
To evaluate the reproducibility of quantitative diffusion measurements obtained with reduced Field of View (rFOV) and Multi-shot EPI (msEPI) acquisitions, using single-shot EPI (ssEPI) as a reference. Diffusion phantom experiments, and prostate diffusion-weighted imaging in healthy volunteers and patients with known or suspected prostate cancer were performed across the three different sequences. Quantitative diffusion measurements of apparent diffusion coefficient, and diffusion kurtosis parameters (healthy volunteers), were obtained and compared across diffusion sequences (rFOV, msEPI, and ssEPI). Other possible confounding factors like b-value combinations and acquisition parameters were also investigated. Both msEPI and rFOV have shown reproducible quantitative diffusion measurements relative to ssEPI; no significant difference in ADC was observed across pulse sequences in the standard diffusion phantom (p = 0.156), healthy volunteers (p ≥ 0.12) or patients (p ≥ 0.26). The ADC values within the non-cancerous central gland and peripheral zone of patients were 1.29 ± 0.17 × 10 -3 mm 2 /s and 1.74 ± 0.23 × 10 -3 mm 2 /s respectively. However, differences in quantitative diffusion parameters were observed across different number of averages for rFOV, and across b-value groups and diffusion models for all the three sequences. Both rFOV and msEPI have the potential to provide high image quality with reproducible quantitative diffusion measurements in prostate diffusion MRI. Copyright © 2018 Elsevier Inc. All rights reserved.
Quantitative trait locus mapping of deep rooting by linkage and association analysis in rice.
Lou, Qiaojun; Chen, Liang; Mei, Hanwei; Wei, Haibin; Feng, Fangjun; Wang, Pei; Xia, Hui; Li, Tiemei; Luo, Lijun
2015-08-01
Deep rooting is a very important trait for plants' drought avoidance, and it is usually represented by the ratio of deep rooting (RDR). Three sets of rice populations were used to determine the genetic base for RDR. A linkage mapping population with 180 recombinant inbred lines and an association mapping population containing 237 rice varieties were used to identify genes linked to RDR. Six quantitative trait loci (QTLs) of RDR were identified as being located on chromosomes 1, 2, 4, 7, and 10. Using 1 019 883 single-nucleotide polymorphisms (SNPs), a genome-wide association study of the RDR was performed. Forty-eight significant SNPs of the RDR were identified and formed a clear peak on the short arm of chromosome 1 in a Manhattan plot. Compared with the shallow-rooting group and the whole collection, the deep-rooting group had selective sweep regions on chromosomes 1 and 2, especially in the major QTL region on chromosome 2. Seven of the nine candidate SNPs identified by association mapping were verified in two RDR extreme groups. The findings from this study will be beneficial to rice drought-resistance research and breeding. © The Author 2015. Published by Oxford University Press on behalf of the Society for Experimental Biology.
Single myelin fiber imaging in living rodents without labeling by deep optical coherence microscopy.
Ben Arous, Juliette; Binding, Jonas; Léger, Jean-François; Casado, Mariano; Topilko, Piotr; Gigan, Sylvain; Boccara, A Claude; Bourdieu, Laurent
2011-11-01
Myelin sheath disruption is responsible for multiple neuropathies in the central and peripheral nervous system. Myelin imaging has thus become an important diagnosis tool. However, in vivo imaging has been limited to either low-resolution techniques unable to resolve individual fibers or to low-penetration imaging of single fibers, which cannot provide quantitative information about large volumes of tissue, as required for diagnostic purposes. Here, we perform myelin imaging without labeling and at micron-scale resolution with >300-μm penetration depth on living rodents. This was achieved with a prototype [termed deep optical coherence microscopy (deep-OCM)] of a high-numerical aperture infrared full-field optical coherence microscope, which includes aberration correction for the compensation of refractive index mismatch and high-frame-rate interferometric measurements. We were able to measure the density of individual myelinated fibers in the rat cortex over a large volume of gray matter. In the peripheral nervous system, deep-OCM allows, after minor surgery, in situ imaging of single myelinated fibers over a large fraction of the sciatic nerve. This allows quantitative comparison of normal and Krox20 mutant mice, in which myelination in the peripheral nervous system is impaired. This opens promising perspectives for myelin chronic imaging in demyelinating diseases and for minimally invasive medical diagnosis.
Single myelin fiber imaging in living rodents without labeling by deep optical coherence microscopy
NASA Astrophysics Data System (ADS)
Ben Arous, Juliette; Binding, Jonas; Léger, Jean-François; Casado, Mariano; Topilko, Piotr; Gigan, Sylvain; Claude Boccara, A.; Bourdieu, Laurent
2011-11-01
Myelin sheath disruption is responsible for multiple neuropathies in the central and peripheral nervous system. Myelin imaging has thus become an important diagnosis tool. However, in vivo imaging has been limited to either low-resolution techniques unable to resolve individual fibers or to low-penetration imaging of single fibers, which cannot provide quantitative information about large volumes of tissue, as required for diagnostic purposes. Here, we perform myelin imaging without labeling and at micron-scale resolution with >300-μm penetration depth on living rodents. This was achieved with a prototype [termed deep optical coherence microscopy (deep-OCM)] of a high-numerical aperture infrared full-field optical coherence microscope, which includes aberration correction for the compensation of refractive index mismatch and high-frame-rate interferometric measurements. We were able to measure the density of individual myelinated fibers in the rat cortex over a large volume of gray matter. In the peripheral nervous system, deep-OCM allows, after minor surgery, in situ imaging of single myelinated fibers over a large fraction of the sciatic nerve. This allows quantitative comparison of normal and Krox20 mutant mice, in which myelination in the peripheral nervous system is impaired. This opens promising perspectives for myelin chronic imaging in demyelinating diseases and for minimally invasive medical diagnosis.
Leakey, Tatiana I; Zielinski, Jerzy; Siegfried, Rachel N; Siegel, Eric R; Fan, Chun-Yang; Cooney, Craig A
2008-06-01
DNA methylation at cytosines is a widely studied epigenetic modification. Methylation is commonly detected using bisulfite modification of DNA followed by PCR and additional techniques such as restriction digestion or sequencing. These additional techniques are either laborious, require specialized equipment, or are not quantitative. Here we describe a simple algorithm that yields quantitative results from analysis of conventional four-dye-trace sequencing. We call this method Mquant and we compare it with the established laboratory method of combined bisulfite restriction assay (COBRA). This analysis of sequencing electropherograms provides a simple, easily applied method to quantify DNA methylation at specific CpG sites.
Sobel Leonard, Ashley; McClain, Micah T; Smith, Gavin J D; Wentworth, David E; Halpin, Rebecca A; Lin, Xudong; Ransier, Amy; Stockwell, Timothy B; Das, Suman R; Gilbert, Anthony S; Lambkin-Williams, Robert; Ginsburg, Geoffrey S; Woods, Christopher W; Koelle, Katia
2016-12-15
Knowledge of influenza virus evolution at the point of transmission and at the intrahost level remains limited, particularly for human hosts. Here, we analyze a unique viral data set of next-generation sequencing (NGS) samples generated from a human influenza challenge study wherein 17 healthy subjects were inoculated with cell- and egg-passaged virus. Nasal wash samples collected from 7 of these subjects were successfully deep sequenced. From these, we characterized changes in the subjects' viral populations during infection and identified differences between the virus in these samples and the viral stock used to inoculate the subjects. We first calculated pairwise genetic distances between the subjects' nasal wash samples, the viral stock, and the influenza virus A/Wisconsin/67/2005 (H3N2) reference strain used to generate the stock virus. These distances revealed that considerable viral evolution occurred at various points in the human challenge study. Further quantitative analyses indicated that (i) the viral stock contained genetic variants that originated and likely were selected for during the passaging process, (ii) direct intranasal inoculation with the viral stock resulted in a selective bottleneck that reduced nonsynonymous genetic diversity in the viral hemagglutinin and nucleoprotein, and (iii) intrahost viral evolution continued over the course of infection. These intrahost evolutionary dynamics were dominated by purifying selection. Our findings indicate that rapid viral evolution can occur during acute influenza infection in otherwise healthy human hosts when the founding population size of the virus is large, as is the case with direct intranasal inoculation. Influenza viruses circulating among humans are known to rapidly evolve over time. However, little is known about how influenza virus evolves across single transmission events and over the course of a single infection. To address these issues, we analyze influenza virus sequences from a human challenge experiment that initiated infection with a cell- and egg-passaged viral stock, which appeared to have adapted during its preparation. We find that the subjects' viral populations differ genetically from the viral stock, with subjects' viral populations having lower representation of the amino-acid-changing variants that arose during viral preparation. We also find that most of the viral evolution occurring over single infections is characterized by further decreases in the frequencies of these amino-acid-changing variants and that only limited intrahost genetic diversification through new mutations is apparent. Our findings indicate that influenza virus populations can undergo rapid genetic changes during acute human infections. Copyright © 2016 Sobel Leonard et al.
Govindarajan, Subramaniam S.; Qi, Feng; Li, Jian-Liang; Sahoo, Malaya K.
2017-01-01
ABSTRACT Paenibacillus sp. strain KS1 was isolated from an epiphyte, Tillandsia usneoides (Spanish moss), in central Florida, USA. Here, we report a draft genome sequence of this strain, which consists of a total of 398 contigs spanning 6,508,195 bp, with a G+C content of 46.5% and comprising 5,401 predicted coding sequences. PMID:28153888
Xiao, Bingbing; Niu, Xiaoxi; Han, Na; Wang, Ben; Du, Pengcheng; Na, Risu; Chen, Chen; Liao, Qinping
2016-06-02
Bacterial vaginosis (BV) is a highly prevalent disease in women, and increases the risk of pelvic inflammatory disease. It has been given wide attention because of the high recurrence rate. Traditional diagnostic methods based on microscope providing limited information on the vaginal microbiota increase the difficulty in tracing the development of the disease in bacteria resistance condition. In this study, we used deep-sequencing technology to observe dynamic variation of the vaginal microbiota at three major time points during treatment, at D0 (before treatment), D7 (stop using the antibiotics) and D30 (the 30-day follow-up visit). Sixty-five patients with BV were enrolled (48 were cured and 17 were not cured), and their bacterial composition of the vaginal microbiota was compared. Interestingly, we identified 9 patients might be recurrence. We also introduced a new measurement point of D7, although its microbiota were significantly inhabited by antibiotic and hard to be observed by traditional method. The vaginal microbiota in deep-sequencing-view present a strong correlation to the final outcome. Thus, coupled with detailed individual bioinformatics analysis and deep-sequencing technology, we may illustrate a more accurate map of vaginal microbial to BV patients, which provide a new opportunity to reduce the rate of recurrence of BV.
Brouilette, Scott; Kuersten, Scott; Mein, Charles; Bozek, Monika; Terry, Anna; Dias, Kerith-Rae; Bhaw-Rosun, Leena; Shintani, Yasunori; Coppen, Steven; Ikebe, Chiho; Sawhney, Vinit; Campbell, Niall; Kaneko, Masahiro; Tano, Nobuko; Ishida, Hidekazu; Suzuki, Ken; Yashiro, Kenta
2012-10-01
Deep sequencing of single cell-derived cDNAs offers novel insights into oncogenesis and embryogenesis. However, traditional library preparation for RNA-seq analysis requires multiple steps with consequent sample loss and stochastic variation at each step significantly affecting output. Thus, a simpler and better protocol is desirable. The recently developed hyperactive Tn5-mediated library preparation, which brings high quality libraries, is likely one of the solutions. Here, we tested the applicability of hyperactive Tn5-mediated library preparation to deep sequencing of single cell cDNA, optimized the protocol, and compared it with the conventional method based on sonication. This new technique does not require any expensive or special equipment, which secures wider availability. A library was constructed from only 100 ng of cDNA, which enables the saving of precious specimens. Only a few steps of robust enzymatic reaction resulted in saved time, enabling more specimens to be prepared at once, and with a more reproducible size distribution among the different specimens. The obtained RNA-seq results were comparable to the conventional method. Thus, this Tn5-mediated preparation is applicable for anyone who aims to carry out deep sequencing for single cell cDNAs. Copyright © 2012 Wiley Periodicals, Inc.
Guo, Feng; Wang, Zhi-Ping; Yu, Ke; Zhang, T.
2015-01-01
Foaming of activated sludge (AS) causes adverse impacts on wastewater treatment operation and hygiene. In this study, we investigated the microbial communities of foam, foaming AS and non-foaming AS in a sewage treatment plant via deep-sequencing of the taxonomic marker genes 16S rRNA and mycobacterial rpoB and a metagenomic approach. In addition to Actinobacteria, many genera (e.g., Clostridium XI, Arcobacter, Flavobacterium) were more abundant in the foam than in the AS. On the other hand, deep-sequencing of rpoB did not detect any obligate pathogenic mycobacteria in the foam. We found that unknown factors other than the abundance of Gordonia sp. could determine the foaming process, because abundance of the same species was stable before and after a foaming event over six months. More interestingly, although the dominant Gordonia foam former was the closest with G. amarae, it was identified as an undescribed Gordonia species by referring to the 16S rRNA gene, gyrB and, most convincingly, the reconstructed draft genome from metagenomic reads. Our results, based on metagenomics and deep sequencing, reveal that foams are derived from diverse taxa, which expands previous understanding and provides new insight into the underlying complications of the foaming phenomenon in AS. PMID:25560234
Oasis 2: improved online analysis of small RNA-seq data.
Rahman, Raza-Ur; Gautam, Abhivyakti; Bethune, Jörn; Sattar, Abdul; Fiosins, Maksims; Magruder, Daniel Sumner; Capece, Vincenzo; Shomroni, Orr; Bonn, Stefan
2018-02-14
Small RNA molecules play important roles in many biological processes and their dysregulation or dysfunction can cause disease. The current method of choice for genome-wide sRNA expression profiling is deep sequencing. Here we present Oasis 2, which is a new main release of the Oasis web application for the detection, differential expression, and classification of small RNAs in deep sequencing data. Compared to its predecessor Oasis, Oasis 2 features a novel and speed-optimized sRNA detection module that supports the identification of small RNAs in any organism with higher accuracy. Next to the improved detection of small RNAs in a target organism, the software now also recognizes potential cross-species miRNAs and viral and bacterial sRNAs in infected samples. In addition, novel miRNAs can now be queried and visualized interactively, providing essential information for over 700 high-quality miRNA predictions across 14 organisms. Robust biomarker signatures can now be obtained using the novel enhanced classification module. Oasis 2 enables biologists and medical researchers to rapidly analyze and query small RNA deep sequencing data with improved precision, recall, and speed, in an interactive and user-friendly environment. Oasis 2 is implemented in Java, J2EE, mysql, Python, R, PHP and JavaScript. It is freely available at https://oasis.dzne.de.
Wilson, M R; Zimmermann, L L; Crawford, E D; Sample, H A; Soni, P R; Baker, A N; Khan, L M; DeRisi, J L
2017-03-01
Solid organ transplant patients are vulnerable to suffering neurologic complications from a wide array of viral infections and can be sentinels in the population who are first to get serious complications from emerging infections like the recent waves of arboviruses, including West Nile virus, Chikungunya virus, Zika virus, and Dengue virus. The diverse and rapidly changing landscape of possible causes of viral encephalitis poses great challenges for traditional candidate-based infectious disease diagnostics that already fail to identify a causative pathogen in approximately 50% of encephalitis cases. We present the case of a 14-year-old girl on immunosuppression for a renal transplant who presented with acute meningoencephalitis. Traditional diagnostics failed to identify an etiology. RNA extracted from her cerebrospinal fluid was subjected to unbiased metagenomic deep sequencing, enhanced with the use of a Cas9-based technique for host depletion. This analysis identified West Nile virus (WNV). Convalescent serum serologies subsequently confirmed WNV seroconversion. These results support a clear clinical role for metagenomic deep sequencing in the setting of suspected viral encephalitis, especially in the context of the high-risk transplant patient population. © 2016 The Authors. American Journal of Transplantation published by Wiley Periodicals, Inc. on behalf of American Society of Transplant Surgeons.
A deep learning framework for causal shape transformation.
Lore, Kin Gwn; Stoecklein, Daniel; Davies, Michael; Ganapathysubramanian, Baskar; Sarkar, Soumik
2018-02-01
Recurrent neural network (RNN) and Long Short-term Memory (LSTM) networks are the common go-to architecture for exploiting sequential information where the output is dependent on a sequence of inputs. However, in most considered problems, the dependencies typically lie in the latent domain which may not be suitable for applications involving the prediction of a step-wise transformation sequence that is dependent on the previous states only in the visible domain with a known terminal state. We propose a hybrid architecture of convolution neural networks (CNN) and stacked autoencoders (SAE) to learn a sequence of causal actions that nonlinearly transform an input visual pattern or distribution into a target visual pattern or distribution with the same support and demonstrated its practicality in a real-world engineering problem involving the physics of fluids. We solved a high-dimensional one-to-many inverse mapping problem concerning microfluidic flow sculpting, where the use of deep learning methods as an inverse map is very seldom explored. This work serves as a fruitful use-case to applied scientists and engineers in how deep learning can be beneficial as a solution for high-dimensional physical problems, and potentially opening doors to impactful advance in fields such as material sciences and medical biology where multistep topological transformations is a key element. Copyright © 2017 Elsevier Ltd. All rights reserved.
Evidence for a persistent microbial seed bank throughout the global ocean
Gibbons, Sean M.; Caporaso, J. Gregory; Pirrung, Meg; Field, Dawn; Knight, Rob; Gilbert, Jack A.
2013-01-01
Do bacterial taxa demonstrate clear endemism, like macroorganisms, or can one site’s bacterial community recapture the total phylogenetic diversity of the world’s oceans? Here we compare a deep bacterial community characterization from one site in the English Channel (L4-DeepSeq) with 356 datasets from the International Census of Marine Microbes (ICoMM) taken from around the globe (ranging from marine pelagic and sediment samples to sponge-associated environments). At the L4-DeepSeq site, increasing sequencing depth uncovers greater phylogenetic overlap with the global ICoMM data. This site contained 31.7–66.2% of operational taxonomic units identified in a given ICoMM biome. Extrapolation of this overlap suggests that 1.93 × 1011 sequences from the L4 site would capture all ICoMM bacterial phylogenetic diversity. Current technology trends suggest this limit may be attainable within 3 y. These results strongly suggest the marine biosphere maintains a previously undetected, persistent microbial seed bank. PMID:23487761
Zhang, De-Chao; Liu, Yan-Xia; Li, Xin-Zheng
2015-09-01
Deep sea ferromanganese (FeMn) nodules contain metallic mineral resources and have great economic potential. In this study, a combination of culture-dependent and culture-independent (16S rRNA genes clone library and pyrosequencing) methods was used to investigate the bacterial diversity in FeMn nodules from Jiaolong Seamount, the South China Sea. Eleven bacterial strains including some moderate thermophiles were isolated. The majority of strains belonged to the phylum Proteobacteria; one isolate belonged to the phylum Firmicutes. A total of 259 near full-length bacterial 16S rRNA gene sequences in a clone library and 67,079 valid reads obtained using pyrosequencing indicated that members of the Gammaproteobacteria dominated, with the most abundant bacterial genera being Pseudomonas and Alteromonas. Sequence analysis indicated the presence of many organisms whose closest relatives are known manganese oxidizers, iron reducers, hydrogen-oxidizing bacteria and methylotrophs. This is the first reported investigation of bacterial diversity associated with deep sea FeMn nodules from the South China Sea.
Zhu, Yuan O; Aw, Pauline P K; de Sessions, Paola Florez; Hong, Shuzhen; See, Lee Xian; Hong, Lewis Z; Wilm, Andreas; Li, Chen Hao; Hue, Stephane; Lim, Seng Gee; Nagarajan, Niranjan; Burkholder, William F; Hibberd, Martin
2017-10-27
Viral populations are complex, dynamic, and fast evolving. The evolution of groups of closely related viruses in a competitive environment is termed quasispecies. To fully understand the role that quasispecies play in viral evolution, characterizing the trajectories of viral genotypes in an evolving population is the key. In particular, long-range haplotype information for thousands of individual viruses is critical; yet generating this information is non-trivial. Popular deep sequencing methods generate relatively short reads that do not preserve linkage information, while third generation sequencing methods have higher error rates that make detection of low frequency mutations a bioinformatics challenge. Here we applied BAsE-Seq, an Illumina-based single-virion sequencing technology, to eight samples from four chronic hepatitis B (CHB) patients - once before antiviral treatment and once after viral rebound due to resistance. With single-virion sequencing, we obtained 248-8796 single-virion sequences per sample, which allowed us to find evidence for both hard and soft selective sweeps. We were able to reconstruct population demographic history that was independently verified by clinically collected data. We further verified four of the samples independently through PacBio SMRT and Illumina Pooled deep sequencing. Overall, we showed that single-virion sequencing yields insight into viral evolution and population dynamics in an efficient and high throughput manner. We believe that single-virion sequencing is widely applicable to the study of viral evolution in the context of drug resistance and host adaptation, allows differentiation between soft or hard selective sweeps, and may be useful in the reconstruction of intra-host viral population demographic history.
Deep Packet/Flow Analysis using GPUs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gong, Qian; Wu, Wenji; DeMar, Phil
Deep packet inspection (DPI) faces severe performance challenges in high-speed networks (40/100 GE) as it requires a large amount of raw computing power and high I/O throughputs. Recently, researchers have tentatively used GPUs to address the above issues and boost the performance of DPI. Typically, DPI applications involve highly complex operations in both per-packet and per-flow data level, often in real-time. The parallel architecture of GPUs fits exceptionally well for per-packet network traffic processing. However, for stateful network protocols such as TCP, their data stream need to be reconstructed in a per-flow level to deliver a consistent content analysis. Sincemore » the flow-centric operations are naturally antiparallel and often require large memory space for buffering out-of-sequence packets, they can be problematic for GPUs, whose memory is normally limited to several gigabytes. In this work, we present a highly efficient GPU-based deep packet/flow analysis framework. The proposed design includes a purely GPU-implemented flow tracking and TCP stream reassembly. Instead of buffering and waiting for TCP packets to become in sequence, our framework process the packets in batch and uses a deterministic finite automaton (DFA) with prefix-/suffix- tree method to detect patterns across out-of-sequence packets that happen to be located in different batches. In conclusion, evaluation shows that our code can reassemble and forward tens of millions of packets per second and conduct a stateful signature-based deep packet inspection at 55 Gbit/s using an NVIDIA K40 GPU.« less
Deller, Timothy W; Khalighi, Mohammad Mehdi; Jansen, Floris P; Glover, Gary H
2018-01-01
The recent introduction of simultaneous whole-body PET/MR scanners has enabled new research taking advantage of the complementary information obtainable with PET and MRI. One such application is kinetic modeling, which requires high levels of PET quantitative stability. To accomplish the required PET stability levels, the PET subsystem must be sufficiently isolated from the effects of MR activity. Performance measurements have previously been published, demonstrating sufficient PET stability in the presence of MR pulsing for typical clinical use; however, PET stability during radiofrequency (RF)-intensive and gradient-intensive sequences has not previously been evaluated for a clinical whole-body scanner. In this work, PET stability of the GE SIGNA PET/MR was examined during simultaneous scanning of aggressive MR pulse sequences. Methods: PET performance tests were acquired with MR idle and during simultaneous MR pulsing. Recent system improvements mitigating RF interference and gain variation were used. A fast recovery fast spin echo MR sequence was selected for high RF power, and an echo planar imaging sequence was selected for its high heat-inducing gradients. Measurements were performed to determine PET stability under varying MR conditions using the following metrics: sensitivity, scatter fraction, contrast recovery, uniformity, count rate performance, and image quantitation. A final PET quantitative stability assessment for simultaneous PET scanning during functional MRI studies was performed with a spiral in-and-out gradient echo sequence. Results: Quantitation stability of a 68 Ge flood phantom was demonstrated within 0.34%. Normalized sensitivity was stable during simultaneous scanning within 0.3%. Scatter fraction measured with a 68 Ge line source in the scatter phantom was stable within the range of 40.4%-40.6%. Contrast recovery and uniformity were comparable for PET images acquired simultaneously with multiple MR conditions. Peak noise equivalent count rate was 224 kcps at an effective activity concentration of 18.6 kBq/mL, and the count rate curves and scatter fraction curve were consistent for the alternating MR pulsing states. A final test demonstrated quantitative stability during a spiral functional MRI sequence. Conclusion: PET stability metrics demonstrated that PET quantitation was not affected during simultaneous aggressive MRI. This stability enables demanding applications such as kinetic modeling. © 2018 by the Society of Nuclear Medicine and Molecular Imaging.
Metagenomic Analysis of Viral Communities in (Hado)Pelagic Sediments
Yoshida, Mitsuhiro; Takaki, Yoshihiro; Eitoku, Masamitsu; Nunoura, Takuro; Takai, Ken
2013-01-01
In this study, we analyzed viral metagenomes (viromes) in the sedimentary habitats of three geographically and geologically distinct (hado)pelagic environments in the northwest Pacific; the Izu-Ogasawara Trench (water depth = 9,760 m) (OG), the Challenger Deep in the Mariana Trench (10,325 m) (MA), and the forearc basin off the Shimokita Peninsula (1,181 m) (SH). Virus abundance ranged from 106 to 1011 viruses/cm3 of sediments (down to 30 cm below the seafloor [cmbsf]). We recovered viral DNA assemblages (viromes) from the (hado)pelagic sediment samples and obtained a total of 37,458, 39,882, and 70,882 sequence reads by 454 GS FLX Titanium pyrosequencing from the virome libraries of the OG, MA, and SH (hado)pelagic sediments, respectively. Only 24−30% of the sequence reads from each virome library exhibited significant similarities to the sequences deposited in the public nr protein database (E-value <10−3 in BLAST). Among the sequences identified as potential viral genes based on the BLAST search, 95−99% of the sequence reads in each library were related to genes from single-stranded DNA (ssDNA) viral families, including Microviridae, Circoviridae, and Geminiviridae. A relatively high abundance of sequences related to the genetic markers (major capsid protein [VP1] and replication protein [Rep]) of two ssDNA viral groups were also detected in these libraries, thereby revealing a high genotypic diversity of their viruses (833 genotypes for VP1 and 2,551 genotypes for Rep). A majority of the viral genes predicted from each library were classified into three ssDNA viral protein categories: Rep, VP1, and minor capsid protein. The deep-sea sedimentary viromes were distinct from the viromes obtained from the oceanic and fresh waters and marine eukaryotes, and thus, deep-sea sediments harbor novel viromes, including previously unidentified ssDNA viruses. PMID:23468952
Metagenomic analysis of viral communities in (hado)pelagic sediments.
Yoshida, Mitsuhiro; Takaki, Yoshihiro; Eitoku, Masamitsu; Nunoura, Takuro; Takai, Ken
2013-01-01
In this study, we analyzed viral metagenomes (viromes) in the sedimentary habitats of three geographically and geologically distinct (hado)pelagic environments in the northwest Pacific; the Izu-Ogasawara Trench (water depth = 9,760 m) (OG), the Challenger Deep in the Mariana Trench (10,325 m) (MA), and the forearc basin off the Shimokita Peninsula (1,181 m) (SH). Virus abundance ranged from 10(6) to 10(11) viruses/cm(3) of sediments (down to 30 cm below the seafloor [cmbsf]). We recovered viral DNA assemblages (viromes) from the (hado)pelagic sediment samples and obtained a total of 37,458, 39,882, and 70,882 sequence reads by 454 GS FLX Titanium pyrosequencing from the virome libraries of the OG, MA, and SH (hado)pelagic sediments, respectively. Only 24-30% of the sequence reads from each virome library exhibited significant similarities to the sequences deposited in the public nr protein database (E-value <10(-3) in BLAST). Among the sequences identified as potential viral genes based on the BLAST search, 95-99% of the sequence reads in each library were related to genes from single-stranded DNA (ssDNA) viral families, including Microviridae, Circoviridae, and Geminiviridae. A relatively high abundance of sequences related to the genetic markers (major capsid protein [VP1] and replication protein [Rep]) of two ssDNA viral groups were also detected in these libraries, thereby revealing a high genotypic diversity of their viruses (833 genotypes for VP1 and 2,551 genotypes for Rep). A majority of the viral genes predicted from each library were classified into three ssDNA viral protein categories: Rep, VP1, and minor capsid protein. The deep-sea sedimentary viromes were distinct from the viromes obtained from the oceanic and fresh waters and marine eukaryotes, and thus, deep-sea sediments harbor novel viromes, including previously unidentified ssDNA viruses.
Roychoudhury, Pavitra; Makhsous, Negar; Hanson, Derek; Chase, Jill; Krueger, Gerhard; Xie, Hong; Huang, Meei-Li; Saunders, Lindsay; Ablashi, Dharam; Koelle, David M.; Cook, Linda; Jerome, Keith R.
2018-01-01
ABSTRACT Quantitative PCR is a diagnostic pillar for clinical virology testing, and reference materials are necessary for accurate, comparable quantitation between clinical laboratories. Accurate quantitation of human herpesvirus 6A/B (HHV-6A/B) is important for detection of viral reactivation and inherited chromosomally integrated HHV-6A/B in immunocompromised patients. Reference materials in clinical virology commonly consist of laboratory-adapted viral strains that may be affected by the culture process. We performed next-generation sequencing to make relative copy number measurements at single nucleotide resolution of eight candidate HHV-6A and seven HHV-6B reference strains and DNA materials from the HHV-6 Foundation and Advanced Biotechnologies Inc. Eleven of 17 (65%) HHV-6A/B candidate reference materials showed multiple copies of the origin of replication upstream of the U41 gene by next-generation sequencing. These large tandem repeats arose independently in culture-adapted HHV-6A and HHV-6B strains, measuring 1,254 bp and 983 bp, respectively. The average copy number measured was between 5 and 10 times the number of copies of the rest of the genome. We also report the first interspecies recombinant HHV-6A/B strain with a HHV-6A backbone and a >5.5-kb region from HHV-6B, from U41 to U43, that covered the origin tandem repeat. Specific HHV-6A reference strains demonstrated duplication of regions at U1/U2, U87, and U89, as well as deletion in the U12-to-U24 region and the U94/U95 genes. HHV-6A/B strains derived from cord blood mononuclear cells from different laboratories on different continents with fewer passages revealed no copy number differences throughout the viral genome. These data indicate that large origin tandem duplications are an adaptation of both HHV-6A and HHV-6B in culture and show interspecies recombination is possible within the Betaherpesvirinae. IMPORTANCE Anything in science that needs to be quantitated requires a standard unit of measurement. This includes viruses, for which quantitation increasingly determines definitions of pathology and guidelines for treatment. However, the act of making standard or reference material in virology can alter its very accuracy through genomic duplications, insertions, and rearrangements. We used deep sequencing to examine candidate reference strains for HHV-6, a ubiquitous human virus that can reactivate in the immunocompromised population and is integrated into the human genome in every cell of the body for 1% of people worldwide. We found large tandem repeats in the origin of replication for both HHV-6A and HHV-6B that are selected for in culture. We also found the first interspecies recombinant between HHV-6A and HHV-6B, a phenomenon that is well known in alphaherpesviruses but to date has not been seen in betaherpesviruses. These data critically inform HHV-6A/B biology and the standard selection process. PMID:29491155
Oral Microbiome of Deep and Shallow Dental Pockets In Chronic Periodontitis
Ge, Xiuchun; Rodriguez, Rafael; Trinh, My; Gunsolley, John; Xu, Ping
2013-01-01
We examined the subgingival bacterial biodiversity in untreated chronic periodontitis patients by sequencing 16S rRNA genes. The primary purpose of the study was to compare the oral microbiome in deep (diseased) and shallow (healthy) sites. A secondary purpose was to evaluate the influences of smoking, race and dental caries on this relationship. A total of 88 subjects from two clinics were recruited. Paired subgingival plaque samples were taken from each subject, one from a probing site depth >5 mm (deep site) and the other from a probing site depth ≤3mm (shallow site). A universal primer set was designed to amplify the V4–V6 region for oral microbial 16S rRNA sequences. Differences in genera and species attributable to deep and shallow sites were determined by statistical analysis using a two-part model and false discovery rate. Fifty-one of 170 genera and 200 of 746 species were found significantly different in abundances between shallow and deep sites. Besides previously identified periodontal disease-associated bacterial species, additional species were found markedly changed in diseased sites. Cluster analysis revealed that the microbiome difference between deep and shallow sites was influenced by patient-level effects such as clinic location, race and smoking. The differences between clinic locations may be influenced by racial distribution, in that all of the African Americans subjects were seen at the same clinic. Our results suggested that there were influences from the microbiome for caries and periodontal disease and these influences are independent. PMID:23762384
Hoshino, Tatsuhiko; Inagaki, Fumio
2017-01-01
Next-generation sequencing (NGS) is a powerful tool for analyzing environmental DNA and provides the comprehensive molecular view of microbial communities. For obtaining the copy number of particular sequences in the NGS library, however, additional quantitative analysis as quantitative PCR (qPCR) or digital PCR (dPCR) is required. Furthermore, number of sequences in a sequence library does not always reflect the original copy number of a target gene because of biases caused by PCR amplification, making it difficult to convert the proportion of particular sequences in the NGS library to the copy number using the mass of input DNA. To address this issue, we applied stochastic labeling approach with random-tag sequences and developed a NGS-based quantification protocol, which enables simultaneous sequencing and quantification of the targeted DNA. This quantitative sequencing (qSeq) is initiated from single-primer extension (SPE) using a primer with random tag adjacent to the 5' end of target-specific sequence. During SPE, each DNA molecule is stochastically labeled with the random tag. Subsequently, first-round PCR is conducted, specifically targeting the SPE product, followed by second-round PCR to index for NGS. The number of random tags is only determined during the SPE step and is therefore not affected by the two rounds of PCR that may introduce amplification biases. In the case of 16S rRNA genes, after NGS sequencing and taxonomic classification, the absolute number of target phylotypes 16S rRNA gene can be estimated by Poisson statistics by counting random tags incorporated at the end of sequence. To test the feasibility of this approach, the 16S rRNA gene of Sulfolobus tokodaii was subjected to qSeq, which resulted in accurate quantification of 5.0 × 103 to 5.0 × 104 copies of the 16S rRNA gene. Furthermore, qSeq was applied to mock microbial communities and environmental samples, and the results were comparable to those obtained using digital PCR and relative abundance based on a standard sequence library. We demonstrated that the qSeq protocol proposed here is advantageous for providing less-biased absolute copy numbers of each target DNA with NGS sequencing at one time. By this new experiment scheme in microbial ecology, microbial community compositions can be explored in more quantitative manner, thus expanding our knowledge of microbial ecosystems in natural environments.
Li, S; Dumdei, E J; Blunt, J W; Munro, M H; Robinson, W T; Pannell, L K
1998-06-26
The structure, stereochemistry, and conformation of theonellapeptolide IIIe (1), a new 36-membered ring cyclic peptolide from the New Zealand deep-water sponge Lamellomorpha strongylata, is described. The sequence of the cytotoxic peptolide was determined through a combination of NMR and MS-MS techniques and confirmed by X-ray crystal structure analysis, which, with chiral HPLC, established the absolute stereochemistry.
Fusarium musae as cause of superficial and deep-seated human infections.
Esposto, M C; Prigitano, A; Tortorano, A M
2016-12-01
BLAST analysis in GenBank of 60 Fusarium verticillioides clinical isolates using the sequence of translation elongation factor 1-alpha allowed the identification of four F. musae confirming that this species is not a rare etiology of superficial and deep infections and that its habitat is not restricted to banana fruits. Copyright © 2016 Elsevier Masson SAS. All rights reserved.
Filippidou, Sevasti; Jaussi, Marion; Junier, Thomas; Wunderlin, Tina; Jeanneret, Nicole; Regenspurg, Simona; Li, Po-E; Lo, Chien-Chi; Johnson, Shannon; McMurry, Kim; Gleasner, Cheryl D; Vuyisich, Momchilo; Chain, Patrick S; Junier, Pilar
2015-08-27
The genome of strain GS3372 is the first publicly available strain of Aeribacillus pallidus. This endospore-forming thermophilic strain was isolated from a deep geothermal reservoir. The availability of this genome can contribute to the clarification of the taxonomy of the closely related Anoxybacillus, Geobacillus, and Aeribacillus genera. Copyright © 2015 Filippidou et al.
Ries, David; Holtgräwe, Daniela; Viehöver, Prisca; Weisshaar, Bernd
2016-03-15
The combination of bulk segregant analysis (BSA) and next generation sequencing (NGS), also known as mapping by sequencing (MBS), has been shown to significantly accelerate the identification of causal mutations for species with a reference genome sequence. The usual approach is to cross homozygous parents that differ for the monogenic trait to address, to perform deep sequencing of DNA from F2 plants pooled according to their phenotype, and subsequently to analyze the allele frequency distribution based on a marker table for the parents studied. The method has been successfully applied for EMS induced mutations as well as natural variation. Here, we show that pooling genetically diverse breeding lines according to a contrasting phenotype also allows high resolution mapping of the causal gene in a crop species. The test case was the monogenic locus causing red vs. green hypocotyl color in Beta vulgaris (R locus). We determined the allele frequencies of polymorphic sequences using sequence data from two diverging phenotypic pools of 180 B. vulgaris accessions each. A single interval of about 31 kbp among the nine chromosomes was identified which indeed contained the causative mutation. By applying a variation of the mapping by sequencing approach, we demonstrated that phenotype-based pooling of diverse accessions from breeding panels and subsequent direct determination of the allele frequency distribution can be successfully applied for gene identification in a crop species. Our approach made it possible to identify a small interval around the causative gene. Sequencing of parents or individual lines was not necessary. Whenever the appropriate plant material is available, the approach described saves time compared to the generation of an F2 population. In addition, we provide clues for planning similar experiments with regard to pool size and the sequencing depth required.
Lake Number, a quantitative indicator of mixing used to estimate changes in dissolved oxygen
Robertson, Dale M.; Imberger, Jorg
1994-01-01
Lake Number, LN, values are shown to be quantitative indicators of deep mixing in lakes and reservoirs that can be used to estimate changes in deep water dissolved oxygen (DO) concentrations. LN is a dimensionless parameter defined as the ratio of the moments about the center of volume of the water body, of the stabilizing force of gravity associated with density stratification to the destabilizing forces supplied by wind, cooling, inflow, outflow, and other artificial mixing devices. To demonstrate the universality of this parameter, LN values are used to describe the extent of deep mixing and are compared with changes in DO concentrations in three reservoirs in Australia and four lakes in the U.S.A., which vary in productivity and mixing regimes. A simple model is developed which relates changes in LN values, i.e., the extent of mixing, to changes in near bottom DO concentrations. After calibrating the model for a specific system, it is possible to use real-time LN values, calculated using water temperature profiles and surface wind velocities, to estimate changes in DO concentrations (assuming unchanged trophic conditions).
Thalassospira xiamenensis sp. nov. and Thalassospira profundimaris sp. nov.
Liu, Chenli; Wu, Yehui; Li, Li; Ma, Yingfei; Shao, Zongze
2007-02-01
Two bacterial strains, M-5T and WP0211T, were isolated from the surface water of a waste-oil pool in a coastal dock and from a deep-sea sediment sample from the West Pacific Ocean, respectively. Analysis of 16S rRNA gene sequences indicated that both strains belonged to the class Alphaproteobacteria and were closely related to Thalassospira lucentensis (96.1 and 96.2 %, gene sequence similarity, respectively). Based on the results of physiological and biochemical tests, as well as DNA-DNA hybridization experiments, it is suggested that these isolates represent two novel species of the genus Thalassospira. Various traits allow both novel strains to be differentiated from Thalassospira lucentensis, including oxygen requirement, nitrate reduction and denitrification abilities and major fatty acid profiles, as well as their ability to utilize six different carbon sources. Furthermore, the novel strains may be readily distinguished from each other by differences in their motility, flagellation, growth at 4 degrees C and 40 degrees C, their ability to hydrolyse Tween 40 and Tween 80, their utilization of 19 different carbon sources and by quantitative differences in their fatty acid contents. It is proposed that the isolates represent two novel species for which the names Thalassospira xiamenensis sp. nov. (type strain, M-5T=DSM 17429T=CGMCC 1.3998T) and Thalassospira profundimaris sp. nov. (type strain, WP0211T=DSM 17430T=CGMCC 1.3997T) are proposed.
Wang, Zhong-Wei; Jiang, Cong; Wen, Qiang; Wang, Na; Tao, Yuan-Yuan; Xu, Li-An
2014-03-15
Camellia chekiangoleosa is an important species of genus Camellia. It provides high-quality edible oil and has great ornamental value. The flowers are big and red which bloom between February and March. Flower pigmentation is closely related to the accumulation of anthocyanin. Although anthocyanin biosynthesis has been studied extensively in herbaceous plants, little molecular information on the anthocyanin biosynthesis pathway of C. chekiangoleosa is yet known. In the present study, a cDNA library was constructed to obtain detailed and general data from the flowers of C. chekiangoleosa. To explore the transcriptome of C. chekiangoleosa and investigate genes involved in anthocyanin biosynthesis, a 454 GS FLX Titanium platform was used to generate an EST dataset. About 46,279 sequences were obtained, and 24,593 (53.1%) were annotated. Using Blast search against the AGRIS, 1740 unigenes were found homologous to 599 Arabidopsis transcription factor genes. Based on the transcriptome dataset, nine anthocyanin biosynthesis pathway genes (PAL, CHS1, CHS2, CHS3, CHI, F3H, DFR, ANS, and UFGT) were identified and cloned. The spatio-temporal expression patterns of these genes were also analyzed using quantitative real-time polymerase chain reaction. The study results not only enrich the gene resource but also provide valuable information for further studies concerning anthocyanin biosynthesis. Copyright © 2014 Elsevier B.V. All rights reserved.
Lata, Pushpa; Govindarajan, Subramaniam S; Qi, Feng; Li, Jian-Liang; Sahoo, Malaya K
2017-02-02
Paenibacillus sp. strain KS1 was isolated from an epiphyte, Tillandsia usneoides (Spanish moss), in central Florida, USA. Here, we report a draft genome sequence of this strain, which consists of a total of 398 contigs spanning 6,508,195 bp, with a G+C content of 46.5% and comprising 5,401 predicted coding sequences. Copyright © 2017 Lata et al.
Xiang, Yu; Bernardy, Mike; Bhagwat, Basdeo; Wiersma, Paul A; DeYoung, Robyn; Bouthillier, Michel
2015-02-01
Strawberry decline disease, probably caused by synergistic reactions of mixed virus infections, threatens the North American strawberry industry. Deep sequencing of strawberry plant samples from eastern Canada resulted in the identification of a new virus genome resembling poleroviruses in sequence and genome structure. Phylogenetic analysis suggests that it is a new member of the genus Polerovirus, family Luteoviridae. The virus is tentatively named "strawberry polerovirus 1" (SPV1).
Tarn, Jonathan; Peoples, Logan M; Hardy, Kevin; Cameron, James; Bartlett, Douglas H
2016-01-01
Relatively few studies have described the microbial populations present in ultra-deep hadal environments, largely as a result of difficulties associated with sampling. Here we report Illumina-tag V6 16S rRNA sequence-based analyses of the free-living and particle-associated microbial communities recovered from locations within two of the deepest hadal sites on Earth, the Challenger Deep (10,918 meters below surface-mbs) and the Sirena Deep (10,667 mbs) within the Mariana Trench, as well as one control site (Ulithi Atoll, 761 mbs). Seawater samples were collected using an autonomous lander positioned ~1 m above the seafloor. The bacterial populations within the Mariana Trench bottom water samples were dissimilar to other deep-sea microbial communities, though with overlap with those of diffuse flow hydrothermal vents and deep-subsurface locations. Distinct particle-associated and free-living bacterial communities were found to exist. The hadal bacterial populations were also markedly different from one another, indicating the likelihood of different chemical conditions at the two sites. In contrast to the bacteria, the hadal archaeal communities were more similar to other less deep datasets and to each other due to an abundance of cosmopolitan deep-sea taxa. The hadal communities were enriched in 34 bacterial and 4 archaeal operational taxonomic units (OTUs) including members of the Gammaproteobacteria, Epsilonproteobacteria, Marinimicrobia, Cyanobacteria, Deltaproteobacteria, Gemmatimonadetes, Atribacteria, Spirochaetes, and Euryarchaeota. Sequences matching cultivated piezophiles were notably enriched in the Challenger Deep, especially within the particle-associated fraction, and were found in higher abundances than in other hadal studies, where they were either far less prevalent or missing. Our results indicate the importance of heterotrophy, sulfur-cycling, and methane and hydrogen utilization within the bottom waters of the deeper regions of the Mariana Trench, and highlight novel community features of these extreme habitats.
Tarn, Jonathan; Peoples, Logan M.; Hardy, Kevin; Cameron, James; Bartlett, Douglas H.
2016-01-01
Relatively few studies have described the microbial populations present in ultra-deep hadal environments, largely as a result of difficulties associated with sampling. Here we report Illumina-tag V6 16S rRNA sequence-based analyses of the free-living and particle-associated microbial communities recovered from locations within two of the deepest hadal sites on Earth, the Challenger Deep (10,918 meters below surface-mbs) and the Sirena Deep (10,667 mbs) within the Mariana Trench, as well as one control site (Ulithi Atoll, 761 mbs). Seawater samples were collected using an autonomous lander positioned ~1 m above the seafloor. The bacterial populations within the Mariana Trench bottom water samples were dissimilar to other deep-sea microbial communities, though with overlap with those of diffuse flow hydrothermal vents and deep-subsurface locations. Distinct particle-associated and free-living bacterial communities were found to exist. The hadal bacterial populations were also markedly different from one another, indicating the likelihood of different chemical conditions at the two sites. In contrast to the bacteria, the hadal archaeal communities were more similar to other less deep datasets and to each other due to an abundance of cosmopolitan deep-sea taxa. The hadal communities were enriched in 34 bacterial and 4 archaeal operational taxonomic units (OTUs) including members of the Gammaproteobacteria, Epsilonproteobacteria, Marinimicrobia, Cyanobacteria, Deltaproteobacteria, Gemmatimonadetes, Atribacteria, Spirochaetes, and Euryarchaeota. Sequences matching cultivated piezophiles were notably enriched in the Challenger Deep, especially within the particle-associated fraction, and were found in higher abundances than in other hadal studies, where they were either far less prevalent or missing. Our results indicate the importance of heterotrophy, sulfur-cycling, and methane and hydrogen utilization within the bottom waters of the deeper regions of the Mariana Trench, and highlight novel community features of these extreme habitats. PMID:27242695
Evolutionary process of deep-sea bathymodiolus mussels.
Miyazaki, Jun-Ichi; de Oliveira Martins, Leonardo; Fujita, Yuko; Matsumoto, Hiroto; Fujiwara, Yoshihiro
2010-04-27
Since the discovery of deep-sea chemosynthesis-based communities, much work has been done to clarify their organismal and environmental aspects. However, major topics remain to be resolved, including when and how organisms invade and adapt to deep-sea environments; whether strategies for invasion and adaptation are shared by different taxa or unique to each taxon; how organisms extend their distribution and diversity; and how they become isolated to speciate in continuous waters. Deep-sea mussels are one of the dominant organisms in chemosynthesis-based communities, thus investigations of their origin and evolution contribute to resolving questions about life in those communities. We investigated worldwide phylogenetic relationships of deep-sea Bathymodiolus mussels and their mytilid relatives by analyzing nucleotide sequences of the mitochondrial cytochrome c oxidase subunit I (COI) and NADH dehydrogenase subunit 4 (ND4) genes. Phylogenetic analysis of the concatenated sequence data showed that mussels of the subfamily Bathymodiolinae from vents and seeps were divided into four groups, and that mussels of the subfamily Modiolinae from sunken wood and whale carcasses assumed the outgroup position and shallow-water modioline mussels were positioned more distantly to the bathymodioline mussels. We provisionally hypothesized the evolutionary history of Bathymodilolus mussels by estimating evolutionary time under a relaxed molecular clock model. Diversification of bathymodioline mussels was initiated in the early Miocene, and subsequently diversification of the groups occurred in the early to middle Miocene. The phylogenetic relationships support the "Evolutionary stepping stone hypothesis," in which mytilid ancestors exploited sunken wood and whale carcasses in their progressive adaptation to deep-sea environments. This hypothesis is also supported by the evolutionary transition of symbiosis in that nutritional adaptation to the deep sea proceeded from extracellular to intracellular symbiotic states in whale carcasses. The estimated evolutionary time suggests that the mytilid ancestors were able to exploit whales during adaptation to the deep sea.
The green ash transcriptome and identification of genes responding to abiotic and biotic stresses
Thomas Lane; Teodora Best; Nicole Zembower; Jack Davitt; Nathan Henry; Yi Xu; Jennifer Koch; Haiying Liang; John McGraw; Stephan Schuster; Donghwan Shim; Mark V. Coggeshall; John E. Carlson; Margaret E. Staton
2016-01-01
Background: To develop a set of transcriptome sequences to support research on environmental stress responses in green ash (Fraxinus pennsylvanica), we undertook deep RNA sequencing of green ash tissues under various stress treatments. The treatments, including emerald ash borer (EAB) feeding, heat, drought, cold and ozone, were selected to mimic...
Woo, Hannah L.; O’Dell, Kaela B.; Utturkar, Sagar; McBride, Kathryn R.; Huntemann, Marcel; Clum, Alicia; Pillay, Manoj; Palaniappan, Krishnaveni; Varghese, Neha; Mikhailova, Natalia; Stamatis, Dimitrios; Reddy, T. B. K.; Ngan, Chew Yee; Daum, Chris; Shapiro, Nicole; Markowitz, Victor; Ivanova, Natalia; Kyrpides, Nikos; Woyke, Tanja; Brown, Steven D.
2016-01-01
Thalassospira sp. strain KO164 was isolated from eastern Mediterranean seawater and sediment laboratory microcosms enriched on insoluble organosolv lignin under oxic conditions. The near-complete genome sequence presented here will facilitate analyses into this deep-ocean bacterium’s ability to degrade recalcitrant organics such as lignin. PMID:27881538
Woo, Hannah L.; O’Dell, Kaela B.; Utturkar, Sagar; ...
2016-11-23
We isolated Thalassospirasp. strain KO164 from eastern Mediterranean seawater and sediment laboratory microcosms enriched on insoluble organosolv lignin under oxic conditions. Furthermore, an analysis of the deep-ocean bacterium’s ability to degrade recalcitrant organics such as lignin near-complete genome sequence, will be presented here.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Woo, Hannah L.; O’Dell, Kaela B.; Utturkar, Sagar
We isolated Thalassospirasp. strain KO164 from eastern Mediterranean seawater and sediment laboratory microcosms enriched on insoluble organosolv lignin under oxic conditions. Furthermore, an analysis of the deep-ocean bacterium’s ability to degrade recalcitrant organics such as lignin near-complete genome sequence, will be presented here.
Draft Genome Sequence of the Spore-Forming Probiotic Strain Bacillus coagulans Unique IS-2
Upadrasta, Aditya; Pitta, Swetha
2016-01-01
Bacillus coagulans Unique IS-2 is a potential spore-forming probiotic that is commercially available on the market. The draft genome sequence presented here provides deep insight into the beneficial features of this strain for its safe use as a probiotic for various human and animal health applications. PMID:27103709
Deep Sequencing Reveals the Complete Genome Sequence of Sweet potato virus G from East Timor
Maina, Solomon; Edwards, Owain R.; Barbetti, Martin J.; de Almeida, Luis; Ximenes, Abel
2016-01-01
We present the first complete Sweet potato virus G (SPVG) genome from sweet potato in East Timor and compare it with seven complete SPVG genomes from South Korea (three), Taiwan (two), Argentina (one), and the United States (one). It most resembles the genomes from the United States and South Korea. PMID:27609925
Deposition and weathering of Asian dust in Korea
NASA Astrophysics Data System (ADS)
Jeong, G.
2013-12-01
Paleolithic stone artifacts in Korea typically occur in brown clay-silt (BCS) sequences. The origin and depositional environment of these sequences are important for reconstructing the paleoenvironment as well as for establishing chronologies of artifact-bearing stratigraphic units. We investigated four BCS-bearing sections in foothills and river and marine terraces in Korea by applying quantitative mineralogical, geochemical, microtextural, and K-Ar isotopic methods. In all four sections, the lower units are colluvial and fluvial deposits strongly influenced by diverse local lithology, whereas the upper units are characterized by BCS units. Mineralogical/geochemical compositions, grain sizes, and colors converge into common properties in the upper BCS units in all sections. These common properties are consistent with the eastward trends of increasing weathering degree and grain size fining throughout the loess-paleosol sections of the Chinese Loess Plateau (CLP). K-Ar detrital ages of the sections also converge upward into a narrow range similar to the age ranges of the loess and paleosols in the CLP. The top BCS unit in the Jeongok section, the thickest section, is underlain by an additionally weathered BCS unit, with strong red chroma indicating a change from warm to cold climate. We did not observe any clear evidence of climatic changes in other thinner sections, which may be due to a superposition of cold-stage accumulation and warm-stage deep weathering. The common properties of the BCSs in Korean sections and their relationship to the CLP loess and paleosols indicate widespread deposition of Asian dust and subsequent weathering in the late Quaternary, forming BCS sequences. In this respect, the BCS sequences investigated here are considered to be the additionally weathered equivalents of the CLP loess-paleosol sequences, having been exposed to the high annual precipitation of the Korean Peninsula. Given the wide distribution of BCS sequences at Paleolithic sites throughout the Peninsula, the findings of this study are important for the ongoing debate surrounding the depositional environments of the Paleolithic deposits, and provide a foundation for the establishment of the chronological framework of the Paleolithic artifact-bearing layers and lithic assemblages.
Deposition and weathering of Asian dust in Paleolithic sites, Korea
NASA Astrophysics Data System (ADS)
Jeong, Gi Young; Choi, Jeong-Heon; Lim, Hyoun Soo; Seong, Chuntaek; Yi, Seon Bok
2013-10-01
Paleolithic stone artifacts in Korea typically occur in brown clay-silt (BCS) sequences. The origin and depositional environment of these sequences are important for reconstructing the paleoenvironment as well as for establishing chronologies of artifact-bearing stratigraphic units. We investigated four BCS-bearing sections in foothills and river and marine terraces in Korea by applying quantitative mineralogical, geochemical, microtextural, and K-Ar isotopic methods. In all four sections, the lower units are colluvial and fluvial deposits strongly influenced by diverse local lithology, whereas the upper units are characterized by BCS units. Mineralogical/geochemical compositions, grain sizes, and colors converge into common properties in the upper BCS units in all sections. These common properties are consistent with the eastward trends of increasing weathering degree and grain size fining throughout the loess-paleosol sections of the Chinese Loess Plateau (CLP). K-Ar detrital ages of the sections also converge upward into a narrow range similar to the age ranges of the loess and paleosols in the CLP. The top BCS unit in the Jeongok section, the thickest section, is underlain by an additionally weathered BCS unit, with strong red chroma indicating a change from warm to cold climate. We did not observe any clear evidence of climatic changes in other thinner sections, which may be due to a superposition of cold-stage accumulation and warm-stage deep weathering. The common properties of the BCSs in Korean sections and their relationship to the CLP loess and paleosols indicate widespread deposition of Asian dust and subsequent weathering in the late Quaternary, forming BCS sequences. In this respect, the BCS sequences investigated here are considered to be the additionally weathered equivalents of the CLP loess-paleosol sequences, having been exposed to the high annual precipitation of the Korean Peninsula. Given the wide distribution of BCS sequences at Paleolithic sites throughout the Peninsula, the findings of this study are important for the ongoing debate surrounding the depositional environments of the Paleolithic deposits, and provide a foundation for the establishment of the chronological framework of the Paleolithic artifact-bearing layers and lithic assemblages.
Carroll, Ian M; Ringel-Kulka, Tamar; Siddle, Jennica P; Klaenhammer, Todd R; Ringel, Yehuda
2012-01-01
The handling and treatment of biological samples is critical when characterizing the composition of the intestinal microbiota between different ecological niches or diseases. Specifically, exposure of fecal samples to room temperature or long term storage in deep freezing conditions may alter the composition of the microbiota. Thus, we stored fecal samples at room temperature and monitored the stability of the microbiota over twenty four hours. We also investigated the stability of the microbiota in fecal samples during a six month storage period at -80°C. As the stability of the fecal microbiota may be affected by intestinal disease, we analyzed two healthy controls and two patients with irritable bowel syndrome (IBS). We used high-throughput pyrosequencing of the 16S rRNA gene to characterize the microbiota in fecal samples stored at room temperature or -80°C at six and seven time points, respectively. The composition of microbial communities in IBS patients and healthy controls were determined and compared using the Quantitative Insights Into Microbial Ecology (QIIME) pipeline. The composition of the microbiota in fecal samples stored for different lengths of time at room temperature or -80°C clustered strongly based on the host each sample originated from. Our data demonstrates that fecal samples exposed to room or deep freezing temperatures for up to twenty four hours and six months, respectively, exhibit a microbial composition and diversity that shares more identity with its host of origin than any other sample.
Shahinas, Dea; Silverman, Michael; Sittler, Taylor; Chiu, Charles; Kim, Peter; Allen-Vercoe, Emma; Weese, Scott; Wong, Andrew; Low, Donald E.; Pillai, Dylan R.
2012-01-01
ABSTRACT Fecal microbiome transplantation by low-volume enema is an effective, safe, and inexpensive alternative to antibiotic therapy for patients with chronic relapsing Clostridium difficile infection (CDI). We explored the microbial diversity of pre- and posttransplant stool specimens from CDI patients (n = 6) using deep sequencing of the 16S rRNA gene. While interindividual variability in microbiota change occurs with fecal transplantation and vancomycin exposure, in this pilot study we note that clinical cure of CDI is associated with an increase in diversity and richness. Genus- and species-level analysis may reveal a cocktail of microorganisms or products thereof that will ultimately be used as a probiotic to treat CDI. PMID:23093385
Mason, Olivia U; Hazen, Terry C; Borglin, Sharon; Chain, Patrick S G; Dubinsky, Eric A; Fortney, Julian L; Han, James; Holman, Hoi-Ying N; Hultman, Jenni; Lamendella, Regina; Mackelprang, Rachel; Malfatti, Stephanie; Tom, Lauren M; Tringe, Susannah G; Woyke, Tanja; Zhou, Jizhong; Rubin, Edward M; Jansson, Janet K
2012-09-01
The Deepwater Horizon oil spill in the Gulf of Mexico resulted in a deep-sea hydrocarbon plume that caused a shift in the indigenous microbial community composition with unknown ecological consequences. Early in the spill history, a bloom of uncultured, thus uncharacterized, members of the Oceanospirillales was previously detected, but their role in oil disposition was unknown. Here our aim was to determine the functional role of the Oceanospirillales and other active members of the indigenous microbial community using deep sequencing of community DNA and RNA, as well as single-cell genomics. Shotgun metagenomic and metatranscriptomic sequencing revealed that genes for motility, chemotaxis and aliphatic hydrocarbon degradation were significantly enriched and expressed in the hydrocarbon plume samples compared with uncontaminated seawater collected from plume depth. In contrast, although genes coding for degradation of more recalcitrant compounds, such as benzene, toluene, ethylbenzene, total xylenes and polycyclic aromatic hydrocarbons, were identified in the metagenomes, they were expressed at low levels, or not at all based on analysis of the metatranscriptomes. Isolation and sequencing of two Oceanospirillales single cells revealed that both cells possessed genes coding for n-alkane and cycloalkane degradation. Specifically, the near-complete pathway for cyclohexane oxidation in the Oceanospirillales single cells was elucidated and supported by both metagenome and metatranscriptome data. The draft genome also included genes for chemotaxis, motility and nutrient acquisition strategies that were also identified in the metagenomes and metatranscriptomes. These data point towards a rapid response of members of the Oceanospirillales to aliphatic hydrocarbons in the deep sea.
Lasorsa, Vito Alessandro; Formicola, Daniela; Pignataro, Piero; Cimmino, Flora; Calabrese, Francesco Maria; Mora, Jaume; Esposito, Maria Rosaria; Pantile, Marcella; Zanon, Carlo; De Mariano, Marilena; Longo, Luca; Hogarty, Michael D.; de Torres, Carmen; Tonini, Gian Paolo; Iolascon, Achille; Capasso, Mario
2016-01-01
The spectrum of somatic mutation of the most aggressive forms of neuroblastoma is not completely determined. We sought to identify potential cancer drivers in clinically aggressive neuroblastoma. Whole exome sequencing was conducted on 17 germline and tumor DNA samples from high-risk patients with adverse events within 36 months from diagnosis (HR-Event3) to identify somatic mutations and deep targeted sequencing of 134 genes selected from the initial screening in additional 48 germline and tumor pairs (62.5% HR-Event3 and high-risk patients), 17 HR-Event3 tumors and 17 human-derived neuroblastoma cell lines. We revealed 22 significantly mutated genes, many of which implicated in cancer progression. Fifteen genes (68.2%) were highly expressed in neuroblastoma supporting their involvement in the disease. CHD9, a cancer driver gene, was the most significantly altered (4.0% of cases) after ALK. Other genes (PTK2, NAV3, NAV1, FZD1 and ATRX), expressed in neuroblastoma and involved in cell invasion and migration were mutated at frequency ranged from 4% to 2%. Focal adhesion and regulation of actin cytoskeleton pathways, were frequently disrupted (14.1% of cases) thus suggesting potential novel therapeutic strategies to prevent disease progression. Notably BARD1, CHEK2 and AXIN2 were enriched in rare, potentially pathogenic, germline variants. In summary, whole exome and deep targeted sequencing identified novel cancer genes of clinically aggressive neuroblastoma. Our analyses show pathway-level implications of infrequently mutated genes in leading neuroblastoma progression. PMID:27009842
Lasorsa, Vito Alessandro; Formicola, Daniela; Pignataro, Piero; Cimmino, Flora; Calabrese, Francesco Maria; Mora, Jaume; Esposito, Maria Rosaria; Pantile, Marcella; Zanon, Carlo; De Mariano, Marilena; Longo, Luca; Hogarty, Michael D; de Torres, Carmen; Tonini, Gian Paolo; Iolascon, Achille; Capasso, Mario
2016-04-19
The spectrum of somatic mutation of the most aggressive forms of neuroblastoma is not completely determined. We sought to identify potential cancer drivers in clinically aggressive neuroblastoma.Whole exome sequencing was conducted on 17 germline and tumor DNA samples from high-risk patients with adverse events within 36 months from diagnosis (HR-Event3) to identify somatic mutations and deep targeted sequencing of 134 genes selected from the initial screening in additional 48 germline and tumor pairs (62.5% HR-Event3 and high-risk patients), 17 HR-Event3 tumors and 17 human-derived neuroblastoma cell lines.We revealed 22 significantly mutated genes, many of which implicated in cancer progression. Fifteen genes (68.2%) were highly expressed in neuroblastoma supporting their involvement in the disease. CHD9, a cancer driver gene, was the most significantly altered (4.0% of cases) after ALK.Other genes (PTK2, NAV3, NAV1, FZD1 and ATRX), expressed in neuroblastoma and involved in cell invasion and migration were mutated at frequency ranged from 4% to 2%.Focal adhesion and regulation of actin cytoskeleton pathways, were frequently disrupted (14.1% of cases) thus suggesting potential novel therapeutic strategies to prevent disease progression.Notably BARD1, CHEK2 and AXIN2 were enriched in rare, potentially pathogenic, germline variants.In summary, whole exome and deep targeted sequencing identified novel cancer genes of clinically aggressive neuroblastoma. Our analyses show pathway-level implications of infrequently mutated genes in leading neuroblastoma progression.
Human Splice-Site Prediction with Deep Neural Networks.
Naito, Tatsuhiko
2018-04-18
Accurate splice-site prediction is essential to delineate gene structures from sequence data. Several computational techniques have been applied to create a system to predict canonical splice sites. For classification tasks, deep neural networks (DNNs) have achieved record-breaking results and often outperformed other supervised learning techniques. In this study, a new method of splice-site prediction using DNNs was proposed. The proposed system receives an input sequence data and returns an answer as to whether it is splice site. The length of input is 140 nucleotides, with the consensus sequence (i.e., "GT" and "AG" for the donor and acceptor sites, respectively) in the middle. Each input sequence model is applied to the pretrained DNN model that determines the probability that an input is a splice site. The model consists of convolutional layers and bidirectional long short-term memory network layers. The pretraining and validation were conducted using the data set tested in previously reported methods. The performance evaluation results showed that the proposed method can outperform the previous methods. In addition, the pattern learned by the DNNs was visualized as position frequency matrices (PFMs). Some of PFMs were very similar to the consensus sequence. The trained DNN model and the brief source code for the prediction system are uploaded. Further improvement will be achieved following the further development of DNNs.
Mallon, Dermot H; Bradley, J Andrew; Winn, Peter J; Taylor, Craig J; Kosmoliaptsis, Vasilis
2015-02-01
We have previously shown that qualitative assessment of surface electrostatic potential of HLA class I molecules helps explain serological patterns of alloantibody binding. We have now used a novel computational approach to quantitate differences in surface electrostatic potential of HLA B-cell epitopes and applied this to explain HLA Bw4 and Bw6 antigenicity. Protein structure models of HLA class I alleles expressing either the Bw4 or Bw6 epitope (defined by sequence motifs at positions 77 to 83) were generated using comparative structure prediction. The electrostatic potential in 3-dimensional space encompassing the Bw4/Bw6 epitope was computed by solving the Poisson-Boltzmann equation and quantitatively compared in a pairwise, all-versus-all fashion to produce distance matrices that cluster epitopes with similar electrostatics properties. Quantitative comparison of surface electrostatic potential at the carboxyl terminal of the α1-helix of HLA class I alleles, corresponding to amino acid sequence motif 77 to 83, produced clustering of HLA molecules in 3 principal groups according to Bw4 or Bw6 epitope expression. Remarkably, quantitative differences in electrostatic potential reflected known patterns of serological reactivity better than Bw4/Bw6 amino acid sequence motifs. Quantitative assessment of epitope electrostatic potential allowed the impact of known amino acid substitutions (HLA-B*07:02 R79G, R82L, G83R) that are critical for antibody binding to be predicted. We describe a novel approach for quantitating differences in HLA B-cell epitope electrostatic potential. Proof of principle is provided that this approach enables better assessment of HLA epitope antigenicity than amino acid sequence data alone, and it may allow prediction of HLA immunogenicity.
Sequence and Structure Dependent DNA-DNA Interactions
NASA Astrophysics Data System (ADS)
Kopchick, Benjamin; Qiu, Xiangyun
Molecular forces between dsDNA strands are largely dominated by electrostatics and have been extensively studied. Quantitative knowledge has been accumulated on how DNA-DNA interactions are modulated by varied biological constituents such as ions, cationic ligands, and proteins. Despite its central role in biology, the sequence of DNA has not received substantial attention and ``random'' DNA sequences are typically used in biophysical studies. However, ~50% of human genome is composed of non-random-sequence DNAs, particularly repetitive sequences. Furthermore, covalent modifications of DNA such as methylation play key roles in gene functions. Such DNAs with specific sequences or modifications often take on structures other than the canonical B-form. Here we present series of quantitative measurements of the DNA-DNA forces with the osmotic stress method on different DNA sequences, from short repeats to the most frequent sequences in genome, and to modifications such as bromination and methylation. We observe peculiar behaviors that appear to be strongly correlated with the incurred structural changes. We speculate the causalities in terms of the differences in hydration shell and DNA surface structures.
Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology.
Otto, Thomas D; Sanders, Mandy; Berriman, Matthew; Newbold, Chris
2010-07-15
The accuracy of reference genomes is important for downstream analysis but a low error rate requires expensive manual interrogation of the sequence. Here, we describe a novel algorithm (Iterative Correction of Reference Nucleotides) that iteratively aligns deep coverage of short sequencing reads to correct errors in reference genome sequences and evaluate their accuracy. Using Plasmodium falciparum (81% A + T content) as an extreme example, we show that the algorithm is highly accurate and corrects over 2000 errors in the reference sequence. We give examples of its application to numerous other eukaryotic and prokaryotic genomes and suggest additional applications. The software is available at http://icorn.sourceforge.net
Deep learning on temporal-spectral data for anomaly detection
NASA Astrophysics Data System (ADS)
Ma, King; Leung, Henry; Jalilian, Ehsan; Huang, Daniel
2017-05-01
Detecting anomalies is important for continuous monitoring of sensor systems. One significant challenge is to use sensor data and autonomously detect changes that cause different conditions to occur. Using deep learning methods, we are able to monitor and detect changes as a result of some disturbance in the system. We utilize deep neural networks for sequence analysis of time series. We use a multi-step method for anomaly detection. We train the network to learn spectral and temporal features from the acoustic time series. We test our method using fiber-optic acoustic data from a pipeline.
Targeted Quantitation of Proteins by Mass Spectrometry
2013-01-01
Quantitative measurement of proteins is one of the most fundamental analytical tasks in a biochemistry laboratory, but widely used immunochemical methods often have limited specificity and high measurement variation. In this review, we discuss applications of multiple-reaction monitoring (MRM) mass spectrometry, which allows sensitive, precise quantitative analyses of peptides and the proteins from which they are derived. Systematic development of MRM assays is permitted by databases of peptide mass spectra and sequences, software tools for analysis design and data analysis, and rapid evolution of tandem mass spectrometer technology. Key advantages of MRM assays are the ability to target specific peptide sequences, including variants and modified forms, and the capacity for multiplexing that allows analysis of dozens to hundreds of peptides. Different quantitative standardization methods provide options that balance precision, sensitivity, and assay cost. Targeted protein quantitation by MRM and related mass spectrometry methods can advance biochemistry by transforming approaches to protein measurement. PMID:23517332
Targeted quantitation of proteins by mass spectrometry.
Liebler, Daniel C; Zimmerman, Lisa J
2013-06-04
Quantitative measurement of proteins is one of the most fundamental analytical tasks in a biochemistry laboratory, but widely used immunochemical methods often have limited specificity and high measurement variation. In this review, we discuss applications of multiple-reaction monitoring (MRM) mass spectrometry, which allows sensitive, precise quantitative analyses of peptides and the proteins from which they are derived. Systematic development of MRM assays is permitted by databases of peptide mass spectra and sequences, software tools for analysis design and data analysis, and rapid evolution of tandem mass spectrometer technology. Key advantages of MRM assays are the ability to target specific peptide sequences, including variants and modified forms, and the capacity for multiplexing that allows analysis of dozens to hundreds of peptides. Different quantitative standardization methods provide options that balance precision, sensitivity, and assay cost. Targeted protein quantitation by MRM and related mass spectrometry methods can advance biochemistry by transforming approaches to protein measurement.
Sivadas, A; Salleh, M Z; Teh, L K; Scaria, V
2017-10-01
Expanding the scope of pharmacogenomic research by including multiple global populations is integral to building robust evidence for its clinical translation. Deep whole-genome sequencing of diverse ethnic populations provides a unique opportunity to study rare and common pharmacogenomic markers that often vary in frequency across populations. In this study, we aim to build a diverse map of pharmacogenetic variants in South East Asian (SEA) Malay population using deep whole-genome sequences of 100 healthy SEA Malay individuals. We investigated the allelic diversity of potentially deleterious pharmacogenomic variants in SEA Malay population. Our analysis revealed 227 common and 466 rare potentially functional single nucleotide variants (SNVs) in 437 pharmacogenomic genes involved in drug metabolism, transport and target genes, including 74 novel variants. This study has created one of the most comprehensive maps of pharmacogenetic markers in any population from whole genomes and will hugely benefit pharmacogenomic investigations and drug dosage recommendations in SEA Malays.
Rapid Fine Conformational Epitope Mapping Using Comprehensive Mutagenesis and Deep Sequencing*
Kowalsky, Caitlin A.; Faber, Matthew S.; Nath, Aritro; Dann, Hailey E.; Kelly, Vince W.; Liu, Li; Shanker, Purva; Wagner, Ellen K.; Maynard, Jennifer A.; Chan, Christina; Whitehead, Timothy A.
2015-01-01
Knowledge of the fine location of neutralizing and non-neutralizing epitopes on human pathogens affords a better understanding of the structural basis of antibody efficacy, which will expedite rational design of vaccines, prophylactics, and therapeutics. However, full utilization of the wealth of information from single cell techniques and antibody repertoire sequencing awaits the development of a high throughput, inexpensive method to map the conformational epitopes for antibody-antigen interactions. Here we show such an approach that combines comprehensive mutagenesis, cell surface display, and DNA deep sequencing. We develop analytical equations to identify epitope positions and show the method effectiveness by mapping the fine epitope for different antibodies targeting TNF, pertussis toxin, and the cancer target TROP2. In all three cases, the experimentally determined conformational epitope was consistent with previous experimental datasets, confirming the reliability of the experimental pipeline. Once the comprehensive library is generated, fine conformational epitope maps can be prepared at a rate of four per day. PMID:26296891
Tracking the origins and drivers of subclonal metastatic expansion in prostate cancer
Hong, Matthew K. H.; Macintyre, Geoff; Wedge, David C.; ...
2015-04-01
Tumour heterogeneity in primary prostate cancer is a well-established phenomenon. However, how the subclonal diversity of tumours changes during metastasis and progression to lethality is poorly understood. Here we reveal the precise direction of metastatic spread across four lethal prostate cancer patients using whole-genome and ultra-deep targeted sequencing of longitudinally collected primary and metastatic tumours. We find one case of metastatic spread to the surgical bed causing local recurrence, and another case of cross-metastatic site seeding combining with dynamic remoulding of subclonal mixtures in response to therapy. By ultra-deep sequencing end-stage blood, we detect both metastatic and primary tumour clones,more » even years after removal of the prostate. As a result, analysis of mutations associated with metastasis reveals an enrichment of TP53 mutations, and additional sequencing of metastases from 19 patients demonstrates that acquisition of TP53 mutations is linked with the expansion of subclones with metastatic potential which we can detect in the blood.« less
NASA Technical Reports Server (NTRS)
Woese, C. R.; Achenbach, L.; Rouviere, P.; Mandelco, L.
1991-01-01
A major and too little recognized source of artifact in phylogenetic analysis of molecular sequence data is compositional difference among sequences. The problem becomes particularly acute when alignments contain ribosomal RNAs from both mesophilic and thermophilic species. Among prokaryotes the latter are considerably higher in G + C content than the former, which often results in artificial clustering of thermophilic lineages and their being placed artificially deep in phylogenetic trees. In this communication we review archaeal phylogeny in the light of this consideration, focusing in particular on the phylogenetic position of the sulfate reducing species Archaeoglobus fulgidus, using both 16S rRNA and 23S rRNA sequences. The analysis shows clearly that the previously reported deep branching of the A. fulgidus lineage (very near the base of the euryarchaeal side of the archaeal tree) is incorrect, and that the lineage actually groups with a previously recognized unit that comprises the Methanomicrobiales and extreme halophiles.
Tracking the origins and drivers of subclonal metastatic expansion in prostate cancer.
Hong, Matthew K H; Macintyre, Geoff; Wedge, David C; Van Loo, Peter; Patel, Keval; Lunke, Sebastian; Alexandrov, Ludmil B; Sloggett, Clare; Cmero, Marek; Marass, Francesco; Tsui, Dana; Mangiola, Stefano; Lonie, Andrew; Naeem, Haroon; Sapre, Nikhil; Phal, Pramit M; Kurganovs, Natalie; Chin, Xiaowen; Kerger, Michael; Warren, Anne Y; Neal, David; Gnanapragasam, Vincent; Rosenfeld, Nitzan; Pedersen, John S; Ryan, Andrew; Haviv, Izhak; Costello, Anthony J; Corcoran, Niall M; Hovens, Christopher M
2015-04-01
Tumour heterogeneity in primary prostate cancer is a well-established phenomenon. However, how the subclonal diversity of tumours changes during metastasis and progression to lethality is poorly understood. Here we reveal the precise direction of metastatic spread across four lethal prostate cancer patients using whole-genome and ultra-deep targeted sequencing of longitudinally collected primary and metastatic tumours. We find one case of metastatic spread to the surgical bed causing local recurrence, and another case of cross-metastatic site seeding combining with dynamic remoulding of subclonal mixtures in response to therapy. By ultra-deep sequencing end-stage blood, we detect both metastatic and primary tumour clones, even years after removal of the prostate. Analysis of mutations associated with metastasis reveals an enrichment of TP53 mutations, and additional sequencing of metastases from 19 patients demonstrates that acquisition of TP53 mutations is linked with the expansion of subclones with metastatic potential which we can detect in the blood.
Deep intronic GPR143 mutation in a Japanese family with ocular albinism
Naruto, Takuya; Okamoto, Nobuhiko; Masuda, Kiyoshi; Endo, Takao; Hatsukawa, Yoshikazu; Kohmoto, Tomohiro; Imoto, Issei
2015-01-01
Deep intronic mutations are often ignored as possible causes of human disease. Using whole-exome sequencing, we analysed genomic DNAs of a Japanese family with two male siblings affected by ocular albinism and congenital nystagmus. Although mutations or copy number alterations of coding regions were not identified in candidate genes, the novel intronic mutation c.659-131 T > G within GPR143 intron 5 was identified as hemizygous in affected siblings and as heterozygous in the unaffected mother. This mutation was predicted to create a cryptic splice donor site within intron 5 and activate a cryptic acceptor site at 41nt upstream, causing the insertion into the coding sequence of an out-of-frame 41-bp pseudoexon with a premature stop codon in the aberrant transcript, which was confirmed by minigene experiments. This result expands the mutational spectrum of GPR143 and suggests the utility of next-generation sequencing integrated with in silico and experimental analyses for improving the molecular diagnosis of this disease. PMID:26061757
Deep intronic GPR143 mutation in a Japanese family with ocular albinism.
Naruto, Takuya; Okamoto, Nobuhiko; Masuda, Kiyoshi; Endo, Takao; Hatsukawa, Yoshikazu; Kohmoto, Tomohiro; Imoto, Issei
2015-06-10
Deep intronic mutations are often ignored as possible causes of human disease. Using whole-exome sequencing, we analysed genomic DNAs of a Japanese family with two male siblings affected by ocular albinism and congenital nystagmus. Although mutations or copy number alterations of coding regions were not identified in candidate genes, the novel intronic mutation c.659-131 T > G within GPR143 intron 5 was identified as hemizygous in affected siblings and as heterozygous in the unaffected mother. This mutation was predicted to create a cryptic splice donor site within intron 5 and activate a cryptic acceptor site at 41nt upstream, causing the insertion into the coding sequence of an out-of-frame 41-bp pseudoexon with a premature stop codon in the aberrant transcript, which was confirmed by minigene experiments. This result expands the mutational spectrum of GPR143 and suggests the utility of next-generation sequencing integrated with in silico and experimental analyses for improving the molecular diagnosis of this disease.
Protein model discrimination using mutational sensitivity derived from deep sequencing.
Adkar, Bharat V; Tripathi, Arti; Sahoo, Anusmita; Bajaj, Kanika; Goswami, Devrishi; Chakrabarti, Purbani; Swarnkar, Mohit K; Gokhale, Rajesh S; Varadarajan, Raghavan
2012-02-08
A major bottleneck in protein structure prediction is the selection of correct models from a pool of decoys. Relative activities of ∼1,200 individual single-site mutants in a saturation library of the bacterial toxin CcdB were estimated by determining their relative populations using deep sequencing. This phenotypic information was used to define an empirical score for each residue (RankScore), which correlated with the residue depth, and identify active-site residues. Using these correlations, ∼98% of correct models of CcdB (RMSD ≤ 4Å) were identified from a large set of decoys. The model-discrimination methodology was further validated on eleven different monomeric proteins using simulated RankScore values. The methodology is also a rapid, accurate way to obtain relative activities of each mutant in a large pool and derive sequence-structure-function relationships without protein isolation or characterization. It can be applied to any system in which mutational effects can be monitored by a phenotypic readout. Copyright © 2012 Elsevier Ltd. All rights reserved.
UCSC genome browser: deep support for molecular biomedical research.
Mangan, Mary E; Williams, Jennifer M; Lathe, Scott M; Karolchik, Donna; Lathe, Warren C
2008-01-01
The volume and complexity of genomic sequence data, and the additional experimental data required for annotation of the genomic context, pose a major challenge for display and access for biomedical researchers. Genome browsers organize this data and make it available in various ways to extract useful information to advance research projects. The UCSC Genome Browser is one of these resources. The official sequence data for a given species forms the framework to display many other types of data such as expression, variation, cross-species comparisons, and more. Visual representations of the data are available for exploration. Data can be queried with sequences. Complex database queries are also easily achieved with the Table Browser interface. Associated tools permit additional query types or access to additional data sources such as images of in situ localizations. Support for solving researcher's issues is provided with active discussion mailing lists and by providing updated training materials. The UCSC Genome Browser provides a source of deep support for a wide range of biomedical molecular research (http://genome.ucsc.edu).
Detection of non-coding RNA in bacteria and archaea using the DETR'PROK Galaxy pipeline.
Toffano-Nioche, Claire; Luo, Yufei; Kuchly, Claire; Wallon, Claire; Steinbach, Delphine; Zytnicki, Matthias; Jacq, Annick; Gautheret, Daniel
2013-09-01
RNA-seq experiments are now routinely used for the large scale sequencing of transcripts. In bacteria or archaea, such deep sequencing experiments typically produce 10-50 million fragments that cover most of the genome, including intergenic regions. In this context, the precise delineation of the non-coding elements is challenging. Non-coding elements include untranslated regions (UTRs) of mRNAs, independent small RNA genes (sRNAs) and transcripts produced from the antisense strand of genes (asRNA). Here we present a computational pipeline (DETR'PROK: detection of ncRNAs in prokaryotes) based on the Galaxy framework that takes as input a mapping of deep sequencing reads and performs successive steps of clustering, comparison with existing annotation and identification of transcribed non-coding fragments classified into putative 5' UTRs, sRNAs and asRNAs. We provide a step-by-step description of the protocol using real-life example data sets from Vibrio splendidus and Escherichia coli. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.
Deep-Earth reactor: nuclear fission, helium, and the geomagnetic field.
Hollenbach, D F; Herndon, J M
2001-09-25
Geomagnetic field reversals and changes in intensity are understandable from an energy standpoint as natural consequences of intermittent and/or variable nuclear fission chain reactions deep within the Earth. Moreover, deep-Earth production of helium, having (3)He/(4)He ratios within the range observed from deep-mantle sources, is demonstrated to be a consequence of nuclear fission. Numerical simulations of a planetary-scale geo-reactor were made by using the SCALE sequence of codes. The results clearly demonstrate that such a geo-reactor (i) would function as a fast-neutron fuel breeder reactor; (ii) could, under appropriate conditions, operate over the entire period of geologic time; and (iii) would function in such a manner as to yield variable and/or intermittent output power.
Bromberg, Yana; Yachdav, Guy; Ofran, Yanay; Schneider, Reinhard; Rost, Burkhard
2009-05-01
The rapidly increasing quantity of protein sequence data continues to widen the gap between available sequences and annotations. Comparative modeling suggests some aspects of the 3D structures of approximately half of all known proteins; homology- and network-based inferences annotate some aspect of function for a similar fraction of the proteome. For most known protein sequences, however, there is detailed knowledge about neither their function nor their structure. Comprehensive efforts towards the expert curation of sequence annotations have failed to meet the demand of the rapidly increasing number of available sequences. Only the automated prediction of protein function in the absence of homology can close the gap between available sequences and annotations in the foreseeable future. This review focuses on two novel methods for automated annotation, and briefly presents an outlook on how modern web software may revolutionize the field of protein sequence annotation. First, predictions of protein binding sites and functional hotspots, and the evolution of these into the most successful type of prediction of protein function from sequence will be discussed. Second, a new tool, comprehensive in silico mutagenesis, which contributes important novel predictions of function and at the same time prepares for the onset of the next sequencing revolution, will be described. While these two new sub-fields of protein prediction represent the breakthroughs that have been achieved methodologically, it will then be argued that a different development might further change the way biomedical researchers benefit from annotations: modern web software can connect the worldwide web in any browser with the 'Deep Web' (ie, proprietary data resources). The availability of this direct connection, and the resulting access to a wealth of data, may impact drug discovery and development more than any existing method that contributes to protein annotation.
An Anatomy of a Seismic Sequence in a Deep Gold Mine
NASA Astrophysics Data System (ADS)
Gibowicz, S. J.
1997-12-01
An unusual swarm-like seismic sequence occurred in April 1993 at the Western Deep Levels gold mine, South Africa. Altogether 199 events with moment magnitude from -0.5 to 3.1 were recorded and located by the mine seismic network. The sequence lasted 12 days and was composed in fact of four main shock-aftershocks sequences, closely following each other in space and time. The events were confined to a volume of rock extending to 670 m in the N-S, 630 m in the E-W, and 390 m in the vertical directions. The first sequence lasted 179 hours and the second only 13 hours, being interrupted by the third sequence which lasted 31 hours, being in turn interrupted by the fourth sequence. The parameter p, describing the rate of occurrence of aftershocks, ranged from 0.7 to 1. The first sequence is characterized by the lowest value of the fractal correlation dimension D = 1.75 and the second by the highest value of D = 2.4, whereas the third and fourth sequences are characterized by the middle value of D = 1.9.¶The corner frequencies of P and S waves are in close proximity and range from 14 to 220 Hz. A display of source parameters as a function of time shows that the four main shocks are most distinctly marked by their source radius. For 46 events a moment tensor inversion was performed. In most cases the double-couple component is dominant, ranging from 60 to 90 percent of the solution. The double-couple solutions correspond to the same number of normal and reverse faults and oblique-slip focal mechanisms. An analysis of space distribution of P, T and B axes reveals that the distribution of B axes is the most regular.
Compilation of Reprints Number 63.
1986-03-01
Michel Be6, Stephen H1. Johnson, and E.F. Chiburis PRELIMINARY SEISMIC REFRACTION RESULTS USING A BOREHOLE SEISMOMETER IN DEEP SEA DRILLING PROJECT HOLE...refraction data with wells drilled on land and offshore reflection profiles permits tentative identification of geologic sequences on the basis of...PERIOD CO’VEAEO PRELIMINARY SEISMIC REFRACTION RESULTS USING A Rern BOREHOLE SEISMOMETER IN DEEP SEA DRILLING ~ rn PROJECT HOLE 395A 6.PERFORMING ORG
Water mass dynamics shape Ross Sea protist communities in mesopelagic and bathypelagic layers
NASA Astrophysics Data System (ADS)
Zoccarato, Luca; Pallavicini, Alberto; Cerino, Federica; Fonda Umani, Serena; Celussi, Mauro
2016-12-01
Deep-sea environments host the largest pool of microbes and represent the last largely unexplored and poorly known ecosystems on Earth. The Ross Sea is characterized by unique oceanographic dynamics and harbors several water masses deeply involved in cooling and ventilation of deep oceans. In this study the V9 region of the 18S rDNA was targeted and sequenced with the Ion Torrent high-throughput sequencing technology to unveil differences in protist communities (>2 μm) correlated with biogeochemical properties of the water masses. The analyzed samples were significantly different in terms of environmental parameters and community composition outlining significant structuring effects of temperature and salinity. Overall, Alveolata (especially Dinophyta), Stramenopiles and Excavata groups dominated mesopelagic and bathypelagic layers, and protist communities were shaped according to the biogeochemistry of the water masses (advection effect and mixing events). Newly-formed High Salinity Shelf Water (HSSW) was characterized by high relative abundance of phototrophic organisms that bloom at the surface during the austral summer. Oxygen-depleted Circumpolar Deep Water (CDW) showed higher abundance of Excavata, common bacterivores in deep water masses. At the shelf-break, Antarctic Bottom Water (AABW), formed by the entrainment of shelf waters in CDW, maintained the eukaryotic genetic signature typical of both parental water masses.
Röthig, Till; Yum, Lauren K.; Kremb, Stephan G.; Roik, Anna; Voolstra, Christian R.
2017-01-01
Microbes associated with deep-sea corals remain poorly studied. The lack of symbiotic algae suggests that associated microbes may play a fundamental role in maintaining a viable coral host via acquisition and recycling of nutrients. Here we employed 16 S rRNA gene sequencing to study bacterial communities of three deep-sea scleractinian corals from the Red Sea, Dendrophyllia sp., Eguchipsammia fistula, and Rhizotrochus typus. We found diverse, species-specific microbiomes, distinct from the surrounding seawater. Microbiomes were comprised of few abundant bacteria, which constituted the majority of sequences (up to 58% depending on the coral species). In addition, we found a high diversity of rare bacteria (taxa at <1% abundance comprised >90% of all bacteria). Interestingly, we identified anaerobic bacteria, potentially providing metabolic functions at low oxygen conditions, as well as bacteria harboring the potential to degrade crude oil components. Considering the presence of oil and gas fields in the Red Sea, these bacteria may unlock this carbon source for the coral host. In conclusion, the prevailing environmental conditions of the deep Red Sea (>20 °C, <2 mg oxygen L−1) may require distinct functional adaptations, and our data suggest that bacterial communities may contribute to coral functioning in this challenging environment. PMID:28303925
Röthig, Till; Yum, Lauren K; Kremb, Stephan G; Roik, Anna; Voolstra, Christian R
2017-03-17
Microbes associated with deep-sea corals remain poorly studied. The lack of symbiotic algae suggests that associated microbes may play a fundamental role in maintaining a viable coral host via acquisition and recycling of nutrients. Here we employed 16 S rRNA gene sequencing to study bacterial communities of three deep-sea scleractinian corals from the Red Sea, Dendrophyllia sp., Eguchipsammia fistula, and Rhizotrochus typus. We found diverse, species-specific microbiomes, distinct from the surrounding seawater. Microbiomes were comprised of few abundant bacteria, which constituted the majority of sequences (up to 58% depending on the coral species). In addition, we found a high diversity of rare bacteria (taxa at <1% abundance comprised >90% of all bacteria). Interestingly, we identified anaerobic bacteria, potentially providing metabolic functions at low oxygen conditions, as well as bacteria harboring the potential to degrade crude oil components. Considering the presence of oil and gas fields in the Red Sea, these bacteria may unlock this carbon source for the coral host. In conclusion, the prevailing environmental conditions of the deep Red Sea (>20 °C, <2 mg oxygen L -1 ) may require distinct functional adaptations, and our data suggest that bacterial communities may contribute to coral functioning in this challenging environment.
Kumar, S; Gadagkar, S R
2000-12-01
The neighbor-joining (NJ) method is widely used in reconstructing large phylogenies because of its computational speed and the high accuracy in phylogenetic inference as revealed in computer simulation studies. However, most computer simulation studies have quantified the overall performance of the NJ method in terms of the percentage of branches inferred correctly or the percentage of replications in which the correct tree is recovered. We have examined other aspects of its performance, such as the relative efficiency in correctly reconstructing shallow (close to the external branches of the tree) and deep branches in large phylogenies; the contribution of zero-length branches to topological errors in the inferred trees; and the influence of increasing the tree size (number of sequences), evolutionary rate, and sequence length on the efficiency of the NJ method. Results show that the correct reconstruction of deep branches is no more difficult than that of shallower branches. The presence of zero-length branches in realized trees contributes significantly to the overall error observed in the NJ tree, especially in large phylogenies or slowly evolving genes. Furthermore, the tree size does not influence the efficiency of NJ in reconstructing shallow and deep branches in our simulation study, in which the evolutionary process is assumed to be homogeneous in all lineages.
NASA Astrophysics Data System (ADS)
Zhang, Likui; Kang, Manyu; Xu, Jiajun; Xu, Jian; Shuai, Yinjie; Zhou, Xiaojian; Yang, Zhihui; Ma, Kesen
2016-05-01
Active deep-sea hydrothermal vents harbor abundant thermophilic and hyperthermophilic microorganisms. However, microbial communities in inactive hydrothermal vents have not been well documented. Here, we investigated bacterial and archaeal communities in the two deep-sea sediments (named as TVG4 and TVG11) collected from inactive hydrothermal vents in the Southwest India Ridge using the high-throughput sequencing technology of Illumina MiSeq2500 platform. Based on the V4 region of 16S rRNA gene, sequence analysis showed that bacterial communities in the two samples were dominated by Proteobacteria, followed by Bacteroidetes, Actinobacteria and Firmicutes. Furthermore, archaeal communities in the two samples were dominated by Thaumarchaeota and Euryarchaeota. Comparative analysis showed that (i) TVG4 displayed the higher bacterial richness and lower archaeal richness than TVG11; (ii) the two samples had more divergence in archaeal communities than bacterial communities. Bacteria and archaea that are potentially associated with nitrogen, sulfur metal and methane cycling were detected in the two samples. Overall, we first provided a comparative picture of bacterial and archaeal communities and revealed their potentially ecological roles in the deep-sea environments of inactive hydrothermal vents in the Southwest Indian Ridge, augmenting microbial communities in inactive hydrothermal vents.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nio, S.D.; Yang, C.S.; Tewfik, N.
1993-09-01
A new development in the application of sequence stratigraphic concepts in marine as well as continental basins is the recognition of high-frequency cyclic patterns in rock successions in the subsurface. Studies of six wells from the northern, central, and southern parts of the Gulf of Suez show the presence of well-preserved, high-frequency cycles with periodicities similar to the orbitally forced Malankovitch parameters. Subsurface rock successions, third-order sequences, and high-frequency cycles were compared with outcrops. After establishing the biostratigraphic framework for the above-mentioned wells, a sequence analysis was performed. Sequence boundaries and maximum flooding positions in each well were calibrated withmore » the occurrences and evaluation of the high-frequency cycles. It became obvious that there is an intimate relationship between these high-frequency Milankovitch cycles and sequence organization. In addition, a close relationship can be observed in the subsurface as well as in outcrops between high-frequency climatic changes (connected to the Milankovitch cycles) and (litho)facies variability. Quantitative evaluations of each sequence and/or systems tract can be computed with the International Geoservices' cyclicity analysis tool (MILABAR). The results are summarized in a well composite chart, rate (NAR), and ratio of preserved time. In correlations between the wells, an accuracy of 500-100 Ka can be obtained. The quantitative evaluation of the sequence and high-frequency cycle analysis gave some new aspects concerning the (litho)facies and geodynamic development during the pre- as well as the synrift stages of the Gulf of Suez Basin.« less
Archaeal Diversity in Waters from Deep South African Gold Mines
Takai, Ken; Moser, Duane P.; DeFlaun, Mary; Onstott, Tullis C.; Fredrickson, James K.
2001-01-01
A culture-independent molecular analysis of archaeal communities in waters collected from deep South African gold mines was performed by performing a PCR-mediated terminal restriction fragment length polymorphism (T-RFLP) analysis of rRNA genes (rDNA) in conjunction with a sequencing analysis of archaeal rDNA clone libraries. The water samples used represented various environments, including deep fissure water, mine service water, and water from an overlying dolomite aquifer. T-RFLP analysis revealed that the ribotype distribution of archaea varied with the source of water. The archaeal communities in the deep gold mine environments exhibited great phylogenetic diversity; the majority of the members were most closely related to uncultivated species. Some archaeal rDNA clones obtained from mine service water and dolomite aquifer water samples were most closely related to environmental rDNA clones from surface soil (soil clones) and marine environments (marine group I [MGI]). Other clones exhibited intermediate phylogenetic affiliation between soil clones and MGI in the Crenarchaeota. Fissure water samples, derived from active or dormant geothermal environments, yielded archaeal sequences that exhibited novel phylogeny, including a novel lineage of Euryarchaeota. These results suggest that deep South African gold mines harbor novel archaeal communities distinct from those observed in other environments. Based on the phylogenetic analysis of archaeal strains and rDNA clones, including the newly discovered archaeal rDNA clones, the evolutionary relationship and the phylogenetic organization of the domain Archaea are reevaluated. PMID:11722932
Abseq: Ultrahigh-throughput single cell protein profiling with droplet microfluidic barcoding.
Shahi, Payam; Kim, Samuel C; Haliburton, John R; Gartner, Zev J; Abate, Adam R
2017-03-14
Proteins are the primary effectors of cellular function, including cellular metabolism, structural dynamics, and information processing. However, quantitative characterization of proteins at the single-cell level is challenging due to the tiny amount of protein available. Here, we present Abseq, a method to detect and quantitate proteins in single cells at ultrahigh throughput. Like flow and mass cytometry, Abseq uses specific antibodies to detect epitopes of interest; however, unlike these methods, antibodies are labeled with sequence tags that can be read out with microfluidic barcoding and DNA sequencing. We demonstrate this novel approach by characterizing surface proteins of different cell types at the single-cell level and distinguishing between the cells by their protein expression profiles. DNA-tagged antibodies provide multiple advantages for profiling proteins in single cells, including the ability to amplify low-abundance tags to make them detectable with sequencing, to use molecular indices for quantitative results, and essentially limitless multiplexing.