Sample records for deep sequencing approach

  1. Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions

    PubMed Central

    2014-01-01

    Deep sequencing harnesses the high throughput nature of next generation sequencing technologies to generate population samples, treating information contained in individual reads as meaningful. Here, we review applications of deep sequencing to pathogen evolution. Pioneering deep sequencing studies from the virology literature are discussed, such as whole genome Roche-454 sequencing analyses of the dynamics of the rapidly mutating pathogens hepatitis C virus and HIV. Extension of the deep sequencing approach to bacterial populations is then discussed, including the impacts of emerging sequencing technologies. While it is clear that deep sequencing has unprecedented potential for assessing the genetic structure and evolutionary history of pathogen populations, bioinformatic challenges remain. We summarise current approaches to overcoming these challenges, in particular methods for detecting low frequency variants in the context of sequencing error and reconstructing individual haplotypes from short reads. PMID:24428920

  2. Discovery radiomics via evolutionary deep radiomic sequencer discovery for pathologically proven lung cancer detection.

    PubMed

    Shafiee, Mohammad Javad; Chung, Audrey G; Khalvati, Farzad; Haider, Masoom A; Wong, Alexander

    2017-10-01

    While lung cancer is the second most diagnosed form of cancer in men and women, a sufficiently early diagnosis can be pivotal in patient survival rates. Imaging-based, or radiomics-driven, detection methods have been developed to aid diagnosticians, but largely rely on hand-crafted features that may not fully encapsulate the differences between cancerous and healthy tissue. Recently, the concept of discovery radiomics was introduced, where custom abstract features are discovered from readily available imaging data. We propose an evolutionary deep radiomic sequencer discovery approach based on evolutionary deep intelligence. Motivated by patient privacy concerns and the idea of operational artificial intelligence, the evolutionary deep radiomic sequencer discovery approach organically evolves increasingly more efficient deep radiomic sequencers that produce significantly more compact yet similarly descriptive radiomic sequences over multiple generations. As a result, this framework improves operational efficiency and enables diagnosis to be run locally at the radiologist's computer while maintaining detection accuracy. We evaluated the evolved deep radiomic sequencer (EDRS) discovered via the proposed evolutionary deep radiomic sequencer discovery framework against state-of-the-art radiomics-driven and discovery radiomics methods using clinical lung CT data with pathologically proven diagnostic data from the LIDC-IDRI dataset. The EDRS shows improved sensitivity (93.42%), specificity (82.39%), and diagnostic accuracy (88.78%) relative to previous radiomics approaches.

  3. DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data.

    PubMed

    Arango-Argoty, Gustavo; Garner, Emily; Pruden, Amy; Heath, Lenwood S; Vikesland, Peter; Zhang, Liqing

    2018-02-01

    Growing concerns about increasing rates of antibiotic resistance call for expanded and comprehensive global monitoring. Advancing methods for monitoring of environmental media (e.g., wastewater, agricultural waste, food, and water) is especially needed for identifying potential resources of novel antibiotic resistance genes (ARGs), hot spots for gene exchange, and as pathways for the spread of ARGs and human exposure. Next-generation sequencing now enables direct access and profiling of the total metagenomic DNA pool, where ARGs are typically identified or predicted based on the "best hits" of sequence searches against existing databases. Unfortunately, this approach produces a high rate of false negatives. To address such limitations, we propose here a deep learning approach, taking into account a dissimilarity matrix created using all known categories of ARGs. Two deep learning models, DeepARG-SS and DeepARG-LS, were constructed for short read sequences and full gene length sequences, respectively. Evaluation of the deep learning models over 30 antibiotic resistance categories demonstrates that the DeepARG models can predict ARGs with both high precision (> 0.97) and recall (> 0.90). The models displayed an advantage over the typical best hit approach, yielding consistently lower false negative rates and thus higher overall recall (> 0.9). As more data become available for under-represented ARG categories, the DeepARG models' performance can be expected to be further enhanced due to the nature of the underlying neural networks. Our newly developed ARG database, DeepARG-DB, encompasses ARGs predicted with a high degree of confidence and extensive manual inspection, greatly expanding current ARG repositories. The deep learning models developed here offer more accurate antimicrobial resistance annotation relative to current bioinformatics practice. DeepARG does not require strict cutoffs, which enables identification of a much broader diversity of ARGs. The DeepARG models and database are available as a command line version and as a Web service at http://bench.cs.vt.edu/deeparg .

  4. Virus Identification in Unknown Tropical Febrile Illness Cases Using Deep Sequencing

    PubMed Central

    Balmaseda, Angel; Harris, Eva; DeRisi, Joseph L.

    2012-01-01

    Dengue virus is an emerging infectious agent that infects an estimated 50–100 million people annually worldwide, yet current diagnostic practices cannot detect an etiologic pathogen in ∼40% of dengue-like illnesses. Metagenomic approaches to pathogen detection, such as viral microarrays and deep sequencing, are promising tools to address emerging and non-diagnosable disease challenges. In this study, we used the Virochip microarray and deep sequencing to characterize the spectrum of viruses present in human sera from 123 Nicaraguan patients presenting with dengue-like symptoms but testing negative for dengue virus. We utilized a barcoding strategy to simultaneously deep sequence multiple serum specimens, generating on average over 1 million reads per sample. We then implemented a stepwise bioinformatic filtering pipeline to remove the majority of human and low-quality sequences to improve the speed and accuracy of subsequent unbiased database searches. By deep sequencing, we were able to detect virus sequence in 37% (45/123) of previously negative cases. These included 13 cases with Human Herpesvirus 6 sequences. Other samples contained sequences with similarity to sequences from viruses in the Herpesviridae, Flaviviridae, Circoviridae, Anelloviridae, Asfarviridae, and Parvoviridae families. In some cases, the putative viral sequences were virtually identical to known viruses, and in others they diverged, suggesting that they may derive from novel viruses. These results demonstrate the utility of unbiased metagenomic approaches in the detection of known and divergent viruses in the study of tropical febrile illness. PMID:22347512

  5. De novo peptide sequencing by deep learning

    PubMed Central

    Tran, Ngoc Hieu; Zhang, Xianglilan; Xin, Lei; Shan, Baozhen; Li, Ming

    2017-01-01

    De novo peptide sequencing from tandem MS data is the key technology in proteomics for the characterization of proteins, especially for new sequences, such as mAbs. In this study, we propose a deep neural network model, DeepNovo, for de novo peptide sequencing. DeepNovo architecture combines recent advances in convolutional neural networks and recurrent neural networks to learn features of tandem mass spectra, fragment ions, and sequence patterns of peptides. The networks are further integrated with local dynamic programming to solve the complex optimization task of de novo sequencing. We evaluated the method on a wide variety of species and found that DeepNovo considerably outperformed state of the art methods, achieving 7.7–22.9% higher accuracy at the amino acid level and 38.1–64.0% higher accuracy at the peptide level. We further used DeepNovo to automatically reconstruct the complete sequences of antibody light and heavy chains of mouse, achieving 97.5–100% coverage and 97.2–99.5% accuracy, without assisting databases. Moreover, DeepNovo is retrainable to adapt to any sources of data and provides a complete end-to-end training and prediction solution to the de novo sequencing problem. Not only does our study extend the deep learning revolution to a new field, but it also shows an innovative approach in solving optimization problems by using deep learning and dynamic programming. PMID:28720701

  6. Exploring fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing

    NASA Astrophysics Data System (ADS)

    Zhang, Xiao-Yong; Wang, Guang-Hua; Xu, Xin-Ya; Nong, Xu-Hua; Wang, Jie; Amin, Muhammad; Qi, Shu-Hua

    2016-10-01

    The present study investigated the fungal diversity in four different deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing of the nuclear ribosomal internal transcribed spacer-1 (ITS1). A total of 40,297 fungal ITS1 sequences clustered into 420 operational taxonomic units (OTUs) with 97% sequence similarity and 170 taxa were recovered from these sediments. Most ITS1 sequences (78%) belonged to the phylum Ascomycota, followed by Basidiomycota (17.3%), Zygomycota (1.5%) and Chytridiomycota (0.8%), and a small proportion (2.4%) belonged to unassigned fungal phyla. Compared with previous studies on fungal diversity of sediments from deep-sea environments by culture-dependent approach and clone library analysis, the present result suggested that Illumina sequencing had been dramatically accelerating the discovery of fungal community of deep-sea sediments. Furthermore, our results revealed that Sordariomycetes was the most diverse and abundant fungal class in this study, challenging the traditional view that the diversity of Sordariomycetes phylotypes was low in the deep-sea environments. In addition, more than 12 taxa accounted for 21.5% sequences were found to be rarely reported as deep-sea fungi, suggesting the deep-sea sediments from Okinawa Trough harbored a plethora of different fungal communities compared with other deep-sea environments. To our knowledge, this study is the first exploration of the fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing.

  7. VirusDetect: An automated pipeline for efficient virus discovery using deep sequencing of small RNAs

    USDA-ARS?s Scientific Manuscript database

    Accurate detection of viruses in plants and animals is critical for agriculture production and human health. Deep sequencing and assembly of virus-derived siRNAs has proven to be a highly efficient approach for virus discovery. However, to date no computational tools specifically designed for both k...

  8. DeepText2GO: Improving large-scale protein function prediction with deep semantic text representation.

    PubMed

    You, Ronghui; Huang, Xiaodi; Zhu, Shanfeng

    2018-06-06

    As of April 2018, UniProtKB has collected more than 115 million protein sequences. Less than 0.15% of these proteins, however, have been associated with experimental GO annotations. As such, the use of automatic protein function prediction (AFP) to reduce this huge gap becomes increasingly important. The previous studies conclude that sequence homology based methods are highly effective in AFP. In addition, mining motif, domain, and functional information from protein sequences has been found very helpful for AFP. Other than sequences, alternative information sources such as text, however, may be useful for AFP as well. Instead of using BOW (bag of words) representation in traditional text-based AFP, we propose a new method called DeepText2GO that relies on deep semantic text representation, together with different kinds of available protein information such as sequence homology, families, domains, and motifs, to improve large-scale AFP. Furthermore, DeepText2GO integrates text-based methods with sequence-based ones by means of a consensus approach. Extensive experiments on the benchmark dataset extracted from UniProt/SwissProt have demonstrated that DeepText2GO significantly outperformed both text-based and sequence-based methods, validating its superiority. Copyright © 2018 Elsevier Inc. All rights reserved.

  9. Rational Protein Engineering Guided by Deep Mutational Scanning

    PubMed Central

    Shin, HyeonSeok; Cho, Byung-Kwan

    2015-01-01

    Sequence–function relationship in a protein is commonly determined by the three-dimensional protein structure followed by various biochemical experiments. However, with the explosive increase in the number of genome sequences, facilitated by recent advances in sequencing technology, the gap between protein sequences available and three-dimensional structures is rapidly widening. A recently developed method termed deep mutational scanning explores the functional phenotype of thousands of mutants via massive sequencing. Coupled with a highly efficient screening system, this approach assesses the phenotypic changes made by the substitution of each amino acid sequence that constitutes a protein. Such an informational resource provides the functional role of each amino acid sequence, thereby providing sufficient rationale for selecting target residues for protein engineering. Here, we discuss the current applications of deep mutational scanning and consider experimental design. PMID:26404267

  10. A robust and cost-effective approach to sequence and analyze complete genomes of small RNA viruses

    USDA-ARS?s Scientific Manuscript database

    Background: Next-generation sequencing (NGS) allows ultra-deep sequencing of nucleic acids. The use of sequence-independent amplification of viral nucleic acids without utilization of target-specific primers provides advantages over traditional sequencing methods and allows detection of unsuspected ...

  11. Position-specific automated processing of V3 env ultra-deep pyrosequencing data for predicting HIV-1 tropism

    PubMed Central

    Jeanne, Nicolas; Saliou, Adrien; Carcenac, Romain; Lefebvre, Caroline; Dubois, Martine; Cazabat, Michelle; Nicot, Florence; Loiseau, Claire; Raymond, Stéphanie; Izopet, Jacques; Delobel, Pierre

    2015-01-01

    HIV-1 coreceptor usage must be accurately determined before starting CCR5 antagonist-based treatment as the presence of undetected minor CXCR4-using variants can cause subsequent virological failure. Ultra-deep pyrosequencing of HIV-1 V3 env allows to detect low levels of CXCR4-using variants that current genotypic approaches miss. However, the computation of the mass of sequence data and the need to identify true minor variants while excluding artifactual sequences generated during amplification and ultra-deep pyrosequencing is rate-limiting. Arbitrary fixed cut-offs below which minor variants are discarded are currently used but the errors generated during ultra-deep pyrosequencing are sequence-dependant rather than random. We have developed an automated processing of HIV-1 V3 env ultra-deep pyrosequencing data that uses biological filters to discard artifactual or non-functional V3 sequences followed by statistical filters to determine position-specific sensitivity thresholds, rather than arbitrary fixed cut-offs. It allows to retain authentic sequences with point mutations at V3 positions of interest and discard artifactual ones with accurate sensitivity thresholds. PMID:26585833

  12. Position-specific automated processing of V3 env ultra-deep pyrosequencing data for predicting HIV-1 tropism.

    PubMed

    Jeanne, Nicolas; Saliou, Adrien; Carcenac, Romain; Lefebvre, Caroline; Dubois, Martine; Cazabat, Michelle; Nicot, Florence; Loiseau, Claire; Raymond, Stéphanie; Izopet, Jacques; Delobel, Pierre

    2015-11-20

    HIV-1 coreceptor usage must be accurately determined before starting CCR5 antagonist-based treatment as the presence of undetected minor CXCR4-using variants can cause subsequent virological failure. Ultra-deep pyrosequencing of HIV-1 V3 env allows to detect low levels of CXCR4-using variants that current genotypic approaches miss. However, the computation of the mass of sequence data and the need to identify true minor variants while excluding artifactual sequences generated during amplification and ultra-deep pyrosequencing is rate-limiting. Arbitrary fixed cut-offs below which minor variants are discarded are currently used but the errors generated during ultra-deep pyrosequencing are sequence-dependant rather than random. We have developed an automated processing of HIV-1 V3 env ultra-deep pyrosequencing data that uses biological filters to discard artifactual or non-functional V3 sequences followed by statistical filters to determine position-specific sensitivity thresholds, rather than arbitrary fixed cut-offs. It allows to retain authentic sequences with point mutations at V3 positions of interest and discard artifactual ones with accurate sensitivity thresholds.

  13. Predicting RNA-protein binding sites and motifs through combining local and global deep convolutional neural networks.

    PubMed

    Pan, Xiaoyong; Shen, Hong-Bin

    2018-05-02

    RNA-binding proteins (RBPs) take over 5∼10% of the eukaryotic proteome and play key roles in many biological processes, e.g. gene regulation. Experimental detection of RBP binding sites is still time-intensive and high-costly. Instead, computational prediction of the RBP binding sites using pattern learned from existing annotation knowledge is a fast approach. From the biological point of view, the local structure context derived from local sequences will be recognized by specific RBPs. However, in computational modeling using deep learning, to our best knowledge, only global representations of entire RNA sequences are employed. So far, the local sequence information is ignored in the deep model construction process. In this study, we present a computational method iDeepE to predict RNA-protein binding sites from RNA sequences by combining global and local convolutional neural networks (CNNs). For the global CNN, we pad the RNA sequences into the same length. For the local CNN, we split a RNA sequence into multiple overlapping fixed-length subsequences, where each subsequence is a signal channel of the whole sequence. Next, we train deep CNNs for multiple subsequences and the padded sequences to learn high-level features, respectively. Finally, the outputs from local and global CNNs are combined to improve the prediction. iDeepE demonstrates a better performance over state-of-the-art methods on two large-scale datasets derived from CLIP-seq. We also find that the local CNN run 1.8 times faster than the global CNN with comparable performance when using GPUs. Our results show that iDeepE has captured experimentally verified binding motifs. https://github.com/xypan1232/iDeepE. xypan172436@gmail.com or hbshen@sjtu.edu.cn. Supplementary data are available at Bioinformatics online.

  14. Insights into Deep-Sea Sediment Fungal Communities from the East Indian Ocean Using Targeted Environmental Sequencing Combined with Traditional Cultivation

    PubMed Central

    Zhang, Xiao-yong; Tang, Gui-ling; Xu, Xin-ya; Nong, Xu-hua; Qi, Shu-Hua

    2014-01-01

    The fungal diversity in deep-sea environments has recently gained an increasing amount attention. Our knowledge and understanding of the true fungal diversity and the role it plays in deep-sea environments, however, is still limited. We investigated the fungal community structure in five sediments from a depth of ∼4000 m in the East India Ocean using a combination of targeted environmental sequencing and traditional cultivation. This approach resulted in the recovery of a total of 45 fungal operational taxonomic units (OTUs) and 20 culturable fungal phylotypes. This finding indicates that there is a great amount of fungal diversity in the deep-sea sediments collected in the East Indian Ocean. Three fungal OTUs and one culturable phylotype demonstrated high divergence (89%–97%) from the existing sequences in the GenBank. Moreover, 44.4% fungal OTUs and 30% culturable fungal phylotypes are new reports for deep-sea sediments. These results suggest that the deep-sea sediments from the East India Ocean can serve as habitats for new fungal communities compared with other deep-sea environments. In addition, different fungal community could be detected when using targeted environmental sequencing compared with traditional cultivation in this study, which suggests that a combination of targeted environmental sequencing and traditional cultivation will generate a more diverse fungal community in deep-sea environments than using either targeted environmental sequencing or traditional cultivation alone. This study is the first to report new insights into the fungal communities in deep-sea sediments from the East Indian Ocean, which increases our knowledge and understanding of the fungal diversity in deep-sea environments. PMID:25272044

  15. Identifying active foraminifera in the Sea of Japan using metatranscriptomic approach

    NASA Astrophysics Data System (ADS)

    Lejzerowicz, Franck; Voltsky, Ivan; Pawlowski, Jan

    2013-02-01

    Metagenetics represents an efficient and rapid tool to describe environmental diversity patterns of microbial eukaryotes based on ribosomal DNA sequences. However, the results of metagenetic studies are often biased by the presence of extracellular DNA molecules that are persistent in the environment, especially in deep-sea sediment. As an alternative, short-lived RNA molecules constitute a good proxy for the detection of active species. Here, we used a metatranscriptomic approach based on RNA-derived (cDNA) sequences to study the diversity of the deep-sea benthic foraminifera and compared it to the metagenetic approach. We analyzed 257 ribosomal DNA and cDNA sequences obtained from seven sediments samples collected in the Sea of Japan at depths ranging from 486 to 3665 m. The DNA and RNA-based approaches gave a similar view of the taxonomic composition of foraminiferal assemblage, but differed in some important points. First, the cDNA dataset was dominated by sequences of rotaliids and robertiniids, suggesting that these calcareous species, some of which have been observed in Rose Bengal stained samples, are the most active component of foraminiferal community. Second, the richness of monothalamous (single-chambered) foraminifera was particularly high in DNA extracts from the deepest samples, confirming that this group of foraminifera is abundant but not necessarily very active in the deep-sea sediments. Finally, the high divergence of undetermined sequences in cDNA dataset indicate the limits of our database and lack of knowledge about some active but possibly rare species. Our study demonstrates the capability of the metatranscriptomic approach to detect active foraminiferal species and prompt its use in future high-throughput sequencing-based environmental surveys.

  16. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier.

    PubMed

    Kulmanov, Maxat; Khan, Mohammed Asif; Hoehndorf, Robert; Wren, Jonathan

    2018-02-15

    A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein-protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations. Web server: http://deepgo.bio2vec.net, Source code: https://github.com/bio-ontology-research-group/deepgo. robert.hoehndorf@kaust.edu.sa. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  17. High Class-Imbalance in pre-miRNA Prediction: A Novel Approach Based on deepSOM.

    PubMed

    Stegmayer, Georgina; Yones, Cristian; Kamenetzky, Laura; Milone, Diego H

    2017-01-01

    The computational prediction of novel microRNA within a full genome involves identifying sequences having the highest chance of being a miRNA precursor (pre-miRNA). These sequences are usually named candidates to miRNA. The well-known pre-miRNAs are usually only a few in comparison to the hundreds of thousands of potential candidates to miRNA that have to be analyzed, which makes this task a high class-imbalance classification problem. The classical way of approaching it has been training a binary classifier in a supervised manner, using well-known pre-miRNAs as positive class and artificially defining the negative class. However, although the selection of positive labeled examples is straightforward, it is very difficult to build a set of negative examples in order to obtain a good set of training samples for a supervised method. In this work, we propose a novel and effective way of approaching this problem using machine learning, without the definition of negative examples. The proposal is based on clustering unlabeled sequences of a genome together with well-known miRNA precursors for the organism under study, which allows for the quick identification of the best candidates to miRNA as those sequences clustered with known precursors. Furthermore, we propose a deep model to overcome the problem of having very few positive class labels. They are always maintained in the deep levels as positive class while less likely pre-miRNA sequences are filtered level after level. Our approach has been compared with other methods for pre-miRNAs prediction in several species, showing effective predictivity of novel miRNAs. Additionally, we will show that our approach has a lower training time and allows for a better graphical navegability and interpretation of the results. A web-demo interface to try deepSOM is available at http://fich.unl.edu.ar/sinc/web-demo/deepsom/.

  18. Fungal diversity in deep-sea sediments of a hydrothermal vent system in the Southwest Indian Ridge

    NASA Astrophysics Data System (ADS)

    Xu, Wei; Gong, Lin-feng; Pang, Ka-Lai; Luo, Zhu-Hua

    2018-01-01

    Deep-sea hydrothermal sediment is known to support remarkably diverse microbial consortia. In deep sea environments, fungal communities remain less studied despite their known taxonomic and functional diversity. High-throughput sequencing methods have augmented our capacity to assess eukaryotic diversity and their functions in microbial ecology. Here we provide the first description of the fungal community diversity found in deep sea sediments collected at the Southwest Indian Ridge (SWIR) using culture-dependent and high-throughput sequencing approaches. A total of 138 fungal isolates were cultured from seven different sediment samples using various nutrient media, and these isolates were identified to 14 fungal taxa, including 11 Ascomycota taxa (7 genera) and 3 Basidiomycota taxa (2 genera) based on internal transcribed spacers (ITS1, ITS2 and 5.8S) of rDNA. Using illumina HiSeq sequencing, a total of 757,467 fungal ITS2 tags were recovered from the samples and clustered into 723 operational taxonomic units (OTUs) belonging to 79 taxa (Ascomycota and Basidiomycota contributed to 99% of all samples) based on 97% sequence similarity. Results from both approaches suggest that there is a high fungal diversity in the deep-sea sediments collected in the SWIR and fungal communities were shown to be slightly different by location, although all were collected from adjacent sites at the SWIR. This study provides baseline data of the fungal diversity and biogeography, and a glimpse to the microbial ecology associated with the deep-sea sediments of the hydrothermal vent system of the Southwest Indian Ridge.

  19. Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning.

    PubMed

    Teng, Haotian; Cao, Minh Duc; Hall, Michael B; Duarte, Tania; Wang, Sheng; Coin, Lachlan J M

    2018-05-01

    Sequencing by translocating DNA fragments through an array of nanopores is a rapidly maturing technology that offers faster and cheaper sequencing than other approaches. However, accurately deciphering the DNA sequence from the noisy and complex electrical signal is challenging. Here, we report Chiron, the first deep learning model to achieve end-to-end basecalling and directly translate the raw signal to DNA sequence without the error-prone segmentation step. Trained with only a small set of 4,000 reads, we show that our model provides state-of-the-art basecalling accuracy, even on previously unseen species. Chiron achieves basecalling speeds of more than 2,000 bases per second using desktop computer graphics processing units.

  20. DeepSig: deep learning improves signal peptide detection in proteins.

    PubMed

    Savojardo, Castrense; Martelli, Pier Luigi; Fariselli, Piero; Casadio, Rita

    2018-05-15

    The identification of signal peptides in protein sequences is an important step toward protein localization and function characterization. Here, we present DeepSig, an improved approach for signal peptide detection and cleavage-site prediction based on deep learning methods. Comparative benchmarks performed on an updated independent dataset of proteins show that DeepSig is the current best performing method, scoring better than other available state-of-the-art approaches on both signal peptide detection and precise cleavage-site identification. DeepSig is available as both standalone program and web server at https://deepsig.biocomp.unibo.it. All datasets used in this study can be obtained from the same website. pierluigi.martelli@unibo.it. Supplementary data are available at Bioinformatics online.

  1. Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Whitehead, Timothy A.; Chevalier, Aaron; Song, Yifan

    2012-06-19

    We show that comprehensive sequence-function maps obtained by deep sequencing can be used to reprogram interaction specificity and to leapfrog over bottlenecks in affinity maturation by combining many individually small contributions not detectable in conventional approaches. We use this approach to optimize two computationally designed inhibitors against H1N1 influenza hemagglutinin and, in both cases, obtain variants with subnanomolar binding affinity. The most potent of these, a 51-residue protein, is broadly cross-reactive against all influenza group 1 hemagglutinins, including human H2, and neutralizes H1N1 viruses with a potency that rivals that of several human monoclonal antibodies, demonstrating that computational design followedmore » by comprehensive energy landscape mapping can generate proteins with potential therapeutic utility.« less

  2. DEEP MOTIF DASHBOARD: VISUALIZING AND UNDERSTANDING GENOMIC SEQUENCES USING DEEP NEURAL NETWORKS.

    PubMed

    Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun

    2017-01-01

    Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence's saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them.

  3. Sequence-specific bias correction for RNA-seq data using recurrent neural networks.

    PubMed

    Zhang, Yao-Zhong; Yamaguchi, Rui; Imoto, Seiya; Miyano, Satoru

    2017-01-25

    The recent success of deep learning techniques in machine learning and artificial intelligence has stimulated a great deal of interest among bioinformaticians, who now wish to bring the power of deep learning to bare on a host of bioinformatical problems. Deep learning is ideally suited for biological problems that require automatic or hierarchical feature representation for biological data when prior knowledge is limited. In this work, we address the sequence-specific bias correction problem for RNA-seq data redusing Recurrent Neural Networks (RNNs) to model nucleotide sequences without pre-determining sequence structures. The sequence-specific bias of a read is then calculated based on the sequence probabilities estimated by RNNs, and used in the estimation of gene abundance. We explore the application of two popular RNN recurrent units for this task and demonstrate that RNN-based approaches provide a flexible way to model nucleotide sequences without knowledge of predetermined sequence structures. Our experiments show that training a RNN-based nucleotide sequence model is efficient and RNN-based bias correction methods compare well with the-state-of-the-art sequence-specific bias correction method on the commonly used MAQC-III data set. RNNs provides an alternative and flexible way to calculate sequence-specific bias without explicitly pre-determining sequence structures.

  4. RNA-Seq analysis to capture the transcriptome landscape of a single cell

    PubMed Central

    Tang, Fuchou; Barbacioru, Catalin; Nordman, Ellen; Xu, Nanlan; Bashkirov, Vladimir I; Lao, Kaiqin; Surani, M. Azim

    2013-01-01

    We describe here a protocol for digital transcriptome analysis in a single mouse blastomere using a deep sequencing approach. An individual blastomere was first isolated and put into lysate buffer by mouth pipette. Reverse transcription was then performed directly on the whole cell lysate. After this, the free primers were removed by Exonuclease I and a poly(A) tail was added to the 3′ end of the first-strand cDNA by Terminal Deoxynucleotidyl Transferase. Then the single cell cDNAs were amplified by 20 plus 9 cycles of PCR. Then 100-200 ng of these amplified cDNAs were used to construct a sequencing library. The sequencing library can be used for deep sequencing using the SOLiD system. Compared with the cDNA microarray technique, our assay can capture up to 75% more genes expressed in early embryos. The protocol can generate deep sequencing libraries within 6 days for 16 single cell samples. PMID:20203668

  5. De Novo Deep Transcriptome Analysis of Medicinal Plants for Gene Discovery in Biosynthesis of Plant Natural Products.

    PubMed

    Han, R; Rai, A; Nakamura, M; Suzuki, H; Takahashi, H; Yamazaki, M; Saito, K

    2016-01-01

    Study on transcriptome, the entire pool of transcripts in an organism or single cells at certain physiological or pathological stage, is indispensable in unraveling the connection and regulation between DNA and protein. Before the advent of deep sequencing, microarray was the main approach to handle transcripts. Despite obvious shortcomings, including limited dynamic range and difficulties to compare the results from distinct experiments, microarray was widely applied. During the past decade, next-generation sequencing (NGS) has revolutionized our understanding of genomics in a fast, high-throughput, cost-effective, and tractable manner. By adopting NGS, efficiency and fruitful outcomes concerning the efforts to elucidate genes responsible for producing active compounds in medicinal plants were profoundly enhanced. The whole process involves steps, from the plant material sampling, to cDNA library preparation, to deep sequencing, and then bioinformatics takes over to assemble enormous-yet fragmentary-data from which to comb and extract information. The unprecedentedly rapid development of such technologies provides so many choices to facilitate the task, which can cause confusion when choosing the suitable methodology for specific purposes. Here, we review the general approaches for deep transcriptome analysis and then focus on their application in discovering biosynthetic pathways of medicinal plants that produce important secondary metabolites. © 2016 Elsevier Inc. All rights reserved.

  6. Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks

    PubMed Central

    Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun

    2018-01-01

    Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence’s saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them. PMID:27896980

  7. Rapid gene identification in sugar beet using deep sequencing of DNA from phenotypic pools selected from breeding panels.

    PubMed

    Ries, David; Holtgräwe, Daniela; Viehöver, Prisca; Weisshaar, Bernd

    2016-03-15

    The combination of bulk segregant analysis (BSA) and next generation sequencing (NGS), also known as mapping by sequencing (MBS), has been shown to significantly accelerate the identification of causal mutations for species with a reference genome sequence. The usual approach is to cross homozygous parents that differ for the monogenic trait to address, to perform deep sequencing of DNA from F2 plants pooled according to their phenotype, and subsequently to analyze the allele frequency distribution based on a marker table for the parents studied. The method has been successfully applied for EMS induced mutations as well as natural variation. Here, we show that pooling genetically diverse breeding lines according to a contrasting phenotype also allows high resolution mapping of the causal gene in a crop species. The test case was the monogenic locus causing red vs. green hypocotyl color in Beta vulgaris (R locus). We determined the allele frequencies of polymorphic sequences using sequence data from two diverging phenotypic pools of 180 B. vulgaris accessions each. A single interval of about 31 kbp among the nine chromosomes was identified which indeed contained the causative mutation. By applying a variation of the mapping by sequencing approach, we demonstrated that phenotype-based pooling of diverse accessions from breeding panels and subsequent direct determination of the allele frequency distribution can be successfully applied for gene identification in a crop species. Our approach made it possible to identify a small interval around the causative gene. Sequencing of parents or individual lines was not necessary. Whenever the appropriate plant material is available, the approach described saves time compared to the generation of an F2 population. In addition, we provide clues for planning similar experiments with regard to pool size and the sequencing depth required.

  8. Deep-sequencing to resolve complex diversity of apicomplexan parasites in platypuses and echidnas: Proof of principle for wildlife disease investigation.

    PubMed

    Šlapeta, Jan; Saverimuttu, Stefan; Vogelnest, Larry; Sangster, Cheryl; Hulst, Frances; Rose, Karrie; Thompson, Paul; Whittington, Richard

    2017-11-01

    The short-beaked echidna (Tachyglossus aculeatus) and the platypus (Ornithorhynchus anatinus) are iconic egg-laying monotremes (Mammalia: Monotremata) from Australasia. The aim of this study was to demonstrate the utility of diversity profiles in disease investigations of monotremes. Using small subunit (18S) rDNA amplicon deep-sequencing we demonstrated the presence of apicomplexan parasites and confirmed by direct and cloned amplicon gene sequencing Theileria ornithorhynchi, Theileria tachyglossi, Eimeria echidnae and Cryptosporidium fayeri. Using a combination of samples from healthy and diseased animals, we show a close evolutionary relationship between species of coccidia (Eimeria) and piroplasms (Theileria) from the echidna and platypus. The presence of E. echidnae was demonstrated in faeces and tissues affected by disseminated coccidiosis. Moreover, the presence of E. echidnae DNA in the blood of echidnas was associated with atoxoplasma-like stages in white blood cells, suggesting Hepatozoon tachyglossi blood stages are disseminated E. echidnae stages. These next-generation DNA sequencing technologies are suited to material and organisms that have not been previously characterised and for which the material is scarce. The deep sequencing approach supports traditional diagnostic methods, including microscopy, clinical pathology and histopathology, to better define the status quo. This approach is particularly suitable for wildlife disease investigation. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. The dynamics of genome replication using deep sequencing

    PubMed Central

    Müller, Carolin A.; Hawkins, Michelle; Retkute, Renata; Malla, Sunir; Wilson, Ray; Blythe, Martin J.; Nakato, Ryuichiro; Komata, Makiko; Shirahige, Katsuhiko; de Moura, Alessandro P.S.; Nieduszynski, Conrad A.

    2014-01-01

    Eukaryotic genomes are replicated from multiple DNA replication origins. We present complementary deep sequencing approaches to measure origin location and activity in Saccharomyces cerevisiae. Measuring the increase in DNA copy number during a synchronous S-phase allowed the precise determination of genome replication. To map origin locations, replication forks were stalled close to their initiation sites; therefore, copy number enrichment was limited to origins. Replication timing profiles were generated from asynchronous cultures using fluorescence-activated cell sorting. Applying this technique we show that the replication profiles of haploid and diploid cells are indistinguishable, indicating that both cell types use the same cohort of origins with the same activities. Finally, increasing sequencing depth allowed the direct measure of replication dynamics from an exponentially growing culture. This is the first time this approach, called marker frequency analysis, has been successfully applied to a eukaryote. These data provide a high-resolution resource and methodological framework for studying genome biology. PMID:24089142

  10. Modeling and prediction of peptide drift times in ion mobility spectrometry using sequence-based and structure-based approaches.

    PubMed

    Zhang, Yiming; Jin, Quan; Wang, Shuting; Ren, Ren

    2011-05-01

    The mobile behavior of 1481 peptides in ion mobility spectrometry (IMS), which are generated by protease digestion of the Drosophila melanogaster proteome, is modeled and predicted based on two different types of characterization methods, i.e. sequence-based approach and structure-based approach. In this procedure, the sequence-based approach considers both the amino acid composition of a peptide and the local environment profile of each amino acid in the peptide; the structure-based approach is performed with the CODESSA protocol, which regards a peptide as a common organic compound and generates more than 200 statistically significant variables to characterize the whole structure profile of a peptide molecule. Subsequently, the nonlinear support vector machine (SVM) and Gaussian process (GP) as well as linear partial least squares (PLS) regression is employed to correlate the structural parameters of the characterizations with the IMS drift times of these peptides. The obtained quantitative structure-spectrum relationship (QSSR) models are evaluated rigorously and investigated systematically via both one-deep and two-deep cross-validations as well as the rigorous Monte Carlo cross-validation (MCCV). We also give a comprehensive comparison on the resulting statistics arising from the different combinations of variable types with modeling methods and find that the sequence-based approach can give the QSSR models with better fitting ability and predictive power but worse interpretability than the structure-based approach. In addition, though the QSSR modeling using sequence-based approach is not needed for the preparation of the minimization structures of peptides before the modeling, it would be considerably efficient as compared to that using structure-based approach. Copyright © 2011 Elsevier Ltd. All rights reserved.

  11. Deep sequencing analysis of viral infection and evolution allows rapid and detailed characterization of viral mutant spectrum.

    PubMed

    Isakov, Ofer; Bordería, Antonio V; Golan, David; Hamenahem, Amir; Celniker, Gershon; Yoffe, Liron; Blanc, Hervé; Vignuzzi, Marco; Shomron, Noam

    2015-07-01

    The study of RNA virus populations is a challenging task. Each population of RNA virus is composed of a collection of different, yet related genomes often referred to as mutant spectra or quasispecies. Virologists using deep sequencing technologies face major obstacles when studying virus population dynamics, both experimentally and in natural settings due to the relatively high error rates of these technologies and the lack of high performance pipelines. In order to overcome these hurdles we developed a computational pipeline, termed ViVan (Viral Variance Analysis). ViVan is a complete pipeline facilitating the identification, characterization and comparison of sequence variance in deep sequenced virus populations. Applying ViVan on deep sequenced data obtained from samples that were previously characterized by more classical approaches, we uncovered novel and potentially crucial aspects of virus populations. With our experimental work, we illustrate how ViVan can be used for studies ranging from the more practical, detection of resistant mutations and effects of antiviral treatments, to the more theoretical temporal characterization of the population in evolutionary studies. Freely available on the web at http://www.vivanbioinfo.org : nshomron@post.tau.ac.il Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  12. Proteome-wide Identification of Novel Ceramide-binding Proteins by Yeast Surface cDNA Display and Deep Sequencing.

    PubMed

    Bidlingmaier, Scott; Ha, Kevin; Lee, Nam-Kyung; Su, Yang; Liu, Bin

    2016-04-01

    Although the bioactive sphingolipid ceramide is an important cell signaling molecule, relatively few direct ceramide-interacting proteins are known. We used an approach combining yeast surface cDNA display and deep sequencing technology to identify novel proteins binding directly to ceramide. We identified 234 candidate ceramide-binding protein fragments and validated binding for 20. Most (17) bound selectively to ceramide, although a few (3) bound to other lipids as well. Several novel ceramide-binding domains were discovered, including the EF-hand calcium-binding motif, the heat shock chaperonin-binding motif STI1, the SCP2 sterol-binding domain, and the tetratricopeptide repeat region motif. Interestingly, four of the verified ceramide-binding proteins (HPCA, HPCAL1, NCS1, and VSNL1) and an additional three candidate ceramide-binding proteins (NCALD, HPCAL4, and KCNIP3) belong to the neuronal calcium sensor family of EF hand-containing proteins. We used mutagenesis to map the ceramide-binding site in HPCA and to create a mutant HPCA that does not bind to ceramide. We demonstrated selective binding to ceramide by mammalian cell-produced wild type but not mutant HPCA. Intriguingly, we also identified a fragment from prostaglandin D2synthase that binds preferentially to ceramide 1-phosphate. The wide variety of proteins and domains capable of binding to ceramide suggests that many of the signaling functions of ceramide may be regulated by direct binding to these proteins. Based on the deep sequencing data, we estimate that our yeast surface cDNA display library covers ∼60% of the human proteome and our selection/deep sequencing protocol can identify target-interacting protein fragments that are present at extremely low frequency in the starting library. Thus, the yeast surface cDNA display/deep sequencing approach is a rapid, comprehensive, and flexible method for the analysis of protein-ligand interactions, particularly for the study of non-protein ligands. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.

  13. Analysis of Variability in HIV-1 Subtype A Strains in Russia Suggests a Combination of Deep Sequencing and Multitarget RNA Interference for Silencing of the Virus.

    PubMed

    Kretova, Olga V; Chechetkin, Vladimir R; Fedoseeva, Daria M; Kravatsky, Yuri V; Sosin, Dmitri V; Alembekov, Ildar R; Gorbacheva, Maria A; Gashnikova, Natalya M; Tchurikov, Nickolai A

    2017-02-01

    Any method for silencing the activity of the HIV-1 retrovirus should tackle the extremely high variability of HIV-1 sequences and mutational escape. We studied sequence variability in the vicinity of selected RNA interference (RNAi) targets from isolates of HIV-1 subtype A in Russia, and we propose that using artificial RNAi is a potential alternative to traditional antiretroviral therapy. We prove that using multiple RNAi targets overcomes the variability in HIV-1 isolates. The optimal number of targets critically depends on the conservation of the target sequences. The total number of targets that are conserved with a probability of 0.7-0.8 should exceed at least 2. Combining deep sequencing and multitarget RNAi may provide an efficient approach to cure HIV/AIDS.

  14. Deep sequencing approaches for the analysis of prokaryotic transcriptional boundaries and dynamics.

    PubMed

    James, Katherine; Cockell, Simon J; Zenkin, Nikolay

    2017-05-01

    The identification of the protein-coding regions of a genome is straightforward due to the universality of start and stop codons. However, the boundaries of the transcribed regions, conditional operon structures, non-coding RNAs and the dynamics of transcription, such as pausing of elongation, are non-trivial to identify, even in the comparatively simple genomes of prokaryotes. Traditional methods for the study of these areas, such as tiling arrays, are noisy, labour-intensive and lack the resolution required for densely-packed bacterial genomes. Recently, deep sequencing has become increasingly popular for the study of the transcriptome due to its lower costs, higher accuracy and single nucleotide resolution. These methods have revolutionised our understanding of prokaryotic transcriptional dynamics. Here, we review the deep sequencing and data analysis techniques that are available for the study of transcription in prokaryotes, and discuss the bioinformatic considerations of these analyses. Copyright © 2017 Elsevier Inc. All rights reserved.

  15. deepTools2: a next generation web server for deep-sequencing data analysis.

    PubMed

    Ramírez, Fidel; Ryan, Devon P; Grüning, Björn; Bhardwaj, Vivek; Kilpert, Fabian; Richter, Andreas S; Heyne, Steffen; Dündar, Friederike; Manke, Thomas

    2016-07-08

    We present an update to our Galaxy-based web server for processing and visualizing deeply sequenced data. Its core tool set, deepTools, allows users to perform complete bioinformatic workflows ranging from quality controls and normalizations of aligned reads to integrative analyses, including clustering and visualization approaches. Since we first described our deepTools Galaxy server in 2014, we have implemented new solutions for many requests from the community and our users. Here, we introduce significant enhancements and new tools to further improve data visualization and interpretation. deepTools continue to be open to all users and freely available as a web service at deeptools.ie-freiburg.mpg.de The new deepTools2 suite can be easily deployed within any Galaxy framework via the toolshed repository, and we also provide source code for command line usage under Linux and Mac OS X. A public and documented API for access to deepTools functionality is also available. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Deep sequencing and in silico analysis of small RNA library reveals novel miRNA from leaf Persicaria minor transcriptome.

    PubMed

    Samad, Abdul Fatah A; Nazaruddin, Nazaruddin; Murad, Abdul Munir Abdul; Jani, Jaeyres; Zainal, Zamri; Ismail, Ismanizan

    2018-03-01

    In current era, majority of microRNA (miRNA) are being discovered through computational approaches which are more confined towards model plants. Here, for the first time, we have described the identification and characterization of novel miRNA in a non-model plant, Persicaria minor ( P . minor ) using computational approach. Unannotated sequences from deep sequencing were analyzed based on previous well-established parameters. Around 24 putative novel miRNAs were identified from 6,417,780 reads of the unannotated sequence which represented 11 unique putative miRNA sequences. PsRobot target prediction tool was deployed to identify the target transcripts of putative novel miRNAs. Most of the predicted target transcripts (mRNAs) were known to be involved in plant development and stress responses. Gene ontology showed that majority of the putative novel miRNA targets involved in cellular component (69.07%), followed by molecular function (30.08%) and biological process (0.85%). Out of 11 unique putative miRNAs, 7 miRNAs were validated through semi-quantitative PCR. These novel miRNAs discoveries in P . minor may develop and update the current public miRNA database.

  17. Genomic variation in macrophage-cultured European porcine reproductive and respiratory syndrome virus Olot/91 revealed using ultra-deep next generation sequencing.

    PubMed

    Lu, Zen H; Brown, Alexander; Wilson, Alison D; Calvert, Jay G; Balasch, Monica; Fuentes-Utrilla, Pablo; Loecherbach, Julia; Turner, Frances; Talbot, Richard; Archibald, Alan L; Ait-Ali, Tahar

    2014-03-04

    Porcine Reproductive and Respiratory Syndrome (PRRS) is a disease of major economic impact worldwide. The etiologic agent of this disease is the PRRS virus (PRRSV). Increasing evidence suggest that microevolution within a coexisting quasispecies population can give rise to high sequence heterogeneity in PRRSV. We developed a pipeline based on the ultra-deep next generation sequencing approach to first construct the complete genome of a European PRRSV, strain Olot/9, cultured on macrophages and then capture the rare variants representative of the mixed quasispecies population. Olot/91 differs from the reference Lelystad strain by about 5% and a total of 88 variants, with frequencies as low as 1%, were detected in the mixed population. These variants included 16 non-synonymous variants concentrated in the genes encoding structural and nonstructural proteins; including Glycoprotein 2a and 5. Using an ultra-deep sequencing methodology, the complete genome of Olot/91 was constructed without any prior knowledge of the sequence. Rare variants that constitute minor fractions of the heterogeneous PRRSV population could successfully be detected to allow further exploration of microevolutionary events.

  18. Modeling positional effects of regulatory sequences with spline transformations increases prediction accuracy of deep neural networks

    PubMed Central

    Avsec, Žiga; Cheng, Jun; Gagneur, Julien

    2018-01-01

    Abstract Motivation Regulatory sequences are not solely defined by their nucleic acid sequence but also by their relative distances to genomic landmarks such as transcription start site, exon boundaries or polyadenylation site. Deep learning has become the approach of choice for modeling regulatory sequences because of its strength to learn complex sequence features. However, modeling relative distances to genomic landmarks in deep neural networks has not been addressed. Results Here we developed spline transformation, a neural network module based on splines to flexibly and robustly model distances. Modeling distances to various genomic landmarks with spline transformations significantly increased state-of-the-art prediction accuracy of in vivo RNA-binding protein binding sites for 120 out of 123 proteins. We also developed a deep neural network for human splice branchpoint based on spline transformations that outperformed the current best, already distance-based, machine learning model. Compared to piecewise linear transformation, as obtained by composition of rectified linear units, spline transformation yields higher prediction accuracy as well as faster and more robust training. As spline transformation can be applied to further quantities beyond distances, such as methylation or conservation, we foresee it as a versatile component in the genomics deep learning toolbox. Availability and implementation Spline transformation is implemented as a Keras layer in the CONCISE python package: https://github.com/gagneurlab/concise. Analysis code is available at https://github.com/gagneurlab/Manuscript_Avsec_Bioinformatics_2017. Contact avsec@in.tum.de or gagneur@in.tum.de Supplementary information Supplementary data are available at Bioinformatics online. PMID:29155928

  19. Enhanced arbovirus surveillance with deep sequencing: Identification of novel rhabdoviruses and bunyaviruses in Australian mosquitoes.

    PubMed

    Coffey, Lark L; Page, Brady L; Greninger, Alexander L; Herring, Belinda L; Russell, Richard C; Doggett, Stephen L; Haniotis, John; Wang, Chunlin; Deng, Xutao; Delwart, Eric L

    2014-01-05

    Viral metagenomics characterizes known and identifies unknown viruses based on sequence similarities to any previously sequenced viral genomes. A metagenomics approach was used to identify virus sequences in Australian mosquitoes causing cytopathic effects in inoculated mammalian cell cultures. Sequence comparisons revealed strains of Liao Ning virus (Reovirus, Seadornavirus), previously detected only in China, livestock-infecting Stretch Lagoon virus (Reovirus, Orbivirus), two novel dimarhabdoviruses, named Beaumont and North Creek viruses, and two novel orthobunyaviruses, named Murrumbidgee and Salt Ash viruses. The novel virus proteomes diverged by ≥ 50% relative to their closest previously genetically characterized viral relatives. Deep sequencing also generated genomes of Warrego and Wallal viruses, orbiviruses linked to kangaroo blindness, whose genomes had not been fully characterized. This study highlights viral metagenomics in concert with traditional arbovirus surveillance to characterize known and new arboviruses in field-collected mosquitoes. Follow-up epidemiological studies are required to determine whether the novel viruses infect humans. © 2013 Elsevier Inc. All rights reserved.

  20. Modeling genome coverage in single-cell sequencing

    PubMed Central

    Daley, Timothy; Smith, Andrew D.

    2014-01-01

    Motivation: Single-cell DNA sequencing is necessary for examining genetic variation at the cellular level, which remains hidden in bulk sequencing experiments. But because they begin with such small amounts of starting material, the amount of information that is obtained from single-cell sequencing experiment is highly sensitive to the choice of protocol employed and variability in library preparation. In particular, the fraction of the genome represented in single-cell sequencing libraries exhibits extreme variability due to quantitative biases in amplification and loss of genetic material. Results: We propose a method to predict the genome coverage of a deep sequencing experiment using information from an initial shallow sequencing experiment mapped to a reference genome. The observed coverage statistics are used in a non-parametric empirical Bayes Poisson model to estimate the gain in coverage from deeper sequencing. This approach allows researchers to know statistical features of deep sequencing experiments without actually sequencing deeply, providing a basis for optimizing and comparing single-cell sequencing protocols or screening libraries. Availability and implementation: The method is available as part of the preseq software package. Source code is available at http://smithlabresearch.org/preseq. Contact: andrewds@usc.edu Supplementary information: Supplementary material is available at Bioinformatics online. PMID:25107873

  1. Dissecting genetic and environmental mutation signatures with model organisms.

    PubMed

    Segovia, Romulo; Tam, Annie S; Stirling, Peter C

    2015-08-01

    Deep sequencing has impacted on cancer research by enabling routine sequencing of genomes and exomes to identify genetic changes associated with carcinogenesis. Researchers can now use the frequency, type, and context of all mutations in tumor genomes to extract mutation signatures that reflect the driving mutational processes. Identifying mutation signatures, however, may not immediately suggest a mechanism. Consequently, several recent studies have employed deep sequencing of model organisms exposed to discrete genetic or environmental perturbations. These studies exploit the simpler genomes and availability of powerful genetic tools in model organisms to analyze mutation signatures under controlled conditions, forging mechanistic links between mutational processes and signatures. We discuss the power of this approach and suggest that many such studies may be on the horizon. Copyright © 2015 Elsevier Ltd. All rights reserved.

  2. Biosynthesis and genetic encoding of phosphothreonine through parallel selection and deep sequencing

    PubMed Central

    Huguenin-Dezot, Nicolas; Liang, Alexandria D.; Schmied, Wolfgang H.; Rogerson, Daniel T.; Chin, Jason W.

    2017-01-01

    The phosphorylation of threonine residues in proteins regulates diverse processes in eukaryotic cells, and thousands of threonine phosphorylations have been identified. An understanding of how threonine phosphorylation regulates biological function will be accelerated by general methods to bio-synthesize defined phospho-proteins. Here we address limitations in current methods for discovering aminoacyl-tRNA synthetase/tRNA pairs for incorporating non-natural amino acids into proteins, by combining parallel positive selections with deep sequencing and statistical analysis, to create a rapid approach for directly discovering aminoacyl-tRNA synthetase/tRNA pairs that selectively incorporate non-natural substrates. Our approach is scalable and enables the direct discovery of aminoacyl-tRNA synthetase/tRNA pairs with mutually orthogonal substrate specificity. We biosynthesize phosphothreonine in cells, and use our new selection approach to discover a phosphothreonyl-tRNA synthetase/tRNACUA pair. By combining these advances we create an entirely biosynthetic route to incorporating phosphothreonine in proteins and biosynthesize several phosphoproteins; enabling phosphoprotein structure determination and synthetic protein kinase activation. PMID:28553966

  3. Whole-Genome Characterization of Prunus necrotic ringspot virus Infecting Sweet Cherry in China

    PubMed Central

    2018-01-01

    ABSTRACT Prunus necrotic ringspot virus (PNRSV) causes yield loss in most cultivated stone fruits, including sweet cherry. Using a small RNA deep-sequencing approach combined with end-genome sequence cloning, we identified the complete genomes of all three PNRSV strands from PNRSV-infected sweet cherry trees and compared them with those of two previously reported isolates. PMID:29496825

  4. A novel wavelet sequence based on deep bidirectional LSTM network model for ECG signal classification.

    PubMed

    Yildirim, Özal

    2018-05-01

    Long-short term memory networks (LSTMs), which have recently emerged in sequential data analysis, are the most widely used type of recurrent neural networks (RNNs) architecture. Progress on the topic of deep learning includes successful adaptations of deep versions of these architectures. In this study, a new model for deep bidirectional LSTM network-based wavelet sequences called DBLSTM-WS was proposed for classifying electrocardiogram (ECG) signals. For this purpose, a new wavelet-based layer is implemented to generate ECG signal sequences. The ECG signals were decomposed into frequency sub-bands at different scales in this layer. These sub-bands are used as sequences for the input of LSTM networks. New network models that include unidirectional (ULSTM) and bidirectional (BLSTM) structures are designed for performance comparisons. Experimental studies have been performed for five different types of heartbeats obtained from the MIT-BIH arrhythmia database. These five types are Normal Sinus Rhythm (NSR), Ventricular Premature Contraction (VPC), Paced Beat (PB), Left Bundle Branch Block (LBBB), and Right Bundle Branch Block (RBBB). The results show that the DBLSTM-WS model gives a high recognition performance of 99.39%. It has been observed that the wavelet-based layer proposed in the study significantly improves the recognition performance of conventional networks. This proposed network structure is an important approach that can be applied to similar signal processing problems. Copyright © 2018 Elsevier Ltd. All rights reserved.

  5. A deep learning framework for improving long-range residue-residue contact prediction using a hierarchical strategy.

    PubMed

    Xiong, Dapeng; Zeng, Jianyang; Gong, Haipeng

    2017-09-01

    Residue-residue contacts are of great value for protein structure prediction, since contact information, especially from those long-range residue pairs, can significantly reduce the complexity of conformational sampling for protein structure prediction in practice. Despite progresses in the past decade on protein targets with abundant homologous sequences, accurate contact prediction for proteins with limited sequence information is still far from satisfaction. Methodologies for these hard targets still need further improvement. We presented a computational program DeepConPred, which includes a pipeline of two novel deep-learning-based methods (DeepCCon and DeepRCon) as well as a contact refinement step, to improve the prediction of long-range residue contacts from primary sequences. When compared with previous prediction approaches, our framework employed an effective scheme to identify optimal and important features for contact prediction, and was only trained with coevolutionary information derived from a limited number of homologous sequences to ensure robustness and usefulness for hard targets. Independent tests showed that 59.33%/49.97%, 64.39%/54.01% and 70.00%/59.81% of the top L/5, top L/10 and top 5 predictions were correct for CASP10/CASP11 proteins, respectively. In general, our algorithm ranked as one of the best methods for CASP targets. All source data and codes are available at http://166.111.152.91/Downloads.html . hgong@tsinghua.edu.cn or zengjy321@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  6. High-throughput sequencing of the entire genomic regions of CCM1/KRIT1, CCM2 and CCM3/PDCD10 to search for pathogenic deep-intronic splice mutations in cerebral cavernous malformations.

    PubMed

    Rath, Matthias; Jenssen, Sönke E; Schwefel, Konrad; Spiegler, Stefanie; Kleimeier, Dana; Sperling, Christian; Kaderali, Lars; Felbor, Ute

    2017-09-01

    Cerebral cavernous malformations (CCM) are vascular lesions of the central nervous system that can cause headaches, seizures and hemorrhagic stroke. Disease-associated mutations have been identified in three genes: CCM1/KRIT1, CCM2 and CCM3/PDCD10. The precise proportion of deep-intronic variants in these genes and their clinical relevance is yet unknown. Here, a long-range PCR (LR-PCR) approach for target enrichment of the entire genomic regions of the three genes was combined with next generation sequencing (NGS) to screen for coding and non-coding variants. NGS detected all six CCM1/KRIT1, two CCM2 and four CCM3/PDCD10 mutations that had previously been identified by Sanger sequencing. Two of the pathogenic variants presented here are novel. Additionally, 20 stringently selected CCM index cases that had remained mutation-negative after conventional sequencing and exclusion of copy number variations were screened for deep-intronic mutations. The combination of bioinformatics filtering and transcript analyses did not reveal any deep-intronic splice mutations in these cases. Our results demonstrate that target enrichment by LR-PCR combined with NGS can be used for a comprehensive analysis of the entire genomic regions of the CCM genes in a research context. However, its clinical utility is limited as deep-intronic splice mutations in CCM1/KRIT1, CCM2 and CCM3/PDCD10 seem to be rather rare. Copyright © 2017 Elsevier Masson SAS. All rights reserved.

  7. DeepBase: annotation and discovery of microRNAs and other noncoding RNAs from deep-sequencing data.

    PubMed

    Yang, Jian-Hua; Qu, Liang-Hu

    2012-01-01

    Recent advances in high-throughput deep-sequencing technology have produced large numbers of short and long RNA sequences and enabled the detection and profiling of known and novel microRNAs (miRNAs) and other noncoding RNAs (ncRNAs) at unprecedented sensitivity and depth. In this chapter, we describe the use of deepBase, a database that we have developed to integrate all public deep-sequencing data and to facilitate the comprehensive annotation and discovery of miRNAs and other ncRNAs from these data. deepBase provides an integrative, interactive, and versatile web graphical interface to evaluate miRBase-annotated miRNA genes and other known ncRNAs, explores the expression patterns of miRNAs and other ncRNAs, and discovers novel miRNAs and other ncRNAs from deep-sequencing data. deepBase also provides a deepView genome browser to comparatively analyze these data at multiple levels. deepBase is available at http://deepbase.sysu.edu.cn/.

  8. Whole-Genome Characterization of Prunus necrotic ringspot virus Infecting Sweet Cherry in China.

    PubMed

    Wang, Jiawei; Zhai, Ying; Zhu, Dongzi; Liu, Weizhen; Pappu, Hanu R; Liu, Qingzhong

    2018-03-01

    Prunus necrotic ringspot virus (PNRSV) causes yield loss in most cultivated stone fruits, including sweet cherry. Using a small RNA deep-sequencing approach combined with end-genome sequence cloning, we identified the complete genomes of all three PNRSV strands from PNRSV-infected sweet cherry trees and compared them with those of two previously reported isolates. Copyright © 2018 Wang et al.

  9. High-Throughput Identification of Loss-of-Function Mutations for Anti-Interferon Activity in the Influenza A Virus NS Segment

    PubMed Central

    Wu, Nicholas C.; Young, Arthur P.; Al-Mawsawi, Laith Q.; Olson, C. Anders; Feng, Jun; Qi, Hangfei; Luan, Harding H.; Li, Xinmin; Wu, Ting-Ting

    2014-01-01

    ABSTRACT Viral proteins often display several functions which require multiple assays to dissect their genetic basis. Here, we describe a systematic approach to screen for loss-of-function mutations that confer a fitness disadvantage under a specified growth condition. Our methodology was achieved by genetically monitoring a mutant library under two growth conditions, with and without interferon, by deep sequencing. We employed a molecular tagging technique to distinguish true mutations from sequencing error. This approach enabled us to identify mutations that were negatively selected against, in addition to those that were positively selected for. Using this technique, we identified loss-of-function mutations in the influenza A virus NS segment that were sensitive to type I interferon in a high-throughput fashion. Mechanistic characterization further showed that a single substitution, D92Y, resulted in the inability of NS to inhibit RIG-I ubiquitination. The approach described in this study can be applied under any specified condition for any virus that can be genetically manipulated. IMPORTANCE Traditional genetics focuses on a single genotype-phenotype relationship, whereas high-throughput genetics permits phenotypic characterization of numerous mutants in parallel. High-throughput genetics often involves monitoring of a mutant library with deep sequencing. However, deep sequencing suffers from a high error rate (∼0.1 to 1%), which is usually higher than the occurrence frequency for individual point mutations within a mutant library. Therefore, only mutations that confer a fitness advantage can be identified with confidence due to an enrichment in the occurrence frequency. In contrast, it is impossible to identify deleterious mutations using most next-generation sequencing techniques. In this study, we have applied a molecular tagging technique to distinguish true mutations from sequencing errors. It enabled us to identify mutations that underwent negative selection, in addition to mutations that experienced positive selection. This study provides a proof of concept by screening for loss-of-function mutations on the influenza A virus NS segment that are involved in its anti-interferon activity. PMID:24965464

  10. Making sense of deep sequencing

    PubMed Central

    Goldman, D.; Domschke, K.

    2016-01-01

    This review, the first of an occasional series, tries to make sense of the concepts and uses of deep sequencing of polynucleic acids (DNA and RNA). Deep sequencing, synonymous with next-generation sequencing, high-throughput sequencing and massively parallel sequencing, includes whole genome sequencing but is more often and diversely applied to specific parts of the genome captured in different ways, for example the highly expressed portion of the genome known as the exome and portions of the genome that are epigenetically marked either by DNA methylation, the binding of proteins including histones, or that are in different configurations and thus more or less accessible to enzymes that cleave DNA. Deep sequencing of RNA (RNASeq) reverse-transcribed to complementary DNA is invaluable for measuring RNA expression and detecting changes in RNA structure. Important concepts in deep sequencing include the length and depth of sequence reads, mapping and assembly of reads, sequencing error, haplotypes, and the propensity of deep sequencing, as with other types of ‘big data’, to generate large numbers of errors, requiring monitoring for methodologic biases and strategies for replication and validation. Deep sequencing yields a unique genetic fingerprint that can be used to identify a person, and a trove of predictors of genetic medical diseases. Deep sequencing to identify epigenetic events including changes in DNA methylation and RNA expression can reveal the history and impact of environmental exposures. Because of the power of sequencing to identify and deliver biomedically significant information about a person and their blood relatives, it creates ethical dilemmas and practical challenges in research and clinical care, for example the decision and procedures to report incidental findings that will increasingly and frequently be discovered. PMID:24925306

  11. Unified Deep Learning Architecture for Modeling Biology Sequence.

    PubMed

    Wu, Hongjie; Cao, Chengyuan; Xia, Xiaoyan; Lu, Qiang

    2017-10-09

    Prediction of the spatial structure or function of biological macromolecules based on their sequence remains an important challenge in bioinformatics. When modeling biological sequences using traditional sequencing models, characteristics, such as long-range interactions between basic units, the complicated and variable output of labeled structures, and the variable length of biological sequences, usually lead to different solutions on a case-by-case basis. This study proposed the use of bidirectional recurrent neural networks based on long short-term memory or a gated recurrent unit to capture long-range interactions by designing the optional reshape operator to adapt to the diversity of the output labels and implementing a training algorithm to support the training of sequence models capable of processing variable-length sequences. Additionally, the merge and pooling operators enhanced the ability to capture short-range interactions between basic units of biological sequences. The proposed deep-learning model and its training algorithm might be capable of solving currently known biological sequence-modeling problems through the use of a unified framework. We validated our model on one of the most difficult biological sequence-modeling problems currently known, with our results indicating the ability of the model to obtain predictions of protein residue interactions that exceeded the accuracy of current popular approaches by 10% based on multiple benchmarks.

  12. Describing the diversity of Ag specific receptors in vertebrates: Contribution of repertoire deep sequencing.

    PubMed

    Castro, Rosario; Navelsaker, Sofie; Krasnov, Aleksei; Du Pasquier, Louis; Boudinot, Pierre

    2017-10-01

    During the last decades, gene and cDNA cloning identified TCR and Ig genes across vertebrates; genome sequencing of TCR and Ig loci in many species revealed the different organizations selected during evolution under the pressure of generating diverse repertoires of Ag receptors. By detecting clonotypes over a wide range of frequency, deep sequencing of Ig and TCR transcripts provides a new way to compare the structure of expressed repertoires in species of various sizes, at different stages of development, with different physiologies, and displaying multiple adaptations to the environment. In this review, we provide a short overview of the technologies currently used to produce global description of immune repertoires, describe how they have already been used in comparative immunology, and we discuss the future potential of such approaches. The development of these methodologies in new species holds promise for new discoveries concerning particular adaptations. As an example, understanding the development of adaptive immunity across metamorphosis in frogs has been made possible by such approaches. Repertoire sequencing is now widely used, not only in basic research but also in the context of immunotherapy and vaccination. Analysis of fish responses to pathogens and vaccines has already benefited from these methods. Finally, we also discuss potential advances based on repertoire sequencing of multigene families of immune sensors and effectors in invertebrates. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. Studies of a biochemical factory: tomato trichome deep expressed sequence tag sequencing and proteomics.

    PubMed

    Schilmiller, Anthony L; Miner, Dennis P; Larson, Matthew; McDowell, Eric; Gang, David R; Wilkerson, Curtis; Last, Robert L

    2010-07-01

    Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces beta-caryophyllene and alpha-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells.

  14. Studies of a Biochemical Factory: Tomato Trichome Deep Expressed Sequence Tag Sequencing and Proteomics1[W][OA

    PubMed Central

    Schilmiller, Anthony L.; Miner, Dennis P.; Larson, Matthew; McDowell, Eric; Gang, David R.; Wilkerson, Curtis; Last, Robert L.

    2010-01-01

    Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces β-caryophyllene and α-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells. PMID:20431087

  15. Metavisitor, a Suite of Galaxy Tools for Simple and Rapid Detection and Discovery of Viruses in Deep Sequence Data

    PubMed Central

    Vernick, Kenneth D.

    2017-01-01

    Metavisitor is a software package that allows biologists and clinicians without specialized bioinformatics expertise to detect and assemble viral genomes from deep sequence datasets. The package is composed of a set of modular bioinformatic tools and workflows that are implemented in the Galaxy framework. Using the graphical Galaxy workflow editor, users with minimal computational skills can use existing Metavisitor workflows or adapt them to suit specific needs by adding or modifying analysis modules. Metavisitor works with DNA, RNA or small RNA sequencing data over a range of read lengths and can use a combination of de novo and guided approaches to assemble genomes from sequencing reads. We show that the software has the potential for quick diagnosis as well as discovery of viruses from a vast array of organisms. Importantly, we provide here executable Metavisitor use cases, which increase the accessibility and transparency of the software, ultimately enabling biologists or clinicians to focus on biological or medical questions. PMID:28045932

  16. Environmental surveillance of viruses by tangential flow filtration and metagenomic reconstruction.

    PubMed

    Furtak, Vyacheslav; Roivainen, Merja; Mirochnichenko, Olga; Zagorodnyaya, Tatiana; Laassri, Majid; Zaidi, Sohail Z; Rehman, Lubna; Alam, Muhammad M; Chizhikov, Vladimir; Chumakov, Konstantin

    2016-04-14

    An approach is proposed for environmental surveillance of poliovirus by concentrating sewage samples with tangential flow filtration (TFF) followed by deep sequencing of viral RNA. Subsequent to testing the method with samples from Finland, samples from Pakistan, a country endemic for poliovirus, were investigated. Genomic sequencing was either performed directly, for unbiased identification of viruses regardless of their ability to grow in cell cultures, or after virus enrichment by cell culture or immunoprecipitation. Bioinformatics enabled separation and determination of individual consensus sequences. Overall, deep sequencing of the entire viral population identified polioviruses, non-polio enteroviruses, and other viruses. In Pakistani sewage samples, adeno-associated virus, unable to replicate autonomously in cell cultures, was the most abundant human virus. The presence of recombinants of wild polioviruses of serotype 1 (WPV1) was also inferred, whereby currently circulating WPV1 of south-Asian (SOAS) lineage comprised two sub-lineages depending on their non-capsid region origin. Complete genome analyses additionally identified point mutants and intertypic recombinants between attenuated Sabin strains in the Pakistani samples, and in one Finnish sample. The approach could allow rapid environmental surveillance of viruses causing human infections. It creates a permanent digital repository of the entire virome potentially useful for retrospective screening of future discovered viruses.

  17. HomozygosityMapper2012--bridging the gap between homozygosity mapping and deep sequencing.

    PubMed

    Seelow, Dominik; Schuelke, Markus

    2012-07-01

    Homozygosity mapping is a common method to map recessive traits in consanguineous families. To facilitate these analyses, we have developed HomozygosityMapper, a web-based approach to homozygosity mapping. HomozygosityMapper allows researchers to directly upload the genotype files produced by the major genotyping platforms as well as deep sequencing data. It detects stretches of homozygosity shared by the affected individuals and displays them graphically. Users can interactively inspect the underlying genotypes, manually refine these regions and eventually submit them to our candidate gene search engine GeneDistiller to identify the most promising candidate genes. Here, we present the new version of HomozygosityMapper. The most striking new feature is the support of Next Generation Sequencing *.vcf files as input. Upon users' requests, we have implemented the analysis of common experimental rodents as well as of important farm animals. Furthermore, we have extended the options for single families and loss of heterozygosity studies. Another new feature is the export of *.bed files for targeted enrichment of the potential disease regions for deep sequencing strategies. HomozygosityMapper also generates files for conventional linkage analyses which are already restricted to the possible disease regions, hence superseding CPU-intensive genome-wide analyses. HomozygosityMapper is freely available at http://www.homozygositymapper.org/.

  18. Maximum entropy methods for extracting the learned features of deep neural networks.

    PubMed

    Finnegan, Alex; Song, Jun S

    2017-10-01

    New architectures of multilayer artificial neural networks and new methods for training them are rapidly revolutionizing the application of machine learning in diverse fields, including business, social science, physical sciences, and biology. Interpreting deep neural networks, however, currently remains elusive, and a critical challenge lies in understanding which meaningful features a network is actually learning. We present a general method for interpreting deep neural networks and extracting network-learned features from input data. We describe our algorithm in the context of biological sequence analysis. Our approach, based on ideas from statistical physics, samples from the maximum entropy distribution over possible sequences, anchored at an input sequence and subject to constraints implied by the empirical function learned by a network. Using our framework, we demonstrate that local transcription factor binding motifs can be identified from a network trained on ChIP-seq data and that nucleosome positioning signals are indeed learned by a network trained on chemical cleavage nucleosome maps. Imposing a further constraint on the maximum entropy distribution also allows us to probe whether a network is learning global sequence features, such as the high GC content in nucleosome-rich regions. This work thus provides valuable mathematical tools for interpreting and extracting learned features from feed-forward neural networks.

  19. A Template-Based Protein Structure Reconstruction Method Using Deep Autoencoder Learning.

    PubMed

    Li, Haiou; Lyu, Qiang; Cheng, Jianlin

    2016-12-01

    Protein structure prediction is an important problem in computational biology, and is widely applied to various biomedical problems such as protein function study, protein design, and drug design. In this work, we developed a novel deep learning approach based on a deeply stacked denoising autoencoder for protein structure reconstruction. We applied our approach to a template-based protein structure prediction using only the 3D structural coordinates of homologous template proteins as input. The templates were identified for a target protein by a PSI-BLAST search. 3DRobot (a program that automatically generates diverse and well-packed protein structure decoys) was used to generate initial decoy models for the target from the templates. A stacked denoising autoencoder was trained on the decoys to obtain a deep learning model for the target protein. The trained deep model was then used to reconstruct the final structural model for the target sequence. With target proteins that have highly similar template proteins as benchmarks, the GDT-TS score of the predicted structures is greater than 0.7, suggesting that the deep autoencoder is a promising method for protein structure reconstruction.

  20. The Peach v2.0 release: high-resolution linkage mapping and deep resequencing improve chromosome-scale assembly and contiguity.

    PubMed

    Verde, Ignazio; Jenkins, Jerry; Dondini, Luca; Micali, Sabrina; Pagliarani, Giulia; Vendramin, Elisa; Paris, Roberta; Aramini, Valeria; Gazza, Laura; Rossini, Laura; Bassi, Daniele; Troggio, Michela; Shu, Shengqiang; Grimwood, Jane; Tartarini, Stefano; Dettori, Maria Teresa; Schmutz, Jeremy

    2017-03-11

    The availability of the peach genome sequence has fostered relevant research in peach and related Prunus species enabling the identification of genes underlying important horticultural traits as well as the development of advanced tools for genetic and genomic analyses. The first release of the peach genome (Peach v1.0) represented a high-quality WGS (Whole Genome Shotgun) chromosome-scale assembly with high contiguity (contig L50 214.2 kb), large portions of mapped sequences (96%) and high base accuracy (99.96%). The aim of this work was to improve the quality of the first assembly by increasing the portion of mapped and oriented sequences, correcting misassemblies and improving the contiguity and base accuracy using high-throughput linkage mapping and deep resequencing approaches. Four linkage maps with 3,576 molecular markers were used to improve the portion of mapped and oriented sequences (from 96.0% and 85.6% of Peach v1.0 to 99.2% and 98.2% of v2.0, respectively) and enabled a more detailed identification of discernible misassemblies (10.4 Mb in total). The deep resequencing approach fixed 859 homozygous SNPs (Single Nucleotide Polymorphisms) and 1347 homozygous indels. Moreover, the assembled NGS contigs enabled the closing of 212 gaps with an improvement in the contig L50 of 19.2%. The improved high quality peach genome assembly (Peach v2.0) represents a valuable tool for the analysis of the genetic diversity, domestication, and as a vehicle for genetic improvement of peach and related Prunus species. Moreover, the important phylogenetic position of peach and the absence of recent whole genome duplication (WGD) events make peach a pivotal species for comparative genomics studies aiming at elucidating plant speciation and diversification processes.

  1. Quantitative phenotyping via deep barcode sequencing.

    PubMed

    Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey

    2009-10-01

    Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.

  2. Deep Sequencing to Identify the Causes of Viral Encephalitis

    PubMed Central

    Chan, Benjamin K.; Wilson, Theodore; Fischer, Kael F.; Kriesel, John D.

    2014-01-01

    Deep sequencing allows for a rapid, accurate characterization of microbial DNA and RNA sequences in many types of samples. Deep sequencing (also called next generation sequencing or NGS) is being developed to assist with the diagnosis of a wide variety of infectious diseases. In this study, seven frozen brain samples from deceased subjects with recent encephalitis were investigated. RNA from each sample was extracted, randomly reverse transcribed and sequenced. The sequence analysis was performed in a blinded fashion and confirmed with pathogen-specific PCR. This analysis successfully identified measles virus sequences in two brain samples and herpes simplex virus type-1 sequences in three brain samples. No pathogen was identified in the other two brain specimens. These results were concordant with pathogen-specific PCR and partially concordant with prior neuropathological examinations, demonstrating that deep sequencing can accurately identify viral infections in frozen brain tissue. PMID:24699691

  3. A Statistical Guide to the Design of Deep Mutational Scanning Experiments

    PubMed Central

    Matuszewski, Sebastian; Hildebrandt, Marcel E.; Ghenu, Ana-Hermina; Jensen, Jeffrey D.; Bank, Claudia

    2016-01-01

    The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deep-sequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates. PMID:27412710

  4. Screening for single nucleotide variants, small indels and exon deletions with a next-generation sequencing based gene panel approach for Usher syndrome

    PubMed Central

    Krawitz, Peter M; Schiska, Daniela; Krüger, Ulrike; Appelt, Sandra; Heinrich, Verena; Parkhomchuk, Dmitri; Timmermann, Bernd; Millan, Jose M; Robinson, Peter N; Mundlos, Stefan; Hecht, Jochen; Gross, Manfred

    2014-01-01

    Usher syndrome is an autosomal recessive disorder characterized both by deafness and blindness. For the three clinical subtypes of Usher syndrome causal mutations in altogether 12 genes and a modifier gene have been identified. Due to the genetic heterogeneity of Usher syndrome, the molecular analysis is predestined for a comprehensive and parallelized analysis of all known genes by next-generation sequencing (NGS) approaches. We describe here the targeted enrichment and deep sequencing for exons of Usher genes and compare the costs and workload of this approach compared to Sanger sequencing. We also present a bioinformatics analysis pipeline that allows us to detect single-nucleotide variants, short insertions and deletions, as well as copy number variations of one or more exons on the same sequence data. Additionally, we present a flexible in silico gene panel for the analysis of sequence variants, in which newly identified genes can easily be included. We applied this approach to a cohort of 44 Usher patients and detected biallelic pathogenic mutations in 35 individuals and monoallelic mutations in eight individuals of our cohort. Thirty-nine of the sequence variants, including two heterozygous deletions comprising several exons of USH2A, have not been reported so far. Our NGS-based approach allowed us to assess single-nucleotide variants, small indels, and whole exon deletions in a single test. The described diagnostic approach is fast and cost-effective with a high molecular diagnostic yield. PMID:25333064

  5. Screening for single nucleotide variants, small indels and exon deletions with a next-generation sequencing based gene panel approach for Usher syndrome.

    PubMed

    Krawitz, Peter M; Schiska, Daniela; Krüger, Ulrike; Appelt, Sandra; Heinrich, Verena; Parkhomchuk, Dmitri; Timmermann, Bernd; Millan, Jose M; Robinson, Peter N; Mundlos, Stefan; Hecht, Jochen; Gross, Manfred

    2014-09-01

    Usher syndrome is an autosomal recessive disorder characterized both by deafness and blindness. For the three clinical subtypes of Usher syndrome causal mutations in altogether 12 genes and a modifier gene have been identified. Due to the genetic heterogeneity of Usher syndrome, the molecular analysis is predestined for a comprehensive and parallelized analysis of all known genes by next-generation sequencing (NGS) approaches. We describe here the targeted enrichment and deep sequencing for exons of Usher genes and compare the costs and workload of this approach compared to Sanger sequencing. We also present a bioinformatics analysis pipeline that allows us to detect single-nucleotide variants, short insertions and deletions, as well as copy number variations of one or more exons on the same sequence data. Additionally, we present a flexible in silico gene panel for the analysis of sequence variants, in which newly identified genes can easily be included. We applied this approach to a cohort of 44 Usher patients and detected biallelic pathogenic mutations in 35 individuals and monoallelic mutations in eight individuals of our cohort. Thirty-nine of the sequence variants, including two heterozygous deletions comprising several exons of USH2A, have not been reported so far. Our NGS-based approach allowed us to assess single-nucleotide variants, small indels, and whole exon deletions in a single test. The described diagnostic approach is fast and cost-effective with a high molecular diagnostic yield.

  6. Exploring the Gastrointestinal "Nemabiome": Deep Amplicon Sequencing to Quantify the Species Composition of Parasitic Nematode Communities.

    PubMed

    Avramenko, Russell W; Redman, Elizabeth M; Lewis, Roy; Yazwinski, Thomas A; Wasmuth, James D; Gilleard, John S

    2015-01-01

    Parasitic helminth infections have a considerable impact on global human health as well as animal welfare and production. Although co-infection with multiple parasite species within a host is common, there is a dearth of tools with which to study the composition of these complex parasite communities. Helminth species vary in their pathogenicity, epidemiology and drug sensitivity and the interactions that occur between co-infecting species and their hosts are poorly understood. We describe the first application of deep amplicon sequencing to study parasitic nematode communities as well as introduce the concept of the gastro-intestinal "nemabiome". The approach is analogous to 16S rDNA deep sequencing used to explore microbial communities, but utilizes the nematode ITS-2 rDNA locus instead. Gastro-intestinal parasites of cattle were used to develop the concept, as this host has many well-defined gastro-intestinal nematode species that commonly occur as complex co-infections. Further, the availability of pure mono-parasite populations from experimentally infected cattle allowed us to prepare mock parasite communities to determine, and correct for, species representation biases in the sequence data. We demonstrate that, once these biases have been corrected, accurate relative quantitation of gastro-intestinal parasitic nematode communities in cattle fecal samples can be achieved. We have validated the accuracy of the method applied to field-samples by comparing the results of detailed morphological examination of L3 larvae populations with those of the sequencing assay. The results illustrate the insights that can be gained into the species composition of parasite communities, using grazing cattle in the mid-west USA as an example. However, both the technical approach and the concept of the 'nemabiome' have a wide range of potential applications in human and veterinary medicine. These include investigations of host-parasite and parasite-parasite interactions during co-infection, parasite epidemiology, parasite ecology and the response of parasite populations to both drug treatments and control programs.

  7. RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach.

    PubMed

    Pan, Xiaoyong; Shen, Hong-Bin

    2017-02-28

    RNAs play key roles in cells through the interactions with proteins known as the RNA-binding proteins (RBP) and their binding motifs enable crucial understanding of the post-transcriptional regulation of RNAs. How the RBPs correctly recognize the target RNAs and why they bind specific positions is still far from clear. Machine learning-based algorithms are widely acknowledged to be capable of speeding up this process. Although many automatic tools have been developed to predict the RNA-protein binding sites from the rapidly growing multi-resource data, e.g. sequence, structure, their domain specific features and formats have posed significant computational challenges. One of current difficulties is that the cross-source shared common knowledge is at a higher abstraction level beyond the observed data, resulting in a low efficiency of direct integration of observed data across domains. The other difficulty is how to interpret the prediction results. Existing approaches tend to terminate after outputting the potential discrete binding sites on the sequences, but how to assemble them into the meaningful binding motifs is a topic worth of further investigation. In viewing of these challenges, we propose a deep learning-based framework (iDeep) by using a novel hybrid convolutional neural network and deep belief network to predict the RBP interaction sites and motifs on RNAs. This new protocol is featured by transforming the original observed data into a high-level abstraction feature space using multiple layers of learning blocks, where the shared representations across different domains are integrated. To validate our iDeep method, we performed experiments on 31 large-scale CLIP-seq datasets, and our results show that by integrating multiple sources of data, the average AUC can be improved by 8% compared to the best single-source-based predictor; and through cross-domain knowledge integration at an abstraction level, it outperforms the state-of-the-art predictors by 6%. Besides the overall enhanced prediction performance, the convolutional neural network module embedded in iDeep is also able to automatically capture the interpretable binding motifs for RBPs. Large-scale experiments demonstrate that these mined binding motifs agree well with the experimentally verified results, suggesting iDeep is a promising approach in the real-world applications. The iDeep framework not only can achieve promising performance than the state-of-the-art predictors, but also easily capture interpretable binding motifs. iDeep is available at http://www.csbio.sjtu.edu.cn/bioinf/iDeep.

  8. Resolving the Complexity of Human Skin Metagenomes Using Single-Molecule Sequencing

    PubMed Central

    Tsai, Yu-Chih; Deming, Clayton; Segre, Julia A.; Kong, Heidi H.; Korlach, Jonas

    2016-01-01

    ABSTRACT Deep metagenomic shotgun sequencing has emerged as a powerful tool to interrogate composition and function of complex microbial communities. Computational approaches to assemble genome fragments have been demonstrated to be an effective tool for de novo reconstruction of genomes from these communities. However, the resultant “genomes” are typically fragmented and incomplete due to the limited ability of short-read sequence data to assemble complex or low-coverage regions. Here, we use single-molecule, real-time (SMRT) sequencing to reconstruct a high-quality, closed genome of a previously uncharacterized Corynebacterium simulans and its companion bacteriophage from a skin metagenomic sample. Considerable improvement in assembly quality occurs in hybrid approaches incorporating short-read data, with even relatively small amounts of long-read data being sufficient to improve metagenome reconstruction. Using short-read data to evaluate strain variation of this C. simulans in its skin community at single-nucleotide resolution, we observed a dominant C. simulans strain with moderate allelic heterozygosity throughout the population. We demonstrate the utility of SMRT sequencing and hybrid approaches in metagenome quantitation, reconstruction, and annotation. PMID:26861018

  9. ampliMethProfiler: a pipeline for the analysis of CpG methylation profiles of targeted deep bisulfite sequenced amplicons.

    PubMed

    Scala, Giovanni; Affinito, Ornella; Palumbo, Domenico; Florio, Ermanno; Monticelli, Antonella; Miele, Gennaro; Chiariotti, Lorenzo; Cocozza, Sergio

    2016-11-25

    CpG sites in an individual molecule may exist in a binary state (methylated or unmethylated) and each individual DNA molecule, containing a certain number of CpGs, is a combination of these states defining an epihaplotype. Classic quantification based approaches to study DNA methylation are intrinsically unable to fully represent the complexity of the underlying methylation substrate. Epihaplotype based approaches, on the other hand, allow methylation profiles of cell populations to be studied at the single molecule level. For such investigations, next-generation sequencing techniques can be used, both for quantitative and for epihaplotype analysis. Currently available tools for methylation analysis lack output formats that explicitly report CpG methylation profiles at the single molecule level and that have suited statistical tools for their interpretation. Here we present ampliMethProfiler, a python-based pipeline for the extraction and statistical epihaplotype analysis of amplicons from targeted deep bisulfite sequencing of multiple DNA regions. ampliMethProfiler tool provides an easy and user friendly way to extract and analyze the epihaplotype composition of reads from targeted bisulfite sequencing experiments. ampliMethProfiler is written in python language and requires a local installation of BLAST and (optionally) QIIME tools. It can be run on Linux and OS X platforms. The software is open source and freely available at http://amplimethprofiler.sourceforge.net .

  10. Single-Domain Parvulins Constitute a Specific Marker for Recently Proposed Deep-Branching Archaeal Subgroups

    PubMed Central

    Lederer, Christoph; Heider, Dominik; van den Boom, Johannes; Hoffmann, Daniel; Mueller, Jonathan W.; Bayer, Peter

    2011-01-01

    Peptidyl-prolyl cis/trans isomerases (PPIases) are enzymes assisting protein folding and protein quality control in organisms of all kingdoms of life. In contrast to the other sub-classes of PPIases, the cyclophilins and the FK-506 binding proteins, little was formerly known about the parvulin type of PPIase in Archaea. Recently, the first solution structure of an archaeal parvulin, the PinA protein from Cenarchaeum symbiosum, was reported. Investigation of occurrence and frequency of PPIase sequences in numerous archaeal genomes now revealed a strong tendency for thermophilic microorganisms to reduce the number of PPIases. Single-domain parvulins were mostly found in the genomes of recently proposed deep-branching archaeal subgroups, the Thaumarchaeota and the ARMANs (archaeal Richmond Mine acidophilic nanoorganisms). Hence, we used the parvulin sequence to reclassify available archaeal metagenomic contigs, thereby, adding new members to these subgroups. A combination of genomic background analysis and phylogenetic approaches of parvulin sequences suggested that the assigned sequences belong to at least two distinct groups of Thaumarchaeota. Finally, machine learning approaches were applied to identify amino acid residues that separate archaeal and bacterial parvulin proteins from each other. When mapped onto the recent PinA solution structure, most of these positions form a cluster at one site of the protein possibly indicating a different functionality of the two groups of parvulin proteins. PMID:22065628

  11. A novel approach to tracking antigen-experienced CD4 T cells into functional compartments via tandem deep and shallow TCR clonotyping.

    PubMed

    Estorninho, Megan; Gibson, Vivienne B; Kronenberg-Versteeg, Deborah; Liu, Yuk-Fun; Ni, Chester; Cerosaletti, Karen; Peakman, Mark

    2013-12-01

    Extensive diversity in the human repertoire of TCRs for Ag is both a cornerstone of effective adaptive immunity that enables host protection against a multiplicity of pathogens and a weakness that gives rise to potential pathological self-reactivity. The complexity arising from diversity makes detection and tracking of single Ag-specific CD4 T cells (ASTs) involved in these immune responses challenging. We report a tandem, multistep process to quantify rare TCRβ-chain variable sequences of ASTs in large polyclonal populations. The approach combines deep high-throughput sequencing (HTS) within functional CD4 T cell compartments, such as naive/memory cells, with shallow, multiple identifier-based HTS of ASTs identified by activation marker upregulation after short-term Ag stimulation in vitro. We find that clonotypes recognizing HLA class II-restricted epitopes of both pathogen-derived Ags and self-Ags are oligoclonal and typically private. Clonotype tracking within an individual reveals private AST clonotypes resident in the memory population, as would be expected, representing clonal expansions (identical nucleotide sequence; "ultraprivate"). Other AST clonotypes share CDR3β amino acid sequences through convergent recombination and are found in memory populations of multiple individuals. Tandem HTS-based clonotyping will facilitate studying AST dynamics, epitope spreading, and repertoire changes that arise postvaccination and following Ag-specific immunotherapies for cancer and autoimmune disease.

  12. Quantitative phenotyping via deep barcode sequencing

    PubMed Central

    Smith, Andrew M.; Heisler, Lawrence E.; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J.; Chee, Mark; Roth, Frederick P.; Giaever, Guri; Nislow, Corey

    2009-01-01

    Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or “Bar-seq,” outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that ∼20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene–environment interactions on a genome-wide scale. PMID:19622793

  13. Probing the Rare Biosphere of the North-West Mediterranean Sea: An Experiment with High Sequencing Effort.

    PubMed

    Crespo, Bibiana G; Wallhead, Philip J; Logares, Ramiro; Pedrós-Alió, Carlos

    2016-01-01

    High-throughput sequencing (HTS) techniques have suggested the existence of a wealth of species with very low relative abundance: the rare biosphere. We attempted to exhaustively map this rare biosphere in two water samples by performing an exceptionally deep pyrosequencing analysis (~500,000 final reads per sample). Species data were derived by a 97% identity criterion and various parametric distributions were fitted to the observed counts. Using the best-fitting Sichel distribution we estimate a total species richness of 1,568-1,669 (95% Credible Interval) and 5,027-5,196 for surface and deep water samples respectively, implying that 84-89% of the total richness in those two samples was sequenced, and we predict that a quadrupling of the present sequencing effort would suffice to observe 90% of the total richness in both samples. Comparing the HTS results with a culturing approach we found that most of the cultured taxa were not obtained by HTS, despite the high sequencing effort. Culturing therefore remains a useful tool for uncovering marine bacterial diversity, in addition to its other uses for studying the ecology of marine bacteria.

  14. Detection of microRNAs in color space.

    PubMed

    Marco, Antonio; Griffiths-Jones, Sam

    2012-02-01

    Deep sequencing provides inexpensive opportunities to characterize the transcriptional diversity of known genomes. The AB SOLiD technology generates millions of short sequencing reads in color-space; that is, the raw data is a sequence of colors, where each color represents 2 nt and each nucleotide is represented by two consecutive colors. This strategy is purported to have several advantages, including increased ability to distinguish sequencing errors from polymorphisms. Several programs have been developed to map short reads to genomes in color space. However, a number of previously unexplored technical issues arise when using SOLiD technology to characterize microRNAs. Here we explore these technical difficulties. First, since the sequenced reads are longer than the biological sequences, every read is expected to contain linker fragments. The color-calling error rate increases toward the 3(') end of the read such that recognizing the linker sequence for removal becomes problematic. Second, mapping in color space may lead to the loss of the first nucleotide of each read. We propose a sequential trimming and mapping approach to map small RNAs. Using our strategy, we reanalyze three published insect small RNA deep sequencing datasets and characterize 22 new microRNAs. A bash shell script to perform the sequential trimming and mapping procedure, called SeqTrimMap, is available at: http://www.mirbase.org/tools/seqtrimmap/ antonio.marco@manchester.ac.uk Supplementary data are available at Bioinformatics online.

  15. Discriminative Prediction of A-To-I RNA Editing Events from DNA Sequence

    PubMed Central

    Sun, Jiangming; Singh, Pratibha; Bagge, Annika; Valtat, Bérengère; Vikman, Petter; Spégel, Peter; Mulder, Hindrik

    2016-01-01

    RNA editing is a post-transcriptional alteration of RNA sequences that, via insertions, deletions or base substitutions, can affect protein structure as well as RNA and protein expression. Recently, it has been suggested that RNA editing may be more frequent than previously thought. A great impediment, however, to a deeper understanding of this process is the paramount sequencing effort that needs to be undertaken to identify RNA editing events. Here, we describe an in silico approach, based on machine learning, that ameliorates this problem. Using 41 nucleotide long DNA sequences, we show that novel A-to-I RNA editing events can be predicted from known A-to-I RNA editing events intra- and interspecies. The validity of the proposed method was verified in an independent experimental dataset. Using our approach, 203 202 putative A-to-I RNA editing events were predicted in the whole human genome. Out of these, 9% were previously reported. The remaining sites require further validation, e.g., by targeted deep sequencing. In conclusion, the approach described here is a useful tool to identify potential A-to-I RNA editing events without the requirement of extensive RNA sequencing. PMID:27764195

  16. Rapid Fine Conformational Epitope Mapping Using Comprehensive Mutagenesis and Deep Sequencing*

    PubMed Central

    Kowalsky, Caitlin A.; Faber, Matthew S.; Nath, Aritro; Dann, Hailey E.; Kelly, Vince W.; Liu, Li; Shanker, Purva; Wagner, Ellen K.; Maynard, Jennifer A.; Chan, Christina; Whitehead, Timothy A.

    2015-01-01

    Knowledge of the fine location of neutralizing and non-neutralizing epitopes on human pathogens affords a better understanding of the structural basis of antibody efficacy, which will expedite rational design of vaccines, prophylactics, and therapeutics. However, full utilization of the wealth of information from single cell techniques and antibody repertoire sequencing awaits the development of a high throughput, inexpensive method to map the conformational epitopes for antibody-antigen interactions. Here we show such an approach that combines comprehensive mutagenesis, cell surface display, and DNA deep sequencing. We develop analytical equations to identify epitope positions and show the method effectiveness by mapping the fine epitope for different antibodies targeting TNF, pertussis toxin, and the cancer target TROP2. In all three cases, the experimentally determined conformational epitope was consistent with previous experimental datasets, confirming the reliability of the experimental pipeline. Once the comprehensive library is generated, fine conformational epitope maps can be prepared at a rate of four per day. PMID:26296891

  17. Clinical Utility of Circulating Tumor DNA for Molecular Assessment and Precision Medicine in Pancreatic Cancer.

    PubMed

    Takai, Erina; Totoki, Yasushi; Nakamura, Hiromi; Kato, Mamoru; Shibata, Tatsuhiro; Yachida, Shinichi

    2016-01-01

    Pancreatic ductal adenocarcinoma (PDAC) remains one of the most lethal malignancies. The genomic landscape of the PDAC genome features four frequently mutated genes (KRAS, CDKN2A, TP53, and SMAD4) and dozens of candidate driver genes altered at low frequency, including potential clinical targets. Circulating cell-free DNA (cfDNA) is a promising resource to detect molecular characteristics of tumors, supporting the concept of "liquid biopsy".We determined the mutational status of KRAS in plasma cfDNA using multiplex droplet digital PCR in 259 patients with PDAC, retrospectively. Furthermore, we constructed a novel modified SureSelect-KAPA-Illumina platform and an original panel of 60 genes. We then performed targeted deep sequencing of cfDNA in 48 patients who had ≥1 % mutant allele frequencies of KRAS in plasma cfDNA.Droplet digital PCR detected KRAS mutations in plasma cfDNA in 63 of 107 (58.9 %) patients with inoperable tumors. Importantly, potentially targetable somatic mutations were identified in 14 of 48 patients (29.2 %) examined by cfDNA sequencing.Our two-step approach with plasma cfDNA, combining droplet digital PCR and targeted deep sequencing, is a feasible clinical approach. Assessment of mutations in plasma cfDNA may provide a new diagnostic tool, assisting decisions for optimal therapeutic strategies for PDAC patients.

  18. A Statistical Guide to the Design of Deep Mutational Scanning Experiments.

    PubMed

    Matuszewski, Sebastian; Hildebrandt, Marcel E; Ghenu, Ana-Hermina; Jensen, Jeffrey D; Bank, Claudia

    2016-09-01

    The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deep-sequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates. Copyright © 2016 by the Genetics Society of America.

  19. Detailed investigation of the microbial community in foaming activated sludge reveals novel foam formers

    PubMed Central

    Guo, Feng; Wang, Zhi-Ping; Yu, Ke; Zhang, T.

    2015-01-01

    Foaming of activated sludge (AS) causes adverse impacts on wastewater treatment operation and hygiene. In this study, we investigated the microbial communities of foam, foaming AS and non-foaming AS in a sewage treatment plant via deep-sequencing of the taxonomic marker genes 16S rRNA and mycobacterial rpoB and a metagenomic approach. In addition to Actinobacteria, many genera (e.g., Clostridium XI, Arcobacter, Flavobacterium) were more abundant in the foam than in the AS. On the other hand, deep-sequencing of rpoB did not detect any obligate pathogenic mycobacteria in the foam. We found that unknown factors other than the abundance of Gordonia sp. could determine the foaming process, because abundance of the same species was stable before and after a foaming event over six months. More interestingly, although the dominant Gordonia foam former was the closest with G. amarae, it was identified as an undescribed Gordonia species by referring to the 16S rRNA gene, gyrB and, most convincingly, the reconstructed draft genome from metagenomic reads. Our results, based on metagenomics and deep sequencing, reveal that foams are derived from diverse taxa, which expands previous understanding and provides new insight into the underlying complications of the foaming phenomenon in AS. PMID:25560234

  20. Increasing Clinical Severity during a Dengue Virus Type 3 Cuban Epidemic: Deep Sequencing of Evolving Viral Populations

    PubMed Central

    Blanc, Hervé; Bordería, Antonio V.; Díaz, Gisell; Henningsson, Rasmus; Gonzalez, Daniel; Santana, Emidalys; Alvarez, Mayling; Castro, Osvaldo; Fontes, Magnus; Vignuzzi, Marco; Guzman, Maria G.

    2016-01-01

    ABSTRACT During the dengue virus type 3 (DENV-3) epidemic that occurred in Havana in 2001 to 2002, severe disease was associated with the infection sequence DENV-1 followed by DENV-3 (DENV-1/DENV-3), while the sequence DENV-2/DENV-3 was associated with mild/asymptomatic infections. To determine the role of the virus in the increasing severity demonstrated during the epidemic, serum samples collected at different time points were studied. A total of 22 full-length sequences were obtained using a deep-sequencing approach. Bayesian phylogenetic analysis of consensus sequences revealed that two DENV-3 lineages were circulating in Havana at that time, both grouped within genotype III. The predominant lineage is closely related to Peruvian and Ecuadorian strains, while the minor lineage is related to Venezuelan strains. According to consensus sequences, relatively few nonsynonymous mutations were observed; only one was fixed during the epidemic at position 4380 in the NS2B gene. Intrahost genetic analysis indicated that a significant minor population was selected and became predominant toward the end of the epidemic. In conclusion, greater variability was detected during the epidemic's progression in terms of significant minority variants, particularly in the nonstructural genes. An increasing trend of genetic diversity toward the end of the epidemic was observed only for synonymous variant allele rates, with higher variability in secondary cases. Remarkably, significant intrahost genetic variation was demonstrated within the same patient during the course of secondary infection with DENV-1/DENV-3, including changes in the structural proteins premembrane (PrM) and envelope (E). Therefore, the dynamic of evolving viral populations in the context of heterotypic antibodies could be related to the increasing clinical severity observed during the epidemic. IMPORTANCE Based on the evidence that DENV fitness is context dependent, our research has focused on the study of viral factors associated with intraepidemic increasing severity in a unique epidemiological setting. Here, we investigated the intrahost genetic diversity in acute human samples collected at different time points during the DENV-3 epidemic that occurred in Cuba in 2001 to 2002 using a deep-sequencing approach. We concluded that greater variability in significant minor populations occurred as the epidemic progressed, particularly in the nonstructural genes, with higher variability observed in secondary infection cases. Remarkably, for the first time significant intrahost genetic variation was demonstrated within the same patient during the course of secondary infection with DENV-1/DENV-3, including changes in structural proteins. These findings indicate that high-resolution approaches are needed to unravel molecular mechanisms involved in dengue pathogenesis. PMID:26889031

  1. Adaptive metric learning with deep neural networks for video-based facial expression recognition

    NASA Astrophysics Data System (ADS)

    Liu, Xiaofeng; Ge, Yubin; Yang, Chao; Jia, Ping

    2018-01-01

    Video-based facial expression recognition has become increasingly important for plenty of applications in the real world. Despite that numerous efforts have been made for the single sequence, how to balance the complex distribution of intra- and interclass variations well between sequences has remained a great difficulty in this area. We propose the adaptive (N+M)-tuplet clusters loss function and optimize it with the softmax loss simultaneously in the training phrase. The variations introduced by personal attributes are alleviated using the similarity measurements of multiple samples in the feature space with many fewer comparison times as conventional deep metric learning approaches, which enables the metric calculations for large data applications (e.g., videos). Both the spatial and temporal relations are well explored by a unified framework that consists of an Inception-ResNet network with long short term memory and the two fully connected layer branches structure. Our proposed method has been evaluated with three well-known databases, and the experimental results show that our method outperforms many state-of-the-art approaches.

  2. Joint deep shape and appearance learning: application to optic pathway glioma segmentation

    NASA Astrophysics Data System (ADS)

    Mansoor, Awais; Li, Ien; Packer, Roger J.; Avery, Robert A.; Linguraru, Marius George

    2017-03-01

    Automated tissue characterization is one of the major applications of computer-aided diagnosis systems. Deep learning techniques have recently demonstrated impressive performance for the image patch-based tissue characterization. However, existing patch-based tissue classification techniques struggle to exploit the useful shape information. Local and global shape knowledge such as the regional boundary changes, diameter, and volumetrics can be useful in classifying the tissues especially in scenarios where the appearance signature does not provide significant classification information. In this work, we present a deep neural network-based method for the automated segmentation of the tumors referred to as optic pathway gliomas (OPG) located within the anterior visual pathway (AVP; optic nerve, chiasm or tracts) using joint shape and appearance learning. Voxel intensity values of commonly used MRI sequences are generally not indicative of OPG. To be considered an OPG, current clinical practice dictates that some portion of AVP must demonstrate shape enlargement. The method proposed in this work integrates multiple sequence magnetic resonance image (T1, T2, and FLAIR) along with local boundary changes to train a deep neural network. For training and evaluation purposes, we used a dataset of multiple sequence MRI obtained from 20 subjects (10 controls, 10 NF1+OPG). To our best knowledge, this is the first deep representation learning-based approach designed to merge shape and multi-channel appearance data for the glioma detection. In our experiments, mean misclassification errors of 2:39% and 0:48% were observed respectively for glioma and control patches extracted from the AVP. Moreover, an overall dice similarity coefficient of 0:87+/-0:13 (0:93+/-0:06 for healthy tissue, 0:78+/-0:18 for glioma tissue) demonstrates the potential of the proposed method in the accurate localization and early detection of OPG.

  3. Analysis of deep learning methods for blind protein contact prediction in CASP12.

    PubMed

    Wang, Sheng; Sun, Siqi; Xu, Jinbo

    2018-03-01

    Here we present the results of protein contact prediction achieved in CASP12 by our RaptorX-Contact server, which is an early implementation of our deep learning method for contact prediction. On a set of 38 free-modeling target domains with a median family size of around 58 effective sequences, our server obtained an average top L/5 long- and medium-range contact accuracy of 47% and 44%, respectively (L = length). A complete implementation has an average accuracy of 59% and 57%, respectively. Our deep learning method formulates contact prediction as a pixel-level image labeling problem and simultaneously predicts all residue pairs of a protein using a combination of two deep residual neural networks, taking as input the residue conservation information, predicted secondary structure and solvent accessibility, contact potential, and coevolution information. Our approach differs from existing methods mainly in (1) formulating contact prediction as a pixel-level image labeling problem instead of an image-level classification problem; (2) simultaneously predicting all contacts of an individual protein to make effective use of contact occurrence patterns; and (3) integrating both one-dimensional and two-dimensional deep convolutional neural networks to effectively learn complex sequence-structure relationship including high-order residue correlation. This paper discusses the RaptorX-Contact pipeline, both contact prediction and contact-based folding results, and finally the strength and weakness of our method. © 2017 Wiley Periodicals, Inc.

  4. High fungal diversity and abundance recovered in the deep-sea sediments of the Pacific Ocean.

    PubMed

    Xu, Wei; Pang, Ka-Lai; Luo, Zhu-Hua

    2014-11-01

    Knowledge about the presence and ecological significance of bacteria and archaea in the deep-sea environments has been well recognized, but the eukaryotic microorganisms, such as fungi, have rarely been reported. The present study investigated the composition and abundance of fungal community in the deep-sea sediments of the Pacific Ocean. In this study, a total of 1,947 internal transcribed spacer (ITS) regions of fungal rRNA gene clones were recovered from five sediment samples at the Pacific Ocean (water depths ranging from 5,017 to 6,986 m) using three different PCR primer sets. There were 16, 17, and 15 different operational taxonomic units (OTUs) identified from fungal-universal, Ascomycota-, and Basidiomycota-specific clone libraries, respectively. Majority of the recovered sequences belonged to diverse phylotypes of Ascomycota (25 phylotypes) and Basidiomycota (18 phylotypes). The multiple primer approach totally recovered 27 phylotypes which showed low similarities (≤97 %) with available fungal sequences in the GenBank, suggesting possible new fungal taxa occurring in the deep-sea environments or belonging to taxa not represented in the GenBank. Our results also recovered high fungal LSU rRNA gene copy numbers (3.52 × 10(6) to 5.23 × 10(7)copies/g wet sediment) from the Pacific Ocean sediment samples, suggesting that the fungi might be involved in important ecological functions in the deep-sea environments.

  5. A Phylogenomic Perspective on the Radiation of Ray-Finned Fishes Based upon Targeted Sequencing of Ultraconserved Elements (UCEs)

    PubMed Central

    Sorenson, Laurie; Santini, Francesco

    2013-01-01

    Ray-finned fishes constitute the dominant radiation of vertebrates with over 32,000 species. Although molecular phylogenetics has begun to disentangle major evolutionary relationships within this vast section of the Tree of Life, there is no widely available approach for efficiently collecting phylogenomic data within fishes, leaving much of the enormous potential of massively parallel sequencing technologies for resolving major radiations in ray-finned fishes unrealized. Here, we provide a genomic perspective on longstanding questions regarding the diversification of major groups of ray-finned fishes through targeted enrichment of ultraconserved nuclear DNA elements (UCEs) and their flanking sequence. Our workflow efficiently and economically generates data sets that are orders of magnitude larger than those produced by traditional approaches and is well-suited to working with museum specimens. Analysis of the UCE data set recovers a well-supported phylogeny at both shallow and deep time-scales that supports a monophyletic relationship between Amia and Lepisosteus (Holostei) and reveals elopomorphs and then osteoglossomorphs to be the earliest diverging teleost lineages. Our approach additionally reveals that sequence capture of UCE regions and their flanking sequence offers enormous potential for resolving phylogenetic relationships within ray-finned fishes. PMID:23824177

  6. Quasispecies Analyses of the HIV-1 Near-full-length Genome With Illumina MiSeq

    PubMed Central

    Ode, Hirotaka; Matsuda, Masakazu; Matsuoka, Kazuhiro; Hachiya, Atsuko; Hattori, Junko; Kito, Yumiko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru

    2015-01-01

    Human immunodeficiency virus type-1 (HIV-1) exhibits high between-host genetic diversity and within-host heterogeneity, recognized as quasispecies. Because HIV-1 quasispecies fluctuate in terms of multiple factors, such as antiretroviral exposure and host immunity, analyzing the HIV-1 genome is critical for selecting effective antiretroviral therapy and understanding within-host viral coevolution mechanisms. Here, to obtain HIV-1 genome sequence information that includes minority variants, we sought to develop a method for evaluating quasispecies throughout the HIV-1 near-full-length genome using the Illumina MiSeq benchtop deep sequencer. To ensure the reliability of minority mutation detection, we applied an analysis method of sequence read mapping onto a consensus sequence derived from de novo assembly followed by iterative mapping and subsequent unique error correction. Deep sequencing analyses of aHIV-1 clone showed that the analysis method reduced erroneous base prevalence below 1% in each sequence position and discarded only < 1% of all collected nucleotides, maximizing the usage of the collected genome sequences. Further, we designed primer sets to amplify the HIV-1 near-full-length genome from clinical plasma samples. Deep sequencing of 92 samples in combination with the primer sets and our analysis method provided sufficient coverage to identify >1%-frequency sequences throughout the genome. When we evaluated sequences of pol genes from 18 treatment-naïve patients' samples, the deep sequencing results were in agreement with Sanger sequencing and identified numerous additional minority mutations. The results suggest that our deep sequencing method would be suitable for identifying within-host viral population dynamics throughout the genome. PMID:26617593

  7. Insights from Human/Mouse genome comparisons

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pennacchio, Len A.

    2003-03-30

    Large-scale public genomic sequencing efforts have provided a wealth of vertebrate sequence data poised to provide insights into mammalian biology. These include deep genomic sequence coverage of human, mouse, rat, zebrafish, and two pufferfish (Fugu rubripes and Tetraodon nigroviridis) (Aparicio et al. 2002; Lander et al. 2001; Venter et al. 2001; Waterston et al. 2002). In addition, a high-priority has been placed on determining the genomic sequence of chimpanzee, dog, cow, frog, and chicken (Boguski 2002). While only recently available, whole genome sequence data have provided the unique opportunity to globally compare complete genome contents. Furthermore, the shared evolutionary ancestrymore » of vertebrate species has allowed the development of comparative genomic approaches to identify ancient conserved sequences with functionality. Accordingly, this review focuses on the initial comparison of available mammalian genomes and describes various insights derived from such analysis.« less

  8. Single-Cell Sequencing for Drug Discovery and Drug Development.

    PubMed

    Wu, Hongjin; Wang, Charles; Wu, Shixiu

    2017-01-01

    Next-generation sequencing (NGS), particularly single-cell sequencing, has revolutionized the scale and scope of genomic and biomedical research. Recent technological advances in NGS and singlecell studies have made the deep whole-genome (DNA-seq), whole epigenome and whole-transcriptome sequencing (RNA-seq) at single-cell level feasible. NGS at the single-cell level expands our view of genome, epigenome and transcriptome and allows the genome, epigenome and transcriptome of any organism to be explored without a priori assumptions and with unprecedented throughput. And it does so with single-nucleotide resolution. NGS is also a very powerful tool for drug discovery and drug development. In this review, we describe the current state of single-cell sequencing techniques, which can provide a new, more powerful and precise approach for analyzing effects of drugs on treated cells and tissues. Our review discusses single-cell whole genome/exome sequencing (scWGS/scWES), single-cell transcriptome sequencing (scRNA-seq), single-cell bisulfite sequencing (scBS), and multiple omics of single-cell sequencing. We also highlight the advantages and challenges of each of these approaches. Finally, we describe, elaborate and speculate the potential applications of single-cell sequencing for drug discovery and drug development. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  9. deepTools: a flexible platform for exploring deep-sequencing data.

    PubMed

    Ramírez, Fidel; Dündar, Friederike; Diehl, Sarah; Grüning, Björn A; Manke, Thomas

    2014-07-01

    We present a Galaxy based web server for processing and visualizing deeply sequenced data. The web server's core functionality consists of a suite of newly developed tools, called deepTools, that enable users with little bioinformatic background to explore the results of their sequencing experiments in a standardized setting. Users can upload pre-processed files with continuous data in standard formats and generate heatmaps and summary plots in a straight-forward, yet highly customizable manner. In addition, we offer several tools for the analysis of files containing aligned reads and enable efficient and reproducible generation of normalized coverage files. As a modular and open-source platform, deepTools can easily be expanded and customized to future demands and developments. The deepTools webserver is freely available at http://deeptools.ie-freiburg.mpg.de and is accompanied by extensive documentation and tutorials aimed at conveying the principles of deep-sequencing data analysis. The web server can be used without registration. deepTools can be installed locally either stand-alone or as part of Galaxy. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus.

    PubMed

    Zhang, Yan; An, Lin; Xu, Jie; Zhang, Bo; Zheng, W Jim; Hu, Ming; Tang, Jijun; Yue, Feng

    2018-02-21

    Although Hi-C technology is one of the most popular tools for studying 3D genome organization, due to sequencing cost, the resolution of most Hi-C datasets are coarse and cannot be used to link distal regulatory elements to their target genes. Here we develop HiCPlus, a computational approach based on deep convolutional neural network, to infer high-resolution Hi-C interaction matrices from low-resolution Hi-C data. We demonstrate that HiCPlus can impute interaction matrices highly similar to the original ones, while only using 1/16 of the original sequencing reads. We show that the models learned from one cell type can be applied to make predictions in other cell or tissue types. Our work not only provides a computational framework to enhance Hi-C data resolution but also reveals features underlying the formation of 3D chromatin interactions.

  11. High-Resolution Sequence-Function Mapping of Full-Length Proteins

    PubMed Central

    Kowalsky, Caitlin A.; Klesmith, Justin R.; Stapleton, James A.; Kelly, Vince; Reichkitzer, Nolan; Whitehead, Timothy A.

    2015-01-01

    Comprehensive sequence-function mapping involves detailing the fitness contribution of every possible single mutation to a gene by comparing the abundance of each library variant before and after selection for the phenotype of interest. Deep sequencing of library DNA allows frequency reconstruction for tens of thousands of variants in a single experiment, yet short read lengths of current sequencers makes it challenging to probe genes encoding full-length proteins. Here we extend the scope of sequence-function maps to entire protein sequences with a modular, universal sequence tiling method. We demonstrate the approach with both growth-based selections and FACS screening, offer parameters and best practices that simplify design of experiments, and present analytical solutions to normalize data across independent selections. Using this protocol, sequence-function maps covering full sequences can be obtained in four to six weeks. Best practices introduced in this manuscript are fully compatible with, and complementary to, other recently published sequence-function mapping protocols. PMID:25790064

  12. Testing genotyping strategies for ultra-deep sequencing of a co-amplifying gene family: MHC class I in a passerine bird.

    PubMed

    Biedrzycka, Aleksandra; Sebastian, Alvaro; Migalska, Magdalena; Westerdahl, Helena; Radwan, Jacek

    2017-07-01

    Characterization of highly duplicated genes, such as genes of the major histocompatibility complex (MHC), where multiple loci often co-amplify, has until recently been hindered by insufficient read depths per amplicon. Here, we used ultra-deep Illumina sequencing to resolve genotypes at exon 3 of MHC class I genes in the sedge warbler (Acrocephalus schoenobaenus). We sequenced 24 individuals in two replicates and used this data, as well as a simulated data set, to test the effect of amplicon coverage (range: 500-20 000 reads per amplicon) on the repeatability of genotyping using four different genotyping approaches. A third replicate employed unique barcoding to assess the extent of tag jumping, that is swapping of individual tag identifiers, which may confound genotyping. The reliability of MHC genotyping increased with coverage and approached or exceeded 90% within-method repeatability of allele calling at coverages of >5000 reads per amplicon. We found generally high agreement between genotyping methods, especially at high coverages. High reliability of the tested genotyping approaches was further supported by our analysis of the simulated data set, although the genotyping approach relying primarily on replication of variants in independent amplicons proved sensitive to repeatable errors. According to the most repeatable genotyping method, the number of co-amplifying variants per individual ranged from 19 to 42. Tag jumping was detectable, but at such low frequencies that it did not affect the reliability of genotyping. We thus demonstrate that gene families with many co-amplifying genes can be reliably genotyped using HTS, provided that there is sufficient per amplicon coverage. © 2016 John Wiley & Sons Ltd.

  13. Deep sequencing-based transcriptome analysis of Plutella xylostella larvae parasitized by Diadegma semiclausum

    PubMed Central

    2011-01-01

    Background Parasitoid insects manipulate their hosts' physiology by injecting various factors into their host upon parasitization. Transcriptomic approaches provide a powerful approach to study insect host-parasitoid interactions at the molecular level. In order to investigate the effects of parasitization by an ichneumonid wasp (Diadegma semiclausum) on the host (Plutella xylostella), the larval transcriptome profile was analyzed using a short-read deep sequencing method (Illumina). Symbiotic polydnaviruses (PDVs) associated with ichneumonid parasitoids, known as ichnoviruses, play significant roles in host immune suppression and developmental regulation. In the current study, D. semiclausum ichnovirus (DsIV) genes expressed in P. xylostella were identified and their sequences compared with other reported PDVs. Five of these genes encode proteins of unknown identity, that have not previously been reported. Results De novo assembly of cDNA sequence data generated 172,660 contigs between 100 and 10000 bp in length; with 35% of > 200 bp in length. Parasitization had significant impacts on expression levels of 928 identified insect host transcripts. Gene ontology data illustrated that the majority of the differentially expressed genes are involved in binding, catalytic activity, and metabolic and cellular processes. In addition, the results show that transcription levels of antimicrobial peptides, such as gloverin, cecropin E and lysozyme, were up-regulated after parasitism. Expression of ichnovirus genes were detected in parasitized larvae with 19 unique sequences identified from five PDV gene families including vankyrin, viral innexin, repeat elements, a cysteine-rich motif, and polar residue rich protein. Vankyrin 1 and repeat element 1 genes showed the highest transcription levels among the DsIV genes. Conclusion This study provides detailed information on differential expression of P. xylostella larval genes following parasitization, DsIV genes expressed in the host and also improves our current understanding of this host-parasitoid interaction. PMID:21906285

  14. Low-Latency Telerobotic Sample Return and Biomolecular Sequencing for Deep Space Gateway

    NASA Astrophysics Data System (ADS)

    Lupisella, M.; Bleacher, J.; Lewis, R.; Dworkin, J.; Wright, M.; Burton, A.; Rubins, K.; Wallace, S.; Stahl, S.; John, K.; Archer, D.; Niles, P.; Regberg, A.; Smith, D.; Race, M.; Chiu, C.; Russell, J.; Rampe, E.; Bywaters, K.

    2018-02-01

    Low-latency telerobotics, crew-assisted sample return, and biomolecular sequencing can be used to acquire and analyze lunar farside and/or Apollo landing site samples. Sequencing can also be used to monitor and study Deep Space Gateway environment and crew health.

  15. Fungal communities from the calcareous deep-sea sediments in the Southwest India Ridge revealed by Illumina sequencing technology.

    PubMed

    Zhang, Likui; Kang, Manyu; Huang, Yangchao; Yang, Lixiang

    2016-05-01

    The diversity and ecological significance of bacteria and archaea in deep-sea environments have been thoroughly investigated, but eukaryotic microorganisms in these areas, such as fungi, are poorly understood. To elucidate fungal diversity in calcareous deep-sea sediments in the Southwest India Ridge (SWIR), the internal transcribed spacer (ITS) regions of rRNA genes from two sediment metagenomic DNA samples were amplified and sequenced using the Illumina sequencing platform. The results revealed that 58-63 % and 36-42 % of the ITS sequences (97 % similarity) belonged to Basidiomycota and Ascomycota, respectively. These findings suggest that Basidiomycota and Ascomycota are the predominant fungal phyla in the two samples. We also found that Agaricomycetes, Leotiomycetes, and Pezizomycetes were the major fungal classes in the two samples. At the species level, Thelephoraceae sp. and Phialocephala fortinii were major fungal species in the two samples. Despite the low relative abundance, unidentified fungal sequences were also observed in the two samples. Furthermore, we found that there were slight differences in fungal diversity between the two sediment samples, although both were collected from the SWIR. Thus, our results demonstrate that calcareous deep-sea sediments in the SWIR harbor diverse fungi, which augment the fungal groups in deep-sea sediments. This is the first report of fungal communities in calcareous deep-sea sediments in the SWIR revealed by Illumina sequencing.

  16. Accurate identification of RNA editing sites from primitive sequence with deep neural networks.

    PubMed

    Ouyang, Zhangyi; Liu, Feng; Zhao, Chenghui; Ren, Chao; An, Gaole; Mei, Chuan; Bo, Xiaochen; Shu, Wenjie

    2018-04-16

    RNA editing is a post-transcriptional RNA sequence alteration. Current methods have identified editing sites and facilitated research but require sufficient genomic annotations and prior-knowledge-based filtering steps, resulting in a cumbersome, time-consuming identification process. Moreover, these methods have limited generalizability and applicability in species with insufficient genomic annotations or in conditions of limited prior knowledge. We developed DeepRed, a deep learning-based method that identifies RNA editing from primitive RNA sequences without prior-knowledge-based filtering steps or genomic annotations. DeepRed achieved 98.1% and 97.9% area under the curve (AUC) in training and test sets, respectively. We further validated DeepRed using experimentally verified U87 cell RNA-seq data, achieving 97.9% positive predictive value (PPV). We demonstrated that DeepRed offers better prediction accuracy and computational efficiency than current methods with large-scale, mass RNA-seq data. We used DeepRed to assess the impact of multiple factors on editing identification with RNA-seq data from the Association of Biomolecular Resource Facilities and Sequencing Quality Control projects. We explored developmental RNA editing pattern changes during human early embryogenesis and evolutionary patterns in Drosophila species and the primate lineage using DeepRed. Our work illustrates DeepRed's state-of-the-art performance; it may decipher the hidden principles behind RNA editing, making editing detection convenient and effective.

  17. Know Your Enemy: Successful Bioinformatic Approaches to Predict Functional RNA Structures in Viral RNAs.

    PubMed

    Lim, Chun Shen; Brown, Chris M

    2017-01-01

    Structured RNA elements may control virus replication, transcription and translation, and their distinct features are being exploited by novel antiviral strategies. Viral RNA elements continue to be discovered using combinations of experimental and computational analyses. However, the wealth of sequence data, notably from deep viral RNA sequencing, viromes, and metagenomes, necessitates computational approaches being used as an essential discovery tool. In this review, we describe practical approaches being used to discover functional RNA elements in viral genomes. In addition to success stories in new and emerging viruses, these approaches have revealed some surprising new features of well-studied viruses e.g., human immunodeficiency virus, hepatitis C virus, influenza, and dengue viruses. Some notable discoveries were facilitated by new comparative analyses of diverse viral genome alignments. Importantly, comparative approaches for finding RNA elements embedded in coding and non-coding regions differ. With the exponential growth of computer power we have progressed from stem-loop prediction on single sequences to cutting edge 3D prediction, and from command line to user friendly web interfaces. Despite these advances, many powerful, user friendly prediction tools and resources are underutilized by the virology community.

  18. Know Your Enemy: Successful Bioinformatic Approaches to Predict Functional RNA Structures in Viral RNAs

    PubMed Central

    Lim, Chun Shen; Brown, Chris M.

    2018-01-01

    Structured RNA elements may control virus replication, transcription and translation, and their distinct features are being exploited by novel antiviral strategies. Viral RNA elements continue to be discovered using combinations of experimental and computational analyses. However, the wealth of sequence data, notably from deep viral RNA sequencing, viromes, and metagenomes, necessitates computational approaches being used as an essential discovery tool. In this review, we describe practical approaches being used to discover functional RNA elements in viral genomes. In addition to success stories in new and emerging viruses, these approaches have revealed some surprising new features of well-studied viruses e.g., human immunodeficiency virus, hepatitis C virus, influenza, and dengue viruses. Some notable discoveries were facilitated by new comparative analyses of diverse viral genome alignments. Importantly, comparative approaches for finding RNA elements embedded in coding and non-coding regions differ. With the exponential growth of computer power we have progressed from stem-loop prediction on single sequences to cutting edge 3D prediction, and from command line to user friendly web interfaces. Despite these advances, many powerful, user friendly prediction tools and resources are underutilized by the virology community. PMID:29354101

  19. A deep learning method for lincRNA detection using auto-encoder algorithm.

    PubMed

    Yu, Ning; Yu, Zeng; Pan, Yi

    2017-12-06

    RNA sequencing technique (RNA-seq) enables scientists to develop novel data-driven methods for discovering more unidentified lincRNAs. Meantime, knowledge-based technologies are experiencing a potential revolution ignited by the new deep learning methods. By scanning the newly found data set from RNA-seq, scientists have found that: (1) the expression of lincRNAs appears to be regulated, that is, the relevance exists along the DNA sequences; (2) lincRNAs contain some conversed patterns/motifs tethered together by non-conserved regions. The two evidences give the reasoning for adopting knowledge-based deep learning methods in lincRNA detection. Similar to coding region transcription, non-coding regions are split at transcriptional sites. However, regulatory RNAs rather than message RNAs are generated. That is, the transcribed RNAs participate the biological process as regulatory units instead of generating proteins. Identifying these transcriptional regions from non-coding regions is the first step towards lincRNA recognition. The auto-encoder method achieves 100% and 92.4% prediction accuracy on transcription sites over the putative data sets. The experimental results also show the excellent performance of predictive deep neural network on the lincRNA data sets compared with support vector machine and traditional neural network. In addition, it is validated through the newly discovered lincRNA data set and one unreported transcription site is found by feeding the whole annotated sequences through the deep learning machine, which indicates that deep learning method has the extensive ability for lincRNA prediction. The transcriptional sequences of lincRNAs are collected from the annotated human DNA genome data. Subsequently, a two-layer deep neural network is developed for the lincRNA detection, which adopts the auto-encoder algorithm and utilizes different encoding schemes to obtain the best performance over intergenic DNA sequence data. Driven by those newly annotated lincRNA data, deep learning methods based on auto-encoder algorithm can exert their capability in knowledge learning in order to capture the useful features and the information correlation along DNA genome sequences for lincRNA detection. As our knowledge, this is the first application to adopt the deep learning techniques for identifying lincRNA transcription sequences.

  20. Prediction of Bispectral Index during Target-controlled Infusion of Propofol and Remifentanil: A Deep Learning Approach.

    PubMed

    Lee, Hyung-Chul; Ryu, Ho-Geol; Chung, Eun-Jin; Jung, Chul-Woo

    2018-03-01

    The discrepancy between predicted effect-site concentration and measured bispectral index is problematic during intravenous anesthesia with target-controlled infusion of propofol and remifentanil. We hypothesized that bispectral index during total intravenous anesthesia would be more accurately predicted by a deep learning approach. Long short-term memory and the feed-forward neural network were sequenced to simulate the pharmacokinetic and pharmacodynamic parts of an empirical model, respectively, to predict intraoperative bispectral index during combined use of propofol and remifentanil. Inputs of long short-term memory were infusion histories of propofol and remifentanil, which were retrieved from target-controlled infusion pumps for 1,800 s at 10-s intervals. Inputs of the feed-forward network were the outputs of long short-term memory and demographic data such as age, sex, weight, and height. The final output of the feed-forward network was the bispectral index. The performance of bispectral index prediction was compared between the deep learning model and previously reported response surface model. The model hyperparameters comprised 8 memory cells in the long short-term memory layer and 16 nodes in the hidden layer of the feed-forward network. The model training and testing were performed with separate data sets of 131 and 100 cases. The concordance correlation coefficient (95% CI) were 0.561 (0.560 to 0.562) in the deep learning model, which was significantly larger than that in the response surface model (0.265 [0.263 to 0.266], P < 0.001). The deep learning model-predicted bispectral index during target-controlled infusion of propofol and remifentanil more accurately compared to the traditional model. The deep learning approach in anesthetic pharmacology seems promising because of its excellent performance and extensibility.

  1. GARFIELD-NGS: Genomic vARiants FIltering by dEep Learning moDels in NGS.

    PubMed

    Ravasio, Viola; Ritelli, Marco; Legati, Andrea; Giacopuzzi, Edoardo

    2018-04-14

    Exome sequencing approach is extensively used in research and diagnostic laboratories to discover pathological variants and study genetic architecture of human diseases. However, a significant proportion of identified genetic variants are actually false positive calls, and this pose serious challenges for variants interpretation. Here, we propose a new tool named GARFIELD-NGS (Genomic vARiants FIltering by dEep Learning moDels in NGS), which rely on deep learning models to dissect false and true variants in exome sequencing experiments performed with Illumina or ION platforms. GARFIELD-NGS showed strong performances for both SNP and INDEL variants (AUC 0.71 - 0.98) and outperformed established hard filters. The method is robust also at low coverage down to 30X and can be applied on data generated with the recent Illumina two-colour chemistry. GARFIELD-NGS processes standard VCF file and produces a regular VCF output. Thus, it can be easily integrated in existing analysis pipeline, allowing application of different thresholds based on desired level of sensitivity and specificity. GARFIELD-NGS available at https://github.com/gedoardo83/GARFIELD-NGS. edoardo.giacopuzzi@unibs.it. Supplementary data are available at Bioinformatics online.

  2. Sequencing of a Patient with Balanced Chromosome Abnormalities and Neurodevelopmental Disease Identifies Disruption of Multiple High Risk Loci by Structural Variation

    PubMed Central

    Blake, Jonathon; Riddell, Andrew; Theiss, Susanne; Gonzalez, Alexis Perez; Haase, Bettina; Jauch, Anna; Janssen, Johannes W. G.; Ibberson, David; Pavlinic, Dinko; Moog, Ute; Benes, Vladimir; Runz, Heiko

    2014-01-01

    Balanced chromosome abnormalities (BCAs) occur at a high frequency in healthy and diseased individuals, but cost-efficient strategies to identify BCAs and evaluate whether they contribute to a phenotype have not yet become widespread. Here we apply genome-wide mate-pair library sequencing to characterize structural variation in a patient with unclear neurodevelopmental disease (NDD) and complex de novo BCAs at the karyotype level. Nucleotide-level characterization of the clinically described BCA breakpoints revealed disruption of at least three NDD candidate genes (LINC00299, NUP205, PSMD14) that gave rise to abnormal mRNAs and could be assumed as disease-causing. However, unbiased genome-wide analysis of the sequencing data for cryptic structural variation was key to reveal an additional submicroscopic inversion that truncates the schizophrenia- and bipolar disorder-associated brain transcription factor ZNF804A as an equally likely NDD-driving gene. Deep sequencing of fluorescent-sorted wild-type and derivative chromosomes confirmed the clinically undetected BCA. Moreover, deep sequencing further validated a high accuracy of mate-pair library sequencing to detect structural variants larger than 10 kB, proposing that this approach is powerful for clinical-grade genome-wide structural variant detection. Our study supports previous evidence for a role of ZNF804A in NDD and highlights the need for a more comprehensive assessment of structural variation in karyotypically abnormal individuals and patients with neurocognitive disease to avoid diagnostic deception. PMID:24625750

  3. Breaking the 1000-gene barrier for Mimivirus using ultra-deep genome and transcriptome sequencing.

    PubMed

    Legendre, Matthieu; Santini, Sébastien; Rico, Alain; Abergel, Chantal; Claverie, Jean-Michel

    2011-03-04

    Mimivirus, a giant dsDNA virus infecting Acanthamoeba, is the prototype of the mimiviridae family, the latest addition to the family of the nucleocytoplasmic large DNA viruses (NCLDVs). Its 1.2 Mb-genome was initially predicted to encode 917 genes. A subsequent RNA-Seq analysis precisely mapped many transcript boundaries and identified 75 new genes. We now report a much deeper analysis using the SOLiD™ technology combining RNA-Seq of the Mimivirus transcriptome during the infectious cycle (202.4 Million reads), and a complete genome re-sequencing (45.3 Million reads). This study corrected the genome sequence and identified several single nucleotide polymorphisms. Our results also provided clear evidence of previously overlooked transcription units, including an important RNA polymerase subunit distantly related to Euryarchea homologues. The total Mimivirus gene count is now 1018, 11% greater than the original annotation. This study highlights the huge progress brought about by ultra-deep sequencing for the comprehensive annotation of virus genomes, opening the door to a complete one-nucleotide resolution level description of their transcriptional activity, and to the realistic modeling of the viral genome expression at the ultimate molecular level. This work also illustrates the need to go beyond bioinformatics-only approaches for the annotation of short protein and non-coding genes in viral genomes.

  4. Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks.

    PubMed

    Umarov, Ramzan Kh; Solovyev, Victor V

    2017-01-01

    Accurate computational identification of promoters remains a challenge as these key DNA regulatory regions have variable structures composed of functional motifs that provide gene-specific initiation of transcription. In this paper we utilize Convolutional Neural Networks (CNN) to analyze sequence characteristics of prokaryotic and eukaryotic promoters and build their predictive models. We trained a similar CNN architecture on promoters of five distant organisms: human, mouse, plant (Arabidopsis), and two bacteria (Escherichia coli and Bacillus subtilis). We found that CNN trained on sigma70 subclass of Escherichia coli promoter gives an excellent classification of promoters and non-promoter sequences (Sn = 0.90, Sp = 0.96, CC = 0.84). The Bacillus subtilis promoters identification CNN model achieves Sn = 0.91, Sp = 0.95, and CC = 0.86. For human, mouse and Arabidopsis promoters we employed CNNs for identification of two well-known promoter classes (TATA and non-TATA promoters). CNN models nicely recognize these complex functional regions. For human promoters Sn/Sp/CC accuracy of prediction reached 0.95/0.98/0,90 on TATA and 0.90/0.98/0.89 for non-TATA promoter sequences, respectively. For Arabidopsis we observed Sn/Sp/CC 0.95/0.97/0.91 (TATA) and 0.94/0.94/0.86 (non-TATA) promoters. Thus, the developed CNN models, implemented in CNNProm program, demonstrated the ability of deep learning approach to grasp complex promoter sequence characteristics and achieve significantly higher accuracy compared to the previously developed promoter prediction programs. We also propose random substitution procedure to discover positionally conserved promoter functional elements. As the suggested approach does not require knowledge of any specific promoter features, it can be easily extended to identify promoters and other complex functional regions in sequences of many other and especially newly sequenced genomes. The CNNProm program is available to run at web server http://www.softberry.com.

  5. Population-genomic variation within RNA viruses of the Western honey bee, Apis mellifera, inferred from deep sequencing

    USDA-ARS?s Scientific Manuscript database

    Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RN...

  6. Geoseq: a tool for dissecting deep-sequencing datasets.

    PubMed

    Gurtowski, James; Cancio, Anthony; Shah, Hardik; Levovitz, Chaya; George, Ajish; Homann, Robert; Sachidanandam, Ravi

    2010-10-12

    Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a) identify differential isoform expression in mRNA-seq datasets, b) identify miRNAs (microRNAs) in libraries, and identify mature and star sequences in miRNAS and c) to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.

  7. Detection of Emerging Vaccine-Related Polioviruses by Deep Sequencing.

    PubMed

    Sahoo, Malaya K; Holubar, Marisa; Huang, ChunHong; Mohamed-Hadley, Alisha; Liu, Yuanyuan; Waggoner, Jesse J; Troy, Stephanie B; Garcia-Garcia, Lourdes; Ferreyra-Reyes, Leticia; Maldonado, Yvonne; Pinsky, Benjamin A

    2017-07-01

    Oral poliovirus vaccine can mutate to regain neurovirulence. To date, evaluation of these mutations has been performed primarily on culture-enriched isolates by using conventional Sanger sequencing. We therefore developed a culture-independent, deep-sequencing method targeting the 5' untranslated region (UTR) and P1 genomic region to characterize vaccine-related poliovirus variants. Error analysis of the deep-sequencing method demonstrated reliable detection of poliovirus mutations at levels of <1%, depending on read depth. Sequencing of viral nucleic acids from the stool of vaccinated, asymptomatic children and their close contacts collected during a prospective cohort study in Veracruz, Mexico, revealed no vaccine-derived polioviruses. This was expected given that the longest duration between sequenced sample collection and the end of the most recent national immunization week was 66 days. However, we identified many low-level variants (<5%) distributed across the 5' UTR and P1 genomic region in all three Sabin serotypes, as well as vaccine-related viruses with multiple canonical mutations associated with phenotypic reversion present at high levels (>90%). These results suggest that monitoring emerging vaccine-related poliovirus variants by deep sequencing may aid in the poliovirus endgame and efforts to ensure global polio eradication. Copyright © 2017 Sahoo et al.

  8. Sequence-of-events-driven automation of the deep space network

    NASA Technical Reports Server (NTRS)

    Hill, R., Jr.; Fayyad, K.; Smyth, C.; Santos, T.; Chen, R.; Chien, S.; Bevan, R.

    1996-01-01

    In February 1995, sequence-of-events (SOE)-driven automation technology was demonstrated for a Voyager telemetry downlink track at DSS 13. This demonstration entailed automated generation of an operations procedure (in the form of a temporal dependency network) from project SOE information using artificial intelligence planning technology and automated execution of the temporal dependency network using the link monitor and control operator assistant system. This article describes the overall approach to SOE-driven automation that was demonstrated, identifies gaps in SOE definitions and project profiles that hamper automation, and provides detailed measurements of the knowledge engineering effort required for automation.

  9. Sequence-of-Events-Driven Automation of the Deep Space Network

    NASA Technical Reports Server (NTRS)

    Hill, R., Jr.; Fayyad, K.; Smyth, C.; Santos, T.; Chen, R.; Chien, S.; Bevan, R.

    1996-01-01

    In February 1995, sequence-of-events (SOE)-driven automation technology was demonstrated for a Voyager telemetry downlink track at DSS 13. This demonstration entailed automated generation of an operations procedure (in the form of a temporal dependency network) from project SOE information using artificial intelligence planning technology and automated execution of the temporal dependency network using the link monitor and control operator assistant system. This article describes the overall approach to SOE-driven automation that was demonstrated, identifies gaps in SOE definitions and project profiles that hamper automation, and provides detailed measurements of the knowledge engineering effort required for automation.

  10. Genome-wide prediction of cis-regulatory regions using supervised deep learning methods.

    PubMed

    Li, Yifeng; Shi, Wenqiang; Wasserman, Wyeth W

    2018-05-31

    In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide. Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome). The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations.

  11. Deep Sequencing in Infectious Diseases: Immune and Pathogen Repertoires for the Improvement of Patient Outcomes.

    PubMed

    Burkholder, William F; Newell, Evan W; Poidinger, Michael; Chen, Swaine; Fink, Katja

    2017-01-01

    The inaugural workshop "Deep Sequencing in Infectious Diseases: Immune and Pathogen Repertoires for the Improvement of Patient Outcomes" was held in Singapore on 13-14 October 2016. The aim of the workshop was to discuss the latest trends in using high-throughput sequencing, bioinformatics, and allied technologies to analyze immune and pathogen repertoires and their interplay within the host, bringing together key international players in the field and Singapore-based researchers and clinician-scientists. The focus was in particular on the application of these technologies for the improvement of patient diagnosis, prognosis and treatment, and for other broad public health outcomes. The presentations by scientists and clinicians showed the potential of deep sequencing technology to capture the coevolution of adaptive immunity and pathogens. For clinical applications, some key challenges remain, such as the long turnaround time and relatively high cost of deep sequencing for pathogen identification and characterization and the lack of international standardization in immune repertoire analysis.

  12. Deep Sequencing in Infectious Diseases: Immune and Pathogen Repertoires for the Improvement of Patient Outcomes

    PubMed Central

    Burkholder, William F.; Newell, Evan W.; Poidinger, Michael; Chen, Swaine; Fink, Katja

    2017-01-01

    The inaugural workshop “Deep Sequencing in Infectious Diseases: Immune and Pathogen Repertoires for the Improvement of Patient Outcomes” was held in Singapore on 13–14 October 2016. The aim of the workshop was to discuss the latest trends in using high-throughput sequencing, bioinformatics, and allied technologies to analyze immune and pathogen repertoires and their interplay within the host, bringing together key international players in the field and Singapore-based researchers and clinician-scientists. The focus was in particular on the application of these technologies for the improvement of patient diagnosis, prognosis and treatment, and for other broad public health outcomes. The presentations by scientists and clinicians showed the potential of deep sequencing technology to capture the coevolution of adaptive immunity and pathogens. For clinical applications, some key challenges remain, such as the long turnaround time and relatively high cost of deep sequencing for pathogen identification and characterization and the lack of international standardization in immune repertoire analysis. PMID:28620372

  13. Current and future molecular diagnostics for ocular infectious diseases.

    PubMed

    Doan, Thuy; Pinsky, Benjamin A

    2016-11-01

    Confirmation of ocular infections can pose great challenges to the clinician. A fundamental limitation is the small amounts of specimen that can be obtained from the eye. Molecular diagnostics can circumvent this limitation and have been shown to be more sensitive than conventional culture. The purpose of this review is to describe new molecular methods and to discuss the applications of next-generation sequencing-based approaches in the diagnosis of ocular infections. Efforts have focused on improving the sensitivity of pathogen detection using molecular methods. This review describes a new molecular target for Toxoplasma gondii-directed polymerase chain reaction assays. Molecular diagnostics for Chlamydia trachomatis and Acanthamoeba species are also discussed. Finally, we describe a hypothesis-free approach, metagenomic deep sequencing, which can detect DNA and RNA pathogens from a single specimen in one test. In some cases, this method can provide the geographic location and timing of the infection. Pathogen-directed PCRs have been powerful tools in the diagnosis of ocular infections for over 20 years. The use of next-generation sequencing-based approaches, when available, will further improve sensitivity of detection with the potential to improve patient care.

  14. DNA Replication Profiling Using Deep Sequencing.

    PubMed

    Saayman, Xanita; Ramos-Pérez, Cristina; Brown, Grant W

    2018-01-01

    Profiling of DNA replication during progression through S phase allows a quantitative snap-shot of replication origin usage and DNA replication fork progression. We present a method for using deep sequencing data to profile DNA replication in S. cerevisiae.

  15. Functional reasoning in diagnostic problem solving

    NASA Technical Reports Server (NTRS)

    Sticklen, Jon; Bond, W. E.; Stclair, D. C.

    1988-01-01

    This work is one facet of an integrated approach to diagnostic problem solving for aircraft and space systems currently under development. The authors are applying a method of modeling and reasoning about deep knowledge based on a functional viewpoint. The approach recognizes a level of device understanding which is intermediate between a compiled level of typical Expert Systems, and a deep level at which large-scale device behavior is derived from known properties of device structure and component behavior. At this intermediate functional level, a device is modeled in three steps. First, a component decomposition of the device is defined. Second, the functionality of each device/subdevice is abstractly identified. Third, the state sequences which implement each function are specified. Given a functional representation and a set of initial conditions, the functional reasoner acts as a consequence finder. The output of the consequence finder can be utilized in diagnostic problem solving. The paper also discussed ways in which this functional approach may find application in the aerospace field.

  16. The seismic stratigraphy of Okanagan Lake, British Columbia; a record of rapid deglaciation in a deep 'fiord-lake' basin

    NASA Astrophysics Data System (ADS)

    Eyles, Nicholas; Mullins, Henry T.; Hine, Albert C.

    1991-09-01

    This paper presents the first detailed data regarding the newly discovered deep infill of Okanagan Lake. Okanagan Lake (50°00'N, 119°30'W) is 120 km long, ˜ 3-5 km wide and occupies a glacially overdeepened bedrock basin in the southern interior of British Columbia. This basin, and other elongate lakes of the region (e.g. Shuswap, Kootenay, Kalamalka, Canim and Mahood lakes), mark the site of westward flowing ice streams within successive Cordilleran ice sheets. An air gun seismic survey of Okanagan Lake shows that the bedrock floor is nearly 650 m below sea-level, more than 2000 m below the rim of the surrounding plateau. The maximum thickness of Pleistocene sediment in Okanagan Lake basin approaches 800 m. Forty-six seismic reflection traverses and an axial profile show a relatively simple stratigraphy composed of three seismic sequences argued to be no older than the last glacial cycle (< 30 ka). A discontinuous basal unit (sequence I) characterized by large-scale diffractions, and up to 460 m thick, infills the narrow, V-shaped bedrock floor of the basin and is interpreted as a boulder gravel deposited by subglacial meltwaters. Overlying seismic sequence II is composed of two sub-sequences. Sub-sequence IIa is a chaotic to massive facies up to 736 m thick. Lakeshore exposures close to where this unit reaches lake level show deformed and chaotically-bedded glaciolacustrine silts containing gravel lens and large ice-rafted boulders. The surface topography of this sub-sequence is irregular and in general mimics the form of the underlying bedrock as a result of compaction. This sequence passes laterally into stratified facies (sub-sequence IIb) at the northern end of the basin. Seismic sequence II appears to record rapid ice-proximal dumping of glaciolacustrine silt as the Okanagan glacier backwasted upvalley in a deep lake. A thin (60 m max.) laminated seismic sequence (III) drapes the hummocky surface of sequence II and represents postglacial sedimentation from fan-deltas. The extreme thickness of sequences I and II in Okanagan Lake reflects the focussing of large volumes of meltwater and sediment into the basin during deglaciation; pre-existing sediments that pre-date the last glacial cycle appear to have been completely eroded. Glaciological conditions during sedimentation may have been similar to marine-based outlet glaciers calving in deep water in fiord basins. In contrast to marine settings where ice bergs are free to disperse, large volumes of dead ice were trapped within the basin; structural evidence for sedimentation around dead ice blocks has been previously used to argue that the Cordilleran Ice Sheet downwasted in situ. We emphasize in contrast, the trapping of dead ice left behind by rapidly calving lake-based outlet glaciers.

  17. Deep sequencing of Salmonella RNA associated with heterologous Hfq proteins in vivo reveals small RNAs as a major target class and identifies RNA processing phenotypes.

    PubMed

    Sittka, Alexandra; Sharma, Cynthia M; Rolle, Katarzyna; Vogel, Jörg

    2009-01-01

    The bacterial Sm-like protein, Hfq, is a key factor for the stability and function of small non-coding RNAs (sRNAs) in Escherichia coli. Homologues of this protein have been predicted in many distantly related organisms yet their functional conservation as sRNA-binding proteins has not entirely been clear. To address this, we expressed in Salmonella the Hfq proteins of two eubacteria (Neisseria meningitides, Aquifex aeolicus) and an archaeon (Methanocaldococcus jannaschii), and analyzed the associated RNA by deep sequencing. This in vivo approach identified endogenous Salmonella sRNAs as a major target of the foreign Hfq proteins. New Salmonella sRNA species were also identified, and some of these accumulated specifically in the presence of a foreign Hfq protein. In addition, we observed specific RNA processing defects, e.g., suppression of precursor processing of SraH sRNA by Methanocaldococcus Hfq, or aberrant accumulation of extracytoplasmic target mRNAs of the Salmonella GcvB, MicA or RybB sRNAs. Taken together, our study provides evidence of a conserved inherent sRNA-binding property of Hfq, which may facilitate the lateral transmission of regulatory sRNAs among distantly related species. It also suggests that the expression of heterologous RNA-binding proteins combined with deep sequencing analysis of RNA ligands can be used as a molecular tool to dissect individual steps of RNA metabolism in vivo.

  18. Understanding the complex evolution of rapidly mutating viruses with deep sequencing: Beyond the analysis of viral diversity.

    PubMed

    Leung, Preston; Eltahla, Auda A; Lloyd, Andrew R; Bull, Rowena A; Luciani, Fabio

    2017-07-15

    With the advent of affordable deep sequencing technologies, detection of low frequency variants within genetically diverse viral populations can now be achieved with unprecedented depth and efficiency. The high-resolution data provided by next generation sequencing technologies is currently recognised as the gold standard in estimation of viral diversity. In the analysis of rapidly mutating viruses, longitudinal deep sequencing datasets from viral genomes during individual infection episodes, as well as at the epidemiological level during outbreaks, now allow for more sophisticated analyses such as statistical estimates of the impact of complex mutation patterns on the evolution of the viral populations both within and between hosts. These analyses are revealing more accurate descriptions of the evolutionary dynamics that underpin the rapid adaptation of these viruses to the host response, and to drug therapies. This review assesses recent developments in methods and provide informative research examples using deep sequencing data generated from rapidly mutating viruses infecting humans, particularly hepatitis C virus (HCV), human immunodeficiency virus (HIV), Ebola virus and influenza virus, to understand the evolution of viral genomes and to explore the relationship between viral mutations and the host adaptive immune response. Finally, we discuss limitations in current technologies, and future directions that take advantage of publically available large deep sequencing datasets. Copyright © 2016 Elsevier B.V. All rights reserved.

  19. Low-abundant bacteria drive compositional changes in the gut microbiota after dietary alteration.

    PubMed

    Benjamino, Jacquelynn; Lincoln, Stephen; Srivastava, Ranjan; Graf, Joerg

    2018-05-10

    As the importance of beneficial bacteria is better recognized, understanding the dynamics of symbioses becomes increasingly crucial. In many gut symbioses, it is essential to understand whether changes in host diet play a role in the persistence of the bacterial gut community. In this study, termites were fed six dietary sources and the microbial community was monitored over a 49-day period using 16S rRNA gene sequencing. A deep backpropagation artificial neural network (ANN) was used to learn how the six different lignocellulose food sources affected the temporal composition of the hindgut microbiota of the termite as well as taxon-taxon and taxon-substrate interactions. Shifts in the termite gut microbiota after diet change in each colony were observed using 16S rRNA gene sequencing and beta diversity analyses. The artificial neural network accurately predicted the relative abundances of taxa at random points in the temporal study and showed that low-abundant taxa maintain community driving correlations in the hindgut. This combinatorial approach utilizing 16S rRNA gene sequencing and deep learning revealed that low-abundant bacteria that often do not belong to the core community are drivers of the termite hindgut bacterial community composition.

  20. Improved detection of CXCR4-using HIV by V3 genotyping: application of population-based and "deep" sequencing to plasma RNA and proviral DNA.

    PubMed

    Swenson, Luke C; Moores, Andrew; Low, Andrew J; Thielen, Alexander; Dong, Winnie; Woods, Conan; Jensen, Mark A; Wynhoven, Brian; Chan, Dennison; Glascock, Christopher; Harrigan, P Richard

    2010-08-01

    Tropism testing should rule out CXCR4-using HIV before treatment with CCR5 antagonists. Currently, the recombinant phenotypic Trofile assay (Monogram) is most widely utilized; however, genotypic tests may represent alternative methods. Independent triplicate amplifications of the HIV gp120 V3 region were made from either plasma HIV RNA or proviral DNA. These underwent standard, population-based sequencing with an ABI3730 (RNA n = 63; DNA n = 40), or "deep" sequencing with a Roche/454 Genome Sequencer-FLX (RNA n = 12; DNA n = 12). Position-specific scoring matrices (PSSMX4/R5) (-6.96 cutoff) and geno2pheno[coreceptor] (5% false-positive rate) inferred tropism from V3 sequence. These methods were then independently validated with a separate, blinded dataset (n = 278) of screening samples from the maraviroc MOTIVATE trials. Standard sequencing of HIV RNA with PSSM yielded 69% sensitivity and 91% specificity, relative to Trofile. The validation dataset gave 75% sensitivity and 83% specificity. Proviral DNA plus PSSM gave 77% sensitivity and 71% specificity. "Deep" sequencing of HIV RNA detected >2% inferred-CXCR4-using virus in 8/8 samples called non-R5 by Trofile, and <2% in 4/4 samples called R5. Triplicate analyses of V3 standard sequence data detect greater proportions of CXCR4-using samples than previously achieved. Sequencing proviral DNA and "deep" V3 sequencing may also be useful tools for assessing tropism.

  1. A New Approach for Mining Order-Preserving Submatrices Based on All Common Subsequences.

    PubMed

    Xue, Yun; Liao, Zhengling; Li, Meihang; Luo, Jie; Kuang, Qiuhua; Hu, Xiaohui; Li, Tiechen

    2015-01-01

    Order-preserving submatrices (OPSMs) have been applied in many fields, such as DNA microarray data analysis, automatic recommendation systems, and target marketing systems, as an important unsupervised learning model. Unfortunately, most existing methods are heuristic algorithms which are unable to reveal OPSMs entirely in NP-complete problem. In particular, deep OPSMs, corresponding to long patterns with few supporting sequences, incur explosive computational costs and are completely pruned by most popular methods. In this paper, we propose an exact method to discover all OPSMs based on frequent sequential pattern mining. First, an existing algorithm was adjusted to disclose all common subsequence (ACS) between every two row sequences, and therefore all deep OPSMs will not be missed. Then, an improved data structure for prefix tree was used to store and traverse ACS, and Apriori principle was employed to efficiently mine the frequent sequential pattern. Finally, experiments were implemented on gene and synthetic datasets. Results demonstrated the effectiveness and efficiency of this method.

  2. Genomic region operation kit for flexible processing of deep sequencing data.

    PubMed

    Ovaska, Kristian; Lyly, Lauri; Sahu, Biswajyoti; Jänne, Olli A; Hautaniemi, Sampsa

    2013-01-01

    Computational analysis of data produced in deep sequencing (DS) experiments is challenging due to large data volumes and requirements for flexible analysis approaches. Here, we present a mathematical formalism based on set algebra for frequently performed operations in DS data analysis to facilitate translation of biomedical research questions to language amenable for computational analysis. With the help of this formalism, we implemented the Genomic Region Operation Kit (GROK), which supports various DS-related operations such as preprocessing, filtering, file conversion, and sample comparison. GROK provides high-level interfaces for R, Python, Lua, and command line, as well as an extension C++ API. It supports major genomic file formats and allows storing custom genomic regions in efficient data structures such as red-black trees and SQL databases. To demonstrate the utility of GROK, we have characterized the roles of two major transcription factors (TFs) in prostate cancer using data from 10 DS experiments. GROK is freely available with a user guide from >http://csbi.ltdk.helsinki.fi/grok/.

  3. Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding.

    PubMed

    Min, Xu; Zeng, Wanwen; Chen, Ning; Chen, Ting; Jiang, Rui

    2017-07-15

    Experimental techniques for measuring chromatin accessibility are expensive and time consuming, appealing for the development of computational approaches to predict open chromatin regions from DNA sequences. Along this direction, existing methods fall into two classes: one based on handcrafted k -mer features and the other based on convolutional neural networks. Although both categories have shown good performance in specific applications thus far, there still lacks a comprehensive framework to integrate useful k -mer co-occurrence information with recent advances in deep learning. We fill this gap by addressing the problem of chromatin accessibility prediction with a convolutional Long Short-Term Memory (LSTM) network with k -mer embedding. We first split DNA sequences into k -mers and pre-train k -mer embedding vectors based on the co-occurrence matrix of k -mers by using an unsupervised representation learning approach. We then construct a supervised deep learning architecture comprised of an embedding layer, three convolutional layers and a Bidirectional LSTM (BLSTM) layer for feature learning and classification. We demonstrate that our method gains high-quality fixed-length features from variable-length sequences and consistently outperforms baseline methods. We show that k -mer embedding can effectively enhance model performance by exploring different embedding strategies. We also prove the efficacy of both the convolution and the BLSTM layers by comparing two variations of the network architecture. We confirm the robustness of our model to hyper-parameters by performing sensitivity analysis. We hope our method can eventually reinforce our understanding of employing deep learning in genomic studies and shed light on research regarding mechanisms of chromatin accessibility. The source code can be downloaded from https://github.com/minxueric/ismb2017_lstm . tingchen@tsinghua.edu.cn or ruijiang@tsinghua.edu.cn. Supplementary materials are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  4. Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding

    PubMed Central

    Min, Xu; Zeng, Wanwen; Chen, Ning; Chen, Ting; Jiang, Rui

    2017-01-01

    Abstract Motivation: Experimental techniques for measuring chromatin accessibility are expensive and time consuming, appealing for the development of computational approaches to predict open chromatin regions from DNA sequences. Along this direction, existing methods fall into two classes: one based on handcrafted k-mer features and the other based on convolutional neural networks. Although both categories have shown good performance in specific applications thus far, there still lacks a comprehensive framework to integrate useful k-mer co-occurrence information with recent advances in deep learning. Results: We fill this gap by addressing the problem of chromatin accessibility prediction with a convolutional Long Short-Term Memory (LSTM) network with k-mer embedding. We first split DNA sequences into k-mers and pre-train k-mer embedding vectors based on the co-occurrence matrix of k-mers by using an unsupervised representation learning approach. We then construct a supervised deep learning architecture comprised of an embedding layer, three convolutional layers and a Bidirectional LSTM (BLSTM) layer for feature learning and classification. We demonstrate that our method gains high-quality fixed-length features from variable-length sequences and consistently outperforms baseline methods. We show that k-mer embedding can effectively enhance model performance by exploring different embedding strategies. We also prove the efficacy of both the convolution and the BLSTM layers by comparing two variations of the network architecture. We confirm the robustness of our model to hyper-parameters by performing sensitivity analysis. We hope our method can eventually reinforce our understanding of employing deep learning in genomic studies and shed light on research regarding mechanisms of chromatin accessibility. Availability and implementation: The source code can be downloaded from https://github.com/minxueric/ismb2017_lstm. Contact: tingchen@tsinghua.edu.cn or ruijiang@tsinghua.edu.cn Supplementary information: Supplementary materials are available at Bioinformatics online. PMID:28881969

  5. Assessing copy number from exome sequencing and exome array CGH based on CNV spectrum in a large clinical cohort.

    PubMed

    Retterer, Kyle; Scuffins, Julie; Schmidt, Daniel; Lewis, Rachel; Pineda-Alvarez, Daniel; Stafford, Amanda; Schmidt, Lindsay; Warren, Stephanie; Gibellini, Federica; Kondakova, Anastasia; Blair, Amanda; Bale, Sherri; Matyakhina, Ludmila; Meck, Jeanne; Aradhya, Swaroop; Haverfield, Eden

    2015-08-01

    Detection of copy-number variation (CNV) is important for investigating many genetic disorders. Testing a large clinical cohort by array comparative genomic hybridization provides a deep perspective on the spectrum of pathogenic CNV. In this context, we describe a bioinformatics approach to extract CNV information from whole-exome sequencing and demonstrate its utility in clinical testing. Exon-focused arrays and whole-genome chromosomal microarray analysis were used to test 14,228 and 14,000 individuals, respectively. Based on these results, we developed an algorithm to detect deletions/duplications in whole-exome sequencing data and a novel whole-exome array. In the exon array cohort, we observed a positive detection rate of 2.4% (25 duplications, 318 deletions), of which 39% involved one or two exons. Chromosomal microarray analysis identified 3,345 CNVs affecting single genes (18%). We demonstrate that our whole-exome sequencing algorithm resolves CNVs of three or more exons. These results demonstrate the clinical utility of single-exon resolution in CNV assays. Our whole-exome sequencing algorithm approaches this resolution but is complemented by a whole-exome array to unambiguously identify intragenic CNVs and single-exon changes. These data illustrate the next advancements in CNV analysis through whole-exome sequencing and whole-exome array.Genet Med 17 8, 623-629.

  6. Enhanced sensitivity for detection of low-level germline mosaic RB1 mutations in sporadic retinoblastoma cases using deep semiconductor sequencing.

    PubMed

    Chen, Zhao; Moran, Kimberly; Richards-Yutz, Jennifer; Toorens, Erik; Gerhart, Daniel; Ganguly, Tapan; Shields, Carol L; Ganguly, Arupa

    2014-03-01

    Sporadic retinoblastoma (RB) is caused by de novo mutations in the RB1 gene. Often, these mutations are present as mosaic mutations that cannot be detected by Sanger sequencing. Next-generation deep sequencing allows unambiguous detection of the mosaic mutations in lymphocyte DNA. Deep sequencing of the RB1 gene on lymphocyte DNA from 20 bilateral and 70 unilateral RB cases was performed, where Sanger sequencing excluded the presence of mutations. The individual exons of the RB1 gene from each sample were amplified, pooled, ligated to barcoded adapters, and sequenced using semiconductor sequencing on an Ion Torrent Personal Genome Machine. Six low-level mosaic mutations were identified in bilateral RB and four in unilateral RB cases. The incidence of low-level mosaic mutation was estimated to be 30% and 6%, respectively, in sporadic bilateral and unilateral RB cases, previously classified as mutation negative. The frequency of point mutations detectable in lymphocyte DNA increased from 96% to 97% for bilateral RB and from 13% to 18% for unilateral RB. The use of deep sequencing technology increased the sensitivity of the detection of low-level germline mosaic mutations in the RB1 gene. This finding has significant implications for improved clinical diagnosis, genetic counseling, surveillance, and management of RB. © 2013 WILEY PERIODICALS, INC.

  7. Contamination Tracer Testing With Seabed Rock Drills: IODP Expedition 357

    NASA Astrophysics Data System (ADS)

    Orcutt, B.; Bergenthal, M.; Freudenthal, T.; Smith, D. J.; Lilley, M. D.; Schneiders, L.; Fruh-Green, G. L.

    2016-12-01

    IODP Expedition 357 utilized seabed rock drills for the first time in the history of the ocean drilling program, with the aim of collecting intact core of shallow mantle sequences from the Atlantis Massif to examine serpentinization processes and the deep biosphere. This new drilling approach required the development of a new system for delivering synthetic tracers during drilling to assess for possible sample contamination. Here, we describe this new tracer delivery system, assess the performance of the system during the expedition, provide an overview of the quality of the core samples collected for deep biosphere investigations based on tracer concentrations, and make recommendations for future applications of the system.

  8. Contamination tracer testing with seabed drills: IODP Expedition 357

    NASA Astrophysics Data System (ADS)

    Orcutt, Beth N.; Bergenthal, Markus; Freudenthal, Tim; Smith, David; Lilley, Marvin D.; Schnieders, Luzie; Green, Sophie; Früh-Green, Gretchen L.

    2017-11-01

    IODP Expedition 357 utilized seabed drills for the first time in the history of the ocean drilling program, with the aim of collecting intact sequences of shallow mantle core from the Atlantis Massif to examine serpentinization processes and the deep biosphere. This novel drilling approach required the development of a new remote seafloor system for delivering synthetic tracers during drilling to assess for possible sample contamination. Here, we describe this new tracer delivery system, assess the performance of the system during the expedition, provide an overview of the quality of the core samples collected for deep biosphere investigations based on tracer concentrations, and make recommendations for future applications of the system.

  9. Micropathogen Community Analysis in Hyalomma rufipes via High-Throughput Sequencing of Small RNAs

    PubMed Central

    Luo, Jin; Liu, Min-Xuan; Ren, Qiao-Yun; Chen, Ze; Tian, Zhan-Cheng; Hao, Jia-Wei; Wu, Feng; Liu, Xiao-Cui; Luo, Jian-Xun; Yin, Hong; Wang, Hui; Liu, Guang-Yuan

    2017-01-01

    Ticks are important vectors in the transmission of a broad range of micropathogens to vertebrates, including humans. Because of the role of ticks in disease transmission, identifying and characterizing the micropathogen profiles of tick populations have become increasingly important. The objective of this study was to survey the micropathogens of Hyalomma rufipes ticks. Illumina HiSeq2000 technology was utilized to perform deep sequencing of small RNAs (sRNAs) extracted from field-collected H. rufipes ticks in Gansu Province, China. The resultant sRNA library data revealed that the surveyed tick populations produced reads that were homologous to St. Croix River Virus (SCRV) sequences. We also observed many reads that were homologous to microbial and/or pathogenic isolates, including bacteria, protozoa, and fungi. As part of this analysis, a phylogenetic tree was constructed to display the relationships among the homologous sequences that were identified. The study offered a unique opportunity to gain insight into the micropathogens of H. rufipes ticks. The effective control of arthropod vectors in the future will require knowledge of the micropathogen composition of vectors harboring infectious agents. Understanding the ecological factors that regulate vector propagation in association with the prevalence and persistence of micropathogen lineages is also imperative. These interactions may affect the evolution of micropathogen lineages, especially if the micropathogens rely on the vector or host for dispersal. The sRNA deep-sequencing approach used in this analysis provides an intuitive method to survey micropathogen prevalence in ticks and other vector species. PMID:28861401

  10. Draft Genome Sequence of Deep-Sea Alteromonas sp. Strain V450 Isolated from the Marine Sponge Leiodermatium sp.

    PubMed Central

    Barrett, Nolan H.; McCarthy, Peter J.

    2017-01-01

    ABSTRACT The proteobacterium Alteromonas sp. strain V450 was isolated from the Atlantic deep-sea sponge Leiodermatium sp. Here, we report the draft genome sequence of this strain, with a genome size of approx. 4.39 Mb and a G+C content of 44.01%. The results will aid deep-sea microbial ecology, evolution, and sponge-microbe association studies. PMID:28153886

  11. Identification of Major Rhizobacterial Taxa Affected by a Glyphosate-Tolerant Soybean Line via Shotgun Metagenomic Approach

    PubMed Central

    Hua, Xiao-Mei; Liang, Li; Wen, Zhong-Ling; Du, Mei-Hang; Meng, Fan-Fan; Pang, Yan-Jun; Tang, Cheng-Yi

    2018-01-01

    The worldwide commercial cultivation of transgenic crops, including glyphosate-tolerant (GT) soybeans, has increased widely during the past 20 years. However, it is accompanied with a growing concern about potential effects of transgenic crops on the soil microbial communities, especially on rhizosphere bacterial communities. Our previous study found that the GT soybean line NZL06-698 (N698) significantly affected rhizosphere bacteria, including some unidentified taxa, through 16S rRNA gene (16S rDNA) V4 region amplicon deep sequencing via Illumina MiSeq. In this study, we performed 16S rDNA V5–V7 region amplicon deep sequencing via Illumina MiSeq and shotgun metagenomic approaches to identify those major taxa. Results of these processes revealed that the species richness and evenness increased in the rhizosphere bacterial communities of N698, the beta diversity of the rhizosphere bacterial communities of N698 was affected, and that certain dominant bacterial phyla and genera were related to N698 compared with its control cultivar Mengdou12. Consistent with our previous findings, this study showed that N698 affects the rhizosphere bacterial communities. In specific, N698 negatively affects Rahnella, Janthinobacterium, Stenotrophomonas, Sphingomonas and Luteibacter while positively affecting Arthrobacter, Bradyrhizobium, Ramlibacter and Nitrospira. PMID:29659545

  12. Acute multi-sgRNA knockdown of KEOPS complex genes reproduces the microcephaly phenotype of the stable knockout zebrafish model.

    PubMed

    Jobst-Schwan, Tilman; Schmidt, Johanna Magdalena; Schneider, Ronen; Hoogstraten, Charlotte A; Ullmann, Jeremy F P; Schapiro, David; Majmundar, Amar J; Kolb, Amy; Eddy, Kaitlyn; Shril, Shirlee; Braun, Daniela A; Poduri, Annapurna; Hildebrandt, Friedhelm

    2018-01-01

    Until recently, morpholino oligonucleotides have been widely employed in zebrafish as an acute and efficient loss-of-function assay. However, off-target effects and reproducibility issues when compared to stable knockout lines have compromised their further use. Here we employed an acute CRISPR/Cas approach using multiple single guide RNAs targeting simultaneously different positions in two exemplar genes (osgep or tprkb) to increase the likelihood of generating mutations on both alleles in the injected F0 generation and to achieve a similar effect as morpholinos but with the reproducibility of stable lines. This multi single guide RNA approach resulted in median likelihoods for at least one mutation on each allele of >99% and sgRNA specific insertion/deletion profiles as revealed by deep-sequencing. Immunoblot showed a significant reduction for Osgep and Tprkb proteins. For both genes, the acute multi-sgRNA knockout recapitulated the microcephaly phenotype and reduction in survival that we observed previously in stable knockout lines, though milder in the acute multi-sgRNA knockout. Finally, we quantify the degree of mutagenesis by deep sequencing, and provide a mathematical model to quantitate the chance for a biallelic loss-of-function mutation. Our findings can be generalized to acute and stable CRISPR/Cas targeting for any zebrafish gene of interest.

  13. Identification of Prostate Cancer-Specific microDNAs

    DTIC Science & Technology

    2016-02-01

    circular DNA by rolling circle amplification (RCA) and then amplified DNA fragments were subject to deep sequencing. Deep sequencing of the...demonstrate the existence of microDNAs in prostate cancer. We adopted multiple displacement amplification (MDA) with random 2 primers for enriched...prostate cancer cells through multiple displacement amplification and next generation sequencing. R e la ti v e c e ll g ro w th ( % ) 0 20

  14. PhyloPythiaS+: a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes.

    PubMed

    Gregor, Ivan; Dröge, Johannes; Schirmer, Melanie; Quince, Christopher; McHardy, Alice C

    2016-01-01

    Background. Metagenomics is an approach for characterizing environmental microbial communities in situ, it allows their functional and taxonomic characterization and to recover sequences from uncultured taxa. This is often achieved by a combination of sequence assembly and binning, where sequences are grouped into 'bins' representing taxa of the underlying microbial community. Assignment to low-ranking taxonomic bins is an important challenge for binning methods as is scalability to Gb-sized datasets generated with deep sequencing techniques. One of the best available methods for species bins recovery from deep-branching phyla is the expert-trained PhyloPythiaS package, where a human expert decides on the taxa to incorporate in the model and identifies 'training' sequences based on marker genes directly from the sample. Due to the manual effort involved, this approach does not scale to multiple metagenome samples and requires substantial expertise, which researchers who are new to the area do not have. Results. We have developed PhyloPythiaS+, a successor to our PhyloPythia(S) software. The new (+) component performs the work previously done by the human expert. PhyloPythiaS+ also includes a new k-mer counting algorithm, which accelerated the simultaneous counting of 4-6-mers used for taxonomic binning 100-fold and reduced the overall execution time of the software by a factor of three. Our software allows to analyze Gb-sized metagenomes with inexpensive hardware, and to recover species or genera-level bins with low error rates in a fully automated fashion. PhyloPythiaS+ was compared to MEGAN, taxator-tk, Kraken and the generic PhyloPythiaS model. The results showed that PhyloPythiaS+ performs especially well for samples originating from novel environments in comparison to the other methods. Availability. PhyloPythiaS+ in a virtual machine is available for installation under Windows, Unix systems or OS X on: https://github.com/algbioi/ppsp/wiki.

  15. Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells.

    PubMed

    Beltman, Joost B; Urbanus, Jos; Velds, Arno; van Rooij, Nienke; Rohr, Jan C; Naik, Shalin H; Schumacher, Ton N

    2016-04-02

    Next generation sequencing (NGS) of amplified DNA is a powerful tool to describe genetic heterogeneity within cell populations that can both be used to investigate the clonal structure of cell populations and to perform genetic lineage tracing. For applications in which both abundant and rare sequences are biologically relevant, the relatively high error rate of NGS techniques complicates data analysis, as it is difficult to distinguish rare true sequences from spurious sequences that are generated by PCR or sequencing errors. This issue, for instance, applies to cellular barcoding strategies that aim to follow the amount and type of offspring of single cells, by supplying these with unique heritable DNA tags. Here, we use genetic barcoding data from the Illumina HiSeq platform to show that straightforward read threshold-based filtering of data is typically insufficient to filter out spurious barcodes. Importantly, we demonstrate that specific sequencing errors occur at an approximately constant rate across different samples that are sequenced in parallel. We exploit this observation by developing a novel approach to filter out spurious sequences. Application of our new method demonstrates its value in the identification of true sequences amongst spurious sequences in biological data sets.

  16. Comparison of Deep-Water Viromes from the Atlantic Ocean and the Mediterranean Sea

    PubMed Central

    Winter, Christian; Garcia, Juan A. L.; Weinbauer, Markus G.; DuBow, Michael S.; Herndl, Gerhard J.

    2014-01-01

    The aim of this study was to compare the composition of two deep-sea viral communities obtained from the Romanche Fracture Zone in the Atlantic Ocean (collected at 5200 m depth) and the southwest Mediterranean Sea (from 2400 m depth) using a pyro-sequencing approach. The results are based on 18.7% and 6.9% of the sequences obtained from the Atlantic Ocean and the Mediterranean Sea, respectively, with hits to genomes in the non-redundant viral RefSeq database. The identifiable richness and relative abundance in both viromes were dominated by archaeal and bacterial viruses accounting for 92.3% of the relative abundance in the Atlantic Ocean and for 83.6% in the Mediterranean Sea. Despite characteristic differences in hydrographic features between the sampling sites in the Atlantic Ocean and the Mediterranean Sea, 440 virus genomes were found in both viromes. An additional 431 virus genomes were identified in the Atlantic Ocean and 75 virus genomes were only found in the Mediterranean Sea. The results indicate that the rather contrasting deep-sea environments of the Atlantic Ocean and the Mediterranean Sea share a common core set of virus types constituting the majority of both virus communities in terms of relative abundance (Atlantic Ocean: 81.4%; Mediterranean Sea: 88.7%). PMID:24959907

  17. Monitoring therapy responses at the leukemic subclone level by ultra-deep amplicon resequencing in acute myeloid leukemia.

    PubMed

    Ojamies, P N; Kontro, M; Edgren, H; Ellonen, P; Lagström, S; Almusa, H; Miettinen, T; Eldfors, S; Tamborero, D; Wennerberg, K; Heckman, C; Porkka, K; Wolf, M; Kallioniemi, O

    2017-05-01

    In our individualized systems medicine program, personalized treatment options are identified and administered to chemorefractory acute myeloid leukemia (AML) patients based on exome sequencing and ex vivo drug sensitivity and resistance testing data. Here, we analyzed how clonal heterogeneity affects the responses of 13 AML patients to chemotherapy or targeted treatments using ultra-deep (average 68 000 × coverage) amplicon resequencing. Using amplicon resequencing, we identified 16 variants from 4 patients (frequency 0.54-2%) that were not detected previously by exome sequencing. A correlation-based method was developed to detect mutation-specific responses in serial samples across multiple time points. Significant subclone-specific responses were observed for both chemotherapy and targeted therapy. We detected subclonal responses in patients where clinical European LeukemiaNet (ELN) criteria showed no response. Subclonal responses also helped to identify putative mechanisms underlying drug sensitivities, such as sensitivity to azacitidine in DNMT3A mutated cell clones and resistance to cytarabine in a subclone with loss of NF1 gene. In summary, ultra-deep amplicon resequencing method enables sensitive quantification of subclonal variants and their responses to therapies. This approach provides new opportunities for designing combinatorial therapies blocking multiple subclones as well as for real-time assessment of such treatments.

  18. Sequence2Vec: a novel embedding approach for modeling transcription factor binding affinity landscape.

    PubMed

    Dai, Hanjun; Umarov, Ramzan; Kuwahara, Hiroyuki; Li, Yu; Song, Le; Gao, Xin

    2017-11-15

    An accurate characterization of transcription factor (TF)-DNA affinity landscape is crucial to a quantitative understanding of the molecular mechanisms underpinning endogenous gene regulation. While recent advances in biotechnology have brought the opportunity for building binding affinity prediction methods, the accurate characterization of TF-DNA binding affinity landscape still remains a challenging problem. Here we propose a novel sequence embedding approach for modeling the transcription factor binding affinity landscape. Our method represents DNA binding sequences as a hidden Markov model which captures both position specific information and long-range dependency in the sequence. A cornerstone of our method is a novel message passing-like embedding algorithm, called Sequence2Vec, which maps these hidden Markov models into a common nonlinear feature space and uses these embedded features to build a predictive model. Our method is a novel combination of the strength of probabilistic graphical models, feature space embedding and deep learning. We conducted comprehensive experiments on over 90 large-scale TF-DNA datasets which were measured by different high-throughput experimental technologies. Sequence2Vec outperforms alternative machine learning methods as well as the state-of-the-art binding affinity prediction methods. Our program is freely available at https://github.com/ramzan1990/sequence2vec. xin.gao@kaust.edu.sa or lsong@cc.gatech.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  19. Deep sampling of the Palomero maize transcriptome by a high throughput strategy of pyrosequencing.

    PubMed

    Vega-Arreguín, Julio C; Ibarra-Laclette, Enrique; Jiménez-Moraila, Beatriz; Martínez, Octavio; Vielle-Calzada, Jean Philippe; Herrera-Estrella, Luis; Herrera-Estrella, Alfredo

    2009-07-06

    In-depth sequencing analysis has not been able to determine the overall complexity of transcriptional activity of a plant organ or tissue sample. In some cases, deep parallel sequencing of Expressed Sequence Tags (ESTs), although not yet optimized for the sequencing of cDNAs, has represented an efficient procedure for validating gene prediction and estimating overall gene coverage. This approach could be very valuable for complex plant genomes. In addition, little emphasis has been given to efforts aiming at an estimation of the overall transcriptional universe found in a multicellular organism at a specific developmental stage. To explore, in depth, the transcriptional diversity in an ancient maize landrace, we developed a protocol to optimize the sequencing of cDNAs and performed 4 consecutive GS20-454 pyrosequencing runs of a cDNA library obtained from 2 week-old Palomero Toluqueño maize plants. The protocol reported here allowed obtaining over 90% of informative sequences. These GS20-454 runs generated over 1.5 Million reads, representing the largest amount of sequences reported from a single plant cDNA library. A collection of 367,391 quality-filtered reads (30.09 Mb) from a single run was sufficient to identify transcripts corresponding to 34% of public maize ESTs databases; total sequences generated after 4 filtered runs increased this coverage to 50%. Comparisons of all 1.5 Million reads to the Maize Assembled Genomic Islands (MAGIs) provided evidence for the transcriptional activity of 11% of MAGIs. We estimate that 5.67% (86,069 sequences) do not align with public ESTs or annotated genes, potentially representing new maize transcripts. Following the assembly of 74.4% of the reads in 65,493 contigs, real-time PCR of selected genes confirmed a predicted correlation between the abundance of GS20-454 sequences and corresponding levels of gene expression. A protocol was developed that significantly increases the number, length and quality of cDNA reads using massive 454 parallel sequencing. We show that recurrent 454 pyrosequencing of a single cDNA sample is necessary to attain a thorough representation of the transcriptional universe present in maize, that can also be used to estimate transcript abundance of specific genes. This data suggests that the molecular and functional diversity contained in the vast native landraces remains to be explored, and that large-scale transcriptional sequencing of a presumed ancestor of the modern maize varieties represents a valuable approach to characterize the functional diversity of maize for future agricultural and evolutionary studies.

  20. An Efficient Strategy of Screening for Pathogens in Wild-Caught Ticks and Mosquitoes by Reusing Small RNA Deep Sequencing Data

    PubMed Central

    An, Xiaoping; Fan, Hang; Ma, Maijuan; Anderson, Benjamin D.; Jiang, Jiafu; Liu, Wei; Cao, Wuchun; Tong, Yigang

    2014-01-01

    This paper explored our hypothesis that sRNA (18∼30 bp) deep sequencing technique can be used as an efficient strategy to identify microorganisms other than viruses, such as prokaryotic and eukaryotic pathogens. In the study, the clean reads derived from the sRNA deep sequencing data of wild-caught ticks and mosquitoes were compared against the NCBI nucleotide collection (non-redundant nt database) using Blastn. The blast results were then analyzed with in-house Python scripts. An empirical formula was proposed to identify the putative pathogens. Results showed that not only viruses but also prokaryotic and eukaryotic species of interest can be screened out and were subsequently confirmed with experiments. Specially, a novel Rickettsia spp. was indicated to exist in Haemaphysalis longicornis ticks collected in Beijing. Our study demonstrated the reuse of sRNA deep sequencing data would have the potential to trace the origin of pathogens or discover novel agents of emerging/re-emerging infectious diseases. PMID:24618575

  1. Draft Genome Sequence of Deep-Sea Alteromonas sp. Strain V450 Isolated from the Marine Sponge Leiodermatium sp.

    PubMed

    Wang, Guojun; Barrett, Nolan H; McCarthy, Peter J

    2017-02-02

    The proteobacterium Alteromonas sp. strain V450 was isolated from the Atlantic deep-sea sponge Leiodermatium sp. Here, we report the draft genome sequence of this strain, with a genome size of approx. 4.39 Mb and a G+C content of 44.01%. The results will aid deep-sea microbial ecology, evolution, and sponge-microbe association studies. Copyright © 2017 Wang et al.

  2. miRBase: integrating microRNA annotation and deep-sequencing data.

    PubMed

    Kozomara, Ana; Griffiths-Jones, Sam

    2011-01-01

    miRBase is the primary online repository for all microRNA sequences and annotation. The current release (miRBase 16) contains over 15,000 microRNA gene loci in over 140 species, and over 17,000 distinct mature microRNA sequences. Deep-sequencing technologies have delivered a sharp rise in the rate of novel microRNA discovery. We have mapped reads from short RNA deep-sequencing experiments to microRNAs in miRBase and developed web interfaces to view these mappings. The user can view all read data associated with a given microRNA annotation, filter reads by experiment and count, and search for microRNAs by tissue- and stage-specific expression. These data can be used as a proxy for relative expression levels of microRNA sequences, provide detailed evidence for microRNA annotations and alternative isoforms of mature microRNAs, and allow us to revisit previous annotations. miRBase is available online at: http://www.mirbase.org/.

  3. Transcriptome sequences resolve deep relationships of the grape family.

    PubMed

    Wen, Jun; Xiong, Zhiqiang; Nie, Ze-Long; Mao, Likai; Zhu, Yabing; Kan, Xian-Zhao; Ickert-Bond, Stefanie M; Gerrath, Jean; Zimmer, Elizabeth A; Fang, Xiao-Dong

    2013-01-01

    Previous phylogenetic studies of the grape family (Vitaceae) yielded poorly resolved deep relationships, thus impeding our understanding of the evolution of the family. Next-generation sequencing now offers access to protein coding sequences very easily, quickly and cost-effectively. To improve upon earlier work, we extracted 417 orthologous single-copy nuclear genes from the transcriptomes of 15 species of the Vitaceae, covering its phylogenetic diversity. The resulting transcriptome phylogeny provides robust support for the deep relationships, showing the phylogenetic utility of transcriptome data for plants over a time scale at least since the mid-Cretaceous. The pros and cons of transcriptome data for phylogenetic inference in plants are also evaluated.

  4. Draft Genome Sequence of Pseudomonas oceani DSM 100277T, a Deep-Sea Bacterium

    PubMed Central

    2018-01-01

    ABSTRACT Pseudomonas oceani DSM 100277T was isolated from deep seawater in the Okinawa Trough at 1390 m. P. oceani belongs to the Pseudomonas pertucinogena group. Here, we report the draft genome sequence of P. oceani, which has an estimated size of 4.1 Mb and exhibits 3,790 coding sequences, with a G+C content of 59.94 mol%. PMID:29650573

  5. A Bioinformatic Pipeline for Monitoring of the Mutational Stability of Viral Drug Targets with Deep-Sequencing Technology.

    PubMed

    Kravatsky, Yuri; Chechetkin, Vladimir; Fedoseeva, Daria; Gorbacheva, Maria; Kravatskaya, Galina; Kretova, Olga; Tchurikov, Nickolai

    2017-11-23

    The efficient development of antiviral drugs, including efficient antiviral small interfering RNAs (siRNAs), requires continuous monitoring of the strict correspondence between a drug and the related highly variable viral DNA/RNA target(s). Deep sequencing is able to provide an assessment of both the general target conservation and the frequency of particular mutations in the different target sites. The aim of this study was to develop a reliable bioinformatic pipeline for the analysis of millions of short, deep sequencing reads corresponding to selected highly variable viral sequences that are drug target(s). The suggested bioinformatic pipeline combines the available programs and the ad hoc scripts based on an original algorithm of the search for the conserved targets in the deep sequencing data. We also present the statistical criteria for the threshold of reliable mutation detection and for the assessment of variations between corresponding data sets. These criteria are robust against the possible sequencing errors in the reads. As an example, the bioinformatic pipeline is applied to the study of the conservation of RNA interference (RNAi) targets in human immunodeficiency virus 1 (HIV-1) subtype A. The developed pipeline is freely available to download at the website http://virmut.eimb.ru/. Brief comments and comparisons between VirMut and other pipelines are also presented.

  6. Aftershock occurrence rate decay for individual sequences and catalogs

    NASA Astrophysics Data System (ADS)

    Nyffenegger, Paul A.

    One of the earliest observations of the Earth's seismicity is that the rate of aftershock occurrence decays with time according to a power law commonly known as modified Omori-law (MOL) decay. However, the physical reasons for aftershock occurrence and the empirical decay in rate remain unclear despite numerous models that yield similar rate decay behavior. Key problems in relating the observed empirical relationship to the physical conditions of the mainshock and fault are the lack of studies including small magnitude mainshocks and the lack of uniformity between studies. We use simulated aftershock sequences to investigate the factors which influence the maximum likelihood (ML) estimate of the Omori-law p value, the parameter describing aftershock occurrence rate decay, for both individual aftershock sequences and "stacked" or superposed sequences. Generally the ML estimate of p is accurate, but since the ML estimated uncertainty is unaffected by whether the sequence resembles an MOL model, a goodness-of-fit test such as the Anderson-Darling statistic is necessary. While stacking aftershock sequences permits the study of entire catalogs and sequences with small aftershock populations, stacking introduces artifacts. The p value for stacked sequences is approximately equal to the mean of the individual sequence p values. We apply single-link cluster analysis to identify all aftershock sequences from eleven regional seismicity catalogs. We observe two new mathematically predictable empirical relationships for the distribution of aftershock sequence populations. The average properties of aftershock sequences are not correlated with tectonic environment, but aftershock populations and p values do show a depth dependence. The p values show great variability with time, and large values or changes in p sometimes precedes major earthquakes. Studies of teleseismic earthquake catalogs over the last twenty years have led seismologists to question seismicity models and aftershock sequence decay for deep sequences. For seven exceptional deep sequences, we conclude that MOL decay adequately describes these sequences, and little difference exists compared to shallow sequences. However, they do include larger aftershock populations compared to most deep sequences. These results imply that p values for deep sequences are larger than those for intermediate depth sequences.

  7. Chemolithotrophy in the continental deep subsurface: Sanford Underground Research Facility (SURF), USA

    PubMed Central

    Osburn, Magdalena R.; LaRowe, Douglas E.; Momper, Lily M.; Amend, Jan P.

    2014-01-01

    The deep subsurface is an enormous repository of microbial life. However, the metabolic capabilities of these microorganisms and the degree to which they are dependent on surface processes are largely unknown. Due to the logistical difficulty of sampling and inherent heterogeneity, the microbial populations of the terrestrial subsurface are poorly characterized. In an effort to better understand the biogeochemistry of deep terrestrial habitats, we evaluate the energetic yield of chemolithotrophic metabolisms and microbial diversity in the Sanford Underground Research Facility (SURF) in the former Homestake Gold Mine, SD, USA. Geochemical data, energetic modeling, and DNA sequencing were combined with principle component analysis to describe this deep (down to 8100 ft below surface), terrestrial environment. SURF provides access into an iron-rich Paleoproterozoic metasedimentary deposit that contains deeply circulating groundwater. Geochemical analyses of subsurface fluids reveal enormous geochemical diversity ranging widely in salinity, oxidation state (ORP 330 to −328 mV), and concentrations of redox sensitive species (e.g., Fe2+ from near 0 to 6.2 mg/L and Σ S2- from 7 to 2778μg/L). As a direct result of this compositional buffet, Gibbs energy calculations reveal an abundance of energy for microorganisms from the oxidation of sulfur, iron, nitrogen, methane, and manganese. Pyrotag DNA sequencing reveals diverse communities of chemolithoautotrophs, thermophiles, aerobic and anaerobic heterotrophs, and numerous uncultivated clades. Extrapolated across the mine footprint, these data suggest a complex spatial mosaic of subsurface primary productivity that is in good agreement with predicted energy yields. Notably, we report Gibbs energy normalized both per mole of reaction and per kg fluid (energy density) and find the later to be more consistent with observed physiologies and environmental conditions. Further application of this approach will significantly expand our understanding of the deep terrestrial biosphere. PMID:25429287

  8. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.

    PubMed

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-11

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

  9. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields

    NASA Astrophysics Data System (ADS)

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-01

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

  10. Deep learning methods for protein torsion angle prediction.

    PubMed

    Li, Haiou; Hou, Jie; Adhikari, Badri; Lyu, Qiang; Cheng, Jianlin

    2017-09-18

    Deep learning is one of the most powerful machine learning methods that has achieved the state-of-the-art performance in many domains. Since deep learning was introduced to the field of bioinformatics in 2012, it has achieved success in a number of areas such as protein residue-residue contact prediction, secondary structure prediction, and fold recognition. In this work, we developed deep learning methods to improve the prediction of torsion (dihedral) angles of proteins. We design four different deep learning architectures to predict protein torsion angles. The architectures including deep neural network (DNN) and deep restricted Boltzmann machine (DRBN), deep recurrent neural network (DRNN) and deep recurrent restricted Boltzmann machine (DReRBM) since the protein torsion angle prediction is a sequence related problem. In addition to existing protein features, two new features (predicted residue contact number and the error distribution of torsion angles extracted from sequence fragments) are used as input to each of the four deep learning architectures to predict phi and psi angles of protein backbone. The mean absolute error (MAE) of phi and psi angles predicted by DRNN, DReRBM, DRBM and DNN is about 20-21° and 29-30° on an independent dataset. The MAE of phi angle is comparable to the existing methods, but the MAE of psi angle is 29°, 2° lower than the existing methods. On the latest CASP12 targets, our methods also achieved the performance better than or comparable to a state-of-the art method. Our experiment demonstrates that deep learning is a valuable method for predicting protein torsion angles. The deep recurrent network architecture performs slightly better than deep feed-forward architecture, and the predicted residue contact number and the error distribution of torsion angles extracted from sequence fragments are useful features for improving prediction accuracy.

  11. Genomic approaches for the elucidation of genes and gene networks underlying cardiovascular traits.

    PubMed

    Adriaens, M E; Bezzina, C R

    2018-06-22

    Genome-wide association studies have shed light on the association between natural genetic variation and cardiovascular traits. However, linking a cardiovascular trait associated locus to a candidate gene or set of candidate genes for prioritization for follow-up mechanistic studies is all but straightforward. Genomic technologies based on next-generation sequencing technology nowadays offer multiple opportunities to dissect gene regulatory networks underlying genetic cardiovascular trait associations, thereby aiding in the identification of candidate genes at unprecedented scale. RNA sequencing in particular becomes a powerful tool when combined with genotyping to identify loci that modulate transcript abundance, known as expression quantitative trait loci (eQTL), or loci modulating transcript splicing known as splicing quantitative trait loci (sQTL). Additionally, the allele-specific resolution of RNA-sequencing technology enables estimation of allelic imbalance, a state where the two alleles of a gene are expressed at a ratio differing from the expected 1:1 ratio. When multiple high-throughput approaches are combined with deep phenotyping in a single study, a comprehensive elucidation of the relationship between genotype and phenotype comes into view, an approach known as systems genetics. In this review, we cover key applications of systems genetics in the broad cardiovascular field.

  12. High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features.

    PubMed

    Jones, David T; Kandathil, Shaun M

    2018-04-26

    In addition to substitution frequency data from protein sequence alignments, many state-of-the-art methods for contact prediction rely on additional sources of information, or features, of protein sequences in order to predict residue-residue contacts, such as solvent accessibility, predicted secondary structure, and scores from other contact prediction methods. It is unclear how much of this information is needed to achieve state-of-the-art results. Here, we show that using deep neural network models, simple alignment statistics contain sufficient information to achieve state-of-the-art precision. Our prediction method, DeepCov, uses fully convolutional neural networks operating on amino-acid pair frequency or covariance data derived directly from sequence alignments, without using global statistical methods such as sparse inverse covariance or pseudolikelihood estimation. Comparisons against CCMpred and MetaPSICOV2 show that using pairwise covariance data calculated from raw alignments as input allows us to match or exceed the performance of both of these methods. Almost all of the achieved precision is obtained when considering relatively local windows (around 15 residues) around any member of a given residue pairing; larger window sizes have comparable performance. Assessment on a set of shallow sequence alignments (fewer than 160 effective sequences) indicates that the new method is substantially more precise than CCMpred and MetaPSICOV2 in this regime, suggesting that improved precision is attainable on smaller sequence families. Overall, the performance of DeepCov is competitive with the state of the art, and our results demonstrate that global models, which employ features from all parts of the input alignment when predicting individual contacts, are not strictly needed in order to attain precise contact predictions. DeepCov is freely available at https://github.com/psipred/DeepCov. d.t.jones@ucl.ac.uk.

  13. A combination of LongSAGE with Solexa sequencing is well suited to explore the depth and the complexity of transcriptome

    PubMed Central

    Hanriot, Lucie; Keime, Céline; Gay, Nadine; Faure, Claudine; Dossat, Carole; Wincker, Patrick; Scoté-Blachon, Céline; Peyron, Christelle; Gandrillon, Olivier

    2008-01-01

    Background "Open" transcriptome analysis methods allow to study gene expression without a priori knowledge of the transcript sequences. As of now, SAGE (Serial Analysis of Gene Expression), LongSAGE and MPSS (Massively Parallel Signature Sequencing) are the mostly used methods for "open" transcriptome analysis. Both LongSAGE and MPSS rely on the isolation of 21 pb tag sequences from each transcript. In contrast to LongSAGE, the high throughput sequencing method used in MPSS enables the rapid sequencing of very large libraries containing several millions of tags, allowing deep transcriptome analysis. However, a bias in the complexity of the transcriptome representation obtained by MPSS was recently uncovered. Results In order to make a deep analysis of mouse hypothalamus transcriptome avoiding the limitation introduced by MPSS, we combined LongSAGE with the Solexa sequencing technology and obtained a library of more than 11 millions of tags. We then compared it to a LongSAGE library of mouse hypothalamus sequenced with the Sanger method. Conclusion We found that Solexa sequencing technology combined with LongSAGE is perfectly suited for deep transcriptome analysis. In contrast to MPSS, it gives a complex representation of transcriptome as reliable as a LongSAGE library sequenced by the Sanger method. PMID:18796152

  14. A sparse autoencoder-based deep neural network for protein solvent accessibility and contact number prediction.

    PubMed

    Deng, Lei; Fan, Chao; Zeng, Zhiwen

    2017-12-28

    Direct prediction of the three-dimensional (3D) structures of proteins from one-dimensional (1D) sequences is a challenging problem. Significant structural characteristics such as solvent accessibility and contact number are essential for deriving restrains in modeling protein folding and protein 3D structure. Thus, accurately predicting these features is a critical step for 3D protein structure building. In this study, we present DeepSacon, a computational method that can effectively predict protein solvent accessibility and contact number by using a deep neural network, which is built based on stacked autoencoder and a dropout method. The results demonstrate that our proposed DeepSacon achieves a significant improvement in the prediction quality compared with the state-of-the-art methods. We obtain 0.70 three-state accuracy for solvent accessibility, 0.33 15-state accuracy and 0.74 Pearson Correlation Coefficient (PCC) for the contact number on the 5729 monomeric soluble globular protein dataset. We also evaluate the performance on the CASP11 benchmark dataset, DeepSacon achieves 0.68 three-state accuracy and 0.69 PCC for solvent accessibility and contact number, respectively. We have shown that DeepSacon can reliably predict solvent accessibility and contact number with stacked sparse autoencoder and a dropout approach.

  15. Sedimentology, tephrostratigraphy, and chronology of the DEEP site sediment record, Lake Ohrid (Albania, FYROM)

    NASA Astrophysics Data System (ADS)

    Leicher, Niklas; Wagner, Bernd; Francke, Alexander; Just, Janna; Zanchetta, Giovanni; Sulpizio, Roberto; Giaccio, Biagio; Nomade, Sebastien

    2017-04-01

    Lake Ohrid, located on the Balkan Peninsula, is one of the very few lakes in the world that provides a continuous and high-resolution record of environmental change of >1.3 Ma. The sedimentary archive was drilled in spring 2013 within the scope of the International Continental Scientific Drilling Program (ICDP) and the Scientific Collaboration on Past Speciation Conditions in Lake Ohrid (SCOPSCO) project in order to investigate local and regional geological and paleoclimatic processes, as well as triggers of evolutionary patterns and endemic biodiversity. The continuous composite profile (584 m) of the main drill site DEEP was logged (XRF, MSCL) and subsampled for biogeochemical (TIC, TOC, TN, TS) and sedimentological (grain size) analyses. The lithology of the DEEP site indicates that the history of Lake Ohrid can roughly be separated into two parts, with the older section between 584 and 450 m depth being characterised by a sedimentary facies indicating shallow water conditions, which is likely younger than ca. 1.9 Ma. In the lowermost few meters of the succession gravels and pebbles hampered a deeper drilling penetration and indicate that fluvial conditions existed during the onset of lake formation. Together with geotectonic, seismic, and biological information, the data imply that the Ohrid basin formed by transtension during the Miocene, opened during the Pliocene and Pleistocene, and that the lake established between 1.9 and 1.3 Ma ago. The sediments of the younger part (< 450 m sediment depth) indicate that deeper water conditions established in Lake Ohrid after 1.3 Ma ago. Since then, biogeochemical proxy data respond to global glacial/interglacial variability, with warm periods being characterized by high TIC and TOC concentrations and cold periods by negligible TIC and low TOC contents, respectively. To date, 56 tephra horizons have been identified in the upper 450 m of the DEEP site sequence and are subject of ongoing investigations aimed at identifying their specific volcanic sources and equivalent known tephra by using geochemical fingerprinting of glass fragments. This was already successfully approved for tephra horizons in the upper 247.8 m of the sequence, obtaining important chronological information from 11 well dated tephra layers. These tephrochronological constraints were complemented by ages obtained from tuning the consistent pattern of the biogeochemical proxy data to orbital parameters in order to develop an age depth model for the last 637 kyr. This dating approach for the upper part will be further extended for the lower sequence below 247.8 m and combined with paleomagnetic information. The Brunhes/Matuyama boundary and the Jaramillo subchron are evident in the DEEP site sequence and will be further confined by higher resolution paleomagnetic measurements. The high-resolution data will also enable the reconstruction of the dynamic of the Earth's Magnetic Field during polarity transitions. This mulit-method dating approach will provide a robust chronology of the core, which is the backbone to fulfil the major aims of the SCOPSCO project.

  16. Deep Impact Sequence Planning Using Multi-Mission Adaptable Planning Tools With Integrated Spacecraft Models

    NASA Technical Reports Server (NTRS)

    Wissler, Steven S.; Maldague, Pierre; Rocca, Jennifer; Seybold, Calina

    2006-01-01

    The Deep Impact mission was ambitious and challenging. JPL's well proven, easily adaptable multi-mission sequence planning tools combined with integrated spacecraft subsystem models enabled a small operations team to develop, validate, and execute extremely complex sequence-based activities within very short development times. This paper focuses on the core planning tool used in the mission, APGEN. It shows how the multi-mission design and adaptability of APGEN made it possible to model spacecraft subsystems as well as ground assets throughout the lifecycle of the Deep Impact project, starting with models of initial, high-level mission objectives, and culminating in detailed predictions of spacecraft behavior during mission-critical activities.

  17. Transcriptome Sequences Resolve Deep Relationships of the Grape Family

    PubMed Central

    Wen, Jun; Xiong, Zhiqiang; Nie, Ze-Long; Mao, Likai; Zhu, Yabing; Kan, Xian-Zhao; Ickert-Bond, Stefanie M.; Gerrath, Jean; Zimmer, Elizabeth A.; Fang, Xiao-Dong

    2013-01-01

    Previous phylogenetic studies of the grape family (Vitaceae) yielded poorly resolved deep relationships, thus impeding our understanding of the evolution of the family. Next-generation sequencing now offers access to protein coding sequences very easily, quickly and cost-effectively. To improve upon earlier work, we extracted 417 orthologous single-copy nuclear genes from the transcriptomes of 15 species of the Vitaceae, covering its phylogenetic diversity. The resulting transcriptome phylogeny provides robust support for the deep relationships, showing the phylogenetic utility of transcriptome data for plants over a time scale at least since the mid-Cretaceous. The pros and cons of transcriptome data for phylogenetic inference in plants are also evaluated. PMID:24069307

  18. A comparative molecular analysis of water-filled limestone sinkholes in north-eastern Mexico.

    PubMed

    Sahl, Jason W; Gary, Marcus O; Harris, J Kirk; Spear, John R

    2011-01-01

    Sistema Zacatón in north-eastern Mexico is host to several deep, water-filled, anoxic, karstic sinkholes (cenotes). These cenotes were explored, mapped, and geochemically and microbiologically sampled by the autonomous underwater vehicle deep phreatic thermal explorer (DEPTHX). The community structure of the filterable fraction of the water column and extensive microbial mats that coat the cenote walls was investigated by comparative analysis of small-subunit (SSU) 16S rRNA gene sequences. Full-length Sanger gene sequence analysis revealed novel microbial diversity that included three putative bacterial candidate phyla and three additional groups that showed high intra-clade distance with poorly characterized bacterial candidate phyla. Limited functional gene sequence analysis in these anoxic environments identified genes associated with methanogenesis, sulfate reduction and anaerobic ammonium oxidation. A directed, barcoded amplicon, multiplex pyrosequencing approach was employed to compare ∼100,000 bacterial SSU gene sequences from water column and wall microbial mat samples from five cenotes in Sistema Zacatón. A new, high-resolution sequence distribution profile (SDP) method identified changes in specific phylogenetic types (phylotypes) in microbial mats at varied depths; Mantel tests showed a correlation of the genetic distances between mat communities in two cenotes and the geographic location of each cenote. Community structure profiles from the water column of three neighbouring cenotes showed distinct variation; statistically significant differences in the concentration of geochemical constituents suggest that the variation observed in microbial communities between neighbouring cenotes are due to geochemical variation. © 2010 Society for Applied Microbiology and Blackwell Publishing Ltd.

  19. ATAC-see reveals the accessible genome by transposase-mediated imaging and sequencing.

    PubMed

    Chen, Xingqi; Shen, Ying; Draper, Will; Buenrostro, Jason D; Litzenburger, Ulrike; Cho, Seung Woo; Satpathy, Ansuman T; Carter, Ava C; Ghosh, Rajarshi P; East-Seletsky, Alexandra; Doudna, Jennifer A; Greenleaf, William J; Liphardt, Jan T; Chang, Howard Y

    2016-12-01

    Spatial organization of the genome plays a central role in gene expression, DNA replication, and repair. But current epigenomic approaches largely map DNA regulatory elements outside of the native context of the nucleus. Here we report assay of transposase-accessible chromatin with visualization (ATAC-see), a transposase-mediated imaging technology that employs direct imaging of the accessible genome in situ, cell sorting, and deep sequencing to reveal the identity of the imaged elements. ATAC-see revealed the cell-type-specific spatial organization of the accessible genome and the coordinated process of neutrophil chromatin extrusion, termed NETosis. Integration of ATAC-see with flow cytometry enables automated quantitation and prospective cell isolation as a function of chromatin accessibility, and it reveals a cell-cycle dependence of chromatin accessibility that is especially dynamic in G1 phase. The integration of imaging and epigenomics provides a general and scalable approach for deciphering the spatiotemporal architecture of gene control.

  20. Deep Learning and Its Applications in Biomedicine.

    PubMed

    Cao, Chensi; Liu, Feng; Tan, Hai; Song, Deshou; Shu, Wenjie; Li, Weizhong; Zhou, Yiming; Bo, Xiaochen; Xie, Zhi

    2018-02-01

    Advances in biological and medical technologies have been providing us explosive volumes of biological and physiological data, such as medical images, electroencephalography, genomic and protein sequences. Learning from these data facilitates the understanding of human health and disease. Developed from artificial neural networks, deep learning-based algorithms show great promise in extracting features and learning patterns from complex data. The aim of this paper is to provide an overview of deep learning techniques and some of the state-of-the-art applications in the biomedical field. We first introduce the development of artificial neural network and deep learning. We then describe two main components of deep learning, i.e., deep learning architectures and model optimization. Subsequently, some examples are demonstrated for deep learning applications, including medical image classification, genomic sequence analysis, as well as protein structure classification and prediction. Finally, we offer our perspectives for the future directions in the field of deep learning. Copyright © 2018. Production and hosting by Elsevier B.V.

  1. Emergent HIV-1 Drug Resistance Mutations Were Not Present at Low-Frequency at Baseline in Non-Nucleoside Reverse Transcriptase Inhibitor-Treated Subjects in the STaR Study

    PubMed Central

    Porter, Danielle P.; Daeumer, Martin; Thielen, Alexander; Chang, Silvia; Martin, Ross; Cohen, Cal; Miller, Michael D.; White, Kirsten L.

    2015-01-01

    At Week 96 of the Single-Tablet Regimen (STaR) study, more treatment-naïve subjects that received rilpivirine/emtricitabine/tenofovir DF (RPV/FTC/TDF) developed resistance mutations compared to those treated with efavirenz (EFV)/FTC/TDF by population sequencing. Furthermore, more RPV/FTC/TDF-treated subjects with baseline HIV-1 RNA >100,000 copies/mL developed resistance compared to subjects with baseline HIV-1 RNA ≤100,000 copies/mL. Here, deep sequencing was utilized to assess the presence of pre-existing low-frequency variants in subjects with and without resistance development in the STaR study. Deep sequencing (Illumina MiSeq) was performed on baseline and virologic failure samples for all subjects analyzed for resistance by population sequencing during the clinical study (n = 33), as well as baseline samples from control subjects with virologic response (n = 118). Primary NRTI or NNRTI drug resistance mutations present at low frequency (≥2% to 20%) were detected in 6.6% of baseline samples by deep sequencing, all of which occurred in control subjects. Deep sequencing results were generally consistent with population sequencing but detected additional primary NNRTI and NRTI resistance mutations at virologic failure in seven samples. HIV-1 drug resistance mutations emerging while on RPV/FTC/TDF or EFV/FTC/TDF treatment were not present at low frequency at baseline in the STaR study. PMID:26690199

  2. Emergent HIV-1 Drug Resistance Mutations Were Not Present at Low-Frequency at Baseline in Non-Nucleoside Reverse Transcriptase Inhibitor-Treated Subjects in the STaR Study.

    PubMed

    Porter, Danielle P; Daeumer, Martin; Thielen, Alexander; Chang, Silvia; Martin, Ross; Cohen, Cal; Miller, Michael D; White, Kirsten L

    2015-12-07

    At Week 96 of the Single-Tablet Regimen (STaR) study, more treatment-naïve subjects that received rilpivirine/emtricitabine/tenofovir DF (RPV/FTC/TDF) developed resistance mutations compared to those treated with efavirenz (EFV)/FTC/TDF by population sequencing. Furthermore, more RPV/FTC/TDF-treated subjects with baseline HIV-1 RNA >100,000 copies/mL developed resistance compared to subjects with baseline HIV-1 RNA ≤100,000 copies/mL. Here, deep sequencing was utilized to assess the presence of pre-existing low-frequency variants in subjects with and without resistance development in the STaR study. Deep sequencing (Illumina MiSeq) was performed on baseline and virologic failure samples for all subjects analyzed for resistance by population sequencing during the clinical study (n = 33), as well as baseline samples from control subjects with virologic response (n = 118). Primary NRTI or NNRTI drug resistance mutations present at low frequency (≥2% to 20%) were detected in 6.6% of baseline samples by deep sequencing, all of which occurred in control subjects. Deep sequencing results were generally consistent with population sequencing but detected additional primary NNRTI and NRTI resistance mutations at virologic failure in seven samples. HIV-1 drug resistance mutations emerging while on RPV/FTC/TDF or EFV/FTC/TDF treatment were not present at low frequency at baseline in the STaR study.

  3. Natural Variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions (2013 DOE JGI Genomics of Energy and Environment 8th Annual User Meeting)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gordon, Sean

    2013-03-01

    Sean Gordon of the USDA on Natural variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions at the 8th Annual Genomics of Energy Environment Meeting on March 27, 2013 in Walnut Creek, CA.

  4. Microbial Diversity in Deep-sea Methane Seep Sediments Presented by SSU rRNA Gene Tag Sequencing

    PubMed Central

    Nunoura, Takuro; Takaki, Yoshihiro; Kazama, Hiromi; Hirai, Miho; Ashi, Juichiro; Imachi, Hiroyuki; Takai, Ken

    2012-01-01

    Microbial community structures in methane seep sediments in the Nankai Trough were analyzed by tag-sequencing analysis for the small subunit (SSU) rRNA gene using a newly developed primer set. The dominant members of Archaea were Deep-sea Hydrothermal Vent Euryarchaeotic Group 6 (DHVEG 6), Marine Group I (MGI) and Deep Sea Archaeal Group (DSAG), and those in Bacteria were Alpha-, Gamma-, Delta- and Epsilonproteobacteria, Chloroflexi, Bacteroidetes, Planctomycetes and Acidobacteria. Diversity and richness were examined by 8,709 and 7,690 tag-sequences from sediments at 5 and 25 cm below the seafloor (cmbsf), respectively. The estimated diversity and richness in the methane seep sediment are as high as those in soil and deep-sea hydrothermal environments, although the tag-sequences obtained in this study were not sufficient to show whole microbial diversity in this analysis. We also compared the diversity and richness of each taxon/division between the sediments from the two depths, and found that the diversity and richness of some taxa/divisions varied significantly along with the depth. PMID:22510646

  5. Emerging patterns of somatic mutations in cancer

    PubMed Central

    Watson, Ian R.; Takahashi, Koichi; Futreal, P. Andrew; Chin, Lynda

    2014-01-01

    The advance in technological tools for massively parallel, high-throughput sequencing of DNA has enabled the comprehensive characterization of somatic mutations in large number of tumor samples. Here, we review recent cancer genomic studies that have assembled emerging views of the landscapes of somatic mutations through deep sequencing analyses of the coding exomes and whole genomes in various cancer types. We discuss the comparative genomics of different cancers, including mutation rates, spectrums, and roles of environmental insults that influence these processes. We highlight the developing statistical approaches used to identify significantly mutated genes, and discuss the emerging biological and clinical insights from such analyses as well as the challenges ahead translating these genomic data into clinical impacts. PMID:24022702

  6. Deep Recurrent Neural Networks for Human Activity Recognition

    PubMed Central

    Murad, Abdulmajid

    2017-01-01

    Adopting deep learning methods for human activity recognition has been effective in extracting discriminative features from raw input sequences acquired from body-worn sensors. Although human movements are encoded in a sequence of successive samples in time, typical machine learning methods perform recognition tasks without exploiting the temporal correlations between input data samples. Convolutional neural networks (CNNs) address this issue by using convolutions across a one-dimensional temporal sequence to capture dependencies among input data. However, the size of convolutional kernels restricts the captured range of dependencies between data samples. As a result, typical models are unadaptable to a wide range of activity-recognition configurations and require fixed-length input windows. In this paper, we propose the use of deep recurrent neural networks (DRNNs) for building recognition models that are capable of capturing long-range dependencies in variable-length input sequences. We present unidirectional, bidirectional, and cascaded architectures based on long short-term memory (LSTM) DRNNs and evaluate their effectiveness on miscellaneous benchmark datasets. Experimental results show that our proposed models outperform methods employing conventional machine learning, such as support vector machine (SVM) and k-nearest neighbors (KNN). Additionally, the proposed models yield better performance than other deep learning techniques, such as deep believe networks (DBNs) and CNNs. PMID:29113103

  7. Deep Recurrent Neural Networks for Human Activity Recognition.

    PubMed

    Murad, Abdulmajid; Pyun, Jae-Young

    2017-11-06

    Adopting deep learning methods for human activity recognition has been effective in extracting discriminative features from raw input sequences acquired from body-worn sensors. Although human movements are encoded in a sequence of successive samples in time, typical machine learning methods perform recognition tasks without exploiting the temporal correlations between input data samples. Convolutional neural networks (CNNs) address this issue by using convolutions across a one-dimensional temporal sequence to capture dependencies among input data. However, the size of convolutional kernels restricts the captured range of dependencies between data samples. As a result, typical models are unadaptable to a wide range of activity-recognition configurations and require fixed-length input windows. In this paper, we propose the use of deep recurrent neural networks (DRNNs) for building recognition models that are capable of capturing long-range dependencies in variable-length input sequences. We present unidirectional, bidirectional, and cascaded architectures based on long short-term memory (LSTM) DRNNs and evaluate their effectiveness on miscellaneous benchmark datasets. Experimental results show that our proposed models outperform methods employing conventional machine learning, such as support vector machine (SVM) and k-nearest neighbors (KNN). Additionally, the proposed models yield better performance than other deep learning techniques, such as deep believe networks (DBNs) and CNNs.

  8. Novel microbial diversity retrieved by autonomous robotic exploration of the world's deepest vertical phreatic sinkhole.

    PubMed

    Sahl, Jason W; Fairfield, Nathaniel; Harris, J Kirk; Wettergreen, David; Stone, William C; Spear, John R

    2010-03-01

    The deep phreatic thermal explorer (DEPTHX) is an autonomous underwater vehicle designed to navigate an unexplored environment, generate high-resolution three-dimensional (3-D) maps, collect biological samples based on an autonomous sampling decision, and return to its origin. In the spring of 2007, DEPTHX was deployed in Zacatón, a deep (approximately 318 m), limestone, phreatic sinkhole (cenote) in northeastern Mexico. As DEPTHX descended, it generated a 3-D map based on the processing of range data from 54 onboard sonars. The vehicle collected water column samples and wall biomat samples throughout the depth profile of the cenote. Post-expedition sample analysis via comparative analysis of 16S rRNA gene sequences revealed a wealth of microbial diversity. Traditional Sanger gene sequencing combined with a barcoded-amplicon pyrosequencing approach revealed novel, phylum-level lineages from the domains Bacteria and Archaea; in addition, several novel subphylum lineages were also identified. Overall, DEPTHX successfully navigated and mapped Zacatón, and collected biological samples based on an autonomous decision, which revealed novel microbial diversity in a previously unexplored environment.

  9. Draft Genome Sequence of Pseudomonas oceani DSM 100277T, a Deep-Sea Bacterium.

    PubMed

    García-Valdés, Elena; Gomila, Margarita; Mulet, Magdalena; Lalucat, Jorge

    2018-04-12

    Pseudomonas oceani DSM 100277 T was isolated from deep seawater in the Okinawa Trough at 1390 m. P. oceani belongs to the Pseudomonas pertucinogena group. Here, we report the draft genome sequence of P. oceani , which has an estimated size of 4.1 Mb and exhibits 3,790 coding sequences, with a G+C content of 59.94 mol%. Copyright © 2018 García-Valdés et al.

  10. Deep Ion Torrent sequencing identifies soil fungal community shifts after frequent prescribed fires in a southeastern US forest ecosystem.

    PubMed

    Brown, Shawn P; Callaham, Mac A; Oliver, Alena K; Jumpponen, Ari

    2013-12-01

    Prescribed burning is a common management tool to control fuel loads, ground vegetation, and facilitate desirable game species. We evaluated soil fungal community responses to long-term prescribed fire treatments in a loblolly pine forest on the Piedmont of Georgia and utilized deep Internal Transcribed Spacer Region 1 (ITS1) amplicon sequencing afforded by the recent Ion Torrent Personal Genome Machine (PGM). These deep sequence data (19,000 + reads per sample after subsampling) indicate that frequent fires (3-year fire interval) shift soil fungus communities, whereas infrequent fires (6-year fire interval) permit system resetting to a state similar to that without prescribed fire. Furthermore, in nonmetric multidimensional scaling analyses, primarily ectomycorrhizal taxa were correlated with axes associated with long fire intervals, whereas soil saprobes tended to be correlated with the frequent fire recurrence. We conclude that (1) multiplexed Ion Torrent PGM analyses allow deep cost effective sequencing of fungal communities but may suffer from short read lengths and inconsistent sequence quality adjacent to the sequencing adaptor; (2) frequent prescribed fires elicit a shift in soil fungal communities; and (3) such shifts do not occur when fire intervals are longer. Our results emphasize the general responsiveness of these forests to management, and the importance of fire return intervals in meeting management objectives. © 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.

  11. Digital Marine Bioprospecting: Mining New Neurotoxin Drug Candidates from the Transcriptomes of Cold-Water Sea Anemones

    PubMed Central

    Urbarova, Ilona; Karlsen, Bård Ove; Okkenhaug, Siri; Seternes, Ole Morten; Johansen, Steinar D.; Emblem, Åse

    2012-01-01

    Marine bioprospecting is the search for new marine bioactive compounds and large-scale screening in extracts represents the traditional approach. Here, we report an alternative complementary protocol, called digital marine bioprospecting, based on deep sequencing of transcriptomes. We sequenced the transcriptomes from the adult polyp stage of two cold-water sea anemones, Bolocera tuediae and Hormathia digitata. We generated approximately 1.1 million quality-filtered sequencing reads by 454 pyrosequencing, which were assembled into approximately 120,000 contigs and 220,000 single reads. Based on annotation and gene ontology analysis we profiled the expressed mRNA transcripts according to known biological processes. As a proof-of-concept we identified polypeptide toxins with a potential blocking activity on sodium and potassium voltage-gated channels from digital transcriptome libraries. PMID:23170083

  12. Deep sequencing reveals double mutations in cis of MPL exon 10 in myeloproliferative neoplasms.

    PubMed

    Pietra, Daniela; Brisci, Angela; Rumi, Elisa; Boggi, Sabrina; Elena, Chiara; Pietrelli, Alessandro; Bordoni, Roberta; Ferrari, Maurizio; Passamonti, Francesco; De Bellis, Gianluca; Cremonesi, Laura; Cazzola, Mario

    2011-04-01

    Somatic mutations of MPL exon 10, mainly involving a W515 substitution, have been described in JAK2 (V617F)-negative patients with essential thrombocythemia and primary myelofibrosis. We used direct sequencing and high-resolution melt analysis to identify mutations of MPL exon 10 in 570 patients with myeloproliferative neoplasms, and allele specific PCR and deep sequencing to further characterize a subset of mutated patients. Somatic mutations were detected in 33 of 221 patients (15%) with JAK2 (V617F)-negative essential thrombocythemia or primary myelofibrosis. Only one patient with essential thrombocythemia carried both JAK2 (V617F) and MPL (W515L). High-resolution melt analysis identified abnormal patterns in all the MPL mutated cases, while direct sequencing did not detect the mutant MPL in one fifth of them. In 3 cases carrying double MPL mutations, deep sequencing analysis showed identical load and location in cis of the paired lesions, indicating their simultaneous occurrence on the same chromosome.

  13. Construction and Evaluation of Normalized cDNA Libraries Enriched with Full-Length Sequences for Rapid Discovery of New Genes from Sisal (Agave sisalana Perr.) Different Developmental Stages

    PubMed Central

    Zhou, Wen-Zhao; Zhang, Yan-Mei; Lu, Jun-Ying; Li, Jun-Feng

    2012-01-01

    To provide a resource of sisal-specific expressed sequence data and facilitate this powerful approach in new gene research, the preparation of normalized cDNA libraries enriched with full-length sequences is necessary. Four libraries were produced with RNA pooled from Agave sisalana multiple tissues to increase efficiency of normalization and maximize the number of independent genes by SMART™ method and the duplex-specific nuclease (DSN). This procedure kept the proportion of full-length cDNAs in the subtracted/normalized libraries and dramatically enhanced the discovery of new genes. Sequencing of 3875 cDNA clones of libraries revealed 3320 unigenes with an average insert length about 1.2 kb, indicating that the non-redundancy of libraries was about 85.7%. These unigene functions were predicted by comparing their sequences to functional domain databases and extensively annotated with Gene Ontology (GO) terms. Comparative analysis of sisal unigenes and other plant genomes revealed that four putative MADS-box genes and knotted-like homeobox (knox) gene were obtained from a total of 1162 full-length transcripts. Furthermore, real-time PCR showed that the characteristics of their transcripts mainly depended on the tight expression regulation of a number of genes during the leaf and flower development. Analysis of individual library sequence data indicated that the pooled-tissue approach was highly effective in discovering new genes and preparing libraries for efficient deep sequencing. PMID:23202944

  14. Unbiased whole-genome deep sequencing of human and porcine stool samples reveals circulation of multiple groups of rotaviruses and a putative zoonotic infection

    PubMed Central

    Phan, My V. T.; Anh, Pham Hong; Cuong, Nguyen Van; Munnink, Bas B. Oude; van der Hoek, Lia; My, Phuc Tran; Tri, Tue Ngo; Bryant, Juliet E.; Baker, Stephen; Thwaites, Guy; Woolhouse, Mark; Kellam, Paul; Rabaa, Maia A.

    2016-01-01

    Abstract Coordinated and synchronous surveillance for zoonotic viruses in both human clinical cases and animal reservoirs provides an opportunity to identify interspecies virus movement. Rotavirus (RV) is an important cause of viral gastroenteritis in humans and animals. In this study, we document the RV diversity within co-located humans and animals sampled from the Mekong delta region of Vietnam using a primer-independent, agnostic, deep sequencing approach. A total of 296 stool samples (146 from diarrhoeal human patients and 150 from pigs living in the same geographical region) were directly sequenced, generating the genomic sequences of sixty human rotaviruses (all group A) and thirty-one porcine rotaviruses (thirteen group A, seven group B, six group C, and five group H). Phylogenetic analyses showed the co-circulation of multiple distinct RV group A (RVA) genotypes/strains, many of which were divergent from the strain components of licensed RVA vaccines, as well as considerable virus diversity in pigs including full genomes of rotaviruses in groups B, C, and H, none of which have been previously reported in Vietnam. Furthermore, the detection of an atypical RVA genotype constellation (G4-P[6]-I1-R1-C1-M1-A8-N1-T7-E1-H1) in a human patient and a pig from the same region provides some evidence for a zoonotic event. PMID:28748110

  15. Acute multi-sgRNA knockdown of KEOPS complex genes reproduces the microcephaly phenotype of the stable knockout zebrafish model

    PubMed Central

    Schneider, Ronen; Hoogstraten, Charlotte A.; Schapiro, David; Majmundar, Amar J.; Kolb, Amy; Eddy, Kaitlyn; Shril, Shirlee; Braun, Daniela A.; Poduri, Annapurna

    2018-01-01

    Until recently, morpholino oligonucleotides have been widely employed in zebrafish as an acute and efficient loss-of-function assay. However, off-target effects and reproducibility issues when compared to stable knockout lines have compromised their further use. Here we employed an acute CRISPR/Cas approach using multiple single guide RNAs targeting simultaneously different positions in two exemplar genes (osgep or tprkb) to increase the likelihood of generating mutations on both alleles in the injected F0 generation and to achieve a similar effect as morpholinos but with the reproducibility of stable lines. This multi single guide RNA approach resulted in median likelihoods for at least one mutation on each allele of >99% and sgRNA specific insertion/deletion profiles as revealed by deep-sequencing. Immunoblot showed a significant reduction for Osgep and Tprkb proteins. For both genes, the acute multi-sgRNA knockout recapitulated the microcephaly phenotype and reduction in survival that we observed previously in stable knockout lines, though milder in the acute multi-sgRNA knockout. Finally, we quantify the degree of mutagenesis by deep sequencing, and provide a mathematical model to quantitate the chance for a biallelic loss-of-function mutation. Our findings can be generalized to acute and stable CRISPR/Cas targeting for any zebrafish gene of interest. PMID:29346415

  16. DNA barcoding reveals seasonal shifts in diet and consumption of deep-sea fishes in wedge-tailed shearwaters

    PubMed Central

    Ando, Haruko; Horikoshi, Kazuo; Suzuki, Hajime; Isagi, Yuji

    2018-01-01

    The foraging ecology of pelagic seabirds is difficult to characterize because of their large foraging areas. In the face of this difficulty, DNA metabarcoding may be a useful approach to analyze diet compositions and foraging behaviors. Using this approach, we investigated the diet composition and its seasonal variation of a common seabird species on the Ogasawara Islands, Japan: the wedge-tailed shearwater Ardenna pacifica. We collected fecal samples during the prebreeding (N = 73) and rearing (N = 96) periods. The diet composition of wedge-tailed shearwater was analyzed by Ion Torrent sequencing using two universal polymerase chain reaction primers for the 12S and 16S mitochondrial DNA regions that targeted vertebrates and mollusks, respectively. The results of a BLAST search of obtained sequences detected 31 and 1 vertebrate and mollusk taxa, respectively. The results of the diet composition analysis showed that wedge-tailed shearwaters frequently consumed deep-sea fishes throughout the sampling season, indicating the importance of these fishes as a stable food resource. However, there was a marked seasonal shift in diet, which may reflect seasonal changes in food resource availability and wedge-tailed shearwater foraging behavior. The collected data regarding the shearwater diet may be useful for in situ conservation efforts. Future research that combines DNA metabarcoding with other tools, such as data logging, may provide further insight into the foraging ecology of pelagic seabirds. PMID:29630670

  17. Novel Lipolytic Enzymes Identified from Metagenomic Library of Deep-Sea Sediment

    PubMed Central

    Jeon, Jeong Ho; Kim, Jun Tae; Lee, Hyun Sook; Kim, Sang-Jin; Kang, Sung Gyun; Choi, Sang Ho; Lee, Jung-Hyun

    2011-01-01

    Metagenomic library was constructed from a deep-sea sediment sample and screened for lipolytic activity. Open-reading frames of six positive clones showed only 33–58% amino acid identities to the known proteins. One of them was assigned to a new group while others were grouped into Families I and V or EstD Family. By employing a combination of approaches such as removing the signal sequence, coexpression of chaperone genes, and low temperature induction, we obtained five soluble recombinant proteins in Escherichia coli. The purified enzymes had optimum temperatures of 30–35°C and the cold-activity property. Among them, one enzyme showed lipase activity by preferentially hydrolyzing p-nitrophenyl palmitate and p-nitrophenyl stearate and high salt resistance with up to 4 M NaCl. Our research demonstrates the feasibility of developing novel lipolytic enzymes from marine environments by the combination of functional metagenomic approach and protein expression technology. PMID:21845199

  18. RaptorX-Property: a web server for protein structure property prediction.

    PubMed

    Wang, Sheng; Li, Wei; Liu, Shiwang; Xu, Jinbo

    2016-07-08

    RaptorX Property (http://raptorx2.uchicago.edu/StructurePropertyPred/predict/) is a web server predicting structure property of a protein sequence without using any templates. It outperforms other servers, especially for proteins without close homologs in PDB or with very sparse sequence profile (i.e. carries little evolutionary information). This server employs a powerful in-house deep learning model DeepCNF (Deep Convolutional Neural Fields) to predict secondary structure (SS), solvent accessibility (ACC) and disorder regions (DISO). DeepCNF not only models complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent property labels. Our experimental results show that, tested on CASP10, CASP11 and the other benchmarks, this server can obtain ∼84% Q3 accuracy for 3-state SS, ∼72% Q8 accuracy for 8-state SS, ∼66% Q3 accuracy for 3-state solvent accessibility, and ∼0.89 area under the ROC curve (AUC) for disorder prediction. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. Aquifer Vulnerability Assessment Based on Sequence Stratigraphic and ³⁹Ar Transport Modeling.

    PubMed

    Sonnenborg, Torben O; Scharling, Peter B; Hinsby, Klaus; Rasmussen, Erik S; Engesgaard, Peter

    2016-03-01

    A large-scale groundwater flow and transport model is developed for a deep-seated (100 to 300 m below ground surface) sedimentary aquifer system. The model is based on a three-dimensional (3D) hydrostratigraphic model, building on a sequence stratigraphic approach. The flow model is calibrated against observations of hydraulic head and stream discharge while the credibility of the transport model is evaluated against measurements of (39)Ar from deep wells using alternative parameterizations of dispersivity and effective porosity. The directly simulated 3D mean age distributions and vertical fluxes are used to visualize the two-dimensional (2D)/3D age and flux distribution along transects and at the top plane of individual aquifers. The simulation results are used to assess the vulnerability of the aquifer system that generally has been assumed to be protected by thick overlaying clayey units and therefore proposed as future reservoirs for drinking water supply. The results indicate that on a regional scale these deep-seated aquifers are not as protected from modern surface water contamination as expected because significant leakage to the deeper aquifers occurs. The complex distribution of local and intermediate groundwater flow systems controlled by the distribution of the river network as well as the topographical variation (Tóth 1963) provides the possibility for modern water to be found in even the deepest aquifers. © 2015, National Ground Water Association.

  20. Identification of ribonucleotide reductase mutation causing temperature-sensitivity of herpes simplex virus isolates from whitlow by deep sequencing.

    PubMed

    Daikoku, Tohru; Oyama, Yukari; Yajima, Misako; Sekizuka, Tsuyoshi; Kuroda, Makoto; Shimada, Yuka; Takehara, Kazuhiko; Miwa, Naoko; Okuda, Tomoko; Sata, Tetsutaro; Shiraki, Kimiyasu

    2015-06-01

    Herpes simplex virus 2 caused a genital ulcer, and a secondary herpetic whitlow appeared during acyclovir therapy. The secondary and recurrent whitlow isolates were acyclovir-resistant and temperature-sensitive in contrast to a genital isolate. We identified the ribonucleotide reductase mutation responsible for temperature-sensitivity by deep-sequencing analysis.

  1. De Novo Generation and Characterization of New Zika Virus Isolate Using Sequence Data from a Microcephaly Case

    PubMed Central

    Setoh, Yin Xiang; Prow, Natalie A.; Peng, Nias; Hugo, Leon E.; Devine, Gregor; Hazlewood, Jessamine E.

    2017-01-01

    ABSTRACT Zika virus (ZIKV) has recently emerged and is the etiological agent of congenital Zika syndrome (CZS), a spectrum of congenital abnormalities arising from neural tissue infections in utero. Herein, we describe the de novo generation of a new ZIKV isolate, ZIKVNatal, using a modified circular polymerase extension reaction protocol and sequence data obtained from a ZIKV-infected fetus with microcephaly. ZIKVNatal thus has no laboratory passage history and is unequivocally associated with CZS. ZIKVNatal could be used to establish a fetal brain infection model in IFNAR−/− mice (including intrauterine growth restriction) without causing symptomatic infections in dams. ZIKVNatal was also able to be transmitted by Aedes aegypti mosquitoes. ZIKVNatal thus retains key aspects of circulating pathogenic ZIKVs and illustrates a novel methodology for obtaining an authentic functional viral isolate by using data from deep sequencing of infected tissues. IMPORTANCE The major complications of an ongoing Zika virus outbreak in the Americas and Asia are congenital defects caused by the virus’s ability to cross the placenta and infect the fetal brain. The ability to generate molecular tools to analyze viral isolates from the current outbreak is essential for furthering our understanding of how these viruses cause congenital defects. The majority of existing viral isolates and infectious cDNA clones generated from them have undergone various numbers of passages in cell culture and/or suckling mice, which is likely to result in the accumulation of adaptive mutations that may affect viral properties. The approach described herein allows rapid generation of new, fully functional Zika virus isolates directly from deep sequencing data from virus-infected tissues without the need for prior virus passaging and for the generation and propagation of full-length cDNA clones. The approach should be applicable to other medically important flaviviruses and perhaps other positive-strand RNA viruses. PMID:28529976

  2. De Novo Generation and Characterization of New Zika Virus Isolate Using Sequence Data from a Microcephaly Case.

    PubMed

    Setoh, Yin Xiang; Prow, Natalie A; Peng, Nias; Hugo, Leon E; Devine, Gregor; Hazlewood, Jessamine E; Suhrbier, Andreas; Khromykh, Alexander A

    2017-01-01

    Zika virus (ZIKV) has recently emerged and is the etiological agent of congenital Zika syndrome (CZS), a spectrum of congenital abnormalities arising from neural tissue infections in utero . Herein, we describe the de novo generation of a new ZIKV isolate, ZIKV Natal , using a modified circular polymerase extension reaction protocol and sequence data obtained from a ZIKV-infected fetus with microcephaly. ZIKV Natal thus has no laboratory passage history and is unequivocally associated with CZS. ZIKV Natal could be used to establish a fetal brain infection model in IFNAR -/- mice (including intrauterine growth restriction) without causing symptomatic infections in dams. ZIKV Natal was also able to be transmitted by Aedes aegypti mosquitoes. ZIKV Natal thus retains key aspects of circulating pathogenic ZIKVs and illustrates a novel methodology for obtaining an authentic functional viral isolate by using data from deep sequencing of infected tissues. IMPORTANCE The major complications of an ongoing Zika virus outbreak in the Americas and Asia are congenital defects caused by the virus's ability to cross the placenta and infect the fetal brain. The ability to generate molecular tools to analyze viral isolates from the current outbreak is essential for furthering our understanding of how these viruses cause congenital defects. The majority of existing viral isolates and infectious cDNA clones generated from them have undergone various numbers of passages in cell culture and/or suckling mice, which is likely to result in the accumulation of adaptive mutations that may affect viral properties. The approach described herein allows rapid generation of new, fully functional Zika virus isolates directly from deep sequencing data from virus-infected tissues without the need for prior virus passaging and for the generation and propagation of full-length cDNA clones. The approach should be applicable to other medically important flaviviruses and perhaps other positive-strand RNA viruses.

  3. Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data.

    PubMed

    Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A; Larsen, Martin Jakob

    2016-01-01

    Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths.

  4. Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data

    PubMed Central

    Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A.; Larsen, Martin Jakob

    2016-01-01

    Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths. PMID:27002637

  5. Ribosome profiling reveals the what, when, where and how of protein synthesis.

    PubMed

    Brar, Gloria A; Weissman, Jonathan S

    2015-11-01

    Ribosome profiling, which involves the deep sequencing of ribosome-protected mRNA fragments, is a powerful tool for globally monitoring translation in vivo. The method has facilitated discovery of the regulation of gene expression underlying diverse and complex biological processes, of important aspects of the mechanism of protein synthesis, and even of new proteins, by providing a systematic approach for experimental annotation of coding regions. Here, we introduce the methodology of ribosome profiling and discuss examples in which this approach has been a key factor in guiding biological discovery, including its prominent role in identifying thousands of novel translated short open reading frames and alternative translation products.

  6. MUFOLD-SS: New deep inception-inside-inception networks for protein secondary structure prediction.

    PubMed

    Fang, Chao; Shang, Yi; Xu, Dong

    2018-05-01

    Protein secondary structure prediction can provide important information for protein 3D structure prediction and protein functions. Deep learning offers a new opportunity to significantly improve prediction accuracy. In this article, a new deep neural network architecture, named the Deep inception-inside-inception (Deep3I) network, is proposed for protein secondary structure prediction and implemented as a software tool MUFOLD-SS. The input to MUFOLD-SS is a carefully designed feature matrix corresponding to the primary amino acid sequence of a protein, which consists of a rich set of information derived from individual amino acid, as well as the context of the protein sequence. Specifically, the feature matrix is a composition of physio-chemical properties of amino acids, PSI-BLAST profile, and HHBlits profile. MUFOLD-SS is composed of a sequence of nested inception modules and maps the input matrix to either eight states or three states of secondary structures. The architecture of MUFOLD-SS enables effective processing of local and global interactions between amino acids in making accurate prediction. In extensive experiments on multiple datasets, MUFOLD-SS outperformed the best existing methods and other deep neural networks significantly. MUFold-SS can be downloaded from http://dslsrv8.cs.missouri.edu/~cf797/MUFoldSS/download.html. © 2018 Wiley Periodicals, Inc.

  7. Large-Scale Interaction Profiling of Protein Domains Through Proteomic Peptide-Phage Display Using Custom Peptidomes.

    PubMed

    Seo, Moon-Hyeong; Nim, Satra; Jeon, Jouhyun; Kim, Philip M

    2017-01-01

    Protein-protein interactions are essential to cellular functions and signaling pathways. We recently combined bioinformatics and custom oligonucleotide arrays to construct custom-made peptide-phage libraries for screening peptide-protein interactions, an approach we call proteomic peptide-phage display (ProP-PD). In this chapter, we describe protocols for phage display for the identification of natural peptide binders for a given protein. We finally describe deep sequencing for the analysis of the proteomic peptide-phage display.

  8. Rapid Creation and Quantitative Monitoring of High Coverage shRNA Libraries

    PubMed Central

    Bassik, Michael C.; Lebbink, Robert Jan; Churchman, L. Stirling; Ingolia, Nicholas T.; Patena, Weronika; LeProust, Emily M.; Schuldiner, Maya; Weissman, Jonathan S.; McManus, Michael T.

    2009-01-01

    Short hairpin RNA (shRNA) libraries are limited by the low efficacy of many shRNAs, giving false negatives, and off-target effects, giving false positives. Here we present a strategy for rapidly creating expanded shRNA pools (∼30 shRNAs/gene) that are analyzed by deep-sequencing (EXPAND). This approach enables identification of multiple effective target-specific shRNAs from a complex pool, allowing a rigorous statistical evaluation of whether a gene is a true hit. PMID:19448642

  9. Integrative Data Analysis of Multi-Platform Cancer Data with a Multimodal Deep Learning Approach.

    PubMed

    Liang, Muxuan; Li, Zhizhong; Chen, Ting; Zeng, Jianyang

    2015-01-01

    Identification of cancer subtypes plays an important role in revealing useful insights into disease pathogenesis and advancing personalized therapy. The recent development of high-throughput sequencing technologies has enabled the rapid collection of multi-platform genomic data (e.g., gene expression, miRNA expression, and DNA methylation) for the same set of tumor samples. Although numerous integrative clustering approaches have been developed to analyze cancer data, few of them are particularly designed to exploit both deep intrinsic statistical properties of each input modality and complex cross-modality correlations among multi-platform input data. In this paper, we propose a new machine learning model, called multimodal deep belief network (DBN), to cluster cancer patients from multi-platform observation data. In our integrative clustering framework, relationships among inherent features of each single modality are first encoded into multiple layers of hidden variables, and then a joint latent model is employed to fuse common features derived from multiple input modalities. A practical learning algorithm, called contrastive divergence (CD), is applied to infer the parameters of our multimodal DBN model in an unsupervised manner. Tests on two available cancer datasets show that our integrative data analysis approach can effectively extract a unified representation of latent features to capture both intra- and cross-modality correlations, and identify meaningful disease subtypes from multi-platform cancer data. In addition, our approach can identify key genes and miRNAs that may play distinct roles in the pathogenesis of different cancer subtypes. Among those key miRNAs, we found that the expression level of miR-29a is highly correlated with survival time in ovarian cancer patients. These results indicate that our multimodal DBN based data analysis approach may have practical applications in cancer pathogenesis studies and provide useful guidelines for personalized cancer therapy.

  10. Methods, Tools and Current Perspectives in Proteogenomics *

    PubMed Central

    Ruggles, Kelly V.; Krug, Karsten; Wang, Xiaojing; Clauser, Karl R.; Wang, Jing; Payne, Samuel H.; Fenyö, David; Zhang, Bing; Mani, D. R.

    2017-01-01

    With combined technological advancements in high-throughput next-generation sequencing and deep mass spectrometry-based proteomics, proteogenomics, i.e. the integrative analysis of proteomic and genomic data, has emerged as a new research field. Early efforts in the field were focused on improving protein identification using sample-specific genomic and transcriptomic sequencing data. More recently, integrative analysis of quantitative measurements from genomic and proteomic studies have identified novel insights into gene expression regulation, cell signaling, and disease. Many methods and tools have been developed or adapted to enable an array of integrative proteogenomic approaches and in this article, we systematically classify published methods and tools into four major categories, (1) Sequence-centric proteogenomics; (2) Analysis of proteogenomic relationships; (3) Integrative modeling of proteogenomic data; and (4) Data sharing and visualization. We provide a comprehensive review of methods and available tools in each category and highlight their typical applications. PMID:28456751

  11. Identification of MicroRNAs in Helicoverpa armigera and Spodoptera litura Based on Deep Sequencing and Homology Analysis

    PubMed Central

    Ge, Xie; Zhang, Yong; Jiang, Jianhao; Zhong, Yi; Yang, Xiaonan; Li, Zhiqian; Huang, Yongping; Tan, Anjiang

    2013-01-01

    The current identification of microRNAs (miRNAs) in insects is largely dependent on genome sequences. However, the lack of available genome sequences inhibits the identification of miRNAs in various insect species. In this study, we used a miRNA database of the silkworm Bombyx mori as a reference to identify miRNAs in Helicoverpa armigera and Spodoptera litura using deep sequencing and homology analysis. Because all three species belong to the Lepidoptera, the experiment produced reliable results. Our study identified 97 and 91 conserved miRNAs in H. armigera and S. litura, respectively. Using the genome of B. mori and BAC sequences of H. armigera as references, 1 novel miRNA and 8 novel miRNA candidates were identified in H. armigera, and 4 novel miRNA candidates were identified in S. litura. An evolutionary analysis revealed that most of the identified miRNAs were insect-specific, and more than 20 miRNAs were Lepidoptera-specific. The investigation of the expression patterns of miR-2a, miR-34, miR-2796-3p and miR-11 revealed their potential roles in insect development. miRNA target prediction revealed that conserved miRNA target sites exist in various genes in the 3 species. Conserved miRNA target sites for the Hsp90 gene among the 3 species were validated in the mammalian 293T cell line using a dual-luciferase reporter assay. Our study provides a new approach with which to identify miRNAs in insects lacking genome information and contributes to the functional analysis of insect miRNAs. PMID:23289012

  12. Deep COI sequencing of standardized benthic samples unveils overlooked diversity of Jordanian coral reefs in the northern Red Sea.

    PubMed

    Al-Rshaidat, Mamoon M D; Snider, Allison; Rosebraugh, Sydney; Devine, Amanda M; Devine, Thomas D; Plaisance, Laetitia; Knowlton, Nancy; Leray, Matthieu

    2016-09-01

    High-throughput sequencing (HTS) of DNA barcodes (metabarcoding), particularly when combined with standardized sampling protocols, is one of the most promising approaches for censusing overlooked cryptic invertebrate communities. We present biodiversity estimates based on sequencing of the cytochrome c oxidase subunit 1 (COI) gene for coral reefs of the Gulf of Aqaba, a semi-enclosed system in the northern Red Sea. Samples were obtained from standardized sampling devices (Autonomous Reef Monitoring Structures (ARMS)) deployed for 18 months. DNA barcoding of non-sessile specimens >2 mm revealed 83 OTUs in six phyla, of which only 25% matched a reference sequence in public databases. Metabarcoding of the 2 mm - 500 μm and sessile bulk fractions revealed 1197 OTUs in 15 animal phyla, of which only 4.9% matched reference barcodes. These results highlight the scarcity of COI data for cryptobenthic organisms of the Red Sea. Compared with data obtained using similar methods, our results suggest that Gulf of Aqaba reefs are less diverse than two Pacific coral reefs but much more diverse than an Atlantic oyster reef at a similar latitude. The standardized approaches used here show promise for establishing baseline data on biodiversity, monitoring the impacts of environmental change, and quantifying patterns of diversity at regional and global scales.

  13. De Novo Peptide Sequencing: Deep Mining of High-Resolution Mass Spectrometry Data.

    PubMed

    Islam, Mohammad Tawhidul; Mohamedali, Abidali; Fernandes, Criselda Santan; Baker, Mark S; Ranganathan, Shoba

    2017-01-01

    High resolution mass spectrometry has revolutionized proteomics over the past decade, resulting in tremendous amounts of data in the form of mass spectra, being generated in a relatively short span of time. The mining of this spectral data for analysis and interpretation though has lagged behind such that potentially valuable data is being overlooked because it does not fit into the mold of traditional database searching methodologies. Although the analysis of spectra by de novo sequences removes such biases and has been available for a long period of time, its uptake has been slow or almost nonexistent within the scientific community. In this chapter, we propose a methodology to integrate de novo peptide sequencing using three commonly available software solutions in tandem, complemented by homology searching, and manual validation of spectra. This simplified method would allow greater use of de novo sequencing approaches and potentially greatly increase proteome coverage leading to the unearthing of valuable insights into protein biology, especially of organisms whose genomes have been recently sequenced or are poorly annotated.

  14. The Role Of Rejuvenation In Shaping The High-Mass End Of The Main Sequence

    NASA Astrophysics Data System (ADS)

    Mancini, Chiara

    2017-06-01

    We investigate the nature of star forming galaxies with reduced specific SFRs and high stellar masses, those that seemingly cause the so-called bending of the main sequence. The fact that such objects host large bulges recently lead some to suggest that the internal formation of the bulges, via compaction or disk instabilities, was the late event that induced sSFRs of massive galaxies to drop in a slow downfall and thus the main sequence to bend. We have studied in detail a sample of 16 galaxies at 0.5

  15. High-throughput sequencing and analysis of the gill tissue transcriptome from the deep-sea hydrothermal vent mussel Bathymodiolus azoricus

    PubMed Central

    2010-01-01

    Background Bathymodiolus azoricus is a deep-sea hydrothermal vent mussel found in association with large faunal communities living in chemosynthetic environments at the bottom of the sea floor near the Azores Islands. Investigation of the exceptional physiological reactions that vent mussels have adopted in their habitat, including responses to environmental microbes, remains a difficult challenge for deep-sea biologists. In an attempt to reveal genes potentially involved in the deep-sea mussel innate immunity we carried out a high-throughput sequence analysis of freshly collected B. azoricus transcriptome using gills tissues as the primary source of immune transcripts given its strategic role in filtering the surrounding waterborne potentially infectious microorganisms. Additionally, a substantial EST data set was produced and from which a comprehensive collection of genes coding for putative proteins was organized in a dedicated database, "DeepSeaVent" the first deep-sea vent animal transcriptome database based on the 454 pyrosequencing technology. Results A normalized cDNA library from gills tissue was sequenced in a full 454 GS-FLX run, producing 778,996 sequencing reads. Assembly of the high quality reads resulted in 75,407 contigs of which 3,071 were singletons. A total of 39,425 transcripts were conceptually translated into amino-sequences of which 22,023 matched known proteins in the NCBI non-redundant protein database, 15,839 revealed conserved protein domains through InterPro functional classification and 9,584 were assigned with Gene Ontology terms. Queries conducted within the database enabled the identification of genes putatively involved in immune and inflammatory reactions which had not been previously evidenced in the vent mussel. Their physical counterpart was confirmed by semi-quantitative quantitative Reverse-Transcription-Polymerase Chain Reactions (RT-PCR) and their RNA transcription level by quantitative PCR (qPCR) experiments. Conclusions We have established the first tissue transcriptional analysis of a deep-sea hydrothermal vent animal and generated a searchable catalog of genes that provides a direct method of identifying and retrieving vast numbers of novel coding sequences which can be applied in gene expression profiling experiments from a non-conventional model organism. This provides the most comprehensive sequence resource for identifying novel genes currently available for a deep-sea vent organism, in particular, genes putatively involved in immune and inflammatory reactions in vent mussels. The characterization of the B. azoricus transcriptome will facilitate research into biological processes underlying physiological adaptations to hydrothermal vent environments and will provide a basis for expanding our understanding of genes putatively involved in adaptations processes during post-capture long term acclimatization experiments, at "sea-level" conditions, using B. azoricus as a model organism. PMID:20937131

  16. A Follow-Up of the Multicenter Collaborative Study on HIV-1 Drug Resistance and Tropism Testing Using 454 Ultra Deep Pyrosequencing

    PubMed Central

    St. John, Elizabeth P.; Simen, Birgitte B.; Turenchalk, Gregory S.; Braverman, Michael S.; Abbate, Isabella; Aerssens, Jeroen; Bouchez, Olivier; Gabriel, Christian; Izopet, Jacques; Meixenberger, Karolin; Di Giallonardo, Francesca; Schlapbach, Ralph; Paredes, Roger; Sakwa, James; Schmitz-Agheguian, Gudrun G.; Thielen, Alexander; Victor, Martin

    2016-01-01

    Background Ultra deep sequencing is of increasing use not only in research but also in diagnostics. For implementation of ultra deep sequencing assays in clinical laboratories for routine diagnostics, intra- and inter-laboratory testing are of the utmost importance. Methods A multicenter study was conducted to validate an updated assay design for 454 Life Sciences’ GS FLX Titanium system targeting protease/reverse transcriptase (RTP) and env (V3) regions to identify HIV-1 drug-resistance mutations and determine co-receptor use with high sensitivity. The study included 30 HIV-1 subtype B and 6 subtype non-B samples with viral titers (VT) of 3,940–447,400 copies/mL, two dilution series (52,129–1,340 and 25,130–734 copies/mL), and triplicate samples. Amplicons spanning PR codons 10–99, RT codons 1–251 and the entire V3 region were generated using barcoded primers. Analysis was performed using the GS Amplicon Variant Analyzer and geno2pheno for tropism. For comparison, population sequencing was performed using the ViroSeq HIV-1 genotyping system. Results The median sequencing depth across the 11 sites was 1,829 reads per position for RTP (IQR 592–3,488) and 2,410 for V3 (IQR 786–3,695). 10 preselected drug resistant variants were measured across sites and showed high inter-laboratory correlation across all sites with data (P<0.001). The triplicate samples of a plasmid mixture confirmed the high inter-laboratory consistency (mean% ± stdev: 4.6 ±0.5, 4.8 ±0.4, 4.9 ±0.3) and revealed good intra-laboratory consistency (mean% range ± stdev range: 4.2–5.2 ± 0.04–0.65). In the two dilutions series, no variants >20% were missed, variants 2–10% were detected at most sites (even at low VT), and variants 1–2% were detected by some sites. All mutations detected by population sequencing were also detected by UDS. Conclusions This assay design results in an accurate and reproducible approach to analyze HIV-1 mutant spectra, even at variant frequencies well below those routinely detectable by population sequencing. PMID:26756901

  17. Reactive Sequencing for Autonomous Navigation Evolving from Phoenix Entry, Descent, and Landing

    NASA Technical Reports Server (NTRS)

    Grasso, Christopher A.; Riedel, Joseph E.; Vaughan, Andrew T.

    2010-01-01

    Virtual Machine Language (VML) is an award-winning advanced procedural sequencing language in use on NASA deep-space missions since 1997, and was used for the successful entry, descent, and landing (EDL) of the Phoenix spacecraft onto the surface of Mars. Phoenix EDL utilized a state-oriented operations architecture which executed within the constraints of the existing VML 2.0 flight capability, compatible with the linear "land or die" nature of the mission. The intricacies of Phoenix EDL included the planned discarding of portions of the vehicle, the complex communications management for relay through on-orbit assets, the presence of temporally indeterminate physical events, and the need to rapidly catch up four days of sequencing should a reboot of the spacecraft flight computer occur shortly before atmospheric entry. These formidable operational challenges led to new techniques for packaging and coordinating reusable sequences called blocks using one-way synchronization via VML sequencing global variable events. The coordinated blocks acted as an ensemble to land the spacecraft, while individually managing various elements in as simple a fashion as possible. This paper outlines prototype VML 2.1 flight capabilities that have evolved from the one-way synchronization techniques in order to implement even more ambitious autonomous mission capabilities. Target missions for these new capabilities include autonomous touch-and-go sampling of cometary and asteroidal bodies, lunar landing of robotic missions, and ultimately landing of crewed lunar vehicles. Close proximity guidance, navigation, and control operations, on-orbit rendezvous, and descent and landing events featured in these missions require elaborate abort capability, manifesting highly non-linear scenarios that are so complex as to overtax traditional sequencing, or even the sort of one-way coordinated sequencing used during EDL. Foreseeing advanced command and control needs for small body and lunar landing guidance, navigation and control scenarios, work began three years ago on substantial upgrades to VML that are now being exercised in scenarios for lunar landing and comet/asteroid rendezvous. The advanced state-based approach includes coordinated state transition machines with distributed decision-making logic. These state machines are not merely sequences - they are reactive logic constructs capable of autonomous decision making within a well-defined domain. Combined with the JPL's AutoNav software used on Deep Space 1 and Deep Impact, the system allows spacecraft to autonomously navigate to an unmapped surface, soft-contact, and either land or ascend. The state machine architecture enabled by VML 2.1 has successfully performed sampling missions and lunar descent missions in a simulated environment, and is progressing toward flight capability. The authors are also investigating using the VML 2.1 flight director architecture to perform autonomous activities like rendezvous with a passive hypothetical Mars sample return capsule. The approach being pursued is similar to the touch-and-go sampling state machines, with the added complications associated with the search for, physical capture of, and securing of a separate spacecraft. Complications include optically finding and tracking the Orbiting Sample Capsule (OSC), keeping the OSC illuminated, making orbital adjustments, and physically capturing the OSC. Other applications could include autonomous science collection and fault compensation.

  18. Sensitive Deep-Sequencing-Based HIV-1 Genotyping Assay To Simultaneously Determine Susceptibility to Protease, Reverse Transcriptase, Integrase, and Maturation Inhibitors, as Well as HIV-1 Coreceptor Tropism

    PubMed Central

    Gibson, Richard M.; Meyer, Ashley M.; Winner, Dane; Archer, John; Feyertag, Felix; Ruiz-Mateos, Ezequiel; Leal, Manuel; Robertson, David L.; Schmotzer, Christine L.

    2014-01-01

    With 29 individual antiretroviral drugs available from six classes that are approved for the treatment of HIV-1 infection, a combination of different phenotypic and genotypic tests is currently needed to monitor HIV-infected individuals. In this study, we developed a novel HIV-1 genotypic assay based on deep sequencing (DeepGen HIV) to simultaneously assess HIV-1 susceptibilities to all drugs targeting the three viral enzymes and to predict HIV-1 coreceptor tropism. Patient-derived gag-p2/NCp7/p1/p6/pol-PR/RT/IN- and env-C2V3 PCR products were sequenced using the Ion Torrent Personal Genome Machine. Reads spanning the 3′ end of the Gag, protease (PR), reverse transcriptase (RT), integrase (IN), and V3 regions were extracted, truncated, translated, and assembled for genotype and HIV-1 coreceptor tropism determination. DeepGen HIV consistently detected both minority drug-resistant viruses and non-R5 HIV-1 variants from clinical specimens with viral loads of ≥1,000 copies/ml and from B and non-B subtypes. Additional mutations associated with resistance to PR, RT, and IN inhibitors, previously undetected by standard (Sanger) population sequencing, were reliably identified at frequencies as low as 1%. DeepGen HIV results correlated with phenotypic (original Trofile, 92%; enhanced-sensitivity Trofile assay [ESTA], 80%; TROCAI, 81%; and VeriTrop, 80%) and genotypic (population sequencing/Geno2Pheno with a 10% false-positive rate [FPR], 84%) HIV-1 tropism test results. DeepGen HIV (83%) and Trofile (85%) showed similar concordances with the clinical response following an 8-day course of maraviroc monotherapy (MCT). In summary, this novel all-inclusive HIV-1 genotypic and coreceptor tropism assay, based on deep sequencing of the PR, RT, IN, and V3 regions, permits simultaneous multiplex detection of low-level drug-resistant and/or non-R5 viruses in up to 96 clinical samples. This comprehensive test, the first of its class, will be instrumental in the development of new antiretroviral drugs and, more importantly, will aid in the treatment and management of HIV-infected individuals. PMID:24468782

  19. Using High-Throughput Sequencing to Leverage Surveillance of Genetic Diversity and Oseltamivir Resistance: A Pilot Study during the 2009 Influenza A(H1N1) Pandemic

    PubMed Central

    Téllez-Sosa, Juan; Rodríguez, Mario Henry; Gómez-Barreto, Rosa E.; Valdovinos-Torres, Humberto; Hidalgo, Ana Cecilia; Cruz-Hervert, Pablo; Luna, René Santos; Carrillo-Valenzo, Erik; Ramos, Celso; García-García, Lourdes; Martínez-Barnetche, Jesús

    2013-01-01

    Background Influenza viruses display a high mutation rate and complex evolutionary patterns. Next-generation sequencing (NGS) has been widely used for qualitative and semi-quantitative assessment of genetic diversity in complex biological samples. The “deep sequencing” approach, enabled by the enormous throughput of current NGS platforms, allows the identification of rare genetic viral variants in targeted genetic regions, but is usually limited to a small number of samples. Methodology and Principal Findings We designed a proof-of-principle study to test whether redistributing sequencing throughput from a high depth-small sample number towards a low depth-large sample number approach is feasible and contributes to influenza epidemiological surveillance. Using 454-Roche sequencing, we sequenced at a rather low depth, a 307 bp amplicon of the neuraminidase gene of the Influenza A(H1N1) pandemic (A(H1N1)pdm) virus from cDNA amplicons pooled in 48 barcoded libraries obtained from nasal swab samples of infected patients (n  =  299) taken from May to November, 2009 pandemic period in Mexico. This approach revealed that during the transition from the first (May-July) to second wave (September-November) of the pandemic, the initial genetic variants were replaced by the N248D mutation in the NA gene, and enabled the establishment of temporal and geographic associations with genetic diversity and the identification of mutations associated with oseltamivir resistance. Conclusions NGS sequencing of a short amplicon from the NA gene at low sequencing depth allowed genetic screening of a large number of samples, providing insights to viral genetic diversity dynamics and the identification of genetic variants associated with oseltamivir resistance. Further research is needed to explain the observed replacement of the genetic variants seen during the second wave. As sequencing throughput rises and library multiplexing and automation improves, we foresee that the approach presented here can be scaled up for global genetic surveillance of influenza and other infectious diseases. PMID:23843978

  20. Dendrites, deep learning, and sequences in the hippocampus.

    PubMed

    Bhalla, Upinder S

    2017-10-12

    The hippocampus places us both in time and space. It does so over remarkably large spans: milliseconds to years, and centimeters to kilometers. This works for sensory representations, for memory, and for behavioral context. How does it fit in such wide ranges of time and space scales, and keep order among the many dimensions of stimulus context? A key organizing principle for a wide sweep of scales and stimulus dimensions is that of order in time, or sequences. Sequences of neuronal activity are ubiquitous in sensory processing, in motor control, in planning actions, and in memory. Against this strong evidence for the phenomenon, there are currently more models than definite experiments about how the brain generates ordered activity. The flip side of sequence generation is discrimination. Discrimination of sequences has been extensively studied at the behavioral, systems, and modeling level, but again physiological mechanisms are fewer. It is against this backdrop that I discuss two recent developments in neural sequence computation, that at face value share little beyond the label "neural." These are dendritic sequence discrimination, and deep learning. One derives from channel physiology and molecular signaling, the other from applied neural network theory - apparently extreme ends of the spectrum of neural circuit detail. I suggest that each of these topics has deep lessons about the possible mechanisms, scales, and capabilities of hippocampal sequence computation. © 2017 Wiley Periodicals, Inc.

  1. De novo transcriptome assembly and positive selection analysis of an individual deep-sea fish.

    PubMed

    Lan, Yi; Sun, Jin; Xu, Ting; Chen, Chong; Tian, Renmao; Qiu, Jian-Wen; Qian, Pei-Yuan

    2018-05-24

    High hydrostatic pressure and low temperatures make the deep sea a harsh environment for life forms. Actin organization and microtubules assembly, which are essential for intracellular transport and cell motility, can be disrupted by high hydrostatic pressure. High hydrostatic pressure can also damage DNA. Nucleic acids exposed to low temperatures can form secondary structures that hinder genetic information processing. To study how deep-sea creatures adapt to such a hostile environment, one of the most straightforward ways is to sequence and compare their genes with those of their shallow-water relatives. We captured an individual of the fish species Aldrovandia affinis, which is a typical deep-sea inhabitant, from the Okinawa Trough at a depth of 1550 m using a remotely operated vehicle (ROV). We sequenced its transcriptome and analyzed its molecular adaptation. We obtained 27,633 protein coding sequences using an Illumina platform and compared them with those of several shallow-water fish species. Analysis of 4918 single-copy orthologs identified 138 positively selected genes in A. affinis, including genes involved in microtubule regulation. Particularly, functional domains related to cold shock as well as DNA repair are exposed to positive selection pressure in both deep-sea fish and hadal amphipod. Overall, we have identified a set of positively selected genes related to cytoskeleton structures, DNA repair and genetic information processing, which shed light on molecular adaptation to the deep sea. These results suggest that amino acid substitutions of these positively selected genes may contribute crucially to the adaptation of deep-sea animals. Additionally, we provide a high-quality transcriptome of a deep-sea fish for future deep-sea studies.

  2. Revealing the unexplored fungal communities in deep groundwater of crystalline bedrock fracture zones in Olkiluoto, Finland.

    PubMed

    Sohlberg, Elina; Bomberg, Malin; Miettinen, Hanna; Nyyssönen, Mari; Salavirta, Heikki; Vikman, Minna; Itävaara, Merja

    2015-01-01

    The diversity and functional role of fungi, one of the ecologically most important groups of eukaryotic microorganisms, remains largely unknown in deep biosphere environments. In this study we investigated fungal communities in packer-isolated bedrock fractures in Olkiluoto, Finland at depths ranging from 296 to 798 m below surface level. DNA- and cDNA-based high-throughput amplicon sequencing analysis of the fungal internal transcribed spacer (ITS) gene markers was used to examine the total fungal diversity and to identify the active members in deep fracture zones at different depths. Results showed that fungi were present in fracture zones at all depths and fungal diversity was higher than expected. Most of the observed fungal sequences belonged to the phylum Ascomycota. Phyla Basidiomycota and Chytridiomycota were only represented as a minor part of the fungal community. Dominating fungal classes in the deep bedrock aquifers were Sordariomycetes, Eurotiomycetes, and Dothideomycetes from the Ascomycota phylum and classes Microbotryomycetes and Tremellomycetes from the Basidiomycota phylum, which are the most frequently detected fungal taxa reported also from deep sea environments. In addition some fungal sequences represented potentially novel fungal species. Active fungi were detected in most of the fracture zones, which proves that fungi are able to maintain cellular activity in these oligotrophic conditions. Possible roles of fungi and their origin in deep bedrock groundwater can only be speculated in the light of current knowledge but some species may be specifically adapted to deep subsurface environment and may play important roles in the utilization and recycling of nutrients and thus sustaining the deep subsurface microbial community.

  3. Assessment of Groundwater Susceptibility to Non-Point Source Contaminants Using Three-Dimensional Transient Indexes.

    PubMed

    Zhang, Yong; Weissmann, Gary S; Fogg, Graham E; Lu, Bingqing; Sun, HongGuang; Zheng, Chunmiao

    2018-06-05

    Groundwater susceptibility to non-point source contamination is typically quantified by stable indexes, while groundwater quality evolution (or deterioration globally) can be a long-term process that may last for decades and exhibit strong temporal variations. This study proposes a three-dimensional (3- d ), transient index map built upon physical models to characterize the complete temporal evolution of deep aquifer susceptibility. For illustration purposes, the previous travel time probability density (BTTPD) approach is extended to assess the 3- d deep groundwater susceptibility to non-point source contamination within a sequence stratigraphic framework observed in the Kings River fluvial fan (KRFF) aquifer. The BTTPD, which represents complete age distributions underlying a single groundwater sample in a regional-scale aquifer, is used as a quantitative, transient measure of aquifer susceptibility. The resultant 3- d imaging of susceptibility using the simulated BTTPDs in KRFF reveals the strong influence of regional-scale heterogeneity on susceptibility. The regional-scale incised-valley fill deposits increase the susceptibility of aquifers by enhancing rapid downward solute movement and displaying relatively narrow and young age distributions. In contrast, the regional-scale sequence-boundary paleosols within the open-fan deposits "protect" deep aquifers by slowing downward solute movement and displaying a relatively broad and old age distribution. Further comparison of the simulated susceptibility index maps to known contaminant distributions shows that these maps are generally consistent with the high concentration and quick evolution of 1,2-dibromo-3-chloropropane (DBCP) in groundwater around the incised-valley fill since the 1970s'. This application demonstrates that the BTTPDs can be used as quantitative and transient measures of deep aquifer susceptibility to non-point source contamination.

  4. Comparative genomic analysis of oil spill impacts on deep water shipwreck microbiomes in the northern Gulf of Mexico

    NASA Astrophysics Data System (ADS)

    Hamdan, L. J.; Damour, M.; McGown, C.; Figan, C.; Kassahun, Z.; Blackwell, K.; Horrell, C.; Gillevet, P.

    2014-12-01

    Shipwrecks serve as artificial reefs in the deep ocean. Because of their inherent diversity compared to their surrounding environment and their random distribution, shipwrecks are ideal ecosystems to study pollution impacts and microbial distribution patterns in the deep biosphere. This study provides a comparative assessment of Deepwater Horizon spill impacts on shipwreck and local sedimentary microbiomes and the synergistic effects of contaminants on these communities and the physical structures that support them. For this study, microbiomes associated with wooden 19th century shipwrecks and World War II era steel shipwrecks in the northern Gulf of Mexico were investigated using next generation sequencing. Samples derived from in situ biofilm monitoring platforms deployed adjacent to 5 shipwrecks for 4 months, and sediment collected from distances ranging from 2-200m from each shipwreck were evaluated for shifts in microbiome structure and gene function relative to proximity to the spill, and oil spill related contaminants in the local environment. The goals of the investigation are to determine impacts to recruitment and community structure at sites located within and outside of areas impacted by the spill. Taxonomic classification of dominant and rare members of shipwreck microbiomes and metabolic information extracted from sequence data yield new understanding of microbial processes associated with site formation. The study provides information on the identity of microbial inhabitants of shipwrecks, their role in site preservation, and impacts of the Deepwater Horizon spill on the primary colonizers of artificial reefs in the deep ocean. This approach could inform about the role of microorganisms in establishment and maintenance of the artificial reef environment, while providing information about ecosystem feedbacks resulting from spills.

  5. Recalcitrant deep and shallow nodes in Aristolochia (Aristolochiaceae) illuminated using anchored hybrid enrichment.

    PubMed

    Wanke, Stefan; Granados Mendoza, Carolina; Müller, Sebastian; Paizanni Guillén, Anna; Neinhuis, Christoph; Lemmon, Alan R; Lemmon, Emily Moriarty; Samain, Marie-Stéphanie

    2017-12-01

    Recalcitrant relationships are characterized by very short internodes that can be found among shallow and deep phylogenetic scales all over the tree of life. Adding large amounts of presumably informative sequences, while decreasing systematic error, has been suggested as a possible approach to increase phylogenetic resolution. The development of enrichment strategies, coupled with next generation sequencing, resulted in a cost-effective way to facilitate the reconstruction of recalcitrant relationships. By applying the anchored hybrid enrichment (AHE) genome partitioning strategy to Aristolochia using an universal angiosperm probe set, we obtained 231-233 out of 517 single or low copy nuclear loci originally contained in the enrichment kit, resulting in a total alignment length of 154,756bp to 160,150bp. Since Aristolochia (Piperales; magnoliids) is distantly related to any angiosperm species whose genome has been used for the plant AHE probe design (Amborella trichopoda being the closest), it serves as a proof of universality for this probe set. Aristolochia comprises approximately 500 species grouped in several clades (OTUs), whose relationships to each other are partially unknown. Previous phylogenetic studies have shown that these lineages branched deep in time and in quick succession, seen as short-deep internodes. Short-shallow internodes are also characteristic of some Aristolochia lineages such as Aristolochia subsection Pentandrae, a clade of presumably recent diversification. This subsection is here included to test the performance of AHE at species level. Filtering and subsampling loci using the phylogenetic informativeness method resolves several recalcitrant phylogenetic relationships within Aristolochia. By assuming different ploidy levels during bioinformatics processing of raw data, first hints are obtained that polyploidization contributed to the evolution of Aristolochia. Phylogenetic results are discussed in the light of current systematics and morphology. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  6. Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction

    PubMed Central

    Laehnemann, David; Borkhardt, Arndt

    2016-01-01

    Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here. PMID:26026159

  7. Ultra-deep mutant spectrum profiling: improving sequencing accuracy using overlapping read pairs.

    PubMed

    Chen-Harris, Haiyin; Borucki, Monica K; Torres, Clinton; Slezak, Tom R; Allen, Jonathan E

    2013-02-12

    High throughput sequencing is beginning to make a transformative impact in the area of viral evolution. Deep sequencing has the potential to reveal the mutant spectrum within a viral sample at high resolution, thus enabling the close examination of viral mutational dynamics both within- and between-hosts. The challenge however, is to accurately model the errors in the sequencing data and differentiate real viral mutations, particularly those that exist at low frequencies, from sequencing errors. We demonstrate that overlapping read pairs (ORP) -- generated by combining short fragment sequencing libraries and longer sequencing reads -- significantly reduce sequencing error rates and improve rare variant detection accuracy. Using this sequencing protocol and an error model optimized for variant detection, we are able to capture a large number of genetic mutations present within a viral population at ultra-low frequency levels (<0.05%). Our rare variant detection strategies have important implications beyond viral evolution and can be applied to any basic and clinical research area that requires the identification of rare mutations.

  8. Deep RNNs for video denoising

    NASA Astrophysics Data System (ADS)

    Chen, Xinyuan; Song, Li; Yang, Xiaokang

    2016-09-01

    Video denoising can be described as the problem of mapping from a specific length of noisy frames to clean one. We propose a deep architecture based on Recurrent Neural Network (RNN) for video denoising. The model learns a patch-based end-to-end mapping between the clean and noisy video sequences. It takes the corrupted video sequences as the input and outputs the clean one. Our deep network, which we refer to as deep Recurrent Neural Networks (deep RNNs or DRNNs), stacks RNN layers where each layer receives the hidden state of the previous layer as input. Experiment shows (i) the recurrent architecture through temporal domain extracts motion information and does favor to video denoising, and (ii) deep architecture have large enough capacity for expressing mapping relation between corrupted videos as input and clean videos as output, furthermore, (iii) the model has generality to learned different mappings from videos corrupted by different types of noise (e.g., Poisson-Gaussian noise). By training on large video databases, we are able to compete with some existing video denoising methods.

  9. Deep ART Neural Model for Biologically Inspired Episodic Memory and Its Application to Task Performance of Robots.

    PubMed

    Park, Gyeong-Moon; Yoo, Yong-Ho; Kim, Deok-Hwa; Kim, Jong-Hwan; Gyeong-Moon Park; Yong-Ho Yoo; Deok-Hwa Kim; Jong-Hwan Kim; Yoo, Yong-Ho; Park, Gyeong-Moon; Kim, Jong-Hwan; Kim, Deok-Hwa

    2018-06-01

    Robots are expected to perform smart services and to undertake various troublesome or difficult tasks in the place of humans. Since these human-scale tasks consist of a temporal sequence of events, robots need episodic memory to store and retrieve the sequences to perform the tasks autonomously in similar situations. As episodic memory, in this paper we propose a novel Deep adaptive resonance theory (ART) neural model and apply it to the task performance of the humanoid robot, Mybot, developed in the Robot Intelligence Technology Laboratory at KAIST. Deep ART has a deep structure to learn events, episodes, and even more like daily episodes. Moreover, it can retrieve the correct episode from partial input cues robustly. To demonstrate the effectiveness and applicability of the proposed Deep ART, experiments are conducted with the humanoid robot, Mybot, for performing the three tasks of arranging toys, making cereal, and disposing of garbage.

  10. Optimization of whole-transcriptome amplification from low cell density deep-sea microbial samples for metatranscriptomic analysis.

    PubMed

    Wu, Jieying; Gao, Weimin; Zhang, Weiwen; Meldrum, Deirdre R

    2011-01-01

    Limitation in sample quality and quantity is one of the big obstacles for applying metatranscriptomic technologies to explore gene expression and functionality of microbial communities in natural environments. In this study, several amplification methods were evaluated for whole-transcriptome amplification of deep-sea microbial samples, which are of low cell density and high impurity. The best amplification method was identified and incorporated into a complete protocol to isolate and amplify deep-sea microbial samples. In the protocol, total RNA was first isolated by a modified method combining Trizol (Invitrogen, CA) and RNeasy (QIAGEN, CA) method, amplified with a WT-Ovation™ Pico RNA Amplification System (NuGEN, CA), and then converted to double-strand DNA from single-strand cDNA with a WT-Ovation™ Exon Module (NuGEN, CA). The products from the whole-transcriptome amplification of deep-sea microbial samples were assessed first through random clone library sequencing. The BLAST search results showed that marine-based sequences are dominant in the libraries, consistent with the ecological source of the samples. The products were then used for next-generation Roche GS FLX Titanium sequencing to obtain metatranscriptome data. Preliminary analysis of the metatranscriptomic data showed good sequencing quality. Although the protocol was designed and demonstrated to be effective for deep-sea microbial samples, it should be applicable to similar samples from other extreme environments in exploring community structure and functionality of microbial communities. Copyright © 2010 Elsevier B.V. All rights reserved.

  11. Deep Investigation of Arabidopsis thaliana Junk DNA Reveals a Continuum between Repetitive Elements and Genomic Dark Matter

    PubMed Central

    Maumus, Florian; Quesneville, Hadi

    2014-01-01

    Eukaryotic genomes contain highly variable amounts of DNA with no apparent function. This so-called junk DNA is composed of two components: repeated and repeat-derived sequences (together referred to as the repeatome), and non-annotated sequences also known as genomic dark matter. Because of their high duplication rates as compared to other genomic features, transposable elements are predominant contributors to the repeatome and the products of their decay is thought to be a major source of genomic dark matter. Determining the origin and composition of junk DNA is thus important to help understanding genome evolution as well as host biology. In this study, we have used a combination of tools enabling to show that the repeatome from the small and reducing A. thaliana genome is significantly larger than previously thought. Furthermore, we present the concepts and results from a series of innovative approaches suggesting that a significant amount of the A. thaliana dark matter is of repetitive origin. As a tentative standard for the community, we propose a deep compendium annotation of the A. thaliana repeatome that may help addressing farther genome evolution as well as transcriptional and epigenetic regulation in this model plant. PMID:24709859

  12. Discovery of a Novel Periodontal Disease-Associated Bacterium.

    PubMed

    Torres, Pedro J; Thompson, John; McLean, Jeffrey S; Kelley, Scott T; Edlund, Anna

    2018-06-02

    One of the world's most common infectious disease, periodontitis (PD), derives from largely uncharacterized communities of oral bacteria growing as biofilms (a.k.a. plaque) on teeth and gum surfaces in periodontal pockets. Bacteria associated with periodontal disease trigger inflammatory responses in immune cells, which in later stages of the disease cause loss of both soft and hard tissue structures supporting teeth. Thus far, only a handful of bacteria have been characterized as infectious agents of PD. Although deep sequencing technologies, such as whole community shotgun sequencing have the potential to capture a detailed picture of highly complex bacterial communities in any given environment, we still lack major reference genomes for the oral microbiome associated with PD and other diseases. In recent work, by using a combination of supervised machine learning and genome assembly, we identified a genome from a novel member of the Bacteroidetes phylum in periodontal samples. Here, by applying a comparative metagenomics read-classification approach, including 272 metagenomes from various human body sites, and our previously assembled draft genome of the uncultivated Candidatus Bacteroides periocalifornicus (CBP) bacterium, we show CBP's ubiquitous distribution in dental plaque, as well as its strong association with the well-known pathogenic "red complex" that resides in deep periodontal pockets.

  13. The complete mitochondrial genome of the deep-sea sponge Poecillastra laminaris (Astrophorida, Vulcanellidae).

    PubMed

    Zeng, Cong; Thomas, Leighton J; Kelly, Michelle; Gardner, Jonathan P A

    2016-05-01

    The complete mitochondrial genome of a New Zealand specimen of the deep-sea sponge Poecillastra laminaris (Sollas, 1886) (Astrophorida, Vulcanellidae), from the Colville Ridge, New Zealand, was sequenced using the 454 Life Science pyrosequencing system. To identify homologous mitochondrial sequences, the 454 reads were mapped to the complete mitochondrial genome sequence of Geodia neptuni (GeneBank No. NC_006990). The P. laminaris genome is 18,413 bp in length and includes 14 protein-coding genes, 24 transfer RNA genes and 2 ribosomal RNA genes. Gene order resembled that of other demosponges. The base composition of the genome is A (29.1%), T (35.2%), C (14.0%) and G (21.7%). This is the second published mitogenome for a sponge of the order Astrophorida and will be useful in future phylogenetic analysis of deep-sea sponges.

  14. MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction.

    PubMed

    Wang, Duolin; Zeng, Shuai; Xu, Chunhui; Qiu, Wangren; Liang, Yanchun; Joshi, Trupti; Xu, Dong

    2017-12-15

    Computational methods for phosphorylation site prediction play important roles in protein function studies and experimental design. Most existing methods are based on feature extraction, which may result in incomplete or biased features. Deep learning as the cutting-edge machine learning method has the ability to automatically discover complex representations of phosphorylation patterns from the raw sequences, and hence it provides a powerful tool for improvement of phosphorylation site prediction. We present MusiteDeep, the first deep-learning framework for predicting general and kinase-specific phosphorylation sites. MusiteDeep takes raw sequence data as input and uses convolutional neural networks with a novel two-dimensional attention mechanism. It achieves over a 50% relative improvement in the area under the precision-recall curve in general phosphorylation site prediction and obtains competitive results in kinase-specific prediction compared to other well-known tools on the benchmark data. MusiteDeep is provided as an open-source tool available at https://github.com/duolinwang/MusiteDeep. xudong@missouri.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  15. Fungal diversity in deep-sea sediments associated with asphalt seeps at the Sao Paulo Plateau

    NASA Astrophysics Data System (ADS)

    Nagano, Yuriko; Miura, Toshiko; Nishi, Shinro; Lima, Andre O.; Nakayama, Cristina; Pellizari, Vivian H.; Fujikura, Katsunori

    2017-12-01

    We investigated the fungal diversity in a total of 20 deep-sea sediment samples (of which 14 samples were associated with natural asphalt seeps and 6 samples were not associated) collected from two different sites at the Sao Paulo Plateau off Brazil by Ion Torrent PGM targeting ITS region of ribosomal RNA. Our results suggest that diverse fungi (113 operational taxonomic units (OTUs) based on clustering at 97% sequence similarity assigned into 9 classes and 31 genus) are present in deep-sea sediment samples collected at the Sao Paulo Plateau, dominated by Ascomycota (74.3%), followed by Basidiomycota (11.5%), unidentified fungi (7.1%), and sequences with no affiliation to any organisms in the public database (7.1%). However, it was revealed that only three species, namely Penicillium sp., Cadophora malorum and Rhodosporidium diobovatum, were dominant, with the majority of OTUs remaining a minor community. Unexpectedly, there was no significant difference in major fungal community structure between the asphalt seep and non-asphalt seep sites, despite the presence of mass hydrocarbon deposits and the high amount of macro organisms surrounding the asphalt seeps. However, there were some differences in the minor fungal communities, with possible asphalt degrading fungi present specifically in the asphalt seep sites. In contrast, some differences were found between the two different sampling sites. Classification of OTUs revealed that only 47 (41.6%) fungal OTUs exhibited >97% sequence similarity, in comparison with pre-existing ITS sequences in public databases, indicating that a majority of deep-sea inhabiting fungal taxa still remain undescribed. Although our knowledge on fungi and their role in deep-sea environments is still limited and scarce, this study increases our understanding of fungal diversity and community structure in deep-sea environments.

  16. Identification of Small RNAs in Desulfovibrio vulgaris Hildenborough

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Burns, Andrew; Joachimiak, Marcin; Deutschbauer, Adam

    2010-05-17

    Desulfovibrio vulgaris is an anaerobic sulfate-reducing bacterium capable of facilitating the removal of toxic metals such as uranium from contaminated sites via reduction. As such, it is essential to understand the intricate regulatory cascades involved in how D. vulgaris and its relatives respond to stressors in such sites. One approach is the identification and analysis of small non-coding RNAs (sRNAs); molecules ranging in size from 20-200 nucleotides that predominantly affect gene regulation by binding to complementary mRNA in an anti-sense fashion and therefore provide an immediate regulatory response. To identify sRNAs in D. vulgaris, a bacterium that does not possessmore » an annotated hfq gene, RNA was pooled from stationary and exponential phases, nitrate exposure, and biofilm conditions. The subsequent RNA was size fractionated, modified, and converted to cDNA for high throughput transcriptomic deep sequencing. A computational approach to identify sRNAs via the alignment of seven separate Desulfovibrio genomes was also performed. From the deep sequencing analysis, 2,296 reads between 20 and 250 nt were identified with expression above genome background. Analysis of those reads limited the number of candidates to ~;;87 intergenic, while ~;;140 appeared to be antisense to annotated open reading frames (ORFs). Further BLAST analysis of the intergenic candidates and other Desulfovibrio genomes indicated that eight candidates were likely portions of ORFs not previously annotated in the D. vulgaris genome. Comparison of the intergenic and antisense data sets to the bioinformatical predicted candidates, resulted in ~;;54 common candidates. Current approaches using Northern analysis and qRT-PCR are being used toverify expression of the candidates and to further develop the role these sRNAs play in D. vulgaris regulation.« less

  17. LookSeq: a browser-based viewer for deep sequencing data.

    PubMed

    Manske, Heinrich Magnus; Kwiatkowski, Dominic P

    2009-11-01

    Sequencing a genome to great depth can be highly informative about heterogeneity within an individual or a population. Here we address the problem of how to visualize the multiple layers of information contained in deep sequencing data. We propose an interactive AJAX-based web viewer for browsing large data sets of aligned sequence reads. By enabling seamless browsing and fast zooming, the LookSeq program assists the user to assimilate information at different levels of resolution, from an overview of a genomic region to fine details such as heterogeneity within the sample. A specific problem, particularly if the sample is heterogeneous, is how to depict information about structural variation. LookSeq provides a simple graphical representation of paired sequence reads that is more revealing about potential insertions and deletions than are conventional methods.

  18. Unveiling the Biodiversity of Deep-Sea Nematodes through Metabarcoding: Are We Ready to Bypass the Classical Taxonomy?

    PubMed Central

    2015-01-01

    Nematodes inhabiting benthic deep-sea ecosystems account for >90% of the total metazoan abundances and they have been hypothesised to be hyper-diverse, but their biodiversity is still largely unknown. Metabarcoding could facilitate the census of biodiversity, especially for those tiny metazoans for which morphological identification is difficult. We compared, for the first time, different DNA extraction procedures based on the use of two commercial kits and a previously published laboratory protocol and tested their suitability for sequencing analyses of 18S rDNA of marine nematodes. We also investigated the reliability of Roche 454 sequencing analyses for assessing the biodiversity of deep-sea nematode assemblages previously morphologically identified. Finally, intra-genomic variation in 18S rRNA gene repeats was investigated by Illumina MiSeq in different deep-sea nematode morphospecies to assess the influence of polymorphisms on nematode biodiversity estimates. Our results indicate that the two commercial kits should be preferred for the molecular analysis of biodiversity of deep-sea nematodes since they consistently provide amplifiable DNA suitable for sequencing. We report that the morphological identification of deep-sea nematodes matches the results obtained by metabarcoding analysis only at the order-family level and that a large portion of Operational Clustered Taxonomic Units (OCTUs) was not assigned. We also show that independently from the cut-off criteria and bioinformatic pipelines used, the number of OCTUs largely exceeds the number of individuals and that 18S rRNA gene of different morpho-species of nematodes displayed intra-genomic polymorphisms. Our results indicate that metabarcoding is an important tool to explore the diversity of deep-sea nematodes, but still fails in identifying most of the species due to limited number of sequences deposited in the public databases, and in providing quantitative data on the species encountered. These aspects should be carefully taken into account before using metabarcoding in quantitative ecological research and monitoring programmes of marine biodiversity. PMID:26701112

  19. Unveiling the Biodiversity of Deep-Sea Nematodes through Metabarcoding: Are We Ready to Bypass the Classical Taxonomy?

    PubMed

    Dell'Anno, Antonio; Carugati, Laura; Corinaldesi, Cinzia; Riccioni, Giulia; Danovaro, Roberto

    2015-01-01

    Nematodes inhabiting benthic deep-sea ecosystems account for >90% of the total metazoan abundances and they have been hypothesised to be hyper-diverse, but their biodiversity is still largely unknown. Metabarcoding could facilitate the census of biodiversity, especially for those tiny metazoans for which morphological identification is difficult. We compared, for the first time, different DNA extraction procedures based on the use of two commercial kits and a previously published laboratory protocol and tested their suitability for sequencing analyses of 18S rDNA of marine nematodes. We also investigated the reliability of Roche 454 sequencing analyses for assessing the biodiversity of deep-sea nematode assemblages previously morphologically identified. Finally, intra-genomic variation in 18S rRNA gene repeats was investigated by Illumina MiSeq in different deep-sea nematode morphospecies to assess the influence of polymorphisms on nematode biodiversity estimates. Our results indicate that the two commercial kits should be preferred for the molecular analysis of biodiversity of deep-sea nematodes since they consistently provide amplifiable DNA suitable for sequencing. We report that the morphological identification of deep-sea nematodes matches the results obtained by metabarcoding analysis only at the order-family level and that a large portion of Operational Clustered Taxonomic Units (OCTUs) was not assigned. We also show that independently from the cut-off criteria and bioinformatic pipelines used, the number of OCTUs largely exceeds the number of individuals and that 18S rRNA gene of different morpho-species of nematodes displayed intra-genomic polymorphisms. Our results indicate that metabarcoding is an important tool to explore the diversity of deep-sea nematodes, but still fails in identifying most of the species due to limited number of sequences deposited in the public databases, and in providing quantitative data on the species encountered. These aspects should be carefully taken into account before using metabarcoding in quantitative ecological research and monitoring programmes of marine biodiversity.

  20. Temporal lobe neoplasm and seizures: how deep does the story go?

    PubMed

    Jehi, Lara E; Lüders, Hans O; Naugle, Richard; Ruggieri, Paul; Morris, Harold; Foldvary, Nancy; Wyllie, Elaine; Kotagal, Prakash; Bingaman, Bill; Dinner, Dudley; Prayson, Richard; Diehl, Beate; Alexopoulos, Andreas; Bautista, Jocelyn; Busch, Robyn

    2008-03-01

    [March 2008-Cleveland Case Report]. There is a well-described association between the occurrence of developmental tumors and the presence of cortical dysplasia in the neighboring brain tissue. The main surgical approaches in the treatment of medically refractory epilepsy related to such developmental tumors include a lesionectomy versus a tailored cortical resection, often guided by an invasive evaluation. This case report describes the surgical management of a 26-year-old female with olfactory auras evolving into automotor seizures and convulsions, occurring in the context of a right temporo-parietal developmental lesion. It illustrates the pros and cons of various surgical approaches, and discusses some pathophysiological aspects of developmental tumors, dysplasia and epilepsy. [Published with video sequences].

  1. Rapidly evolving homing CRISPR barcodes

    PubMed Central

    Kalhor, Reza; Mali, Prashant; Church, George M.

    2017-01-01

    We present here an approach for engineering evolving DNA barcodes in living cells. The methodology entails using a homing guide RNA (hgRNA) scaffold that directs the Cas9-hgRNA complex to target the DNA locus of the hgRNA itself. We show that this homing CRISPR-Cas9 system acts as an expressed genetic barcode that diversifies its sequence and that the rate of diversification can be controlled in cultured cells. We further evaluate these barcodes in cell populations and show the barcode RNAs can be assayed as single molecules in situ . This integrated approach will have wide ranging applications, such as in deep lineage tracing, cellular barcoding, molecular recording, dissecting cancer biology, and connectome mapping. PMID:27918539

  2. Optimizing Multi-Station Template Matching to Identify and Characterize Induced Seismicity in Ohio

    NASA Astrophysics Data System (ADS)

    Brudzinski, M. R.; Skoumal, R.; Currie, B. S.

    2014-12-01

    As oil and gas well completions utilizing multi-stage hydraulic fracturing have become more commonplace, the potential for seismicity induced by the deep disposal of frac-related flowback waters and the hydraulic fracturing process itself has become increasingly important. While it is rare for these processes to induce felt seismicity, the recent increase in the number of deep injection wells and volumes injected have been suspected to have contributed to a substantial increase of events = M 3 in the continental U.S. over the past decade. Earthquake template matching using multi-station waveform cross-correlation is an adept tool for investigating potentially induced sequences due to its proficiency at identifying similar/repeating seismic events. We have sought to refine this approach by investigating a variety of seismic sequences and determining the optimal parameters (station combinations, template lengths and offsets, filter frequencies, data access method, etc.) for identifying induced seismicity. When applied to a sequence near a wastewater injection well in Youngstown, Ohio, our optimized template matching routine yielded 566 events while other template matching studies found ~100-200 events. We also identified 77 events on 4-12 March 2014 that are temporally and spatially correlated with active hydraulic fracturing in Poland Township, Ohio. We find similar improvement in characterizing sequences in Washington and Harrison Counties, which appear to be related to wastewater injection and hydraulic fracturing, respectively. In the Youngstown and Poland Township cases, focal mechanisms and double difference relocation using the cross-correlation matrix finds left-lateral faults striking roughly east-west near the top of the basement. We have also used template matching to determine isolated earthquakes near several other wastewater injection wells are unlikely to be induced based on a lack of similar/repeating sequences. Optimized template matching utilizes high-quality reliable stations within pre-existing seismic networks and is therefore a cost-efficient monitoring strategy for identifying and characterizing potentially induced seismic sequences.

  3. Chromatin accessibility prediction via a hybrid deep convolutional neural network.

    PubMed

    Liu, Qiao; Xia, Fei; Yin, Qijin; Jiang, Rui

    2018-03-01

    A majority of known genetic variants associated with human-inherited diseases lie in non-coding regions that lack adequate interpretation, making it indispensable to systematically discover functional sites at the whole genome level and precisely decipher their implications in a comprehensive manner. Although computational approaches have been complementing high-throughput biological experiments towards the annotation of the human genome, it still remains a big challenge to accurately annotate regulatory elements in the context of a specific cell type via automatic learning of the DNA sequence code from large-scale sequencing data. Indeed, the development of an accurate and interpretable model to learn the DNA sequence signature and further enable the identification of causative genetic variants has become essential in both genomic and genetic studies. We proposed Deopen, a hybrid framework mainly based on a deep convolutional neural network, to automatically learn the regulatory code of DNA sequences and predict chromatin accessibility. In a series of comparison with existing methods, we show the superior performance of our model in not only the classification of accessible regions against background sequences sampled at random, but also the regression of DNase-seq signals. Besides, we further visualize the convolutional kernels and show the match of identified sequence signatures and known motifs. We finally demonstrate the sensitivity of our model in finding causative noncoding variants in the analysis of a breast cancer dataset. We expect to see wide applications of Deopen with either public or in-house chromatin accessibility data in the annotation of the human genome and the identification of non-coding variants associated with diseases. Deopen is freely available at https://github.com/kimmo1019/Deopen. ruijiang@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  4. Molecular Phylogenetic Analysis of Archaeal Intron-Containing Genes Coding for rRNA Obtained from a Deep-Subsurface Geothermal Water Pool

    PubMed Central

    Takai, Ken; Horikoshi, Koki

    1999-01-01

    Molecular phylogenetic analysis of a naturally occurring microbial community in a deep-subsurface geothermal environment indicated that the phylogenetic diversity of the microbial population in the environment was extremely limited and that only hyperthermophilic archaeal members closely related to Pyrobaculum were present. All archaeal ribosomal DNA sequences contained intron-like sequences, some of which had open reading frames with repeated homing-endonuclease motifs. The sequence similarity analysis and the phylogenetic analysis of these homing endonucleases suggested the possible phylogenetic relationship among archaeal rRNA-encoded homing endonucleases. PMID:10584021

  5. A pooling-based approach to mapping genetic variants associated with DNA methylation

    PubMed Central

    Kaplow, Irene M.; MacIsaac, Julia L.; Mah, Sarah M.; McEwen, Lisa M.; Kobor, Michael S.; Fraser, Hunter B.

    2015-01-01

    DNA methylation is an epigenetic modification that plays a key role in gene regulation. Previous studies have investigated its genetic basis by mapping genetic variants that are associated with DNA methylation at specific sites, but these have been limited to microarrays that cover <2% of the genome and cannot account for allele-specific methylation (ASM). Other studies have performed whole-genome bisulfite sequencing on a few individuals, but these lack statistical power to identify variants associated with DNA methylation. We present a novel approach in which bisulfite-treated DNA from many individuals is sequenced together in a single pool, resulting in a truly genome-wide map of DNA methylation. Compared to methods that do not account for ASM, our approach increases statistical power to detect associations while sharply reducing cost, effort, and experimental variability. As a proof of concept, we generated deep sequencing data from a pool of 60 human cell lines; we evaluated almost twice as many CpGs as the largest microarray studies and identified more than 2000 genetic variants associated with DNA methylation. We found that these variants are highly enriched for associations with chromatin accessibility and CTCF binding but are less likely to be associated with traits indirectly linked to DNA, such as gene expression and disease phenotypes. In summary, our approach allows genome-wide mapping of genetic variants associated with DNA methylation in any tissue of any species, without the need for individual-level genotype or methylation data. PMID:25910490

  6. A pooling-based approach to mapping genetic variants associated with DNA methylation

    DOE PAGES

    Kaplow, Irene M.; MacIsaac, Julia L.; Mah, Sarah M.; ...

    2015-04-24

    DNA methylation is an epigenetic modification that plays a key role in gene regulation. Previous studies have investigated its genetic basis by mapping genetic variants that are associated with DNA methylation at specific sites, but these have been limited to microarrays that cover <2% of the genome and cannot account for allele-specific methylation (ASM). Other studies have performed whole-genome bisulfite sequencing on a few individuals, but these lack statistical power to identify variants associated with DNA methylation. We present a novel approach in which bisulfite-treated DNA from many individuals is sequenced together in a single pool, resulting in a trulymore » genome-wide map of DNA methylation. Compared to methods that do not account for ASM, our approach increases statistical power to detect associations while sharply reducing cost, effort, and experimental variability. As a proof of concept, we generated deep sequencing data from a pool of 60 human cell lines; we evaluated almost twice as many CpGs as the largest microarray studies and identified more than 2000 genetic variants associated with DNA methylation. Here we found that these variants are highly enriched for associations with chromatin accessibility and CTCF binding but are less likely to be associated with traits indirectly linked to DNA, such as gene expression and disease phenotypes. In summary, our approach allows genome-wide mapping of genetic variants associated with DNA methylation in any tissue of any species, without the need for individual-level genotype or methylation data.« less

  7. A first insight into the occurrence and expression of functional amoA and accA genes of autotrophic and ammonia-oxidizing bathypelagic Crenarchaeota of Tyrrhenian Sea

    NASA Astrophysics Data System (ADS)

    Yakimov, Michail M.; Cono, Violetta La; Denaro, Renata

    2009-05-01

    The autotrophic and ammonia-oxidizing crenarchaeal assemblage at offshore site located in the deep Mediterranean (Tyrrhenian Sea, depth 3000 m) water was studied by PCR amplification of the key functional genes involved in energy (ammonia mono-oxygenase alpha subunit, amoA) and central metabolism (acetyl-CoA carboxylase alpha subunit, accA). Using two recently annotated genomes of marine crenarchaeons, an initial set of primers targeting archaeal accA-like genes was designed. Approximately 300 clones were analyzed, of which 100% of amoA library and almost 70% of accA library were unambiguously related to the corresponding genes from marine Crenarchaeota. Even though the acetyl-CoA carboxylase is phylogenetically not well conserved and the remaining clones were affiliated to various bacterial acetyl-CoA/propionyl-CoA carboxylase genes, the pool of archaeal sequences was applied for development of quantitative PCR analysis of accA-like distribution using TaqMan ® methodolgy. The archaeal accA gene fragments, together with alignable gene fragments from the Sargasso Sea and North Pacific Subtropical Gyre (ALOHA Station) metagenome databases, were analyzed by multiple sequence alignment. Two accA-like sequences, found in ALOHA Station at the depth of 4000 m, formed a deeply branched clade with 64% of all archaeal Tyrrhenian clones. No close relatives for residual 36% of clones, except of those recovered from Eastern Mediterranean, was found, suggesting the existence of a specific lineage of the crenarchaeal accA genes in deep Mediterranean water. Alignment of Mediterranean amoA sequences defined four cosmopolitan phylotypes of Crenarchaeota putative ammonia mono-oxygenase subunit A gene occurring in the water sample from the 3000 m depth. Without exception all phylotypes fell into Deep Marine Group I cluster that contain the vast majority of known sequences recovered from global deep-sea environment. Remarkably, three phylotypes accounted for 91% of all Mediterranean amoA clones and corresponded to the sequences retrieved from the less deep compartments of the world's ocean, most likely reflecting the higher temperature at the depth of the Mediterranean Sea. In order to verify whether these phylotypes might represent important Crenarchaeota in the functioning of the Mediterranean bathypelagic ecosystem, expression of crenarchaeal amoA gene was monitored by direct RNA retrieval and following analysis of amoA-related mRNA transcripts. Surprisingly, all mRNA-derived sequences formed a tight monophyletic group, which fell into large Shallow Marine Group I cluster with sequences retrieved from shallow (up to 200 m) waters, sediments and corals. This group was not detected in DNA-based clone library, obviously, due to an overwhelming dominance of the Deep Marine Group I. The failure to recover the amoA transcripts, related to Deep Marine Group I of Crenarchaeota, was unanticipated and likely resulted from the physiology of these strongly adapted deep-sea organisms. As far as all seawater samples were treated on-board under atmospheric pressure conditions and sunlight, the decompression and/or photoinhibition likely affected their metabolic activity, followed by the strong decay of gene expression.

  8. PolyA_DB 3 catalogs cleavage and polyadenylation sites identified by deep sequencing in multiple genomes

    PubMed Central

    Wang, Ruijia; Nambiar, Ram; Zheng, Dinghai

    2018-01-01

    Abstract PolyA_DB is a database cataloging cleavage and polyadenylation sites (PASs) in several genomes. Previous versions were based mainly on expressed sequence tags (ESTs), which had a limited amount and could lead to inaccurate PAS identification due to the presence of internal A-rich sequences in transcripts. Here, we present an updated version of the database based solely on deep sequencing data. First, PASs are mapped by the 3′ region extraction and deep sequencing (3′READS) method, ensuring unequivocal PAS identification. Second, a large volume of data based on diverse biological samples increases PAS coverage by 3.5-fold over the EST-based version and provides PAS usage information. Third, strand-specific RNA-seq data are used to extend annotated 3′ ends of genes to obtain more thorough annotations of alternative polyadenylation (APA) sites. Fourth, conservation information of PAS across mammals sheds light on significance of APA sites. The database (URL: http://www.polya-db.org/v3) currently holds PASs in human, mouse, rat and chicken, and has links to the UCSC genome browser for further visualization and for integration with other genomic data. PMID:29069441

  9. The deep, hot biosphere: Twenty-five years of retrospection.

    PubMed

    Colman, Daniel R; Poudel, Saroj; Stamps, Blake W; Boyd, Eric S; Spear, John R

    2017-07-03

    Twenty-five years ago this month, Thomas Gold published a seminal manuscript suggesting the presence of a "deep, hot biosphere" in the Earth's crust. Since this publication, a considerable amount of attention has been given to the study of deep biospheres, their role in geochemical cycles, and their potential to inform on the origin of life and its potential outside of Earth. Overwhelming evidence now supports the presence of a deep biosphere ubiquitously distributed on Earth in both terrestrial and marine settings. Furthermore, it has become apparent that much of this life is dependent on lithogenically sourced high-energy compounds to sustain productivity. A vast diversity of uncultivated microorganisms has been detected in subsurface environments, and we show that H 2 , CH 4 , and CO feature prominently in many of their predicted metabolisms. Despite 25 years of intense study, key questions remain on life in the deep subsurface, including whether it is endemic and the extent of its involvement in the anaerobic formation and degradation of hydrocarbons. Emergent data from cultivation and next-generation sequencing approaches continue to provide promising new hints to answer these questions. As Gold suggested, and as has become increasingly evident, to better understand the subsurface is critical to further understanding the Earth, life, the evolution of life, and the potential for life elsewhere. To this end, we suggest the need to develop a robust network of interdisciplinary scientists and accessible field sites for long-term monitoring of the Earth's subsurface in the form of a deep subsurface microbiome initiative.

  10. Deep whole-genome sequencing of 90 Han Chinese genomes.

    PubMed

    Lan, Tianming; Lin, Haoxiang; Zhu, Wenjuan; Laurent, Tellier Christian Asker Melchior; Yang, Mengcheng; Liu, Xin; Wang, Jun; Wang, Jian; Yang, Huanming; Xu, Xun; Guo, Xiaosen

    2017-09-01

    Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low-frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low-frequency and novel variants. Although whole-exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole-genome sequencing data are limited for any population, and a large amount of low-frequency, population-specific variants remain uncharacterized. We have performed whole-genome sequencing at a high depth (∼×80) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genomes Project samples, including 45 Northern Han Chinese and 45 Southern Han Chinese samples. Eighty-three of these 90 have been sequenced by the 1000 Genomes Project. We have identified 12 568 804 single nucleotide polymorphisms, 2 074 210 short InDels, and 26 142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7 000 629 novel variants with low frequency (defined as minor allele frequency < 5%), including 5 813 503 single nucleotide polymorphisms, 1 169 199 InDels, and 17 927 structural variants. Using deep sequencing data, we have built a greatly expanded spectrum of genetic variation for the Han Chinese genome. Compared to the 1000 Genomes Project, these Han Chinese deep sequencing data enhance the characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement to the 1000 Genomes Project, as well as to other human genome projects. © The Authors 2017. Published by Oxford University Press.

  11. Deep reefs are not universal refuges: Reseeding potential varies among coral species

    PubMed Central

    Bongaerts, Pim; Riginos, Cynthia; Brunner, Ramona; Englebert, Norbert; Smith, Struan R.; Hoegh-Guldberg, Ove

    2017-01-01

    Deep coral reefs (that is, mesophotic coral ecosystems) can act as refuges against major disturbances affecting shallow reefs. It has been proposed that, through the provision of coral propagules, such deep refuges may aid in shallow reef recovery; however, this “reseeding” hypothesis remains largely untested. We conducted a genome-wide assessment of two scleractinian coral species with contrasting reproductive modes, to assess the potential for connectivity between mesophotic (40 m) and shallow (12 m) depths on an isolated reef system in the Western Atlantic (Bermuda). To overcome the pervasive issue of endosymbiont contamination associated with de novo sequencing of corals, we used a novel subtraction reference approach. We have demonstrated that strong depth-associated selection has led to genome-wide divergence in the brooding species Agaricia fragilis (with divergence by depth exceeding divergence by location). Despite introgression from shallow into deep populations, a lack of first-generation migrants indicates that effective connectivity over ecological time scales is extremely limited for this species and thus precludes reseeding of shallow reefs from deep refuges. In contrast, no genetic structuring between depths (or locations) was observed for the broadcasting species Stephanocoenia intersepta, indicating substantial potential for vertical connectivity. Our findings demonstrate that vertical connectivity within the same reef system can differ greatly between species and that the reseeding potential of deep reefs in Bermuda may apply to only a small number of scleractinian species. Overall, we argue that the “deep reef refuge hypothesis” holds for individual coral species during episodic disturbances but should not be assumed as a broader ecosystem-wide phenomenon. PMID:28246645

  12. Highly Sensitive Detection of Isoniazid Heteroresistance in Mycobacterium tuberculosis by DeepMelt Assay.

    PubMed

    Liang, Bin; Tan, Yaoju; Li, Zi; Tian, Xueshan; Du, Chen; Li, Hui; Li, Guoli; Yao, Xiangyang; Wang, Zhongan; Xu, Ye; Li, Qingge

    2018-02-01

    Detection of heteroresistance of Mycobacterium tuberculosis remains challenging using current genotypic drug susceptibility testing methods. Here, we described a melting curve analysis-based approach, termed DeepMelt, that can detect less-abundant mutants through selective clamping of the wild type in mixed populations. The singleplex DeepMelt assay detected 0.01% katG S315T in 10 5 M. tuberculosis genomes/μl. The multiplex DeepMelt TB/INH detected 1% of mutant species in the four loci associated with isoniazid resistance in 10 4 M. tuberculosis genomes/μl. The DeepMelt TB/INH assay was tested on a panel of DNA extracted from 602 precharacterized clinical isolates. Using the 1% proportion method as the gold standard, the sensitivity was found to be increased from 93.6% (176/188, 95% confidence interval [CI] = 89.2 to 96.3%) to 95.7% (180/188, 95% CI = 91.8 to 97.8%) compared to the MeltPro TB/INH assay. Further evaluation of 109 smear-positive sputum specimens increased the sensitivity from 83.3% (20/24, 95% CI = 64.2 to 93.3%) to 91.7% (22/24, 95% CI = 74.2 to 97.7%). In both cases, the specificity remained nearly unchanged. All heteroresistant samples newly identified by the DeepMelt TB/INH assay were confirmed by DNA sequencing and even partially by digital PCR. The DeepMelt assay may fill the gap between current genotypic and phenotypic drug susceptibility testing for detecting drug-resistant tuberculosis patients. Copyright © 2018 American Society for Microbiology.

  13. Optimization of conditions to sequence long cDNAs from viruses

    USDA-ARS?s Scientific Manuscript database

    Fourth generation sequencing with the Minion nanopore sequencer provides opportunity to obtain deep coverage and long read for single molecules. This will benefit studies on RNA viruses. In the past, Sanger, Illumina, and Ion Torrent sequencing have been utilized to study RNA viruses. Both technique...

  14. SNP discovery through de novo deep sequencing using the next generation of DNA sequencers

    USDA-ARS?s Scientific Manuscript database

    The production of high volumes of DNA sequence data using new technologies has permitted more efficient identification of single nucleotide polymorphisms in vertebrate genomes. This chapter presented practical methodology for production and analysis of DNA sequence data for SNP discovery....

  15. MRI markers of small vessel disease in lobar and deep hemispheric intracerebral hemorrhage.

    PubMed

    Smith, Eric E; Nandigam, Kaveer R N; Chen, Yu-Wei; Jeng, Jed; Salat, David; Halpin, Amy; Frosch, Matthew; Wendell, Lauren; Fazen, Louis; Rosand, Jonathan; Viswanathan, Anand; Greenberg, Steven M

    2010-09-01

    MRI evidence of small vessel disease is common in intracerebral hemorrhage (ICH). We hypothesized that ICH caused by cerebral amyloid angiopathy (CAA) or hypertensive vasculopathy would have different distributions of MRI T2 white matter hyperintensity (WMH) and microbleeds. Data were analyzed from 133 consecutive patients with primary supratentorial ICH and adequate MRI sequences. CAA was diagnosed using the Boston criteria. WMH segmentation was performed using a validated semiautomated method. WMH and microbleeds were compared according to site of symptomatic hematoma origin (lobar versus deep) or by pattern of hemorrhages, including both hematomas and microbleeds, on MRI gradient recalled echo sequence (grouped as lobar only-probable CAA, lobar only-possible CAA, deep hemispheric only, or mixed lobar and deep hemorrhages). Patients with lobar and deep hemispheric hematoma had similar median normalized WMH volumes (19.5 cm versus 19.9 cm(3), P=0.74) and prevalence of >or=1 microbleed (54% versus 52%, P=0.99). The supratentorial WMH distribution was similar according to hemorrhage location category; however, the prevalence of brain stem T2 hyperintensity was lower in lobar hematoma versus deep hematoma (54% versus 70%, P=0.004). Mixed ICH was common (23%). Patients with mixed ICH had large normalized WMH volumes and a posterior distribution of cortical hemorrhages similar to that seen in CAA. WMH distribution is largely similar between CAA-related and non-CAA-related ICH. Mixed lobar and deep hemorrhages are seen on MRI gradient recalled echo sequence in up to one fourth of patients; in these patients, both hypertension and CAA may be contributing to the burden of WMH.

  16. Spitzer Space Telescope Sequencing Operations Software, Strategies, and Lessons Learned

    NASA Technical Reports Server (NTRS)

    Bliss, David A.

    2006-01-01

    The Space Infrared Telescope Facility (SIRTF) was launched in August, 2003, and renamed to the Spitzer Space Telescope in 2004. Two years of observing the universe in the wavelength range from 3 to 180 microns has yielded enormous scientific discoveries. Since this magnificent observatory has a limited lifetime, maximizing science viewing efficiency (ie, maximizing time spent executing activities directly related to science observations) was the key operational objective. The strategy employed for maximizing science viewing efficiency was to optimize spacecraft flexibility, adaptability, and use of observation time. The selected approach involved implementation of a multi-engine sequencing architecture coupled with nondeterministic spacecraft and science execution times. This approach, though effective, added much complexity to uplink operations and sequence development. The Jet Propulsion Laboratory (JPL) manages Spitzer s operations. As part of the uplink process, Spitzer s Mission Sequence Team (MST) was tasked with processing observatory inputs from the Spitzer Science Center (SSC) into efficiently integrated, constraint-checked, and modeled review and command products which accommodated the complexity of non-deterministic spacecraft and science event executions without increasing operations costs. The MST developed processes, scripts, and participated in the adaptation of multi-mission core software to enable rapid processing of complex sequences. The MST was also tasked with developing a Downlink Keyword File (DKF) which could instruct Deep Space Network (DSN) stations on how and when to configure themselves to receive Spitzer science data. As MST and uplink operations developed, important lessons were learned that should be applied to future missions, especially those missions which employ command-intensive operations via a multi-engine sequence architecture.

  17. Next-generation sequencing reveals low-dose effects of cationic dendrimers in primary human bronchial epithelial cells.

    PubMed

    Feliu, Neus; Kohonen, Pekka; Ji, Jie; Zhang, Yuning; Karlsson, Hanna L; Palmberg, Lena; Nyström, Andreas; Fadeel, Bengt

    2015-01-27

    Gene expression profiling has developed rapidly in recent years with the advent of deep sequencing technologies such as RNA sequencing (RNA Seq) and could be harnessed to predict and define mechanisms of toxicity of chemicals and nanomaterials. However, the full potential of these technologies in (nano)toxicology is yet to be realized. Here, we show that systems biology approaches can uncover mechanisms underlying cellular responses to nanomaterials. Using RNA Seq and computational approaches, we found that cationic poly(amidoamine) dendrimers (PAMAM-NH2) are capable of triggering down-regulation of cell-cycle-related genes in primary human bronchial epithelial cells at doses that do not elicit acute cytotoxicity, as demonstrated using conventional cell viability assays, while gene transcription was not affected by neutral PAMAM-OH dendrimers. The PAMAMs were internalized in an active manner by lung cells and localized mainly in lysosomes; amine-terminated dendrimers were internalized more efficiently when compared to the hydroxyl-terminated dendrimers. Upstream regulator analysis implicated NF-κB as a putative transcriptional regulator, and subsequent cell-based assays confirmed that PAMAM-NH2 caused NF-κB-dependent cell cycle arrest. However, PAMAM-NH2 did not affect cell cycle progression in the human A549 adenocarcinoma cell line. These results demonstrate the feasibility of applying systems biology approaches to predict cellular responses to nanomaterials and highlight the importance of using relevant (primary) cell models.

  18. Deep Sequencing Analysis of Apple Infecting Viruses in Korea

    PubMed Central

    Cho, In-Sook; Igori, Davaajargal; Lim, Seungmo; Choi, Gug-Seoun; Hammond, John; Lim, Hyoun-Sub; Moon, Jae Sun

    2016-01-01

    Deep sequencing has generated 52 contigs derived from five viruses; Apple chlorotic leaf spot virus (ACLSV), Apple stem grooving virus (ASGV), Apple stem pitting virus (ASPV), Apple green crinkle associated virus (AGCaV), and Apricot latent virus (ApLV) were identified from eight apple samples showing small leaves and/or growth retardation. Nucleotide (nt) sequence identity of the assembled contigs was from 68% to 99% compared to the reference sequences of the five respective viral genomes. Sequences of ASPV and ASGV were the most abundantly represented by the 52 contigs assembled. The presence of the five viruses in the samples was confirmed by RT-PCR using specific primers based on the sequences of each assembled contig. All five viruses were detected in three of the samples, whereas all samples had mixed infections with at least two viruses. The most frequently detected virus was ASPV, followed by ASGV, ApLV, ACLSV, and AGCaV which were withal found in mixed infections in the tested samples. AGCaV was identified in assembled contigs ID 1012480 and 93549, which showed 82% and 78% nt sequence identity with ORF1 of AGCaV isolate Aurora-1. ApLV was identified in three assembled contigs, ID 65587, 1802365, and 116777, which showed 77%, 78%, and 76% nt sequence identity respectively with ORF1 of ApLV isolate LA2. Deep sequencing assay was shown to be a valuable and powerful tool for detection and identification of known and unknown virome in infected apple trees, here identifying ApLV and AGCaV in commercial orchards in Korea for the first time. PMID:27721694

  19. Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks

    NASA Astrophysics Data System (ADS)

    Wang, Yiheng; Liu, Tong; Xu, Dong; Shi, Huidong; Zhang, Chaoyang; Mo, Yin-Yuan; Wang, Zheng

    2016-01-01

    The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named “DeepMethyl” to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.

  20. Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks.

    PubMed

    Wang, Yiheng; Liu, Tong; Xu, Dong; Shi, Huidong; Zhang, Chaoyang; Mo, Yin-Yuan; Wang, Zheng

    2016-01-22

    The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named "DeepMethyl" to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.

  1. Insertion sequences enrichment in extreme Red sea brine pool vent.

    PubMed

    Elbehery, Ali H A; Aziz, Ramy K; Siam, Rania

    2017-03-01

    Mobile genetic elements are major agents of genome diversification and evolution. Limited studies addressed their characteristics, including abundance, and role in extreme habitats. One of the rare natural habitats exposed to multiple-extreme conditions, including high temperature, salinity and concentration of heavy metals, are the Red Sea brine pools. We assessed the abundance and distribution of different mobile genetic elements in four Red Sea brine pools including the world's largest known multiple-extreme deep-sea environment, the Red Sea Atlantis II Deep. We report a gradient in the abundance of mobile genetic elements, dramatically increasing in the harshest environment of the pool. Additionally, we identified a strong association between the abundance of insertion sequences and extreme conditions, being highest in the harshest and deepest layer of the Red Sea Atlantis II Deep. Our comparative analyses of mobile genetic elements in secluded, extreme and relatively non-extreme environments, suggest that insertion sequences predominantly contribute to polyextremophiles genome plasticity.

  2. Multimission Software Reuse in an Environment of Large Paradigm Shifts

    NASA Technical Reports Server (NTRS)

    Wilson, Robert K.

    1996-01-01

    The ground data systems provided for NASA space mission support are discussed. As space missions expand, the ground systems requirements become more complex. Current ground data systems provide for telemetry, command, and uplink and downlink processing capabilities. The new millennium project (NMP) technology testbed for 21st century NASA missions is discussed. The program demonstrates spacecraft and ground system technologies. The paradigm shift from detailed ground sequencing to a goal oriented planning approach is considered. The work carried out to meet this paradigm for the Deep Space-1 (DS-1) mission is outlined.

  3. Bacterial Diversity in Bentonites, Engineered Barrier for Deep Geological Disposal of Radioactive Wastes.

    PubMed

    Lopez-Fernandez, Margarita; Cherkouk, Andrea; Vilchez-Vargas, Ramiro; Jauregui, Ruy; Pieper, Dietmar; Boon, Nico; Sanchez-Castro, Ivan; Merroun, Mohamed L

    2015-11-01

    The long-term disposal of radioactive wastes in a deep geological repository is the accepted international solution for the treatment and management of these special residues. The microbial community of the selected host rocks and engineered barriers for the deep geological repository may affect the performance and the safety of the radioactive waste disposal. In this work, the bacterial population of bentonite formations of Almeria (Spain), selected as a reference material for bentonite-engineered barriers in the disposal of radioactive wastes, was studied. 16S ribosomal RNA (rRNA) gene-based approaches were used to study the bacterial community of the bentonite samples by traditional clone libraries and Illumina sequencing. Using both techniques, the bacterial diversity analysis revealed similar results, with phylotypes belonging to 14 different bacterial phyla: Acidobacteria, Actinobacteria, Armatimonadetes, Bacteroidetes, Chloroflexi, Cyanobacteria, Deinococcus-Thermus, Firmicutes, Gemmatimonadetes, Planctomycetes, Proteobacteria, Nitrospirae, Verrucomicrobia and an unknown phylum. The dominant groups of the community were represented by Proteobacteria and Bacteroidetes. A high diversity was found in three of the studied samples. However, two samples were less diverse and dominated by Betaproteobacteria.

  4. Affinity Maturation of a Cyclic Peptide Handle for Therapeutic Antibodies Using Deep Mutational Scanning*

    PubMed Central

    van Rosmalen, Martijn; Janssen, Brian M. G.; Hendrikse, Natalie M.; van der Linden, Ardjan J.; Pieters, Pascal A.; Wanders, Dave; de Greef, Tom F. A.; Merkx, Maarten

    2017-01-01

    Meditopes are cyclic peptides that bind in a specific pocket in the antigen-binding fragment of a therapeutic antibody such as cetuximab. Provided their moderate affinity can be enhanced, meditope peptides could be used as specific non-covalent and paratope-independent handles in targeted drug delivery, molecular imaging, and therapeutic drug monitoring. Here we show that the affinity of a recently reported meditope for cetuximab can be substantially enhanced using a combination of yeast display and deep mutational scanning. Deep sequencing was used to construct a fitness landscape of this protein-peptide interaction, and four mutations were identified that together improved the affinity for cetuximab 10-fold to 15 nm. Importantly, the increased affinity translated into enhanced cetuximab-mediated recruitment to EGF receptor-overexpressing cancer cells. Although in silico Rosetta simulations correctly identified positions that were tolerant to mutation, modeling did not accurately predict the affinity-enhancing mutations. The experimental approach reported here should be generally applicable and could be used to develop meditope peptides with low nanomolar affinity for other therapeutic antibodies. PMID:27974464

  5. Combining deep learning and level set for the automated segmentation of the left ventricle of the heart from cardiac cine magnetic resonance.

    PubMed

    Ngo, Tuan Anh; Lu, Zhi; Carneiro, Gustavo

    2017-01-01

    We introduce a new methodology that combines deep learning and level set for the automated segmentation of the left ventricle of the heart from cardiac cine magnetic resonance (MR) data. This combination is relevant for segmentation problems, where the visual object of interest presents large shape and appearance variations, but the annotated training set is small, which is the case for various medical image analysis applications, including the one considered in this paper. In particular, level set methods are based on shape and appearance terms that use small training sets, but present limitations for modelling the visual object variations. Deep learning methods can model such variations using relatively small amounts of annotated training, but they often need to be regularised to produce good generalisation. Therefore, the combination of these methods brings together the advantages of both approaches, producing a methodology that needs small training sets and produces accurate segmentation results. We test our methodology on the MICCAI 2009 left ventricle segmentation challenge database (containing 15 sequences for training, 15 for validation and 15 for testing), where our approach achieves the most accurate results in the semi-automated problem and state-of-the-art results for the fully automated challenge. Crown Copyright © 2016. Published by Elsevier B.V. All rights reserved.

  6. Hiding deep in the trees: discovery of divergent mitochondrial lineages in Malagasy chameleons of the Calumma nasutum group

    PubMed Central

    Gehring, Philip-Sebastian; Tolley, Krystal A; Eckhardt, Falk Sebastian; Townsend, Ted M; Ziegler, Thomas; Ratsoavina, Fanomezana; Glaw, Frank; Vences, Miguel

    2012-01-01

    We conducted a comprehensive molecular phylogenetic study for a group of chameleons from Madagascar (Chamaeleonidae: Calumma nasutum group, comprising seven nominal species) to examine the genetic and species diversity in this widespread genus. Based on DNA sequences of the mitochondrial gene (ND2) from 215 specimens, we reconstructed the phylogeny using a Bayesian approach. Our results show deep divergences among several unnamed mitochondrial lineages that are difficult to identify morphologically. We evaluated lineage diversification using a number of statistical phylogenetic methods (general mixed Yule-coalescent model; SpeciesIdentifier; net p-distances) to objectively delimit lineages that we here consider as operational taxonomic units (OTUs), and for which the taxonomic status remains largely unknown. In addition, we compared molecular and morphological differentiation in detail for one particularly diverse clade (the C. boettgeri complex) from northern Madagascar. To assess the species boundaries within this group we used an integrative taxonomic approach, combining evidence from two independent molecular markers (ND2 and CMOS), together with genital and other external morphological characters, and conclude that some of the newly discovered OTUs are separate species (confirmed candidate species, CCS), while others should best be considered as deep conspecific lineages (DCLs). Our analysis supports a total of 33 OTUs, of which seven correspond to described species, suggesting that the taxonomy of the C. nasutum group is in need of revision. PMID:22957155

  7. DeepLoc: prediction of protein subcellular localization using deep learning.

    PubMed

    Almagro Armenteros, José Juan; Sønderby, Casper Kaae; Sønderby, Søren Kaae; Nielsen, Henrik; Winther, Ole

    2017-11-01

    The prediction of eukaryotic protein subcellular localization is a well-studied topic in bioinformatics due to its relevance in proteomics research. Many machine learning methods have been successfully applied in this task, but in most of them, predictions rely on annotation of homologues from knowledge databases. For novel proteins where no annotated homologues exist, and for predicting the effects of sequence variants, it is desirable to have methods for predicting protein properties from sequence information only. Here, we present a prediction algorithm using deep neural networks to predict protein subcellular localization relying only on sequence information. At its core, the prediction model uses a recurrent neural network that processes the entire protein sequence and an attention mechanism identifying protein regions important for the subcellular localization. The model was trained and tested on a protein dataset extracted from one of the latest UniProt releases, in which experimentally annotated proteins follow more stringent criteria than previously. We demonstrate that our model achieves a good accuracy (78% for 10 categories; 92% for membrane-bound or soluble), outperforming current state-of-the-art algorithms, including those relying on homology information. The method is available as a web server at http://www.cbs.dtu.dk/services/DeepLoc. Example code is available at https://github.com/JJAlmagro/subcellular_localization. The dataset is available at http://www.cbs.dtu.dk/services/DeepLoc/data.php. jjalma@dtu.dk. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  8. Musculoskeletal MRI findings of juvenile localized scleroderma.

    PubMed

    Eutsler, Eric P; Horton, Daniel B; Epelman, Monica; Finkel, Terri; Averill, Lauren W

    2017-04-01

    Juvenile localized scleroderma comprises a group of autoimmune conditions often characterized clinically by an area of skin hardening. In addition to superficial changes in the skin and subcutaneous tissues, juvenile localized scleroderma may involve the deep soft tissues, bones and joints, possibly resulting in functional impairment and pain in addition to cosmetic changes. There is literature documenting the spectrum of findings for deep involvement of localized scleroderma (fascia, muscles, tendons, bones and joints) in adults, but there is limited literature for the condition in children. We aimed to document the spectrum of musculoskeletal magnetic resonance imaging (MRI) findings of both superficial and deep juvenile localized scleroderma involvement in children and to evaluate the utility of various MRI sequences for detecting those findings. Two radiologists retrospectively evaluated 20 MRI studies of the extremities in 14 children with juvenile localized scleroderma. Each imaging sequence was also given a subjective score of 0 (not useful), 1 (somewhat useful) or 2 (most useful for detecting the findings). Deep tissue involvement was detected in 65% of the imaged extremities. Fascial thickening and enhancement were seen in 50% of imaged extremities. Axial T1, axial T1 fat-suppressed (FS) contrast-enhanced and axial fluid-sensitive sequences were rated most useful. Fascial thickening and enhancement were the most commonly encountered deep tissue findings in extremity MRIs of children with juvenile localized scleroderma. Because abnormalities of the skin, subcutaneous tissues and fascia tend to run longitudinally in an affected limb, axial T1, axial fluid-sensitive and axial T1-FS contrast-enhanced sequences should be included in the imaging protocol.

  9. Dissecting enzyme function with microfluidic-based deep mutational scanning.

    PubMed

    Romero, Philip A; Tran, Tuan M; Abate, Adam R

    2015-06-09

    Natural enzymes are incredibly proficient catalysts, but engineering them to have new or improved functions is challenging due to the complexity of how an enzyme's sequence relates to its biochemical properties. Here, we present an ultrahigh-throughput method for mapping enzyme sequence-function relationships that combines droplet microfluidic screening with next-generation DNA sequencing. We apply our method to map the activity of millions of glycosidase sequence variants. Microfluidic-based deep mutational scanning provides a comprehensive and unbiased view of the enzyme function landscape. The mapping displays expected patterns of mutational tolerance and a strong correspondence to sequence variation within the enzyme family, but also reveals previously unreported sites that are crucial for glycosidase function. We modified the screening protocol to include a high-temperature incubation step, and the resulting thermotolerance landscape allowed the discovery of mutations that enhance enzyme thermostability. Droplet microfluidics provides a general platform for enzyme screening that, when combined with DNA-sequencing technologies, enables high-throughput mapping of enzyme sequence space.

  10. Complete genome sequence of Southern tomato virus naturally infecting tomatoes in Bangladesh using small RNA deep sequencing

    USDA-ARS?s Scientific Manuscript database

    The complete genome sequence of a Southern tomato virus (STV) isolate on tomato plants in a seed production field in Bangladesh was obtained for the first time using next generation sequencing. The identified isolate STV_BD-13 shares high degree of sequence identity (99%) with several known STV isol...

  11. Complete genome sequence of southern tomato virus identified from China using next generation sequencing

    USDA-ARS?s Scientific Manuscript database

    Complete genome sequence of a double-stranded RNA (dsRNA) virus, southern tomato virus (STV), on tomatoes in China, was elucidated using small RNAs deep sequencing. The identified STV_CN12 shares 99% sequence identity to other isolates from Mexico, France, Spain, and U.S. This is the first report ...

  12. Deep whole-genome sequencing of 100 southeast Asian Malays.

    PubMed

    Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2013-01-10

    Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  13. Deep Whole-Genome Sequencing of 100 Southeast Asian Malays

    PubMed Central

    Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2013-01-01

    Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. PMID:23290073

  14. High-resolution mapping, characterization, and optimization of autonomously replicating sequences in yeast

    PubMed Central

    Liachko, Ivan; Youngblood, Rachel A.; Keich, Uri; Dunham, Maitreya J.

    2013-01-01

    DNA replication origins are necessary for the duplication of genomes. In addition, plasmid-based expression systems require DNA replication origins to maintain plasmids efficiently. The yeast autonomously replicating sequence (ARS) assay has been a valuable tool in dissecting replication origin structure and function. However, the dearth of information on origins in diverse yeasts limits the availability of efficient replication origin modules to only a handful of species and restricts our understanding of origin function and evolution. To enable rapid study of origins, we have developed a sequencing-based suite of methods for comprehensively mapping and characterizing ARSs within a yeast genome. Our approach finely maps genomic inserts capable of supporting plasmid replication and uses massively parallel deep mutational scanning to define molecular determinants of ARS function with single-nucleotide resolution. In addition to providing unprecedented detail into origin structure, our data have allowed us to design short, synthetic DNA sequences that retain maximal ARS function. These methods can be readily applied to understand and modulate ARS function in diverse systems. PMID:23241746

  15. Using deep RNA sequencing for the structural annotation of the laccaria bicolor mycorrhizal transcriptome.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Larsen, P. E.; Trivedi, G.; Sreedasyam, A.

    2010-07-06

    Accurate structural annotation is important for prediction of function and required for in vitro approaches to characterize or validate the gene expression products. Despite significant efforts in the field, determination of the gene structure from genomic data alone is a challenging and inaccurate process. The ease of acquisition of transcriptomic sequence provides a direct route to identify expressed sequences and determine the correct gene structure. We developed methods to utilize RNA-seq data to correct errors in the structural annotation and extend the boundaries of current gene models using assembly approaches. The methods were validated with a transcriptomic data set derivedmore » from the fungus Laccaria bicolor, which develops a mycorrhizal symbiotic association with the roots of many tree species. Our analysis focused on the subset of 1501 gene models that are differentially expressed in the free living vs. mycorrhizal transcriptome and are expected to be important elements related to carbon metabolism, membrane permeability and transport, and intracellular signaling. Of the set of 1501 gene models, 1439 (96%) successfully generated modified gene models in which all error flags were successfully resolved and the sequences aligned to the genomic sequence. The remaining 4% (62 gene models) either had deviations from transcriptomic data that could not be spanned or generated sequence that did not align to genomic sequence. The outcome of this process is a set of high confidence gene models that can be reliably used for experimental characterization of protein function. 69% of expressed mycorrhizal JGI 'best' gene models deviated from the transcript sequence derived by this method. The transcriptomic sequence enabled correction of a majority of the structural inconsistencies and resulted in a set of validated models for 96% of the mycorrhizal genes. The method described here can be applied to improve gene structural annotation in other species, provided that there is a sequenced genome and a set of gene models.« less

  16. AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling.

    PubMed

    Wang, Sheng; Sun, Siqi; Xu, Jinbo

    2016-09-01

    Deep Convolutional Neural Networks (DCNN) has shown excellent performance in a variety of machine learning tasks. This paper presents Deep Convolutional Neural Fields (DeepCNF), an integration of DCNN with Conditional Random Field (CRF), for sequence labeling with an imbalanced label distribution. The widely-used training methods, such as maximum-likelihood and maximum labelwise accuracy, do not work well on imbalanced data. To handle this, we present a new training algorithm called maximum-AUC for DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced data. To fulfill this, we formulate AUC in a pairwise ranking framework, approximate it by a polynomial function and then apply a gradient-based procedure to optimize it. Our experimental results confirm that maximum-AUC greatly outperforms the other two training methods on 8-state secondary structure prediction and disorder prediction since their label distributions are highly imbalanced and also has similar performance as the other two training methods on solvent accessibility prediction, which has three equally-distributed labels. Furthermore, our experimental results show that our AUC-trained DeepCNF models greatly outperform existing popular predictors of these three tasks. The data and software related to this paper are available at https://github.com/realbigws/DeepCNF_AUC.

  17. AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling

    PubMed Central

    Wang, Sheng; Sun, Siqi

    2017-01-01

    Deep Convolutional Neural Networks (DCNN) has shown excellent performance in a variety of machine learning tasks. This paper presents Deep Convolutional Neural Fields (DeepCNF), an integration of DCNN with Conditional Random Field (CRF), for sequence labeling with an imbalanced label distribution. The widely-used training methods, such as maximum-likelihood and maximum labelwise accuracy, do not work well on imbalanced data. To handle this, we present a new training algorithm called maximum-AUC for DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced data. To fulfill this, we formulate AUC in a pairwise ranking framework, approximate it by a polynomial function and then apply a gradient-based procedure to optimize it. Our experimental results confirm that maximum-AUC greatly outperforms the other two training methods on 8-state secondary structure prediction and disorder prediction since their label distributions are highly imbalanced and also has similar performance as the other two training methods on solvent accessibility prediction, which has three equally-distributed labels. Furthermore, our experimental results show that our AUC-trained DeepCNF models greatly outperform existing popular predictors of these three tasks. The data and software related to this paper are available at https://github.com/realbigws/DeepCNF_AUC. PMID:28884168

  18. A phylogenetic approach to octocoral community structure in the deep Gulf of Mexico

    NASA Astrophysics Data System (ADS)

    Quattrini, Andrea M.; Etnoyer, Peter J.; Doughty, Cheryl; English, Lisa; Falco, Rosalia; Remon, Natasha; Rittinghouse, Matthew; Cordes, Erik E.

    2014-01-01

    Deep-sea communities are becoming increasingly vulnerable to anthropogenic disturbances, as fishing, hydrocarbon exploration and extraction, and mining activities extend into deeper water. Negative impacts from such activities were recently documented in the Gulf of Mexico (GoM), where the Deepwater Horizon oil spill caused substantial damage to a deep-water octocoral community. Although a faunal checklist and numerous museum records are currently available for the entire GoM, local-scale diversity and assemblage structure of octocoral communities remains unknown, particularly in deep water. On a series of recent cruises (2008-2011) using remotely operated vehicles, 435 octocorals were collected from 33 deep-water sites (250-2500 m) in the northern GoM. To elucidate species boundaries, the extended mitochondrial barcode (COI+igr1+msh) was successfully amplified and sequenced for 422 of these specimens, yielding a total of 64 haplotypes representing at least 52 species. Further, at least 29% of the species collected were either previously not known to occur in the GoM (12 species) or represent new species (at least three species). Overall, species richness at each site was fairly low (1-12 spp.). The greatest species richness occurred at the shallowest (<325 m: GC140, n=8 spp.) and the deepest (2100-2500 m: DC673, n=12 spp., DC583, n=10 spp.) sites, and minimum taxonomic and phylogenetic (Faith's Index) diversity was evident at 600-950 m. This pattern is the opposite of the typical pattern of deep-sea diversity in the GoM, which normally peaks at mid-slope depths. Sorensen's Index of taxonomic β-diversity indicated that six distinct (65-95% dissimilarity) species assemblages corresponded with five depth breaks at ~325, 425, 600, 1100, and 2100 m. Further assemblage structure was observed within certain depth zones. Of note, within the 425-600 m depth range, species assemblages at the West Florida Slope differed from the other sites, corresponding to an established biogeographic barrier. The phylogenetic approach used in this study provided important insights into the species boundaries of many taxa while demonstrating that evolutionary history plays a critical role in community structure of deep-sea octocorals.

  19. AGN contribution to the total IR luminosity in Herschel selected galaxies out to z~1.5

    NASA Astrophysics Data System (ADS)

    Baronchelli, Ivano; Scarlata, Claudia; Rodighiero, Giulia; Berta, Stefano; Sedgwick, Christopher; Vaccari, Mattia; Franceschini, Alberto; Urrutia, Tanya; Malkan, Matthew Arnold; Salvato, Mara; Bonato, Matteo; Serjeant, Stephen; Pearson, Chris; Marchetti, Lucia

    2016-01-01

    In the past decade, a growing amount of evidence suggests a tight link between the growth of Active Galactic Nuclei (AGN) and that of their host galaxies. X-ray studies on the Super Massive Black Holes (SMBHs) activity indicate the existence of a Black Hole Accretion Rate (BHAR) "main sequence", similar to the "main sequence" observed in star-forming galaxies, between the star-formation rate (SFR) and stellar mass (M*). We use the multi wavelength data from the SIMES survey to study the optical to sub-mm spectral energy distribution (SED) of galaxies identified at 250 μm by the Herschel Space Observatory. In particular, for galaxies in the 0.2-1.5 redshift range, we explore the relations among galaxy's stellar mass, SFR, and SMBH accretion rate. The deep Spitzer-IRAC/MIPS (3.6, 4.5 and 24 μm) together with the deep AKARI-IRC observations (7, 11 and 15 μm) allow us to constrain the critical spectral region where the dusty torus emission of AGNs is more prominent. Thanks to the Herschel-SPIRE observations, we can also precisely measure the SFR from the bolometric (i.e. 8-1000 μm) far-IR emission. Using this multi-wavelength approach we confirm the existence, at z<0.5, of the M*-BHAR "main sequence". The measured average ratio between BHAR and SFR is close to the value required to maintain the SMBH-to-M* ratio of ˜103 and decreases at higher specific SFRs (SSFR=SFR/M*). Finally, combining our observations with literature results, we show that the slope of the BHAR main sequence is evolving with redshift between z~0 and z~2.

  20. Use of sequence-independent-single-primer-amplification (SISPA) for whole genome sequencing using illumina MiSeq platform for avian influenza virus, Newcastle disease virus, and infectious bronchitis virus

    USDA-ARS?s Scientific Manuscript database

    Over the past decade, Next Generation Sequencing (NGS) technologies, also called deep sequencing, have continued to evolve, increasing capacity and lower the cost necessary for large genome sequencing projects. The one of the advantage of NGS platforms is the possibility to sequence the samples with...

  1. Deep sequencing of foot-and-mouth disease virus reveals RNA sequences involved in genome packaging.

    PubMed

    Logan, Grace; Newman, Joseph; Wright, Caroline F; Lasecka-Dykes, Lidia; Haydon, Daniel T; Cottam, Eleanor M; Tuthill, Tobias J

    2017-10-18

    Non-enveloped viruses protect their genomes by packaging them into an outer shell or capsid of virus-encoded proteins. Packaging and capsid assembly in RNA viruses can involve interactions between capsid proteins and secondary structures in the viral genome as exemplified by the RNA bacteriophage MS2 and as proposed for other RNA viruses of plants, animals and human. In the picornavirus family of non-enveloped RNA viruses, the requirements for genome packaging remain poorly understood. Here we show a novel and simple approach to identify predicted RNA secondary structures involved in genome packaging in the picornavirus foot-and-mouth disease virus (FMDV). By interrogating deep sequencing data generated from both packaged and unpackaged populations of RNA we have determined multiple regions of the genome with constrained variation in the packaged population. Predicted secondary structures of these regions revealed stem loops with conservation of structure and a common motif at the loop. Disruption of these features resulted in attenuation of virus growth in cell culture due to a reduction in assembly of mature virions. This study provides evidence for the involvement of predicted RNA structures in picornavirus packaging and offers a readily transferable methodology for identifying packaging requirements in many other viruses. Importance In order to transmit their genetic material to a new host, non-enveloped viruses must protect their genomes by packaging them into an outer shell or capsid of virus-encoded proteins. For many non-enveloped RNA viruses the requirements for this critical part of the viral life cycle remain poorly understood. We have identified RNA sequences involved in genome packaging of the picornavirus foot-and-mouth disease virus. This virus causes an economically devastating disease of livestock affecting both the developed and developing world. The experimental methods developed to carry out this work are novel, simple and transferable to the study of packaging signals in other RNA viruses. Improved understanding of RNA packaging may lead to novel vaccine approaches or targets for antiviral drugs with broad spectrum activity. Copyright © 2017 Logan et al.

  2. Protein Solvent-Accessibility Prediction by a Stacked Deep Bidirectional Recurrent Neural Network.

    PubMed

    Zhang, Buzhong; Li, Linqing; Lü, Qiang

    2018-05-25

    Residue solvent accessibility is closely related to the spatial arrangement and packing of residues. Predicting the solvent accessibility of a protein is an important step to understand its structure and function. In this work, we present a deep learning method to predict residue solvent accessibility, which is based on a stacked deep bidirectional recurrent neural network applied to sequence profiles. To capture more long-range sequence information, a merging operator was proposed when bidirectional information from hidden nodes was merged for outputs. Three types of merging operators were used in our improved model, with a long short-term memory network performing as a hidden computing node. The trained database was constructed from 7361 proteins extracted from the PISCES server using a cut-off of 25% sequence identity. Sequence-derived features including position-specific scoring matrix, physical properties, physicochemical characteristics, conservation score and protein coding were used to represent a residue. Using this method, predictive values of continuous relative solvent-accessible area were obtained, and then, these values were transformed into binary states with predefined thresholds. Our experimental results showed that our deep learning method improved prediction quality relative to current methods, with mean absolute error and Pearson's correlation coefficient values of 8.8% and 74.8%, respectively, on the CB502 dataset and 8.2% and 78%, respectively, on the Manesh215 dataset.

  3. Prognostic value of deep sequencing method for minimal residual disease detection in multiple myeloma

    PubMed Central

    Lahuerta, Juan J.; Pepin, François; González, Marcos; Barrio, Santiago; Ayala, Rosa; Puig, Noemí; Montalban, María A.; Paiva, Bruno; Weng, Li; Jiménez, Cristina; Sopena, María; Moorhead, Martin; Cedena, Teresa; Rapado, Immaculada; Mateos, María Victoria; Rosiñol, Laura; Oriol, Albert; Blanchard, María J.; Martínez, Rafael; Bladé, Joan; San Miguel, Jesús; Faham, Malek; García-Sanz, Ramón

    2014-01-01

    We assessed the prognostic value of minimal residual disease (MRD) detection in multiple myeloma (MM) patients using a sequencing-based platform in bone marrow samples from 133 MM patients in at least very good partial response (VGPR) after front-line therapy. Deep sequencing was carried out in patients in whom a high-frequency myeloma clone was identified and MRD was assessed using the IGH-VDJH, IGH-DJH, and IGK assays. The results were contrasted with those of multiparametric flow cytometry (MFC) and allele-specific oligonucleotide polymerase chain reaction (ASO-PCR). The applicability of deep sequencing was 91%. Concordance between sequencing and MFC and ASO-PCR was 83% and 85%, respectively. Patients who were MRD– by sequencing had a significantly longer time to tumor progression (TTP) (median 80 vs 31 months; P < .0001) and overall survival (median not reached vs 81 months; P = .02), compared with patients who were MRD+. When stratifying patients by different levels of MRD, the respective TTP medians were: MRD ≥10−3 27 months, MRD 10−3 to 10−5 48 months, and MRD <10−5 80 months (P = .003 to .0001). Ninety-two percent of VGPR patients were MRD+. In complete response patients, the TTP remained significantly longer for MRD– compared with MRD+ patients (131 vs 35 months; P = .0009). PMID:24646471

  4. Deep sequencing and flow cytometric characterization of expanded effector memory CD8+CD57+ T cells frequently reveals T-cell receptor Vβ oligoclonality and CDR3 homology in acquired aplastic anemia.

    PubMed

    Giudice, Valentina; Feng, Xingmin; Lin, Zenghua; Hu, Wei; Zhang, Fanmao; Qiao, Wangmin; Ibanez, Maria Del Pilar Fernandez; Rios, Olga; Young, Neal S

    2018-05-01

    Oligoclonal expansion of CD8 + CD28 - lymphocytes has been considered indirect evidence for a pathogenic immune response in acquired aplastic anemia. A subset of CD8 + CD28 - cells with CD57 expression, termed effector memory cells, is expanded in several immune-mediated diseases and may have a role in immune surveillance. We hypothesized that effector memory CD8 + CD28 - CD57 + cells may drive aberrant oligoclonal expansion in aplastic anemia. We found CD8 + CD57 + cells frequently expanded in the blood of aplastic anemia patients, with oligoclonal characteristics by flow cytometric Vβ usage analysis: skewing in 1-5 Vβ families and frequencies of immunodominant clones ranging from 1.98% to 66.5%. Oligoclonal characteristics were also observed in total CD8 + cells from aplastic anemia patients with CD8 + CD57 + cell expansion by T-cell receptor deep sequencing, as well as the presence of 1-3 immunodominant clones. Oligoclonality was confirmed by T-cell receptor repertoire deep sequencing of enriched CD8 + CD57 + cells, which also showed decreased diversity compared to total CD4 + and CD8 + cell pools. From analysis of complementarity-determining region 3 sequences in the CD8 + cell pool, a total of 29 sequences were shared between patients and controls, but these sequences were highly expressed in aplastic anemia subjects and also present in their immunodominant clones. In summary, expansion of effector memory CD8 + T cells is frequent in aplastic anemia and mirrors Vβ oligoclonal expansion. Flow cytometric Vβ usage analysis combined with deep sequencing technologies allows high resolution characterization of the T-cell receptor repertoire, and might represent a useful tool in the diagnosis and periodic evaluation of aplastic anemia patients. (Registered at clinicaltrials.gov identifiers: 00001620, 01623167, 00001397, 00071045, 00081523, 00961064 ). Copyright © 2018 Ferrata Storti Foundation.

  5. Genome-Wide Identification of miRNAs Responsive to Drought in Peach (Prunus persica) by High-Throughput Deep Sequencing

    PubMed Central

    Eldem, Vahap; Çelikkol Akçay, Ufuk; Ozhuner, Esma; Bakır, Yakup; Uranbey, Serkan; Unver, Turgay

    2012-01-01

    Peach (Prunus persica L.) is one of the most important worldwide fresh fruits. Since fruit growth largely depends on adequate water supply, drought stress is considered as the most important abiotic stress limiting fleshy fruit production and quality in peach. Plant responses to drought stress are regulated both at transcriptional and post-transcriptional level. As post-transcriptional gene regulators, miRNAs (miRNAs) are small (19–25 nucleotides in length), endogenous, non-coding RNAs. Recent studies indicate that miRNAs are involved in plant responses to drought. Therefore, Illumina deep sequencing technology was used for genome-wide identification of miRNAs and their expression profile in response to drought in peach. In this study, four sRNA libraries were constructed from leaf control (LC), leaf stress (LS), root control (RC) and root stress (RS) samples. We identified a total of 531, 471, 535 and 487 known mature miRNAs in LC, LS, RC and RS libraries, respectively. The expression level of 262 (104 up-regulated, 158 down-regulated) of the 453 miRNAs changed significantly in leaf tissue, whereas 368 (221 up-regulated, 147 down-regulated) of the 465 miRNAs had expression levels that changed significantly in root tissue upon drought stress. Additionally, a total of 197, 221, 238 and 265 novel miRNA precursor candidates were identified from LC, LS, RC and RS libraries, respectively. Target transcripts (137 for LC, 133 for LS, 148 for RC and 153 for RS) generated significant Gene Ontology (GO) terms related to DNA binding and catalytic activites. Genome-wide miRNA expression analysis of peach by deep sequencing approach helped to expand our understanding of miRNA function in response to drought stress in peach and Rosaceae. A set of differentially expressed miRNAs could pave the way for developing new strategies to alleviate the adverse effects of drought stress on plant growth and development. PMID:23227166

  6. On-Line Detection and Segmentation of Sports Motions Using a Wearable Sensor.

    PubMed

    Kim, Woosuk; Kim, Myunggyu

    2018-03-19

    In sports motion analysis, observation is a prerequisite for understanding the quality of motions. This paper introduces a novel approach to detect and segment sports motions using a wearable sensor for supporting systematic observation. The main goal is, for convenient analysis, to automatically provide motion data, which are temporally classified according to the phase definition. For explicit segmentation, a motion model is defined as a sequence of sub-motions with boundary states. A sequence classifier based on deep neural networks is designed to detect sports motions from continuous sensor inputs. The evaluation on two types of motions (soccer kicking and two-handed ball throwing) verifies that the proposed method is successful for the accurate detection and segmentation of sports motions. By developing a sports motion analysis system using the motion model and the sequence classifier, we show that the proposed method is useful for observation of sports motions by automatically providing relevant motion data for analysis.

  7. Complete genome sequence of a tomato infecting tomato mottle mosaic virus in New York

    USDA-ARS?s Scientific Manuscript database

    Complete genome sequence of an emerging isolate of tomato mottle mosaic virus (ToMMV) infecting experimental nicotianan benthamiana plants in up-state New York was obtained using small RNA deep sequencing. ToMMV_NY-13 shared 99% sequence identity to ToMMV isolates from Mexico and Florida. Broader d...

  8. High-Resolution Whole-Genome Sequencing Reveals That Specific Chromatin Domains from Most Human Chromosomes Associate with Nucleoli

    PubMed Central

    van Koningsbruggen, Silvana; Gierliński, Marek; Schofield, Pietá; Martin, David; Barton, Geoffey J.; Ariyurek, Yavuz; den Dunnen, Johan T.

    2010-01-01

    The nuclear space is mostly occupied by chromosome territories and nuclear bodies. Although this organization of chromosomes affects gene function, relatively little is known about the role of nuclear bodies in the organization of chromosomal regions. The nucleolus is the best-studied subnuclear structure and forms around the rRNA repeat gene clusters on the acrocentric chromosomes. In addition to rDNA, other chromatin sequences also surround the nucleolar surface and may even loop into the nucleolus. These additional nucleolar-associated domains (NADs) have not been well characterized. We present here a whole-genome, high-resolution analysis of chromatin endogenously associated with nucleoli. We have used a combination of three complementary approaches, namely fluorescence comparative genome hybridization, high-throughput deep DNA sequencing and photoactivation combined with time-lapse fluorescence microscopy. The data show that specific sequences from most human chromosomes, in addition to the rDNA repeat units, associate with nucleoli in a reproducible and heritable manner. NADs have in common a high density of AT-rich sequence elements, low gene density and a statistically significant enrichment in transcriptionally repressed genes. Unexpectedly, both the direct DNA sequencing and fluorescence photoactivation data show that certain chromatin loci can specifically associate with either the nucleolus, or the nuclear envelope. PMID:20826608

  9. High-resolution whole-genome sequencing reveals that specific chromatin domains from most human chromosomes associate with nucleoli.

    PubMed

    van Koningsbruggen, Silvana; Gierlinski, Marek; Schofield, Pietá; Martin, David; Barton, Geoffey J; Ariyurek, Yavuz; den Dunnen, Johan T; Lamond, Angus I

    2010-11-01

    The nuclear space is mostly occupied by chromosome territories and nuclear bodies. Although this organization of chromosomes affects gene function, relatively little is known about the role of nuclear bodies in the organization of chromosomal regions. The nucleolus is the best-studied subnuclear structure and forms around the rRNA repeat gene clusters on the acrocentric chromosomes. In addition to rDNA, other chromatin sequences also surround the nucleolar surface and may even loop into the nucleolus. These additional nucleolar-associated domains (NADs) have not been well characterized. We present here a whole-genome, high-resolution analysis of chromatin endogenously associated with nucleoli. We have used a combination of three complementary approaches, namely fluorescence comparative genome hybridization, high-throughput deep DNA sequencing and photoactivation combined with time-lapse fluorescence microscopy. The data show that specific sequences from most human chromosomes, in addition to the rDNA repeat units, associate with nucleoli in a reproducible and heritable manner. NADs have in common a high density of AT-rich sequence elements, low gene density and a statistically significant enrichment in transcriptionally repressed genes. Unexpectedly, both the direct DNA sequencing and fluorescence photoactivation data show that certain chromatin loci can specifically associate with either the nucleolus, or the nuclear envelope.

  10. Prediction of enhancer-promoter interactions via natural language processing.

    PubMed

    Zeng, Wanwen; Wu, Mengmeng; Jiang, Rui

    2018-05-09

    Precise identification of three-dimensional genome organization, especially enhancer-promoter interactions (EPIs), is important to deciphering gene regulation, cell differentiation and disease mechanisms. Currently, it is a challenging task to distinguish true interactions from other nearby non-interacting ones since the power of traditional experimental methods is limited due to low resolution or low throughput. We propose a novel computational framework EP2vec to assay three-dimensional genomic interactions. We first extract sequence embedding features, defined as fixed-length vector representations learned from variable-length sequences using an unsupervised deep learning method in natural language processing. Then, we train a classifier to predict EPIs using the learned representations in supervised way. Experimental results demonstrate that EP2vec obtains F1 scores ranging from 0.841~ 0.933 on different datasets, which outperforms existing methods. We prove the robustness of sequence embedding features by carrying out sensitivity analysis. Besides, we identify motifs that represent cell line-specific information through analysis of the learned sequence embedding features by adopting attention mechanism. Last, we show that even superior performance with F1 scores 0.889~ 0.940 can be achieved by combining sequence embedding features and experimental features. EP2vec sheds light on feature extraction for DNA sequences of arbitrary lengths and provides a powerful approach for EPIs identification.

  11. Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes).

    PubMed

    Dessimoz, Christophe; Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro

    2011-09-01

    Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references.

  12. Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes)

    PubMed Central

    Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro

    2011-01-01

    Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references. PMID:21712341

  13. The Bouma Sequence and the turbidite mind set

    NASA Astrophysics Data System (ADS)

    Shanmugam, G.

    1997-11-01

    Conventionally, the Bouma Sequence [Bouma, A.H., 1962. Sedimentology of some Flysch Deposits: A Graphic Approach to Facies Interpretation. Elsevier, Amsterdam, 168 pp.], composed of T a, T b, T c, T d, and T e divisions, is interpreted to be the product of a turbidity current. However, recent core and outcrop studies show that the complete and partial Bouma sequences can also be interpreted to be deposits formed by processes other than turbidity currents, such as sandy debris flows and bottom-current reworking. Many published examples of turbidites, most of them hydrocarbon-bearing sands, in the North Sea, the Norwegian Sea, offshore Nigeria, offshore Gabon, Gulf of Mexico, and the Ouachita Mountains, are being reinterpreted by the present author as dominantly deposits of sandy debris flows and bottom-current reworking with only a minor percentage of true turbidites (i.e., deposits of turbidity currents with fluidal or Newtonian rheology in which sediment is suspended by fluid turbulence). This reinterpretation is based on detailed description of 21,000 ft (6402 m) of conventional cores and 1200 ft (365 m) of outcrop sections. The predominance of interpreted turbidites in these areas by other workers can be attributed to the following: (1) loose applications of turbidity-current concepts without regard for fluid rheology, flow state, and sediment-support mechanism that result in a category of 'turbidity currents' that includes debris flows and bottom currents; (2) field description of deep-water sands using the Bouma Sequence (an interpretive model) that invariably leads to a model-driven turbidite interpretation; (3) the prevailing turbidite mind set that subconsciously forces one to routinely interpret most deep-water sands as some kind of turbidites; (4) the use of our inability to interpret transport mechanism from the depositional record as an excuse for assuming deep-water sands as deposits of turbidity currents; (5) the flawed concept of high-density turbidity currents that allows room for interpreting debris-flow deposits as turbidites; (6) the flawed comparison of subaerial river currents (fluid-gravity flows dominated by bed-load transport) with subaqueous turbidity currents (sediment-gravity flows dominated by suspended load transport) that results in misinterpreting ungraded or parallel-stratified deep-sea deposits as turbidites; and (7) the attraction to use obsolete submarine-fan models with channels and lobes that require a turbidite interpretation. Although the turbidite paradigm is alive and well for now, the turbidites themselves are becoming an endangered facies!

  14. Mark report satellite tags (mrPATs) to detail large-scale horizontal movements of deep water species: First results for the Greenland shark (Somniosus microcephalus)

    NASA Astrophysics Data System (ADS)

    Hussey, Nigel E.; Orr, Jack; Fisk, Aaron T.; Hedges, Kevin J.; Ferguson, Steven H.; Barkley, Amanda N.

    2018-04-01

    The deep-sea is increasingly viewed as a lucrative environment for the growth of resource extraction industries. To date, our ability to study deep-sea species lags behind that of those inhabiting the photic zone limiting scientific data available for management. In particular, knowledge of horizontal movements is restricted to two locations; capture and recapture, with no temporal information on absolute animal locations between endpoints. To elucidate the horizontal movements of a large deep-sea fish, a novel tagging approach was adopted using the smallest available prototype satellite tag - the mark-report pop-up archival tag (mrPAT). Five Greenland sharks (Somniosus microcephalus) were equipped with multiple mrPATs as well as a standard archival satellite tag (miniPAT) that were programmed to release in sequence at 8-10 day intervals. The performance of the mrPATs was quantified. The tagging approach provided multiple locations per individual and revealed a previously unknown directed migration of Greenland sharks from the Canadian high Arctic to Northwest Greenland. All tags reported locations, however, the accuracy and time from expected release were variable among tags (average time to an accurate location from expected release = 30.8 h, range: 4.9-227.6 h). Average mrPAT drift rate estimated from best quality messages (LQ1,2,3) was 0.37 ± 0.09 m/s indicating tags were on average 41.1 ± 63.4 km (range: 6.5-303.1 km) from the location of the animal when they transmitted. mrPATs provided daily temperature values that were highly correlated among tags and with the miniPAT (70.8% of tag pairs were significant). In contrast, daily tilt sensor data were variable among tags on the same animal (12.5% of tag pairs were significant). Tracking large-scale movements of deep-sea fish has historically been limited by the remote environment they inhabit. The current study provides a new approach to document reliable coarse scale horizontal movements to understand migrations, stock structure and habitat use of large species. Opportunities to apply mrPATs to understand the movements of medium size fish, marine mammals and to validate retrospective movement modeling approaches based on archival data are presented.

  15. Stratal stacking patterns and tectono-sedimentary evolution of hyperextended magma-poor rifted margins

    NASA Astrophysics Data System (ADS)

    Ribes, C.; Gillard, M.; Epin, M. E.; Ghienne, J. F.; Manatschal, G.; Karner, G. D.; Johnson, C. A.

    2016-12-01

    Research on the formation and evolution of deep-water rifted margins has undergone a major paradigm shift in recent years. An increasing number of studies of present-day and fossil rifted margins allow us to identify and characterize the structural architecture of the most distal parts of rifted margins, the so-called hyperextended, magma-poor rifted margins. However, at present, little is known about the depositional environments, sedimentary facies, stacking patterns, subsidence and thermal history within these domains. In this context, characterizing the stratal stacking patterns and understanding their spatial and temporal evolution is a new challenge. The major difficulty comes from the fact that the observed stratigraphic geometries and facies relationships are a result of the complex interplay between sediment supply and available accommodation, which is controlled by not only the regional generation of accommodation, but also by local tectono-magmatic processes. These parameters are poorly constrained or even sufficiently known in these tectonic settings. Indeed, the complex structural evolution of hyperextended magma-poor rifted margins, including the development of poly-phase in-sequence and out of sequence extensional detachment faults and associated mantle exhumation and magmatic activity, can generate complex accommodation patterns over a highly structured top basement. The presentation summarizes early results concerning the controlling parameters on ultra-deep water stratigraphic stacking patterns and to provide a conceptual framework. This observation-driven approach combines fieldwork from fossil Alpine Tethys margins exposed in the Alps and the analysis of seismic reflection data from present-day deep water rifted margins such as the Australian-Antarctic, East India and Iberia-Newfoundland margins.

  16. Identification and Removal of Contaminant Sequences From Ribosomal Gene Databases: Lessons From the Census of Deep Life

    PubMed Central

    Sheik, Cody S.; Reese, Brandi Kiel; Twing, Katrina I.; Sylvan, Jason B.; Grim, Sharon L.; Schrenk, Matthew O.; Sogin, Mitchell L.; Colwell, Frederick S.

    2018-01-01

    Earth’s subsurface environment is one of the largest, yet least studied, biomes on Earth, and many questions remain regarding what microorganisms are indigenous to the subsurface. Through the activity of the Census of Deep Life (CoDL) and the Deep Carbon Observatory, an open access 16S ribosomal RNA gene sequence database from diverse subsurface environments has been compiled. However, due to low quantities of biomass in the deep subsurface, the potential for incorporation of contaminants from reagents used during sample collection, processing, and/or sequencing is high. Thus, to understand the ecology of subsurface microorganisms (i.e., the distribution, richness, or survival), it is necessary to minimize, identify, and remove contaminant sequences that will skew the relative abundances of all taxa in the sample. In this meta-analysis, we identify putative contaminants associated with the CoDL dataset, recommend best practices for removing contaminants from samples, and propose a series of best practices for subsurface microbiology sampling. The most abundant putative contaminant genera observed, independent of evenness across samples, were Propionibacterium, Aquabacterium, Ralstonia, and Acinetobacter. While the top five most frequently observed genera were Pseudomonas, Propionibacterium, Acinetobacter, Ralstonia, and Sphingomonas. The majority of the most frequently observed genera (high evenness) were associated with reagent or potential human contamination. Additionally, in DNA extraction blanks, we observed potential archaeal contaminants, including methanogens, which have not been discussed in previous contamination studies. Such contaminants would directly affect the interpretation of subsurface molecular studies, as methanogenesis is an important subsurface biogeochemical process. Utilizing previously identified contaminant genera, we found that ∼27% of the total dataset were identified as contaminant sequences that likely originate from DNA extraction and DNA cleanup methods. Thus, controls must be taken at every step of the collection and processing procedure when working with low biomass environments such as, but not limited to, portions of Earth’s deep subsurface. Taken together, we stress that the CoDL dataset is an incredible resource for the broader research community interested in subsurface life, and steps to remove contamination derived sequences must be taken prior to using this dataset. PMID:29780369

  17. Low-abundance HIV drug-resistant viral variants in treatment-experienced persons correlate with historical antiretroviral use.

    PubMed

    Le, Thuy; Chiarella, Jennifer; Simen, Birgitte B; Hanczaruk, Bozena; Egholm, Michael; Landry, Marie L; Dieckhaus, Kevin; Rosen, Marc I; Kozal, Michael J

    2009-06-29

    It is largely unknown how frequently low-abundance HIV drug-resistant variants at levels under limit of detection of conventional genotyping (<20% of quasi-species) are present in antiretroviral-experienced persons experiencing virologic failure. Further, the clinical implications of low-abundance drug-resistant variants at time of virologic failure are unknown. Plasma samples from 22 antiretroviral-experienced subjects collected at time of virologic failure (viral load 1380 to 304,000 copies/mL) were obtained from a specimen bank (from 2004-2007). The prevalence and profile of drug-resistant mutations were determined using Sanger sequencing and ultra-deep pyrosequencing. Genotypes were interpreted using Stanford HIV database algorithm. Antiretroviral treatment histories were obtained by chart review and correlated with drug-resistant mutations. Low-abundance drug-resistant mutations were detected in all 22 subjects by deep sequencing and only in 3 subjects by Sanger sequencing. In total they accounted for 90 of 247 mutations (36%) detected by deep sequencing; the majority of these (95%) were not detected by standard genotyping. A mean of 4 additional mutations per subject were detected by deep sequencing (p<0.0001, 95%CI: 2.85-5.53). The additional low-abundance drug-resistant mutations increased a subject's genotypic resistance to one or more antiretrovirals in 17 of 22 subjects (77%). When correlated with subjects' antiretroviral treatment histories, the additional low-abundance drug-resistant mutations correlated with the failing antiretroviral drugs in 21% subjects and correlated with historical antiretroviral use in 79% subjects (OR, 13.73; 95% CI, 2.5-74.3, p = 0.0016). Low-abundance HIV drug-resistant mutations in antiretroviral-experienced subjects at time of virologic failure can increase a subject's overall burden of resistance, yet commonly go unrecognized by conventional genotyping. The majority of unrecognized resistant mutations correlate with historical antiretroviral use. Ultra-deep sequencing can provide important historical resistance information for clinicians when planning subsequent antiretroviral regimens for highly treatment-experienced patients, particularly when their prior treatment histories and longitudinal genotypes are not available.

  18. Low-Abundance HIV Drug-Resistant Viral Variants in Treatment-Experienced Persons Correlate with Historical Antiretroviral Use

    PubMed Central

    Le, Thuy; Chiarella, Jennifer; Simen, Birgitte B.; Hanczaruk, Bozena; Egholm, Michael; Landry, Marie L.; Dieckhaus, Kevin; Rosen, Marc I.; Kozal, Michael J.

    2009-01-01

    Background It is largely unknown how frequently low-abundance HIV drug-resistant variants at levels under limit of detection of conventional genotyping (<20% of quasi-species) are present in antiretroviral-experienced persons experiencing virologic failure. Further, the clinical implications of low-abundance drug-resistant variants at time of virologic failure are unknown. Methodology/Principal Findings Plasma samples from 22 antiretroviral-experienced subjects collected at time of virologic failure (viral load 1380 to 304,000 copies/mL) were obtained from a specimen bank (from 2004–2007). The prevalence and profile of drug-resistant mutations were determined using Sanger sequencing and ultra-deep pyrosequencing. Genotypes were interpreted using Stanford HIV database algorithm. Antiretroviral treatment histories were obtained by chart review and correlated with drug-resistant mutations. Low-abundance drug-resistant mutations were detected in all 22 subjects by deep sequencing and only in 3 subjects by Sanger sequencing. In total they accounted for 90 of 247 mutations (36%) detected by deep sequencing; the majority of these (95%) were not detected by standard genotyping. A mean of 4 additional mutations per subject were detected by deep sequencing (p<0.0001, 95%CI: 2.85–5.53). The additional low-abundance drug-resistant mutations increased a subject's genotypic resistance to one or more antiretrovirals in 17 of 22 subjects (77%). When correlated with subjects' antiretroviral treatment histories, the additional low-abundance drug-resistant mutations correlated with the failing antiretroviral drugs in 21% subjects and correlated with historical antiretroviral use in 79% subjects (OR, 13.73; 95% CI, 2.5–74.3, p = 0.0016). Conclusions/Significance Low-abundance HIV drug-resistant mutations in antiretroviral-experienced subjects at time of virologic failure can increase a subject's overall burden of resistance, yet commonly go unrecognized by conventional genotyping. The majority of unrecognized resistant mutations correlate with historical antiretroviral use. Ultra-deep sequencing can provide important historical resistance information for clinicians when planning subsequent antiretroviral regimens for highly treatment-experienced patients, particularly when their prior treatment histories and longitudinal genotypes are not available. PMID:19562031

  19. Identification and Removal of Contaminant Sequences From Ribosomal Gene Databases: Lessons From the Census of Deep Life.

    PubMed

    Sheik, Cody S; Reese, Brandi Kiel; Twing, Katrina I; Sylvan, Jason B; Grim, Sharon L; Schrenk, Matthew O; Sogin, Mitchell L; Colwell, Frederick S

    2018-01-01

    Earth's subsurface environment is one of the largest, yet least studied, biomes on Earth, and many questions remain regarding what microorganisms are indigenous to the subsurface. Through the activity of the Census of Deep Life (CoDL) and the Deep Carbon Observatory, an open access 16S ribosomal RNA gene sequence database from diverse subsurface environments has been compiled. However, due to low quantities of biomass in the deep subsurface, the potential for incorporation of contaminants from reagents used during sample collection, processing, and/or sequencing is high. Thus, to understand the ecology of subsurface microorganisms (i.e., the distribution, richness, or survival), it is necessary to minimize, identify, and remove contaminant sequences that will skew the relative abundances of all taxa in the sample. In this meta-analysis, we identify putative contaminants associated with the CoDL dataset, recommend best practices for removing contaminants from samples, and propose a series of best practices for subsurface microbiology sampling. The most abundant putative contaminant genera observed, independent of evenness across samples, were Propionibacterium , Aquabacterium , Ralstonia , and Acinetobacter . While the top five most frequently observed genera were Pseudomonas , Propionibacterium , Acinetobacter , Ralstonia , and Sphingomonas . The majority of the most frequently observed genera (high evenness) were associated with reagent or potential human contamination. Additionally, in DNA extraction blanks, we observed potential archaeal contaminants, including methanogens, which have not been discussed in previous contamination studies. Such contaminants would directly affect the interpretation of subsurface molecular studies, as methanogenesis is an important subsurface biogeochemical process. Utilizing previously identified contaminant genera, we found that ∼27% of the total dataset were identified as contaminant sequences that likely originate from DNA extraction and DNA cleanup methods. Thus, controls must be taken at every step of the collection and processing procedure when working with low biomass environments such as, but not limited to, portions of Earth's deep subsurface. Taken together, we stress that the CoDL dataset is an incredible resource for the broader research community interested in subsurface life, and steps to remove contamination derived sequences must be taken prior to using this dataset.

  20. MRI-based dynamic tracking of an untethered ferromagnetic microcapsule navigating in liquid

    NASA Astrophysics Data System (ADS)

    Dahmen, Christian; Belharet, Karim; Folio, David; Ferreira, Antoine; Fatikow, Sergej

    2016-04-01

    The propulsion of ferromagnetic objects by means of MRI gradients is a promising approach to enable new forms of therapy. In this work, necessary techniques are presented to make this approach work. This includes path planning algorithms working on MRI data, ferromagnetic artifact imaging and a tracking algorithm which delivers position feedback for the ferromagnetic objects, and a propulsion sequence to enable interleaved magnetic propulsion and imaging. Using a dedicated software environment, integrating path-planning methods and real-time tracking, a clinical MRI system is adapted to provide this new functionality for controlled interventional targeted therapeutic applications. Through MRI-based sensing analysis, this article aims to propose a framework to plan a robust pathway to enhance the navigation ability to reach deep locations in the human body. The proposed approaches are validated with different experiments.

  1. Sedimentation and basin-fill history of the Neogene clastic succession exposed in the southeastern fold belt of the Bengal Basin, Bangladesh: a high-resolution sequence stratigraphic approach

    NASA Astrophysics Data System (ADS)

    Royhan Gani, M.; Mustafa Alam, M.

    2003-02-01

    The Tertiary basin-fill history of the Bengal Basin suffers from oversimplification. The interpretation of the sedimentary history of the basin should be consistent with the evolution of its three geo-tectonic provinces, namely, western, northeastern and eastern. Each province has its own basin generation and sediment-fill history related mainly to the Indo-Burmese and subordinately to the Indo-Tibetan plate convergence. This paper is mainly concerned with facies and facies sequence analysis of the Neogene clastic succession within the subduction-related active margin setting (oblique convergence) in the southeastern fold belt of the Bengal Basin. Detailed fieldwork was carried out in the Sitapahar anticline of the Rangamati area and the Mirinja anticline of the Lama area. The study shows that the exposed Neogene succession represents an overall basinward progradation from deep marine through shallow marine to continental-fluvial environments. Based on regionally correlatable erosion surfaces the entire succession (3000+ m thick) has been grouped into three composite sequences C, B and A, from oldest to youngest. Composite sequence C begins with deep-water base-of-slope clastics overlain by thick slope mud that passes upward into shallow marine and nearshore clastics. Composite sequence B characteristically depicts tide-dominated open-marine to coastal depositional systems with evidence of cyclic marine regression and transgression. Repetitive occurrence of incised channel, tidal inlet, tidal ridge/shoal, tidal flat and other tidal deposits is separated by shelfal mudstone. Most of the sandbodies contain a full spectrum of tide-generated structures (e.g. herringbone cross-bedding, bundle structure, mud couplet, bipolar cross-lamination with reactivation surfaces, 'tidal' bedding). Storm activities appear to have played a subordinate role in the mid and inner shelf region. Rizocorallium, Rosselia, Planolites and Zoophycos are the dominant ichnofacies within the shelfal mudstone. This paralic sedimentation of Neogene succession in the study area can serve as a good point of reference for tide-dominated regressive shelf depositional systems. The top of the composite sequence B is marked by a pronounced erosion surface indicating the final phase of marine regression followed by the gradual establishment of continental-fluvial depositional systems represented by composite sequence A. In this composite sequence, stacked channel bars of low-sinuosity braided rivers gradually pass upsequence into high-sinuosity meandering river deposits. A sequence stratigraphic approach has been adopted to interpret the basin-fill history with respect to relative sea-level changes; and to subdivide the rock record into several sequences and units (systems tracts and parasequences) based on identified bounding discontinuities, such as transgressive erosion surface (TES), regressive erosion surface (RES), marine flooding surface (MFS), and incised valley floor (IVF). This approach provides new insight for both exploration and exploitation strategy for hydrocarbon plays that may prove vital to the oil companies engaged in exploration activities in the Bengal Basin. It is strongly recommended here that the traditional lithostratigraphic classification of this part of the basin, which is based on the Assam stratigraphy, be abandoned or at least revised. A tentative allostratigraphic scheme is presented, and it is suggested that to formalize the scheme further study, both surface and subsurface, is needed.

  2. Graphical classification of DNA sequences of HLA alleles by deep learning.

    PubMed

    Miyake, Jun; Kaneshita, Yuhei; Asatani, Satoshi; Tagawa, Seiichi; Niioka, Hirohiko; Hirano, Takashi

    2018-04-01

    Alleles of human leukocyte antigen (HLA)-A DNAs are classified and expressed graphically by using artificial intelligence "Deep Learning (Stacked autoencoder)". Nucleotide sequence data corresponding to the length of 822 bp, collected from the Immuno Polymorphism Database, were compressed to 2-dimensional representation and were plotted. Profiles of the two-dimensional plots indicate that the alleles can be classified as clusters are formed. The two-dimensional plot of HLA-A DNAs gives a clear outlook for characterizing the various alleles.

  3. 3' terminal diversity of MRP RNA and other human noncoding RNAs revealed by deep sequencing.

    PubMed

    Goldfarb, Katherine C; Cech, Thomas R

    2013-09-21

    Post-transcriptional 3' end processing is a key component of RNA regulation. The abundant and essential RNA subunit of RNase MRP has been proposed to function in three distinct cellular compartments and therefore may utilize this mode of regulation. Here we employ 3' RACE coupled with high-throughput sequencing to characterize the 3' terminal sequences of human MRP RNA and other noncoding RNAs that form RNP complexes. The 3' terminal sequence of MRP RNA from HEK293T cells has a distinctive distribution of genomically encoded termini (including an assortment of U residues) with a portion of these selectively tagged by oligo(A) tails. This profile contrasts with the relatively homogenous 3' terminus of an in vitro transcribed MRP RNA control and the differing 3' terminal profiles of U3 snoRNA, RNase P RNA, and telomerase RNA (hTR). 3' RACE coupled with deep sequencing provides a valuable framework for the functional characterization of 3' terminal sequences of noncoding RNAs.

  4. Multiplicity and molecular epidemiology of Plasmodium vivax and Plasmodium falciparum infections in East Africa.

    PubMed

    Zhong, Daibin; Lo, Eugenia; Wang, Xiaoming; Yewhalaw, Delenasaw; Zhou, Guofa; Atieli, Harrysone E; Githeko, Andrew; Hemming-Schroeder, Elizabeth; Lee, Ming-Chieh; Afrane, Yaw; Yan, Guiyun

    2018-05-02

    Parasite genetic diversity and multiplicity of infection (MOI) affect clinical outcomes, response to drug treatment and naturally-acquired or vaccine-induced immunity. Traditional methods often underestimate the frequency and diversity of multiclonal infections due to technical sensitivity and specificity. Next-generation sequencing techniques provide a novel opportunity to study complexity of parasite populations and molecular epidemiology. Symptomatic and asymptomatic Plasmodium vivax samples were collected from health centres/hospitals and schools, respectively, from 2011 to 2015 in Ethiopia. Similarly, both symptomatic and asymptomatic Plasmodium falciparum samples were collected, respectively, from hospitals and schools in 2005 and 2015 in Kenya. Finger-pricked blood samples were collected and dried on filter paper. Long amplicon (> 400 bp) deep sequencing of merozoite surface protein 1 (msp1) gene was conducted to determine multiplicity and molecular epidemiology of P. vivax and P. falciparum infections. The results were compared with those based on short amplicon (117 bp) deep sequencing. A total of 139 P. vivax and 222 P. falciparum samples were pyro-sequenced for pvmsp1 and pfmsp1, yielding a total of 21 P. vivax and 99 P. falciparum predominant haplotypes. The average MOI for P. vivax and P. falciparum were 2.16 and 2.68, respectively, which were significantly higher than that of microsatellite markers and short amplicon (117 bp) deep sequencing. Multiclonal infections were detected in 62.2% of the samples for P. vivax and 74.8% of the samples for P. falciparum. Four out of the five subjects with recurrent P. vivax malaria were found to be a relapse 44-65 days after clearance of parasites. No difference was observed in MOI among P. vivax patients of different symptoms, ages and genders. Similar patterns were also observed in P. falciparum except for one study site in Kenyan lowland areas with significantly higher MOI. The study used a novel method to evaluate Plasmodium MOI and molecular epidemiological patterns by long amplicon ultra-deep sequencing. The complexity of infections were similar among age groups, symptoms, genders, transmission settings (spatial heterogeneity), as well as over years (pre- vs. post-scale-up interventions). This study demonstrated that long amplicon deep sequencing is a useful tool to investigate multiplicity and molecular epidemiology of Plasmodium parasite infections.

  5. The deep, hot biosphere: Twenty-five years of retrospection

    PubMed Central

    Colman, Daniel R.; Poudel, Saroj; Stamps, Blake W.; Boyd, Eric S.; Spear, John R.

    2017-01-01

    Twenty-five years ago this month, Thomas Gold published a seminal manuscript suggesting the presence of a “deep, hot biosphere” in the Earth’s crust. Since this publication, a considerable amount of attention has been given to the study of deep biospheres, their role in geochemical cycles, and their potential to inform on the origin of life and its potential outside of Earth. Overwhelming evidence now supports the presence of a deep biosphere ubiquitously distributed on Earth in both terrestrial and marine settings. Furthermore, it has become apparent that much of this life is dependent on lithogenically sourced high-energy compounds to sustain productivity. A vast diversity of uncultivated microorganisms has been detected in subsurface environments, and we show that H2, CH4, and CO feature prominently in many of their predicted metabolisms. Despite 25 years of intense study, key questions remain on life in the deep subsurface, including whether it is endemic and the extent of its involvement in the anaerobic formation and degradation of hydrocarbons. Emergent data from cultivation and next-generation sequencing approaches continue to provide promising new hints to answer these questions. As Gold suggested, and as has become increasingly evident, to better understand the subsurface is critical to further understanding the Earth, life, the evolution of life, and the potential for life elsewhere. To this end, we suggest the need to develop a robust network of interdisciplinary scientists and accessible field sites for long-term monitoring of the Earth’s subsurface in the form of a deep subsurface microbiome initiative. PMID:28674200

  6. Phylogenetic and enzymatic diversity of deep subseafloor aerobic microorganisms in organics- and methane-rich sediments off Shimokita Peninsula.

    PubMed

    Kobayashi, Tohru; Koide, Osamu; Mori, Kozue; Shimamura, Shigeru; Matsuura, Takae; Miura, Takeshi; Takaki, Yoshihiro; Morono, Yuki; Nunoura, Takuro; Imachi, Hiroyuki; Inagaki, Fumio; Takai, Ken; Horikoshi, Koki

    2008-07-01

    "A meta-enzyme approach" is proposed as an ecological enzymatic method to explore the potential functions of microbial communities in extreme environments such as the deep marine subsurface. We evaluated a variety of extra-cellular enzyme activities of sediment slurries and isolates from a deep subseafloor sediment core. Using the new deep-sea drilling vessel "Chikyu", we obtained 365 m of core sediments that contained approximately 2% organic matter and considerable amounts of methane from offshore the Shimokita Peninsula in Japan at a water depth of 1,180 m. In the extra-sediment fraction of the slurry samples, phosphatase, esterase, and catalase activities were detected consistently throughout the core sediments down to the deepest slurry sample from 342.5 m below seafloor (mbsf). Detectable enzyme activities predicted the existence of a sizable population of viable aerobic microorganisms even in deep subseafloor habitats. The subsequent quantitative cultivation using solid media represented remarkably high numbers of aerobic, heterotrophic microbial populations (e.g., maximally 4.4x10(7) cells cm(-3) at 342.5 mbsf). Analysis of 16S rRNA gene sequences revealed that the predominant cultivated microbial components were affiliated with the genera Bacillus, Shewanella, Pseudoalteromonas, Halomonas, Pseudomonas, Paracoccus, Rhodococcus, Microbacterium, and Flexibacteracea. Many of the predominant and scarce isolates produced a variety of extra-cellular enzymes such as proteases, amylases, lipases, chitinases, phosphatases, and deoxyribonucleases. Our results indicate that microbes in the deep subseafloor environment off Shimokita are metabolically active and that the cultivable populations may have a great potential in biotechnology.

  7. Fragment analysis represents a suitable approach for the detection of hotspot c.7541_7542delCT NOTCH1 mutation in chronic lymphocytic leukemia.

    PubMed

    Vavrova, Eva; Kantorova, Barbara; Vonkova, Barbara; Kabathova, Jitka; Skuhrova-Francova, Hana; Diviskova, Eva; Letocha, Ondrej; Kotaskova, Jana; Brychtova, Yvona; Doubek, Michael; Mayer, Jiri; Pospisilova, Sarka

    2017-09-01

    The hotspot c.7541_7542delCT NOTCH1 mutation has been proven to have a negative clinical impact in chronic lymphocytic leukemia (CLL). However, an optimal method for its detection has not yet been specified. The aim of our study was to examine the presence of the NOTCH1 mutation in CLL using three commonly used molecular methods. Sanger sequencing, fragment analysis and allele-specific PCR were compared in the detection of the c.7541_7542delCT NOTCH1 mutation in 201 CLL patients. In 7 patients with inconclusive mutational analysis results, the presence of the NOTCH1 mutation was also confirmed using ultra-deep next generation sequencing. The NOTCH1 mutation was detected in 15% (30/201) of examined patients. Only fragment analysis was able to identify all 30 NOTCH1-mutated patients. Sanger sequencing and allele-specific PCR showed a lower detection efficiency, determining 93% (28/30) and 80% (24/30) of the present NOTCH1 mutations, respectively. Considering these three most commonly used methodologies for c.7541_7542delCT NOTCH1 mutation screening in CLL, we defined fragment analysis as the most suitable approach for detecting the hotspot NOTCH1 mutation. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. Deep sequencing and genome-wide analysis reveals the expansion of MicroRNA genes in the gall midge Mayetiola destructor

    PubMed Central

    2013-01-01

    Background MicroRNAs (miRNAs) are small non-coding RNAs that play critical roles in regulating post transcriptional gene expression. Gall midges encompass a large group of insects that are of economic importance and also possess fascinating biological traits. The gall midge Mayetiola destructor, commonly known as the Hessian fly, is a destructive pest of wheat and model organism for studying gall midge biology and insect – host plant interactions. Results In this study, we systematically analyzed miRNAs from the Hessian fly. Deep-sequencing a Hessian fly larval transcriptome led to the identification of 89 miRNA species that are either identical or very similar to known miRNAs from other insects, and 184 novel miRNAs that have not been reported from other species. A genome-wide search through a draft Hessian fly genome sequence identified a total of 611 putative miRNA-encoding genes based on sequence similarity and the existence of a stem-loop structure for miRNA precursors. Analysis of the 611 putative genes revealed a striking feature: the dramatic expansion of several miRNA gene families. The largest family contained 91 genes that encoded 20 different miRNAs. Microarray analyses revealed the expression of miRNA genes was strictly regulated during Hessian fly larval development and abundance of many miRNA genes were affected by host genotypes. Conclusion The identification of a large number of miRNAs for the first time from a gall midge provides a foundation for further studies of miRNA functions in gall midge biology and behavior. The dramatic expansion of identical or similar miRNAs provides a unique system to study functional relations among miRNA iso-genes as well as changes in sequence specificity due to small changes in miRNAs and in their mRNA targets. These results may also facilitate the identification of miRNA genes for potential pest control through transgenic approaches. PMID:23496979

  9. Uncovering Small RNA-Mediated Responses to Cold Stress in a Wheat Thermosensitive Genic Male-Sterile Line by Deep Sequencing1[W][OA

    PubMed Central

    Tang, Zhonghui; Zhang, Liping; Xu, Chenguang; Yuan, Shaohua; Zhang, Fengting; Zheng, Yonglian; Zhao, Changping

    2012-01-01

    The male sterility of thermosensitive genic male sterile (TGMS) lines of wheat (Triticum aestivum) is strictly controlled by temperature. The early phase of anther development is especially susceptible to cold stress. MicroRNAs (miRNAs) play an important role in plant development and in responses to environmental stress. In this study, deep sequencing of small RNA (smRNA) libraries obtained from spike tissues of the TGMS line under cold and control conditions identified a total of 78 unique miRNA sequences from 30 families and trans-acting small interfering RNAs (tasiRNAs) derived from two TAS3 genes. To identify smRNA targets in the wheat TGMS line, we applied the degradome sequencing method, which globally and directly identifies the remnants of smRNA-directed target cleavage. We identified 26 targets of 16 miRNA families and three targets of tasiRNAs. Comparing smRNA sequencing data sets and TaqMan quantitative polymerase chain reaction results, we identified six miRNAs and one tasiRNA (tasiRNA-ARF [for Auxin-Responsive Factor]) as cold stress-responsive smRNAs in spike tissues of the TGMS line. We also determined the expression profiles of target genes that encode transcription factors in response to cold stress. Interestingly, the expression of cold stress-responsive smRNAs integrated in the auxin-signaling pathway and their target genes was largely noncorrelated. We investigated the tissue-specific expression of smRNAs using a tissue microarray approach. Our data indicated that miR167 and tasiRNA-ARF play roles in regulating the auxin-signaling pathway and possibly in the developmental response to cold stress. These data provide evidence that smRNA regulatory pathways are linked with male sterility in the TGMS line during cold stress. PMID:22508932

  10. SpliceRover: Interpretable Convolutional Neural: Networks for Improved Splice Site Prediction.

    PubMed

    Zuallaert, Jasper; Godin, Fréderic; Kim, Mijung; Soete, Arne; Saeys, Yvan; De Neve, Wesley

    2018-06-21

    During the last decade, improvements in high-throughput sequencing have generated a wealth of genomic data. Functionally interpreting these sequences and finding the biological signals that are hallmarks of gene function and regulation is currently mostly done using automated genome annotation platforms, which mainly rely on integrated machine learning frameworks to identify different functional sites of interest, including splice sites. Splicing is an essential step in the gene regulation process, and the correct identification of splice sites is a major cornerstone in a genome annotation system. In this paper, we present SpliceRover, a predictive deep learning approach that outperforms the state-of-the-art in splice site prediction. SpliceRover uses convolutional neural networks (CNNs), which have been shown to obtain cutting edge performance on a wide variety of prediction tasks. We adapted this approach to deal with genomic sequence inputs, and show it consistently outperforms already existing approaches, with relative improvements in prediction effectiveness of up to 80.9% when measured in terms of false discovery rate. However, a major criticism of CNNs concerns their "black box" nature, as mechanisms to obtain insight into their reasoning processes are limited. To facilitate interpretability of the SpliceRover models, we introduce an approach to visualize the biologically relevant information learnt. We show that our visualization approach is able to recover features known to be important for splice site prediction (binding motifs around the splice site, presence of polypyrimidine tracts and branch points), as well as reveal new features (e.g., several types of exclusion patterns near splice sites). SpliceRover is available as a web service. The prediction tool and instructions can be found at http://bioit2.irc.ugent.be/splicerover/. Supplementary materials are available at Bioinformatics online.

  11. 33 CFR 207.640 - Sacramento Deep Water Ship Channel Barge Lock and Approach Canals; use, administration, and...

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 33 Navigation and Navigable Waters 3 2014-07-01 2014-07-01 false Sacramento Deep Water Ship... REGULATIONS § 207.640 Sacramento Deep Water Ship Channel Barge Lock and Approach Canals; use, administration, and navigation. (a) Sacramento Deep Water Ship Channel Barge Lock and Approach Canals; use...

  12. 33 CFR 207.640 - Sacramento Deep Water Ship Channel Barge Lock and Approach Canals; use, administration, and...

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 33 Navigation and Navigable Waters 3 2013-07-01 2013-07-01 false Sacramento Deep Water Ship... REGULATIONS § 207.640 Sacramento Deep Water Ship Channel Barge Lock and Approach Canals; use, administration, and navigation. (a) Sacramento Deep Water Ship Channel Barge Lock and Approach Canals; use...

  13. 33 CFR 207.640 - Sacramento Deep Water Ship Channel Barge Lock and Approach Canals; use, administration, and...

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 33 Navigation and Navigable Waters 3 2012-07-01 2012-07-01 false Sacramento Deep Water Ship... REGULATIONS § 207.640 Sacramento Deep Water Ship Channel Barge Lock and Approach Canals; use, administration, and navigation. (a) Sacramento Deep Water Ship Channel Barge Lock and Approach Canals; use...

  14. Complete genome sequence of a novel genotype of squash mosaic virus

    USDA-ARS?s Scientific Manuscript database

    Complete genome sequence of a novel genotype of Squash mosaic virus (SqMV) infecting squash plants in Spain was obtained using deep sequencing of small ribonucleic acids and assembly. The low nucleotide sequence identities, with 87-88% on RNA1 and 84-86% on RNA2 to known SqMV isolates, suggest a new...

  15. First complete genome sequence of an emerging cucumber green mottle mosaic virus isolate in North America

    USDA-ARS?s Scientific Manuscript database

    The complete genome sequence (6,423 nt) of an emerging Cucumber green mottle mosaic virus (CGMMV) isolate on cucumber in North America was determined through deep sequencing of sRNA and rapid amplification of cDNA ends. It shares 99% nucleotide sequence identity to the Asian genotype, but only 90% t...

  16. Canada's Deep Geological Repository For Used Nuclear Fuel -The Geoscientific Site Evaluation Process

    NASA Astrophysics Data System (ADS)

    Hirschorn, S.; Ben Belfadhel, M.; Blyth, A.; DesRoches, A. J.; McKelvie, J. R. M.; Parmenter, A.; Sanchez-Rico Castejon, M.; Urrutia-Bustos, A.; Vorauer, A.

    2014-12-01

    The Nuclear Waste Management Organization (NWMO) is responsible for implementing Adaptive Phased Management, the approach selected by the Government of Canada for long-term management of used nuclear fuel generated by Canadian nuclear reactors. In May 2010, the NWMO published and initiated a nine-step site selection process to find an informed and willing community to host a deep geological repository for Canada's used nuclear fuel. The site selection process is designed to address a broad range of technical and social, economic and cultural factors. The suitability of candidate areas will be assessed in a stepwise manner over a period of many years and include three main steps: Initial Screenings; Preliminary Assessments; and Detailed Site Characterizations. The Preliminary Assessment is conducted in two phases. NWMO has completed Phase 1 preliminary assessments for the first eight communities that entered into this step. While the Phase 1 desktop geoscientific assessments showed that each of the eight communities contains general areas that have the potential to satisfy the geoscientific safety requirements for hosting a deep geological repository, the assessment identified varying degrees of geoscientific complexity and uncertainty between communities, reflecting their different geological settings and structural histories. Phase 2 activities will include a sequence of high-resolution airborne geophysical surveys and focused geological field mapping to ground-truth lithology and structural features, followed by limited deep borehole drilling and testing. These activities will further evaluate the site's ability to meet the safety functions that a site would need to ultimately satisfy in order to be considered suitable. This paper provides an update on the site evaluation process and describes the approach, methods and criteria that are being used to conduct the geoscientific Preliminary Assessments.

  17. Coalescent Inference Using Serially Sampled, High-Throughput Sequencing Data from Intrahost HIV Infection

    PubMed Central

    Dialdestoro, Kevin; Sibbesen, Jonas Andreas; Maretty, Lasse; Raghwani, Jayna; Gall, Astrid; Kellam, Paul; Pybus, Oliver G.; Hein, Jotun; Jenkins, Paul A.

    2016-01-01

    Human immunodeficiency virus (HIV) is a rapidly evolving pathogen that causes chronic infections, so genetic diversity within a single infection can be very high. High-throughput “deep” sequencing can now measure this diversity in unprecedented detail, particularly since it can be performed at different time points during an infection, and this offers a potentially powerful way to infer the evolutionary dynamics of the intrahost viral population. However, population genomic inference from HIV sequence data is challenging because of high rates of mutation and recombination, rapid demographic changes, and ongoing selective pressures. In this article we develop a new method for inference using HIV deep sequencing data, using an approach based on importance sampling of ancestral recombination graphs under a multilocus coalescent model. The approach further extends recent progress in the approximation of so-called conditional sampling distributions, a quantity of key interest when approximating coalescent likelihoods. The chief novelties of our method are that it is able to infer rates of recombination and mutation, as well as the effective population size, while handling sampling over different time points and missing data without extra computational difficulty. We apply our method to a data set of HIV-1, in which several hundred sequences were obtained from an infected individual at seven time points over 2 years. We find mutation rate and effective population size estimates to be comparable to those produced by the software BEAST. Additionally, our method is able to produce local recombination rate estimates. The software underlying our method, Coalescenator, is freely available. PMID:26857628

  18. A Comprehensive Phylogenetic Analysis of the Scleractinia (Cnidaria, Anthozoa) Based on Mitochondrial CO1 Sequence Data

    PubMed Central

    Kitahara, Marcelo V.; Cairns, Stephen D.; Stolarski, Jarosław; Blair, David; Miller, David J.

    2010-01-01

    Background Classical morphological taxonomy places the approximately 1400 recognized species of Scleractinia (hard corals) into 27 families, but many aspects of coral evolution remain unclear despite the application of molecular phylogenetic methods. In part, this may be a consequence of such studies focusing on the reef-building (shallow water and zooxanthellate) Scleractinia, and largely ignoring the large number of deep-sea species. To better understand broad patterns of coral evolution, we generated molecular data for a broad and representative range of deep sea scleractinians collected off New Caledonia and Australia during the last decade, and conducted the most comprehensive molecular phylogenetic analysis to date of the order Scleractinia. Methodology Partial (595 bp) sequences of the mitochondrial cytochrome oxidase subunit 1 (CO1) gene were determined for 65 deep-sea (azooxanthellate) scleractinians and 11 shallow-water species. These new data were aligned with 158 published sequences, generating a 234 taxon dataset representing 25 of the 27 currently recognized scleractinian families. Principal Findings/Conclusions There was a striking discrepancy between the taxonomic validity of coral families consisting predominantly of deep-sea or shallow-water species. Most families composed predominantly of deep-sea azooxanthellate species were monophyletic in both maximum likelihood and Bayesian analyses but, by contrast (and consistent with previous studies), most families composed predominantly of shallow-water zooxanthellate taxa were polyphyletic, although Acroporidae, Poritidae, Pocilloporidae, and Fungiidae were exceptions to this general pattern. One factor contributing to this inconsistency may be the greater environmental stability of deep-sea environments, effectively removing taxonomic “noise” contributed by phenotypic plasticity. Our phylogenetic analyses imply that the most basal extant scleractinians are azooxanthellate solitary corals from deep-water, their divergence predating that of the robust and complex corals. Deep-sea corals are likely to be critical to understanding anthozoan evolution and the origins of the Scleractinia. PMID:20628613

  19. Comprehensive discovery of noncoding RNAs in acute myeloid leukemia cell transcriptomes.

    PubMed

    Zhang, Jin; Griffith, Malachi; Miller, Christopher A; Griffith, Obi L; Spencer, David H; Walker, Jason R; Magrini, Vincent; McGrath, Sean D; Ly, Amy; Helton, Nichole M; Trissal, Maria; Link, Daniel C; Dang, Ha X; Larson, David E; Kulkarni, Shashikant; Cordes, Matthew G; Fronick, Catrina C; Fulton, Robert S; Klco, Jeffery M; Mardis, Elaine R; Ley, Timothy J; Wilson, Richard K; Maher, Christopher A

    2017-11-01

    To detect diverse and novel RNA species comprehensively, we compared deep small RNA and RNA sequencing (RNA-seq) methods applied to a primary acute myeloid leukemia (AML) sample. We were able to discover previously unannotated small RNAs using deep sequencing of a library method using broader insert size selection. We analyzed the long noncoding RNA (lncRNA) landscape in AML by comparing deep sequencing from multiple RNA-seq library construction methods for the sample that we studied and then integrating RNA-seq data from 179 AML cases. This identified lncRNAs that are completely novel, differentially expressed, and associated with specific AML subtypes. Our study revealed the complexity of the noncoding RNA transcriptome through a combined strategy of strand-specific small RNA and total RNA-seq. This dataset will serve as an invaluable resource for future RNA-based analyses. Copyright © 2017 ISEH – Society for Hematology and Stem Cells. Published by Elsevier Inc. All rights reserved.

  20. Development of a candidate reference material for adventitious virus detection in vaccine and biologicals manufacturing by deep sequencing

    PubMed Central

    Mee, Edward T.; Preston, Mark D.; Minor, Philip D.; Schepelmann, Silke; Huang, Xuening; Nguyen, Jenny; Wall, David; Hargrove, Stacey; Fu, Thomas; Xu, George; Li, Li; Cote, Colette; Delwart, Eric; Li, Linlin; Hewlett, Indira; Simonyan, Vahan; Ragupathy, Viswanath; Alin, Voskanian-Kordi; Mermod, Nicolas; Hill, Christiane; Ottenwälder, Birgit; Richter, Daniel C.; Tehrani, Arman; Jacqueline, Weber-Lehmann; Cassart, Jean-Pol; Letellier, Carine; Vandeputte, Olivier; Ruelle, Jean-Louis; Deyati, Avisek; La Neve, Fabio; Modena, Chiara; Mee, Edward; Schepelmann, Silke; Preston, Mark; Minor, Philip; Eloit, Marc; Muth, Erika; Lamamy, Arnaud; Jagorel, Florence; Cheval, Justine; Anscombe, Catherine; Misra, Raju; Wooldridge, David; Gharbia, Saheer; Rose, Graham; Ng, Siemon H.S.; Charlebois, Robert L.; Gisonni-Lex, Lucy; Mallet, Laurent; Dorange, Fabien; Chiu, Charles; Naccache, Samia; Kellam, Paul; van der Hoek, Lia; Cotten, Matt; Mitchell, Christine; Baier, Brian S.; Sun, Wenping; Malicki, Heather D.

    2016-01-01

    Background Unbiased deep sequencing offers the potential for improved adventitious virus screening in vaccines and biotherapeutics. Successful implementation of such assays will require appropriate control materials to confirm assay performance and sensitivity. Methods A common reference material containing 25 target viruses was produced and 16 laboratories were invited to process it using their preferred adventitious virus detection assay. Results Fifteen laboratories returned results, obtained using a wide range of wet-lab and informatics methods. Six of 25 target viruses were detected by all laboratories, with the remaining viruses detected by 4–14 laboratories. Six non-target viruses were detected by three or more laboratories. Conclusion The study demonstrated that a wide range of methods are currently used for adventitious virus detection screening in biological products by deep sequencing and that they can yield significantly different results. This underscores the need for common reference materials to ensure satisfactory assay performance and enable comparisons between laboratories. PMID:26709640

  1. Intelligent fault diagnosis of rolling bearings using an improved deep recurrent neural network

    NASA Astrophysics Data System (ADS)

    Jiang, Hongkai; Li, Xingqiu; Shao, Haidong; Zhao, Ke

    2018-06-01

    Traditional intelligent fault diagnosis methods for rolling bearings heavily depend on manual feature extraction and feature selection. For this purpose, an intelligent deep learning method, named the improved deep recurrent neural network (DRNN), is proposed in this paper. Firstly, frequency spectrum sequences are used as inputs to reduce the input size and ensure good robustness. Secondly, DRNN is constructed by the stacks of the recurrent hidden layer to automatically extract the features from the input spectrum sequences. Thirdly, an adaptive learning rate is adopted to improve the training performance of the constructed DRNN. The proposed method is verified with experimental rolling bearing data, and the results confirm that the proposed method is more effective than traditional intelligent fault diagnosis methods.

  2. A deep learning-based multi-model ensemble method for cancer prediction.

    PubMed

    Xiao, Yawen; Wu, Jun; Lin, Zongli; Zhao, Xiaodong

    2018-01-01

    Cancer is a complex worldwide health problem associated with high mortality. With the rapid development of the high-throughput sequencing technology and the application of various machine learning methods that have emerged in recent years, progress in cancer prediction has been increasingly made based on gene expression, providing insight into effective and accurate treatment decision making. Thus, developing machine learning methods, which can successfully distinguish cancer patients from healthy persons, is of great current interest. However, among the classification methods applied to cancer prediction so far, no one method outperforms all the others. In this paper, we demonstrate a new strategy, which applies deep learning to an ensemble approach that incorporates multiple different machine learning models. We supply informative gene data selected by differential gene expression analysis to five different classification models. Then, a deep learning method is employed to ensemble the outputs of the five classifiers. The proposed deep learning-based multi-model ensemble method was tested on three public RNA-seq data sets of three kinds of cancers, Lung Adenocarcinoma, Stomach Adenocarcinoma and Breast Invasive Carcinoma. The test results indicate that it increases the prediction accuracy of cancer for all the tested RNA-seq data sets as compared to using a single classifier or the majority voting algorithm. By taking full advantage of different classifiers, the proposed deep learning-based multi-model ensemble method is shown to be accurate and effective for cancer prediction. Copyright © 2017 Elsevier B.V. All rights reserved.

  3. Deep-towed high resolution seismic imaging II: Determination of P-wave velocity distribution

    NASA Astrophysics Data System (ADS)

    Marsset, B.; Ker, S.; Thomas, Y.; Colin, F.

    2018-02-01

    The acquisition of high resolution seismic data in deep waters requires the development of deep towed seismic sources and receivers able to deal with the high hydrostatic pressure environment. The low frequency piezoelectric transducer of the SYSIF (SYstème Sismique Fond) deep towed seismic device comply with the former requirement taking advantage of the coupling of a mechanical resonance (Janus driver) and a fluid resonance (Helmholtz cavity) to produce a large frequency bandwidth acoustic signal (220-1050 Hz). The ability to perform deep towed multichannel seismic imaging with SYSIF was demonstrated in 2014, yet, the ability to determine P-wave velocity distribution wasn't achieved. P-wave velocity analysis relies on the ratio between the source-receiver offset range and the depth of the seismic reflectors, thus towing the seismic source and receivers closer to the sea bed will provide a better geometry for P-wave velocity determination. Yet, technical issues, related to the acoustic source directivity, arise for this approach in the particular framework of piezoelectric sources. A signal processing sequence is therefore added to the initial processing flow. Data acquisition took place during the GHASS (Gas Hydrates, fluid Activities and Sediment deformations in the western Black Sea) cruise in the Romanian waters of the Black Sea. The results of the imaging processing are presented for two seismic data sets acquired over gas hydrates and gas bearing sediments. The improvement in the final seismic resolution demonstrates the validity of the velocity model.

  4. Improving microbial fitness in the mammalian gut by in vivo temporal functional metagenomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yaung, Stephanie J.; Deng, Luxue; Li, Ning

    Elucidating functions of commensal microbial genes in the mammalian gut is challenging because many commensals are recalcitrant to laboratory cultivation and genetic manipulation. We present Temporal FUnctional Metagenomics sequencing (TFUMseq), a platform to functionally mine bacterial genomes for genes that contribute to fitness of commensal bacteria in vivo. Our approach uses metagenomic DNA to construct large-scale heterologous expression libraries that are tracked over time in vivo by deep sequencing and computational methods. To demonstrate our approach, we built a TFUMseq plasmid library using the gut commensal Bacteroides thetaiotaomicron (Bt) and introduced Escherichia coli carrying this library into germfree mice. Populationmore » dynamics of library clones revealed Bt genes conferring significant fitness advantages in E. coli over time, including carbohydrate utilization genes, with a Bt galactokinase central to early colonization, and subsequent dominance by a Bt glycoside hydrolase enabling sucrose metabolism coupled with co-evolution of the plasmid library and E. coli genome driving increased galactose utilization. Here, our findings highlight the utility of functional metagenomics for engineering commensal bacteria with improved properties, including expanded colonization capabilities in vivo.« less

  5. Improving microbial fitness in the mammalian gut by in vivo temporal functional metagenomics

    DOE PAGES

    Yaung, Stephanie J.; Deng, Luxue; Li, Ning; ...

    2015-03-11

    Elucidating functions of commensal microbial genes in the mammalian gut is challenging because many commensals are recalcitrant to laboratory cultivation and genetic manipulation. We present Temporal FUnctional Metagenomics sequencing (TFUMseq), a platform to functionally mine bacterial genomes for genes that contribute to fitness of commensal bacteria in vivo. Our approach uses metagenomic DNA to construct large-scale heterologous expression libraries that are tracked over time in vivo by deep sequencing and computational methods. To demonstrate our approach, we built a TFUMseq plasmid library using the gut commensal Bacteroides thetaiotaomicron (Bt) and introduced Escherichia coli carrying this library into germfree mice. Populationmore » dynamics of library clones revealed Bt genes conferring significant fitness advantages in E. coli over time, including carbohydrate utilization genes, with a Bt galactokinase central to early colonization, and subsequent dominance by a Bt glycoside hydrolase enabling sucrose metabolism coupled with co-evolution of the plasmid library and E. coli genome driving increased galactose utilization. Here, our findings highlight the utility of functional metagenomics for engineering commensal bacteria with improved properties, including expanded colonization capabilities in vivo.« less

  6. Position-specific binding of FUS to nascent RNA regulates mRNA length

    PubMed Central

    Masuda, Akio; Takeda, Jun-ichi; Okuno, Tatsuya; Okamoto, Takaaki; Ohkawara, Bisei; Ito, Mikako; Ishigaki, Shinsuke; Sobue, Gen

    2015-01-01

    More than half of all human genes produce prematurely terminated polyadenylated short mRNAs. However, the underlying mechanisms remain largely elusive. CLIP-seq (cross-linking immunoprecipitation [CLIP] combined with deep sequencing) of FUS (fused in sarcoma) in neuronal cells showed that FUS is frequently clustered around an alternative polyadenylation (APA) site of nascent RNA. ChIP-seq (chromatin immunoprecipitation [ChIP] combined with deep sequencing) of RNA polymerase II (RNAP II) demonstrated that FUS stalls RNAP II and prematurely terminates transcription. When an APA site is located upstream of an FUS cluster, FUS enhances polyadenylation by recruiting CPSF160 and up-regulates the alternative short transcript. In contrast, when an APA site is located downstream from an FUS cluster, polyadenylation is not activated, and the RNAP II-suppressing effect of FUS leads to down-regulation of the alternative short transcript. CAGE-seq (cap analysis of gene expression [CAGE] combined with deep sequencing) and PolyA-seq (a strand-specific and quantitative method for high-throughput sequencing of 3' ends of polyadenylated transcripts) revealed that position-specific regulation of mRNA lengths by FUS is operational in two-thirds of transcripts in neuronal cells, with enrichment in genes involved in synaptic activities. PMID:25995189

  7. Deep sequencing detects very-low-grade somatic mosaicism in the unaffected mother of siblings with nemaline myopathy.

    PubMed

    Miyatake, Satoko; Koshimizu, Eriko; Hayashi, Yukiko K; Miya, Kazushi; Shiina, Masaaki; Nakashima, Mitsuko; Tsurusaki, Yoshinori; Miyake, Noriko; Saitsu, Hirotomo; Ogata, Kazuhiro; Nishino, Ichizo; Matsumoto, Naomichi

    2014-07-01

    When an expected mutation in a particular disease-causing gene is not identified in a suspected carrier, it is usually assumed to be due to germline mosaicism. We report here very-low-grade somatic mosaicism in ACTA1 in an unaffected mother of two siblings affected with a neonatal form of nemaline myopathy. The mosaicism was detected by deep resequencing using a next-generation sequencer. We identified a novel heterozygous mutation in ACTA1, c.448A>G (p.Thr150Ala), in the affected siblings. Three-dimensional structural modeling suggested that this mutation may affect polymerization and/or actin's interactions with other proteins. In this family, we expected autosomal dominant inheritance with either parent demonstrating germline or somatic mosaicism. Sanger sequencing identified no mutation. However, further deep resequencing of this mutation on a next-generation sequencer identified very-low-grade somatic mosaicism in the mother: 0.4%, 1.1%, and 8.3% in the saliva, blood leukocytes, and nails, respectively. Our study demonstrates the possibility of very-low-grade somatic mosaicism in suspected carriers, rather than germline mosaicism. Copyright © 2014 Elsevier B.V. All rights reserved.

  8. Insights about minority HIV-1 strains in transmitted drug resistance mutation dynamics and disease progression.

    PubMed

    Leda, Ana Rachel; Hunter, James; Oliveira, Ursula Castro; Azevedo, Inacio Junqueira; Sucupira, Maria Cecilia Araripe; Diaz, Ricardo Sobhie

    2018-04-19

    The presence of minority transmitted drug resistance mutations was assessed using ultra-deep sequencing and correlated with disease progression among recently HIV-1-infected individuals from Brazil. Samples at baseline during recent infection and 1 year after the establishment of the infection were analysed. Viral RNA and proviral DNA from 25 individuals were subjected to ultra-deep sequencing of the reverse transcriptase and protease regions of HIV-1. Viral strains carrying transmitted drug resistance mutations were detected in 9 out of the 25 patients, for all major antiretroviral classes, ranging from one to five mutations per patient. Ultra-deep sequencing detected strains with frequencies as low as 1.6% and only strains with frequencies >20% were detected by population plasma sequencing (three patients). Transmitted drug resistance strains with frequencies <14.8% did not persist upon established infection. The presence of transmitted drug resistance mutations was negatively correlated with the viral load and with CD4+ T cell count decay. Transmitted drug resistance mutations representing small percentages of the viral population do not persist during infection because they are negatively selected in the first year after HIV-1 seroconversion.

  9. GenomeGems: evaluation of genetic variability from deep sequencing data

    PubMed Central

    2012-01-01

    Background Detection of disease-causing mutations using Deep Sequencing technologies possesses great challenges. In particular, organizing the great amount of sequences generated so that mutations, which might possibly be biologically relevant, are easily identified is a difficult task. Yet, for this assignment only limited automatic accessible tools exist. Findings We developed GenomeGems to gap this need by enabling the user to view and compare Single Nucleotide Polymorphisms (SNPs) from multiple datasets and to load the data onto the UCSC Genome Browser for an expanded and familiar visualization. As such, via automatic, clear and accessible presentation of processed Deep Sequencing data, our tool aims to facilitate ranking of genomic SNP calling. GenomeGems runs on a local Personal Computer (PC) and is freely available at http://www.tau.ac.il/~nshomron/GenomeGems. Conclusions GenomeGems enables researchers to identify potential disease-causing SNPs in an efficient manner. This enables rapid turnover of information and leads to further experimental SNP validation. The tool allows the user to compare and visualize SNPs from multiple experiments and to easily load SNP data onto the UCSC Genome browser for further detailed information. PMID:22748151

  10. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kaplow, Irene M.; MacIsaac, Julia L.; Mah, Sarah M.

    DNA methylation is an epigenetic modification that plays a key role in gene regulation. Previous studies have investigated its genetic basis by mapping genetic variants that are associated with DNA methylation at specific sites, but these have been limited to microarrays that cover <2% of the genome and cannot account for allele-specific methylation (ASM). Other studies have performed whole-genome bisulfite sequencing on a few individuals, but these lack statistical power to identify variants associated with DNA methylation. We present a novel approach in which bisulfite-treated DNA from many individuals is sequenced together in a single pool, resulting in a trulymore » genome-wide map of DNA methylation. Compared to methods that do not account for ASM, our approach increases statistical power to detect associations while sharply reducing cost, effort, and experimental variability. As a proof of concept, we generated deep sequencing data from a pool of 60 human cell lines; we evaluated almost twice as many CpGs as the largest microarray studies and identified more than 2000 genetic variants associated with DNA methylation. Here we found that these variants are highly enriched for associations with chromatin accessibility and CTCF binding but are less likely to be associated with traits indirectly linked to DNA, such as gene expression and disease phenotypes. In summary, our approach allows genome-wide mapping of genetic variants associated with DNA methylation in any tissue of any species, without the need for individual-level genotype or methylation data.« less

  11. Feasibility of 3.0T pelvic MR imaging in the evaluation of endometriosis.

    PubMed

    Manganaro, L; Fierro, F; Tomei, A; Irimia, D; Lodise, P; Sergi, M E; Vinci, V; Sollazzo, P; Porpora, M G; Delfini, R; Vittori, G; Marini, M

    2012-06-01

    Endometriosis represents an important clinical problem in women of reproductive age with high impact on quality of life, work productivity and health care management. The aim of this study is to define the role of 3T magnetom system MRI in the evaluation of endometriosis. Forty-six women, with transvaginal (TV) ultrasound examination positive for endometriosis, with pelvic pain, or infertile underwent an MR 3.0T examination with the following protocol: T2 weighted FRFSE HR sequences, T2 weighted FRFSE HR CUBE 3D sequences, T1 w FSE sequences, LAVA-flex sequences. Pelvic anatomy, macroscopic endometriosis implants, deep endometriosis implants, fallopian tube involvement, adhesions presence, fluid effusion in Douglas pouch, uterus and kidney pathologies or anomalies associated and sacral nervous routes were considered by two radiologists in consensus. Laparoscopy was considered the gold standard. MRI imaging diagnosed deep endometriosis in 22/46 patients, endometriomas not associated to deep implants in 9/46 patients, 15/46 patients resulted negative for endometriosis, 11 of 22 patients with deep endometriosis reported ovarian endometriosis cyst. We obtained high percentages of sensibility (96.97%), specificity (100.00%), VPP (100.00%), VPN (92.86%). Pelvic MRI performed with 3T system guarantees high spatial and contrast resolution, providing accurate information about endometriosis implants, with a good pre-surgery mapping of the lesions involving both bowels and bladder surface and recto-uterine ligaments. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  12. ComplexContact: a web server for inter-protein contact prediction using deep learning.

    PubMed

    Zeng, Hong; Wang, Sheng; Zhou, Tianming; Zhao, Feifeng; Li, Xiufeng; Wu, Qing; Xu, Jinbo

    2018-05-22

    ComplexContact (http://raptorx2.uchicago.edu/ComplexContact/) is a web server for sequence-based interfacial residue-residue contact prediction of a putative protein complex. Interfacial residue-residue contacts are critical for understanding how proteins form complex and interact at residue level. When receiving a pair of protein sequences, ComplexContact first searches for their sequence homologs and builds two paired multiple sequence alignments (MSA), then it applies co-evolution analysis and a CASP-winning deep learning (DL) method to predict interfacial contacts from paired MSAs and visualizes the prediction as an image. The DL method was originally developed for intra-protein contact prediction and performed the best in CASP12. Our large-scale experimental test further shows that ComplexContact greatly outperforms pure co-evolution methods for inter-protein contact prediction, regardless of the species.

  13. Mixed heterolobosean and novel gregarine lineage genes from culture ATCC 50646: Long-branch artefacts, not lateral gene transfer, distort α-tubulin phylogeny.

    PubMed

    Cavalier-Smith, Thomas

    2015-04-01

    Contradictory and confusing results can arise if sequenced 'monoprotist' samples really contain DNA of very different species. Eukaryote-wide phylogenetic analyses using five genes from the amoeboflagellate culture ATCC 50646 previously implied it was an undescribed percolozoan related to percolatean flagellates (Stephanopogon, Percolomonas). Contrastingly, three phylogenetic analyses of 18S rRNA alone, did not place it within Percolozoa, but as an isolated deep-branching excavate. I resolve that contradiction by sequence phylogenies for all five genes individually, using up to 652 taxa. Its 18S rRNA sequence (GQ377652) is near-identical to one from stained-glass windows, somewhat more distant from one from cooling-tower water, all three related to terrestrial actinocephalid gregarines Hoplorhynchus and Pyxinia. All four protein-gene sequences (Hsp90; α-tubulin; β-tubulin; actin) are from an amoeboflagellate heterolobosean percolozoan, not especially deeply branching. Contrary to previous conclusions from trees combining protein and rRNA sequences or rDNA trees including Eozoa only, this culture does not represent a major novel deep-branching eukaryote lineage distinct from Heterolobosea, and thus lacks special significance for deep eukaryote phylogeny, though the rDNA sequence is important for gregarine phylogeny. α-Tubulin trees for over 250 eukaryotes refute earlier suggestions of lateral gene transfer within eukaryotes, being largely congruent with morphology and other gene trees. Copyright © 2015. Published by Elsevier GmbH.

  14. Classification of G-protein coupled receptors based on a rich generation of convolutional neural network, N-gram transformation and multiple sequence alignments.

    PubMed

    Li, Man; Ling, Cheng; Xu, Qi; Gao, Jingyang

    2018-02-01

    Sequence classification is crucial in predicting the function of newly discovered sequences. In recent years, the prediction of the incremental large-scale and diversity of sequences has heavily relied on the involvement of machine-learning algorithms. To improve prediction accuracy, these algorithms must confront the key challenge of extracting valuable features. In this work, we propose a feature-enhanced protein classification approach, considering the rich generation of multiple sequence alignment algorithms, N-gram probabilistic language model and the deep learning technique. The essence behind the proposed method is that if each group of sequences can be represented by one feature sequence, composed of homologous sites, there should be less loss when the sequence is rebuilt, when a more relevant sequence is added to the group. On the basis of this consideration, the prediction becomes whether a query sequence belonging to a group of sequences can be transferred to calculate the probability that the new feature sequence evolves from the original one. The proposed work focuses on the hierarchical classification of G-protein Coupled Receptors (GPCRs), which begins by extracting the feature sequences from the multiple sequence alignment results of the GPCRs sub-subfamilies. The N-gram model is then applied to construct the input vectors. Finally, these vectors are imported into a convolutional neural network to make a prediction. The experimental results elucidate that the proposed method provides significant performance improvements. The classification error rate of the proposed method is reduced by at least 4.67% (family level I) and 5.75% (family Level II), in comparison with the current state-of-the-art methods. The implementation program of the proposed work is freely available at: https://github.com/alanFchina/CNN .

  15. Targeted Enrichment of Large Gene Families for Phylogenetic Inference: Phylogeny and Molecular Evolution of Photosynthesis Genes in the Portullugo Clade (Caryophyllales).

    PubMed

    Moore, Abigail J; Vos, Jurriaan M De; Hancock, Lillian P; Goolsby, Eric; Edwards, Erika J

    2018-05-01

    Hybrid enrichment is an increasingly popular approach for obtaining hundreds of loci for phylogenetic analysis across many taxa quickly and cheaply. The genes targeted for sequencing are typically single-copy loci, which facilitate a more straightforward sequence assembly and homology assignment process. However, this approach limits the inclusion of most genes of functional interest, which often belong to multi-gene families. Here, we demonstrate the feasibility of including large gene families in hybrid enrichment protocols for phylogeny reconstruction and subsequent analyses of molecular evolution, using a new set of bait sequences designed for the "portullugo" (Caryophyllales), a moderately sized lineage of flowering plants (~ 2200 species) that includes the cacti and harbors many evolutionary transitions to C$_{\\mathrm{4}}$ and CAM photosynthesis. Including multi-gene families allowed us to simultaneously infer a robust phylogeny and construct a dense sampling of sequences for a major enzyme of C$_{\\mathrm{4}}$ and CAM photosynthesis, which revealed the accumulation of adaptive amino acid substitutions associated with C$_{\\mathrm{4}}$ and CAM origins in particular paralogs. Our final set of matrices for phylogenetic analyses included 75-218 loci across 74 taxa, with ~ 50% matrix completeness across data sets. Phylogenetic resolution was greatly improved across the tree, at both shallow and deep levels. Concatenation and coalescent-based approaches both resolve the sister lineage of the cacti with strong support: Anacampserotaceae $+$ Portulacaceae, two lineages of mostly diminutive succulent herbs of warm, arid regions. In spite of this congruence, BUCKy concordance analyses demonstrated strong and conflicting signals across gene trees. Our results add to the growing number of examples illustrating the complexity of phylogenetic signals in genomic-scale data.

  16. A deep learning approach for real time prostate segmentation in freehand ultrasound guided biopsy.

    PubMed

    Anas, Emran Mohammad Abu; Mousavi, Parvin; Abolmaesumi, Purang

    2018-06-01

    Targeted prostate biopsy, incorporating multi-parametric magnetic resonance imaging (mp-MRI) and its registration with ultrasound, is currently the state-of-the-art in prostate cancer diagnosis. The registration process in most targeted biopsy systems today relies heavily on accurate segmentation of ultrasound images. Automatic or semi-automatic segmentation is typically performed offline prior to the start of the biopsy procedure. In this paper, we present a deep neural network based real-time prostate segmentation technique during the biopsy procedure, hence paving the way for dynamic registration of mp-MRI and ultrasound data. In addition to using convolutional networks for extracting spatial features, the proposed approach employs recurrent networks to exploit the temporal information among a series of ultrasound images. One of the key contributions in the architecture is to use residual convolution in the recurrent networks to improve optimization. We also exploit recurrent connections within and across different layers of the deep networks to maximize the utilization of the temporal information. Furthermore, we perform dense and sparse sampling of the input ultrasound sequence to make the network robust to ultrasound artifacts. Our architecture is trained on 2,238 labeled transrectal ultrasound images, with an additional 637 and 1,017 unseen images used for validation and testing, respectively. We obtain a mean Dice similarity coefficient of 93%, a mean surface distance error of 1.10 mm and a mean Hausdorff distance error of 3.0 mm. A comparison of the reported results with those of a state-of-the-art technique indicates statistically significant improvement achieved by the proposed approach. Copyright © 2018 Elsevier B.V. All rights reserved.

  17. Deep learning improves prediction of CRISPR-Cpf1 guide RNA activity.

    PubMed

    Kim, Hui Kwon; Min, Seonwoo; Song, Myungjae; Jung, Soobin; Choi, Jae Woo; Kim, Younggwang; Lee, Sangeun; Yoon, Sungroh; Kim, Hyongbum Henry

    2018-03-01

    We present two algorithms to predict the activity of AsCpf1 guide RNAs. Indel frequencies for 15,000 target sequences were used in a deep-learning framework based on a convolutional neural network to train Seq-deepCpf1. We then incorporated chromatin accessibility information to create the better-performing DeepCpf1 algorithm for cell lines for which such information is available and show that both algorithms outperform previous machine learning algorithms on our own and published data sets.

  18. DSAP: deep-sequencing small RNA analysis pipeline.

    PubMed

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.

  19. Construction of Pseudomolecule Sequences of the aus Rice Cultivar Kasalath for Comparative Genomics of Asian Cultivated Rice

    PubMed Central

    Sakai, Hiroaki; Kanamori, Hiroyuki; Arai-Kichise, Yuko; Shibata-Hatta, Mari; Ebana, Kaworu; Oono, Youko; Kurita, Kanako; Fujisawa, Hiroko; Katagiri, Satoshi; Mukai, Yoshiyuki; Hamada, Masao; Itoh, Takeshi; Matsumoto, Takashi; Katayose, Yuichi; Wakasa, Kyo; Yano, Masahiro; Wu, Jianzhong

    2014-01-01

    Having a deep genetic structure evolved during its domestication and adaptation, the Asian cultivated rice (Oryza sativa) displays considerable physiological and morphological variations. Here, we describe deep whole-genome sequencing of the aus rice cultivar Kasalath by using the advanced next-generation sequencing (NGS) technologies to gain a better understanding of the sequence and structural changes among highly differentiated cultivars. The de novo assembled Kasalath sequences represented 91.1% (330.55 Mb) of the genome and contained 35 139 expressed loci annotated by RNA-Seq analysis. We detected 2 787 250 single-nucleotide polymorphisms (SNPs) and 7393 large insertion/deletion (indel) sites (>100 bp) between Kasalath and Nipponbare, and 2 216 251 SNPs and 3780 large indels between Kasalath and 93-11. Extensive comparison of the gene contents among these cultivars revealed similar rates of gene gain and loss. We detected at least 7.39 Mb of inserted sequences and 40.75 Mb of unmapped sequences in the Kasalath genome in comparison with the Nipponbare reference genome. Mapping of the publicly available NGS short reads from 50 rice accessions proved the necessity and the value of using the Kasalath whole-genome sequence as an additional reference to capture the sequence polymorphisms that cannot be discovered by using the Nipponbare sequence alone. PMID:24578372

  20. The Effects of Discipline on Deep Approaches to Student Learning and College Outcomes

    ERIC Educational Resources Information Center

    Nelson Laird, Thomas F.; Shoup, Rick; Kuh, George D.; Schwarz, Michael J.

    2008-01-01

    "Deep learning" represents student engagement in approaches to learning that emphasize integration, synthesis, and reflection. Because learning is a shared responsibility between students and faculty, it is important to determine whether faculty members emphasize deep approaches to learning and to assess how much students employ these approaches.…

  1. Comparison of Sanger and next generation sequencing performance for genotyping Cryptosporidium isolates at the 18S rRNA and actin loci.

    PubMed

    Paparini, Andrea; Gofton, Alexander; Yang, Rongchang; White, Nicole; Bunce, Michael; Ryan, Una M

    2015-01-01

    Cryptosporidium is an important enteric pathogen that infects a wide range of humans and animals. Rapid and reliable detection and characterisation methods are essential for understanding the transmission dynamics of the parasite. Sanger sequencing, and high-throughput sequencing (HTS) on an Ion Torrent platform, were compared with each other for their sensitivity and accuracy in detecting and characterising 25 Cryptosporidium-positive human and animal faecal samples. Ion Torrent reads (n = 123,857) were obtained at both 18S rRNA and actin loci for 21 of the 25 samples. Of these, one isolate at the actin locus (Cattle 05) and three at the 18S rRNA locus (HTS 10, HTS 11 and HTS 12), suffered PCR drop-out (i.e. PCR failures) when using fusion-tagged PCR. Sanger sequences were obtained for both loci for 23 of the 25 samples and showed good agreement with Ion Torrent-based genotyping. Two samples both from pythons (SK 02 and SK 05) produced mixed 18S and actin chromatograms by Sanger sequencing but were clearly identified by Ion Torrent sequencing as C. muris. One isolate (SK 03) was typed as C. muris by Sanger sequencing but was identified as a mixed C. muris and C. tyzzeri infection by HTS. 18S rRNA Type B sequences were identified in 4/6 C. parvum isolates when deep sequenced but were undetected in Sanger sequencing. Sanger was cheaper than Ion Torrent when sequencing a small numbers of samples, but when larger numbers of samples are considered (n = 60), the costs were comparative. Fusion-tagged amplicon based approaches are a powerful way of approaching mixtures, the only draw-back being the loss of PCR efficiency on low-template samples when using primers coupled to MID tags and adaptors. Taken together these data show that HTS has excellent potential for revealing the "true" composition of species/types in a Cryptosporidium infection, but that HTS workflows need to be carefully developed to ensure sensitivity, accuracy and contamination are controlled. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. Generic Amplicon Deep Sequencing to Determine Ilarvirus Species Diversity in Australian Prunus

    PubMed Central

    Kinoti, Wycliff M.; Constable, Fiona E.; Nancarrow, Narelle; Plummer, Kim M.; Rodoni, Brendan

    2017-01-01

    The distribution of Ilarvirus species populations amongst 61 Australian Prunus trees was determined by next generation sequencing (NGS) of amplicons generated using a genus-based generic RT-PCR targeting a conserved region of the Ilarvirus RNA2 component that encodes the RNA dependent RNA polymerase (RdRp) gene. Presence of Ilarvirus sequences in each positive sample was further validated by Sanger sequencing of cloned amplicons of regions of each of RNA1, RNA2 and/or RNA3 that were generated by species specific PCRs and by metagenomic NGS. Prunus necrotic ringspot virus (PNRSV) was the most frequently detected Ilarvirus, occurring in 48 of the 61 Ilarvirus-positive trees and Prune dwarf virus (PDV) and Apple mosaic virus (ApMV) were detected in three trees and one tree, respectively. American plum line pattern virus (APLPV) was detected in three trees and represents the first report of APLPV detection in Australia. Two novel and distinct groups of Ilarvirus-like RNA2 amplicon sequences were also identified in several trees by the generic amplicon NGS approach. The high read depth from the amplicon NGS of the generic PCR products allowed the detection of distinct RNA2 RdRp sequence variant populations of PNRSV, PDV, ApMV, APLPV and the two novel Ilarvirus-like sequences. Mixed infections of ilarviruses were also detected in seven Prunus trees. Sanger sequencing of specific RNA1, RNA2, and/or RNA3 genome segments of each virus and total nucleic acid metagenomics NGS confirmed the presence of PNRSV, PDV, ApMV and APLPV detected by RNA2 generic amplicon NGS. However, the two novel groups of Ilarvirus-like RNA2 amplicon sequences detected by the generic amplicon NGS could not be associated to the presence of sequence from RNA1 or RNA3 genome segments or full Ilarvirus genomes, and their origin is unclear. This work highlights the sensitivity of genus-specific amplicon NGS in detection of virus sequences and their distinct populations in multiple samples, and the need for a standardized approach to accurately determine what constitutes an active, viable virus infection after detection by molecular based methods. PMID:28713347

  3. Generic Amplicon Deep Sequencing to Determine Ilarvirus Species Diversity in Australian Prunus.

    PubMed

    Kinoti, Wycliff M; Constable, Fiona E; Nancarrow, Narelle; Plummer, Kim M; Rodoni, Brendan

    2017-01-01

    The distribution of Ilarvirus species populations amongst 61 Australian Prunus trees was determined by next generation sequencing (NGS) of amplicons generated using a genus-based generic RT-PCR targeting a conserved region of the Ilarvirus RNA2 component that encodes the RNA dependent RNA polymerase (RdRp) gene. Presence of Ilarvirus sequences in each positive sample was further validated by Sanger sequencing of cloned amplicons of regions of each of RNA1, RNA2 and/or RNA3 that were generated by species specific PCRs and by metagenomic NGS. Prunus necrotic ringspot virus (PNRSV) was the most frequently detected Ilarvirus , occurring in 48 of the 61 Ilarvirus -positive trees and Prune dwarf virus (PDV) and Apple mosaic virus (ApMV) were detected in three trees and one tree, respectively. American plum line pattern virus (APLPV) was detected in three trees and represents the first report of APLPV detection in Australia. Two novel and distinct groups of Ilarvirus -like RNA2 amplicon sequences were also identified in several trees by the generic amplicon NGS approach. The high read depth from the amplicon NGS of the generic PCR products allowed the detection of distinct RNA2 RdRp sequence variant populations of PNRSV, PDV, ApMV, APLPV and the two novel Ilarvirus -like sequences. Mixed infections of ilarviruses were also detected in seven Prunus trees. Sanger sequencing of specific RNA1, RNA2, and/or RNA3 genome segments of each virus and total nucleic acid metagenomics NGS confirmed the presence of PNRSV, PDV, ApMV and APLPV detected by RNA2 generic amplicon NGS. However, the two novel groups of Ilarvirus -like RNA2 amplicon sequences detected by the generic amplicon NGS could not be associated to the presence of sequence from RNA1 or RNA3 genome segments or full Ilarvirus genomes, and their origin is unclear. This work highlights the sensitivity of genus-specific amplicon NGS in detection of virus sequences and their distinct populations in multiple samples, and the need for a standardized approach to accurately determine what constitutes an active, viable virus infection after detection by molecular based methods.

  4. DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations.

    PubMed

    Yuan, Yuchen; Shi, Yi; Li, Changyang; Kim, Jinman; Cai, Weidong; Han, Zeguang; Feng, David Dagan

    2016-12-23

    With the developments of DNA sequencing technology, large amounts of sequencing data have become available in recent years and provide unprecedented opportunities for advanced association studies between somatic point mutations and cancer types/subtypes, which may contribute to more accurate somatic point mutation based cancer classification (SMCC). However in existing SMCC methods, issues like high data sparsity, small volume of sample size, and the application of simple linear classifiers, are major obstacles in improving the classification performance. To address the obstacles in existing SMCC studies, we propose DeepGene, an advanced deep neural network (DNN) based classifier, that consists of three steps: firstly, the clustered gene filtering (CGF) concentrates the gene data by mutation occurrence frequency, filtering out the majority of irrelevant genes; secondly, the indexed sparsity reduction (ISR) converts the gene data into indexes of its non-zero elements, thereby significantly suppressing the impact of data sparsity; finally, the data after CGF and ISR is fed into a DNN classifier, which extracts high-level features for accurate classification. Experimental results on our curated TCGA-DeepGene dataset, which is a reformulated subset of the TCGA dataset containing 12 selected types of cancer, show that CGF, ISR and DNN all contribute in improving the overall classification performance. We further compare DeepGene with three widely adopted classifiers and demonstrate that DeepGene has at least 24% performance improvement in terms of testing accuracy. Based on deep learning and somatic point mutation data, we devise DeepGene, an advanced cancer type classifier, which addresses the obstacles in existing SMCC studies. Experiments indicate that DeepGene outperforms three widely adopted existing classifiers, which is mainly attributed to its deep learning module that is able to extract the high level features between combinatorial somatic point mutations and cancer types.

  5. QQ-SNV: single nucleotide variant detection at low frequency by comparing the quality quantiles.

    PubMed

    Van der Borght, Koen; Thys, Kim; Wetzels, Yves; Clement, Lieven; Verbist, Bie; Reumers, Joke; van Vlijmen, Herman; Aerssens, Jeroen

    2015-11-10

    Next generation sequencing enables studying heterogeneous populations of viral infections. When the sequencing is done at high coverage depth ("deep sequencing"), low frequency variants can be detected. Here we present QQ-SNV (http://sourceforge.net/projects/qqsnv), a logistic regression classifier model developed for the Illumina sequencing platforms that uses the quantiles of the quality scores, to distinguish true single nucleotide variants from sequencing errors based on the estimated SNV probability. To train the model, we created a dataset of an in silico mixture of five HIV-1 plasmids. Testing of our method in comparison to the existing methods LoFreq, ShoRAH, and V-Phaser 2 was performed on two HIV and four HCV plasmid mixture datasets and one influenza H1N1 clinical dataset. For default application of QQ-SNV, variants were called using a SNV probability cutoff of 0.5 (QQ-SNV(D)). To improve the sensitivity we used a SNV probability cutoff of 0.0001 (QQ-SNV(HS)). To also increase specificity, SNVs called were overruled when their frequency was below the 80(th) percentile calculated on the distribution of error frequencies (QQ-SNV(HS-P80)). When comparing QQ-SNV versus the other methods on the plasmid mixture test sets, QQ-SNV(D) performed similarly to the existing approaches. QQ-SNV(HS) was more sensitive on all test sets but with more false positives. QQ-SNV(HS-P80) was found to be the most accurate method over all test sets by balancing sensitivity and specificity. When applied to a paired-end HCV sequencing study, with lowest spiked-in true frequency of 0.5%, QQ-SNV(HS-P80) revealed a sensitivity of 100% (vs. 40-60% for the existing methods) and a specificity of 100% (vs. 98.0-99.7% for the existing methods). In addition, QQ-SNV required the least overall computation time to process the test sets. Finally, when testing on a clinical sample, four putative true variants with frequency below 0.5% were consistently detected by QQ-SNV(HS-P80) from different generations of Illumina sequencers. We developed and successfully evaluated a novel method, called QQ-SNV, for highly efficient single nucleotide variant calling on Illumina deep sequencing virology data.

  6. Phylogenetic and Genome-Wide Deep-Sequencing Analyses of Canine Parvovirus Reveal Co-Infection with Field Variants and Emergence of a Recent Recombinant Strain

    PubMed Central

    Pérez, Ruben; Calleros, Lucía; Marandino, Ana; Sarute, Nicolás; Iraola, Gregorio; Grecco, Sofia; Blanc, Hervé; Vignuzzi, Marco; Isakov, Ofer; Shomron, Noam; Carrau, Lucía; Hernández, Martín; Francia, Lourdes; Sosa, Katia; Tomás, Gonzalo; Panzera, Yanina

    2014-01-01

    Canine parvovirus (CPV), a fast-evolving single-stranded DNA virus, comprises three antigenic variants (2a, 2b, and 2c) with different frequencies and genetic variability among countries. The contribution of co-infection and recombination to the genetic variability of CPV is far from being fully elucidated. Here we took advantage of a natural CPV population, recently formed by the convergence of divergent CPV-2c and CPV-2a strains, to study co-infection and recombination. Complete sequences of the viral coding region of CPV-2a and CPV-2c strains from 40 samples were generated and analyzed using phylogenetic tools. Two samples showed co-infection and were further analyzed by deep sequencing. The sequence profile of one of the samples revealed the presence of CPV-2c and CPV-2a strains that differed at 29 nucleotides. The other sample included a minor CPV-2a strain (13.3% of the viral population) and a major recombinant strain (86.7%). The recombinant strain arose from inter-genotypic recombination between CPV-2c and CPV-2a strains within the VP1/VP2 gene boundary. Our findings highlight the importance of deep-sequencing analysis to provide a better understanding of CPV molecular diversity. PMID:25365348

  7. Evidence for thermal convection in the deep carbonate aquifer of the eastern sector of the Po Plain, Italy

    NASA Astrophysics Data System (ADS)

    Pasquale, V.; Chiozzi, P.; Verdoya, M.

    2013-05-01

    Temperatures recorded in wells as deep as 6 km drilled for hydrocarbon prospecting were used together with geological information to depict the thermal regime of the sedimentary sequence of the eastern sector of the Po Plain. After correction for drilling disturbance, temperature data were analyzed through an inversion technique based on a laterally constant thermal gradient model. The obtained thermal gradient is quite low within the deep carbonate unit (14 mK m- 1), while it is larger (53 mK m- 1) in the overlying impermeable formations. In the uppermost sedimentary layers, the thermal gradient is close to the regional average (21 mK m- 1). We argue that such a vertical change cannot be ascribed to thermal conductivity variation within the sedimentary sequence, but to deep groundwater flow. Since the hydrogeological characteristics (including litho-stratigraphic sequence and structural setting) hardly permit forced convection, we suggest that thermal convection might occur within the deep carbonate aquifer. The potential of this mechanism was evaluated by means of the Rayleigh number analysis. It turned out that permeability required for convection to occur must be larger than 3 10- 15 m2. The average over-heat ratio is 0.45. The lateral variation of hydrothermal regime was tested by using temperature data representing the aquifer thermal conditions. We found that thermal convection might be more developed and variable at the Ferrara High and its surroundings, where widespread fracturing may have increased permeability.

  8. Captured metagenomics: large-scale targeting of genes based on ‘sequence capture’ reveals functional diversity in soils

    PubMed Central

    Manoharan, Lokeshwaran; Kushwaha, Sandeep K.; Hedlund, Katarina; Ahrén, Dag

    2015-01-01

    Microbial enzyme diversity is a key to understand many ecosystem processes. Whole metagenome sequencing (WMG) obtains information on functional genes, but it is costly and inefficient due to large amount of sequencing that is required. In this study, we have applied a captured metagenomics technique for functional genes in soil microorganisms, as an alternative to WMG. Large-scale targeting of functional genes, coding for enzymes related to organic matter degradation, was applied to two agricultural soil communities through captured metagenomics. Captured metagenomics uses custom-designed, hybridization-based oligonucleotide probes that enrich functional genes of interest in metagenomic libraries where only probe-bound DNA fragments are sequenced. The captured metagenomes were highly enriched with targeted genes while maintaining their target diversity and their taxonomic distribution correlated well with the traditional ribosomal sequencing. The captured metagenomes were highly enriched with genes related to organic matter degradation; at least five times more than similar, publicly available soil WMG projects. This target enrichment technique also preserves the functional representation of the soils, thereby facilitating comparative metagenomics projects. Here, we present the first study that applies the captured metagenomics approach in large scale, and this novel method allows deep investigations of central ecosystem processes by studying functional gene abundances. PMID:26490729

  9. A novel process of viral vector barcoding and library preparation enables high-diversity library generation and recombination-free paired-end sequencing

    PubMed Central

    Davidsson, Marcus; Diaz-Fernandez, Paula; Schwich, Oliver D.; Torroba, Marcos; Wang, Gang; Björklund, Tomas

    2016-01-01

    Detailed characterization and mapping of oligonucleotide function in vivo is generally a very time consuming effort that only allows for hypothesis driven subsampling of the full sequence to be analysed. Recent advances in deep sequencing together with highly efficient parallel oligonucleotide synthesis and cloning techniques have, however, opened up for entirely new ways to map genetic function in vivo. Here we present a novel, optimized protocol for the generation of universally applicable, barcode labelled, plasmid libraries. The libraries are designed to enable the production of viral vector preparations assessing coding or non-coding RNA function in vivo. When generating high diversity libraries, it is a challenge to achieve efficient cloning, unambiguous barcoding and detailed characterization using low-cost sequencing technologies. With the presented protocol, diversity of above 3 million uniquely barcoded adeno-associated viral (AAV) plasmids can be achieved in a single reaction through a process achievable in any molecular biology laboratory. This approach opens up for a multitude of in vivo assessments from the evaluation of enhancer and promoter regions to the optimization of genome editing. The generated plasmid libraries are also useful for validation of sequencing clustering algorithms and we here validate the newly presented message passing clustering process named Starcode. PMID:27874090

  10. Genetic risk prediction using a spatial autoregressive model with adaptive lasso.

    PubMed

    Wen, Yalu; Shen, Xiaoxi; Lu, Qing

    2018-05-31

    With rapidly evolving high-throughput technologies, studies are being initiated to accelerate the process toward precision medicine. The collection of the vast amounts of sequencing data provides us with great opportunities to systematically study the role of a deep catalog of sequencing variants in risk prediction. Nevertheless, the massive amount of noise signals and low frequencies of rare variants in sequencing data pose great analytical challenges on risk prediction modeling. Motivated by the development in spatial statistics, we propose a spatial autoregressive model with adaptive lasso (SARAL) for risk prediction modeling using high-dimensional sequencing data. The SARAL is a set-based approach, and thus, it reduces the data dimension and accumulates genetic effects within a single-nucleotide variant (SNV) set. Moreover, it allows different SNV sets having various magnitudes and directions of effect sizes, which reflects the nature of complex diseases. With the adaptive lasso implemented, SARAL can shrink the effects of noise SNV sets to be zero and, thus, further improve prediction accuracy. Through simulation studies, we demonstrate that, overall, SARAL is comparable to, if not better than, the genomic best linear unbiased prediction method. The method is further illustrated by an application to the sequencing data from the Alzheimer's Disease Neuroimaging Initiative. Copyright © 2018 John Wiley & Sons, Ltd.

  11. Selective 2'-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) for direct, versatile and accurate RNA structure analysis.

    PubMed

    Smola, Matthew J; Rice, Greggory M; Busan, Steven; Siegfried, Nathan A; Weeks, Kevin M

    2015-11-01

    Selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) chemistries exploit small electrophilic reagents that react with 2'-hydroxyl groups to interrogate RNA structure at single-nucleotide resolution. Mutational profiling (MaP) identifies modified residues by using reverse transcriptase to misread a SHAPE-modified nucleotide and then counting the resulting mutations by massively parallel sequencing. The SHAPE-MaP approach measures the structure of large and transcriptome-wide systems as accurately as can be done for simple model RNAs. This protocol describes the experimental steps, implemented over 3 d, that are required to perform SHAPE probing and to construct multiplexed SHAPE-MaP libraries suitable for deep sequencing. Automated processing of MaP sequencing data is accomplished using two software packages. ShapeMapper converts raw sequencing files into mutational profiles, creates SHAPE reactivity plots and provides useful troubleshooting information. SuperFold uses these data to model RNA secondary structures, identify regions with well-defined structures and visualize probable and alternative helices, often in under 1 d. SHAPE-MaP can be used to make nucleotide-resolution biophysical measurements of individual RNA motifs, rare components of complex RNA ensembles and entire transcriptomes.

  12. Teaching neuroanatomy using computer-aided learning: What makes for successful outcomes?

    PubMed

    Svirko, Elena; Mellanby, Jane

    2017-11-01

    Computer-aided learning (CAL) is an integral part of many medical courses. The neuroscience course at Oxford University for medical students includes CAL course of neuroanatomy. CAL is particularly suited to this since neuroanatomy requires much detailed three-dimensional visualization, which can be presented on screen. The CAL course was evaluated using the concept of approach to learning. The aims of university teaching are congruent with the deep approach-seeking meaning and relating new information to previous knowledge-rather than to the surface approach of concentrating on rote learning of detail. Seven cohorts of medical students (N = 869) filled in approach to learning scale and a questionnaire investigating their engagement with the CAL course. The students' scores on CAL-course-based neuroanatomy assessment and later university examinations were obtained. Although the students reported less use of the deep approach for the neuroanatomy CAL course than for the rest of their neuroanatomy course (mean = 24.99 vs. 31.49, P < 0.001), deep approach for CAL was positively correlated with neuroanatomy assessment performance (r = 0.12, P < 0.001). Time spent on the CAL course, enjoyment of it, the amount of CAL videos watched and quizzes completed were each significantly positively related to deep approach. The relationship between deep approach and enjoyment was particularly notable (25.5% shared variance). Reported relationships between deep approach and academic performance support the desirability of deep approach in university students. It is proposed that enjoyment of the course and the deep approach could be increased by incorporation of more clinical material which is what the students liked most. Anat Sci Educ 10: 560-569. © 2017 American Association of Anatomists. © 2017 American Association of Anatomists.

  13. Deep sequencing of the viral phoH gene reveals temporal variation, depth-specific composition, and persistent dominance of the same viral phoH genes in the Sargasso Sea

    PubMed Central

    Goldsmith, Dawn B.; Parsons, Rachel J.; Beyene, Damitu; Salamon, Peter

    2015-01-01

    Deep sequencing of the viral phoH gene, a host-derived auxiliary metabolic gene, was used to track viral diversity throughout the water column at the Bermuda Atlantic Time-series Study (BATS) site in the summer (September) and winter (March) of three years. Viral phoH sequences reveal differences in the viral communities throughout a depth profile and between seasons in the same year. Variation was also detected between the same seasons in subsequent years, though these differences were not as great as the summer/winter distinctions. Over 3,600 phoH operational taxonomic units (OTUs; 97% sequence identity) were identified. Despite high richness, most phoH sequences belong to a few large, common OTUs whereas the majority of the OTUs are small and rare. While many OTUs make sporadic appearances at just a few times or depths, a small number of OTUs dominate the community throughout the seasons, depths, and years. PMID:26157645

  14. Comparison of magnetic resonance imaging sequences for depicting the subthalamic nucleus for deep brain stimulation.

    PubMed

    Nagahama, Hiroshi; Suzuki, Kengo; Shonai, Takaharu; Aratani, Kazuki; Sakurai, Yuuki; Nakamura, Manami; Sakata, Motomichi

    2015-01-01

    Electrodes are surgically implanted into the subthalamic nucleus (STN) of Parkinson's disease patients to provide deep brain stimulation. For ensuring correct positioning, the anatomic location of the STN must be determined preoperatively. Magnetic resonance imaging has been used for pinpointing the location of the STN. To identify the optimal imaging sequence for identifying the STN, we compared images produced with T2 star-weighted angiography (SWAN), gradient echo T2*-weighted imaging, and fast spin echo T2-weighted imaging in 6 healthy volunteers. Our comparison involved measurement of the contrast-to-noise ratio (CNR) for the STN and substantia nigra and a radiologist's interpretations of the images. Of the sequences examined, the CNR and qualitative scores were significantly higher on SWAN images than on other images (p < 0.01) for STN visualization. Kappa value (0.74) on SWAN images was the highest in three sequences for visualizing the STN. SWAN is the sequence best suited for identifying the STN at the present time.

  15. EHR Big Data Deep Phenotyping

    PubMed Central

    Lenert, L.; Lopez-Campos, G.

    2014-01-01

    Summary Objectives Given the quickening speed of discovery of variant disease drivers from combined patient genotype and phenotype data, the objective is to provide methodology using big data technology to support the definition of deep phenotypes in medical records. Methods As the vast stores of genomic information increase with next generation sequencing, the importance of deep phenotyping increases. The growth of genomic data and adoption of Electronic Health Records (EHR) in medicine provides a unique opportunity to integrate phenotype and genotype data into medical records. The method by which collections of clinical findings and other health related data are leveraged to form meaningful phenotypes is an active area of research. Longitudinal data stored in EHRs provide a wealth of information that can be used to construct phenotypes of patients. We focus on a practical problem around data integration for deep phenotype identification within EHR data. The use of big data approaches are described that enable scalable markup of EHR events that can be used for semantic and temporal similarity analysis to support the identification of phenotype and genotype relationships. Conclusions Stead and colleagues’ 2005 concept of using light standards to increase the productivity of software systems by riding on the wave of hardware/processing power is described as a harbinger for designing future healthcare systems. The big data solution, using flexible markup, provides a route to improved utilization of processing power for organizing patient records in genotype and phenotype research. PMID:25123744

  16. Microbial potential for carbon and nutrient cycling in a geogenic supercritical carbon dioxide reservoir: Microbial life in the deep carbonated biosphere

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Freedman, Adam J. E.; Tan, BoonFei; Thompson, Janelle R.

    Microorganisms catalyze carbon cycling and biogeochemical reactions in the deep subsurface and thus may be expected to influence the fate of injected super-critical (sc) CO 2 following geological carbon sequestration (GCS). We hypothesized that natural subsurface scCO 2 reservoirs, which serve as analogs for the long-term fate of sequestered scCO 2 harbor a ‘deep carbonated biosphere’ with carbon cycling potential. We sampled subsurface fluids from scCO 2- water separators at a natural scCO 2 reservoir at McElmo Dome, Colorado for analysis of 16S rRNA gene diversity and metagenome content. Sequence annotations indicated dominance of Sulfurospirillum, Rhizobium, Desulfovibrio and four membersmore » of the Clostridiales family. Genomes extracted from metagenomes using homology and compositional approaches revealed diverse mechanisms for growth and nutrient cycling, including pathways for CO 2 and N 2 fixation, anaerobic respiration, sulfur oxidation, fermentation and potential for metabolic syntrophy. Differences in biogeochemical potential between two production well communities were consistent with differences in fluid chemical profiles, suggesting a potential link between microbial activity and geochemistry. In conclusion, the existence of a microbial ecosystem associated with the McElmo Dome scCO 2 reservoir indicates that potential impacts of the deep biosphere on CO 2 fate and transport should be taken into consideration as a component of GCS planning and modelling.« less

  17. Microbial potential for carbon and nutrient cycling in a geogenic supercritical carbon dioxide reservoir: Microbial life in the deep carbonated biosphere

    DOE PAGES

    Freedman, Adam J. E.; Tan, BoonFei; Thompson, Janelle R.

    2017-05-02

    Microorganisms catalyze carbon cycling and biogeochemical reactions in the deep subsurface and thus may be expected to influence the fate of injected super-critical (sc) CO 2 following geological carbon sequestration (GCS). We hypothesized that natural subsurface scCO 2 reservoirs, which serve as analogs for the long-term fate of sequestered scCO 2 harbor a ‘deep carbonated biosphere’ with carbon cycling potential. We sampled subsurface fluids from scCO 2- water separators at a natural scCO 2 reservoir at McElmo Dome, Colorado for analysis of 16S rRNA gene diversity and metagenome content. Sequence annotations indicated dominance of Sulfurospirillum, Rhizobium, Desulfovibrio and four membersmore » of the Clostridiales family. Genomes extracted from metagenomes using homology and compositional approaches revealed diverse mechanisms for growth and nutrient cycling, including pathways for CO 2 and N 2 fixation, anaerobic respiration, sulfur oxidation, fermentation and potential for metabolic syntrophy. Differences in biogeochemical potential between two production well communities were consistent with differences in fluid chemical profiles, suggesting a potential link between microbial activity and geochemistry. In conclusion, the existence of a microbial ecosystem associated with the McElmo Dome scCO 2 reservoir indicates that potential impacts of the deep biosphere on CO 2 fate and transport should be taken into consideration as a component of GCS planning and modelling.« less

  18. Deep sequencing methods for protein engineering and design.

    PubMed

    Wrenbeck, Emily E; Faber, Matthew S; Whitehead, Timothy A

    2017-08-01

    The advent of next-generation sequencing (NGS) has revolutionized protein science, and the development of complementary methods enabling NGS-driven protein engineering have followed. In general, these experiments address the functional consequences of thousands of protein variants in a massively parallel manner using genotype-phenotype linked high-throughput functional screens followed by DNA counting via deep sequencing. We highlight the use of information rich datasets to engineer protein molecular recognition. Examples include the creation of multiple dual-affinity Fabs targeting structurally dissimilar epitopes and engineering of a broad germline-targeted anti-HIV-1 immunogen. Additionally, we highlight the generation of enzyme fitness landscapes for conducting fundamental studies of protein behavior and evolution. We conclude with discussion of technological advances. Copyright © 2016 Elsevier Ltd. All rights reserved.

  19. Development of genic-SSR markers by deep transcriptome sequencing in pigeonpea [Cajanus cajan (L.) Millspaugh].

    PubMed

    Dutta, Sutapa; Kumawat, Giriraj; Singh, Bikram P; Gupta, Deepak K; Singh, Sangeeta; Dogra, Vivek; Gaikwad, Kishor; Sharma, Tilak R; Raje, Ranjeet S; Bandhopadhya, Tapas K; Datta, Subhojit; Singh, Mahendra N; Bashasab, Fakrudin; Kulwal, Pawan; Wanjari, K B; K Varshney, Rajeev; Cook, Douglas R; Singh, Nagendra K

    2011-01-20

    Pigeonpea [Cajanus cajan (L.) Millspaugh], one of the most important food legumes of semi-arid tropical and subtropical regions, has limited genomic resources, particularly expressed sequence based (genic) markers. We report a comprehensive set of validated genic simple sequence repeat (SSR) markers using deep transcriptome sequencing, and its application in genetic diversity analysis and mapping. In this study, 43,324 transcriptome shotgun assembly unigene contigs were assembled from 1.696 million 454 GS-FLX sequence reads of separate pooled cDNA libraries prepared from leaf, root, stem and immature seed of two pigeonpea varieties, Asha and UPAS 120. A total of 3,771 genic-SSR loci, excluding homopolymeric and compound repeats, were identified; of which 2,877 PCR primer pairs were designed for marker development. Dinucleotide was the most common repeat motif with a frequency of 60.41%, followed by tri- (34.52%), hexa- (2.62%), tetra- (1.67%) and pentanucleotide (0.76%) repeat motifs. Primers were synthesized and tested for 772 of these loci with repeat lengths of ≥ 18 bp. Of these, 550 markers were validated for consistent amplification in eight diverse pigeonpea varieties; 71 were found to be polymorphic on agarose gel electrophoresis. Genetic diversity analysis was done on 22 pigeonpea varieties and eight wild species using 20 highly polymorphic genic-SSR markers. The number of alleles at these loci ranged from 4-10 and the polymorphism information content values ranged from 0.46 to 0.72. Neighbor-joining dendrogram showed distinct separation of the different groups of pigeonpea cultivars and wild species. Deep transcriptome sequencing of the two parental lines helped in silico identification of polymorphic genic-SSR loci to facilitate the rapid development of an intra-species reference genetic map, a subset of which was validated for expected allelic segregation in the reference mapping population. We developed 550 validated genic-SSR markers in pigeonpea using deep transcriptome sequencing. From these, 20 highly polymorphic markers were used to evaluate the genetic relationship among species of the genus Cajanus. A comprehensive set of genic-SSR markers was developed as an important genomic resource for diversity analysis and genetic mapping in pigeonpea.

  20. Development of genic-SSR markers by deep transcriptome sequencing in pigeonpea [Cajanus cajan (L.) Millspaugh

    PubMed Central

    2011-01-01

    Background Pigeonpea [Cajanus cajan (L.) Millspaugh], one of the most important food legumes of semi-arid tropical and subtropical regions, has limited genomic resources, particularly expressed sequence based (genic) markers. We report a comprehensive set of validated genic simple sequence repeat (SSR) markers using deep transcriptome sequencing, and its application in genetic diversity analysis and mapping. Results In this study, 43,324 transcriptome shotgun assembly unigene contigs were assembled from 1.696 million 454 GS-FLX sequence reads of separate pooled cDNA libraries prepared from leaf, root, stem and immature seed of two pigeonpea varieties, Asha and UPAS 120. A total of 3,771 genic-SSR loci, excluding homopolymeric and compound repeats, were identified; of which 2,877 PCR primer pairs were designed for marker development. Dinucleotide was the most common repeat motif with a frequency of 60.41%, followed by tri- (34.52%), hexa- (2.62%), tetra- (1.67%) and pentanucleotide (0.76%) repeat motifs. Primers were synthesized and tested for 772 of these loci with repeat lengths of ≥18 bp. Of these, 550 markers were validated for consistent amplification in eight diverse pigeonpea varieties; 71 were found to be polymorphic on agarose gel electrophoresis. Genetic diversity analysis was done on 22 pigeonpea varieties and eight wild species using 20 highly polymorphic genic-SSR markers. The number of alleles at these loci ranged from 4-10 and the polymorphism information content values ranged from 0.46 to 0.72. Neighbor-joining dendrogram showed distinct separation of the different groups of pigeonpea cultivars and wild species. Deep transcriptome sequencing of the two parental lines helped in silico identification of polymorphic genic-SSR loci to facilitate the rapid development of an intra-species reference genetic map, a subset of which was validated for expected allelic segregation in the reference mapping population. Conclusion We developed 550 validated genic-SSR markers in pigeonpea using deep transcriptome sequencing. From these, 20 highly polymorphic markers were used to evaluate the genetic relationship among species of the genus Cajanus. A comprehensive set of genic-SSR markers was developed as an important genomic resource for diversity analysis and genetic mapping in pigeonpea. PMID:21251263

  1. Sorting Out the Ocean Crust Deep Biosphere with Single Cell Omics Approaches

    NASA Astrophysics Data System (ADS)

    Orcutt, B.; D'Angelo, T.; Goordial, J.; Jones, R. M.; Carr, S. A.

    2017-12-01

    Although oceanic crust comprises a large habitat for subsurface life, the structure, function, and dynamics of microbial communities living on rocks in the subsurface are poorly understood. Single cell level approaches can overcome limitations of low biomass in subsurface systems. Coupled with incubation experiments with amino acid orthologs, single cell level sorting can reveal high resolution information about identity, functional potential, and growth. Leveraging collaboration with the Single Cell Genomics Center and the Facility for Aquatic Cytometry at Bigelow Laboratory, we present recent results from single cell level sorting and -omics sequencing from several crustal environments, including the Atlantis Massif and the Juan de Fuca Ridge flank. We will also highlight new experiments conducted with samples recovered from the flank of the Mid-Atlantic Ridge.

  2. Culturable prokaryotic diversity of deep, gas hydrate sediments: first use of a continuous high-pressure, anaerobic, enrichment and isolation system for subseafloor sediments (DeepIsoBUG)

    PubMed Central

    Parkes, R John; Sellek, Gerard; Webster, Gordon; Martin, Derek; Anders, Erik; Weightman, Andrew J; Sass, Henrik

    2009-01-01

    Deep subseafloor sediments may contain depressurization-sensitive, anaerobic, piezophilic prokaryotes. To test this we developed the DeepIsoBUG system, which when coupled with the HYACINTH pressure-retaining drilling and core storage system and the PRESS core cutting and processing system, enables deep sediments to be handled without depressurization (up to 25 MPa) and anaerobic prokaryotic enrichments and isolation to be conducted up to 100 MPa. Here, we describe the system and its first use with subsurface gas hydrate sediments from the Indian Continental Shelf, Cascadia Margin and Gulf of Mexico. Generally, highest cell concentrations in enrichments occurred close to in situ pressures (14 MPa) in a variety of media, although growth continued up to at least 80 MPa. Predominant sequences in enrichments were Carnobacterium, Clostridium, Marinilactibacillus and Pseudomonas, plus Acetobacterium and Bacteroidetes in Indian samples, largely independent of media and pressures. Related 16S rRNA gene sequences for all of these Bacteria have been detected in deep, subsurface environments, although isolated strains were piezotolerant, being able to grow at atmospheric pressure. Only the Clostridium and Acetobacterium were obligate anaerobes. No Archaea were enriched. It may be that these sediment samples were not deep enough (total depth 1126–1527 m) to obtain obligate piezophiles. PMID:19694787

  3. Targeted parallel sequencing of the Musa species: searching for an alternative model system for polyploidy studies

    USDA-ARS?s Scientific Manuscript database

    Modern day genomics holds the promise of solving the complexities of basic plant sciences, and of catalyzing practical advances in plant breeding. While contiguous, "base perfect" deep sequencing is a key module of any genome project, recent advances in parallel next generation sequencing technologi...

  4. 3′ terminal diversity of MRP RNA and other human noncoding RNAs revealed by deep sequencing

    PubMed Central

    2013-01-01

    Background Post-transcriptional 3′ end processing is a key component of RNA regulation. The abundant and essential RNA subunit of RNase MRP has been proposed to function in three distinct cellular compartments and therefore may utilize this mode of regulation. Here we employ 3′ RACE coupled with high-throughput sequencing to characterize the 3′ terminal sequences of human MRP RNA and other noncoding RNAs that form RNP complexes. Results The 3′ terminal sequence of MRP RNA from HEK293T cells has a distinctive distribution of genomically encoded termini (including an assortment of U residues) with a portion of these selectively tagged by oligo(A) tails. This profile contrasts with the relatively homogenous 3′ terminus of an in vitro transcribed MRP RNA control and the differing 3′ terminal profiles of U3 snoRNA, RNase P RNA, and telomerase RNA (hTR). Conclusions 3′ RACE coupled with deep sequencing provides a valuable framework for the functional characterization of 3′ terminal sequences of noncoding RNAs. PMID:24053768

  5. Viruses in diarrhoeic dogs include novel kobuviruses and sapoviruses.

    PubMed

    Li, Linlin; Pesavento, Patricia A; Shan, Tongling; Leutenegger, Christian M; Wang, Chunlin; Delwart, Eric

    2011-11-01

    The close interactions of dogs with humans and surrounding wildlife provide frequent opportunities for cross-species virus transmissions. In order to initiate an unbiased characterization of the eukaryotic viruses in the gut of dogs, this study used deep sequencing of partially purified viral capsid-protected nucleic acids from the faeces of 18 diarrhoeic dogs. Known canine parvoviruses, coronaviruses and rotaviruses were identified, and the genomes of the first reported canine kobuvirus and sapovirus were characterized. Canine kobuvirus, the first sequenced canine picornavirus and the closest genetic relative of the diarrhoea-causing human Aichi virus, was detected at high frequency in the faeces of both healthy and diarrhoeic dogs. Canine sapovirus constituted a novel genogroup within the genus Sapovirus, a group of viruses also associated with human and animal diarrhoea. These results highlight the high frequency of new virus detection possible even in extensively studied animal species using metagenomics approaches, and provide viral genomes for further disease-association studies.

  6. Genetics of impulsive behaviour

    PubMed Central

    Bevilacqua, Laura; Goldman, David

    2013-01-01

    Impulsivity, defined as the tendency to act without foresight, comprises a multitude of constructs and is associated with a variety of psychiatric disorders. Dissecting different aspects of impulsive behaviour and relating these to specific neurobiological circuits would improve our understanding of the etiology of complex behaviours for which impulsivity is key, and advance genetic studies in this behavioural domain. In this review, we will discuss the heritability of some impulsivity constructs and their possible use as endophenotypes (heritable, disease-associated intermediate phenotypes). Several functional genetic variants associated with impulsive behaviour have been identified by the candidate gene approach and re-sequencing, and whole genome strategies can be implemented for discovery of novel rare and common alleles influencing impulsivity. Via deep sequencing an uncommon HTR2B stop codon, common in one population, was discovered, with implications for understanding impulsive behaviour in both humans and rodents and for future gene discovery. PMID:23440466

  7. Medical Sequencing at the extremes of Human Body Mass

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ahituv, Nadav; Kavaslar, Nihan; Schackwitz, Wendy

    2006-09-01

    Body weight is a quantitative trait with significantheritability in humans. To identify potential genetic contributors tothis phenotype, we resequenced the coding exons and splice junctions of58 genes in 379 obese and 378 lean individuals. Our 96Mb survey included21 genes associated with monogenic forms of obesity in humans or mice, aswell as 37 genes that function in body weight-related pathways. We foundthat the monogenic obesity-associated gene group was enriched for rarenonsynonymous variants unique to the obese (n=46) versus lean (n=26)populations. Computational analysis further predicted a significantlygreater fraction of deleterious variants within the obese cohort.Consistent with the complex inheritance of body weight,more » we did notobserve obvious familial segregation in the majority of the 28 availablekindreds. Taken together, these data suggest that multiple rare alleleswith variable penetrance contribute to obesity in the population andprovide a deep medical sequencing based approach to detectthem.« less

  8. The Transcriptome of Compatible and Incompatible Interactions of Potato (Solanum tuberosum) with Phytophthora infestans Revealed by DeepSAGE Analysis

    PubMed Central

    Gyetvai, Gabor; Sønderkær, Mads; Göbel, Ulrike; Basekow, Rico; Ballvora, Agim; Imhoff, Maren; Kersten, Birgit; Nielsen, Kåre-Lehman; Gebhardt, Christiane

    2012-01-01

    Late blight, caused by the oomycete Phytophthora infestans, is the most important disease of potato (Solanum tuberosum). Understanding the molecular basis of resistance and susceptibility to late blight is therefore highly relevant for developing resistant cultivars, either by marker-assissted selection or by transgenic approaches. Specific P. infestans races having the Avr1 effector gene trigger a hypersensitive resistance response in potato plants carrying the R1 resistance gene (incompatible interaction) and cause disease in plants lacking R1 (compatible interaction). The transcriptomes of the compatible and incompatible interaction were captured by DeepSAGE analysis of 44 biological samples comprising five genotypes, differing only by the presence or absence of the R1 transgene, three infection time points and three biological replicates. 30.859 unique 21 base pair sequence tags were obtained, one third of which did not match any known potato transcript sequence. Two third of the tags were expressed at low frequency (<10 tag counts/million). 20.470 unitags matched to approximately twelve thousand potato transcribed genes. Tag frequencies were compared between compatible and incompatible interactions over the infection time course and between compatible and incompatible genotypes. Transcriptional changes were more numerous in compatible than in incompatible interactions. In contrast to incompatible interactions, transcriptional changes in the compatible interaction were observed predominantly for multigene families encoding defense response genes and genes functional in photosynthesis and CO2 fixation. Numerous transcriptional differences were also observed between near isogenic genotypes prior to infection with P. infestans. Our DeepSAGE transcriptome analysis uncovered novel candidate genes for plant host pathogen interactions, examples of which are discussed with respect to possible function. PMID:22328937

  9. Quantitative Viral Community DNA Analysis Reveals the Dominance of Single-Stranded DNA Viruses in Offshore Upper Bathyal Sediment from Tohoku, Japan

    PubMed Central

    Yoshida, Mitsuhiro; Mochizuki, Tomohiro; Urayama, Syun-Ichi; Yoshida-Takashima, Yukari; Nishi, Shinro; Hirai, Miho; Nomaki, Hidetaka; Takaki, Yoshihiro; Nunoura, Takuro; Takai, Ken

    2018-01-01

    Previous studies on marine environmental virology have primarily focused on double-stranded DNA (dsDNA) viruses; however, it has recently been suggested that single-stranded DNA (ssDNA) viruses are more abundant in marine ecosystems. In this study, we performed a quantitative viral community DNA analysis to estimate the relative abundance and composition of both ssDNA and dsDNA viruses in offshore upper bathyal sediment from Tohoku, Japan (water depth = 500 m). The estimated dsDNA viral abundance ranged from 3 × 106 to 5 × 106 genome copies per cm3 sediment, showing values similar to the range of fluorescence-based direct virus counts. In contrast, the estimated ssDNA viral abundance ranged from 1 × 108 to 3 × 109 genome copies per cm3 sediment, thus providing an estimation that the ssDNA viral populations represent 96.3–99.8% of the benthic total DNA viral assemblages. In the ssDNA viral metagenome, most of the identified viral sequences were associated with ssDNA viral families such as Circoviridae and Microviridae. The principle components analysis of the ssDNA viral sequence components from the sedimentary ssDNA viral metagenomic libraries found that the different depth viral communities at the study site all exhibited similar profiles compared with deep-sea sediment ones at other reference sites. Our results suggested that deep-sea benthic ssDNA viruses have been significantly underestimated by conventional direct virus counts and that their contributions to deep-sea benthic microbial mortality and geochemical cycles should be further addressed by such a new quantitative approach. PMID:29467725

  10. Analysis of microRNA profile of Anopheles sinensis by deep sequencing and bioinformatic approaches.

    PubMed

    Feng, Xinyu; Zhou, Xiaojian; Zhou, Shuisen; Wang, Jingwen; Hu, Wei

    2018-03-12

    microRNAs (miRNAs) are small non-coding RNAs widely identified in many mosquitoes. They are reported to play important roles in development, differentiation and innate immunity. However, miRNAs in Anopheles sinensis, one of the Chinese malaria mosquitoes, remain largely unknown. We investigated the global miRNA expression profile of An. sinensis using Illumina Hiseq 2000 sequencing. Meanwhile, we applied a bioinformatic approach to identify potential miRNAs in An. sinensis. The identified miRNA profiles were compared and analyzed by two approaches. The selected miRNAs from the sequencing result and the bioinformatic approach were confirmed with qRT-PCR. Moreover, target prediction, GO annotation and pathway analysis were carried out to understand the role of miRNAs in An. sinensis. We identified 49 conserved miRNAs and 12 novel miRNAs by next-generation high-throughput sequencing technology. In contrast, 43 miRNAs were predicted by the bioinformatic approach, of which two were assigned as novel. Comparative analysis of miRNA profiles by two approaches showed that 21 miRNAs were shared between them. Twelve novel miRNAs did not match any known miRNAs of any organism, indicating that they are possibly species-specific. Forty miRNAs were found in many mosquito species, indicating that these miRNAs are evolutionally conserved and may have critical roles in the process of life. Both the selected known and novel miRNAs (asi-miR-281, asi-miR-184, asi-miR-14, asi-miR-nov5, asi-miR-nov4, asi-miR-9383, and asi-miR-2a) could be detected by quantitative real-time PCR (qRT-PCR) in the sequenced sample, and the expression patterns of these miRNAs measured by qRT-PCR were in concordance with the original miRNA sequencing data. The predicted targets for the known and the novel miRNAs covered many important biological roles and pathways indicating the diversity of miRNA functions. We also found 21 conserved miRNAs and eight counterparts of target immune pathway genes in An. sinensis based on the analysis of An. gambiae. Our results provide the first lead to the elucidation of the miRNA profile in An. sinensis. Unveiling the roles of mosquito miRNAs will undoubtedly lead to a better understanding of mosquito biology and mosquito-pathogen interactions. This work lays the foundation for the further functional study of An. sinensis miRNAs and will facilitate their application in vector control.

  11. Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts.

    PubMed

    Cocos, Anne; Fiks, Alexander G; Masino, Aaron J

    2017-07-01

    Social media is an important pharmacovigilance data source for adverse drug reaction (ADR) identification. Human review of social media data is infeasible due to data quantity, thus natural language processing techniques are necessary. Social media includes informal vocabulary and irregular grammar, which challenge natural language processing methods. Our objective is to develop a scalable, deep-learning approach that exceeds state-of-the-art ADR detection performance in social media. We developed a recurrent neural network (RNN) model that labels words in an input sequence with ADR membership tags. The only input features are word-embedding vectors, which can be formed through task-independent pretraining or during ADR detection training. Our best-performing RNN model used pretrained word embeddings created from a large, non-domain-specific Twitter dataset. It achieved an approximate match F-measure of 0.755 for ADR identification on the dataset, compared to 0.631 for a baseline lexicon system and 0.65 for the state-of-the-art conditional random field model. Feature analysis indicated that semantic information in pretrained word embeddings boosted sensitivity and, combined with contextual awareness captured in the RNN, precision. Our model required no task-specific feature engineering, suggesting generalizability to additional sequence-labeling tasks. Learning curve analysis showed that our model reached optimal performance with fewer training examples than the other models. ADR detection performance in social media is significantly improved by using a contextually aware model and word embeddings formed from large, unlabeled datasets. The approach reduces manual data-labeling requirements and is scalable to large social media datasets. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  12. Finding a needle in the virus metagenome haystack--micro-metagenome analysis captures a snapshot of the diversity of a bacteriophage armoire.

    PubMed

    Ray, Jessica; Dondrup, Michael; Modha, Sejal; Steen, Ida Helene; Sandaa, Ruth-Anne; Clokie, Martha

    2012-01-01

    Viruses are ubiquitous in the oceans and critical components of marine microbial communities, regulating nutrient transfer to higher trophic levels or to the dissolved organic pool through lysis of host cells. Hydrothermal vent systems are oases of biological activity in the deep oceans, for which knowledge of biodiversity and its impact on global ocean biogeochemical cycling is still in its infancy. In order to gain biological insight into viral communities present in hydrothermal vent systems, we developed a method based on deep-sequencing of pulsed field gel electrophoretic bands representing key viral fractions present in seawater within and surrounding a hydrothermal plume derived from Loki's Castle vent field at the Arctic Mid-Ocean Ridge. The reduction in virus community complexity afforded by this novel approach enabled the near-complete reconstruction of a lambda-like phage genome from the virus fraction of the plume. Phylogenetic examination of distinct gene regions in this lambdoid phage genome unveiled diversity at loci encoding superinfection exclusion- and integrase-like proteins. This suggests the importance of fine-tuning lyosgenic conversion as a viral survival strategy, and provides insights into the nature of host-virus and virus-virus interactions, within hydrothermal plumes. By reducing the complexity of the viral community through targeted sequencing of prominent dsDNA viral fractions, this method has selectively mimicked virus dominance approaching that hitherto achieved only through culturing, thus enabling bioinformatic analysis to locate a lambdoid viral "needle" within the greater viral community "haystack". Such targeted analyses have great potential for accelerating the extraction of biological knowledge from diverse and poorly understood environmental viral communities.

  13. Robust, Optimal Water Infrastructure Planning Under Deep Uncertainty Using Metamodels

    NASA Astrophysics Data System (ADS)

    Maier, H. R.; Beh, E. H. Y.; Zheng, F.; Dandy, G. C.; Kapelan, Z.

    2015-12-01

    Optimal long-term planning plays an important role in many water infrastructure problems. However, this task is complicated by deep uncertainty about future conditions, such as the impact of population dynamics and climate change. One way to deal with this uncertainty is by means of robustness, which aims to ensure that water infrastructure performs adequately under a range of plausible future conditions. However, as robustness calculations require computationally expensive system models to be run for a large number of scenarios, it is generally computationally intractable to include robustness as an objective in the development of optimal long-term infrastructure plans. In order to overcome this shortcoming, an approach is developed that uses metamodels instead of computationally expensive simulation models in robustness calculations. The approach is demonstrated for the optimal sequencing of water supply augmentation options for the southern portion of the water supply for Adelaide, South Australia. A 100-year planning horizon is subdivided into ten equal decision stages for the purpose of sequencing various water supply augmentation options, including desalination, stormwater harvesting and household rainwater tanks. The objectives include the minimization of average present value of supply augmentation costs, the minimization of average present value of greenhouse gas emissions and the maximization of supply robustness. The uncertain variables are rainfall, per capita water consumption and population. Decision variables are the implementation stages of the different water supply augmentation options. Artificial neural networks are used as metamodels to enable all objectives to be calculated in a computationally efficient manner at each of the decision stages. The results illustrate the importance of identifying optimal staged solutions to ensure robustness and sustainability of water supply into an uncertain long-term future.

  14. Optical Communications Channel Combiner

    NASA Technical Reports Server (NTRS)

    Quirk, Kevin J.; Quirk, Kevin J.; Nguyen, Danh H.; Nguyen, Huy

    2012-01-01

    NASA has identified deep-space optical communications links as an integral part of a unified space communication network in order to provide data rates in excess of 100 Mb/s. The distances and limited power inherent in a deep-space optical downlink necessitate the use of photon-counting detectors and a power-efficient modulation such as pulse position modulation (PPM). For the output of each photodetector, whether from a separate telescope or a portion of the detection area, a communication receiver estimates a log-likelihood ratio for each PPM slot. To realize the full effective aperture of these receivers, their outputs must be combined prior to information decoding. A channel combiner was developed to synchronize the log-likelihood ratio (LLR) sequences of multiple receivers, and then combines these into a single LLR sequence for information decoding. The channel combiner synchronizes the LLR sequences of up to three receivers and then combines these into a single LLR sequence for output. The channel combiner has three channel inputs, each of which takes as input a sequence of four-bit LLRs for each PPM slot in a codeword via a XAUI 10 Gb/s quad optical fiber interface. The cross-correlation between the channels LLR time series are calculated and used to synchronize the sequences prior to combining. The output of the channel combiner is a sequence of four-bit LLRs for each PPM slot in a codeword via a XAUI 10 Gb/s quad optical fiber interface. The unit is controlled through a 1 Gb/s Ethernet UDP/IP interface. A deep-space optical communication link has not yet been demonstrated. This ground-station channel combiner was developed to demonstrate this capability and is unique in its ability to process such a signal.

  15. The 3-D aftershock distribution of three recent M5~5.5 earthquakes in the Anza region,California

    NASA Astrophysics Data System (ADS)

    Zhang, Q.; Wdowinski, S.; Lin, G.

    2011-12-01

    The San Jacinto fault zone (SJFZ) exhibits the highest level of seismicity compared to other regions in southern California. On average, it produces four earthquakes per day, most of them at depth of 10-17 km. Over the past decade, an increasing seismic activity occurred in the Anza region, which included three M5~5.5 events and their aftershock sequences. These events occurred in 2001, 2005, and 2010. In this research we map the 3-D distribution of these three events to evaluate their rupture geometry and better understand the unusual deep seismic pattern along the SJFZ, which was termed "deep creep" (Wdowinski, 2009). We relocated 97,562 events from 1981 to 2011 in Anza region by applying the Source-Specific Station Term (SSST) method (Lin et al., 2006) and used an accurate 1-D velocity model derived from 3-D model of Lin et al (2007) and used In order to separate the aftershock sequence from background seismicity, we characterized each of the three aftershock sequences using Omori's law. Preliminary results show that all three sequences had a similar geometry of deep elongated aftershock distribution. Most aftershocks occurred at depth of 10-17 km and extended over a 70 km long segments of the SJFZ, centered at the mainshock hypocenters. A comparative study of other M5~5.5 mainshocks and their aftershock sequences in southern California reveals very different geometrical pattern, suggesting that the three Anza M5~5.5 events are unique and can be indicative of "deep creep" deformation processes. Reference 1.Lin, G.and Shearer,P.M.,2006, The COMPLOC earthquake location package,Seism. Res. Lett.77, pp.440-444. 2.Lin, G. and Shearer, P.M., Hauksson, E., and Thurber C.H.,2007, A three-dimensional crustal seismic velocity model for southern California from a composite event method,J. Geophys.Res.112, B12306, doi: 10.1029/ 2007JB004977. 3.Wdowinski, S. ,2009, Deep creep as a cause for the excess seismicity along the San Jacinto fault, Nat. Geosci.,doi:10.1038/NGEO684.

  16. Population genomics of C. melanopterus using target gene capture data: demographic inferences and conservation perspectives

    PubMed Central

    Maisano Delser, Pierpaolo; Corrigan, Shannon; Hale, Matthew; Li, Chenhong; Veuille, Michel; Planes, Serge; Naylor, Gavin; Mona, Stefano

    2016-01-01

    Population genetics studies on non-model organisms typically involve sampling few markers from multiple individuals. Next-generation sequencing approaches open up the possibility of sampling many more markers from fewer individuals to address the same questions. Here, we applied a target gene capture method to deep sequence ~1000 independent autosomal regions of a non-model organism, the blacktip reef shark (Carcharhinus melanopterus). We devised a sampling scheme based on the predictions of theoretical studies of metapopulations to show that sampling few individuals, but many loci, can be extremely informative to reconstruct the evolutionary history of species. We collected data from a single deme (SID) from Northern Australia and from a scattered sampling representing various locations throughout the Indian Ocean (SCD). We explored the genealogical signature of population dynamics detected from both sampling schemes using an ABC algorithm. We then contrasted these results with those obtained by fitting the data to a non-equilibrium finite island model. Both approaches supported an Nm value ~40, consistent with philopatry in this species. Finally, we demonstrate through simulation that metapopulations exhibit greater resilience to recent changes in effective size compared to unstructured populations. We propose an empirical approach to detect recent bottlenecks based on our sampling scheme. PMID:27651217

  17. Population genomics of C. melanopterus using target gene capture data: demographic inferences and conservation perspectives.

    PubMed

    Maisano Delser, Pierpaolo; Corrigan, Shannon; Hale, Matthew; Li, Chenhong; Veuille, Michel; Planes, Serge; Naylor, Gavin; Mona, Stefano

    2016-09-21

    Population genetics studies on non-model organisms typically involve sampling few markers from multiple individuals. Next-generation sequencing approaches open up the possibility of sampling many more markers from fewer individuals to address the same questions. Here, we applied a target gene capture method to deep sequence ~1000 independent autosomal regions of a non-model organism, the blacktip reef shark (Carcharhinus melanopterus). We devised a sampling scheme based on the predictions of theoretical studies of metapopulations to show that sampling few individuals, but many loci, can be extremely informative to reconstruct the evolutionary history of species. We collected data from a single deme (SID) from Northern Australia and from a scattered sampling representing various locations throughout the Indian Ocean (SCD). We explored the genealogical signature of population dynamics detected from both sampling schemes using an ABC algorithm. We then contrasted these results with those obtained by fitting the data to a non-equilibrium finite island model. Both approaches supported an Nm value ~40, consistent with philopatry in this species. Finally, we demonstrate through simulation that metapopulations exhibit greater resilience to recent changes in effective size compared to unstructured populations. We propose an empirical approach to detect recent bottlenecks based on our sampling scheme.

  18. Effects of hydrostatic pressure on yeasts isolated from deep-sea hydrothermal vents.

    PubMed

    Burgaud, Gaëtan; Hué, Nguyen Thi Minh; Arzur, Danielle; Coton, Monika; Perrier-Cornet, Jean-Marie; Jebbar, Mohamed; Barbier, Georges

    2015-11-01

    Hydrostatic pressure plays a significant role in the distribution of life in the biosphere. Knowledge of deep-sea piezotolerant and (hyper)piezophilic bacteria and archaea diversity has been well documented, along with their specific adaptations to cope with high hydrostatic pressure (HHP). Recent investigations of deep-sea microbial community compositions have shown unexpected micro-eukaryotic communities, mainly dominated by fungi. Molecular methods such as next-generation sequencing have been used for SSU rRNA gene sequencing to reveal fungal taxa. Currently, a difficult but fascinating challenge for marine mycologists is to create deep-sea marine fungus culture collections and assess their ability to cope with pressure. Indeed, although there is no universal genetic marker for piezoresistance, physiological analyses provide concrete relevant data for estimating their adaptations and understanding the role of fungal communities in the abyss. The present study investigated morphological and physiological responses of fungi to HHP using a collection of deep-sea yeasts as a model. The aim was to determine whether deep-sea yeasts were able to tolerate different HHP and if they were metabolically active. Here we report an unexpected taxonomic-based dichotomic response to pressure with piezosensitve ascomycetes and piezotolerant basidiomycetes, and distinct morphological switches triggered by pressure for certain strains. Copyright © 2015 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

  19. Paleosols can promote root growth of the recent vegetation - a case study from the sandy soil-sediment sequence Rakt, the Netherlands

    NASA Astrophysics Data System (ADS)

    Gocke, M. I.; Kessler, F.; van Mourik, J. M.; Jansen, B.; Wiesenberg, G. L. B.

    2015-12-01

    Soil studies commonly comprise the uppermost meter for tracing e.g. soil development. However, the maximum rooting depth of various plants significantly exceeds this depth. We hypothesized that deeper parts of the soil, soil parent material and especially paleosols provide beneficial conditions in terms of e.g. nutrient contents, thus supporting their utilization and exploitation by deep roots. We aimed to decipher the different phases of soil formation in Dutch drift- and coversands. The study site is located at Bedafse Bergen (SE Netherlands) in a 200 year old oak stand. A recent Podzol developed on driftsand covering a Plaggic Anthrosol that established in a relict Podzol on Late Glacial eolian coversand. Root-free soil and sediment samples, collected in 10-15 cm depth increments, were subjected to a multi-proxy physical and geochemical approach. The Plaggic Anthrosol revealed low bulk density and high phosphorous and organic carbon contents, whereas the relict Podzol was characterized by high iron and aluminum contents. Frequencies of fine (≤ 2 mm) and medium roots (2-5 mm) were determined on horizontal levels and the profile wall for a detailed pseudo-three-dimensional insight. On horizontal levels, living roots maximized in the uppermost part of the relict Podzol with ca. 4450 and 220 m-2, significantly exceeding topsoil root abundances. Roots of oak trees thus benefited from the favorable growth conditions in the nutrient-rich Plaggic Anthrosol, whereas increased compactness and high aluminum contents of the relict Podzol caused a strong decrease of roots. The approach demonstrated the benefit of comprehensive root investigation to support and explain pedogenic investigations of soil profiles, as fine roots can be significantly underestimated when quantified at the profile wall. The possible rooting of soil parent material and paleosols long after their burial confirmed recent studies on the potential influence of rooting to overprint sediment-(paleo)soil sequences of various ages, sedimentary and climatic settings. Potential consequences of deep rooting for terrestrial deep carbon stocks, located to a relevant part in paleosols, remain largely unknown and require further investigation.

  20. Paleosols can promote root growth of recent vegetation - a case study from the sandy soil-sediment sequence Rakt, the Netherlands

    NASA Astrophysics Data System (ADS)

    Gocke, Martina I.; Kessler, Fabian; van Mourik, Jan M.; Jansen, Boris; Wiesenberg, Guido L. B.

    2016-10-01

    Soil studies commonly comprise the uppermost meter for tracing, e.g., soil development. However, the maximum rooting depth of various plants significantly exceeds this depth. We hypothesized that deeper parts of the soil, soil parent material and especially paleosols provide beneficial conditions in terms of, e.g., nutrient contents, thus supporting their utilization and exploitation by deep roots. We aimed to decipher the different phases of soil formation in Dutch drift sands and cover sands. The study site is located at Bedafse Bergen (southeastern Netherlands) in a 200-year-old oak stand. A recent Podzol developed on drift sand covering a Plaggic Anthrosol that was piled up on a relict Podzol on Late Glacial eolian cover sand. Root-free soil and sediment samples, collected in 10-15 cm depth increments, were subjected to a multi-proxy physical and geochemical approach. The Plaggic Anthrosol revealed low bulk density and high phosphorous and organic carbon contents, whereas the relict Podzol was characterized by high iron and aluminum contents. Frequencies of fine (diameter ≤ 2 mm) and medium roots (2-5 mm) were determined on horizontal levels and the profile wall for a detailed pseudo-three-dimensional insight. On horizontal levels, living roots were most abundant in the uppermost part of the relict Podzol with ca. 4450 and 220 m-2, significantly exceeding topsoil root abundances. Roots of oak trees thus benefited from the favorable growth conditions in the nutrient-rich Plaggic Anthrosol, whereas increased compactness and high aluminum contents of the relict Podzol caused a strong decrease of roots. The approach demonstrated the benefit of comprehensive root investigation to support interpretation of soil profiles, as fine roots can be significantly underestimated when quantified at the profile wall. The possible rooting of soil parent material and paleosols long after their burial confirmed recent studies on the potential influence of rooting to overprint sediment-(paleo)soil sequences of various ages, sedimentary and climatic settings. Potential consequences of deep rooting for terrestrial deep carbon stocks, located to a relevant part in paleosols, remain largely unknown and require further investigation.

  1. Expanding anchored hybrid enrichment to resolve both deep and shallow relationships within the spider tree of life.

    PubMed

    Hamilton, Chris A; Lemmon, Alan R; Lemmon, Emily Moriarty; Bond, Jason E

    2016-10-13

    Despite considerable effort, progress in spider molecular systematics has lagged behind many other comparable arthropod groups, thereby hindering family-level resolution, classification, and testing of important macroevolutionary hypotheses. Recently, alternative targeted sequence capture techniques have provided molecular systematics a powerful tool for resolving relationships across the Tree of Life. One of these approaches, Anchored Hybrid Enrichment (AHE), is designed to recover hundreds of unique orthologous loci from across the genome, for resolving both shallow and deep-scale evolutionary relationships within non-model systems. Herein we present a modification of the AHE approach that expands its use for application in spiders, with a particular emphasis on the infraorder Mygalomorphae. Our aim was to design a set of probes that effectively capture loci informative at a diversity of phylogenetic timescales. Following identification of putative arthropod-wide loci, we utilized homologous transcriptome sequences from 17 species across all spiders to identify exon boundaries. Conserved regions with variable flanking regions were then sought across the tick genome, three published araneomorph spider genomes, and raw genomic reads of two mygalomorph taxa. Following development of the 585 target loci in the Spider Probe Kit, we applied AHE across three taxonomic depths to evaluate performance: deep-level spider family relationships (33 taxa, 327 loci); family and generic relationships within the mygalomorph family Euctenizidae (25 taxa, 403 loci); and species relationships in the North American tarantula genus Aphonopelma (83 taxa, 581 loci). At the deepest level, all three major spider lineages (the Mesothelae, Mygalomorphae, and Araneomorphae) were supported with high bootstrap support. Strong support was also found throughout the Euctenizidae, including generic relationships within the family and species relationships within the genus Aptostichus. As in the Euctenizidae, virtually identical topologies were inferred with high support throughout Aphonopelma. The Spider Probe Kit, the first implementation of AHE methodology in Class Arachnida, holds great promise for gathering the types and quantities of molecular data needed to accelerate an understanding of the spider Tree of Life by providing a mechanism whereby different researchers can confidently and effectively use the same loci for independent projects, yet allowing synthesis of data across independent research groups.

  2. When less is more: 'slicing' sequencing data improves read decoding accuracy and de novo assembly quality.

    PubMed

    Lonardi, Stefano; Mirebrahim, Hamid; Wanamaker, Steve; Alpert, Matthew; Ciardo, Gianfranco; Duma, Denisa; Close, Timothy J

    2015-09-15

    As the invention of DNA sequencing in the 70s, computational biologists have had to deal with the problem of de novo genome assembly with limited (or insufficient) depth of sequencing. In this work, we investigate the opposite problem, that is, the challenge of dealing with excessive depth of sequencing. We explore the effect of ultra-deep sequencing data in two domains: (i) the problem of decoding reads to bacterial artificial chromosome (BAC) clones (in the context of the combinatorial pooling design we have recently proposed), and (ii) the problem of de novo assembly of BAC clones. Using real ultra-deep sequencing data, we show that when the depth of sequencing increases over a certain threshold, sequencing errors make these two problems harder and harder (instead of easier, as one would expect with error-free data), and as a consequence the quality of the solution degrades with more and more data. For the first problem, we propose an effective solution based on 'divide and conquer': we 'slice' a large dataset into smaller samples of optimal size, decode each slice independently, and then merge the results. Experimental results on over 15 000 barley BACs and over 4000 cowpea BACs demonstrate a significant improvement in the quality of the decoding and the final assembly. For the second problem, we show for the first time that modern de novo assemblers cannot take advantage of ultra-deep sequencing data. Python scripts to process slices and resolve decoding conflicts are available from http://goo.gl/YXgdHT; software Hashfilter can be downloaded from http://goo.gl/MIyZHs stelo@cs.ucr.edu or timothy.close@ucr.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  3. Unique microbial community in drilling fluids from Chinese continental scientific drilling

    USGS Publications Warehouse

    Zhang, Gengxin; Dong, Hailiang; Jiang, Hongchen; Xu, Zhiqin; Eberl, Dennis D.

    2006-01-01

    Circulating drilling fluid is often regarded as a contamination source in investigations of subsurface microbiology. However, it also provides an opportunity to sample geological fluids at depth and to study contained microbial communities. During our study of deep subsurface microbiology of the Chinese Continental Scientific Deep drilling project, we collected 6 drilling fluid samples from a borehole from 2290 to 3350 m below the land surface. Microbial communities in these samples were characterized with cultivation-dependent and -independent techniques. Characterization of 16S rRNA genes indicated that the bacterial clone sequences related to Firmicutes became progressively dominant with increasing depth. Most sequences were related to anaerobic, thermophilic, halophilic or alkaliphilic bacteria. These habitats were consistent with the measured geochemical characteristics of the drilling fluids that have incorporated geological fluids and partly reflected the in-situ conditions. Several clone types were closely related to Thermoanaerobacter ethanolicus, Caldicellulosiruptor lactoaceticus, and Anaerobranca gottschalkii, an anaerobic metal-reducer, an extreme thermophile, and an anaerobic chemoorganotroph, respectively, with an optimal growth temperature of 50–68°C. Seven anaerobic, thermophilic Fe(III)-reducing bacterial isolates were obtained and they were capable of reducing iron oxide and clay minerals to produce siderite, vivianite, and illite. The archaeal diversity was low. Most archaeal sequences were not related to any known cultivated species, but rather to environmental clone sequences recovered from subsurface environments. We infer that the detected microbes were derived from geological fluids at depth and their growth habitats reflected the deep subsurface conditions. These findings have important implications for microbial survival and their ecological functions in the deep subsurface.

  4. Full genome virus detection in fecal samples using sensitive nucleic acid preparation, deep sequencing, and a novel iterative sequence classification algorithm.

    PubMed

    Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J; Kellam, Paul; van der Hoek, Lia

    2014-01-01

    We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis.

  5. Full Genome Virus Detection in Fecal Samples Using Sensitive Nucleic Acid Preparation, Deep Sequencing, and a Novel Iterative Sequence Classification Algorithm

    PubMed Central

    Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J.; Kellam, Paul; van der Hoek, Lia

    2014-01-01

    We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis. PMID:24695106

  6. Usefulness of DWI in preoperative assessment of deep myometrial invasion in patients with endometrial carcinoma: a systematic review and meta-analysis

    PubMed Central

    2014-01-01

    Background The objective of this study was to perform a systematic review and a meta-analysis in order to estimate the diagnostic accuracy of diffusion weighted imaging (DWI) in the preoperative assessment of deep myometrial invasion in patients with endometrial carcinoma. Methods Studies evaluating DWI for the detection of deep myometrial invasion in patients with endometrial carcinoma were systematically searched for in the MEDLINE, EMBASE, and Cochrane Library from January 1995 to January 2014. Methodologic quality was assessed by using the Quality Assessment of Diagnostic Accuracy Studies tool. Bivariate random-effects meta-analytic methods were used to obtain pooled estimates of sensitivity, specificity, diagnostic odds ratio (DOR) and receiver operating characteristic (ROC) curves. The study also evaluated the clinical utility of DWI in preoperative assessment of deep myometrial invasion. Results Seven studies enrolling a total of 320 individuals met the study inclusion criteria. The summary area under the ROC curve was 0.91. There was no evidence of publication bias (P = 0.90, bias coefficient analysis). Sensitivity and specificity of DWI for detection of deep myometrial invasion across all studies were 0.90 and 0.89, respectively. Positive and negative likelihood ratios with DWI were 8 and 0.11 respectively. In patients with high pre-test probabilities, DWI enabled confirmation of deep myometrial invasion; in patients with low pre-test probabilities, DWI enabled exclusion of deep myometrial invasion. The worst case scenario (pre-test probability, 50%) post-test probabilities were 89% and 10% for positive and negative DWI results, respectively. Conclusion DWI has high sensitivity and specificity for detecting deep myometrial invasion and more importantly can reliably rule out deep myometrial invasion. Therefore, it would be worthwhile to add a DWI sequence to the standard MRI protocols in preoperative evaluation of endometrial cancer in order to detect deep myometrial invasion, which along with other poor prognostic factors like age, tumor grade, and LVSI would be useful in stratifying high risk groups thereby helping in the tailoring of surgical approach in patient with low risk of endometrial carcinoma. PMID:25608571

  7. Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning.

    PubMed

    Adhikari, Badri; Hou, Jie; Cheng, Jianlin

    2018-03-01

    In this study, we report the evaluation of the residue-residue contacts predicted by our three different methods in the CASP12 experiment, focusing on studying the impact of multiple sequence alignment, residue coevolution, and machine learning on contact prediction. The first method (MULTICOM-NOVEL) uses only traditional features (sequence profile, secondary structure, and solvent accessibility) with deep learning to predict contacts and serves as a baseline. The second method (MULTICOM-CONSTRUCT) uses our new alignment algorithm to generate deep multiple sequence alignment to derive coevolution-based features, which are integrated by a neural network method to predict contacts. The third method (MULTICOM-CLUSTER) is a consensus combination of the predictions of the first two methods. We evaluated our methods on 94 CASP12 domains. On a subset of 38 free-modeling domains, our methods achieved an average precision of up to 41.7% for top L/5 long-range contact predictions. The comparison of the three methods shows that the quality and effective depth of multiple sequence alignments, coevolution-based features, and machine learning integration of coevolution-based features and traditional features drive the quality of predicted protein contacts. On the full CASP12 dataset, the coevolution-based features alone can improve the average precision from 28.4% to 41.6%, and the machine learning integration of all the features further raises the precision to 56.3%, when top L/5 predicted long-range contacts are evaluated. And the correlation between the precision of contact prediction and the logarithm of the number of effective sequences in alignments is 0.66. © 2017 Wiley Periodicals, Inc.

  8. A comprehensive survey of 3' animal miRNA modification events and a possible role for 3' adenylation in modulating miRNA targeting effectiveness.

    PubMed

    Burroughs, A Maxwell; Ando, Yoshinari; de Hoon, Michiel J L; Tomaru, Yasuhiro; Nishibu, Takahiro; Ukekawa, Ryo; Funakoshi, Taku; Kurokawa, Tsutomu; Suzuki, Harukazu; Hayashizaki, Yoshihide; Daub, Carsten O

    2010-10-01

    Animal microRNA sequences are subject to 3' nucleotide addition. Through detailed analysis of deep-sequenced short RNA data sets, we show adenylation and uridylation of miRNA is globally present and conserved across Drosophila and vertebrates. To better understand 3' adenylation function, we deep-sequenced RNA after knockdown of nucleotidyltransferase enzymes. The PAPD4 nucleotidyltransferase adenylates a wide range of miRNA loci, but adenylation does not appear to affect miRNA stability on a genome-wide scale. Adenine addition appears to reduce effectiveness of miRNA targeting of mRNA transcripts while deep-sequencing of RNA bound to immunoprecipitated Argonaute (AGO) subfamily proteins EIF2C1-EIF2C3 revealed substantial reduction of adenine addition in miRNA associated with EIF2C2 and EIF2C3. Our findings show 3' addition events are widespread and conserved across animals, PAPD4 is a primary miRNA adenylating enzyme, and suggest a role for 3' adenine addition in modulating miRNA effectiveness, possibly through interfering with incorporation into the RNA-induced silencing complex (RISC), a regulatory role that would complement the role of miRNA uridylation in blocking DICER1 uptake.

  9. The mitochondrial genome of Ifremeria nautilei and the phylogenetic position of the enigmatic deep-sea Abyssochrysoidea (Mollusca: Gastropoda).

    PubMed

    Osca, David; Templado, José; Zardoya, Rafael

    2014-09-01

    The complete nucleotide sequence of the mitochondrial (mt) genome of the deep-sea vent snail Ifremeria nautilei (Gastropoda: Abyssochrysoidea) was determined. The double stranded circular molecule is 15,664 pb in length and encodes for the typical 37 metazoan mitochondrial genes. The gene arrangement of the Ifremeria mt genome is most similar to genome organization of caenogastropods and differs only on the relative position of the trnW gene. The deduced amino acid sequences of the mt protein coding genes of Ifremeria mt genome were aligned with orthologous sequences from representatives of the main lineages of gastropods and phylogenetic relationships were inferred. The reconstructed phylogeny supports that Ifremeria belongs to Caenogastropoda and that it is closely related to hypsogastropod superfamilies. Results were compared with a reconstructed nuclear-based phylogeny. Moreover, a relaxed molecular-clock timetree calibrated with fossils dated the divergence of Abyssochrysoidea in the Late Jurassic-Early Cretaceous indicating a relatively modern colonization of deep-sea environments by these snails. Copyright © 2014 Elsevier B.V. All rights reserved.

  10. VDJ-Seq: Deep Sequencing Analysis of Rearranged Immunoglobulin Heavy Chain Gene to Reveal Clonal Evolution Patterns of B Cell Lymphoma.

    PubMed

    Jiang, Yanwen; Nie, Kui; Redmond, David; Melnick, Ari M; Tam, Wayne; Elemento, Olivier

    2015-12-28

    Understanding tumor clonality is critical to understanding the mechanisms involved in tumorigenesis and disease progression. In addition, understanding the clonal composition changes that occur within a tumor in response to certain micro-environment or treatments may lead to the design of more sophisticated and effective approaches to eradicate tumor cells. However, tracking tumor clonal sub-populations has been challenging due to the lack of distinguishable markers. To address this problem, a VDJ-seq protocol was created to trace the clonal evolution patterns of diffuse large B cell lymphoma (DLBCL) relapse by exploiting VDJ recombination and somatic hypermutation (SHM), two unique features of B cell lymphomas. In this protocol, Next-Generation sequencing (NGS) libraries with indexing potential were constructed from amplified rearranged immunoglobulin heavy chain (IgH) VDJ region from pairs of primary diagnosis and relapse DLBCL samples. On average more than half million VDJ sequences per sample were obtained after sequencing, which contain both VDJ rearrangement and SHM information. In addition, customized bioinformatics pipelines were developed to fully utilize sequence information for the characterization of IgH-VDJ repertoire within these samples. Furthermore, the pipeline allows the reconstruction and comparison of the clonal architecture of individual tumors, which enables the examination of the clonal heterogeneity within the diagnosis tumors and deduction of clonal evolution patterns between diagnosis and relapse tumor pairs. When applying this analysis to several diagnosis-relapse pairs, we uncovered key evidence that multiple distinctive tumor evolutionary patterns could lead to DLBCL relapse. Additionally, this approach can be expanded into other clinical aspects, such as identification of minimal residual disease, monitoring relapse progress and treatment response, and investigation of immune repertoires in non-lymphoma contexts.

  11. Integrated digital error suppression for improved detection of circulating tumor DNA

    PubMed Central

    Kurtz, David M.; Chabon, Jacob J.; Scherer, Florian; Stehr, Henning; Liu, Chih Long; Bratman, Scott V.; Say, Carmen; Zhou, Li; Carter, Justin N.; West, Robert B.; Sledge, George W.; Shrager, Joseph B.; Loo, Billy W.; Neal, Joel W.; Wakelee, Heather A.; Diehn, Maximilian; Alizadeh, Ash A.

    2016-01-01

    High-throughput sequencing of circulating tumor DNA (ctDNA) promises to facilitate personalized cancer therapy. However, low quantities of cell-free DNA (cfDNA) in the blood and sequencing artifacts currently limit analytical sensitivity. To overcome these limitations, we introduce an approach for integrated digital error suppression (iDES). Our method combines in silico elimination of highly stereotypical background artifacts with a molecular barcoding strategy for the efficient recovery of cfDNA molecules. Individually, these two methods each improve the sensitivity of cancer personalized profiling by deep sequencing (CAPP-Seq) by ~3 fold, and synergize when combined to yield ~15-fold improvements. As a result, iDES-enhanced CAPP-Seq facilitates noninvasive variant detection across hundreds of kilobases. Applied to clinical non-small cell lung cancer (NSCLC) samples, our method enabled biopsy-free profiling of EGFR kinase domain mutations with 92% sensitivity and 96% specificity and detection of ctDNA down to 4 in 105 cfDNA molecules. We anticipate that iDES will aid the noninvasive genotyping and detection of ctDNA in research and clinical settings. PMID:27018799

  12. Viral to metazoan marine plankton nucleotide sequences from the Tara Oceans expedition

    PubMed Central

    Alberti, Adriana; Poulain, Julie; Engelen, Stefan; Labadie, Karine; Romac, Sarah; Ferrera, Isabel; Albini, Guillaume; Aury, Jean-Marc; Belser, Caroline; Bertrand, Alexis; Cruaud, Corinne; Da Silva, Corinne; Dossat, Carole; Gavory, Frédérick; Gas, Shahinaz; Guy, Julie; Haquelle, Maud; Jacoby, E'krame; Jaillon, Olivier; Lemainque, Arnaud; Pelletier, Eric; Samson, Gaëlle; Wessner, Mark; Bazire, Pascal; Beluche, Odette; Bertrand, Laurie; Besnard-Gonnet, Marielle; Bordelais, Isabelle; Boutard, Magali; Dubois, Maria; Dumont, Corinne; Ettedgui, Evelyne; Fernandez, Patricia; Garcia, Espérance; Aiach, Nathalie Giordanenco; Guerin, Thomas; Hamon, Chadia; Brun, Elodie; Lebled, Sandrine; Lenoble, Patricia; Louesse, Claudine; Mahieu, Eric; Mairey, Barbara; Martins, Nathalie; Megret, Catherine; Milani, Claire; Muanga, Jacqueline; Orvain, Céline; Payen, Emilie; Perroud, Peggy; Petit, Emmanuelle; Robert, Dominique; Ronsin, Murielle; Vacherie, Benoit; Acinas, Silvia G.; Royo-Llonch, Marta; Cornejo-Castillo, Francisco M.; Logares, Ramiro; Fernández-Gómez, Beatriz; Bowler, Chris; Cochrane, Guy; Amid, Clara; Hoopen, Petra Ten; De Vargas, Colomban; Grimsley, Nigel; Desgranges, Elodie; Kandels-Lewis, Stefanie; Ogata, Hiroyuki; Poulton, Nicole; Sieracki, Michael E.; Stepanauskas, Ramunas; Sullivan, Matthew B.; Brum, Jennifer R.; Duhaime, Melissa B.; Poulos, Bonnie T.; Hurwitz, Bonnie L.; Acinas, Silvia G.; Bork, Peer; Boss, Emmanuel; Bowler, Chris; De Vargas, Colomban; Follows, Michael; Gorsky, Gabriel; Grimsley, Nigel; Hingamp, Pascal; Iudicone, Daniele; Jaillon, Olivier; Kandels-Lewis, Stefanie; Karp-Boss, Lee; Karsenti, Eric; Not, Fabrice; Ogata, Hiroyuki; Pesant, Stéphane; Raes, Jeroen; Sardet, Christian; Sieracki, Michael E.; Speich, Sabrina; Stemmann, Lars; Sullivan, Matthew B.; Sunagawa, Shinichi; Wincker, Patrick; Pesant, Stéphane; Karsenti, Eric; Wincker, Patrick

    2017-01-01

    A unique collection of oceanic samples was gathered by the Tara Oceans expeditions (2009–2013), targeting plankton organisms ranging from viruses to metazoans, and providing rich environmental context measurements. Thanks to recent advances in the field of genomics, extensive sequencing has been performed for a deep genomic analysis of this huge collection of samples. A strategy based on different approaches, such as metabarcoding, metagenomics, single-cell genomics and metatranscriptomics, has been chosen for analysis of size-fractionated plankton communities. Here, we provide detailed procedures applied for genomic data generation, from nucleic acids extraction to sequence production, and we describe registries of genomics datasets available at the European Nucleotide Archive (ENA, www.ebi.ac.uk/ena). The association of these metadata to the experimental procedures applied for their generation will help the scientific community to access these data and facilitate their analysis. This paper complements other efforts to provide a full description of experiments and open science resources generated from the Tara Oceans project, further extending their value for the study of the world’s planktonic ecosystems. PMID:28763055

  13. Viral to metazoan marine plankton nucleotide sequences from the Tara Oceans expedition.

    PubMed

    Alberti, Adriana; Poulain, Julie; Engelen, Stefan; Labadie, Karine; Romac, Sarah; Ferrera, Isabel; Albini, Guillaume; Aury, Jean-Marc; Belser, Caroline; Bertrand, Alexis; Cruaud, Corinne; Da Silva, Corinne; Dossat, Carole; Gavory, Frédérick; Gas, Shahinaz; Guy, Julie; Haquelle, Maud; Jacoby, E'krame; Jaillon, Olivier; Lemainque, Arnaud; Pelletier, Eric; Samson, Gaëlle; Wessner, Mark; Acinas, Silvia G; Royo-Llonch, Marta; Cornejo-Castillo, Francisco M; Logares, Ramiro; Fernández-Gómez, Beatriz; Bowler, Chris; Cochrane, Guy; Amid, Clara; Hoopen, Petra Ten; De Vargas, Colomban; Grimsley, Nigel; Desgranges, Elodie; Kandels-Lewis, Stefanie; Ogata, Hiroyuki; Poulton, Nicole; Sieracki, Michael E; Stepanauskas, Ramunas; Sullivan, Matthew B; Brum, Jennifer R; Duhaime, Melissa B; Poulos, Bonnie T; Hurwitz, Bonnie L; Pesant, Stéphane; Karsenti, Eric; Wincker, Patrick

    2017-08-01

    A unique collection of oceanic samples was gathered by the Tara Oceans expeditions (2009-2013), targeting plankton organisms ranging from viruses to metazoans, and providing rich environmental context measurements. Thanks to recent advances in the field of genomics, extensive sequencing has been performed for a deep genomic analysis of this huge collection of samples. A strategy based on different approaches, such as metabarcoding, metagenomics, single-cell genomics and metatranscriptomics, has been chosen for analysis of size-fractionated plankton communities. Here, we provide detailed procedures applied for genomic data generation, from nucleic acids extraction to sequence production, and we describe registries of genomics datasets available at the European Nucleotide Archive (ENA, www.ebi.ac.uk/ena). The association of these metadata to the experimental procedures applied for their generation will help the scientific community to access these data and facilitate their analysis. This paper complements other efforts to provide a full description of experiments and open science resources generated from the Tara Oceans project, further extending their value for the study of the world's planktonic ecosystems.

  14. Do Students Develop towards More Deep Approaches to Learning during Studies? A Systematic Review on the Development of Students' Deep and Surface Approaches to Learning in Higher Education

    ERIC Educational Resources Information Center

    Asikainen, Henna; Gijbels, David

    2017-01-01

    The focus of the present paper is on the contribution of the research in the student approaches to learning tradition. Several studies in this field have started from the assumption that students' approaches to learning develop towards more deep approaches to learning in higher education. This paper reports on a systematic review of longitudinal…

  15. Performance evaluation of 2D and 3D deep learning approaches for automatic segmentation of multiple organs on CT images

    NASA Astrophysics Data System (ADS)

    Zhou, Xiangrong; Yamada, Kazuma; Kojima, Takuya; Takayama, Ryosuke; Wang, Song; Zhou, Xinxin; Hara, Takeshi; Fujita, Hiroshi

    2018-02-01

    The purpose of this study is to evaluate and compare the performance of modern deep learning techniques for automatically recognizing and segmenting multiple organ regions on 3D CT images. CT image segmentation is one of the important task in medical image analysis and is still very challenging. Deep learning approaches have demonstrated the capability of scene recognition and semantic segmentation on nature images and have been used to address segmentation problems of medical images. Although several works showed promising results of CT image segmentation by using deep learning approaches, there is no comprehensive evaluation of segmentation performance of the deep learning on segmenting multiple organs on different portions of CT scans. In this paper, we evaluated and compared the segmentation performance of two different deep learning approaches that used 2D- and 3D deep convolutional neural networks (CNN) without- and with a pre-processing step. A conventional approach that presents the state-of-the-art performance of CT image segmentation without deep learning was also used for comparison. A dataset that includes 240 CT images scanned on different portions of human bodies was used for performance evaluation. The maximum number of 17 types of organ regions in each CT scan were segmented automatically and compared to the human annotations by using ratio of intersection over union (IU) as the criterion. The experimental results demonstrated the IUs of the segmentation results had a mean value of 79% and 67% by averaging 17 types of organs that segmented by a 3D- and 2D deep CNN, respectively. All the results of the deep learning approaches showed a better accuracy and robustness than the conventional segmentation method that used probabilistic atlas and graph-cut methods. The effectiveness and the usefulness of deep learning approaches were demonstrated for solving multiple organs segmentation problem on 3D CT images.

  16. Characterising Atlantic deep waters during the extreme warmth of the early Eocene 'greenhouse'.

    NASA Astrophysics Data System (ADS)

    Cameron, A.; Sexton, P. F.; Anand, P.; Huck, C. E.; Fehr, M.; Dickson, A.; Scher, H. D.; van de Flierdt, T.; Westerhold, T.; Roehl, U.

    2014-12-01

    The meridional overturning circulation (MOC) is a planetary-scale oceanic flow that is of direct importance to the climate system because it transports heat, salt and nutrients to high latitudes and regulates the exchange of CO2 with the atmosphere. The Atlantic Ocean plays a strong role in the modern day MOC however, it is unclear what role it may have played during extreme climate conditions such as those found in the early Eocene 'greenhouse'. In order to resolve the Atlantic's role in the MOC during the early/middle Eocene, we present a multi-proxy approach to investigate changes in ocean circulation, water mass geometry, sediment supply to the deep oceans and the physical strength of deep waters from four different IODP drill sites. Neodymium isotopes (ɛNd), REE profiles and cerium anomalies measured in fossilised fish teeth help to characterise geochemical changes to water masses throughout the Atlantic whilst bulk sediment ɛNd and XRF-core scan data documents changes in sediment supply to the region. Sortable silt data provides a physical constraint on the strength of deep-water movements during the extreme climatic conditions of the early Eocene. We utilise expanded and continuous sequences from two sites in the North west Atlantic spanning the early to middle Eocene recently recovered on IODP Exp. 342 (1403, 1409) that are located on the Newfoundland Ridge, directly in the flow path of today's Deep Western Boundary Current. We also present data from equatorial Demerara Rise (IODP site 1258) and from further north at the mouth of the Labrador Sea (ODP Site 647).

  17. iSS-PC: Identifying Splicing Sites via Physical-Chemical Properties Using Deep Sparse Auto-Encoder.

    PubMed

    Xu, Zhao-Chun; Wang, Peng; Qiu, Wang-Ren; Xiao, Xuan

    2017-08-15

    Gene splicing is one of the most significant biological processes in eukaryotic gene expression, such as RNA splicing, which can cause a pre-mRNA to produce one or more mature messenger RNAs containing the coded information with multiple biological functions. Thus, identifying splicing sites in DNA/RNA sequences is significant for both the bio-medical research and the discovery of new drugs. However, it is expensive and time consuming based only on experimental technique, so new computational methods are needed. To identify the splice donor sites and splice acceptor sites accurately and quickly, a deep sparse auto-encoder model with two hidden layers, called iSS-PC, was constructed based on minimum error law, in which we incorporated twelve physical-chemical properties of the dinucleotides within DNA into PseDNC to formulate given sequence samples via a battery of cross-covariance and auto-covariance transformations. In this paper, five-fold cross-validation test results based on the same benchmark data-sets indicated that the new predictor remarkably outperformed the existing prediction methods in this field. Furthermore, it is expected that many other related problems can be also studied by this approach. To implement classification accurately and quickly, an easy-to-use web-server for identifying slicing sites has been established for free access at: http://www.jci-bioinfo.cn/iSS-PC.

  18. Position-aware deep multi-task learning for drug-drug interaction extraction.

    PubMed

    Zhou, Deyu; Miao, Lei; He, Yulan

    2018-05-01

    A drug-drug interaction (DDI) is a situation in which a drug affects the activity of another drug synergistically or antagonistically when being administered together. The information of DDIs is crucial for healthcare professionals to prevent adverse drug events. Although some known DDIs can be found in purposely-built databases such as DrugBank, most information is still buried in scientific publications. Therefore, automatically extracting DDIs from biomedical texts is sorely needed. In this paper, we propose a novel position-aware deep multi-task learning approach for extracting DDIs from biomedical texts. In particular, sentences are represented as a sequence of word embeddings and position embeddings. An attention-based bidirectional long short-term memory (BiLSTM) network is used to encode each sentence. The relative position information of words with the target drugs in text is combined with the hidden states of BiLSTM to generate the position-aware attention weights. Moreover, the tasks of predicting whether or not two drugs interact with each other and further distinguishing the types of interactions are learned jointly in multi-task learning framework. The proposed approach has been evaluated on the DDIExtraction challenge 2013 corpus and the results show that with the position-aware attention only, our proposed approach outperforms the state-of-the-art method by 0.99% for binary DDI classification, and with both position-aware attention and multi-task learning, our approach achieves a micro F-score of 72.99% on interaction type identification, outperforming the state-of-the-art approach by 1.51%, which demonstrates the effectiveness of the proposed approach. Copyright © 2018 Elsevier B.V. All rights reserved.

  19. Factors Contributing to Changes in a Deep Approach to Learning in Different Learning Environments

    ERIC Educational Resources Information Center

    Postareff, Liisa; Parpala, Anna; Lindblom-Ylänne, Sari

    2015-01-01

    The study explored factors explaining changes in a deep approach to learning. The data consisted of interviews with 12 students from four Bachelor-level courses representing different disciplines. We analysed and compared descriptions of students whose deep approach either increased, decreased or remained relatively unchanged during their courses.…

  20. Dental students' perception of their approaches to learning in a PBL programme.

    PubMed

    Haghparast, H; Ghorbani, A; Rohlin, M

    2017-08-01

    To compare dental students' perceptions of their learning approaches between different years of a problem-based learning (PBL) programme. The hypothesis was that in a comparison between senior and junior students, the senior students would perceive themselves as having a higher level of deep learning approach and a lower level of surface learning approach than junior students would. This hypothesis was based on the fact that senior students have longer experience of a student-centred educational context, which is supposed to underpin student learning. Students of three cohorts (first year, third year and fifth year) of a PBL-based dental programme were asked to respond to a questionnaire (R-SPQ-2F) developed to analyse students' learning approaches, that is deep approach and surface approach, using four subscales including deep strategy, surface strategy, deep motive and surface motive. The results of the three cohorts were compared using a one-way analysis of variance (ANOVA). A P-value was set at <0.05 for statistical significance. The fifth-year students demonstrated a lower surface approach than the first-year students (P = 0.020). There was a significant decrease in surface strategy from the first to the fifth year (P = 0.003). No differences were found concerning deep approach or its subscales (deep strategy and deep motive) between the mean scores of the three cohorts. The results did not show the expected increased depth in learning approaches over the programme years. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  1. A ruby-colored Pseudobaeospora species is described as new from material collected on the island of Hawaii.

    PubMed

    Desjardin, Dennis E; Hemmes, Don E; Perry, Brian A

    2014-01-01

    Pseudobaeospora wipapatiae is described as new based on material collected in alien wet habitats on the island of Hawaii. Unique features of this beautiful species include deep ruby-colored basidiomes with two-spored basidia, amyloid cheilocystidia and a hymeniderm pileipellis with abundant pileocystidia that is initially deep ruby in KOH then changes to lilac gray. Phylogenetic analysis of nuclear large ribosomal subunit sequence data suggest a close relationship between Pseudobaeospora and Tricholoma. BLAST comparisons of internal transcribed spacer and 5.8S nuclear ribosomal subunit regions sequence data reveal greatest similarity with existing sequences of Pseudobaeospora species. A comprehensive description, color photograph, illustrations of salient micromorphological features and comparisons with phenetically similar taxa are provided. © 2014 by The Mycological Society of America.

  2. Deep-Sea Microbes: Linking Biogeochemical Rates to -Omics Approaches

    NASA Astrophysics Data System (ADS)

    Herndl, G. J.; Sintes, E.; Bayer, B.; Bergauer, K.; Amano, C.; Hansman, R.; Garcia, J.; Reinthaler, T.

    2016-02-01

    Over the past decade substantial progress has been made in determining deep ocean microbial activity and resolving some of the enigmas in understanding the deep ocean carbon flux. Also, metagenomics approaches have shed light onto the dark ocean's microbes but linking -omics approaches to biogeochemical rate measurements are generally rare in microbial oceanography and even more so for the deep ocean. In this presentation, we will show by combining metagenomics, -proteomics and biogeochemical rate measurements on the bulk and single-cell level that deep-sea microbes exhibit characteristics of generalists with a large genome repertoire, versatile in utilizing substrate as revealed by metaproteomics. This is in striking contrast with the apparently rather uniform dissolved organic matter pool in the deep ocean. Combining the different -omics approaches with metabolic rate measurements, we will highlight some major inconsistencies and enigmas in our understanding of the carbon cycling and microbial food web structure in the dark ocean.

  3. Deep sequencing-based transcriptome profiling analysis of bacteria-challenged Lateolabrax japonicus reveals insight into the immune-relevant genes in marine fish

    PubMed Central

    2010-01-01

    Background Systematic research on fish immunogenetics is indispensable in understanding the origin and evolution of immune systems. This has long been a challenging task because of the limited number of deep sequencing technologies and genome backgrounds of non-model fish available. The newly developed Solexa/Illumina RNA-seq and Digital gene expression (DGE) are high-throughput sequencing approaches and are powerful tools for genomic studies at the transcriptome level. This study reports the transcriptome profiling analysis of bacteria-challenged Lateolabrax japonicus using RNA-seq and DGE in an attempt to gain insights into the immunogenetics of marine fish. Results RNA-seq analysis generated 169,950 non-redundant consensus sequences, among which 48,987 functional transcripts with complete or various length encoding regions were identified. More than 52% of these transcripts are possibly involved in approximately 219 known metabolic or signalling pathways, while 2,673 transcripts were associated with immune-relevant genes. In addition, approximately 8% of the transcripts appeared to be fish-specific genes that have never been described before. DGE analysis revealed that the host transcriptome profile of Vibrio harveyi-challenged L. japonicus is considerably altered, as indicated by the significant up- or down-regulation of 1,224 strong infection-responsive transcripts. Results indicated an overall conservation of the components and transcriptome alterations underlying innate and adaptive immunity in fish and other vertebrate models. Analysis suggested the acquisition of numerous fish-specific immune system components during early vertebrate evolution. Conclusion This study provided a global survey of host defence gene activities against bacterial challenge in a non-model marine fish. Results can contribute to the in-depth study of candidate genes in marine fish immunity, and help improve current understanding of host-pathogen interactions and evolutionary history of immunogenetics from fish to mammals. PMID:20707909

  4. Clonal evolution of acute myeloid leukemia highlighted by latest genome sequencing studies.

    PubMed

    Zhang, Xuehong; Lv, Dekang; Zhang, Yu; Liu, Quentin; Li, Zhiguang

    2016-09-06

    Decades of years might be required for an initiated cell to become a fully-pledged, metastasized tumor. DNA mutations are accumulated during this process including background mutations that emerge scholastically, as well as driver mutations that selectively occur in a handful of cancer genes and confer the cell a growth advantage over its neighbors. A clone of tumor cells could be superseded by another clone that acquires new mutations and grows more aggressively. Tumor evolutional patterns have been studied for years using conventional approaches that focus on the investigation of a single or a couple of genes. Latest deep sequencing technology enables a global view of tumor evolution by deciphering almost all genome aberrations in a tumor. Tumor clones and the fate of each clone during tumor evolution can be depicted with the help of the concept of variant allele frequency. Here, we summarize the new insights of cancer evolutional progression in acute myeloid leukemia. Cancer evolution is currently thought to start from a clone that has accumulated the requisite somatically-acquired genetic aberrations through a series of increasingly disordered clinical and pathological phases, eventually leading to malignant transformation [1-3]. The observations in invasive colorectal cancer that usually emerges from an antecedent benign adenomatous polyp and in cervical cancer that proceeds through intraepithelial neoplasia support the idea of stepwise or linear cancerous progression [3-5]. Genetically, such progression is achieved by successive waves of clonal expansion during which cells acquire novel genomic alterations including single nucleotide variants (SNVs), small insertions and deletions (indels), and/or copy number variations (CNVs) [6]. The latest improvement in sequencing technology has allowed the deciphering of the whole exome or genome in different types of tumor and normal tissue pairs, providing detailed catalogue about genome aberrations during tumor initiation and progression, which have been reviewed in several papers [7-10]. Here, we focus on demonstrating the cancer clonal evolution pattern revealed by recent deep sequencing studies of samples from acute myeloid leukemia (AML) patients.

  5. On the age and mass function of the globular cluster M 4: A different interpretation of recent deep HST observations

    NASA Astrophysics Data System (ADS)

    De Marchi, G.; Paresce, F.; Straniero, O.; Prada Moroni, P. G.

    2004-03-01

    Very deep images of the Galactic globular cluster M 4 (NGC 6121) through the F606W and F814W filters were taken in 2001 with the WFPC2 on board the HST. A first published analysis of this data set (Richer et al. \\cite{Richer2002}) produced the result that the age of M 4 is 12.7± 0.7 Gyr (Hansen et al. \\cite{Hansen2002}), thus setting a robust lower limit to the age of the universe. In view of the great astronomical importance of getting this number right, we have subjected the same data set to the simplest possible photometric analysis that completely avoids uncertain assumptions about the origin of the detected sources. This analysis clearly reveals both a thin main sequence, from which can be deduced the deepest statistically complete mass function yet determined for a globular cluster, and a white dwarf (WD) sequence extending all the way down to the 5 \\sigma detection limit at I ≃ 27. The WD sequence is abruptly terminated at exactly this limit as expected by detection statistics. Using our most recent theoretical WD models (Prada Moroni & Straniero \\cite{Prada2002}) to obtain the expected WD sequence for different ages in the observed bandpasses, we find that the data so far obtained do not reach the peak of the WD luminosity function, thus only allowing one to set a lower limit to the age of M 4 of ˜9 Gyr. Thus, the problem of determining the absolute age of a globular cluster and, therefore, the onset of GC formation with cosmologically significant accuracy remains completely open. Only observations several magnitudes deeper than the limit obtained so far would allow one to approach this objective. Based on observations with the NASA/ESA Hubble Space Telescope, obtained at the Space Telescope Science Institute, which is operated by AURA for NASA under contract NAS5-26555.

  6. Is There Still Room for Novel Viral Pathogens in Pediatric Respiratory Tract Infections?

    PubMed Central

    Taboada, Blanca; Espinoza, Marco A.; Isa, Pavel; Aponte, Fernando E.; Arias-Ortiz, María A.; Monge-Martínez, Jesús; Rodríguez-Vázquez, Rubén; Díaz-Hernández, Fidel; Zárate-Vidal, Fernando; Wong-Chew, Rosa María; Firo-Reyes, Verónica; del Río-Almendárez, Carlos N.; Gaitán-Meza, Jesús; Villaseñor-Sierra, Alberto; Martínez-Aguilar, Gerardo; Salas-Mier, Ma. del Carmen; Noyola, Daniel E.; Pérez-Gónzalez, Luis F.; López, Susana; Santos-Preciado, José I.; Arias, Carlos F.

    2014-01-01

    Viruses are the most frequent cause of respiratory disease in children. However, despite the advanced diagnostic methods currently in use, in 20 to 50% of respiratory samples a specific pathogen cannot be detected. In this work, we used a metagenomic approach and deep sequencing to examine respiratory samples from children with lower and upper respiratory tract infections that had been previously found negative for 6 bacteria and 15 respiratory viruses by PCR. Nasal washings from 25 children (out of 250) hospitalized with a diagnosis of pneumonia and nasopharyngeal swabs from 46 outpatient children (out of 526) were studied. DNA reads for at least one virus commonly associated to respiratory infections was found in 20 of 25 hospitalized patients, while reads for pathogenic respiratory bacteria were detected in the remaining 5 children. For outpatients, all the samples were pooled into 25 DNA libraries for sequencing. In this case, in 22 of the 25 sequenced libraries at least one respiratory virus was identified, while in all other, but one, pathogenic bacteria were detected. In both patient groups reads for respiratory syncytial virus, coronavirus-OC43, and rhinovirus were identified. In addition, viruses less frequently associated to respiratory infections were also found. Saffold virus was detected in outpatient but not in hospitalized children. Anellovirus, rotavirus, and astrovirus, as well as several animal and plant viruses were detected in both groups. No novel viruses were identified. Adding up the deep sequencing results to the PCR data, 79.2% of 250 hospitalized and 76.6% of 526 ambulatory patients were positive for viruses, and all other children, but one, had pathogenic respiratory bacteria identified. These results suggest that at least in the type of populations studied and with the sampling methods used the odds of finding novel, clinically relevant viruses, in pediatric respiratory infections are low. PMID:25412469

  7. Discovery of Pod Shatter-Resistant Associated SNPs by Deep Sequencing of a Representative Library Followed by Bulk Segregant Analysis in Rapeseed

    PubMed Central

    Huang, Shunmou; Yang, Hongli; Zhan, Gaomiao; Wang, Xinfa; Liu, Guihua; Wang, Hanzhong

    2012-01-01

    Background Single nucleotide polymorphisms (SNPs) are an important class of genetic marker for target gene mapping. As of yet, there is no rapid and effective method to identify SNPs linked with agronomic traits in rapeseed and other crop species. Methodology/Principal Findings We demonstrate a novel method for identifying SNP markers in rapeseed by deep sequencing a representative library and performing bulk segregant analysis. With this method, SNPs associated with rapeseed pod shatter-resistance were discovered. Firstly, a reduced representation of the rapeseed genome was used. Genomic fragments ranging from 450–550 bp were prepared from the susceptible bulk (ten F2 plants with the silique shattering resistance index, SSRI <0.10) and the resistance bulk (ten F2 plants with SSRI >0.90), and also Solexa sequencing-produced 90 bp reads. Approximately 50 million of these sequence reads were assembled into contigs to a depth of 20-fold coverage. Secondly, 60,396 ‘simple SNPs’ were identified, and the statistical significance was evaluated using Fisher's exact test. There were 70 associated SNPs whose –log10 p value over 16 were selected to be further analyzed. The distribution of these SNPs appeared a tight cluster, which consisted of 14 associated SNPs within a 396 kb region on chromosome A09. Our evidence indicates that this region contains a major quantitative trait locus (QTL). Finally, two associated SNPs from this region were mapped on a major QTL region. Conclusions/Significance 70 associated SNPs were discovered and a major QTL for rapeseed pod shatter-resistance was found on chromosome A09 using our novel method. The associated SNP markers were used for mapping of the QTL, and may be useful for improving pod shatter-resistance in rapeseed through marker-assisted selection and map-based cloning. This approach will accelerate the discovery of major QTLs and the cloning of functional genes for important agronomic traits in rapeseed and other crop species. PMID:22529909

  8. Thalassospira indica sp. nov., isolated from deep seawater.

    PubMed

    Liu, Yang; Lai, Qiliang; Du, Juan; Sun, Fengqin; Shao, Zongze

    2016-12-01

    A taxonomic study using a polyphasic approach was carried out on strain PB8BT, which was isolated from the deep water of the Indian Ocean. Cells of the bacterium were Gram-stain-negative, oxidase- and catalase-positive, curved rods and motile. Growth was observed at salinities of 0-15 % and at temperatures of 10-41°C. The isolate could reduce nitrate to nitrite and degrade Tween 80, but not degrade gelatin. Phylogenetic analysis based on 16S rRNA gene sequences indicated that strain PB8BT belonged to the genus Thalassospira, with the highest sequence similarity to the closely related type strain Thalassospira tepidiphila 1-1BT (99.7 %), followed by Thalassospira profundimaris WP0211T (99.6 %). Multilocus sequence analysis demonstrated low similarities of 94.1 and 93.7 % between strain PB8BT and the two reference type strains. Digital DNA-DNA hybridization values between strain PB8BT and the two above-mentioned type strains were, respectively, 56.3 and 55.3 %. The principal fatty acids of strain PB8BT were C18 : 1ω6c/C18 : 1ω7c, C19 : 0 cyclo ω8c and C16 : 0. The G+C content of the chromosomal DNA was 54.9 mol%. The quinone was determined to be Q-10 (100 %). Phosphatidylglycerol, phosphatidylethanolamine, and several unidentified phospholipids and lipids were present. Based on phenotypic and genotypic characteristics, strain PB8BT represents a novel species within the genus Thalassospira, for which the name Thalassospira indica sp. nov. is proposed. The type strain of the novel species is PB8BT (=MCCC 1A01103T=LMG 29620T).

  9. Deep Sequencing Reveals the Complete Genome and Evidence for Transcriptional Activity of the First Virus-Like Sequences Identified in Aristotelia chilensis (Maqui Berry)

    PubMed Central

    Villacreses, Javier; Rojas-Herrera, Marcelo; Sánchez, Carolina; Hewstone, Nicole; Undurraga, Soledad F.; Alzate, Juan F.; Manque, Patricio; Maracaja-Coutinho, Vinicius; Polanco, Victor

    2015-01-01

    Here, we report the genome sequence and evidence for transcriptional activity of a virus-like element in the native Chilean berry tree Aristotelia chilensis. We propose to name the endogenous sequence as Aristotelia chilensis Virus 1 (AcV1). High-throughput sequencing of the genome of this tree uncovered an endogenous viral element, with a size of 7122 bp, corresponding to the complete genome of AcV1. Its sequence contains three open reading frames (ORFs): ORFs 1 and 2 shares 66%–73% amino acid similarity with members of the Caulimoviridae virus family, especially the Petunia vein clearing virus (PVCV), Petuvirus genus. ORF1 encodes a movement protein (MP); ORF2 a Reverse Transcriptase (RT) and a Ribonuclease H (RNase H) domain; and ORF3 showed no amino acid sequence similarity with any other known virus proteins. Analogous to other known endogenous pararetrovirus sequences (EPRVs), AcV1 is integrated in the genome of Maqui Berry and showed low viral transcriptional activity, which was detected by deep sequencing technology (DNA and RNA-seq). Phylogenetic analysis of AcV1 and other pararetroviruses revealed a closer resemblance with Petuvirus. Overall, our data suggests that AcV1 could be a new member of Caulimoviridae family, genus Petuvirus, and the first evidence of this kind of virus in a fruit plant. PMID:25855242

  10. High Diversity of Myocyanophage in Various Aquatic Environments Revealed by High-Throughput Sequencing of Major Capsid Protein Gene With a New Set of Primers.

    PubMed

    Hou, Weiguo; Wang, Shang; Briggs, Brandon R; Li, Gaoyuan; Xie, Wei; Dong, Hailiang

    2018-01-01

    Myocyanophages, a group of viruses infecting cyanobacteria, are abundant and play important roles in elemental cycling. Here we investigated the particle-associated viral communities retained on 0.2 μm filters and in sediment samples (representing ancient cyanophage communities) from four ocean and three lake locations, using high-throughput sequencing and a newly designed primer pair targeting a gene fragment (∼145-bp in length) encoding the cyanophage gp23 major capsid protein (MCP). Diverse viral communities were detected in all samples. The fragments of 142-, 145-, and 148-bp in length were most abundant in the amplicons, and most sequences (>92%) belonged to cyanophages. Additionally, different sequencing depths resulted in different diversity estimates of the viral community. Operational taxonomic units obtained from deep sequencing of the MCP gene covered the majority of those obtained from shallow sequencing, suggesting that deep sequencing exhibited a more complete picture of cyanophage community than shallow sequencing. Our results also revealed a wide geographic distribution of marine myocyanophages, i.e., higher dissimilarities of the myocyanophage communities corresponded with the larger distances between the sampling sites. Collectively, this study suggests that the newly designed primer pair can be effectively used to study the community and diversity of myocyanophage from different environments, and the high-throughput sequencing represents a good method to understand viral diversity.

  11. High Diversity of Myocyanophage in Various Aquatic Environments Revealed by High-Throughput Sequencing of Major Capsid Protein Gene With a New Set of Primers

    PubMed Central

    Hou, Weiguo; Wang, Shang; Briggs, Brandon R.; Li, Gaoyuan; Xie, Wei; Dong, Hailiang

    2018-01-01

    Myocyanophages, a group of viruses infecting cyanobacteria, are abundant and play important roles in elemental cycling. Here we investigated the particle-associated viral communities retained on 0.2 μm filters and in sediment samples (representing ancient cyanophage communities) from four ocean and three lake locations, using high-throughput sequencing and a newly designed primer pair targeting a gene fragment (∼145-bp in length) encoding the cyanophage gp23 major capsid protein (MCP). Diverse viral communities were detected in all samples. The fragments of 142-, 145-, and 148-bp in length were most abundant in the amplicons, and most sequences (>92%) belonged to cyanophages. Additionally, different sequencing depths resulted in different diversity estimates of the viral community. Operational taxonomic units obtained from deep sequencing of the MCP gene covered the majority of those obtained from shallow sequencing, suggesting that deep sequencing exhibited a more complete picture of cyanophage community than shallow sequencing. Our results also revealed a wide geographic distribution of marine myocyanophages, i.e., higher dissimilarities of the myocyanophage communities corresponded with the larger distances between the sampling sites. Collectively, this study suggests that the newly designed primer pair can be effectively used to study the community and diversity of myocyanophage from different environments, and the high-throughput sequencing represents a good method to understand viral diversity.

  12. Molecular diversity and distribution pattern of ciliates in sediments from deep-sea hydrothermal vents in the Okinawa Trough and adjacent sea areas

    NASA Astrophysics Data System (ADS)

    Zhao, Feng; Xu, Kuidong

    2016-10-01

    In comparison with the macrobenthos and prokaryotes, patterns of diversity and distribution of microbial eukaryotes in deep-sea hydrothermal vents are poorly known. The widely used high-throughput sequencing of 18S rDNA has revealed a high diversity of microeukaryotes yielded from both living organisms and buried DNA in marine sediments. More recently, cDNA surveys have been utilized to uncover the diversity of active organisms. However, both methods have never been used to evaluate the diversity of ciliates in hydrothermal vents. By using high-throughput DNA and cDNA sequencing of 18S rDNA, we evaluated the molecular diversity of ciliates, a representative group of microbial eukaryotes, from the sediments of deep-sea hydrothermal vents in the Okinawa Trough and compared it with that of an adjacent deep-sea area about 15 km away and that of an offshore area of the Yellow Sea about 500 km away. The results of DNA sequencing showed that Spirotrichea and Oligohymenophorea were the most diverse and abundant groups in all the three habitats. The proportion of sequences of Oligohymenophorea was the highest in the hydrothermal vents whereas Spirotrichea was the most diverse group at all three habitats. Plagiopyleans were found only in the hydrothermal vents but with low diversity and abundance. By contrast, the cDNA sequencing showed that Plagiopylea was the most diverse and most abundant group in the hydrothermal vents, followed by Spirotrichea in terms of diversity and Oligohymenophorea in terms of relative abundance. A novel group of ciliates, distinctly separate from the 12 known classes, was detected in the hydrothermal vents, indicating undescribed, possibly highly divergent ciliates may inhabit this environment. Statistical analyses showed that: (i) the three habitats differed significantly from one another in terms of diversity of both the rare and the total ciliate taxa, and; (ii) the adjacent deep sea was more similar to the offshore area than to the hydrothermal vents. In terms of the diversity of abundant taxa, however, there was no significant difference between the hydrothermal vents and the adjacent deep sea, both of which differed significantly from the offshore area. As abundant ciliate taxa can be found in several sampling sites, they are likely adapted to large environmental variations, while rare taxa are found in specific habitat and thus are potentially more sensitive to varying environmental conditions.

  13. KungFQ: a simple and powerful approach to compress fastq files.

    PubMed

    Grassi, Elena; Di Gregorio, Federico; Molineris, Ivan

    2012-01-01

    Nowadays storing data derived from deep sequencing experiments has become pivotal and standard compression algorithms do not exploit in a satisfying manner their structure. A number of reference-based compression algorithms have been developed but they are less adequate when approaching new species without fully sequenced genomes or nongenomic data. We developed a tool that takes advantages of fastq characteristics and encodes them in a binary format optimized in order to be further compressed with standard tools (such as gzip or lzma). The algorithm is straightforward and does not need any external reference file, it scans the fastq only once and has a constant memory requirement. Moreover, we added the possibility to perform lossy compression, losing some of the original information (IDs and/or qualities) but resulting in smaller files; it is also possible to define a quality cutoff under which corresponding base calls are converted to N. We achieve 2.82 to 7.77 compression ratios on various fastq files without losing information and 5.37 to 8.77 losing IDs, which are often not used in common analysis pipelines. In this paper, we compare the algorithm performance with known tools, usually obtaining higher compression levels.

  14. Role of Mitochondrial Inheritance on Prostate Cancer Outcome in African American Men. Addendum

    DTIC Science & Technology

    2016-11-01

    DNA sequencing technique developed by our collaborator using single amplicon long-range PCR that permits deep coverage (10,000-20,000X on average) of...the mitochondrial genome. We have sequenced 652 samples derived from frozen fully using this technology. The additional DNA samples derived from...paraffin embedded (FFPE) tissue were more challenging, but have now been sequenced . Mapping of DNA variants in our sequenced genomes to mitochondrial

  15. Learning approach among health sciences students in a medical college in Nepal: a cross-sectional study.

    PubMed

    Shah, Dev Kumar; Yadav, Ram Lochan; Sharma, Deepak; Yadav, Prakash Kumar; Sapkota, Niraj Khatri; Jha, Rajesh Kumar; Islam, Md Nazrul

    2016-01-01

    Many factors shape the quality of learning. The intrinsically motivated students adopt a deep approach to learning, while students who fear failure in assessments adopt a surface approach to learning. In the area of health science education in Nepal, there is still a lack of studies on learning approach that can be used to transform the students to become better learners and improve the effectiveness of teaching. Therefore, we aimed to explore the learning approaches among medical, dental, and nursing students of Chitwan Medical College, Nepal using Biggs's Revised Two-Factor Study Process Questionnaire (R-SPQ-2F) after testing its reliability. R-SPQ-2F containing 20 items represented two main scales of learning approaches, deep and surface, with four subscales: deep motive, deep strategy, surface motive, and surface strategy. Each subscale had five items and each item was rated on a 5-point Likert scale. The data were analyzed using Student's t-test and analysis of variance. Reliability of the administered questionnaire was checked using Cronbach's alpha. The Cronbach's alpha value (0.6) for 20 items of R-SPQ-2F was found to be acceptable for its use. The participants predominantly had a deep approach to learning regardless of their age and sex (deep: 32.62±6.33 versus surface: 25.14±6.81, P<0.001). The level of deep approach among medical students (33.26±6.40) was significantly higher than among dental (31.71±6.51) and nursing (31.36±4.72) students. In comparison to first-year students, deep approach among second-year medical (34.63±6.51 to 31.73±5.93; P<0.001) and dental (33.47±6.73 to 29.09±5.62; P=0.002) students was found to be significantly decreased. On the other hand, surface approach significantly increased (25.55±8.19 to 29.34±6.25; P=0.023) among second-year dental students compared to first-year dental students. Medical students were found to adopt a deeper approach to learning than dental and nursing students. However, irrespective of disciplines and personal characteristics of participants, the primarily deep learning approach was found to be shifting progressively toward a surface approach after completion of an academic year, which should be avoided.

  16. Deep Sequencing Reveals a Divergent Ugandan cassava brown streak virus Isolate from Malawi

    PubMed Central

    Winter, Stephan; Mukasa, Settumba; Tairo, Fred; Sseruwagi, Peter; Ndunguru, Joseph; Duffy, Siobain

    2017-01-01

    ABSTRACT Illumina sequencing of RNA from a cassava cutting from northern Malawi produced a genome of Ugandan cassava brown streak virus (UCBSV-MW-NB7_2013). Sequence comparisons revealed stronger similarity to an isolate from nearby Tanzania (93.4% pairwise nucleotide identity) than to those previously reported from Malawi (86.9 to 87.0%). PMID:28818908

  17. High-Throughput SNP Discovery through Deep Resequencing of a Reduced Representation Library to Anchor and Orient Scaffolds in the Soybean Whole Genome Sequence

    USDA-ARS?s Scientific Manuscript database

    The soybean Consensus Map 4.0 facilitated the anchoring of 95.6% of the soybean whole genome sequence developed by the Joint Genome Institute, Department of Energy but only properly oriented 66% of the sequence scaffolds. To find additional single nucleotide polymorphism (SNP) markers for additiona...

  18. Carbonate sedimentation in an extensional active margin: Cretaceous history of the Haymana region, Pontides

    NASA Astrophysics Data System (ADS)

    Okay, Aral I.; Altiner, Demir

    2016-10-01

    The Haymana region in Central Anatolia is located in the southern part of the Pontides close to the İzmir-Ankara suture. During the Cretaceous, the region formed part of the south-facing active margin of the Eurasia. The area preserves a nearly complete record of the Cretaceous system. Shallow marine carbonates of earliest Cretaceous age are overlain by a 700-m-thick Cretaceous sequence, dominated by deep marine limestones. Three unconformity-bounded pelagic carbonate sequences of Berriasian, Albian-Cenomanian and Turonian-Santonian ages are recognized: Each depositional sequence is preceded by a period of tilting and submarine erosion during the Berriasian, early Albian and late Cenomanian, which corresponds to phases of local extension in the active continental margin. Carbonate breccias mark the base of the sequences and each carbonate sequence steps down on older units. The deep marine carbonate deposition ended in the late Santonian followed by tilting, erosion and folding during the Campanian. Deposition of thick siliciclastic turbidites started in the late Campanian and continued into the Tertiary. Unlike most forearc basins, the Haymana region was a site of deep marine carbonate deposition until the Campanian. This was because the Pontide arc was extensional and the volcanic detritus was trapped in the intra-arc basins and did not reach the forearc or the trench. The extensional nature of the arc is also shown by the opening of the Black Sea as a backarc basin in the Turonian-Santonian. The carbonate sedimentation in an active margin is characterized by synsedimentary vertical displacements, which results in submarine erosion, carbonate breccias and in the lateral discontinuity of the sequences, and differs from blanket like carbonate deposition in the passive margins.

  19. Subsurface microbial diversity in deep-granitic-fracture water in Colorado

    USGS Publications Warehouse

    Sahl, J.W.; Schmidt, R.; Swanner, E.D.; Mandernack, K.W.; Templeton, A.S.; Kieft, Thomas L.; Smith, R.L.; Sanford, W.E.; Callaghan, R.L.; Mitton, J.B.; Spear, J.R.

    2008-01-01

    A microbial community analysis using 16S rRNA gene sequencing was performed on borehole water and a granite rock core from Henderson Mine, a >1,000-meter-deep molybdenum mine near Empire, CO. Chemical analysis of borehole water at two separate depths (1,044 m and 1,004 m below the mine entrance) suggests that a sharp chemical gradient exists, likely from the mixing of two distinct subsurface fluids, one metal rich and one relatively dilute; this has created unique niches for microorganisms. The microbial community analyzed from filtered, oxic borehole water indicated an abundance of sequences from iron-oxidizing bacteria (Gallionella spp.) and was compared to the community from the same borehole after 2 weeks of being plugged with an expandable packer. Statistical analyses with UniFrac revealed a significant shift in community structure following the addition of the packer. Phospholipid fatty acid (PLFA) analysis suggested that Nitrosomonadales dominated the oxic borehole, while PLFAs indicative of anaerobic bacteria were most abundant in the samples from the plugged borehole. Microbial sequences were represented primarily by Firmicutes, Proteobacteria, and a lineage of sequences which did not group with any identified bacterial division; phylogenetic analyses confirmed the presence of a novel candidate division. This "Henderson candidate division" dominated the clone libraries from the dilute anoxic fluids. Sequences obtained from the granitic rock core (1,740 m below the surface) were represented by the divisions Proteobacteria (primarily the family Ralstoniaceae) and Firmicutes. Sequences grouping within Ralstoniaceae were also found in the clone libraries from metal-rich fluids yet were absent in more dilute fluids. Lineage-specific comparisons, combined with phylogenetic statistical analyses, show that geochemical variance has an important effect on microbial community structure in deep, subsurface systems. Copyright ?? 2008, American Society for Microbiology. All Rights Reserved.

  20. Subsurface Microbial Diversity in Deep-Granitic-Fracture Water in Colorado▿

    PubMed Central

    Sahl, Jason W.; Schmidt, Raleigh; Swanner, Elizabeth D.; Mandernack, Kevin W.; Templeton, Alexis S.; Kieft, Thomas L.; Smith, Richard L.; Sanford, William E.; Callaghan, Robert L.; Mitton, Jeffry B.; Spear, John R.

    2008-01-01

    A microbial community analysis using 16S rRNA gene sequencing was performed on borehole water and a granite rock core from Henderson Mine, a >1,000-meter-deep molybdenum mine near Empire, CO. Chemical analysis of borehole water at two separate depths (1,044 m and 1,004 m below the mine entrance) suggests that a sharp chemical gradient exists, likely from the mixing of two distinct subsurface fluids, one metal rich and one relatively dilute; this has created unique niches for microorganisms. The microbial community analyzed from filtered, oxic borehole water indicated an abundance of sequences from iron-oxidizing bacteria (Gallionella spp.) and was compared to the community from the same borehole after 2 weeks of being plugged with an expandable packer. Statistical analyses with UniFrac revealed a significant shift in community structure following the addition of the packer. Phospholipid fatty acid (PLFA) analysis suggested that Nitrosomonadales dominated the oxic borehole, while PLFAs indicative of anaerobic bacteria were most abundant in the samples from the plugged borehole. Microbial sequences were represented primarily by Firmicutes, Proteobacteria, and a lineage of sequences which did not group with any identified bacterial division; phylogenetic analyses confirmed the presence of a novel candidate division. This “Henderson candidate division” dominated the clone libraries from the dilute anoxic fluids. Sequences obtained from the granitic rock core (1,740 m below the surface) were represented by the divisions Proteobacteria (primarily the family Ralstoniaceae) and Firmicutes. Sequences grouping within Ralstoniaceae were also found in the clone libraries from metal-rich fluids yet were absent in more dilute fluids. Lineage-specific comparisons, combined with phylogenetic statistical analyses, show that geochemical variance has an important effect on microbial community structure in deep, subsurface systems. PMID:17981950

  1. Deep sequencing of hepatitis C virus hypervariable region 1 reveals no correlation between genetic heterogeneity and antiviral treatment outcome

    PubMed Central

    2014-01-01

    Background Hypervariable region 1 (HVR1) contained within envelope protein 2 (E2) gene is the most variable part of HCV genome and its translation product is a major target for the host immune response. Variability within HVR1 may facilitate evasion of the immune response and could affect treatment outcome. The aim of the study was to analyze the impact of HVR1 heterogeneity employing sensitive ultra-deep sequencing, on the outcome of PEG-IFN-α (pegylated interferon α) and ribavirin treatment. Methods HVR1 sequences were amplified from pretreatment serum samples of 25 patients infected with genotype 1b HCV (12 responders and 13 non-responders) and were subjected to pyrosequencing (GS Junior, 454/Roche). Reads were corrected for sequencing error using ShoRAH software, while population reconstruction was done using three different minimal variant frequency cut-offs of 1%, 2% and 5%. Statistical analysis was done using Mann–Whitney and Fisher’s exact tests. Results Complexity, Shannon entropy, nucleotide diversity per site, genetic distance and the number of genetic substitutions were not significantly different between responders and non-responders, when analyzing viral populations at any of the three frequencies (≥1%, ≥2% and ≥5%). When clonal sample was used to determine pyrosequencing error, 4% of reads were found to be incorrect and the most abundant variant was present at a frequency of 1.48%. Use of ShoRAH reduced the sequencing error to 1%, with the most abundant erroneous variant present at frequency of 0.5%. Conclusions While deep sequencing revealed complex genetic heterogeneity of HVR1 in chronic hepatitis C patients, there was no correlation between treatment outcome and any of the analyzed quasispecies parameters. PMID:25016390

  2. The Influence of Parents and Teachers on the Deep Learning Approach of Pupils in Norwegian Upper-Secondary Schools

    ERIC Educational Resources Information Center

    Elstad, Eyvind; Christophersen, Knut-Andreas; Turmo, Are

    2012-01-01

    Introduction: The purpose of this article was to explore the influence of parents and teachers on the deep learning approach of pupils by estimating the strength of the relationships between these factors and the motivation, volition and deep learning approach of Norwegian 16-year-olds. Method: Structural equation modeling for cross-sectional…

  3. Looking beyond the exome: a phenotype-first approach to molecular diagnostic resolution in rare and undiagnosed diseases.

    PubMed

    Pena, Loren D M; Jiang, Yong-Hui; Schoch, Kelly; Spillmann, Rebecca C; Walley, Nicole; Stong, Nicholas; Rapisardo Horn, Sarah; Sullivan, Jennifer A; McConkie-Rosell, Allyn; Kansagra, Sujay; Smith, Edward C; El-Dairi, Mays; Bellet, Jane; Keels, Martha Ann; Jasien, Joan; Kranz, Peter G; Noel, Richard; Nagaraj, Shashi K; Lark, Robert K; Wechsler, Daniel S G; Del Gaudio, Daniela; Leung, Marco L; Hendon, Laura G; Parker, Collette C; Jones, Kelly L; Goldstein, David B; Shashi, Vandana

    2018-04-01

    PurposeTo describe examples of missed pathogenic variants on whole-exome sequencing (WES) and the importance of deep phenotyping for further diagnostic testing.MethodsGuided by phenotypic information, three children with negative WES underwent targeted single-gene testing.ResultsIndividual 1 had a clinical diagnosis consistent with infantile systemic hyalinosis, although WES and a next-generation sequencing (NGS)-based ANTXR2 test were negative. Sanger sequencing of ANTXR2 revealed a homozygous single base pair insertion, previously missed by the WES variant caller software. Individual 2 had neurodevelopmental regression and cerebellar atrophy, with no diagnosis on WES. New clinical findings prompted Sanger sequencing and copy number testing of PLA2G6. A novel homozygous deletion of the noncoding exon 1 (not included in the WES capture kit) was detected, with extension into the promoter, confirming the clinical suspicion of infantile neuroaxonal dystrophy. Individual 3 had progressive ataxia, spasticity, and magnetic resonance image changes of vanishing white matter leukoencephalopathy. An NGS leukodystrophy gene panel and WES showed a heterozygous pathogenic variant in EIF2B5; no deletions/duplications were detected. Sanger sequencing of EIF2B5 showed a frameshift indel, probably missed owing to failure of alignment.ConclusionThese cases illustrate potential pitfalls of WES/NGS testing and the importance of phenotype-guided molecular testing in yielding diagnoses.

  4. BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone.

    PubMed

    Yang, Bite; Liu, Feng; Ren, Chao; Ouyang, Zhangyi; Xie, Ziwei; Bo, Xiaochen; Shu, Wenjie

    2017-07-01

    Enhancer elements are noncoding stretches of DNA that play key roles in controlling gene expression programmes. Despite major efforts to develop accurate enhancer prediction methods, identifying enhancer sequences continues to be a challenge in the annotation of mammalian genomes. One of the major issues is the lack of large, sufficiently comprehensive and experimentally validated enhancers for humans or other species. Thus, the development of computational methods based on limited experimentally validated enhancers and deciphering the transcriptional regulatory code encoded in the enhancer sequences is urgent. We present a deep-learning-based hybrid architecture, BiRen, which predicts enhancers using the DNA sequence alone. Our results demonstrate that BiRen can learn common enhancer patterns directly from the DNA sequence and exhibits superior accuracy, robustness and generalizability in enhancer prediction relative to other state-of-the-art enhancer predictors based on sequence characteristics. Our BiRen will enable researchers to acquire a deeper understanding of the regulatory code of enhancer sequences. Our BiRen method can be freely accessed at https://github.com/wenjiegroup/BiRen . shuwj@bmi.ac.cn or boxc@bmi.ac.cn. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  5. Distribution and Diversity of Microbial Eukaryotes in Bathypelagic Waters of the South China Sea.

    PubMed

    Xu, Dapeng; Jiao, Nianzhi; Ren, Rui; Warren, Alan

    2017-05-01

    Little is known about the biodiversity of microbial eukaryotes in the South China Sea, especially in waters at bathyal depths. Here, we employed SSU rDNA gene sequencing to reveal the diversity and community structure across depth and distance gradients in the South China Sea. Vertically, the highest alpha diversity was found at 75-m depth. The communities of microbial eukaryotes were clustered into shallow-, middle-, and deep-water groups according to the depth from which they were collected, indicating a depth-related diversity and distribution pattern. Rhizaria sequences dominated the microeukaryote community and occurred in all samples except those from less than 50-m deep, being most abundant near the sea floor where they contributed ca. 64-97% and 40-74% of the total sequences and OTUs recovered, respectively. A large portion of rhizarian OTUs has neither a nearest named neighbor nor a nearest neighbor in the GenBank database which indicated the presence of new phylotypes in the South China Sea. Given their overwhelming abundance and richness, further phylogenetic analysis of rhizarians were performed and three new genetic clusters were revealed containing sequences retrieved from the deep waters of the South China Sea. Our results shed light on the diversity and community structure of microbial eukaryotes in this not yet fully explored area. © 2016 The Author(s) Journal of Eukaryotic Microbiology © 2016 International Society of Protistologists.

  6. Complete genome sequence of the aerobic, heterotroph Marinithermus hydrothermalis type strain (T1T) from a deep-sea hydrothermal vent chimney

    PubMed Central

    Copeland, Alex; Gu, Wei; Yasawong, Montri; Lapidus, Alla; Lucas, Susan; Deshpande, Shweta; Pagani, Ioanna; Tapia, Roxanne; Cheng, Jan-Fang; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Pan, Chongle; Brambilla, Evelyne-Marie; Rohde, Manfred; Tindall, Brian J.; Sikorski, Johannes; Göker, Markus; Detter, John C.; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Woyke, Tanja

    2012-01-01

    Marinithermus hydrothermalis Sako et al. 2003 is the type species of the monotypic genus Marinithermus. M. hydrothermalis T1T was the first isolate within the phylum “Thermus-Deinococcus” to exhibit optimal growth under a salinity equivalent to that of sea water and to have an absolute requirement for NaCl for growth. M. hydrothermalis T1T is of interest because it may provide a new insight into the ecological significance of the aerobic, thermophilic decomposers in the circulation of organic compounds in deep-sea hydrothermal vent ecosystems. This is the first completed genome sequence of a member of the genus Marinithermus and the seventh sequence from the family Thermaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,269,167 bp long genome with its 2,251 protein-coding and 59 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:22675595

  7. Brain Tumor Segmentation Using Deep Belief Networks and Pathological Knowledge.

    PubMed

    Zhan, Tianming; Chen, Yi; Hong, Xunning; Lu, Zhenyu; Chen, Yunjie

    2017-01-01

    In this paper, we propose an automatic brain tumor segmentation method based on Deep Belief Networks (DBNs) and pathological knowledge. The proposed method is targeted against gliomas (both low and high grade) obtained in multi-sequence magnetic resonance images (MRIs). Firstly, a novel deep architecture is proposed to combine the multi-sequences intensities feature extraction with classification to get the classification probabilities of each voxel. Then, graph cut based optimization is executed on the classification probabilities to strengthen the spatial relationships of voxels. At last, pathological knowledge of gliomas is applied to remove some false positives. Our method was validated in the Brain Tumor Segmentation Challenge 2012 and 2013 databases (BRATS 2012, 2013). The performance of segmentation results demonstrates our proposal providing a competitive solution with stateof- the-art methods. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  8. A Protocol for Functional Assessment of Whole-Protein Saturation Mutagenesis Libraries Utilizing High-Throughput Sequencing.

    PubMed

    Stiffler, Michael A; Subramanian, Subu K; Salinas, Victor H; Ranganathan, Rama

    2016-07-03

    Site-directed mutagenesis has long been used as a method to interrogate protein structure, function and evolution. Recent advances in massively-parallel sequencing technology have opened up the possibility of assessing the functional or fitness effects of large numbers of mutations simultaneously. Here, we present a protocol for experimentally determining the effects of all possible single amino acid mutations in a protein of interest utilizing high-throughput sequencing technology, using the 263 amino acid antibiotic resistance enzyme TEM-1 β-lactamase as an example. In this approach, a whole-protein saturation mutagenesis library is constructed by site-directed mutagenic PCR, randomizing each position individually to all possible amino acids. The library is then transformed into bacteria, and selected for the ability to confer resistance to β-lactam antibiotics. The fitness effect of each mutation is then determined by deep sequencing of the library before and after selection. Importantly, this protocol introduces methods which maximize sequencing read depth and permit the simultaneous selection of the entire mutation library, by mixing adjacent positions into groups of length accommodated by high-throughput sequencing read length and utilizing orthogonal primers to barcode each group. Representative results using this protocol are provided by assessing the fitness effects of all single amino acid mutations in TEM-1 at a clinically relevant dosage of ampicillin. The method should be easily extendable to other proteins for which a high-throughput selection assay is in place.

  9. Targeted exome sequencing reveals novel USH2A mutations in Chinese patients with simplex Usher syndrome.

    PubMed

    Shu, Hai-Rong; Bi, Huai; Pan, Yang-Chun; Xu, Hang-Yu; Song, Jian-Xin; Hu, Jie

    2015-09-16

    Usher syndrome (USH) is an autosomal recessive disorder characterized by hearing impairment and vision dysfunction due to retinitis pigmentosa. Phenotypic and genetic heterogeneities of this disease make it impractical to obtain a genetic diagnosis by conventional Sanger sequencing. In this study, we applied a next-generation sequencing approach to detect genetic abnormalities in patients with USH. Two unrelated Chinese families were recruited, consisting of two USH afflicted patients and four unaffected relatives. We selected 199 genes related to inherited retinal diseases as targets for deep exome sequencing. Through systematic data analysis using an established bioinformatics pipeline, all variants that passed filter criteria were validated by Sanger sequencing and co-segregation analysis. A homozygous frameshift mutation (c.4382delA, p.T1462Lfs*2) was revealed in exon20 of gene USH2A in the F1 family. Two compound heterozygous mutations, IVS47 + 1G > A and c.13156A > T (p.I4386F), located in intron 48 and exon 63 respectively, of USH2A, were identified as causative mutations for the F2 family. Of note, the missense mutation c.13156A > T has not been reported so far. In conclusion, targeted exome sequencing precisely and rapidly identified the genetic defects in two Chinese USH families and this technique can be applied as a routine examination for these disorders with significant clinical and genetic heterogeneity.

  10. Ultra-deep sequencing enables high-fidelity recovery of biodiversity for bulk arthropod samples without PCR amplification

    PubMed Central

    2013-01-01

    Background Next-generation-sequencing (NGS) technologies combined with a classic DNA barcoding approach have enabled fast and credible measurement for biodiversity of mixed environmental samples. However, the PCR amplification involved in nearly all existing NGS protocols inevitably introduces taxonomic biases. In the present study, we developed new Illumina pipelines without PCR amplifications to analyze terrestrial arthropod communities. Results Mitochondrial enrichment directly followed by Illumina shotgun sequencing, at an ultra-high sequence volume, enabled the recovery of Cytochrome c Oxidase subunit 1 (COI) barcode sequences, which allowed for the estimation of species composition at high fidelity for a terrestrial insect community. With 15.5 Gbp Illumina data, approximately 97% and 92% were detected out of the 37 input Operational Taxonomic Units (OTUs), whether the reference barcode library was used or not, respectively, while only 1 novel OTU was found for the latter. Additionally, relatively strong correlation between the sequencing volume and the total biomass was observed for species from the bulk sample, suggesting a potential solution to reveal relative abundance. Conclusions The ability of the new Illumina PCR-free pipeline for DNA metabarcoding to detect small arthropod specimens and its tendency to avoid most, if not all, false positives suggests its great potential in biodiversity-related surveillance, such as in biomonitoring programs. However, further improvement for mitochondrial enrichment is likely needed for the application of the new pipeline in analyzing arthropod communities at higher diversity. PMID:23587339

  11. Sequence capture by hybridization to explore modern and ancient genomic diversity in model and nonmodel organisms

    PubMed Central

    Gasc, Cyrielle; Peyretaillade, Eric

    2016-01-01

    Abstract The recent expansion of next-generation sequencing has significantly improved biological research. Nevertheless, deep exploration of genomes or metagenomic samples remains difficult because of the sequencing depth and the associated costs required. Therefore, different partitioning strategies have been developed to sequence informative subsets of studied genomes. Among these strategies, hybridization capture has proven to be an innovative and efficient tool for targeting and enriching specific biomarkers in complex DNA mixtures. It has been successfully applied in numerous areas of biology, such as exome resequencing for the identification of mutations underlying Mendelian or complex diseases and cancers, and its usefulness has been demonstrated in the agronomic field through the linking of genetic variants to agricultural phenotypic traits of interest. Moreover, hybridization capture has provided access to underexplored, but relevant fractions of genomes through its ability to enrich defined targets and their flanking regions. Finally, on the basis of restricted genomic information, this method has also allowed the expansion of knowledge of nonreference species and ancient genomes and provided a better understanding of metagenomic samples. In this review, we present the major advances and discoveries permitted by hybridization capture and highlight the potency of this approach in all areas of biology. PMID:27105841

  12. Sequence capture by hybridization to explore modern and ancient genomic diversity in model and nonmodel organisms.

    PubMed

    Gasc, Cyrielle; Peyretaillade, Eric; Peyret, Pierre

    2016-06-02

    The recent expansion of next-generation sequencing has significantly improved biological research. Nevertheless, deep exploration of genomes or metagenomic samples remains difficult because of the sequencing depth and the associated costs required. Therefore, different partitioning strategies have been developed to sequence informative subsets of studied genomes. Among these strategies, hybridization capture has proven to be an innovative and efficient tool for targeting and enriching specific biomarkers in complex DNA mixtures. It has been successfully applied in numerous areas of biology, such as exome resequencing for the identification of mutations underlying Mendelian or complex diseases and cancers, and its usefulness has been demonstrated in the agronomic field through the linking of genetic variants to agricultural phenotypic traits of interest. Moreover, hybridization capture has provided access to underexplored, but relevant fractions of genomes through its ability to enrich defined targets and their flanking regions. Finally, on the basis of restricted genomic information, this method has also allowed the expansion of knowledge of nonreference species and ancient genomes and provided a better understanding of metagenomic samples. In this review, we present the major advances and discoveries permitted by hybridization capture and highlight the potency of this approach in all areas of biology. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. Hourly air pollution concentrations and their important predictors over Houston, Texas using deep neural networks: case study of DISCOVER-AQ time period

    NASA Astrophysics Data System (ADS)

    Eslami, E.; Choi, Y.; Roy, A.

    2017-12-01

    Air quality forecasting carried out by chemical transport models often show significant error. This study uses a deep-learning approach over the Houston-Galveston-Brazoria (HGB) area to overcome this forecasting challenge, for the DISCOVER-AQ period (September 2013). Two approaches, deep neural network (DNN) using a Multi-Layer Perceptron (MLP) and Restricted Boltzmann Machine (RBM) were utilized. The proposed approaches analyzed input data by identifying features abstracted from its previous layer using a stepwise method. The approaches predicted hourly ozone and PM in September 2013 using several predictors of prior three days, including wind fields, temperature, relative humidity, cloud fraction, precipitation along with PM, ozone, and NOx concentrations. Model-measurement comparisons for available monitoring sites reported Indexes of Agreement (IOA) of around 0.95 for both DNN and RBM. A standard artificial neural network (ANN) (IOA=0.90) with similar architecture showed poorer performance than the deep networks, clearly demonstrating the superiority of the deep approaches. Additionally, each network (both deep and standard) performed significantly better than a previous CMAQ study, which showed an IOA of less than 0.80. The most influential input variables were identified using their associated weights, which represented the sensitivity of ozone to input parameters. The results indicate deep learning approaches can achieve more accurate ozone forecasting and identify the important input variables for ozone predictions in metropolitan areas.

  14. Deep Reconditioning Testing for near Earth Orbits

    NASA Technical Reports Server (NTRS)

    Betz, F. E.; Barnes, W. L.

    1984-01-01

    The problems and benefits of deep reconditioning to near Earth orbit missions with high cycle life and shallow discharge depth requirements is discussed. A simple battery level approach to deep reconditioning of nickel cadmium batteries in near Earth orbit is considered. A test plan was developed to perform deep reconditioning in direct comparison with an alternative trickle charge approach. The results demonstrate that the deep reconditioning procedure described for near Earth orbit application is inferior to the alternative of trickle charging.

  15. Exploring the relationships between epistemic beliefs about medicine and approaches to learning medicine: a structural equation modeling analysis.

    PubMed

    Chiu, Yen-Lin; Liang, Jyh-Chong; Hou, Cheng-Yen; Tsai, Chin-Chung

    2016-07-18

    Students' epistemic beliefs may vary in different domains; therefore, it may be beneficial for medical educators to better understand medical students' epistemic beliefs regarding medicine. Understanding how medical students are aware of medical knowledge and how they learn medicine is a critical issue of medical education. The main purposes of this study were to investigate medical students' epistemic beliefs relating to medical knowledge, and to examine their relationships with students' approaches to learning medicine. A total of 340 undergraduate medical students from 9 medical colleges in Taiwan were surveyed with the Medical-Specific Epistemic Beliefs (MSEB) questionnaire (i.e., multi-source, uncertainty, development, justification) and the Approach to Learning Medicine (ALM) questionnaire (i.e., surface motive, surface strategy, deep motive, and deep strategy). By employing the structural equation modeling technique, the confirmatory factor analysis and path analysis were conducted to validate the questionnaires and explore the structural relations between these two constructs. It was indicated that medical students with multi-source beliefs who were suspicious of medical knowledge transmitted from authorities were less likely to possess a surface motive and deep strategies. Students with beliefs regarding uncertain medical knowledge tended to utilize flexible approaches, that is, they were inclined to possess a surface motive but adopt deep strategies. Students with beliefs relating to justifying medical knowledge were more likely to have mixed motives (both surface and deep motives) and mixed strategies (both surface and deep strategies). However, epistemic beliefs regarding development did not have significant relations with approaches to learning. Unexpectedly, it was found that medical students with sophisticated epistemic beliefs (e.g., suspecting knowledge from medical experts) did not necessarily engage in deep approaches to learning medicine. Instead of a deep approach, medical students with sophisticated epistemic beliefs in uncertain and justifying medical knowledge intended to employ a flexible approach and a mixed approach, respectively.

  16. Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees.

    PubMed

    Williams, Philip H; Eyles, Rod; Weiller, Georg

    2012-01-01

    MicroRNAs (miRNAs) are nonprotein coding RNAs between 20 and 22 nucleotides long that attenuate protein production. Different types of sequence data are being investigated for novel miRNAs, including genomic and transcriptomic sequences. A variety of machine learning methods have successfully predicted miRNA precursors, mature miRNAs, and other nonprotein coding sequences. MirTools, mirDeep2, and miRanalyzer require "read count" to be included with the input sequences, which restricts their use to deep-sequencing data. Our aim was to train a predictor using a cross-section of different species to accurately predict miRNAs outside the training set. We wanted a system that did not require read-count for prediction and could therefore be applied to short sequences extracted from genomic, EST, or RNA-seq sources. A miRNA-predictive decision-tree model has been developed by supervised machine learning. It only requires that the corresponding genome or transcriptome is available within a sequence window that includes the precursor candidate so that the required sequence features can be collected. Some of the most critical features for training the predictor are the miRNA:miRNA(∗) duplex energy and the number of mismatches in the duplex. We present a cross-species plant miRNA predictor with 84.08% sensitivity and 98.53% specificity based on rigorous testing by leave-one-out validation.

  17. miRanalyzer: a microRNA detection and analysis tool for next-generation sequencing experiments.

    PubMed

    Hackenberg, Michael; Sturm, Martin; Langenberger, David; Falcón-Pérez, Juan Manuel; Aransay, Ana M

    2009-07-01

    Next-generation sequencing allows now the sequencing of small RNA molecules and the estimation of their expression levels. Consequently, there will be a high demand of bioinformatics tools to cope with the several gigabytes of sequence data generated in each single deep-sequencing experiment. Given this scene, we developed miRanalyzer, a web server tool for the analysis of deep-sequencing experiments for small RNAs. The web server tool requires a simple input file containing a list of unique reads and its copy numbers (expression levels). Using these data, miRanalyzer (i) detects all known microRNA sequences annotated in miRBase, (ii) finds all perfect matches against other libraries of transcribed sequences and (iii) predicts new microRNAs. The prediction of new microRNAs is an especially important point as there are many species with very few known microRNAs. Therefore, we implemented a highly accurate machine learning algorithm for the prediction of new microRNAs that reaches AUC values of 97.9% and recall values of up to 75% on unseen data. The web tool summarizes all the described steps in a single output page, which provides a comprehensive overview of the analysis, adding links to more detailed output pages for each analysis module. miRanalyzer is available at http://web.bioinformatics.cicbiogune.es/microRNA/.

  18. Cultivating the Deep Subsurface Microbiome

    NASA Astrophysics Data System (ADS)

    Casar, C. P.; Osburn, M. R.; Flynn, T. M.; Masterson, A.; Kruger, B.

    2017-12-01

    Subterranean ecosystems are poorly understood because many microbes detected in metagenomic surveys are only distantly related to characterized isolates. Cultivating microorganisms from the deep subsurface is challenging due to its inaccessibility and potential for contamination. The Deep Mine Microbial Observatory (DeMMO) in Lead, SD however, offers access to deep microbial life via pristine fracture fluids in bedrock to a depth of 1478 m. The metabolic landscape of DeMMO was previously characterized via thermodynamic modeling coupled with genomic data, illustrating the potential for microbial inhabitants of DeMMO to utilize mineral substrates as energy sources. Here, we employ field and lab based cultivation approaches with pure minerals to link phylogeny to metabolism at DeMMO. Fracture fluids were directed through reactors filled with Fe3O4, Fe2O3, FeS2, MnO2, and FeCO3 at two sites (610 m and 1478 m) for 2 months prior to harvesting for subsequent analyses. We examined mineralogical, geochemical, and microbiological composition of the reactors via DNA sequencing, microscopy, lipid biomarker characterization, and bulk C and N isotope ratios to determine the influence of mineralogy on biofilm community development. Pre-characterized mineral chips were imaged via SEM to assay microbial growth; preliminary results suggest MnO2, Fe3O4, and Fe2O3 were most conducive to colonization. Solid materials from reactors were used as inoculum for batch cultivation experiments. Media designed to mimic fracture fluid chemistry was supplemented with mineral substrates targeting metal reducers. DNA sequences and microscopy of iron oxide-rich biofilms and fracture fluids suggest iron oxidation is a major energy source at redox transition zones where anaerobic fluids meet more oxidizing conditions. We utilized these biofilms and fluids as inoculum in gradient cultivation experiments targeting microaerophilic iron oxidizers. Cultivation of microbes endemic to DeMMO, a system locally dominated by unclassified and candidate phyla, has the potential to yield novel subsurface organisms with unique physiologies. We intend to further utilize subsurface isolates to probe the effects of geochemical perturbations on biosignatures in future studies, thus broadening our understanding of subterranean ecosystems.

  19. An introduction to deep learning on biological sequence data: examples and solutions.

    PubMed

    Jurtz, Vanessa Isabell; Johansen, Alexander Rosenberg; Nielsen, Morten; Almagro Armenteros, Jose Juan; Nielsen, Henrik; Sønderby, Casper Kaae; Winther, Ole; Sønderby, Søren Kaae

    2017-11-15

    Deep neural network architectures such as convolutional and long short-term memory networks have become increasingly popular as machine learning tools during the recent years. The availability of greater computational resources, more data, new algorithms for training deep models and easy to use libraries for implementation and training of neural networks are the drivers of this development. The use of deep learning has been especially successful in image recognition; and the development of tools, applications and code examples are in most cases centered within this field rather than within biology. Here, we aim to further the development of deep learning methods within biology by providing application examples and ready to apply and adapt code templates. Given such examples, we illustrate how architectures consisting of convolutional and long short-term memory neural networks can relatively easily be designed and trained to state-of-the-art performance on three biological sequence problems: prediction of subcellular localization, protein secondary structure and the binding of peptides to MHC Class II molecules. All implementations and datasets are available online to the scientific community at https://github.com/vanessajurtz/lasagne4bio. skaaesonderby@gmail.com. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  20. Spotting L3 slice in CT scans using deep convolutional network and transfer learning.

    PubMed

    Belharbi, Soufiane; Chatelain, Clément; Hérault, Romain; Adam, Sébastien; Thureau, Sébastien; Chastan, Mathieu; Modzelewski, Romain

    2017-08-01

    In this article, we present a complete automated system for spotting a particular slice in a complete 3D Computed Tomography exam (CT scan). Our approach does not require any assumptions on which part of the patient's body is covered by the scan. It relies on an original machine learning regression approach. Our models are learned using the transfer learning trick by exploiting deep architectures that have been pre-trained on imageNet database, and therefore it requires very little annotation for its training. The whole pipeline consists of three steps: i) conversion of the CT scans into Maximum Intensity Projection (MIP) images, ii) prediction from a Convolutional Neural Network (CNN) applied in a sliding window fashion over the MIP image, and iii) robust analysis of the prediction sequence to predict the height of the desired slice within the whole CT scan. Our approach is applied to the detection of the third lumbar vertebra (L3) slice that has been found to be representative to the whole body composition. Our system is evaluated on a database collected in our clinical center, containing 642 CT scans from different patients. We obtained an average localization error of 1.91±2.69 slices (less than 5 mm) in an average time of less than 2.5 s/CT scan, allowing integration of the proposed system into daily clinical routines. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. Draft Genome Sequence of Thermus scotoductus Strain K1, Isolated from a Geothermal Spring in Karvachar, Nagorno Karabakh

    PubMed Central

    Saghatelyan, Ani; Poghosyan, Lianna

    2015-01-01

    The 2,379,636-bp draft genome sequence of Thermus scotoductus strain K1, isolated from geothermal spring outlet located in the Karvachar region in Nagorno Karabakh is presented. Strain K1 shares about 80% genome sequence similarity with T. scotoductus strain SA-01, recovered from a deep gold mine in South Africa. PMID:26564055

  2. Analysis of recurrent neural networks for short-term energy load forecasting

    NASA Astrophysics Data System (ADS)

    Di Persio, Luca; Honchar, Oleksandr

    2017-11-01

    Short-term forecasts have recently gained an increasing attention because of the rise of competitive electricity markets. In fact, short-terms forecast of possible future loads turn out to be fundamental to build efficient energy management strategies as well as to avoid energy wastage. Such type of challenges are difficult to tackle both from a theoretical and applied point of view. Latter tasks require sophisticated methods to manage multidimensional time series related to stochastic phenomena which are often highly interconnected. In the present work we first review novel approaches to energy load forecasting based on recurrent neural network, focusing our attention on long/short term memory architectures (LSTMs). Such type of artificial neural networks have been widely applied to problems dealing with sequential data such it happens, e.g., in socio-economics settings, for text recognition purposes, concerning video signals, etc., always showing their effectiveness to model complex temporal data. Moreover, we consider different novel variations of basic LSTMs, such as sequence-to-sequence approach and bidirectional LSTMs, aiming at providing effective models for energy load data. Last but not least, we test all the described algorithms on real energy load data showing not only that deep recurrent networks can be successfully applied to energy load forecasting, but also that this approach can be extended to other problems based on time series prediction.

  3. Contribution of crenarchaeal autotrophic ammonia oxidizers to the dark primary production in Tyrrhenian deep waters (Central Mediterranean Sea)

    PubMed Central

    Yakimov, Michail M; Cono, Violetta La; Smedile, Francesco; DeLuca, Thomas H; Juárez, Silvia; Ciordia, Sergio; Fernández, Marisol; Albar, Juan Pablo; Ferrer, Manuel; Golyshin, Peter N; Giuliano, Laura

    2011-01-01

    Mesophilic Crenarchaeota have recently been thought to be significant contributors to nitrogen (N) and carbon (C) cycling. In this study, we examined the vertical distribution of ammonia-oxidizing Crenarchaeota at offshore site in Southern Tyrrhenian Sea. The median value of the crenachaeal cell to amoA gene ratio was close to one suggesting that virtually all deep-sea Crenarchaeota possess the capacity to oxidize ammonia. Crenarchaea-specific genes, nirK and ureC, for nitrite reductase and urease were identified and their affiliation demonstrated the presence of ‘deep-sea' clades distinct from ‘shallow' representatives. Measured deep-sea dark CO2 fixation estimates were comparable to the median value of photosynthetic biomass production calculated for this area of Tyrrhenian Sea, pointing to the significance of this process in the C cycle of aphotic marine ecosystems. To elucidate the pivotal organisms in this process, we targeted known marine crenarchaeal autotrophy-related genes, coding for acetyl-CoA carboxylase (accA) and 4-hydroxybutyryl-CoA dehydratase (4-hbd). As in case of nirK and ureC, these genes are grouped with deep-sea sequences being distantly related to those retrieved from the epipelagic zone. To pair the molecular data with specific functional attributes we performed [14C]HCO3 incorporation experiments followed by analyses of radiolabeled proteins using shotgun proteomics approach. More than 100 oligopeptides were attributed to 40 marine crenarchaeal-specific proteins that are involved in 10 different metabolic processes, including autotrophy. Obtained results provided a clear proof of chemolithoautotrophic physiology of bathypelagic crenarchaeota and indicated that this numerically predominant group of microorganisms facilitate a hitherto unrecognized sink for inorganic C of a global importance. PMID:21209665

  4. Aligning and synchronization of MIS5 proxy records from Lake Ohrid (FYROM) with independently dated Mediterranean archives: implications for DEEP core chronology

    NASA Astrophysics Data System (ADS)

    Zanchetta, Giovanni; Regattieri, Eleonora; Giaccio, Biagio; Wagner, Bernd; Sulpizio, Roberto; Francke, Alex; Vogel, Hendrik; Sadori, Laura; Masi, Alessia; Sinopoli, Gaia; Lacey, Jack H.; Leng, Melanie J.; Leicher, Niklas

    2016-05-01

    The DEEP site sediment sequence obtained during the ICDP SCOPSCO project at Lake Ohrid was dated using tephrostratigraphic information, cyclostratigraphy, and orbital tuning through the marine isotope stages (MIS) 15-1. Although this approach is suitable for the generation of a general chronological framework of the long succession, it is insufficient to resolve more detailed palaeoclimatological questions, such as leads and lags of climate events between marine and terrestrial records or between different regions. Here, we demonstrate how the use of different tie points can affect cyclostratigraphy and orbital tuning for the period between ca. 140 and 70 ka and how the results can be correlated with directly/indirectly radiometrically dated Mediterranean marine and continental proxy records. The alternative age model presented here shows consistent differences with that initially proposed by Francke et al. (2015) for the same interval, in particular at the level of the MIS6-5e transition. According to this new age model, different proxies from the DEEP site sediment record support an increase of temperatures between glacial to interglacial conditions, which is almost synchronous with a rapid increase in sea surface temperature observed in the western Mediterranean. The results show how a detailed study of independent chronological tie points is important to align different records and to highlight asynchronisms of climate events. Moreover, Francke et al. (2016) have incorporated the new chronology proposed for tephra OH-DP-0499 in the final DEEP age model. This has reduced substantially the chronological discrepancies between the DEEP site age model and the model proposed here for the last glacial-interglacial transition.

  5. Cultivation and diversity of fungi buried in the Baltic Sea sediments

    NASA Astrophysics Data System (ADS)

    Xiao, N.

    2015-12-01

    @font-face { "MS 明朝"; }@font-face { "Century"; }@font-face { "Century"; }@font-face { "@MS 明朝"; }p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0mm 0mm 0.0001pt; text-align: justify; font-size: 12pt; ; }.MsoChpDefault { ; }div.WordSection1 { page: WordSection1; } Studies on molecular biological and cultivation have been done for the prokaryotic microbial community in the deep biosphere. Compare to the prokaryotic community, few attempts have been done for eukaryotic microbial community. Here we report the study on fungi buried in deep-subsurface sediments by approaches of both cultivation and molecular diversity survey. Cultivation targeting fungi has been done using a sequential sediment samples obtained from the Baltic Sea, Landsort Deep site during the IODP expedition 347. 6 culture media with different nutrition and salt concentration have been tried for the fungi cultivation. 50 isolates of fungi were obtained from the sediment samples. The surface sediments showed richness of fungi strains but not for the deep sediments. Internal Transcribed Spacer (ITS) regions of RNA genes were amplified and for the identification of the isolates. The isolates were classified to 11 different genera. Pseudeurotium bakeri was the dominant strain throughout the glacial and interglacial sediments. We also found different representative fungal strains from glacial and interglacial sediments, suggesting the cultivated strains are buried from different sources. The survey of fungal diversity was done by sequencing the 18S RNA genes in the total DNA extracted from selected sediment samples. Fungi community showed different cluster in the glacial and interglacial sediments.Our results revealed the presence and activity of fungi in the deep biosphere of the Baltic sea and provided evidence of fungal community response to the climate change.

  6. Toward a real-time system for temporal enhanced ultrasound-guided prostate biopsy.

    PubMed

    Azizi, Shekoofeh; Van Woudenberg, Nathan; Sojoudi, Samira; Li, Ming; Xu, Sheng; Abu Anas, Emran M; Yan, Pingkun; Tahmasebi, Amir; Kwak, Jin Tae; Turkbey, Baris; Choyke, Peter; Pinto, Peter; Wood, Bradford; Mousavi, Parvin; Abolmaesumi, Purang

    2018-03-27

    We have previously proposed temporal enhanced ultrasound (TeUS) as a new paradigm for tissue characterization. TeUS is based on analyzing a sequence of ultrasound data with deep learning and has been demonstrated to be successful for detection of cancer in ultrasound-guided prostate biopsy. Our aim is to enable the dissemination of this technology to the community for large-scale clinical validation. In this paper, we present a unified software framework demonstrating near-real-time analysis of ultrasound data stream using a deep learning solution. The system integrates ultrasound imaging hardware, visualization and a deep learning back-end to build an accessible, flexible and robust platform. A client-server approach is used in order to run computationally expensive algorithms in parallel. We demonstrate the efficacy of the framework using two applications as case studies. First, we show that prostate cancer detection using near-real-time analysis of RF and B-mode TeUS data and deep learning is feasible. Second, we present real-time segmentation of ultrasound prostate data using an integrated deep learning solution. The system is evaluated for cancer detection accuracy on ultrasound data obtained from a large clinical study with 255 biopsy cores from 157 subjects. It is further assessed with an independent dataset with 21 biopsy targets from six subjects. In the first study, we achieve area under the curve, sensitivity, specificity and accuracy of 0.94, 0.77, 0.94 and 0.92, respectively, for the detection of prostate cancer. In the second study, we achieve an AUC of 0.85. Our results suggest that TeUS-guided biopsy can be potentially effective for the detection of prostate cancer.

  7. Contribution of crenarchaeal autotrophic ammonia oxidizers to the dark primary production in Tyrrhenian deep waters (Central Mediterranean Sea).

    PubMed

    Yakimov, Michail M; Cono, Violetta La; Smedile, Francesco; DeLuca, Thomas H; Juárez, Silvia; Ciordia, Sergio; Fernández, Marisol; Albar, Juan Pablo; Ferrer, Manuel; Golyshin, Peter N; Giuliano, Laura

    2011-06-01

    Mesophilic Crenarchaeota have recently been thought to be significant contributors to nitrogen (N) and carbon (C) cycling. In this study, we examined the vertical distribution of ammonia-oxidizing Crenarchaeota at offshore site in Southern Tyrrhenian Sea. The median value of the crenachaeal cell to amoA gene ratio was close to one suggesting that virtually all deep-sea Crenarchaeota possess the capacity to oxidize ammonia. Crenarchaea-specific genes, nirK and ureC, for nitrite reductase and urease were identified and their affiliation demonstrated the presence of 'deep-sea' clades distinct from 'shallow' representatives. Measured deep-sea dark CO(2) fixation estimates were comparable to the median value of photosynthetic biomass production calculated for this area of Tyrrhenian Sea, pointing to the significance of this process in the C cycle of aphotic marine ecosystems. To elucidate the pivotal organisms in this process, we targeted known marine crenarchaeal autotrophy-related genes, coding for acetyl-CoA carboxylase (accA) and 4-hydroxybutyryl-CoA dehydratase (4-hbd). As in case of nirK and ureC, these genes are grouped with deep-sea sequences being distantly related to those retrieved from the epipelagic zone. To pair the molecular data with specific functional attributes we performed [(14)C]HCO(3) incorporation experiments followed by analyses of radiolabeled proteins using shotgun proteomics approach. More than 100 oligopeptides were attributed to 40 marine crenarchaeal-specific proteins that are involved in 10 different metabolic processes, including autotrophy. Obtained results provided a clear proof of chemolithoautotrophic physiology of bathypelagic crenarchaeota and indicated that this numerically predominant group of microorganisms facilitate a hitherto unrecognized sink for inorganic C of a global importance.

  8. Arthropod phylogenetics in light of three novel millipede (myriapoda: diplopoda) mitochondrial genomes with comments on the appropriateness of mitochondrial genome sequence data for inferring deep level relationships.

    PubMed

    Brewer, Michael S; Swafford, Lynn; Spruill, Chad L; Bond, Jason E

    2013-01-01

    Arthropods are the most diverse group of eukaryotic organisms, but their phylogenetic relationships are poorly understood. Herein, we describe three mitochondrial genomes representing orders of millipedes for which complete genomes had not been characterized. Newly sequenced genomes are combined with existing data to characterize the protein coding regions of myriapods and to attempt to reconstruct the evolutionary relationships within the Myriapoda and Arthropoda. The newly sequenced genomes are similar to previously characterized millipede sequences in terms of synteny and length. Unique translocations occurred within the newly sequenced taxa, including one half of the Appalachioria falcifera genome, which is inverted with respect to other millipede genomes. Across myriapods, amino acid conservation levels are highly dependent on the gene region. Additionally, individual loci varied in the level of amino acid conservation. Overall, most gene regions showed low levels of conservation at many sites. Attempts to reconstruct the evolutionary relationships suffered from questionable relationships and low support values. Analyses of phylogenetic informativeness show the lack of signal deep in the trees (i.e., genes evolve too quickly). As a result, the myriapod tree resembles previously published results but lacks convincing support, and, within the arthropod tree, well established groups were recovered as polyphyletic. The novel genome sequences described herein provide useful genomic information concerning millipede groups that had not been investigated. Taken together with existing sequences, the variety of compositions and evolution of myriapod mitochondrial genomes are shown to be more complex than previously thought. Unfortunately, the use of mitochondrial protein-coding regions in deep arthropod phylogenetics appears problematic, a result consistent with previously published studies. Lack of phylogenetic signal renders the resulting tree topologies as suspect. As such, these data are likely inappropriate for investigating such ancient relationships.

  9. Transcriptome Analysis in Venom Gland of the Predatory Giant Ant Dinoponera quadriceps: Insights into the Polypeptide Toxin Arsenal of Hymenopterans

    PubMed Central

    Chong, Cheong-Meng; Leung, Siu Wai; Prieto-da-Silva, Álvaro R. B.; Havt, Alexandre; Quinet, Yves P.; Martins, Alice M. C.; Lee, Simon M. Y.; Rádis-Baptista, Gandhi

    2014-01-01

    Background Dinoponera quadriceps is a predatory giant ant that inhabits the Neotropical region and subdues its prey (insects) with stings that deliver a toxic cocktail of molecules. Human accidents occasionally occur and cause local pain and systemic symptoms. A comprehensive study of the D. quadriceps venom gland transcriptome is required to advance our knowledge about the toxin repertoire of the giant ant venom and to understand the physiopathological basis of Hymenoptera envenomation. Results We conducted a transcriptome analysis of a cDNA library from the D. quadriceps venom gland with Sanger sequencing in combination with whole-transcriptome shotgun deep sequencing. From the cDNA library, a total of 420 independent clones were analyzed. Although the proportion of dinoponeratoxin isoform precursors was high, the first giant ant venom inhibitor cysteine-knot (ICK) toxin was found. The deep next generation sequencing yielded a total of 2,514,767 raw reads that were assembled into 18,546 contigs. A BLAST search of the assembled contigs against non-redundant and Swiss-Prot databases showed that 6,463 contigs corresponded to BLASTx hits and indicated an interesting diversity of transcripts related to venom gene expression. The majority of these venom-related sequences code for a major polypeptide core, which comprises venom allergens, lethal-like proteins and esterases, and a minor peptide framework composed of inter-specific structurally conserved cysteine-rich toxins. Both the cDNA library and deep sequencing yielded large proportions of contigs that showed no similarities with known sequences. Conclusions To our knowledge, this is the first report of the venom gland transcriptome of the New World giant ant D. quadriceps. The glandular venom system was dissected, and the toxin arsenal was revealed; this process brought to light novel sequences that included an ICK-folded toxins, allergen proteins, esterases (phospholipases and carboxylesterases), and lethal-like toxins. These findings contribute to the understanding of the ecology, behavior and venomics of hymenopterans. PMID:24498135

  10. ViVaMBC: estimating viral sequence variation in complex populations from illumina deep-sequencing data using model-based clustering.

    PubMed

    Verbist, Bie; Clement, Lieven; Reumers, Joke; Thys, Kim; Vapirev, Alexander; Talloen, Willem; Wetzels, Yves; Meys, Joris; Aerssens, Jeroen; Bijnens, Luc; Thas, Olivier

    2015-02-22

    Deep-sequencing allows for an in-depth characterization of sequence variation in complex populations. However, technology associated errors may impede a powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores which are derived from a quadruplet of intensities, one channel for each nucleotide type for Illumina sequencing. The highest intensity of the four channels determines the base that is called. Mismatch bases can often be corrected by the second best base, i.e. the base with the second highest intensity in the quadruplet. A virus variant model-based clustering method, ViVaMBC, is presented that explores quality scores and second best base calls for identifying and quantifying viral variants. ViVaMBC is optimized to call variants at the codon level (nucleotide triplets) which enables immediate biological interpretation of the variants with respect to their antiviral drug responses. Using mixtures of HCV plasmids we show that our method accurately estimates frequencies down to 0.5%. The estimates are unbiased when average coverages of 25,000 are reached. A comparison with the SNP-callers V-Phaser2, ShoRAH, and LoFreq shows that ViVaMBC has a superb sensitivity and specificity for variants with frequencies above 0.4%. Unlike the competitors, ViVaMBC reports a higher number of false-positive findings with frequencies below 0.4% which might partially originate from picking up artificial variants introduced by errors in the sample and library preparation step. ViVaMBC is the first method to call viral variants directly at the codon level. The strength of the approach lies in modeling the error probabilities based on the quality scores. Although the use of second best base calls appeared very promising in our data exploration phase, their utility was limited. They provided a slight increase in sensitivity, which however does not warrant the additional computational cost of running the offline base caller. Apparently a lot of information is already contained in the quality scores enabling the model based clustering procedure to adjust the majority of the sequencing errors. Overall the sensitivity of ViVaMBC is such that technical constraints like PCR errors start to form the bottleneck for low frequency variant detection.

  11. Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shi, CY; Yang, H; Wei, CL

    Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq) provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes. Using high-throughput Illumina RNA-seq, the transcriptome from poly (A){sup +} RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs). Approximate 34.5 million reads were obtained, trimmed, and assembled intomore » 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010). Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG) found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were analyzed by RT-PCR and quantitative real time PCR (qRT-PCR). An extensive transcriptome dataset has been obtained from the deep sequencing of tea plant. The coverage of the transcriptome is comprehensive enough to discover all known genes of several major metabolic pathways. This transcriptome dataset can serve as an important public information platform for gene expression, genomics, and functional genomic studies in C. sinensis.« less

  12. Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

    PubMed Central

    2011-01-01

    Background Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq) provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes. Results Using high-throughput Illumina RNA-seq, the transcriptome from poly (A)+ RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs). Approximate 34.5 million reads were obtained, trimmed, and assembled into 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010). Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG) found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were analyzed by RT-PCR and quantitative real time PCR (qRT-PCR). Conclusions An extensive transcriptome dataset has been obtained from the deep sequencing of tea plant. The coverage of the transcriptome is comprehensive enough to discover all known genes of several major metabolic pathways. This transcriptome dataset can serve as an important public information platform for gene expression, genomics, and functional genomic studies in C. sinensis. PMID:21356090

  13. Deep Learning for Brain MRI Segmentation: State of the Art and Future Directions.

    PubMed

    Akkus, Zeynettin; Galimzianova, Alfiia; Hoogi, Assaf; Rubin, Daniel L; Erickson, Bradley J

    2017-08-01

    Quantitative analysis of brain MRI is routine for many neurological diseases and conditions and relies on accurate segmentation of structures of interest. Deep learning-based segmentation approaches for brain MRI are gaining interest due to their self-learning and generalization ability over large amounts of data. As the deep learning architectures are becoming more mature, they gradually outperform previous state-of-the-art classical machine learning algorithms. This review aims to provide an overview of current deep learning-based segmentation approaches for quantitative brain MRI. First we review the current deep learning architectures used for segmentation of anatomical brain structures and brain lesions. Next, the performance, speed, and properties of deep learning approaches are summarized and discussed. Finally, we provide a critical assessment of the current state and identify likely future developments and trends.

  14. Complete genome sequence of the Antarctic Halorubrum lacusprofundi type strain ACAM 34

    DOE PAGES

    Anderson, Iain J.; DasSarma, Priya; Lucas, Susan; ...

    2016-09-10

    Halorubrum lacusprofundi is an extreme halophile within the archaeal phylum Euryarchaeota. The type strain ACAM 34 was isolated from Deep Lake, Antarctica. H. lacusprofundi is of phylogenetic interest because it is distantly related to the haloarchaea that have previously been sequenced. It is also of interest because of its psychrotolerance. We report here the complete genome sequence of H. lacusprofundi type strain ACAM 34 and its annotation. In conclusion, this genome is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea.

  15. Complete genome sequence of the Antarctic Halorubrum lacusprofundi type strain ACAM 34

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Anderson, Iain J.; DasSarma, Priya; Lucas, Susan

    Halorubrum lacusprofundi is an extreme halophile within the archaeal phylum Euryarchaeota. The type strain ACAM 34 was isolated from Deep Lake, Antarctica. H. lacusprofundi is of phylogenetic interest because it is distantly related to the haloarchaea that have previously been sequenced. It is also of interest because of its psychrotolerance. We report here the complete genome sequence of H. lacusprofundi type strain ACAM 34 and its annotation. In conclusion, this genome is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea.

  16. MRI markers of small vessel disease in lobar and deep hemispheric intracerebral hemorrhage

    PubMed Central

    Smith, Eric E.; Nandigam, Kaveer R.N.; Chen, Yu-Wei; Jeng, Jed; Salat, David; Halpin, Amy; Frosch, Matthew; Wendell, Lauren; Fazen, Louis; Rosand, Jonathan; Viswanathan, Anand; Greenberg, Steven M.

    2014-01-01

    Background MRI evidence of small vessel disease is common in intracerebral hemorrhage (ICH). We hypothesized that ICH caused by cerebral amyloid angiopathy (CAA) or hypertensive vasculopathy would have different distributions of MRI T2 white matter hyperintensity (WMH) and microbleeds (MB). Methods Data were analyzed from 133 consecutive patients with primary supratentorial ICH and adequate MRI sequences. CAA was diagnosed using the Boston criteria. WMH segmentation was performed using a validated semi-automated method. WMH and MB were compared according to site of symptomatic hematoma origin (lobar vs. deep) or by pattern of hemorrhages, including both hematomas and MB, on MRI GRE sequence (grouped as lobar only--probable CAA, lobar only--possible CAA, deep hemispheric only, or mixed lobar and deep hemorrhages). Results Lobar and deep hemispheric hematoma patients had similar median nWMH volumes (19.5 cm vs. 19.9 cm3, p=0.74) and prevalence of ≥1 MB (54% vs. 52%, p=0.99). The supratentorial WMH distribution was similar according to hemorrhage location category, however the prevalence of brainstem T2 hyperintensity was lower in lobar hematoma vs. deep hematoma (54% vs. 70%, p=0.004). Mixed ICH was common (23%). Mixed ICH patients had large nWMH volumes and a posterior distribution of cortical hemorrhages similar to that seen in CAA. Conclusions WMH distribution is largely similar between CAA-related and non-CAA-related ICH. Mixed lobar and deep hemorrhages are seen on MRI GRE in up to one quarter of patients; in these patients both hypertension and CAA may be contributing to the burden of WMH. PMID:20689084

  17. Plastid Phylogenomics Resolve Deep Relationships among Eupolypod II Ferns with Rapid Radiation and Rate Heterogeneity

    PubMed Central

    Wei, Ran; Yan, Yue-Hong; Harris, AJ; Kang, Jong-Soo; Shen, Hui; Zhang, Xian-Chun

    2017-01-01

    Abstract The eupolypods II ferns represent a classic case of evolutionary radiation and, simultaneously, exhibit high substitution rate heterogeneity. These factors have been proposed to contribute to the contentious resolutions among clades within this fern group in multilocus phylogenetic studies. We investigated the deep phylogenetic relationships of eupolypod II ferns by sampling all major families and using 40 plastid genomes, or plastomes, of which 33 were newly sequenced with next-generation sequencing technology. We performed model-based analyses to evaluate the diversity of molecular evolutionary rates for these ferns. Our plastome data, with more than 26,000 informative characters, yielded good resolution for deep relationships within eupolypods II and unambiguously clarified the position of Rhachidosoraceae and the monophyly of Athyriaceae. Results of rate heterogeneity analysis revealed approximately 33 significant rate shifts in eupolypod II ferns, with the most heterogeneous rates (both accelerations and decelerations) occurring in two phylogenetically difficult lineages, that is, the Rhachidosoraceae–Aspleniaceae and Athyriaceae clades. These observations support the hypothesis that rate heterogeneity has previously constrained the deep phylogenetic resolution in eupolypods II. According to the plastome data, we propose that 14 chloroplast markers are particularly phylogenetically informative for eupolypods II both at the familial and generic levels. Our study demonstrates the power of a character-rich plastome data set and high-throughput sequencing for resolving the recalcitrant lineages, which have undergone rapid evolutionary radiation and dramatic changes in substitution rates. PMID:28854625

  18. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.

    PubMed

    Wang, Sheng; Sun, Siqi; Li, Zhen; Zhang, Renyu; Xu, Jinbo

    2017-01-01

    Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. http://raptorx.uchicago.edu/ContactMap/.

  19. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model

    PubMed Central

    Li, Zhen; Zhang, Renyu

    2017-01-01

    Motivation Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. Method This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Results Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. Availability http://raptorx.uchicago.edu/ContactMap/ PMID:28056090

  20. Integrating functional genomics to accelerate mechanistic personalized medicine.

    PubMed

    Tyner, Jeffrey W

    2017-03-01

    The advent of deep sequencing technologies has resulted in the deciphering of tremendous amounts of genetic information. These data have led to major discoveries, and many anecdotes now exist of individual patients whose clinical outcomes have benefited from novel, genetically guided therapeutic strategies. However, the majority of genetic events in cancer are currently undrugged, leading to a biological gap between understanding of tumor genetic etiology and translation to improved clinical approaches. Functional screening has made tremendous strides in recent years with the development of new experimental approaches to studying ex vivo and in vivo drug sensitivity. Numerous discoveries and anecdotes also exist for translation of functional screening into novel clinical strategies; however, the current clinical application of functional screening remains largely confined to small clinical trials at specific academic centers. The intersection between genomic and functional approaches represents an ideal modality to accelerate our understanding of drug sensitivities as they relate to specific genetic events and further understand the full mechanisms underlying drug sensitivity patterns.

  1. Deep Learning MR Imaging-based Attenuation Correction for PET/MR Imaging.

    PubMed

    Liu, Fang; Jang, Hyungseok; Kijowski, Richard; Bradshaw, Tyler; McMillan, Alan B

    2018-02-01

    Purpose To develop and evaluate the feasibility of deep learning approaches for magnetic resonance (MR) imaging-based attenuation correction (AC) (termed deep MRAC) in brain positron emission tomography (PET)/MR imaging. Materials and Methods A PET/MR imaging AC pipeline was built by using a deep learning approach to generate pseudo computed tomographic (CT) scans from MR images. A deep convolutional auto-encoder network was trained to identify air, bone, and soft tissue in volumetric head MR images coregistered to CT data for training. A set of 30 retrospective three-dimensional T1-weighted head images was used to train the model, which was then evaluated in 10 patients by comparing the generated pseudo CT scan to an acquired CT scan. A prospective study was carried out for utilizing simultaneous PET/MR imaging for five subjects by using the proposed approach. Analysis of covariance and paired-sample t tests were used for statistical analysis to compare PET reconstruction error with deep MRAC and two existing MR imaging-based AC approaches with CT-based AC. Results Deep MRAC provides an accurate pseudo CT scan with a mean Dice coefficient of 0.971 ± 0.005 for air, 0.936 ± 0.011 for soft tissue, and 0.803 ± 0.021 for bone. Furthermore, deep MRAC provides good PET results, with average errors of less than 1% in most brain regions. Significantly lower PET reconstruction errors were realized with deep MRAC (-0.7% ± 1.1) compared with Dixon-based soft-tissue and air segmentation (-5.8% ± 3.1) and anatomic CT-based template registration (-4.8% ± 2.2). Conclusion The authors developed an automated approach that allows generation of discrete-valued pseudo CT scans (soft tissue, bone, and air) from a single high-spatial-resolution diagnostic-quality three-dimensional MR image and evaluated it in brain PET/MR imaging. This deep learning approach for MR imaging-based AC provided reduced PET reconstruction error relative to a CT-based standard within the brain compared with current MR imaging-based AC approaches. © RSNA, 2017 Online supplemental material is available for this article.

  2. Draft Genome Sequence of Aldehyde-Degrading Strain Halomonas axialensis ACH-L-8

    PubMed Central

    Ye, Jun; Ren, Chong; Shan, Xiexie

    2016-01-01

    Halomonas axialensis ACH-L-8, a deep-sea strain isolated from the South China Sea, has the ability to degrade aldehydes. Here, we present an annotated draft genome sequence of this species, which could provide fundamental molecular information on the aldehydes-degrading mechanism. PMID:27081145

  3. Transcriptomic sequencing reveals a set of unique genes activated by butyrate-induced histone modification

    USDA-ARS?s Scientific Manuscript database

    Butyrate is a nutritional element with strong epigenetic regulatory activity as an inhibitor of histone deacetylases (HDACs). Based on the analysis of differentially expressed genes induced by butyrate in the bovine epithelial cell using deep RNA-sequencing technology (RNA-seq), a set of unique gen...

  4. Metagenomic gene annotation by a homology-independent approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Froula, Jeff; Zhang, Tao; Salmeen, Annette

    2011-06-02

    Fully understanding the genetic potential of a microbial community requires functional annotation of all the genes it encodes. The recently developed deep metagenome sequencing approach has enabled rapid identification of millions of genes from a complex microbial community without cultivation. Current homology-based gene annotation fails to detect distantly-related or structural homologs. Furthermore, homology searches with millions of genes are very computational intensive. To overcome these limitations, we developed rhModeller, a homology-independent software pipeline to efficiently annotate genes from metagenomic sequencing projects. Using cellulases and carbonic anhydrases as two independent test cases, we demonstrated that rhModeller is much faster than HMMERmore » but with comparable accuracy, at 94.5percent and 99.9percent accuracy, respectively. More importantly, rhModeller has the ability to detect novel proteins that do not share significant homology to any known protein families. As {approx}50percent of the 2 million genes derived from the cow rumen metagenome failed to be annotated based on sequence homology, we tested whether rhModeller could be used to annotate these genes. Preliminary results suggest that rhModeller is robust in the presence of missense and frameshift mutations, two common errors in metagenomic genes. Applying the pipeline to the cow rumen genes identified 4,990 novel cellulases candidates and 8,196 novel carbonic anhydrase candidates.In summary, we expect rhModeller to dramatically increase the speed and quality of metagnomic gene annotation.« less

  5. When practice precedes theory - A mixed methods evaluation of students' learning experiences in an undergraduate study program in nursing.

    PubMed

    Falk, Kristin; Falk, Hanna; Jakobsson Ung, Eva

    2016-01-01

    A key area for consideration is determining how optimal conditions for learning can be created. Higher education in nursing aims to prepare students to develop their capabilities to become independent professionals. The aim of this study was to evaluate the effects of sequencing clinical practice prior to theoretical studies on student's experiences of self-directed learning readiness and students' approach to learning in the second year of a three-year undergraduate study program in nursing. 123 nursing students was included in the study and divided in two groups. In group A (n = 60) clinical practice preceded theoretical studies. In group (n = 63) theoretical studies preceded clinical practice. Learning readiness was measured using the Directed Learning Readiness Scale for Nursing Education (SDLRSNE), and learning process was measured using the revised two-factor version of the Study Process Questionnaire (R-SPQ-2F). Students were also asked to write down their personal reflections throughout the course. By using a mixed method design, the qualitative component focused on the students' personal experiences in relation to the sequencing of theoretical studies and clinical practice. The quantitative component provided information about learning readiness before and after the intervention. Our findings confirm that students are sensitive and adaptable to their learning contexts, and that the sequencing of courses is subordinate to a pedagogical style enhancing students' deep learning approaches, which needs to be incorporated in the development of undergraduate nursing programs. Copyright © 2015 Elsevier Ltd. All rights reserved.

  6. Surveying N2O-producing pathways in bacteria.

    PubMed

    Stein, Lisa Y

    2011-01-01

    Nitrous oxide (N(2)O) is produced by bacteria as an intermediate of both dissimilatory and detoxification pathways under a range of oxygen levels, although the majority of N(2)O is released in suboxic to anoxic environments. N(2)O production under physiologically relevant conditions appears to require the reduction of nitric oxide (NO) produced from the oxidation of hydroxylamine (nitrification), reduction of nitrite (denitrification), or by host cells of pathogenic bacteria. In a single bacterial isolate, N(2)O-producing pathways can be complex, overlapping, involve multiple enzymes with the same function, and require multiple layers of regulatory machinery. This overview discusses how to identify known N(2)O-producing inventory and regulatory sequences within bacterial genome sequences and basic physiological approaches for investigating the function of that inventory. A multitude of review articles have been published on individual enzymes, pathways, regulation, and environmental significance of N(2)O-production encompassing a large diversity of bacterial isolates. The combination of next-generation deep sequencing platforms, emerging proteomics technologies, and basic microbial physiology can be used to expand what is known about N(2)O-producing pathways in individual bacterial species to discover novel inventory and unifying features of pathways. A combination of approaches is required to understand and generalize the function and control of N(2)O production across a range of temporal and spatial scales within natural and host environments. Copyright © 2011 Elsevier Inc. All rights reserved.

  7. Computational phenotype discovery using unsupervised feature learning over noisy, sparse, and irregular clinical data.

    PubMed

    Lasko, Thomas A; Denny, Joshua C; Levy, Mia A

    2013-01-01

    Inferring precise phenotypic patterns from population-scale clinical data is a core computational task in the development of precision, personalized medicine. The traditional approach uses supervised learning, in which an expert designates which patterns to look for (by specifying the learning task and the class labels), and where to look for them (by specifying the input variables). While appropriate for individual tasks, this approach scales poorly and misses the patterns that we don't think to look for. Unsupervised feature learning overcomes these limitations by identifying patterns (or features) that collectively form a compact and expressive representation of the source data, with no need for expert input or labeled examples. Its rising popularity is driven by new deep learning methods, which have produced high-profile successes on difficult standardized problems of object recognition in images. Here we introduce its use for phenotype discovery in clinical data. This use is challenging because the largest source of clinical data - Electronic Medical Records - typically contains noisy, sparse, and irregularly timed observations, rendering them poor substrates for deep learning methods. Our approach couples dirty clinical data to deep learning architecture via longitudinal probability densities inferred using Gaussian process regression. From episodic, longitudinal sequences of serum uric acid measurements in 4368 individuals we produced continuous phenotypic features that suggest multiple population subtypes, and that accurately distinguished (0.97 AUC) the uric-acid signatures of gout vs. acute leukemia despite not being optimized for the task. The unsupervised features were as accurate as gold-standard features engineered by an expert with complete knowledge of the domain, the classification task, and the class labels. Our findings demonstrate the potential for achieving computational phenotype discovery at population scale. We expect such data-driven phenotypes to expose unknown disease variants and subtypes and to provide rich targets for genetic association studies.

  8. Computational Phenotype Discovery Using Unsupervised Feature Learning over Noisy, Sparse, and Irregular Clinical Data

    PubMed Central

    Lasko, Thomas A.; Denny, Joshua C.; Levy, Mia A.

    2013-01-01

    Inferring precise phenotypic patterns from population-scale clinical data is a core computational task in the development of precision, personalized medicine. The traditional approach uses supervised learning, in which an expert designates which patterns to look for (by specifying the learning task and the class labels), and where to look for them (by specifying the input variables). While appropriate for individual tasks, this approach scales poorly and misses the patterns that we don’t think to look for. Unsupervised feature learning overcomes these limitations by identifying patterns (or features) that collectively form a compact and expressive representation of the source data, with no need for expert input or labeled examples. Its rising popularity is driven by new deep learning methods, which have produced high-profile successes on difficult standardized problems of object recognition in images. Here we introduce its use for phenotype discovery in clinical data. This use is challenging because the largest source of clinical data – Electronic Medical Records – typically contains noisy, sparse, and irregularly timed observations, rendering them poor substrates for deep learning methods. Our approach couples dirty clinical data to deep learning architecture via longitudinal probability densities inferred using Gaussian process regression. From episodic, longitudinal sequences of serum uric acid measurements in 4368 individuals we produced continuous phenotypic features that suggest multiple population subtypes, and that accurately distinguished (0.97 AUC) the uric-acid signatures of gout vs. acute leukemia despite not being optimized for the task. The unsupervised features were as accurate as gold-standard features engineered by an expert with complete knowledge of the domain, the classification task, and the class labels. Our findings demonstrate the potential for achieving computational phenotype discovery at population scale. We expect such data-driven phenotypes to expose unknown disease variants and subtypes and to provide rich targets for genetic association studies. PMID:23826094

  9. Colonization of plant substrates at hydrothermal vents and cold seeps in the northeast Atlantic and Mediterranean and occurrence of symbiont-related bacteria

    PubMed Central

    Szafranski, Kamil M.; Deschamps, Philippe; Cunha, Marina R.; Gaudron, Sylvie M.; Duperron, Sébastien

    2015-01-01

    Reducing conditions with elevated sulfide and methane concentrations in ecosystems such as hydrothermal vents, cold seeps or organic falls, are suitable for chemosynthetic primary production. Understanding processes driving bacterial diversity, colonization and dispersal is of prime importance for deep-sea microbial ecology. This study provides a detailed characterization of bacterial assemblages colonizing plant-derived substrates using a standardized approach over a geographic area spanning the North-East Atlantic and Mediterranean. Wood and alfalfa substrates in colonization devices were deployed for different periods at 8 deep-sea chemosynthesis-based sites in four distinct geographic areas. Pyrosequencing of a fragment of the 16S rRNA-encoding gene was used to describe bacterial communities. Colonization occurred within the first 14 days. The diversity was higher in samples deployed for more than 289 days. After 289 days, no relation was observed between community richness and deployment duration, suggesting that diversity may have reached saturation sometime in between. Communities in long-term deployments were different, and their composition was mainly influenced by the geographical location where devices were deployed. Numerous sequences related to horizontally-transmitted chemosynthetic symbionts of metazoans were identified. Their potential status as free-living forms of these symbionts was evaluated based on sequence similarity with demonstrated symbionts. Results suggest that some free-living forms of metazoan symbionts or their close relatives, such as Epsilonproteobacteria associated with the shrimp Rimicaris exoculata, are efficient colonizers of plant substrates at vents and seeps. PMID:25774156

  10. The Cotoncello Shear Zone (Elba Island, Italy): The deep root of a fossil oceanic detachment fault in the Ligurian ophiolites

    NASA Astrophysics Data System (ADS)

    Frassi, Chiara; Musumeci, Giovanni; Zucali, Michele; Mazzarini, Francesco; Rebay, Gisella; Langone, Antonio

    2017-05-01

    The ophiolite sequences in the western Elba Island are classically interpreted as a well-exposed ocean-floor section emplaced during the Apennines orogeny at the top of the tectonic nappe-stack. Stratigraphic, petrological and geochemical features indicate that these ophiolite sequences are remnants of slow-ultraslow spreading oceanic lithosphere analogous to the present-day Mid-Atlantic Ridge and Southwest Indian Ridge. Within the oceanward section of Tethyan lithosphere exposed in the Elba Island, we investigated for the first time a ​10s of meters-thick structure, the Cotoncello Shear Zone (CSZ), that records high-temperature ductile deformation. We used a multidisciplinary approach to document the tectono-metamorphic evolution of the shear zone and its role during spreading of the western Tethys. In addition, we used zircon U-Pb ages to date formation of the gabbroic lower crust in this sector of the Apennines. Our results indicate that the CSZ rooted below the brittle-ductile transition at temperature above 800 °C. A high-temperature ductile fabric was overprinted by fabrics recorded during progressive exhumation up to shallower levers under temperature < 500 °C. We suggest that the CSZ may represent the deep root of a detachment fault that accomplished exhumation of an ancient oceanic core complex (OCC) in between two stages of magmatic accretion. We suggest that the CSZ represents an excellent on-land example enabling to assess relationships between magmatism and deformation when extensional oceanic detachments are at work.

  11. Effective deep learning training for single-image super-resolution in endomicroscopy exploiting video-registration-based reconstruction.

    PubMed

    Ravì, Daniele; Szczotka, Agnieszka Barbara; Shakir, Dzhoshkun Ismail; Pereira, Stephen P; Vercauteren, Tom

    2018-06-01

    Probe-based confocal laser endomicroscopy (pCLE) is a recent imaging modality that allows performing in vivo optical biopsies. The design of pCLE hardware, and its reliance on an optical fibre bundle, fundamentally limits the image quality with a few tens of thousands fibres, each acting as the equivalent of a single-pixel detector, assembled into a single fibre bundle. Video registration techniques can be used to estimate high-resolution (HR) images by exploiting the temporal information contained in a sequence of low-resolution (LR) images. However, the alignment of LR frames, required for the fusion, is computationally demanding and prone to artefacts. In this work, we propose a novel synthetic data generation approach to train exemplar-based Deep Neural Networks (DNNs). HR pCLE images with enhanced quality are recovered by the models trained on pairs of estimated HR images (generated by the video registration algorithm) and realistic synthetic LR images. Performance of three different state-of-the-art DNNs techniques were analysed on a Smart Atlas database of 8806 images from 238 pCLE video sequences. The results were validated through an extensive image quality assessment that takes into account different quality scores, including a Mean Opinion Score (MOS). Results indicate that the proposed solution produces an effective improvement in the quality of the obtained reconstructed image. The proposed training strategy and associated DNNs allows us to perform convincing super-resolution of pCLE images.

  12. Suboxic deep seawater in the late Paleoproterozoic: Evidence from hematitic chert and iron formation related to seafloor-hydrothermal sulfide deposits, central Arizona, USA

    USGS Publications Warehouse

    Slack, J.F.; Grenne, Tor; Bekker, A.; Rouxel, O.J.; Lindberg, P.A.

    2007-01-01

    A current model for the evolution of Proterozoic deep seawater composition involves a change from anoxic sulfide-free to sulfidic conditions 1.8??Ga. In an earlier model the deep ocean became oxic at that time. Both models are based on the secular distribution of banded iron formation (BIF) in shallow marine sequences. We here present a new model based on rare earth elements, especially redox-sensitive Ce, in hydrothermal silica-iron oxide sediments from deeper-water, open-marine settings related to volcanogenic massive sulfide (VMS) deposits. In contrast to Archean, Paleozoic, and modern hydrothermal iron oxide sediments, 1.74 to 1.71??Ga hematitic chert (jasper) and iron formation in central Arizona, USA, show moderate positive to small negative Ce anomalies, suggesting that the redox state of the deep ocean then was at a transitional, suboxic state with low concentrations of dissolved O2 but no H2S. The presence of jasper and/or iron formation related to VMS deposits in other volcanosedimentary sequences ca. 1.79-1.69??Ga, 1.40??Ga, and 1.24??Ga also reflects oxygenated and not sulfidic deep ocean waters during these time periods. Suboxic conditions in the deep ocean are consistent with the lack of shallow-marine BIF ??? 1.8 to 0.8??Ga, and likely limited nutrient concentrations in seawater and, consequently, may have constrained biological evolution. ?? 2006 Elsevier B.V. All rights reserved.

  13. Changing Students' Approaches to Learning: A Two-Year Study within a University Teacher Training Course

    ERIC Educational Resources Information Center

    Gijbels, David; Coertjens, Liesje; Vanthournout, Gert; Struyf, Elke; Van Petegem, Peter

    2009-01-01

    Inciting a deep approach to learning in students is difficult. The present research poses two questions: can a constructivist learning-assessment environment change students' approaches towards a more deep approach? What effect does additional feedback have on the changes in learning approaches? Two cohorts of students completed questionnaires…

  14. Coexpression networks implicate human midfetal deep cortical projection neurons in the pathogenesis of autism

    PubMed Central

    Willsey, A. Jeremy; Sanders, Stephan J.; Li, Mingfeng; Dong, Shan; Tebbenkamp, Andrew T.; Muhle, Rebecca A.; Reilly, Steven K.; Lin, Leon; Fertuzinhos, Sofia; Miller, Jeremy A.; Murtha, Michael T.; Bichsel, Candace; Niu, Wei; Cotney, Justin; Ercan-Sencicek, A. Gulhan; Gockley, Jake; Gupta, Abha; Han, Wenqi; He, Xin; Hoffman, Ellen; Klei, Lambertus; Lei, Jing; Liu, Wenzhong; Liu, Li; Lu, Cong; Xu, Xuming; Zhu, Ying; Mane, Shrikant M.; Lein, Edward S.; Wei, Liping; Noonan, James P.; Roeder, Kathryn; Devlin, Bernie; Šestan, Nenad; State, Matthew W.

    2013-01-01

    SUMMARY Autism spectrum disorder (ASD) is a complex developmental syndrome of unknown etiology. Recent studies employing exome- and genome-wide sequencing have identified nine high-confidence ASD (hcASD) genes. Working from the hypothesis that ASD-associated mutations in these biologically pleiotropic genes will disrupt intersecting developmental processes to contribute to a common phenotype, we have attempted to identify time periods, brain regions, and cell types in which these genes converge. We have constructed coexpression networks based on the hcASD “seed” genes, leveraging a rich expression data set encompassing multiple human brain regions across human development and into adulthood. By assessing enrichment of an independent set of probable ASD (pASD) genes, derived from the same sequencing studies, we demonstrate a key point of convergence in midfetal layer 5/6 cortical projection neurons. This approach informs when, where, and in what cell types mutations in these specific genes may be productively studied to clarify ASD pathophysiology. PMID:24267886

  15. Jellyfish Bioactive Compounds: Methods for Wet-Lab Work

    PubMed Central

    Frazão, Bárbara; Antunes, Agostinho

    2016-01-01

    The study of bioactive compounds from marine animals has provided, over time, an endless source of interesting molecules. Jellyfish are commonly targets of study due to their toxic proteins. However, there is a gap in reviewing successful wet-lab methods employed in these animals, which compromises the fast progress in the detection of related biomolecules. Here, we provide a compilation of the most effective wet-lab methodologies for jellyfish venom extraction prior to proteomic analysis—separation, identification and toxicity assays. This includes SDS-PAGE, 2DE, gel chromatography, HPLC, DEAE, LC-MS, MALDI, Western blot, hemolytic assay, antimicrobial assay and protease activity assay. For a more comprehensive approach, jellyfish toxicity studies should further consider transcriptome sequencing. We reviewed such methodologies and other genomic techniques used prior to the deep sequencing of transcripts, including RNA extraction, construction of cDNA libraries and RACE. Overall, we provide an overview of the most promising methods and their successful implementation for optimizing time and effort when studying jellyfish. PMID:27077869

  16. Jellyfish Bioactive Compounds: Methods for Wet-Lab Work.

    PubMed

    Frazão, Bárbara; Antunes, Agostinho

    2016-04-12

    The study of bioactive compounds from marine animals has provided, over time, an endless source of interesting molecules. Jellyfish are commonly targets of study due to their toxic proteins. However, there is a gap in reviewing successful wet-lab methods employed in these animals, which compromises the fast progress in the detection of related biomolecules. Here, we provide a compilation of the most effective wet-lab methodologies for jellyfish venom extraction prior to proteomic analysis-separation, identification and toxicity assays. This includes SDS-PAGE, 2DE, gel chromatography, HPLC, DEAE, LC-MS, MALDI, Western blot, hemolytic assay, antimicrobial assay and protease activity assay. For a more comprehensive approach, jellyfish toxicity studies should further consider transcriptome sequencing. We reviewed such methodologies and other genomic techniques used prior to the deep sequencing of transcripts, including RNA extraction, construction of cDNA libraries and RACE. Overall, we provide an overview of the most promising methods and their successful implementation for optimizing time and effort when studying jellyfish.

  17. Mapping Phylogenetic Trees to Reveal Distinct Patterns of Evolution

    PubMed Central

    Kendall, Michelle; Colijn, Caroline

    2016-01-01

    Evolutionary relationships are frequently described by phylogenetic trees, but a central barrier in many fields is the difficulty of interpreting data containing conflicting phylogenetic signals. We present a metric-based method for comparing trees which extracts distinct alternative evolutionary relationships embedded in data. We demonstrate detection and resolution of phylogenetic uncertainty in a recent study of anole lizards, leading to alternate hypotheses about their evolutionary relationships. We use our approach to compare trees derived from different genes of Ebolavirus and find that the VP30 gene has a distinct phylogenetic signature composed of three alternatives that differ in the deep branching structure. Key words: phylogenetics, evolution, tree metrics, genetics, sequencing. PMID:27343287

  18. RNA deep sequencing as a tool for selection of cell lines for systematic subcellular localization of all human proteins.

    PubMed

    Danielsson, Frida; Wiking, Mikaela; Mahdessian, Diana; Skogs, Marie; Ait Blal, Hammou; Hjelmare, Martin; Stadler, Charlotte; Uhlén, Mathias; Lundberg, Emma

    2013-01-04

    One of the major challenges of a chromosome-centric proteome project is to explore in a systematic manner the potential proteins identified from the chromosomal genome sequence, but not yet characterized on a protein level. Here, we describe the use of RNA deep sequencing to screen human cell lines for RNA profiles and to use this information to select cell lines suitable for characterization of the corresponding gene product. In this manner, the subcellular localization of proteins can be analyzed systematically using antibody-based confocal microscopy. We demonstrate the usefulness of selecting cell lines with high expression levels of RNA transcripts to increase the likelihood of high quality immunofluorescence staining and subsequent successful subcellular localization of the corresponding protein. The results show a path to combine transcriptomics with affinity proteomics to characterize the proteins in a gene- or chromosome-centric manner.

  19. Deep sequencing reveals persistence of cell-associated mumps vaccine virus in chronic encephalitis.

    PubMed

    Morfopoulou, Sofia; Mee, Edward T; Connaughton, Sarah M; Brown, Julianne R; Gilmour, Kimberly; Chong, W K 'Kling'; Duprex, W Paul; Ferguson, Deborah; Hubank, Mike; Hutchinson, Ciaran; Kaliakatsos, Marios; McQuaid, Stephen; Paine, Simon; Plagnol, Vincent; Ruis, Christopher; Virasami, Alex; Zhan, Hong; Jacques, Thomas S; Schepelmann, Silke; Qasim, Waseem; Breuer, Judith

    2017-01-01

    Routine childhood vaccination against measles, mumps and rubella has virtually abolished virus-related morbidity and mortality. Notwithstanding this, we describe here devastating neurological complications associated with the detection of live-attenuated mumps virus Jeryl Lynn (MuV JL5 ) in the brain of a child who had undergone successful allogeneic transplantation for severe combined immunodeficiency (SCID). This is the first confirmed report of MuV JL5 associated with chronic encephalitis and highlights the need to exclude immunodeficient individuals from immunisation with live-attenuated vaccines. The diagnosis was only possible by deep sequencing of the brain biopsy. Sequence comparison of the vaccine batch to the MuV JL5 isolated from brain identified biased hypermutation, particularly in the matrix gene, similar to those found in measles from cases of SSPE. The findings provide unique insights into the pathogenesis of paramyxovirus brain infections.

  20. Using Poisson mixed-effects model to quantify transcript-level gene expression in RNA-Seq.

    PubMed

    Hu, Ming; Zhu, Yu; Taylor, Jeremy M G; Liu, Jun S; Qin, Zhaohui S

    2012-01-01

    RNA sequencing (RNA-Seq) is a powerful new technology for mapping and quantifying transcriptomes using ultra high-throughput next-generation sequencing technologies. Using deep sequencing, gene expression levels of all transcripts including novel ones can be quantified digitally. Although extremely promising, the massive amounts of data generated by RNA-Seq, substantial biases and uncertainty in short read alignment pose challenges for data analysis. In particular, large base-specific variation and between-base dependence make simple approaches, such as those that use averaging to normalize RNA-Seq data and quantify gene expressions, ineffective. In this study, we propose a Poisson mixed-effects (POME) model to characterize base-level read coverage within each transcript. The underlying expression level is included as a key parameter in this model. Since the proposed model is capable of incorporating base-specific variation as well as between-base dependence that affect read coverage profile throughout the transcript, it can lead to improved quantification of the true underlying expression level. POME can be freely downloaded at http://www.stat.purdue.edu/~yuzhu/pome.html. yuzhu@purdue.edu; zhaohui.qin@emory.edu Supplementary data are available at Bioinformatics online.

  1. Assessing the Gene Content of the Megagenome: Sugar Pine (Pinus lambertiana)

    PubMed Central

    Gonzalez-Ibeas, Daniel; Martinez-Garcia, Pedro J.; Famula, Randi A.; Delfino-Mix, Annette; Stevens, Kristian A.; Loopstra, Carol A.; Langley, Charles H.; Neale, David B.; Wegrzyn, Jill L.

    2016-01-01

    Sugar pine (Pinus lambertiana Douglas) is within the subgenus Strobus with an estimated genome size of 31 Gbp. Transcriptomic resources are of particular interest in conifers due to the challenges presented in their megagenomes for gene identification. In this study, we present the first comprehensive survey of the P. lambertiana transcriptome through deep sequencing of a variety of tissue types to generate more than 2.5 billion short reads. Third generation, long reads generated through PacBio Iso-Seq have been included for the first time in conifers to combat the challenges associated with de novo transcriptome assembly. A technology comparison is provided here to contribute to the otherwise scarce comparisons of second and third generation transcriptome sequencing approaches in plant species. In addition, the transcriptome reference was essential for gene model identification and quality assessment in the parallel project responsible for sequencing and assembly of the entire genome. In this study, the transcriptomic data were also used to address questions surrounding lineage-specific Dicer-like proteins in conifers. These proteins play a role in the control of transposable element proliferation and the related genome expansion in conifers. PMID:27799338

  2. Biophysics of protein evolution and evolutionary protein biophysics

    PubMed Central

    Sikosek, Tobias; Chan, Hue Sun

    2014-01-01

    The study of molecular evolution at the level of protein-coding genes often entails comparing large datasets of sequences to infer their evolutionary relationships. Despite the importance of a protein's structure and conformational dynamics to its function and thus its fitness, common phylogenetic methods embody minimal biophysical knowledge of proteins. To underscore the biophysical constraints on natural selection, we survey effects of protein mutations, highlighting the physical basis for marginal stability of natural globular proteins and how requirement for kinetic stability and avoidance of misfolding and misinteractions might have affected protein evolution. The biophysical underpinnings of these effects have been addressed by models with an explicit coarse-grained spatial representation of the polypeptide chain. Sequence–structure mappings based on such models are powerful conceptual tools that rationalize mutational robustness, evolvability, epistasis, promiscuous function performed by ‘hidden’ conformational states, resolution of adaptive conflicts and conformational switches in the evolution from one protein fold to another. Recently, protein biophysics has been applied to derive more accurate evolutionary accounts of sequence data. Methods have also been developed to exploit sequence-based evolutionary information to predict biophysical behaviours of proteins. The success of these approaches demonstrates a deep synergy between the fields of protein biophysics and protein evolution. PMID:25165599

  3. Draft Genome Sequence of Thermus scotoductus Strain K1, Isolated from a Geothermal Spring in Karvachar, Nagorno Karabakh.

    PubMed

    Saghatelyan, Ani; Poghosyan, Lianna; Panosyan, Hovik; Birkeland, Nils-Kåre

    2015-11-12

    The 2,379,636-bp draft genome sequence of Thermus scotoductus strain K1, isolated from geothermal spring outlet located in the Karvachar region in Nagorno Karabakh is presented. Strain K1 shares about 80% genome sequence similarity with T. scotoductus strain SA-01, recovered from a deep gold mine in South Africa. Copyright © 2015 Saghatelyan et al.

  4. Not All Particles Are Equal: The Selective Enrichment of Particle-Associated Bacteria from the Mediterranean Sea.

    PubMed

    López-Pérez, Mario; Kimes, Nikole E; Haro-Moreno, Jose M; Rodriguez-Valera, Francisco

    2016-01-01

    We have used two metagenomic approaches, direct sequencing of natural samples and sequencing after enrichment, to characterize communities of prokaryotes associated to particles. In the first approximation, different size filters (0.22 and 5 μm) were used to identify prokaryotic microbes of free-living and particle-attached bacterial communities in the Mediterranean water column. A subtractive metagenomic approach was used to characterize the dominant microbial groups in the large size fraction that were not present in the free-living one. They belonged mainly to Actinobacteria, Planctomycetes, Flavobacteria and Proteobacteria. In addition, marine microbial communities enriched by incubation with different kinds of particulate material have been studied by metagenomic assembly. Different particle kinds (diatomaceous earth, sand, chitin and cellulose) were colonized by very different communities of bacteria belonging to Roseobacter, Vibrio, Bacteriovorax, and Lacinutrix that were distant relatives of genomes already described from marine habitats. Besides, using assembly from deep metagenomic sequencing from the particle-specific enrichments we were able to determine a total of 20 groups of contigs (eight of them with >50% completeness) and reconstruct de novo five new genomes of novel species within marine clades (>79% completeness and <1.8% contamination). We also describe for the first time the genome of a marine Rhizobiales phage that seems to infect a broad range of Alphaproteobacteria and live in habitats as diverse as soil, marine sediment and water column. The metagenomic recruitment of the communities found by direct sequencing of the large size filter and by enrichment had nearly no overlap. These results indicate that these reconstructed genomes are part of the rare biosphere which exists at nominal levels under natural conditions.

  5. Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

    PubMed Central

    Matochko, Wadim L.; Derda, Ratmir

    2013-01-01

    Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (S a). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of S a and use them to define the sequencing operator (S e q). Sequencing without any bias and errors is S e q = S a IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (C E N), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process. PMID:24416071

  6. Teaching of real numbers by using the Archimedes-Cantor approach and computer algebra systems

    NASA Astrophysics Data System (ADS)

    Vorob'ev, Evgenii M.

    2015-11-01

    Computer technologies and especially computer algebra systems (CAS) allow students to overcome some of the difficulties they encounter in the study of real numbers. The teaching of calculus can be considerably more effective with the use of CAS provided the didactics of the discipline makes it possible to reveal the full computational potential of CAS. In the case of real numbers, the Archimedes-Cantor approach satisfies this requirement. The name of Archimedes brings back the exhaustion method. Cantor's name reminds us of the use of Cauchy rational sequences to represent real numbers. The usage of CAS with the Archimedes-Cantor approach enables the discussion of various representations of real numbers such as graphical, decimal, approximate decimal with precision estimates, and representation as points on a straight line. Exercises with numbers such as e, π, the golden ratio ϕ, and algebraic irrational numbers can help students better understand the real numbers. The Archimedes-Cantor approach also reveals a deep and close relationship between real numbers and continuity, in particular the continuity of functions.

  7. Poly(A)-tag deep sequencing data processing to extract poly(A) sites.

    PubMed

    Wu, Xiaohui; Ji, Guoli; Li, Qingshun Quinn

    2015-01-01

    Polyadenylation [poly(A)] is an essential posttranscriptional processing step in the maturation of eukaryotic mRNA. The advent of next-generation sequencing (NGS) technology has offered feasible means to generate large-scale data and new opportunities for intensive study of polyadenylation, particularly deep sequencing of the transcriptome targeting the junction of 3'-UTR and the poly(A) tail of the transcript. To take advantage of this unprecedented amount of data, we present an automated workflow to identify polyadenylation sites by integrating NGS data cleaning, processing, mapping, normalizing, and clustering. In this pipeline, a series of Perl scripts are seamlessly integrated to iteratively map the single- or paired-end sequences to the reference genome. After mapping, the poly(A) tags (PATs) at the same genome coordinate are grouped into one cleavage site, and the internal priming artifacts removed. Then the ambiguous region is introduced to parse the genome annotation for cleavage site clustering. Finally, cleavage sites within a close range of 24 nucleotides and from different samples can be clustered into poly(A) clusters. This procedure could be used to identify thousands of reliable poly(A) clusters from millions of NGS sequences in different tissues or treatments.

  8. Integrated sequence stratigraphy of the postimpact sediments from the Eyreville core holes, Chesapeake Bay impact structure inner basin

    USGS Publications Warehouse

    Browning, J.V.; Miller, K.G.; McLaughlin, P.P.; Edwards, L.E.; Kulpecz, A.A.; Powars, D.S.; Wade, B.S.; Feigenson, M.D.; Wright, J.D.

    2009-01-01

    The Eyreville core holes provide the first continuously cored record of postimpact sequences from within the deepest part of the central Chesapeake Bay impact crater. We analyzed the upper Eocene to Pliocene postimpact sediments from the Eyreville A and C core holes for lithology (semiquantitative measurements of grain size and composition), sequence stratigraphy, and chronostratigraphy. Age is based primarily on Sr isotope stratigraphy supplemented by biostratigraphy (dinocysts, nannofossils, and planktonic foraminifers); age resolution is approximately ??0.5 Ma for early Miocene sequences and approximately ??1.0 Ma for younger and older sequences. Eocene-lower Miocene sequences are subtle, upper middle to lower upper Miocene sequences are more clearly distinguished, and upper Miocene- Pliocene sequences display a distinct facies pattern within sequences. We recognize two upper Eocene, two Oligocene, nine Miocene, three Pliocene, and one Pleistocene sequence and correlate them with those in New Jersey and Delaware. The upper Eocene through Pleistocene strata at Eyreville record changes from: (1) rapidly deposited, extremely fi ne-grained Eocene strata that probably represent two sequences deposited in a deep (>200 m) basin; to (2) highly dissected Oligocene (two very thin sequences) to lower Miocene (three thin sequences) with a long hiatus; to (3) a thick, rapidly deposited (43-73 m/Ma), very fi ne-grained, biosiliceous middle Miocene (16.5-14 Ma) section divided into three sequences (V5-V3) deposited in middle neritic paleoenvironments; to (4) a 4.5-Ma-long hiatus (12.8-8.3 Ma); to (5) sandy, shelly upper Miocene to Pliocene strata (8.3-2.0 Ma) divided into six sequences deposited in shelf and shoreface environments; and, last, to (6) a sandy middle Pleistocene paralic sequence (~400 ka). The Eyreville cores thus record the fi lling of a deep impact-generated basin where the timing of sequence boundaries is heavily infl uenced by eustasy. ?? 2009 The Geological Society of America.

  9. Deep and surface learning in problem-based learning: a review of the literature.

    PubMed

    Dolmans, Diana H J M; Loyens, Sofie M M; Marcq, Hélène; Gijbels, David

    2016-12-01

    In problem-based learning (PBL), implemented worldwide, students learn by discussing professionally relevant problems enhancing application and integration of knowledge, which is assumed to encourage students towards a deep learning approach in which students are intrinsically interested and try to understand what is being studied. This review investigates: (1) the effects of PBL on students' deep and surface approaches to learning, (2) whether and why these effects do differ across (a) the context of the learning environment (single vs. curriculum wide implementation), and (b) study quality. Studies were searched dealing with PBL and students' approaches to learning. Twenty-one studies were included. The results indicate that PBL does enhance deep learning with a small positive average effect size of .11 and a positive effect in eleven of the 21 studies. Four studies show a decrease in deep learning and six studies show no effect. PBL does not seem to have an effect on surface learning as indicated by a very small average effect size (.08) and eleven studies showing no increase in the surface approach. Six studies demonstrate a decrease and four an increase in surface learning. It is concluded that PBL does seem to enhance deep learning and has little effect on surface learning, although more longitudinal research using high quality measurement instruments is needed to support this conclusion with stronger evidence. Differences cannot be explained by the study quality but a curriculum wide implementation of PBL has a more positive impact on the deep approach (effect size .18) compared to an implementation within a single course (effect size of -.05). PBL is assumed to enhance active learning and students' intrinsic motivation, which enhances deep learning. A high perceived workload and assessment that is perceived as not rewarding deep learning are assumed to enhance surface learning.

  10. Predicting residue-wise contact orders in proteins by support vector regression.

    PubMed

    Song, Jiangning; Burrage, Kevin

    2006-10-03

    The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.

  11. Ultra-Deep Sequencing Analysis of the Hepatitis A Virus 5'-Untranslated Region among Cases of the Same Outbreak from a Single Source

    PubMed Central

    Wu, Shuang; Nakamoto, Shingo; Kanda, Tatsuo; Jiang, Xia; Nakamura, Masato; Miyamura, Tatsuo; Shirasawa, Hiroshi; Sugiura, Nobuyuki; Takahashi-Nakaguchi, Azusa; Gonoi, Tohru; Yokosuka, Osamu

    2014-01-01

    Hepatitis A virus (HAV) is a causative agent of acute viral hepatitis for which an effective vaccine has been developed. Here we describe ultra-deep pyrosequences (UDPSs) of HAV 5'-untranslated region (5'UTR) among cases of the same outbreak, which arose from a single source, associated with a revolving sushi bar. We determined the reference sequence from HAV-derived clone from an attendant by the Sanger method. Sixteen UDPSs from this outbreak and one from another sporadic case were compared with this reference. Nucleotide errors yielded a UDPS error rate of < 1%. This study confirmed that nucleotide substitutions of this region are transition mutations in outbreak cases, that insertion was observed only in non-severe cases, and that these nucleotide substitutions were different from those of the sporadic case. Analysis of UDPSs detected low-prevalence HAV variations in 5'UTR, but no specific mutations associated with severity in these outbreak cases. To our surprise, HAV strains in this outbreak conserved HAV IRES sequence even if we performed analysis of UDPSs. UDPS analysis of HAV 5'UTR gave us no association between the disease severity of hepatitis A and HAV 5'UTR substitutions. It might be more interesting to perform ultra-deep sequencing of full length HAV genome in order to reveal possible unknown genomic determinants associated with disease severity. Further studies will be needed. PMID:24396287

  12. Low endemism, continued deep-shallow interchanges, and evidence for cosmopolitan distributions in free-living marine nematodes (order Enoplida)

    PubMed Central

    2010-01-01

    Background Nematodes represent the most abundant benthic metazoa in one of the largest habitats on earth, the deep sea. Characterizing major patterns of biodiversity within this dominant group is a critical step towards understanding evolutionary patterns across this vast ecosystem. The present study has aimed to place deep-sea nematode species into a phylogenetic framework, investigate relationships between shallow water and deep-sea taxa, and elucidate phylogeographic patterns amongst the deep-sea fauna. Results Molecular data (18 S and 28 S rRNA) confirms a high diversity amongst deep-sea Enoplids. There is no evidence for endemic deep-sea lineages in Maximum Likelihood or Bayesian phylogenies, and Enoplids do not cluster according to depth or geographic location. Tree topologies suggest frequent interchanges between deep-sea and shallow water habitats, as well as a mixture of early radiations and more recently derived lineages amongst deep-sea taxa. This study also provides convincing evidence of cosmopolitan marine species, recovering a subset of Oncholaimid nematodes with identical gene sequences (18 S, 28 S and cox1) at trans-Atlantic sample sites. Conclusions The complex clade structures recovered within the Enoplida support a high global species richness for marine nematodes, with phylogeographic patterns suggesting the existence of closely related, globally distributed species complexes in the deep sea. True cosmopolitan species may additionally exist within this group, potentially driven by specific life history traits of Enoplids. Although this investigation aimed to intensively sample nematodes from the order Enoplida, specimens were only identified down to genus (at best) and our sampling regime focused on an infinitesimal small fraction of the deep-sea floor. Future nematode studies should incorporate an extended sample set covering a wide depth range (shelf, bathyal, and abyssal sites), utilize additional genetic loci (e.g. mtDNA) that are informative at the species level, and apply high-throughput sequencing methods to fully assay community diversity. Finally, further molecular studies are needed to determine whether phylogeographic patterns observed in Enoplids are common across other ubiquitous marine groups (e.g. Chromadorida, Monhysterida). PMID:21167065

  13. Cellular and molecular characterization of gametogenic progression in ex vivo cultured prepuberal mouse testes.

    PubMed

    Isoler-Alcaraz, J; Fernández-Pérez, D; Larriba, E; Del Mazo, J

    2017-10-18

    Recently, an effective testis culture method using a gas-liquid interphase, capable of differentiate male germ cells from neonatal spermatogonia to spermatozoa has been developed. Nevertheless, this methodology needs deep analyses that allow future experimental approaches in basic, pathologic and/or reprotoxicologic studies. Because of this, we characterized at cellular and molecular levels the entire in vitro spermatogenic progression, in order to understand and evaluate the characteristics that define the spermatogenic process in ex vivo cultured testes compared to the in vivo development. Testicular explants of CD1 mice aged 6 and 10 days post-partum were respectively cultured during 55 and 89 days. Cytological and molecular approaches were performed, analyzing germ cell proportion at different time culture points, meiotic markers immunodetecting synaptonemal complex protein SYCP3 by immunocytochemistry and the relative expression of different marker genes along the differentiation process by Reverse Transcription - quantitative Polymerase Chain Reaction. In addition, microRNA and piwi-interactingRNA profiles were also evaluated by Next Generation Sequencing and bioinformatic approaches. The method promoted and maintained the spermatogenic process during 89 days. At a cytological level we detected spermatogenic development delays of cultured explants compared to the natural in vivo process. The expression of different spermatogenic stages gene markers correlated with the proportion of different cell types detected in the cytological preparations. In vitro progression analysis of the different spermatogenic cell types, from both 6.5 dpp and 10.5 dpp testes explants, has revealed a relative delay in relation to in vivo process. The expression of the genes studied as biomarkers correlates with the cytologically and functional detected progression and differential expression identified in vivo. After a first analysis of deep sequencing data it has been observed that as long as cultures progress, the proportion of microRNAs declined respect to piwi-interactingRNAs levels that increased, showing a similar propensity than which happens in in vivo spermatogenesis. Our study allows to improve and potentially to control the ex vivo spermatogenesis development, opening new perspectives in the reproductive biology fields including male fertility.

  14. Association Between Mutation Clearance After Induction Therapy and Outcomes in Acute Myeloid Leukemia

    PubMed Central

    Klco, Jeffery M.; Miller, Christopher A.; Griffith, Malachi; Petti, Allegra; Spencer, David H.; Ketkar-Kulkarni, Shamika; Wartman, Lukas D; Christopher, Matthew; Lamprecht, Tamara L.; Helton, Nicole M.; Duncavage, Eric J.; Payton, Jacqueline E.; Baty, Jack; Heath, Sharon E.; Griffith, Obi L.; Shen, Dong; Hundal, Jasreet; Chang, Gue Su; Fulton, Robert; O'Laughlin, Michelle; Fronick, Catrina; Magrini, Vincent; Demeter, Ryan T.; Larson, David E.; Kulkarni, Shashikant; Ozenberger, Bradley A.; Welch, John S; Walter, Matthew J; Graubert, Timothy A.; Westervelt, Peter; Radich, Jerald P.; Link, Daniel C.; Mardis, Elaine R.; DiPersio, John F.; Wilson, Richard K.; Ley, Timothy J.

    2015-01-01

    IMPORTANCE Tests that predict outcomes for patients with acute myeloid leukemia (AML) are imprecise, especially for those with intermediate risk AML. OBJECTIVES To determine whether genomic approaches can provide novel prognostic information for adult patients with de novo AML. DESIGN, SETTING, AND PARTICIPANTS Whole-genome or exome sequencing was performed on samples obtained at disease presentation from 71 patients with AML (mean age, 50.8 years) treated with standard induction chemotherapy at a single site starting in March 2002, with follow-up through January 2015. In addition, deep digital sequencing was performed on paired diagnosis and remission samples from 50 patients (including 32 with intermediate-risk AML), approximately 30 days after successful induction therapy. Twenty-five of the 50 were from the cohort of 71 patients, and 25 were new, additional cases. EXPOSURES Whole-genome or exome sequencing and targeted deep sequencing. Risk of identification based on genetic data. MAIN OUTCOMES AND MEASURES Mutation patterns (including clearance of leukemia-associated variants after chemotherapy) and their association with event-free survival and overall survival. RESULTS Analysis of comprehensive genomic data from the 71 patients did not improve outcome assessment over current standard-of-care metrics. In an analysis of 50 patients with both presentation and documented remission samples, 24 (48%) had persistent leukemia-associated mutations in at least 5%of bone marrow cells at remission. The 24 with persistent mutations had significantly reduced event-free and overall survival vs the 26 who cleared all mutations. Patients with intermediate cytogenetic risk profiles had similar findings. Digital Sequencing (n=50)Intermediate CytogeneticRisk Profile (n=32)PersistentMutations(n=24)ClearedMutations(n=26)HR(95% CI)PersistentMutations(n=14)ClearedMutations(n=18)HR(95% CI)Event-free survival,median (95% CI), mo6.0(3.7–9.6)17.9(11.3–40.4)3.67(1.93–7.11)8.8(3.7–14.6)25.6(11.4-notestimable)3.32(1.44–7.67)Overall survival,median (95% CI), mo10.5(7.5–22.2)42.2(20.6-notestimable)2.86(1.39–5.88)19.3(7.5–42.3)46.8(22.6-notestimable)2.88(1.11–7.45) CONCLUSIONS AND RELEVANCE The detection of persistent leukemia-associated mutations in at least 5%of bone marrow cells in day 30 remission samples was associated with a significantly increased risk of relapse, and reduced overall survival. These data suggest that this genomic approach may improve risk stratification for patients with AML. PMID:26305651

  15. Genome Sequence of Aeribacillus pallidus Strain GS3372, an Endospore-Forming Bacterium Isolated in a Deep Geothermal Reservoir

    PubMed Central

    Filippidou, Sevasti; Jaussi, Marion; Junier, Thomas; Wunderlin, Tina; Jeanneret, Nicole; Regenspurg, Simona; Li, Po-E; Lo, Chien-Chi; McMurry, Kim; Gleasner, Cheryl D.; Vuyisich, Momchilo; Chain, Patrick S.

    2015-01-01

    The genome of strain GS3372 is the first publicly available strain of Aeribacillus pallidus. This endospore-forming thermophilic strain was isolated from a deep geothermal reservoir. The availability of this genome can contribute to the clarification of the taxonomy of the closely related Anoxybacillus, Geobacillus, and Aeribacillus genera. PMID:26316637

  16. Bacterial community diversity of the deep-sea octocoral Paramuricea placomus.

    PubMed

    Kellogg, Christina A; Ross, Steve W; Brooke, Sandra D

    2016-01-01

    Compared to tropical corals, much less is known about deep-sea coral biology and ecology. Although the microbial communities of some deep-sea corals have been described, this is the first study to characterize the bacterial community associated with the deep-sea octocoral, Paramuricea placomus . Samples from five colonies of P. placomus were collected from Baltimore Canyon (379-382 m depth) in the Atlantic Ocean off the east coast of the United States of America. DNA was extracted from the coral samples and 16S rRNA gene amplicons were pyrosequenced using V4-V5 primers. Three samples sequenced deeply (>4,000 sequences each) and were further analyzed. The dominant microbial phylum was Proteobacteria, but other major phyla included Firmicutes and Planctomycetes. A conserved community of bacterial taxa held in common across the three P. placomus colonies was identified, comprising 68-90% of the total bacterial community depending on the coral individual. The bacterial community of P. placomus does not appear to include the genus Endozoicomonas , which has been found previously to be the dominant bacterial associate in several temperate and tropical gorgonians. Inferred functionality suggests the possibility of nitrogen cycling by the core bacterial community.

  17. Bacterial community diversity of the deep-sea octocoral Paramuricea placomus

    USGS Publications Warehouse

    Kellogg, Christina A.; Ross, Steve W.; Brooke, Sandra D.

    2016-01-01

    Compared to tropical corals, much less is known about deep-sea coral biology and ecology. Although the microbial communities of some deep-sea corals have been described, this is the first study to characterize the bacterial community associated with the deep-sea octocoral, Paramuricea placomus. Samples from five colonies of P. placomus were collected from Baltimore Canyon (379–382 m depth) in the Atlantic Ocean off the east coast of the United States of America. DNA was extracted from the coral samples and 16S rRNA gene amplicons were pyrosequenced using V4-V5 primers. Three samples sequenced deeply (>4,000 sequences each) and were further analyzed. The dominant microbial phylum was Proteobacteria, but other major phyla included Firmicutes and Planctomycetes. A conserved community of bacterial taxa held in common across the three P. placomuscolonies was identified, comprising 68–90% of the total bacterial community depending on the coral individual. The bacterial community of P. placomusdoes not appear to include the genus Endozoicomonas, which has been found previously to be the dominant bacterial associate in several temperate and tropical gorgonians. Inferred functionality suggests the possibility of nitrogen cycling by the core bacterial community.

  18. Biogeographic patterns of bacterial microdiversity in Arctic deep-sea sediments (HAUSGARTEN, Fram Strait)

    PubMed Central

    Buttigieg, Pier Luigi; Ramette, Alban

    2015-01-01

    Marine bacteria colonizing deep-sea sediments beneath the Arctic ocean, a rapidly changing ecosystem, have been shown to exhibit significant biogeographic patterns along transects spanning tens of kilometers and across water depths of several thousand meters (Jacob et al., 2013). Jacob et al. (2013) adopted what has become a classical view of microbial diversity – based on operational taxonomic units clustered at the 97% sequence identity level of the 16S rRNA gene – and observed a very large microbial community replacement at the HAUSGARTEN Long Term Ecological Research station (Eastern Fram Strait). Here, we revisited these data using the oligotyping approach and aimed to obtain new insight into ecological and biogeographic patterns associated with bacterial microdiversity in marine sediments. We also assessed the level of concordance of these insights with previously obtained results. Variation in oligotype dispersal range, relative abundance, co-occurrence, and taxonomic identity were related to environmental parameters such as water depth, biomass, and sedimentary pigment concentration. This study assesses ecological implications of the new microdiversity-based technique using a well-characterized dataset of high relevance for global change biology. PMID:25601856

  19. Subglacial Lake Vostok (Antarctica) Accretion Ice Contains a Diverse Set of Sequences from Aquatic, Marine and Sediment-Inhabiting Bacteria and Eukarya

    PubMed Central

    Edgar, Robyn; Veerapaneni, Ram S.; D’Elia, Tom; Morris, Paul F.; Rogers, Scott O.

    2013-01-01

    Lake Vostok, the 7th largest (by volume) and 4th deepest lake on Earth, is covered by more than 3,700 m of ice, making it the largest subglacial lake known. The combination of cold, heat (from possible hydrothermal activity), pressure (from the overriding glacier), limited nutrients and complete darkness presents extreme challenges to life. Here, we report metagenomic/metatranscriptomic sequence analyses from four accretion ice sections from the Vostok 5G ice core. Two sections accreted in the vicinity of an embayment on the southwestern end of the lake, and the other two represented part of the southern main basin. We obtained 3,507 unique gene sequences from concentrates of 500 ml of 0.22 µm-filtered accretion ice meltwater. Taxonomic classifications (to genus and/or species) were possible for 1,623 of the sequences. Species determinations in combination with mRNA gene sequence results allowed deduction of the metabolic pathways represented in the accretion ice and, by extension, in the lake. Approximately 94% of the sequences were from Bacteria and 6% were from Eukarya. Only two sequences were from Archaea. In general, the taxa were similar to organisms previously described from lakes, brackish water, marine environments, soil, glaciers, ice, lake sediments, deep-sea sediments, deep-sea thermal vents, animals and plants. Sequences from aerobic, anaerobic, psychrophilic, thermophilic, halophilic, alkaliphilic, acidophilic, desiccation-resistant, autotrophic and heterotrophic organisms were present, including a number from multicellular eukaryotes. PMID:23843994

  20. Subglacial Lake Vostok (Antarctica) accretion ice contains a diverse set of sequences from aquatic, marine and sediment-inhabiting bacteria and eukarya.

    PubMed

    Shtarkman, Yury M; Koçer, Zeynep A; Edgar, Robyn; Veerapaneni, Ram S; D'Elia, Tom; Morris, Paul F; Rogers, Scott O

    2013-01-01

    Lake Vostok, the 7(th) largest (by volume) and 4(th) deepest lake on Earth, is covered by more than 3,700 m of ice, making it the largest subglacial lake known. The combination of cold, heat (from possible hydrothermal activity), pressure (from the overriding glacier), limited nutrients and complete darkness presents extreme challenges to life. Here, we report metagenomic/metatranscriptomic sequence analyses from four accretion ice sections from the Vostok 5G ice core. Two sections accreted in the vicinity of an embayment on the southwestern end of the lake, and the other two represented part of the southern main basin. We obtained 3,507 unique gene sequences from concentrates of 500 ml of 0.22 µm-filtered accretion ice meltwater. Taxonomic classifications (to genus and/or species) were possible for 1,623 of the sequences. Species determinations in combination with mRNA gene sequence results allowed deduction of the metabolic pathways represented in the accretion ice and, by extension, in the lake. Approximately 94% of the sequences were from Bacteria and 6% were from Eukarya. Only two sequences were from Archaea. In general, the taxa were similar to organisms previously described from lakes, brackish water, marine environments, soil, glaciers, ice, lake sediments, deep-sea sediments, deep-sea thermal vents, animals and plants. Sequences from aerobic, anaerobic, psychrophilic, thermophilic, halophilic, alkaliphilic, acidophilic, desiccation-resistant, autotrophic and heterotrophic organisms were present, including a number from multicellular eukaryotes.

  1. Identification of microRNAs from Amur grape (Vitis amurensis Rupr.) by deep sequencing and analysis of microRNA variations with bioinformatics.

    PubMed

    Wang, Chen; Han, Jian; Liu, Chonghuai; Kibet, Korir Nicholas; Kayesh, Emrul; Shangguan, Lingfei; Li, Xiaoying; Fang, Jinggui

    2012-03-29

    MicroRNA (miRNA) is a class of functional non-coding small RNA with 19-25 nucleotides in length while Amur grape (Vitis amurensis Rupr.) is an important wild fruit crop with the strongest cold resistance among the Vitis species, is used as an excellent breeding parent for grapevine, and has elicited growing interest in wine production. To date, there is a relatively large number of grapevine miRNAs (vv-miRNAs) from cultivated grapevine varieties such as Vitis vinifera L. and hybrids of V. vinifera and V. labrusca, but there is no report on miRNAs from Vitis amurensis Rupr, a wild grapevine species. A small RNA library from Amur grape was constructed and Solexa technology used to perform deep sequencing of the library followed by subsequent bioinformatics analysis to identify new miRNAs. In total, 126 conserved miRNAs belonging to 27 miRNA families were identified, and 34 known but non-conserved miRNAs were also found. Significantly, 72 new potential Amur grape-specific miRNAs were discovered. The sequences of these new potential va-miRNAs were further validated through miR-RACE, and accumulation of 18 new va-miRNAs in seven tissues of grapevines confirmed by real time RT-PCR (qRT-PCR) analysis. The expression levels of va-miRNAs in flowers and berries were found to be basically consistent in identity to those from deep sequenced sRNAs libraries of combined corresponding tissues. We also describe the conservation and variation of va-miRNAs using miR-SNPs and miR-LDs during plant evolution based on comparison of orthologous sequences, and further reveal that the number and sites of miR-SNP in diverse miRNA families exhibit distinct divergence. Finally, 346 target genes for the new miRNAs were predicted and they include a number of Amur grape stress tolerance genes and many genes regulating anthocyanin synthesis and sugar metabolism. Deep sequencing of short RNAs from Amur grape flowers and berries identified 72 new potential miRNAs and 34 known but non-conserved miRNAs, indicating that specific miRNAs exist in Amur grape. These results show that a number of regulatory miRNAs exist in Amur grape and play an important role in Amur grape growth, development, and response to abiotic or biotic stress.

  2. Comparative metagenomics of microbial communities inhabiting deep-sea hydrothermal vent chimneys with contrasting chemistries

    PubMed Central

    Xie, Wei; Wang, Fengping; Guo, Lei; Chen, Zeling; Sievert, Stefan M; Meng, Jun; Huang, Guangrui; Li, Yuxin; Yan, Qingyu; Wu, Shan; Wang, Xin; Chen, Shangwu; He, Guangyuan; Xiao, Xiang; Xu, Anlong

    2011-01-01

    Deep-sea hydrothermal vent chimneys harbor a high diversity of largely unknown microorganisms. Although the phylogenetic diversity of these microorganisms has been described previously, the adaptation and metabolic potential of the microbial communities is only beginning to be revealed. A pyrosequencing approach was used to directly obtain sequences from a fosmid library constructed from a black smoker chimney 4143-1 in the Mothra hydrothermal vent field at the Juan de Fuca Ridge. A total of 308 034 reads with an average sequence length of 227 bp were generated. Comparative genomic analyses of metagenomes from a variety of environments by two-way clustering of samples and functional gene categories demonstrated that the 4143-1 metagenome clustered most closely with that from a carbonate chimney from Lost City. Both are highly enriched in genes for mismatch repair and homologous recombination, suggesting that the microbial communities have evolved extensive DNA repair systems to cope with the extreme conditions that have potential deleterious effects on the genomes. As previously reported for the Lost City microbiome, the metagenome of chimney 4143-1 exhibited a high proportion of transposases, implying that horizontal gene transfer may be a common occurrence in the deep-sea vent chimney biosphere. In addition, genes for chemotaxis and flagellar assembly were highly enriched in the chimney metagenomes, reflecting the adaptation of the organisms to the highly dynamic conditions present within the chimney walls. Reconstruction of the metabolic pathways revealed that the microbial community in the wall of chimney 4143-1 was mainly fueled by sulfur oxidation, putatively coupled to nitrate reduction to perform inorganic carbon fixation through the Calvin–Benson–Bassham cycle. On the basis of the genomic organization of the key genes of the carbon fixation and sulfur oxidation pathways contained in the large genomic fragments, both obligate and facultative autotrophs appear to be present and contribute to biomass production. PMID:20927138

  3. Using small RNA (sRNA) deep sequencing to understand global virus distribution in plants

    USDA-ARS?s Scientific Manuscript database

    Small RNAs (sRNAs), a class of regulatory RNAs, have been used to serve as the specificity determinants of suppressing gene expression in plants and animals. Next generation sequencing (NGS) uncovered the sRNA landscape in most organisms including their associated microbes. In the current study, w...

  4. Fatal Metacestode Infection in Bornean Orangutan Caused by Unknown Versteria Species

    PubMed Central

    Gendron-Fitzpatrick, Annette; Deering, Kathleen M.; Wallace, Roberta S.; Clyde, Victoria L.; Lauck, Michael; Rosen, Gail E.; Bennett, Andrew J.; Greiner, Ellis C.; O’Connor, David H.

    2014-01-01

    A captive juvenile Bornean orangutan (Pongo pygmaeus) died from an unknown disseminated parasitic infection. Deep sequencing of DNA from infected tissues, followed by gene-specific PCR and sequencing, revealed a divergent species within the newly proposed genus Versteria (Cestoda: Taeniidae). Versteria may represent a previously unrecognized risk to primate health. PMID:24377497

  5. Testing deep reticulate evolution in Amaryllidaceae Tribe Hippeastreae (Asparagales) with ITS and chloroplast sequence data

    USDA-ARS?s Scientific Manuscript database

    The phylogeny of Amaryllidaceae tribe Hippeastreae was inferred using chloroplast (3’ycf1, ndhF, trnL-F) and nuclear (ITS rDNA) sequence data under maximum parsimony and maximum likelihood frameworks. Network analyses were applied to resolve conflicting signals among data sets and putative scenarios...

  6. BIOCHEMICAL AND PHYLOGENETIC CHARACTERIZATION OF TWO NOVEL DEEP-SEA THERMOCOCCUS ISOLATES WITH POTENTIALLY BIOTECHNOLOGICAL APPLICATIONS

    EPA Science Inventory

    The partial 16S rDNA gene sequences of two thermophilic archaeal strains, TY and TYS, previously isolated from the Guaymas Basin hydrothermal vent site were determined. Lipid analyses and a comparative analysis performed with 16S rDNA sequences of similar thermophilic species sho...

  7. Analysis of alterative cleavage and polyadenylation by 3′ region extraction and deep sequencing

    PubMed Central

    Hoque, Mainul; Ji, Zhe; Zheng, Dinghai; Luo, Wenting; Li, Wencheng; You, Bei; Park, Ji Yeon; Yehia, Ghassan; Tian, Bin

    2012-01-01

    Alternative cleavage and polyadenylation (APA) leads to mRNA isoforms with different coding sequences (CDS) and/or 3′ untranslated regions (3′UTRs). Using 3′ Region Extraction And Deep Sequencing (3′READS), a method which addresses the internal priming and oligo(A) tail issues that commonly plague polyA site (pA) identification, we comprehensively mapped pAs in the mouse genome, thoroughly annotating 3′ ends of genes and revealing over five thousand pAs (~8% of total) flanked by A-rich sequences, which have hitherto been overlooked. About 79% of mRNA genes and 66% of long non-coding RNA (lncRNA) genes have APA; but these two gene types have distinct usage patterns for pAs in introns and upstream exons. Promoter-distal pAs become relatively more abundant during embryonic development and cell differentiation, a trend affecting pAs in both 3′-most exons and upstream regions. Upregulated isoforms generally have stronger pAs, suggesting global modulation of the 3′ end processing activity in development and differentiation. PMID:23241633

  8. Microbes in deep marine sediments viewed through amplicon sequencing and metagenomics

    NASA Astrophysics Data System (ADS)

    Biddle, J.; Leon, Z. R.; Russell, J. A., III; Martino, A. J.

    2016-12-01

    Nearly twenty percent of microbial biomass on Earth can be found in the marine subsurface. The majority of this is concentrated on continental margins, which have been investigated by scientific drilling. On the Costa Rica Margin, Iberian Margin and Peru Margins, sediment samples have been investigated through DNA extraction followed by amplicon and metagenomic sequencing. Overall samples show a high degree of microbial diversity, including many lineages of newly defined groups. In this talk, metagenome assembled genomes of unusual lineages will be presented, including their relationships to shallower relatives. From Costa Rica, in particular, we have retrieved deep relatives of Lokiarchaeota and Thorarchaeota, as well as other deeply branching archaeal relatives. We discuss their genome similarities to both other archaea and eukaryotes. From the Iberian Margin, relatives of Atribacteria and Aerophobetes will be discussed. Finally, we will detail the knowledge lost or gained depending on whether samples are studied via amplicon sequencing or total metagenomics, as studies in other environments have shown that up to 15% of microbial diversity is ignored when samples are studied via amplicon sequencing alone.

  9. Characterization of skin ulceration syndrome associated microRNAs in sea cucumber Apostichopus japonicus by deep sequencing.

    PubMed

    Li, Chenghua; Feng, Weida; Qiu, Lihua; Xia, Changge; Su, Xiurong; Jin, Chunhua; Zhou, Tingting; Zeng, Yuan; Li, Taiwu

    2012-08-01

    MicroRNAs (miRNAs) constitute a family of small RNA species which have been demonstrated to be one of key effectors in mediating host-pathogen interaction. In this study, two haemocytes miRNA libraries were constructed with deep sequenced by illumina Hiseq2000 from healthy (L1) and skin ulceration syndrome Apostichopus japonicus (L2). The high throughput solexa sequencing resulted in 9,579,038 and 7,742,558 clean data from L1 and L2, respectively. Sequences analysis revealed that 40 conserved miRNAs were found in both libraries, in which let-7 and mir-125 were speculated to be clustered together and expressed accordingly. Eighty-six miRNA candidates were also identified by reference genome search and stem-loop structure prediction. Importantly, mir-31 and mir-2008 displayed significant differential expression between the two libraries according to FPKM model, which might be considered as promising targets for elucidating the intrinsic mechanism of skin ulceration syndrome outbreak in the species. Copyright © 2012 Elsevier Ltd. All rights reserved.

  10. Diversity and Biogeography of Bathyal and Abyssal Seafloor Bacteria

    PubMed Central

    Bienhold, Christina; Zinger, Lucie; Boetius, Antje; Ramette, Alban

    2016-01-01

    The deep ocean floor covers more than 60% of the Earth’s surface, and hosts diverse bacterial communities with important functions in carbon and nutrient cycles. The identification of key bacterial members remains a challenge and their patterns of distribution in seafloor sediment yet remain poorly described. Previous studies were either regionally restricted or included few deep-sea sediments, and did not specifically test biogeographic patterns across the vast oligotrophic bathyal and abyssal seafloor. Here we define the composition of this deep seafloor microbiome by describing those bacterial operational taxonomic units (OTU) that are specifically associated with deep-sea surface sediments at water depths ranging from 1000–5300 m. We show that the microbiome of the surface seafloor is distinct from the subsurface seafloor. The cosmopolitan bacterial OTU were affiliated with the clades JTB255 (class Gammaproteobacteria, order Xanthomonadales) and OM1 (Actinobacteria, order Acidimicrobiales), comprising 21% and 7% of their respective clades, and about 1% of all sequences in the study. Overall, few sequence-abundant bacterial types were globally dispersed and displayed positive range-abundance relationships. Most bacterial populations were rare and exhibited a high degree of endemism, explaining the substantial differences in community composition observed over large spatial scales. Despite the relative physicochemical uniformity of deep-sea sediments, we identified indicators of productivity regimes, especially sediment organic matter content, as factors significantly associated with changes in bacterial community structure across the globe. PMID:26814838

  11. Diverse deep-sea fungi from the South China Sea and their antimicrobial activity.

    PubMed

    Zhang, Xiao-Yong; Zhang, Yun; Xu, Xin-Ya; Qi, Shu-Hua

    2013-11-01

    We investigated the diversity of fungal communities in nine different deep-sea sediment samples of the South China Sea by culture-dependent methods followed by analysis of fungal internal transcribed spacer (ITS) sequences. Although 14 out of 27 identified species were reported in a previous study, 13 species were isolated from sediments of deep-sea environments for the first report. Moreover, these ITS sequences of six isolates shared 84-92 % similarity with their closest matches in GenBank, which suggested that they might be novel phylotypes of genera Ajellomyces, Podosordaria, Torula, and Xylaria. The antimicrobial activities of these fungal isolates were explored using a double-layer technique. A relatively high proportion (56 %) of fungal isolates exhibited antimicrobial activity against at least one pathogenic bacterium or fungus among four marine pathogenic microbes (Micrococcus luteus, Pseudoaltermonas piscida, Aspergerillus versicolor, and A. sydowii). Out of these antimicrobial fungi, the genera Arthrinium, Aspergillus, and Penicillium exhibited antibacterial and antifungal activities, while genus Aureobasidium displayed only antibacterial activity, and genera Acremonium, Cladosporium, Geomyces, and Phaeosphaeriopsis displayed only antifungal activity. To our knowledge, this is the first report to investigate the diversity and antimicrobial activity of culturable deep-sea-derived fungi in the South China Sea. These results suggest that diverse deep-sea fungi from the South China Sea are a potential source for antibiotics' discovery and further increase the pool of fungi available for natural bioactive product screening.

  12. Genealogy-based methods for inference of historical recombination and gene flow and their application in Saccharomyces cerevisiae.

    PubMed

    Jenkins, Paul A; Song, Yun S; Brem, Rachel B

    2012-01-01

    Genetic exchange between isolated populations, or introgression between species, serves as a key source of novel genetic material on which natural selection can act. While detecting historical gene flow from DNA sequence data is of much interest, many existing methods can be limited by requirements for deep population genomic sampling. In this paper, we develop a scalable genealogy-based method to detect candidate signatures of gene flow into a given population when the source of the alleles is unknown. Our method does not require sequenced samples from the source population, provided that the alleles have not reached fixation in the sampled recipient population. The method utilizes recent advances in algorithms for the efficient reconstruction of ancestral recombination graphs, which encode genealogical histories of DNA sequence data at each site, and is capable of detecting the signatures of gene flow whose footprints are of length up to single genes. Further, we employ a theoretical framework based on coalescent theory to test for statistical significance of certain recombination patterns consistent with gene flow from divergent sources. Implementing these methods for application to whole-genome sequences of environmental yeast isolates, we illustrate the power of our approach to highlight loci with unusual recombination histories. By developing innovative theory and methods to analyze signatures of gene flow from population sequence data, our work establishes a foundation for the continued study of introgression and its evolutionary relevance.

  13. Genealogy-Based Methods for Inference of Historical Recombination and Gene Flow and Their Application in Saccharomyces cerevisiae

    PubMed Central

    Jenkins, Paul A.; Song, Yun S.; Brem, Rachel B.

    2012-01-01

    Genetic exchange between isolated populations, or introgression between species, serves as a key source of novel genetic material on which natural selection can act. While detecting historical gene flow from DNA sequence data is of much interest, many existing methods can be limited by requirements for deep population genomic sampling. In this paper, we develop a scalable genealogy-based method to detect candidate signatures of gene flow into a given population when the source of the alleles is unknown. Our method does not require sequenced samples from the source population, provided that the alleles have not reached fixation in the sampled recipient population. The method utilizes recent advances in algorithms for the efficient reconstruction of ancestral recombination graphs, which encode genealogical histories of DNA sequence data at each site, and is capable of detecting the signatures of gene flow whose footprints are of length up to single genes. Further, we employ a theoretical framework based on coalescent theory to test for statistical significance of certain recombination patterns consistent with gene flow from divergent sources. Implementing these methods for application to whole-genome sequences of environmental yeast isolates, we illustrate the power of our approach to highlight loci with unusual recombination histories. By developing innovative theory and methods to analyze signatures of gene flow from population sequence data, our work establishes a foundation for the continued study of introgression and its evolutionary relevance. PMID:23226196

  14. Insilico profiling of microRNAs in Korean ginseng (Panax ginseng Meyer)

    PubMed Central

    Mathiyalagan, Ramya; Subramaniyam, Sathiyamoorthy; Natarajan, Sathishkumar; Kim, Yeon Ju; Sun, Myung Suk; Kim, Se Young; Kim, Yu-Jin; Yang, Deok Chun

    2013-01-01

    MicroRNAs (miRNAs) are a class of recently discovered non-coding small RNA molecules, on average approximately 21 nucleotides in length, which underlie numerous important biological roles in gene regulation in various organisms. The miRNA database (release 18) has 18,226 miRNAs, which have been deposited from different species. Although miRNAs have been identified and validated in many plant species, no studies have been reported on discovering miRNAs in Panax ginseng Meyer, which is a traditionally known medicinal plant in oriental medicine, also known as Korean ginseng. It has triterpene ginseng saponins called ginsenosides, which are responsible for its various pharmacological activities. Predicting conserved miRNAs by homology-based analysis with available expressed sequence tag (EST) sequences can be powerful, if the species lacks whole genome sequence information. In this study by using the EST based computational approach, 69 conserved miRNAs belonging to 44 miRNA families were identified in Korean ginseng. The digital gene expression patterns of predicted conserved miRNAs were analyzed by deep sequencing using small RNA sequences of flower buds, leaves, and lateral roots. We have found that many of the identified miRNAs showed tissue specific expressions. Using the insilico method, 346 potential targets were identified for the predicted 69 conserved miRNAs by searching the ginseng EST database, and the predicted targets were mainly involved in secondary metabolic processes, responses to biotic and abiotic stress, and transcription regulator activities, as well as a variety of other metabolic processes. PMID:23717176

  15. Molecular characterization of oral squamous cell carcinoma using targeted next-generation sequencing.

    PubMed

    Er, Tze-Kiong; Wang, Yen-Yun; Chen, Chih-Chieh; Herreros-Villanueva, Marta; Liu, Ta-Chih; Yuan, Shyng-Shiou F

    2015-10-01

    Many genetic factors play an important role in the development of oral squamous cell carcinoma. The aim of this study was to assess the mutational profile in oral squamous cell carcinoma using formalin-fixed, paraffin-embedded tumors from a Taiwanese population by performing targeted sequencing of 26 cancer-associated genes that are frequently mutated in solid tumors. Next-generation sequencing was performed in 50 formalin-fixed, paraffin-embedded tumor specimens obtained from patients with oral squamous cell carcinoma. Genetic alterations in the 26 cancer-associated genes were detected using a deep sequencing (>1000X) approach. TP53, PIK3CA, MET, APC, CDH1, and FBXW7 were most frequently mutated genes. Most remarkably, TP53 mutations and PIK3CA mutations, which accounted for 68% and 18% of tumors, respectively, were more prevalent in a Taiwanese population. Other genes including MET (4%), APC (4%), CDH1 (2%), and FBXW7 (2%) were identified in our population. In summary, our study shows the feasibility of performing targeted sequencing using formalin-fixed, paraffin-embedded samples. Additionally, this study also reports the mutational landscape of oral squamous cell carcinoma in the Taiwanese population. We believe that this study will shed new light on fundamental aspects in understanding the molecular pathogenesis of oral squamous cell carcinoma and may aid in the development of new targeted therapies. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  16. DeepSAGE Based Differential Gene Expression Analysis under Cold and Freeze Stress in Seabuckthorn (Hippophae rhamnoides L.)

    PubMed Central

    Chaudhary, Saurabh; Sharma, Prakash C.

    2015-01-01

    Seabuckthorn (Hippophae rhamnoides L.), an important plant species of Indian Himalayas, is well known for its immense medicinal and nutritional value. The plant has the ability to sustain growth in harsh environments of extreme temperatures, drought and salinity. We employed DeepSAGE, a tag based approach, to identify differentially expressed genes under cold and freeze stress in seabuckthorn. In total 36.2 million raw tags including 13.9 million distinct tags were generated using Illumina sequencing platform for three leaf tissue libraries including control (CON), cold stress (CS) and freeze stress (FS). After discarding low quality tags, 35.5 million clean tags including 7 million distinct clean tags were obtained. In all, 11922 differentially expressed genes (DEGs) including 6539 up regulated and 5383 down regulated genes were identified in three comparative setups i.e. CON vs CS, CON vs FS and CS vs FS. Gene ontology and KEGG pathway analysis were performed to assign gene ontology term to DEGs and ascertain their biological functions. DEGs were mapped back to our existing seabuckthorn transcriptome assembly comprising of 88,297 putative unigenes leading to the identification of 428 cold and freeze stress responsive genes. Expression of randomly selected 22 DEGs was validated using qRT-PCR that further supported our DeepSAGE results. The present study provided a comprehensive view of global gene expression profile of seabuckthorn under cold and freeze stresses. The DeepSAGE data could also serve as a valuable resource for further functional genomics studies aiming selection of candidate genes for development of abiotic stress tolerant transgenic plants. PMID:25803684

  17. DeepSAGE based differential gene expression analysis under cold and freeze stress in seabuckthorn (Hippophae rhamnoides L.).

    PubMed

    Chaudhary, Saurabh; Sharma, Prakash C

    2015-01-01

    Seabuckthorn (Hippophae rhamnoides L.), an important plant species of Indian Himalayas, is well known for its immense medicinal and nutritional value. The plant has the ability to sustain growth in harsh environments of extreme temperatures, drought and salinity. We employed DeepSAGE, a tag based approach, to identify differentially expressed genes under cold and freeze stress in seabuckthorn. In total 36.2 million raw tags including 13.9 million distinct tags were generated using Illumina sequencing platform for three leaf tissue libraries including control (CON), cold stress (CS) and freeze stress (FS). After discarding low quality tags, 35.5 million clean tags including 7 million distinct clean tags were obtained. In all, 11922 differentially expressed genes (DEGs) including 6539 up regulated and 5383 down regulated genes were identified in three comparative setups i.e. CON vs CS, CON vs FS and CS vs FS. Gene ontology and KEGG pathway analysis were performed to assign gene ontology term to DEGs and ascertain their biological functions. DEGs were mapped back to our existing seabuckthorn transcriptome assembly comprising of 88,297 putative unigenes leading to the identification of 428 cold and freeze stress responsive genes. Expression of randomly selected 22 DEGs was validated using qRT-PCR that further supported our DeepSAGE results. The present study provided a comprehensive view of global gene expression profile of seabuckthorn under cold and freeze stresses. The DeepSAGE data could also serve as a valuable resource for further functional genomics studies aiming selection of candidate genes for development of abiotic stress tolerant transgenic plants.

  18. Deep Sequencing-Identified Kanamycin-Resistant Paenibacillus sp. Strain KS1 Isolated from Epiphyte Tillandsia usneoides (Spanish Moss) in Central Florida, USA

    PubMed Central

    Govindarajan, Subramaniam S.; Qi, Feng; Li, Jian-Liang; Sahoo, Malaya K.

    2017-01-01

    ABSTRACT Paenibacillus sp. strain KS1 was isolated from an epiphyte, Tillandsia usneoides (Spanish moss), in central Florida, USA. Here, we report a draft genome sequence of this strain, which consists of a total of 398 contigs spanning 6,508,195 bp, with a G+C content of 46.5% and comprising 5,401 predicted coding sequences. PMID:28153888

  19. Predictive value of the composition of the vaginal microbiota in bacterial vaginosis, a dynamic study to identify recurrence-related flora.

    PubMed

    Xiao, Bingbing; Niu, Xiaoxi; Han, Na; Wang, Ben; Du, Pengcheng; Na, Risu; Chen, Chen; Liao, Qinping

    2016-06-02

    Bacterial vaginosis (BV) is a highly prevalent disease in women, and increases the risk of pelvic inflammatory disease. It has been given wide attention because of the high recurrence rate. Traditional diagnostic methods based on microscope providing limited information on the vaginal microbiota increase the difficulty in tracing the development of the disease in bacteria resistance condition. In this study, we used deep-sequencing technology to observe dynamic variation of the vaginal microbiota at three major time points during treatment, at D0 (before treatment), D7 (stop using the antibiotics) and D30 (the 30-day follow-up visit). Sixty-five patients with BV were enrolled (48 were cured and 17 were not cured), and their bacterial composition of the vaginal microbiota was compared. Interestingly, we identified 9 patients might be recurrence. We also introduced a new measurement point of D7, although its microbiota were significantly inhabited by antibiotic and hard to be observed by traditional method. The vaginal microbiota in deep-sequencing-view present a strong correlation to the final outcome. Thus, coupled with detailed individual bioinformatics analysis and deep-sequencing technology, we may illustrate a more accurate map of vaginal microbial to BV patients, which provide a new opportunity to reduce the rate of recurrence of BV.

  20. A simple and novel method for RNA-seq library preparation of single cell cDNA analysis by hyperactive Tn5 transposase.

    PubMed

    Brouilette, Scott; Kuersten, Scott; Mein, Charles; Bozek, Monika; Terry, Anna; Dias, Kerith-Rae; Bhaw-Rosun, Leena; Shintani, Yasunori; Coppen, Steven; Ikebe, Chiho; Sawhney, Vinit; Campbell, Niall; Kaneko, Masahiro; Tano, Nobuko; Ishida, Hidekazu; Suzuki, Ken; Yashiro, Kenta

    2012-10-01

    Deep sequencing of single cell-derived cDNAs offers novel insights into oncogenesis and embryogenesis. However, traditional library preparation for RNA-seq analysis requires multiple steps with consequent sample loss and stochastic variation at each step significantly affecting output. Thus, a simpler and better protocol is desirable. The recently developed hyperactive Tn5-mediated library preparation, which brings high quality libraries, is likely one of the solutions. Here, we tested the applicability of hyperactive Tn5-mediated library preparation to deep sequencing of single cell cDNA, optimized the protocol, and compared it with the conventional method based on sonication. This new technique does not require any expensive or special equipment, which secures wider availability. A library was constructed from only 100 ng of cDNA, which enables the saving of precious specimens. Only a few steps of robust enzymatic reaction resulted in saved time, enabling more specimens to be prepared at once, and with a more reproducible size distribution among the different specimens. The obtained RNA-seq results were comparable to the conventional method. Thus, this Tn5-mediated preparation is applicable for anyone who aims to carry out deep sequencing for single cell cDNAs. Copyright © 2012 Wiley Periodicals, Inc.

Top