Sample records for prediction deep sequencing

  1. Deep learning methods for protein torsion angle prediction.

    PubMed

    Li, Haiou; Hou, Jie; Adhikari, Badri; Lyu, Qiang; Cheng, Jianlin

    2017-09-18

    Deep learning is one of the most powerful machine learning methods that has achieved the state-of-the-art performance in many domains. Since deep learning was introduced to the field of bioinformatics in 2012, it has achieved success in a number of areas such as protein residue-residue contact prediction, secondary structure prediction, and fold recognition. In this work, we developed deep learning methods to improve the prediction of torsion (dihedral) angles of proteins. We design four different deep learning architectures to predict protein torsion angles. The architectures including deep neural network (DNN) and deep restricted Boltzmann machine (DRBN), deep recurrent neural network (DRNN) and deep recurrent restricted Boltzmann machine (DReRBM) since the protein torsion angle prediction is a sequence related problem. In addition to existing protein features, two new features (predicted residue contact number and the error distribution of torsion angles extracted from sequence fragments) are used as input to each of the four deep learning architectures to predict phi and psi angles of protein backbone. The mean absolute error (MAE) of phi and psi angles predicted by DRNN, DReRBM, DRBM and DNN is about 20-21° and 29-30° on an independent dataset. The MAE of phi angle is comparable to the existing methods, but the MAE of psi angle is 29°, 2° lower than the existing methods. On the latest CASP12 targets, our methods also achieved the performance better than or comparable to a state-of-the art method. Our experiment demonstrates that deep learning is a valuable method for predicting protein torsion angles. The deep recurrent network architecture performs slightly better than deep feed-forward architecture, and the predicted residue contact number and the error distribution of torsion angles extracted from sequence fragments are useful features for improving prediction accuracy.

  2. Predicting RNA-protein binding sites and motifs through combining local and global deep convolutional neural networks.

    PubMed

    Pan, Xiaoyong; Shen, Hong-Bin

    2018-05-02

    RNA-binding proteins (RBPs) take over 5∼10% of the eukaryotic proteome and play key roles in many biological processes, e.g. gene regulation. Experimental detection of RBP binding sites is still time-intensive and high-costly. Instead, computational prediction of the RBP binding sites using pattern learned from existing annotation knowledge is a fast approach. From the biological point of view, the local structure context derived from local sequences will be recognized by specific RBPs. However, in computational modeling using deep learning, to our best knowledge, only global representations of entire RNA sequences are employed. So far, the local sequence information is ignored in the deep model construction process. In this study, we present a computational method iDeepE to predict RNA-protein binding sites from RNA sequences by combining global and local convolutional neural networks (CNNs). For the global CNN, we pad the RNA sequences into the same length. For the local CNN, we split a RNA sequence into multiple overlapping fixed-length subsequences, where each subsequence is a signal channel of the whole sequence. Next, we train deep CNNs for multiple subsequences and the padded sequences to learn high-level features, respectively. Finally, the outputs from local and global CNNs are combined to improve the prediction. iDeepE demonstrates a better performance over state-of-the-art methods on two large-scale datasets derived from CLIP-seq. We also find that the local CNN run 1.8 times faster than the global CNN with comparable performance when using GPUs. Our results show that iDeepE has captured experimentally verified binding motifs. https://github.com/xypan1232/iDeepE. xypan172436@gmail.com or hbshen@sjtu.edu.cn. Supplementary data are available at Bioinformatics online.

  3. High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features.

    PubMed

    Jones, David T; Kandathil, Shaun M

    2018-04-26

    In addition to substitution frequency data from protein sequence alignments, many state-of-the-art methods for contact prediction rely on additional sources of information, or features, of protein sequences in order to predict residue-residue contacts, such as solvent accessibility, predicted secondary structure, and scores from other contact prediction methods. It is unclear how much of this information is needed to achieve state-of-the-art results. Here, we show that using deep neural network models, simple alignment statistics contain sufficient information to achieve state-of-the-art precision. Our prediction method, DeepCov, uses fully convolutional neural networks operating on amino-acid pair frequency or covariance data derived directly from sequence alignments, without using global statistical methods such as sparse inverse covariance or pseudolikelihood estimation. Comparisons against CCMpred and MetaPSICOV2 show that using pairwise covariance data calculated from raw alignments as input allows us to match or exceed the performance of both of these methods. Almost all of the achieved precision is obtained when considering relatively local windows (around 15 residues) around any member of a given residue pairing; larger window sizes have comparable performance. Assessment on a set of shallow sequence alignments (fewer than 160 effective sequences) indicates that the new method is substantially more precise than CCMpred and MetaPSICOV2 in this regime, suggesting that improved precision is attainable on smaller sequence families. Overall, the performance of DeepCov is competitive with the state of the art, and our results demonstrate that global models, which employ features from all parts of the input alignment when predicting individual contacts, are not strictly needed in order to attain precise contact predictions. DeepCov is freely available at https://github.com/psipred/DeepCov. d.t.jones@ucl.ac.uk.

  4. DEEP MOTIF DASHBOARD: VISUALIZING AND UNDERSTANDING GENOMIC SEQUENCES USING DEEP NEURAL NETWORKS.

    PubMed

    Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun

    2017-01-01

    Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence's saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them.

  5. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.

    PubMed

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-11

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

  6. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields

    NASA Astrophysics Data System (ADS)

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-01

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

  7. Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks

    PubMed Central

    Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun

    2018-01-01

    Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence’s saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them. PMID:27896980

  8. RaptorX-Property: a web server for protein structure property prediction.

    PubMed

    Wang, Sheng; Li, Wei; Liu, Shiwang; Xu, Jinbo

    2016-07-08

    RaptorX Property (http://raptorx2.uchicago.edu/StructurePropertyPred/predict/) is a web server predicting structure property of a protein sequence without using any templates. It outperforms other servers, especially for proteins without close homologs in PDB or with very sparse sequence profile (i.e. carries little evolutionary information). This server employs a powerful in-house deep learning model DeepCNF (Deep Convolutional Neural Fields) to predict secondary structure (SS), solvent accessibility (ACC) and disorder regions (DISO). DeepCNF not only models complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent property labels. Our experimental results show that, tested on CASP10, CASP11 and the other benchmarks, this server can obtain ∼84% Q3 accuracy for 3-state SS, ∼72% Q8 accuracy for 8-state SS, ∼66% Q3 accuracy for 3-state solvent accessibility, and ∼0.89 area under the ROC curve (AUC) for disorder prediction. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. MUFOLD-SS: New deep inception-inside-inception networks for protein secondary structure prediction.

    PubMed

    Fang, Chao; Shang, Yi; Xu, Dong

    2018-05-01

    Protein secondary structure prediction can provide important information for protein 3D structure prediction and protein functions. Deep learning offers a new opportunity to significantly improve prediction accuracy. In this article, a new deep neural network architecture, named the Deep inception-inside-inception (Deep3I) network, is proposed for protein secondary structure prediction and implemented as a software tool MUFOLD-SS. The input to MUFOLD-SS is a carefully designed feature matrix corresponding to the primary amino acid sequence of a protein, which consists of a rich set of information derived from individual amino acid, as well as the context of the protein sequence. Specifically, the feature matrix is a composition of physio-chemical properties of amino acids, PSI-BLAST profile, and HHBlits profile. MUFOLD-SS is composed of a sequence of nested inception modules and maps the input matrix to either eight states or three states of secondary structures. The architecture of MUFOLD-SS enables effective processing of local and global interactions between amino acids in making accurate prediction. In extensive experiments on multiple datasets, MUFOLD-SS outperformed the best existing methods and other deep neural networks significantly. MUFold-SS can be downloaded from http://dslsrv8.cs.missouri.edu/~cf797/MUFoldSS/download.html. © 2018 Wiley Periodicals, Inc.

  10. MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction.

    PubMed

    Wang, Duolin; Zeng, Shuai; Xu, Chunhui; Qiu, Wangren; Liang, Yanchun; Joshi, Trupti; Xu, Dong

    2017-12-15

    Computational methods for phosphorylation site prediction play important roles in protein function studies and experimental design. Most existing methods are based on feature extraction, which may result in incomplete or biased features. Deep learning as the cutting-edge machine learning method has the ability to automatically discover complex representations of phosphorylation patterns from the raw sequences, and hence it provides a powerful tool for improvement of phosphorylation site prediction. We present MusiteDeep, the first deep-learning framework for predicting general and kinase-specific phosphorylation sites. MusiteDeep takes raw sequence data as input and uses convolutional neural networks with a novel two-dimensional attention mechanism. It achieves over a 50% relative improvement in the area under the precision-recall curve in general phosphorylation site prediction and obtains competitive results in kinase-specific prediction compared to other well-known tools on the benchmark data. MusiteDeep is provided as an open-source tool available at https://github.com/duolinwang/MusiteDeep. xudong@missouri.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  11. DeepLoc: prediction of protein subcellular localization using deep learning.

    PubMed

    Almagro Armenteros, José Juan; Sønderby, Casper Kaae; Sønderby, Søren Kaae; Nielsen, Henrik; Winther, Ole

    2017-11-01

    The prediction of eukaryotic protein subcellular localization is a well-studied topic in bioinformatics due to its relevance in proteomics research. Many machine learning methods have been successfully applied in this task, but in most of them, predictions rely on annotation of homologues from knowledge databases. For novel proteins where no annotated homologues exist, and for predicting the effects of sequence variants, it is desirable to have methods for predicting protein properties from sequence information only. Here, we present a prediction algorithm using deep neural networks to predict protein subcellular localization relying only on sequence information. At its core, the prediction model uses a recurrent neural network that processes the entire protein sequence and an attention mechanism identifying protein regions important for the subcellular localization. The model was trained and tested on a protein dataset extracted from one of the latest UniProt releases, in which experimentally annotated proteins follow more stringent criteria than previously. We demonstrate that our model achieves a good accuracy (78% for 10 categories; 92% for membrane-bound or soluble), outperforming current state-of-the-art algorithms, including those relying on homology information. The method is available as a web server at http://www.cbs.dtu.dk/services/DeepLoc. Example code is available at https://github.com/JJAlmagro/subcellular_localization. The dataset is available at http://www.cbs.dtu.dk/services/DeepLoc/data.php. jjalma@dtu.dk. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  12. A deep learning framework for improving long-range residue-residue contact prediction using a hierarchical strategy.

    PubMed

    Xiong, Dapeng; Zeng, Jianyang; Gong, Haipeng

    2017-09-01

    Residue-residue contacts are of great value for protein structure prediction, since contact information, especially from those long-range residue pairs, can significantly reduce the complexity of conformational sampling for protein structure prediction in practice. Despite progresses in the past decade on protein targets with abundant homologous sequences, accurate contact prediction for proteins with limited sequence information is still far from satisfaction. Methodologies for these hard targets still need further improvement. We presented a computational program DeepConPred, which includes a pipeline of two novel deep-learning-based methods (DeepCCon and DeepRCon) as well as a contact refinement step, to improve the prediction of long-range residue contacts from primary sequences. When compared with previous prediction approaches, our framework employed an effective scheme to identify optimal and important features for contact prediction, and was only trained with coevolutionary information derived from a limited number of homologous sequences to ensure robustness and usefulness for hard targets. Independent tests showed that 59.33%/49.97%, 64.39%/54.01% and 70.00%/59.81% of the top L/5, top L/10 and top 5 predictions were correct for CASP10/CASP11 proteins, respectively. In general, our algorithm ranked as one of the best methods for CASP targets. All source data and codes are available at http://166.111.152.91/Downloads.html . hgong@tsinghua.edu.cn or zengjy321@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  13. DeepText2GO: Improving large-scale protein function prediction with deep semantic text representation.

    PubMed

    You, Ronghui; Huang, Xiaodi; Zhu, Shanfeng

    2018-06-06

    As of April 2018, UniProtKB has collected more than 115 million protein sequences. Less than 0.15% of these proteins, however, have been associated with experimental GO annotations. As such, the use of automatic protein function prediction (AFP) to reduce this huge gap becomes increasingly important. The previous studies conclude that sequence homology based methods are highly effective in AFP. In addition, mining motif, domain, and functional information from protein sequences has been found very helpful for AFP. Other than sequences, alternative information sources such as text, however, may be useful for AFP as well. Instead of using BOW (bag of words) representation in traditional text-based AFP, we propose a new method called DeepText2GO that relies on deep semantic text representation, together with different kinds of available protein information such as sequence homology, families, domains, and motifs, to improve large-scale AFP. Furthermore, DeepText2GO integrates text-based methods with sequence-based ones by means of a consensus approach. Extensive experiments on the benchmark dataset extracted from UniProt/SwissProt have demonstrated that DeepText2GO significantly outperformed both text-based and sequence-based methods, validating its superiority. Copyright © 2018 Elsevier Inc. All rights reserved.

  14. DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data.

    PubMed

    Arango-Argoty, Gustavo; Garner, Emily; Pruden, Amy; Heath, Lenwood S; Vikesland, Peter; Zhang, Liqing

    2018-02-01

    Growing concerns about increasing rates of antibiotic resistance call for expanded and comprehensive global monitoring. Advancing methods for monitoring of environmental media (e.g., wastewater, agricultural waste, food, and water) is especially needed for identifying potential resources of novel antibiotic resistance genes (ARGs), hot spots for gene exchange, and as pathways for the spread of ARGs and human exposure. Next-generation sequencing now enables direct access and profiling of the total metagenomic DNA pool, where ARGs are typically identified or predicted based on the "best hits" of sequence searches against existing databases. Unfortunately, this approach produces a high rate of false negatives. To address such limitations, we propose here a deep learning approach, taking into account a dissimilarity matrix created using all known categories of ARGs. Two deep learning models, DeepARG-SS and DeepARG-LS, were constructed for short read sequences and full gene length sequences, respectively. Evaluation of the deep learning models over 30 antibiotic resistance categories demonstrates that the DeepARG models can predict ARGs with both high precision (> 0.97) and recall (> 0.90). The models displayed an advantage over the typical best hit approach, yielding consistently lower false negative rates and thus higher overall recall (> 0.9). As more data become available for under-represented ARG categories, the DeepARG models' performance can be expected to be further enhanced due to the nature of the underlying neural networks. Our newly developed ARG database, DeepARG-DB, encompasses ARGs predicted with a high degree of confidence and extensive manual inspection, greatly expanding current ARG repositories. The deep learning models developed here offer more accurate antimicrobial resistance annotation relative to current bioinformatics practice. DeepARG does not require strict cutoffs, which enables identification of a much broader diversity of ARGs. The DeepARG models and database are available as a command line version and as a Web service at http://bench.cs.vt.edu/deeparg .

  15. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier.

    PubMed

    Kulmanov, Maxat; Khan, Mohammed Asif; Hoehndorf, Robert; Wren, Jonathan

    2018-02-15

    A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein-protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations. Web server: http://deepgo.bio2vec.net, Source code: https://github.com/bio-ontology-research-group/deepgo. robert.hoehndorf@kaust.edu.sa. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  16. ComplexContact: a web server for inter-protein contact prediction using deep learning.

    PubMed

    Zeng, Hong; Wang, Sheng; Zhou, Tianming; Zhao, Feifeng; Li, Xiufeng; Wu, Qing; Xu, Jinbo

    2018-05-22

    ComplexContact (http://raptorx2.uchicago.edu/ComplexContact/) is a web server for sequence-based interfacial residue-residue contact prediction of a putative protein complex. Interfacial residue-residue contacts are critical for understanding how proteins form complex and interact at residue level. When receiving a pair of protein sequences, ComplexContact first searches for their sequence homologs and builds two paired multiple sequence alignments (MSA), then it applies co-evolution analysis and a CASP-winning deep learning (DL) method to predict interfacial contacts from paired MSAs and visualizes the prediction as an image. The DL method was originally developed for intra-protein contact prediction and performed the best in CASP12. Our large-scale experimental test further shows that ComplexContact greatly outperforms pure co-evolution methods for inter-protein contact prediction, regardless of the species.

  17. Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning.

    PubMed

    Adhikari, Badri; Hou, Jie; Cheng, Jianlin

    2018-03-01

    In this study, we report the evaluation of the residue-residue contacts predicted by our three different methods in the CASP12 experiment, focusing on studying the impact of multiple sequence alignment, residue coevolution, and machine learning on contact prediction. The first method (MULTICOM-NOVEL) uses only traditional features (sequence profile, secondary structure, and solvent accessibility) with deep learning to predict contacts and serves as a baseline. The second method (MULTICOM-CONSTRUCT) uses our new alignment algorithm to generate deep multiple sequence alignment to derive coevolution-based features, which are integrated by a neural network method to predict contacts. The third method (MULTICOM-CLUSTER) is a consensus combination of the predictions of the first two methods. We evaluated our methods on 94 CASP12 domains. On a subset of 38 free-modeling domains, our methods achieved an average precision of up to 41.7% for top L/5 long-range contact predictions. The comparison of the three methods shows that the quality and effective depth of multiple sequence alignments, coevolution-based features, and machine learning integration of coevolution-based features and traditional features drive the quality of predicted protein contacts. On the full CASP12 dataset, the coevolution-based features alone can improve the average precision from 28.4% to 41.6%, and the machine learning integration of all the features further raises the precision to 56.3%, when top L/5 predicted long-range contacts are evaluated. And the correlation between the precision of contact prediction and the logarithm of the number of effective sequences in alignments is 0.66. © 2017 Wiley Periodicals, Inc.

  18. A deep learning method for lincRNA detection using auto-encoder algorithm.

    PubMed

    Yu, Ning; Yu, Zeng; Pan, Yi

    2017-12-06

    RNA sequencing technique (RNA-seq) enables scientists to develop novel data-driven methods for discovering more unidentified lincRNAs. Meantime, knowledge-based technologies are experiencing a potential revolution ignited by the new deep learning methods. By scanning the newly found data set from RNA-seq, scientists have found that: (1) the expression of lincRNAs appears to be regulated, that is, the relevance exists along the DNA sequences; (2) lincRNAs contain some conversed patterns/motifs tethered together by non-conserved regions. The two evidences give the reasoning for adopting knowledge-based deep learning methods in lincRNA detection. Similar to coding region transcription, non-coding regions are split at transcriptional sites. However, regulatory RNAs rather than message RNAs are generated. That is, the transcribed RNAs participate the biological process as regulatory units instead of generating proteins. Identifying these transcriptional regions from non-coding regions is the first step towards lincRNA recognition. The auto-encoder method achieves 100% and 92.4% prediction accuracy on transcription sites over the putative data sets. The experimental results also show the excellent performance of predictive deep neural network on the lincRNA data sets compared with support vector machine and traditional neural network. In addition, it is validated through the newly discovered lincRNA data set and one unreported transcription site is found by feeding the whole annotated sequences through the deep learning machine, which indicates that deep learning method has the extensive ability for lincRNA prediction. The transcriptional sequences of lincRNAs are collected from the annotated human DNA genome data. Subsequently, a two-layer deep neural network is developed for the lincRNA detection, which adopts the auto-encoder algorithm and utilizes different encoding schemes to obtain the best performance over intergenic DNA sequence data. Driven by those newly annotated lincRNA data, deep learning methods based on auto-encoder algorithm can exert their capability in knowledge learning in order to capture the useful features and the information correlation along DNA genome sequences for lincRNA detection. As our knowledge, this is the first application to adopt the deep learning techniques for identifying lincRNA transcription sequences.

  19. Protein Solvent-Accessibility Prediction by a Stacked Deep Bidirectional Recurrent Neural Network.

    PubMed

    Zhang, Buzhong; Li, Linqing; Lü, Qiang

    2018-05-25

    Residue solvent accessibility is closely related to the spatial arrangement and packing of residues. Predicting the solvent accessibility of a protein is an important step to understand its structure and function. In this work, we present a deep learning method to predict residue solvent accessibility, which is based on a stacked deep bidirectional recurrent neural network applied to sequence profiles. To capture more long-range sequence information, a merging operator was proposed when bidirectional information from hidden nodes was merged for outputs. Three types of merging operators were used in our improved model, with a long short-term memory network performing as a hidden computing node. The trained database was constructed from 7361 proteins extracted from the PISCES server using a cut-off of 25% sequence identity. Sequence-derived features including position-specific scoring matrix, physical properties, physicochemical characteristics, conservation score and protein coding were used to represent a residue. Using this method, predictive values of continuous relative solvent-accessible area were obtained, and then, these values were transformed into binary states with predefined thresholds. Our experimental results showed that our deep learning method improved prediction quality relative to current methods, with mean absolute error and Pearson's correlation coefficient values of 8.8% and 74.8%, respectively, on the CB502 dataset and 8.2% and 78%, respectively, on the Manesh215 dataset.

  20. Accurate identification of RNA editing sites from primitive sequence with deep neural networks.

    PubMed

    Ouyang, Zhangyi; Liu, Feng; Zhao, Chenghui; Ren, Chao; An, Gaole; Mei, Chuan; Bo, Xiaochen; Shu, Wenjie

    2018-04-16

    RNA editing is a post-transcriptional RNA sequence alteration. Current methods have identified editing sites and facilitated research but require sufficient genomic annotations and prior-knowledge-based filtering steps, resulting in a cumbersome, time-consuming identification process. Moreover, these methods have limited generalizability and applicability in species with insufficient genomic annotations or in conditions of limited prior knowledge. We developed DeepRed, a deep learning-based method that identifies RNA editing from primitive RNA sequences without prior-knowledge-based filtering steps or genomic annotations. DeepRed achieved 98.1% and 97.9% area under the curve (AUC) in training and test sets, respectively. We further validated DeepRed using experimentally verified U87 cell RNA-seq data, achieving 97.9% positive predictive value (PPV). We demonstrated that DeepRed offers better prediction accuracy and computational efficiency than current methods with large-scale, mass RNA-seq data. We used DeepRed to assess the impact of multiple factors on editing identification with RNA-seq data from the Association of Biomolecular Resource Facilities and Sequencing Quality Control projects. We explored developmental RNA editing pattern changes during human early embryogenesis and evolutionary patterns in Drosophila species and the primate lineage using DeepRed. Our work illustrates DeepRed's state-of-the-art performance; it may decipher the hidden principles behind RNA editing, making editing detection convenient and effective.

  1. De novo peptide sequencing by deep learning

    PubMed Central

    Tran, Ngoc Hieu; Zhang, Xianglilan; Xin, Lei; Shan, Baozhen; Li, Ming

    2017-01-01

    De novo peptide sequencing from tandem MS data is the key technology in proteomics for the characterization of proteins, especially for new sequences, such as mAbs. In this study, we propose a deep neural network model, DeepNovo, for de novo peptide sequencing. DeepNovo architecture combines recent advances in convolutional neural networks and recurrent neural networks to learn features of tandem mass spectra, fragment ions, and sequence patterns of peptides. The networks are further integrated with local dynamic programming to solve the complex optimization task of de novo sequencing. We evaluated the method on a wide variety of species and found that DeepNovo considerably outperformed state of the art methods, achieving 7.7–22.9% higher accuracy at the amino acid level and 38.1–64.0% higher accuracy at the peptide level. We further used DeepNovo to automatically reconstruct the complete sequences of antibody light and heavy chains of mouse, achieving 97.5–100% coverage and 97.2–99.5% accuracy, without assisting databases. Moreover, DeepNovo is retrainable to adapt to any sources of data and provides a complete end-to-end training and prediction solution to the de novo sequencing problem. Not only does our study extend the deep learning revolution to a new field, but it also shows an innovative approach in solving optimization problems by using deep learning and dynamic programming. PMID:28720701

  2. Analysis of deep learning methods for blind protein contact prediction in CASP12.

    PubMed

    Wang, Sheng; Sun, Siqi; Xu, Jinbo

    2018-03-01

    Here we present the results of protein contact prediction achieved in CASP12 by our RaptorX-Contact server, which is an early implementation of our deep learning method for contact prediction. On a set of 38 free-modeling target domains with a median family size of around 58 effective sequences, our server obtained an average top L/5 long- and medium-range contact accuracy of 47% and 44%, respectively (L = length). A complete implementation has an average accuracy of 59% and 57%, respectively. Our deep learning method formulates contact prediction as a pixel-level image labeling problem and simultaneously predicts all residue pairs of a protein using a combination of two deep residual neural networks, taking as input the residue conservation information, predicted secondary structure and solvent accessibility, contact potential, and coevolution information. Our approach differs from existing methods mainly in (1) formulating contact prediction as a pixel-level image labeling problem instead of an image-level classification problem; (2) simultaneously predicting all contacts of an individual protein to make effective use of contact occurrence patterns; and (3) integrating both one-dimensional and two-dimensional deep convolutional neural networks to effectively learn complex sequence-structure relationship including high-order residue correlation. This paper discusses the RaptorX-Contact pipeline, both contact prediction and contact-based folding results, and finally the strength and weakness of our method. © 2017 Wiley Periodicals, Inc.

  3. Modeling positional effects of regulatory sequences with spline transformations increases prediction accuracy of deep neural networks

    PubMed Central

    Avsec, Žiga; Cheng, Jun; Gagneur, Julien

    2018-01-01

    Abstract Motivation Regulatory sequences are not solely defined by their nucleic acid sequence but also by their relative distances to genomic landmarks such as transcription start site, exon boundaries or polyadenylation site. Deep learning has become the approach of choice for modeling regulatory sequences because of its strength to learn complex sequence features. However, modeling relative distances to genomic landmarks in deep neural networks has not been addressed. Results Here we developed spline transformation, a neural network module based on splines to flexibly and robustly model distances. Modeling distances to various genomic landmarks with spline transformations significantly increased state-of-the-art prediction accuracy of in vivo RNA-binding protein binding sites for 120 out of 123 proteins. We also developed a deep neural network for human splice branchpoint based on spline transformations that outperformed the current best, already distance-based, machine learning model. Compared to piecewise linear transformation, as obtained by composition of rectified linear units, spline transformation yields higher prediction accuracy as well as faster and more robust training. As spline transformation can be applied to further quantities beyond distances, such as methylation or conservation, we foresee it as a versatile component in the genomics deep learning toolbox. Availability and implementation Spline transformation is implemented as a Keras layer in the CONCISE python package: https://github.com/gagneurlab/concise. Analysis code is available at https://github.com/gagneurlab/Manuscript_Avsec_Bioinformatics_2017. Contact avsec@in.tum.de or gagneur@in.tum.de Supplementary information Supplementary data are available at Bioinformatics online. PMID:29155928

  4. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.

    PubMed

    Wang, Sheng; Sun, Siqi; Li, Zhen; Zhang, Renyu; Xu, Jinbo

    2017-01-01

    Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. http://raptorx.uchicago.edu/ContactMap/.

  5. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model

    PubMed Central

    Li, Zhen; Zhang, Renyu

    2017-01-01

    Motivation Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. Method This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Results Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. Availability http://raptorx.uchicago.edu/ContactMap/ PMID:28056090

  6. Deep learning improves prediction of CRISPR-Cpf1 guide RNA activity.

    PubMed

    Kim, Hui Kwon; Min, Seonwoo; Song, Myungjae; Jung, Soobin; Choi, Jae Woo; Kim, Younggwang; Lee, Sangeun; Yoon, Sungroh; Kim, Hyongbum Henry

    2018-03-01

    We present two algorithms to predict the activity of AsCpf1 guide RNAs. Indel frequencies for 15,000 target sequences were used in a deep-learning framework based on a convolutional neural network to train Seq-deepCpf1. We then incorporated chromatin accessibility information to create the better-performing DeepCpf1 algorithm for cell lines for which such information is available and show that both algorithms outperform previous machine learning algorithms on our own and published data sets.

  7. AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling.

    PubMed

    Wang, Sheng; Sun, Siqi; Xu, Jinbo

    2016-09-01

    Deep Convolutional Neural Networks (DCNN) has shown excellent performance in a variety of machine learning tasks. This paper presents Deep Convolutional Neural Fields (DeepCNF), an integration of DCNN with Conditional Random Field (CRF), for sequence labeling with an imbalanced label distribution. The widely-used training methods, such as maximum-likelihood and maximum labelwise accuracy, do not work well on imbalanced data. To handle this, we present a new training algorithm called maximum-AUC for DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced data. To fulfill this, we formulate AUC in a pairwise ranking framework, approximate it by a polynomial function and then apply a gradient-based procedure to optimize it. Our experimental results confirm that maximum-AUC greatly outperforms the other two training methods on 8-state secondary structure prediction and disorder prediction since their label distributions are highly imbalanced and also has similar performance as the other two training methods on solvent accessibility prediction, which has three equally-distributed labels. Furthermore, our experimental results show that our AUC-trained DeepCNF models greatly outperform existing popular predictors of these three tasks. The data and software related to this paper are available at https://github.com/realbigws/DeepCNF_AUC.

  8. AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling

    PubMed Central

    Wang, Sheng; Sun, Siqi

    2017-01-01

    Deep Convolutional Neural Networks (DCNN) has shown excellent performance in a variety of machine learning tasks. This paper presents Deep Convolutional Neural Fields (DeepCNF), an integration of DCNN with Conditional Random Field (CRF), for sequence labeling with an imbalanced label distribution. The widely-used training methods, such as maximum-likelihood and maximum labelwise accuracy, do not work well on imbalanced data. To handle this, we present a new training algorithm called maximum-AUC for DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced data. To fulfill this, we formulate AUC in a pairwise ranking framework, approximate it by a polynomial function and then apply a gradient-based procedure to optimize it. Our experimental results confirm that maximum-AUC greatly outperforms the other two training methods on 8-state secondary structure prediction and disorder prediction since their label distributions are highly imbalanced and also has similar performance as the other two training methods on solvent accessibility prediction, which has three equally-distributed labels. Furthermore, our experimental results show that our AUC-trained DeepCNF models greatly outperform existing popular predictors of these three tasks. The data and software related to this paper are available at https://github.com/realbigws/DeepCNF_AUC. PMID:28884168

  9. Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees.

    PubMed

    Williams, Philip H; Eyles, Rod; Weiller, Georg

    2012-01-01

    MicroRNAs (miRNAs) are nonprotein coding RNAs between 20 and 22 nucleotides long that attenuate protein production. Different types of sequence data are being investigated for novel miRNAs, including genomic and transcriptomic sequences. A variety of machine learning methods have successfully predicted miRNA precursors, mature miRNAs, and other nonprotein coding sequences. MirTools, mirDeep2, and miRanalyzer require "read count" to be included with the input sequences, which restricts their use to deep-sequencing data. Our aim was to train a predictor using a cross-section of different species to accurately predict miRNAs outside the training set. We wanted a system that did not require read-count for prediction and could therefore be applied to short sequences extracted from genomic, EST, or RNA-seq sources. A miRNA-predictive decision-tree model has been developed by supervised machine learning. It only requires that the corresponding genome or transcriptome is available within a sequence window that includes the precursor candidate so that the required sequence features can be collected. Some of the most critical features for training the predictor are the miRNA:miRNA(∗) duplex energy and the number of mismatches in the duplex. We present a cross-species plant miRNA predictor with 84.08% sensitivity and 98.53% specificity based on rigorous testing by leave-one-out validation.

  10. BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone.

    PubMed

    Yang, Bite; Liu, Feng; Ren, Chao; Ouyang, Zhangyi; Xie, Ziwei; Bo, Xiaochen; Shu, Wenjie

    2017-07-01

    Enhancer elements are noncoding stretches of DNA that play key roles in controlling gene expression programmes. Despite major efforts to develop accurate enhancer prediction methods, identifying enhancer sequences continues to be a challenge in the annotation of mammalian genomes. One of the major issues is the lack of large, sufficiently comprehensive and experimentally validated enhancers for humans or other species. Thus, the development of computational methods based on limited experimentally validated enhancers and deciphering the transcriptional regulatory code encoded in the enhancer sequences is urgent. We present a deep-learning-based hybrid architecture, BiRen, which predicts enhancers using the DNA sequence alone. Our results demonstrate that BiRen can learn common enhancer patterns directly from the DNA sequence and exhibits superior accuracy, robustness and generalizability in enhancer prediction relative to other state-of-the-art enhancer predictors based on sequence characteristics. Our BiRen will enable researchers to acquire a deeper understanding of the regulatory code of enhancer sequences. Our BiRen method can be freely accessed at https://github.com/wenjiegroup/BiRen . shuwj@bmi.ac.cn or boxc@bmi.ac.cn. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  11. Position-specific automated processing of V3 env ultra-deep pyrosequencing data for predicting HIV-1 tropism

    PubMed Central

    Jeanne, Nicolas; Saliou, Adrien; Carcenac, Romain; Lefebvre, Caroline; Dubois, Martine; Cazabat, Michelle; Nicot, Florence; Loiseau, Claire; Raymond, Stéphanie; Izopet, Jacques; Delobel, Pierre

    2015-01-01

    HIV-1 coreceptor usage must be accurately determined before starting CCR5 antagonist-based treatment as the presence of undetected minor CXCR4-using variants can cause subsequent virological failure. Ultra-deep pyrosequencing of HIV-1 V3 env allows to detect low levels of CXCR4-using variants that current genotypic approaches miss. However, the computation of the mass of sequence data and the need to identify true minor variants while excluding artifactual sequences generated during amplification and ultra-deep pyrosequencing is rate-limiting. Arbitrary fixed cut-offs below which minor variants are discarded are currently used but the errors generated during ultra-deep pyrosequencing are sequence-dependant rather than random. We have developed an automated processing of HIV-1 V3 env ultra-deep pyrosequencing data that uses biological filters to discard artifactual or non-functional V3 sequences followed by statistical filters to determine position-specific sensitivity thresholds, rather than arbitrary fixed cut-offs. It allows to retain authentic sequences with point mutations at V3 positions of interest and discard artifactual ones with accurate sensitivity thresholds. PMID:26585833

  12. Position-specific automated processing of V3 env ultra-deep pyrosequencing data for predicting HIV-1 tropism.

    PubMed

    Jeanne, Nicolas; Saliou, Adrien; Carcenac, Romain; Lefebvre, Caroline; Dubois, Martine; Cazabat, Michelle; Nicot, Florence; Loiseau, Claire; Raymond, Stéphanie; Izopet, Jacques; Delobel, Pierre

    2015-11-20

    HIV-1 coreceptor usage must be accurately determined before starting CCR5 antagonist-based treatment as the presence of undetected minor CXCR4-using variants can cause subsequent virological failure. Ultra-deep pyrosequencing of HIV-1 V3 env allows to detect low levels of CXCR4-using variants that current genotypic approaches miss. However, the computation of the mass of sequence data and the need to identify true minor variants while excluding artifactual sequences generated during amplification and ultra-deep pyrosequencing is rate-limiting. Arbitrary fixed cut-offs below which minor variants are discarded are currently used but the errors generated during ultra-deep pyrosequencing are sequence-dependant rather than random. We have developed an automated processing of HIV-1 V3 env ultra-deep pyrosequencing data that uses biological filters to discard artifactual or non-functional V3 sequences followed by statistical filters to determine position-specific sensitivity thresholds, rather than arbitrary fixed cut-offs. It allows to retain authentic sequences with point mutations at V3 positions of interest and discard artifactual ones with accurate sensitivity thresholds.

  13. Human Splice-Site Prediction with Deep Neural Networks.

    PubMed

    Naito, Tatsuhiko

    2018-04-18

    Accurate splice-site prediction is essential to delineate gene structures from sequence data. Several computational techniques have been applied to create a system to predict canonical splice sites. For classification tasks, deep neural networks (DNNs) have achieved record-breaking results and often outperformed other supervised learning techniques. In this study, a new method of splice-site prediction using DNNs was proposed. The proposed system receives an input sequence data and returns an answer as to whether it is splice site. The length of input is 140 nucleotides, with the consensus sequence (i.e., "GT" and "AG" for the donor and acceptor sites, respectively) in the middle. Each input sequence model is applied to the pretrained DNN model that determines the probability that an input is a splice site. The model consists of convolutional layers and bidirectional long short-term memory network layers. The pretraining and validation were conducted using the data set tested in previously reported methods. The performance evaluation results showed that the proposed method can outperform the previous methods. In addition, the pattern learned by the DNNs was visualized as position frequency matrices (PFMs). Some of PFMs were very similar to the consensus sequence. The trained DNN model and the brief source code for the prediction system are uploaded. Further improvement will be achieved following the further development of DNNs.

  14. Deep Impact Sequence Planning Using Multi-Mission Adaptable Planning Tools With Integrated Spacecraft Models

    NASA Technical Reports Server (NTRS)

    Wissler, Steven S.; Maldague, Pierre; Rocca, Jennifer; Seybold, Calina

    2006-01-01

    The Deep Impact mission was ambitious and challenging. JPL's well proven, easily adaptable multi-mission sequence planning tools combined with integrated spacecraft subsystem models enabled a small operations team to develop, validate, and execute extremely complex sequence-based activities within very short development times. This paper focuses on the core planning tool used in the mission, APGEN. It shows how the multi-mission design and adaptability of APGEN made it possible to model spacecraft subsystems as well as ground assets throughout the lifecycle of the Deep Impact project, starting with models of initial, high-level mission objectives, and culminating in detailed predictions of spacecraft behavior during mission-critical activities.

  15. High Class-Imbalance in pre-miRNA Prediction: A Novel Approach Based on deepSOM.

    PubMed

    Stegmayer, Georgina; Yones, Cristian; Kamenetzky, Laura; Milone, Diego H

    2017-01-01

    The computational prediction of novel microRNA within a full genome involves identifying sequences having the highest chance of being a miRNA precursor (pre-miRNA). These sequences are usually named candidates to miRNA. The well-known pre-miRNAs are usually only a few in comparison to the hundreds of thousands of potential candidates to miRNA that have to be analyzed, which makes this task a high class-imbalance classification problem. The classical way of approaching it has been training a binary classifier in a supervised manner, using well-known pre-miRNAs as positive class and artificially defining the negative class. However, although the selection of positive labeled examples is straightforward, it is very difficult to build a set of negative examples in order to obtain a good set of training samples for a supervised method. In this work, we propose a novel and effective way of approaching this problem using machine learning, without the definition of negative examples. The proposal is based on clustering unlabeled sequences of a genome together with well-known miRNA precursors for the organism under study, which allows for the quick identification of the best candidates to miRNA as those sequences clustered with known precursors. Furthermore, we propose a deep model to overcome the problem of having very few positive class labels. They are always maintained in the deep levels as positive class while less likely pre-miRNA sequences are filtered level after level. Our approach has been compared with other methods for pre-miRNAs prediction in several species, showing effective predictivity of novel miRNAs. Additionally, we will show that our approach has a lower training time and allows for a better graphical navegability and interpretation of the results. A web-demo interface to try deepSOM is available at http://fich.unl.edu.ar/sinc/web-demo/deepsom/.

  16. Using Deep Learning Model for Meteorological Satellite Cloud Image Prediction

    NASA Astrophysics Data System (ADS)

    Su, X.

    2017-12-01

    A satellite cloud image contains much weather information such as precipitation information. Short-time cloud movement forecast is important for precipitation forecast and is the primary means for typhoon monitoring. The traditional methods are mostly using the cloud feature matching and linear extrapolation to predict the cloud movement, which makes that the nonstationary process such as inversion and deformation during the movement of the cloud is basically not considered. It is still a hard task to predict cloud movement timely and correctly. As deep learning model could perform well in learning spatiotemporal features, to meet this challenge, we could regard cloud image prediction as a spatiotemporal sequence forecasting problem and introduce deep learning model to solve this problem. In this research, we use a variant of Gated-Recurrent-Unit(GRU) that has convolutional structures to deal with spatiotemporal features and build an end-to-end model to solve this forecast problem. In this model, both the input and output are spatiotemporal sequences. Compared to Convolutional LSTM(ConvLSTM) model, this model has lower amount of parameters. We imply this model on GOES satellite data and the model perform well.

  17. New in protein structure and function annotation: hotspots, single nucleotide polymorphisms and the 'Deep Web'.

    PubMed

    Bromberg, Yana; Yachdav, Guy; Ofran, Yanay; Schneider, Reinhard; Rost, Burkhard

    2009-05-01

    The rapidly increasing quantity of protein sequence data continues to widen the gap between available sequences and annotations. Comparative modeling suggests some aspects of the 3D structures of approximately half of all known proteins; homology- and network-based inferences annotate some aspect of function for a similar fraction of the proteome. For most known protein sequences, however, there is detailed knowledge about neither their function nor their structure. Comprehensive efforts towards the expert curation of sequence annotations have failed to meet the demand of the rapidly increasing number of available sequences. Only the automated prediction of protein function in the absence of homology can close the gap between available sequences and annotations in the foreseeable future. This review focuses on two novel methods for automated annotation, and briefly presents an outlook on how modern web software may revolutionize the field of protein sequence annotation. First, predictions of protein binding sites and functional hotspots, and the evolution of these into the most successful type of prediction of protein function from sequence will be discussed. Second, a new tool, comprehensive in silico mutagenesis, which contributes important novel predictions of function and at the same time prepares for the onset of the next sequencing revolution, will be described. While these two new sub-fields of protein prediction represent the breakthroughs that have been achieved methodologically, it will then be argued that a different development might further change the way biomedical researchers benefit from annotations: modern web software can connect the worldwide web in any browser with the 'Deep Web' (ie, proprietary data resources). The availability of this direct connection, and the resulting access to a wealth of data, may impact drug discovery and development more than any existing method that contributes to protein annotation.

  18. miRanalyzer: a microRNA detection and analysis tool for next-generation sequencing experiments.

    PubMed

    Hackenberg, Michael; Sturm, Martin; Langenberger, David; Falcón-Pérez, Juan Manuel; Aransay, Ana M

    2009-07-01

    Next-generation sequencing allows now the sequencing of small RNA molecules and the estimation of their expression levels. Consequently, there will be a high demand of bioinformatics tools to cope with the several gigabytes of sequence data generated in each single deep-sequencing experiment. Given this scene, we developed miRanalyzer, a web server tool for the analysis of deep-sequencing experiments for small RNAs. The web server tool requires a simple input file containing a list of unique reads and its copy numbers (expression levels). Using these data, miRanalyzer (i) detects all known microRNA sequences annotated in miRBase, (ii) finds all perfect matches against other libraries of transcribed sequences and (iii) predicts new microRNAs. The prediction of new microRNAs is an especially important point as there are many species with very few known microRNAs. Therefore, we implemented a highly accurate machine learning algorithm for the prediction of new microRNAs that reaches AUC values of 97.9% and recall values of up to 75% on unseen data. The web tool summarizes all the described steps in a single output page, which provides a comprehensive overview of the analysis, adding links to more detailed output pages for each analysis module. miRanalyzer is available at http://web.bioinformatics.cicbiogune.es/microRNA/.

  19. Deep Learning and Its Applications in Biomedicine.

    PubMed

    Cao, Chensi; Liu, Feng; Tan, Hai; Song, Deshou; Shu, Wenjie; Li, Weizhong; Zhou, Yiming; Bo, Xiaochen; Xie, Zhi

    2018-02-01

    Advances in biological and medical technologies have been providing us explosive volumes of biological and physiological data, such as medical images, electroencephalography, genomic and protein sequences. Learning from these data facilitates the understanding of human health and disease. Developed from artificial neural networks, deep learning-based algorithms show great promise in extracting features and learning patterns from complex data. The aim of this paper is to provide an overview of deep learning techniques and some of the state-of-the-art applications in the biomedical field. We first introduce the development of artificial neural network and deep learning. We then describe two main components of deep learning, i.e., deep learning architectures and model optimization. Subsequently, some examples are demonstrated for deep learning applications, including medical image classification, genomic sequence analysis, as well as protein structure classification and prediction. Finally, we offer our perspectives for the future directions in the field of deep learning. Copyright © 2018. Production and hosting by Elsevier B.V.

  20. Unified Deep Learning Architecture for Modeling Biology Sequence.

    PubMed

    Wu, Hongjie; Cao, Chengyuan; Xia, Xiaoyan; Lu, Qiang

    2017-10-09

    Prediction of the spatial structure or function of biological macromolecules based on their sequence remains an important challenge in bioinformatics. When modeling biological sequences using traditional sequencing models, characteristics, such as long-range interactions between basic units, the complicated and variable output of labeled structures, and the variable length of biological sequences, usually lead to different solutions on a case-by-case basis. This study proposed the use of bidirectional recurrent neural networks based on long short-term memory or a gated recurrent unit to capture long-range interactions by designing the optional reshape operator to adapt to the diversity of the output labels and implementing a training algorithm to support the training of sequence models capable of processing variable-length sequences. Additionally, the merge and pooling operators enhanced the ability to capture short-range interactions between basic units of biological sequences. The proposed deep-learning model and its training algorithm might be capable of solving currently known biological sequence-modeling problems through the use of a unified framework. We validated our model on one of the most difficult biological sequence-modeling problems currently known, with our results indicating the ability of the model to obtain predictions of protein residue interactions that exceeded the accuracy of current popular approaches by 10% based on multiple benchmarks.

  1. Genome-wide analyses of long noncoding RNA expression profiles correlated with radioresistance in nasopharyngeal carcinoma via next-generation deep sequencing.

    PubMed

    Li, Guo; Liu, Yong; Liu, Chao; Su, Zhongwu; Ren, Shuling; Wang, Yunyun; Deng, Tengbo; Huang, Donghai; Tian, Yongquan; Qiu, Yuanzheng

    2016-09-06

    Radioresistance is one of the major factors limiting the therapeutic efficacy and prognosis of patients with nasopharyngeal carcinoma (NPC). Accumulating evidence has suggested that aberrant expression of long noncoding RNAs (lncRNAs) contributes to cancer progression. Therefore, here we identified lncRNAs associated with radioresistance in NPC. The differential expression profiles of lncRNAs associated with NPC radioresistance were constructed by next-generation deep sequencing by comparing radioresistant NPC cells with their parental cells. LncRNA-related mRNAs were predicted and analyzed using bioinformatics algorithms compared with the mRNA profiles related to radioresistance obtained in our previous study. Several lncRNAs and associated mRNAs were validated in established NPC radioresistant cell models and NPC tissues. By comparison between radioresistant CNE-2-Rs and parental CNE-2 cells by next-generation deep sequencing, a total of 781 known lncRNAs and 2054 novel lncRNAs were annotated. The top five upregulated and downregulated known/novel lncRNAs were detected using quantitative real-time reverse transcription-polymerase chain reaction, and 7/10 known lncRNAs and 3/10 novel lncRNAs were demonstrated to have significant differential expression trends that were the same as those predicted by deep sequencing. From the prediction process, 13 pairs of lncRNAs and their associated genes were acquired, and the prediction trends of three pairs were validated in both radioresistant CNE-2-Rs and 6-10B-Rs cell lines, including lncRNA n373932 and SLITRK5, n409627 and PRSS12, and n386034 and RIMKLB. LncRNA n373932 and its related SLITRK5 showed dramatic expression changes in post-irradiation radioresistant cells and a negative expression correlation in NPC tissues (R = -0.595, p < 0.05). Our study provides an overview of the expression profiles of radioresistant lncRNAs and potentially related mRNAs, which will facilitate future investigations into the function of lncRNAs in NPC radioresistance.

  2. A sparse autoencoder-based deep neural network for protein solvent accessibility and contact number prediction.

    PubMed

    Deng, Lei; Fan, Chao; Zeng, Zhiwen

    2017-12-28

    Direct prediction of the three-dimensional (3D) structures of proteins from one-dimensional (1D) sequences is a challenging problem. Significant structural characteristics such as solvent accessibility and contact number are essential for deriving restrains in modeling protein folding and protein 3D structure. Thus, accurately predicting these features is a critical step for 3D protein structure building. In this study, we present DeepSacon, a computational method that can effectively predict protein solvent accessibility and contact number by using a deep neural network, which is built based on stacked autoencoder and a dropout method. The results demonstrate that our proposed DeepSacon achieves a significant improvement in the prediction quality compared with the state-of-the-art methods. We obtain 0.70 three-state accuracy for solvent accessibility, 0.33 15-state accuracy and 0.74 Pearson Correlation Coefficient (PCC) for the contact number on the 5729 monomeric soluble globular protein dataset. We also evaluate the performance on the CASP11 benchmark dataset, DeepSacon achieves 0.68 three-state accuracy and 0.69 PCC for solvent accessibility and contact number, respectively. We have shown that DeepSacon can reliably predict solvent accessibility and contact number with stacked sparse autoencoder and a dropout approach.

  3. Genome-wide prediction of cis-regulatory regions using supervised deep learning methods.

    PubMed

    Li, Yifeng; Shi, Wenqiang; Wasserman, Wyeth W

    2018-05-31

    In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide. Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome). The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations.

  4. A new method for enhancer prediction based on deep belief network.

    PubMed

    Bu, Hongda; Gan, Yanglan; Wang, Yang; Zhou, Shuigeng; Guan, Jihong

    2017-10-16

    Studies have shown that enhancers are significant regulatory elements to play crucial roles in gene expression regulation. Since enhancers are unrelated to the orientation and distance to their target genes, it is a challenging mission for scholars and researchers to accurately predicting distal enhancers. In the past years, with the high-throughout ChiP-seq technologies development, several computational techniques emerge to predict enhancers using epigenetic or genomic features. Nevertheless, the inconsistency of computational models across different cell-lines and the unsatisfactory prediction performance call for further research in this area. Here, we propose a new Deep Belief Network (DBN) based computational method for enhancer prediction, which is called EnhancerDBN. This method combines diverse features, composed of DNA sequence compositional features, DNA methylation and histone modifications. Our computational results indicate that 1) EnhancerDBN outperforms 13 existing methods in prediction, and 2) GC content and DNA methylation can serve as relevant features for enhancer prediction. Deep learning is effective in boosting the performance of enhancer prediction.

  5. Predicting backbone Cα angles and dihedrals from protein sequences by stacked sparse auto-encoder deep neural network.

    PubMed

    Lyons, James; Dehzangi, Abdollah; Heffernan, Rhys; Sharma, Alok; Paliwal, Kuldip; Sattar, Abdul; Zhou, Yaoqi; Yang, Yuedong

    2014-10-30

    Because a nearly constant distance between two neighbouring Cα atoms, local backbone structure of proteins can be represented accurately by the angle between C(αi-1)-C(αi)-C(αi+1) (θ) and a dihedral angle rotated about the C(αi)-C(αi+1) bond (τ). θ and τ angles, as the representative of structural properties of three to four amino-acid residues, offer a description of backbone conformations that is complementary to φ and ψ angles (single residue) and secondary structures (>3 residues). Here, we report the first machine-learning technique for sequence-based prediction of θ and τ angles. Predicted angles based on an independent test have a mean absolute error of 9° for θ and 34° for τ with a distribution on the θ-τ plane close to that of native values. The average root-mean-square distance of 10-residue fragment structures constructed from predicted θ and τ angles is only 1.9Å from their corresponding native structures. Predicted θ and τ angles are expected to be complementary to predicted ϕ and ψ angles and secondary structures for using in model validation and template-based as well as template-free structure prediction. The deep neural network learning technique is available as an on-line server called Structural Property prediction with Integrated DEep neuRal network (SPIDER) at http://sparks-lab.org. Copyright © 2014 Wiley Periodicals, Inc.

  6. Predicting human protein function with multi-task deep neural networks.

    PubMed

    Fa, Rui; Cozzetto, Domenico; Wan, Cen; Jones, David T

    2018-01-01

    Machine learning methods for protein function prediction are urgently needed, especially now that a substantial fraction of known sequences remains unannotated despite the extensive use of functional assignments based on sequence similarity. One major bottleneck supervised learning faces in protein function prediction is the structured, multi-label nature of the problem, because biological roles are represented by lists of terms from hierarchically organised controlled vocabularies such as the Gene Ontology. In this work, we build on recent developments in the area of deep learning and investigate the usefulness of multi-task deep neural networks (MTDNN), which consist of upstream shared layers upon which are stacked in parallel as many independent modules (additional hidden layers with their own output units) as the number of output GO terms (the tasks). MTDNN learns individual tasks partially using shared representations and partially from task-specific characteristics. When no close homologues with experimentally validated functions can be identified, MTDNN gives more accurate predictions than baseline methods based on annotation frequencies in public databases or homology transfers. More importantly, the results show that MTDNN binary classification accuracy is higher than alternative machine learning-based methods that do not exploit commonalities and differences among prediction tasks. Interestingly, compared with a single-task predictor, the performance improvement is not linearly correlated with the number of tasks in MTDNN, but medium size models provide more improvement in our case. One of advantages of MTDNN is that given a set of features, there is no requirement for MTDNN to have a bootstrap feature selection procedure as what traditional machine learning algorithms do. Overall, the results indicate that the proposed MTDNN algorithm improves the performance of protein function prediction. On the other hand, there is still large room for deep learning techniques to further enhance prediction ability.

  7. Deep sequencing and in silico analysis of small RNA library reveals novel miRNA from leaf Persicaria minor transcriptome.

    PubMed

    Samad, Abdul Fatah A; Nazaruddin, Nazaruddin; Murad, Abdul Munir Abdul; Jani, Jaeyres; Zainal, Zamri; Ismail, Ismanizan

    2018-03-01

    In current era, majority of microRNA (miRNA) are being discovered through computational approaches which are more confined towards model plants. Here, for the first time, we have described the identification and characterization of novel miRNA in a non-model plant, Persicaria minor ( P . minor ) using computational approach. Unannotated sequences from deep sequencing were analyzed based on previous well-established parameters. Around 24 putative novel miRNAs were identified from 6,417,780 reads of the unannotated sequence which represented 11 unique putative miRNA sequences. PsRobot target prediction tool was deployed to identify the target transcripts of putative novel miRNAs. Most of the predicted target transcripts (mRNAs) were known to be involved in plant development and stress responses. Gene ontology showed that majority of the putative novel miRNA targets involved in cellular component (69.07%), followed by molecular function (30.08%) and biological process (0.85%). Out of 11 unique putative miRNAs, 7 miRNAs were validated through semi-quantitative PCR. These novel miRNAs discoveries in P . minor may develop and update the current public miRNA database.

  8. DEEP: a general computational framework for predicting enhancers

    PubMed Central

    Kleftogiannis, Dimitrios; Kalnis, Panos; Bajic, Vladimir B.

    2015-01-01

    Transcription regulation in multicellular eukaryotes is orchestrated by a number of DNA functional elements located at gene regulatory regions. Some regulatory regions (e.g. enhancers) are located far away from the gene they affect. Identification of distal regulatory elements is a challenge for the bioinformatics research. Although existing methodologies increased the number of computationally predicted enhancers, performance inconsistency of computational models across different cell-lines, class imbalance within the learning sets and ad hoc rules for selecting enhancer candidates for supervised learning, are some key questions that require further examination. In this study we developed DEEP, a novel ensemble prediction framework. DEEP integrates three components with diverse characteristics that streamline the analysis of enhancer's properties in a great variety of cellular conditions. In our method we train many individual classification models that we combine to classify DNA regions as enhancers or non-enhancers. DEEP uses features derived from histone modification marks or attributes coming from sequence characteristics. Experimental results indicate that DEEP performs better than four state-of-the-art methods on the ENCODE data. We report the first computational enhancer prediction results on FANTOM5 data where DEEP achieves 90.2% accuracy and 90% geometric mean (GM) of specificity and sensitivity across 36 different tissues. We further present results derived using in vivo-derived enhancer data from VISTA database. DEEP-VISTA, when tested on an independent test set, achieved GM of 80.1% and accuracy of 89.64%. DEEP framework is publicly available at http://cbrc.kaust.edu.sa/deep/. PMID:25378307

  9. Determination of binding affinity upon mutation for type I dockerin-cohesin complexes from Clostridium thermocellum and Clostridium cellulolyticum using deep sequencing.

    PubMed

    Kowalsky, Caitlin A; Whitehead, Timothy A

    2016-12-01

    The comprehensive sequence determinants of binding affinity for type I cohesin toward dockerin from Clostridium thermocellum and Clostridium cellulolyticum was evaluated using deep mutational scanning coupled to yeast surface display. We measured the relative binding affinity to dockerin for 2970 and 2778 single point mutants of C. thermocellum and C. cellulolyticum, respectively, representing over 96% of all possible single point mutants. The interface ΔΔG for each variant was reconstructed from sequencing counts and compared with the three independent experimental methods. This reconstruction results in a narrow dynamic range of -0.8-0.5 kcal/mol. The computational software packages FoldX and Rosetta were used to predict mutations that disrupt binding by more than 0.4 kcal/mol. The area under the curve of receiver operator curves was 0.82 for FoldX and 0.77 for Rosetta, showing reasonable agreements between predictions and experimental results. Destabilizing mutations to core and rim positions were predicted with higher accuracy than support positions. This benchmark dataset may be useful for developing new computational prediction tools for the prediction of the mutational effect on binding affinities for protein-protein interactions. Experimental considerations to improve precision and range of the reconstruction method are discussed. Proteins 2016; 84:1914-1928. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  10. A Template-Based Protein Structure Reconstruction Method Using Deep Autoencoder Learning.

    PubMed

    Li, Haiou; Lyu, Qiang; Cheng, Jianlin

    2016-12-01

    Protein structure prediction is an important problem in computational biology, and is widely applied to various biomedical problems such as protein function study, protein design, and drug design. In this work, we developed a novel deep learning approach based on a deeply stacked denoising autoencoder for protein structure reconstruction. We applied our approach to a template-based protein structure prediction using only the 3D structural coordinates of homologous template proteins as input. The templates were identified for a target protein by a PSI-BLAST search. 3DRobot (a program that automatically generates diverse and well-packed protein structure decoys) was used to generate initial decoy models for the target from the templates. A stacked denoising autoencoder was trained on the decoys to obtain a deep learning model for the target protein. The trained deep model was then used to reconstruct the final structural model for the target sequence. With target proteins that have highly similar template proteins as benchmarks, the GDT-TS score of the predicted structures is greater than 0.7, suggesting that the deep autoencoder is a promising method for protein structure reconstruction.

  11. Prediction of residue-residue contact matrix for protein-protein interaction with Fisher score features and deep learning.

    PubMed

    Du, Tianchuan; Liao, Li; Wu, Cathy H; Sun, Bilin

    2016-11-01

    Protein-protein interactions play essential roles in many biological processes. Acquiring knowledge of the residue-residue contact information of two interacting proteins is not only helpful in annotating functions for proteins, but also critical for structure-based drug design. The prediction of the protein residue-residue contact matrix of the interfacial regions is challenging. In this work, we introduced deep learning techniques (specifically, stacked autoencoders) to build deep neural network models to tackled the residue-residue contact prediction problem. In tandem with interaction profile Hidden Markov Models, which was used first to extract Fisher score features from protein sequences, stacked autoencoders were deployed to extract and learn hidden abstract features. The deep learning model showed significant improvement over the traditional machine learning model, Support Vector Machines (SVM), with the overall accuracy increased by 15% from 65.40% to 80.82%. We showed that the stacked autoencoders could extract novel features, which can be utilized by deep neural networks and other classifiers to enhance learning, out of the Fisher score features. It is further shown that deep neural networks have significant advantages over SVM in making use of the newly extracted features. Copyright © 2016. Published by Elsevier Inc.

  12. High-throughput, pooled sequencing identifies mutations in NUBPL and FOXRED1 in human complex I deficiency

    PubMed Central

    Calvo, Sarah E; Tucker, Elena J; Compton, Alison G; Kirby, Denise M; Crawford, Gabriel; Burtt, Noel P; Rivas, Manuel A; Guiducci, Candace; Bruno, Damien L; Goldberger, Olga A; Redman, Michelle C; Wiltshire, Esko; Wilson, Callum J; Altshuler, David; Gabriel, Stacey B; Daly, Mark J; Thorburn, David R; Mootha, Vamsi K

    2010-01-01

    Discovering the molecular basis of mitochondrial respiratory chain disease is challenging given the large number of both mitochondrial and nuclear genes involved. We report a strategy of focused candidate gene prediction, high-throughput sequencing, and experimental validation to uncover the molecular basis of mitochondrial complex I (CI) disorders. We created five pools of DNA from a cohort of 103 patients and then performed deep sequencing of 103 candidate genes to spotlight 151 rare variants predicted to impact protein function. We used confirmatory experiments to establish genetic diagnoses in 22% of previously unsolved cases, and discovered that defects in NUBPL and FOXRED1 can cause CI deficiency. Our study illustrates how large-scale sequencing, coupled with functional prediction and experimental validation, can reveal novel disease-causing mutations in individual patients. PMID:20818383

  13. Deep Sequencing-Identified Kanamycin-Resistant Paenibacillus sp. Strain KS1 Isolated from Epiphyte Tillandsia usneoides (Spanish Moss) in Central Florida, USA

    PubMed Central

    Govindarajan, Subramaniam S.; Qi, Feng; Li, Jian-Liang; Sahoo, Malaya K.

    2017-01-01

    ABSTRACT Paenibacillus sp. strain KS1 was isolated from an epiphyte, Tillandsia usneoides (Spanish moss), in central Florida, USA. Here, we report a draft genome sequence of this strain, which consists of a total of 398 contigs spanning 6,508,195 bp, with a G+C content of 46.5% and comprising 5,401 predicted coding sequences. PMID:28153888

  14. Aftershock occurrence rate decay for individual sequences and catalogs

    NASA Astrophysics Data System (ADS)

    Nyffenegger, Paul A.

    One of the earliest observations of the Earth's seismicity is that the rate of aftershock occurrence decays with time according to a power law commonly known as modified Omori-law (MOL) decay. However, the physical reasons for aftershock occurrence and the empirical decay in rate remain unclear despite numerous models that yield similar rate decay behavior. Key problems in relating the observed empirical relationship to the physical conditions of the mainshock and fault are the lack of studies including small magnitude mainshocks and the lack of uniformity between studies. We use simulated aftershock sequences to investigate the factors which influence the maximum likelihood (ML) estimate of the Omori-law p value, the parameter describing aftershock occurrence rate decay, for both individual aftershock sequences and "stacked" or superposed sequences. Generally the ML estimate of p is accurate, but since the ML estimated uncertainty is unaffected by whether the sequence resembles an MOL model, a goodness-of-fit test such as the Anderson-Darling statistic is necessary. While stacking aftershock sequences permits the study of entire catalogs and sequences with small aftershock populations, stacking introduces artifacts. The p value for stacked sequences is approximately equal to the mean of the individual sequence p values. We apply single-link cluster analysis to identify all aftershock sequences from eleven regional seismicity catalogs. We observe two new mathematically predictable empirical relationships for the distribution of aftershock sequence populations. The average properties of aftershock sequences are not correlated with tectonic environment, but aftershock populations and p values do show a depth dependence. The p values show great variability with time, and large values or changes in p sometimes precedes major earthquakes. Studies of teleseismic earthquake catalogs over the last twenty years have led seismologists to question seismicity models and aftershock sequence decay for deep sequences. For seven exceptional deep sequences, we conclude that MOL decay adequately describes these sequences, and little difference exists compared to shallow sequences. However, they do include larger aftershock populations compared to most deep sequences. These results imply that p values for deep sequences are larger than those for intermediate depth sequences.

  15. Modeling genome coverage in single-cell sequencing

    PubMed Central

    Daley, Timothy; Smith, Andrew D.

    2014-01-01

    Motivation: Single-cell DNA sequencing is necessary for examining genetic variation at the cellular level, which remains hidden in bulk sequencing experiments. But because they begin with such small amounts of starting material, the amount of information that is obtained from single-cell sequencing experiment is highly sensitive to the choice of protocol employed and variability in library preparation. In particular, the fraction of the genome represented in single-cell sequencing libraries exhibits extreme variability due to quantitative biases in amplification and loss of genetic material. Results: We propose a method to predict the genome coverage of a deep sequencing experiment using information from an initial shallow sequencing experiment mapped to a reference genome. The observed coverage statistics are used in a non-parametric empirical Bayes Poisson model to estimate the gain in coverage from deeper sequencing. This approach allows researchers to know statistical features of deep sequencing experiments without actually sequencing deeply, providing a basis for optimizing and comparing single-cell sequencing protocols or screening libraries. Availability and implementation: The method is available as part of the preseq software package. Source code is available at http://smithlabresearch.org/preseq. Contact: andrewds@usc.edu Supplementary information: Supplementary material is available at Bioinformatics online. PMID:25107873

  16. DeepSig: deep learning improves signal peptide detection in proteins.

    PubMed

    Savojardo, Castrense; Martelli, Pier Luigi; Fariselli, Piero; Casadio, Rita

    2018-05-15

    The identification of signal peptides in protein sequences is an important step toward protein localization and function characterization. Here, we present DeepSig, an improved approach for signal peptide detection and cleavage-site prediction based on deep learning methods. Comparative benchmarks performed on an updated independent dataset of proteins show that DeepSig is the current best performing method, scoring better than other available state-of-the-art approaches on both signal peptide detection and precise cleavage-site identification. DeepSig is available as both standalone program and web server at https://deepsig.biocomp.unibo.it. All datasets used in this study can be obtained from the same website. pierluigi.martelli@unibo.it. Supplementary data are available at Bioinformatics online.

  17. Deep Sequencing-Identified Kanamycin-Resistant Paenibacillus sp. Strain KS1 Isolated from Epiphyte Tillandsia usneoides (Spanish Moss) in Central Florida, USA.

    PubMed

    Lata, Pushpa; Govindarajan, Subramaniam S; Qi, Feng; Li, Jian-Liang; Sahoo, Malaya K

    2017-02-02

    Paenibacillus sp. strain KS1 was isolated from an epiphyte, Tillandsia usneoides (Spanish moss), in central Florida, USA. Here, we report a draft genome sequence of this strain, which consists of a total of 398 contigs spanning 6,508,195 bp, with a G+C content of 46.5% and comprising 5,401 predicted coding sequences. Copyright © 2017 Lata et al.

  18. RaptorX-Angle: real-value prediction of protein backbone dihedral angles through a hybrid method of clustering and deep learning.

    PubMed

    Gao, Yujuan; Wang, Sheng; Deng, Minghua; Xu, Jinbo

    2018-05-08

    Protein dihedral angles provide a detailed description of protein local conformation. Predicted dihedral angles can be used to narrow down the conformational space of the whole polypeptide chain significantly, thus aiding protein tertiary structure prediction. However, direct angle prediction from sequence alone is challenging. In this article, we present a novel method (named RaptorX-Angle) to predict real-valued angles by combining clustering and deep learning. Tested on a subset of PDB25 and the targets in the latest two Critical Assessment of protein Structure Prediction (CASP), our method outperforms the existing state-of-art method SPIDER2 in terms of Pearson Correlation Coefficient (PCC) and Mean Absolute Error (MAE). Our result also shows approximately linear relationship between the real prediction errors and our estimated bounds. That is, the real prediction error can be well approximated by our estimated bounds. Our study provides an alternative and more accurate prediction of dihedral angles, which may facilitate protein structure prediction and functional study.

  19. RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach.

    PubMed

    Pan, Xiaoyong; Shen, Hong-Bin

    2017-02-28

    RNAs play key roles in cells through the interactions with proteins known as the RNA-binding proteins (RBP) and their binding motifs enable crucial understanding of the post-transcriptional regulation of RNAs. How the RBPs correctly recognize the target RNAs and why they bind specific positions is still far from clear. Machine learning-based algorithms are widely acknowledged to be capable of speeding up this process. Although many automatic tools have been developed to predict the RNA-protein binding sites from the rapidly growing multi-resource data, e.g. sequence, structure, their domain specific features and formats have posed significant computational challenges. One of current difficulties is that the cross-source shared common knowledge is at a higher abstraction level beyond the observed data, resulting in a low efficiency of direct integration of observed data across domains. The other difficulty is how to interpret the prediction results. Existing approaches tend to terminate after outputting the potential discrete binding sites on the sequences, but how to assemble them into the meaningful binding motifs is a topic worth of further investigation. In viewing of these challenges, we propose a deep learning-based framework (iDeep) by using a novel hybrid convolutional neural network and deep belief network to predict the RBP interaction sites and motifs on RNAs. This new protocol is featured by transforming the original observed data into a high-level abstraction feature space using multiple layers of learning blocks, where the shared representations across different domains are integrated. To validate our iDeep method, we performed experiments on 31 large-scale CLIP-seq datasets, and our results show that by integrating multiple sources of data, the average AUC can be improved by 8% compared to the best single-source-based predictor; and through cross-domain knowledge integration at an abstraction level, it outperforms the state-of-the-art predictors by 6%. Besides the overall enhanced prediction performance, the convolutional neural network module embedded in iDeep is also able to automatically capture the interpretable binding motifs for RBPs. Large-scale experiments demonstrate that these mined binding motifs agree well with the experimentally verified results, suggesting iDeep is a promising approach in the real-world applications. The iDeep framework not only can achieve promising performance than the state-of-the-art predictors, but also easily capture interpretable binding motifs. iDeep is available at http://www.csbio.sjtu.edu.cn/bioinf/iDeep.

  20. Pre-drill predictions versus post-drill results: use of sequence stratigraphic methods in reduction of exploration risk, Sarawak Deep-water Blocks, Malaysia

    NASA Astrophysics Data System (ADS)

    Mansor, Md Yazid; Snedden, J. W.; Sarg, J. F.; Smith, B. S.; Kolich, T.; Carter, M.

    1999-04-01

    Limited well control, great distances from age-equivalent producing fields, and a largely unknown stratigraphy necessitated use of sequence stratigraphic methods to assess exploration risk associated with reservoir, source and seal distribution in the Mobil-operated Deep-water Blocks of Sarawak, Malaysia. These methods allowed predictions to be made and reservoir risks to be halved in each of the locations drilled in 1995. Predictions regarding reservoir and stratigraphy proved correct, as the Mulu-1 and Bako-1 wells penetrated numerous high-quality, thick sandstone reservoirs in the Middle to Lower Miocene section. Shallow marine sandstones dominate the vertical succession in both wells, with characteristic aggradational, upward-coarsening log motifs. Cores display classic wave-generated stratification and hummocky cross-bedding. Evidence, such as marginal-marine to neritic microfauna in cuttings of both wells, supports these interpretations. Lack of hydrocarbon charge in the two wells may be due to their position relative to coaly hydrocarbon source beds. These prospects have high trap and seal integrity, being well defined on seismics as high relief horst blocks covered by a very thick shale-prone section. The Mulu-1 well, for example, is located at least 20-30 km down stratigraphic dip from mapped coeval lower coastal-plain deposits. Amplitude anomalies on the flank of the Mulu horst are probably derived from transported organics buried in deep Plio-Pleistocene kitchens in the northwest portion of the Mobil blocks. Remaining potential of mapped prospects is high and efforts continue at characterizing the petroleum system of the Deep-water Blocks. Seismic attribute and interval velocity analyses provide new clues to the location of probable coaly source rocks, especially when viewed in their regional and sequence stratigraphic context. Future work is planned and will serve to reduce risk to acceptable levels and support further drilling in this prospective hydrocarbon province.

  1. DRREP: deep ridge regressed epitope predictor.

    PubMed

    Sher, Gene; Zhi, Degui; Zhang, Shaojie

    2017-10-03

    The ability to predict epitopes plays an enormous role in vaccine development in terms of our ability to zero in on where to do a more thorough in-vivo analysis of the protein in question. Though for the past decade there have been numerous advancements and improvements in epitope prediction, on average the best benchmark prediction accuracies are still only around 60%. New machine learning algorithms have arisen within the domain of deep learning, text mining, and convolutional networks. This paper presents a novel analytically trained and string kernel using deep neural network, which is tailored for continuous epitope prediction, called: Deep Ridge Regressed Epitope Predictor (DRREP). DRREP was tested on long protein sequences from the following datasets: SARS, Pellequer, HIV, AntiJen, and SEQ194. DRREP was compared to numerous state of the art epitope predictors, including the most recently published predictors called LBtope and DMNLBE. Using area under ROC curve (AUC), DRREP achieved a performance improvement over the best performing predictors on SARS (13.7%), HIV (8.9%), Pellequer (1.5%), and SEQ194 (3.1%), with its performance being matched only on the AntiJen dataset, by the LBtope predictor, where both DRREP and LBtope achieved an AUC of 0.702. DRREP is an analytically trained deep neural network, thus capable of learning in a single step through regression. By combining the features of deep learning, string kernels, and convolutional networks, the system is able to perform residue-by-residue prediction of continues epitopes with higher accuracy than the current state of the art predictors.

  2. Prediction of Bispectral Index during Target-controlled Infusion of Propofol and Remifentanil: A Deep Learning Approach.

    PubMed

    Lee, Hyung-Chul; Ryu, Ho-Geol; Chung, Eun-Jin; Jung, Chul-Woo

    2018-03-01

    The discrepancy between predicted effect-site concentration and measured bispectral index is problematic during intravenous anesthesia with target-controlled infusion of propofol and remifentanil. We hypothesized that bispectral index during total intravenous anesthesia would be more accurately predicted by a deep learning approach. Long short-term memory and the feed-forward neural network were sequenced to simulate the pharmacokinetic and pharmacodynamic parts of an empirical model, respectively, to predict intraoperative bispectral index during combined use of propofol and remifentanil. Inputs of long short-term memory were infusion histories of propofol and remifentanil, which were retrieved from target-controlled infusion pumps for 1,800 s at 10-s intervals. Inputs of the feed-forward network were the outputs of long short-term memory and demographic data such as age, sex, weight, and height. The final output of the feed-forward network was the bispectral index. The performance of bispectral index prediction was compared between the deep learning model and previously reported response surface model. The model hyperparameters comprised 8 memory cells in the long short-term memory layer and 16 nodes in the hidden layer of the feed-forward network. The model training and testing were performed with separate data sets of 131 and 100 cases. The concordance correlation coefficient (95% CI) were 0.561 (0.560 to 0.562) in the deep learning model, which was significantly larger than that in the response surface model (0.265 [0.263 to 0.266], P < 0.001). The deep learning model-predicted bispectral index during target-controlled infusion of propofol and remifentanil more accurately compared to the traditional model. The deep learning approach in anesthetic pharmacology seems promising because of its excellent performance and extensibility.

  3. Genetic risk prediction using a spatial autoregressive model with adaptive lasso.

    PubMed

    Wen, Yalu; Shen, Xiaoxi; Lu, Qing

    2018-05-31

    With rapidly evolving high-throughput technologies, studies are being initiated to accelerate the process toward precision medicine. The collection of the vast amounts of sequencing data provides us with great opportunities to systematically study the role of a deep catalog of sequencing variants in risk prediction. Nevertheless, the massive amount of noise signals and low frequencies of rare variants in sequencing data pose great analytical challenges on risk prediction modeling. Motivated by the development in spatial statistics, we propose a spatial autoregressive model with adaptive lasso (SARAL) for risk prediction modeling using high-dimensional sequencing data. The SARAL is a set-based approach, and thus, it reduces the data dimension and accumulates genetic effects within a single-nucleotide variant (SNV) set. Moreover, it allows different SNV sets having various magnitudes and directions of effect sizes, which reflects the nature of complex diseases. With the adaptive lasso implemented, SARAL can shrink the effects of noise SNV sets to be zero and, thus, further improve prediction accuracy. Through simulation studies, we demonstrate that, overall, SARAL is comparable to, if not better than, the genomic best linear unbiased prediction method. The method is further illustrated by an application to the sequencing data from the Alzheimer's Disease Neuroimaging Initiative. Copyright © 2018 John Wiley & Sons, Ltd.

  4. A deep learning-based multi-model ensemble method for cancer prediction.

    PubMed

    Xiao, Yawen; Wu, Jun; Lin, Zongli; Zhao, Xiaodong

    2018-01-01

    Cancer is a complex worldwide health problem associated with high mortality. With the rapid development of the high-throughput sequencing technology and the application of various machine learning methods that have emerged in recent years, progress in cancer prediction has been increasingly made based on gene expression, providing insight into effective and accurate treatment decision making. Thus, developing machine learning methods, which can successfully distinguish cancer patients from healthy persons, is of great current interest. However, among the classification methods applied to cancer prediction so far, no one method outperforms all the others. In this paper, we demonstrate a new strategy, which applies deep learning to an ensemble approach that incorporates multiple different machine learning models. We supply informative gene data selected by differential gene expression analysis to five different classification models. Then, a deep learning method is employed to ensemble the outputs of the five classifiers. The proposed deep learning-based multi-model ensemble method was tested on three public RNA-seq data sets of three kinds of cancers, Lung Adenocarcinoma, Stomach Adenocarcinoma and Breast Invasive Carcinoma. The test results indicate that it increases the prediction accuracy of cancer for all the tested RNA-seq data sets as compared to using a single classifier or the majority voting algorithm. By taking full advantage of different classifiers, the proposed deep learning-based multi-model ensemble method is shown to be accurate and effective for cancer prediction. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. Draft Genome Sequence of Pseudomonas pachastrellae Strain CCUG 46540T, a Deep-Sea Bacterium.

    PubMed

    Gomila, Margarita; Mulet, Magdalena; Lalucat, Jorge; García-Valdés, Elena

    2017-04-06

    Pseudomonas pachastrellae strain CCUG 46540 T (KMM 330 T ) was isolated from a deep-sea sponge specimen collected in the Philippine Sea at a depth of 750 m. The draft genome has an estimated size of 4.0 Mb, exhibits a G+C content of 61.2 mol%, and is predicted to encode 3,592 proteins, including pathways for the degradation of aromatic compounds. Copyright © 2017 Gomila et al.

  6. Draft Genome Sequence of Pseudomonas pachastrellae Strain CCUG 46540T, a Deep-Sea Bacterium

    PubMed Central

    2017-01-01

    ABSTRACT Pseudomonas pachastrellae strain CCUG 46540T (KMM 330T) was isolated from a deep-sea sponge specimen collected in the Philippine Sea at a depth of 750 m. The draft genome has an estimated size of 4.0 Mb, exhibits a G+C content of 61.2 mol%, and is predicted to encode 3,592 proteins, including pathways for the degradation of aromatic compounds. PMID:28385850

  7. Classification of G-protein coupled receptors based on a rich generation of convolutional neural network, N-gram transformation and multiple sequence alignments.

    PubMed

    Li, Man; Ling, Cheng; Xu, Qi; Gao, Jingyang

    2018-02-01

    Sequence classification is crucial in predicting the function of newly discovered sequences. In recent years, the prediction of the incremental large-scale and diversity of sequences has heavily relied on the involvement of machine-learning algorithms. To improve prediction accuracy, these algorithms must confront the key challenge of extracting valuable features. In this work, we propose a feature-enhanced protein classification approach, considering the rich generation of multiple sequence alignment algorithms, N-gram probabilistic language model and the deep learning technique. The essence behind the proposed method is that if each group of sequences can be represented by one feature sequence, composed of homologous sites, there should be less loss when the sequence is rebuilt, when a more relevant sequence is added to the group. On the basis of this consideration, the prediction becomes whether a query sequence belonging to a group of sequences can be transferred to calculate the probability that the new feature sequence evolves from the original one. The proposed work focuses on the hierarchical classification of G-protein Coupled Receptors (GPCRs), which begins by extracting the feature sequences from the multiple sequence alignment results of the GPCRs sub-subfamilies. The N-gram model is then applied to construct the input vectors. Finally, these vectors are imported into a convolutional neural network to make a prediction. The experimental results elucidate that the proposed method provides significant performance improvements. The classification error rate of the proposed method is reduced by at least 4.67% (family level I) and 5.75% (family Level II), in comparison with the current state-of-the-art methods. The implementation program of the proposed work is freely available at: https://github.com/alanFchina/CNN .

  8. Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions

    PubMed Central

    2014-01-01

    Deep sequencing harnesses the high throughput nature of next generation sequencing technologies to generate population samples, treating information contained in individual reads as meaningful. Here, we review applications of deep sequencing to pathogen evolution. Pioneering deep sequencing studies from the virology literature are discussed, such as whole genome Roche-454 sequencing analyses of the dynamics of the rapidly mutating pathogens hepatitis C virus and HIV. Extension of the deep sequencing approach to bacterial populations is then discussed, including the impacts of emerging sequencing technologies. While it is clear that deep sequencing has unprecedented potential for assessing the genetic structure and evolutionary history of pathogen populations, bioinformatic challenges remain. We summarise current approaches to overcoming these challenges, in particular methods for detecting low frequency variants in the context of sequencing error and reconstructing individual haplotypes from short reads. PMID:24428920

  9. Deep learning of the regulatory grammar of yeast 5′ untranslated regions from 500,000 random sequences

    PubMed Central

    Groves, Benjamin; Kuchina, Anna; Rosenberg, Alexander B.; Jojic, Nebojsa; Fields, Stanley; Seelig, Georg

    2017-01-01

    Our ability to predict protein expression from DNA sequence alone remains poor, reflecting our limited understanding of cis-regulatory grammar and hampering the design of engineered genes for synthetic biology applications. Here, we generate a model that predicts the protein expression of the 5′ untranslated region (UTR) of mRNAs in the yeast Saccharomyces cerevisiae. We constructed a library of half a million 50-nucleotide-long random 5′ UTRs and assayed their activity in a massively parallel growth selection experiment. The resulting data allow us to quantify the impact on protein expression of Kozak sequence composition, upstream open reading frames (uORFs), and secondary structure. We trained a convolutional neural network (CNN) on the random library and showed that it performs well at predicting the protein expression of both a held-out set of the random 5′ UTRs as well as native S. cerevisiae 5′ UTRs. The model additionally was used to computationally evolve highly active 5′ UTRs. We confirmed experimentally that the great majority of the evolved sequences led to higher protein expression rates than the starting sequences, demonstrating the predictive power of this model. PMID:29097404

  10. Prediction of novel pre-microRNAs with high accuracy through boosting and SVM.

    PubMed

    Zhang, Yuanwei; Yang, Yifan; Zhang, Huan; Jiang, Xiaohua; Xu, Bo; Xue, Yu; Cao, Yunxia; Zhai, Qian; Zhai, Yong; Xu, Mingqing; Cooke, Howard J; Shi, Qinghua

    2011-05-15

    High-throughput deep-sequencing technology has generated an unprecedented number of expressed short sequence reads, presenting not only an opportunity but also a challenge for prediction of novel microRNAs. To verify the existence of candidate microRNAs, we have to show that these short sequences can be processed from candidate pre-microRNAs. However, it is laborious and time consuming to verify these using existing experimental techniques. Therefore, here, we describe a new method, miRD, which is constructed using two feature selection strategies based on support vector machines (SVMs) and boosting method. It is a high-efficiency tool for novel pre-microRNA prediction with accuracy up to 94.0% among different species. miRD is implemented in PHP/PERL+MySQL+R and can be freely accessed at http://mcg.ustc.edu.cn/rpg/mird/mird.php.

  11. Modeling and prediction of peptide drift times in ion mobility spectrometry using sequence-based and structure-based approaches.

    PubMed

    Zhang, Yiming; Jin, Quan; Wang, Shuting; Ren, Ren

    2011-05-01

    The mobile behavior of 1481 peptides in ion mobility spectrometry (IMS), which are generated by protease digestion of the Drosophila melanogaster proteome, is modeled and predicted based on two different types of characterization methods, i.e. sequence-based approach and structure-based approach. In this procedure, the sequence-based approach considers both the amino acid composition of a peptide and the local environment profile of each amino acid in the peptide; the structure-based approach is performed with the CODESSA protocol, which regards a peptide as a common organic compound and generates more than 200 statistically significant variables to characterize the whole structure profile of a peptide molecule. Subsequently, the nonlinear support vector machine (SVM) and Gaussian process (GP) as well as linear partial least squares (PLS) regression is employed to correlate the structural parameters of the characterizations with the IMS drift times of these peptides. The obtained quantitative structure-spectrum relationship (QSSR) models are evaluated rigorously and investigated systematically via both one-deep and two-deep cross-validations as well as the rigorous Monte Carlo cross-validation (MCCV). We also give a comprehensive comparison on the resulting statistics arising from the different combinations of variable types with modeling methods and find that the sequence-based approach can give the QSSR models with better fitting ability and predictive power but worse interpretability than the structure-based approach. In addition, though the QSSR modeling using sequence-based approach is not needed for the preparation of the minimization structures of peptides before the modeling, it would be considerably efficient as compared to that using structure-based approach. Copyright © 2011 Elsevier Ltd. All rights reserved.

  12. Identification of novel microRNAs in Hevea brasiliensis and computational prediction of their targets

    PubMed Central

    2012-01-01

    Background Plants respond to external stimuli through fine regulation of gene expression partially ensured by small RNAs. Of these, microRNAs (miRNAs) play a crucial role. They negatively regulate gene expression by targeting the cleavage or translational inhibition of target messenger RNAs (mRNAs). In Hevea brasiliensis, environmental and harvesting stresses are known to affect natural rubber production. This study set out to identify abiotic stress-related miRNAs in Hevea using next-generation sequencing and bioinformatic analysis. Results Deep sequencing of small RNAs was carried out on plantlets subjected to severe abiotic stress using the Solexa technique. By combining the LeARN pipeline, data from the Plant microRNA database (PMRD) and Hevea EST sequences, we identified 48 conserved miRNA families already characterized in other plant species, and 10 putatively novel miRNA families. The results showed the most abundant size for miRNAs to be 24 nucleotides, except for seven families. Several MIR genes produced both 20-22 nucleotides and 23-27 nucleotides. The two miRNA class sizes were detected for both conserved and putative novel miRNA families, suggesting their functional duality. The EST databases were scanned with conserved and novel miRNA sequences. MiRNA targets were computationally predicted and analysed. The predicted targets involved in "responses to stimuli" and to "antioxidant" and "transcription activities" are presented. Conclusions Deep sequencing of small RNAs combined with transcriptomic data is a powerful tool for identifying conserved and novel miRNAs when the complete genome is not yet available. Our study provided additional information for evolutionary studies and revealed potentially specific regulation of the control of redox status in Hevea. PMID:22330773

  13. Sensitive Deep-Sequencing-Based HIV-1 Genotyping Assay To Simultaneously Determine Susceptibility to Protease, Reverse Transcriptase, Integrase, and Maturation Inhibitors, as Well as HIV-1 Coreceptor Tropism

    PubMed Central

    Gibson, Richard M.; Meyer, Ashley M.; Winner, Dane; Archer, John; Feyertag, Felix; Ruiz-Mateos, Ezequiel; Leal, Manuel; Robertson, David L.; Schmotzer, Christine L.

    2014-01-01

    With 29 individual antiretroviral drugs available from six classes that are approved for the treatment of HIV-1 infection, a combination of different phenotypic and genotypic tests is currently needed to monitor HIV-infected individuals. In this study, we developed a novel HIV-1 genotypic assay based on deep sequencing (DeepGen HIV) to simultaneously assess HIV-1 susceptibilities to all drugs targeting the three viral enzymes and to predict HIV-1 coreceptor tropism. Patient-derived gag-p2/NCp7/p1/p6/pol-PR/RT/IN- and env-C2V3 PCR products were sequenced using the Ion Torrent Personal Genome Machine. Reads spanning the 3′ end of the Gag, protease (PR), reverse transcriptase (RT), integrase (IN), and V3 regions were extracted, truncated, translated, and assembled for genotype and HIV-1 coreceptor tropism determination. DeepGen HIV consistently detected both minority drug-resistant viruses and non-R5 HIV-1 variants from clinical specimens with viral loads of ≥1,000 copies/ml and from B and non-B subtypes. Additional mutations associated with resistance to PR, RT, and IN inhibitors, previously undetected by standard (Sanger) population sequencing, were reliably identified at frequencies as low as 1%. DeepGen HIV results correlated with phenotypic (original Trofile, 92%; enhanced-sensitivity Trofile assay [ESTA], 80%; TROCAI, 81%; and VeriTrop, 80%) and genotypic (population sequencing/Geno2Pheno with a 10% false-positive rate [FPR], 84%) HIV-1 tropism test results. DeepGen HIV (83%) and Trofile (85%) showed similar concordances with the clinical response following an 8-day course of maraviroc monotherapy (MCT). In summary, this novel all-inclusive HIV-1 genotypic and coreceptor tropism assay, based on deep sequencing of the PR, RT, IN, and V3 regions, permits simultaneous multiplex detection of low-level drug-resistant and/or non-R5 viruses in up to 96 clinical samples. This comprehensive test, the first of its class, will be instrumental in the development of new antiretroviral drugs and, more importantly, will aid in the treatment and management of HIV-infected individuals. PMID:24468782

  14. Identification and profiling of novel microRNAs in the Brassica rapa genome based on small RNA deep sequencing

    PubMed Central

    2012-01-01

    Background MicroRNAs (miRNAs) are one of the functional non-coding small RNAs involved in the epigenetic control of the plant genome. Although plants contain both evolutionary conserved miRNAs and species-specific miRNAs within their genomes, computational methods often only identify evolutionary conserved miRNAs. The recent sequencing of the Brassica rapa genome enables us to identify miRNAs and their putative target genes. In this study, we sought to provide a more comprehensive prediction of B. rapa miRNAs based on high throughput small RNA deep sequencing. Results We sequenced small RNAs from five types of tissue: seedlings, roots, petioles, leaves, and flowers. By analyzing 2.75 million unique reads that mapped to the B. rapa genome, we identified 216 novel and 196 conserved miRNAs that were predicted to target approximately 20% of the genome’s protein coding genes. Quantitative analysis of miRNAs from the five types of tissue revealed that novel miRNAs were expressed in diverse tissues but their expression levels were lower than those of the conserved miRNAs. Comparative analysis of the miRNAs between the B. rapa and Arabidopsis thaliana genomes demonstrated that redundant copies of conserved miRNAs in the B. rapa genome may have been deleted after whole genome triplication. Novel miRNA members seemed to have spontaneously arisen from the B. rapa and A. thaliana genomes, suggesting the species-specific expansion of miRNAs. We have made this data publicly available in a miRNA database of B. rapa called BraMRs. The database allows the user to retrieve miRNA sequences, their expression profiles, and a description of their target genes from the five tissue types investigated here. Conclusions This is the first report to identify novel miRNAs from Brassica crops using genome-wide high throughput techniques. The combination of computational methods and small RNA deep sequencing provides robust predictions of miRNAs in the genome. The finding of numerous novel miRNAs, many with few target genes and low expression levels, suggests the rapid evolution of miRNA genes. The development of a miRNA database, BraMRs, enables us to integrate miRNA identification, target prediction, and functional annotation of target genes. BraMRs will represent a valuable public resource with which to study the epigenetic control of B. rapa and other closely related Brassica species. The database is available at the following link: http://bramrs.rna.kr [1]. PMID:23163954

  15. Predicting residue-wise contact orders in proteins by support vector regression.

    PubMed

    Song, Jiangning; Burrage, Kevin

    2006-10-03

    The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.

  16. Discriminative Prediction of A-To-I RNA Editing Events from DNA Sequence

    PubMed Central

    Sun, Jiangming; Singh, Pratibha; Bagge, Annika; Valtat, Bérengère; Vikman, Petter; Spégel, Peter; Mulder, Hindrik

    2016-01-01

    RNA editing is a post-transcriptional alteration of RNA sequences that, via insertions, deletions or base substitutions, can affect protein structure as well as RNA and protein expression. Recently, it has been suggested that RNA editing may be more frequent than previously thought. A great impediment, however, to a deeper understanding of this process is the paramount sequencing effort that needs to be undertaken to identify RNA editing events. Here, we describe an in silico approach, based on machine learning, that ameliorates this problem. Using 41 nucleotide long DNA sequences, we show that novel A-to-I RNA editing events can be predicted from known A-to-I RNA editing events intra- and interspecies. The validity of the proposed method was verified in an independent experimental dataset. Using our approach, 203 202 putative A-to-I RNA editing events were predicted in the whole human genome. Out of these, 9% were previously reported. The remaining sites require further validation, e.g., by targeted deep sequencing. In conclusion, the approach described here is a useful tool to identify potential A-to-I RNA editing events without the requirement of extensive RNA sequencing. PMID:27764195

  17. Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks.

    PubMed

    Umarov, Ramzan Kh; Solovyev, Victor V

    2017-01-01

    Accurate computational identification of promoters remains a challenge as these key DNA regulatory regions have variable structures composed of functional motifs that provide gene-specific initiation of transcription. In this paper we utilize Convolutional Neural Networks (CNN) to analyze sequence characteristics of prokaryotic and eukaryotic promoters and build their predictive models. We trained a similar CNN architecture on promoters of five distant organisms: human, mouse, plant (Arabidopsis), and two bacteria (Escherichia coli and Bacillus subtilis). We found that CNN trained on sigma70 subclass of Escherichia coli promoter gives an excellent classification of promoters and non-promoter sequences (Sn = 0.90, Sp = 0.96, CC = 0.84). The Bacillus subtilis promoters identification CNN model achieves Sn = 0.91, Sp = 0.95, and CC = 0.86. For human, mouse and Arabidopsis promoters we employed CNNs for identification of two well-known promoter classes (TATA and non-TATA promoters). CNN models nicely recognize these complex functional regions. For human promoters Sn/Sp/CC accuracy of prediction reached 0.95/0.98/0,90 on TATA and 0.90/0.98/0.89 for non-TATA promoter sequences, respectively. For Arabidopsis we observed Sn/Sp/CC 0.95/0.97/0.91 (TATA) and 0.94/0.94/0.86 (non-TATA) promoters. Thus, the developed CNN models, implemented in CNNProm program, demonstrated the ability of deep learning approach to grasp complex promoter sequence characteristics and achieve significantly higher accuracy compared to the previously developed promoter prediction programs. We also propose random substitution procedure to discover positionally conserved promoter functional elements. As the suggested approach does not require knowledge of any specific promoter features, it can be easily extended to identify promoters and other complex functional regions in sequences of many other and especially newly sequenced genomes. The CNNProm program is available to run at web server http://www.softberry.com.

  18. DeepBase: annotation and discovery of microRNAs and other noncoding RNAs from deep-sequencing data.

    PubMed

    Yang, Jian-Hua; Qu, Liang-Hu

    2012-01-01

    Recent advances in high-throughput deep-sequencing technology have produced large numbers of short and long RNA sequences and enabled the detection and profiling of known and novel microRNAs (miRNAs) and other noncoding RNAs (ncRNAs) at unprecedented sensitivity and depth. In this chapter, we describe the use of deepBase, a database that we have developed to integrate all public deep-sequencing data and to facilitate the comprehensive annotation and discovery of miRNAs and other ncRNAs from these data. deepBase provides an integrative, interactive, and versatile web graphical interface to evaluate miRBase-annotated miRNA genes and other known ncRNAs, explores the expression patterns of miRNAs and other ncRNAs, and discovers novel miRNAs and other ncRNAs from deep-sequencing data. deepBase also provides a deepView genome browser to comparatively analyze these data at multiple levels. deepBase is available at http://deepbase.sysu.edu.cn/.

  19. Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks

    NASA Astrophysics Data System (ADS)

    Wang, Yiheng; Liu, Tong; Xu, Dong; Shi, Huidong; Zhang, Chaoyang; Mo, Yin-Yuan; Wang, Zheng

    2016-01-01

    The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named “DeepMethyl” to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.

  20. Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks.

    PubMed

    Wang, Yiheng; Liu, Tong; Xu, Dong; Shi, Huidong; Zhang, Chaoyang; Mo, Yin-Yuan; Wang, Zheng

    2016-01-22

    The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named "DeepMethyl" to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.

  1. Making sense of deep sequencing

    PubMed Central

    Goldman, D.; Domschke, K.

    2016-01-01

    This review, the first of an occasional series, tries to make sense of the concepts and uses of deep sequencing of polynucleic acids (DNA and RNA). Deep sequencing, synonymous with next-generation sequencing, high-throughput sequencing and massively parallel sequencing, includes whole genome sequencing but is more often and diversely applied to specific parts of the genome captured in different ways, for example the highly expressed portion of the genome known as the exome and portions of the genome that are epigenetically marked either by DNA methylation, the binding of proteins including histones, or that are in different configurations and thus more or less accessible to enzymes that cleave DNA. Deep sequencing of RNA (RNASeq) reverse-transcribed to complementary DNA is invaluable for measuring RNA expression and detecting changes in RNA structure. Important concepts in deep sequencing include the length and depth of sequence reads, mapping and assembly of reads, sequencing error, haplotypes, and the propensity of deep sequencing, as with other types of ‘big data’, to generate large numbers of errors, requiring monitoring for methodologic biases and strategies for replication and validation. Deep sequencing yields a unique genetic fingerprint that can be used to identify a person, and a trove of predictors of genetic medical diseases. Deep sequencing to identify epigenetic events including changes in DNA methylation and RNA expression can reveal the history and impact of environmental exposures. Because of the power of sequencing to identify and deliver biomedically significant information about a person and their blood relatives, it creates ethical dilemmas and practical challenges in research and clinical care, for example the decision and procedures to report incidental findings that will increasingly and frequently be discovered. PMID:24925306

  2. An introduction to deep learning on biological sequence data: examples and solutions.

    PubMed

    Jurtz, Vanessa Isabell; Johansen, Alexander Rosenberg; Nielsen, Morten; Almagro Armenteros, Jose Juan; Nielsen, Henrik; Sønderby, Casper Kaae; Winther, Ole; Sønderby, Søren Kaae

    2017-11-15

    Deep neural network architectures such as convolutional and long short-term memory networks have become increasingly popular as machine learning tools during the recent years. The availability of greater computational resources, more data, new algorithms for training deep models and easy to use libraries for implementation and training of neural networks are the drivers of this development. The use of deep learning has been especially successful in image recognition; and the development of tools, applications and code examples are in most cases centered within this field rather than within biology. Here, we aim to further the development of deep learning methods within biology by providing application examples and ready to apply and adapt code templates. Given such examples, we illustrate how architectures consisting of convolutional and long short-term memory neural networks can relatively easily be designed and trained to state-of-the-art performance on three biological sequence problems: prediction of subcellular localization, protein secondary structure and the binding of peptides to MHC Class II molecules. All implementations and datasets are available online to the scientific community at https://github.com/vanessajurtz/lasagne4bio. skaaesonderby@gmail.com. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  3. Characterization of skin ulceration syndrome associated microRNAs in sea cucumber Apostichopus japonicus by deep sequencing.

    PubMed

    Li, Chenghua; Feng, Weida; Qiu, Lihua; Xia, Changge; Su, Xiurong; Jin, Chunhua; Zhou, Tingting; Zeng, Yuan; Li, Taiwu

    2012-08-01

    MicroRNAs (miRNAs) constitute a family of small RNA species which have been demonstrated to be one of key effectors in mediating host-pathogen interaction. In this study, two haemocytes miRNA libraries were constructed with deep sequenced by illumina Hiseq2000 from healthy (L1) and skin ulceration syndrome Apostichopus japonicus (L2). The high throughput solexa sequencing resulted in 9,579,038 and 7,742,558 clean data from L1 and L2, respectively. Sequences analysis revealed that 40 conserved miRNAs were found in both libraries, in which let-7 and mir-125 were speculated to be clustered together and expressed accordingly. Eighty-six miRNA candidates were also identified by reference genome search and stem-loop structure prediction. Importantly, mir-31 and mir-2008 displayed significant differential expression between the two libraries according to FPKM model, which might be considered as promising targets for elucidating the intrinsic mechanism of skin ulceration syndrome outbreak in the species. Copyright © 2012 Elsevier Ltd. All rights reserved.

  4. Magnetic resonance imaging in local staging of endometrial carcinoma: diagnostic performance, pitfalls, and literature review.

    PubMed

    Zandrino, Franco; La Paglia, Ernesto; Musante, Francesco

    2010-01-01

    To assess the diagnostic accuracy of magnetic resonance imaging in local staging of endometrial carcinoma, and to review the results and pitfalls described in the literature. Thirty women with a histological diagnosis of endometrial carcinoma underwent magnetic resonance imaging. Unenhanced T2-weighted and dynamic contrast-enhanced Ti-weighted sequences were obtained. Hysterectomy and salpingo-oophorectomy was performed in all patients. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy were calculated for the detection of deep myometrial and cervical infiltration. For deep myometrial infiltration T2-weighted sequences reached a sensitivity of 85%, specificity of 76%, PPV of 73%, NVP of 87%, and accuracy of 80%, while contrast-enhanced scans reached a sensitivity of 90%, specificity of 80%, PPV of 82%, NPV of 89%, and accuracy of 85%. For cervical infiltration T2-weighted sequences reached a sensitivity of 75%, specificity of 88%, PPV of 50%, NPV of 96%, and accuracy of 87%, while contrast-enhanced scans reached a sensitivity of 100%, specificity of 94%, PPV of 75%, NPV of 100%, and accuracy of 95%. Unenhanced and dynamic gadolinium-enhanced magnetic resonance allows accurate assessment of myometrial and cervical infiltration. Information provided by magnetic resonance imaging can define prognosis and management.

  5. Probing the Rare Biosphere of the North-West Mediterranean Sea: An Experiment with High Sequencing Effort.

    PubMed

    Crespo, Bibiana G; Wallhead, Philip J; Logares, Ramiro; Pedrós-Alió, Carlos

    2016-01-01

    High-throughput sequencing (HTS) techniques have suggested the existence of a wealth of species with very low relative abundance: the rare biosphere. We attempted to exhaustively map this rare biosphere in two water samples by performing an exceptionally deep pyrosequencing analysis (~500,000 final reads per sample). Species data were derived by a 97% identity criterion and various parametric distributions were fitted to the observed counts. Using the best-fitting Sichel distribution we estimate a total species richness of 1,568-1,669 (95% Credible Interval) and 5,027-5,196 for surface and deep water samples respectively, implying that 84-89% of the total richness in those two samples was sequenced, and we predict that a quadrupling of the present sequencing effort would suffice to observe 90% of the total richness in both samples. Comparing the HTS results with a culturing approach we found that most of the cultured taxa were not obtained by HTS, despite the high sequencing effort. Culturing therefore remains a useful tool for uncovering marine bacterial diversity, in addition to its other uses for studying the ecology of marine bacteria.

  6. Deep intronic GPR143 mutation in a Japanese family with ocular albinism

    PubMed Central

    Naruto, Takuya; Okamoto, Nobuhiko; Masuda, Kiyoshi; Endo, Takao; Hatsukawa, Yoshikazu; Kohmoto, Tomohiro; Imoto, Issei

    2015-01-01

    Deep intronic mutations are often ignored as possible causes of human disease. Using whole-exome sequencing, we analysed genomic DNAs of a Japanese family with two male siblings affected by ocular albinism and congenital nystagmus. Although mutations or copy number alterations of coding regions were not identified in candidate genes, the novel intronic mutation c.659-131 T > G within GPR143 intron 5 was identified as hemizygous in affected siblings and as heterozygous in the unaffected mother. This mutation was predicted to create a cryptic splice donor site within intron 5 and activate a cryptic acceptor site at 41nt upstream, causing the insertion into the coding sequence of an out-of-frame 41-bp pseudoexon with a premature stop codon in the aberrant transcript, which was confirmed by minigene experiments. This result expands the mutational spectrum of GPR143 and suggests the utility of next-generation sequencing integrated with in silico and experimental analyses for improving the molecular diagnosis of this disease. PMID:26061757

  7. Deep intronic GPR143 mutation in a Japanese family with ocular albinism.

    PubMed

    Naruto, Takuya; Okamoto, Nobuhiko; Masuda, Kiyoshi; Endo, Takao; Hatsukawa, Yoshikazu; Kohmoto, Tomohiro; Imoto, Issei

    2015-06-10

    Deep intronic mutations are often ignored as possible causes of human disease. Using whole-exome sequencing, we analysed genomic DNAs of a Japanese family with two male siblings affected by ocular albinism and congenital nystagmus. Although mutations or copy number alterations of coding regions were not identified in candidate genes, the novel intronic mutation c.659-131 T > G within GPR143 intron 5 was identified as hemizygous in affected siblings and as heterozygous in the unaffected mother. This mutation was predicted to create a cryptic splice donor site within intron 5 and activate a cryptic acceptor site at 41nt upstream, causing the insertion into the coding sequence of an out-of-frame 41-bp pseudoexon with a premature stop codon in the aberrant transcript, which was confirmed by minigene experiments. This result expands the mutational spectrum of GPR143 and suggests the utility of next-generation sequencing integrated with in silico and experimental analyses for improving the molecular diagnosis of this disease.

  8. Protein model discrimination using mutational sensitivity derived from deep sequencing.

    PubMed

    Adkar, Bharat V; Tripathi, Arti; Sahoo, Anusmita; Bajaj, Kanika; Goswami, Devrishi; Chakrabarti, Purbani; Swarnkar, Mohit K; Gokhale, Rajesh S; Varadarajan, Raghavan

    2012-02-08

    A major bottleneck in protein structure prediction is the selection of correct models from a pool of decoys. Relative activities of ∼1,200 individual single-site mutants in a saturation library of the bacterial toxin CcdB were estimated by determining their relative populations using deep sequencing. This phenotypic information was used to define an empirical score for each residue (RankScore), which correlated with the residue depth, and identify active-site residues. Using these correlations, ∼98% of correct models of CcdB (RMSD ≤ 4Å) were identified from a large set of decoys. The model-discrimination methodology was further validated on eleven different monomeric proteins using simulated RankScore values. The methodology is also a rapid, accurate way to obtain relative activities of each mutant in a large pool and derive sequence-structure-function relationships without protein isolation or characterization. It can be applied to any system in which mutational effects can be monitored by a phenotypic readout. Copyright © 2012 Elsevier Ltd. All rights reserved.

  9. Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding.

    PubMed

    Min, Xu; Zeng, Wanwen; Chen, Ning; Chen, Ting; Jiang, Rui

    2017-07-15

    Experimental techniques for measuring chromatin accessibility are expensive and time consuming, appealing for the development of computational approaches to predict open chromatin regions from DNA sequences. Along this direction, existing methods fall into two classes: one based on handcrafted k -mer features and the other based on convolutional neural networks. Although both categories have shown good performance in specific applications thus far, there still lacks a comprehensive framework to integrate useful k -mer co-occurrence information with recent advances in deep learning. We fill this gap by addressing the problem of chromatin accessibility prediction with a convolutional Long Short-Term Memory (LSTM) network with k -mer embedding. We first split DNA sequences into k -mers and pre-train k -mer embedding vectors based on the co-occurrence matrix of k -mers by using an unsupervised representation learning approach. We then construct a supervised deep learning architecture comprised of an embedding layer, three convolutional layers and a Bidirectional LSTM (BLSTM) layer for feature learning and classification. We demonstrate that our method gains high-quality fixed-length features from variable-length sequences and consistently outperforms baseline methods. We show that k -mer embedding can effectively enhance model performance by exploring different embedding strategies. We also prove the efficacy of both the convolution and the BLSTM layers by comparing two variations of the network architecture. We confirm the robustness of our model to hyper-parameters by performing sensitivity analysis. We hope our method can eventually reinforce our understanding of employing deep learning in genomic studies and shed light on research regarding mechanisms of chromatin accessibility. The source code can be downloaded from https://github.com/minxueric/ismb2017_lstm . tingchen@tsinghua.edu.cn or ruijiang@tsinghua.edu.cn. Supplementary materials are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  10. Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding

    PubMed Central

    Min, Xu; Zeng, Wanwen; Chen, Ning; Chen, Ting; Jiang, Rui

    2017-01-01

    Abstract Motivation: Experimental techniques for measuring chromatin accessibility are expensive and time consuming, appealing for the development of computational approaches to predict open chromatin regions from DNA sequences. Along this direction, existing methods fall into two classes: one based on handcrafted k-mer features and the other based on convolutional neural networks. Although both categories have shown good performance in specific applications thus far, there still lacks a comprehensive framework to integrate useful k-mer co-occurrence information with recent advances in deep learning. Results: We fill this gap by addressing the problem of chromatin accessibility prediction with a convolutional Long Short-Term Memory (LSTM) network with k-mer embedding. We first split DNA sequences into k-mers and pre-train k-mer embedding vectors based on the co-occurrence matrix of k-mers by using an unsupervised representation learning approach. We then construct a supervised deep learning architecture comprised of an embedding layer, three convolutional layers and a Bidirectional LSTM (BLSTM) layer for feature learning and classification. We demonstrate that our method gains high-quality fixed-length features from variable-length sequences and consistently outperforms baseline methods. We show that k-mer embedding can effectively enhance model performance by exploring different embedding strategies. We also prove the efficacy of both the convolution and the BLSTM layers by comparing two variations of the network architecture. We confirm the robustness of our model to hyper-parameters by performing sensitivity analysis. We hope our method can eventually reinforce our understanding of employing deep learning in genomic studies and shed light on research regarding mechanisms of chromatin accessibility. Availability and implementation: The source code can be downloaded from https://github.com/minxueric/ismb2017_lstm. Contact: tingchen@tsinghua.edu.cn or ruijiang@tsinghua.edu.cn Supplementary information: Supplementary materials are available at Bioinformatics online. PMID:28881969

  11. Exploring fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing

    NASA Astrophysics Data System (ADS)

    Zhang, Xiao-Yong; Wang, Guang-Hua; Xu, Xin-Ya; Nong, Xu-Hua; Wang, Jie; Amin, Muhammad; Qi, Shu-Hua

    2016-10-01

    The present study investigated the fungal diversity in four different deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing of the nuclear ribosomal internal transcribed spacer-1 (ITS1). A total of 40,297 fungal ITS1 sequences clustered into 420 operational taxonomic units (OTUs) with 97% sequence similarity and 170 taxa were recovered from these sediments. Most ITS1 sequences (78%) belonged to the phylum Ascomycota, followed by Basidiomycota (17.3%), Zygomycota (1.5%) and Chytridiomycota (0.8%), and a small proportion (2.4%) belonged to unassigned fungal phyla. Compared with previous studies on fungal diversity of sediments from deep-sea environments by culture-dependent approach and clone library analysis, the present result suggested that Illumina sequencing had been dramatically accelerating the discovery of fungal community of deep-sea sediments. Furthermore, our results revealed that Sordariomycetes was the most diverse and abundant fungal class in this study, challenging the traditional view that the diversity of Sordariomycetes phylotypes was low in the deep-sea environments. In addition, more than 12 taxa accounted for 21.5% sequences were found to be rarely reported as deep-sea fungi, suggesting the deep-sea sediments from Okinawa Trough harbored a plethora of different fungal communities compared with other deep-sea environments. To our knowledge, this study is the first exploration of the fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing.

  12. Sequence2Vec: a novel embedding approach for modeling transcription factor binding affinity landscape.

    PubMed

    Dai, Hanjun; Umarov, Ramzan; Kuwahara, Hiroyuki; Li, Yu; Song, Le; Gao, Xin

    2017-11-15

    An accurate characterization of transcription factor (TF)-DNA affinity landscape is crucial to a quantitative understanding of the molecular mechanisms underpinning endogenous gene regulation. While recent advances in biotechnology have brought the opportunity for building binding affinity prediction methods, the accurate characterization of TF-DNA binding affinity landscape still remains a challenging problem. Here we propose a novel sequence embedding approach for modeling the transcription factor binding affinity landscape. Our method represents DNA binding sequences as a hidden Markov model which captures both position specific information and long-range dependency in the sequence. A cornerstone of our method is a novel message passing-like embedding algorithm, called Sequence2Vec, which maps these hidden Markov models into a common nonlinear feature space and uses these embedded features to build a predictive model. Our method is a novel combination of the strength of probabilistic graphical models, feature space embedding and deep learning. We conducted comprehensive experiments on over 90 large-scale TF-DNA datasets which were measured by different high-throughput experimental technologies. Sequence2Vec outperforms alternative machine learning methods as well as the state-of-the-art binding affinity prediction methods. Our program is freely available at https://github.com/ramzan1990/sequence2vec. xin.gao@kaust.edu.sa or lsong@cc.gatech.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  13. Predictive value of the composition of the vaginal microbiota in bacterial vaginosis, a dynamic study to identify recurrence-related flora.

    PubMed

    Xiao, Bingbing; Niu, Xiaoxi; Han, Na; Wang, Ben; Du, Pengcheng; Na, Risu; Chen, Chen; Liao, Qinping

    2016-06-02

    Bacterial vaginosis (BV) is a highly prevalent disease in women, and increases the risk of pelvic inflammatory disease. It has been given wide attention because of the high recurrence rate. Traditional diagnostic methods based on microscope providing limited information on the vaginal microbiota increase the difficulty in tracing the development of the disease in bacteria resistance condition. In this study, we used deep-sequencing technology to observe dynamic variation of the vaginal microbiota at three major time points during treatment, at D0 (before treatment), D7 (stop using the antibiotics) and D30 (the 30-day follow-up visit). Sixty-five patients with BV were enrolled (48 were cured and 17 were not cured), and their bacterial composition of the vaginal microbiota was compared. Interestingly, we identified 9 patients might be recurrence. We also introduced a new measurement point of D7, although its microbiota were significantly inhabited by antibiotic and hard to be observed by traditional method. The vaginal microbiota in deep-sequencing-view present a strong correlation to the final outcome. Thus, coupled with detailed individual bioinformatics analysis and deep-sequencing technology, we may illustrate a more accurate map of vaginal microbial to BV patients, which provide a new opportunity to reduce the rate of recurrence of BV.

  14. Oasis 2: improved online analysis of small RNA-seq data.

    PubMed

    Rahman, Raza-Ur; Gautam, Abhivyakti; Bethune, Jörn; Sattar, Abdul; Fiosins, Maksims; Magruder, Daniel Sumner; Capece, Vincenzo; Shomroni, Orr; Bonn, Stefan

    2018-02-14

    Small RNA molecules play important roles in many biological processes and their dysregulation or dysfunction can cause disease. The current method of choice for genome-wide sRNA expression profiling is deep sequencing. Here we present Oasis 2, which is a new main release of the Oasis web application for the detection, differential expression, and classification of small RNAs in deep sequencing data. Compared to its predecessor Oasis, Oasis 2 features a novel and speed-optimized sRNA detection module that supports the identification of small RNAs in any organism with higher accuracy. Next to the improved detection of small RNAs in a target organism, the software now also recognizes potential cross-species miRNAs and viral and bacterial sRNAs in infected samples. In addition, novel miRNAs can now be queried and visualized interactively, providing essential information for over 700 high-quality miRNA predictions across 14 organisms. Robust biomarker signatures can now be obtained using the novel enhanced classification module. Oasis 2 enables biologists and medical researchers to rapidly analyze and query small RNA deep sequencing data with improved precision, recall, and speed, in an interactive and user-friendly environment. Oasis 2 is implemented in Java, J2EE, mysql, Python, R, PHP and JavaScript. It is freely available at https://oasis.dzne.de.

  15. Identification of microRNA-like RNAs from Curvularia lunata associated with maize leaf spot by bioinformation analysis and deep sequencing.

    PubMed

    Liu, Tong; Hu, John; Zuo, Yuhu; Jin, Yazhong; Hou, Jumei

    2016-04-01

    Deep sequencing of small RNAs is a useful tool to identify novel small RNAs that may be involved in fungal growth and pathogenesis. In this study, we used HiSeq deep sequencing to identify 747,487 unique small RNAs from Curvularia lunata. Among these small RNAs were 1012 microRNA-like RNAs (milRNAs), which are similar to other known microRNAs, and 48 potential novel milRNAs without homologs in other organisms have been identified using the miRBase© database. We used quantitative PCR to analyze the expression of four of these milRNAs from C. lunata at different developmental stages. The analysis revealed several changes associated with germinating conidia and mycelial growth, suggesting that these milRNAs may play a role in pathogen infection and mycelial growth. A total of 8334 target mRNAs for the 1012 milRNAs that were identified, and 256 target mRNAs for the 48 novel milRNAs were predicted by computational analysis. These target mRNAs of milRNAs were also performed by gene ontology and Kyoto Encyclopedia of Genes and Genomes pathway analysis. To our knowledge, this study is the first report of C. lunata's milRNA profiles. This information will provide a better understanding of pathogen development and infection mechanism.

  16. A deep learning framework for causal shape transformation.

    PubMed

    Lore, Kin Gwn; Stoecklein, Daniel; Davies, Michael; Ganapathysubramanian, Baskar; Sarkar, Soumik

    2018-02-01

    Recurrent neural network (RNN) and Long Short-term Memory (LSTM) networks are the common go-to architecture for exploiting sequential information where the output is dependent on a sequence of inputs. However, in most considered problems, the dependencies typically lie in the latent domain which may not be suitable for applications involving the prediction of a step-wise transformation sequence that is dependent on the previous states only in the visible domain with a known terminal state. We propose a hybrid architecture of convolution neural networks (CNN) and stacked autoencoders (SAE) to learn a sequence of causal actions that nonlinearly transform an input visual pattern or distribution into a target visual pattern or distribution with the same support and demonstrated its practicality in a real-world engineering problem involving the physics of fluids. We solved a high-dimensional one-to-many inverse mapping problem concerning microfluidic flow sculpting, where the use of deep learning methods as an inverse map is very seldom explored. This work serves as a fruitful use-case to applied scientists and engineers in how deep learning can be beneficial as a solution for high-dimensional physical problems, and potentially opening doors to impactful advance in fields such as material sciences and medical biology where multistep topological transformations is a key element. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. Uniform, optimal signal processing of mapped deep-sequencing data.

    PubMed

    Kumar, Vibhor; Muratani, Masafumi; Rayan, Nirmala Arul; Kraus, Petra; Lufkin, Thomas; Ng, Huck Hui; Prabhakar, Shyam

    2013-07-01

    Despite their apparent diversity, many problems in the analysis of high-throughput sequencing data are merely special cases of two general problems, signal detection and signal estimation. Here we adapt formally optimal solutions from signal processing theory to analyze signals of DNA sequence reads mapped to a genome. We describe DFilter, a detection algorithm that identifies regulatory features in ChIP-seq, DNase-seq and FAIRE-seq data more accurately than assay-specific algorithms. We also describe EFilter, an estimation algorithm that accurately predicts mRNA levels from as few as 1-2 histone profiles (R ∼0.9). Notably, the presence of regulatory motifs in promoters correlates more with histone modifications than with mRNA levels, suggesting that histone profiles are more predictive of cis-regulatory mechanisms. We show by applying DFilter and EFilter to embryonic forebrain ChIP-seq data that regulatory protein identification and functional annotation are feasible despite tissue heterogeneity. The mathematical formalism underlying our tools facilitates integrative analysis of data from virtually any sequencing-based functional profile.

  18. Quantitative phenotyping via deep barcode sequencing.

    PubMed

    Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey

    2009-10-01

    Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.

  19. Deep Sequencing to Identify the Causes of Viral Encephalitis

    PubMed Central

    Chan, Benjamin K.; Wilson, Theodore; Fischer, Kael F.; Kriesel, John D.

    2014-01-01

    Deep sequencing allows for a rapid, accurate characterization of microbial DNA and RNA sequences in many types of samples. Deep sequencing (also called next generation sequencing or NGS) is being developed to assist with the diagnosis of a wide variety of infectious diseases. In this study, seven frozen brain samples from deceased subjects with recent encephalitis were investigated. RNA from each sample was extracted, randomly reverse transcribed and sequenced. The sequence analysis was performed in a blinded fashion and confirmed with pathogen-specific PCR. This analysis successfully identified measles virus sequences in two brain samples and herpes simplex virus type-1 sequences in three brain samples. No pathogen was identified in the other two brain specimens. These results were concordant with pathogen-specific PCR and partially concordant with prior neuropathological examinations, demonstrating that deep sequencing can accurately identify viral infections in frozen brain tissue. PMID:24699691

  20. Extracting features from protein sequences to improve deep extreme learning machine for protein fold recognition.

    PubMed

    Ibrahim, Wisam; Abadeh, Mohammad Saniee

    2017-05-21

    Protein fold recognition is an important problem in bioinformatics to predict three-dimensional structure of a protein. One of the most challenging tasks in protein fold recognition problem is the extraction of efficient features from the amino-acid sequences to obtain better classifiers. In this paper, we have proposed six descriptors to extract features from protein sequences. These descriptors are applied in the first stage of a three-stage framework PCA-DELM-LDA to extract feature vectors from the amino-acid sequences. Principal Component Analysis PCA has been implemented to reduce the number of extracted features. The extracted feature vectors have been used with original features to improve the performance of the Deep Extreme Learning Machine DELM in the second stage. Four new features have been extracted from the second stage and used in the third stage by Linear Discriminant Analysis LDA to classify the instances into 27 folds. The proposed framework is implemented on the independent and combined feature sets in SCOP datasets. The experimental results show that extracted feature vectors in the first stage could improve the performance of DELM in extracting new useful features in second stage. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. Microbial co-occurrence patterns in deep Precambrian bedrock fracture fluids

    NASA Astrophysics Data System (ADS)

    Purkamo, Lotta; Bomberg, Malin; Kietäväinen, Riikka; Salavirta, Heikki; Nyyssönen, Mari; Nuppunen-Puputti, Maija; Ahonen, Lasse; Kukkonen, Ilmo; Itävaara, Merja

    2016-05-01

    The bacterial and archaeal community composition and the possible carbon assimilation processes and energy sources of microbial communities in oligotrophic, deep, crystalline bedrock fractures is yet to be resolved. In this study, intrinsic microbial communities from groundwater of six fracture zones from 180 to 2300 m depths in Outokumpu bedrock were characterized using high-throughput amplicon sequencing and metagenomic prediction. Comamonadaceae-, Anaerobrancaceae- and Pseudomonadaceae-related operational taxonomic units (OTUs) form the core community in deep crystalline bedrock fractures in Outokumpu. Archaeal communities were mainly composed of Methanobacteriaceae-affiliating OTUs. The predicted bacterial metagenomes showed that pathways involved in fatty acid and amino sugar metabolism were common. In addition, relative abundance of genes coding the enzymes of autotrophic carbon fixation pathways in predicted metagenomes was low. This indicates that heterotrophic carbon assimilation is more important for microbial communities of the fracture zones. Network analysis based on co-occurrence of OTUs revealed possible "keystone" genera of the microbial communities belonging to Burkholderiales and Clostridiales. Bacterial communities in fractures resemble those found in oligotrophic, hydrogen-enriched environments. Serpentinization reactions of ophiolitic rocks in Outokumpu assemblage may provide a source of energy and organic carbon compounds for the microbial communities in the fractures. Sulfate reducers and methanogens form a minority of the total microbial communities, but OTUs forming these minor groups are similar to those found in other deep Precambrian terrestrial bedrock environments.

  2. The keystone species of Precambrian deep bedrock biosphere belong to Burkholderiales and Clostridiales

    NASA Astrophysics Data System (ADS)

    Purkamo, L.; Bomberg, M.; Kietäväinen, R.; Salavirta, H.; Nyyssönen, M.; Nuppunen-Puputti, M.; Ahonen, L.; Kukkonen, I.; Itävaara, M.

    2015-11-01

    The bacterial and archaeal community composition and the possible carbon assimilation processes and energy sources of microbial communities in oligotrophic, deep, crystalline bedrock fractures is yet to be resolved. In this study, intrinsic microbial communities from six fracture zones from 180-2300 m depths in Outokumpu bedrock were characterized using high-throughput amplicon sequencing and metagenomic prediction. Comamonadaceae-, Anaerobrancaceae- and Pseudomonadaceae-related OTUs form the core community in deep crystalline bedrock fractures in Outokumpu. Archaeal communities were mainly composed of Methanobacteraceae-affiliating OTUs. The predicted bacterial metagenomes showed that pathways involved in fatty acid and amino sugar metabolism were common. In addition, relative abundance of genes coding the enzymes of autotrophic carbon fixation pathways in predicted metagenomes was low. This indicates that heterotrophic carbon assimilation is more important for microbial communities of the fracture zones. Network analysis based on co-occurrence of OTUs revealed the keystone genera of the microbial communities belonging to Burkholderiales and Clostridiales. Bacterial communities in fractures resemble those found from oligotrophic, hydrogen-enriched environments. Serpentinization reactions of ophiolitic rocks in Outokumpu assemblage may provide a source of energy and organic carbon compounds for the microbial communities in the fractures. Sulfate reducers and methanogens form a minority of the total microbial communities, but OTUs forming these minor groups are similar to those found from other deep Precambrian terrestrial bedrock environments.

  3. Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus.

    PubMed

    Zhang, Yan; An, Lin; Xu, Jie; Zhang, Bo; Zheng, W Jim; Hu, Ming; Tang, Jijun; Yue, Feng

    2018-02-21

    Although Hi-C technology is one of the most popular tools for studying 3D genome organization, due to sequencing cost, the resolution of most Hi-C datasets are coarse and cannot be used to link distal regulatory elements to their target genes. Here we develop HiCPlus, a computational approach based on deep convolutional neural network, to infer high-resolution Hi-C interaction matrices from low-resolution Hi-C data. We demonstrate that HiCPlus can impute interaction matrices highly similar to the original ones, while only using 1/16 of the original sequencing reads. We show that the models learned from one cell type can be applied to make predictions in other cell or tissue types. Our work not only provides a computational framework to enhance Hi-C data resolution but also reveals features underlying the formation of 3D chromatin interactions.

  4. Improving Protein Fold Recognition by Deep Learning Networks.

    PubMed

    Jo, Taeho; Hou, Jie; Eickholt, Jesse; Cheng, Jianlin

    2015-12-04

    For accurate recognition of protein folds, a deep learning network method (DN-Fold) was developed to predict if a given query-template protein pair belongs to the same structural fold. The input used stemmed from the protein sequence and structural features extracted from the protein pair. We evaluated the performance of DN-Fold along with 18 different methods on Lindahl's benchmark dataset and on a large benchmark set extracted from SCOP 1.75 consisting of about one million protein pairs, at three different levels of fold recognition (i.e., protein family, superfamily, and fold) depending on the evolutionary distance between protein sequences. The correct recognition rate of ensembled DN-Fold for Top 1 predictions is 84.5%, 61.5%, and 33.6% and for Top 5 is 91.2%, 76.5%, and 60.7% at family, superfamily, and fold levels, respectively. We also evaluated the performance of single DN-Fold (DN-FoldS), which showed the comparable results at the level of family and superfamily, compared to ensemble DN-Fold. Finally, we extended the binary classification problem of fold recognition to real-value regression task, which also show a promising performance. DN-Fold is freely available through a web server at http://iris.rnet.missouri.edu/dnfold.

  5. Know Your Enemy: Successful Bioinformatic Approaches to Predict Functional RNA Structures in Viral RNAs.

    PubMed

    Lim, Chun Shen; Brown, Chris M

    2017-01-01

    Structured RNA elements may control virus replication, transcription and translation, and their distinct features are being exploited by novel antiviral strategies. Viral RNA elements continue to be discovered using combinations of experimental and computational analyses. However, the wealth of sequence data, notably from deep viral RNA sequencing, viromes, and metagenomes, necessitates computational approaches being used as an essential discovery tool. In this review, we describe practical approaches being used to discover functional RNA elements in viral genomes. In addition to success stories in new and emerging viruses, these approaches have revealed some surprising new features of well-studied viruses e.g., human immunodeficiency virus, hepatitis C virus, influenza, and dengue viruses. Some notable discoveries were facilitated by new comparative analyses of diverse viral genome alignments. Importantly, comparative approaches for finding RNA elements embedded in coding and non-coding regions differ. With the exponential growth of computer power we have progressed from stem-loop prediction on single sequences to cutting edge 3D prediction, and from command line to user friendly web interfaces. Despite these advances, many powerful, user friendly prediction tools and resources are underutilized by the virology community.

  6. Know Your Enemy: Successful Bioinformatic Approaches to Predict Functional RNA Structures in Viral RNAs

    PubMed Central

    Lim, Chun Shen; Brown, Chris M.

    2018-01-01

    Structured RNA elements may control virus replication, transcription and translation, and their distinct features are being exploited by novel antiviral strategies. Viral RNA elements continue to be discovered using combinations of experimental and computational analyses. However, the wealth of sequence data, notably from deep viral RNA sequencing, viromes, and metagenomes, necessitates computational approaches being used as an essential discovery tool. In this review, we describe practical approaches being used to discover functional RNA elements in viral genomes. In addition to success stories in new and emerging viruses, these approaches have revealed some surprising new features of well-studied viruses e.g., human immunodeficiency virus, hepatitis C virus, influenza, and dengue viruses. Some notable discoveries were facilitated by new comparative analyses of diverse viral genome alignments. Importantly, comparative approaches for finding RNA elements embedded in coding and non-coding regions differ. With the exponential growth of computer power we have progressed from stem-loop prediction on single sequences to cutting edge 3D prediction, and from command line to user friendly web interfaces. Despite these advances, many powerful, user friendly prediction tools and resources are underutilized by the virology community. PMID:29354101

  7. Quantitative phenotyping via deep barcode sequencing

    PubMed Central

    Smith, Andrew M.; Heisler, Lawrence E.; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J.; Chee, Mark; Roth, Frederick P.; Giaever, Guri; Nislow, Corey

    2009-01-01

    Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or “Bar-seq,” outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that ∼20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene–environment interactions on a genome-wide scale. PMID:19622793

  8. Advances in Discrete-Event Simulation for MSL Command Validation

    NASA Technical Reports Server (NTRS)

    Patrikalakis, Alexander; O'Reilly, Taifun

    2013-01-01

    In the last five years, the discrete event simulator, SEQuence GENerator (SEQGEN), developed at the Jet Propulsion Laboratory to plan deep-space missions, has greatly increased uplink operations capacity to deal with increasingly complicated missions. In this paper, we describe how the Mars Science Laboratory (MSL) project makes full use of an interpreted environment to simulate change in more than fifty thousand flight software parameters and conditional command sequences to predict the result of executing a conditional branch in a command sequence, and enable the ability to warn users whenever one or more simulated spacecraft states change in an unexpected manner. Using these new SEQGEN features, operators plan more activities in one sol than ever before.

  9. Deep sequencing of cardiac microRNA-mRNA interactomes in clinical and experimental cardiomyopathy

    PubMed Central

    Matkovich, Scot J.; Dorn, Gerald W.

    2018-01-01

    Summary MicroRNAs are a family of short (~21 nucleotide) noncoding RNAs that serve key roles in cellular growth and differentiation and the response of the heart to stress stimuli. As the sequence-specific recognition element of RNA-induced silencing complexes (RISCs), microRNAs bind mRNAs and prevent their translation via mechanisms that may include transcript degradation and/or prevention of ribosome binding. Short microRNA sequences and the ability of microRNAs to bind to mRNA sites having only partial/imperfect sequence complementarity complicates purely computational analyses of microRNA-mRNA interactomes. Furthermore, computational microRNA target prediction programs typically ignore biological context, and therefore the principal determinants of microRNA-mRNA binding: the presence and quantity of each. To address these deficiencies we describe an empirical method, developed via studies of stressed and failing hearts, to determine disease-induced changes in microRNAs, mRNAs, and the mRNAs targeted to the RISC, without cross-linking mRNAs to RISC proteins. Deep sequencing methods are used to determine RNA abundances, delivering unbiased, quantitative RNA data limited only by their annotation in the genome of interest. We describe the laboratory bench steps required to perform these experiments, experimental design strategies to achieve an appropriate number of sequencing reads per biological replicate, and computer-based processing tools and procedures to convert large raw sequencing data files into gene expression measures useful for differential expression analyses. PMID:25836573

  10. Deep sequencing of cardiac microRNA-mRNA interactomes in clinical and experimental cardiomyopathy.

    PubMed

    Matkovich, Scot J; Dorn, Gerald W

    2015-01-01

    MicroRNAs are a family of short (~21 nucleotide) noncoding RNAs that serve key roles in cellular growth and differentiation and the response of the heart to stress stimuli. As the sequence-specific recognition element of RNA-induced silencing complexes (RISCs), microRNAs bind mRNAs and prevent their translation via mechanisms that may include transcript degradation and/or prevention of ribosome binding. Short microRNA sequences and the ability of microRNAs to bind to mRNA sites having only partial/imperfect sequence complementarity complicate purely computational analyses of microRNA-mRNA interactomes. Furthermore, computational microRNA target prediction programs typically ignore biological context, and therefore the principal determinants of microRNA-mRNA binding: the presence and quantity of each. To address these deficiencies we describe an empirical method, developed via studies of stressed and failing hearts, to determine disease-induced changes in microRNAs, mRNAs, and the mRNAs targeted to the RISC, without cross-linking mRNAs to RISC proteins. Deep sequencing methods are used to determine RNA abundances, delivering unbiased, quantitative RNA data limited only by their annotation in the genome of interest. We describe the laboratory bench steps required to perform these experiments, experimental design strategies to achieve an appropriate number of sequencing reads per biological replicate, and computer-based processing tools and procedures to convert large raw sequencing data files into gene expression measures useful for differential expression analyses.

  11. Discovery radiomics via evolutionary deep radiomic sequencer discovery for pathologically proven lung cancer detection.

    PubMed

    Shafiee, Mohammad Javad; Chung, Audrey G; Khalvati, Farzad; Haider, Masoom A; Wong, Alexander

    2017-10-01

    While lung cancer is the second most diagnosed form of cancer in men and women, a sufficiently early diagnosis can be pivotal in patient survival rates. Imaging-based, or radiomics-driven, detection methods have been developed to aid diagnosticians, but largely rely on hand-crafted features that may not fully encapsulate the differences between cancerous and healthy tissue. Recently, the concept of discovery radiomics was introduced, where custom abstract features are discovered from readily available imaging data. We propose an evolutionary deep radiomic sequencer discovery approach based on evolutionary deep intelligence. Motivated by patient privacy concerns and the idea of operational artificial intelligence, the evolutionary deep radiomic sequencer discovery approach organically evolves increasingly more efficient deep radiomic sequencers that produce significantly more compact yet similarly descriptive radiomic sequences over multiple generations. As a result, this framework improves operational efficiency and enables diagnosis to be run locally at the radiologist's computer while maintaining detection accuracy. We evaluated the evolved deep radiomic sequencer (EDRS) discovered via the proposed evolutionary deep radiomic sequencer discovery framework against state-of-the-art radiomics-driven and discovery radiomics methods using clinical lung CT data with pathologically proven diagnostic data from the LIDC-IDRI dataset. The EDRS shows improved sensitivity (93.42%), specificity (82.39%), and diagnostic accuracy (88.78%) relative to previous radiomics approaches.

  12. Insilico profiling of microRNAs in Korean ginseng (Panax ginseng Meyer)

    PubMed Central

    Mathiyalagan, Ramya; Subramaniyam, Sathiyamoorthy; Natarajan, Sathishkumar; Kim, Yeon Ju; Sun, Myung Suk; Kim, Se Young; Kim, Yu-Jin; Yang, Deok Chun

    2013-01-01

    MicroRNAs (miRNAs) are a class of recently discovered non-coding small RNA molecules, on average approximately 21 nucleotides in length, which underlie numerous important biological roles in gene regulation in various organisms. The miRNA database (release 18) has 18,226 miRNAs, which have been deposited from different species. Although miRNAs have been identified and validated in many plant species, no studies have been reported on discovering miRNAs in Panax ginseng Meyer, which is a traditionally known medicinal plant in oriental medicine, also known as Korean ginseng. It has triterpene ginseng saponins called ginsenosides, which are responsible for its various pharmacological activities. Predicting conserved miRNAs by homology-based analysis with available expressed sequence tag (EST) sequences can be powerful, if the species lacks whole genome sequence information. In this study by using the EST based computational approach, 69 conserved miRNAs belonging to 44 miRNA families were identified in Korean ginseng. The digital gene expression patterns of predicted conserved miRNAs were analyzed by deep sequencing using small RNA sequences of flower buds, leaves, and lateral roots. We have found that many of the identified miRNAs showed tissue specific expressions. Using the insilico method, 346 potential targets were identified for the predicted 69 conserved miRNAs by searching the ginseng EST database, and the predicted targets were mainly involved in secondary metabolic processes, responses to biotic and abiotic stress, and transcription regulator activities, as well as a variety of other metabolic processes. PMID:23717176

  13. Deep sequencing identifies circulating mouse miRNAs that are functionally implicated in manifestations of aging and responsive to calorie restriction.

    PubMed

    Dhahbi, Joseph M; Spindler, Stephen R; Atamna, Hani; Yamakawa, Amy; Guerrero, Noel; Boffelli, Dario; Mote, Patricia; Martin, David I K

    2013-02-01

    MicroRNAs (miRNAs) function to modulate gene expression, and through this property they regulate a broad spectrum of cellular processes. They can circulate in blood and thereby mediate cell-to-cell communication. Aging involves changes in many cellular processes that are potentially regulated by miRNAs, and some evidence has implicated circulating miRNAs in the aging process. In order to initiate a comprehensive assessment of the role of circulating miRNAs in aging, we have used deep sequencing to characterize circulating miRNAs in the serum of young mice, old mice, and old mice maintained on calorie restriction (CR). Deep sequencing identifies a set of novel miRNAs, and also accurately measures all known miRNAs present in serum. This analysis demonstrates that the levels of many miRNAs circulating in the mouse are increased with age, and that the increases can be antagonized by CR. The genes targeted by this set of age-modulated miRNAs are predicted to regulate biological processes directly relevant to the manifestations of aging including metabolic changes, and the miRNAs themselves have been linked to diseases associated with old age. This finding implicates circulating miRNAs in the aging process, raising questions about their tissues of origin, their cellular targets, and their functional role in metabolic changes that occur with aging.

  14. Predictive value of the composition of the vaginal microbiota in bacterial vaginosis, a dynamic study to identify recurrence-related flora

    PubMed Central

    Xiao, Bingbing; Niu, Xiaoxi; Han, Na; Wang, Ben; Du, Pengcheng; Na, Risu; Chen, Chen; Liao, Qinping

    2016-01-01

    Bacterial vaginosis (BV) is a highly prevalent disease in women, and increases the risk of pelvic inflammatory disease. It has been given wide attention because of the high recurrence rate. Traditional diagnostic methods based on microscope providing limited information on the vaginal microbiota increase the difficulty in tracing the development of the disease in bacteria resistance condition. In this study, we used deep-sequencing technology to observe dynamic variation of the vaginal microbiota at three major time points during treatment, at D0 (before treatment), D7 (stop using the antibiotics) and D30 (the 30-day follow-up visit). Sixty-five patients with BV were enrolled (48 were cured and 17 were not cured), and their bacterial composition of the vaginal microbiota was compared. Interestingly, we identified 9 patients might be recurrence. We also introduced a new measurement point of D7, although its microbiota were significantly inhabited by antibiotic and hard to be observed by traditional method. The vaginal microbiota in deep-sequencing-view present a strong correlation to the final outcome. Thus, coupled with detailed individual bioinformatics analysis and deep-sequencing technology, we may illustrate a more accurate map of vaginal microbial to BV patients, which provide a new opportunity to reduce the rate of recurrence of BV. PMID:27253522

  15. Chromatin accessibility prediction via a hybrid deep convolutional neural network.

    PubMed

    Liu, Qiao; Xia, Fei; Yin, Qijin; Jiang, Rui

    2018-03-01

    A majority of known genetic variants associated with human-inherited diseases lie in non-coding regions that lack adequate interpretation, making it indispensable to systematically discover functional sites at the whole genome level and precisely decipher their implications in a comprehensive manner. Although computational approaches have been complementing high-throughput biological experiments towards the annotation of the human genome, it still remains a big challenge to accurately annotate regulatory elements in the context of a specific cell type via automatic learning of the DNA sequence code from large-scale sequencing data. Indeed, the development of an accurate and interpretable model to learn the DNA sequence signature and further enable the identification of causative genetic variants has become essential in both genomic and genetic studies. We proposed Deopen, a hybrid framework mainly based on a deep convolutional neural network, to automatically learn the regulatory code of DNA sequences and predict chromatin accessibility. In a series of comparison with existing methods, we show the superior performance of our model in not only the classification of accessible regions against background sequences sampled at random, but also the regression of DNase-seq signals. Besides, we further visualize the convolutional kernels and show the match of identified sequence signatures and known motifs. We finally demonstrate the sensitivity of our model in finding causative noncoding variants in the analysis of a breast cancer dataset. We expect to see wide applications of Deopen with either public or in-house chromatin accessibility data in the annotation of the human genome and the identification of non-coding variants associated with diseases. Deopen is freely available at https://github.com/kimmo1019/Deopen. ruijiang@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  16. A Deep Learning Framework for Robust and Accurate Prediction of ncRNA-Protein Interactions Using Evolutionary Information.

    PubMed

    Yi, Hai-Cheng; You, Zhu-Hong; Huang, De-Shuang; Li, Xiao; Jiang, Tong-Hai; Li, Li-Ping

    2018-06-01

    The interactions between non-coding RNAs (ncRNAs) and proteins play an important role in many biological processes, and their biological functions are primarily achieved by binding with a variety of proteins. High-throughput biological techniques are used to identify protein molecules bound with specific ncRNA, but they are usually expensive and time consuming. Deep learning provides a powerful solution to computationally predict RNA-protein interactions. In this work, we propose the RPI-SAN model by using the deep-learning stacked auto-encoder network to mine the hidden high-level features from RNA and protein sequences and feed them into a random forest (RF) model to predict ncRNA binding proteins. Stacked assembling is further used to improve the accuracy of the proposed method. Four benchmark datasets, including RPI2241, RPI488, RPI1807, and NPInter v2.0, were employed for the unbiased evaluation of five established prediction tools: RPI-Pred, IPMiner, RPISeq-RF, lncPro, and RPI-SAN. The experimental results show that our RPI-SAN model achieves much better performance than other methods, with accuracies of 90.77%, 89.7%, 96.1%, and 99.33%, respectively. It is anticipated that RPI-SAN can be used as an effective computational tool for future biomedical researches and can accurately predict the potential ncRNA-protein interacted pairs, which provides reliable guidance for biological research. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.

  17. Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks.

    PubMed

    Hanson, Jack; Yang, Yuedong; Paliwal, Kuldip; Zhou, Yaoqi

    2017-03-01

    Capturing long-range interactions between structural but not sequence neighbors of proteins is a long-standing challenging problem in bioinformatics. Recently, long short-term memory (LSTM) networks have significantly improved the accuracy of speech and image classification problems by remembering useful past information in long sequential events. Here, we have implemented deep bidirectional LSTM recurrent neural networks in the problem of protein intrinsic disorder prediction. The new method, named SPOT-Disorder, has steadily improved over a similar method using a traditional, window-based neural network (SPINE-D) in all datasets tested without separate training on short and long disordered regions. Independent tests on four other datasets including the datasets from critical assessment of structure prediction (CASP) techniques and >10 000 annotated proteins from MobiDB, confirmed SPOT-Disorder as one of the best methods in disorder prediction. Moreover, initial studies indicate that the method is more accurate in predicting functional sites in disordered regions. These results highlight the usefulness combining LSTM with deep bidirectional recurrent neural networks in capturing non-local, long-range interactions for bioinformatics applications. SPOT-disorder is available as a web server and as a standalone program at: http://sparks-lab.org/server/SPOT-disorder/index.php . j.hanson@griffith.edu.au or yuedong.yang@griffith.edu.au or yaoqi.zhou@griffith.edu.au. Supplementary data is available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  18. Insights into Deep-Sea Sediment Fungal Communities from the East Indian Ocean Using Targeted Environmental Sequencing Combined with Traditional Cultivation

    PubMed Central

    Zhang, Xiao-yong; Tang, Gui-ling; Xu, Xin-ya; Nong, Xu-hua; Qi, Shu-Hua

    2014-01-01

    The fungal diversity in deep-sea environments has recently gained an increasing amount attention. Our knowledge and understanding of the true fungal diversity and the role it plays in deep-sea environments, however, is still limited. We investigated the fungal community structure in five sediments from a depth of ∼4000 m in the East India Ocean using a combination of targeted environmental sequencing and traditional cultivation. This approach resulted in the recovery of a total of 45 fungal operational taxonomic units (OTUs) and 20 culturable fungal phylotypes. This finding indicates that there is a great amount of fungal diversity in the deep-sea sediments collected in the East Indian Ocean. Three fungal OTUs and one culturable phylotype demonstrated high divergence (89%–97%) from the existing sequences in the GenBank. Moreover, 44.4% fungal OTUs and 30% culturable fungal phylotypes are new reports for deep-sea sediments. These results suggest that the deep-sea sediments from the East India Ocean can serve as habitats for new fungal communities compared with other deep-sea environments. In addition, different fungal community could be detected when using targeted environmental sequencing compared with traditional cultivation in this study, which suggests that a combination of targeted environmental sequencing and traditional cultivation will generate a more diverse fungal community in deep-sea environments than using either targeted environmental sequencing or traditional cultivation alone. This study is the first to report new insights into the fungal communities in deep-sea sediments from the East Indian Ocean, which increases our knowledge and understanding of the fungal diversity in deep-sea environments. PMID:25272044

  19. A novel pathogenic aviadenovirus from red-bellied parrots (Poicephalus rufiventris) unveils deep recombination events among avian host lineages

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Das, Shubhagata, E-mail: sdas@csu.edu.au

    Competing roles of coevolution, selective pressure and recombination are an emerging interest in virus evolution. We report a novel aviadenovirus from captive red-bellied parrots (Poicephalus rufiventris) that uncovers evidence of deep recombination among aviadenoviruses. The sequence identity of the virus was most closely related to Turkey adenovirus D (42% similarity) and other adenoviruses in chickens, turkeys and pigeons. Sequencing and comparative analysis showed that the genome comprised 40,930 nucleotides containing 42 predicted open reading frames (ORFs) 19 of which had strong similarity with genes from other adenovirus species. The new genome unveiled a lineage that likely participated in deep recombinationmore » events across the genus Aviadenovirus accounting for an ancient evolutionary relationship. We hypothesize frequent host switch events and recombination among adenovirus progenitors in Galloanserae hosts caused the radiation of extant aviadenoviruses and the newly assembled Poicephalus adenovirus genome points to a potentially broader host range of these viruses among birds. - Highlights: •Shows how a single new genome can change overall phylogeny. •Reveals host switch events among adenovirus progenitors in Galloanserae hosts. •Points to a potentially broader host range of adenoviruses among birds and wildlife .« less

  20. SD-MSAEs: Promoter recognition in human genome based on deep feature extraction.

    PubMed

    Xu, Wenxuan; Zhang, Li; Lu, Yaping

    2016-06-01

    The prediction and recognition of promoter in human genome play an important role in DNA sequence analysis. Entropy, in Shannon sense, of information theory is a multiple utility in bioinformatic details analysis. The relative entropy estimator methods based on statistical divergence (SD) are used to extract meaningful features to distinguish different regions of DNA sequences. In this paper, we choose context feature and use a set of methods of SD to select the most effective n-mers distinguishing promoter regions from other DNA regions in human genome. Extracted from the total possible combinations of n-mers, we can get four sparse distributions based on promoter and non-promoters training samples. The informative n-mers are selected by optimizing the differentiating extents of these distributions. Specially, we combine the advantage of statistical divergence and multiple sparse auto-encoders (MSAEs) in deep learning to extract deep feature for promoter recognition. And then we apply multiple SVMs and a decision model to construct a human promoter recognition method called SD-MSAEs. Framework is flexible that it can integrate new feature extraction or new classification models freely. Experimental results show that our method has high sensitivity and specificity. Copyright © 2016 Elsevier Inc. All rights reserved.

  1. Microbial communities and their predicted metabolic characteristics in deep fracture groundwaters of the crystalline bedrock at Olkiluoto, Finland

    NASA Astrophysics Data System (ADS)

    Bomberg, Malin; Lamminmäki, Tiina; Itävaara, Merja

    2016-11-01

    The microbial diversity in oligotrophic isolated crystalline Fennoscandian Shield bedrock fracture groundwaters is high, but the core community has not been identified. Here we characterized the bacterial and archaeal communities in 12 water conductive fractures situated at depths between 296 and 798 m by high throughput amplicon sequencing using the Illumina HiSeq platform. Between 1.7 × 104 and 1.2 × 106 bacterial or archaeal sequence reads per sample were obtained. These sequences revealed that up to 95 and 99 % of the bacterial and archaeal sequences obtained from the 12 samples, respectively, belonged to only a few common species, i.e. the core microbiome. However, the remaining rare microbiome contained over 3- and 6-fold more bacterial and archaeal taxa. The metabolic properties of the microbial communities were predicted using PICRUSt. The approximate estimation showed that the metabolic pathways commonly included fermentation, fatty acid oxidation, glycolysis/gluconeogenesis, oxidative phosphorylation, and methanogenesis/anaerobic methane oxidation, but carbon fixation through the Calvin cycle, reductive TCA cycle, and the Wood-Ljungdahl pathway was also predicted. The rare microbiome is an unlimited source of genomic functionality in all ecosystems. It may consist of remnants of microbial communities prevailing in earlier environmental conditions, but could also be induced again if changes in their living conditions occur.

  2. Breaking the 1000-gene barrier for Mimivirus using ultra-deep genome and transcriptome sequencing.

    PubMed

    Legendre, Matthieu; Santini, Sébastien; Rico, Alain; Abergel, Chantal; Claverie, Jean-Michel

    2011-03-04

    Mimivirus, a giant dsDNA virus infecting Acanthamoeba, is the prototype of the mimiviridae family, the latest addition to the family of the nucleocytoplasmic large DNA viruses (NCLDVs). Its 1.2 Mb-genome was initially predicted to encode 917 genes. A subsequent RNA-Seq analysis precisely mapped many transcript boundaries and identified 75 new genes. We now report a much deeper analysis using the SOLiD™ technology combining RNA-Seq of the Mimivirus transcriptome during the infectious cycle (202.4 Million reads), and a complete genome re-sequencing (45.3 Million reads). This study corrected the genome sequence and identified several single nucleotide polymorphisms. Our results also provided clear evidence of previously overlooked transcription units, including an important RNA polymerase subunit distantly related to Euryarchea homologues. The total Mimivirus gene count is now 1018, 11% greater than the original annotation. This study highlights the huge progress brought about by ultra-deep sequencing for the comprehensive annotation of virus genomes, opening the door to a complete one-nucleotide resolution level description of their transcriptional activity, and to the realistic modeling of the viral genome expression at the ultimate molecular level. This work also illustrates the need to go beyond bioinformatics-only approaches for the annotation of short protein and non-coding genes in viral genomes.

  3. Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance.

    PubMed

    Bashir, Ali; Bansal, Vikas; Bafna, Vineet

    2010-06-18

    Massively parallel DNA sequencing technologies have enabled the sequencing of several individual human genomes. These technologies are also being used in novel ways for mRNA expression profiling, genome-wide discovery of transcription-factor binding sites, small RNA discovery, etc. The multitude of sequencing platforms, each with their unique characteristics, pose a number of design challenges, regarding the technology to be used and the depth of sequencing required for a particular sequencing application. Here we describe a number of analytical and empirical results to address design questions for two applications: detection of structural variations from paired-end sequencing and estimating mRNA transcript abundance. For structural variation, our results provide explicit trade-offs between the detection and resolution of rearrangement breakpoints, and the optimal mix of paired-read insert lengths. Specifically, we prove that optimal detection and resolution of breakpoints is achieved using a mix of exactly two insert library lengths. Furthermore, we derive explicit formulae to determine these insert length combinations, enabling a 15% improvement in breakpoint detection at the same experimental cost. On empirical short read data, these predictions show good concordance with Illumina 200 bp and 2 Kbp insert length libraries. For transcriptome sequencing, we determine the sequencing depth needed to detect rare transcripts from a small pilot study. With only 1 Million reads, we derive corrections that enable almost perfect prediction of the underlying expression probability distribution, and use this to predict the sequencing depth required to detect low expressed genes with greater than 95% probability. Together, our results form a generic framework for many design considerations related to high-throughput sequencing. We provide software tools http://bix.ucsd.edu/projects/NGS-DesignTools to derive platform independent guidelines for designing sequencing experiments (amount of sequencing, choice of insert length, mix of libraries) for novel applications of next generation sequencing.

  4. Sixty-five years of the long march in protein secondary structure prediction: the final stretch?

    PubMed Central

    Yang, Yuedong; Gao, Jianzhao; Wang, Jihua; Heffernan, Rhys; Hanson, Jack; Paliwal, Kuldip; Zhou, Yaoqi

    2018-01-01

    Abstract Protein secondary structure prediction began in 1951 when Pauling and Corey predicted helical and sheet conformations for protein polypeptide backbone even before the first protein structure was determined. Sixty-five years later, powerful new methods breathe new life into this field. The highest three-state accuracy without relying on structure templates is now at 82–84%, a number unthinkable just a few years ago. These improvements came from increasingly larger databases of protein sequences and structures for training, the use of template secondary structure information and more powerful deep learning techniques. As we are approaching to the theoretical limit of three-state prediction (88–90%), alternative to secondary structure prediction (prediction of backbone torsion angles and Cα-atom-based angles and torsion angles) not only has more room for further improvement but also allows direct prediction of three-dimensional fragment structures with constantly improved accuracy. About 20% of all 40-residue fragments in a database of 1199 non-redundant proteins have <6 Å root-mean-squared distance from the native conformations by SPIDER2. More powerful deep learning methods with improved capability of capturing long-range interactions begin to emerge as the next generation of techniques for secondary structure prediction. The time has come to finish off the final stretch of the long march towards protein secondary structure prediction. PMID:28040746

  5. Metagenomic Analysis of Viral Communities in (Hado)Pelagic Sediments

    PubMed Central

    Yoshida, Mitsuhiro; Takaki, Yoshihiro; Eitoku, Masamitsu; Nunoura, Takuro; Takai, Ken

    2013-01-01

    In this study, we analyzed viral metagenomes (viromes) in the sedimentary habitats of three geographically and geologically distinct (hado)pelagic environments in the northwest Pacific; the Izu-Ogasawara Trench (water depth = 9,760 m) (OG), the Challenger Deep in the Mariana Trench (10,325 m) (MA), and the forearc basin off the Shimokita Peninsula (1,181 m) (SH). Virus abundance ranged from 106 to 1011 viruses/cm3 of sediments (down to 30 cm below the seafloor [cmbsf]). We recovered viral DNA assemblages (viromes) from the (hado)pelagic sediment samples and obtained a total of 37,458, 39,882, and 70,882 sequence reads by 454 GS FLX Titanium pyrosequencing from the virome libraries of the OG, MA, and SH (hado)pelagic sediments, respectively. Only 24−30% of the sequence reads from each virome library exhibited significant similarities to the sequences deposited in the public nr protein database (E-value <10−3 in BLAST). Among the sequences identified as potential viral genes based on the BLAST search, 95−99% of the sequence reads in each library were related to genes from single-stranded DNA (ssDNA) viral families, including Microviridae, Circoviridae, and Geminiviridae. A relatively high abundance of sequences related to the genetic markers (major capsid protein [VP1] and replication protein [Rep]) of two ssDNA viral groups were also detected in these libraries, thereby revealing a high genotypic diversity of their viruses (833 genotypes for VP1 and 2,551 genotypes for Rep). A majority of the viral genes predicted from each library were classified into three ssDNA viral protein categories: Rep, VP1, and minor capsid protein. The deep-sea sedimentary viromes were distinct from the viromes obtained from the oceanic and fresh waters and marine eukaryotes, and thus, deep-sea sediments harbor novel viromes, including previously unidentified ssDNA viruses. PMID:23468952

  6. Metagenomic analysis of viral communities in (hado)pelagic sediments.

    PubMed

    Yoshida, Mitsuhiro; Takaki, Yoshihiro; Eitoku, Masamitsu; Nunoura, Takuro; Takai, Ken

    2013-01-01

    In this study, we analyzed viral metagenomes (viromes) in the sedimentary habitats of three geographically and geologically distinct (hado)pelagic environments in the northwest Pacific; the Izu-Ogasawara Trench (water depth = 9,760 m) (OG), the Challenger Deep in the Mariana Trench (10,325 m) (MA), and the forearc basin off the Shimokita Peninsula (1,181 m) (SH). Virus abundance ranged from 10(6) to 10(11) viruses/cm(3) of sediments (down to 30 cm below the seafloor [cmbsf]). We recovered viral DNA assemblages (viromes) from the (hado)pelagic sediment samples and obtained a total of 37,458, 39,882, and 70,882 sequence reads by 454 GS FLX Titanium pyrosequencing from the virome libraries of the OG, MA, and SH (hado)pelagic sediments, respectively. Only 24-30% of the sequence reads from each virome library exhibited significant similarities to the sequences deposited in the public nr protein database (E-value <10(-3) in BLAST). Among the sequences identified as potential viral genes based on the BLAST search, 95-99% of the sequence reads in each library were related to genes from single-stranded DNA (ssDNA) viral families, including Microviridae, Circoviridae, and Geminiviridae. A relatively high abundance of sequences related to the genetic markers (major capsid protein [VP1] and replication protein [Rep]) of two ssDNA viral groups were also detected in these libraries, thereby revealing a high genotypic diversity of their viruses (833 genotypes for VP1 and 2,551 genotypes for Rep). A majority of the viral genes predicted from each library were classified into three ssDNA viral protein categories: Rep, VP1, and minor capsid protein. The deep-sea sedimentary viromes were distinct from the viromes obtained from the oceanic and fresh waters and marine eukaryotes, and thus, deep-sea sediments harbor novel viromes, including previously unidentified ssDNA viruses.

  7. Prediction of enhancer-promoter interactions via natural language processing.

    PubMed

    Zeng, Wanwen; Wu, Mengmeng; Jiang, Rui

    2018-05-09

    Precise identification of three-dimensional genome organization, especially enhancer-promoter interactions (EPIs), is important to deciphering gene regulation, cell differentiation and disease mechanisms. Currently, it is a challenging task to distinguish true interactions from other nearby non-interacting ones since the power of traditional experimental methods is limited due to low resolution or low throughput. We propose a novel computational framework EP2vec to assay three-dimensional genomic interactions. We first extract sequence embedding features, defined as fixed-length vector representations learned from variable-length sequences using an unsupervised deep learning method in natural language processing. Then, we train a classifier to predict EPIs using the learned representations in supervised way. Experimental results demonstrate that EP2vec obtains F1 scores ranging from 0.841~ 0.933 on different datasets, which outperforms existing methods. We prove the robustness of sequence embedding features by carrying out sensitivity analysis. Besides, we identify motifs that represent cell line-specific information through analysis of the learned sequence embedding features by adopting attention mechanism. Last, we show that even superior performance with F1 scores 0.889~ 0.940 can be achieved by combining sequence embedding features and experimental features. EP2vec sheds light on feature extraction for DNA sequences of arbitrary lengths and provides a powerful approach for EPIs identification.

  8. Quasispecies Analyses of the HIV-1 Near-full-length Genome With Illumina MiSeq

    PubMed Central

    Ode, Hirotaka; Matsuda, Masakazu; Matsuoka, Kazuhiro; Hachiya, Atsuko; Hattori, Junko; Kito, Yumiko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru

    2015-01-01

    Human immunodeficiency virus type-1 (HIV-1) exhibits high between-host genetic diversity and within-host heterogeneity, recognized as quasispecies. Because HIV-1 quasispecies fluctuate in terms of multiple factors, such as antiretroviral exposure and host immunity, analyzing the HIV-1 genome is critical for selecting effective antiretroviral therapy and understanding within-host viral coevolution mechanisms. Here, to obtain HIV-1 genome sequence information that includes minority variants, we sought to develop a method for evaluating quasispecies throughout the HIV-1 near-full-length genome using the Illumina MiSeq benchtop deep sequencer. To ensure the reliability of minority mutation detection, we applied an analysis method of sequence read mapping onto a consensus sequence derived from de novo assembly followed by iterative mapping and subsequent unique error correction. Deep sequencing analyses of aHIV-1 clone showed that the analysis method reduced erroneous base prevalence below 1% in each sequence position and discarded only < 1% of all collected nucleotides, maximizing the usage of the collected genome sequences. Further, we designed primer sets to amplify the HIV-1 near-full-length genome from clinical plasma samples. Deep sequencing of 92 samples in combination with the primer sets and our analysis method provided sufficient coverage to identify >1%-frequency sequences throughout the genome. When we evaluated sequences of pol genes from 18 treatment-naïve patients' samples, the deep sequencing results were in agreement with Sanger sequencing and identified numerous additional minority mutations. The results suggest that our deep sequencing method would be suitable for identifying within-host viral population dynamics throughout the genome. PMID:26617593

  9. Virus Identification in Unknown Tropical Febrile Illness Cases Using Deep Sequencing

    PubMed Central

    Balmaseda, Angel; Harris, Eva; DeRisi, Joseph L.

    2012-01-01

    Dengue virus is an emerging infectious agent that infects an estimated 50–100 million people annually worldwide, yet current diagnostic practices cannot detect an etiologic pathogen in ∼40% of dengue-like illnesses. Metagenomic approaches to pathogen detection, such as viral microarrays and deep sequencing, are promising tools to address emerging and non-diagnosable disease challenges. In this study, we used the Virochip microarray and deep sequencing to characterize the spectrum of viruses present in human sera from 123 Nicaraguan patients presenting with dengue-like symptoms but testing negative for dengue virus. We utilized a barcoding strategy to simultaneously deep sequence multiple serum specimens, generating on average over 1 million reads per sample. We then implemented a stepwise bioinformatic filtering pipeline to remove the majority of human and low-quality sequences to improve the speed and accuracy of subsequent unbiased database searches. By deep sequencing, we were able to detect virus sequence in 37% (45/123) of previously negative cases. These included 13 cases with Human Herpesvirus 6 sequences. Other samples contained sequences with similarity to sequences from viruses in the Herpesviridae, Flaviviridae, Circoviridae, Anelloviridae, Asfarviridae, and Parvoviridae families. In some cases, the putative viral sequences were virtually identical to known viruses, and in others they diverged, suggesting that they may derive from novel viruses. These results demonstrate the utility of unbiased metagenomic approaches in the detection of known and divergent viruses in the study of tropical febrile illness. PMID:22347512

  10. Identification and functional analysis of flowering related microRNAs in common wild rice (Oryza rufipogon Griff.).

    PubMed

    Chen, Zongxiang; Li, Fuli; Yang, Songnan; Dong, Yibo; Yuan, Qianhua; Wang, Feng; Li, Weimin; Jiang, Ying; Jia, Shirong; Pei, Xinwu

    2013-01-01

    MicroRNAs (miRNAs) is a class of non-coding RNAs involved in post- transcriptional control of gene expression, via degradation and/or translational inhibition. Six-hundred sixty-one rice miRNAs are known that are important in plant development. However, flowering-related miRNAs have not been characterized in Oryza rufipogon Griff. It was approved by supervision department of Guangdong wild rice protection. We analyzed flowering-related miRNAs in O. rufipogon using high-throughput sequencing (deep sequencing) to understand the changes that occurred during rice domestication, and to elucidate their functions in flowering. Three O. rufipogon sRNA libraries, two vegetative stage (CWR-V1 and CWR-V2) and one flowering stage (CWR-F2) were sequenced using Illumina deep sequencing. A total of 20,156,098, 21,531,511 and 20,995,942 high quality sRNA reads were obtained from CWR-V1, CWR-V2 and CWR-F2, respectively, of which 3,448,185, 4,265,048 and 2,833,527 reads matched known miRNAs. We identified 512 known rice miRNAs in 214 miRNA families and predicted 290 new miRNAs. Targeted functional annotation, GO and KEGG pathway analyses predicted that 187 miRNAs regulate expression of flowering-related genes. Differential expression analysis of flowering-related miRNAs showed that: expression of 95 miRNAs varied significantly between the libraries, 66 are flowering-related miRNAs, such as oru-miR97, oru-miR117, oru-miR135, oru-miR137, et al. 17 are early-flowering -related miRNAs, including osa-miR160f, osa-miR164d, osa-miR167d, osa-miR169a, osa-miR172b, oru-miR4, et al., induced during the floral transition. Real-time PCR revealed the same expression patterns as deep sequencing. miRNAs targets were confirmed for cleavage by 5'-RACE in vivo, and were negatively regulated by miRNAs. This is the first investigation of flowering miRNAs in wild rice. The result indicates that variation in miRNAs occurred during rice domestication and lays a foundation for further study of phase change and flowering in O. rufipogon. Complicated regulatory networks mediated by multiple miRNAs regulate the expression of flowering genes that control the induction of flowering.

  11. Improving Protein Fold Recognition by Deep Learning Networks

    NASA Astrophysics Data System (ADS)

    Jo, Taeho; Hou, Jie; Eickholt, Jesse; Cheng, Jianlin

    2015-12-01

    For accurate recognition of protein folds, a deep learning network method (DN-Fold) was developed to predict if a given query-template protein pair belongs to the same structural fold. The input used stemmed from the protein sequence and structural features extracted from the protein pair. We evaluated the performance of DN-Fold along with 18 different methods on Lindahl’s benchmark dataset and on a large benchmark set extracted from SCOP 1.75 consisting of about one million protein pairs, at three different levels of fold recognition (i.e., protein family, superfamily, and fold) depending on the evolutionary distance between protein sequences. The correct recognition rate of ensembled DN-Fold for Top 1 predictions is 84.5%, 61.5%, and 33.6% and for Top 5 is 91.2%, 76.5%, and 60.7% at family, superfamily, and fold levels, respectively. We also evaluated the performance of single DN-Fold (DN-FoldS), which showed the comparable results at the level of family and superfamily, compared to ensemble DN-Fold. Finally, we extended the binary classification problem of fold recognition to real-value regression task, which also show a promising performance. DN-Fold is freely available through a web server at http://iris.rnet.missouri.edu/dnfold.

  12. deepTools: a flexible platform for exploring deep-sequencing data.

    PubMed

    Ramírez, Fidel; Dündar, Friederike; Diehl, Sarah; Grüning, Björn A; Manke, Thomas

    2014-07-01

    We present a Galaxy based web server for processing and visualizing deeply sequenced data. The web server's core functionality consists of a suite of newly developed tools, called deepTools, that enable users with little bioinformatic background to explore the results of their sequencing experiments in a standardized setting. Users can upload pre-processed files with continuous data in standard formats and generate heatmaps and summary plots in a straight-forward, yet highly customizable manner. In addition, we offer several tools for the analysis of files containing aligned reads and enable efficient and reproducible generation of normalized coverage files. As a modular and open-source platform, deepTools can easily be expanded and customized to future demands and developments. The deepTools webserver is freely available at http://deeptools.ie-freiburg.mpg.de and is accompanied by extensive documentation and tutorials aimed at conveying the principles of deep-sequencing data analysis. The web server can be used without registration. deepTools can be installed locally either stand-alone or as part of Galaxy. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. A korarchaeal genome reveals insights into the evolution of the Archaea

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Anderson, Iain J; Elkins, James G.; Podar, Mircea

    2008-06-05

    The candidate division Korarchaeota comprises a group of uncultivated microorganisms that, by their small subunit rRNA phylogeny, may have diverged early from the major archaeal phyla Crenarchaeota and Euryarchaeota. Here, we report the initial characterization of a member of the Korarchaeota with the proposed name,"Candidatus Korarchaeum cryptofilum," which exhibits an ultrathin filamentous morphology. To investigate possible ancestral relationships between deep-branching Korarchaeota and other phyla, we used whole-genome shotgun sequencing to construct a complete composite korarchaeal genome from enriched cells. The genome was assembled into a single contig 1.59 Mb in length with a G + C content of 49percent. Ofmore » the 1,617 predicted protein-coding genes, 1,382 (85percent) could be assigned to a revised set of archaeal Clusters of Orthologous Groups (COGs). The predicted gene functions suggest that the organism relies on a simple mode of peptide fermentation for carbon and energy and lacks the ability to synthesize de novo purines, CoA, and several other cofactors. Phylogenetic analyses based on conserved single genes and concatenated protein sequences positioned the korarchaeote as a deep archaeal lineage with an apparent affinity to the Crenarchaeota. However, the predicted gene content revealed that several conserved cellular systems, such as cell division, DNA replication, and tRNA maturation, resemble the counterparts in the Euryarchaeota. In light of the known composition of archaeal genomes, the Korarchaeota might have retained a set of cellular features that represents the ancestral archaeal form.« less

  14. A Korarchael Genome Reveals Insights into the Evolution of the Archaea

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lapidus, Alla; Elkins, James G.; Podar, Mircea

    2008-01-07

    The candidate division Korarchaeota comprises a group of uncultivated microorganisms that, by their small subunit rRNA phylogeny, may have diverged early from the major archaeal phyla Crenarchaeota and Euryarchaeota. Here, we report the initial characterization of a member of the Korarchaeota with the proposed name, ?Candidatus Korarchaeum cryptofilum,? which exhibits an ultrathin filamentous morphology. To investigate possible ancestral relationships between deep-branching Korarchaeota and other phyla, we used whole-genome shotgun sequencing to construct a complete composite korarchaeal genome from enriched cells. The genome was assembled into a single contig 1.59 Mb in length with a G + C content of 49percent.more » Of the 1,617 predicted protein-coding genes, 1,382 (85percent) could be assigned to a revised set of archaeal Clusters of Orthologous Groups (COGs). The predicted gene functions suggest that the organism relies on a simple mode of peptide fermentation for carbon and energy and lacks the ability to synthesize de novo purines, CoA, and several other cofactors. Phylogenetic analyses based on conserved single genes and concatenated protein sequences positioned the korarchaeote as a deep archaeal lineage with an apparent affinity to the Crenarchaeota. However, the predicted gene content revealed that several conserved cellular systems, such as cell division, DNA replication, and tRNA maturation, resemble the counterparts in the Euryarchaeota. In light of the known composition of archaeal genomes, the Korarchaeota might have retained a set of cellular features that represents the ancestral archaeal form.« less

  15. Low-abundant bacteria drive compositional changes in the gut microbiota after dietary alteration.

    PubMed

    Benjamino, Jacquelynn; Lincoln, Stephen; Srivastava, Ranjan; Graf, Joerg

    2018-05-10

    As the importance of beneficial bacteria is better recognized, understanding the dynamics of symbioses becomes increasingly crucial. In many gut symbioses, it is essential to understand whether changes in host diet play a role in the persistence of the bacterial gut community. In this study, termites were fed six dietary sources and the microbial community was monitored over a 49-day period using 16S rRNA gene sequencing. A deep backpropagation artificial neural network (ANN) was used to learn how the six different lignocellulose food sources affected the temporal composition of the hindgut microbiota of the termite as well as taxon-taxon and taxon-substrate interactions. Shifts in the termite gut microbiota after diet change in each colony were observed using 16S rRNA gene sequencing and beta diversity analyses. The artificial neural network accurately predicted the relative abundances of taxa at random points in the temporal study and showed that low-abundant taxa maintain community driving correlations in the hindgut. This combinatorial approach utilizing 16S rRNA gene sequencing and deep learning revealed that low-abundant bacteria that often do not belong to the core community are drivers of the termite hindgut bacterial community composition.

  16. Identification and characterization of novel serum microRNA candidates from deep sequencing in cervical cancer patients.

    PubMed

    Juan, Li; Tong, Hong-li; Zhang, Pengjun; Guo, Guanghong; Wang, Zi; Wen, Xinyu; Dong, Zhennan; Tian, Ya-ping

    2014-09-03

    Small non-coding microRNAs (miRNAs) are involved in cancer development and progression, and serum profiles of cervical cancer patients may be useful for identifying novel miRNAs. We performed deep sequencing on serum pools of cervical cancer patients and healthy controls with 3 replicates and constructed a small RNA library. We used MIREAP to predict novel miRNAs and identified 2 putative novel miRNAs between serum pools of cervical cancer patients and healthy controls after filtering out pseudo-pre-miRNAs using Triplet-SVM analysis. The 2 putative novel miRNAs were validated by real time PCR and were significantly decreased in cervical cancer patients compared with healthy controls. One novel miRNA had an area under curve (AUC) of 0.921 (95% CI: 0.883, 0.959) with a sensitivity of 85.7% and a specificity of 88.2% when discriminating between cervical cancer patients and healthy controls. Our results suggest that characterizing serum profiles of cervical cancers by Solexa sequencing may be a good method for identifying novel miRNAs and that the validated novel miRNAs described here may be cervical cancer-associated biomarkers.

  17. The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads.

    PubMed

    Wang, Zhiwen; Hobson, Neil; Galindo, Leonardo; Zhu, Shilin; Shi, Daihu; McDill, Joshua; Yang, Linfeng; Hawkins, Simon; Neutelings, Godfrey; Datla, Raju; Lambert, Georgina; Galbraith, David W; Grassa, Christopher J; Geraldes, Armando; Cronk, Quentin C; Cullis, Christopher; Dash, Prasanta K; Kumar, Polumetla A; Cloutier, Sylvie; Sharpe, Andrew G; Wong, Gane K-S; Wang, Jun; Deyholos, Michael K

    2012-11-01

    Flax (Linum usitatissimum) is an ancient crop that is widely cultivated as a source of fiber, oil and medicinally relevant compounds. To accelerate crop improvement, we performed whole-genome shotgun sequencing of the nuclear genome of flax. Seven paired-end libraries ranging in size from 300 bp to 10 kb were sequenced using an Illumina genome analyzer. A de novo assembly, comprised exclusively of deep-coverage (approximately 94× raw, approximately 69× filtered) short-sequence reads (44-100 bp), produced a set of scaffolds with N(50) =694 kb, including contigs with N(50)=20.1 kb. The contig assembly contained 302 Mb of non-redundant sequence representing an estimated 81% genome coverage. Up to 96% of published flax ESTs aligned to the whole-genome shotgun scaffolds. However, comparisons with independently sequenced BACs and fosmids showed some mis-assembly of regions at the genome scale. A total of 43384 protein-coding genes were predicted in the whole-genome shotgun assembly, and up to 93% of published flax ESTs, and 86% of A. thaliana genes aligned to these predicted genes, indicating excellent coverage and accuracy at the gene level. Analysis of the synonymous substitution rates (K(s) ) observed within duplicate gene pairs was consistent with a recent (5-9 MYA) whole-genome duplication in flax. Within the predicted proteome, we observed enrichment of many conserved domains (Pfam-A) that may contribute to the unique properties of this crop, including agglutinin proteins. Together these results show that de novo assembly, based solely on whole-genome shotgun short-sequence reads, is an efficient means of obtaining nearly complete genome sequence information for some plant species. © 2012 The Authors. The Plant Journal © 2012 Blackwell Publishing Ltd.

  18. Deep sequencing of Salmonella RNA associated with heterologous Hfq proteins in vivo reveals small RNAs as a major target class and identifies RNA processing phenotypes.

    PubMed

    Sittka, Alexandra; Sharma, Cynthia M; Rolle, Katarzyna; Vogel, Jörg

    2009-01-01

    The bacterial Sm-like protein, Hfq, is a key factor for the stability and function of small non-coding RNAs (sRNAs) in Escherichia coli. Homologues of this protein have been predicted in many distantly related organisms yet their functional conservation as sRNA-binding proteins has not entirely been clear. To address this, we expressed in Salmonella the Hfq proteins of two eubacteria (Neisseria meningitides, Aquifex aeolicus) and an archaeon (Methanocaldococcus jannaschii), and analyzed the associated RNA by deep sequencing. This in vivo approach identified endogenous Salmonella sRNAs as a major target of the foreign Hfq proteins. New Salmonella sRNA species were also identified, and some of these accumulated specifically in the presence of a foreign Hfq protein. In addition, we observed specific RNA processing defects, e.g., suppression of precursor processing of SraH sRNA by Methanocaldococcus Hfq, or aberrant accumulation of extracytoplasmic target mRNAs of the Salmonella GcvB, MicA or RybB sRNAs. Taken together, our study provides evidence of a conserved inherent sRNA-binding property of Hfq, which may facilitate the lateral transmission of regulatory sRNAs among distantly related species. It also suggests that the expression of heterologous RNA-binding proteins combined with deep sequencing analysis of RNA ligands can be used as a molecular tool to dissect individual steps of RNA metabolism in vivo.

  19. Clonal architecture of secondary acute myeloid leukemia defined by single-cell sequencing.

    PubMed

    Hughes, Andrew E O; Magrini, Vincent; Demeter, Ryan; Miller, Christopher A; Fulton, Robert; Fulton, Lucinda L; Eades, William C; Elliott, Kevin; Heath, Sharon; Westervelt, Peter; Ding, Li; Conrad, Donald F; White, Brian S; Shao, Jin; Link, Daniel C; DiPersio, John F; Mardis, Elaine R; Wilson, Richard K; Ley, Timothy J; Walter, Matthew J; Graubert, Timothy A

    2014-07-01

    Next-generation sequencing has been used to infer the clonality of heterogeneous tumor samples. These analyses yield specific predictions-the population frequency of individual clones, their genetic composition, and their evolutionary relationships-which we set out to test by sequencing individual cells from three subjects diagnosed with secondary acute myeloid leukemia, each of whom had been previously characterized by whole genome sequencing of unfractionated tumor samples. Single-cell mutation profiling strongly supported the clonal architecture implied by the analysis of bulk material. In addition, it resolved the clonal assignment of single nucleotide variants that had been initially ambiguous and identified areas of previously unappreciated complexity. Accordingly, we find that many of the key assumptions underlying the analysis of tumor clonality by deep sequencing of unfractionated material are valid. Furthermore, we illustrate a single-cell sequencing strategy for interrogating the clonal relationships among known variants that is cost-effective, scalable, and adaptable to the analysis of both hematopoietic and solid tumors, or any heterogeneous population of cells.

  20. The star forming universe after z=1

    NASA Astrophysics Data System (ADS)

    Harker, Justin J.

    This dissertation explores three projects in the field of galaxy formation and evolution: the formation of the red sequence via quenching, the detection, characterization, and frequency of starbursts in the DEEP2 sample, and the behavior of a main sequence of star forming galaxies whose behavior is determined by baryonic mass, referred to as staged star formation. The first section, in Chapter 2, presents a breakdown of several population synthesis models designed to probe the history of the red sequence. Known from measurements at low redshift to be composed of objects with a large range of ages, the red sequence is not well-modeled as being the result of a single monolithic event in the distant past. By combining information on restframe color, Balmer absorption line strengths, and the number density of L* galaxies as a function of redshift, we find evidence that the red sequence is built up over time. The second section, in Chapter 3 and 4, presents a novel method for determining simultaneously the absorption line and emission line contributions to the total measured equivalent width of Balmer lines. Relying on the predictable behavior of both absorption lines, which are to first order equivalent to one another, and emission lines, which follow a predictable decrement toward shorter wavelengths, a single measurement of total line strength for Hb and Hd yield uncoupled emission and absorption line components. Using the measurement of Hd in absorption against D n 4000 and Hb in emission, we isolate a population of potential starbursts in the DEEP2 sample. The final section, in Chapter 5, explores the regularity of star formation as a function of redshift, using the staged star formation prescription of Noeske et al. (2007a). We compute a set of t-models using the prescription, and compare them to the data in a number of parameters in addition to mass and star formation. While the staged star formation model is a good match in a number of parameters, we find several irregularities.

  1. Development of a deep convolutional neural network to predict grading of canine meningiomas from magnetic resonance images.

    PubMed

    Banzato, T; Cherubini, G B; Atzori, M; Zotti, A

    2018-05-01

    An established deep neural network (DNN) based on transfer learning and a newly designed DNN were tested to predict the grade of meningiomas from magnetic resonance (MR) images in dogs and to determine the accuracy of classification of using pre- and post-contrast T1-weighted (T1W), and T2-weighted (T2W) MR images. The images were randomly assigned to a training set, a validation set and a test set, comprising 60%, 10% and 30% of images, respectively. The combination of DNN and MR sequence displaying the highest discriminating accuracy was used to develop an image classifier to predict the grading of new cases. The algorithm based on transfer learning using the established DNN did not provide satisfactory results, whereas the newly designed DNN had high classification accuracy. On the basis of classification accuracy, an image classifier built on the newly designed DNN using post-contrast T1W images was developed. This image classifier correctly predicted the grading of 8 out of 10 images not included in the data set. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.

  2. Analysis of Ribosome Stalling and Translation Elongation Dynamics by Deep Learning.

    PubMed

    Zhang, Sai; Hu, Hailin; Zhou, Jingtian; He, Xuan; Jiang, Tao; Zeng, Jianyang

    2017-09-27

    Ribosome stalling is manifested by the local accumulation of ribosomes at specific codon positions of mRNAs. Here, we present ROSE, a deep learning framework to analyze high-throughput ribosome profiling data and estimate the probability of a ribosome stalling event occurring at each genomic location. Extensive validation tests on independent data demonstrated that ROSE possessed higher prediction accuracy than conventional prediction models, with an increase in the area under the receiver operating characteristic curve by up to 18.4%. In addition, genome-wide statistical analyses showed that ROSE predictions can be well correlated with diverse putative regulatory factors of ribosome stalling. Moreover, the genome-wide ribosome stalling landscapes of both human and yeast computed by ROSE recovered the functional interplays between ribosome stalling and cotranslational events in protein biogenesis, including protein targeting by the signal recognition particles and protein secondary structure formation. Overall, our study provides a novel method to complement the ribosome profiling techniques and further decipher the complex regulatory mechanisms underlying translation elongation dynamics encoded in the mRNA sequence. Copyright © 2017 Elsevier Inc. All rights reserved.

  3. Identification of microRNAs from Amur grape (Vitis amurensis Rupr.) by deep sequencing and analysis of microRNA variations with bioinformatics.

    PubMed

    Wang, Chen; Han, Jian; Liu, Chonghuai; Kibet, Korir Nicholas; Kayesh, Emrul; Shangguan, Lingfei; Li, Xiaoying; Fang, Jinggui

    2012-03-29

    MicroRNA (miRNA) is a class of functional non-coding small RNA with 19-25 nucleotides in length while Amur grape (Vitis amurensis Rupr.) is an important wild fruit crop with the strongest cold resistance among the Vitis species, is used as an excellent breeding parent for grapevine, and has elicited growing interest in wine production. To date, there is a relatively large number of grapevine miRNAs (vv-miRNAs) from cultivated grapevine varieties such as Vitis vinifera L. and hybrids of V. vinifera and V. labrusca, but there is no report on miRNAs from Vitis amurensis Rupr, a wild grapevine species. A small RNA library from Amur grape was constructed and Solexa technology used to perform deep sequencing of the library followed by subsequent bioinformatics analysis to identify new miRNAs. In total, 126 conserved miRNAs belonging to 27 miRNA families were identified, and 34 known but non-conserved miRNAs were also found. Significantly, 72 new potential Amur grape-specific miRNAs were discovered. The sequences of these new potential va-miRNAs were further validated through miR-RACE, and accumulation of 18 new va-miRNAs in seven tissues of grapevines confirmed by real time RT-PCR (qRT-PCR) analysis. The expression levels of va-miRNAs in flowers and berries were found to be basically consistent in identity to those from deep sequenced sRNAs libraries of combined corresponding tissues. We also describe the conservation and variation of va-miRNAs using miR-SNPs and miR-LDs during plant evolution based on comparison of orthologous sequences, and further reveal that the number and sites of miR-SNP in diverse miRNA families exhibit distinct divergence. Finally, 346 target genes for the new miRNAs were predicted and they include a number of Amur grape stress tolerance genes and many genes regulating anthocyanin synthesis and sugar metabolism. Deep sequencing of short RNAs from Amur grape flowers and berries identified 72 new potential miRNAs and 34 known but non-conserved miRNAs, indicating that specific miRNAs exist in Amur grape. These results show that a number of regulatory miRNAs exist in Amur grape and play an important role in Amur grape growth, development, and response to abiotic or biotic stress.

  4. Draft genome sequence of the novel strain Pseudomonas sp. 10B238 with potential ability to produce antibiotics from deep-sea sediment.

    PubMed

    Pan, Hua-Qi; Hu, Jiang-Chun

    2015-10-01

    Pseudomonas sp. 10B238 was a putatively novel species of Pseudomonas, isolated from a deep-sea sediment of the South China Sea, which had the genetic potential to produce secondary metabolites related to nonribosomal peptides (NRPs), as well as showed moderate antimicrobial activities. Here we report a high quality draft genome of Pseudomonas sp. 10B238, which comprises 4,933,052bp with the G+C content of 60.23%. A total of 11 potential secondary metabolite biosynthetic gene clusters were predicted, including a NRP for new peptide siderophore. And many anaerobic respiratory terminal enzymes were found for life in deep-sea environments. Our results may provide insights into biosynthetic pathway for antimicrobial bioactive compounds and be helpful to understand the physiological characteristic of this species. Copyright © 2015 Elsevier B.V. All rights reserved.

  5. Low-Latency Telerobotic Sample Return and Biomolecular Sequencing for Deep Space Gateway

    NASA Astrophysics Data System (ADS)

    Lupisella, M.; Bleacher, J.; Lewis, R.; Dworkin, J.; Wright, M.; Burton, A.; Rubins, K.; Wallace, S.; Stahl, S.; John, K.; Archer, D.; Niles, P.; Regberg, A.; Smith, D.; Race, M.; Chiu, C.; Russell, J.; Rampe, E.; Bywaters, K.

    2018-02-01

    Low-latency telerobotics, crew-assisted sample return, and biomolecular sequencing can be used to acquire and analyze lunar farside and/or Apollo landing site samples. Sequencing can also be used to monitor and study Deep Space Gateway environment and crew health.

  6. A deep learning framework for modeling structural features of RNA-binding protein targets

    PubMed Central

    Zhang, Sai; Zhou, Jingtian; Hu, Hailin; Gong, Haipeng; Chen, Ligong; Cheng, Chao; Zeng, Jianyang

    2016-01-01

    RNA-binding proteins (RBPs) play important roles in the post-transcriptional control of RNAs. Identifying RBP binding sites and characterizing RBP binding preferences are key steps toward understanding the basic mechanisms of the post-transcriptional gene regulation. Though numerous computational methods have been developed for modeling RBP binding preferences, discovering a complete structural representation of the RBP targets by integrating their available structural features in all three dimensions is still a challenging task. In this paper, we develop a general and flexible deep learning framework for modeling structural binding preferences and predicting binding sites of RBPs, which takes (predicted) RNA tertiary structural information into account for the first time. Our framework constructs a unified representation that characterizes the structural specificities of RBP targets in all three dimensions, which can be further used to predict novel candidate binding sites and discover potential binding motifs. Through testing on the real CLIP-seq datasets, we have demonstrated that our deep learning framework can automatically extract effective hidden structural features from the encoded raw sequence and structural profiles, and predict accurate RBP binding sites. In addition, we have conducted the first study to show that integrating the additional RNA tertiary structural features can improve the model performance in predicting RBP binding sites, especially for the polypyrimidine tract-binding protein (PTB), which also provides a new evidence to support the view that RBPs may own specific tertiary structural binding preferences. In particular, the tests on the internal ribosome entry site (IRES) segments yield satisfiable results with experimental support from the literature and further demonstrate the necessity of incorporating RNA tertiary structural information into the prediction model. The source code of our approach can be found in https://github.com/thucombio/deepnet-rbp. PMID:26467480

  7. Fungal communities from the calcareous deep-sea sediments in the Southwest India Ridge revealed by Illumina sequencing technology.

    PubMed

    Zhang, Likui; Kang, Manyu; Huang, Yangchao; Yang, Lixiang

    2016-05-01

    The diversity and ecological significance of bacteria and archaea in deep-sea environments have been thoroughly investigated, but eukaryotic microorganisms in these areas, such as fungi, are poorly understood. To elucidate fungal diversity in calcareous deep-sea sediments in the Southwest India Ridge (SWIR), the internal transcribed spacer (ITS) regions of rRNA genes from two sediment metagenomic DNA samples were amplified and sequenced using the Illumina sequencing platform. The results revealed that 58-63 % and 36-42 % of the ITS sequences (97 % similarity) belonged to Basidiomycota and Ascomycota, respectively. These findings suggest that Basidiomycota and Ascomycota are the predominant fungal phyla in the two samples. We also found that Agaricomycetes, Leotiomycetes, and Pezizomycetes were the major fungal classes in the two samples. At the species level, Thelephoraceae sp. and Phialocephala fortinii were major fungal species in the two samples. Despite the low relative abundance, unidentified fungal sequences were also observed in the two samples. Furthermore, we found that there were slight differences in fungal diversity between the two sediment samples, although both were collected from the SWIR. Thus, our results demonstrate that calcareous deep-sea sediments in the SWIR harbor diverse fungi, which augment the fungal groups in deep-sea sediments. This is the first report of fungal communities in calcareous deep-sea sediments in the SWIR revealed by Illumina sequencing.

  8. Deep sampling of the Palomero maize transcriptome by a high throughput strategy of pyrosequencing.

    PubMed

    Vega-Arreguín, Julio C; Ibarra-Laclette, Enrique; Jiménez-Moraila, Beatriz; Martínez, Octavio; Vielle-Calzada, Jean Philippe; Herrera-Estrella, Luis; Herrera-Estrella, Alfredo

    2009-07-06

    In-depth sequencing analysis has not been able to determine the overall complexity of transcriptional activity of a plant organ or tissue sample. In some cases, deep parallel sequencing of Expressed Sequence Tags (ESTs), although not yet optimized for the sequencing of cDNAs, has represented an efficient procedure for validating gene prediction and estimating overall gene coverage. This approach could be very valuable for complex plant genomes. In addition, little emphasis has been given to efforts aiming at an estimation of the overall transcriptional universe found in a multicellular organism at a specific developmental stage. To explore, in depth, the transcriptional diversity in an ancient maize landrace, we developed a protocol to optimize the sequencing of cDNAs and performed 4 consecutive GS20-454 pyrosequencing runs of a cDNA library obtained from 2 week-old Palomero Toluqueño maize plants. The protocol reported here allowed obtaining over 90% of informative sequences. These GS20-454 runs generated over 1.5 Million reads, representing the largest amount of sequences reported from a single plant cDNA library. A collection of 367,391 quality-filtered reads (30.09 Mb) from a single run was sufficient to identify transcripts corresponding to 34% of public maize ESTs databases; total sequences generated after 4 filtered runs increased this coverage to 50%. Comparisons of all 1.5 Million reads to the Maize Assembled Genomic Islands (MAGIs) provided evidence for the transcriptional activity of 11% of MAGIs. We estimate that 5.67% (86,069 sequences) do not align with public ESTs or annotated genes, potentially representing new maize transcripts. Following the assembly of 74.4% of the reads in 65,493 contigs, real-time PCR of selected genes confirmed a predicted correlation between the abundance of GS20-454 sequences and corresponding levels of gene expression. A protocol was developed that significantly increases the number, length and quality of cDNA reads using massive 454 parallel sequencing. We show that recurrent 454 pyrosequencing of a single cDNA sample is necessary to attain a thorough representation of the transcriptional universe present in maize, that can also be used to estimate transcript abundance of specific genes. This data suggests that the molecular and functional diversity contained in the vast native landraces remains to be explored, and that large-scale transcriptional sequencing of a presumed ancestor of the modern maize varieties represents a valuable approach to characterize the functional diversity of maize for future agricultural and evolutionary studies.

  9. Deep nirS amplicon sequencing of San Francisco Bay sediments enables prediction of geography and environmental conditions from denitrifying community composition.

    PubMed

    Lee, Jessica A; Francis, Christopher A

    2017-12-01

    Denitrification is a dominant nitrogen loss process in the sediments of San Francisco Bay. In this study, we sought to understand the ecology of denitrifying bacteria by using next-generation sequencing (NGS) to survey the diversity of a denitrification functional gene, nirS (encoding cytchrome-cd 1 nitrite reductase), along the salinity gradient of San Francisco Bay over the course of a year. We compared our dataset to a library of nirS sequences obtained previously from the same samples by standard PCR cloning and Sanger sequencing, and showed that both methods similarly demonstrated geography, salinity and, to a lesser extent, nitrogen, to be strong determinants of community composition. Furthermore, the depth afforded by NGS enabled novel techniques for measuring the association between environment and community composition. We used Random Forests modelling to demonstrate that the site and salinity of a sample could be predicted from its nirS sequences, and to identify indicator taxa associated with those environmental characteristics. This work contributes significantly to our understanding of the distribution and dynamics of denitrifying communities in San Francisco Bay, and provides valuable tools for the further study of this key N-cycling guild in all estuarine systems. © 2017 Society for Applied Microbiology and John Wiley & Sons Ltd.

  10. pseudoMap: an innovative and comprehensive resource for identification of siRNA-mediated mechanisms in human transcribed pseudogenes.

    PubMed

    Chan, Wen-Ling; Yang, Wen-Kuang; Huang, Hsien-Da; Chang, Jan-Gowth

    2013-01-01

    RNA interference (RNAi) is a gene silencing process within living cells, which is controlled by the RNA-induced silencing complex with a sequence-specific manner. In flies and mice, the pseudogene transcripts can be processed into short interfering RNAs (siRNAs) that regulate protein-coding genes through the RNAi pathway. Following these findings, we construct an innovative and comprehensive database to elucidate siRNA-mediated mechanism in human transcribed pseudogenes (TPGs). To investigate TPG producing siRNAs that regulate protein-coding genes, we mapped the TPGs to small RNAs (sRNAs) that were supported by publicly deep sequencing data from various sRNA libraries and constructed the TPG-derived siRNA-target interactions. In addition, we also presented that TPGs can act as a target for miRNAs that actually regulate the parental gene. To enable the systematic compilation and updating of these results and additional information, we have developed a database, pseudoMap, capturing various types of information, including sequence data, TPG and cognate annotation, deep sequencing data, RNA-folding structure, gene expression profiles, miRNA annotation and target prediction. As our knowledge, pseudoMap is the first database to demonstrate two mechanisms of human TPGs: encoding siRNAs and decoying miRNAs that target the parental gene. pseudoMap is freely accessible at http://pseudomap.mbc.nctu.edu.tw/. Database URL: http://pseudomap.mbc.nctu.edu.tw/

  11. The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line.

    PubMed

    Suzuki, Harukazu; Forrest, Alistair R R; van Nimwegen, Erik; Daub, Carsten O; Balwierz, Piotr J; Irvine, Katharine M; Lassmann, Timo; Ravasi, Timothy; Hasegawa, Yuki; de Hoon, Michiel J L; Katayama, Shintaro; Schroder, Kate; Carninci, Piero; Tomaru, Yasuhiro; Kanamori-Katayama, Mutsumi; Kubosaki, Atsutaka; Akalin, Altuna; Ando, Yoshinari; Arner, Erik; Asada, Maki; Asahara, Hiroshi; Bailey, Timothy; Bajic, Vladimir B; Bauer, Denis; Beckhouse, Anthony G; Bertin, Nicolas; Björkegren, Johan; Brombacher, Frank; Bulger, Erika; Chalk, Alistair M; Chiba, Joe; Cloonan, Nicole; Dawe, Adam; Dostie, Josee; Engström, Pär G; Essack, Magbubah; Faulkner, Geoffrey J; Fink, J Lynn; Fredman, David; Fujimori, Ko; Furuno, Masaaki; Gojobori, Takashi; Gough, Julian; Grimmond, Sean M; Gustafsson, Mika; Hashimoto, Megumi; Hashimoto, Takehiro; Hatakeyama, Mariko; Heinzel, Susanne; Hide, Winston; Hofmann, Oliver; Hörnquist, Michael; Huminiecki, Lukasz; Ikeo, Kazuho; Imamoto, Naoko; Inoue, Satoshi; Inoue, Yusuke; Ishihara, Ryoko; Iwayanagi, Takao; Jacobsen, Anders; Kaur, Mandeep; Kawaji, Hideya; Kerr, Markus C; Kimura, Ryuichiro; Kimura, Syuhei; Kimura, Yasumasa; Kitano, Hiroaki; Koga, Hisashi; Kojima, Toshio; Kondo, Shinji; Konno, Takeshi; Krogh, Anders; Kruger, Adele; Kumar, Ajit; Lenhard, Boris; Lennartsson, Andreas; Lindow, Morten; Lizio, Marina; Macpherson, Cameron; Maeda, Norihiro; Maher, Christopher A; Maqungo, Monique; Mar, Jessica; Matigian, Nicholas A; Matsuda, Hideo; Mattick, John S; Meier, Stuart; Miyamoto, Sei; Miyamoto-Sato, Etsuko; Nakabayashi, Kazuhiko; Nakachi, Yutaka; Nakano, Mika; Nygaard, Sanne; Okayama, Toshitsugu; Okazaki, Yasushi; Okuda-Yabukami, Haruka; Orlando, Valerio; Otomo, Jun; Pachkov, Mikhail; Petrovsky, Nikolai; Plessy, Charles; Quackenbush, John; Radovanovic, Aleksandar; Rehli, Michael; Saito, Rintaro; Sandelin, Albin; Schmeier, Sebastian; Schönbach, Christian; Schwartz, Ariel S; Semple, Colin A; Sera, Miho; Severin, Jessica; Shirahige, Katsuhiko; Simons, Cas; St Laurent, George; Suzuki, Masanori; Suzuki, Takahiro; Sweet, Matthew J; Taft, Ryan J; Takeda, Shizu; Takenaka, Yoichi; Tan, Kai; Taylor, Martin S; Teasdale, Rohan D; Tegnér, Jesper; Teichmann, Sarah; Valen, Eivind; Wahlestedt, Claes; Waki, Kazunori; Waterhouse, Andrew; Wells, Christine A; Winther, Ole; Wu, Linda; Yamaguchi, Kazumi; Yanagawa, Hiroshi; Yasuda, Jun; Zavolan, Mihaela; Hume, David A; Arakawa, Takahiro; Fukuda, Shiro; Imamura, Kengo; Kai, Chikatoshi; Kaiho, Ai; Kawashima, Tsugumi; Kawazu, Chika; Kitazume, Yayoi; Kojima, Miki; Miura, Hisashi; Murakami, Kayoko; Murata, Mitsuyoshi; Ninomiya, Noriko; Nishiyori, Hiromi; Noma, Shohei; Ogawa, Chihiro; Sano, Takuma; Simon, Christophe; Tagami, Michihira; Takahashi, Yukari; Kawai, Jun; Hayashizaki, Yoshihide

    2009-05-01

    Using deep sequencing (deepCAGE), the FANTOM4 study measured the genome-wide dynamics of transcription-start-site usage in the human monocytic cell line THP-1 throughout a time course of growth arrest and differentiation. Modeling the expression dynamics in terms of predicted cis-regulatory sites, we identified the key transcription regulators, their time-dependent activities and target genes. Systematic siRNA knockdown of 52 transcription factors confirmed the roles of individual factors in the regulatory network. Our results indicate that cellular states are constrained by complex networks involving both positive and negative regulatory interactions among substantial numbers of transcription factors and that no single transcription factor is both necessary and sufficient to drive the differentiation process.

  12. Deep sequencing of foot-and-mouth disease virus reveals RNA sequences involved in genome packaging.

    PubMed

    Logan, Grace; Newman, Joseph; Wright, Caroline F; Lasecka-Dykes, Lidia; Haydon, Daniel T; Cottam, Eleanor M; Tuthill, Tobias J

    2017-10-18

    Non-enveloped viruses protect their genomes by packaging them into an outer shell or capsid of virus-encoded proteins. Packaging and capsid assembly in RNA viruses can involve interactions between capsid proteins and secondary structures in the viral genome as exemplified by the RNA bacteriophage MS2 and as proposed for other RNA viruses of plants, animals and human. In the picornavirus family of non-enveloped RNA viruses, the requirements for genome packaging remain poorly understood. Here we show a novel and simple approach to identify predicted RNA secondary structures involved in genome packaging in the picornavirus foot-and-mouth disease virus (FMDV). By interrogating deep sequencing data generated from both packaged and unpackaged populations of RNA we have determined multiple regions of the genome with constrained variation in the packaged population. Predicted secondary structures of these regions revealed stem loops with conservation of structure and a common motif at the loop. Disruption of these features resulted in attenuation of virus growth in cell culture due to a reduction in assembly of mature virions. This study provides evidence for the involvement of predicted RNA structures in picornavirus packaging and offers a readily transferable methodology for identifying packaging requirements in many other viruses. Importance In order to transmit their genetic material to a new host, non-enveloped viruses must protect their genomes by packaging them into an outer shell or capsid of virus-encoded proteins. For many non-enveloped RNA viruses the requirements for this critical part of the viral life cycle remain poorly understood. We have identified RNA sequences involved in genome packaging of the picornavirus foot-and-mouth disease virus. This virus causes an economically devastating disease of livestock affecting both the developed and developing world. The experimental methods developed to carry out this work are novel, simple and transferable to the study of packaging signals in other RNA viruses. Improved understanding of RNA packaging may lead to novel vaccine approaches or targets for antiviral drugs with broad spectrum activity. Copyright © 2017 Logan et al.

  13. Population-genomic variation within RNA viruses of the Western honey bee, Apis mellifera, inferred from deep sequencing

    USDA-ARS?s Scientific Manuscript database

    Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RN...

  14. RNAseq versus genome-predicted transcriptomes: a large population of novel transcripts identified in an Illumina-454 Hydra transcriptome.

    PubMed

    Wenger, Yvan; Galliot, Brigitte

    2013-03-25

    Evolutionary studies benefit from deep sequencing technologies that generate genomic and transcriptomic sequences from a variety of organisms. Genome sequencing and RNAseq have complementary strengths. In this study, we present the assembly of the most complete Hydra transcriptome to date along with a comparative analysis of the specific features of RNAseq and genome-predicted transcriptomes currently available in the freshwater hydrozoan Hydra vulgaris. To produce an accurate and extensive Hydra transcriptome, we combined Illumina and 454 Titanium reads, giving the primacy to Illumina over 454 reads to correct homopolymer errors. This strategy yielded an RNAseq transcriptome that contains 48'909 unique sequences including splice variants, representing approximately 24'450 distinct genes. Comparative analysis to the available genome-predicted transcriptomes identified 10'597 novel Hydra transcripts that encode 529 evolutionarily-conserved proteins. The annotation of 170 human orthologs points to critical functions in protein biosynthesis, FGF and TOR signaling, vesicle transport, immunity, cell cycle regulation, cell death, mitochondrial metabolism, transcription and chromatin regulation. However, a majority of these novel transcripts encodes short ORFs, at least 767 of them corresponding to pseudogenes. This RNAseq transcriptome also lacks 11'270 predicted transcripts that correspond either to silent genes or to genes expressed below the detection level of this study. We established a simple and powerful strategy to combine Illumina and 454 reads and we produced, with genome assistance, an extensive and accurate Hydra transcriptome. The comparative analysis of the RNAseq transcriptome with genome-predicted transcriptomes lead to the identification of large populations of novel as well as missing transcripts that might reflect Hydra-specific evolutionary events.

  15. RNAseq versus genome-predicted transcriptomes: a large population of novel transcripts identified in an Illumina-454 Hydra transcriptome

    PubMed Central

    2013-01-01

    Background Evolutionary studies benefit from deep sequencing technologies that generate genomic and transcriptomic sequences from a variety of organisms. Genome sequencing and RNAseq have complementary strengths. In this study, we present the assembly of the most complete Hydra transcriptome to date along with a comparative analysis of the specific features of RNAseq and genome-predicted transcriptomes currently available in the freshwater hydrozoan Hydra vulgaris. Results To produce an accurate and extensive Hydra transcriptome, we combined Illumina and 454 Titanium reads, giving the primacy to Illumina over 454 reads to correct homopolymer errors. This strategy yielded an RNAseq transcriptome that contains 48’909 unique sequences including splice variants, representing approximately 24’450 distinct genes. Comparative analysis to the available genome-predicted transcriptomes identified 10’597 novel Hydra transcripts that encode 529 evolutionarily-conserved proteins. The annotation of 170 human orthologs points to critical functions in protein biosynthesis, FGF and TOR signaling, vesicle transport, immunity, cell cycle regulation, cell death, mitochondrial metabolism, transcription and chromatin regulation. However, a majority of these novel transcripts encodes short ORFs, at least 767 of them corresponding to pseudogenes. This RNAseq transcriptome also lacks 11’270 predicted transcripts that correspond either to silent genes or to genes expressed below the detection level of this study. Conclusions We established a simple and powerful strategy to combine Illumina and 454 reads and we produced, with genome assistance, an extensive and accurate Hydra transcriptome. The comparative analysis of the RNAseq transcriptome with genome-predicted transcriptomes lead to the identification of large populations of novel as well as missing transcripts that might reflect Hydra-specific evolutionary events. PMID:23530871

  16. Geoseq: a tool for dissecting deep-sequencing datasets.

    PubMed

    Gurtowski, James; Cancio, Anthony; Shah, Hardik; Levovitz, Chaya; George, Ajish; Homann, Robert; Sachidanandam, Ravi

    2010-10-12

    Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a) identify differential isoform expression in mRNA-seq datasets, b) identify miRNAs (microRNAs) in libraries, and identify mature and star sequences in miRNAS and c) to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.

  17. Chemolithotrophy in the continental deep subsurface: Sanford Underground Research Facility (SURF), USA

    PubMed Central

    Osburn, Magdalena R.; LaRowe, Douglas E.; Momper, Lily M.; Amend, Jan P.

    2014-01-01

    The deep subsurface is an enormous repository of microbial life. However, the metabolic capabilities of these microorganisms and the degree to which they are dependent on surface processes are largely unknown. Due to the logistical difficulty of sampling and inherent heterogeneity, the microbial populations of the terrestrial subsurface are poorly characterized. In an effort to better understand the biogeochemistry of deep terrestrial habitats, we evaluate the energetic yield of chemolithotrophic metabolisms and microbial diversity in the Sanford Underground Research Facility (SURF) in the former Homestake Gold Mine, SD, USA. Geochemical data, energetic modeling, and DNA sequencing were combined with principle component analysis to describe this deep (down to 8100 ft below surface), terrestrial environment. SURF provides access into an iron-rich Paleoproterozoic metasedimentary deposit that contains deeply circulating groundwater. Geochemical analyses of subsurface fluids reveal enormous geochemical diversity ranging widely in salinity, oxidation state (ORP 330 to −328 mV), and concentrations of redox sensitive species (e.g., Fe2+ from near 0 to 6.2 mg/L and Σ S2- from 7 to 2778μg/L). As a direct result of this compositional buffet, Gibbs energy calculations reveal an abundance of energy for microorganisms from the oxidation of sulfur, iron, nitrogen, methane, and manganese. Pyrotag DNA sequencing reveals diverse communities of chemolithoautotrophs, thermophiles, aerobic and anaerobic heterotrophs, and numerous uncultivated clades. Extrapolated across the mine footprint, these data suggest a complex spatial mosaic of subsurface primary productivity that is in good agreement with predicted energy yields. Notably, we report Gibbs energy normalized both per mole of reaction and per kg fluid (energy density) and find the later to be more consistent with observed physiologies and environmental conditions. Further application of this approach will significantly expand our understanding of the deep terrestrial biosphere. PMID:25429287

  18. Detection of Emerging Vaccine-Related Polioviruses by Deep Sequencing.

    PubMed

    Sahoo, Malaya K; Holubar, Marisa; Huang, ChunHong; Mohamed-Hadley, Alisha; Liu, Yuanyuan; Waggoner, Jesse J; Troy, Stephanie B; Garcia-Garcia, Lourdes; Ferreyra-Reyes, Leticia; Maldonado, Yvonne; Pinsky, Benjamin A

    2017-07-01

    Oral poliovirus vaccine can mutate to regain neurovirulence. To date, evaluation of these mutations has been performed primarily on culture-enriched isolates by using conventional Sanger sequencing. We therefore developed a culture-independent, deep-sequencing method targeting the 5' untranslated region (UTR) and P1 genomic region to characterize vaccine-related poliovirus variants. Error analysis of the deep-sequencing method demonstrated reliable detection of poliovirus mutations at levels of <1%, depending on read depth. Sequencing of viral nucleic acids from the stool of vaccinated, asymptomatic children and their close contacts collected during a prospective cohort study in Veracruz, Mexico, revealed no vaccine-derived polioviruses. This was expected given that the longest duration between sequenced sample collection and the end of the most recent national immunization week was 66 days. However, we identified many low-level variants (<5%) distributed across the 5' UTR and P1 genomic region in all three Sabin serotypes, as well as vaccine-related viruses with multiple canonical mutations associated with phenotypic reversion present at high levels (>90%). These results suggest that monitoring emerging vaccine-related poliovirus variants by deep sequencing may aid in the poliovirus endgame and efforts to ensure global polio eradication. Copyright © 2017 Sahoo et al.

  19. Rational Protein Engineering Guided by Deep Mutational Scanning

    PubMed Central

    Shin, HyeonSeok; Cho, Byung-Kwan

    2015-01-01

    Sequence–function relationship in a protein is commonly determined by the three-dimensional protein structure followed by various biochemical experiments. However, with the explosive increase in the number of genome sequences, facilitated by recent advances in sequencing technology, the gap between protein sequences available and three-dimensional structures is rapidly widening. A recently developed method termed deep mutational scanning explores the functional phenotype of thousands of mutants via massive sequencing. Coupled with a highly efficient screening system, this approach assesses the phenotypic changes made by the substitution of each amino acid sequence that constitutes a protein. Such an informational resource provides the functional role of each amino acid sequence, thereby providing sufficient rationale for selecting target residues for protein engineering. Here, we discuss the current applications of deep mutational scanning and consider experimental design. PMID:26404267

  20. Deep Sequencing in Infectious Diseases: Immune and Pathogen Repertoires for the Improvement of Patient Outcomes.

    PubMed

    Burkholder, William F; Newell, Evan W; Poidinger, Michael; Chen, Swaine; Fink, Katja

    2017-01-01

    The inaugural workshop "Deep Sequencing in Infectious Diseases: Immune and Pathogen Repertoires for the Improvement of Patient Outcomes" was held in Singapore on 13-14 October 2016. The aim of the workshop was to discuss the latest trends in using high-throughput sequencing, bioinformatics, and allied technologies to analyze immune and pathogen repertoires and their interplay within the host, bringing together key international players in the field and Singapore-based researchers and clinician-scientists. The focus was in particular on the application of these technologies for the improvement of patient diagnosis, prognosis and treatment, and for other broad public health outcomes. The presentations by scientists and clinicians showed the potential of deep sequencing technology to capture the coevolution of adaptive immunity and pathogens. For clinical applications, some key challenges remain, such as the long turnaround time and relatively high cost of deep sequencing for pathogen identification and characterization and the lack of international standardization in immune repertoire analysis.

  1. Deep Sequencing in Infectious Diseases: Immune and Pathogen Repertoires for the Improvement of Patient Outcomes

    PubMed Central

    Burkholder, William F.; Newell, Evan W.; Poidinger, Michael; Chen, Swaine; Fink, Katja

    2017-01-01

    The inaugural workshop “Deep Sequencing in Infectious Diseases: Immune and Pathogen Repertoires for the Improvement of Patient Outcomes” was held in Singapore on 13–14 October 2016. The aim of the workshop was to discuss the latest trends in using high-throughput sequencing, bioinformatics, and allied technologies to analyze immune and pathogen repertoires and their interplay within the host, bringing together key international players in the field and Singapore-based researchers and clinician-scientists. The focus was in particular on the application of these technologies for the improvement of patient diagnosis, prognosis and treatment, and for other broad public health outcomes. The presentations by scientists and clinicians showed the potential of deep sequencing technology to capture the coevolution of adaptive immunity and pathogens. For clinical applications, some key challenges remain, such as the long turnaround time and relatively high cost of deep sequencing for pathogen identification and characterization and the lack of international standardization in immune repertoire analysis. PMID:28620372

  2. Genetic Determinants of Drug Resistance in Mycobacterium tuberculosis and Their Diagnostic Value.

    PubMed

    Farhat, Maha R; Sultana, Razvan; Iartchouk, Oleg; Bozeman, Sam; Galagan, James; Sisk, Peter; Stolte, Christian; Nebenzahl-Guimaraes, Hanna; Jacobson, Karen; Sloutsky, Alexander; Kaur, Devinder; Posey, James; Kreiswirth, Barry N; Kurepina, Natalia; Rigouts, Leen; Streicher, Elizabeth M; Victor, Tommie C; Warren, Robin M; van Soolingen, Dick; Murray, Megan

    2016-09-01

    The development of molecular diagnostics that detect both the presence of Mycobacterium tuberculosis in clinical samples and drug resistance-conferring mutations promises to revolutionize patient care and interrupt transmission by ensuring early diagnosis. However, these tools require the identification of genetic determinants of resistance to the full range of antituberculosis drugs. To determine the optimal molecular approach needed, we sought to create a comprehensive catalog of resistance mutations and assess their sensitivity and specificity in diagnosing drug resistance. We developed and validated molecular inversion probes for DNA capture and deep sequencing of 28 drug-resistance loci in M. tuberculosis. We used the probes for targeted sequencing of a geographically diverse set of 1,397 clinical M. tuberculosis isolates with known drug resistance phenotypes. We identified a minimal set of mutations to predict resistance to first- and second-line antituberculosis drugs and validated our predictions in an independent dataset. We constructed and piloted a web-based database that provides public access to the sequence data and prediction tool. The predicted resistance to rifampicin and isoniazid exceeded 90% sensitivity and specificity but was lower for other drugs. The number of mutations needed to diagnose resistance is large, and for the 13 drugs studied it was 238 across 18 genetic loci. These data suggest that a comprehensive M. tuberculosis drug resistance diagnostic will need to allow for a high dimension of mutation detection. They also support the hypothesis that currently unknown genetic determinants, potentially discoverable by whole-genome sequencing, encode resistance to second-line tuberculosis drugs.

  3. Genetic Determinants of Drug Resistance in Mycobacterium tuberculosis and Their Diagnostic Value

    PubMed Central

    Sultana, Razvan; Iartchouk, Oleg; Bozeman, Sam; Galagan, James; Sisk, Peter; Stolte, Christian; Nebenzahl-Guimaraes, Hanna; Jacobson, Karen; Sloutsky, Alexander; Kaur, Devinder; Posey, James; Kreiswirth, Barry N.; Kurepina, Natalia; Rigouts, Leen; Streicher, Elizabeth M.; Victor, Tommie C.; Warren, Robin M.; van Soolingen, Dick; Murray, Megan

    2016-01-01

    Rationale: The development of molecular diagnostics that detect both the presence of Mycobacterium tuberculosis in clinical samples and drug resistance–conferring mutations promises to revolutionize patient care and interrupt transmission by ensuring early diagnosis. However, these tools require the identification of genetic determinants of resistance to the full range of antituberculosis drugs. Objectives: To determine the optimal molecular approach needed, we sought to create a comprehensive catalog of resistance mutations and assess their sensitivity and specificity in diagnosing drug resistance. Methods: We developed and validated molecular inversion probes for DNA capture and deep sequencing of 28 drug-resistance loci in M. tuberculosis. We used the probes for targeted sequencing of a geographically diverse set of 1,397 clinical M. tuberculosis isolates with known drug resistance phenotypes. We identified a minimal set of mutations to predict resistance to first- and second-line antituberculosis drugs and validated our predictions in an independent dataset. We constructed and piloted a web-based database that provides public access to the sequence data and prediction tool. Measurements and Main Results: The predicted resistance to rifampicin and isoniazid exceeded 90% sensitivity and specificity but was lower for other drugs. The number of mutations needed to diagnose resistance is large, and for the 13 drugs studied it was 238 across 18 genetic loci. Conclusions: These data suggest that a comprehensive M. tuberculosis drug resistance diagnostic will need to allow for a high dimension of mutation detection. They also support the hypothesis that currently unknown genetic determinants, potentially discoverable by whole-genome sequencing, encode resistance to second-line tuberculosis drugs. PMID:26910495

  4. Identifying MicroRNAs and Transcript Targets in Jatropha Seeds

    PubMed Central

    Galli, Vanessa; Guzman, Frank; de Oliveira, Luiz F. V.; Loss-Morais, Guilherme; Körbes, Ana P.; Silva, Sérgio D. A.; Margis-Pinheiro, Márcia M. A. N.; Margis, Rogério

    2014-01-01

    MicroRNAs, or miRNAs, are endogenously encoded small RNAs that play a key role in diverse plant biological processes. Jatropha curcas L. has received significant attention as a potential oilseed crop for the production of renewable oil. Here, a sRNA library of mature seeds and three mRNA libraries from three different seed development stages were generated by deep sequencing to identify and characterize the miRNAs and pre-miRNAs of J. curcas. Computational analysis was used for the identification of 180 conserved miRNAs and 41 precursors (pre-miRNAs) as well as 16 novel pre-miRNAs. The predicted miRNA target genes are involved in a broad range of physiological functions, including cellular structure, nuclear function, translation, transport, hormone synthesis, defense, and lipid metabolism. Some pre-miRNA and miRNA targets vary in abundance between the three stages of seed development. A search for sequences that produce siRNA was performed, and the results indicated that J. curcas siRNAs play a role in nuclear functions, transport, catalytic processes and disease resistance. This study presents the first large scale identification of J. curcas miRNAs and their targets in mature seeds based on deep sequencing, and it contributes to a functional understanding of these miRNAs. PMID:24551031

  5. Genomic analysis suggests that mRNA destabilization by the microprocessor is specialized for the auto-regulation of Dgcr8.

    PubMed

    Shenoy, Archana; Blelloch, Robert

    2009-09-11

    The Microprocessor, containing the RNA binding protein Dgcr8 and RNase III enzyme Drosha, is responsible for processing primary microRNAs to precursor microRNAs. The Microprocessor regulates its own levels by cleaving hairpins in the 5'UTR and coding region of the Dgcr8 mRNA, thereby destabilizing the mature transcript. To determine whether the Microprocessor has a broader role in directly regulating other coding mRNA levels, we integrated results from expression profiling and ultra high-throughput deep sequencing of small RNAs. Expression analysis of mRNAs in wild-type, Dgcr8 knockout, and Dicer knockout mouse embryonic stem (ES) cells uncovered mRNAs that were specifically upregulated in the Dgcr8 null background. A number of these transcripts had evolutionarily conserved predicted hairpin targets for the Microprocessor. However, analysis of deep sequencing data of 18 to 200nt small RNAs in mouse ES, HeLa, and HepG2 indicates that exonic sequence reads that map in a pattern consistent with Microprocessor activity are unique to Dgcr8. We conclude that the Microprocessor's role in directly destabilizing coding mRNAs is likely specifically targeted to Dgcr8 itself, suggesting a specialized cellular mechanism for gene auto-regulation.

  6. Upside-down sequence stratigraphy, sandy highstands, and muddy prograding complexes in the Surma Basin, Bangladesh

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Radovich, B.J.; Hoffman, M.W.; Perlmutter, M.A.

    1995-12-31

    Several large, TCF-size gas fields have been discovered in the Surma Basin, Bangladesh. Detailed sequence stratigraphy was performed on log and seismic data to study these fields and future potential of the area. The prospective section is Upper Miocene sands caught up in a series of younger compressional fault-related folds caused by the Indian Plate colliding with S.E. Asia in the late Tertiary. World-class gas/water contacts are observed on the seismic data over the fields. Sequence stratigraphic techniques reveal an ordered, predictable stratigraphic architecture of sandy highstands and transgressions, and muddy aggraded prograding complexes with deep incisions at each sequencemore » boundary. This serves as a framework to understand the hydrocarbon accumulations in the area. Cyclostratigraphy is used to understand the unusual lithology distributions in the basin.« less

  7. Deep RNA sequencing analysis of readthrough gene fusions in human prostate adenocarcinoma and reference samples

    PubMed Central

    2011-01-01

    Background Readthrough fusions across adjacent genes in the genome, or transcription-induced chimeras (TICs), have been estimated using expressed sequence tag (EST) libraries to involve 4-6% of all genes. Deep transcriptional sequencing (RNA-Seq) now makes it possible to study the occurrence and expression levels of TICs in individual samples across the genome. Methods We performed single-end RNA-Seq on three human prostate adenocarcinoma samples and their corresponding normal tissues, as well as brain and universal reference samples. We developed two bioinformatics methods to specifically identify TIC events: a targeted alignment method using artificial exon-exon junctions within 200,000 bp from adjacent genes, and genomic alignment allowing splicing within individual reads. We performed further experimental verification and characterization of selected TIC and fusion events using quantitative RT-PCR and comparative genomic hybridization microarrays. Results Targeted alignment against artificial exon-exon junctions yielded 339 distinct TIC events, including 32 gene pairs with multiple isoforms. The false discovery rate was estimated to be 1.5%. Spliced alignment to the genome was less sensitive, finding only 18% of those found by targeted alignment in 33-nt reads and 59% of those in 50-nt reads. However, spliced alignment revealed 30 cases of TICs with intervening exons, in addition to distant inversions, scrambled genes, and translocations. Our findings increase the catalog of observed TIC gene pairs by 66%. We verified 6 of 6 predicted TICs in all prostate samples, and 2 of 5 predicted novel distant gene fusions, both private events among 54 prostate tumor samples tested. Expression of TICs correlates with that of the upstream gene, which can explain the prostate-specific pattern of some TIC events and the restriction of the SLC45A3-ELK4 e4-e2 TIC to ERG-negative prostate samples, as confirmed in 20 matched prostate tumor and normal samples and 9 lung cancer cell lines. Conclusions Deep transcriptional sequencing and analysis with targeted and spliced alignment methods can effectively identify TIC events across the genome in individual tissues. Prostate and reference samples exhibit a wide range of TIC events, involving more genes than estimated previously using ESTs. Tissue specificity of TIC events is correlated with expression patterns of the upstream gene. Some TIC events, such as MSMB-NCOA4, may play functional roles in cancer. PMID:21261984

  8. Identification and Functional Analysis of Flowering Related microRNAs in Common Wild Rice (Oryza rufipogon Griff.)

    PubMed Central

    Dong, Yibo; Yuan, Qianhua; Wang, Feng; Li, Weimin; Jiang, Ying; Jia, Shirong; Pei, XinWu

    2013-01-01

    Background MicroRNAs (miRNAs) is a class of non-coding RNAs involved in post- transcriptional control of gene expression, via degradation and/or translational inhibition. Six-hundred sixty-one rice miRNAs are known that are important in plant development. However, flowering-related miRNAs have not been characterized in Oryza rufipogon Griff. It was approved by supervision department of Guangdong wild rice protection. We analyzed flowering-related miRNAs in O. rufipogon using high-throughput sequencing (deep sequencing) to understand the changes that occurred during rice domestication, and to elucidate their functions in flowering. Results Three O. rufipogon sRNA libraries, two vegetative stage (CWR-V1 and CWR-V2) and one flowering stage (CWR-F2) were sequenced using Illumina deep sequencing. A total of 20,156,098, 21,531,511 and 20,995,942 high quality sRNA reads were obtained from CWR-V1, CWR-V2 and CWR-F2, respectively, of which 3,448,185, 4,265,048 and 2,833,527 reads matched known miRNAs. We identified 512 known rice miRNAs in 214 miRNA families and predicted 290 new miRNAs. Targeted functional annotation, GO and KEGG pathway analyses predicted that 187 miRNAs regulate expression of flowering-related genes. Differential expression analysis of flowering-related miRNAs showed that: expression of 95 miRNAs varied significantly between the libraries, 66 are flowering-related miRNAs, such as oru-miR97, oru-miR117, oru-miR135, oru-miR137, et al. 17 are early-flowering -related miRNAs, including osa-miR160f, osa-miR164d, osa-miR167d, osa-miR169a, osa-miR172b, oru-miR4, et al., induced during the floral transition. Real-time PCR revealed the same expression patterns as deep sequencing. miRNAs targets were confirmed for cleavage by 5′-RACE in vivo, and were negatively regulated by miRNAs. Conclusions This is the first investigation of flowering miRNAs in wild rice. The result indicates that variation in miRNAs occurred during rice domestication and lays a foundation for further study of phase change and flowering in O. rufipogon. Complicated regulatory networks mediated by multiple miRNAs regulate the expression of flowering genes that control the induction of flowering. PMID:24386120

  9. The Genome of Deep-Sea Vent Chemolithoautotroph Thiomicrospira crunogena XCL-2

    PubMed Central

    Scott, Kathleen M; Sievert, Stefan M; Abril, Fereniki N; Ball, Lois A; Barrett, Chantell J; Blake, Rodrigo A; Boller, Amanda J; Chain, Patrick S. G; Clark, Justine A; Davis, Carisa R; Detter, Chris; Do, Kimberly F; Dobrinski, Kimberly P; Faza, Brandon I; Fitzpatrick, Kelly A; Freyermuth, Sharyn K; Harmer, Tara L; Hauser, Loren J; Hügler, Michael; Kerfeld, Cheryl A; Klotz, Martin G; Kong, William W; Land, Miriam; Lapidus, Alla; Larimer, Frank W; Longo, Dana L; Lucas, Susan; Malfatti, Stephanie A; Massey, Steven E; Martin, Darlene D; McCuddin, Zoe; Meyer, Folker; Moore, Jessica L; Ocampo, Luis H; Paul, John H; Paulsen, Ian T; Reep, Douglas K; Ren, Qinghu; Ross, Rachel L; Sato, Priscila Y; Thomas, Phaedra; Tinkham, Lance E; Zeruth, Gary T

    2006-01-01

    Presented here is the complete genome sequence of Thiomicrospira crunogena XCL-2, representative of ubiquitous chemolithoautotrophic sulfur-oxidizing bacteria isolated from deep-sea hydrothermal vents. This gammaproteobacterium has a single chromosome (2,427,734 base pairs), and its genome illustrates many of the adaptations that have enabled it to thrive at vents globally. It has 14 methyl-accepting chemotaxis protein genes, including four that may assist in positioning it in the redoxcline. A relative abundance of coding sequences (CDSs) encoding regulatory proteins likely control the expression of genes encoding carboxysomes, multiple dissolved inorganic nitrogen and phosphate transporters, as well as a phosphonate operon, which provide this species with a variety of options for acquiring these substrates from the environment. Thiom. crunogena XCL-2 is unusual among obligate sulfur-oxidizing bacteria in relying on the Sox system for the oxidation of reduced sulfur compounds. The genome has characteristics consistent with an obligately chemolithoautotrophic lifestyle, including few transporters predicted to have organic allocrits, and Calvin-Benson-Bassham cycle CDSs scattered throughout the genome. PMID:17105352

  10. DNA Replication Profiling Using Deep Sequencing.

    PubMed

    Saayman, Xanita; Ramos-Pérez, Cristina; Brown, Grant W

    2018-01-01

    Profiling of DNA replication during progression through S phase allows a quantitative snap-shot of replication origin usage and DNA replication fork progression. We present a method for using deep sequencing data to profile DNA replication in S. cerevisiae.

  11. Deep developmental transcriptome sequencing uncovers numerous new genes and enhances gene annotation in the sponge Amphimedon queenslandica.

    PubMed

    Fernandez-Valverde, Selene L; Calcino, Andrew D; Degnan, Bernard M

    2015-05-15

    The demosponge Amphimedon queenslandica is amongst the few early-branching metazoans with an assembled and annotated draft genome, making it an important species in the study of the origin and early evolution of animals. Current gene models in this species are largely based on in silico predictions and low coverage expressed sequence tag (EST) evidence. Amphimedon queenslandica protein-coding gene models are improved using deep RNA-Seq data from four developmental stages and CEL-Seq data from 82 developmental samples. Over 86% of previously predicted genes are retained in the new gene models, although 24% have additional exons; there is also a marked increase in the total number of annotated 3' and 5' untranslated regions (UTRs). Importantly, these new developmental transcriptome data reveal numerous previously unannotated protein-coding genes in the Amphimedon genome, increasing the total gene number by 25%, from 30,060 to 40,122. In general, Amphimedon genes have introns that are markedly smaller than those in other animals and most of the alternatively spliced genes in Amphimedon undergo intron-retention; exon-skipping is the least common mode of alternative splicing. Finally, in addition to canonical polyadenylation signal sequences, Amphimedon genes are enriched in a number of unique AT-rich motifs in their 3' UTRs. The inclusion of developmental transcriptome data has substantially improved the structure and composition of protein-coding gene models in Amphimedon queenslandica, providing a more accurate and comprehensive set of genes for functional and comparative studies. These improvements reveal the Amphimedon genome is comprised of a remarkably high number of tightly packed genes. These genes have small introns and there is pervasive intron retention amongst alternatively spliced transcripts. These aspects of the sponge genome are more similar unicellular opisthokont genomes than to other animal genomes.

  12. Understanding the complex evolution of rapidly mutating viruses with deep sequencing: Beyond the analysis of viral diversity.

    PubMed

    Leung, Preston; Eltahla, Auda A; Lloyd, Andrew R; Bull, Rowena A; Luciani, Fabio

    2017-07-15

    With the advent of affordable deep sequencing technologies, detection of low frequency variants within genetically diverse viral populations can now be achieved with unprecedented depth and efficiency. The high-resolution data provided by next generation sequencing technologies is currently recognised as the gold standard in estimation of viral diversity. In the analysis of rapidly mutating viruses, longitudinal deep sequencing datasets from viral genomes during individual infection episodes, as well as at the epidemiological level during outbreaks, now allow for more sophisticated analyses such as statistical estimates of the impact of complex mutation patterns on the evolution of the viral populations both within and between hosts. These analyses are revealing more accurate descriptions of the evolutionary dynamics that underpin the rapid adaptation of these viruses to the host response, and to drug therapies. This review assesses recent developments in methods and provide informative research examples using deep sequencing data generated from rapidly mutating viruses infecting humans, particularly hepatitis C virus (HCV), human immunodeficiency virus (HIV), Ebola virus and influenza virus, to understand the evolution of viral genomes and to explore the relationship between viral mutations and the host adaptive immune response. Finally, we discuss limitations in current technologies, and future directions that take advantage of publically available large deep sequencing datasets. Copyright © 2016 Elsevier B.V. All rights reserved.

  13. SpliceRover: Interpretable Convolutional Neural: Networks for Improved Splice Site Prediction.

    PubMed

    Zuallaert, Jasper; Godin, Fréderic; Kim, Mijung; Soete, Arne; Saeys, Yvan; De Neve, Wesley

    2018-06-21

    During the last decade, improvements in high-throughput sequencing have generated a wealth of genomic data. Functionally interpreting these sequences and finding the biological signals that are hallmarks of gene function and regulation is currently mostly done using automated genome annotation platforms, which mainly rely on integrated machine learning frameworks to identify different functional sites of interest, including splice sites. Splicing is an essential step in the gene regulation process, and the correct identification of splice sites is a major cornerstone in a genome annotation system. In this paper, we present SpliceRover, a predictive deep learning approach that outperforms the state-of-the-art in splice site prediction. SpliceRover uses convolutional neural networks (CNNs), which have been shown to obtain cutting edge performance on a wide variety of prediction tasks. We adapted this approach to deal with genomic sequence inputs, and show it consistently outperforms already existing approaches, with relative improvements in prediction effectiveness of up to 80.9% when measured in terms of false discovery rate. However, a major criticism of CNNs concerns their "black box" nature, as mechanisms to obtain insight into their reasoning processes are limited. To facilitate interpretability of the SpliceRover models, we introduce an approach to visualize the biologically relevant information learnt. We show that our visualization approach is able to recover features known to be important for splice site prediction (binding motifs around the splice site, presence of polypyrimidine tracts and branch points), as well as reveal new features (e.g., several types of exclusion patterns near splice sites). SpliceRover is available as a web service. The prediction tool and instructions can be found at http://bioit2.irc.ugent.be/splicerover/. Supplementary materials are available at Bioinformatics online.

  14. Improved detection of CXCR4-using HIV by V3 genotyping: application of population-based and "deep" sequencing to plasma RNA and proviral DNA.

    PubMed

    Swenson, Luke C; Moores, Andrew; Low, Andrew J; Thielen, Alexander; Dong, Winnie; Woods, Conan; Jensen, Mark A; Wynhoven, Brian; Chan, Dennison; Glascock, Christopher; Harrigan, P Richard

    2010-08-01

    Tropism testing should rule out CXCR4-using HIV before treatment with CCR5 antagonists. Currently, the recombinant phenotypic Trofile assay (Monogram) is most widely utilized; however, genotypic tests may represent alternative methods. Independent triplicate amplifications of the HIV gp120 V3 region were made from either plasma HIV RNA or proviral DNA. These underwent standard, population-based sequencing with an ABI3730 (RNA n = 63; DNA n = 40), or "deep" sequencing with a Roche/454 Genome Sequencer-FLX (RNA n = 12; DNA n = 12). Position-specific scoring matrices (PSSMX4/R5) (-6.96 cutoff) and geno2pheno[coreceptor] (5% false-positive rate) inferred tropism from V3 sequence. These methods were then independently validated with a separate, blinded dataset (n = 278) of screening samples from the maraviroc MOTIVATE trials. Standard sequencing of HIV RNA with PSSM yielded 69% sensitivity and 91% specificity, relative to Trofile. The validation dataset gave 75% sensitivity and 83% specificity. Proviral DNA plus PSSM gave 77% sensitivity and 71% specificity. "Deep" sequencing of HIV RNA detected >2% inferred-CXCR4-using virus in 8/8 samples called non-R5 by Trofile, and <2% in 4/4 samples called R5. Triplicate analyses of V3 standard sequence data detect greater proportions of CXCR4-using samples than previously achieved. Sequencing proviral DNA and "deep" V3 sequencing may also be useful tools for assessing tropism.

  15. Sedimentology and Sedimentary Dynamics of the Desmoinesian Cherokee Group, Deep Anadarko Basin, Texas Panhandle

    NASA Astrophysics Data System (ADS)

    Hu, N.; Loucks, R.; Frebourg, G.

    2015-12-01

    Understanding the spatial variability of deep-water facies is critical to deep-water research because of its revealing information about the relationship between desity flow processes and their resultant sedimentary sequences. The Cherokee Group in the Anadarko Basin, northeastern Texas Panhandle, provides an opportunity to investigate an icehouse-greenhouse Pennsylvanian hybrid system that well demonstrates the intricacies of vertical and lateral facies relationships in an unconfined fan-delta fed deep-water slope to basinal setting. The stratigraphic section ranges in thickness from 150 to 460 m. The cyclic sedimentation and foreland basin tectonics resulted in a complex stratal architecture that was sourced by multiple areas of sediment input. This investigation consists of wireline-log and core data. Five-thousand wireline logs were correlated in an area of over 9500 sq km to map out six depositional sequences that are separated by major flooding events. These events are correlative over the whole area of study. Six cores, that sample nearly the complete section, were described for lithofacies. Lithofacies are recognized based on depositional features and mineralogy:(1) Subarkose, (2) Lithicarkoses, (3) Sandy siliciclastic conglomerate, (4) Muddy calcareous conglomerate, (5) Crinoidal packstone, (6) Oodic grainstone, (7)Pelodic grainstone, (8) Ripple laminated mudrock, (9) faint laminated mudrock. The integration of isopachs of depositional sequences with the lithofacies has allowed the delineation of the spatial and temporal evolution of the slope to basin-floor system. Thin-to-thick bedded turbidites, hyperconcentrated density flow deposits (slurry beds), and debris and mud flow deposits were observed and can be used to better predicte lithofacies distributions in areas that have less data control. These mixed siliciclastic and carbonate deposits can be carrier beds for the hydrocarbons generated from the enclosing organic-rich (TOC ranges from 0.55 to 6.77wt%), dysareobic to anaerobic mudstones.

  16. Extremely hypomorphic and severe deep intronic variants in the ABCA4 locus result in varying Stargardt disease phenotypes.

    PubMed

    Zernant, Jana; Lee, Winston; Nagasaki, Takayuki; Collison, Frederick T; Fishman, Gerald A; Bertelsen, Mette; Rosenberg, Thomas; Gouras, Peter; Tsang, Stephen H; Allikmets, Rando

    2018-05-30

    Autosomal recessive Stargardt disease (STGD1, MIM 248200) is caused by mutations in the ABCA4 gene. Complete sequencing of the ABCA4 locus in STGD1 patients identifies two expected disease-causing alleles in ~75% of patients and only one mutation in ~15% of patients. Recently, many possibly pathogenic variants in deep intronic sequences of ABCA4 have been identified in the latter group. We extended our analyses of deep intronic ABCA4 variants and determined that one of these, c.4253+43G>A (rs61754045), is present in 29/1155 (2.6%) of STGD1 patients. The variant is found at statistically significantly higher frequency in patients with only one pathogenic ABCA4 allele, 23/160 (14.38%), MAF=0.072, compared to MAF=0.013 in all STGD1 cases and MAF=0.006 in the matching general population (P<1x10-7). The variant, which is not predicted to have any effect on splicing, is the first reported intronic "extremely hypomorphic allele" in the ABCA4 locus; i.e., it is pathogenic only when in trans with a loss-of-function ABCA4 allele. It results in a distinct clinical phenotype characterized by late-onset of symptoms and foveal sparing. In ~70% of cases the variant was allelic with the c.6006-609T>A (rs575968112) variant, which was deemed non-pathogenic. Another rare deep intronic variant, c.5196+1056A>G (rs886044749), found in 5/834 (0.6%) of STGD1 cases is, conversely, a severe allele. This study determines pathogenicity for three non-coding variants in STGD1 patients of European descent accounting for ~3% of the disease. Defining disease-associated alleles in the non-coding sequences of the ABCA4 locus can be accomplished by integrated clinical and genetic analyses. Cold Spring Harbor Laboratory Press.

  17. Identification of microRNAs from Amur grape (vitis amurensis Rupr.) by deep sequencing and analysis of microRNA variations with bioinformatics

    PubMed Central

    2012-01-01

    Background MicroRNA (miRNA) is a class of functional non-coding small RNA with 19-25 nucleotides in length while Amur grape (Vitis amurensis Rupr.) is an important wild fruit crop with the strongest cold resistance among the Vitis species, is used as an excellent breeding parent for grapevine, and has elicited growing interest in wine production. To date, there is a relatively large number of grapevine miRNAs (vv-miRNAs) from cultivated grapevine varieties such as Vitis vinifera L. and hybrids of V. vinifera and V. labrusca, but there is no report on miRNAs from Vitis amurensis Rupr, a wild grapevine species. Results A small RNA library from Amur grape was constructed and Solexa technology used to perform deep sequencing of the library followed by subsequent bioinformatics analysis to identify new miRNAs. In total, 126 conserved miRNAs belonging to 27 miRNA families were identified, and 34 known but non-conserved miRNAs were also found. Significantly, 72 new potential Amur grape-specific miRNAs were discovered. The sequences of these new potential va-miRNAs were further validated through miR-RACE, and accumulation of 18 new va-miRNAs in seven tissues of grapevines confirmed by real time RT-PCR (qRT-PCR) analysis. The expression levels of va-miRNAs in flowers and berries were found to be basically consistent in identity to those from deep sequenced sRNAs libraries of combined corresponding tissues. We also describe the conservation and variation of va-miRNAs using miR-SNPs and miR-LDs during plant evolution based on comparison of orthologous sequences, and further reveal that the number and sites of miR-SNP in diverse miRNA families exhibit distinct divergence. Finally, 346 target genes for the new miRNAs were predicted and they include a number of Amur grape stress tolerance genes and many genes regulating anthocyanin synthesis and sugar metabolism. Conclusions Deep sequencing of short RNAs from Amur grape flowers and berries identified 72 new potential miRNAs and 34 known but non-conserved miRNAs, indicating that specific miRNAs exist in Amur grape. These results show that a number of regulatory miRNAs exist in Amur grape and play an important role in Amur grape growth, development, and response to abiotic or biotic stress. PMID:22455456

  18. Sequence stratigraphy of the Monterey Formation, Santa Barbara County: Integration of physical, chemical, and biofacies data from outcrop and subsurface

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bohacs, K.M.

    1990-05-01

    Deep basinal rocks of the Monterey Formation can be allocated to different depositional environments based on an integration of bedding, facies stacking patterns, lithology, biofacies, and inorganic and organic chemistry. These rocks show evidence of systematic changes in depositional environments that can be related to eustatic sea level change and basin evolution. Even deep-basinal environments are affected by changing sea level through changes in circulation patterns and intensities nutrient budgets and dispersal patterns, and location and intensity of the oceanic oxygen minimum. The sequence-stratigraphic framework was constructed based on the physical expression of the outcrop strata and confirmed by typingmore » the outcrop sections to an integrated well-log/seismic grid through outcrop gamma-ray-spectral profiles. Interpretation of a sequence boundary was based on increased proportions of hemipelagic facies, evidence of increased bottom-energy levels above the boundary, and local erosion and relief on the surface. The proportion of shallower water and reworked dinoflagellates increased to a local maximum above the boundary, Downlap surfaces exhibited increased proportions of pelagic facies around the surface, evidence of decreased bottom-energy levels and terrigenous sedimentation rates, and little or no significant erosion on the surface. The proportion of deeper water dinoflagellates increased to a local maximum at or near the downlap surface; there was no evidence of reworked individuals. The detailed sequence-stratigraphic framework makes it possible to the rock properties to genetic processes for construction of predictive models.« less

  19. A comparison of microbial communities in deep-sea polymetallic nodules and the surrounding sediments in the Pacific Ocean

    NASA Astrophysics Data System (ADS)

    Wu, Yue-Hong; Liao, Li; Wang, Chun-Sheng; Ma, Wei-Lin; Meng, Fan-Xu; Wu, Min; Xu, Xue-Wei

    2013-09-01

    Deep-sea polymetallic nodules, rich in metals such as Fe, Mn, and Ni, are potential resources for future exploitation. Early culturing and microscopy studies suggest that polymetallic nodules are at least partially biogenic. To understand the microbial communities in this environment, we compared microbial community composition and diversity inside nodules and in the surrounding sediments. Three sampling sites in the Pacific Ocean containing polymetallic nodules were used for culture-independent investigations of microbial diversity. A total of 1013 near full-length bacterial 16S rRNA gene sequences and 640 archaeal 16S rRNA gene sequences with ~650 bp from nodules and the surrounding sediments were analyzed. Bacteria showed higher diversity than archaea. Interestingly, sediments contained more diverse bacterial communities than nodules, while the opposite was detected for archaea. Bacterial communities tend to be mostly unique to sediments or nodules, with only 13.3% of sequences shared. The most abundant bacterial groups detected only in nodules were Pseudoalteromonas and Alteromonas, which were predicted to play a role in building matrix outside cells to induce or control mineralization. However, archaeal communities were mostly shared between sediments and nodules, including the most abundant OTU containing 290 sequences from marine group I Thaumarchaeota. PcoA analysis indicated that microhabitat (i.e., nodule or sediment) seemed to be a major factor influencing microbial community composition, rather than sampling locations or distances between locations.

  20. Partial androgen insensitivity syndrome caused by a deep intronic mutation creating an alternative splice acceptor site of the AR gene.

    PubMed

    Ono, Hiroyuki; Saitsu, Hirotomo; Horikawa, Reiko; Nakashima, Shinichi; Ohkubo, Yumiko; Yanagi, Kumiko; Nakabayashi, Kazuhiko; Fukami, Maki; Fujisawa, Yasuko; Ogata, Tsutomu

    2018-02-02

    Although partial androgen insensitivity syndrome (PAIS) is caused by attenuated responsiveness to androgens, androgen receptor gene (AR) mutations on the coding regions and their splice sites have been identified only in <25% of patients with a diagnosis of PAIS. We performed extensive molecular studies including whole exome sequencing in a Japanese family with PAIS, identifying a deep intronic variant beyond the branch site at intron 6 of AR (NM_000044.4:c.2450-42 G > A). This variant created the splice acceptor motif that was accompanied by pyrimidine-rich sequence and two candidate branch sites. Consistent with this, reverse transcriptase (RT)-PCR experiments for cycloheximide-treated lymphoblastoid cell lines revealed a relatively large amount of aberrant mRNA produced by the newly created splice acceptor site and a relatively small amount of wildtype mRNA produced by the normal splice acceptor site. Furthermore, most of the aberrant mRNA was shown to undergo nonsense mediated decay (NMD) and, if a small amount of aberrant mRNA may have escaped NMD, such mRNA was predicted to generate a truncated AR protein missing some functional domains. These findings imply that the deep intronic mutation creating an alternative splice acceptor site resulted in the production of a relatively small amount of wildtype AR mRNA, leading to PAIS.

  1. Enhanced sensitivity for detection of low-level germline mosaic RB1 mutations in sporadic retinoblastoma cases using deep semiconductor sequencing.

    PubMed

    Chen, Zhao; Moran, Kimberly; Richards-Yutz, Jennifer; Toorens, Erik; Gerhart, Daniel; Ganguly, Tapan; Shields, Carol L; Ganguly, Arupa

    2014-03-01

    Sporadic retinoblastoma (RB) is caused by de novo mutations in the RB1 gene. Often, these mutations are present as mosaic mutations that cannot be detected by Sanger sequencing. Next-generation deep sequencing allows unambiguous detection of the mosaic mutations in lymphocyte DNA. Deep sequencing of the RB1 gene on lymphocyte DNA from 20 bilateral and 70 unilateral RB cases was performed, where Sanger sequencing excluded the presence of mutations. The individual exons of the RB1 gene from each sample were amplified, pooled, ligated to barcoded adapters, and sequenced using semiconductor sequencing on an Ion Torrent Personal Genome Machine. Six low-level mosaic mutations were identified in bilateral RB and four in unilateral RB cases. The incidence of low-level mosaic mutation was estimated to be 30% and 6%, respectively, in sporadic bilateral and unilateral RB cases, previously classified as mutation negative. The frequency of point mutations detectable in lymphocyte DNA increased from 96% to 97% for bilateral RB and from 13% to 18% for unilateral RB. The use of deep sequencing technology increased the sensitivity of the detection of low-level germline mosaic mutations in the RB1 gene. This finding has significant implications for improved clinical diagnosis, genetic counseling, surveillance, and management of RB. © 2013 WILEY PERIODICALS, INC.

  2. Draft Genome Sequence of Deep-Sea Alteromonas sp. Strain V450 Isolated from the Marine Sponge Leiodermatium sp.

    PubMed Central

    Barrett, Nolan H.; McCarthy, Peter J.

    2017-01-01

    ABSTRACT The proteobacterium Alteromonas sp. strain V450 was isolated from the Atlantic deep-sea sponge Leiodermatium sp. Here, we report the draft genome sequence of this strain, with a genome size of approx. 4.39 Mb and a G+C content of 44.01%. The results will aid deep-sea microbial ecology, evolution, and sponge-microbe association studies. PMID:28153886

  3. Identification of Prostate Cancer-Specific microDNAs

    DTIC Science & Technology

    2016-02-01

    circular DNA by rolling circle amplification (RCA) and then amplified DNA fragments were subject to deep sequencing. Deep sequencing of the...demonstrate the existence of microDNAs in prostate cancer. We adopted multiple displacement amplification (MDA) with random 2 primers for enriched...prostate cancer cells through multiple displacement amplification and next generation sequencing. R e la ti v e c e ll g ro w th ( % ) 0 20

  4. Sequence-specific bias correction for RNA-seq data using recurrent neural networks.

    PubMed

    Zhang, Yao-Zhong; Yamaguchi, Rui; Imoto, Seiya; Miyano, Satoru

    2017-01-25

    The recent success of deep learning techniques in machine learning and artificial intelligence has stimulated a great deal of interest among bioinformaticians, who now wish to bring the power of deep learning to bare on a host of bioinformatical problems. Deep learning is ideally suited for biological problems that require automatic or hierarchical feature representation for biological data when prior knowledge is limited. In this work, we address the sequence-specific bias correction problem for RNA-seq data redusing Recurrent Neural Networks (RNNs) to model nucleotide sequences without pre-determining sequence structures. The sequence-specific bias of a read is then calculated based on the sequence probabilities estimated by RNNs, and used in the estimation of gene abundance. We explore the application of two popular RNN recurrent units for this task and demonstrate that RNN-based approaches provide a flexible way to model nucleotide sequences without knowledge of predetermined sequence structures. Our experiments show that training a RNN-based nucleotide sequence model is efficient and RNN-based bias correction methods compare well with the-state-of-the-art sequence-specific bias correction method on the commonly used MAQC-III data set. RNNs provides an alternative and flexible way to calculate sequence-specific bias without explicitly pre-determining sequence structures.

  5. Evolution of simeprevir-resistant variants over time by ultra-deep sequencing in HCV genotype 1b.

    PubMed

    Akuta, Norio; Suzuki, Fumitaka; Sezaki, Hitomi; Suzuki, Yoshiyuki; Hosaka, Tetsuya; Kobayashi, Masahiro; Kobayashi, Mariko; Saitoh, Satoshi; Ikeda, Kenji; Kumada, Hiromitsu

    2014-08-01

    Using ultra-deep sequencing technology, the present study was designed to investigate the evolution of simeprevir-resistant variants (amino acid substitutions of aa80, aa155, aa156, and aa168 positions in HCV NS3 region) over time. In Toranomon Hospital, 18 Japanese patients infected with HCV genotype 1b, received triple therapy of simeprevir/PEG-IFN/ribavirin (DRAGON or CONCERT study). Sustained virological response rate was 67%, and that was significantly higher in patients with IL28B rs8099917 TT than in those with non-TT. Six patients, who did not achieve sustained virological response, were tested for resistant variants by ultra-deep sequencing, at the baseline, at the time of re-elevation of viral loads, and at 96 weeks after the completion of treatment. Twelve of 18 resistant variants, detected at re-elevation of viral load, were de novo resistant variants. Ten of 12 de novo resistant variants become undetectable over time, and that five of seven resistant variants, detected at baseline, persisted over time. In one patient, variants of Q80R at baseline (0.3%) increased at 96-week after the cessation of treatment (10.2%), and de novo resistant variants of D168E (0.3%) also increased at 96-week after the cessation of treatment (9.7%). In conclusion, the present study indicates that the emergence of simeprevir-resistant variants after the start of treatment could not be predicted at baseline, and the majority of de novo resistant variants become undetectable over time. Further large-scale prospective studies should be performed to investigate the clinical utility in detecting simeprevir-resistant variants. © 2014 Wiley Periodicals, Inc.

  6. An Efficient Strategy of Screening for Pathogens in Wild-Caught Ticks and Mosquitoes by Reusing Small RNA Deep Sequencing Data

    PubMed Central

    An, Xiaoping; Fan, Hang; Ma, Maijuan; Anderson, Benjamin D.; Jiang, Jiafu; Liu, Wei; Cao, Wuchun; Tong, Yigang

    2014-01-01

    This paper explored our hypothesis that sRNA (18∼30 bp) deep sequencing technique can be used as an efficient strategy to identify microorganisms other than viruses, such as prokaryotic and eukaryotic pathogens. In the study, the clean reads derived from the sRNA deep sequencing data of wild-caught ticks and mosquitoes were compared against the NCBI nucleotide collection (non-redundant nt database) using Blastn. The blast results were then analyzed with in-house Python scripts. An empirical formula was proposed to identify the putative pathogens. Results showed that not only viruses but also prokaryotic and eukaryotic species of interest can be screened out and were subsequently confirmed with experiments. Specially, a novel Rickettsia spp. was indicated to exist in Haemaphysalis longicornis ticks collected in Beijing. Our study demonstrated the reuse of sRNA deep sequencing data would have the potential to trace the origin of pathogens or discover novel agents of emerging/re-emerging infectious diseases. PMID:24618575

  7. Supercritical sedimentary structures and bedforms and criteria for recognition in the field: insights from the Middle Eocene deep-marine Morillo and Guaso systems, Ainsa Basin, Spanish Pyrenees

    NASA Astrophysics Data System (ADS)

    Torley, John; Pickering, Kevin

    2017-04-01

    It has long been acknowledged that for most submarine slopes with gradients > 0.5, common to many deep-water environments, they should contain abundant evidence of supercritical flows and their deposits. However, it is common for deep-marine sands/sandstones to be routinely modelled using the Bouma (1962) sequence for turbidites. Recently, the importance of supercritical flows has been highlighted from seafloor observations, with numerical and physical experiments. Such experiments have produced previously unrecognised bedforms which fail to be interpreted adequately by Bouma's model, including antidunes, chutes-and-pools, and cyclic steps. Fieldwork in the Middle Eocene Ainsa Basin, Spanish Pyrenees, has been undertaken in the Morillo and Guaso systems of the Upper Hecho Group. Approximately 5,000 beds were measured and documented in detail, e.g., grain size, sedimentary structures, bedforms and facies. Collectively, this data can be used to understand supercritical versus subcritical flow. The relative importance of supercritical flow can then be compared and contrasted within individual ancient deep-marine systems. The Morillo System is relatively coarse-grained, compared with the Guaso System. The results of this research contribute to an improved understanding of the processes in deep-marine systems, and directly benefit the hydrocarbon industry by providing better constraints to predict deep-water reservoir composition and architecture.

  8. Novel microbial assemblages inhabiting crustal fluids within mid-ocean ridge flank subsurface basalt

    PubMed Central

    Jungbluth, Sean P; Bowers, Robert M; Lin, Huei-Ting; Cowen, James P; Rappé, Michael S

    2016-01-01

    Although little is known regarding microbial life within our planet's rock-hosted deep subseafloor biosphere, boreholes drilled through deep ocean sediment and into the underlying basaltic crust provide invaluable windows of access that have been used previously to document the presence of microorganisms within fluids percolating through the deep ocean crust. In this study, the analysis of 1.7 million small subunit ribosomal RNA genes amplified and sequenced from marine sediment, bottom seawater and basalt-hosted deep subseafloor fluids that span multiple years and locations on the Juan de Fuca Ridge flank was used to quantitatively delineate a subseafloor microbiome comprised of distinct bacteria and archaea. Hot, anoxic crustal fluids tapped by newly installed seafloor sampling observatories at boreholes U1362A and U1362B contained abundant bacterial lineages of phylogenetically unique Nitrospirae, Aminicenantes, Calescamantes and Chloroflexi. Although less abundant, the domain Archaea was dominated by unique, uncultivated lineages of marine benthic group E, the Terrestrial Hot Spring Crenarchaeotic Group, the Bathyarchaeota and relatives of cultivated, sulfate-reducing Archaeoglobi. Consistent with recent geochemical measurements and bioenergetic predictions, the potential importance of methane cycling and sulfate reduction were imprinted within the basalt-hosted deep subseafloor crustal fluid microbial community. This unique window of access to the deep ocean subsurface basement reveals a microbial landscape that exhibits previously undetected spatial heterogeneity. PMID:26872042

  9. Draft Genome Sequence of Deep-Sea Alteromonas sp. Strain V450 Isolated from the Marine Sponge Leiodermatium sp.

    PubMed

    Wang, Guojun; Barrett, Nolan H; McCarthy, Peter J

    2017-02-02

    The proteobacterium Alteromonas sp. strain V450 was isolated from the Atlantic deep-sea sponge Leiodermatium sp. Here, we report the draft genome sequence of this strain, with a genome size of approx. 4.39 Mb and a G+C content of 44.01%. The results will aid deep-sea microbial ecology, evolution, and sponge-microbe association studies. Copyright © 2017 Wang et al.

  10. Transcriptome sequencing analysis of novel sRNAs of Kineococcus radiotolerans in response to ionizing radiation.

    PubMed

    Chen, Zhouwei; Li, Lufeng; Shan, Zhan; Huang, Hannian; Chen, Huan; Ding, Xianfeng; Guo, Jiangfeng; Liu, Lili

    2016-11-01

    Kineococcus radiotolerans is a Gram-positive, radio-resistant bacterium isolated from a radioactive environment. The small noncoding RNAs (sRNAs) in bacteria are reported to play roles in the immediate response to stress and/or the recovery from stress. The analysis of K. radiotolerans transcriptome sequencing results can identify these sRNAs in a genome-wide detection, using RNA sequencing (RNA-seq) by the deep sequencing technique. In this study, the raw data of radiation-exposed samples (RS) and control samples (CS) were acquired separately from the sequencing platform. There were 217 common sRNA candidates in the two samples screened in the genome-wide scale by bioinformatics analysis. There were 43 differentially expressed sRNA candidates, including 28 up-regulated and 15 down-regulated ones. The down-regulated sRNAs were selected for the sRNA target prediction, of which 12 sRNAs that may modulate the genes related to the transcription regulation and DNA repair were considered as the candidates involved in the radio-resistance regulation system. Copyright © 2016 Elsevier GmbH. All rights reserved.

  11. miRBase: integrating microRNA annotation and deep-sequencing data.

    PubMed

    Kozomara, Ana; Griffiths-Jones, Sam

    2011-01-01

    miRBase is the primary online repository for all microRNA sequences and annotation. The current release (miRBase 16) contains over 15,000 microRNA gene loci in over 140 species, and over 17,000 distinct mature microRNA sequences. Deep-sequencing technologies have delivered a sharp rise in the rate of novel microRNA discovery. We have mapped reads from short RNA deep-sequencing experiments to microRNAs in miRBase and developed web interfaces to view these mappings. The user can view all read data associated with a given microRNA annotation, filter reads by experiment and count, and search for microRNAs by tissue- and stage-specific expression. These data can be used as a proxy for relative expression levels of microRNA sequences, provide detailed evidence for microRNA annotations and alternative isoforms of mature microRNAs, and allow us to revisit previous annotations. miRBase is available online at: http://www.mirbase.org/.

  12. Transcriptome sequences resolve deep relationships of the grape family.

    PubMed

    Wen, Jun; Xiong, Zhiqiang; Nie, Ze-Long; Mao, Likai; Zhu, Yabing; Kan, Xian-Zhao; Ickert-Bond, Stefanie M; Gerrath, Jean; Zimmer, Elizabeth A; Fang, Xiao-Dong

    2013-01-01

    Previous phylogenetic studies of the grape family (Vitaceae) yielded poorly resolved deep relationships, thus impeding our understanding of the evolution of the family. Next-generation sequencing now offers access to protein coding sequences very easily, quickly and cost-effectively. To improve upon earlier work, we extracted 417 orthologous single-copy nuclear genes from the transcriptomes of 15 species of the Vitaceae, covering its phylogenetic diversity. The resulting transcriptome phylogeny provides robust support for the deep relationships, showing the phylogenetic utility of transcriptome data for plants over a time scale at least since the mid-Cretaceous. The pros and cons of transcriptome data for phylogenetic inference in plants are also evaluated.

  13. Protein-Protein Interactions Prediction Using a Novel Local Conjoint Triad Descriptor of Amino Acid Sequences

    PubMed Central

    Zhang, Long; Jia, Lianyin; Ren, Yazhou

    2017-01-01

    Protein-protein interactions (PPIs) play crucial roles in almost all cellular processes. Although a large amount of PPIs have been verified by high-throughput techniques in the past decades, currently known PPIs pairs are still far from complete. Furthermore, the wet-lab experiments based techniques for detecting PPIs are time-consuming and expensive. Hence, it is urgent and essential to develop automatic computational methods to efficiently and accurately predict PPIs. In this paper, a sequence-based approach called DNN-LCTD is developed by combining deep neural networks (DNNs) and a novel local conjoint triad description (LCTD) feature representation. LCTD incorporates the advantage of local description and conjoint triad, thus, it is capable to account for the interactions between residues in both continuous and discontinuous regions of amino acid sequences. DNNs can not only learn suitable features from the data by themselves, but also learn and discover hierarchical representations of data. When performing on the PPIs data of Saccharomyces cerevisiae, DNN-LCTD achieves superior performance with accuracy as 93.12%, precision as 93.75%, sensitivity as 93.83%, area under the receiver operating characteristic curve (AUC) as 97.92%, and it only needs 718 s. These results indicate DNN-LCTD is very promising for predicting PPIs. DNN-LCTD can be a useful supplementary tool for future proteomics study. PMID:29117139

  14. Protein-Protein Interactions Prediction Using a Novel Local Conjoint Triad Descriptor of Amino Acid Sequences.

    PubMed

    Wang, Jun; Zhang, Long; Jia, Lianyin; Ren, Yazhou; Yu, Guoxian

    2017-11-08

    Protein-protein interactions (PPIs) play crucial roles in almost all cellular processes. Although a large amount of PPIs have been verified by high-throughput techniques in the past decades, currently known PPIs pairs are still far from complete. Furthermore, the wet-lab experiments based techniques for detecting PPIs are time-consuming and expensive. Hence, it is urgent and essential to develop automatic computational methods to efficiently and accurately predict PPIs. In this paper, a sequence-based approach called DNN-LCTD is developed by combining deep neural networks (DNNs) and a novel local conjoint triad description (LCTD) feature representation. LCTD incorporates the advantage of local description and conjoint triad, thus, it is capable to account for the interactions between residues in both continuous and discontinuous regions of amino acid sequences. DNNs can not only learn suitable features from the data by themselves, but also learn and discover hierarchical representations of data. When performing on the PPIs data of Saccharomyces cerevisiae , DNN-LCTD achieves superior performance with accuracy as 93.12%, precision as 93.75%, sensitivity as 93.83%, area under the receiver operating characteristic curve (AUC) as 97.92%, and it only needs 718 s. These results indicate DNN-LCTD is very promising for predicting PPIs. DNN-LCTD can be a useful supplementary tool for future proteomics study.

  15. Draft Genome Sequence of Pseudomonas oceani DSM 100277T, a Deep-Sea Bacterium

    PubMed Central

    2018-01-01

    ABSTRACT Pseudomonas oceani DSM 100277T was isolated from deep seawater in the Okinawa Trough at 1390 m. P. oceani belongs to the Pseudomonas pertucinogena group. Here, we report the draft genome sequence of P. oceani, which has an estimated size of 4.1 Mb and exhibits 3,790 coding sequences, with a G+C content of 59.94 mol%. PMID:29650573

  16. A Bioinformatic Pipeline for Monitoring of the Mutational Stability of Viral Drug Targets with Deep-Sequencing Technology.

    PubMed

    Kravatsky, Yuri; Chechetkin, Vladimir; Fedoseeva, Daria; Gorbacheva, Maria; Kravatskaya, Galina; Kretova, Olga; Tchurikov, Nickolai

    2017-11-23

    The efficient development of antiviral drugs, including efficient antiviral small interfering RNAs (siRNAs), requires continuous monitoring of the strict correspondence between a drug and the related highly variable viral DNA/RNA target(s). Deep sequencing is able to provide an assessment of both the general target conservation and the frequency of particular mutations in the different target sites. The aim of this study was to develop a reliable bioinformatic pipeline for the analysis of millions of short, deep sequencing reads corresponding to selected highly variable viral sequences that are drug target(s). The suggested bioinformatic pipeline combines the available programs and the ad hoc scripts based on an original algorithm of the search for the conserved targets in the deep sequencing data. We also present the statistical criteria for the threshold of reliable mutation detection and for the assessment of variations between corresponding data sets. These criteria are robust against the possible sequencing errors in the reads. As an example, the bioinformatic pipeline is applied to the study of the conservation of RNA interference (RNAi) targets in human immunodeficiency virus 1 (HIV-1) subtype A. The developed pipeline is freely available to download at the website http://virmut.eimb.ru/. Brief comments and comparisons between VirMut and other pipelines are also presented.

  17. High resolution tempo-spatial ozone prediction with SVM and LSTM

    NASA Astrophysics Data System (ADS)

    Gao, D.; Zhang, Y.; Qu, Z.; Sadighi, K.; Coffey, E.; LIU, Q.; Hannigan, M.; Henze, D. K.; Dick, R.; Shang, L.; Lv, Q.

    2017-12-01

    To investigate and predict the exposure of ozone and other pollutants in urban areas, we utilize data from various infrastructures including EPA, NOAA and RIITS from government of Los Angeles and construct statistical models to conduct ozone concentration prediction in Los Angeles areas at finer spatial and temporal granularity. Our work involves cyber data such as traffic, roads and population data as features for prediction. Two statistical models, Support Vector Machine (SVM) and Long Short-term Memory (LSTM, deep learning method) are used for prediction. . Our experiments show that kernelized SVM gains better prediction performance when taking traffic counts, road density and population density as features, with a prediction RMSE of 7.99 ppb for all-time ozone and 6.92 ppb for peak-value ozone. With simulated NOx from Chemical Transport Model(CTM) as features, SVM generates even better prediction performance, with a prediction RMSE of 6.69ppb. We also build LSTM, which has shown great advantages at dealing with temporal sequences, to predict ozone concentration by treating ozone concentration as spatial-temporal sequences. Trained by ozone concentration measurements from the 13 EPA stations in LA area, the model achieves 4.45 ppb RMSE. Besides, we build a variant of this model which adds spatial dynamics into the model in the form of transition matrix that reveals new knowledge on pollutant transition. The forgetting gate of the trained LSTM is consistent with the delay effect of ozone concentration and the trained transition matrix shows spatial consistency with the common direction of winds in LA area.

  18. Identification of MicroRNAs in Helicoverpa armigera and Spodoptera litura Based on Deep Sequencing and Homology Analysis

    PubMed Central

    Ge, Xie; Zhang, Yong; Jiang, Jianhao; Zhong, Yi; Yang, Xiaonan; Li, Zhiqian; Huang, Yongping; Tan, Anjiang

    2013-01-01

    The current identification of microRNAs (miRNAs) in insects is largely dependent on genome sequences. However, the lack of available genome sequences inhibits the identification of miRNAs in various insect species. In this study, we used a miRNA database of the silkworm Bombyx mori as a reference to identify miRNAs in Helicoverpa armigera and Spodoptera litura using deep sequencing and homology analysis. Because all three species belong to the Lepidoptera, the experiment produced reliable results. Our study identified 97 and 91 conserved miRNAs in H. armigera and S. litura, respectively. Using the genome of B. mori and BAC sequences of H. armigera as references, 1 novel miRNA and 8 novel miRNA candidates were identified in H. armigera, and 4 novel miRNA candidates were identified in S. litura. An evolutionary analysis revealed that most of the identified miRNAs were insect-specific, and more than 20 miRNAs were Lepidoptera-specific. The investigation of the expression patterns of miR-2a, miR-34, miR-2796-3p and miR-11 revealed their potential roles in insect development. miRNA target prediction revealed that conserved miRNA target sites exist in various genes in the 3 species. Conserved miRNA target sites for the Hsp90 gene among the 3 species were validated in the mammalian 293T cell line using a dual-luciferase reporter assay. Our study provides a new approach with which to identify miRNAs in insects lacking genome information and contributes to the functional analysis of insect miRNAs. PMID:23289012

  19. A Deep Learning Network Approach to ab initio Protein Secondary Structure Prediction

    PubMed Central

    Spencer, Matt; Eickholt, Jesse; Cheng, Jianlin

    2014-01-01

    Ab initio protein secondary structure (SS) predictions are utilized to generate tertiary structure predictions, which are increasingly demanded due to the rapid discovery of proteins. Although recent developments have slightly exceeded previous methods of SS prediction, accuracy has stagnated around 80% and many wonder if prediction cannot be advanced beyond this ceiling. Disciplines that have traditionally employed neural networks are experimenting with novel deep learning techniques in attempts to stimulate progress. Since neural networks have historically played an important role in SS prediction, we wanted to determine whether deep learning could contribute to the advancement of this field as well. We developed an SS predictor that makes use of the position-specific scoring matrix generated by PSI-BLAST and deep learning network architectures, which we call DNSS. Graphical processing units and CUDA software optimize the deep network architecture and efficiently train the deep networks. Optimal parameters for the training process were determined, and a workflow comprising three separately trained deep networks was constructed in order to make refined predictions. This deep learning network approach was used to predict SS for a fully independent test data set of 198 proteins, achieving a Q3 accuracy of 80.7% and a Sov accuracy of 74.2%. PMID:25750595

  20. A Deep Learning Network Approach to ab initio Protein Secondary Structure Prediction.

    PubMed

    Spencer, Matt; Eickholt, Jesse; Jianlin Cheng

    2015-01-01

    Ab initio protein secondary structure (SS) predictions are utilized to generate tertiary structure predictions, which are increasingly demanded due to the rapid discovery of proteins. Although recent developments have slightly exceeded previous methods of SS prediction, accuracy has stagnated around 80 percent and many wonder if prediction cannot be advanced beyond this ceiling. Disciplines that have traditionally employed neural networks are experimenting with novel deep learning techniques in attempts to stimulate progress. Since neural networks have historically played an important role in SS prediction, we wanted to determine whether deep learning could contribute to the advancement of this field as well. We developed an SS predictor that makes use of the position-specific scoring matrix generated by PSI-BLAST and deep learning network architectures, which we call DNSS. Graphical processing units and CUDA software optimize the deep network architecture and efficiently train the deep networks. Optimal parameters for the training process were determined, and a workflow comprising three separately trained deep networks was constructed in order to make refined predictions. This deep learning network approach was used to predict SS for a fully independent test dataset of 198 proteins, achieving a Q3 accuracy of 80.7 percent and a Sov accuracy of 74.2 percent.

  1. A combination of LongSAGE with Solexa sequencing is well suited to explore the depth and the complexity of transcriptome

    PubMed Central

    Hanriot, Lucie; Keime, Céline; Gay, Nadine; Faure, Claudine; Dossat, Carole; Wincker, Patrick; Scoté-Blachon, Céline; Peyron, Christelle; Gandrillon, Olivier

    2008-01-01

    Background "Open" transcriptome analysis methods allow to study gene expression without a priori knowledge of the transcript sequences. As of now, SAGE (Serial Analysis of Gene Expression), LongSAGE and MPSS (Massively Parallel Signature Sequencing) are the mostly used methods for "open" transcriptome analysis. Both LongSAGE and MPSS rely on the isolation of 21 pb tag sequences from each transcript. In contrast to LongSAGE, the high throughput sequencing method used in MPSS enables the rapid sequencing of very large libraries containing several millions of tags, allowing deep transcriptome analysis. However, a bias in the complexity of the transcriptome representation obtained by MPSS was recently uncovered. Results In order to make a deep analysis of mouse hypothalamus transcriptome avoiding the limitation introduced by MPSS, we combined LongSAGE with the Solexa sequencing technology and obtained a library of more than 11 millions of tags. We then compared it to a LongSAGE library of mouse hypothalamus sequenced with the Sanger method. Conclusion We found that Solexa sequencing technology combined with LongSAGE is perfectly suited for deep transcriptome analysis. In contrast to MPSS, it gives a complex representation of transcriptome as reliable as a LongSAGE library sequenced by the Sanger method. PMID:18796152

  2. Cratering motions and structural deformation in the rim of the Prairie Flat multiring explosion crater

    NASA Technical Reports Server (NTRS)

    Roddy, D. J.; Ullrich, G. W.; Sauer, F. M.; Jones, G. H. S.

    1977-01-01

    Cratering motions and structural deformation are described for the rim of the Prairie Flat multiring crater, 85.5 m across and 5.3 m deep, which was formed by the detonation of a 500-ton TNT surface-tangent sphere. The terminal displacement and motion data are derived from marker cans and velocity gages emplaced in drill holes in a three-dimensional matrix radial to the crater. The integration of this data with a detailed geologic cross section, mapped from deep trench excavations through the rim, provides a composite view of the general sequence of motions that formed a transiently uplifted rim, overturned flap, inverted stratigraphy, downfolded rim, and deformed strata in the crater walls. Preliminary comparisons with laboratory experimental cratering and with numerical simulations indicate that explosion craters of the Prairie Flat-type generated by surface and near-surface energy sources tend to follow predictable motion sequences and produce comparable structural deformation. More specifically, central uplift and multiring impact craters with morphologies and structures comparable to Prairie Flat are inferred to have experienced similar deformational histories of the rim, such as uplift, overturning, terracing, and downfolding.

  3. Sequence stratigraphy of the Monterey Formation, Santa Barbara County: Integration of physical, chemical, and biofacies data from outcrop and subsurface

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bohacs, K.M.

    1991-02-01

    Deep basinal rocks of the Monterey Formation can be allocated to different depositional environments based on an integration of bedding, stacking patterns of facies, lithology, biofacies, and inorganic and organic chemistry. These rocks show evidence of systematic changes in depositional environments that can be related to eustatic sea level changes and basin evolution. Even deep-basinal environments are affected by changing sea level through changes in circulation patterns and intensities, nutrient budgets and dispersal patterns, and location and intensity of the oceanic oxygen minimum. The sequence-stratigraphic framework was constructed based on the physical expression of the outcrop strata and confirmed bymore » typing the outcrop sections to an integrated will-log/seismic grid through outcrop gamma-ray spectral profiles. Interpretation of a sequence boundary was based on increased proportions of hemipelagic facies and evidence of increased bottom-energy levels above the boundary, and local erosion and relief on the surface. The proportion of shallower water and reworked dinoflagellates increased to a local maximum above the boundary. Downlap surfaces exhibited increased proportions of pelagic facies around the surface, a secular change in the dominant lithology across the surface, evidence of decreased bottom-energy levels and terrigenous sedimentation rates, and little or not significant erosion on the surface. The proportion of deeper water dinoflagellates increased to a local maximum at or near the downlap surface; there was no evidence of reworked individuals. The detailed sequence-stratigraphic framework makes it possible to tie rock properties to genetic processes for construction of predictive models.« less

  4. Transcriptome Sequences Resolve Deep Relationships of the Grape Family

    PubMed Central

    Wen, Jun; Xiong, Zhiqiang; Nie, Ze-Long; Mao, Likai; Zhu, Yabing; Kan, Xian-Zhao; Ickert-Bond, Stefanie M.; Gerrath, Jean; Zimmer, Elizabeth A.; Fang, Xiao-Dong

    2013-01-01

    Previous phylogenetic studies of the grape family (Vitaceae) yielded poorly resolved deep relationships, thus impeding our understanding of the evolution of the family. Next-generation sequencing now offers access to protein coding sequences very easily, quickly and cost-effectively. To improve upon earlier work, we extracted 417 orthologous single-copy nuclear genes from the transcriptomes of 15 species of the Vitaceae, covering its phylogenetic diversity. The resulting transcriptome phylogeny provides robust support for the deep relationships, showing the phylogenetic utility of transcriptome data for plants over a time scale at least since the mid-Cretaceous. The pros and cons of transcriptome data for phylogenetic inference in plants are also evaluated. PMID:24069307

  5. GOLabeler: Improving Sequence-based Large-scale Protein Function Prediction by Learning to Rank.

    PubMed

    You, Ronghui; Zhang, Zihan; Xiong, Yi; Sun, Fengzhu; Mamitsuka, Hiroshi; Zhu, Shanfeng

    2018-03-07

    Gene Ontology (GO) has been widely used to annotate functions of proteins and understand their biological roles. Currently only <1% of more than 70 million proteins in UniProtKB have experimental GO annotations, implying the strong necessity of automated function prediction (AFP) of proteins, where AFP is a hard multilabel classification problem due to one protein with a diverse number of GO terms. Most of these proteins have only sequences as input information, indicating the importance of sequence-based AFP (SAFP: sequences are the only input). Furthermore homology-based SAFP tools are competitive in AFP competitions, while they do not necessarily work well for so-called difficult proteins, which have <60% sequence identity to proteins with annotations already. Thus the vital and challenging problem now is how to develop a method for SAFP, particularly for difficult proteins. The key of this method is to extract not only homology information but also diverse, deep- rooted information/evidence from sequence inputs and integrate them into a predictor in a both effective and efficient manner. We propose GOLabeler, which integrates five component classifiers, trained from different features, including GO term frequency, sequence alignment, amino acid trigram, domains and motifs, and biophysical properties, etc., in the framework of learning to rank (LTR), a paradigm of machine learning, especially powerful for multilabel classification. The empirical results obtained by examining GOLabeler extensively and thoroughly by using large-scale datasets revealed numerous favorable aspects of GOLabeler, including significant performance advantage over state-of-the-art AFP methods. http://datamining-iip.fudan.edu.cn/golabeler. zhusf@fudan.edu.cn. Supplementary data are available at Bioinformatics online.

  6. Emergent HIV-1 Drug Resistance Mutations Were Not Present at Low-Frequency at Baseline in Non-Nucleoside Reverse Transcriptase Inhibitor-Treated Subjects in the STaR Study

    PubMed Central

    Porter, Danielle P.; Daeumer, Martin; Thielen, Alexander; Chang, Silvia; Martin, Ross; Cohen, Cal; Miller, Michael D.; White, Kirsten L.

    2015-01-01

    At Week 96 of the Single-Tablet Regimen (STaR) study, more treatment-naïve subjects that received rilpivirine/emtricitabine/tenofovir DF (RPV/FTC/TDF) developed resistance mutations compared to those treated with efavirenz (EFV)/FTC/TDF by population sequencing. Furthermore, more RPV/FTC/TDF-treated subjects with baseline HIV-1 RNA >100,000 copies/mL developed resistance compared to subjects with baseline HIV-1 RNA ≤100,000 copies/mL. Here, deep sequencing was utilized to assess the presence of pre-existing low-frequency variants in subjects with and without resistance development in the STaR study. Deep sequencing (Illumina MiSeq) was performed on baseline and virologic failure samples for all subjects analyzed for resistance by population sequencing during the clinical study (n = 33), as well as baseline samples from control subjects with virologic response (n = 118). Primary NRTI or NNRTI drug resistance mutations present at low frequency (≥2% to 20%) were detected in 6.6% of baseline samples by deep sequencing, all of which occurred in control subjects. Deep sequencing results were generally consistent with population sequencing but detected additional primary NNRTI and NRTI resistance mutations at virologic failure in seven samples. HIV-1 drug resistance mutations emerging while on RPV/FTC/TDF or EFV/FTC/TDF treatment were not present at low frequency at baseline in the STaR study. PMID:26690199

  7. Emergent HIV-1 Drug Resistance Mutations Were Not Present at Low-Frequency at Baseline in Non-Nucleoside Reverse Transcriptase Inhibitor-Treated Subjects in the STaR Study.

    PubMed

    Porter, Danielle P; Daeumer, Martin; Thielen, Alexander; Chang, Silvia; Martin, Ross; Cohen, Cal; Miller, Michael D; White, Kirsten L

    2015-12-07

    At Week 96 of the Single-Tablet Regimen (STaR) study, more treatment-naïve subjects that received rilpivirine/emtricitabine/tenofovir DF (RPV/FTC/TDF) developed resistance mutations compared to those treated with efavirenz (EFV)/FTC/TDF by population sequencing. Furthermore, more RPV/FTC/TDF-treated subjects with baseline HIV-1 RNA >100,000 copies/mL developed resistance compared to subjects with baseline HIV-1 RNA ≤100,000 copies/mL. Here, deep sequencing was utilized to assess the presence of pre-existing low-frequency variants in subjects with and without resistance development in the STaR study. Deep sequencing (Illumina MiSeq) was performed on baseline and virologic failure samples for all subjects analyzed for resistance by population sequencing during the clinical study (n = 33), as well as baseline samples from control subjects with virologic response (n = 118). Primary NRTI or NNRTI drug resistance mutations present at low frequency (≥2% to 20%) were detected in 6.6% of baseline samples by deep sequencing, all of which occurred in control subjects. Deep sequencing results were generally consistent with population sequencing but detected additional primary NNRTI and NRTI resistance mutations at virologic failure in seven samples. HIV-1 drug resistance mutations emerging while on RPV/FTC/TDF or EFV/FTC/TDF treatment were not present at low frequency at baseline in the STaR study.

  8. VirusDetect: An automated pipeline for efficient virus discovery using deep sequencing of small RNAs

    USDA-ARS?s Scientific Manuscript database

    Accurate detection of viruses in plants and animals is critical for agriculture production and human health. Deep sequencing and assembly of virus-derived siRNAs has proven to be a highly efficient approach for virus discovery. However, to date no computational tools specifically designed for both k...

  9. Natural Variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions (2013 DOE JGI Genomics of Energy and Environment 8th Annual User Meeting)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gordon, Sean

    2013-03-01

    Sean Gordon of the USDA on Natural variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions at the 8th Annual Genomics of Energy Environment Meeting on March 27, 2013 in Walnut Creek, CA.

  10. Microbial Diversity in Deep-sea Methane Seep Sediments Presented by SSU rRNA Gene Tag Sequencing

    PubMed Central

    Nunoura, Takuro; Takaki, Yoshihiro; Kazama, Hiromi; Hirai, Miho; Ashi, Juichiro; Imachi, Hiroyuki; Takai, Ken

    2012-01-01

    Microbial community structures in methane seep sediments in the Nankai Trough were analyzed by tag-sequencing analysis for the small subunit (SSU) rRNA gene using a newly developed primer set. The dominant members of Archaea were Deep-sea Hydrothermal Vent Euryarchaeotic Group 6 (DHVEG 6), Marine Group I (MGI) and Deep Sea Archaeal Group (DSAG), and those in Bacteria were Alpha-, Gamma-, Delta- and Epsilonproteobacteria, Chloroflexi, Bacteroidetes, Planctomycetes and Acidobacteria. Diversity and richness were examined by 8,709 and 7,690 tag-sequences from sediments at 5 and 25 cm below the seafloor (cmbsf), respectively. The estimated diversity and richness in the methane seep sediment are as high as those in soil and deep-sea hydrothermal environments, although the tag-sequences obtained in this study were not sufficient to show whole microbial diversity in this analysis. We also compared the diversity and richness of each taxon/division between the sediments from the two depths, and found that the diversity and richness of some taxa/divisions varied significantly along with the depth. PMID:22510646

  11. Deep Recurrent Neural Networks for Human Activity Recognition

    PubMed Central

    Murad, Abdulmajid

    2017-01-01

    Adopting deep learning methods for human activity recognition has been effective in extracting discriminative features from raw input sequences acquired from body-worn sensors. Although human movements are encoded in a sequence of successive samples in time, typical machine learning methods perform recognition tasks without exploiting the temporal correlations between input data samples. Convolutional neural networks (CNNs) address this issue by using convolutions across a one-dimensional temporal sequence to capture dependencies among input data. However, the size of convolutional kernels restricts the captured range of dependencies between data samples. As a result, typical models are unadaptable to a wide range of activity-recognition configurations and require fixed-length input windows. In this paper, we propose the use of deep recurrent neural networks (DRNNs) for building recognition models that are capable of capturing long-range dependencies in variable-length input sequences. We present unidirectional, bidirectional, and cascaded architectures based on long short-term memory (LSTM) DRNNs and evaluate their effectiveness on miscellaneous benchmark datasets. Experimental results show that our proposed models outperform methods employing conventional machine learning, such as support vector machine (SVM) and k-nearest neighbors (KNN). Additionally, the proposed models yield better performance than other deep learning techniques, such as deep believe networks (DBNs) and CNNs. PMID:29113103

  12. Deep Recurrent Neural Networks for Human Activity Recognition.

    PubMed

    Murad, Abdulmajid; Pyun, Jae-Young

    2017-11-06

    Adopting deep learning methods for human activity recognition has been effective in extracting discriminative features from raw input sequences acquired from body-worn sensors. Although human movements are encoded in a sequence of successive samples in time, typical machine learning methods perform recognition tasks without exploiting the temporal correlations between input data samples. Convolutional neural networks (CNNs) address this issue by using convolutions across a one-dimensional temporal sequence to capture dependencies among input data. However, the size of convolutional kernels restricts the captured range of dependencies between data samples. As a result, typical models are unadaptable to a wide range of activity-recognition configurations and require fixed-length input windows. In this paper, we propose the use of deep recurrent neural networks (DRNNs) for building recognition models that are capable of capturing long-range dependencies in variable-length input sequences. We present unidirectional, bidirectional, and cascaded architectures based on long short-term memory (LSTM) DRNNs and evaluate their effectiveness on miscellaneous benchmark datasets. Experimental results show that our proposed models outperform methods employing conventional machine learning, such as support vector machine (SVM) and k-nearest neighbors (KNN). Additionally, the proposed models yield better performance than other deep learning techniques, such as deep believe networks (DBNs) and CNNs.

  13. Draft Genome Sequence of Pseudomonas oceani DSM 100277T, a Deep-Sea Bacterium.

    PubMed

    García-Valdés, Elena; Gomila, Margarita; Mulet, Magdalena; Lalucat, Jorge

    2018-04-12

    Pseudomonas oceani DSM 100277 T was isolated from deep seawater in the Okinawa Trough at 1390 m. P. oceani belongs to the Pseudomonas pertucinogena group. Here, we report the draft genome sequence of P. oceani , which has an estimated size of 4.1 Mb and exhibits 3,790 coding sequences, with a G+C content of 59.94 mol%. Copyright © 2018 García-Valdés et al.

  14. Using next generation transcriptome sequencing to predict an ectomycorrhizal metablome.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Larsen, P. E.; Sreedasyam, A.; Trivedi, G

    Mycorrhizae, symbiotic interactions between soil fungi and tree roots, are ubiquitous in terrestrial ecosystems. The fungi contribute phosphorous, nitrogen and mobilized nutrients from organic matter in the soil and in return the fungus receives photosynthetically-derived carbohydrates. This union of plant and fungal metabolisms is the mycorrhizal metabolome. Understanding this symbiotic relationship at a molecular level provides important contributions to the understanding of forest ecosystems and global carbon cycling. We generated next generation short-read transcriptomic sequencing data from fully-formed ectomycorrhizae between Laccaria bicolor and aspen (Populus tremuloides) roots. The transcriptomic data was used to identify statistically significantly expressed gene models usingmore » a bootstrap-style approach, and these expressed genes were mapped to specific metabolic pathways. Integration of expressed genes that code for metabolic enzymes and the set of expressed membrane transporters generates a predictive model of the ectomycorrhizal metabolome. The generated model of mycorrhizal metabolome predicts that the specific compounds glycine, glutamate, and allantoin are synthesized by L. bicolor and that these compounds or their metabolites may be used for the benefit of aspen in exchange for the photosynthetically-derived sugars fructose and glucose. The analysis illustrates an approach to generate testable biological hypotheses to investigate the complex molecular interactions that drive ectomycorrhizal symbiosis. These models are consistent with experimental environmental data and provide insight into the molecular exchange processes for organisms in this complex ecosystem. The method used here for predicting metabolomic models of mycorrhizal systems from deep RNA sequencing data can be generalized and is broadly applicable to transcriptomic data derived from complex systems.« less

  15. Predictive Bcl-2 Family Binding Models Rooted in Experiment or Structure

    PubMed Central

    DeBartolo, Joe; Dutta, Sanjib; Reich, Lothar; Keating, Amy E.

    2013-01-01

    Proteins of the Bcl-2 family either enhance or suppress programmed cell death and are centrally involved in cancer development and resistance to chemotherapy. BH3 (Bcl-2 homology 3)-only Bcl-2 proteins promote cell death by docking an α-helix into a hydrophobic groove on the surface of one or more of five pro-survival Bcl-2 receptor proteins. There is high structural homology within the pro-death and pro-survival families, yet a high degree of interaction specificity is nevertheless encoded, posing an interesting and important molecular recognition problem. Understanding protein features that dictate Bcl-2 interaction specificity is critical for designing peptide-based cancer therapeutics and diagnostics. In this study, we present peptide SPOT arrays and deep sequencing data from yeast display screening experiments that significantly expand the BH3 sequence space that has been experimentally tested for interaction with five human anti-apoptotic receptors. These data provide rich information about the determinants of Bcl-2 family specificity. To interpret and use the information, we constructed two simple data-based models that can predict affinity and specificity when evaluated on independent data sets within a limited sequence space. We also constructed a novel structure-based statistical potential, called STATIUM, which is remarkably good at predicting Bcl-2 affinity and specificity, especially considering it is not trained on experimental data. We compare the performance of our three models to each other and to alternative structure-based methods and discuss how such tools can guide prediction and design of new Bcl-2 family complexes. PMID:22617328

  16. Deep Ion Torrent sequencing identifies soil fungal community shifts after frequent prescribed fires in a southeastern US forest ecosystem.

    PubMed

    Brown, Shawn P; Callaham, Mac A; Oliver, Alena K; Jumpponen, Ari

    2013-12-01

    Prescribed burning is a common management tool to control fuel loads, ground vegetation, and facilitate desirable game species. We evaluated soil fungal community responses to long-term prescribed fire treatments in a loblolly pine forest on the Piedmont of Georgia and utilized deep Internal Transcribed Spacer Region 1 (ITS1) amplicon sequencing afforded by the recent Ion Torrent Personal Genome Machine (PGM). These deep sequence data (19,000 + reads per sample after subsampling) indicate that frequent fires (3-year fire interval) shift soil fungus communities, whereas infrequent fires (6-year fire interval) permit system resetting to a state similar to that without prescribed fire. Furthermore, in nonmetric multidimensional scaling analyses, primarily ectomycorrhizal taxa were correlated with axes associated with long fire intervals, whereas soil saprobes tended to be correlated with the frequent fire recurrence. We conclude that (1) multiplexed Ion Torrent PGM analyses allow deep cost effective sequencing of fungal communities but may suffer from short read lengths and inconsistent sequence quality adjacent to the sequencing adaptor; (2) frequent prescribed fires elicit a shift in soil fungal communities; and (3) such shifts do not occur when fire intervals are longer. Our results emphasize the general responsiveness of these forests to management, and the importance of fire return intervals in meeting management objectives. © 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.

  17. RNA-Seq analysis to capture the transcriptome landscape of a single cell

    PubMed Central

    Tang, Fuchou; Barbacioru, Catalin; Nordman, Ellen; Xu, Nanlan; Bashkirov, Vladimir I; Lao, Kaiqin; Surani, M. Azim

    2013-01-01

    We describe here a protocol for digital transcriptome analysis in a single mouse blastomere using a deep sequencing approach. An individual blastomere was first isolated and put into lysate buffer by mouth pipette. Reverse transcription was then performed directly on the whole cell lysate. After this, the free primers were removed by Exonuclease I and a poly(A) tail was added to the 3′ end of the first-strand cDNA by Terminal Deoxynucleotidyl Transferase. Then the single cell cDNAs were amplified by 20 plus 9 cycles of PCR. Then 100-200 ng of these amplified cDNAs were used to construct a sequencing library. The sequencing library can be used for deep sequencing using the SOLiD system. Compared with the cDNA microarray technique, our assay can capture up to 75% more genes expressed in early embryos. The protocol can generate deep sequencing libraries within 6 days for 16 single cell samples. PMID:20203668

  18. Deep sequencing reveals double mutations in cis of MPL exon 10 in myeloproliferative neoplasms.

    PubMed

    Pietra, Daniela; Brisci, Angela; Rumi, Elisa; Boggi, Sabrina; Elena, Chiara; Pietrelli, Alessandro; Bordoni, Roberta; Ferrari, Maurizio; Passamonti, Francesco; De Bellis, Gianluca; Cremonesi, Laura; Cazzola, Mario

    2011-04-01

    Somatic mutations of MPL exon 10, mainly involving a W515 substitution, have been described in JAK2 (V617F)-negative patients with essential thrombocythemia and primary myelofibrosis. We used direct sequencing and high-resolution melt analysis to identify mutations of MPL exon 10 in 570 patients with myeloproliferative neoplasms, and allele specific PCR and deep sequencing to further characterize a subset of mutated patients. Somatic mutations were detected in 33 of 221 patients (15%) with JAK2 (V617F)-negative essential thrombocythemia or primary myelofibrosis. Only one patient with essential thrombocythemia carried both JAK2 (V617F) and MPL (W515L). High-resolution melt analysis identified abnormal patterns in all the MPL mutated cases, while direct sequencing did not detect the mutant MPL in one fifth of them. In 3 cases carrying double MPL mutations, deep sequencing analysis showed identical load and location in cis of the paired lesions, indicating their simultaneous occurrence on the same chromosome.

  19. The deep, hot biosphere: Twenty-five years of retrospection.

    PubMed

    Colman, Daniel R; Poudel, Saroj; Stamps, Blake W; Boyd, Eric S; Spear, John R

    2017-07-03

    Twenty-five years ago this month, Thomas Gold published a seminal manuscript suggesting the presence of a "deep, hot biosphere" in the Earth's crust. Since this publication, a considerable amount of attention has been given to the study of deep biospheres, their role in geochemical cycles, and their potential to inform on the origin of life and its potential outside of Earth. Overwhelming evidence now supports the presence of a deep biosphere ubiquitously distributed on Earth in both terrestrial and marine settings. Furthermore, it has become apparent that much of this life is dependent on lithogenically sourced high-energy compounds to sustain productivity. A vast diversity of uncultivated microorganisms has been detected in subsurface environments, and we show that H 2 , CH 4 , and CO feature prominently in many of their predicted metabolisms. Despite 25 years of intense study, key questions remain on life in the deep subsurface, including whether it is endemic and the extent of its involvement in the anaerobic formation and degradation of hydrocarbons. Emergent data from cultivation and next-generation sequencing approaches continue to provide promising new hints to answer these questions. As Gold suggested, and as has become increasingly evident, to better understand the subsurface is critical to further understanding the Earth, life, the evolution of life, and the potential for life elsewhere. To this end, we suggest the need to develop a robust network of interdisciplinary scientists and accessible field sites for long-term monitoring of the Earth's subsurface in the form of a deep subsurface microbiome initiative.

  20. Affinity Maturation of a Cyclic Peptide Handle for Therapeutic Antibodies Using Deep Mutational Scanning*

    PubMed Central

    van Rosmalen, Martijn; Janssen, Brian M. G.; Hendrikse, Natalie M.; van der Linden, Ardjan J.; Pieters, Pascal A.; Wanders, Dave; de Greef, Tom F. A.; Merkx, Maarten

    2017-01-01

    Meditopes are cyclic peptides that bind in a specific pocket in the antigen-binding fragment of a therapeutic antibody such as cetuximab. Provided their moderate affinity can be enhanced, meditope peptides could be used as specific non-covalent and paratope-independent handles in targeted drug delivery, molecular imaging, and therapeutic drug monitoring. Here we show that the affinity of a recently reported meditope for cetuximab can be substantially enhanced using a combination of yeast display and deep mutational scanning. Deep sequencing was used to construct a fitness landscape of this protein-peptide interaction, and four mutations were identified that together improved the affinity for cetuximab 10-fold to 15 nm. Importantly, the increased affinity translated into enhanced cetuximab-mediated recruitment to EGF receptor-overexpressing cancer cells. Although in silico Rosetta simulations correctly identified positions that were tolerant to mutation, modeling did not accurately predict the affinity-enhancing mutations. The experimental approach reported here should be generally applicable and could be used to develop meditope peptides with low nanomolar affinity for other therapeutic antibodies. PMID:27974464

  1. Deep sequencing analysis of viral infection and evolution allows rapid and detailed characterization of viral mutant spectrum.

    PubMed

    Isakov, Ofer; Bordería, Antonio V; Golan, David; Hamenahem, Amir; Celniker, Gershon; Yoffe, Liron; Blanc, Hervé; Vignuzzi, Marco; Shomron, Noam

    2015-07-01

    The study of RNA virus populations is a challenging task. Each population of RNA virus is composed of a collection of different, yet related genomes often referred to as mutant spectra or quasispecies. Virologists using deep sequencing technologies face major obstacles when studying virus population dynamics, both experimentally and in natural settings due to the relatively high error rates of these technologies and the lack of high performance pipelines. In order to overcome these hurdles we developed a computational pipeline, termed ViVan (Viral Variance Analysis). ViVan is a complete pipeline facilitating the identification, characterization and comparison of sequence variance in deep sequenced virus populations. Applying ViVan on deep sequenced data obtained from samples that were previously characterized by more classical approaches, we uncovered novel and potentially crucial aspects of virus populations. With our experimental work, we illustrate how ViVan can be used for studies ranging from the more practical, detection of resistant mutations and effects of antiviral treatments, to the more theoretical temporal characterization of the population in evolutionary studies. Freely available on the web at http://www.vivanbioinfo.org : nshomron@post.tau.ac.il Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  2. Cough event classification by pretrained deep neural network.

    PubMed

    Liu, Jia-Ming; You, Mingyu; Wang, Zheng; Li, Guo-Zheng; Xu, Xianghuai; Qiu, Zhongmin

    2015-01-01

    Cough is an essential symptom in respiratory diseases. In the measurement of cough severity, an accurate and objective cough monitor is expected by respiratory disease society. This paper aims to introduce a better performed algorithm, pretrained deep neural network (DNN), to the cough classification problem, which is a key step in the cough monitor. The deep neural network models are built from two steps, pretrain and fine-tuning, followed by a Hidden Markov Model (HMM) decoder to capture tamporal information of the audio signals. By unsupervised pretraining a deep belief network, a good initialization for a deep neural network is learned. Then the fine-tuning step is a back propogation tuning the neural network so that it can predict the observation probability associated with each HMM states, where the HMM states are originally achieved by force-alignment with a Gaussian Mixture Model Hidden Markov Model (GMM-HMM) on the training samples. Three cough HMMs and one noncough HMM are employed to model coughs and noncoughs respectively. The final decision is made based on viterbi decoding algorihtm that generates the most likely HMM sequence for each sample. A sample is labeled as cough if a cough HMM is found in the sequence. The experiments were conducted on a dataset that was collected from 22 patients with respiratory diseases. Patient dependent (PD) and patient independent (PI) experimental settings were used to evaluate the models. Five criteria, sensitivity, specificity, F1, macro average and micro average are shown to depict different aspects of the models. From overall evaluation criteria, the DNN based methods are superior to traditional GMM-HMM based method on F1 and micro average with maximal 14% and 11% error reduction in PD and 7% and 10% in PI, meanwhile keep similar performances on macro average. They also surpass GMM-HMM model on specificity with maximal 14% error reduction on both PD and PI. In this paper, we tried pretrained deep neural network in cough classification problem. Our results showed that comparing with the conventional GMM-HMM framework, the HMM-DNN could get better overall performance on cough classification task.

  3. ANME-2D Archaea Catalyze Methane Oxidation in Deep Subsurface Sediments Independent of Nitrate Reduction

    NASA Astrophysics Data System (ADS)

    Hernsdorf, A. W.; Amano, Y.; Suzuki, Y.; Ise, K.; Thomas, B. C.; Banfield, J. F.

    2015-12-01

    Terrestrial sediments are an important global reservoir for methane. Microorganisms in the deep subsurface play a critical role in the methane cycle, yet much remains to be learned about their diversity and metabolisms. To provide more comprehensive insight into the microbiology of the methane cycle in the deep subsurface, we conducted a genome-resolved study of samples collected from the Horonobe Underground Research Laboratory (HURL), Japan. Groundwater samples were obtained from three boreholes from a depth range of between 140 m and 250 m in two consecutive years. Groundwater was filtered and metagenomic DNA extracted and sequenced, and the sequence data assembled. Based on the sequences of phylogenetically informative genes on the assembled fragments, we detected a high degree of overlap in community composition across a vertical transect within one borehole at the two sampling times. However, there was comparatively little similarity observed among communities across boreholes. Spatial and temporal abundance patterns were used in combination with tetranucleotide signatures of assembled genome fragments to bin the data and reconstruct over 200 unique draft genomes, of which 137 are considered to be of high quality (>90% complete). The deepest samples from one borehole were highly dominated by an archaeon identified as ANME-2D; this organism was also present at lower abundance in all other samples from that borehole. Also abundant in these microbial communities were novel members of the Gammaproteobacteria, Saccharibacteria (TM7) and Tenericute phyla. Notably, a ~2 Mbp draft genome for the ANME-2D archaeon was reconstructed. As expected, the genome encodes all of the genes predicted to be involved in the reverse methanogenesis pathway. In contrast with the previously reported ANME2-D genome, the HURL ANME-2D genome lacks the capacity to reduce nitrate. However, we identified many multiheme cytochromes with closest similarity to those of the known Fe-reducing/oxidizing archaeon Ferroglobus placidus. Thus, we suggest that ANME2-D may couple methane oxidation to reduction of ferric iron minerals in the sediment and may be generally important as a link between the iron and methane cycles in deep subsurface environments. Such information has important implications for modeling the global carbon cycle.

  4. Identification of ribonucleotide reductase mutation causing temperature-sensitivity of herpes simplex virus isolates from whitlow by deep sequencing.

    PubMed

    Daikoku, Tohru; Oyama, Yukari; Yajima, Misako; Sekizuka, Tsuyoshi; Kuroda, Makoto; Shimada, Yuka; Takehara, Kazuhiko; Miwa, Naoko; Okuda, Tomoko; Sata, Tetsutaro; Shiraki, Kimiyasu

    2015-06-01

    Herpes simplex virus 2 caused a genital ulcer, and a secondary herpetic whitlow appeared during acyclovir therapy. The secondary and recurrent whitlow isolates were acyclovir-resistant and temperature-sensitive in contrast to a genital isolate. We identified the ribonucleotide reductase mutation responsible for temperature-sensitivity by deep-sequencing analysis.

  5. Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning.

    PubMed

    Teng, Haotian; Cao, Minh Duc; Hall, Michael B; Duarte, Tania; Wang, Sheng; Coin, Lachlan J M

    2018-05-01

    Sequencing by translocating DNA fragments through an array of nanopores is a rapidly maturing technology that offers faster and cheaper sequencing than other approaches. However, accurately deciphering the DNA sequence from the noisy and complex electrical signal is challenging. Here, we report Chiron, the first deep learning model to achieve end-to-end basecalling and directly translate the raw signal to DNA sequence without the error-prone segmentation step. Trained with only a small set of 4,000 reads, we show that our model provides state-of-the-art basecalling accuracy, even on previously unseen species. Chiron achieves basecalling speeds of more than 2,000 bases per second using desktop computer graphics processing units.

  6. Intronic splicing mutations in PTCH1 cause Gorlin syndrome.

    PubMed

    Bholah, Zaynab; Smith, Miriam J; Byers, Helen J; Miles, Emma K; Evans, D Gareth; Newman, William G

    2014-09-01

    Gorlin syndrome is an autosomal dominant disorder characterized by multiple early-onset basal cell carcinoma, odontogenic keratocysts and skeletal abnormalities. It is caused by heterozygous mutations in the tumour suppressor PTCH1. Routine clinical genetic testing, by Sanger sequencing and multiplex ligation-dependent probe amplification (MLPA) to confirm a clinical diagnosis of Gorlin syndrome, identifies a mutation in 60-90 % of cases. We undertook RNA analysis on lymphocytes from ten individuals diagnosed with Gorlin syndrome, but without known PTCH1 mutations by exonic sequencing or MLPA. Two altered PTCH1 transcripts were identified. Genomic DNA sequence analysis identified an intron 7 mutation c.1068-10T>A, which created a strong cryptic splice acceptor site, leading to an intronic insertion of eight bases; this is predicted to create a frameshift p.(His358Alafs*12). Secondly, a deep intronic mutation c.2561-2057A>G caused an inframe insertion of 78 intronic bases in the cDNA transcript, leading to a premature stop codon p.(Gly854fs*3). The mutations are predicted to cause loss of function of PTCH1, consistent with its tumour suppressor function. The findings indicate the importance of RNA analysis to detect intronic mutations in PTCH1 not identified by routine screening techniques.

  7. Deep-sea vent phage DNA polymerase specifically initiates DNA synthesis in the absence of primers.

    PubMed

    Zhu, Bin; Wang, Longfei; Mitsunobu, Hitoshi; Lu, Xueling; Hernandez, Alfredo J; Yoshida-Takashima, Yukari; Nunoura, Takuro; Tabor, Stanley; Richardson, Charles C

    2017-03-21

    A DNA polymerase is encoded by the deep-sea vent phage NrS-1. NrS-1 has a unique genome organization containing genes that are predicted to encode a helicase and a single-stranded DNA (ssDNA)-binding protein. The gene for an unknown protein shares weak homology with the bifunctional primase-polymerases (prim-pols) from archaeal plasmids but is missing the zinc-binding domain typically found in primases. We show that this gene product has efficient DNA polymerase activity and is processive in DNA synthesis in the presence of the NrS-1 helicase and ssDNA-binding protein. Remarkably, this NrS-1 DNA polymerase initiates DNA synthesis from a specific template DNA sequence in the absence of any primer. The de novo DNA polymerase activity resides in the N-terminal domain of the protein, whereas the C-terminal domain enhances DNA binding.

  8. Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data.

    PubMed

    Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A; Larsen, Martin Jakob

    2016-01-01

    Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths.

  9. Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data

    PubMed Central

    Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A.; Larsen, Martin Jakob

    2016-01-01

    Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths. PMID:27002637

  10. Deep sequencing of small RNA libraries from human prostate epithelial and stromal cells reveal distinct pattern of microRNAs primarily predicted to target growth factors.

    PubMed

    Singh, Savita; Zheng, Yun; Jagadeeswaran, Guru; Ebron, Jey Sabith; Sikand, Kavleen; Gupta, Sanjay; Sunker, Ramanjulu; Shukla, Girish C

    2016-02-28

    Complex epithelial and stromal cell interactions are required during the development and progression of prostate cancer. Regulatory small non-coding microRNAs (miRNAs) participate in the spatiotemporal regulation of messenger RNA (mRNA) and regulation of translation affecting a large number of genes involved in prostate carcinogenesis. In this study, through deep-sequencing of size fractionated small RNA libraries we profiled the miRNAs of prostate epithelial (PrEC) and stromal (PrSC) cells. Over 50 million reads were obtained for PrEC in which 860,468 were unique sequences. Similarly, nearly 76 million reads for PrSC were obtained in which over 1 million were unique reads. Expression of many miRNAs of broadly conserved and poorly conserved miRNA families were identified. Sixteen highly expressed miRNAs with significant change in expression in PrSC than PrEC were further analyzed in silico. ConsensusPathDB showed the target genes of these miRNAs were significantly involved in adherence junction, cell adhesion, EGRF, TGF-β and androgen signaling. Let-7 family of tumor-suppressor miRNAs expression was highly pervasive in both, PrEC and PrSC cells. In addition, we have also identified several miRNAs that are unique to PrEC or PrSC cells and their predicted putative targets are a group of transcription factors. This study provides perspective on the miRNA expression in PrEC and PrSC, and reveals a global trend in miRNA interactome. We conclude that the most abundant miRNAs are potential regulators of development and differentiation of the prostate gland by targeting a set of growth factors. Additionally, high level expression of the most members of let-7 family miRNAs suggests their role in the fine tuning of the growth and proliferation of prostate epithelial and stromal cells. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  11. microRNA expression profiling in fetal single ventricle malformation identified by deep sequencing.

    PubMed

    Yu, Zhang-Bin; Han, Shu-Ping; Bai, Yun-Fei; Zhu, Chun; Pan, Ya; Guo, Xi-Rong

    2012-01-01

    microRNAs (miRNAs) have emerged as key regulators in many biological processes, particularly cardiac growth and development, although the specific miRNA expression profile associated with this process remains to be elucidated. This study aimed to characterize the cellular microRNA profile involved in the development of congenital heart malformation, through the investigation of single ventricle (SV) defects. Comprehensive miRNA profiling in human fetal SV cardiac tissue was performed by deep sequencing. Differential expression of 48 miRNAs was revealed by sequencing by oligonucleotide ligation and detection (SOLiD) analysis. Of these, 38 were down-regulated and 10 were up-regulated in differentiated SV cardiac tissue, compared to control cardiac tissue. This was confirmed by real-time quantitative reverse transcription-polymerase chain reaction (qRT-PCR) analysis. Predicted target genes of the 48 differentially expressed miRNAs were analyzed by gene ontology and categorized according to cellular process, regulation of biological process and metabolic process. Pathway-Express analysis identified the WNT and mTOR signaling pathways as the most significant processes putatively affected by the differential expression of these miRNAs. The candidate genes involved in cardiac development were identified as potential targets for these differentially expressed microRNAs and the collaborative network of microRNAs and cardiac development related-mRNAs was constructed. These data provide the basis for future investigation of the mechanism of the occurrence and development of fetal SV malformations.

  12. Peregrine and saker falcon genome sequences provide insights into evolution of a predatory lifestyle.

    PubMed

    Zhan, Xiangjiang; Pan, Shengkai; Wang, Junyi; Dixon, Andrew; He, Jing; Muller, Margit G; Ni, Peixiang; Hu, Li; Liu, Yuan; Hou, Haolong; Chen, Yuanping; Xia, Jinquan; Luo, Qiong; Xu, Pengwei; Chen, Ying; Liao, Shengguang; Cao, Changchang; Gao, Shukun; Wang, Zhaobao; Yue, Zhen; Li, Guoqing; Yin, Ye; Fox, Nick C; Wang, Jun; Bruford, Michael W

    2013-05-01

    As top predators, falcons possess unique morphological, physiological and behavioral adaptations that allow them to be successful hunters: for example, the peregrine is renowned as the world's fastest animal. To examine the evolutionary basis of predatory adaptations, we sequenced the genomes of both the peregrine (Falco peregrinus) and saker falcon (Falco cherrug), and we present parallel, genome-wide evidence for evolutionary innovation and selection for a predatory lifestyle. The genomes, assembled using Illumina deep sequencing with greater than 100-fold coverage, are both approximately 1.2 Gb in length, with transcriptome-assisted prediction of approximately 16,200 genes for both species. Analysis of 8,424 orthologs in both falcons, chicken, zebra finch and turkey identified consistent evidence for genome-wide rapid evolution in these raptors. SNP-based inference showed contrasting recent demographic trajectories for the two falcons, and gene-based analysis highlighted falcon-specific evolutionary novelties for beak development and olfaction and specifically for homeostasis-related genes in the arid environment-adapted saker.

  13. Landscape of Infiltrating T Cells in Liver Cancer Revealed by Single-Cell Sequencing.

    PubMed

    Zheng, Chunhong; Zheng, Liangtao; Yoo, Jae-Kwang; Guo, Huahu; Zhang, Yuanyuan; Guo, Xinyi; Kang, Boxi; Hu, Ruozhen; Huang, Julie Y; Zhang, Qiming; Liu, Zhouzerui; Dong, Minghui; Hu, Xueda; Ouyang, Wenjun; Peng, Jirun; Zhang, Zemin

    2017-06-15

    Systematic interrogation of tumor-infiltrating lymphocytes is key to the development of immunotherapies and the prediction of their clinical responses in cancers. Here, we perform deep single-cell RNA sequencing on 5,063 single T cells isolated from peripheral blood, tumor, and adjacent normal tissues from six hepatocellular carcinoma patients. The transcriptional profiles of these individual cells, coupled with assembled T cell receptor (TCR) sequences, enable us to identify 11 T cell subsets based on their molecular and functional properties and delineate their developmental trajectory. Specific subsets such as exhausted CD8 + T cells and Tregs are preferentially enriched and potentially clonally expanded in hepatocellular carcinoma (HCC), and we identified signature genes for each subset. One of the genes, layilin, is upregulated on activated CD8 + T cells and Tregs and represses the CD8 + T cell functions in vitro. This compendium of transcriptome data provides valuable insights and a rich resource for understanding the immune landscape in cancers. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. The genetic landscape of a physical interaction

    PubMed Central

    Diss, Guillaume

    2018-01-01

    A key question in human genetics and evolutionary biology is how mutations in different genes combine to alter phenotypes. Efforts to systematically map genetic interactions have mostly made use of gene deletions. However, most genetic variation consists of point mutations of diverse and difficult to predict effects. Here, by developing a new sequencing-based protein interaction assay – deepPCA – we quantified the effects of >120,000 pairs of point mutations on the formation of the AP-1 transcription factor complex between the products of the FOS and JUN proto-oncogenes. Genetic interactions are abundant both in cis (within one protein) and trans (between the two molecules) and consist of two classes – interactions driven by thermodynamics that can be predicted using a three-parameter global model, and structural interactions between proximally located residues. These results reveal how physical interactions generate quantitatively predictable genetic interactions. PMID:29638215

  15. Deep learning predictions of survival based on MRI in amyotrophic lateral sclerosis.

    PubMed

    van der Burgh, Hannelore K; Schmidt, Ruben; Westeneng, Henk-Jan; de Reus, Marcel A; van den Berg, Leonard H; van den Heuvel, Martijn P

    2017-01-01

    Amyotrophic lateral sclerosis (ALS) is a progressive neuromuscular disease, with large variation in survival between patients. Currently, it remains rather difficult to predict survival based on clinical parameters alone. Here, we set out to use clinical characteristics in combination with MRI data to predict survival of ALS patients using deep learning, a machine learning technique highly effective in a broad range of big-data analyses. A group of 135 ALS patients was included from whom high-resolution diffusion-weighted and T1-weighted images were acquired at the first visit to the outpatient clinic. Next, each of the patients was monitored carefully and survival time to death was recorded. Patients were labeled as short, medium or long survivors, based on their recorded time to death as measured from the time of disease onset. In the deep learning procedure, the total group of 135 patients was split into a training set for deep learning (n = 83 patients), a validation set (n = 20) and an independent evaluation set (n = 32) to evaluate the performance of the obtained deep learning networks. Deep learning based on clinical characteristics predicted survival category correctly in 68.8% of the cases. Deep learning based on MRI predicted 62.5% correctly using structural connectivity and 62.5% using brain morphology data. Notably, when we combined the three sources of information, deep learning prediction accuracy increased to 84.4%. Taken together, our findings show the added value of MRI with respect to predicting survival in ALS, demonstrating the advantage of deep learning in disease prognostication.

  16. Genome-wide characterization of microRNA in foxtail millet (Setaria italica)

    PubMed Central

    2013-01-01

    Background MicroRNAs (miRNAs) are a class of short non-coding, endogenous RNAs that play key roles in many biological processes in both animals and plants. Although many miRNAs have been identified in a large number of organisms, the miRNAs in foxtail millet (Setaria italica) have, until now, been poorly understood. Results In this study, two replicate small RNA libraries from foxtail millet shoots were sequenced, and 40 million reads representing over 10 million unique sequences were generated. We identified 43 known miRNAs, 172 novel miRNAs and 2 mirtron precursor candidates in foxtail millet. Some miRNA*s of the known and novel miRNAs were detected as well. Further, eight novel miRNAs were validated by stem-loop RT-PCR. Potential targets of the foxtail millet miRNAs were predicted based on our strict criteria. Of the predicted target genes, 79% (351) had functional annotations in InterPro and GO analyses, indicating the targets of the miRNAs were involved in a wide range of regulatory functions and some specific biological processes. A total of 69 pairs of syntenic miRNA precursors that were conserved between foxtail millet and sorghum were found. Additionally, stem-loop RT-PCR was conducted to confirm the tissue-specific expression of some miRNAs in the four tissues identified by deep-sequencing. Conclusions We predicted, for the first time, 215 miRNAs and 447 miRNA targets in foxtail millet at a genome-wide level. The precursors, expression levels, miRNA* sequences, target functions, conservation, and evolution of miRNAs we identified were investigated. Some of the novel foxtail millet miRNAs and miRNA targets were validated experimentally. PMID:24330712

  17. Genome-wide characterization of microRNA in foxtail millet (Setaria italica).

    PubMed

    Yi, Fei; Xie, Shaojun; Liu, Yuwei; Qi, Xin; Yu, Jingjuan

    2013-12-13

    MicroRNAs (miRNAs) are a class of short non-coding, endogenous RNAs that play key roles in many biological processes in both animals and plants. Although many miRNAs have been identified in a large number of organisms, the miRNAs in foxtail millet (Setaria italica) have, until now, been poorly understood. In this study, two replicate small RNA libraries from foxtail millet shoots were sequenced, and 40 million reads representing over 10 million unique sequences were generated. We identified 43 known miRNAs, 172 novel miRNAs and 2 mirtron precursor candidates in foxtail millet. Some miRNA*s of the known and novel miRNAs were detected as well. Further, eight novel miRNAs were validated by stem-loop RT-PCR. Potential targets of the foxtail millet miRNAs were predicted based on our strict criteria. Of the predicted target genes, 79% (351) had functional annotations in InterPro and GO analyses, indicating the targets of the miRNAs were involved in a wide range of regulatory functions and some specific biological processes. A total of 69 pairs of syntenic miRNA precursors that were conserved between foxtail millet and sorghum were found. Additionally, stem-loop RT-PCR was conducted to confirm the tissue-specific expression of some miRNAs in the four tissues identified by deep-sequencing. We predicted, for the first time, 215 miRNAs and 447 miRNA targets in foxtail millet at a genome-wide level. The precursors, expression levels, miRNA* sequences, target functions, conservation, and evolution of miRNAs we identified were investigated. Some of the novel foxtail millet miRNAs and miRNA targets were validated experimentally.

  18. P-Hint-Hunt: a deep parallelized whole genome DNA methylation detection tool.

    PubMed

    Peng, Shaoliang; Yang, Shunyun; Gao, Ming; Liao, Xiangke; Liu, Jie; Yang, Canqun; Wu, Chengkun; Yu, Wenqiang

    2017-03-14

    The increasing studies have been conducted using whole genome DNA methylation detection as one of the most important part of epigenetics research to find the significant relationships among DNA methylation and several typical diseases, such as cancers and diabetes. In many of those studies, mapping the bisulfite treated sequence to the whole genome has been the main method to study DNA cytosine methylation. However, today's relative tools almost suffer from inaccuracies and time-consuming problems. In our study, we designed a new DNA methylation prediction tool ("Hint-Hunt") to solve the problem. By having an optimal complex alignment computation and Smith-Waterman matrix dynamic programming, Hint-Hunt could analyze and predict the DNA methylation status. But when Hint-Hunt tried to predict DNA methylation status with large-scale dataset, there are still slow speed and low temporal-spatial efficiency problems. In order to solve the problems of Smith-Waterman dynamic programming and low temporal-spatial efficiency, we further design a deep parallelized whole genome DNA methylation detection tool ("P-Hint-Hunt") on Tianhe-2 (TH-2) supercomputer. To the best of our knowledge, P-Hint-Hunt is the first parallel DNA methylation detection tool with a high speed-up to process large-scale dataset, and could run both on CPU and Intel Xeon Phi coprocessors. Moreover, we deploy and evaluate Hint-Hunt and P-Hint-Hunt on TH-2 supercomputer in different scales. The experimental results illuminate our tools eliminate the deviation caused by bisulfite treatment in mapping procedure and the multi-level parallel program yields a 48 times speed-up with 64 threads. P-Hint-Hunt gain a deep acceleration on CPU and Intel Xeon Phi heterogeneous platform, which gives full play of the advantages of multi-cores (CPU) and many-cores (Phi).

  19. DeepSynergy: predicting anti-cancer drug synergy with Deep Learning

    PubMed Central

    Preuer, Kristina; Lewis, Richard P I; Hochreiter, Sepp; Bender, Andreas; Bulusu, Krishna C; Klambauer, Günter

    2018-01-01

    Abstract Motivation While drug combination therapies are a well-established concept in cancer treatment, identifying novel synergistic combinations is challenging due to the size of combinatorial space. However, computational approaches have emerged as a time- and cost-efficient way to prioritize combinations to test, based on recently available large-scale combination screening data. Recently, Deep Learning has had an impact in many research areas by achieving new state-of-the-art model performance. However, Deep Learning has not yet been applied to drug synergy prediction, which is the approach we present here, termed DeepSynergy. DeepSynergy uses chemical and genomic information as input information, a normalization strategy to account for input data heterogeneity, and conical layers to model drug synergies. Results DeepSynergy was compared to other machine learning methods such as Gradient Boosting Machines, Random Forests, Support Vector Machines and Elastic Nets on the largest publicly available synergy dataset with respect to mean squared error. DeepSynergy significantly outperformed the other methods with an improvement of 7.2% over the second best method at the prediction of novel drug combinations within the space of explored drugs and cell lines. At this task, the mean Pearson correlation coefficient between the measured and the predicted values of DeepSynergy was 0.73. Applying DeepSynergy for classification of these novel drug combinations resulted in a high predictive performance of an AUC of 0.90. Furthermore, we found that all compared methods exhibit low predictive performance when extrapolating to unexplored drugs or cell lines, which we suggest is due to limitations in the size and diversity of the dataset. We envision that DeepSynergy could be a valuable tool for selecting novel synergistic drug combinations. Availability and implementation DeepSynergy is available via www.bioinf.jku.at/software/DeepSynergy. Contact klambauer@bioinf.jku.at Supplementary information Supplementary data are available at Bioinformatics online. PMID:29253077

  20. Fungal diversity in deep-sea sediments of a hydrothermal vent system in the Southwest Indian Ridge

    NASA Astrophysics Data System (ADS)

    Xu, Wei; Gong, Lin-feng; Pang, Ka-Lai; Luo, Zhu-Hua

    2018-01-01

    Deep-sea hydrothermal sediment is known to support remarkably diverse microbial consortia. In deep sea environments, fungal communities remain less studied despite their known taxonomic and functional diversity. High-throughput sequencing methods have augmented our capacity to assess eukaryotic diversity and their functions in microbial ecology. Here we provide the first description of the fungal community diversity found in deep sea sediments collected at the Southwest Indian Ridge (SWIR) using culture-dependent and high-throughput sequencing approaches. A total of 138 fungal isolates were cultured from seven different sediment samples using various nutrient media, and these isolates were identified to 14 fungal taxa, including 11 Ascomycota taxa (7 genera) and 3 Basidiomycota taxa (2 genera) based on internal transcribed spacers (ITS1, ITS2 and 5.8S) of rDNA. Using illumina HiSeq sequencing, a total of 757,467 fungal ITS2 tags were recovered from the samples and clustered into 723 operational taxonomic units (OTUs) belonging to 79 taxa (Ascomycota and Basidiomycota contributed to 99% of all samples) based on 97% sequence similarity. Results from both approaches suggest that there is a high fungal diversity in the deep-sea sediments collected in the SWIR and fungal communities were shown to be slightly different by location, although all were collected from adjacent sites at the SWIR. This study provides baseline data of the fungal diversity and biogeography, and a glimpse to the microbial ecology associated with the deep-sea sediments of the hydrothermal vent system of the Southwest Indian Ridge.

  1. Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes).

    PubMed

    Dessimoz, Christophe; Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro

    2011-09-01

    Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references.

  2. Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes)

    PubMed Central

    Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro

    2011-01-01

    Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references. PMID:21712341

  3. Tactical Determinants of Setting Zone in Elite Men'S Volleyball

    PubMed Central

    Afonso, Jose; Esteves, Francisca; Araújo, Rui; Thomas, Luke; Mesquita, Isabel

    2012-01-01

    The interactions between two opposing teams lead to the emergence of unique game patterns. In volleyball, attack efficacy emerges as the strongest predictor of the final result and thus it becomes of foremost importance to understand which game patterns afford the attaining of higher attack efficacies. These rely on the quality of the setting action. In turn, the serve and the serve reception constrain the setter's actions and the attacker's efficacy. Therefore, the purpose of this study was to examine predictors of the setting zone in elite-level men's volleyball. Thirty-one matches of the 2007 World Cup were analyzed, in total 5117 rallies. The dependent variable was the setting zone, and the independent variables were the server player, serve type, serve direction, serve depth, reception zone, receiver player and reception type. Multinomial logistic regression was applied, in order to obtain the estimated likelihood of occurrence of the dependent variable, based on the values of the independent variables (p < 0.05). Only the serve direction showed not to be predictive of the setting zone. Concerning the remaining variables, the tennis jump serve, serves from the middle-player, deep serves, reception near the endline or sidelines, reception by the zone 4 attackers when in defensive zone, and low reception all proved to impair the quality of reception, demanding the setter to play more often in the not acceptable setting zone. Results suggest that, at this level, practice of serve-reception should preferably cover the deep tennis jump serve, and attempt to afford the libero more opportunities to receive. By focusing on the variables with the most predictive power, performers may better allocate their attention towards the most pertinent cues at each moment. Knowledge of these interactive models provides valuable insights into the dynamics of the action sequences, affording coaches important information and guidance. Key pointsA set of key variables interact and allow predicting the setting zone, an important variable in determining attack efficacy in high-level men's volleyball.The tennis jump serve, deep serves, receptions near the endline or sidelines, serves from the middle-players, receptions by the zone 4 attackers when in defensive zone, and low reception enhance the utilization of non-ideal setting zones.By focusing on the variables with the most predictive power, performers may better allocate their attention towards the most pertinent cues at each moment.Knowledge of these interactive models provides valuable insights into the dynamics of the action sequences, affording coaches important information and guidance. PMID:24149123

  4. High-throughput sequencing and analysis of the gill tissue transcriptome from the deep-sea hydrothermal vent mussel Bathymodiolus azoricus

    PubMed Central

    2010-01-01

    Background Bathymodiolus azoricus is a deep-sea hydrothermal vent mussel found in association with large faunal communities living in chemosynthetic environments at the bottom of the sea floor near the Azores Islands. Investigation of the exceptional physiological reactions that vent mussels have adopted in their habitat, including responses to environmental microbes, remains a difficult challenge for deep-sea biologists. In an attempt to reveal genes potentially involved in the deep-sea mussel innate immunity we carried out a high-throughput sequence analysis of freshly collected B. azoricus transcriptome using gills tissues as the primary source of immune transcripts given its strategic role in filtering the surrounding waterborne potentially infectious microorganisms. Additionally, a substantial EST data set was produced and from which a comprehensive collection of genes coding for putative proteins was organized in a dedicated database, "DeepSeaVent" the first deep-sea vent animal transcriptome database based on the 454 pyrosequencing technology. Results A normalized cDNA library from gills tissue was sequenced in a full 454 GS-FLX run, producing 778,996 sequencing reads. Assembly of the high quality reads resulted in 75,407 contigs of which 3,071 were singletons. A total of 39,425 transcripts were conceptually translated into amino-sequences of which 22,023 matched known proteins in the NCBI non-redundant protein database, 15,839 revealed conserved protein domains through InterPro functional classification and 9,584 were assigned with Gene Ontology terms. Queries conducted within the database enabled the identification of genes putatively involved in immune and inflammatory reactions which had not been previously evidenced in the vent mussel. Their physical counterpart was confirmed by semi-quantitative quantitative Reverse-Transcription-Polymerase Chain Reactions (RT-PCR) and their RNA transcription level by quantitative PCR (qPCR) experiments. Conclusions We have established the first tissue transcriptional analysis of a deep-sea hydrothermal vent animal and generated a searchable catalog of genes that provides a direct method of identifying and retrieving vast numbers of novel coding sequences which can be applied in gene expression profiling experiments from a non-conventional model organism. This provides the most comprehensive sequence resource for identifying novel genes currently available for a deep-sea vent organism, in particular, genes putatively involved in immune and inflammatory reactions in vent mussels. The characterization of the B. azoricus transcriptome will facilitate research into biological processes underlying physiological adaptations to hydrothermal vent environments and will provide a basis for expanding our understanding of genes putatively involved in adaptations processes during post-capture long term acclimatization experiments, at "sea-level" conditions, using B. azoricus as a model organism. PMID:20937131

  5. Rotational motions from the 2016, Central Italy seismic sequence, as observed by an underground ring laser gyroscope

    NASA Astrophysics Data System (ADS)

    Simonelli, A.; Igel, H.; Wassermann, J.; Belfi, J.; Di Virgilio, A.; Beverini, N.; De Luca, G.; Saccorotti, G.

    2018-05-01

    We present the analysis of rotational and translational ground motions from earthquakes recorded during October/November, 2016, in association with the Central Italy seismic-sequence. We use co-located measurements of the vertical ground rotation rate from a large ring laser gyroscope (RLG), and the three components of ground velocity from a broadband seismometer. Both instruments are positioned in a deep underground environment, within the Gran Sasso National Laboratories (LNGS) of the Istituto Nazionale di Fisica Nucleare (INFN). We collected dozens of events spanning the 3.5-5.9 Magnitude range, and epicentral distances between 30 km and 70 km. This data set constitutes an unprecedented observation of the vertical rotational motions associated with an intense seismic sequence at local distance. Under the plane wave approximation we process the data set in order to get an experimental estimation of the events back azimuth. Peak values of rotation rate (PRR) and horizontal acceleration (PGA) are markedly correlated, according to a scaling constant which is consistent with previous measurements from different earthquake sequences. We used a prediction model in use for Italy to calculate the expected PGA at the recording site, obtaining consequently predictions for PRR. Within the modeling uncertainties, predicted rotations are consistent with the observed ones, suggesting the possibility of establishing specific attenuation models for ground rotations, like the scaling of peak velocity and peak acceleration in empirical ground-motion prediction relationships. In a second step, after identifying the direction of the incoming wave-field, we extract phase velocity data using the spectral ratio of the translational and rotational components.. This analysis is performed over time windows associated with the P-coda, S-coda and Lg phase. Results are consistent with independent estimates of shear-wave velocities in the shallow crust of the Central Apennines.

  6. Accurate Prediction of Protein Contact Maps by Coupling Residual Two-Dimensional Bidirectional Long Short-Term Memory with Convolutional Neural Networks.

    PubMed

    Hanson, Jack; Paliwal, Kuldip; Litfin, Thomas; Yang, Yuedong; Zhou, Yaoqi

    2018-06-19

    Accurate prediction of a protein contact map depends greatly on capturing as much contextual information as possible from surrounding residues for a target residue pair. Recently, ultra-deep residual convolutional networks were found to be state-of-the-art in the latest Critical Assessment of Structure Prediction techniques (CASP12, (Schaarschmidt et al., 2018)) for protein contact map prediction by attempting to provide a protein-wide context at each residue pair. Recurrent neural networks have seen great success in recent protein residue classification problems due to their ability to propagate information through long protein sequences, especially Long Short-Term Memory (LSTM) cells. Here we propose a novel protein contact map prediction method by stacking residual convolutional networks with two-dimensional residual bidirectional recurrent LSTM networks, and using both one-dimensional sequence-based and two-dimensional evolutionary coupling-based information. We show that the proposed method achieves a robust performance over validation and independent test sets with the Area Under the receiver operating characteristic Curve (AUC)>0.95 in all tests. When compared to several state-of-the-art methods for independent testing of 228 proteins, the method yields an AUC value of 0.958, whereas the next-best method obtains an AUC of 0.909. More importantly, the improvement is over contacts at all sequence-position separations. Specifically, a 8.95%, 5.65% and 2.84% increase in precision were observed for the top L∕10 predictions over the next best for short, medium and long-range contacts, respectively. This confirms the usefulness of ResNets to congregate the short-range relations and 2D-BRLSTM to propagate the long-range dependencies throughout the entire protein contact map 'image'. SPOT-Contact server url: http://sparks-lab.org/jack/server/SPOT-Contact/. Supplementary data is available at Bioinformatics online.

  7. Dendrites, deep learning, and sequences in the hippocampus.

    PubMed

    Bhalla, Upinder S

    2017-10-12

    The hippocampus places us both in time and space. It does so over remarkably large spans: milliseconds to years, and centimeters to kilometers. This works for sensory representations, for memory, and for behavioral context. How does it fit in such wide ranges of time and space scales, and keep order among the many dimensions of stimulus context? A key organizing principle for a wide sweep of scales and stimulus dimensions is that of order in time, or sequences. Sequences of neuronal activity are ubiquitous in sensory processing, in motor control, in planning actions, and in memory. Against this strong evidence for the phenomenon, there are currently more models than definite experiments about how the brain generates ordered activity. The flip side of sequence generation is discrimination. Discrimination of sequences has been extensively studied at the behavioral, systems, and modeling level, but again physiological mechanisms are fewer. It is against this backdrop that I discuss two recent developments in neural sequence computation, that at face value share little beyond the label "neural." These are dendritic sequence discrimination, and deep learning. One derives from channel physiology and molecular signaling, the other from applied neural network theory - apparently extreme ends of the spectrum of neural circuit detail. I suggest that each of these topics has deep lessons about the possible mechanisms, scales, and capabilities of hippocampal sequence computation. © 2017 Wiley Periodicals, Inc.

  8. A robust and cost-effective approach to sequence and analyze complete genomes of small RNA viruses

    USDA-ARS?s Scientific Manuscript database

    Background: Next-generation sequencing (NGS) allows ultra-deep sequencing of nucleic acids. The use of sequence-independent amplification of viral nucleic acids without utilization of target-specific primers provides advantages over traditional sequencing methods and allows detection of unsuspected ...

  9. Large-Scale Comparative Phenotypic and Genomic Analyses Reveal Ecological Preferences of Shewanella Species and Identify Metabolic Pathways Conserved at the Genus Level ▿ †

    PubMed Central

    Rodrigues, Jorge L. M.; Serres, Margrethe H.; Tiedje, James M.

    2011-01-01

    The use of comparative genomics for the study of different microbiological species has increased substantially as sequence technologies become more affordable. However, efforts to fully link a genotype to its phenotype remain limited to the development of one mutant at a time. In this study, we provided a high-throughput alternative to this limiting step by coupling comparative genomics to the use of phenotype arrays for five sequenced Shewanella strains. Positive phenotypes were obtained for 441 nutrients (C, N, P, and S sources), with N-based compounds being the most utilized for all strains. Many genes and pathways predicted by genome analyses were confirmed with the comparative phenotype assay, and three degradation pathways believed to be missing in Shewanella were confirmed as missing. A number of previously unknown gene products were predicted to be parts of pathways or to have a function, expanding the number of gene targets for future genetic analyses. Ecologically, the comparative high-throughput phenotype analysis provided insights into niche specialization among the five different strains. For example, Shewanella amazonensis strain SB2B, isolated from the Amazon River delta, was capable of utilizing 60 C compounds, whereas Shewanella sp. strain W3-18-1, isolated from deep marine sediment, utilized only 25 of them. In spite of the large number of nutrient sources yielding positive results, our study indicated that except for the N sources, they were not sufficiently informative to predict growth phenotypes from increasing evolutionary distances. Our results indicate the importance of phenotypic evaluation for confirming genome predictions. This strategy will accelerate the functional discovery of genes and provide an ecological framework for microbial genome sequencing projects. PMID:21642407

  10. Insights into the immuno-molecular biology of Angiostrongylus vasorum through transcriptomics--prospects for new interventions.

    PubMed

    Ansell, Brendan R E; Schnyder, Manuela; Deplazes, Peter; Korhonen, Pasi K; Young, Neil D; Hall, Ross S; Mangiola, Stefano; Boag, Peter R; Hofmann, Andreas; Sternberg, Paul W; Jex, Aaron R; Gasser, Robin B

    2013-12-01

    Angiostrongylus vasorum is a metastrongyloid nematode of dogs and other canids of major clinical importance in many countries. In order to gain first insights into the molecular biology of this worm, we conducted the first large-scale exploration of its transcriptome, and predicted essential molecules linked to metabolic and biological processes as well as host immune responses. We also predicted and prioritized drug targets and drug candidates. Following Illumina sequencing (RNA-seq), 52.3 million sequence reads representing adult A. vasorum were assembled and annotated. The assembly yielded 20,033 contigs, which encoded proteins with 11,505 homologues in Caenorhabditis elegans, and additional 2252 homologues in various other parasitic helminths for which curated data sets were publicly available. Functional annotation was achieved for 11,752 (58.6%) proteins predicted for A. vasorum, including peptidases (4.5%) and peptidase inhibitors (1.6%), protein kinases (1.7%), G protein-coupled receptors (GPCRs) (1.5%) and phosphatases (1.2%). Contigs encoding excretory/secretory and immuno-modulatory proteins represented some of the most highly transcribed molecules, and encoded enzymes that digest haemoglobin were conserved between A. vasorum and other blood-feeding nematodes. Using an essentiality-based approach, drug targets, including neurotransmitter receptors, an important chemosensory ion channel and cysteine proteinase-3 were predicted in A. vasorum, as were associated small molecular inhibitors/activators. Future transcriptomic analyses of all developmental stages of A. vasorum should facilitate deep explorations of the molecular biology of this important parasitic nematode and support the sequencing of its genome. These advances will provide a foundation for exploring immuno-molecular aspects of angiostrongylosis and have the potential to underpin the discovery of new methods of intervention. © 2013.

  11. De novo transcriptome assembly and positive selection analysis of an individual deep-sea fish.

    PubMed

    Lan, Yi; Sun, Jin; Xu, Ting; Chen, Chong; Tian, Renmao; Qiu, Jian-Wen; Qian, Pei-Yuan

    2018-05-24

    High hydrostatic pressure and low temperatures make the deep sea a harsh environment for life forms. Actin organization and microtubules assembly, which are essential for intracellular transport and cell motility, can be disrupted by high hydrostatic pressure. High hydrostatic pressure can also damage DNA. Nucleic acids exposed to low temperatures can form secondary structures that hinder genetic information processing. To study how deep-sea creatures adapt to such a hostile environment, one of the most straightforward ways is to sequence and compare their genes with those of their shallow-water relatives. We captured an individual of the fish species Aldrovandia affinis, which is a typical deep-sea inhabitant, from the Okinawa Trough at a depth of 1550 m using a remotely operated vehicle (ROV). We sequenced its transcriptome and analyzed its molecular adaptation. We obtained 27,633 protein coding sequences using an Illumina platform and compared them with those of several shallow-water fish species. Analysis of 4918 single-copy orthologs identified 138 positively selected genes in A. affinis, including genes involved in microtubule regulation. Particularly, functional domains related to cold shock as well as DNA repair are exposed to positive selection pressure in both deep-sea fish and hadal amphipod. Overall, we have identified a set of positively selected genes related to cytoskeleton structures, DNA repair and genetic information processing, which shed light on molecular adaptation to the deep sea. These results suggest that amino acid substitutions of these positively selected genes may contribute crucially to the adaptation of deep-sea animals. Additionally, we provide a high-quality transcriptome of a deep-sea fish for future deep-sea studies.

  12. Naturally occurring dominant drug resistance mutations occur infrequently in the setting of recently acquired hepatitis C.

    PubMed

    Applegate, Tanya L; Gaudieri, Silvana; Plauzolles, Anne; Chopra, Abha; Grebely, Jason; Lucas, Michaela; Hellard, Margaret; Luciani, Fabio; Dore, Gregory J; Matthews, Gail V

    2015-01-01

    Direct-acting antivirals (DAAs) are predicted to transform hepatitis C therapy, yet little is known about the prevalence of naturally occurring resistance mutations in recently acquired HCV. This study aimed to determine the prevalence and frequency of drug resistance mutations in the viral quasispecies among HIV-positive and -negative individuals with recent HCV. The NS3 protease, NS5A and NS5B polymerase genes were amplified from 50 genotype 1a participants of the Australian Trial in Acute Hepatitis C. Amino acid variations at sites known to be associated with possible drug resistance were analysed by ultra-deep pyrosequencing. A total of 12% of individuals harboured dominant resistance mutations, while 36% demonstrated non-dominant resistant variants below that detectable by bulk sequencing (that is, <20%) but above a threshold of 1%. Resistance variants (<1%) were observed at most sites associated with DAA resistance from all classes, with the exception of sofosbuvir. Dominant resistant mutations were uncommonly observed in the setting of recent HCV. However, low-level mutations to all DAA classes were observed by deep sequencing at the majority of sites and in most individuals. The significance of these variants and impact on future treatment options remains to be determined. Clinicaltrials.gov NCT00192569.

  13. Revealing the unexplored fungal communities in deep groundwater of crystalline bedrock fracture zones in Olkiluoto, Finland.

    PubMed

    Sohlberg, Elina; Bomberg, Malin; Miettinen, Hanna; Nyyssönen, Mari; Salavirta, Heikki; Vikman, Minna; Itävaara, Merja

    2015-01-01

    The diversity and functional role of fungi, one of the ecologically most important groups of eukaryotic microorganisms, remains largely unknown in deep biosphere environments. In this study we investigated fungal communities in packer-isolated bedrock fractures in Olkiluoto, Finland at depths ranging from 296 to 798 m below surface level. DNA- and cDNA-based high-throughput amplicon sequencing analysis of the fungal internal transcribed spacer (ITS) gene markers was used to examine the total fungal diversity and to identify the active members in deep fracture zones at different depths. Results showed that fungi were present in fracture zones at all depths and fungal diversity was higher than expected. Most of the observed fungal sequences belonged to the phylum Ascomycota. Phyla Basidiomycota and Chytridiomycota were only represented as a minor part of the fungal community. Dominating fungal classes in the deep bedrock aquifers were Sordariomycetes, Eurotiomycetes, and Dothideomycetes from the Ascomycota phylum and classes Microbotryomycetes and Tremellomycetes from the Basidiomycota phylum, which are the most frequently detected fungal taxa reported also from deep sea environments. In addition some fungal sequences represented potentially novel fungal species. Active fungi were detected in most of the fracture zones, which proves that fungi are able to maintain cellular activity in these oligotrophic conditions. Possible roles of fungi and their origin in deep bedrock groundwater can only be speculated in the light of current knowledge but some species may be specifically adapted to deep subsurface environment and may play important roles in the utilization and recycling of nutrients and thus sustaining the deep subsurface microbial community.

  14. DeepMirTar: a deep-learning approach for predicting human miRNA targets.

    PubMed

    Wen, Ming; Cong, Peisheng; Zhang, Zhimin; Lu, Hongmei; Li, Tonghua

    2018-06-01

    MicroRNAs (miRNAs) are small noncoding RNAs that function in RNA silencing and post-transcriptional regulation of gene expression by targeting messenger RNAs (mRNAs). Because the underlying mechanisms associated with miRNA binding to mRNA are not fully understood, a major challenge of miRNA studies involves the identification of miRNA-target sites on mRNA. In silico prediction of miRNA-target sites can expedite costly and time-consuming experimental work by providing the most promising miRNA-target-site candidates. In this study, we reported the design and implementation of DeepMirTar, a deep-learning-based approach for accurately predicting human miRNA targets at the site level. The predicted miRNA-target sites are those having canonical or non-canonical seed, and features, including high-level expert-designed, low-level expert-designed, and raw-data-level, were used to represent the miRNA-target site. Comparison with other state-of-the-art machine-learning methods and existing miRNA-target-prediction tools indicated that DeepMirTar improved overall predictive performance. DeepMirTar is freely available at https://github.com/Bjoux2/DeepMirTar_SdA. lith@tongji.edu.cn, hongmeilu@csu.edu.cn. Supplementary data are available at Bioinformatics online.

  15. Ultra-deep mutant spectrum profiling: improving sequencing accuracy using overlapping read pairs.

    PubMed

    Chen-Harris, Haiyin; Borucki, Monica K; Torres, Clinton; Slezak, Tom R; Allen, Jonathan E

    2013-02-12

    High throughput sequencing is beginning to make a transformative impact in the area of viral evolution. Deep sequencing has the potential to reveal the mutant spectrum within a viral sample at high resolution, thus enabling the close examination of viral mutational dynamics both within- and between-hosts. The challenge however, is to accurately model the errors in the sequencing data and differentiate real viral mutations, particularly those that exist at low frequencies, from sequencing errors. We demonstrate that overlapping read pairs (ORP) -- generated by combining short fragment sequencing libraries and longer sequencing reads -- significantly reduce sequencing error rates and improve rare variant detection accuracy. Using this sequencing protocol and an error model optimized for variant detection, we are able to capture a large number of genetic mutations present within a viral population at ultra-low frequency levels (<0.05%). Our rare variant detection strategies have important implications beyond viral evolution and can be applied to any basic and clinical research area that requires the identification of rare mutations.

  16. Four Major South Korea's Rivers Using Deep Learning Models.

    PubMed

    Lee, Sangmok; Lee, Donghyun

    2018-06-24

    Harmful algal blooms are an annual phenomenon that cause environmental damage, economic losses, and disease outbreaks. A fundamental solution to this problem is still lacking, thus, the best option for counteracting the effects of algal blooms is to improve advance warnings (predictions). However, existing physical prediction models have difficulties setting a clear coefficient indicating the relationship between each factor when predicting algal blooms, and many variable data sources are required for the analysis. These limitations are accompanied by high time and economic costs. Meanwhile, artificial intelligence and deep learning methods have become increasingly common in scientific research; attempts to apply the long short-term memory (LSTM) model to environmental research problems are increasing because the LSTM model exhibits good performance for time-series data prediction. However, few studies have applied deep learning models or LSTM to algal bloom prediction, especially in South Korea, where algal blooms occur annually. Therefore, we employed the LSTM model for algal bloom prediction in four major rivers of South Korea. We conducted short-term (one week) predictions by employing regression analysis and deep learning techniques on a newly constructed water quality and quantity dataset drawn from 16 dammed pools on the rivers. Three deep learning models (multilayer perceptron, MLP; recurrent neural network, RNN; and long short-term memory, LSTM) were used to predict chlorophyll-a, a recognized proxy for algal activity. The results were compared to those from OLS (ordinary least square) regression analysis and actual data based on the root mean square error (RSME). The LSTM model showed the highest prediction rate for harmful algal blooms and all deep learning models out-performed the OLS regression analysis. Our results reveal the potential for predicting algal blooms using LSTM and deep learning.

  17. Deep RNNs for video denoising

    NASA Astrophysics Data System (ADS)

    Chen, Xinyuan; Song, Li; Yang, Xiaokang

    2016-09-01

    Video denoising can be described as the problem of mapping from a specific length of noisy frames to clean one. We propose a deep architecture based on Recurrent Neural Network (RNN) for video denoising. The model learns a patch-based end-to-end mapping between the clean and noisy video sequences. It takes the corrupted video sequences as the input and outputs the clean one. Our deep network, which we refer to as deep Recurrent Neural Networks (deep RNNs or DRNNs), stacks RNN layers where each layer receives the hidden state of the previous layer as input. Experiment shows (i) the recurrent architecture through temporal domain extracts motion information and does favor to video denoising, and (ii) deep architecture have large enough capacity for expressing mapping relation between corrupted videos as input and clean videos as output, furthermore, (iii) the model has generality to learned different mappings from videos corrupted by different types of noise (e.g., Poisson-Gaussian noise). By training on large video databases, we are able to compete with some existing video denoising methods.

  18. The deep, hot biosphere: Twenty-five years of retrospection

    PubMed Central

    Colman, Daniel R.; Poudel, Saroj; Stamps, Blake W.; Boyd, Eric S.; Spear, John R.

    2017-01-01

    Twenty-five years ago this month, Thomas Gold published a seminal manuscript suggesting the presence of a “deep, hot biosphere” in the Earth’s crust. Since this publication, a considerable amount of attention has been given to the study of deep biospheres, their role in geochemical cycles, and their potential to inform on the origin of life and its potential outside of Earth. Overwhelming evidence now supports the presence of a deep biosphere ubiquitously distributed on Earth in both terrestrial and marine settings. Furthermore, it has become apparent that much of this life is dependent on lithogenically sourced high-energy compounds to sustain productivity. A vast diversity of uncultivated microorganisms has been detected in subsurface environments, and we show that H2, CH4, and CO feature prominently in many of their predicted metabolisms. Despite 25 years of intense study, key questions remain on life in the deep subsurface, including whether it is endemic and the extent of its involvement in the anaerobic formation and degradation of hydrocarbons. Emergent data from cultivation and next-generation sequencing approaches continue to provide promising new hints to answer these questions. As Gold suggested, and as has become increasingly evident, to better understand the subsurface is critical to further understanding the Earth, life, the evolution of life, and the potential for life elsewhere. To this end, we suggest the need to develop a robust network of interdisciplinary scientists and accessible field sites for long-term monitoring of the Earth’s subsurface in the form of a deep subsurface microbiome initiative. PMID:28674200

  19. Phylogenetic and enzymatic diversity of deep subseafloor aerobic microorganisms in organics- and methane-rich sediments off Shimokita Peninsula.

    PubMed

    Kobayashi, Tohru; Koide, Osamu; Mori, Kozue; Shimamura, Shigeru; Matsuura, Takae; Miura, Takeshi; Takaki, Yoshihiro; Morono, Yuki; Nunoura, Takuro; Imachi, Hiroyuki; Inagaki, Fumio; Takai, Ken; Horikoshi, Koki

    2008-07-01

    "A meta-enzyme approach" is proposed as an ecological enzymatic method to explore the potential functions of microbial communities in extreme environments such as the deep marine subsurface. We evaluated a variety of extra-cellular enzyme activities of sediment slurries and isolates from a deep subseafloor sediment core. Using the new deep-sea drilling vessel "Chikyu", we obtained 365 m of core sediments that contained approximately 2% organic matter and considerable amounts of methane from offshore the Shimokita Peninsula in Japan at a water depth of 1,180 m. In the extra-sediment fraction of the slurry samples, phosphatase, esterase, and catalase activities were detected consistently throughout the core sediments down to the deepest slurry sample from 342.5 m below seafloor (mbsf). Detectable enzyme activities predicted the existence of a sizable population of viable aerobic microorganisms even in deep subseafloor habitats. The subsequent quantitative cultivation using solid media represented remarkably high numbers of aerobic, heterotrophic microbial populations (e.g., maximally 4.4x10(7) cells cm(-3) at 342.5 mbsf). Analysis of 16S rRNA gene sequences revealed that the predominant cultivated microbial components were affiliated with the genera Bacillus, Shewanella, Pseudoalteromonas, Halomonas, Pseudomonas, Paracoccus, Rhodococcus, Microbacterium, and Flexibacteracea. Many of the predominant and scarce isolates produced a variety of extra-cellular enzymes such as proteases, amylases, lipases, chitinases, phosphatases, and deoxyribonucleases. Our results indicate that microbes in the deep subseafloor environment off Shimokita are metabolically active and that the cultivable populations may have a great potential in biotechnology.

  20. deepTools2: a next generation web server for deep-sequencing data analysis.

    PubMed

    Ramírez, Fidel; Ryan, Devon P; Grüning, Björn; Bhardwaj, Vivek; Kilpert, Fabian; Richter, Andreas S; Heyne, Steffen; Dündar, Friederike; Manke, Thomas

    2016-07-08

    We present an update to our Galaxy-based web server for processing and visualizing deeply sequenced data. Its core tool set, deepTools, allows users to perform complete bioinformatic workflows ranging from quality controls and normalizations of aligned reads to integrative analyses, including clustering and visualization approaches. Since we first described our deepTools Galaxy server in 2014, we have implemented new solutions for many requests from the community and our users. Here, we introduce significant enhancements and new tools to further improve data visualization and interpretation. deepTools continue to be open to all users and freely available as a web service at deeptools.ie-freiburg.mpg.de The new deepTools2 suite can be easily deployed within any Galaxy framework via the toolshed repository, and we also provide source code for command line usage under Linux and Mac OS X. A public and documented API for access to deepTools functionality is also available. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. Deep ART Neural Model for Biologically Inspired Episodic Memory and Its Application to Task Performance of Robots.

    PubMed

    Park, Gyeong-Moon; Yoo, Yong-Ho; Kim, Deok-Hwa; Kim, Jong-Hwan; Gyeong-Moon Park; Yong-Ho Yoo; Deok-Hwa Kim; Jong-Hwan Kim; Yoo, Yong-Ho; Park, Gyeong-Moon; Kim, Jong-Hwan; Kim, Deok-Hwa

    2018-06-01

    Robots are expected to perform smart services and to undertake various troublesome or difficult tasks in the place of humans. Since these human-scale tasks consist of a temporal sequence of events, robots need episodic memory to store and retrieve the sequences to perform the tasks autonomously in similar situations. As episodic memory, in this paper we propose a novel Deep adaptive resonance theory (ART) neural model and apply it to the task performance of the humanoid robot, Mybot, developed in the Robot Intelligence Technology Laboratory at KAIST. Deep ART has a deep structure to learn events, episodes, and even more like daily episodes. Moreover, it can retrieve the correct episode from partial input cues robustly. To demonstrate the effectiveness and applicability of the proposed Deep ART, experiments are conducted with the humanoid robot, Mybot, for performing the three tasks of arranging toys, making cereal, and disposing of garbage.

  2. Predicting healthcare trajectories from medical records: A deep learning approach.

    PubMed

    Pham, Trang; Tran, Truyen; Phung, Dinh; Venkatesh, Svetha

    2017-05-01

    Personalized predictive medicine necessitates the modeling of patient illness and care processes, which inherently have long-term temporal dependencies. Healthcare observations, stored in electronic medical records are episodic and irregular in time. We introduce DeepCare, an end-to-end deep dynamic neural network that reads medical records, stores previous illness history, infers current illness states and predicts future medical outcomes. At the data level, DeepCare represents care episodes as vectors and models patient health state trajectories by the memory of historical records. Built on Long Short-Term Memory (LSTM), DeepCare introduces methods to handle irregularly timed events by moderating the forgetting and consolidation of memory. DeepCare also explicitly models medical interventions that change the course of illness and shape future medical risk. Moving up to the health state level, historical and present health states are then aggregated through multiscale temporal pooling, before passing through a neural network that estimates future outcomes. We demonstrate the efficacy of DeepCare for disease progression modeling, intervention recommendation, and future risk prediction. On two important cohorts with heavy social and economic burden - diabetes and mental health - the results show improved prediction accuracy. Copyright © 2017 Elsevier Inc. All rights reserved.

  3. Optimization of whole-transcriptome amplification from low cell density deep-sea microbial samples for metatranscriptomic analysis.

    PubMed

    Wu, Jieying; Gao, Weimin; Zhang, Weiwen; Meldrum, Deirdre R

    2011-01-01

    Limitation in sample quality and quantity is one of the big obstacles for applying metatranscriptomic technologies to explore gene expression and functionality of microbial communities in natural environments. In this study, several amplification methods were evaluated for whole-transcriptome amplification of deep-sea microbial samples, which are of low cell density and high impurity. The best amplification method was identified and incorporated into a complete protocol to isolate and amplify deep-sea microbial samples. In the protocol, total RNA was first isolated by a modified method combining Trizol (Invitrogen, CA) and RNeasy (QIAGEN, CA) method, amplified with a WT-Ovation™ Pico RNA Amplification System (NuGEN, CA), and then converted to double-strand DNA from single-strand cDNA with a WT-Ovation™ Exon Module (NuGEN, CA). The products from the whole-transcriptome amplification of deep-sea microbial samples were assessed first through random clone library sequencing. The BLAST search results showed that marine-based sequences are dominant in the libraries, consistent with the ecological source of the samples. The products were then used for next-generation Roche GS FLX Titanium sequencing to obtain metatranscriptome data. Preliminary analysis of the metatranscriptomic data showed good sequencing quality. Although the protocol was designed and demonstrated to be effective for deep-sea microbial samples, it should be applicable to similar samples from other extreme environments in exploring community structure and functionality of microbial communities. Copyright © 2010 Elsevier B.V. All rights reserved.

  4. The complete mitochondrial genome of the deep-sea sponge Poecillastra laminaris (Astrophorida, Vulcanellidae).

    PubMed

    Zeng, Cong; Thomas, Leighton J; Kelly, Michelle; Gardner, Jonathan P A

    2016-05-01

    The complete mitochondrial genome of a New Zealand specimen of the deep-sea sponge Poecillastra laminaris (Sollas, 1886) (Astrophorida, Vulcanellidae), from the Colville Ridge, New Zealand, was sequenced using the 454 Life Science pyrosequencing system. To identify homologous mitochondrial sequences, the 454 reads were mapped to the complete mitochondrial genome sequence of Geodia neptuni (GeneBank No. NC_006990). The P. laminaris genome is 18,413 bp in length and includes 14 protein-coding genes, 24 transfer RNA genes and 2 ribosomal RNA genes. Gene order resembled that of other demosponges. The base composition of the genome is A (29.1%), T (35.2%), C (14.0%) and G (21.7%). This is the second published mitogenome for a sponge of the order Astrophorida and will be useful in future phylogenetic analysis of deep-sea sponges.

  5. Fungal diversity in deep-sea sediments associated with asphalt seeps at the Sao Paulo Plateau

    NASA Astrophysics Data System (ADS)

    Nagano, Yuriko; Miura, Toshiko; Nishi, Shinro; Lima, Andre O.; Nakayama, Cristina; Pellizari, Vivian H.; Fujikura, Katsunori

    2017-12-01

    We investigated the fungal diversity in a total of 20 deep-sea sediment samples (of which 14 samples were associated with natural asphalt seeps and 6 samples were not associated) collected from two different sites at the Sao Paulo Plateau off Brazil by Ion Torrent PGM targeting ITS region of ribosomal RNA. Our results suggest that diverse fungi (113 operational taxonomic units (OTUs) based on clustering at 97% sequence similarity assigned into 9 classes and 31 genus) are present in deep-sea sediment samples collected at the Sao Paulo Plateau, dominated by Ascomycota (74.3%), followed by Basidiomycota (11.5%), unidentified fungi (7.1%), and sequences with no affiliation to any organisms in the public database (7.1%). However, it was revealed that only three species, namely Penicillium sp., Cadophora malorum and Rhodosporidium diobovatum, were dominant, with the majority of OTUs remaining a minor community. Unexpectedly, there was no significant difference in major fungal community structure between the asphalt seep and non-asphalt seep sites, despite the presence of mass hydrocarbon deposits and the high amount of macro organisms surrounding the asphalt seeps. However, there were some differences in the minor fungal communities, with possible asphalt degrading fungi present specifically in the asphalt seep sites. In contrast, some differences were found between the two different sampling sites. Classification of OTUs revealed that only 47 (41.6%) fungal OTUs exhibited >97% sequence similarity, in comparison with pre-existing ITS sequences in public databases, indicating that a majority of deep-sea inhabiting fungal taxa still remain undescribed. Although our knowledge on fungi and their role in deep-sea environments is still limited and scarce, this study increases our understanding of fungal diversity and community structure in deep-sea environments.

  6. CRISPR-Cas9-mediated saturated mutagenesis screen predicts clinical drug resistance with improved accuracy.

    PubMed

    Ma, Leyuan; Boucher, Jeffrey I; Paulsen, Janet; Matuszewski, Sebastian; Eide, Christopher A; Ou, Jianhong; Eickelberg, Garrett; Press, Richard D; Zhu, Lihua Julie; Druker, Brian J; Branford, Susan; Wolfe, Scot A; Jensen, Jeffrey D; Schiffer, Celia A; Green, Michael R; Bolon, Daniel N

    2017-10-31

    Developing tools to accurately predict the clinical prevalence of drug-resistant mutations is a key step toward generating more effective therapeutics. Here we describe a high-throughput CRISPR-Cas9-based saturated mutagenesis approach to generate comprehensive libraries of point mutations at a defined genomic location and systematically study their effect on cell growth. As proof of concept, we mutagenized a selected region within the leukemic oncogene BCR-ABL1 Using bulk competitions with a deep-sequencing readout, we analyzed hundreds of mutations under multiple drug conditions and found that the effects of mutations on growth in the presence or absence of drug were critical for predicting clinically relevant resistant mutations, many of which were cancer adaptive in the absence of drug pressure. Using this approach, we identified all clinically isolated BCR-ABL1 mutations and achieved a prediction score that correlated highly with their clinical prevalence. The strategy described here can be broadly applied to a variety of oncogenes to predict patient mutations and evaluate resistance susceptibility in the development of new therapeutics. Published under the PNAS license.

  7. LookSeq: a browser-based viewer for deep sequencing data.

    PubMed

    Manske, Heinrich Magnus; Kwiatkowski, Dominic P

    2009-11-01

    Sequencing a genome to great depth can be highly informative about heterogeneity within an individual or a population. Here we address the problem of how to visualize the multiple layers of information contained in deep sequencing data. We propose an interactive AJAX-based web viewer for browsing large data sets of aligned sequence reads. By enabling seamless browsing and fast zooming, the LookSeq program assists the user to assimilate information at different levels of resolution, from an overview of a genomic region to fine details such as heterogeneity within the sample. A specific problem, particularly if the sample is heterogeneous, is how to depict information about structural variation. LookSeq provides a simple graphical representation of paired sequence reads that is more revealing about potential insertions and deletions than are conventional methods.

  8. Unveiling the Biodiversity of Deep-Sea Nematodes through Metabarcoding: Are We Ready to Bypass the Classical Taxonomy?

    PubMed Central

    2015-01-01

    Nematodes inhabiting benthic deep-sea ecosystems account for >90% of the total metazoan abundances and they have been hypothesised to be hyper-diverse, but their biodiversity is still largely unknown. Metabarcoding could facilitate the census of biodiversity, especially for those tiny metazoans for which morphological identification is difficult. We compared, for the first time, different DNA extraction procedures based on the use of two commercial kits and a previously published laboratory protocol and tested their suitability for sequencing analyses of 18S rDNA of marine nematodes. We also investigated the reliability of Roche 454 sequencing analyses for assessing the biodiversity of deep-sea nematode assemblages previously morphologically identified. Finally, intra-genomic variation in 18S rRNA gene repeats was investigated by Illumina MiSeq in different deep-sea nematode morphospecies to assess the influence of polymorphisms on nematode biodiversity estimates. Our results indicate that the two commercial kits should be preferred for the molecular analysis of biodiversity of deep-sea nematodes since they consistently provide amplifiable DNA suitable for sequencing. We report that the morphological identification of deep-sea nematodes matches the results obtained by metabarcoding analysis only at the order-family level and that a large portion of Operational Clustered Taxonomic Units (OCTUs) was not assigned. We also show that independently from the cut-off criteria and bioinformatic pipelines used, the number of OCTUs largely exceeds the number of individuals and that 18S rRNA gene of different morpho-species of nematodes displayed intra-genomic polymorphisms. Our results indicate that metabarcoding is an important tool to explore the diversity of deep-sea nematodes, but still fails in identifying most of the species due to limited number of sequences deposited in the public databases, and in providing quantitative data on the species encountered. These aspects should be carefully taken into account before using metabarcoding in quantitative ecological research and monitoring programmes of marine biodiversity. PMID:26701112

  9. Unveiling the Biodiversity of Deep-Sea Nematodes through Metabarcoding: Are We Ready to Bypass the Classical Taxonomy?

    PubMed

    Dell'Anno, Antonio; Carugati, Laura; Corinaldesi, Cinzia; Riccioni, Giulia; Danovaro, Roberto

    2015-01-01

    Nematodes inhabiting benthic deep-sea ecosystems account for >90% of the total metazoan abundances and they have been hypothesised to be hyper-diverse, but their biodiversity is still largely unknown. Metabarcoding could facilitate the census of biodiversity, especially for those tiny metazoans for which morphological identification is difficult. We compared, for the first time, different DNA extraction procedures based on the use of two commercial kits and a previously published laboratory protocol and tested their suitability for sequencing analyses of 18S rDNA of marine nematodes. We also investigated the reliability of Roche 454 sequencing analyses for assessing the biodiversity of deep-sea nematode assemblages previously morphologically identified. Finally, intra-genomic variation in 18S rRNA gene repeats was investigated by Illumina MiSeq in different deep-sea nematode morphospecies to assess the influence of polymorphisms on nematode biodiversity estimates. Our results indicate that the two commercial kits should be preferred for the molecular analysis of biodiversity of deep-sea nematodes since they consistently provide amplifiable DNA suitable for sequencing. We report that the morphological identification of deep-sea nematodes matches the results obtained by metabarcoding analysis only at the order-family level and that a large portion of Operational Clustered Taxonomic Units (OCTUs) was not assigned. We also show that independently from the cut-off criteria and bioinformatic pipelines used, the number of OCTUs largely exceeds the number of individuals and that 18S rRNA gene of different morpho-species of nematodes displayed intra-genomic polymorphisms. Our results indicate that metabarcoding is an important tool to explore the diversity of deep-sea nematodes, but still fails in identifying most of the species due to limited number of sequences deposited in the public databases, and in providing quantitative data on the species encountered. These aspects should be carefully taken into account before using metabarcoding in quantitative ecological research and monitoring programmes of marine biodiversity.

  10. Molecular Phylogenetic Analysis of Archaeal Intron-Containing Genes Coding for rRNA Obtained from a Deep-Subsurface Geothermal Water Pool

    PubMed Central

    Takai, Ken; Horikoshi, Koki

    1999-01-01

    Molecular phylogenetic analysis of a naturally occurring microbial community in a deep-subsurface geothermal environment indicated that the phylogenetic diversity of the microbial population in the environment was extremely limited and that only hyperthermophilic archaeal members closely related to Pyrobaculum were present. All archaeal ribosomal DNA sequences contained intron-like sequences, some of which had open reading frames with repeated homing-endonuclease motifs. The sequence similarity analysis and the phylogenetic analysis of these homing endonucleases suggested the possible phylogenetic relationship among archaeal rRNA-encoded homing endonucleases. PMID:10584021

  11. Drilling a crater at the Equator-insides from ICDP DeepCHALLA

    NASA Astrophysics Data System (ADS)

    Meyer, Inka; Van Daele, Maarten; Tanghe, Niels; Eloy, Jonas; Verschuren, Dirk; De Batist, Marc

    2017-04-01

    Long and continuous sediment records from equatorial Africa are rare, resulting in a so far fragmentary understanding of the effects of a warming atmosphere on the tropical hydrological cycle at the regional scale. Serve and recurrent droughts is the principle weather-related hazard throughout sub-Saharan Africa, and the quality of long-term weather prediction a principle bottleneck hampering drought mitigation and adaptation. The impact of 21st-century anthropogenic climate change on the African rainfall is highly uncertain, implying unforeseeable effects on freshwater resources. During the "CHALLACEA" project (2005-2008) detailed investigations of Lake Challa, a relatively small and deep crater lake on the border between Kenya and Tanzania, revealed the lake is a key site for reconstructing the climate and environmental history of equatorial East Africa. Various biological, bio-geochemical and sedimentological investigations of the 22 long CHALLACEA-core helped to understand the systematics of Lake Challa under present-day conditions as well as to reconstruct environmental changes over the past 25,000 years. Due to the good quality of the Lake Challa sediment and the high scientific outcome of the record, a new International Continental Scientific Drilling Programme (ICDP) project "DeepCHALLA" was established to drill a longer sediment record, going further back in time. During the drilling campaign in November 2016 a 215 m long sediment sequence was obtained which will provide unique information about environmental changes in low-latitudes over a complete glacial - interglacial cycle. Therefore, the record opens new opportunities to study East African environmental changes and paleo-hydrological conditions much further back in time, encompassing the entire known existence of modern humans (Homo sapiens) in East Africa. Here we present a compilation of the environmental reconstructions based on the CHALLACEA sediment sequence and will give an outline of future work on the DeepCHALLA record.

  12. CASSIUS: The Cassini Uplink Scheduler

    NASA Technical Reports Server (NTRS)

    Bellinger, Earl

    2012-01-01

    The Cassini Uplink Scheduler (CASSIUS) is cross-platform software used to generate a radiation sequence plan for commands being sent to the Cassini spacecraft. Because signals must travel through varying amounts of Earth's atmosphere, several different modes of constant telemetry rates have been devised. These modes guarantee that the spacecraft and the Deep Space Network agree with respect to the data transmission rate. However, the memory readout of a command will be lost if it occurs on a telemetry mode boundary. Given a list of spacecraft message files as well as the available telemetry modes, CASSIUS can find an uplink sequence that ensures safe transmission of each file. In addition, it can predict when the two on-board solid state recorders will swap. CASSIUS prevents data corruption by making sure that commands are not planned for memory readout during telemetry rate changes or a solid state recorder swap.

  13. Medical Sequencing at the extremes of Human Body Mass

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ahituv, Nadav; Kavaslar, Nihan; Schackwitz, Wendy

    2006-09-01

    Body weight is a quantitative trait with significantheritability in humans. To identify potential genetic contributors tothis phenotype, we resequenced the coding exons and splice junctions of58 genes in 379 obese and 378 lean individuals. Our 96Mb survey included21 genes associated with monogenic forms of obesity in humans or mice, aswell as 37 genes that function in body weight-related pathways. We foundthat the monogenic obesity-associated gene group was enriched for rarenonsynonymous variants unique to the obese (n=46) versus lean (n=26)populations. Computational analysis further predicted a significantlygreater fraction of deleterious variants within the obese cohort.Consistent with the complex inheritance of body weight,more » we did notobserve obvious familial segregation in the majority of the 28 availablekindreds. Taken together, these data suggest that multiple rare alleleswith variable penetrance contribute to obesity in the population andprovide a deep medical sequencing based approach to detectthem.« less

  14. A first insight into the occurrence and expression of functional amoA and accA genes of autotrophic and ammonia-oxidizing bathypelagic Crenarchaeota of Tyrrhenian Sea

    NASA Astrophysics Data System (ADS)

    Yakimov, Michail M.; Cono, Violetta La; Denaro, Renata

    2009-05-01

    The autotrophic and ammonia-oxidizing crenarchaeal assemblage at offshore site located in the deep Mediterranean (Tyrrhenian Sea, depth 3000 m) water was studied by PCR amplification of the key functional genes involved in energy (ammonia mono-oxygenase alpha subunit, amoA) and central metabolism (acetyl-CoA carboxylase alpha subunit, accA). Using two recently annotated genomes of marine crenarchaeons, an initial set of primers targeting archaeal accA-like genes was designed. Approximately 300 clones were analyzed, of which 100% of amoA library and almost 70% of accA library were unambiguously related to the corresponding genes from marine Crenarchaeota. Even though the acetyl-CoA carboxylase is phylogenetically not well conserved and the remaining clones were affiliated to various bacterial acetyl-CoA/propionyl-CoA carboxylase genes, the pool of archaeal sequences was applied for development of quantitative PCR analysis of accA-like distribution using TaqMan ® methodolgy. The archaeal accA gene fragments, together with alignable gene fragments from the Sargasso Sea and North Pacific Subtropical Gyre (ALOHA Station) metagenome databases, were analyzed by multiple sequence alignment. Two accA-like sequences, found in ALOHA Station at the depth of 4000 m, formed a deeply branched clade with 64% of all archaeal Tyrrhenian clones. No close relatives for residual 36% of clones, except of those recovered from Eastern Mediterranean, was found, suggesting the existence of a specific lineage of the crenarchaeal accA genes in deep Mediterranean water. Alignment of Mediterranean amoA sequences defined four cosmopolitan phylotypes of Crenarchaeota putative ammonia mono-oxygenase subunit A gene occurring in the water sample from the 3000 m depth. Without exception all phylotypes fell into Deep Marine Group I cluster that contain the vast majority of known sequences recovered from global deep-sea environment. Remarkably, three phylotypes accounted for 91% of all Mediterranean amoA clones and corresponded to the sequences retrieved from the less deep compartments of the world's ocean, most likely reflecting the higher temperature at the depth of the Mediterranean Sea. In order to verify whether these phylotypes might represent important Crenarchaeota in the functioning of the Mediterranean bathypelagic ecosystem, expression of crenarchaeal amoA gene was monitored by direct RNA retrieval and following analysis of amoA-related mRNA transcripts. Surprisingly, all mRNA-derived sequences formed a tight monophyletic group, which fell into large Shallow Marine Group I cluster with sequences retrieved from shallow (up to 200 m) waters, sediments and corals. This group was not detected in DNA-based clone library, obviously, due to an overwhelming dominance of the Deep Marine Group I. The failure to recover the amoA transcripts, related to Deep Marine Group I of Crenarchaeota, was unanticipated and likely resulted from the physiology of these strongly adapted deep-sea organisms. As far as all seawater samples were treated on-board under atmospheric pressure conditions and sunlight, the decompression and/or photoinhibition likely affected their metabolic activity, followed by the strong decay of gene expression.

  15. Next-generation sequencing reveals low-dose effects of cationic dendrimers in primary human bronchial epithelial cells.

    PubMed

    Feliu, Neus; Kohonen, Pekka; Ji, Jie; Zhang, Yuning; Karlsson, Hanna L; Palmberg, Lena; Nyström, Andreas; Fadeel, Bengt

    2015-01-27

    Gene expression profiling has developed rapidly in recent years with the advent of deep sequencing technologies such as RNA sequencing (RNA Seq) and could be harnessed to predict and define mechanisms of toxicity of chemicals and nanomaterials. However, the full potential of these technologies in (nano)toxicology is yet to be realized. Here, we show that systems biology approaches can uncover mechanisms underlying cellular responses to nanomaterials. Using RNA Seq and computational approaches, we found that cationic poly(amidoamine) dendrimers (PAMAM-NH2) are capable of triggering down-regulation of cell-cycle-related genes in primary human bronchial epithelial cells at doses that do not elicit acute cytotoxicity, as demonstrated using conventional cell viability assays, while gene transcription was not affected by neutral PAMAM-OH dendrimers. The PAMAMs were internalized in an active manner by lung cells and localized mainly in lysosomes; amine-terminated dendrimers were internalized more efficiently when compared to the hydroxyl-terminated dendrimers. Upstream regulator analysis implicated NF-κB as a putative transcriptional regulator, and subsequent cell-based assays confirmed that PAMAM-NH2 caused NF-κB-dependent cell cycle arrest. However, PAMAM-NH2 did not affect cell cycle progression in the human A549 adenocarcinoma cell line. These results demonstrate the feasibility of applying systems biology approaches to predict cellular responses to nanomaterials and highlight the importance of using relevant (primary) cell models.

  16. Genomic variation in macrophage-cultured European porcine reproductive and respiratory syndrome virus Olot/91 revealed using ultra-deep next generation sequencing.

    PubMed

    Lu, Zen H; Brown, Alexander; Wilson, Alison D; Calvert, Jay G; Balasch, Monica; Fuentes-Utrilla, Pablo; Loecherbach, Julia; Turner, Frances; Talbot, Richard; Archibald, Alan L; Ait-Ali, Tahar

    2014-03-04

    Porcine Reproductive and Respiratory Syndrome (PRRS) is a disease of major economic impact worldwide. The etiologic agent of this disease is the PRRS virus (PRRSV). Increasing evidence suggest that microevolution within a coexisting quasispecies population can give rise to high sequence heterogeneity in PRRSV. We developed a pipeline based on the ultra-deep next generation sequencing approach to first construct the complete genome of a European PRRSV, strain Olot/9, cultured on macrophages and then capture the rare variants representative of the mixed quasispecies population. Olot/91 differs from the reference Lelystad strain by about 5% and a total of 88 variants, with frequencies as low as 1%, were detected in the mixed population. These variants included 16 non-synonymous variants concentrated in the genes encoding structural and nonstructural proteins; including Glycoprotein 2a and 5. Using an ultra-deep sequencing methodology, the complete genome of Olot/91 was constructed without any prior knowledge of the sequence. Rare variants that constitute minor fractions of the heterogeneous PRRSV population could successfully be detected to allow further exploration of microevolutionary events.

  17. PolyA_DB 3 catalogs cleavage and polyadenylation sites identified by deep sequencing in multiple genomes

    PubMed Central

    Wang, Ruijia; Nambiar, Ram; Zheng, Dinghai

    2018-01-01

    Abstract PolyA_DB is a database cataloging cleavage and polyadenylation sites (PASs) in several genomes. Previous versions were based mainly on expressed sequence tags (ESTs), which had a limited amount and could lead to inaccurate PAS identification due to the presence of internal A-rich sequences in transcripts. Here, we present an updated version of the database based solely on deep sequencing data. First, PASs are mapped by the 3′ region extraction and deep sequencing (3′READS) method, ensuring unequivocal PAS identification. Second, a large volume of data based on diverse biological samples increases PAS coverage by 3.5-fold over the EST-based version and provides PAS usage information. Third, strand-specific RNA-seq data are used to extend annotated 3′ ends of genes to obtain more thorough annotations of alternative polyadenylation (APA) sites. Fourth, conservation information of PAS across mammals sheds light on significance of APA sites. The database (URL: http://www.polya-db.org/v3) currently holds PASs in human, mouse, rat and chicken, and has links to the UCSC genome browser for further visualization and for integration with other genomic data. PMID:29069441

  18. Identification of microRNAs differentially expressed involved in male flower development.

    PubMed

    Wang, Zhengjia; Huang, Jianqin; Sun, Zhichao; Zheng, Bingsong

    2015-03-01

    Hickory (Carya cathayensis Sarg.) is one of the most economically important woody trees in eastern China, but its long flowering phase delays yield. Our understanding of the regulatory roles of microRNAs (miRNAs) in male flower development in hickory remains poor. Using high-throughput sequencing technology, we have pyrosequenced two small RNA libraries from two male flower differentiation stages in hickory. Analysis of the sequencing data identified 114 conserved miRNAs that belonged to 23 miRNA families, five novel miRNAs including their corresponding miRNA*s, and 22 plausible miRNA candidates. Differential expression analysis revealed 12 miRNA sequences that were upregulated in the later (reproductive) stage of male flower development. Quantitative real-time PCR showed similar expression trends as that of the deep sequencing. Novel miRNAs and plausible miRNA candidates were predicted using bioinformatic analysis methods. The miRNAs newly identified in this study have increased the number of known miRNAs in hickory, and the identification of differentially expressed miRNAs will provide new avenues for studies into miRNAs involved in the process of male flower development in hickory and other related trees.

  19. Deep whole-genome sequencing of 90 Han Chinese genomes.

    PubMed

    Lan, Tianming; Lin, Haoxiang; Zhu, Wenjuan; Laurent, Tellier Christian Asker Melchior; Yang, Mengcheng; Liu, Xin; Wang, Jun; Wang, Jian; Yang, Huanming; Xu, Xun; Guo, Xiaosen

    2017-09-01

    Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low-frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low-frequency and novel variants. Although whole-exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole-genome sequencing data are limited for any population, and a large amount of low-frequency, population-specific variants remain uncharacterized. We have performed whole-genome sequencing at a high depth (∼×80) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genomes Project samples, including 45 Northern Han Chinese and 45 Southern Han Chinese samples. Eighty-three of these 90 have been sequenced by the 1000 Genomes Project. We have identified 12 568 804 single nucleotide polymorphisms, 2 074 210 short InDels, and 26 142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7 000 629 novel variants with low frequency (defined as minor allele frequency < 5%), including 5 813 503 single nucleotide polymorphisms, 1 169 199 InDels, and 17 927 structural variants. Using deep sequencing data, we have built a greatly expanded spectrum of genetic variation for the Han Chinese genome. Compared to the 1000 Genomes Project, these Han Chinese deep sequencing data enhance the characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement to the 1000 Genomes Project, as well as to other human genome projects. © The Authors 2017. Published by Oxford University Press.

  20. Comparative sequencing analysis reveals high genomic concordance between matched primary and metastatic colorectal cancer lesions.

    PubMed

    Brannon, A Rose; Vakiani, Efsevia; Sylvester, Brooke E; Scott, Sasinya N; McDermott, Gregory; Shah, Ronak H; Kania, Krishan; Viale, Agnes; Oschwald, Dayna M; Vacic, Vladimir; Emde, Anne-Katrin; Cercek, Andrea; Yaeger, Rona; Kemeny, Nancy E; Saltz, Leonard B; Shia, Jinru; D'Angelica, Michael I; Weiser, Martin R; Solit, David B; Berger, Michael F

    2014-08-28

    Colorectal cancer is the second leading cause of cancer death in the United States, with over 50,000 deaths estimated in 2014. Molecular profiling for somatic mutations that predict absence of response to anti-EGFR therapy has become standard practice in the treatment of metastatic colorectal cancer; however, the quantity and type of tissue available for testing is frequently limited. Further, the degree to which the primary tumor is a faithful representation of metastatic disease has been questioned. As next-generation sequencing technology becomes more widely available for clinical use and additional molecularly targeted agents are considered as treatment options in colorectal cancer, it is important to characterize the extent of tumor heterogeneity between primary and metastatic tumors. We performed deep coverage, targeted next-generation sequencing of 230 key cancer-associated genes for 69 matched primary and metastatic tumors and normal tissue. Mutation profiles were 100% concordant for KRAS, NRAS, and BRAF, and were highly concordant for recurrent alterations in colorectal cancer. Additionally, whole genome sequencing of four patient trios did not reveal any additional site-specific targetable alterations. Colorectal cancer primary tumors and metastases exhibit high genomic concordance. As current clinical practices in colorectal cancer revolve around KRAS, NRAS, and BRAF mutation status, diagnostic sequencing of either primary or metastatic tissue as available is acceptable for most patients. Additionally, consistency between targeted sequencing and whole genome sequencing results suggests that targeted sequencing may be a suitable strategy for clinical diagnostic applications.

  1. Optimization of conditions to sequence long cDNAs from viruses

    USDA-ARS?s Scientific Manuscript database

    Fourth generation sequencing with the Minion nanopore sequencer provides opportunity to obtain deep coverage and long read for single molecules. This will benefit studies on RNA viruses. In the past, Sanger, Illumina, and Ion Torrent sequencing have been utilized to study RNA viruses. Both technique...

  2. SNP discovery through de novo deep sequencing using the next generation of DNA sequencers

    USDA-ARS?s Scientific Manuscript database

    The production of high volumes of DNA sequence data using new technologies has permitted more efficient identification of single nucleotide polymorphisms in vertebrate genomes. This chapter presented practical methodology for production and analysis of DNA sequence data for SNP discovery....

  3. smRNAome profiling to identify conserved and novel microRNAs in Stevia rebaudiana Bertoni

    PubMed Central

    2012-01-01

    Background MicroRNAs (miRNAs) constitute a family of small RNA (sRNA) population that regulates the gene expression and plays an important role in plant development, metabolism, signal transduction and stress response. Extensive studies on miRNAs have been performed in different plants such as Arabidopsis thaliana, Oryza sativa etc. and volume of the miRNA database, mirBASE, has been increasing on day to day basis. Stevia rebaudiana Bertoni is an important perennial herb which accumulates high concentrations of diterpene steviol glycosides which contributes to its high indexed sweetening property with no calorific value. Several studies have been carried out for understanding molecular mechanism involved in biosynthesis of these glycosides, however, information about miRNAs has been lacking in S. rebaudiana. Deep sequencing of small RNAs combined with transcriptomic data is a powerful tool for identifying conserved and novel miRNAs irrespective of availability of genome sequence data. Results To identify miRNAs in S. rebaudiana, sRNA library was constructed and sequenced using Illumina genome analyzer II. A total of 30,472,534 reads representing 2,509,190 distinct sequences were obtained from sRNA library. Based on sequence similarity, we identified 100 miRNAs belonging to 34 highly conserved families. Also, we identified 12 novel miRNAs whose precursors were potentially generated from stevia EST and nucleotide sequences. All novel sequences have not been earlier described in other plant species. Putative target genes were predicted for most conserved and novel miRNAs. The predicted targets are mainly mRNA encoding enzymes regulating essential plant metabolic and signaling pathways. Conclusions This study led to the identification of 34 highly conserved miRNA families and 12 novel potential miRNAs indicating that specific miRNAs exist in stevia species. Our results provided information on stevia miRNAs and their targets building a foundation for future studies to understand their roles in key stevia traits. PMID:23116282

  4. Use of four next-generation sequencing platforms to determine HIV-1 coreceptor tropism.

    PubMed

    Archer, John; Weber, Jan; Henry, Kenneth; Winner, Dane; Gibson, Richard; Lee, Lawrence; Paxinos, Ellen; Arts, Eric J; Robertson, David L; Mimms, Larry; Quiñones-Mateu, Miguel E

    2012-01-01

    HIV-1 coreceptor tropism assays are required to rule out the presence of CXCR4-tropic (non-R5) viruses prior treatment with CCR5 antagonists. Phenotypic (e.g., Trofile™, Monogram Biosciences) and genotypic (e.g., population sequencing linked to bioinformatic algorithms) assays are the most widely used. Although several next-generation sequencing (NGS) platforms are available, to date all published deep sequencing HIV-1 tropism studies have used the 454™ Life Sciences/Roche platform. In this study, HIV-1 co-receptor usage was predicted for twelve patients scheduled to start a maraviroc-based antiretroviral regimen. The V3 region of the HIV-1 env gene was sequenced using four NGS platforms: 454™, PacBio® RS (Pacific Biosciences), Illumina®, and Ion Torrent™ (Life Technologies). Cross-platform variation was evaluated, including number of reads, read length and error rates. HIV-1 tropism was inferred using Geno2Pheno, Web PSSM, and the 11/24/25 rule and compared with Trofile™ and virologic response to antiretroviral therapy. Error rates related to insertions/deletions (indels) and nucleotide substitutions introduced by the four NGS platforms were low compared to the actual HIV-1 sequence variation. Each platform detected all major virus variants within the HIV-1 population with similar frequencies. Identification of non-R5 viruses was comparable among the four platforms, with minor differences attributable to the algorithms used to infer HIV-1 tropism. All NGS platforms showed similar concordance with virologic response to the maraviroc-based regimen (75% to 80% range depending on the algorithm used), compared to Trofile (80%) and population sequencing (70%). In conclusion, all four NGS platforms were able to detect minority non-R5 variants at comparable levels suggesting that any NGS-based method can be used to predict HIV-1 coreceptor usage.

  5. smRNAome profiling to identify conserved and novel microRNAs in Stevia rebaudiana Bertoni.

    PubMed

    Mandhan, Vibha; Kaur, Jagdeep; Singh, Kashmir

    2012-11-01

    MicroRNAs (miRNAs) constitute a family of small RNA (sRNA) population that regulates the gene expression and plays an important role in plant development, metabolism, signal transduction and stress response. Extensive studies on miRNAs have been performed in different plants such as Arabidopsis thaliana, Oryza sativa etc. and volume of the miRNA database, mirBASE, has been increasing on day to day basis. Stevia rebaudiana Bertoni is an important perennial herb which accumulates high concentrations of diterpene steviol glycosides which contributes to its high indexed sweetening property with no calorific value. Several studies have been carried out for understanding molecular mechanism involved in biosynthesis of these glycosides, however, information about miRNAs has been lacking in S. rebaudiana. Deep sequencing of small RNAs combined with transcriptomic data is a powerful tool for identifying conserved and novel miRNAs irrespective of availability of genome sequence data. To identify miRNAs in S. rebaudiana, sRNA library was constructed and sequenced using Illumina genome analyzer II. A total of 30,472,534 reads representing 2,509,190 distinct sequences were obtained from sRNA library. Based on sequence similarity, we identified 100 miRNAs belonging to 34 highly conserved families. Also, we identified 12 novel miRNAs whose precursors were potentially generated from stevia EST and nucleotide sequences. All novel sequences have not been earlier described in other plant species. Putative target genes were predicted for most conserved and novel miRNAs. The predicted targets are mainly mRNA encoding enzymes regulating essential plant metabolic and signaling pathways. This study led to the identification of 34 highly conserved miRNA families and 12 novel potential miRNAs indicating that specific miRNAs exist in stevia species. Our results provided information on stevia miRNAs and their targets building a foundation for future studies to understand their roles in key stevia traits.

  6. pDeep: Predicting MS/MS Spectra of Peptides with Deep Learning.

    PubMed

    Zhou, Xie-Xuan; Zeng, Wen-Feng; Chi, Hao; Luo, Chunjie; Liu, Chao; Zhan, Jianfeng; He, Si-Min; Zhang, Zhifei

    2017-12-05

    In tandem mass spectrometry (MS/MS)-based proteomics, search engines rely on comparison between an experimental MS/MS spectrum and the theoretical spectra of the candidate peptides. Hence, accurate prediction of the theoretical spectra of peptides appears to be particularly important. Here, we present pDeep, a deep neural network-based model for the spectrum prediction of peptides. Using the bidirectional long short-term memory (BiLSTM), pDeep can predict higher-energy collisional dissociation, electron-transfer dissociation, and electron-transfer and higher-energy collision dissociation MS/MS spectra of peptides with >0.9 median Pearson correlation coefficients. Further, we showed that intermediate layer of the neural network could reveal physicochemical properties of amino acids, for example the similarities of fragmentation behaviors between amino acids. We also showed the potential of pDeep to distinguish extremely similar peptides (peptides that contain isobaric amino acids, for example, GG = N, AG = Q, or even I = L), which were very difficult to distinguish using traditional search engines.

  7. MRI markers of small vessel disease in lobar and deep hemispheric intracerebral hemorrhage.

    PubMed

    Smith, Eric E; Nandigam, Kaveer R N; Chen, Yu-Wei; Jeng, Jed; Salat, David; Halpin, Amy; Frosch, Matthew; Wendell, Lauren; Fazen, Louis; Rosand, Jonathan; Viswanathan, Anand; Greenberg, Steven M

    2010-09-01

    MRI evidence of small vessel disease is common in intracerebral hemorrhage (ICH). We hypothesized that ICH caused by cerebral amyloid angiopathy (CAA) or hypertensive vasculopathy would have different distributions of MRI T2 white matter hyperintensity (WMH) and microbleeds. Data were analyzed from 133 consecutive patients with primary supratentorial ICH and adequate MRI sequences. CAA was diagnosed using the Boston criteria. WMH segmentation was performed using a validated semiautomated method. WMH and microbleeds were compared according to site of symptomatic hematoma origin (lobar versus deep) or by pattern of hemorrhages, including both hematomas and microbleeds, on MRI gradient recalled echo sequence (grouped as lobar only-probable CAA, lobar only-possible CAA, deep hemispheric only, or mixed lobar and deep hemorrhages). Patients with lobar and deep hemispheric hematoma had similar median normalized WMH volumes (19.5 cm versus 19.9 cm(3), P=0.74) and prevalence of >or=1 microbleed (54% versus 52%, P=0.99). The supratentorial WMH distribution was similar according to hemorrhage location category; however, the prevalence of brain stem T2 hyperintensity was lower in lobar hematoma versus deep hematoma (54% versus 70%, P=0.004). Mixed ICH was common (23%). Patients with mixed ICH had large normalized WMH volumes and a posterior distribution of cortical hemorrhages similar to that seen in CAA. WMH distribution is largely similar between CAA-related and non-CAA-related ICH. Mixed lobar and deep hemorrhages are seen on MRI gradient recalled echo sequence in up to one fourth of patients; in these patients, both hypertension and CAA may be contributing to the burden of WMH.

  8. Deep-Learning-Based Drug-Target Interaction Prediction.

    PubMed

    Wen, Ming; Zhang, Zhimin; Niu, Shaoyu; Sha, Haozhi; Yang, Ruihan; Yun, Yonghuan; Lu, Hongmei

    2017-04-07

    Identifying interactions between known drugs and targets is a major challenge in drug repositioning. In silico prediction of drug-target interaction (DTI) can speed up the expensive and time-consuming experimental work by providing the most potent DTIs. In silico prediction of DTI can also provide insights about the potential drug-drug interaction and promote the exploration of drug side effects. Traditionally, the performance of DTI prediction depends heavily on the descriptors used to represent the drugs and the target proteins. In this paper, to accurately predict new DTIs between approved drugs and targets without separating the targets into different classes, we developed a deep-learning-based algorithmic framework named DeepDTIs. It first abstracts representations from raw input descriptors using unsupervised pretraining and then applies known label pairs of interaction to build a classification model. Compared with other methods, it is found that DeepDTIs reaches or outperforms other state-of-the-art methods. The DeepDTIs can be further used to predict whether a new drug targets to some existing targets or whether a new target interacts with some existing drugs.

  9. Deep Sequencing Analysis of Apple Infecting Viruses in Korea

    PubMed Central

    Cho, In-Sook; Igori, Davaajargal; Lim, Seungmo; Choi, Gug-Seoun; Hammond, John; Lim, Hyoun-Sub; Moon, Jae Sun

    2016-01-01

    Deep sequencing has generated 52 contigs derived from five viruses; Apple chlorotic leaf spot virus (ACLSV), Apple stem grooving virus (ASGV), Apple stem pitting virus (ASPV), Apple green crinkle associated virus (AGCaV), and Apricot latent virus (ApLV) were identified from eight apple samples showing small leaves and/or growth retardation. Nucleotide (nt) sequence identity of the assembled contigs was from 68% to 99% compared to the reference sequences of the five respective viral genomes. Sequences of ASPV and ASGV were the most abundantly represented by the 52 contigs assembled. The presence of the five viruses in the samples was confirmed by RT-PCR using specific primers based on the sequences of each assembled contig. All five viruses were detected in three of the samples, whereas all samples had mixed infections with at least two viruses. The most frequently detected virus was ASPV, followed by ASGV, ApLV, ACLSV, and AGCaV which were withal found in mixed infections in the tested samples. AGCaV was identified in assembled contigs ID 1012480 and 93549, which showed 82% and 78% nt sequence identity with ORF1 of AGCaV isolate Aurora-1. ApLV was identified in three assembled contigs, ID 65587, 1802365, and 116777, which showed 77%, 78%, and 76% nt sequence identity respectively with ORF1 of ApLV isolate LA2. Deep sequencing assay was shown to be a valuable and powerful tool for detection and identification of known and unknown virome in infected apple trees, here identifying ApLV and AGCaV in commercial orchards in Korea for the first time. PMID:27721694

  10. Theory of Semiconducting Superlattices and Microstructures

    DTIC Science & Technology

    1992-03-01

    theory elucidated the various factors affecting deep levels, sets forth the conditions for obtaining shallow-deep transitions, and predicts that Si (a...theory elucidates the various factors affecting deep levels, sets forth the conditions for obtaining shallow-deep transitions, and predicts that Si (a...ondenotes the anion vacancy, which can be thought any quantitative theoretical factor are theof as originating from Column-O of the Period strengths of

  11. Deep sequencing approaches for the analysis of prokaryotic transcriptional boundaries and dynamics.

    PubMed

    James, Katherine; Cockell, Simon J; Zenkin, Nikolay

    2017-05-01

    The identification of the protein-coding regions of a genome is straightforward due to the universality of start and stop codons. However, the boundaries of the transcribed regions, conditional operon structures, non-coding RNAs and the dynamics of transcription, such as pausing of elongation, are non-trivial to identify, even in the comparatively simple genomes of prokaryotes. Traditional methods for the study of these areas, such as tiling arrays, are noisy, labour-intensive and lack the resolution required for densely-packed bacterial genomes. Recently, deep sequencing has become increasingly popular for the study of the transcriptome due to its lower costs, higher accuracy and single nucleotide resolution. These methods have revolutionised our understanding of prokaryotic transcriptional dynamics. Here, we review the deep sequencing and data analysis techniques that are available for the study of transcription in prokaryotes, and discuss the bioinformatic considerations of these analyses. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. Insertion sequences enrichment in extreme Red sea brine pool vent.

    PubMed

    Elbehery, Ali H A; Aziz, Ramy K; Siam, Rania

    2017-03-01

    Mobile genetic elements are major agents of genome diversification and evolution. Limited studies addressed their characteristics, including abundance, and role in extreme habitats. One of the rare natural habitats exposed to multiple-extreme conditions, including high temperature, salinity and concentration of heavy metals, are the Red Sea brine pools. We assessed the abundance and distribution of different mobile genetic elements in four Red Sea brine pools including the world's largest known multiple-extreme deep-sea environment, the Red Sea Atlantis II Deep. We report a gradient in the abundance of mobile genetic elements, dramatically increasing in the harshest environment of the pool. Additionally, we identified a strong association between the abundance of insertion sequences and extreme conditions, being highest in the harshest and deepest layer of the Red Sea Atlantis II Deep. Our comparative analyses of mobile genetic elements in secluded, extreme and relatively non-extreme environments, suggest that insertion sequences predominantly contribute to polyextremophiles genome plasticity.

  13. Diagnosis and prediction of periodontally compromised teeth using a deep learning-based convolutional neural network algorithm.

    PubMed

    Lee, Jae-Hong; Kim, Do-Hyung; Jeong, Seong-Nyum; Choi, Seong-Ho

    2018-04-01

    The aim of the current study was to develop a computer-assisted detection system based on a deep convolutional neural network (CNN) algorithm and to evaluate the potential usefulness and accuracy of this system for the diagnosis and prediction of periodontally compromised teeth (PCT). Combining pretrained deep CNN architecture and a self-trained network, periapical radiographic images were used to determine the optimal CNN algorithm and weights. The diagnostic and predictive accuracy, sensitivity, specificity, positive predictive value, negative predictive value, receiver operating characteristic (ROC) curve, area under the ROC curve, confusion matrix, and 95% confidence intervals (CIs) were calculated using our deep CNN algorithm, based on a Keras framework in Python. The periapical radiographic dataset was split into training (n=1,044), validation (n=348), and test (n=348) datasets. With the deep learning algorithm, the diagnostic accuracy for PCT was 81.0% for premolars and 76.7% for molars. Using 64 premolars and 64 molars that were clinically diagnosed as severe PCT, the accuracy of predicting extraction was 82.8% (95% CI, 70.1%-91.2%) for premolars and 73.4% (95% CI, 59.9%-84.0%) for molars. We demonstrated that the deep CNN algorithm was useful for assessing the diagnosis and predictability of PCT. Therefore, with further optimization of the PCT dataset and improvements in the algorithm, a computer-aided detection system can be expected to become an effective and efficient method of diagnosing and predicting PCT.

  14. Analysis of Variability in HIV-1 Subtype A Strains in Russia Suggests a Combination of Deep Sequencing and Multitarget RNA Interference for Silencing of the Virus.

    PubMed

    Kretova, Olga V; Chechetkin, Vladimir R; Fedoseeva, Daria M; Kravatsky, Yuri V; Sosin, Dmitri V; Alembekov, Ildar R; Gorbacheva, Maria A; Gashnikova, Natalya M; Tchurikov, Nickolai A

    2017-02-01

    Any method for silencing the activity of the HIV-1 retrovirus should tackle the extremely high variability of HIV-1 sequences and mutational escape. We studied sequence variability in the vicinity of selected RNA interference (RNAi) targets from isolates of HIV-1 subtype A in Russia, and we propose that using artificial RNAi is a potential alternative to traditional antiretroviral therapy. We prove that using multiple RNAi targets overcomes the variability in HIV-1 isolates. The optimal number of targets critically depends on the conservation of the target sequences. The total number of targets that are conserved with a probability of 0.7-0.8 should exceed at least 2. Combining deep sequencing and multitarget RNAi may provide an efficient approach to cure HIV/AIDS.

  15. Discovery and profiling of novel and conserved microRNAs during flower development in Carya cathayensis via deep sequencing.

    PubMed

    Wang, Zheng Jia; Huang, Jian Qin; Huang, You Jun; Li, Zheng; Zheng, Bing Song

    2012-08-01

    Hickory (Carya cathayensis Sarg.) is an economically important woody plant in China, but its long juvenile phase delays yield. MicroRNAs (miRNAs) are critical regulators of genes and important for normal plant development and physiology, including flower development. We used Solexa technology to sequence two small RNA libraries from two floral differentiation stages in hickory to identify miRNAs related to flower development. We identified 39 conserved miRNA sequences from 114 loci belonging to 23 families as well as two novel and ten potential novel miRNAs belonging to nine families. Moreover, 35 conserved miRNA*s and two novel miRNA*s were detected. Twenty miRNA sequences from 49 loci belonging to 11 families were differentially expressed; all were up-regulated at the later stage of flower development in hickory. Quantitative real-time PCR of 12 conserved miRNA sequences, five novel miRNA families, and two novel miRNA*s validated that all were expressed during hickory flower development, and the expression patterns were similar to those detected with Solexa sequencing. Finally, a total of 146 targets of the novel and conserved miRNAs were predicted. This study identified a diverse set of miRNAs that were closely related to hickory flower development and that could help in plant floral induction.

  16. Genome-wide computational identification of microRNAs and their targets in the deep-branching eukaryote Giardia lamblia.

    PubMed

    Zhang, Yan-Qiong; Chen, Dong-Liang; Tian, Hai-Feng; Zhang, Bao-Hong; Wen, Jian-Fan

    2009-10-01

    Using a combined computational program, we identified 50 potential microRNAs (miRNAs) in Giardia lamblia, one of the most primitive unicellular eukaryotes. These miRNAs are unique to G. lamblia and no homologues have been found in other organisms; miRNAs, currently known in other species, were not found in G. lamblia. This suggests that miRNA biogenesis and miRNA-mediated gene regulation pathway may evolve independently, especially in evolutionarily distant lineages. A majority (43) of the predicted miRNAs are located at one single locus; however, some miRNAs have two or more copies in the genome. Among the 58 miRNA genes, 28 are located in the intergenic regions whereas 30 are present in the anti-sense strands of the protein-coding sequences. Five predicted miRNAs are expressed in G. lamblia trophozoite cells evidenced by expressed sequence tags or RT-PCR. Thirty-seven identified miRNAs may target 50 protein-coding genes, including seven variant-specific surface proteins (VSPs). Our findings provide a clue that miRNA-mediated gene regulation may exist in the early stage of eukaryotic evolution, suggesting that it is an important regulation system ubiquitous in eukaryotes.

  17. High-throughput sequencing of the entire genomic regions of CCM1/KRIT1, CCM2 and CCM3/PDCD10 to search for pathogenic deep-intronic splice mutations in cerebral cavernous malformations.

    PubMed

    Rath, Matthias; Jenssen, Sönke E; Schwefel, Konrad; Spiegler, Stefanie; Kleimeier, Dana; Sperling, Christian; Kaderali, Lars; Felbor, Ute

    2017-09-01

    Cerebral cavernous malformations (CCM) are vascular lesions of the central nervous system that can cause headaches, seizures and hemorrhagic stroke. Disease-associated mutations have been identified in three genes: CCM1/KRIT1, CCM2 and CCM3/PDCD10. The precise proportion of deep-intronic variants in these genes and their clinical relevance is yet unknown. Here, a long-range PCR (LR-PCR) approach for target enrichment of the entire genomic regions of the three genes was combined with next generation sequencing (NGS) to screen for coding and non-coding variants. NGS detected all six CCM1/KRIT1, two CCM2 and four CCM3/PDCD10 mutations that had previously been identified by Sanger sequencing. Two of the pathogenic variants presented here are novel. Additionally, 20 stringently selected CCM index cases that had remained mutation-negative after conventional sequencing and exclusion of copy number variations were screened for deep-intronic mutations. The combination of bioinformatics filtering and transcript analyses did not reveal any deep-intronic splice mutations in these cases. Our results demonstrate that target enrichment by LR-PCR combined with NGS can be used for a comprehensive analysis of the entire genomic regions of the CCM genes in a research context. However, its clinical utility is limited as deep-intronic splice mutations in CCM1/KRIT1, CCM2 and CCM3/PDCD10 seem to be rather rare. Copyright © 2017 Elsevier Masson SAS. All rights reserved.

  18. Musculoskeletal MRI findings of juvenile localized scleroderma.

    PubMed

    Eutsler, Eric P; Horton, Daniel B; Epelman, Monica; Finkel, Terri; Averill, Lauren W

    2017-04-01

    Juvenile localized scleroderma comprises a group of autoimmune conditions often characterized clinically by an area of skin hardening. In addition to superficial changes in the skin and subcutaneous tissues, juvenile localized scleroderma may involve the deep soft tissues, bones and joints, possibly resulting in functional impairment and pain in addition to cosmetic changes. There is literature documenting the spectrum of findings for deep involvement of localized scleroderma (fascia, muscles, tendons, bones and joints) in adults, but there is limited literature for the condition in children. We aimed to document the spectrum of musculoskeletal magnetic resonance imaging (MRI) findings of both superficial and deep juvenile localized scleroderma involvement in children and to evaluate the utility of various MRI sequences for detecting those findings. Two radiologists retrospectively evaluated 20 MRI studies of the extremities in 14 children with juvenile localized scleroderma. Each imaging sequence was also given a subjective score of 0 (not useful), 1 (somewhat useful) or 2 (most useful for detecting the findings). Deep tissue involvement was detected in 65% of the imaged extremities. Fascial thickening and enhancement were seen in 50% of imaged extremities. Axial T1, axial T1 fat-suppressed (FS) contrast-enhanced and axial fluid-sensitive sequences were rated most useful. Fascial thickening and enhancement were the most commonly encountered deep tissue findings in extremity MRIs of children with juvenile localized scleroderma. Because abnormalities of the skin, subcutaneous tissues and fascia tend to run longitudinally in an affected limb, axial T1, axial fluid-sensitive and axial T1-FS contrast-enhanced sequences should be included in the imaging protocol.

  19. Dissecting enzyme function with microfluidic-based deep mutational scanning.

    PubMed

    Romero, Philip A; Tran, Tuan M; Abate, Adam R

    2015-06-09

    Natural enzymes are incredibly proficient catalysts, but engineering them to have new or improved functions is challenging due to the complexity of how an enzyme's sequence relates to its biochemical properties. Here, we present an ultrahigh-throughput method for mapping enzyme sequence-function relationships that combines droplet microfluidic screening with next-generation DNA sequencing. We apply our method to map the activity of millions of glycosidase sequence variants. Microfluidic-based deep mutational scanning provides a comprehensive and unbiased view of the enzyme function landscape. The mapping displays expected patterns of mutational tolerance and a strong correspondence to sequence variation within the enzyme family, but also reveals previously unreported sites that are crucial for glycosidase function. We modified the screening protocol to include a high-temperature incubation step, and the resulting thermotolerance landscape allowed the discovery of mutations that enhance enzyme thermostability. Droplet microfluidics provides a general platform for enzyme screening that, when combined with DNA-sequencing technologies, enables high-throughput mapping of enzyme sequence space.

  20. Complete genome sequence of Southern tomato virus naturally infecting tomatoes in Bangladesh using small RNA deep sequencing

    USDA-ARS?s Scientific Manuscript database

    The complete genome sequence of a Southern tomato virus (STV) isolate on tomato plants in a seed production field in Bangladesh was obtained for the first time using next generation sequencing. The identified isolate STV_BD-13 shares high degree of sequence identity (99%) with several known STV isol...

  1. Complete genome sequence of southern tomato virus identified from China using next generation sequencing

    USDA-ARS?s Scientific Manuscript database

    Complete genome sequence of a double-stranded RNA (dsRNA) virus, southern tomato virus (STV), on tomatoes in China, was elucidated using small RNAs deep sequencing. The identified STV_CN12 shares 99% sequence identity to other isolates from Mexico, France, Spain, and U.S. This is the first report ...

  2. Deep whole-genome sequencing of 100 southeast Asian Malays.

    PubMed

    Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2013-01-10

    Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  3. Deep Whole-Genome Sequencing of 100 Southeast Asian Malays

    PubMed Central

    Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2013-01-01

    Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. PMID:23290073

  4. High-Throughput Sequencing of Campylobacter jejuni Insertion Mutant Libraries Reveals mapA as a Fitness Factor for Chicken Colonization

    PubMed Central

    Johnson, Jeremiah G.; Livny, Jonathan

    2014-01-01

    Campylobacter jejuni is a leading cause of gastrointestinal infections worldwide, due primarily to its ability to asymptomatically colonize the gastrointestinal tracts of agriculturally relevant animals, including chickens. Infection often occurs following consumption of meat that was contaminated by C. jejuni during harvest. Because of this, much interest lies in understanding the mechanisms that allow C. jejuni to colonize the chicken gastrointestinal tract. To address this, we generated a C. jejuni transposon mutant library that is amenable to insertion sequencing and introduced this mutant pool into day-of-hatch chicks. Following deep sequencing of C. jejuni mutants in the cecal outputs, several novel factors required for efficient colonization of the chicken gastrointestinal tract were identified, including the predicted outer membrane protein MapA. A mutant strain lacking mapA was constructed and found to be significantly reduced for chicken colonization in both competitive infections and monoinfections. Further, we found that mapA is required for in vitro competition with wild-type C. jejuni but is dispensable for growth in monoculture. PMID:24633877

  5. High-throughput sequencing of Campylobacter jejuni insertion mutant libraries reveals mapA as a fitness factor for chicken colonization.

    PubMed

    Johnson, Jeremiah G; Livny, Jonathan; Dirita, Victor J

    2014-06-01

    Campylobacter jejuni is a leading cause of gastrointestinal infections worldwide, due primarily to its ability to asymptomatically colonize the gastrointestinal tracts of agriculturally relevant animals, including chickens. Infection often occurs following consumption of meat that was contaminated by C. jejuni during harvest. Because of this, much interest lies in understanding the mechanisms that allow C. jejuni to colonize the chicken gastrointestinal tract. To address this, we generated a C. jejuni transposon mutant library that is amenable to insertion sequencing and introduced this mutant pool into day-of-hatch chicks. Following deep sequencing of C. jejuni mutants in the cecal outputs, several novel factors required for efficient colonization of the chicken gastrointestinal tract were identified, including the predicted outer membrane protein MapA. A mutant strain lacking mapA was constructed and found to be significantly reduced for chicken colonization in both competitive infections and monoinfections. Further, we found that mapA is required for in vitro competition with wild-type C. jejuni but is dispensable for growth in monoculture.

  6. In chronic myeloid leukemia patients on second-line tyrosine kinase inhibitor therapy, deep sequencing of BCR-ABL1 at the time of warning may allow sensitive detection of emerging drug-resistant mutants.

    PubMed

    Soverini, Simona; De Benedittis, Caterina; Castagnetti, Fausto; Gugliotta, Gabriele; Mancini, Manuela; Bavaro, Luana; Machova Polakova, Katerina; Linhartova, Jana; Iurlo, Alessandra; Russo, Domenico; Pane, Fabrizio; Saglio, Giuseppe; Rosti, Gianantonio; Cavo, Michele; Baccarani, Michele; Martinelli, Giovanni

    2016-08-02

    Imatinib-resistant chronic myeloid leukemia (CML) patients receiving second-line tyrosine kinase inhibitor (TKI) therapy with dasatinib or nilotinib have a higher risk of disease relapse and progression and not infrequently BCR-ABL1 kinase domain (KD) mutations are implicated in therapeutic failure. In this setting, earlier detection of emerging BCR-ABL1 KD mutations would offer greater chances of efficacy for subsequent salvage therapy and limit the biological consequences of full BCR-ABL1 kinase reactivation. Taking advantage of an already set up and validated next-generation deep amplicon sequencing (DS) assay, we aimed to assess whether DS may allow a larger window of detection of emerging BCR-ABL1 KD mutants predicting for an impending relapse. a total of 125 longitudinal samples from 51 CML patients who had acquired dasatinib- or nilotinib-resistant mutations during second-line therapy were analyzed by DS from the time of failure and mutation detection by conventional sequencing backwards. BCR-ABL1/ABL1%(IS) transcript levels were used to define whether the patient had 'optimal response', 'warning' or 'failure' at the time of first mutation detection by DS. DS was able to backtrack dasatinib- or nilotinib-resistant mutations to the previous sample(s) in 23/51 (45 %) pts. Median mutation burden at the time of first detection by DS was 5.5 % (range, 1.5-17.5 %); median interval between detection by DS and detection by conventional sequencing was 3 months (range, 1-9 months). In 5 cases, the mutations were detectable at baseline. In the remaining cases, response level at the time mutations were first detected by DS could be defined as 'Warning' (according to the 2013 ELN definitions of response to 2nd-line therapy) in 13 cases, as 'Optimal response' in one case, as 'Failure' in 4 cases. No dasatinib- or nilotinib-resistant mutations were detected by DS in 15 randomly selected patients with 'warning' at various timepoints, that later turned into optimal responders with no treatment changes. DS enables a larger window of detection of emerging BCR-ABL1 KD mutations predicting for an impending relapse. A 'Warning' response may represent a rational trigger, besides 'Failure', for DS-based mutation screening in CML patients undergoing second-line TKI therapy.

  7. A review on machine learning principles for multi-view biological data integration.

    PubMed

    Li, Yifeng; Wu, Fang-Xiang; Ngom, Alioune

    2018-03-01

    Driven by high-throughput sequencing techniques, modern genomic and clinical studies are in a strong need of integrative machine learning models for better use of vast volumes of heterogeneous information in the deep understanding of biological systems and the development of predictive models. How data from multiple sources (called multi-view data) are incorporated in a learning system is a key step for successful analysis. In this article, we provide a comprehensive review on omics and clinical data integration techniques, from a machine learning perspective, for various analyses such as prediction, clustering, dimension reduction and association. We shall show that Bayesian models are able to use prior information and model measurements with various distributions; tree-based methods can either build a tree with all features or collectively make a final decision based on trees learned from each view; kernel methods fuse the similarity matrices learned from individual views together for a final similarity matrix or learning model; network-based fusion methods are capable of inferring direct and indirect associations in a heterogeneous network; matrix factorization models have potential to learn interactions among features from different views; and a range of deep neural networks can be integrated in multi-modal learning for capturing the complex mechanism of biological systems.

  8. BUSCA: an integrative web server to predict subcellular localization of proteins.

    PubMed

    Savojardo, Castrense; Martelli, Pier Luigi; Fariselli, Piero; Profiti, Giuseppe; Casadio, Rita

    2018-04-30

    Here, we present BUSCA (http://busca.biocomp.unibo.it), a novel web server that integrates different computational tools for predicting protein subcellular localization. BUSCA combines methods for identifying signal and transit peptides (DeepSig and TPpred3), GPI-anchors (PredGPI) and transmembrane domains (ENSEMBLE3.0 and BetAware) with tools for discriminating subcellular localization of both globular and membrane proteins (BaCelLo, MemLoci and SChloro). Outcomes from the different tools are processed and integrated for annotating subcellular localization of both eukaryotic and bacterial protein sequences. We benchmark BUSCA against protein targets derived from recent CAFA experiments and other specific data sets, reporting performance at the state-of-the-art. BUSCA scores better than all other evaluated methods on 2732 targets from CAFA2, with a F1 value equal to 0.49 and among the best methods when predicting targets from CAFA3. We propose BUSCA as an integrated and accurate resource for the annotation of protein subcellular localization.

  9. Use of sequence-independent-single-primer-amplification (SISPA) for whole genome sequencing using illumina MiSeq platform for avian influenza virus, Newcastle disease virus, and infectious bronchitis virus

    USDA-ARS?s Scientific Manuscript database

    Over the past decade, Next Generation Sequencing (NGS) technologies, also called deep sequencing, have continued to evolve, increasing capacity and lower the cost necessary for large genome sequencing projects. The one of the advantage of NGS platforms is the possibility to sequence the samples with...

  10. A novel wavelet sequence based on deep bidirectional LSTM network model for ECG signal classification.

    PubMed

    Yildirim, Özal

    2018-05-01

    Long-short term memory networks (LSTMs), which have recently emerged in sequential data analysis, are the most widely used type of recurrent neural networks (RNNs) architecture. Progress on the topic of deep learning includes successful adaptations of deep versions of these architectures. In this study, a new model for deep bidirectional LSTM network-based wavelet sequences called DBLSTM-WS was proposed for classifying electrocardiogram (ECG) signals. For this purpose, a new wavelet-based layer is implemented to generate ECG signal sequences. The ECG signals were decomposed into frequency sub-bands at different scales in this layer. These sub-bands are used as sequences for the input of LSTM networks. New network models that include unidirectional (ULSTM) and bidirectional (BLSTM) structures are designed for performance comparisons. Experimental studies have been performed for five different types of heartbeats obtained from the MIT-BIH arrhythmia database. These five types are Normal Sinus Rhythm (NSR), Ventricular Premature Contraction (VPC), Paced Beat (PB), Left Bundle Branch Block (LBBB), and Right Bundle Branch Block (RBBB). The results show that the DBLSTM-WS model gives a high recognition performance of 99.39%. It has been observed that the wavelet-based layer proposed in the study significantly improves the recognition performance of conventional networks. This proposed network structure is an important approach that can be applied to similar signal processing problems. Copyright © 2018 Elsevier Ltd. All rights reserved.

  11. Special astronomical configurations, solar activity and deep degassing as a trigger of natural hazards

    NASA Astrophysics Data System (ADS)

    Natyaganov, Vladimir; Syvorotkin, Vladimir; Fedorov, Valeriy; Shopin, Sergey

    2016-04-01

    Extraordinary cases of tectonic events (strong earthquakes, volcano eruptions), mine explosions, typhoons, hurricanes, tornado outbreak sequences, ball lightnings, transient luminous events are analyzed in relation with special astronomical configurations, which are specific relative positions of the Sun, Earth, Moon and the closest planets of the Solar System (Venus, Mars and Jupiter) [1]. Usage of special astronomical coordinate systems give evidence not only of correlations but also of hidden causes-and-effect relations between the analyzed phenomena. The geocentric ecliptic latitude system is an example of such astronomical coordinate systems. It gives clear evidence of coherence between strong earthquakes and the maximal Moon declination from the plane of the ecliptic. Extraordinary cases of planet activity from the beginning of XX century till the present time are shown in the years of special astronomical configurations and abrupt increasing of solar activity. According to the empirical scheme of short-term earthquake prediction [3], geomagnetic disturbances are the triggers of earthquakes. Geomagnetic disturbances perform electromagnetic pumping (electromagnetic excitation) of the Earth's interior in the regions of intersections of seismomagnetic meridians with the plate boundaries as a result of electrothermal breakdowns in the heterogeneous medium of tectonic faults. This results in the local intensification of deep degassing [4], decreasing of shear strength of the medium that triggers earthquakes usually after 2 or 3 weeks (±2 days) after the geomagnetic disturbance. Examples of officially registered predictions of Kamchatka earthquakes with M7+ without missing events, including deep-focus earthquakes in the Okhotsk Sea since the year of 2002, are shown. It is discussed correlations and possible cause-and-effect relations between a different phenomena such as - dangerous natural hazardous events such as the record tornado outbreak sequences in the USA (May 2003, 400 tornadoes in 20 states and the 2011 Superb Outbreak in April 2011 (580 tornadoes), which corresponds to a third and about a half of the average annual number of tornadoes) - naturally-anthropogenic accidents with gas explosions in diggings and coal mines [4]; - special Moon phases (new moons and full moons); - local intensification of deep hydrogen-methane degassing; - extensive spatial anomalies of total ozone content in the stratosphere; - strong geomagnetic disturbances. The work was financially supported by the Ministry of Education and Science of the Russian Federation (in accordance with the requirements of the contract No. 14.577.21.0109, project UID RFMEFI57714X0109) References 1. V. M. Fedorov, Gravitational factors and astronomical chronology of geosphere processes [Gravitacionnye faktory i astronomicheskaja hronologija geosfernyh processov]. Moscow State University, Moscow, 2000. 368p. (In Russian) 2. V. L. Natyaganov, A. M. Nechaev, I. V. Stepanov, "Spatio-tempral relations of planet tectonic activity [Prostranstvenno-vremennye zakonomernosti tektonicheskoj aktivnosti planety]", Eurasian Union of Scientists, 2015, No. 3(12), Vol. 8. pp. 120-123. (In Russian) 3. L. N. Doda, V. L. Natyaganov, I. V. Stepanov, "An empirical scheme of short-term earthquake prediction," Doklady Earth Sciences, vol. 453, no.5, pp.551-557, Dec., 2013 4. V. L. Syvorotkin. Deep degassing and global catastrophes. Geoinformcentr. Moscow, 2002, 250 p. (In Russian)

  12. Spotting L3 slice in CT scans using deep convolutional network and transfer learning.

    PubMed

    Belharbi, Soufiane; Chatelain, Clément; Hérault, Romain; Adam, Sébastien; Thureau, Sébastien; Chastan, Mathieu; Modzelewski, Romain

    2017-08-01

    In this article, we present a complete automated system for spotting a particular slice in a complete 3D Computed Tomography exam (CT scan). Our approach does not require any assumptions on which part of the patient's body is covered by the scan. It relies on an original machine learning regression approach. Our models are learned using the transfer learning trick by exploiting deep architectures that have been pre-trained on imageNet database, and therefore it requires very little annotation for its training. The whole pipeline consists of three steps: i) conversion of the CT scans into Maximum Intensity Projection (MIP) images, ii) prediction from a Convolutional Neural Network (CNN) applied in a sliding window fashion over the MIP image, and iii) robust analysis of the prediction sequence to predict the height of the desired slice within the whole CT scan. Our approach is applied to the detection of the third lumbar vertebra (L3) slice that has been found to be representative to the whole body composition. Our system is evaluated on a database collected in our clinical center, containing 642 CT scans from different patients. We obtained an average localization error of 1.91±2.69 slices (less than 5 mm) in an average time of less than 2.5 s/CT scan, allowing integration of the proposed system into daily clinical routines. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. Distributional prediction of Pleistocene forearc minibasin turbidites in the NE Nankai Trough area (off central Japan)

    NASA Astrophysics Data System (ADS)

    Egawa, K.; Furukawa, T.; Saeki, T.; Suzuki, K.; Narita, H.

    2011-12-01

    Natural gas hydrate-related sequences commonly provide unclear seismic images due to bottom simulating reflector, a seismic indicator of the theoretical base of gas hydrate stability zone, which usually causes problems for fully analyzing the detailed sedimentary structures and seismic facies. Here we propose an alternative technique to predict the distributional pattern of gas hydrate-related deep-sea turbidites with special reference to a Pleistocene forearc minibasin in the northeastern Nankai Trough area, off central Japan, from the integrated 3D structural and sedimentologic modeling. Structural unfolding and stratigraphic backstripping successively modeled a simple horseshoe-shaped paleobathymetry of the targeted turbidite sequence. Based on best-fit matching of net-to-gross ratio (or sand fraction) between the model and wells, subsequent turbidity current modeling on the restored paleobathymetric surface during a single flow event demonstrated excellent prediction results showing the morphologically controlled turbidity current evolution and selective turbidite sand distribution within the modeled minibasin. Also, multiple turbidity current modeling indicated the stacking sheet turbidites with regression and proximal/distal onlaps in the minibasin due to reflections off an opposing slope, whose sedimentary features are coincident with the seismic interpretation. Such modeling works can help us better understand the depositional pattern of gas hydrate-related, unconsolidated turbidites and also can improve gas hydrate reservoir characterization. This study was financially supported by MH21 Research Consortium.

  14. Prognostic value of deep sequencing method for minimal residual disease detection in multiple myeloma

    PubMed Central

    Lahuerta, Juan J.; Pepin, François; González, Marcos; Barrio, Santiago; Ayala, Rosa; Puig, Noemí; Montalban, María A.; Paiva, Bruno; Weng, Li; Jiménez, Cristina; Sopena, María; Moorhead, Martin; Cedena, Teresa; Rapado, Immaculada; Mateos, María Victoria; Rosiñol, Laura; Oriol, Albert; Blanchard, María J.; Martínez, Rafael; Bladé, Joan; San Miguel, Jesús; Faham, Malek; García-Sanz, Ramón

    2014-01-01

    We assessed the prognostic value of minimal residual disease (MRD) detection in multiple myeloma (MM) patients using a sequencing-based platform in bone marrow samples from 133 MM patients in at least very good partial response (VGPR) after front-line therapy. Deep sequencing was carried out in patients in whom a high-frequency myeloma clone was identified and MRD was assessed using the IGH-VDJH, IGH-DJH, and IGK assays. The results were contrasted with those of multiparametric flow cytometry (MFC) and allele-specific oligonucleotide polymerase chain reaction (ASO-PCR). The applicability of deep sequencing was 91%. Concordance between sequencing and MFC and ASO-PCR was 83% and 85%, respectively. Patients who were MRD– by sequencing had a significantly longer time to tumor progression (TTP) (median 80 vs 31 months; P < .0001) and overall survival (median not reached vs 81 months; P = .02), compared with patients who were MRD+. When stratifying patients by different levels of MRD, the respective TTP medians were: MRD ≥10−3 27 months, MRD 10−3 to 10−5 48 months, and MRD <10−5 80 months (P = .003 to .0001). Ninety-two percent of VGPR patients were MRD+. In complete response patients, the TTP remained significantly longer for MRD– compared with MRD+ patients (131 vs 35 months; P = .0009). PMID:24646471

  15. Construction and Evaluation of Normalized cDNA Libraries Enriched with Full-Length Sequences for Rapid Discovery of New Genes from Sisal (Agave sisalana Perr.) Different Developmental Stages

    PubMed Central

    Zhou, Wen-Zhao; Zhang, Yan-Mei; Lu, Jun-Ying; Li, Jun-Feng

    2012-01-01

    To provide a resource of sisal-specific expressed sequence data and facilitate this powerful approach in new gene research, the preparation of normalized cDNA libraries enriched with full-length sequences is necessary. Four libraries were produced with RNA pooled from Agave sisalana multiple tissues to increase efficiency of normalization and maximize the number of independent genes by SMART™ method and the duplex-specific nuclease (DSN). This procedure kept the proportion of full-length cDNAs in the subtracted/normalized libraries and dramatically enhanced the discovery of new genes. Sequencing of 3875 cDNA clones of libraries revealed 3320 unigenes with an average insert length about 1.2 kb, indicating that the non-redundancy of libraries was about 85.7%. These unigene functions were predicted by comparing their sequences to functional domain databases and extensively annotated with Gene Ontology (GO) terms. Comparative analysis of sisal unigenes and other plant genomes revealed that four putative MADS-box genes and knotted-like homeobox (knox) gene were obtained from a total of 1162 full-length transcripts. Furthermore, real-time PCR showed that the characteristics of their transcripts mainly depended on the tight expression regulation of a number of genes during the leaf and flower development. Analysis of individual library sequence data indicated that the pooled-tissue approach was highly effective in discovering new genes and preparing libraries for efficient deep sequencing. PMID:23202944

  16. Deep sequencing and flow cytometric characterization of expanded effector memory CD8+CD57+ T cells frequently reveals T-cell receptor Vβ oligoclonality and CDR3 homology in acquired aplastic anemia.

    PubMed

    Giudice, Valentina; Feng, Xingmin; Lin, Zenghua; Hu, Wei; Zhang, Fanmao; Qiao, Wangmin; Ibanez, Maria Del Pilar Fernandez; Rios, Olga; Young, Neal S

    2018-05-01

    Oligoclonal expansion of CD8 + CD28 - lymphocytes has been considered indirect evidence for a pathogenic immune response in acquired aplastic anemia. A subset of CD8 + CD28 - cells with CD57 expression, termed effector memory cells, is expanded in several immune-mediated diseases and may have a role in immune surveillance. We hypothesized that effector memory CD8 + CD28 - CD57 + cells may drive aberrant oligoclonal expansion in aplastic anemia. We found CD8 + CD57 + cells frequently expanded in the blood of aplastic anemia patients, with oligoclonal characteristics by flow cytometric Vβ usage analysis: skewing in 1-5 Vβ families and frequencies of immunodominant clones ranging from 1.98% to 66.5%. Oligoclonal characteristics were also observed in total CD8 + cells from aplastic anemia patients with CD8 + CD57 + cell expansion by T-cell receptor deep sequencing, as well as the presence of 1-3 immunodominant clones. Oligoclonality was confirmed by T-cell receptor repertoire deep sequencing of enriched CD8 + CD57 + cells, which also showed decreased diversity compared to total CD4 + and CD8 + cell pools. From analysis of complementarity-determining region 3 sequences in the CD8 + cell pool, a total of 29 sequences were shared between patients and controls, but these sequences were highly expressed in aplastic anemia subjects and also present in their immunodominant clones. In summary, expansion of effector memory CD8 + T cells is frequent in aplastic anemia and mirrors Vβ oligoclonal expansion. Flow cytometric Vβ usage analysis combined with deep sequencing technologies allows high resolution characterization of the T-cell receptor repertoire, and might represent a useful tool in the diagnosis and periodic evaluation of aplastic anemia patients. (Registered at clinicaltrials.gov identifiers: 00001620, 01623167, 00001397, 00071045, 00081523, 00961064 ). Copyright © 2018 Ferrata Storti Foundation.

  17. Cross-Participant EEG-Based Assessment of Cognitive Workload Using Multi-Path Convolutional Recurrent Neural Networks.

    PubMed

    Hefron, Ryan; Borghetti, Brett; Schubert Kabban, Christine; Christensen, James; Estepp, Justin

    2018-04-26

    Applying deep learning methods to electroencephalograph (EEG) data for cognitive state assessment has yielded improvements over previous modeling methods. However, research focused on cross-participant cognitive workload modeling using these techniques is underrepresented. We study the problem of cross-participant state estimation in a non-stimulus-locked task environment, where a trained model is used to make workload estimates on a new participant who is not represented in the training set. Using experimental data from the Multi-Attribute Task Battery (MATB) environment, a variety of deep neural network models are evaluated in the trade-space of computational efficiency, model accuracy, variance and temporal specificity yielding three important contributions: (1) The performance of ensembles of individually-trained models is statistically indistinguishable from group-trained methods at most sequence lengths. These ensembles can be trained for a fraction of the computational cost compared to group-trained methods and enable simpler model updates. (2) While increasing temporal sequence length improves mean accuracy, it is not sufficient to overcome distributional dissimilarities between individuals’ EEG data, as it results in statistically significant increases in cross-participant variance. (3) Compared to all other networks evaluated, a novel convolutional-recurrent model using multi-path subnetworks and bi-directional, residual recurrent layers resulted in statistically significant increases in predictive accuracy and decreases in cross-participant variance.

  18. ASR5 is involved in the regulation of miRNA expression in rice.

    PubMed

    Neto, Lauro Bücker; Arenhart, Rafael Augusto; de Oliveira, Luiz Felipe Valter; de Lima, Júlio Cesar; Bodanese-Zanettini, Maria Helena; Margis, Rogerio; Margis-Pinheiro, Márcia

    2015-11-01

    The work describes an ASR knockdown transcriptomic analysis by deep sequencing of rice root seedlings and the transactivation of ASR cis-acting elements in the upstream region of a MIR gene. MicroRNAs are key regulators of gene expression that guide post-transcriptional control of plant development and responses to environmental stresses. ASR (ABA, Stress and Ripening) proteins are plant-specific transcription factors with key roles in different biological processes. In rice, ASR proteins have been suggested to participate in the regulation of stress response genes. This work describes the transcriptomic analysis by deep sequencing two libraries, comparing miRNA abundance from the roots of transgenic ASR5 knockdown rice seedlings with that of the roots of wild-type non-transformed rice seedlings. Members of 59 miRNA families were detected, and 276 mature miRNAs were identified. Our analysis detected 112 miRNAs that were differentially expressed between the two libraries. A predicted inverse correlation between miR167abc and its target gene (LOC_Os07g29820) was confirmed using RT-qPCR. Protoplast transactivation assays showed that ASR5 is able to recognize binding sites upstream of the MIR167a gene and drive its expression in vivo. Together, our data establish a comparative study of miRNAome profiles and is the first study to suggest the involvement of ASR proteins in miRNA gene regulation.

  19. iSS-PC: Identifying Splicing Sites via Physical-Chemical Properties Using Deep Sparse Auto-Encoder.

    PubMed

    Xu, Zhao-Chun; Wang, Peng; Qiu, Wang-Ren; Xiao, Xuan

    2017-08-15

    Gene splicing is one of the most significant biological processes in eukaryotic gene expression, such as RNA splicing, which can cause a pre-mRNA to produce one or more mature messenger RNAs containing the coded information with multiple biological functions. Thus, identifying splicing sites in DNA/RNA sequences is significant for both the bio-medical research and the discovery of new drugs. However, it is expensive and time consuming based only on experimental technique, so new computational methods are needed. To identify the splice donor sites and splice acceptor sites accurately and quickly, a deep sparse auto-encoder model with two hidden layers, called iSS-PC, was constructed based on minimum error law, in which we incorporated twelve physical-chemical properties of the dinucleotides within DNA into PseDNC to formulate given sequence samples via a battery of cross-covariance and auto-covariance transformations. In this paper, five-fold cross-validation test results based on the same benchmark data-sets indicated that the new predictor remarkably outperformed the existing prediction methods in this field. Furthermore, it is expected that many other related problems can be also studied by this approach. To implement classification accurately and quickly, an easy-to-use web-server for identifying slicing sites has been established for free access at: http://www.jci-bioinfo.cn/iSS-PC.

  20. Differential Expression of microRNAs in the Ovaries from Letrozole-Induced Rat Model of Polycystic Ovary Syndrome.

    PubMed

    Li, Dandan; Li, Chunjin; Xu, Ying; Xu, Duo; Li, Hongjiao; Gao, Liwei; Chen, Shuxiong; Fu, Lulu; Xu, Xin; Liu, Yongzheng; Zhang, Xueying; Zhang, Jingshun; Ming, Hao; Zheng, Lianwen

    2016-04-01

    Polycystic ovary syndrome (PCOS) is a complex and heterogeneous endocrine disorder. To understand the pathogenesis of PCOS, we established rat models of PCOS induced by letrozole and employed deep sequencing to screen the differential expression of microRNAs (miRNAs) in PCOS rats and control rats. We observed vaginal smear and detected ovarian pathological alteration and hormone level changes in PCOS rats. Deep sequencing showed that a total of 129 miRNAs were differentially expressed in the ovaries from letrozole-induced rat model compared with the control, including 49 miRNAs upregulated and 80 miRNAs downregulated. Furthermore, the differential expression of miR-201-5p, miR-34b-5p, miR-141-3p, and miR-200a-3p were confirmed by real-time polymerase chain reaction. Bioinformatic analysis revealed that these four miRNAs were predicted to target a large set of genes with different functions. Pathway analysis supported that the miRNAs regulate oocyte meiosis, mitogen-activated protein kinase (MAPK) signaling, phosphoinositide 3-kinase/Akt (PI3K-Akt) signaling, Rap1 signaling, and Notch signaling. These data indicate that miRNAs are differentially expressed in rat PCOS model and the differentially expressed miRNA are involved in the etiology and pathophysiology of PCOS. Our findings will help identify miRNAs as novel diagnostic markers and therapeutic targets for PCOS.

  1. Lifelong learning of human actions with deep neural network self-organization.

    PubMed

    Parisi, German I; Tani, Jun; Weber, Cornelius; Wermter, Stefan

    2017-12-01

    Lifelong learning is fundamental in autonomous robotics for the acquisition and fine-tuning of knowledge through experience. However, conventional deep neural models for action recognition from videos do not account for lifelong learning but rather learn a batch of training data with a predefined number of action classes and samples. Thus, there is the need to develop learning systems with the ability to incrementally process available perceptual cues and to adapt their responses over time. We propose a self-organizing neural architecture for incrementally learning to classify human actions from video sequences. The architecture comprises growing self-organizing networks equipped with recurrent neurons for processing time-varying patterns. We use a set of hierarchically arranged recurrent networks for the unsupervised learning of action representations with increasingly large spatiotemporal receptive fields. Lifelong learning is achieved in terms of prediction-driven neural dynamics in which the growth and the adaptation of the recurrent networks are driven by their capability to reconstruct temporally ordered input sequences. Experimental results on a classification task using two action benchmark datasets show that our model is competitive with state-of-the-art methods for batch learning also when a significant number of sample labels are missing or corrupted during training sessions. Additional experiments show the ability of our model to adapt to non-stationary input avoiding catastrophic interference. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  2. Cross-Participant EEG-Based Assessment of Cognitive Workload Using Multi-Path Convolutional Recurrent Neural Networks

    PubMed Central

    Hefron, Ryan; Borghetti, Brett; Schubert Kabban, Christine; Christensen, James; Estepp, Justin

    2018-01-01

    Applying deep learning methods to electroencephalograph (EEG) data for cognitive state assessment has yielded improvements over previous modeling methods. However, research focused on cross-participant cognitive workload modeling using these techniques is underrepresented. We study the problem of cross-participant state estimation in a non-stimulus-locked task environment, where a trained model is used to make workload estimates on a new participant who is not represented in the training set. Using experimental data from the Multi-Attribute Task Battery (MATB) environment, a variety of deep neural network models are evaluated in the trade-space of computational efficiency, model accuracy, variance and temporal specificity yielding three important contributions: (1) The performance of ensembles of individually-trained models is statistically indistinguishable from group-trained methods at most sequence lengths. These ensembles can be trained for a fraction of the computational cost compared to group-trained methods and enable simpler model updates. (2) While increasing temporal sequence length improves mean accuracy, it is not sufficient to overcome distributional dissimilarities between individuals’ EEG data, as it results in statistically significant increases in cross-participant variance. (3) Compared to all other networks evaluated, a novel convolutional-recurrent model using multi-path subnetworks and bi-directional, residual recurrent layers resulted in statistically significant increases in predictive accuracy and decreases in cross-participant variance. PMID:29701668

  3. A Deep-Coverage Tomato BAC Library and Prospects Toward Development of an STC Framework for Genome Sequencing

    PubMed Central

    Budiman, Muhammad A.; Mao, Long; Wood, Todd C.; Wing, Rod A.

    2000-01-01

    Recently a new strategy using BAC end sequences as sequence-tagged connectors (STCs) was proposed for whole-genome sequencing projects. In this study, we present the construction and detailed characterization of a 15.0 haploid genome equivalent BAC library for the cultivated tomato, Lycopersicon esculentum cv. Heinz 1706. The library contains 129,024 clones with an average insert size of 117.5 kb and a chloroplast content of 1.11%. BAC end sequences from 1490 ends were generated and analyzed as a preliminary evaluation for using this library to develop an STC framework to sequence the tomato genome. A total of 1205 BAC end sequences (80.9%) were obtained, with an average length of 360 high-quality bases, and were searched against the GenBank database. Using a cutoff expectation value of <10−6, and combining the results from BLASTN, BLASTX, and TBLASTX searches, 24.3% of the BAC end sequences were similar to known sequences, of which almost half (48.7%) share sequence similarities to retrotransposons and 7% to known genes. Some of the transposable element sequences were the first reported in tomato, such as sequences similar to maize transposon Activator (Ac) ORF and tobacco pararetrovirus-like sequences. Interestingly, there were no BAC end sequences similar to the highly repeated TGRI and TGRII elements. However, the majority (70.3%) of STCs did not share significant sequence similarities to any sequences in GenBank at either the DNA or predicted protein levels, indicating that a large portion of the tomato genome is still unknown. Our data demonstrate that this BAC library is suitable for developing an STC database to sequence the tomato genome. The advantages of developing an STC framework for whole-genome sequencing of tomato are discussed. [The BAC end sequences described in this paper have been deposited in the GenBank data library under accession nos. AQ367111–AQ368361.] PMID:10645957

  4. Complete genome sequence of a tomato infecting tomato mottle mosaic virus in New York

    USDA-ARS?s Scientific Manuscript database

    Complete genome sequence of an emerging isolate of tomato mottle mosaic virus (ToMMV) infecting experimental nicotianan benthamiana plants in up-state New York was obtained using small RNA deep sequencing. ToMMV_NY-13 shared 99% sequence identity to ToMMV isolates from Mexico and Florida. Broader d...

  5. Benchmarking Deep Learning Models on Large Healthcare Datasets.

    PubMed

    Purushotham, Sanjay; Meng, Chuizheng; Che, Zhengping; Liu, Yan

    2018-06-04

    Deep learning models (aka Deep Neural Networks) have revolutionized many fields including computer vision, natural language processing, speech recognition, and is being increasingly used in clinical healthcare applications. However, few works exist which have benchmarked the performance of the deep learning models with respect to the state-of-the-art machine learning models and prognostic scoring systems on publicly available healthcare datasets. In this paper, we present the benchmarking results for several clinical prediction tasks such as mortality prediction, length of stay prediction, and ICD-9 code group prediction using Deep Learning models, ensemble of machine learning models (Super Learner algorithm), SAPS II and SOFA scores. We used the Medical Information Mart for Intensive Care III (MIMIC-III) (v1.4) publicly available dataset, which includes all patients admitted to an ICU at the Beth Israel Deaconess Medical Center from 2001 to 2012, for the benchmarking tasks. Our results show that deep learning models consistently outperform all the other approaches especially when the 'raw' clinical time series data is used as input features to the models. Copyright © 2018 Elsevier Inc. All rights reserved.

  6. Predicted deep-sea coral habitat suitability for the U.S. West coast.

    PubMed

    Guinotte, John M; Davies, Andrew J

    2014-01-01

    Regional scale habitat suitability models provide finer scale resolution and more focused predictions of where organisms may occur. Previous modelling approaches have focused primarily on local and/or global scales, while regional scale models have been relatively few. In this study, regional scale predictive habitat models are presented for deep-sea corals for the U.S. West Coast (California, Oregon and Washington). Model results are intended to aid in future research or mapping efforts and to assess potential coral habitat suitability both within and outside existing bottom trawl closures (i.e. Essential Fish Habitat (EFH)) and identify suitable habitat within U.S. National Marine Sanctuaries (NMS). Deep-sea coral habitat suitability was modelled at 500 m×500 m spatial resolution using a range of physical, chemical and environmental variables known or thought to influence the distribution of deep-sea corals. Using a spatial partitioning cross-validation approach, maximum entropy models identified slope, temperature, salinity and depth as important predictors for most deep-sea coral taxa. Large areas of highly suitable deep-sea coral habitat were predicted both within and outside of existing bottom trawl closures and NMS boundaries. Predicted habitat suitability over regional scales are not currently able to identify coral areas with pin point accuracy and probably overpredict actual coral distribution due to model limitations and unincorporated variables (i.e. data on distribution of hard substrate) that are known to limit their distribution. Predicted habitat results should be used in conjunction with multibeam bathymetry, geological mapping and other tools to guide future research efforts to areas with the highest probability of harboring deep-sea corals. Field validation of predicted habitat is needed to quantify model accuracy, particularly in areas that have not been sampled.

  7. Predicted Deep-Sea Coral Habitat Suitability for the U.S. West Coast

    PubMed Central

    Guinotte, John M.; Davies, Andrew J.

    2014-01-01

    Regional scale habitat suitability models provide finer scale resolution and more focused predictions of where organisms may occur. Previous modelling approaches have focused primarily on local and/or global scales, while regional scale models have been relatively few. In this study, regional scale predictive habitat models are presented for deep-sea corals for the U.S. West Coast (California, Oregon and Washington). Model results are intended to aid in future research or mapping efforts and to assess potential coral habitat suitability both within and outside existing bottom trawl closures (i.e. Essential Fish Habitat (EFH)) and identify suitable habitat within U.S. National Marine Sanctuaries (NMS). Deep-sea coral habitat suitability was modelled at 500 m×500 m spatial resolution using a range of physical, chemical and environmental variables known or thought to influence the distribution of deep-sea corals. Using a spatial partitioning cross-validation approach, maximum entropy models identified slope, temperature, salinity and depth as important predictors for most deep-sea coral taxa. Large areas of highly suitable deep-sea coral habitat were predicted both within and outside of existing bottom trawl closures and NMS boundaries. Predicted habitat suitability over regional scales are not currently able to identify coral areas with pin point accuracy and probably overpredict actual coral distribution due to model limitations and unincorporated variables (i.e. data on distribution of hard substrate) that are known to limit their distribution. Predicted habitat results should be used in conjunction with multibeam bathymetry, geological mapping and other tools to guide future research efforts to areas with the highest probability of harboring deep-sea corals. Field validation of predicted habitat is needed to quantify model accuracy, particularly in areas that have not been sampled. PMID:24759613

  8. Proteome-wide Identification of Novel Ceramide-binding Proteins by Yeast Surface cDNA Display and Deep Sequencing.

    PubMed

    Bidlingmaier, Scott; Ha, Kevin; Lee, Nam-Kyung; Su, Yang; Liu, Bin

    2016-04-01

    Although the bioactive sphingolipid ceramide is an important cell signaling molecule, relatively few direct ceramide-interacting proteins are known. We used an approach combining yeast surface cDNA display and deep sequencing technology to identify novel proteins binding directly to ceramide. We identified 234 candidate ceramide-binding protein fragments and validated binding for 20. Most (17) bound selectively to ceramide, although a few (3) bound to other lipids as well. Several novel ceramide-binding domains were discovered, including the EF-hand calcium-binding motif, the heat shock chaperonin-binding motif STI1, the SCP2 sterol-binding domain, and the tetratricopeptide repeat region motif. Interestingly, four of the verified ceramide-binding proteins (HPCA, HPCAL1, NCS1, and VSNL1) and an additional three candidate ceramide-binding proteins (NCALD, HPCAL4, and KCNIP3) belong to the neuronal calcium sensor family of EF hand-containing proteins. We used mutagenesis to map the ceramide-binding site in HPCA and to create a mutant HPCA that does not bind to ceramide. We demonstrated selective binding to ceramide by mammalian cell-produced wild type but not mutant HPCA. Intriguingly, we also identified a fragment from prostaglandin D2synthase that binds preferentially to ceramide 1-phosphate. The wide variety of proteins and domains capable of binding to ceramide suggests that many of the signaling functions of ceramide may be regulated by direct binding to these proteins. Based on the deep sequencing data, we estimate that our yeast surface cDNA display library covers ∼60% of the human proteome and our selection/deep sequencing protocol can identify target-interacting protein fragments that are present at extremely low frequency in the starting library. Thus, the yeast surface cDNA display/deep sequencing approach is a rapid, comprehensive, and flexible method for the analysis of protein-ligand interactions, particularly for the study of non-protein ligands. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.

  9. Identification and Removal of Contaminant Sequences From Ribosomal Gene Databases: Lessons From the Census of Deep Life

    PubMed Central

    Sheik, Cody S.; Reese, Brandi Kiel; Twing, Katrina I.; Sylvan, Jason B.; Grim, Sharon L.; Schrenk, Matthew O.; Sogin, Mitchell L.; Colwell, Frederick S.

    2018-01-01

    Earth’s subsurface environment is one of the largest, yet least studied, biomes on Earth, and many questions remain regarding what microorganisms are indigenous to the subsurface. Through the activity of the Census of Deep Life (CoDL) and the Deep Carbon Observatory, an open access 16S ribosomal RNA gene sequence database from diverse subsurface environments has been compiled. However, due to low quantities of biomass in the deep subsurface, the potential for incorporation of contaminants from reagents used during sample collection, processing, and/or sequencing is high. Thus, to understand the ecology of subsurface microorganisms (i.e., the distribution, richness, or survival), it is necessary to minimize, identify, and remove contaminant sequences that will skew the relative abundances of all taxa in the sample. In this meta-analysis, we identify putative contaminants associated with the CoDL dataset, recommend best practices for removing contaminants from samples, and propose a series of best practices for subsurface microbiology sampling. The most abundant putative contaminant genera observed, independent of evenness across samples, were Propionibacterium, Aquabacterium, Ralstonia, and Acinetobacter. While the top five most frequently observed genera were Pseudomonas, Propionibacterium, Acinetobacter, Ralstonia, and Sphingomonas. The majority of the most frequently observed genera (high evenness) were associated with reagent or potential human contamination. Additionally, in DNA extraction blanks, we observed potential archaeal contaminants, including methanogens, which have not been discussed in previous contamination studies. Such contaminants would directly affect the interpretation of subsurface molecular studies, as methanogenesis is an important subsurface biogeochemical process. Utilizing previously identified contaminant genera, we found that ∼27% of the total dataset were identified as contaminant sequences that likely originate from DNA extraction and DNA cleanup methods. Thus, controls must be taken at every step of the collection and processing procedure when working with low biomass environments such as, but not limited to, portions of Earth’s deep subsurface. Taken together, we stress that the CoDL dataset is an incredible resource for the broader research community interested in subsurface life, and steps to remove contamination derived sequences must be taken prior to using this dataset. PMID:29780369

  10. Low-abundance HIV drug-resistant viral variants in treatment-experienced persons correlate with historical antiretroviral use.

    PubMed

    Le, Thuy; Chiarella, Jennifer; Simen, Birgitte B; Hanczaruk, Bozena; Egholm, Michael; Landry, Marie L; Dieckhaus, Kevin; Rosen, Marc I; Kozal, Michael J

    2009-06-29

    It is largely unknown how frequently low-abundance HIV drug-resistant variants at levels under limit of detection of conventional genotyping (<20% of quasi-species) are present in antiretroviral-experienced persons experiencing virologic failure. Further, the clinical implications of low-abundance drug-resistant variants at time of virologic failure are unknown. Plasma samples from 22 antiretroviral-experienced subjects collected at time of virologic failure (viral load 1380 to 304,000 copies/mL) were obtained from a specimen bank (from 2004-2007). The prevalence and profile of drug-resistant mutations were determined using Sanger sequencing and ultra-deep pyrosequencing. Genotypes were interpreted using Stanford HIV database algorithm. Antiretroviral treatment histories were obtained by chart review and correlated with drug-resistant mutations. Low-abundance drug-resistant mutations were detected in all 22 subjects by deep sequencing and only in 3 subjects by Sanger sequencing. In total they accounted for 90 of 247 mutations (36%) detected by deep sequencing; the majority of these (95%) were not detected by standard genotyping. A mean of 4 additional mutations per subject were detected by deep sequencing (p<0.0001, 95%CI: 2.85-5.53). The additional low-abundance drug-resistant mutations increased a subject's genotypic resistance to one or more antiretrovirals in 17 of 22 subjects (77%). When correlated with subjects' antiretroviral treatment histories, the additional low-abundance drug-resistant mutations correlated with the failing antiretroviral drugs in 21% subjects and correlated with historical antiretroviral use in 79% subjects (OR, 13.73; 95% CI, 2.5-74.3, p = 0.0016). Low-abundance HIV drug-resistant mutations in antiretroviral-experienced subjects at time of virologic failure can increase a subject's overall burden of resistance, yet commonly go unrecognized by conventional genotyping. The majority of unrecognized resistant mutations correlate with historical antiretroviral use. Ultra-deep sequencing can provide important historical resistance information for clinicians when planning subsequent antiretroviral regimens for highly treatment-experienced patients, particularly when their prior treatment histories and longitudinal genotypes are not available.

  11. Low-Abundance HIV Drug-Resistant Viral Variants in Treatment-Experienced Persons Correlate with Historical Antiretroviral Use

    PubMed Central

    Le, Thuy; Chiarella, Jennifer; Simen, Birgitte B.; Hanczaruk, Bozena; Egholm, Michael; Landry, Marie L.; Dieckhaus, Kevin; Rosen, Marc I.; Kozal, Michael J.

    2009-01-01

    Background It is largely unknown how frequently low-abundance HIV drug-resistant variants at levels under limit of detection of conventional genotyping (<20% of quasi-species) are present in antiretroviral-experienced persons experiencing virologic failure. Further, the clinical implications of low-abundance drug-resistant variants at time of virologic failure are unknown. Methodology/Principal Findings Plasma samples from 22 antiretroviral-experienced subjects collected at time of virologic failure (viral load 1380 to 304,000 copies/mL) were obtained from a specimen bank (from 2004–2007). The prevalence and profile of drug-resistant mutations were determined using Sanger sequencing and ultra-deep pyrosequencing. Genotypes were interpreted using Stanford HIV database algorithm. Antiretroviral treatment histories were obtained by chart review and correlated with drug-resistant mutations. Low-abundance drug-resistant mutations were detected in all 22 subjects by deep sequencing and only in 3 subjects by Sanger sequencing. In total they accounted for 90 of 247 mutations (36%) detected by deep sequencing; the majority of these (95%) were not detected by standard genotyping. A mean of 4 additional mutations per subject were detected by deep sequencing (p<0.0001, 95%CI: 2.85–5.53). The additional low-abundance drug-resistant mutations increased a subject's genotypic resistance to one or more antiretrovirals in 17 of 22 subjects (77%). When correlated with subjects' antiretroviral treatment histories, the additional low-abundance drug-resistant mutations correlated with the failing antiretroviral drugs in 21% subjects and correlated with historical antiretroviral use in 79% subjects (OR, 13.73; 95% CI, 2.5–74.3, p = 0.0016). Conclusions/Significance Low-abundance HIV drug-resistant mutations in antiretroviral-experienced subjects at time of virologic failure can increase a subject's overall burden of resistance, yet commonly go unrecognized by conventional genotyping. The majority of unrecognized resistant mutations correlate with historical antiretroviral use. Ultra-deep sequencing can provide important historical resistance information for clinicians when planning subsequent antiretroviral regimens for highly treatment-experienced patients, particularly when their prior treatment histories and longitudinal genotypes are not available. PMID:19562031

  12. Identification and Removal of Contaminant Sequences From Ribosomal Gene Databases: Lessons From the Census of Deep Life.

    PubMed

    Sheik, Cody S; Reese, Brandi Kiel; Twing, Katrina I; Sylvan, Jason B; Grim, Sharon L; Schrenk, Matthew O; Sogin, Mitchell L; Colwell, Frederick S

    2018-01-01

    Earth's subsurface environment is one of the largest, yet least studied, biomes on Earth, and many questions remain regarding what microorganisms are indigenous to the subsurface. Through the activity of the Census of Deep Life (CoDL) and the Deep Carbon Observatory, an open access 16S ribosomal RNA gene sequence database from diverse subsurface environments has been compiled. However, due to low quantities of biomass in the deep subsurface, the potential for incorporation of contaminants from reagents used during sample collection, processing, and/or sequencing is high. Thus, to understand the ecology of subsurface microorganisms (i.e., the distribution, richness, or survival), it is necessary to minimize, identify, and remove contaminant sequences that will skew the relative abundances of all taxa in the sample. In this meta-analysis, we identify putative contaminants associated with the CoDL dataset, recommend best practices for removing contaminants from samples, and propose a series of best practices for subsurface microbiology sampling. The most abundant putative contaminant genera observed, independent of evenness across samples, were Propionibacterium , Aquabacterium , Ralstonia , and Acinetobacter . While the top five most frequently observed genera were Pseudomonas , Propionibacterium , Acinetobacter , Ralstonia , and Sphingomonas . The majority of the most frequently observed genera (high evenness) were associated with reagent or potential human contamination. Additionally, in DNA extraction blanks, we observed potential archaeal contaminants, including methanogens, which have not been discussed in previous contamination studies. Such contaminants would directly affect the interpretation of subsurface molecular studies, as methanogenesis is an important subsurface biogeochemical process. Utilizing previously identified contaminant genera, we found that ∼27% of the total dataset were identified as contaminant sequences that likely originate from DNA extraction and DNA cleanup methods. Thus, controls must be taken at every step of the collection and processing procedure when working with low biomass environments such as, but not limited to, portions of Earth's deep subsurface. Taken together, we stress that the CoDL dataset is an incredible resource for the broader research community interested in subsurface life, and steps to remove contamination derived sequences must be taken prior to using this dataset.

  13. Graphical classification of DNA sequences of HLA alleles by deep learning.

    PubMed

    Miyake, Jun; Kaneshita, Yuhei; Asatani, Satoshi; Tagawa, Seiichi; Niioka, Hirohiko; Hirano, Takashi

    2018-04-01

    Alleles of human leukocyte antigen (HLA)-A DNAs are classified and expressed graphically by using artificial intelligence "Deep Learning (Stacked autoencoder)". Nucleotide sequence data corresponding to the length of 822 bp, collected from the Immuno Polymorphism Database, were compressed to 2-dimensional representation and were plotted. Profiles of the two-dimensional plots indicate that the alleles can be classified as clusters are formed. The two-dimensional plot of HLA-A DNAs gives a clear outlook for characterizing the various alleles.

  14. Enhanced arbovirus surveillance with deep sequencing: Identification of novel rhabdoviruses and bunyaviruses in Australian mosquitoes.

    PubMed

    Coffey, Lark L; Page, Brady L; Greninger, Alexander L; Herring, Belinda L; Russell, Richard C; Doggett, Stephen L; Haniotis, John; Wang, Chunlin; Deng, Xutao; Delwart, Eric L

    2014-01-05

    Viral metagenomics characterizes known and identifies unknown viruses based on sequence similarities to any previously sequenced viral genomes. A metagenomics approach was used to identify virus sequences in Australian mosquitoes causing cytopathic effects in inoculated mammalian cell cultures. Sequence comparisons revealed strains of Liao Ning virus (Reovirus, Seadornavirus), previously detected only in China, livestock-infecting Stretch Lagoon virus (Reovirus, Orbivirus), two novel dimarhabdoviruses, named Beaumont and North Creek viruses, and two novel orthobunyaviruses, named Murrumbidgee and Salt Ash viruses. The novel virus proteomes diverged by ≥ 50% relative to their closest previously genetically characterized viral relatives. Deep sequencing also generated genomes of Warrego and Wallal viruses, orbiviruses linked to kangaroo blindness, whose genomes had not been fully characterized. This study highlights viral metagenomics in concert with traditional arbovirus surveillance to characterize known and new arboviruses in field-collected mosquitoes. Follow-up epidemiological studies are required to determine whether the novel viruses infect humans. © 2013 Elsevier Inc. All rights reserved.

  15. 3' terminal diversity of MRP RNA and other human noncoding RNAs revealed by deep sequencing.

    PubMed

    Goldfarb, Katherine C; Cech, Thomas R

    2013-09-21

    Post-transcriptional 3' end processing is a key component of RNA regulation. The abundant and essential RNA subunit of RNase MRP has been proposed to function in three distinct cellular compartments and therefore may utilize this mode of regulation. Here we employ 3' RACE coupled with high-throughput sequencing to characterize the 3' terminal sequences of human MRP RNA and other noncoding RNAs that form RNP complexes. The 3' terminal sequence of MRP RNA from HEK293T cells has a distinctive distribution of genomically encoded termini (including an assortment of U residues) with a portion of these selectively tagged by oligo(A) tails. This profile contrasts with the relatively homogenous 3' terminus of an in vitro transcribed MRP RNA control and the differing 3' terminal profiles of U3 snoRNA, RNase P RNA, and telomerase RNA (hTR). 3' RACE coupled with deep sequencing provides a valuable framework for the functional characterization of 3' terminal sequences of noncoding RNAs.

  16. Multiplicity and molecular epidemiology of Plasmodium vivax and Plasmodium falciparum infections in East Africa.

    PubMed

    Zhong, Daibin; Lo, Eugenia; Wang, Xiaoming; Yewhalaw, Delenasaw; Zhou, Guofa; Atieli, Harrysone E; Githeko, Andrew; Hemming-Schroeder, Elizabeth; Lee, Ming-Chieh; Afrane, Yaw; Yan, Guiyun

    2018-05-02

    Parasite genetic diversity and multiplicity of infection (MOI) affect clinical outcomes, response to drug treatment and naturally-acquired or vaccine-induced immunity. Traditional methods often underestimate the frequency and diversity of multiclonal infections due to technical sensitivity and specificity. Next-generation sequencing techniques provide a novel opportunity to study complexity of parasite populations and molecular epidemiology. Symptomatic and asymptomatic Plasmodium vivax samples were collected from health centres/hospitals and schools, respectively, from 2011 to 2015 in Ethiopia. Similarly, both symptomatic and asymptomatic Plasmodium falciparum samples were collected, respectively, from hospitals and schools in 2005 and 2015 in Kenya. Finger-pricked blood samples were collected and dried on filter paper. Long amplicon (> 400 bp) deep sequencing of merozoite surface protein 1 (msp1) gene was conducted to determine multiplicity and molecular epidemiology of P. vivax and P. falciparum infections. The results were compared with those based on short amplicon (117 bp) deep sequencing. A total of 139 P. vivax and 222 P. falciparum samples were pyro-sequenced for pvmsp1 and pfmsp1, yielding a total of 21 P. vivax and 99 P. falciparum predominant haplotypes. The average MOI for P. vivax and P. falciparum were 2.16 and 2.68, respectively, which were significantly higher than that of microsatellite markers and short amplicon (117 bp) deep sequencing. Multiclonal infections were detected in 62.2% of the samples for P. vivax and 74.8% of the samples for P. falciparum. Four out of the five subjects with recurrent P. vivax malaria were found to be a relapse 44-65 days after clearance of parasites. No difference was observed in MOI among P. vivax patients of different symptoms, ages and genders. Similar patterns were also observed in P. falciparum except for one study site in Kenyan lowland areas with significantly higher MOI. The study used a novel method to evaluate Plasmodium MOI and molecular epidemiological patterns by long amplicon ultra-deep sequencing. The complexity of infections were similar among age groups, symptoms, genders, transmission settings (spatial heterogeneity), as well as over years (pre- vs. post-scale-up interventions). This study demonstrated that long amplicon deep sequencing is a useful tool to investigate multiplicity and molecular epidemiology of Plasmodium parasite infections.

  17. Prediction of Tissue Outcome and Assessment of Treatment Effect in Acute Ischemic Stroke Using Deep Learning.

    PubMed

    Nielsen, Anne; Hansen, Mikkel Bo; Tietze, Anna; Mouridsen, Kim

    2018-06-01

    Treatment options for patients with acute ischemic stroke depend on the volume of salvageable tissue. This volume assessment is currently based on fixed thresholds and single imagine modalities, limiting accuracy. We wish to develop and validate a predictive model capable of automatically identifying and combining acute imaging features to accurately predict final lesion volume. Using acute magnetic resonance imaging, we developed and trained a deep convolutional neural network (CNN deep ) to predict final imaging outcome. A total of 222 patients were included, of which 187 were treated with rtPA (recombinant tissue-type plasminogen activator). The performance of CNN deep was compared with a shallow CNN based on the perfusion-weighted imaging biomarker Tmax (CNN Tmax ), a shallow CNN based on a combination of 9 different biomarkers (CNN shallow ), a generalized linear model, and thresholding of the diffusion-weighted imaging biomarker apparent diffusion coefficient (ADC) at 600×10 -6 mm 2 /s (ADC thres ). To assess whether CNN deep is capable of differentiating outcomes of ±intravenous rtPA, patients not receiving intravenous rtPA were included to train CNN deep, -rtpa to access a treatment effect. The networks' performances were evaluated using visual inspection, area under the receiver operating characteristic curve (AUC), and contrast. CNN deep yields significantly better performance in predicting final outcome (AUC=0.88±0.12) than generalized linear model (AUC=0.78±0.12; P =0.005), CNN Tmax (AUC=0.72±0.14; P <0.003), and ADC thres (AUC=0.66±0.13; P <0.0001) and a substantially better performance than CNN shallow (AUC=0.85±0.11; P =0.063). Measured by contrast, CNN deep improves the predictions significantly, showing superiority to all other methods ( P ≤0.003). CNN deep also seems to be able to differentiate outcomes based on treatment strategy with the volume of final infarct being significantly different ( P =0.048). The considerable prediction improvement accuracy over current state of the art increases the potential for automated decision support in providing recommendations for personalized treatment plans. © 2018 American Heart Association, Inc.

  18. Deep learning of mutation-gene-drug relations from the literature.

    PubMed

    Lee, Kyubum; Kim, Byounggun; Choi, Yonghwa; Kim, Sunkyu; Shin, Wonho; Lee, Sunwon; Park, Sungjoon; Kim, Seongsoon; Tan, Aik Choon; Kang, Jaewoo

    2018-01-25

    Molecular biomarkers that can predict drug efficacy in cancer patients are crucial components for the advancement of precision medicine. However, identifying these molecular biomarkers remains a laborious and challenging task. Next-generation sequencing of patients and preclinical models have increasingly led to the identification of novel gene-mutation-drug relations, and these results have been reported and published in the scientific literature. Here, we present two new computational methods that utilize all the PubMed articles as domain specific background knowledge to assist in the extraction and curation of gene-mutation-drug relations from the literature. The first method uses the Biomedical Entity Search Tool (BEST) scoring results as some of the features to train the machine learning classifiers. The second method uses not only the BEST scoring results, but also word vectors in a deep convolutional neural network model that are constructed from and trained on numerous documents such as PubMed abstracts and Google News articles. Using the features obtained from both the BEST search engine scores and word vectors, we extract mutation-gene and mutation-drug relations from the literature using machine learning classifiers such as random forest and deep convolutional neural networks. Our methods achieved better results compared with the state-of-the-art methods. We used our proposed features in a simple machine learning model, and obtained F1-scores of 0.96 and 0.82 for mutation-gene and mutation-drug relation classification, respectively. We also developed a deep learning classification model using convolutional neural networks, BEST scores, and the word embeddings that are pre-trained on PubMed or Google News data. Using deep learning, the classification accuracy improved, and F1-scores of 0.96 and 0.86 were obtained for the mutation-gene and mutation-drug relations, respectively. We believe that our computational methods described in this research could be used as an important tool in identifying molecular biomarkers that predict drug responses in cancer patients. We also built a database of these mutation-gene-drug relations that were extracted from all the PubMed abstracts. We believe that our database can prove to be a valuable resource for precision medicine researchers.

  19. Phylogeny of the beaked whale genus Mesoplodon (Ziphiidae: Cetacea) revealed by nuclear introns: implications for the evolution of male tusks.

    PubMed

    Dalebout, Merel L; Steel, Debbie; Baker, C Scott

    2008-12-01

    With 14 species currently recognized, the beaked whale genus Mesoplodon (family Ziphiidae) is the most speciose in the order Cetacea. Beaked whales are widely distributed but are rarely seen at sea due to their oceanic distribution, deep-diving capacity, and apparent low abundance. Morphological differentiation among Mesoplodon species is relatively limited, with the exception of tooth form in adult males. Based on scarring patterns, males appear to use their tusk-like teeth as weapons in aggressive encounters with other males. Females are effectively toothless. We used sequences from seven nuclear introns (3348 base pairs) to construct a robust and highly resolved phylogeny, which was then used as a framework to test predictions from four hypotheses seeking to explain patterns of Mesoplodon tusk morphology and/or the processes that have driven the diversification of this genus: (1) linear progression of tusk form; (2) allopatric speciation through isolation in adjacent deep-sea canyons; (3) sympatric speciation through sexual selection on tusks; and (4) selection for species-recognition cues. Maximum-likelihood and Bayesian reconstructions confirmed the monophyly of the genus and revealed that what were considered ancestral and derived tusk forms have in fact arisen independently on several occasions, contrary to predictions from the linear-progression hypothesis. Further, none of the three well-supported species clades was confined to a single ocean basin, as might have been expected from the deep-sea canyon-isolation or sexual-selection hypotheses, and some species with similar tusks have overlapping distributions, contrary to predictions from the species-recognition hypothesis. However, the divergent tusk forms and sympatric distributions of three of the four sister-species pairs identified suggest that sexual selection on male tusks has likely played an important role in this unique radiation, although other forces are clearly also involved. To our knowledge, this is the first time that sexual selection has been explicitly implicated in the radiation of a mammalian group outside terrestrial ungulates.

  20. Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome

    PubMed Central

    Margulies, Elliott H.; Cooper, Gregory M.; Asimenos, George; Thomas, Daryl J.; Dewey, Colin N.; Siepel, Adam; Birney, Ewan; Keefe, Damian; Schwartz, Ariel S.; Hou, Minmei; Taylor, James; Nikolaev, Sergey; Montoya-Burgos, Juan I.; Löytynoja, Ari; Whelan, Simon; Pardi, Fabio; Massingham, Tim; Brown, James B.; Bickel, Peter; Holmes, Ian; Mullikin, James C.; Ureta-Vidal, Abel; Paten, Benedict; Stone, Eric A.; Rosenbloom, Kate R.; Kent, W. James; Bouffard, Gerard G.; Guan, Xiaobin; Hansen, Nancy F.; Idol, Jacquelyn R.; Maduro, Valerie V.B.; Maskeri, Baishali; McDowell, Jennifer C.; Park, Morgan; Thomas, Pamela J.; Young, Alice C.; Blakesley, Robert W.; Muzny, Donna M.; Sodergren, Erica; Wheeler, David A.; Worley, Kim C.; Jiang, Huaiyang; Weinstock, George M.; Gibbs, Richard A.; Graves, Tina; Fulton, Robert; Mardis, Elaine R.; Wilson, Richard K.; Clamp, Michele; Cuff, James; Gnerre, Sante; Jaffe, David B.; Chang, Jean L.; Lindblad-Toh, Kerstin; Lander, Eric S.; Hinrichs, Angie; Trumbower, Heather; Clawson, Hiram; Zweig, Ann; Kuhn, Robert M.; Barber, Galt; Harte, Rachel; Karolchik, Donna; Field, Matthew A.; Moore, Richard A.; Matthewson, Carrie A.; Schein, Jacqueline E.; Marra, Marco A.; Antonarakis, Stylianos E.; Batzoglou, Serafim; Goldman, Nick; Hardison, Ross; Haussler, David; Miller, Webb; Pachter, Lior; Green, Eric D.; Sidow, Arend

    2007-01-01

    A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization. PMID:17567995

  1. Boosting compound-protein interaction prediction by deep learning.

    PubMed

    Tian, Kai; Shao, Mingyu; Wang, Yang; Guan, Jihong; Zhou, Shuigeng

    2016-11-01

    The identification of interactions between compounds and proteins plays an important role in network pharmacology and drug discovery. However, experimentally identifying compound-protein interactions (CPIs) is generally expensive and time-consuming, computational approaches are thus introduced. Among these, machine-learning based methods have achieved a considerable success. However, due to the nonlinear and imbalanced nature of biological data, many machine learning approaches have their own limitations. Recently, deep learning techniques show advantages over many state-of-the-art machine learning methods in some applications. In this study, we aim at improving the performance of CPI prediction based on deep learning, and propose a method called DL-CPI (the abbreviation of Deep Learning for Compound-Protein Interactions prediction), which employs deep neural network (DNN) to effectively learn the representations of compound-protein pairs. Extensive experiments show that DL-CPI can learn useful features of compound-protein pairs by a layerwise abstraction, and thus achieves better prediction performance than existing methods on both balanced and imbalanced datasets. Copyright © 2016 Elsevier Inc. All rights reserved.

  2. Predicting Fire Susceptibility in the Forests of Amazonia

    NASA Technical Reports Server (NTRS)

    Nepstad, Daniel C.; Brown, I. Foster; Setzer, Alberto

    2000-01-01

    Although fire is the single greatest threat to the ecological integrity of Amazon forests, our ability to predict the occurrence of Amazon forest fires is rudimentary. Part of the difficulty encountered in making such predictions is the remarkable capacity of Amazon forests to tolerate drought by tapping moisture stored in deep soil. These forests can avoid drought-induced leaf shedding by withdrawing moisture to depths of 8 meters and more. Hence, the absorption of deep soil moisture allows these forests to maintain their leaf canopies following droughts of several months duration, thereby maintaining the deep shade and high relative humidity of the forest interior that prevents these ecosystems from burning. But the drought- and fire-avoidance that is conferred by this deep-rooting phenomenon is not unlimited. During successive years of drought, such as those provoked by El Nino episodes, deep soil moisture can be depleted, and drought-induced leaf shedding begins. The goal of this project was to incorporate this knowledge of Amazon forest fire ecology into a predictive model of forest flammability.

  3. Complete genome sequence of a novel genotype of squash mosaic virus

    USDA-ARS?s Scientific Manuscript database

    Complete genome sequence of a novel genotype of Squash mosaic virus (SqMV) infecting squash plants in Spain was obtained using deep sequencing of small ribonucleic acids and assembly. The low nucleotide sequence identities, with 87-88% on RNA1 and 84-86% on RNA2 to known SqMV isolates, suggest a new...

  4. First complete genome sequence of an emerging cucumber green mottle mosaic virus isolate in North America

    USDA-ARS?s Scientific Manuscript database

    The complete genome sequence (6,423 nt) of an emerging Cucumber green mottle mosaic virus (CGMMV) isolate on cucumber in North America was determined through deep sequencing of sRNA and rapid amplification of cDNA ends. It shares 99% nucleotide sequence identity to the Asian genotype, but only 90% t...

  5. Analysis of hepatitis C NS5A resistance associated polymorphisms using ultra deep single molecule real time (SMRT) sequencing.

    PubMed

    Bergfors, Assar; Leenheer, Daniël; Bergqvist, Anders; Ameur, Adam; Lennerstrand, Johan

    2016-02-01

    Development of Hepatitis C virus (HCV) resistance against direct-acting antivirals (DAAs), including NS5A inhibitors, is an obstacle to successful treatment of HCV when DAAs are used in sub-optimal combinations. Furthermore, it has been shown that baseline (pre-existing) resistance against DAAs is present in treatment naïve-patients and this will potentially complicate future treatment strategies in different HCV genotypes (GTs). Thus the aim was to detect low levels of NS5A resistant associated variants (RAVs) in a limited sample set of treatment-naïve patients of HCV GT1a and 3a, since such polymorphisms can display in vitro resistance as high as 60000 fold. Ultra-deep single molecule real time (SMRT) sequencing with the Pacific Biosciences (PacBio) RSII instrument was used to detect these RAVs. The SMRT sequencing was conducted on ten samples; three of them positive with Sanger sequencing (GT1a Q30H and Y93N, and GT3a Y93H), five GT1a samples, and two GT3a non-positive samples. The same methods were applied to the HCV GT1a H77-plasmid in a dilution series, in order to determine the error rates of replication, which in turn was used to determine the limit of detection (LOD), as defined by mean + 3SD, of minority variants down to 0.24%. We found important baseline NS5A RAVs at levels between 0.24 and 0.5%, which could potentially have clinical relevance. This new method with low level detection of baseline RAVs could be useful in predicting the most cost-efficient combination of DAA treatment, and reduce the treatment duration for an HCV infected individual. Copyright © 2015 Elsevier B.V. All rights reserved.

  6. Deep Sequencing Reveals Spatially Distributed Distinct Hot Spot Mutations in DICER1-Related Multinodular Goiter.

    PubMed

    de Kock, Leanne; Bah, Ismaël; Revil, Timothée; Bérubé, Pierre; Wu, Mona K; Sabbaghian, Nelly; Priest, John R; Ragoussis, Jiannis; Foulkes, William D

    2016-10-01

    Nontoxic multinodular goiter (MNG) occurs frequently, but its genetic etiology is not well established. Familial MNG and MNG occurring with ovarian Sertoli-Leydig cell tumor are associated with germline DICER1 mutations. We recently identified second somatic DICER1 ribonuclease (RNase) IIIb mutations in two MNGs. The objective of the study was to investigate the occurrence of somatic DICER1 mutations and mutational clonality in MNG. MNGs from 15 patients (10 with and five without germline DICER1 mutations) were selected based on tissue availability. Core biopsies/scrapings (n = 70) were obtained, sampling areas of follicular hyperplasia, hyperplasia within colloid pools, unremarkable thyroid parenchyma, and areas of thyroid parenchyma, not classified. After capture with a Fluidigm access array, the coding sequence of DICER1 was deep sequenced using DNA from each core/scraping. All germline DICER1-mutated cases were found to harbor at least one RNase III mutation. Specifically, we identified 12 individually distinct DICER1 RNase IIIb hot spot mutations in 32 of the follicular hyperplasia or hyperplasia within colloid pools cores/scrapings. These mutations are predicted to affect the metal-ion binding residues at positions p.Glu1705, p.Asp1709, p.Gly1809, p.Asp1810, and p.Glu1813. Somatic RNase IIIb mutations were identified in the 10 DICER1 germline mutated MNGs as follows: two cases contained one somatic mutation, five cases contained two mutations, and three cases contained three distinct somatic hot spot mutations. No RNase IIIb mutations were identified in the MNGs from individuals without germline DICER1 mutations. This study demonstrates that nodules within MNG occurring in DICER1 syndrome are associated with spatially distributed somatic DICER1 RNase IIIb mutations.

  7. A Comprehensive Phylogenetic Analysis of the Scleractinia (Cnidaria, Anthozoa) Based on Mitochondrial CO1 Sequence Data

    PubMed Central

    Kitahara, Marcelo V.; Cairns, Stephen D.; Stolarski, Jarosław; Blair, David; Miller, David J.

    2010-01-01

    Background Classical morphological taxonomy places the approximately 1400 recognized species of Scleractinia (hard corals) into 27 families, but many aspects of coral evolution remain unclear despite the application of molecular phylogenetic methods. In part, this may be a consequence of such studies focusing on the reef-building (shallow water and zooxanthellate) Scleractinia, and largely ignoring the large number of deep-sea species. To better understand broad patterns of coral evolution, we generated molecular data for a broad and representative range of deep sea scleractinians collected off New Caledonia and Australia during the last decade, and conducted the most comprehensive molecular phylogenetic analysis to date of the order Scleractinia. Methodology Partial (595 bp) sequences of the mitochondrial cytochrome oxidase subunit 1 (CO1) gene were determined for 65 deep-sea (azooxanthellate) scleractinians and 11 shallow-water species. These new data were aligned with 158 published sequences, generating a 234 taxon dataset representing 25 of the 27 currently recognized scleractinian families. Principal Findings/Conclusions There was a striking discrepancy between the taxonomic validity of coral families consisting predominantly of deep-sea or shallow-water species. Most families composed predominantly of deep-sea azooxanthellate species were monophyletic in both maximum likelihood and Bayesian analyses but, by contrast (and consistent with previous studies), most families composed predominantly of shallow-water zooxanthellate taxa were polyphyletic, although Acroporidae, Poritidae, Pocilloporidae, and Fungiidae were exceptions to this general pattern. One factor contributing to this inconsistency may be the greater environmental stability of deep-sea environments, effectively removing taxonomic “noise” contributed by phenotypic plasticity. Our phylogenetic analyses imply that the most basal extant scleractinians are azooxanthellate solitary corals from deep-water, their divergence predating that of the robust and complex corals. Deep-sea corals are likely to be critical to understanding anthozoan evolution and the origins of the Scleractinia. PMID:20628613

  8. Comprehensive discovery of noncoding RNAs in acute myeloid leukemia cell transcriptomes.

    PubMed

    Zhang, Jin; Griffith, Malachi; Miller, Christopher A; Griffith, Obi L; Spencer, David H; Walker, Jason R; Magrini, Vincent; McGrath, Sean D; Ly, Amy; Helton, Nichole M; Trissal, Maria; Link, Daniel C; Dang, Ha X; Larson, David E; Kulkarni, Shashikant; Cordes, Matthew G; Fronick, Catrina C; Fulton, Robert S; Klco, Jeffery M; Mardis, Elaine R; Ley, Timothy J; Wilson, Richard K; Maher, Christopher A

    2017-11-01

    To detect diverse and novel RNA species comprehensively, we compared deep small RNA and RNA sequencing (RNA-seq) methods applied to a primary acute myeloid leukemia (AML) sample. We were able to discover previously unannotated small RNAs using deep sequencing of a library method using broader insert size selection. We analyzed the long noncoding RNA (lncRNA) landscape in AML by comparing deep sequencing from multiple RNA-seq library construction methods for the sample that we studied and then integrating RNA-seq data from 179 AML cases. This identified lncRNAs that are completely novel, differentially expressed, and associated with specific AML subtypes. Our study revealed the complexity of the noncoding RNA transcriptome through a combined strategy of strand-specific small RNA and total RNA-seq. This dataset will serve as an invaluable resource for future RNA-based analyses. Copyright © 2017 ISEH – Society for Hematology and Stem Cells. Published by Elsevier Inc. All rights reserved.

  9. Development of a candidate reference material for adventitious virus detection in vaccine and biologicals manufacturing by deep sequencing

    PubMed Central

    Mee, Edward T.; Preston, Mark D.; Minor, Philip D.; Schepelmann, Silke; Huang, Xuening; Nguyen, Jenny; Wall, David; Hargrove, Stacey; Fu, Thomas; Xu, George; Li, Li; Cote, Colette; Delwart, Eric; Li, Linlin; Hewlett, Indira; Simonyan, Vahan; Ragupathy, Viswanath; Alin, Voskanian-Kordi; Mermod, Nicolas; Hill, Christiane; Ottenwälder, Birgit; Richter, Daniel C.; Tehrani, Arman; Jacqueline, Weber-Lehmann; Cassart, Jean-Pol; Letellier, Carine; Vandeputte, Olivier; Ruelle, Jean-Louis; Deyati, Avisek; La Neve, Fabio; Modena, Chiara; Mee, Edward; Schepelmann, Silke; Preston, Mark; Minor, Philip; Eloit, Marc; Muth, Erika; Lamamy, Arnaud; Jagorel, Florence; Cheval, Justine; Anscombe, Catherine; Misra, Raju; Wooldridge, David; Gharbia, Saheer; Rose, Graham; Ng, Siemon H.S.; Charlebois, Robert L.; Gisonni-Lex, Lucy; Mallet, Laurent; Dorange, Fabien; Chiu, Charles; Naccache, Samia; Kellam, Paul; van der Hoek, Lia; Cotten, Matt; Mitchell, Christine; Baier, Brian S.; Sun, Wenping; Malicki, Heather D.

    2016-01-01

    Background Unbiased deep sequencing offers the potential for improved adventitious virus screening in vaccines and biotherapeutics. Successful implementation of such assays will require appropriate control materials to confirm assay performance and sensitivity. Methods A common reference material containing 25 target viruses was produced and 16 laboratories were invited to process it using their preferred adventitious virus detection assay. Results Fifteen laboratories returned results, obtained using a wide range of wet-lab and informatics methods. Six of 25 target viruses were detected by all laboratories, with the remaining viruses detected by 4–14 laboratories. Six non-target viruses were detected by three or more laboratories. Conclusion The study demonstrated that a wide range of methods are currently used for adventitious virus detection screening in biological products by deep sequencing and that they can yield significantly different results. This underscores the need for common reference materials to ensure satisfactory assay performance and enable comparisons between laboratories. PMID:26709640

  10. Intelligent fault diagnosis of rolling bearings using an improved deep recurrent neural network

    NASA Astrophysics Data System (ADS)

    Jiang, Hongkai; Li, Xingqiu; Shao, Haidong; Zhao, Ke

    2018-06-01

    Traditional intelligent fault diagnosis methods for rolling bearings heavily depend on manual feature extraction and feature selection. For this purpose, an intelligent deep learning method, named the improved deep recurrent neural network (DRNN), is proposed in this paper. Firstly, frequency spectrum sequences are used as inputs to reduce the input size and ensure good robustness. Secondly, DRNN is constructed by the stacks of the recurrent hidden layer to automatically extract the features from the input spectrum sequences. Thirdly, an adaptive learning rate is adopted to improve the training performance of the constructed DRNN. The proposed method is verified with experimental rolling bearing data, and the results confirm that the proposed method is more effective than traditional intelligent fault diagnosis methods.

  11. High-throughput sequencing identification and characterization of potentially adhesion-related small RNAs in Streptococcus mutans.

    PubMed

    Zhu, Wenhui; Liu, Shanshan; Liu, Jia; Zhou, Yan; Lin, Huancai

    2018-05-01

    Adherence capacity is one of the principal virulence factors of Streptococcus mutans, and adhesion virulence factors are controlled by small RNAs (sRNAs) at the post-transcriptional level in various bacteria. Here, we aimed to identify and decipher putative adhesion-related sRNAs in clinical strains of S. mutans. RNA deep-sequencing was performed to identify potential sRNAs under different adhesion conditions. The expression of sRNAs was analysed by quantitative real-time PCR (qRT-PCR), and bioinformatic methods were used to predict the functional characteristics of sRNAs. A total of 736 differentially expressed candidate sRNAs were predicted, and these included 352 sRNAs located on the antisense to mRNA (AM) and 384 sRNAs in intergenic regions (IGRs). The top 7 differentially expressed sRNAs were successfully validated by qRT-PCR in UA159, and 2 of these were further confirmed in 100 clinical isolates. Moreover, the sequences of two sRNAs were conserved in other Streptococcus species, indicating a conserved role in such closely related species. A good correlation between the expression of sRNAs and the adhesion of 100 clinical strains was observed, which, combined with GO and KEGG, provides a perspective for the comprehension of sRNA function annotation. This study revealed a multitude of novel putative adhesion-related sRNAs in S. mutans and contributed to a better understanding of information concerning the transcriptional regulation of adhesion in S. mutans.

  12. Molecular characterization of a gene POLR2H encoded an essential subunit for RNA polymerase II from the Giant Panda (Ailuropoda Melanoleuca).

    PubMed

    Du, Yu-Jie; Hou, Yi-Ling; Hou, Wan-Ru

    2013-02-01

    The Giant Panda is an endangered and valuable gene pool in genetic, its important functional gene POLR2H encodes an essential shared peptide H of RNA polymerases. The genomic DNA and cDNA sequences were cloned successfully for the first time from the Giant Panda (Ailuropoda melanoleuca) adopting touchdown-PCR and reverse transcription polymerase chain reaction (RT-PCR), respectively. The length of the genomic sequence of the Giant Panda is 3,285 bp, including five exons and four introns. The cDNA fragment cloned is 509 bp in length, containing an open reading frame of 453 bp encoding 150 amino acids. Alignment analysis indicated that both the cDNA and its deduced amino acid sequence were highly conserved. Protein structure prediction showed that there was one protein kinase C phosphorylation site, four casein kinase II phosphorylation sites and one amidation site in the POLR2H protein, further shaping advanced protein structure. The cDNA cloned was expressed in Escherichia coli, which indicated that POLR2H fusion with the N-terminally His-tagged form brought about the accumulation of an expected 20.5 kDa polypeptide in line with the predicted protein. On the basis of what has already been achieved in this study, further deep-in research will be conducted, which has great value in theory and practical significance.

  13. Position-specific binding of FUS to nascent RNA regulates mRNA length

    PubMed Central

    Masuda, Akio; Takeda, Jun-ichi; Okuno, Tatsuya; Okamoto, Takaaki; Ohkawara, Bisei; Ito, Mikako; Ishigaki, Shinsuke; Sobue, Gen

    2015-01-01

    More than half of all human genes produce prematurely terminated polyadenylated short mRNAs. However, the underlying mechanisms remain largely elusive. CLIP-seq (cross-linking immunoprecipitation [CLIP] combined with deep sequencing) of FUS (fused in sarcoma) in neuronal cells showed that FUS is frequently clustered around an alternative polyadenylation (APA) site of nascent RNA. ChIP-seq (chromatin immunoprecipitation [ChIP] combined with deep sequencing) of RNA polymerase II (RNAP II) demonstrated that FUS stalls RNAP II and prematurely terminates transcription. When an APA site is located upstream of an FUS cluster, FUS enhances polyadenylation by recruiting CPSF160 and up-regulates the alternative short transcript. In contrast, when an APA site is located downstream from an FUS cluster, polyadenylation is not activated, and the RNAP II-suppressing effect of FUS leads to down-regulation of the alternative short transcript. CAGE-seq (cap analysis of gene expression [CAGE] combined with deep sequencing) and PolyA-seq (a strand-specific and quantitative method for high-throughput sequencing of 3' ends of polyadenylated transcripts) revealed that position-specific regulation of mRNA lengths by FUS is operational in two-thirds of transcripts in neuronal cells, with enrichment in genes involved in synaptic activities. PMID:25995189

  14. Deep sequencing detects very-low-grade somatic mosaicism in the unaffected mother of siblings with nemaline myopathy.

    PubMed

    Miyatake, Satoko; Koshimizu, Eriko; Hayashi, Yukiko K; Miya, Kazushi; Shiina, Masaaki; Nakashima, Mitsuko; Tsurusaki, Yoshinori; Miyake, Noriko; Saitsu, Hirotomo; Ogata, Kazuhiro; Nishino, Ichizo; Matsumoto, Naomichi

    2014-07-01

    When an expected mutation in a particular disease-causing gene is not identified in a suspected carrier, it is usually assumed to be due to germline mosaicism. We report here very-low-grade somatic mosaicism in ACTA1 in an unaffected mother of two siblings affected with a neonatal form of nemaline myopathy. The mosaicism was detected by deep resequencing using a next-generation sequencer. We identified a novel heterozygous mutation in ACTA1, c.448A>G (p.Thr150Ala), in the affected siblings. Three-dimensional structural modeling suggested that this mutation may affect polymerization and/or actin's interactions with other proteins. In this family, we expected autosomal dominant inheritance with either parent demonstrating germline or somatic mosaicism. Sanger sequencing identified no mutation. However, further deep resequencing of this mutation on a next-generation sequencer identified very-low-grade somatic mosaicism in the mother: 0.4%, 1.1%, and 8.3% in the saliva, blood leukocytes, and nails, respectively. Our study demonstrates the possibility of very-low-grade somatic mosaicism in suspected carriers, rather than germline mosaicism. Copyright © 2014 Elsevier B.V. All rights reserved.

  15. Insights about minority HIV-1 strains in transmitted drug resistance mutation dynamics and disease progression.

    PubMed

    Leda, Ana Rachel; Hunter, James; Oliveira, Ursula Castro; Azevedo, Inacio Junqueira; Sucupira, Maria Cecilia Araripe; Diaz, Ricardo Sobhie

    2018-04-19

    The presence of minority transmitted drug resistance mutations was assessed using ultra-deep sequencing and correlated with disease progression among recently HIV-1-infected individuals from Brazil. Samples at baseline during recent infection and 1 year after the establishment of the infection were analysed. Viral RNA and proviral DNA from 25 individuals were subjected to ultra-deep sequencing of the reverse transcriptase and protease regions of HIV-1. Viral strains carrying transmitted drug resistance mutations were detected in 9 out of the 25 patients, for all major antiretroviral classes, ranging from one to five mutations per patient. Ultra-deep sequencing detected strains with frequencies as low as 1.6% and only strains with frequencies >20% were detected by population plasma sequencing (three patients). Transmitted drug resistance strains with frequencies <14.8% did not persist upon established infection. The presence of transmitted drug resistance mutations was negatively correlated with the viral load and with CD4+ T cell count decay. Transmitted drug resistance mutations representing small percentages of the viral population do not persist during infection because they are negatively selected in the first year after HIV-1 seroconversion.

  16. GenomeGems: evaluation of genetic variability from deep sequencing data

    PubMed Central

    2012-01-01

    Background Detection of disease-causing mutations using Deep Sequencing technologies possesses great challenges. In particular, organizing the great amount of sequences generated so that mutations, which might possibly be biologically relevant, are easily identified is a difficult task. Yet, for this assignment only limited automatic accessible tools exist. Findings We developed GenomeGems to gap this need by enabling the user to view and compare Single Nucleotide Polymorphisms (SNPs) from multiple datasets and to load the data onto the UCSC Genome Browser for an expanded and familiar visualization. As such, via automatic, clear and accessible presentation of processed Deep Sequencing data, our tool aims to facilitate ranking of genomic SNP calling. GenomeGems runs on a local Personal Computer (PC) and is freely available at http://www.tau.ac.il/~nshomron/GenomeGems. Conclusions GenomeGems enables researchers to identify potential disease-causing SNPs in an efficient manner. This enables rapid turnover of information and leads to further experimental SNP validation. The tool allows the user to compare and visualize SNPs from multiple experiments and to easily load SNP data onto the UCSC Genome browser for further detailed information. PMID:22748151

  17. Biology of a deep-water sea anemone (Anthozoa: Actiniidae) from eastern Canada: Spawning, development, and growth

    NASA Astrophysics Data System (ADS)

    Mercier, Annie; Baillon, Sandrine; Daly, Marymegan; Macrander, Jason; Hamel, Jean-François

    2017-03-01

    Knowledge of the general biology and reproductive ecology of deep-water species can help predict their resilience to environmental and anthropogenic disturbances. The present study centers on live specimens of a deep-water sea anemone which were collected at bathyal depths between 1100 and 1400 m and kept in a mesocosm for over 6 years. Morphology and DNA sequencing confirmed that the species belongs to the genus Urticina. Male and female (9-10 cm pedal disk diameter, 90 tentacles) spawned 4 years post collection, in early spring (March). Both sexes released gametes through the mouth. The negatively buoyant oocytes (550-600 μm in diameter) quickly settled on the rocks and soft sediments surrounding the female. Lecithotrophic embryonic and larval development occurred on the substratum. Fully developed planula larvae were detected after 17-21 days. Planulae started to crawl and swim around but remained demersal. Metamorphosis and settlement occurred after 30-35 days, exclusively on hard substrata and preferentially on undersurfaces. Offspring grew slowly, developing 8 tentacles after 5 months and 24 tentacles after 12 months (3-4 mm pedal disk diameter). After 2.6 years of growth, the captive-born sea anemones reached 12-16 mm in pedal disk diameter and possessed 48-54 tentacles.

  18. Feasibility of 3.0T pelvic MR imaging in the evaluation of endometriosis.

    PubMed

    Manganaro, L; Fierro, F; Tomei, A; Irimia, D; Lodise, P; Sergi, M E; Vinci, V; Sollazzo, P; Porpora, M G; Delfini, R; Vittori, G; Marini, M

    2012-06-01

    Endometriosis represents an important clinical problem in women of reproductive age with high impact on quality of life, work productivity and health care management. The aim of this study is to define the role of 3T magnetom system MRI in the evaluation of endometriosis. Forty-six women, with transvaginal (TV) ultrasound examination positive for endometriosis, with pelvic pain, or infertile underwent an MR 3.0T examination with the following protocol: T2 weighted FRFSE HR sequences, T2 weighted FRFSE HR CUBE 3D sequences, T1 w FSE sequences, LAVA-flex sequences. Pelvic anatomy, macroscopic endometriosis implants, deep endometriosis implants, fallopian tube involvement, adhesions presence, fluid effusion in Douglas pouch, uterus and kidney pathologies or anomalies associated and sacral nervous routes were considered by two radiologists in consensus. Laparoscopy was considered the gold standard. MRI imaging diagnosed deep endometriosis in 22/46 patients, endometriomas not associated to deep implants in 9/46 patients, 15/46 patients resulted negative for endometriosis, 11 of 22 patients with deep endometriosis reported ovarian endometriosis cyst. We obtained high percentages of sensibility (96.97%), specificity (100.00%), VPP (100.00%), VPN (92.86%). Pelvic MRI performed with 3T system guarantees high spatial and contrast resolution, providing accurate information about endometriosis implants, with a good pre-surgery mapping of the lesions involving both bowels and bladder surface and recto-uterine ligaments. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  19. Structured Semantic Knowledge Can Emerge Automatically from Predicting Word Sequences in Child-Directed Speech

    PubMed Central

    Huebner, Philip A.; Willits, Jon A.

    2018-01-01

    Previous research has suggested that distributional learning mechanisms may contribute to the acquisition of semantic knowledge. However, distributional learning mechanisms, statistical learning, and contemporary “deep learning” approaches have been criticized for being incapable of learning the kind of abstract and structured knowledge that many think is required for acquisition of semantic knowledge. In this paper, we show that recurrent neural networks, trained on noisy naturalistic speech to children, do in fact learn what appears to be abstract and structured knowledge. We trained two types of recurrent neural networks (Simple Recurrent Network, and Long Short-Term Memory) to predict word sequences in a 5-million-word corpus of speech directed to children ages 0–3 years old, and assessed what semantic knowledge they acquired. We found that learned internal representations are encoding various abstract grammatical and semantic features that are useful for predicting word sequences. Assessing the organization of semantic knowledge in terms of the similarity structure, we found evidence of emergent categorical and hierarchical structure in both models. We found that the Long Short-term Memory (LSTM) and SRN are both learning very similar kinds of representations, but the LSTM achieved higher levels of performance on a quantitative evaluation. We also trained a non-recurrent neural network, Skip-gram, on the same input to compare our results to the state-of-the-art in machine learning. We found that Skip-gram achieves relatively similar performance to the LSTM, but is representing words more in terms of thematic compared to taxonomic relations, and we provide reasons why this might be the case. Our findings show that a learning system that derives abstract, distributed representations for the purpose of predicting sequential dependencies in naturalistic language may provide insight into emergence of many properties of the developing semantic system. PMID:29520243

  20. Mixed heterolobosean and novel gregarine lineage genes from culture ATCC 50646: Long-branch artefacts, not lateral gene transfer, distort α-tubulin phylogeny.

    PubMed

    Cavalier-Smith, Thomas

    2015-04-01

    Contradictory and confusing results can arise if sequenced 'monoprotist' samples really contain DNA of very different species. Eukaryote-wide phylogenetic analyses using five genes from the amoeboflagellate culture ATCC 50646 previously implied it was an undescribed percolozoan related to percolatean flagellates (Stephanopogon, Percolomonas). Contrastingly, three phylogenetic analyses of 18S rRNA alone, did not place it within Percolozoa, but as an isolated deep-branching excavate. I resolve that contradiction by sequence phylogenies for all five genes individually, using up to 652 taxa. Its 18S rRNA sequence (GQ377652) is near-identical to one from stained-glass windows, somewhat more distant from one from cooling-tower water, all three related to terrestrial actinocephalid gregarines Hoplorhynchus and Pyxinia. All four protein-gene sequences (Hsp90; α-tubulin; β-tubulin; actin) are from an amoeboflagellate heterolobosean percolozoan, not especially deeply branching. Contrary to previous conclusions from trees combining protein and rRNA sequences or rDNA trees including Eozoa only, this culture does not represent a major novel deep-branching eukaryote lineage distinct from Heterolobosea, and thus lacks special significance for deep eukaryote phylogeny, though the rDNA sequence is important for gregarine phylogeny. α-Tubulin trees for over 250 eukaryotes refute earlier suggestions of lateral gene transfer within eukaryotes, being largely congruent with morphology and other gene trees. Copyright © 2015. Published by Elsevier GmbH.

  1. Species-energy relationship in the deep sea: A test using the Quaternary fossil record

    USGS Publications Warehouse

    Hunt, G.; Cronin, T. M.; Roy, K.

    2005-01-01

    Little is known about the processes regulating species richness in deep-sea communities. Here we take advantage of natural experiments involving climate change to test whether predictions of the species-energy hypothesis hold in the deep sea. In addition, we test for the relationship between temperature and species richness predicted by a recent model based on biochemical kinetics of metabolism. Using the deep-sea fossil record of benthic foraminifera and statistical meta-analyses of temperature-richness and productivity-richness relationships in 10 deep-sea cores, we show that temperature but not productivity is a significant predictor of species richness over the past c. 130 000 years. Our results not only show that the temperature-richness relationship in the deep-sea is remarkably similar to that found in terrestrial and shallow marine habitats, but also that species richness tracks temperature change over geological time, at least on scales of c. 100 000 years. Thus, predicting biotic response to global climate change in the deep sea would require better understanding of how temperature regulates the occurrences and geographical ranges of species. ??2005 Blackwell Publishing Ltd/CNRS.

  2. EP-DNN: A Deep Neural Network-Based Global Enhancer Prediction Algorithm.

    PubMed

    Kim, Seong Gon; Harwani, Mrudul; Grama, Ananth; Chaterji, Somali

    2016-12-08

    We present EP-DNN, a protocol for predicting enhancers based on chromatin features, in different cell types. Specifically, we use a deep neural network (DNN)-based architecture to extract enhancer signatures in a representative human embryonic stem cell type (H1) and a differentiated lung cell type (IMR90). We train EP-DNN using p300 binding sites, as enhancers, and TSS and random non-DHS sites, as non-enhancers. We perform same-cell and cross-cell predictions to quantify the validation rate and compare against two state-of-the-art methods, DEEP-ENCODE and RFECS. We find that EP-DNN has superior accuracy with a validation rate of 91.6%, relative to 85.3% for DEEP-ENCODE and 85.5% for RFECS, for a given number of enhancer predictions and also scales better for a larger number of enhancer predictions. Moreover, our H1 → IMR90 predictions turn out to be more accurate than IMR90 → IMR90, potentially because H1 exhibits a richer signature set and our EP-DNN model is expressive enough to extract these subtleties. Our work shows how to leverage the full expressivity of deep learning models, using multiple hidden layers, while avoiding overfitting on the training data. We also lay the foundation for exploration of cross-cell enhancer predictions, potentially reducing the need for expensive experimentation.

  3. EP-DNN: A Deep Neural Network-Based Global Enhancer Prediction Algorithm

    NASA Astrophysics Data System (ADS)

    Kim, Seong Gon; Harwani, Mrudul; Grama, Ananth; Chaterji, Somali

    2016-12-01

    We present EP-DNN, a protocol for predicting enhancers based on chromatin features, in different cell types. Specifically, we use a deep neural network (DNN)-based architecture to extract enhancer signatures in a representative human embryonic stem cell type (H1) and a differentiated lung cell type (IMR90). We train EP-DNN using p300 binding sites, as enhancers, and TSS and random non-DHS sites, as non-enhancers. We perform same-cell and cross-cell predictions to quantify the validation rate and compare against two state-of-the-art methods, DEEP-ENCODE and RFECS. We find that EP-DNN has superior accuracy with a validation rate of 91.6%, relative to 85.3% for DEEP-ENCODE and 85.5% for RFECS, for a given number of enhancer predictions and also scales better for a larger number of enhancer predictions. Moreover, our H1 → IMR90 predictions turn out to be more accurate than IMR90 → IMR90, potentially because H1 exhibits a richer signature set and our EP-DNN model is expressive enough to extract these subtleties. Our work shows how to leverage the full expressivity of deep learning models, using multiple hidden layers, while avoiding overfitting on the training data. We also lay the foundation for exploration of cross-cell enhancer predictions, potentially reducing the need for expensive experimentation.

  4. Endometrial Cancer: Combined MR Volumetry and Diffusion-weighted Imaging for Assessment of Myometrial and Lymphovascular Invasion and Tumor Grade

    PubMed Central

    Reinhold, Caroline; Alsharif, Shaza S.; Addley, Helen; Arceneau, Jocelyne; Molinari, Nicolas; Guiu, Boris; Sala, Evis

    2015-01-01

    Purpose To investigate magnetic resonance (MR) volumetry of endometrial tumors and its association with deep myometrial invasion, tumor grade, and lymphovascular invasion and to assess the value of apparent diffusion coefficient (ADC) histographic analysis of the whole tumor volume for prediction of tumor grade and lymphovascular invasion. Materials and Methods The institutional review board approved this retrospective study; patient consent was not required. Between May 2010 and May 2012, 70 women (mean age, 64 years; range, 24–91 years) with endometrial cancer underwent preoperative MR imaging, including axial oblique and sagittal T2-weighted, dynamic contrast material–enhanced, and diffusion-weighted imaging. Volumetry of the tumor and uterus was performed during the six sequences, with manual tracing of each section, and the tumor volume ratio (TVR) was calculated. ADC histograms were generated from pixel ADCs from the whole tumor volume. The threshold of TVR associated with myometrial invasion was assessed by using receiver operating characteristic curves. An independent sample Mann Whitney U test was used to compare differences in ADCs, skewness, and kurtosis between tumor grade and the presence of lymphovascular invasion. Results No significant difference in tumor volume and TVR was found among the six MR imaging sequences (P = .95 and .86, respectively). A TVR greater than or equal to 25% allowed prediction of deep myometrial invasion with sensitivity of 100% and specificity of 93% (area under the curve, 0.96; 95% confidence interval: 0.86, 0.99) at axial oblique diffusion-weighted imaging. A TVR of greater than or equal to 25% was associated with grade 3 tumors (P = .0007) and with lymphovascular invasion (P < .0001). There was no significant difference in the ADCs between grades 1 and 2 tumors (P > .05). The minimum, 10th, 25th, 50th, 75th, and 90th percentile ADCs were significantly lower in grade 3 tumors than in grades 1 and 2 tumors (P < .02). Conclusion The combination of whole tumor volume and ADC can be used for prediction of tumor grade, lymphovascular invasion, and depth of myometrial invasion. © RSNA, 2015 PMID:25928157

  5. Learning better deep features for the prediction of occult invasive disease in ductal carcinoma in situ through transfer learning

    NASA Astrophysics Data System (ADS)

    Shi, Bibo; Hou, Rui; Mazurowski, Maciej A.; Grimm, Lars J.; Ren, Yinhao; Marks, Jeffrey R.; King, Lorraine M.; Maley, Carlo C.; Hwang, E. Shelley; Lo, Joseph Y.

    2018-02-01

    Purpose: To determine whether domain transfer learning can improve the performance of deep features extracted from digital mammograms using a pre-trained deep convolutional neural network (CNN) in the prediction of occult invasive disease for patients with ductal carcinoma in situ (DCIS) on core needle biopsy. Method: In this study, we collected digital mammography magnification views for 140 patients with DCIS at biopsy, 35 of which were subsequently upstaged to invasive cancer. We utilized a deep CNN model that was pre-trained on two natural image data sets (ImageNet and DTD) and one mammographic data set (INbreast) as the feature extractor, hypothesizing that these data sets are increasingly more similar to our target task and will lead to better representations of deep features to describe DCIS lesions. Through a statistical pooling strategy, three sets of deep features were extracted using the CNNs at different levels of convolutional layers from the lesion areas. A logistic regression classifier was then trained to predict which tumors contain occult invasive disease. The generalization performance was assessed and compared using repeated random sub-sampling validation and receiver operating characteristic (ROC) curve analysis. Result: The best performance of deep features was from CNN model pre-trained on INbreast, and the proposed classifier using this set of deep features was able to achieve a median classification performance of ROC-AUC equal to 0.75, which is significantly better (p<=0.05) than the performance of deep features extracted using ImageNet data set (ROCAUC = 0.68). Conclusion: Transfer learning is helpful for learning a better representation of deep features, and improves the prediction of occult invasive disease in DCIS.

  6. Permeability Surface of Deep Middle Cerebral Artery Territory on Computed Tomographic Perfusion Predicts Hemorrhagic Transformation After Stroke.

    PubMed

    Li, Qiao; Gao, Xinyi; Yao, Zhenwei; Feng, Xiaoyuan; He, Huijin; Xue, Jing; Gao, Peiyi; Yang, Lumeng; Cheng, Xin; Chen, Weijian; Yang, Yunjun

    2017-09-01

    Permeability surface (PS) on computed tomographic perfusion reflects blood-brain barrier permeability and is related to hemorrhagic transformation (HT). HT of deep middle cerebral artery (MCA) territory can occur after recanalization of proximal large-vessel occlusion. We aimed to determine the relationship between HT and PS of deep MCA territory. We retrospectively reviewed 70 consecutive acute ischemic stroke patients presenting with occlusion of the distal internal carotid artery or M1 segment of the MCA. All patients underwent computed tomographic perfusion within 6 hours after symptom onset. Computed tomographic perfusion data were postprocessed to generate maps of different perfusion parameters. Risk factors were identified for increased deep MCA territory PS. Receiver operating characteristic curve analysis was performed to calculate the optimal PS threshold to predict HT of deep MCA territory. Increased PS was associated with HT of deep MCA territory. After adjustments for age, sex, onset time to computed tomographic perfusion, and baseline National Institutes of Health Stroke Scale, poor collateral status (odds ratio, 7.8; 95% confidence interval, 1.67-37.14; P =0.009) and proximal MCA-M1 occlusion (odds ratio, 4.12; 95% confidence interval, 1.03-16.52; P =0.045) were independently associated with increased deep MCA territory PS. Relative PS most accurately predicted HT of deep MCA territory (area under curve, 0.94; optimal threshold, 2.89). Increased PS can predict HT of deep MCA territory after recanalization therapy for cerebral proximal large-vessel occlusion. Proximal MCA-M1 complete occlusion and distal internal carotid artery occlusion in conjunction with poor collaterals elevate deep MCA territory PS. © 2017 American Heart Association, Inc.

  7. Inferring deep-brain activity from cortical activity using functional near-infrared spectroscopy

    PubMed Central

    Liu, Ning; Cui, Xu; Bryant, Daniel M.; Glover, Gary H.; Reiss, Allan L.

    2015-01-01

    Functional near-infrared spectroscopy (fNIRS) is an increasingly popular technology for studying brain function because it is non-invasive, non-irradiating and relatively inexpensive. Further, fNIRS potentially allows measurement of hemodynamic activity with high temporal resolution (milliseconds) and in naturalistic settings. However, in comparison with other imaging modalities, namely fMRI, fNIRS has a significant drawback: limited sensitivity to hemodynamic changes in deep-brain regions. To overcome this limitation, we developed a computational method to infer deep-brain activity using fNIRS measurements of cortical activity. Using simultaneous fNIRS and fMRI, we measured brain activity in 17 participants as they completed three cognitive tasks. A support vector regression (SVR) learning algorithm was used to predict activity in twelve deep-brain regions using information from surface fNIRS measurements. We compared these predictions against actual fMRI-measured activity using Pearson’s correlation to quantify prediction performance. To provide a benchmark for comparison, we also used fMRI measurements of cortical activity to infer deep-brain activity. When using fMRI-measured activity from the entire cortex, we were able to predict deep-brain activity in the fusiform cortex with an average correlation coefficient of 0.80 and in all deep-brain regions with an average correlation coefficient of 0.67. The top 15% of predictions using fNIRS signal achieved an accuracy of 0.7. To our knowledge, this study is the first to investigate the feasibility of using cortical activity to infer deep-brain activity. This new method has the potential to extend fNIRS applications in cognitive and clinical neuroscience research. PMID:25798327

  8. DSAP: deep-sequencing small RNA analysis pipeline.

    PubMed

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.

  9. Construction of Pseudomolecule Sequences of the aus Rice Cultivar Kasalath for Comparative Genomics of Asian Cultivated Rice

    PubMed Central

    Sakai, Hiroaki; Kanamori, Hiroyuki; Arai-Kichise, Yuko; Shibata-Hatta, Mari; Ebana, Kaworu; Oono, Youko; Kurita, Kanako; Fujisawa, Hiroko; Katagiri, Satoshi; Mukai, Yoshiyuki; Hamada, Masao; Itoh, Takeshi; Matsumoto, Takashi; Katayose, Yuichi; Wakasa, Kyo; Yano, Masahiro; Wu, Jianzhong

    2014-01-01

    Having a deep genetic structure evolved during its domestication and adaptation, the Asian cultivated rice (Oryza sativa) displays considerable physiological and morphological variations. Here, we describe deep whole-genome sequencing of the aus rice cultivar Kasalath by using the advanced next-generation sequencing (NGS) technologies to gain a better understanding of the sequence and structural changes among highly differentiated cultivars. The de novo assembled Kasalath sequences represented 91.1% (330.55 Mb) of the genome and contained 35 139 expressed loci annotated by RNA-Seq analysis. We detected 2 787 250 single-nucleotide polymorphisms (SNPs) and 7393 large insertion/deletion (indel) sites (>100 bp) between Kasalath and Nipponbare, and 2 216 251 SNPs and 3780 large indels between Kasalath and 93-11. Extensive comparison of the gene contents among these cultivars revealed similar rates of gene gain and loss. We detected at least 7.39 Mb of inserted sequences and 40.75 Mb of unmapped sequences in the Kasalath genome in comparison with the Nipponbare reference genome. Mapping of the publicly available NGS short reads from 50 rice accessions proved the necessity and the value of using the Kasalath whole-genome sequence as an additional reference to capture the sequence polymorphisms that cannot be discovered by using the Nipponbare sequence alone. PMID:24578372

  10. To master or perform? Exploring relations between achievement goals and conceptual change learning.

    PubMed

    Ranellucci, John; Muis, Krista R; Duffy, Melissa; Wang, Xihui; Sampasivam, Lavanya; Franco, Gina M

    2013-09-01

    Research is needed to explore conceptual change in relation to achievement goal orientations and depth of processing. To address this need, we examined relations between achievement goals, use of deep versus shallow processing strategies, and conceptual change learning using a think-aloud protocol. Seventy-three undergraduate students were assessed on their prior knowledge and misconceptions about Newtonian mechanics, and then reported their achievement goals and participated in think-aloud protocols while reading Newtonian physics texts. A mastery-approach goal orientation positively predicted deep processing strategies, shallow processing strategies, and conceptual change. In contrast, a performance-approach goal orientation did not predict either of the processing strategies, but negatively predicted conceptual change. A performance-avoidance goal orientation negatively predicted deep processing strategies and conceptual change. Moreover, deep and shallow processing strategies positively predicted conceptual change as well as recall. Finally, both deep and shallow processing strategies mediated relations between mastery-approach goals and conceptual change. Results provide some support for Dole and Sinatra's (1998) Cognitive Reconstruction of Knowledge Model of conceptual change but also challenge specific facets with regard to the role of depth of processing in conceptual change. © 2012 The British Psychological Society.

  11. Linking Sediment Microbial Communities to Carbon Cycling in High-Latitude Lakes

    NASA Astrophysics Data System (ADS)

    Emerson, J. B.; Varner, R. K.; Johnson, J. E.; Owusu-Dommey, A.; Binder, M.; Woodcroft, B. J.; Wik, M.; Freitas, N. L.; Boyd, J. A.; Crill, P. M.; Saleska, S. R.; Tyson, G. W.; Rich, V. I.

    2015-12-01

    It is well recognized that thawing permafrost peatlands are likely to provide a positive feedback to climate change via CH4 and CO2 emissions. High-latitude lakes in these landscapes have also been identified as sources of CH4 and CO2 loss to the atmosphere. To investigate microbial contributions to carbon loss from high-latitude lakes, we characterized sediment geochemistry and microbiota via cores collected from deep and shallow regions of two lakes (Inre Harrsjön and Mellersta Harrsjön) in Arctic Sweden in July, 2012. These lakes are within the Stordalen Mire long-term ecological area, a focal site for investigating the impacts of climate change-related permafrost thaw, and the lakes in this area are responsible for ~55% of the CH4 loss from this hydrologically interconnected system. Across 40 samples from 4 to 40 cm deep within four sediment cores, Illumina 16S rRNA gene sequencing revealed that the sedimentary microbiota was dominated by candidate phyla OP9 and OP8 (Atribacteria and Aminicenantes, respectively, including putative fermenters and anaerobic respirers), predicted methanotrophic Gammaproteobacteria, and predicted methanogenic archaea from the Thermoplasmata Group E2 clade. We observed some overlap in community structure with nearby peatlands, which tend to be dominated by methanogens and Acidobacteria. Sediment microbial communities differed significantly between lakes, by overlying lake depth (shallow vs. deep), and by depth within a core, with each trend corresponding to parallel differences in biogeochemical measurements. Overall, our results support the potential for significant microbial controls on carbon cycling in high-latitude lakes associated with thawing permafrost, and ongoing metagenomic analyses of focal samples will yield further insight into the functional potential of these microbial communities and their dominant members.

  12. DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations.

    PubMed

    Yuan, Yuchen; Shi, Yi; Li, Changyang; Kim, Jinman; Cai, Weidong; Han, Zeguang; Feng, David Dagan

    2016-12-23

    With the developments of DNA sequencing technology, large amounts of sequencing data have become available in recent years and provide unprecedented opportunities for advanced association studies between somatic point mutations and cancer types/subtypes, which may contribute to more accurate somatic point mutation based cancer classification (SMCC). However in existing SMCC methods, issues like high data sparsity, small volume of sample size, and the application of simple linear classifiers, are major obstacles in improving the classification performance. To address the obstacles in existing SMCC studies, we propose DeepGene, an advanced deep neural network (DNN) based classifier, that consists of three steps: firstly, the clustered gene filtering (CGF) concentrates the gene data by mutation occurrence frequency, filtering out the majority of irrelevant genes; secondly, the indexed sparsity reduction (ISR) converts the gene data into indexes of its non-zero elements, thereby significantly suppressing the impact of data sparsity; finally, the data after CGF and ISR is fed into a DNN classifier, which extracts high-level features for accurate classification. Experimental results on our curated TCGA-DeepGene dataset, which is a reformulated subset of the TCGA dataset containing 12 selected types of cancer, show that CGF, ISR and DNN all contribute in improving the overall classification performance. We further compare DeepGene with three widely adopted classifiers and demonstrate that DeepGene has at least 24% performance improvement in terms of testing accuracy. Based on deep learning and somatic point mutation data, we devise DeepGene, an advanced cancer type classifier, which addresses the obstacles in existing SMCC studies. Experiments indicate that DeepGene outperforms three widely adopted existing classifiers, which is mainly attributed to its deep learning module that is able to extract the high level features between combinatorial somatic point mutations and cancer types.

  13. Pitfalls of CT for deep neck abscess imaging assessment: a retrospective review of 162 cases.

    PubMed

    Chuang, S Y; Lin, H T; Wen, Y S; Hsu, F J

    2013-01-01

    To investigate the diagnostic value of contrast-enhanced computed tomography (CT) for the prediction of deep neck abscesses in different deep neck spaces and to evaluate the false-positive results. We retrospectively analysed the clinical charts, CT examinations, surgical findings, bacteriology, pathological examinations and complications of hospitalised patients with a diagnosis of deep neck abscess from 2004 to 2010. The positive predictive values (PPV) for the prediction of abscesses by CT scan in different deep neck spaces were calculated individually on the basis of surgical findings. A total of 162 patients were included in this study. All patients received both intravenous antibiotics and surgical drainage. The parapharyngeal space was the most commonly involved space. The overall PPV for the prediction of deep neck abscess with contrast-enhanced CT was 79.6%. The PPV was 91.3% when more than one deep neck space was involved but only 50.0% in patients with isolated retropharyngeal abscesses. In the false-positive group, cellulitis was the most common final result, followed by cystic degeneration of cervical metastases. Five specimens taken intra-operatively revealed malignancy and four of these were not infected. There are some limitations affecting the differentiation of abscesses and cellulitis, particularly in the retropharyngeal space. A central necrotic cervical metastatic lymph node may sometimes also mimic a simple pyogenic deep neck abscess on both clinical pictures and CT images. Routine biopsy of the tissue must be performed during surgical drainage.

  14. Living microbial ecosystems within the active zone of catagenesis: Implications for feeding the deep biosphere

    NASA Astrophysics Data System (ADS)

    Horsfield, B.; Schenk, H. J.; Zink, K.; Ondrak, R.; Dieckmann, V.; Kallmeyer, J.; Mangelsdorf, K.; di Primio, R.; Wilkes, H.; Parkes, R. J.; Fry, J.; Cragg, B.

    2006-06-01

    Earth's largest reactive carbon pool, marine sedimentary organic matter, becomes increasingly recalcitrant during burial, making it almost inaccessible as a substrate for microorganisms, and thereby limiting metabolic activity in the deep biosphere. Because elevated temperature acting over geological time leads to the massive thermal breakdown of the organic matter into volatiles, including petroleum, the question arises whether microorganisms can directly utilize these maturation products as a substrate. While migrated thermogenic fluids are known to sustain microbial consortia in shallow sediments, an in situ coupling of abiotic generation and microbial utilization has not been demonstrated. Here we show, using a combination of basin modelling, kinetic modelling, geomicrobiology and biogeochemistry, that microorganisms inhabit the active generation zone in the Nankai Trough, offshore Japan. Three sites from ODP Leg 190 have been evaluated, namely 1173, 1174 and 1177, drilled in nearly undeformed Quaternary and Tertiary sedimentary sequences seaward of the Nankai Trough itself. Paleotemperatures were reconstructed based on subsidence profiles, compaction modelling, present-day heat flow, downhole temperature measurements and organic maturity parameters. Today's heat flow distribution can be considered mainly conductive, and is extremely high in places, reaching 180 mW/m 2. The kinetic parameters describing total hydrocarbon generation, determined by laboratory pyrolysis experiments, were utilized by the model in order to predict the timing of generation in time and space. The model predicts that the onset of present day generation lies between 300 and 500 m below sea floor (5100-5300 m below mean sea level), depending on well location. In the case of Site 1174, 5-10% conversion has taken place by a present day temperature of ca. 85 °C. Predictions were largely validated by on-site hydrocarbon gas measurements. Viable organisms in the same depth range have been proven using 14C-radiolabelled substrates for methanogenesis, bacterial cell counts and intact phospholipids. Altogether, these results point to an overlap of abiotic thermal degradation reactions going on in the same part of the sedimentary column as where a deep biosphere exists. The organic matter preserved in Nankai Trough sediments is of the type that generates putative feedstocks for microbial activity, namely oxygenated compounds and hydrocarbons. Furthermore, the rates of thermal degradation calculated from the kinetic model closely resemble rates of respiration and electron donor consumption independently measured in other deep biosphere environments. We deduce that abiotically driven degradation reactions have provided substrates for microbial activity in deep sediments at this convergent continental margin.

  15. Phylogenetic and Genome-Wide Deep-Sequencing Analyses of Canine Parvovirus Reveal Co-Infection with Field Variants and Emergence of a Recent Recombinant Strain

    PubMed Central

    Pérez, Ruben; Calleros, Lucía; Marandino, Ana; Sarute, Nicolás; Iraola, Gregorio; Grecco, Sofia; Blanc, Hervé; Vignuzzi, Marco; Isakov, Ofer; Shomron, Noam; Carrau, Lucía; Hernández, Martín; Francia, Lourdes; Sosa, Katia; Tomás, Gonzalo; Panzera, Yanina

    2014-01-01

    Canine parvovirus (CPV), a fast-evolving single-stranded DNA virus, comprises three antigenic variants (2a, 2b, and 2c) with different frequencies and genetic variability among countries. The contribution of co-infection and recombination to the genetic variability of CPV is far from being fully elucidated. Here we took advantage of a natural CPV population, recently formed by the convergence of divergent CPV-2c and CPV-2a strains, to study co-infection and recombination. Complete sequences of the viral coding region of CPV-2a and CPV-2c strains from 40 samples were generated and analyzed using phylogenetic tools. Two samples showed co-infection and were further analyzed by deep sequencing. The sequence profile of one of the samples revealed the presence of CPV-2c and CPV-2a strains that differed at 29 nucleotides. The other sample included a minor CPV-2a strain (13.3% of the viral population) and a major recombinant strain (86.7%). The recombinant strain arose from inter-genotypic recombination between CPV-2c and CPV-2a strains within the VP1/VP2 gene boundary. Our findings highlight the importance of deep-sequencing analysis to provide a better understanding of CPV molecular diversity. PMID:25365348

  16. Deep-sequencing to resolve complex diversity of apicomplexan parasites in platypuses and echidnas: Proof of principle for wildlife disease investigation.

    PubMed

    Šlapeta, Jan; Saverimuttu, Stefan; Vogelnest, Larry; Sangster, Cheryl; Hulst, Frances; Rose, Karrie; Thompson, Paul; Whittington, Richard

    2017-11-01

    The short-beaked echidna (Tachyglossus aculeatus) and the platypus (Ornithorhynchus anatinus) are iconic egg-laying monotremes (Mammalia: Monotremata) from Australasia. The aim of this study was to demonstrate the utility of diversity profiles in disease investigations of monotremes. Using small subunit (18S) rDNA amplicon deep-sequencing we demonstrated the presence of apicomplexan parasites and confirmed by direct and cloned amplicon gene sequencing Theileria ornithorhynchi, Theileria tachyglossi, Eimeria echidnae and Cryptosporidium fayeri. Using a combination of samples from healthy and diseased animals, we show a close evolutionary relationship between species of coccidia (Eimeria) and piroplasms (Theileria) from the echidna and platypus. The presence of E. echidnae was demonstrated in faeces and tissues affected by disseminated coccidiosis. Moreover, the presence of E. echidnae DNA in the blood of echidnas was associated with atoxoplasma-like stages in white blood cells, suggesting Hepatozoon tachyglossi blood stages are disseminated E. echidnae stages. These next-generation DNA sequencing technologies are suited to material and organisms that have not been previously characterised and for which the material is scarce. The deep sequencing approach supports traditional diagnostic methods, including microscopy, clinical pathology and histopathology, to better define the status quo. This approach is particularly suitable for wildlife disease investigation. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. Evidence for thermal convection in the deep carbonate aquifer of the eastern sector of the Po Plain, Italy

    NASA Astrophysics Data System (ADS)

    Pasquale, V.; Chiozzi, P.; Verdoya, M.

    2013-05-01

    Temperatures recorded in wells as deep as 6 km drilled for hydrocarbon prospecting were used together with geological information to depict the thermal regime of the sedimentary sequence of the eastern sector of the Po Plain. After correction for drilling disturbance, temperature data were analyzed through an inversion technique based on a laterally constant thermal gradient model. The obtained thermal gradient is quite low within the deep carbonate unit (14 mK m- 1), while it is larger (53 mK m- 1) in the overlying impermeable formations. In the uppermost sedimentary layers, the thermal gradient is close to the regional average (21 mK m- 1). We argue that such a vertical change cannot be ascribed to thermal conductivity variation within the sedimentary sequence, but to deep groundwater flow. Since the hydrogeological characteristics (including litho-stratigraphic sequence and structural setting) hardly permit forced convection, we suggest that thermal convection might occur within the deep carbonate aquifer. The potential of this mechanism was evaluated by means of the Rayleigh number analysis. It turned out that permeability required for convection to occur must be larger than 3 10- 15 m2. The average over-heat ratio is 0.45. The lateral variation of hydrothermal regime was tested by using temperature data representing the aquifer thermal conditions. We found that thermal convection might be more developed and variable at the Ferrara High and its surroundings, where widespread fracturing may have increased permeability.

  18. De Novo Deep Transcriptome Analysis of Medicinal Plants for Gene Discovery in Biosynthesis of Plant Natural Products.

    PubMed

    Han, R; Rai, A; Nakamura, M; Suzuki, H; Takahashi, H; Yamazaki, M; Saito, K

    2016-01-01

    Study on transcriptome, the entire pool of transcripts in an organism or single cells at certain physiological or pathological stage, is indispensable in unraveling the connection and regulation between DNA and protein. Before the advent of deep sequencing, microarray was the main approach to handle transcripts. Despite obvious shortcomings, including limited dynamic range and difficulties to compare the results from distinct experiments, microarray was widely applied. During the past decade, next-generation sequencing (NGS) has revolutionized our understanding of genomics in a fast, high-throughput, cost-effective, and tractable manner. By adopting NGS, efficiency and fruitful outcomes concerning the efforts to elucidate genes responsible for producing active compounds in medicinal plants were profoundly enhanced. The whole process involves steps, from the plant material sampling, to cDNA library preparation, to deep sequencing, and then bioinformatics takes over to assemble enormous-yet fragmentary-data from which to comb and extract information. The unprecedentedly rapid development of such technologies provides so many choices to facilitate the task, which can cause confusion when choosing the suitable methodology for specific purposes. Here, we review the general approaches for deep transcriptome analysis and then focus on their application in discovering biosynthetic pathways of medicinal plants that produce important secondary metabolites. © 2016 Elsevier Inc. All rights reserved.

  19. Instruments for Deep Space Weather Prediction and Science

    NASA Astrophysics Data System (ADS)

    DeForest, C. E.; Laurent, G.

    2018-02-01

    We discuss remote space weather monitoring system concepts that could mount on the Deep Space Gateway and provide predictive capability for space weather events including SEP events and CME crossings, and advance heliophysics of the solar wind.

  20. Identification of miRNA from Bouteloua gracilis, a drought tolerant grass, by deep sequencing and their in silico analysis.

    PubMed

    Ordóñez-Baquera, Perla Lucía; González-Rodríguez, Everardo; Aguado-Santacruz, Gerardo Armando; Rascón-Cruz, Quintín; Conesa, Ana; Moreno-Brito, Verónica; Echavarria, Raquel; Dominguez-Viveros, Joel

    2017-02-01

    MicroRNAs (miRNAs) are small non-coding RNA molecules that regulate signal transduction, development, metabolism, and stress responses in plants through post-transcriptional degradation and/or translational repression of target mRNAs. Several studies have addressed the role of miRNAs in model plant species, but miRNA expression and function in economically important forage crops, such as Bouteloua gracilis (Poaceae), a high-quality and drought-resistant grass distributed in semiarid regions of the United States and northern Mexico remain unknown. We applied high-throughput sequencing technology and bioinformatics analysis and identified 31 conserved miRNA families and 53 novel putative miRNAs with different abundance of reads in chlorophyllic cell cultures derived from B. gracilis. Some conserved miRNA families were highly abundant and possessed predicted targets involved in metabolism, plant growth and development, and stress responses. We also predicted additional identified novel miRNAs with specific targets, including B. gracilis ESTs, which were detected under drought stress conditions. Here we report 31 conserved miRNA families and 53 putative novel miRNAs in B. gracilis. Our results suggested the presence of regulatory miRNAs involved in modulating physiological and stress responses in this grass species. Copyright © 2016 Elsevier Ltd. All rights reserved.

  1. The Tara Oceans voyage reveals global diversity and distribution patterns of marine planktonic ciliates

    PubMed Central

    Gimmler, Anna; Korn, Ralf; de Vargas, Colomban; Audic, Stéphane; Stoeck, Thorsten

    2016-01-01

    Illumina reads of the SSU-rDNA-V9 region obtained from the circumglobal Tara Oceans expedition allow the investigation of protistan plankton diversity patterns on a global scale. We analyzed 6,137,350 V9-amplicons from ocean surface waters and the deep chlorophyll maximum, which were taxonomically assigned to the phylum Ciliophora. For open ocean samples global planktonic ciliate diversity is relatively low (ca. 1,300 observed and predicted ciliate OTUs). We found that 17% of all detected ciliate OTUs occurred in all oceanic regions under study. On average, local ciliate OTU richness represented 27% of the global ciliate OTU richness, indicating that a large proportion of ciliates is widely distributed. Yet, more than half of these OTUs shared <90% sequence similarity with reference sequences of described ciliates. While alpha-diversity measures (richness and exp(Shannon H)) are hardly affected by contemporary environmental conditions, species (OTU) turnover and community similarity (β-diversity) across taxonomic groups showed strong correlation to environmental parameters. Logistic regression models predicted significant correlations between the occurrence of specific ciliate genera and individual nutrients, the oceanic carbonate system and temperature. Planktonic ciliates displayed distinct vertical distributions relative to chlorophyll a. In contrast, the Tara Oceans dataset did not reveal any evidence that latitude is structuring ciliate communities. PMID:27633177

  2. Interpretable Deep Models for ICU Outcome Prediction

    PubMed Central

    Che, Zhengping; Purushotham, Sanjay; Khemani, Robinder; Liu, Yan

    2016-01-01

    Exponential surge in health care data, such as longitudinal data from electronic health records (EHR), sensor data from intensive care unit (ICU), etc., is providing new opportunities to discover meaningful data-driven characteristics and patterns ofdiseases. Recently, deep learning models have been employedfor many computational phenotyping and healthcare prediction tasks to achieve state-of-the-art performance. However, deep models lack interpretability which is crucial for wide adoption in medical research and clinical decision-making. In this paper, we introduce a simple yet powerful knowledge-distillation approach called interpretable mimic learning, which uses gradient boosting trees to learn interpretable models and at the same time achieves strong prediction performance as deep learning models. Experiment results on Pediatric ICU dataset for acute lung injury (ALI) show that our proposed method not only outperforms state-of-the-art approaches for morality and ventilator free days prediction tasks but can also provide interpretable models to clinicians. PMID:28269832

  3. Deep Learning for Prediction of Obstructive Disease From Fast Myocardial Perfusion SPECT: A Multicenter Study.

    PubMed

    Betancur, Julian; Commandeur, Frederic; Motlagh, Mahsaw; Sharir, Tali; Einstein, Andrew J; Bokhari, Sabahat; Fish, Mathews B; Ruddy, Terrence D; Kaufmann, Philipp; Sinusas, Albert J; Miller, Edward J; Bateman, Timothy M; Dorbala, Sharmila; Di Carli, Marcelo; Germano, Guido; Otaki, Yuka; Tamarappoo, Balaji K; Dey, Damini; Berman, Daniel S; Slomka, Piotr J

    2018-03-12

    The study evaluated the automatic prediction of obstructive disease from myocardial perfusion imaging (MPI) by deep learning as compared with total perfusion deficit (TPD). Deep convolutional neural networks trained with a large multicenter population may provide improved prediction of per-patient and per-vessel coronary artery disease from single-photon emission computed tomography MPI. A total of 1,638 patients (67% men) without known coronary artery disease, undergoing stress 99m Tc-sestamibi or tetrofosmin MPI with new generation solid-state scanners in 9 different sites, with invasive coronary angiography performed within 6 months of MPI, were studied. Obstructive disease was defined as ≥70% narrowing of coronary arteries (≥50% for left main artery). Left ventricular myocardium was segmented using clinical nuclear cardiology software and verified by an expert reader. Stress TPD was computed using sex- and camera-specific normal limits. Deep learning was trained using raw and quantitative polar maps and evaluated for prediction of obstructive stenosis in a stratified 10-fold cross-validation procedure. A total of 1,018 (62%) patients and 1,797 of 4,914 (37%) arteries had obstructive disease. Area under the receiver-operating characteristic curve for disease prediction by deep learning was higher than for TPD (per patient: 0.80 vs. 0.78; per vessel: 0.76 vs. 0.73: p < 0.01). With deep learning threshold set to the same specificity as TPD, per-patient sensitivity improved from 79.8% (TPD) to 82.3% (deep learning) (p < 0.05), and per-vessel sensitivity improved from 64.4% (TPD) to 69.8% (deep learning) (p < 0.01). Deep learning has the potential to improve automatic interpretation of MPI as compared with current clinical methods. Copyright © 2018 American College of Cardiology Foundation. Published by Elsevier Inc. All rights reserved.

  4. Prediction of fatigue-related driver performance from EEG data by deep Riemannian model.

    PubMed

    Hajinoroozi, Mehdi; Jianqiu Zhang; Yufei Huang

    2017-07-01

    Prediction of the drivers' drowsy and alert states is important for safety purposes. The prediction of drivers' drowsy and alert states from electroencephalography (EEG) using shallow and deep Riemannian methods is presented. For shallow Riemannian methods, the minimum distance to Riemannian mean (mdm) and Log-Euclidian metric are investigated, where it is shown that Log-Euclidian metric outperforms the mdm algorithm. In addition the SPDNet, a deep Riemannian model, that takes the EEG covariance matrix as the input is investigated. It is shown that SPDNet outperforms all tested shallow and deep classification methods. Performance of SPDNet is 6.02% and 2.86% higher than the best performance by the conventional Euclidian classifiers and shallow Riemannian models, respectively.

  5. Deep Visual Attention Prediction

    NASA Astrophysics Data System (ADS)

    Wang, Wenguan; Shen, Jianbing

    2018-05-01

    In this work, we aim to predict human eye fixation with view-free scenes based on an end-to-end deep learning architecture. Although Convolutional Neural Networks (CNNs) have made substantial improvement on human attention prediction, it is still needed to improve CNN based attention models by efficiently leveraging multi-scale features. Our visual attention network is proposed to capture hierarchical saliency information from deep, coarse layers with global saliency information to shallow, fine layers with local saliency response. Our model is based on a skip-layer network structure, which predicts human attention from multiple convolutional layers with various reception fields. Final saliency prediction is achieved via the cooperation of those global and local predictions. Our model is learned in a deep supervision manner, where supervision is directly fed into multi-level layers, instead of previous approaches of providing supervision only at the output layer and propagating this supervision back to earlier layers. Our model thus incorporates multi-level saliency predictions within a single network, which significantly decreases the redundancy of previous approaches of learning multiple network streams with different input scales. Extensive experimental analysis on various challenging benchmark datasets demonstrate our method yields state-of-the-art performance with competitive inference time.

  6. Deep sequencing of the viral phoH gene reveals temporal variation, depth-specific composition, and persistent dominance of the same viral phoH genes in the Sargasso Sea

    PubMed Central

    Goldsmith, Dawn B.; Parsons, Rachel J.; Beyene, Damitu; Salamon, Peter

    2015-01-01

    Deep sequencing of the viral phoH gene, a host-derived auxiliary metabolic gene, was used to track viral diversity throughout the water column at the Bermuda Atlantic Time-series Study (BATS) site in the summer (September) and winter (March) of three years. Viral phoH sequences reveal differences in the viral communities throughout a depth profile and between seasons in the same year. Variation was also detected between the same seasons in subsequent years, though these differences were not as great as the summer/winter distinctions. Over 3,600 phoH operational taxonomic units (OTUs; 97% sequence identity) were identified. Despite high richness, most phoH sequences belong to a few large, common OTUs whereas the majority of the OTUs are small and rare. While many OTUs make sporadic appearances at just a few times or depths, a small number of OTUs dominate the community throughout the seasons, depths, and years. PMID:26157645

  7. Comparison of magnetic resonance imaging sequences for depicting the subthalamic nucleus for deep brain stimulation.

    PubMed

    Nagahama, Hiroshi; Suzuki, Kengo; Shonai, Takaharu; Aratani, Kazuki; Sakurai, Yuuki; Nakamura, Manami; Sakata, Motomichi

    2015-01-01

    Electrodes are surgically implanted into the subthalamic nucleus (STN) of Parkinson's disease patients to provide deep brain stimulation. For ensuring correct positioning, the anatomic location of the STN must be determined preoperatively. Magnetic resonance imaging has been used for pinpointing the location of the STN. To identify the optimal imaging sequence for identifying the STN, we compared images produced with T2 star-weighted angiography (SWAN), gradient echo T2*-weighted imaging, and fast spin echo T2-weighted imaging in 6 healthy volunteers. Our comparison involved measurement of the contrast-to-noise ratio (CNR) for the STN and substantia nigra and a radiologist's interpretations of the images. Of the sequences examined, the CNR and qualitative scores were significantly higher on SWAN images than on other images (p < 0.01) for STN visualization. Kappa value (0.74) on SWAN images was the highest in three sequences for visualizing the STN. SWAN is the sequence best suited for identifying the STN at the present time.

  8. Deep sequencing methods for protein engineering and design.

    PubMed

    Wrenbeck, Emily E; Faber, Matthew S; Whitehead, Timothy A

    2017-08-01

    The advent of next-generation sequencing (NGS) has revolutionized protein science, and the development of complementary methods enabling NGS-driven protein engineering have followed. In general, these experiments address the functional consequences of thousands of protein variants in a massively parallel manner using genotype-phenotype linked high-throughput functional screens followed by DNA counting via deep sequencing. We highlight the use of information rich datasets to engineer protein molecular recognition. Examples include the creation of multiple dual-affinity Fabs targeting structurally dissimilar epitopes and engineering of a broad germline-targeted anti-HIV-1 immunogen. Additionally, we highlight the generation of enzyme fitness landscapes for conducting fundamental studies of protein behavior and evolution. We conclude with discussion of technological advances. Copyright © 2016 Elsevier Ltd. All rights reserved.

  9. Development of genic-SSR markers by deep transcriptome sequencing in pigeonpea [Cajanus cajan (L.) Millspaugh].

    PubMed

    Dutta, Sutapa; Kumawat, Giriraj; Singh, Bikram P; Gupta, Deepak K; Singh, Sangeeta; Dogra, Vivek; Gaikwad, Kishor; Sharma, Tilak R; Raje, Ranjeet S; Bandhopadhya, Tapas K; Datta, Subhojit; Singh, Mahendra N; Bashasab, Fakrudin; Kulwal, Pawan; Wanjari, K B; K Varshney, Rajeev; Cook, Douglas R; Singh, Nagendra K

    2011-01-20

    Pigeonpea [Cajanus cajan (L.) Millspaugh], one of the most important food legumes of semi-arid tropical and subtropical regions, has limited genomic resources, particularly expressed sequence based (genic) markers. We report a comprehensive set of validated genic simple sequence repeat (SSR) markers using deep transcriptome sequencing, and its application in genetic diversity analysis and mapping. In this study, 43,324 transcriptome shotgun assembly unigene contigs were assembled from 1.696 million 454 GS-FLX sequence reads of separate pooled cDNA libraries prepared from leaf, root, stem and immature seed of two pigeonpea varieties, Asha and UPAS 120. A total of 3,771 genic-SSR loci, excluding homopolymeric and compound repeats, were identified; of which 2,877 PCR primer pairs were designed for marker development. Dinucleotide was the most common repeat motif with a frequency of 60.41%, followed by tri- (34.52%), hexa- (2.62%), tetra- (1.67%) and pentanucleotide (0.76%) repeat motifs. Primers were synthesized and tested for 772 of these loci with repeat lengths of ≥ 18 bp. Of these, 550 markers were validated for consistent amplification in eight diverse pigeonpea varieties; 71 were found to be polymorphic on agarose gel electrophoresis. Genetic diversity analysis was done on 22 pigeonpea varieties and eight wild species using 20 highly polymorphic genic-SSR markers. The number of alleles at these loci ranged from 4-10 and the polymorphism information content values ranged from 0.46 to 0.72. Neighbor-joining dendrogram showed distinct separation of the different groups of pigeonpea cultivars and wild species. Deep transcriptome sequencing of the two parental lines helped in silico identification of polymorphic genic-SSR loci to facilitate the rapid development of an intra-species reference genetic map, a subset of which was validated for expected allelic segregation in the reference mapping population. We developed 550 validated genic-SSR markers in pigeonpea using deep transcriptome sequencing. From these, 20 highly polymorphic markers were used to evaluate the genetic relationship among species of the genus Cajanus. A comprehensive set of genic-SSR markers was developed as an important genomic resource for diversity analysis and genetic mapping in pigeonpea.

  10. Development of genic-SSR markers by deep transcriptome sequencing in pigeonpea [Cajanus cajan (L.) Millspaugh

    PubMed Central

    2011-01-01

    Background Pigeonpea [Cajanus cajan (L.) Millspaugh], one of the most important food legumes of semi-arid tropical and subtropical regions, has limited genomic resources, particularly expressed sequence based (genic) markers. We report a comprehensive set of validated genic simple sequence repeat (SSR) markers using deep transcriptome sequencing, and its application in genetic diversity analysis and mapping. Results In this study, 43,324 transcriptome shotgun assembly unigene contigs were assembled from 1.696 million 454 GS-FLX sequence reads of separate pooled cDNA libraries prepared from leaf, root, stem and immature seed of two pigeonpea varieties, Asha and UPAS 120. A total of 3,771 genic-SSR loci, excluding homopolymeric and compound repeats, were identified; of which 2,877 PCR primer pairs were designed for marker development. Dinucleotide was the most common repeat motif with a frequency of 60.41%, followed by tri- (34.52%), hexa- (2.62%), tetra- (1.67%) and pentanucleotide (0.76%) repeat motifs. Primers were synthesized and tested for 772 of these loci with repeat lengths of ≥18 bp. Of these, 550 markers were validated for consistent amplification in eight diverse pigeonpea varieties; 71 were found to be polymorphic on agarose gel electrophoresis. Genetic diversity analysis was done on 22 pigeonpea varieties and eight wild species using 20 highly polymorphic genic-SSR markers. The number of alleles at these loci ranged from 4-10 and the polymorphism information content values ranged from 0.46 to 0.72. Neighbor-joining dendrogram showed distinct separation of the different groups of pigeonpea cultivars and wild species. Deep transcriptome sequencing of the two parental lines helped in silico identification of polymorphic genic-SSR loci to facilitate the rapid development of an intra-species reference genetic map, a subset of which was validated for expected allelic segregation in the reference mapping population. Conclusion We developed 550 validated genic-SSR markers in pigeonpea using deep transcriptome sequencing. From these, 20 highly polymorphic markers were used to evaluate the genetic relationship among species of the genus Cajanus. A comprehensive set of genic-SSR markers was developed as an important genomic resource for diversity analysis and genetic mapping in pigeonpea. PMID:21251263

  11. In-depth comparison of somatic point mutation callers based on different tumor next-generation sequencing depth data

    NASA Astrophysics Data System (ADS)

    Cai, Lei; Yuan, Wei; Zhang, Zhou; He, Lin; Chou, Kuo-Chen

    2016-11-01

    Four popular somatic single nucleotide variant (SNV) calling methods (Varscan, SomaticSniper, Strelka and MuTect2) were carefully evaluated on the real whole exome sequencing (WES, depth of ~50X) and ultra-deep targeted sequencing (UDT-Seq, depth of ~370X) data. The four tools returned poor consensus on candidates (only 20% of calls were with multiple hits by the callers). For both WES and UDT-Seq, MuTect2 and Strelka obtained the largest proportion of COSMIC entries as well as the lowest rate of dbSNP presence and high-alternative-alleles-in-control calls, demonstrating their superior sensitivity and accuracy. Combining different callers does increase reliability of candidates, but narrows the list down to very limited range of tumor read depth and variant allele frequency. Calling SNV on UDT-Seq data, which were of much higher read-depth, discovered additional true-positive variations, despite an even more tremendous growth in false positive predictions. Our findings not only provide valuable benchmark for state-of-the-art SNV calling methods, but also shed light on the access to more accurate SNV identification in the future.

  12. Biophysics of protein evolution and evolutionary protein biophysics

    PubMed Central

    Sikosek, Tobias; Chan, Hue Sun

    2014-01-01

    The study of molecular evolution at the level of protein-coding genes often entails comparing large datasets of sequences to infer their evolutionary relationships. Despite the importance of a protein's structure and conformational dynamics to its function and thus its fitness, common phylogenetic methods embody minimal biophysical knowledge of proteins. To underscore the biophysical constraints on natural selection, we survey effects of protein mutations, highlighting the physical basis for marginal stability of natural globular proteins and how requirement for kinetic stability and avoidance of misfolding and misinteractions might have affected protein evolution. The biophysical underpinnings of these effects have been addressed by models with an explicit coarse-grained spatial representation of the polypeptide chain. Sequence–structure mappings based on such models are powerful conceptual tools that rationalize mutational robustness, evolvability, epistasis, promiscuous function performed by ‘hidden’ conformational states, resolution of adaptive conflicts and conformational switches in the evolution from one protein fold to another. Recently, protein biophysics has been applied to derive more accurate evolutionary accounts of sequence data. Methods have also been developed to exploit sequence-based evolutionary information to predict biophysical behaviours of proteins. The success of these approaches demonstrates a deep synergy between the fields of protein biophysics and protein evolution. PMID:25165599

  13. Culturable prokaryotic diversity of deep, gas hydrate sediments: first use of a continuous high-pressure, anaerobic, enrichment and isolation system for subseafloor sediments (DeepIsoBUG)

    PubMed Central

    Parkes, R John; Sellek, Gerard; Webster, Gordon; Martin, Derek; Anders, Erik; Weightman, Andrew J; Sass, Henrik

    2009-01-01

    Deep subseafloor sediments may contain depressurization-sensitive, anaerobic, piezophilic prokaryotes. To test this we developed the DeepIsoBUG system, which when coupled with the HYACINTH pressure-retaining drilling and core storage system and the PRESS core cutting and processing system, enables deep sediments to be handled without depressurization (up to 25 MPa) and anaerobic prokaryotic enrichments and isolation to be conducted up to 100 MPa. Here, we describe the system and its first use with subsurface gas hydrate sediments from the Indian Continental Shelf, Cascadia Margin and Gulf of Mexico. Generally, highest cell concentrations in enrichments occurred close to in situ pressures (14 MPa) in a variety of media, although growth continued up to at least 80 MPa. Predominant sequences in enrichments were Carnobacterium, Clostridium, Marinilactibacillus and Pseudomonas, plus Acetobacterium and Bacteroidetes in Indian samples, largely independent of media and pressures. Related 16S rRNA gene sequences for all of these Bacteria have been detected in deep, subsurface environments, although isolated strains were piezotolerant, being able to grow at atmospheric pressure. Only the Clostridium and Acetobacterium were obligate anaerobes. No Archaea were enriched. It may be that these sediment samples were not deep enough (total depth 1126–1527 m) to obtain obligate piezophiles. PMID:19694787

  14. Targeted parallel sequencing of the Musa species: searching for an alternative model system for polyploidy studies

    USDA-ARS?s Scientific Manuscript database

    Modern day genomics holds the promise of solving the complexities of basic plant sciences, and of catalyzing practical advances in plant breeding. While contiguous, "base perfect" deep sequencing is a key module of any genome project, recent advances in parallel next generation sequencing technologi...

  15. 3′ terminal diversity of MRP RNA and other human noncoding RNAs revealed by deep sequencing

    PubMed Central

    2013-01-01

    Background Post-transcriptional 3′ end processing is a key component of RNA regulation. The abundant and essential RNA subunit of RNase MRP has been proposed to function in three distinct cellular compartments and therefore may utilize this mode of regulation. Here we employ 3′ RACE coupled with high-throughput sequencing to characterize the 3′ terminal sequences of human MRP RNA and other noncoding RNAs that form RNP complexes. Results The 3′ terminal sequence of MRP RNA from HEK293T cells has a distinctive distribution of genomically encoded termini (including an assortment of U residues) with a portion of these selectively tagged by oligo(A) tails. This profile contrasts with the relatively homogenous 3′ terminus of an in vitro transcribed MRP RNA control and the differing 3′ terminal profiles of U3 snoRNA, RNase P RNA, and telomerase RNA (hTR). Conclusions 3′ RACE coupled with deep sequencing provides a valuable framework for the functional characterization of 3′ terminal sequences of noncoding RNAs. PMID:24053768

  16. Studies of a biochemical factory: tomato trichome deep expressed sequence tag sequencing and proteomics.

    PubMed

    Schilmiller, Anthony L; Miner, Dennis P; Larson, Matthew; McDowell, Eric; Gang, David R; Wilkerson, Curtis; Last, Robert L

    2010-07-01

    Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces beta-caryophyllene and alpha-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells.

  17. Studies of a Biochemical Factory: Tomato Trichome Deep Expressed Sequence Tag Sequencing and Proteomics1[W][OA

    PubMed Central

    Schilmiller, Anthony L.; Miner, Dennis P.; Larson, Matthew; McDowell, Eric; Gang, David R.; Wilkerson, Curtis; Last, Robert L.

    2010-01-01

    Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces β-caryophyllene and α-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells. PMID:20431087

  18. Predicting novel microRNA: a comprehensive comparison of machine learning approaches.

    PubMed

    Stegmayer, Georgina; Di Persia, Leandro E; Rubiolo, Mariano; Gerard, Matias; Pividori, Milton; Yones, Cristian; Bugnon, Leandro A; Rodriguez, Tadeo; Raad, Jonathan; Milone, Diego H

    2018-05-23

    The importance of microRNAs (miRNAs) is widely recognized in the community nowadays because these short segments of RNA can play several roles in almost all biological processes. The computational prediction of novel miRNAs involves training a classifier for identifying sequences having the highest chance of being precursors of miRNAs (pre-miRNAs). The big issue with this task is that well-known pre-miRNAs are usually few in comparison with the hundreds of thousands of candidate sequences in a genome, which results in high class imbalance. This imbalance has a strong influence on most standard classifiers, and if not properly addressed in the model and the experiments, not only performance reported can be completely unrealistic but also the classifier will not be able to work properly for pre-miRNA prediction. Besides, another important issue is that for most of the machine learning (ML) approaches already used (supervised methods), it is necessary to have both positive and negative examples. The selection of positive examples is straightforward (well-known pre-miRNAs). However, it is difficult to build a representative set of negative examples because they should be sequences with hairpin structure that do not contain a pre-miRNA. This review provides a comprehensive study and comparative assessment of methods from these two ML approaches for dealing with the prediction of novel pre-miRNAs: supervised and unsupervised training. We present and analyze the ML proposals that have appeared during the past 10 years in literature. They have been compared in several prediction tasks involving two model genomes and increasing imbalance levels. This work provides a review of existing ML approaches for pre-miRNA prediction and fair comparisons of the classifiers with same features and data sets, instead of just a revision of published software tools. The results and the discussion can help the community to select the most adequate bioinformatics approach according to the prediction task at hand. The comparative results obtained suggest that from low to mid-imbalance levels between classes, supervised methods can be the best. However, at very high imbalance levels, closer to real case scenarios, models including unsupervised and deep learning can provide better performance.

  19. A deep intronic CLRN1 (USH3A) founder mutation generates an aberrant exon and underlies severe Usher syndrome on the Arabian Peninsula.

    PubMed

    Khan, Arif O; Becirovic, Elvir; Betz, Christian; Neuhaus, Christine; Altmüller, Janine; Maria Riedmayr, Lisa; Motameny, Susanne; Nürnberg, Gudrun; Nürnberg, Peter; Bolz, Hanno J

    2017-05-03

    Deafblindness is mostly due to Usher syndrome caused by recessive mutations in the known genes. Mutation-negative patients therefore either have distinct diseases, mutations in yet unknown Usher genes or in extra-exonic parts of the known genes - to date a largely unexplored possibility. In a consanguineous Saudi family segregating Usher syndrome type 1 (USH1), NGS of genes for Usher syndrome, deafness and retinal dystrophy and subsequent whole-exome sequencing each failed to identify a mutation. Genome-wide linkage analysis revealed two small candidate regions on chromosome 3, one containing the USH3A gene CLRN1, which has never been associated with Usher syndrome in Saudi Arabia. Whole-genome sequencing (WGS) identified a homozygous deep intronic mutation, c.254-649T > G, predicted to generate a novel donor splice site. CLRN1 minigene-based analysis confirmed the splicing of an aberrant exon due to usage of this novel motif, resulting in a frameshift and a premature termination codon. We identified this mutation in an additional two of seven unrelated mutation-negative Saudi USH1 patients. Locus-specific markers indicated that c.254-649T > G CLRN1 represents a founder allele that may significantly contribute to deafblindness in this population. Our finding underlines the potential of WGS to uncover atypically localized, hidden mutations in patients who lack exonic mutations in the known disease genes.

  20. TAM: a method for enrichment and depletion analysis of a microRNA category in a list of microRNAs.

    PubMed

    Lu, Ming; Shi, Bing; Wang, Juan; Cao, Qun; Cui, Qinghua

    2010-08-09

    MicroRNAs (miRNAs) are a class of important gene regulators. The number of identified miRNAs has been increasing dramatically in recent years. An emerging major challenge is the interpretation of the genome-scale miRNA datasets, including those derived from microarray and deep-sequencing. It is interesting and important to know the common rules or patterns behind a list of miRNAs, (i.e. the deregulated miRNAs resulted from an experiment of miRNA microarray or deep-sequencing). For the above purpose, this study presents a method and develops a tool (TAM) for annotations of meaningful human miRNAs categories. We first integrated miRNAs into various meaningful categories according to prior knowledge, such as miRNA family, miRNA cluster, miRNA function, miRNA associated diseases, and tissue specificity. Using TAM, given lists of miRNAs can be rapidly annotated and summarized according to the integrated miRNA categorical data. Moreover, given a list of miRNAs, TAM can be used to predict novel related miRNAs. Finally, we confirmed the usefulness and reliability of TAM by applying it to deregulated miRNAs in acute myocardial infarction (AMI) from two independent experiments. TAM can efficiently identify meaningful categories for given miRNAs. In addition, TAM can be used to identify novel miRNA biomarkers. TAM tool, source codes, and miRNA category data are freely available at http://cmbi.bjmu.edu.cn/tam.

  1. Optical Communications Channel Combiner

    NASA Technical Reports Server (NTRS)

    Quirk, Kevin J.; Quirk, Kevin J.; Nguyen, Danh H.; Nguyen, Huy

    2012-01-01

    NASA has identified deep-space optical communications links as an integral part of a unified space communication network in order to provide data rates in excess of 100 Mb/s. The distances and limited power inherent in a deep-space optical downlink necessitate the use of photon-counting detectors and a power-efficient modulation such as pulse position modulation (PPM). For the output of each photodetector, whether from a separate telescope or a portion of the detection area, a communication receiver estimates a log-likelihood ratio for each PPM slot. To realize the full effective aperture of these receivers, their outputs must be combined prior to information decoding. A channel combiner was developed to synchronize the log-likelihood ratio (LLR) sequences of multiple receivers, and then combines these into a single LLR sequence for information decoding. The channel combiner synchronizes the LLR sequences of up to three receivers and then combines these into a single LLR sequence for output. The channel combiner has three channel inputs, each of which takes as input a sequence of four-bit LLRs for each PPM slot in a codeword via a XAUI 10 Gb/s quad optical fiber interface. The cross-correlation between the channels LLR time series are calculated and used to synchronize the sequences prior to combining. The output of the channel combiner is a sequence of four-bit LLRs for each PPM slot in a codeword via a XAUI 10 Gb/s quad optical fiber interface. The unit is controlled through a 1 Gb/s Ethernet UDP/IP interface. A deep-space optical communication link has not yet been demonstrated. This ground-station channel combiner was developed to demonstrate this capability and is unique in its ability to process such a signal.

  2. The 3-D aftershock distribution of three recent M5~5.5 earthquakes in the Anza region,California

    NASA Astrophysics Data System (ADS)

    Zhang, Q.; Wdowinski, S.; Lin, G.

    2011-12-01

    The San Jacinto fault zone (SJFZ) exhibits the highest level of seismicity compared to other regions in southern California. On average, it produces four earthquakes per day, most of them at depth of 10-17 km. Over the past decade, an increasing seismic activity occurred in the Anza region, which included three M5~5.5 events and their aftershock sequences. These events occurred in 2001, 2005, and 2010. In this research we map the 3-D distribution of these three events to evaluate their rupture geometry and better understand the unusual deep seismic pattern along the SJFZ, which was termed "deep creep" (Wdowinski, 2009). We relocated 97,562 events from 1981 to 2011 in Anza region by applying the Source-Specific Station Term (SSST) method (Lin et al., 2006) and used an accurate 1-D velocity model derived from 3-D model of Lin et al (2007) and used In order to separate the aftershock sequence from background seismicity, we characterized each of the three aftershock sequences using Omori's law. Preliminary results show that all three sequences had a similar geometry of deep elongated aftershock distribution. Most aftershocks occurred at depth of 10-17 km and extended over a 70 km long segments of the SJFZ, centered at the mainshock hypocenters. A comparative study of other M5~5.5 mainshocks and their aftershock sequences in southern California reveals very different geometrical pattern, suggesting that the three Anza M5~5.5 events are unique and can be indicative of "deep creep" deformation processes. Reference 1.Lin, G.and Shearer,P.M.,2006, The COMPLOC earthquake location package,Seism. Res. Lett.77, pp.440-444. 2.Lin, G. and Shearer, P.M., Hauksson, E., and Thurber C.H.,2007, A three-dimensional crustal seismic velocity model for southern California from a composite event method,J. Geophys.Res.112, B12306, doi: 10.1029/ 2007JB004977. 3.Wdowinski, S. ,2009, Deep creep as a cause for the excess seismicity along the San Jacinto fault, Nat. Geosci.,doi:10.1038/NGEO684.

  3. Effects of hydrostatic pressure on yeasts isolated from deep-sea hydrothermal vents.

    PubMed

    Burgaud, Gaëtan; Hué, Nguyen Thi Minh; Arzur, Danielle; Coton, Monika; Perrier-Cornet, Jean-Marie; Jebbar, Mohamed; Barbier, Georges

    2015-11-01

    Hydrostatic pressure plays a significant role in the distribution of life in the biosphere. Knowledge of deep-sea piezotolerant and (hyper)piezophilic bacteria and archaea diversity has been well documented, along with their specific adaptations to cope with high hydrostatic pressure (HHP). Recent investigations of deep-sea microbial community compositions have shown unexpected micro-eukaryotic communities, mainly dominated by fungi. Molecular methods such as next-generation sequencing have been used for SSU rRNA gene sequencing to reveal fungal taxa. Currently, a difficult but fascinating challenge for marine mycologists is to create deep-sea marine fungus culture collections and assess their ability to cope with pressure. Indeed, although there is no universal genetic marker for piezoresistance, physiological analyses provide concrete relevant data for estimating their adaptations and understanding the role of fungal communities in the abyss. The present study investigated morphological and physiological responses of fungi to HHP using a collection of deep-sea yeasts as a model. The aim was to determine whether deep-sea yeasts were able to tolerate different HHP and if they were metabolically active. Here we report an unexpected taxonomic-based dichotomic response to pressure with piezosensitve ascomycetes and piezotolerant basidiomycetes, and distinct morphological switches triggered by pressure for certain strains. Copyright © 2015 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

  4. Sensitive cell-based assay for determination of human immunodeficiency virus type 1 coreceptor tropism.

    PubMed

    Weber, Jan; Vazquez, Ana C; Winner, Dane; Gibson, Richard M; Rhea, Ariel M; Rose, Justine D; Wylie, Doug; Henry, Kenneth; Wright, Alison; King, Kevin; Archer, John; Poveda, Eva; Soriano, Vicente; Robertson, David L; Olivo, Paul D; Arts, Eric J; Quiñones-Mateu, Miguel E

    2013-05-01

    CCR5 antagonists are a powerful new class of antiretroviral drugs that require a companion assay to evaluate the presence of CXCR4-tropic (non-R5) viruses prior to use in human immunodeficiency virus (HIV)-infected individuals. In this study, we have developed, characterized, verified, and prevalidated a novel phenotypic test to determine HIV-1 coreceptor tropism (VERITROP) based on a sensitive cell-to-cell fusion assay. A proprietary vector was constructed containing a near-full-length HIV-1 genome with the yeast uracil biosynthesis (URA3) gene replacing the HIV-1 env coding sequence. Patient-derived HIV-1 PCR products were introduced by homologous recombination using an innovative yeast-based cloning strategy. The env-expressing vectors were then used in a cell-to-cell fusion assay to determine the presence of R5 and/or non-R5 HIV-1 variants within the viral population. Results were compared with (i) the original version of Trofile (Monogram Biosciences, San Francisco, CA), (ii) population sequencing, and (iii) 454 pyrosequencing, with the genotypic data analyzed using several bioinformatics tools, i.e., the 11/24/25 rule, Geno2Pheno (2% to 5.75%, 3.5%, or 10% false-positive rate [FPR]), and webPSSM. VERITROP consistently detected minority non-R5 variants from clinical specimens, with an analytical sensitivity of 0.3%, with viral loads of ≥1,000 copies/ml, and from B and non-B subtypes. In a pilot study, a 73.7% (56/76) concordance was observed with the original Trofile assay, with 19 of the 20 discordant results corresponding to non-R5 variants detected using VERITROP and not by the original Trofile assay. The degree of concordance of VERITROP and Trofile with population and deep sequencing results depended on the algorithm used to determine HIV-1 coreceptor tropism. Overall, VERITROP showed better concordance with deep sequencing/Geno2Pheno at a 0.3% detection threshold (67%), whereas Trofile matched better with population sequencing (79%). However, 454 sequencing using Geno2Pheno at a 10% FPR and 0.3% threshold and VERITROP more accurately predicted the success of a maraviroc-based regimen. In conclusion, VERITROP may promote the development of new HIV coreceptor antagonists and aid in the treatment and management of HIV-infected individuals prior to and/or during treatment with this class of drugs.

  5. Small RNA deep sequencing identifies novel and salt-stress-regulated microRNAs from roots of Medicago sativa and Medicago truncatula.

    PubMed

    Long, Rui-Cai; Li, Ming-Na; Kang, Jun-Mei; Zhang, Tie-Jun; Sun, Yan; Yang, Qing-Chuan

    2015-05-01

    Small 21- to 24-nucleotide (nt) ribonucleic acids (RNAs), notably the microRNA (miRNA), are emerging as a posttranscriptional regulation mechanism. Salt stress is one of the primary abiotic stresses that cause the crop losses worldwide. In saline lands, root growth and function of plant are determined by the action of environmental salt stress through specific genes that adapt root development to the restrictive condition. To elucidate the role of miRNAs in salt stress regulation in Medicago, we used a high-throughput sequencing approach to analyze four small RNA libraries from roots of Zhongmu-1 (Medicago sativa) and Jemalong A17 (Medicago truncatula), which were treated with 300 mM NaCl for 0 and 8 h. Each library generated about 20 million short sequences and contained predominantly small RNAs of 24-nt length, followed by 21-nt and 22-nt small RNAs. Using sequence analysis, we identified 385 conserved miRNAs from 96 families, along with 68 novel candidate miRNAs. Of all the 68 predicted novel miRNAs, 15 miRNAs were identified to have miRNA*. Statistical analysis on abundance of sequencing read revealed specific miRNA showing contrasting expression patterns between M. sativa and M. truncatula roots, as well as between roots treated for 0 and 8 h. The expression of 10 conserved and novel miRNAs was also quantified by quantitative real-time reverse transcription polymerase chain reaction (qRT-PCR). The miRNA precursor and target genes were predicted by bioinformatics analysis. We concluded that the salt stress related conserved and novel miRNAs may have a large variety of target mRNAs, some of which might play key roles in salt stress regulation of Medicago. © 2014 Scandinavian Plant Physiology Society.

  6. When less is more: 'slicing' sequencing data improves read decoding accuracy and de novo assembly quality.

    PubMed

    Lonardi, Stefano; Mirebrahim, Hamid; Wanamaker, Steve; Alpert, Matthew; Ciardo, Gianfranco; Duma, Denisa; Close, Timothy J

    2015-09-15

    As the invention of DNA sequencing in the 70s, computational biologists have had to deal with the problem of de novo genome assembly with limited (or insufficient) depth of sequencing. In this work, we investigate the opposite problem, that is, the challenge of dealing with excessive depth of sequencing. We explore the effect of ultra-deep sequencing data in two domains: (i) the problem of decoding reads to bacterial artificial chromosome (BAC) clones (in the context of the combinatorial pooling design we have recently proposed), and (ii) the problem of de novo assembly of BAC clones. Using real ultra-deep sequencing data, we show that when the depth of sequencing increases over a certain threshold, sequencing errors make these two problems harder and harder (instead of easier, as one would expect with error-free data), and as a consequence the quality of the solution degrades with more and more data. For the first problem, we propose an effective solution based on 'divide and conquer': we 'slice' a large dataset into smaller samples of optimal size, decode each slice independently, and then merge the results. Experimental results on over 15 000 barley BACs and over 4000 cowpea BACs demonstrate a significant improvement in the quality of the decoding and the final assembly. For the second problem, we show for the first time that modern de novo assemblers cannot take advantage of ultra-deep sequencing data. Python scripts to process slices and resolve decoding conflicts are available from http://goo.gl/YXgdHT; software Hashfilter can be downloaded from http://goo.gl/MIyZHs stelo@cs.ucr.edu or timothy.close@ucr.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  7. Deep Sequencing of T-cell Receptor DNA as a Biomarker of Clonally Expanded TILs in Breast Cancer after Immunotherapy.

    PubMed

    Page, David B; Yuan, Jianda; Redmond, David; Wen, Y Hanna; Durack, Jeremy C; Emerson, Ryan; Solomon, Stephen; Dong, Zhiwan; Wong, Phillip; Comstock, Christopher; Diab, Adi; Sung, Janice; Maybody, Majid; Morris, Elizabeth; Brogi, Edi; Morrow, Monica; Sacchini, Virgilio; Elemento, Olivier; Robins, Harlan; Patil, Sujata; Allison, James P; Wolchok, Jedd D; Hudis, Clifford; Norton, Larry; McArthur, Heather L

    2016-10-01

    In early-stage breast cancer, the degree of tumor-infiltrating lymphocytes (TIL) predicts response to chemotherapy and overall survival. Combination immunotherapy with immune checkpoint antibody plus tumor cryoablation can induce lymphocytic infiltrates and improve survival in mice. We used T-cell receptor (TCR) DNA sequencing to evaluate both the effect of cryoimmunotherapy in humans and the feasibility of TCR sequencing in early-stage breast cancer. In a pilot clinical trial, 18 women with early-stage breast cancer were treated preoperatively with cryoablation, single-dose anti-CTLA-4 (ipilimumab), or cryoablation + ipilimumab. TCRs within serially collected peripheral blood and tumor tissue were sequenced. In baseline tumor tissues, T-cell density as measured by TCR sequencing correlated with TIL scores obtained by hematoxylin and eosin (H&E) staining. However, tumors with little or no lymphocytes by H&E contained up to 3.6 × 10 6 TCR DNA sequences, highlighting the sensitivity of the ImmunoSEQ platform. In this dataset, ipilimumab increased intratumoral T-cell density over time, whereas cryoablation ± ipilimumab diversified and remodeled the intratumoral T-cell clonal repertoire. Compared with monotherapy, cryoablation plus ipilimumab was associated with numerically greater numbers of peripheral blood and intratumoral T-cell clones expanding robustly following therapy. In conclusion, TCR sequencing correlates with H&E lymphocyte scoring and provides additional information on clonal diversity. These findings support further study of the use of TCR sequencing as a biomarker for T-cell responses to therapy and for the study of cryoimmunotherapy in early-stage breast cancer. Cancer Immunol Res; 4(10); 835-44. ©2016 AACR. ©2016 American Association for Cancer Research.

  8. Unique microbial community in drilling fluids from Chinese continental scientific drilling

    USGS Publications Warehouse

    Zhang, Gengxin; Dong, Hailiang; Jiang, Hongchen; Xu, Zhiqin; Eberl, Dennis D.

    2006-01-01

    Circulating drilling fluid is often regarded as a contamination source in investigations of subsurface microbiology. However, it also provides an opportunity to sample geological fluids at depth and to study contained microbial communities. During our study of deep subsurface microbiology of the Chinese Continental Scientific Deep drilling project, we collected 6 drilling fluid samples from a borehole from 2290 to 3350 m below the land surface. Microbial communities in these samples were characterized with cultivation-dependent and -independent techniques. Characterization of 16S rRNA genes indicated that the bacterial clone sequences related to Firmicutes became progressively dominant with increasing depth. Most sequences were related to anaerobic, thermophilic, halophilic or alkaliphilic bacteria. These habitats were consistent with the measured geochemical characteristics of the drilling fluids that have incorporated geological fluids and partly reflected the in-situ conditions. Several clone types were closely related to Thermoanaerobacter ethanolicus, Caldicellulosiruptor lactoaceticus, and Anaerobranca gottschalkii, an anaerobic metal-reducer, an extreme thermophile, and an anaerobic chemoorganotroph, respectively, with an optimal growth temperature of 50–68°C. Seven anaerobic, thermophilic Fe(III)-reducing bacterial isolates were obtained and they were capable of reducing iron oxide and clay minerals to produce siderite, vivianite, and illite. The archaeal diversity was low. Most archaeal sequences were not related to any known cultivated species, but rather to environmental clone sequences recovered from subsurface environments. We infer that the detected microbes were derived from geological fluids at depth and their growth habitats reflected the deep subsurface conditions. These findings have important implications for microbial survival and their ecological functions in the deep subsurface.

  9. Full genome virus detection in fecal samples using sensitive nucleic acid preparation, deep sequencing, and a novel iterative sequence classification algorithm.

    PubMed

    Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J; Kellam, Paul; van der Hoek, Lia

    2014-01-01

    We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis.

  10. Full Genome Virus Detection in Fecal Samples Using Sensitive Nucleic Acid Preparation, Deep Sequencing, and a Novel Iterative Sequence Classification Algorithm

    PubMed Central

    Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J.; Kellam, Paul; van der Hoek, Lia

    2014-01-01

    We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis. PMID:24695106

  11. News from the protein mutability landscape.

    PubMed

    Hecht, Maximilian; Bromberg, Yana; Rost, Burkhard

    2013-11-01

    Some mutations of protein residues matter more than others, and these are often conserved evolutionarily. The explosion of deep sequencing and genotyping increasingly requires the distinction between effect and neutral variants. The simplest approach predicts all mutations of conserved residues to have an effect; however, this works poorly, at best. Many computational tools that are optimized to predict the impact of point mutations provide more detail. Here, we expand the perspective from the view of single variants to the level of sketching the entire mutability landscape. This landscape is defined by the impact of substituting every residue at each position in a protein by each of the 19 non-native amino acids. We review some of the powerful conclusions about protein function, stability and their robustness to mutation that can be drawn from such an analysis. Large-scale experimental and computational mutagenesis experiments are increasingly furthering our understanding of protein function and of the genotype-phenotype associations. We also discuss how these can be used to improve predictions of protein function and pathogenicity of missense variants. Copyright © 2013 The Authors. Published by Elsevier Ltd.. All rights reserved.

  12. A comprehensive survey of 3' animal miRNA modification events and a possible role for 3' adenylation in modulating miRNA targeting effectiveness.

    PubMed

    Burroughs, A Maxwell; Ando, Yoshinari; de Hoon, Michiel J L; Tomaru, Yasuhiro; Nishibu, Takahiro; Ukekawa, Ryo; Funakoshi, Taku; Kurokawa, Tsutomu; Suzuki, Harukazu; Hayashizaki, Yoshihide; Daub, Carsten O

    2010-10-01

    Animal microRNA sequences are subject to 3' nucleotide addition. Through detailed analysis of deep-sequenced short RNA data sets, we show adenylation and uridylation of miRNA is globally present and conserved across Drosophila and vertebrates. To better understand 3' adenylation function, we deep-sequenced RNA after knockdown of nucleotidyltransferase enzymes. The PAPD4 nucleotidyltransferase adenylates a wide range of miRNA loci, but adenylation does not appear to affect miRNA stability on a genome-wide scale. Adenine addition appears to reduce effectiveness of miRNA targeting of mRNA transcripts while deep-sequencing of RNA bound to immunoprecipitated Argonaute (AGO) subfamily proteins EIF2C1-EIF2C3 revealed substantial reduction of adenine addition in miRNA associated with EIF2C2 and EIF2C3. Our findings show 3' addition events are widespread and conserved across animals, PAPD4 is a primary miRNA adenylating enzyme, and suggest a role for 3' adenine addition in modulating miRNA effectiveness, possibly through interfering with incorporation into the RNA-induced silencing complex (RISC), a regulatory role that would complement the role of miRNA uridylation in blocking DICER1 uptake.

  13. HomozygosityMapper2012--bridging the gap between homozygosity mapping and deep sequencing.

    PubMed

    Seelow, Dominik; Schuelke, Markus

    2012-07-01

    Homozygosity mapping is a common method to map recessive traits in consanguineous families. To facilitate these analyses, we have developed HomozygosityMapper, a web-based approach to homozygosity mapping. HomozygosityMapper allows researchers to directly upload the genotype files produced by the major genotyping platforms as well as deep sequencing data. It detects stretches of homozygosity shared by the affected individuals and displays them graphically. Users can interactively inspect the underlying genotypes, manually refine these regions and eventually submit them to our candidate gene search engine GeneDistiller to identify the most promising candidate genes. Here, we present the new version of HomozygosityMapper. The most striking new feature is the support of Next Generation Sequencing *.vcf files as input. Upon users' requests, we have implemented the analysis of common experimental rodents as well as of important farm animals. Furthermore, we have extended the options for single families and loss of heterozygosity studies. Another new feature is the export of *.bed files for targeted enrichment of the potential disease regions for deep sequencing strategies. HomozygosityMapper also generates files for conventional linkage analyses which are already restricted to the possible disease regions, hence superseding CPU-intensive genome-wide analyses. HomozygosityMapper is freely available at http://www.homozygositymapper.org/.

  14. Maximum entropy methods for extracting the learned features of deep neural networks.

    PubMed

    Finnegan, Alex; Song, Jun S

    2017-10-01

    New architectures of multilayer artificial neural networks and new methods for training them are rapidly revolutionizing the application of machine learning in diverse fields, including business, social science, physical sciences, and biology. Interpreting deep neural networks, however, currently remains elusive, and a critical challenge lies in understanding which meaningful features a network is actually learning. We present a general method for interpreting deep neural networks and extracting network-learned features from input data. We describe our algorithm in the context of biological sequence analysis. Our approach, based on ideas from statistical physics, samples from the maximum entropy distribution over possible sequences, anchored at an input sequence and subject to constraints implied by the empirical function learned by a network. Using our framework, we demonstrate that local transcription factor binding motifs can be identified from a network trained on ChIP-seq data and that nucleosome positioning signals are indeed learned by a network trained on chemical cleavage nucleosome maps. Imposing a further constraint on the maximum entropy distribution also allows us to probe whether a network is learning global sequence features, such as the high GC content in nucleosome-rich regions. This work thus provides valuable mathematical tools for interpreting and extracting learned features from feed-forward neural networks.

  15. Comparing deep neural network and other machine learning algorithms for stroke prediction in a large-scale population-based electronic medical claims database.

    PubMed

    Chen-Ying Hung; Wei-Chen Chen; Po-Tsun Lai; Ching-Heng Lin; Chi-Chun Lee

    2017-07-01

    Electronic medical claims (EMCs) can be used to accurately predict the occurrence of a variety of diseases, which can contribute to precise medical interventions. While there is a growing interest in the application of machine learning (ML) techniques to address clinical problems, the use of deep-learning in healthcare have just gained attention recently. Deep learning, such as deep neural network (DNN), has achieved impressive results in the areas of speech recognition, computer vision, and natural language processing in recent years. However, deep learning is often difficult to comprehend due to the complexities in its framework. Furthermore, this method has not yet been demonstrated to achieve a better performance comparing to other conventional ML algorithms in disease prediction tasks using EMCs. In this study, we utilize a large population-based EMC database of around 800,000 patients to compare DNN with three other ML approaches for predicting 5-year stroke occurrence. The result shows that DNN and gradient boosting decision tree (GBDT) can result in similarly high prediction accuracies that are better compared to logistic regression (LR) and support vector machine (SVM) approaches. Meanwhile, DNN achieves optimal results by using lesser amounts of patient data when comparing to GBDT method.

  16. deepNF: Deep network fusion for protein function prediction.

    PubMed

    Gligorijevic, Vladimir; Barot, Meet; Bonneau, Richard

    2018-06-01

    The prevalence of high-throughput experimental methods has resulted in an abundance of large-scale molecular and functional interaction networks. The connectivity of these networks provides a rich source of information for inferring functional annotations for genes and proteins. An important challenge has been to develop methods for combining these heterogeneous networks to extract useful protein feature representations for function prediction. Most of the existing approaches for network integration use shallow models that encounter difficulty in capturing complex and highly-nonlinear network structures. Thus, we propose deepNF, a network fusion method based on Multimodal Deep Autoencoders to extract high-level features of proteins from multiple heterogeneous interaction networks. We apply this method to combine STRING networks to construct a common low-dimensional representation containing high-level protein features. We use separate layers for different network types in the early stages of the multimodal autoencoder, later connecting all the layers into a single bottleneck layer from which we extract features to predict protein function. We compare the cross-validation and temporal holdout predictive performance of our method with state-of-the-art methods, including the recently proposed method Mashup. Our results show that our method outperforms previous methods for both human and yeast STRING networks. We also show substantial improvement in the performance of our method in predicting GO terms of varying type and specificity. deepNF is freely available at: https://github.com/VGligorijevic/deepNF. vgligorijevic@flatironinstitute.org, rb133@nyu.edu. Supplementary data are available at Bioinformatics online.

  17. A deep auto-encoder model for gene expression prediction.

    PubMed

    Xie, Rui; Wen, Jia; Quitadamo, Andrew; Cheng, Jianlin; Shi, Xinghua

    2017-11-17

    Gene expression is a key intermediate level that genotypes lead to a particular trait. Gene expression is affected by various factors including genotypes of genetic variants. With an aim of delineating the genetic impact on gene expression, we build a deep auto-encoder model to assess how good genetic variants will contribute to gene expression changes. This new deep learning model is a regression-based predictive model based on the MultiLayer Perceptron and Stacked Denoising Auto-encoder (MLP-SAE). The model is trained using a stacked denoising auto-encoder for feature selection and a multilayer perceptron framework for backpropagation. We further improve the model by introducing dropout to prevent overfitting and improve performance. To demonstrate the usage of this model, we apply MLP-SAE to a real genomic datasets with genotypes and gene expression profiles measured in yeast. Our results show that the MLP-SAE model with dropout outperforms other models including Lasso, Random Forests and the MLP-SAE model without dropout. Using the MLP-SAE model with dropout, we show that gene expression quantifications predicted by the model solely based on genotypes, align well with true gene expression patterns. We provide a deep auto-encoder model for predicting gene expression from SNP genotypes. This study demonstrates that deep learning is appropriate for tackling another genomic problem, i.e., building predictive models to understand genotypes' contribution to gene expression. With the emerging availability of richer genomic data, we anticipate that deep learning models play a bigger role in modeling and interpreting genomics.

  18. DeepQA: improving the estimation of single protein model quality with deep belief networks.

    PubMed

    Cao, Renzhi; Bhattacharya, Debswapna; Hou, Jie; Cheng, Jianlin

    2016-12-05

    Protein quality assessment (QA) useful for ranking and selecting protein models has long been viewed as one of the major challenges for protein tertiary structure prediction. Especially, estimating the quality of a single protein model, which is important for selecting a few good models out of a large model pool consisting of mostly low-quality models, is still a largely unsolved problem. We introduce a novel single-model quality assessment method DeepQA based on deep belief network that utilizes a number of selected features describing the quality of a model from different perspectives, such as energy, physio-chemical characteristics, and structural information. The deep belief network is trained on several large datasets consisting of models from the Critical Assessment of Protein Structure Prediction (CASP) experiments, several publicly available datasets, and models generated by our in-house ab initio method. Our experiments demonstrate that deep belief network has better performance compared to Support Vector Machines and Neural Networks on the protein model quality assessment problem, and our method DeepQA achieves the state-of-the-art performance on CASP11 dataset. It also outperformed two well-established methods in selecting good outlier models from a large set of models of mostly low quality generated by ab initio modeling methods. DeepQA is a useful deep learning tool for protein single model quality assessment and protein structure prediction. The source code, executable, document and training/test datasets of DeepQA for Linux is freely available to non-commercial users at http://cactus.rnet.missouri.edu/DeepQA/ .

  19. The mitochondrial genome of Ifremeria nautilei and the phylogenetic position of the enigmatic deep-sea Abyssochrysoidea (Mollusca: Gastropoda).

    PubMed

    Osca, David; Templado, José; Zardoya, Rafael

    2014-09-01

    The complete nucleotide sequence of the mitochondrial (mt) genome of the deep-sea vent snail Ifremeria nautilei (Gastropoda: Abyssochrysoidea) was determined. The double stranded circular molecule is 15,664 pb in length and encodes for the typical 37 metazoan mitochondrial genes. The gene arrangement of the Ifremeria mt genome is most similar to genome organization of caenogastropods and differs only on the relative position of the trnW gene. The deduced amino acid sequences of the mt protein coding genes of Ifremeria mt genome were aligned with orthologous sequences from representatives of the main lineages of gastropods and phylogenetic relationships were inferred. The reconstructed phylogeny supports that Ifremeria belongs to Caenogastropoda and that it is closely related to hypsogastropod superfamilies. Results were compared with a reconstructed nuclear-based phylogeny. Moreover, a relaxed molecular-clock timetree calibrated with fossils dated the divergence of Abyssochrysoidea in the Late Jurassic-Early Cretaceous indicating a relatively modern colonization of deep-sea environments by these snails. Copyright © 2014 Elsevier B.V. All rights reserved.

  20. TargetSpy: a supervised machine learning approach for microRNA target prediction.

    PubMed

    Sturm, Martin; Hackenberg, Michael; Langenberger, David; Frishman, Dmitrij

    2010-05-28

    Virtually all currently available microRNA target site prediction algorithms require the presence of a (conserved) seed match to the 5' end of the microRNA. Recently however, it has been shown that this requirement might be too stringent, leading to a substantial number of missed target sites. We developed TargetSpy, a novel computational approach for predicting target sites regardless of the presence of a seed match. It is based on machine learning and automatic feature selection using a wide spectrum of compositional, structural, and base pairing features covering current biological knowledge. Our model does not rely on evolutionary conservation, which allows the detection of species-specific interactions and makes TargetSpy suitable for analyzing unconserved genomic sequences.In order to allow for an unbiased comparison of TargetSpy to other methods, we classified all algorithms into three groups: I) no seed match requirement, II) seed match requirement, and III) conserved seed match requirement. TargetSpy predictions for classes II and III are generated by appropriate postfiltering. On a human dataset revealing fold-change in protein production for five selected microRNAs our method shows superior performance in all classes. In Drosophila melanogaster not only our class II and III predictions are on par with other algorithms, but notably the class I (no-seed) predictions are just marginally less accurate. We estimate that TargetSpy predicts between 26 and 112 functional target sites without a seed match per microRNA that are missed by all other currently available algorithms. Only a few algorithms can predict target sites without demanding a seed match and TargetSpy demonstrates a substantial improvement in prediction accuracy in that class. Furthermore, when conservation and the presence of a seed match are required, the performance is comparable with state-of-the-art algorithms. TargetSpy was trained on mouse and performs well in human and drosophila, suggesting that it may be applicable to a broad range of species. Moreover, we have demonstrated that the application of machine learning techniques in combination with upcoming deep sequencing data results in a powerful microRNA target site prediction tool http://www.targetspy.org.

  1. TargetSpy: a supervised machine learning approach for microRNA target prediction

    PubMed Central

    2010-01-01

    Background Virtually all currently available microRNA target site prediction algorithms require the presence of a (conserved) seed match to the 5' end of the microRNA. Recently however, it has been shown that this requirement might be too stringent, leading to a substantial number of missed target sites. Results We developed TargetSpy, a novel computational approach for predicting target sites regardless of the presence of a seed match. It is based on machine learning and automatic feature selection using a wide spectrum of compositional, structural, and base pairing features covering current biological knowledge. Our model does not rely on evolutionary conservation, which allows the detection of species-specific interactions and makes TargetSpy suitable for analyzing unconserved genomic sequences. In order to allow for an unbiased comparison of TargetSpy to other methods, we classified all algorithms into three groups: I) no seed match requirement, II) seed match requirement, and III) conserved seed match requirement. TargetSpy predictions for classes II and III are generated by appropriate postfiltering. On a human dataset revealing fold-change in protein production for five selected microRNAs our method shows superior performance in all classes. In Drosophila melanogaster not only our class II and III predictions are on par with other algorithms, but notably the class I (no-seed) predictions are just marginally less accurate. We estimate that TargetSpy predicts between 26 and 112 functional target sites without a seed match per microRNA that are missed by all other currently available algorithms. Conclusion Only a few algorithms can predict target sites without demanding a seed match and TargetSpy demonstrates a substantial improvement in prediction accuracy in that class. Furthermore, when conservation and the presence of a seed match are required, the performance is comparable with state-of-the-art algorithms. TargetSpy was trained on mouse and performs well in human and drosophila, suggesting that it may be applicable to a broad range of species. Moreover, we have demonstrated that the application of machine learning techniques in combination with upcoming deep sequencing data results in a powerful microRNA target site prediction tool http://www.targetspy.org. PMID:20509939

  2. Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection

    DTIC Science & Technology

    2015-02-01

    multi-resolution stacking (MRS), which is a stack of ensemble classifiers. Each classifier in a building block inputs the concatenation of the predictions ...a base classifier in MRS, named boosted deep neural network (bDNN). bDNN first generates multiple base predictions from different contexts of a single...frame by only one DNN and then aggregates the base predictions for a better prediction of the frame, and it is different from computationally

  3. A Statistical Guide to the Design of Deep Mutational Scanning Experiments

    PubMed Central

    Matuszewski, Sebastian; Hildebrandt, Marcel E.; Ghenu, Ana-Hermina; Jensen, Jeffrey D.; Bank, Claudia

    2016-01-01

    The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deep-sequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates. PMID:27412710

  4. Hubble Tarantula Treasury Project - VI. Identification of Pre-Main-Sequence Stars using Machine Learning techniques

    NASA Astrophysics Data System (ADS)

    Ksoll, Victor F.; Gouliermis, Dimitrios A.; Klessen, Ralf S.; Grebel, Eva K.; Sabbi, Elena; Anderson, Jay; Lennon, Daniel J.; Cignoni, Michele; de Marchi, Guido; Smith, Linda J.; Tosi, Monica; van der Marel, Roeland P.

    2018-05-01

    The Hubble Tarantula Treasury Project (HTTP) has provided an unprecedented photometric coverage of the entire star-burst region of 30 Doradus down to the half Solar mass limit. We use the deep stellar catalogue of HTTP to identify all the pre-main-sequence (PMS) stars of the region, i.e., stars that have not started their lives on the main-sequence yet. The photometric distinction of these stars from the more evolved populations is not a trivial task due to several factors that alter their colour-magnitude diagram positions. The identification of PMS stars requires, thus, sophisticated statistical methods. We employ Machine Learning Classification techniques on the HTTP survey of more than 800,000 sources to identify the PMS stellar content of the observed field. Our methodology consists of 1) carefully selecting the most probable low-mass PMS stellar population of the star-forming cluster NGC2070, 2) using this sample to train classification algorithms to build a predictive model for PMS stars, and 3) applying this model in order to identify the most probable PMS content across the entire Tarantula Nebula. We employ Decision Tree, Random Forest and Support Vector Machine classifiers to categorise the stars as PMS and Non-PMS. The Random Forest and Support Vector Machine provided the most accurate models, predicting about 20,000 sources with a candidateship probability higher than 50 percent, and almost 10,000 PMS candidates with a probability higher than 95 percent. This is the richest and most accurate photometric catalogue of extragalactic PMS candidates across the extent of a whole star-forming complex.

  5. Identification and characterization of microRNAs in white and brown alpaca skin

    PubMed Central

    2012-01-01

    Background MicroRNAs (miRNAs) are small, non-coding 21–25 nt RNA molecules that play an important role in regulating gene expression. Little is known about the expression profiles and functions of miRNAs in skin and their role in pigmentation. Alpacas have more than 22 natural coat colors, more than any other fiber producing species. To better understand the role of miRNAs in control of coat color we performed a comprehensive analysis of miRNA expression profiles in skin of white versus brown alpacas. Results Two small RNA libraries from white alpaca (WA) and brown alpaca (BA) skin were sequenced with the aid of Illumina sequencing technology. 272 and 267 conserved miRNAs were obtained from the WA and BA skin libraries, respectively. Of these conserved miRNAs, 35 and 13 were more abundant in WA and BA skin, respectively. The targets of these miRNAs were predicted and grouped based on Gene Ontology and KEGG pathway analysis. Many predicted target genes for these miRNAs are involved in the melanogenesis pathway controlling pigmentation. In addition to the conserved miRNAs, we also obtained 22 potentially novel miRNAs from the WA and BA skin libraries. Conclusion This study represents the first comprehensive survey of miRNAs expressed in skin of animals of different coat colors by deep sequencing analysis. We discovered a collection of miRNAs that are differentially expressed in WA and BA skin. The results suggest important potential functions of miRNAs in coat color regulation. PMID:23067000

  6. Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Whitehead, Timothy A.; Chevalier, Aaron; Song, Yifan

    2012-06-19

    We show that comprehensive sequence-function maps obtained by deep sequencing can be used to reprogram interaction specificity and to leapfrog over bottlenecks in affinity maturation by combining many individually small contributions not detectable in conventional approaches. We use this approach to optimize two computationally designed inhibitors against H1N1 influenza hemagglutinin and, in both cases, obtain variants with subnanomolar binding affinity. The most potent of these, a 51-residue protein, is broadly cross-reactive against all influenza group 1 hemagglutinins, including human H2, and neutralizes H1N1 viruses with a potency that rivals that of several human monoclonal antibodies, demonstrating that computational design followedmore » by comprehensive energy landscape mapping can generate proteins with potential therapeutic utility.« less

  7. Dissecting genetic and environmental mutation signatures with model organisms.

    PubMed

    Segovia, Romulo; Tam, Annie S; Stirling, Peter C

    2015-08-01

    Deep sequencing has impacted on cancer research by enabling routine sequencing of genomes and exomes to identify genetic changes associated with carcinogenesis. Researchers can now use the frequency, type, and context of all mutations in tumor genomes to extract mutation signatures that reflect the driving mutational processes. Identifying mutation signatures, however, may not immediately suggest a mechanism. Consequently, several recent studies have employed deep sequencing of model organisms exposed to discrete genetic or environmental perturbations. These studies exploit the simpler genomes and availability of powerful genetic tools in model organisms to analyze mutation signatures under controlled conditions, forging mechanistic links between mutational processes and signatures. We discuss the power of this approach and suggest that many such studies may be on the horizon. Copyright © 2015 Elsevier Ltd. All rights reserved.

  8. A ruby-colored Pseudobaeospora species is described as new from material collected on the island of Hawaii.

    PubMed

    Desjardin, Dennis E; Hemmes, Don E; Perry, Brian A

    2014-01-01

    Pseudobaeospora wipapatiae is described as new based on material collected in alien wet habitats on the island of Hawaii. Unique features of this beautiful species include deep ruby-colored basidiomes with two-spored basidia, amyloid cheilocystidia and a hymeniderm pileipellis with abundant pileocystidia that is initially deep ruby in KOH then changes to lilac gray. Phylogenetic analysis of nuclear large ribosomal subunit sequence data suggest a close relationship between Pseudobaeospora and Tricholoma. BLAST comparisons of internal transcribed spacer and 5.8S nuclear ribosomal subunit regions sequence data reveal greatest similarity with existing sequences of Pseudobaeospora species. A comprehensive description, color photograph, illustrations of salient micromorphological features and comparisons with phenetically similar taxa are provided. © 2014 by The Mycological Society of America.

  9. [Severity classification of chronic obstructive pulmonary disease based on deep learning].

    PubMed

    Ying, Jun; Yang, Ceyuan; Li, Quanzheng; Xue, Wanguo; Li, Tanshi; Cao, Wenzhe

    2017-12-01

    In this paper, a deep learning method has been raised to build an automatic classification algorithm of severity of chronic obstructive pulmonary disease. Large sample clinical data as input feature were analyzed for their weights in classification. Through feature selection, model training, parameter optimization and model testing, a classification prediction model based on deep belief network was built to predict severity classification criteria raised by the Global Initiative for Chronic Obstructive Lung Disease (GOLD). We get accuracy over 90% in prediction for two different standardized versions of severity criteria raised in 2007 and 2011 respectively. Moreover, we also got the contribution ranking of different input features through analyzing the model coefficient matrix and confirmed that there was a certain degree of agreement between the more contributive input features and the clinical diagnostic knowledge. The validity of the deep belief network model was proved by this result. This study provides an effective solution for the application of deep learning method in automatic diagnostic decision making.

  10. Deep-water kelp refugia as potential hotspots of tropical marine diversity and productivity.

    PubMed

    Graham, Michael H; Kinlan, Brian P; Druehl, Louis D; Garske, Lauren E; Banks, Stuart

    2007-10-16

    Classic marine ecological paradigms view kelp forests as inherently temperate-boreal phenomena replaced by coral reefs in tropical waters. These paradigms hinge on the notion that tropical surface waters are too warm and nutrient-depleted to support kelp productivity and survival. We present a synthetic oceanographic and ecophysiological model that accurately identifies all known kelp populations and, by using the same criteria, predicts the existence of >23,500 km(2) unexplored submerged (30- to 200-m depth) tropical kelp habitats. Predicted tropical kelp habitats were most probable in regions where bathymetry and upwelling resulted in mixed-layer shoaling above the depth of minimum annual irradiance dose for kelp survival. Using model predictions, we discovered extensive new deep-water Eisenia galapagensis populations in the Galápagos that increased in abundance with increasing depth to >60 m, complete with cold-water flora and fauna of temperate affinities. The predictability of deep-water kelp habitat and the discovery of expansive deep-water Galápagos kelp forests validate the extent of deep-water tropical kelp refugia, with potential implications for regional productivity and biodiversity, tropical food web ecology, and understanding of the resilience of tropical marine systems to climate change.

  11. Deep learning based tissue analysis predicts outcome in colorectal cancer.

    PubMed

    Bychkov, Dmitrii; Linder, Nina; Turkki, Riku; Nordling, Stig; Kovanen, Panu E; Verrill, Clare; Walliander, Margarita; Lundin, Mikael; Haglund, Caj; Lundin, Johan

    2018-02-21

    Image-based machine learning and deep learning in particular has recently shown expert-level accuracy in medical image classification. In this study, we combine convolutional and recurrent architectures to train a deep network to predict colorectal cancer outcome based on images of tumour tissue samples. The novelty of our approach is that we directly predict patient outcome, without any intermediate tissue classification. We evaluate a set of digitized haematoxylin-eosin-stained tumour tissue microarray (TMA) samples from 420 colorectal cancer patients with clinicopathological and outcome data available. The results show that deep learning-based outcome prediction with only small tissue areas as input outperforms (hazard ratio 2.3; CI 95% 1.79-3.03; AUC 0.69) visual histological assessment performed by human experts on both TMA spot (HR 1.67; CI 95% 1.28-2.19; AUC 0.58) and whole-slide level (HR 1.65; CI 95% 1.30-2.15; AUC 0.57) in the stratification into low- and high-risk patients. Our results suggest that state-of-the-art deep learning techniques can extract more prognostic information from the tissue morphology of colorectal cancer than an experienced human observer.

  12. Readmission prediction via deep contextual embedding of clinical concepts.

    PubMed

    Xiao, Cao; Ma, Tengfei; Dieng, Adji B; Blei, David M; Wang, Fei

    2018-01-01

    Hospital readmission costs a lot of money every year. Many hospital readmissions are avoidable, and excessive hospital readmissions could also be harmful to the patients. Accurate prediction of hospital readmission can effectively help reduce the readmission risk. However, the complex relationship between readmission and potential risk factors makes readmission prediction a difficult task. The main goal of this paper is to explore deep learning models to distill such complex relationships and make accurate predictions. We propose CONTENT, a deep model that predicts hospital readmissions via learning interpretable patient representations by capturing both local and global contexts from patient Electronic Health Records (EHR) through a hybrid Topic Recurrent Neural Network (TopicRNN) model. The experiment was conducted using the EHR of a real world Congestive Heart Failure (CHF) cohort of 5,393 patients. The proposed model outperforms state-of-the-art methods in readmission prediction (e.g. 0.6103 ± 0.0130 vs. second best 0.5998 ± 0.0124 in terms of ROC-AUC). The derived patient representations were further utilized for patient phenotyping. The learned phenotypes provide more precise understanding of readmission risks. Embedding both local and global context in patient representation not only improves prediction performance, but also brings interpretable insights of understanding readmission risks for heterogeneous chronic clinical conditions. This is the first of its kind model that integrates the power of both conventional deep neural network and the probabilistic generative models for highly interpretable deep patient representation learning. Experimental results and case studies demonstrate the improved performance and interpretability of the model.

  13. Contrasting genomic properties of free-living and particle-attached microbial assemblages within a coastal ecosystem

    PubMed Central

    Smith, Maria W.; Zeigler Allen, Lisa; Allen, Andrew E.; Herfort, Lydie; Simon, Holly M.

    2013-01-01

    The Columbia River (CR) is a powerful economic and environmental driver in the US Pacific Northwest. Microbial communities in the water column were analyzed from four diverse habitats: (1) an estuarine turbidity maximum (ETM), (2) a chlorophyll maximum of the river plume, (3) an upwelling-associated hypoxic zone, and (4) the deep ocean bottom. Three size fractions, 0.1–0.8, 0.8–3, and 3–200 μm were collected for each habitat in August 2007, and used for DNA isolation and 454 sequencing, resulting in 12 metagenomes of >5 million reads (>1.6 Gbp). To characterize the dominant microorganisms and metabolisms contributing to coastal biogeochemistry, we used predicted peptide and rRNA data. The 3- and 0.8-μm metagenomes, representing particulate fractions, were taxonomically diverse across habitats. The 3-μm size fractions contained a high abundance of eukaryota with diatoms dominating the hypoxic water and plume, while cryptophytes were more abundant in the ETM. The 0.1-μm metagenomes represented mainly free-living bacteria and archaea. The most abundant archaeal hits were observed in the deep ocean and hypoxic water (19% of prokaryotic peptides in the 0.1-μm metagenomes), and were homologous to Nitrosopumilus maritimus (ammonia-oxidizing Thaumarchaeota). Bacteria dominated metagenomes of all samples. In the euphotic zone (estuary, plume and hypoxic ocean), the most abundant bacterial taxa (≥40% of prokaryotic peptides) represented aerobic photoheterotrophs. In contrast, the low-oxygen, deep water metagenome was enriched with sequences for strict and facultative anaerobes. Interestingly, many of the same anaerobic bacterial families were enriched in the 3-μm size fraction of the ETM (2–10X more abundant relative to the 0.1-μm metagenome), indicating possible formation of anoxic microniches within particles. Results from this study provide a metagenome perspective on ecosystem-scale metabolism in an upwelling-influenced river-dominated coastal margin. PMID:23750156

  14. Farming system context drives the value of deep wheat roots in semi-arid environments

    PubMed Central

    Lilley, Julianne M.; Kirkegaard, John A.

    2016-01-01

    The capture of subsoil water by wheat roots can make a valuable contribution to grain yield on deep soils. More extensive root systems can capture more water, but leave the soil in a drier state, potentially limiting water availability to subsequent crops. To evaluate the importance of these legacy effects, a long-term simulation analysis at eight sites in the semi-arid environment of Australia compared the yield of standard wheat cultivars with cultivars that were (i) modified to have root systems which extract more water at depth and/or (ii) sown earlier to increase the duration of the vegetative period and hence rooting depth. We compared simulations with and without annual resetting of soil water to investigate the legacy effects of drier subsoils related to modified root systems. Simulated mean yield benefits from modified root systems declined from 0.1–0.6 t ha−1 when annually reset, to 0–0.2 t ha−1 in the continuous simulation due to a legacy of drier soils (mean 0–32mm) at subsequent crop sowing. For continuous simulations, predicted yield benefits of >0.2 t ha−1 from more extensive root systems were rare (3–10% of years) at sites with shallow soils (<1.0 m), but occurred in 14–44% of years at sites with deeper soils (1.6–2.5 m). Earlier sowing had a larger impact than modified root systems on water uptake (14–31 vs 2–17mm) and mean yield increase (up to 0.7 vs 0–0.2 t ha−1) and the benefits occurred on deep and shallow soils and in more years (9–79 vs 3–44%). Increasing the proportion of crops in the sequence which dry the subsoil extensively has implications for the farming system productivity, and the crop sequence must be managed tactically to optimize overall system benefits. PMID:26976814

  15. Farming system context drives the value of deep wheat roots in semi-arid environments.

    PubMed

    Lilley, Julianne M; Kirkegaard, John A

    2016-06-01

    The capture of subsoil water by wheat roots can make a valuable contribution to grain yield on deep soils. More extensive root systems can capture more water, but leave the soil in a drier state, potentially limiting water availability to subsequent crops. To evaluate the importance of these legacy effects, a long-term simulation analysis at eight sites in the semi-arid environment of Australia compared the yield of standard wheat cultivars with cultivars that were (i) modified to have root systems which extract more water at depth and/or (ii) sown earlier to increase the duration of the vegetative period and hence rooting depth. We compared simulations with and without annual resetting of soil water to investigate the legacy effects of drier subsoils related to modified root systems. Simulated mean yield benefits from modified root systems declined from 0.1-0.6 t ha(-1) when annually reset, to 0-0.2 t ha(-1) in the continuous simulation due to a legacy of drier soils (mean 0-32mm) at subsequent crop sowing. For continuous simulations, predicted yield benefits of >0.2 t ha(-1) from more extensive root systems were rare (3-10% of years) at sites with shallow soils (<1.0 m), but occurred in 14-44% of years at sites with deeper soils (1.6-2.5 m). Earlier sowing had a larger impact than modified root systems on water uptake (14-31 vs 2-17mm) and mean yield increase (up to 0.7 vs 0-0.2 t ha(-1)) and the benefits occurred on deep and shallow soils and in more years (9-79 vs 3-44%). Increasing the proportion of crops in the sequence which dry the subsoil extensively has implications for the farming system productivity, and the crop sequence must be managed tactically to optimize overall system benefits. © The Author 2016. Published by Oxford University Press on behalf of the Society for Experimental Biology.

  16. Deep learning architecture for air quality predictions.

    PubMed

    Li, Xiang; Peng, Ling; Hu, Yuan; Shao, Jing; Chi, Tianhe

    2016-11-01

    With the rapid development of urbanization and industrialization, many developing countries are suffering from heavy air pollution. Governments and citizens have expressed increasing concern regarding air pollution because it affects human health and sustainable development worldwide. Current air quality prediction methods mainly use shallow models; however, these methods produce unsatisfactory results, which inspired us to investigate methods of predicting air quality based on deep architecture models. In this paper, a novel spatiotemporal deep learning (STDL)-based air quality prediction method that inherently considers spatial and temporal correlations is proposed. A stacked autoencoder (SAE) model is used to extract inherent air quality features, and it is trained in a greedy layer-wise manner. Compared with traditional time series prediction models, our model can predict the air quality of all stations simultaneously and shows the temporal stability in all seasons. Moreover, a comparison with the spatiotemporal artificial neural network (STANN), auto regression moving average (ARMA), and support vector regression (SVR) models demonstrates that the proposed method of performing air quality predictions has a superior performance.

  17. Characterization of Tensile Properties, Limiting Strains, and Deep Drawing Behavior of AA5754-H22 Sheet at Elevated Temperature

    NASA Astrophysics Data System (ADS)

    Panicker, Sudhy S.; Singh, Har Govind; Panda, Sushanta Kumar; Dashwood, Richard

    2015-11-01

    Automotive industries are very much interested in characterization of formability improvement of aluminum alloys at elevated temperatures before designing tools, heating systems, and processing sequences for fabrication of auto-body panels by warm forming technology. In this study, tensile tests of AA5754-H22 aluminum alloy were carried out at five different temperatures and three different strain rates to investigate the deformation behavior correlating with Cowper-Symonds constitutive equation. Laboratory scale warm forming facilities were designed and fabricated to perform limiting dome height and deep drawing tests to evaluate forming limit strains and drawability of sheet metal at different tool temperatures. The forming limit strain and dome height improved significantly when both the die and punch were heated to 200 °C. Remarkable improvement in deep drawn cup depth was observed when die and punch temperatures were maintained at 200 and 30 °C, respectively, producing a non-isothermal temperature gradient of approximately 93 °C across the blank from flange to center. The forming behavior at different isothermal and non-isothermal conditions were predicted successfully using a thermo-mechanical FE model incorporating temperature-dependent properties in Barlat-89 yield criterion coupled with Cowper-Symonds hardening model, and the thinning/failure location in deformed cups were validated implementing the experimental limiting strains as damage model.

  18. Deep Sequencing Reveals the Complete Genome and Evidence for Transcriptional Activity of the First Virus-Like Sequences Identified in Aristotelia chilensis (Maqui Berry)

    PubMed Central

    Villacreses, Javier; Rojas-Herrera, Marcelo; Sánchez, Carolina; Hewstone, Nicole; Undurraga, Soledad F.; Alzate, Juan F.; Manque, Patricio; Maracaja-Coutinho, Vinicius; Polanco, Victor

    2015-01-01

    Here, we report the genome sequence and evidence for transcriptional activity of a virus-like element in the native Chilean berry tree Aristotelia chilensis. We propose to name the endogenous sequence as Aristotelia chilensis Virus 1 (AcV1). High-throughput sequencing of the genome of this tree uncovered an endogenous viral element, with a size of 7122 bp, corresponding to the complete genome of AcV1. Its sequence contains three open reading frames (ORFs): ORFs 1 and 2 shares 66%–73% amino acid similarity with members of the Caulimoviridae virus family, especially the Petunia vein clearing virus (PVCV), Petuvirus genus. ORF1 encodes a movement protein (MP); ORF2 a Reverse Transcriptase (RT) and a Ribonuclease H (RNase H) domain; and ORF3 showed no amino acid sequence similarity with any other known virus proteins. Analogous to other known endogenous pararetrovirus sequences (EPRVs), AcV1 is integrated in the genome of Maqui Berry and showed low viral transcriptional activity, which was detected by deep sequencing technology (DNA and RNA-seq). Phylogenetic analysis of AcV1 and other pararetroviruses revealed a closer resemblance with Petuvirus. Overall, our data suggests that AcV1 could be a new member of Caulimoviridae family, genus Petuvirus, and the first evidence of this kind of virus in a fruit plant. PMID:25855242

  19. High Diversity of Myocyanophage in Various Aquatic Environments Revealed by High-Throughput Sequencing of Major Capsid Protein Gene With a New Set of Primers.

    PubMed

    Hou, Weiguo; Wang, Shang; Briggs, Brandon R; Li, Gaoyuan; Xie, Wei; Dong, Hailiang

    2018-01-01

    Myocyanophages, a group of viruses infecting cyanobacteria, are abundant and play important roles in elemental cycling. Here we investigated the particle-associated viral communities retained on 0.2 μm filters and in sediment samples (representing ancient cyanophage communities) from four ocean and three lake locations, using high-throughput sequencing and a newly designed primer pair targeting a gene fragment (∼145-bp in length) encoding the cyanophage gp23 major capsid protein (MCP). Diverse viral communities were detected in all samples. The fragments of 142-, 145-, and 148-bp in length were most abundant in the amplicons, and most sequences (>92%) belonged to cyanophages. Additionally, different sequencing depths resulted in different diversity estimates of the viral community. Operational taxonomic units obtained from deep sequencing of the MCP gene covered the majority of those obtained from shallow sequencing, suggesting that deep sequencing exhibited a more complete picture of cyanophage community than shallow sequencing. Our results also revealed a wide geographic distribution of marine myocyanophages, i.e., higher dissimilarities of the myocyanophage communities corresponded with the larger distances between the sampling sites. Collectively, this study suggests that the newly designed primer pair can be effectively used to study the community and diversity of myocyanophage from different environments, and the high-throughput sequencing represents a good method to understand viral diversity.

  20. High Diversity of Myocyanophage in Various Aquatic Environments Revealed by High-Throughput Sequencing of Major Capsid Protein Gene With a New Set of Primers

    PubMed Central

    Hou, Weiguo; Wang, Shang; Briggs, Brandon R.; Li, Gaoyuan; Xie, Wei; Dong, Hailiang

    2018-01-01

    Myocyanophages, a group of viruses infecting cyanobacteria, are abundant and play important roles in elemental cycling. Here we investigated the particle-associated viral communities retained on 0.2 μm filters and in sediment samples (representing ancient cyanophage communities) from four ocean and three lake locations, using high-throughput sequencing and a newly designed primer pair targeting a gene fragment (∼145-bp in length) encoding the cyanophage gp23 major capsid protein (MCP). Diverse viral communities were detected in all samples. The fragments of 142-, 145-, and 148-bp in length were most abundant in the amplicons, and most sequences (>92%) belonged to cyanophages. Additionally, different sequencing depths resulted in different diversity estimates of the viral community. Operational taxonomic units obtained from deep sequencing of the MCP gene covered the majority of those obtained from shallow sequencing, suggesting that deep sequencing exhibited a more complete picture of cyanophage community than shallow sequencing. Our results also revealed a wide geographic distribution of marine myocyanophages, i.e., higher dissimilarities of the myocyanophage communities corresponded with the larger distances between the sampling sites. Collectively, this study suggests that the newly designed primer pair can be effectively used to study the community and diversity of myocyanophage from different environments, and the high-throughput sequencing represents a good method to understand viral diversity.

  1. Molecular diversity and distribution pattern of ciliates in sediments from deep-sea hydrothermal vents in the Okinawa Trough and adjacent sea areas

    NASA Astrophysics Data System (ADS)

    Zhao, Feng; Xu, Kuidong

    2016-10-01

    In comparison with the macrobenthos and prokaryotes, patterns of diversity and distribution of microbial eukaryotes in deep-sea hydrothermal vents are poorly known. The widely used high-throughput sequencing of 18S rDNA has revealed a high diversity of microeukaryotes yielded from both living organisms and buried DNA in marine sediments. More recently, cDNA surveys have been utilized to uncover the diversity of active organisms. However, both methods have never been used to evaluate the diversity of ciliates in hydrothermal vents. By using high-throughput DNA and cDNA sequencing of 18S rDNA, we evaluated the molecular diversity of ciliates, a representative group of microbial eukaryotes, from the sediments of deep-sea hydrothermal vents in the Okinawa Trough and compared it with that of an adjacent deep-sea area about 15 km away and that of an offshore area of the Yellow Sea about 500 km away. The results of DNA sequencing showed that Spirotrichea and Oligohymenophorea were the most diverse and abundant groups in all the three habitats. The proportion of sequences of Oligohymenophorea was the highest in the hydrothermal vents whereas Spirotrichea was the most diverse group at all three habitats. Plagiopyleans were found only in the hydrothermal vents but with low diversity and abundance. By contrast, the cDNA sequencing showed that Plagiopylea was the most diverse and most abundant group in the hydrothermal vents, followed by Spirotrichea in terms of diversity and Oligohymenophorea in terms of relative abundance. A novel group of ciliates, distinctly separate from the 12 known classes, was detected in the hydrothermal vents, indicating undescribed, possibly highly divergent ciliates may inhabit this environment. Statistical analyses showed that: (i) the three habitats differed significantly from one another in terms of diversity of both the rare and the total ciliate taxa, and; (ii) the adjacent deep sea was more similar to the offshore area than to the hydrothermal vents. In terms of the diversity of abundant taxa, however, there was no significant difference between the hydrothermal vents and the adjacent deep sea, both of which differed significantly from the offshore area. As abundant ciliate taxa can be found in several sampling sites, they are likely adapted to large environmental variations, while rare taxa are found in specific habitat and thus are potentially more sensitive to varying environmental conditions.

  2. Role of Mitochondrial Inheritance on Prostate Cancer Outcome in African American Men. Addendum

    DTIC Science & Technology

    2016-11-01

    DNA sequencing technique developed by our collaborator using single amplicon long-range PCR that permits deep coverage (10,000-20,000X on average) of...the mitochondrial genome. We have sequenced 652 samples derived from frozen fully using this technology. The additional DNA samples derived from...paraffin embedded (FFPE) tissue were more challenging, but have now been sequenced . Mapping of DNA variants in our sequenced genomes to mitochondrial

  3. Deep Sequencing Reveals a Divergent Ugandan cassava brown streak virus Isolate from Malawi

    PubMed Central

    Winter, Stephan; Mukasa, Settumba; Tairo, Fred; Sseruwagi, Peter; Ndunguru, Joseph; Duffy, Siobain

    2017-01-01

    ABSTRACT Illumina sequencing of RNA from a cassava cutting from northern Malawi produced a genome of Ugandan cassava brown streak virus (UCBSV-MW-NB7_2013). Sequence comparisons revealed stronger similarity to an isolate from nearby Tanzania (93.4% pairwise nucleotide identity) than to those previously reported from Malawi (86.9 to 87.0%). PMID:28818908

  4. High-Throughput SNP Discovery through Deep Resequencing of a Reduced Representation Library to Anchor and Orient Scaffolds in the Soybean Whole Genome Sequence

    USDA-ARS?s Scientific Manuscript database

    The soybean Consensus Map 4.0 facilitated the anchoring of 95.6% of the soybean whole genome sequence developed by the Joint Genome Institute, Department of Energy but only properly oriented 66% of the sequence scaffolds. To find additional single nucleotide polymorphism (SNP) markers for additiona...

  5. Deep learning-based features of breast MRI for prediction of occult invasive disease following a diagnosis of ductal carcinoma in situ: preliminary data

    NASA Astrophysics Data System (ADS)

    Zhu, Zhe; Harowicz, Michael; Zhang, Jun; Saha, Ashirbani; Grimm, Lars J.; Hwang, Shelley; Mazurowski, Maciej A.

    2018-02-01

    Approximately 25% of patients with ductal carcinoma in situ (DCIS) diagnosed from core needle biopsy are subsequently upstaged to invasive cancer at surgical excision. Identifying patients with occult invasive disease is important as it changes treatment and precludes enrollment in active surveillance for DCIS. In this study, we investigated upstaging of DCIS to invasive disease using deep features. While deep neural networks require large amounts of training data, the available data to predict DCIS upstaging is sparse and thus directly training a neural network is unlikely to be successful. In this work, a pre-trained neural network is used as a feature extractor and a support vector machine (SVM) is trained on the extracted features. We used the dynamic contrast-enhanced (DCE) MRIs of patients at our institution from January 1, 2000, through March 23, 2014 who underwent MRI following a diagnosis of DCIS. Among the 131 DCIS patients, there were 35 patients who were upstaged to invasive cancer. Area under the ROC curve within the 10-fold cross-validation scheme was used for validation of our predictive model. The use of deep features was able to achieve an AUC of 0.68 (95% CI: 0.56-0.78) to predict occult invasive disease. This preliminary work demonstrates the promise of deep features to predict surgical upstaging following a diagnosis of DCIS.

  6. Identifying active foraminifera in the Sea of Japan using metatranscriptomic approach

    NASA Astrophysics Data System (ADS)

    Lejzerowicz, Franck; Voltsky, Ivan; Pawlowski, Jan

    2013-02-01

    Metagenetics represents an efficient and rapid tool to describe environmental diversity patterns of microbial eukaryotes based on ribosomal DNA sequences. However, the results of metagenetic studies are often biased by the presence of extracellular DNA molecules that are persistent in the environment, especially in deep-sea sediment. As an alternative, short-lived RNA molecules constitute a good proxy for the detection of active species. Here, we used a metatranscriptomic approach based on RNA-derived (cDNA) sequences to study the diversity of the deep-sea benthic foraminifera and compared it to the metagenetic approach. We analyzed 257 ribosomal DNA and cDNA sequences obtained from seven sediments samples collected in the Sea of Japan at depths ranging from 486 to 3665 m. The DNA and RNA-based approaches gave a similar view of the taxonomic composition of foraminiferal assemblage, but differed in some important points. First, the cDNA dataset was dominated by sequences of rotaliids and robertiniids, suggesting that these calcareous species, some of which have been observed in Rose Bengal stained samples, are the most active component of foraminiferal community. Second, the richness of monothalamous (single-chambered) foraminifera was particularly high in DNA extracts from the deepest samples, confirming that this group of foraminifera is abundant but not necessarily very active in the deep-sea sediments. Finally, the high divergence of undetermined sequences in cDNA dataset indicate the limits of our database and lack of knowledge about some active but possibly rare species. Our study demonstrates the capability of the metatranscriptomic approach to detect active foraminiferal species and prompt its use in future high-throughput sequencing-based environmental surveys.

  7. Carbonate sedimentation in an extensional active margin: Cretaceous history of the Haymana region, Pontides

    NASA Astrophysics Data System (ADS)

    Okay, Aral I.; Altiner, Demir

    2016-10-01

    The Haymana region in Central Anatolia is located in the southern part of the Pontides close to the İzmir-Ankara suture. During the Cretaceous, the region formed part of the south-facing active margin of the Eurasia. The area preserves a nearly complete record of the Cretaceous system. Shallow marine carbonates of earliest Cretaceous age are overlain by a 700-m-thick Cretaceous sequence, dominated by deep marine limestones. Three unconformity-bounded pelagic carbonate sequences of Berriasian, Albian-Cenomanian and Turonian-Santonian ages are recognized: Each depositional sequence is preceded by a period of tilting and submarine erosion during the Berriasian, early Albian and late Cenomanian, which corresponds to phases of local extension in the active continental margin. Carbonate breccias mark the base of the sequences and each carbonate sequence steps down on older units. The deep marine carbonate deposition ended in the late Santonian followed by tilting, erosion and folding during the Campanian. Deposition of thick siliciclastic turbidites started in the late Campanian and continued into the Tertiary. Unlike most forearc basins, the Haymana region was a site of deep marine carbonate deposition until the Campanian. This was because the Pontide arc was extensional and the volcanic detritus was trapped in the intra-arc basins and did not reach the forearc or the trench. The extensional nature of the arc is also shown by the opening of the Black Sea as a backarc basin in the Turonian-Santonian. The carbonate sedimentation in an active margin is characterized by synsedimentary vertical displacements, which results in submarine erosion, carbonate breccias and in the lateral discontinuity of the sequences, and differs from blanket like carbonate deposition in the passive margins.

  8. Subsurface microbial diversity in deep-granitic-fracture water in Colorado

    USGS Publications Warehouse

    Sahl, J.W.; Schmidt, R.; Swanner, E.D.; Mandernack, K.W.; Templeton, A.S.; Kieft, Thomas L.; Smith, R.L.; Sanford, W.E.; Callaghan, R.L.; Mitton, J.B.; Spear, J.R.

    2008-01-01

    A microbial community analysis using 16S rRNA gene sequencing was performed on borehole water and a granite rock core from Henderson Mine, a >1,000-meter-deep molybdenum mine near Empire, CO. Chemical analysis of borehole water at two separate depths (1,044 m and 1,004 m below the mine entrance) suggests that a sharp chemical gradient exists, likely from the mixing of two distinct subsurface fluids, one metal rich and one relatively dilute; this has created unique niches for microorganisms. The microbial community analyzed from filtered, oxic borehole water indicated an abundance of sequences from iron-oxidizing bacteria (Gallionella spp.) and was compared to the community from the same borehole after 2 weeks of being plugged with an expandable packer. Statistical analyses with UniFrac revealed a significant shift in community structure following the addition of the packer. Phospholipid fatty acid (PLFA) analysis suggested that Nitrosomonadales dominated the oxic borehole, while PLFAs indicative of anaerobic bacteria were most abundant in the samples from the plugged borehole. Microbial sequences were represented primarily by Firmicutes, Proteobacteria, and a lineage of sequences which did not group with any identified bacterial division; phylogenetic analyses confirmed the presence of a novel candidate division. This "Henderson candidate division" dominated the clone libraries from the dilute anoxic fluids. Sequences obtained from the granitic rock core (1,740 m below the surface) were represented by the divisions Proteobacteria (primarily the family Ralstoniaceae) and Firmicutes. Sequences grouping within Ralstoniaceae were also found in the clone libraries from metal-rich fluids yet were absent in more dilute fluids. Lineage-specific comparisons, combined with phylogenetic statistical analyses, show that geochemical variance has an important effect on microbial community structure in deep, subsurface systems. Copyright ?? 2008, American Society for Microbiology. All Rights Reserved.

  9. Subsurface Microbial Diversity in Deep-Granitic-Fracture Water in Colorado▿

    PubMed Central

    Sahl, Jason W.; Schmidt, Raleigh; Swanner, Elizabeth D.; Mandernack, Kevin W.; Templeton, Alexis S.; Kieft, Thomas L.; Smith, Richard L.; Sanford, William E.; Callaghan, Robert L.; Mitton, Jeffry B.; Spear, John R.

    2008-01-01

    A microbial community analysis using 16S rRNA gene sequencing was performed on borehole water and a granite rock core from Henderson Mine, a >1,000-meter-deep molybdenum mine near Empire, CO. Chemical analysis of borehole water at two separate depths (1,044 m and 1,004 m below the mine entrance) suggests that a sharp chemical gradient exists, likely from the mixing of two distinct subsurface fluids, one metal rich and one relatively dilute; this has created unique niches for microorganisms. The microbial community analyzed from filtered, oxic borehole water indicated an abundance of sequences from iron-oxidizing bacteria (Gallionella spp.) and was compared to the community from the same borehole after 2 weeks of being plugged with an expandable packer. Statistical analyses with UniFrac revealed a significant shift in community structure following the addition of the packer. Phospholipid fatty acid (PLFA) analysis suggested that Nitrosomonadales dominated the oxic borehole, while PLFAs indicative of anaerobic bacteria were most abundant in the samples from the plugged borehole. Microbial sequences were represented primarily by Firmicutes, Proteobacteria, and a lineage of sequences which did not group with any identified bacterial division; phylogenetic analyses confirmed the presence of a novel candidate division. This “Henderson candidate division” dominated the clone libraries from the dilute anoxic fluids. Sequences obtained from the granitic rock core (1,740 m below the surface) were represented by the divisions Proteobacteria (primarily the family Ralstoniaceae) and Firmicutes. Sequences grouping within Ralstoniaceae were also found in the clone libraries from metal-rich fluids yet were absent in more dilute fluids. Lineage-specific comparisons, combined with phylogenetic statistical analyses, show that geochemical variance has an important effect on microbial community structure in deep, subsurface systems. PMID:17981950

  10. Deep sequencing of hepatitis C virus hypervariable region 1 reveals no correlation between genetic heterogeneity and antiviral treatment outcome

    PubMed Central

    2014-01-01

    Background Hypervariable region 1 (HVR1) contained within envelope protein 2 (E2) gene is the most variable part of HCV genome and its translation product is a major target for the host immune response. Variability within HVR1 may facilitate evasion of the immune response and could affect treatment outcome. The aim of the study was to analyze the impact of HVR1 heterogeneity employing sensitive ultra-deep sequencing, on the outcome of PEG-IFN-α (pegylated interferon α) and ribavirin treatment. Methods HVR1 sequences were amplified from pretreatment serum samples of 25 patients infected with genotype 1b HCV (12 responders and 13 non-responders) and were subjected to pyrosequencing (GS Junior, 454/Roche). Reads were corrected for sequencing error using ShoRAH software, while population reconstruction was done using three different minimal variant frequency cut-offs of 1%, 2% and 5%. Statistical analysis was done using Mann–Whitney and Fisher’s exact tests. Results Complexity, Shannon entropy, nucleotide diversity per site, genetic distance and the number of genetic substitutions were not significantly different between responders and non-responders, when analyzing viral populations at any of the three frequencies (≥1%, ≥2% and ≥5%). When clonal sample was used to determine pyrosequencing error, 4% of reads were found to be incorrect and the most abundant variant was present at a frequency of 1.48%. Use of ShoRAH reduced the sequencing error to 1%, with the most abundant erroneous variant present at frequency of 0.5%. Conclusions While deep sequencing revealed complex genetic heterogeneity of HVR1 in chronic hepatitis C patients, there was no correlation between treatment outcome and any of the analyzed quasispecies parameters. PMID:25016390

  11. Topological self-organization and prediction learning support both action and lexical chains in the brain.

    PubMed

    Chersi, Fabian; Ferro, Marcello; Pezzulo, Giovanni; Pirrelli, Vito

    2014-07-01

    A growing body of evidence in cognitive psychology and neuroscience suggests a deep interconnection between sensory-motor and language systems in the brain. Based on recent neurophysiological findings on the anatomo-functional organization of the fronto-parietal network, we present a computational model showing that language processing may have reused or co-developed organizing principles, functionality, and learning mechanisms typical of premotor circuit. The proposed model combines principles of Hebbian topological self-organization and prediction learning. Trained on sequences of either motor or linguistic units, the network develops independent neuronal chains, formed by dedicated nodes encoding only context-specific stimuli. Moreover, neurons responding to the same stimulus or class of stimuli tend to cluster together to form topologically connected areas similar to those observed in the brain cortex. Simulations support a unitary explanatory framework reconciling neurophysiological motor data with established behavioral evidence on lexical acquisition, access, and recall. Copyright © 2014 Cognitive Science Society, Inc.

  12. Ribosome profiling reveals pervasive and regulated stop codon readthrough in Drosophila melanogaster

    PubMed Central

    Dunn, Joshua G; Foo, Catherine K; Belletier, Nicolette G; Gavis, Elizabeth R; Weissman, Jonathan S

    2013-01-01

    Ribosomes can read through stop codons in a regulated manner, elongating rather than terminating the nascent peptide. Stop codon readthrough is essential to diverse viruses, and phylogenetically predicted to occur in a few hundred genes in Drosophila melanogaster, but the importance of regulated readthrough in eukaryotes remains largely unexplored. Here, we present a ribosome profiling assay (deep sequencing of ribosome-protected mRNA fragments) for Drosophila melanogaster, and provide the first genome-wide experimental analysis of readthrough. Readthrough is far more pervasive than expected: the vast majority of readthrough events evolved within D. melanogaster and were not predicted phylogenetically. The resulting C-terminal protein extensions show evidence of selection, contain functional subcellular localization signals, and their readthrough is regulated, arguing for their importance. We further demonstrate that readthrough occurs in yeast and humans. Readthrough thus provides general mechanisms both to regulate gene expression and function, and to add plasticity to the proteome during evolution. DOI: http://dx.doi.org/10.7554/eLife.01179.001 PMID:24302569

  13. Using deep RNA sequencing for the structural annotation of the laccaria bicolor mycorrhizal transcriptome.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Larsen, P. E.; Trivedi, G.; Sreedasyam, A.

    2010-07-06

    Accurate structural annotation is important for prediction of function and required for in vitro approaches to characterize or validate the gene expression products. Despite significant efforts in the field, determination of the gene structure from genomic data alone is a challenging and inaccurate process. The ease of acquisition of transcriptomic sequence provides a direct route to identify expressed sequences and determine the correct gene structure. We developed methods to utilize RNA-seq data to correct errors in the structural annotation and extend the boundaries of current gene models using assembly approaches. The methods were validated with a transcriptomic data set derivedmore » from the fungus Laccaria bicolor, which develops a mycorrhizal symbiotic association with the roots of many tree species. Our analysis focused on the subset of 1501 gene models that are differentially expressed in the free living vs. mycorrhizal transcriptome and are expected to be important elements related to carbon metabolism, membrane permeability and transport, and intracellular signaling. Of the set of 1501 gene models, 1439 (96%) successfully generated modified gene models in which all error flags were successfully resolved and the sequences aligned to the genomic sequence. The remaining 4% (62 gene models) either had deviations from transcriptomic data that could not be spanned or generated sequence that did not align to genomic sequence. The outcome of this process is a set of high confidence gene models that can be reliably used for experimental characterization of protein function. 69% of expressed mycorrhizal JGI 'best' gene models deviated from the transcript sequence derived by this method. The transcriptomic sequence enabled correction of a majority of the structural inconsistencies and resulted in a set of validated models for 96% of the mycorrhizal genes. The method described here can be applied to improve gene structural annotation in other species, provided that there is a sequenced genome and a set of gene models.« less

  14. Development and Validation of a Deep Neural Network Model for Prediction of Postoperative In-hospital Mortality.

    PubMed

    Lee, Christine K; Hofer, Ira; Gabel, Eilon; Baldi, Pierre; Cannesson, Maxime

    2018-04-17

    The authors tested the hypothesis that deep neural networks trained on intraoperative features can predict postoperative in-hospital mortality. The data used to train and validate the algorithm consists of 59,985 patients with 87 features extracted at the end of surgery. Feed-forward networks with a logistic output were trained using stochastic gradient descent with momentum. The deep neural networks were trained on 80% of the data, with 20% reserved for testing. The authors assessed improvement of the deep neural network by adding American Society of Anesthesiologists (ASA) Physical Status Classification and robustness of the deep neural network to a reduced feature set. The networks were then compared to ASA Physical Status, logistic regression, and other published clinical scores including the Surgical Apgar, Preoperative Score to Predict Postoperative Mortality, Risk Quantification Index, and the Risk Stratification Index. In-hospital mortality in the training and test sets were 0.81% and 0.73%. The deep neural network with a reduced feature set and ASA Physical Status classification had the highest area under the receiver operating characteristics curve, 0.91 (95% CI, 0.88 to 0.93). The highest logistic regression area under the curve was found with a reduced feature set and ASA Physical Status (0.90, 95% CI, 0.87 to 0.93). The Risk Stratification Index had the highest area under the receiver operating characteristics curve, at 0.97 (95% CI, 0.94 to 0.99). Deep neural networks can predict in-hospital mortality based on automatically extractable intraoperative data, but are not (yet) superior to existing methods.

  15. Spatiotemporal Recurrent Convolutional Networks for Traffic Prediction in Transportation Networks

    PubMed Central

    Yu, Haiyang; Wu, Zhihai; Wang, Shuqin; Wang, Yunpeng; Ma, Xiaolei

    2017-01-01

    Predicting large-scale transportation network traffic has become an important and challenging topic in recent decades. Inspired by the domain knowledge of motion prediction, in which the future motion of an object can be predicted based on previous scenes, we propose a network grid representation method that can retain the fine-scale structure of a transportation network. Network-wide traffic speeds are converted into a series of static images and input into a novel deep architecture, namely, spatiotemporal recurrent convolutional networks (SRCNs), for traffic forecasting. The proposed SRCNs inherit the advantages of deep convolutional neural networks (DCNNs) and long short-term memory (LSTM) neural networks. The spatial dependencies of network-wide traffic can be captured by DCNNs, and the temporal dynamics can be learned by LSTMs. An experiment on a Beijing transportation network with 278 links demonstrates that SRCNs outperform other deep learning-based algorithms in both short-term and long-term traffic prediction. PMID:28672867

  16. Spatiotemporal Recurrent Convolutional Networks for Traffic Prediction in Transportation Networks.

    PubMed

    Yu, Haiyang; Wu, Zhihai; Wang, Shuqin; Wang, Yunpeng; Ma, Xiaolei

    2017-06-26

    Predicting large-scale transportation network traffic has become an important and challenging topic in recent decades. Inspired by the domain knowledge of motion prediction, in which the future motion of an object can be predicted based on previous scenes, we propose a network grid representation method that can retain the fine-scale structure of a transportation network. Network-wide traffic speeds are converted into a series of static images and input into a novel deep architecture, namely, spatiotemporal recurrent convolutional networks (SRCNs), for traffic forecasting. The proposed SRCNs inherit the advantages of deep convolutional neural networks (DCNNs) and long short-term memory (LSTM) neural networks. The spatial dependencies of network-wide traffic can be captured by DCNNs, and the temporal dynamics can be learned by LSTMs. An experiment on a Beijing transportation network with 278 links demonstrates that SRCNs outperform other deep learning-based algorithms in both short-term and long-term traffic prediction.

  17. Distribution and Diversity of Microbial Eukaryotes in Bathypelagic Waters of the South China Sea.

    PubMed

    Xu, Dapeng; Jiao, Nianzhi; Ren, Rui; Warren, Alan

    2017-05-01

    Little is known about the biodiversity of microbial eukaryotes in the South China Sea, especially in waters at bathyal depths. Here, we employed SSU rDNA gene sequencing to reveal the diversity and community structure across depth and distance gradients in the South China Sea. Vertically, the highest alpha diversity was found at 75-m depth. The communities of microbial eukaryotes were clustered into shallow-, middle-, and deep-water groups according to the depth from which they were collected, indicating a depth-related diversity and distribution pattern. Rhizaria sequences dominated the microeukaryote community and occurred in all samples except those from less than 50-m deep, being most abundant near the sea floor where they contributed ca. 64-97% and 40-74% of the total sequences and OTUs recovered, respectively. A large portion of rhizarian OTUs has neither a nearest named neighbor nor a nearest neighbor in the GenBank database which indicated the presence of new phylotypes in the South China Sea. Given their overwhelming abundance and richness, further phylogenetic analysis of rhizarians were performed and three new genetic clusters were revealed containing sequences retrieved from the deep waters of the South China Sea. Our results shed light on the diversity and community structure of microbial eukaryotes in this not yet fully explored area. © 2016 The Author(s) Journal of Eukaryotic Microbiology © 2016 International Society of Protistologists.

  18. Metavisitor, a Suite of Galaxy Tools for Simple and Rapid Detection and Discovery of Viruses in Deep Sequence Data

    PubMed Central

    Vernick, Kenneth D.

    2017-01-01

    Metavisitor is a software package that allows biologists and clinicians without specialized bioinformatics expertise to detect and assemble viral genomes from deep sequence datasets. The package is composed of a set of modular bioinformatic tools and workflows that are implemented in the Galaxy framework. Using the graphical Galaxy workflow editor, users with minimal computational skills can use existing Metavisitor workflows or adapt them to suit specific needs by adding or modifying analysis modules. Metavisitor works with DNA, RNA or small RNA sequencing data over a range of read lengths and can use a combination of de novo and guided approaches to assemble genomes from sequencing reads. We show that the software has the potential for quick diagnosis as well as discovery of viruses from a vast array of organisms. Importantly, we provide here executable Metavisitor use cases, which increase the accessibility and transparency of the software, ultimately enabling biologists or clinicians to focus on biological or medical questions. PMID:28045932

  19. Complete genome sequence of the aerobic, heterotroph Marinithermus hydrothermalis type strain (T1T) from a deep-sea hydrothermal vent chimney

    PubMed Central

    Copeland, Alex; Gu, Wei; Yasawong, Montri; Lapidus, Alla; Lucas, Susan; Deshpande, Shweta; Pagani, Ioanna; Tapia, Roxanne; Cheng, Jan-Fang; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Pan, Chongle; Brambilla, Evelyne-Marie; Rohde, Manfred; Tindall, Brian J.; Sikorski, Johannes; Göker, Markus; Detter, John C.; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Woyke, Tanja

    2012-01-01

    Marinithermus hydrothermalis Sako et al. 2003 is the type species of the monotypic genus Marinithermus. M. hydrothermalis T1T was the first isolate within the phylum “Thermus-Deinococcus” to exhibit optimal growth under a salinity equivalent to that of sea water and to have an absolute requirement for NaCl for growth. M. hydrothermalis T1T is of interest because it may provide a new insight into the ecological significance of the aerobic, thermophilic decomposers in the circulation of organic compounds in deep-sea hydrothermal vent ecosystems. This is the first completed genome sequence of a member of the genus Marinithermus and the seventh sequence from the family Thermaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,269,167 bp long genome with its 2,251 protein-coding and 59 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:22675595

  20. Brain Tumor Segmentation Using Deep Belief Networks and Pathological Knowledge.

    PubMed

    Zhan, Tianming; Chen, Yi; Hong, Xunning; Lu, Zhenyu; Chen, Yunjie

    2017-01-01

    In this paper, we propose an automatic brain tumor segmentation method based on Deep Belief Networks (DBNs) and pathological knowledge. The proposed method is targeted against gliomas (both low and high grade) obtained in multi-sequence magnetic resonance images (MRIs). Firstly, a novel deep architecture is proposed to combine the multi-sequences intensities feature extraction with classification to get the classification probabilities of each voxel. Then, graph cut based optimization is executed on the classification probabilities to strengthen the spatial relationships of voxels. At last, pathological knowledge of gliomas is applied to remove some false positives. Our method was validated in the Brain Tumor Segmentation Challenge 2012 and 2013 databases (BRATS 2012, 2013). The performance of segmentation results demonstrates our proposal providing a competitive solution with stateof- the-art methods. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  1. A deep learning framework for financial time series using stacked autoencoders and long-short term memory.

    PubMed

    Bao, Wei; Yue, Jun; Rao, Yulei

    2017-01-01

    The application of deep learning approaches to finance has received a great deal of attention from both investors and researchers. This study presents a novel deep learning framework where wavelet transforms (WT), stacked autoencoders (SAEs) and long-short term memory (LSTM) are combined for stock price forecasting. The SAEs for hierarchically extracted deep features is introduced into stock price forecasting for the first time. The deep learning framework comprises three stages. First, the stock price time series is decomposed by WT to eliminate noise. Second, SAEs is applied to generate deep high-level features for predicting the stock price. Third, high-level denoising features are fed into LSTM to forecast the next day's closing price. Six market indices and their corresponding index futures are chosen to examine the performance of the proposed model. Results show that the proposed model outperforms other similar models in both predictive accuracy and profitability performance.

  2. Viral Predation and Host Immunity Structure Microbial Communities in a Terrestrial Deep Subsurface, Hydraulically Fractured Shale System

    NASA Astrophysics Data System (ADS)

    Daly, R. A.; Mouser, P. J.; Trexler, R.; Wrighton, K. C.

    2014-12-01

    Despite a growing appreciation for the ecological role of viruses in marine and gut systems, little is known about their role in the terrestrial deep (> 2000 m) subsurface. We used assembly-based metagenomics to examine the viral component in fluids from hydraulically fractured Marcellus shale gas wells. Here we reconstructed microbial and viral genomes from samples collected 7, 82, and 328 days post fracturing. Viruses accounted for 4.14%, 0.92% and 0.59% of the sample reads that mapped to the assembly. We identified 6 complete, circularized viral genomes and an additional 92 viral contigs > 5 kb with a maximum contig size of 73.6 kb. A BLAST comparison to NCBI viral genomes revealed that 85% of viral contigs had significant hits to the viral order Caudovirales, with 43% of sequences belonging to the family Siphoviridae, 38% to Myoviridae, and 12% to Podoviridae. Enrichment of Caudovirales viruses was supported by a large number of predicted proteins characteristic of tailed viruses including terminases (TerL), tape measure, tail formation, and baseplate related proteins. The viral contigs included evidence of lytic and temperate lifestyles, with the 7 day sample having the greatest number of detected lytic viruses. Notably in this sample, the most abundant virus was lytic and its inferred host, a member of the Vibrionaceae, was not detected at later time points. Analyses of CRISPR sequences (a viral and foreign DNA immune system in bacteria and archaea), linked 18 viral contigs to hosts. CRISPR linkages increased through time and all bacterial and archaeal genomes recovered in the final time point had genes for CRISPR-mediated viral defense. The majority of CRISPR sequences linked phage genomes to several Halanaerobium strains, which are the dominant and persisting members of the community inferred to be responsible for carbon and sulfur cycling in these shales. Network analysis revealed that several viruses were present in the 82 and 328 day samples; this viral persistence is consistent with concomitant temporal stability in geochemistry and microbial community composition. Our findings suggest that after a disturbance (hydraulic fracturing) viral predation and host immunity is an important controller of microbial community structure, metabolism, and thus biogeochemical cycling in the deep subsurface.

  3. The seismic stratigraphy of Okanagan Lake, British Columbia; a record of rapid deglaciation in a deep 'fiord-lake' basin

    NASA Astrophysics Data System (ADS)

    Eyles, Nicholas; Mullins, Henry T.; Hine, Albert C.

    1991-09-01

    This paper presents the first detailed data regarding the newly discovered deep infill of Okanagan Lake. Okanagan Lake (50°00'N, 119°30'W) is 120 km long, ˜ 3-5 km wide and occupies a glacially overdeepened bedrock basin in the southern interior of British Columbia. This basin, and other elongate lakes of the region (e.g. Shuswap, Kootenay, Kalamalka, Canim and Mahood lakes), mark the site of westward flowing ice streams within successive Cordilleran ice sheets. An air gun seismic survey of Okanagan Lake shows that the bedrock floor is nearly 650 m below sea-level, more than 2000 m below the rim of the surrounding plateau. The maximum thickness of Pleistocene sediment in Okanagan Lake basin approaches 800 m. Forty-six seismic reflection traverses and an axial profile show a relatively simple stratigraphy composed of three seismic sequences argued to be no older than the last glacial cycle (< 30 ka). A discontinuous basal unit (sequence I) characterized by large-scale diffractions, and up to 460 m thick, infills the narrow, V-shaped bedrock floor of the basin and is interpreted as a boulder gravel deposited by subglacial meltwaters. Overlying seismic sequence II is composed of two sub-sequences. Sub-sequence IIa is a chaotic to massive facies up to 736 m thick. Lakeshore exposures close to where this unit reaches lake level show deformed and chaotically-bedded glaciolacustrine silts containing gravel lens and large ice-rafted boulders. The surface topography of this sub-sequence is irregular and in general mimics the form of the underlying bedrock as a result of compaction. This sequence passes laterally into stratified facies (sub-sequence IIb) at the northern end of the basin. Seismic sequence II appears to record rapid ice-proximal dumping of glaciolacustrine silt as the Okanagan glacier backwasted upvalley in a deep lake. A thin (60 m max.) laminated seismic sequence (III) drapes the hummocky surface of sequence II and represents postglacial sedimentation from fan-deltas. The extreme thickness of sequences I and II in Okanagan Lake reflects the focussing of large volumes of meltwater and sediment into the basin during deglaciation; pre-existing sediments that pre-date the last glacial cycle appear to have been completely eroded. Glaciological conditions during sedimentation may have been similar to marine-based outlet glaciers calving in deep water in fiord basins. In contrast to marine settings where ice bergs are free to disperse, large volumes of dead ice were trapped within the basin; structural evidence for sedimentation around dead ice blocks has been previously used to argue that the Cordilleran Ice Sheet downwasted in situ. We emphasize in contrast, the trapping of dead ice left behind by rapidly calving lake-based outlet glaciers.

  4. Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records

    PubMed Central

    Miotto, Riccardo; Li, Li; Kidd, Brian A.; Dudley, Joel T.

    2016-01-01

    Secondary use of electronic health records (EHRs) promises to advance clinical research and better inform clinical decision making. Challenges in summarizing and representing patient data prevent widespread practice of predictive modeling using EHRs. Here we present a novel unsupervised deep feature learning method to derive a general-purpose patient representation from EHR data that facilitates clinical predictive modeling. In particular, a three-layer stack of denoising autoencoders was used to capture hierarchical regularities and dependencies in the aggregated EHRs of about 700,000 patients from the Mount Sinai data warehouse. The result is a representation we name “deep patient”. We evaluated this representation as broadly predictive of health states by assessing the probability of patients to develop various diseases. We performed evaluation using 76,214 test patients comprising 78 diseases from diverse clinical domains and temporal windows. Our results significantly outperformed those achieved using representations based on raw EHR data and alternative feature learning strategies. Prediction performance for severe diabetes, schizophrenia, and various cancers were among the top performing. These findings indicate that deep learning applied to EHRs can derive patient representations that offer improved clinical predictions, and could provide a machine learning framework for augmenting clinical decision systems. PMID:27185194

  5. Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records

    NASA Astrophysics Data System (ADS)

    Miotto, Riccardo; Li, Li; Kidd, Brian A.; Dudley, Joel T.

    2016-05-01

    Secondary use of electronic health records (EHRs) promises to advance clinical research and better inform clinical decision making. Challenges in summarizing and representing patient data prevent widespread practice of predictive modeling using EHRs. Here we present a novel unsupervised deep feature learning method to derive a general-purpose patient representation from EHR data that facilitates clinical predictive modeling. In particular, a three-layer stack of denoising autoencoders was used to capture hierarchical regularities and dependencies in the aggregated EHRs of about 700,000 patients from the Mount Sinai data warehouse. The result is a representation we name “deep patient”. We evaluated this representation as broadly predictive of health states by assessing the probability of patients to develop various diseases. We performed evaluation using 76,214 test patients comprising 78 diseases from diverse clinical domains and temporal windows. Our results significantly outperformed those achieved using representations based on raw EHR data and alternative feature learning strategies. Prediction performance for severe diabetes, schizophrenia, and various cancers were among the top performing. These findings indicate that deep learning applied to EHRs can derive patient representations that offer improved clinical predictions, and could provide a machine learning framework for augmenting clinical decision systems.

  6. Computational RNomics of Drosophilids

    PubMed Central

    Rose, Dominic; Hackermüller, Jörg; Washietl, Stefan; Reiche, Kristin; Hertel, Jana; Findeiß, Sven; Stadler, Peter F; Prohaska, Sonja J

    2007-01-01

    Background Recent experimental and computational studies have provided overwhelming evidence for a plethora of diverse transcripts that are unrelated to protein-coding genes. One subclass consists of those RNAs that require distinctive secondary structure motifs to exert their biological function and hence exhibit distinctive patterns of sequence conservation characteristic for positive selection on RNA secondary structure. The deep-sequencing of 12 drosophilid species coordinated by the NHGRI provides an ideal data set of comparative computational approaches to determine those genomic loci that code for evolutionarily conserved RNA motifs. This class of loci includes the majority of the known small ncRNAs as well as structured RNA motifs in mRNAs. We report here on a genome-wide survey using RNAz. Results We obtain 16 000 high quality predictions among which we recover the majority of the known ncRNAs. Taking a pessimistically estimated false discovery rate of 40% into account, this implies that at least some ten thousand loci in the Drosophila genome show the hallmarks of stabilizing selection action of RNA structure, and hence are most likely functional at the RNA level. A subset of RNAz predictions overlapping with TRF1 and BRF binding sites [Isogai et al., EMBO J. 26: 79–89 (2007)], which are plausible candidates of Pol III transcripts, have been studied in more detail. Among these sequences we identify several "clusters" of ncRNA candidates with striking structural similarities. Conclusion The statistical evaluation of the RNAz predictions in comparison with a similar analysis of vertebrate genomes [Washietl et al., Nat. Biotech. 23: 1383–1390 (2005)] shows that qualitatively similar fractions of structured RNAs are found in introns, UTRs, and intergenic regions. The intergenic RNA structures, however, are concentrated much more closely around known protein-coding loci, suggesting that flies have significantly smaller complement of independent structured ncRNAs compared to mammals. PMID:17996037

  7. Prediction of properties of wheat dough using intelligent deep belief networks

    NASA Astrophysics Data System (ADS)

    Guha, Paramita; Bhatnagar, Taru; Pal, Ishan; Kamboj, Uma; Mishra, Sunita

    2017-11-01

    In this paper, the rheological and chemical properties of wheat dough are predicted using deep belief networks. Wheat grains are stored at controlled environmental conditions. The internal parameters of grains viz., protein, fat, carbohydrates, moisture, ash are determined using standard chemical analysis and viscosity of the dough is measured using Rheometer. Here, fat, carbohydrates, moisture, ash and temperature are considered as inputs whereas protein and viscosity are chosen as outputs. The prediction algorithm is developed using deep neural network where each layer is trained greedily using restricted Boltzmann machine (RBM) networks. The overall network is finally fine-tuned using standard neural network technique. In most literature, it has been found that fine-tuning is done using back-propagation technique. In this paper, a new algorithm is proposed in which each layer is tuned using RBM and the final network is fine-tuned using deep neural network (DNN). It has been observed that with the proposed algorithm, errors between the actual and predicted outputs are less compared to the conventional algorithm. Hence, the given network can be considered as beneficial as it predicts the outputs more accurately. Numerical results along with discussions are presented.

  8. pLoc-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general PseAAC.

    PubMed

    Cheng, Xiang; Xiao, Xuan; Chou, Kuo-Chen

    2017-10-06

    Information of the proteins' subcellular localization is crucially important for revealing their biological functions in a cell, the basic unit of life. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to develop computational tools for timely identifying their subcellular locations based on the sequence information alone. The current study is focused on the Gram-negative bacterial proteins. Although considerable efforts have been made in protein subcellular prediction, the problem is far from being solved yet. This is because mounting evidences have indicated that many Gram-negative bacterial proteins exist in two or more location sites. Unfortunately, most existing methods can be used to deal with single-location proteins only. Actually, proteins with multi-locations may have some special biological functions important for both basic research and drug design. In this study, by using the multi-label theory, we developed a new predictor called "pLoc-mGneg" for predicting the subcellular localization of Gram-negative bacterial proteins with both single and multiple locations. Rigorous cross-validation on a high quality benchmark dataset indicated that the proposed predictor is remarkably superior to "iLoc-Gneg", the state-of-the-art predictor for the same purpose. For the convenience of most experimental scientists, a user-friendly web-server for the novel predictor has been established at http://www.jci-bioinfo.cn/pLoc-mGneg/, by which users can easily get their desired results without the need to go through the complicated mathematics involved. Copyright © 2017 Elsevier Inc. All rights reserved.

  9. Joint deep shape and appearance learning: application to optic pathway glioma segmentation

    NASA Astrophysics Data System (ADS)

    Mansoor, Awais; Li, Ien; Packer, Roger J.; Avery, Robert A.; Linguraru, Marius George

    2017-03-01

    Automated tissue characterization is one of the major applications of computer-aided diagnosis systems. Deep learning techniques have recently demonstrated impressive performance for the image patch-based tissue characterization. However, existing patch-based tissue classification techniques struggle to exploit the useful shape information. Local and global shape knowledge such as the regional boundary changes, diameter, and volumetrics can be useful in classifying the tissues especially in scenarios where the appearance signature does not provide significant classification information. In this work, we present a deep neural network-based method for the automated segmentation of the tumors referred to as optic pathway gliomas (OPG) located within the anterior visual pathway (AVP; optic nerve, chiasm or tracts) using joint shape and appearance learning. Voxel intensity values of commonly used MRI sequences are generally not indicative of OPG. To be considered an OPG, current clinical practice dictates that some portion of AVP must demonstrate shape enlargement. The method proposed in this work integrates multiple sequence magnetic resonance image (T1, T2, and FLAIR) along with local boundary changes to train a deep neural network. For training and evaluation purposes, we used a dataset of multiple sequence MRI obtained from 20 subjects (10 controls, 10 NF1+OPG). To our best knowledge, this is the first deep representation learning-based approach designed to merge shape and multi-channel appearance data for the glioma detection. In our experiments, mean misclassification errors of 2:39% and 0:48% were observed respectively for glioma and control patches extracted from the AVP. Moreover, an overall dice similarity coefficient of 0:87+/-0:13 (0:93+/-0:06 for healthy tissue, 0:78+/-0:18 for glioma tissue) demonstrates the potential of the proposed method in the accurate localization and early detection of OPG.

  10. Distinct biological subtypes and patterns of genome evolution in lymphoma revealed by circulating tumor DNA.

    PubMed

    Scherer, Florian; Kurtz, David M; Newman, Aaron M; Stehr, Henning; Craig, Alexander F M; Esfahani, Mohammad Shahrokh; Lovejoy, Alexander F; Chabon, Jacob J; Klass, Daniel M; Liu, Chih Long; Zhou, Li; Glover, Cynthia; Visser, Brendan C; Poultsides, George A; Advani, Ranjana H; Maeda, Lauren S; Gupta, Neel K; Levy, Ronald; Ohgami, Robert S; Kunder, Christian A; Diehn, Maximilian; Alizadeh, Ash A

    2016-11-09

    Patients with diffuse large B cell lymphoma (DLBCL) exhibit marked diversity in tumor behavior and outcomes, yet the identification of poor-risk groups remains challenging. In addition, the biology underlying these differences is incompletely understood. We hypothesized that characterization of mutational heterogeneity and genomic evolution using circulating tumor DNA (ctDNA) profiling could reveal molecular determinants of adverse outcomes. To address this hypothesis, we applied cancer personalized profiling by deep sequencing (CAPP-Seq) analysis to tumor biopsies and cell-free DNA samples from 92 lymphoma patients and 24 healthy subjects. At diagnosis, the amount of ctDNA was found to strongly correlate with clinical indices and was independently predictive of patient outcomes. We demonstrate that ctDNA genotyping can classify transcriptionally defined tumor subtypes, including DLBCL cell of origin, directly from plasma. By simultaneously tracking multiple somatic mutations in ctDNA, our approach outperformed immunoglobulin sequencing and radiographic imaging for the detection of minimal residual disease and facilitated noninvasive identification of emergent resistance mutations to targeted therapies. In addition, we identified distinct patterns of clonal evolution distinguishing indolent follicular lymphomas from those that transformed into DLBCL, allowing for potential noninvasive prediction of histological transformation. Collectively, our results demonstrate that ctDNA analysis reveals biological factors that underlie lymphoma clinical outcomes and could facilitate individualized therapy. Copyright © 2016, American Association for the Advancement of Science.

  11. ACS Imaging of beta Pic: Searching for the origin of rings and asymmetry in planetesimal disks

    NASA Astrophysics Data System (ADS)

    Kalas, Paul

    2003-07-01

    The emerging picture for planetesimal disks around main sequence stars is that their radial and azimuthal symmetries are significantly deformed by the dynamical effects of either planets interior to the disk, or stellar objects exterior to the disk. The cause of these structures, such as the 50 AU cutoff of our Kuiper Belt, remains mysterious. Structure in the beta Pic planetesimal disk could be due to dynamics controlled by an extrasolar planet, or by the tidal influence of a more massive object exterior to the disk. The hypothesis of an extrasolar planet causing the vertical deformation in the disk predicts a blue color to the disk perpendicular to the disk midplane. The hypothesis that a stellar perturber deforms the disk predicts a globally uniform color and the existence of ring-like structure beyond 800 AU radius. We propose to obtain deep, multi-color images of the beta Pic disk ansae in the region 15"-220" {200-4000 AU} radius with the ACS WFC. The unparalleled stability of the HST PSF means that these data are uniquely capable of delivering the color sensitivity that can distinguish between the two theories of beta Pic's disk structure. Ascertaining the cause of such structure provide a meaningful context for understanding the dynamical history of our early solar system, as well as other planetesimal systems imaged around main sequence stars.

  12. Comparison of the Exomes of Common Carp (Cyprinus carpio) and Zebrafish (Danio rerio)

    PubMed Central

    Henkel, Christiaan V.; Dirks, Ron P.; Jansen, Hans J.; Forlenza, Maria; Wiegertjes, Geert F.; Howe, Kerstin; van den Thillart, Guido E.E.J.M.

    2012-01-01

    Abstract Research on common carp, Cyprinus carpio, is beneficial for zebrafish research because of resources available owing to its large body size, such as the availability of sufficient organ material for transcriptomics, proteomics, and metabolomics. Here we describe the shot gun sequencing of a clonal double-haploid common carp line. The assembly consists of 511891 scaffolds with an N50 of 17 kb, predicting a total genome size of 1.4–1.5 Gb. A detailed analysis of the ten largest scaffolds indicates that the carp genome has a considerably lower repeat coverage than zebrafish, whilst the average intron size is significantly smaller, making it comparable to the fugu genome. The quality of the scaffolding was confirmed by comparisons with RNA deep sequencing data sets and a manual analysis for synteny with the zebrafish, especially the Hox gene clusters. In the ten largest scaffolds analyzed, the synteny of genes is almost complete. Comparisons of predicted exons of common carp with those of the zebrafish revealed only few genes specific for either zebrafish or carp, most of these being of unknown function. This supports the hypothesis of an additional genome duplication event in the carp evolutionary history, which—due to a higher degree of compactness—did not result in a genome larger than that of zebrafish. PMID:22715948

  13. Detection of microRNAs in color space.

    PubMed

    Marco, Antonio; Griffiths-Jones, Sam

    2012-02-01

    Deep sequencing provides inexpensive opportunities to characterize the transcriptional diversity of known genomes. The AB SOLiD technology generates millions of short sequencing reads in color-space; that is, the raw data is a sequence of colors, where each color represents 2 nt and each nucleotide is represented by two consecutive colors. This strategy is purported to have several advantages, including increased ability to distinguish sequencing errors from polymorphisms. Several programs have been developed to map short reads to genomes in color space. However, a number of previously unexplored technical issues arise when using SOLiD technology to characterize microRNAs. Here we explore these technical difficulties. First, since the sequenced reads are longer than the biological sequences, every read is expected to contain linker fragments. The color-calling error rate increases toward the 3(') end of the read such that recognizing the linker sequence for removal becomes problematic. Second, mapping in color space may lead to the loss of the first nucleotide of each read. We propose a sequential trimming and mapping approach to map small RNAs. Using our strategy, we reanalyze three published insect small RNA deep sequencing datasets and characterize 22 new microRNAs. A bash shell script to perform the sequential trimming and mapping procedure, called SeqTrimMap, is available at: http://www.mirbase.org/tools/seqtrimmap/ antonio.marco@manchester.ac.uk Supplementary data are available at Bioinformatics online.

  14. Preoperative prediction of lymph node metastasis and deep stromal invasion in women with invasive cervical cancer: prospective multicenter study using 2D and 3D ultrasound.

    PubMed

    Pálsdóttir, K; Fischerova, D; Franchi, D; Testa, A; Di Legge, A; Epstein, E

    2015-04-01

    To determine how various objective two-dimensional (2D) and three-dimensional (3D) ultrasound parameters allow prediction of deep stromal tumor invasion and lymph node involvement, in comparison to subjective ultrasound assessment, in women scheduled for surgery for cervical cancer. This was a prospective multicenter trial including 104 women with cervical cancer at FIGO Stages IA2-IIB, verified histologically. Patients scheduled for surgery underwent a preoperative ultrasound examination. The value of various 2D (size, color score) and 3D (volume, vascular indices) ultrasound parameters was compared to that of subjective assessment in the prediction of deep stromal tumor invasion and lymph node involvement. Histology obtained from radical hysterectomy or trachelectomy and pelvic lymphadenectomy was considered as the gold standard for assessment. All women underwent pelvic lymphadenectomy, with 99 (95%) undergoing subsequent radical surgery; five underwent only pelvic lymphadenectomy because of the presence of a positive sentinel lymph node. Women with deep stromal invasion or lymph node involvement had significantly larger tumors (diameter and volume) but there was no correlation with vascular indices measured on 3D ultrasound. Subjective evaluation was superior (AUC, 0.93; sensitivity, 90.5%; specificity, 97.2%) in the prediction of deep stromal invasion when compared to any objective measurement technique, with maximal tumor diameter at 20.5-mm cut-off (AUC, 0.83; sensitivity, 90.5%; specificity, 61.1%) and 3D tumor volume at 9.1-mm(3) cut-off (AUC, 0.85; sensitivity, 79.4%; specificity, 83.3%) providing the best performance among the objective parameters. Both subjective assessment and objective measurements were poorly predictive of lymph node involvement. In women with cervical cancer, subjective ultrasound evaluation allowed better prediction of deep stromal invasion than did objective measurements; however, neither subjective evaluation nor objective parameters were adequate to predict lymph node involvement. 3D vascular indices were ineffective in the prediction of advanced stages of the disease. Copyright © 2014 ISUOG. Published by John Wiley & Sons Ltd.

  15. A Statistical Guide to the Design of Deep Mutational Scanning Experiments.

    PubMed

    Matuszewski, Sebastian; Hildebrandt, Marcel E; Ghenu, Ana-Hermina; Jensen, Jeffrey D; Bank, Claudia

    2016-09-01

    The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deep-sequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates. Copyright © 2016 by the Genetics Society of America.

  16. A hybrid deep learning approach to predict malignancy of breast lesions using mammograms

    NASA Astrophysics Data System (ADS)

    Wang, Yunzhi; Heidari, Morteza; Mirniaharikandehei, Seyedehnafiseh; Gong, Jing; Qian, Wei; Qiu, Yuchen; Zheng, Bin

    2018-03-01

    Applying deep learning technology to medical imaging informatics field has been recently attracting extensive research interest. However, the limited medical image dataset size often reduces performance and robustness of the deep learning based computer-aided detection and/or diagnosis (CAD) schemes. In attempt to address this technical challenge, this study aims to develop and evaluate a new hybrid deep learning based CAD approach to predict likelihood of a breast lesion detected on mammogram being malignant. In this approach, a deep Convolutional Neural Network (CNN) was firstly pre-trained using the ImageNet dataset and serve as a feature extractor. A pseudo-color Region of Interest (ROI) method was used to generate ROIs with RGB channels from the mammographic images as the input to the pre-trained deep network. The transferred CNN features from different layers of the CNN were then obtained and a linear support vector machine (SVM) was trained for the prediction task. By applying to a dataset involving 301 suspicious breast lesions and using a leave-one-case-out validation method, the areas under the ROC curves (AUC) = 0.762 and 0.792 using the traditional CAD scheme and the proposed deep learning based CAD scheme, respectively. An ensemble classifier that combines the classification scores generated by the two schemes yielded an improved AUC value of 0.813. The study results demonstrated feasibility and potentially improved performance of applying a new hybrid deep learning approach to develop CAD scheme using a relatively small dataset of medical images.

  17. The effects of deep network topology on mortality prediction.

    PubMed

    Hao Du; Ghassemi, Mohammad M; Mengling Feng

    2016-08-01

    Deep learning has achieved remarkable results in the areas of computer vision, speech recognition, natural language processing and most recently, even playing Go. The application of deep-learning to problems in healthcare, however, has gained attention only in recent years, and it's ultimate place at the bedside remains a topic of skeptical discussion. While there is a growing academic interest in the application of Machine Learning (ML) techniques to clinical problems, many in the clinical community see little incentive to upgrade from simpler methods, such as logistic regression, to deep learning. Logistic regression, after all, provides odds ratios, p-values and confidence intervals that allow for ease of interpretation, while deep nets are often seen as `black-boxes' which are difficult to understand and, as of yet, have not demonstrated performance levels far exceeding their simpler counterparts. If deep learning is to ever take a place at the bedside, it will require studies which (1) showcase the performance of deep-learning methods relative to other approaches and (2) interpret the relationships between network structure, model performance, features and outcomes. We have chosen these two requirements as the goal of this study. In our investigation, we utilized a publicly available EMR dataset of over 32,000 intensive care unit patients and trained a Deep Belief Network (DBN) to predict patient mortality at discharge. Utilizing an evolutionary algorithm, we demonstrate automated topology selection for DBNs. We demonstrate that with the correct topology selection, DBNs can achieve better prediction performance compared to several bench-marking methods.

  18. The rise of deep learning in drug discovery.

    PubMed

    Chen, Hongming; Engkvist, Ola; Wang, Yinhai; Olivecrona, Marcus; Blaschke, Thomas

    2018-06-01

    Over the past decade, deep learning has achieved remarkable success in various artificial intelligence research areas. Evolved from the previous research on artificial neural networks, this technology has shown superior performance to other machine learning algorithms in areas such as image and voice recognition, natural language processing, among others. The first wave of applications of deep learning in pharmaceutical research has emerged in recent years, and its utility has gone beyond bioactivity predictions and has shown promise in addressing diverse problems in drug discovery. Examples will be discussed covering bioactivity prediction, de novo molecular design, synthesis prediction and biological image analysis. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.

  19. Draft Genome Sequence of Thermus scotoductus Strain K1, Isolated from a Geothermal Spring in Karvachar, Nagorno Karabakh

    PubMed Central

    Saghatelyan, Ani; Poghosyan, Lianna

    2015-01-01

    The 2,379,636-bp draft genome sequence of Thermus scotoductus strain K1, isolated from geothermal spring outlet located in the Karvachar region in Nagorno Karabakh is presented. Strain K1 shares about 80% genome sequence similarity with T. scotoductus strain SA-01, recovered from a deep gold mine in South Africa. PMID:26564055

  20. Deep learning for single-molecule science

    NASA Astrophysics Data System (ADS)

    Albrecht, Tim; Slabaugh, Gregory; Alonso, Eduardo; Al-Arif, SM Masudur R.

    2017-10-01

    Exploring and making predictions based on single-molecule data can be challenging, not only due to the sheer size of the datasets, but also because a priori knowledge about the signal characteristics is typically limited and poor signal-to-noise ratio. For example, hypothesis-driven data exploration, informed by an expectation of the signal characteristics, can lead to interpretation bias or loss of information. Equally, even when the different data categories are known, e.g., the four bases in DNA sequencing, it is often difficult to know how to make best use of the available information content. The latest developments in machine learning (ML), so-called deep learning (DL) offer interesting, new avenues to address such challenges. In some applications, such as speech and image recognition, DL has been able to outperform conventional ML strategies and even human performance. However, to date DL has not been applied much in single-molecule science, presumably in part because relatively little is known about the ‘internal workings’ of such DL tools within single-molecule science as a field. In this Tutorial, we make an attempt to illustrate in a step-by-step guide how one of those, a convolutional neural network (CNN), may be used for base calling in DNA sequencing applications. We compare it with a SVM as a more conventional ML method, and discuss some of the strengths and weaknesses of the approach. In particular, a ‘deep’ neural network has many features of a ‘black box’, which has important implications on how we look at and interpret data.

  1. SEAN: SNP prediction and display program utilizing EST sequence clusters.

    PubMed

    Huntley, Derek; Baldo, Angela; Johri, Saurabh; Sergot, Marek

    2006-02-15

    SEAN is an application that predicts single nucleotide polymorphisms (SNPs) using multiple sequence alignments produced from expressed sequence tag (EST) clusters. The algorithm uses rules of sequence identity and SNP abundance to determine the quality of the prediction. A Java viewer is provided to display the EST alignments and predicted SNPs.

  2. Analysis of microRNA profile of Anopheles sinensis by deep sequencing and bioinformatic approaches.

    PubMed

    Feng, Xinyu; Zhou, Xiaojian; Zhou, Shuisen; Wang, Jingwen; Hu, Wei

    2018-03-12

    microRNAs (miRNAs) are small non-coding RNAs widely identified in many mosquitoes. They are reported to play important roles in development, differentiation and innate immunity. However, miRNAs in Anopheles sinensis, one of the Chinese malaria mosquitoes, remain largely unknown. We investigated the global miRNA expression profile of An. sinensis using Illumina Hiseq 2000 sequencing. Meanwhile, we applied a bioinformatic approach to identify potential miRNAs in An. sinensis. The identified miRNA profiles were compared and analyzed by two approaches. The selected miRNAs from the sequencing result and the bioinformatic approach were confirmed with qRT-PCR. Moreover, target prediction, GO annotation and pathway analysis were carried out to understand the role of miRNAs in An. sinensis. We identified 49 conserved miRNAs and 12 novel miRNAs by next-generation high-throughput sequencing technology. In contrast, 43 miRNAs were predicted by the bioinformatic approach, of which two were assigned as novel. Comparative analysis of miRNA profiles by two approaches showed that 21 miRNAs were shared between them. Twelve novel miRNAs did not match any known miRNAs of any organism, indicating that they are possibly species-specific. Forty miRNAs were found in many mosquito species, indicating that these miRNAs are evolutionally conserved and may have critical roles in the process of life. Both the selected known and novel miRNAs (asi-miR-281, asi-miR-184, asi-miR-14, asi-miR-nov5, asi-miR-nov4, asi-miR-9383, and asi-miR-2a) could be detected by quantitative real-time PCR (qRT-PCR) in the sequenced sample, and the expression patterns of these miRNAs measured by qRT-PCR were in concordance with the original miRNA sequencing data. The predicted targets for the known and the novel miRNAs covered many important biological roles and pathways indicating the diversity of miRNA functions. We also found 21 conserved miRNAs and eight counterparts of target immune pathway genes in An. sinensis based on the analysis of An. gambiae. Our results provide the first lead to the elucidation of the miRNA profile in An. sinensis. Unveiling the roles of mosquito miRNAs will undoubtedly lead to a better understanding of mosquito biology and mosquito-pathogen interactions. This work lays the foundation for the further functional study of An. sinensis miRNAs and will facilitate their application in vector control.

  3. SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences.

    PubMed

    Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

    2008-05-01

    Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.

  4. SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences

    PubMed Central

    Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

    2008-01-01

    Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. PMID:18452616

  5. Spermine Condenses DNA, but Not RNA Duplexes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Katz, Andrea M.; Tolokh, Igor S.; Pabit, Suzette A.

    Interactions between the polyamine spermine and nucleic acids drive important cellular processes. Spermine condenses DNA, and some RNAs such as poly(rA):poly(rU). A large fraction of the spermine present in cells is bound to RNA, but apparently does not condense it. Here, we study the effect of spermine binding to short duplex RNA and DNA and compare our findings with predictions of molecular dynamics simulations. When small numbers of spermine are introduced, RNA with a designed sequence, containing a mixture of 14 GC pairs and 11 AU pairs, resists condensation relative to DNA of an equivalent sequence or to 25 basemore » pair poly(rA):poly(rU) RNA. Comparison of wide-angle x-ray scattering profiles with simulation suggests that spermine is sequestered deep within the major groove of mixed sequence RNA, preventing condensation by limiting opportunities to bridge to other molecules as well as stabilizing the RNA by locking it into a particular conformation. In contrast, for DNA, simulations suggest that spermine binds external to the duplex, offering opportunities for intermolecular interaction. The goal of this study is to explain how RNA can remain soluble, and available for interaction with other molecules in the cell, despite the presence of spermine at concentrations high enough to precipitate DNA.« less

  6. Unveiling the metabolic potential of two soil-derived microbial consortia selected on wheat straw

    PubMed Central

    Jiménez, Diego Javier; Chaves-Moreno, Diego; van Elsas, Jan Dirk

    2015-01-01

    Based on the premise that plant biomass can be efficiently degraded by mixed microbial cultures and/or enzymes, we here applied a targeted metagenomics-based approach to explore the metabolic potential of two forest soil-derived lignocellulolytic microbial consortia, denoted RWS and TWS (bred on wheat straw). Using the metagenomes of three selected batches of two experimental systems, about 1.2 Gb of sequence was generated. Comparative analyses revealed an overrepresentation of predicted carbohydrate transporters (ABC, TonB and phosphotransferases), two-component sensing systems and β-glucosidases/galactosidases in the two consortia as compared to the forest soil inoculum. Additionally, “profiling” of carbohydrate-active enzymes showed significant enrichments of several genes encoding glycosyl hydrolases of families GH2, GH43, GH92 and GH95. Sequence analyses revealed these to be most strongly affiliated to genes present on the genomes of Sphingobacterium, Bacteroides, Flavobacterium and Pedobacter spp. Assembly of the RWS and TWS metagenomes generated 16,536 and 15,902 contigs of ≥10 Kb, respectively. Thirteen contigs, containing 39 glycosyl hydrolase genes, constitute novel (hemi)cellulose utilization loci with affiliation to sequences primarily found in the Bacteroidetes. Overall, this study provides deep insight in the plant polysaccharide degrading capabilities of microbial consortia bred from forest soil, highlighting their biotechnological potential. PMID:26343383

  7. Discriminating a deep gallium antisite defect from shallow acceptors in GaAs using supercell calculations

    DOE PAGES

    Schultz, Peter A.

    2016-03-01

    For the purposes of making reliable first-principles predictions of defect energies in semiconductors, it is crucial to distinguish between effective-mass-like defects, which cannot be treated accurately with existing supercell methods, and deep defects, for which density functional theory calculations can yield reliable predictions of defect energy levels. The gallium antisite defect GaAs is often associated with the 78/203 meV shallow double acceptor in Ga-rich gallium arsenide. Within a conceptual framework of level patterns, analyses of structure and spin stabilization can be used within a supercell approach to distinguish localized deep defect states from shallow acceptors such as B As. Thismore » systematic approach determines that the gallium antisite supercell results has signatures inconsistent with an effective mass state and cannot be the 78/203 shallow double acceptor. Lastly, the properties of the Ga antisite in GaAs are described, total energy calculations that explicitly map onto asymptotic discrete localized bulk states predict that the Ga antisite is a deep double acceptor and has at least one deep donor state.« less

  8. Arthropod phylogenetics in light of three novel millipede (myriapoda: diplopoda) mitochondrial genomes with comments on the appropriateness of mitochondrial genome sequence data for inferring deep level relationships.

    PubMed

    Brewer, Michael S; Swafford, Lynn; Spruill, Chad L; Bond, Jason E

    2013-01-01

    Arthropods are the most diverse group of eukaryotic organisms, but their phylogenetic relationships are poorly understood. Herein, we describe three mitochondrial genomes representing orders of millipedes for which complete genomes had not been characterized. Newly sequenced genomes are combined with existing data to characterize the protein coding regions of myriapods and to attempt to reconstruct the evolutionary relationships within the Myriapoda and Arthropoda. The newly sequenced genomes are similar to previously characterized millipede sequences in terms of synteny and length. Unique translocations occurred within the newly sequenced taxa, including one half of the Appalachioria falcifera genome, which is inverted with respect to other millipede genomes. Across myriapods, amino acid conservation levels are highly dependent on the gene region. Additionally, individual loci varied in the level of amino acid conservation. Overall, most gene regions showed low levels of conservation at many sites. Attempts to reconstruct the evolutionary relationships suffered from questionable relationships and low support values. Analyses of phylogenetic informativeness show the lack of signal deep in the trees (i.e., genes evolve too quickly). As a result, the myriapod tree resembles previously published results but lacks convincing support, and, within the arthropod tree, well established groups were recovered as polyphyletic. The novel genome sequences described herein provide useful genomic information concerning millipede groups that had not been investigated. Taken together with existing sequences, the variety of compositions and evolution of myriapod mitochondrial genomes are shown to be more complex than previously thought. Unfortunately, the use of mitochondrial protein-coding regions in deep arthropod phylogenetics appears problematic, a result consistent with previously published studies. Lack of phylogenetic signal renders the resulting tree topologies as suspect. As such, these data are likely inappropriate for investigating such ancient relationships.

  9. Transcriptome Analysis in Venom Gland of the Predatory Giant Ant Dinoponera quadriceps: Insights into the Polypeptide Toxin Arsenal of Hymenopterans

    PubMed Central

    Chong, Cheong-Meng; Leung, Siu Wai; Prieto-da-Silva, Álvaro R. B.; Havt, Alexandre; Quinet, Yves P.; Martins, Alice M. C.; Lee, Simon M. Y.; Rádis-Baptista, Gandhi

    2014-01-01

    Background Dinoponera quadriceps is a predatory giant ant that inhabits the Neotropical region and subdues its prey (insects) with stings that deliver a toxic cocktail of molecules. Human accidents occasionally occur and cause local pain and systemic symptoms. A comprehensive study of the D. quadriceps venom gland transcriptome is required to advance our knowledge about the toxin repertoire of the giant ant venom and to understand the physiopathological basis of Hymenoptera envenomation. Results We conducted a transcriptome analysis of a cDNA library from the D. quadriceps venom gland with Sanger sequencing in combination with whole-transcriptome shotgun deep sequencing. From the cDNA library, a total of 420 independent clones were analyzed. Although the proportion of dinoponeratoxin isoform precursors was high, the first giant ant venom inhibitor cysteine-knot (ICK) toxin was found. The deep next generation sequencing yielded a total of 2,514,767 raw reads that were assembled into 18,546 contigs. A BLAST search of the assembled contigs against non-redundant and Swiss-Prot databases showed that 6,463 contigs corresponded to BLASTx hits and indicated an interesting diversity of transcripts related to venom gene expression. The majority of these venom-related sequences code for a major polypeptide core, which comprises venom allergens, lethal-like proteins and esterases, and a minor peptide framework composed of inter-specific structurally conserved cysteine-rich toxins. Both the cDNA library and deep sequencing yielded large proportions of contigs that showed no similarities with known sequences. Conclusions To our knowledge, this is the first report of the venom gland transcriptome of the New World giant ant D. quadriceps. The glandular venom system was dissected, and the toxin arsenal was revealed; this process brought to light novel sequences that included an ICK-folded toxins, allergen proteins, esterases (phospholipases and carboxylesterases), and lethal-like toxins. These findings contribute to the understanding of the ecology, behavior and venomics of hymenopterans. PMID:24498135

  10. Complete genome sequence of the Antarctic Halorubrum lacusprofundi type strain ACAM 34

    DOE PAGES

    Anderson, Iain J.; DasSarma, Priya; Lucas, Susan; ...

    2016-09-10

    Halorubrum lacusprofundi is an extreme halophile within the archaeal phylum Euryarchaeota. The type strain ACAM 34 was isolated from Deep Lake, Antarctica. H. lacusprofundi is of phylogenetic interest because it is distantly related to the haloarchaea that have previously been sequenced. It is also of interest because of its psychrotolerance. We report here the complete genome sequence of H. lacusprofundi type strain ACAM 34 and its annotation. In conclusion, this genome is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea.

  11. Complete genome sequence of the Antarctic Halorubrum lacusprofundi type strain ACAM 34

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Anderson, Iain J.; DasSarma, Priya; Lucas, Susan

    Halorubrum lacusprofundi is an extreme halophile within the archaeal phylum Euryarchaeota. The type strain ACAM 34 was isolated from Deep Lake, Antarctica. H. lacusprofundi is of phylogenetic interest because it is distantly related to the haloarchaea that have previously been sequenced. It is also of interest because of its psychrotolerance. We report here the complete genome sequence of H. lacusprofundi type strain ACAM 34 and its annotation. In conclusion, this genome is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea.

  12. MRI markers of small vessel disease in lobar and deep hemispheric intracerebral hemorrhage

    PubMed Central

    Smith, Eric E.; Nandigam, Kaveer R.N.; Chen, Yu-Wei; Jeng, Jed; Salat, David; Halpin, Amy; Frosch, Matthew; Wendell, Lauren; Fazen, Louis; Rosand, Jonathan; Viswanathan, Anand; Greenberg, Steven M.

    2014-01-01

    Background MRI evidence of small vessel disease is common in intracerebral hemorrhage (ICH). We hypothesized that ICH caused by cerebral amyloid angiopathy (CAA) or hypertensive vasculopathy would have different distributions of MRI T2 white matter hyperintensity (WMH) and microbleeds (MB). Methods Data were analyzed from 133 consecutive patients with primary supratentorial ICH and adequate MRI sequences. CAA was diagnosed using the Boston criteria. WMH segmentation was performed using a validated semi-automated method. WMH and MB were compared according to site of symptomatic hematoma origin (lobar vs. deep) or by pattern of hemorrhages, including both hematomas and MB, on MRI GRE sequence (grouped as lobar only--probable CAA, lobar only--possible CAA, deep hemispheric only, or mixed lobar and deep hemorrhages). Results Lobar and deep hemispheric hematoma patients had similar median nWMH volumes (19.5 cm vs. 19.9 cm3, p=0.74) and prevalence of ≥1 MB (54% vs. 52%, p=0.99). The supratentorial WMH distribution was similar according to hemorrhage location category, however the prevalence of brainstem T2 hyperintensity was lower in lobar hematoma vs. deep hematoma (54% vs. 70%, p=0.004). Mixed ICH was common (23%). Mixed ICH patients had large nWMH volumes and a posterior distribution of cortical hemorrhages similar to that seen in CAA. Conclusions WMH distribution is largely similar between CAA-related and non-CAA-related ICH. Mixed lobar and deep hemorrhages are seen on MRI GRE in up to one quarter of patients; in these patients both hypertension and CAA may be contributing to the burden of WMH. PMID:20689084

  13. Plastid Phylogenomics Resolve Deep Relationships among Eupolypod II Ferns with Rapid Radiation and Rate Heterogeneity

    PubMed Central

    Wei, Ran; Yan, Yue-Hong; Harris, AJ; Kang, Jong-Soo; Shen, Hui; Zhang, Xian-Chun

    2017-01-01

    Abstract The eupolypods II ferns represent a classic case of evolutionary radiation and, simultaneously, exhibit high substitution rate heterogeneity. These factors have been proposed to contribute to the contentious resolutions among clades within this fern group in multilocus phylogenetic studies. We investigated the deep phylogenetic relationships of eupolypod II ferns by sampling all major families and using 40 plastid genomes, or plastomes, of which 33 were newly sequenced with next-generation sequencing technology. We performed model-based analyses to evaluate the diversity of molecular evolutionary rates for these ferns. Our plastome data, with more than 26,000 informative characters, yielded good resolution for deep relationships within eupolypods II and unambiguously clarified the position of Rhachidosoraceae and the monophyly of Athyriaceae. Results of rate heterogeneity analysis revealed approximately 33 significant rate shifts in eupolypod II ferns, with the most heterogeneous rates (both accelerations and decelerations) occurring in two phylogenetically difficult lineages, that is, the Rhachidosoraceae–Aspleniaceae and Athyriaceae clades. These observations support the hypothesis that rate heterogeneity has previously constrained the deep phylogenetic resolution in eupolypods II. According to the plastome data, we propose that 14 chloroplast markers are particularly phylogenetically informative for eupolypods II both at the familial and generic levels. Our study demonstrates the power of a character-rich plastome data set and high-throughput sequencing for resolving the recalcitrant lineages, which have undergone rapid evolutionary radiation and dramatic changes in substitution rates. PMID:28854625

  14. Comprehensive identification and profiling of host miRNAs in response to Singapore grouper iridovirus (SGIV) infection in grouper (Epinephelus coioides).

    PubMed

    Guo, Chuanyu; Cui, Huachun; Ni, Songwei; Yan, Yang; Qin, Qiwei

    2015-10-01

    microRNAs (miRNAs) are an evolutionarily conserved class of non-coding RNA molecules that participate in various biological processes. Employment of high-throughput screening strategies greatly prompts the investigation and profiling of miRNAs in diverse species. In recent years, grouper (Epinephelus spp.) aquaculture was severely affected by iridoviral diseases. However, knowledge regarding the host immune responses to viral infection, especially the miRNA-mediated immune regulatory roles, is rather limited. In this study, by employing Solexa deep sequencing approach, we identified 116 grouper miRNAs from grouper spleen-derived cells (GS). As expected, these miRNAs shared high sequence similarity with miRNAs identified in zebrafish (Danio rerio), pufferfish (Fugu rubripes), and other higher vertebrates. In the process of Singapore grouper iridovirus (SGIV) infection, 45 and 43 miRNAs with altered expression (>1.5-fold) were identified by miRNA microarray assays in grouper spleen tissues and GS cells, respectively. Furthermore, target prediction revealed 189 putative targets of these grouper miRNAs. Copyright © 2015 Elsevier Ltd. All rights reserved.

  15. Identification and profiling of conserved and novel microRNAs involved in oil and oleic acid production during embryogenesis in Carya cathayensis Sarg.

    PubMed

    Wang, Zhengjia; Huang, Ruiming; Sun, Zhichao; Zhang, Tong; Huang, Jianqin

    2017-05-01

    MicroRNAs (miRNAs) are important regulators of plant development and fruit formation. Mature embryos of hickory (Carya cathayensis Sarg.) nuts contain more than 70% oil (comprising 90% unsaturated fatty acids), along with a substantial amount of oleic acid. To understand the roles of miRNAs involved in oil and oleic acid production during hickory embryogenesis, three small RNA libraries from different stages of embryogenesis were constructed. Deep sequencing of these three libraries identified 95 conserved miRNAs with 19 miRNA*s, 7 novel miRNAs (as well as their corresponding miRNA*s), and 26 potentially novel miRNAs. The analysis identified 15 miRNAs involved in oil and oleic acid production that are differentially expressed during embryogenesis in hickory. Among them, nine miRNA sequences, including eight conserved and one novel, were confirmed by qRT-PCR. In addition, 145 target genes of the novel miRNAs were predicted using a bioinformatic approach. Our results provide a framework for better understanding the roles of miRNAs during embryogenesis in hickory.

  16. Revised annotation of Plutella xylostella microRNAs and their genome-wide target identification.

    PubMed

    Etebari, K; Asgari, S

    2016-12-01

    The diamondback moth, Plutella xylostella, is the most devastating pest of brassica crops worldwide. Although 128 mature microRNAs (miRNAs) have been annotated from this species in miRBase, there is a need to extend and correct the current P. xylostella miRNA repertoire as a result of its recently improved genome assembly and more available small RNA sequence data. We used our new ultra-deep sequence data and bioinformatics to re-annotate the P. xylostella genome for high confidence miRNAs with the correct 5p and 3p arm features. Furthermore, all the P. xylostella annotated genes were also screened to identify potential miRNA binding sites using three target-predicting algorithms. In total, 203 mature miRNAs were annotated, including 33 novel miRNAs. We identified 7691 highly confident binding sites for 160 pxy-miRNAs. The data provided here will facilitate future studies involving functional analyses of P. xylostella miRNAs as a platform to introduce novel approaches for sustainable management of this destructive pest. © 2016 The Royal Entomological Society.

  17. A Deep Space Orbit Determination Software: Overview and Event Prediction Capability

    NASA Astrophysics Data System (ADS)

    Kim, Youngkwang; Park, Sang-Young; Lee, Eunji; Kim, Minsik

    2017-06-01

    This paper presents an overview of deep space orbit determination software (DSODS), as well as validation and verification results on its event prediction capabilities. DSODS was developed in the MATLAB object-oriented programming environment to support the Korea Pathfinder Lunar Orbiter (KPLO) mission. DSODS has three major capabilities: celestial event prediction for spacecraft, orbit determination with deep space network (DSN) tracking data, and DSN tracking data simulation. To achieve its functionality requirements, DSODS consists of four modules: orbit propagation (OP), event prediction (EP), data simulation (DS), and orbit determination (OD) modules. This paper explains the highest-level data flows between modules in event prediction, orbit determination, and tracking data simulation processes. Furthermore, to address the event prediction capability of DSODS, this paper introduces OP and EP modules. The role of the OP module is to handle time and coordinate system conversions, to propagate spacecraft trajectories, and to handle the ephemerides of spacecraft and celestial bodies. Currently, the OP module utilizes the General Mission Analysis Tool (GMAT) as a third-party software component for highfidelity deep space propagation, as well as time and coordinate system conversions. The role of the EP module is to predict celestial events, including eclipses, and ground station visibilities, and this paper presents the functionality requirements of the EP module. The validation and verification results show that, for most cases, event prediction errors were less than 10 millisec when compared with flight proven mission analysis tools such as GMAT and Systems Tool Kit (STK). Thus, we conclude that DSODS is capable of predicting events for the KPLO in real mission applications.

  18. An initial investigation on developing a new method to predict short-term breast cancer risk based on deep learning technology

    NASA Astrophysics Data System (ADS)

    Qiu, Yuchen; Wang, Yunzhi; Yan, Shiju; Tan, Maxine; Cheng, Samuel; Liu, Hong; Zheng, Bin

    2016-03-01

    In order to establish a new personalized breast cancer screening paradigm, it is critically important to accurately predict the short-term risk of a woman having image-detectable cancer after a negative mammographic screening. In this study, we developed and tested a novel short-term risk assessment model based on deep learning method. During the experiment, a number of 270 "prior" negative screening cases was assembled. In the next sequential ("current") screening mammography, 135 cases were positive and 135 cases remained negative. These cases were randomly divided into a training set with 200 cases and a testing set with 70 cases. A deep learning based computer-aided diagnosis (CAD) scheme was then developed for the risk assessment, which consists of two modules: adaptive feature identification module and risk prediction module. The adaptive feature identification module is composed of three pairs of convolution-max-pooling layers, which contains 20, 10, and 5 feature maps respectively. The risk prediction module is implemented by a multiple layer perception (MLP) classifier, which produces a risk score to predict the likelihood of the woman developing short-term mammography-detectable cancer. The result shows that the new CAD-based risk model yielded a positive predictive value of 69.2% and a negative predictive value of 74.2%, with a total prediction accuracy of 71.4%. This study demonstrated that applying a new deep learning technology may have significant potential to develop a new short-term risk predicting scheme with improved performance in detecting early abnormal symptom from the negative mammograms.

  19. Downscaling ocean conditions with application to the Gulf of Maine, Scotian Shelf and adjacent deep ocean

    NASA Astrophysics Data System (ADS)

    Katavouta, Anna; Thompson, Keith R.

    2016-08-01

    The overall goal is to downscale ocean conditions predicted by an existing global prediction system and evaluate the results using observations from the Gulf of Maine, Scotian Shelf and adjacent deep ocean. The first step is to develop a one-way nested regional model and evaluate its predictions using observations from multiple sources including satellite-borne sensors of surface temperature and sea level, CTDs, Argo floats and moored current meters. It is shown that the regional model predicts more realistic fields than the global system on the shelf because it has higher resolution and includes tides that are absent from the global system. However, in deep water the regional model misplaces deep ocean eddies and meanders associated with the Gulf Stream. This is not because the regional model's dynamics are flawed but rather is the result of internally generated variability in deep water that leads to decoupling of the regional model from the global system. To overcome this problem, the next step is to spectrally nudge the regional model to the large scales (length scales > 90 km) of the global system. It is shown this leads to more realistic predictions off the shelf. Wavenumber spectra show that even though spectral nudging constrains the large scales, it does not suppress the variability on small scales; on the contrary, it favours the formation of eddies with length scales below the cutoff wavelength of the spectral nudging.

  20. An improved advertising CTR prediction approach based on the fuzzy deep neural network

    PubMed Central

    Gao, Shu; Li, Mingjiang

    2018-01-01

    Combining a deep neural network with fuzzy theory, this paper proposes an advertising click-through rate (CTR) prediction approach based on a fuzzy deep neural network (FDNN). In this approach, fuzzy Gaussian-Bernoulli restricted Boltzmann machine (FGBRBM) is first applied to input raw data from advertising datasets. Next, fuzzy restricted Boltzmann machine (FRBM) is used to construct the fuzzy deep belief network (FDBN) with the unsupervised method layer by layer. Finally, fuzzy logistic regression (FLR) is utilized for modeling the CTR. The experimental results show that the proposed FDNN model outperforms several baseline models in terms of both data representation capability and robustness in advertising click log datasets with noise. PMID:29727443

Top