Quantitative phenotyping via deep barcode sequencing.
Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey
2009-10-01
Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.
Deep Sequencing to Identify the Causes of Viral Encephalitis
Chan, Benjamin K.; Wilson, Theodore; Fischer, Kael F.; Kriesel, John D.
2014-01-01
Deep sequencing allows for a rapid, accurate characterization of microbial DNA and RNA sequences in many types of samples. Deep sequencing (also called next generation sequencing or NGS) is being developed to assist with the diagnosis of a wide variety of infectious diseases. In this study, seven frozen brain samples from deceased subjects with recent encephalitis were investigated. RNA from each sample was extracted, randomly reverse transcribed and sequenced. The sequence analysis was performed in a blinded fashion and confirmed with pathogen-specific PCR. This analysis successfully identified measles virus sequences in two brain samples and herpes simplex virus type-1 sequences in three brain samples. No pathogen was identified in the other two brain specimens. These results were concordant with pathogen-specific PCR and partially concordant with prior neuropathological examinations, demonstrating that deep sequencing can accurately identify viral infections in frozen brain tissue. PMID:24699691
Pan, Xiaoyong; Shen, Hong-Bin
2018-05-02
RNA-binding proteins (RBPs) take over 5∼10% of the eukaryotic proteome and play key roles in many biological processes, e.g. gene regulation. Experimental detection of RBP binding sites is still time-intensive and high-costly. Instead, computational prediction of the RBP binding sites using pattern learned from existing annotation knowledge is a fast approach. From the biological point of view, the local structure context derived from local sequences will be recognized by specific RBPs. However, in computational modeling using deep learning, to our best knowledge, only global representations of entire RNA sequences are employed. So far, the local sequence information is ignored in the deep model construction process. In this study, we present a computational method iDeepE to predict RNA-protein binding sites from RNA sequences by combining global and local convolutional neural networks (CNNs). For the global CNN, we pad the RNA sequences into the same length. For the local CNN, we split a RNA sequence into multiple overlapping fixed-length subsequences, where each subsequence is a signal channel of the whole sequence. Next, we train deep CNNs for multiple subsequences and the padded sequences to learn high-level features, respectively. Finally, the outputs from local and global CNNs are combined to improve the prediction. iDeepE demonstrates a better performance over state-of-the-art methods on two large-scale datasets derived from CLIP-seq. We also find that the local CNN run 1.8 times faster than the global CNN with comparable performance when using GPUs. Our results show that iDeepE has captured experimentally verified binding motifs. https://github.com/xypan1232/iDeepE. xypan172436@gmail.com or hbshen@sjtu.edu.cn. Supplementary data are available at Bioinformatics online.
Park, Gyeong-Moon; Yoo, Yong-Ho; Kim, Deok-Hwa; Kim, Jong-Hwan; Gyeong-Moon Park; Yong-Ho Yoo; Deok-Hwa Kim; Jong-Hwan Kim; Yoo, Yong-Ho; Park, Gyeong-Moon; Kim, Jong-Hwan; Kim, Deok-Hwa
2018-06-01
Robots are expected to perform smart services and to undertake various troublesome or difficult tasks in the place of humans. Since these human-scale tasks consist of a temporal sequence of events, robots need episodic memory to store and retrieve the sequences to perform the tasks autonomously in similar situations. As episodic memory, in this paper we propose a novel Deep adaptive resonance theory (ART) neural model and apply it to the task performance of the humanoid robot, Mybot, developed in the Robot Intelligence Technology Laboratory at KAIST. Deep ART has a deep structure to learn events, episodes, and even more like daily episodes. Moreover, it can retrieve the correct episode from partial input cues robustly. To demonstrate the effectiveness and applicability of the proposed Deep ART, experiments are conducted with the humanoid robot, Mybot, for performing the three tasks of arranging toys, making cereal, and disposing of garbage.
Quantitative phenotyping via deep barcode sequencing
Smith, Andrew M.; Heisler, Lawrence E.; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J.; Chee, Mark; Roth, Frederick P.; Giaever, Guri; Nislow, Corey
2009-01-01
Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or “Bar-seq,” outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that ∼20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene–environment interactions on a genome-wide scale. PMID:19622793
Deep learning methods for protein torsion angle prediction.
Li, Haiou; Hou, Jie; Adhikari, Badri; Lyu, Qiang; Cheng, Jianlin
2017-09-18
Deep learning is one of the most powerful machine learning methods that has achieved the state-of-the-art performance in many domains. Since deep learning was introduced to the field of bioinformatics in 2012, it has achieved success in a number of areas such as protein residue-residue contact prediction, secondary structure prediction, and fold recognition. In this work, we developed deep learning methods to improve the prediction of torsion (dihedral) angles of proteins. We design four different deep learning architectures to predict protein torsion angles. The architectures including deep neural network (DNN) and deep restricted Boltzmann machine (DRBN), deep recurrent neural network (DRNN) and deep recurrent restricted Boltzmann machine (DReRBM) since the protein torsion angle prediction is a sequence related problem. In addition to existing protein features, two new features (predicted residue contact number and the error distribution of torsion angles extracted from sequence fragments) are used as input to each of the four deep learning architectures to predict phi and psi angles of protein backbone. The mean absolute error (MAE) of phi and psi angles predicted by DRNN, DReRBM, DRBM and DNN is about 20-21° and 29-30° on an independent dataset. The MAE of phi angle is comparable to the existing methods, but the MAE of psi angle is 29°, 2° lower than the existing methods. On the latest CASP12 targets, our methods also achieved the performance better than or comparable to a state-of-the art method. Our experiment demonstrates that deep learning is a valuable method for predicting protein torsion angles. The deep recurrent network architecture performs slightly better than deep feed-forward architecture, and the predicted residue contact number and the error distribution of torsion angles extracted from sequence fragments are useful features for improving prediction accuracy.
Jones, David T; Kandathil, Shaun M
2018-04-26
In addition to substitution frequency data from protein sequence alignments, many state-of-the-art methods for contact prediction rely on additional sources of information, or features, of protein sequences in order to predict residue-residue contacts, such as solvent accessibility, predicted secondary structure, and scores from other contact prediction methods. It is unclear how much of this information is needed to achieve state-of-the-art results. Here, we show that using deep neural network models, simple alignment statistics contain sufficient information to achieve state-of-the-art precision. Our prediction method, DeepCov, uses fully convolutional neural networks operating on amino-acid pair frequency or covariance data derived directly from sequence alignments, without using global statistical methods such as sparse inverse covariance or pseudolikelihood estimation. Comparisons against CCMpred and MetaPSICOV2 show that using pairwise covariance data calculated from raw alignments as input allows us to match or exceed the performance of both of these methods. Almost all of the achieved precision is obtained when considering relatively local windows (around 15 residues) around any member of a given residue pairing; larger window sizes have comparable performance. Assessment on a set of shallow sequence alignments (fewer than 160 effective sequences) indicates that the new method is substantially more precise than CCMpred and MetaPSICOV2 in this regime, suggesting that improved precision is attainable on smaller sequence families. Overall, the performance of DeepCov is competitive with the state of the art, and our results demonstrate that global models, which employ features from all parts of the input alignment when predicting individual contacts, are not strictly needed in order to attain precise contact predictions. DeepCov is freely available at https://github.com/psipred/DeepCov. d.t.jones@ucl.ac.uk.
Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A; Larsen, Martin Jakob
2016-01-01
Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths.
Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A.; Larsen, Martin Jakob
2016-01-01
Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths. PMID:27002637
A deep learning method for lincRNA detection using auto-encoder algorithm.
Yu, Ning; Yu, Zeng; Pan, Yi
2017-12-06
RNA sequencing technique (RNA-seq) enables scientists to develop novel data-driven methods for discovering more unidentified lincRNAs. Meantime, knowledge-based technologies are experiencing a potential revolution ignited by the new deep learning methods. By scanning the newly found data set from RNA-seq, scientists have found that: (1) the expression of lincRNAs appears to be regulated, that is, the relevance exists along the DNA sequences; (2) lincRNAs contain some conversed patterns/motifs tethered together by non-conserved regions. The two evidences give the reasoning for adopting knowledge-based deep learning methods in lincRNA detection. Similar to coding region transcription, non-coding regions are split at transcriptional sites. However, regulatory RNAs rather than message RNAs are generated. That is, the transcribed RNAs participate the biological process as regulatory units instead of generating proteins. Identifying these transcriptional regions from non-coding regions is the first step towards lincRNA recognition. The auto-encoder method achieves 100% and 92.4% prediction accuracy on transcription sites over the putative data sets. The experimental results also show the excellent performance of predictive deep neural network on the lincRNA data sets compared with support vector machine and traditional neural network. In addition, it is validated through the newly discovered lincRNA data set and one unreported transcription site is found by feeding the whole annotated sequences through the deep learning machine, which indicates that deep learning method has the extensive ability for lincRNA prediction. The transcriptional sequences of lincRNAs are collected from the annotated human DNA genome data. Subsequently, a two-layer deep neural network is developed for the lincRNA detection, which adopts the auto-encoder algorithm and utilizes different encoding schemes to obtain the best performance over intergenic DNA sequence data. Driven by those newly annotated lincRNA data, deep learning methods based on auto-encoder algorithm can exert their capability in knowledge learning in order to capture the useful features and the information correlation along DNA genome sequences for lincRNA detection. As our knowledge, this is the first application to adopt the deep learning techniques for identifying lincRNA transcription sequences.
Yildirim, Özal
2018-05-01
Long-short term memory networks (LSTMs), which have recently emerged in sequential data analysis, are the most widely used type of recurrent neural networks (RNNs) architecture. Progress on the topic of deep learning includes successful adaptations of deep versions of these architectures. In this study, a new model for deep bidirectional LSTM network-based wavelet sequences called DBLSTM-WS was proposed for classifying electrocardiogram (ECG) signals. For this purpose, a new wavelet-based layer is implemented to generate ECG signal sequences. The ECG signals were decomposed into frequency sub-bands at different scales in this layer. These sub-bands are used as sequences for the input of LSTM networks. New network models that include unidirectional (ULSTM) and bidirectional (BLSTM) structures are designed for performance comparisons. Experimental studies have been performed for five different types of heartbeats obtained from the MIT-BIH arrhythmia database. These five types are Normal Sinus Rhythm (NSR), Ventricular Premature Contraction (VPC), Paced Beat (PB), Left Bundle Branch Block (LBBB), and Right Bundle Branch Block (RBBB). The results show that the DBLSTM-WS model gives a high recognition performance of 99.39%. It has been observed that the wavelet-based layer proposed in the study significantly improves the recognition performance of conventional networks. This proposed network structure is an important approach that can be applied to similar signal processing problems. Copyright © 2018 Elsevier Ltd. All rights reserved.
Deep Recurrent Neural Networks for Human Activity Recognition
Murad, Abdulmajid
2017-01-01
Adopting deep learning methods for human activity recognition has been effective in extracting discriminative features from raw input sequences acquired from body-worn sensors. Although human movements are encoded in a sequence of successive samples in time, typical machine learning methods perform recognition tasks without exploiting the temporal correlations between input data samples. Convolutional neural networks (CNNs) address this issue by using convolutions across a one-dimensional temporal sequence to capture dependencies among input data. However, the size of convolutional kernels restricts the captured range of dependencies between data samples. As a result, typical models are unadaptable to a wide range of activity-recognition configurations and require fixed-length input windows. In this paper, we propose the use of deep recurrent neural networks (DRNNs) for building recognition models that are capable of capturing long-range dependencies in variable-length input sequences. We present unidirectional, bidirectional, and cascaded architectures based on long short-term memory (LSTM) DRNNs and evaluate their effectiveness on miscellaneous benchmark datasets. Experimental results show that our proposed models outperform methods employing conventional machine learning, such as support vector machine (SVM) and k-nearest neighbors (KNN). Additionally, the proposed models yield better performance than other deep learning techniques, such as deep believe networks (DBNs) and CNNs. PMID:29113103
Deep Recurrent Neural Networks for Human Activity Recognition.
Murad, Abdulmajid; Pyun, Jae-Young
2017-11-06
Adopting deep learning methods for human activity recognition has been effective in extracting discriminative features from raw input sequences acquired from body-worn sensors. Although human movements are encoded in a sequence of successive samples in time, typical machine learning methods perform recognition tasks without exploiting the temporal correlations between input data samples. Convolutional neural networks (CNNs) address this issue by using convolutions across a one-dimensional temporal sequence to capture dependencies among input data. However, the size of convolutional kernels restricts the captured range of dependencies between data samples. As a result, typical models are unadaptable to a wide range of activity-recognition configurations and require fixed-length input windows. In this paper, we propose the use of deep recurrent neural networks (DRNNs) for building recognition models that are capable of capturing long-range dependencies in variable-length input sequences. We present unidirectional, bidirectional, and cascaded architectures based on long short-term memory (LSTM) DRNNs and evaluate their effectiveness on miscellaneous benchmark datasets. Experimental results show that our proposed models outperform methods employing conventional machine learning, such as support vector machine (SVM) and k-nearest neighbors (KNN). Additionally, the proposed models yield better performance than other deep learning techniques, such as deep believe networks (DBNs) and CNNs.
Accurate identification of RNA editing sites from primitive sequence with deep neural networks.
Ouyang, Zhangyi; Liu, Feng; Zhao, Chenghui; Ren, Chao; An, Gaole; Mei, Chuan; Bo, Xiaochen; Shu, Wenjie
2018-04-16
RNA editing is a post-transcriptional RNA sequence alteration. Current methods have identified editing sites and facilitated research but require sufficient genomic annotations and prior-knowledge-based filtering steps, resulting in a cumbersome, time-consuming identification process. Moreover, these methods have limited generalizability and applicability in species with insufficient genomic annotations or in conditions of limited prior knowledge. We developed DeepRed, a deep learning-based method that identifies RNA editing from primitive RNA sequences without prior-knowledge-based filtering steps or genomic annotations. DeepRed achieved 98.1% and 97.9% area under the curve (AUC) in training and test sets, respectively. We further validated DeepRed using experimentally verified U87 cell RNA-seq data, achieving 97.9% positive predictive value (PPV). We demonstrated that DeepRed offers better prediction accuracy and computational efficiency than current methods with large-scale, mass RNA-seq data. We used DeepRed to assess the impact of multiple factors on editing identification with RNA-seq data from the Association of Biomolecular Resource Facilities and Sequencing Quality Control projects. We explored developmental RNA editing pattern changes during human early embryogenesis and evolutionary patterns in Drosophila species and the primate lineage using DeepRed. Our work illustrates DeepRed's state-of-the-art performance; it may decipher the hidden principles behind RNA editing, making editing detection convenient and effective.
Detection of Emerging Vaccine-Related Polioviruses by Deep Sequencing.
Sahoo, Malaya K; Holubar, Marisa; Huang, ChunHong; Mohamed-Hadley, Alisha; Liu, Yuanyuan; Waggoner, Jesse J; Troy, Stephanie B; Garcia-Garcia, Lourdes; Ferreyra-Reyes, Leticia; Maldonado, Yvonne; Pinsky, Benjamin A
2017-07-01
Oral poliovirus vaccine can mutate to regain neurovirulence. To date, evaluation of these mutations has been performed primarily on culture-enriched isolates by using conventional Sanger sequencing. We therefore developed a culture-independent, deep-sequencing method targeting the 5' untranslated region (UTR) and P1 genomic region to characterize vaccine-related poliovirus variants. Error analysis of the deep-sequencing method demonstrated reliable detection of poliovirus mutations at levels of <1%, depending on read depth. Sequencing of viral nucleic acids from the stool of vaccinated, asymptomatic children and their close contacts collected during a prospective cohort study in Veracruz, Mexico, revealed no vaccine-derived polioviruses. This was expected given that the longest duration between sequenced sample collection and the end of the most recent national immunization week was 66 days. However, we identified many low-level variants (<5%) distributed across the 5' UTR and P1 genomic region in all three Sabin serotypes, as well as vaccine-related viruses with multiple canonical mutations associated with phenotypic reversion present at high levels (>90%). These results suggest that monitoring emerging vaccine-related poliovirus variants by deep sequencing may aid in the poliovirus endgame and efforts to ensure global polio eradication. Copyright © 2017 Sahoo et al.
DEEP MOTIF DASHBOARD: VISUALIZING AND UNDERSTANDING GENOMIC SEQUENCES USING DEEP NEURAL NETWORKS.
Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun
2017-01-01
Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence's saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them.
Chen, Zhao; Moran, Kimberly; Richards-Yutz, Jennifer; Toorens, Erik; Gerhart, Daniel; Ganguly, Tapan; Shields, Carol L; Ganguly, Arupa
2014-03-01
Sporadic retinoblastoma (RB) is caused by de novo mutations in the RB1 gene. Often, these mutations are present as mosaic mutations that cannot be detected by Sanger sequencing. Next-generation deep sequencing allows unambiguous detection of the mosaic mutations in lymphocyte DNA. Deep sequencing of the RB1 gene on lymphocyte DNA from 20 bilateral and 70 unilateral RB cases was performed, where Sanger sequencing excluded the presence of mutations. The individual exons of the RB1 gene from each sample were amplified, pooled, ligated to barcoded adapters, and sequenced using semiconductor sequencing on an Ion Torrent Personal Genome Machine. Six low-level mosaic mutations were identified in bilateral RB and four in unilateral RB cases. The incidence of low-level mosaic mutation was estimated to be 30% and 6%, respectively, in sporadic bilateral and unilateral RB cases, previously classified as mutation negative. The frequency of point mutations detectable in lymphocyte DNA increased from 96% to 97% for bilateral RB and from 13% to 18% for unilateral RB. The use of deep sequencing technology increased the sensitivity of the detection of low-level germline mosaic mutations in the RB1 gene. This finding has significant implications for improved clinical diagnosis, genetic counseling, surveillance, and management of RB. © 2013 WILEY PERIODICALS, INC.
Mee, Edward T.; Preston, Mark D.; Minor, Philip D.; Schepelmann, Silke; Huang, Xuening; Nguyen, Jenny; Wall, David; Hargrove, Stacey; Fu, Thomas; Xu, George; Li, Li; Cote, Colette; Delwart, Eric; Li, Linlin; Hewlett, Indira; Simonyan, Vahan; Ragupathy, Viswanath; Alin, Voskanian-Kordi; Mermod, Nicolas; Hill, Christiane; Ottenwälder, Birgit; Richter, Daniel C.; Tehrani, Arman; Jacqueline, Weber-Lehmann; Cassart, Jean-Pol; Letellier, Carine; Vandeputte, Olivier; Ruelle, Jean-Louis; Deyati, Avisek; La Neve, Fabio; Modena, Chiara; Mee, Edward; Schepelmann, Silke; Preston, Mark; Minor, Philip; Eloit, Marc; Muth, Erika; Lamamy, Arnaud; Jagorel, Florence; Cheval, Justine; Anscombe, Catherine; Misra, Raju; Wooldridge, David; Gharbia, Saheer; Rose, Graham; Ng, Siemon H.S.; Charlebois, Robert L.; Gisonni-Lex, Lucy; Mallet, Laurent; Dorange, Fabien; Chiu, Charles; Naccache, Samia; Kellam, Paul; van der Hoek, Lia; Cotten, Matt; Mitchell, Christine; Baier, Brian S.; Sun, Wenping; Malicki, Heather D.
2016-01-01
Background Unbiased deep sequencing offers the potential for improved adventitious virus screening in vaccines and biotherapeutics. Successful implementation of such assays will require appropriate control materials to confirm assay performance and sensitivity. Methods A common reference material containing 25 target viruses was produced and 16 laboratories were invited to process it using their preferred adventitious virus detection assay. Results Fifteen laboratories returned results, obtained using a wide range of wet-lab and informatics methods. Six of 25 target viruses were detected by all laboratories, with the remaining viruses detected by 4–14 laboratories. Six non-target viruses were detected by three or more laboratories. Conclusion The study demonstrated that a wide range of methods are currently used for adventitious virus detection screening in biological products by deep sequencing and that they can yield significantly different results. This underscores the need for common reference materials to ensure satisfactory assay performance and enable comparisons between laboratories. PMID:26709640
AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling.
Wang, Sheng; Sun, Siqi; Xu, Jinbo
2016-09-01
Deep Convolutional Neural Networks (DCNN) has shown excellent performance in a variety of machine learning tasks. This paper presents Deep Convolutional Neural Fields (DeepCNF), an integration of DCNN with Conditional Random Field (CRF), for sequence labeling with an imbalanced label distribution. The widely-used training methods, such as maximum-likelihood and maximum labelwise accuracy, do not work well on imbalanced data. To handle this, we present a new training algorithm called maximum-AUC for DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced data. To fulfill this, we formulate AUC in a pairwise ranking framework, approximate it by a polynomial function and then apply a gradient-based procedure to optimize it. Our experimental results confirm that maximum-AUC greatly outperforms the other two training methods on 8-state secondary structure prediction and disorder prediction since their label distributions are highly imbalanced and also has similar performance as the other two training methods on solvent accessibility prediction, which has three equally-distributed labels. Furthermore, our experimental results show that our AUC-trained DeepCNF models greatly outperform existing popular predictors of these three tasks. The data and software related to this paper are available at https://github.com/realbigws/DeepCNF_AUC.
AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling
Wang, Sheng; Sun, Siqi
2017-01-01
Deep Convolutional Neural Networks (DCNN) has shown excellent performance in a variety of machine learning tasks. This paper presents Deep Convolutional Neural Fields (DeepCNF), an integration of DCNN with Conditional Random Field (CRF), for sequence labeling with an imbalanced label distribution. The widely-used training methods, such as maximum-likelihood and maximum labelwise accuracy, do not work well on imbalanced data. To handle this, we present a new training algorithm called maximum-AUC for DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced data. To fulfill this, we formulate AUC in a pairwise ranking framework, approximate it by a polynomial function and then apply a gradient-based procedure to optimize it. Our experimental results confirm that maximum-AUC greatly outperforms the other two training methods on 8-state secondary structure prediction and disorder prediction since their label distributions are highly imbalanced and also has similar performance as the other two training methods on solvent accessibility prediction, which has three equally-distributed labels. Furthermore, our experimental results show that our AUC-trained DeepCNF models greatly outperform existing popular predictors of these three tasks. The data and software related to this paper are available at https://github.com/realbigws/DeepCNF_AUC. PMID:28884168
RNA-Seq analysis to capture the transcriptome landscape of a single cell
Tang, Fuchou; Barbacioru, Catalin; Nordman, Ellen; Xu, Nanlan; Bashkirov, Vladimir I; Lao, Kaiqin; Surani, M. Azim
2013-01-01
We describe here a protocol for digital transcriptome analysis in a single mouse blastomere using a deep sequencing approach. An individual blastomere was first isolated and put into lysate buffer by mouth pipette. Reverse transcription was then performed directly on the whole cell lysate. After this, the free primers were removed by Exonuclease I and a poly(A) tail was added to the 3′ end of the first-strand cDNA by Terminal Deoxynucleotidyl Transferase. Then the single cell cDNAs were amplified by 20 plus 9 cycles of PCR. Then 100-200 ng of these amplified cDNAs were used to construct a sequencing library. The sequencing library can be used for deep sequencing using the SOLiD system. Compared with the cDNA microarray technique, our assay can capture up to 75% more genes expressed in early embryos. The protocol can generate deep sequencing libraries within 6 days for 16 single cell samples. PMID:20203668
DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations.
Yuan, Yuchen; Shi, Yi; Li, Changyang; Kim, Jinman; Cai, Weidong; Han, Zeguang; Feng, David Dagan
2016-12-23
With the developments of DNA sequencing technology, large amounts of sequencing data have become available in recent years and provide unprecedented opportunities for advanced association studies between somatic point mutations and cancer types/subtypes, which may contribute to more accurate somatic point mutation based cancer classification (SMCC). However in existing SMCC methods, issues like high data sparsity, small volume of sample size, and the application of simple linear classifiers, are major obstacles in improving the classification performance. To address the obstacles in existing SMCC studies, we propose DeepGene, an advanced deep neural network (DNN) based classifier, that consists of three steps: firstly, the clustered gene filtering (CGF) concentrates the gene data by mutation occurrence frequency, filtering out the majority of irrelevant genes; secondly, the indexed sparsity reduction (ISR) converts the gene data into indexes of its non-zero elements, thereby significantly suppressing the impact of data sparsity; finally, the data after CGF and ISR is fed into a DNN classifier, which extracts high-level features for accurate classification. Experimental results on our curated TCGA-DeepGene dataset, which is a reformulated subset of the TCGA dataset containing 12 selected types of cancer, show that CGF, ISR and DNN all contribute in improving the overall classification performance. We further compare DeepGene with three widely adopted classifiers and demonstrate that DeepGene has at least 24% performance improvement in terms of testing accuracy. Based on deep learning and somatic point mutation data, we devise DeepGene, an advanced cancer type classifier, which addresses the obstacles in existing SMCC studies. Experiments indicate that DeepGene outperforms three widely adopted existing classifiers, which is mainly attributed to its deep learning module that is able to extract the high level features between combinatorial somatic point mutations and cancer types.
DeepSig: deep learning improves signal peptide detection in proteins.
Savojardo, Castrense; Martelli, Pier Luigi; Fariselli, Piero; Casadio, Rita
2018-05-15
The identification of signal peptides in protein sequences is an important step toward protein localization and function characterization. Here, we present DeepSig, an improved approach for signal peptide detection and cleavage-site prediction based on deep learning methods. Comparative benchmarks performed on an updated independent dataset of proteins show that DeepSig is the current best performing method, scoring better than other available state-of-the-art approaches on both signal peptide detection and precise cleavage-site identification. DeepSig is available as both standalone program and web server at https://deepsig.biocomp.unibo.it. All datasets used in this study can be obtained from the same website. pierluigi.martelli@unibo.it. Supplementary data are available at Bioinformatics online.
Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks
Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun
2018-01-01
Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence’s saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them. PMID:27896980
DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data.
Arango-Argoty, Gustavo; Garner, Emily; Pruden, Amy; Heath, Lenwood S; Vikesland, Peter; Zhang, Liqing
2018-02-01
Growing concerns about increasing rates of antibiotic resistance call for expanded and comprehensive global monitoring. Advancing methods for monitoring of environmental media (e.g., wastewater, agricultural waste, food, and water) is especially needed for identifying potential resources of novel antibiotic resistance genes (ARGs), hot spots for gene exchange, and as pathways for the spread of ARGs and human exposure. Next-generation sequencing now enables direct access and profiling of the total metagenomic DNA pool, where ARGs are typically identified or predicted based on the "best hits" of sequence searches against existing databases. Unfortunately, this approach produces a high rate of false negatives. To address such limitations, we propose here a deep learning approach, taking into account a dissimilarity matrix created using all known categories of ARGs. Two deep learning models, DeepARG-SS and DeepARG-LS, were constructed for short read sequences and full gene length sequences, respectively. Evaluation of the deep learning models over 30 antibiotic resistance categories demonstrates that the DeepARG models can predict ARGs with both high precision (> 0.97) and recall (> 0.90). The models displayed an advantage over the typical best hit approach, yielding consistently lower false negative rates and thus higher overall recall (> 0.9). As more data become available for under-represented ARG categories, the DeepARG models' performance can be expected to be further enhanced due to the nature of the underlying neural networks. Our newly developed ARG database, DeepARG-DB, encompasses ARGs predicted with a high degree of confidence and extensive manual inspection, greatly expanding current ARG repositories. The deep learning models developed here offer more accurate antimicrobial resistance annotation relative to current bioinformatics practice. DeepARG does not require strict cutoffs, which enables identification of a much broader diversity of ARGs. The DeepARG models and database are available as a command line version and as a Web service at http://bench.cs.vt.edu/deeparg .
Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions
2014-01-01
Deep sequencing harnesses the high throughput nature of next generation sequencing technologies to generate population samples, treating information contained in individual reads as meaningful. Here, we review applications of deep sequencing to pathogen evolution. Pioneering deep sequencing studies from the virology literature are discussed, such as whole genome Roche-454 sequencing analyses of the dynamics of the rapidly mutating pathogens hepatitis C virus and HIV. Extension of the deep sequencing approach to bacterial populations is then discussed, including the impacts of emerging sequencing technologies. While it is clear that deep sequencing has unprecedented potential for assessing the genetic structure and evolutionary history of pathogen populations, bioinformatic challenges remain. We summarise current approaches to overcoming these challenges, in particular methods for detecting low frequency variants in the context of sequencing error and reconstructing individual haplotypes from short reads. PMID:24428920
Isakov, Ofer; Bordería, Antonio V; Golan, David; Hamenahem, Amir; Celniker, Gershon; Yoffe, Liron; Blanc, Hervé; Vignuzzi, Marco; Shomron, Noam
2015-07-01
The study of RNA virus populations is a challenging task. Each population of RNA virus is composed of a collection of different, yet related genomes often referred to as mutant spectra or quasispecies. Virologists using deep sequencing technologies face major obstacles when studying virus population dynamics, both experimentally and in natural settings due to the relatively high error rates of these technologies and the lack of high performance pipelines. In order to overcome these hurdles we developed a computational pipeline, termed ViVan (Viral Variance Analysis). ViVan is a complete pipeline facilitating the identification, characterization and comparison of sequence variance in deep sequenced virus populations. Applying ViVan on deep sequenced data obtained from samples that were previously characterized by more classical approaches, we uncovered novel and potentially crucial aspects of virus populations. With our experimental work, we illustrate how ViVan can be used for studies ranging from the more practical, detection of resistant mutations and effects of antiviral treatments, to the more theoretical temporal characterization of the population in evolutionary studies. Freely available on the web at http://www.vivanbioinfo.org : nshomron@post.tau.ac.il Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Porter, Danielle P.; Daeumer, Martin; Thielen, Alexander; Chang, Silvia; Martin, Ross; Cohen, Cal; Miller, Michael D.; White, Kirsten L.
2015-01-01
At Week 96 of the Single-Tablet Regimen (STaR) study, more treatment-naïve subjects that received rilpivirine/emtricitabine/tenofovir DF (RPV/FTC/TDF) developed resistance mutations compared to those treated with efavirenz (EFV)/FTC/TDF by population sequencing. Furthermore, more RPV/FTC/TDF-treated subjects with baseline HIV-1 RNA >100,000 copies/mL developed resistance compared to subjects with baseline HIV-1 RNA ≤100,000 copies/mL. Here, deep sequencing was utilized to assess the presence of pre-existing low-frequency variants in subjects with and without resistance development in the STaR study. Deep sequencing (Illumina MiSeq) was performed on baseline and virologic failure samples for all subjects analyzed for resistance by population sequencing during the clinical study (n = 33), as well as baseline samples from control subjects with virologic response (n = 118). Primary NRTI or NNRTI drug resistance mutations present at low frequency (≥2% to 20%) were detected in 6.6% of baseline samples by deep sequencing, all of which occurred in control subjects. Deep sequencing results were generally consistent with population sequencing but detected additional primary NNRTI and NRTI resistance mutations at virologic failure in seven samples. HIV-1 drug resistance mutations emerging while on RPV/FTC/TDF or EFV/FTC/TDF treatment were not present at low frequency at baseline in the STaR study. PMID:26690199
Porter, Danielle P; Daeumer, Martin; Thielen, Alexander; Chang, Silvia; Martin, Ross; Cohen, Cal; Miller, Michael D; White, Kirsten L
2015-12-07
At Week 96 of the Single-Tablet Regimen (STaR) study, more treatment-naïve subjects that received rilpivirine/emtricitabine/tenofovir DF (RPV/FTC/TDF) developed resistance mutations compared to those treated with efavirenz (EFV)/FTC/TDF by population sequencing. Furthermore, more RPV/FTC/TDF-treated subjects with baseline HIV-1 RNA >100,000 copies/mL developed resistance compared to subjects with baseline HIV-1 RNA ≤100,000 copies/mL. Here, deep sequencing was utilized to assess the presence of pre-existing low-frequency variants in subjects with and without resistance development in the STaR study. Deep sequencing (Illumina MiSeq) was performed on baseline and virologic failure samples for all subjects analyzed for resistance by population sequencing during the clinical study (n = 33), as well as baseline samples from control subjects with virologic response (n = 118). Primary NRTI or NNRTI drug resistance mutations present at low frequency (≥2% to 20%) were detected in 6.6% of baseline samples by deep sequencing, all of which occurred in control subjects. Deep sequencing results were generally consistent with population sequencing but detected additional primary NNRTI and NRTI resistance mutations at virologic failure in seven samples. HIV-1 drug resistance mutations emerging while on RPV/FTC/TDF or EFV/FTC/TDF treatment were not present at low frequency at baseline in the STaR study.
Deep learning improves prediction of CRISPR-Cpf1 guide RNA activity.
Kim, Hui Kwon; Min, Seonwoo; Song, Myungjae; Jung, Soobin; Choi, Jae Woo; Kim, Younggwang; Lee, Sangeun; Yoon, Sungroh; Kim, Hyongbum Henry
2018-03-01
We present two algorithms to predict the activity of AsCpf1 guide RNAs. Indel frequencies for 15,000 target sequences were used in a deep-learning framework based on a convolutional neural network to train Seq-deepCpf1. We then incorporated chromatin accessibility information to create the better-performing DeepCpf1 algorithm for cell lines for which such information is available and show that both algorithms outperform previous machine learning algorithms on our own and published data sets.
deepTools2: a next generation web server for deep-sequencing data analysis.
Ramírez, Fidel; Ryan, Devon P; Grüning, Björn; Bhardwaj, Vivek; Kilpert, Fabian; Richter, Andreas S; Heyne, Steffen; Dündar, Friederike; Manke, Thomas
2016-07-08
We present an update to our Galaxy-based web server for processing and visualizing deeply sequenced data. Its core tool set, deepTools, allows users to perform complete bioinformatic workflows ranging from quality controls and normalizations of aligned reads to integrative analyses, including clustering and visualization approaches. Since we first described our deepTools Galaxy server in 2014, we have implemented new solutions for many requests from the community and our users. Here, we introduce significant enhancements and new tools to further improve data visualization and interpretation. deepTools continue to be open to all users and freely available as a web service at deeptools.ie-freiburg.mpg.de The new deepTools2 suite can be easily deployed within any Galaxy framework via the toolshed repository, and we also provide source code for command line usage under Linux and Mac OS X. A public and documented API for access to deepTools functionality is also available. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Intelligent fault diagnosis of rolling bearings using an improved deep recurrent neural network
NASA Astrophysics Data System (ADS)
Jiang, Hongkai; Li, Xingqiu; Shao, Haidong; Zhao, Ke
2018-06-01
Traditional intelligent fault diagnosis methods for rolling bearings heavily depend on manual feature extraction and feature selection. For this purpose, an intelligent deep learning method, named the improved deep recurrent neural network (DRNN), is proposed in this paper. Firstly, frequency spectrum sequences are used as inputs to reduce the input size and ensure good robustness. Secondly, DRNN is constructed by the stacks of the recurrent hidden layer to automatically extract the features from the input spectrum sequences. Thirdly, an adaptive learning rate is adopted to improve the training performance of the constructed DRNN. The proposed method is verified with experimental rolling bearing data, and the results confirm that the proposed method is more effective than traditional intelligent fault diagnosis methods.
DeepBase: annotation and discovery of microRNAs and other noncoding RNAs from deep-sequencing data.
Yang, Jian-Hua; Qu, Liang-Hu
2012-01-01
Recent advances in high-throughput deep-sequencing technology have produced large numbers of short and long RNA sequences and enabled the detection and profiling of known and novel microRNAs (miRNAs) and other noncoding RNAs (ncRNAs) at unprecedented sensitivity and depth. In this chapter, we describe the use of deepBase, a database that we have developed to integrate all public deep-sequencing data and to facilitate the comprehensive annotation and discovery of miRNAs and other ncRNAs from these data. deepBase provides an integrative, interactive, and versatile web graphical interface to evaluate miRBase-annotated miRNA genes and other known ncRNAs, explores the expression patterns of miRNAs and other ncRNAs, and discovers novel miRNAs and other ncRNAs from deep-sequencing data. deepBase also provides a deepView genome browser to comparatively analyze these data at multiple levels. deepBase is available at http://deepbase.sysu.edu.cn/.
Making sense of deep sequencing
Goldman, D.; Domschke, K.
2016-01-01
This review, the first of an occasional series, tries to make sense of the concepts and uses of deep sequencing of polynucleic acids (DNA and RNA). Deep sequencing, synonymous with next-generation sequencing, high-throughput sequencing and massively parallel sequencing, includes whole genome sequencing but is more often and diversely applied to specific parts of the genome captured in different ways, for example the highly expressed portion of the genome known as the exome and portions of the genome that are epigenetically marked either by DNA methylation, the binding of proteins including histones, or that are in different configurations and thus more or less accessible to enzymes that cleave DNA. Deep sequencing of RNA (RNASeq) reverse-transcribed to complementary DNA is invaluable for measuring RNA expression and detecting changes in RNA structure. Important concepts in deep sequencing include the length and depth of sequence reads, mapping and assembly of reads, sequencing error, haplotypes, and the propensity of deep sequencing, as with other types of ‘big data’, to generate large numbers of errors, requiring monitoring for methodologic biases and strategies for replication and validation. Deep sequencing yields a unique genetic fingerprint that can be used to identify a person, and a trove of predictors of genetic medical diseases. Deep sequencing to identify epigenetic events including changes in DNA methylation and RNA expression can reveal the history and impact of environmental exposures. Because of the power of sequencing to identify and deliver biomedically significant information about a person and their blood relatives, it creates ethical dilemmas and practical challenges in research and clinical care, for example the decision and procedures to report incidental findings that will increasingly and frequently be discovered. PMID:24925306
De novo peptide sequencing by deep learning
Tran, Ngoc Hieu; Zhang, Xianglilan; Xin, Lei; Shan, Baozhen; Li, Ming
2017-01-01
De novo peptide sequencing from tandem MS data is the key technology in proteomics for the characterization of proteins, especially for new sequences, such as mAbs. In this study, we propose a deep neural network model, DeepNovo, for de novo peptide sequencing. DeepNovo architecture combines recent advances in convolutional neural networks and recurrent neural networks to learn features of tandem mass spectra, fragment ions, and sequence patterns of peptides. The networks are further integrated with local dynamic programming to solve the complex optimization task of de novo sequencing. We evaluated the method on a wide variety of species and found that DeepNovo considerably outperformed state of the art methods, achieving 7.7–22.9% higher accuracy at the amino acid level and 38.1–64.0% higher accuracy at the peptide level. We further used DeepNovo to automatically reconstruct the complete sequences of antibody light and heavy chains of mouse, achieving 97.5–100% coverage and 97.2–99.5% accuracy, without assisting databases. Moreover, DeepNovo is retrainable to adapt to any sources of data and provides a complete end-to-end training and prediction solution to the de novo sequencing problem. Not only does our study extend the deep learning revolution to a new field, but it also shows an innovative approach in solving optimization problems by using deep learning and dynamic programming. PMID:28720701
ComplexContact: a web server for inter-protein contact prediction using deep learning.
Zeng, Hong; Wang, Sheng; Zhou, Tianming; Zhao, Feifeng; Li, Xiufeng; Wu, Qing; Xu, Jinbo
2018-05-22
ComplexContact (http://raptorx2.uchicago.edu/ComplexContact/) is a web server for sequence-based interfacial residue-residue contact prediction of a putative protein complex. Interfacial residue-residue contacts are critical for understanding how proteins form complex and interact at residue level. When receiving a pair of protein sequences, ComplexContact first searches for their sequence homologs and builds two paired multiple sequence alignments (MSA), then it applies co-evolution analysis and a CASP-winning deep learning (DL) method to predict interfacial contacts from paired MSAs and visualizes the prediction as an image. The DL method was originally developed for intra-protein contact prediction and performed the best in CASP12. Our large-scale experimental test further shows that ComplexContact greatly outperforms pure co-evolution methods for inter-protein contact prediction, regardless of the species.
The partial 16S rDNA gene sequences of two thermophilic archaeal strains, TY and TYS, previously isolated from the Guaymas Basin hydrothermal vent site were determined. Lipid analyses and a comparative analysis performed with 16S rDNA sequences of similar thermophilic species sho...
NASA Astrophysics Data System (ADS)
Zhang, Xiao-Yong; Wang, Guang-Hua; Xu, Xin-Ya; Nong, Xu-Hua; Wang, Jie; Amin, Muhammad; Qi, Shu-Hua
2016-10-01
The present study investigated the fungal diversity in four different deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing of the nuclear ribosomal internal transcribed spacer-1 (ITS1). A total of 40,297 fungal ITS1 sequences clustered into 420 operational taxonomic units (OTUs) with 97% sequence similarity and 170 taxa were recovered from these sediments. Most ITS1 sequences (78%) belonged to the phylum Ascomycota, followed by Basidiomycota (17.3%), Zygomycota (1.5%) and Chytridiomycota (0.8%), and a small proportion (2.4%) belonged to unassigned fungal phyla. Compared with previous studies on fungal diversity of sediments from deep-sea environments by culture-dependent approach and clone library analysis, the present result suggested that Illumina sequencing had been dramatically accelerating the discovery of fungal community of deep-sea sediments. Furthermore, our results revealed that Sordariomycetes was the most diverse and abundant fungal class in this study, challenging the traditional view that the diversity of Sordariomycetes phylotypes was low in the deep-sea environments. In addition, more than 12 taxa accounted for 21.5% sequences were found to be rarely reported as deep-sea fungi, suggesting the deep-sea sediments from Okinawa Trough harbored a plethora of different fungal communities compared with other deep-sea environments. To our knowledge, this study is the first exploration of the fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing.
A visual tracking method based on deep learning without online model updating
NASA Astrophysics Data System (ADS)
Tang, Cong; Wang, Yicheng; Feng, Yunsong; Zheng, Chao; Jin, Wei
2018-02-01
The paper proposes a visual tracking method based on deep learning without online model updating. In consideration of the advantages of deep learning in feature representation, deep model SSD (Single Shot Multibox Detector) is used as the object extractor in the tracking model. Simultaneously, the color histogram feature and HOG (Histogram of Oriented Gradient) feature are combined to select the tracking object. In the process of tracking, multi-scale object searching map is built to improve the detection performance of deep detection model and the tracking efficiency. In the experiment of eight respective tracking video sequences in the baseline dataset, compared with six state-of-the-art methods, the method in the paper has better robustness in the tracking challenging factors, such as deformation, scale variation, rotation variation, illumination variation, and background clutters, moreover, its general performance is better than other six tracking methods.
Protein Solvent-Accessibility Prediction by a Stacked Deep Bidirectional Recurrent Neural Network.
Zhang, Buzhong; Li, Linqing; Lü, Qiang
2018-05-25
Residue solvent accessibility is closely related to the spatial arrangement and packing of residues. Predicting the solvent accessibility of a protein is an important step to understand its structure and function. In this work, we present a deep learning method to predict residue solvent accessibility, which is based on a stacked deep bidirectional recurrent neural network applied to sequence profiles. To capture more long-range sequence information, a merging operator was proposed when bidirectional information from hidden nodes was merged for outputs. Three types of merging operators were used in our improved model, with a long short-term memory network performing as a hidden computing node. The trained database was constructed from 7361 proteins extracted from the PISCES server using a cut-off of 25% sequence identity. Sequence-derived features including position-specific scoring matrix, physical properties, physicochemical characteristics, conservation score and protein coding were used to represent a residue. Using this method, predictive values of continuous relative solvent-accessible area were obtained, and then, these values were transformed into binary states with predefined thresholds. Our experimental results showed that our deep learning method improved prediction quality relative to current methods, with mean absolute error and Pearson's correlation coefficient values of 8.8% and 74.8%, respectively, on the CB502 dataset and 8.2% and 78%, respectively, on the Manesh215 dataset.
MRI markers of small vessel disease in lobar and deep hemispheric intracerebral hemorrhage.
Smith, Eric E; Nandigam, Kaveer R N; Chen, Yu-Wei; Jeng, Jed; Salat, David; Halpin, Amy; Frosch, Matthew; Wendell, Lauren; Fazen, Louis; Rosand, Jonathan; Viswanathan, Anand; Greenberg, Steven M
2010-09-01
MRI evidence of small vessel disease is common in intracerebral hemorrhage (ICH). We hypothesized that ICH caused by cerebral amyloid angiopathy (CAA) or hypertensive vasculopathy would have different distributions of MRI T2 white matter hyperintensity (WMH) and microbleeds. Data were analyzed from 133 consecutive patients with primary supratentorial ICH and adequate MRI sequences. CAA was diagnosed using the Boston criteria. WMH segmentation was performed using a validated semiautomated method. WMH and microbleeds were compared according to site of symptomatic hematoma origin (lobar versus deep) or by pattern of hemorrhages, including both hematomas and microbleeds, on MRI gradient recalled echo sequence (grouped as lobar only-probable CAA, lobar only-possible CAA, deep hemispheric only, or mixed lobar and deep hemorrhages). Patients with lobar and deep hemispheric hematoma had similar median normalized WMH volumes (19.5 cm versus 19.9 cm(3), P=0.74) and prevalence of >or=1 microbleed (54% versus 52%, P=0.99). The supratentorial WMH distribution was similar according to hemorrhage location category; however, the prevalence of brain stem T2 hyperintensity was lower in lobar hematoma versus deep hematoma (54% versus 70%, P=0.004). Mixed ICH was common (23%). Patients with mixed ICH had large normalized WMH volumes and a posterior distribution of cortical hemorrhages similar to that seen in CAA. WMH distribution is largely similar between CAA-related and non-CAA-related ICH. Mixed lobar and deep hemorrhages are seen on MRI gradient recalled echo sequence in up to one fourth of patients; in these patients, both hypertension and CAA may be contributing to the burden of WMH.
Wu, Shuang; Nakamoto, Shingo; Kanda, Tatsuo; Jiang, Xia; Nakamura, Masato; Miyamura, Tatsuo; Shirasawa, Hiroshi; Sugiura, Nobuyuki; Takahashi-Nakaguchi, Azusa; Gonoi, Tohru; Yokosuka, Osamu
2014-01-01
Hepatitis A virus (HAV) is a causative agent of acute viral hepatitis for which an effective vaccine has been developed. Here we describe ultra-deep pyrosequences (UDPSs) of HAV 5'-untranslated region (5'UTR) among cases of the same outbreak, which arose from a single source, associated with a revolving sushi bar. We determined the reference sequence from HAV-derived clone from an attendant by the Sanger method. Sixteen UDPSs from this outbreak and one from another sporadic case were compared with this reference. Nucleotide errors yielded a UDPS error rate of < 1%. This study confirmed that nucleotide substitutions of this region are transition mutations in outbreak cases, that insertion was observed only in non-severe cases, and that these nucleotide substitutions were different from those of the sporadic case. Analysis of UDPSs detected low-prevalence HAV variations in 5'UTR, but no specific mutations associated with severity in these outbreak cases. To our surprise, HAV strains in this outbreak conserved HAV IRES sequence even if we performed analysis of UDPSs. UDPS analysis of HAV 5'UTR gave us no association between the disease severity of hepatitis A and HAV 5'UTR substitutions. It might be more interesting to perform ultra-deep sequencing of full length HAV genome in order to reveal possible unknown genomic determinants associated with disease severity. Further studies will be needed. PMID:24396287
Deep whole-genome sequencing of 100 southeast Asian Malays.
Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying
2013-01-10
Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Deep Whole-Genome Sequencing of 100 Southeast Asian Malays
Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying
2013-01-01
Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. PMID:23290073
Using Deep Learning Model for Meteorological Satellite Cloud Image Prediction
NASA Astrophysics Data System (ADS)
Su, X.
2017-12-01
A satellite cloud image contains much weather information such as precipitation information. Short-time cloud movement forecast is important for precipitation forecast and is the primary means for typhoon monitoring. The traditional methods are mostly using the cloud feature matching and linear extrapolation to predict the cloud movement, which makes that the nonstationary process such as inversion and deformation during the movement of the cloud is basically not considered. It is still a hard task to predict cloud movement timely and correctly. As deep learning model could perform well in learning spatiotemporal features, to meet this challenge, we could regard cloud image prediction as a spatiotemporal sequence forecasting problem and introduce deep learning model to solve this problem. In this research, we use a variant of Gated-Recurrent-Unit(GRU) that has convolutional structures to deal with spatiotemporal features and build an end-to-end model to solve this forecast problem. In this model, both the input and output are spatiotemporal sequences. Compared to Convolutional LSTM(ConvLSTM) model, this model has lower amount of parameters. We imply this model on GOES satellite data and the model perform well.
Ibrahim, Wisam; Abadeh, Mohammad Saniee
2017-05-21
Protein fold recognition is an important problem in bioinformatics to predict three-dimensional structure of a protein. One of the most challenging tasks in protein fold recognition problem is the extraction of efficient features from the amino-acid sequences to obtain better classifiers. In this paper, we have proposed six descriptors to extract features from protein sequences. These descriptors are applied in the first stage of a three-stage framework PCA-DELM-LDA to extract feature vectors from the amino-acid sequences. Principal Component Analysis PCA has been implemented to reduce the number of extracted features. The extracted feature vectors have been used with original features to improve the performance of the Deep Extreme Learning Machine DELM in the second stage. Four new features have been extracted from the second stage and used in the third stage by Linear Discriminant Analysis LDA to classify the instances into 27 folds. The proposed framework is implemented on the independent and combined feature sets in SCOP datasets. The experimental results show that extracted feature vectors in the first stage could improve the performance of DELM in extracting new useful features in second stage. Copyright © 2017 Elsevier Ltd. All rights reserved.
Compilation of Reprints Number 63.
1986-03-01
Michel Be6, Stephen H1. Johnson, and E.F. Chiburis PRELIMINARY SEISMIC REFRACTION RESULTS USING A BOREHOLE SEISMOMETER IN DEEP SEA DRILLING PROJECT HOLE...refraction data with wells drilled on land and offshore reflection profiles permits tentative identification of geologic sequences on the basis of...PERIOD CO’VEAEO PRELIMINARY SEISMIC REFRACTION RESULTS USING A Rern BOREHOLE SEISMOMETER IN DEEP SEA DRILLING ~ rn PROJECT HOLE 395A 6.PERFORMING ORG
Kumar, S; Gadagkar, S R
2000-12-01
The neighbor-joining (NJ) method is widely used in reconstructing large phylogenies because of its computational speed and the high accuracy in phylogenetic inference as revealed in computer simulation studies. However, most computer simulation studies have quantified the overall performance of the NJ method in terms of the percentage of branches inferred correctly or the percentage of replications in which the correct tree is recovered. We have examined other aspects of its performance, such as the relative efficiency in correctly reconstructing shallow (close to the external branches of the tree) and deep branches in large phylogenies; the contribution of zero-length branches to topological errors in the inferred trees; and the influence of increasing the tree size (number of sequences), evolutionary rate, and sequence length on the efficiency of the NJ method. Results show that the correct reconstruction of deep branches is no more difficult than that of shallower branches. The presence of zero-length branches in realized trees contributes significantly to the overall error observed in the NJ tree, especially in large phylogenies or slowly evolving genes. Furthermore, the tree size does not influence the efficiency of NJ in reconstructing shallow and deep branches in our simulation study, in which the evolutionary process is assumed to be homogeneous in all lineages.
Deep Packet/Flow Analysis using GPUs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gong, Qian; Wu, Wenji; DeMar, Phil
Deep packet inspection (DPI) faces severe performance challenges in high-speed networks (40/100 GE) as it requires a large amount of raw computing power and high I/O throughputs. Recently, researchers have tentatively used GPUs to address the above issues and boost the performance of DPI. Typically, DPI applications involve highly complex operations in both per-packet and per-flow data level, often in real-time. The parallel architecture of GPUs fits exceptionally well for per-packet network traffic processing. However, for stateful network protocols such as TCP, their data stream need to be reconstructed in a per-flow level to deliver a consistent content analysis. Sincemore » the flow-centric operations are naturally antiparallel and often require large memory space for buffering out-of-sequence packets, they can be problematic for GPUs, whose memory is normally limited to several gigabytes. In this work, we present a highly efficient GPU-based deep packet/flow analysis framework. The proposed design includes a purely GPU-implemented flow tracking and TCP stream reassembly. Instead of buffering and waiting for TCP packets to become in sequence, our framework process the packets in batch and uses a deterministic finite automaton (DFA) with prefix-/suffix- tree method to detect patterns across out-of-sequence packets that happen to be located in different batches. In conclusion, evaluation shows that our code can reassemble and forward tens of millions of packets per second and conduct a stateful signature-based deep packet inspection at 55 Gbit/s using an NVIDIA K40 GPU.« less
Deep whole-genome sequencing of 90 Han Chinese genomes.
Lan, Tianming; Lin, Haoxiang; Zhu, Wenjuan; Laurent, Tellier Christian Asker Melchior; Yang, Mengcheng; Liu, Xin; Wang, Jun; Wang, Jian; Yang, Huanming; Xu, Xun; Guo, Xiaosen
2017-09-01
Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low-frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low-frequency and novel variants. Although whole-exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole-genome sequencing data are limited for any population, and a large amount of low-frequency, population-specific variants remain uncharacterized. We have performed whole-genome sequencing at a high depth (∼×80) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genomes Project samples, including 45 Northern Han Chinese and 45 Southern Han Chinese samples. Eighty-three of these 90 have been sequenced by the 1000 Genomes Project. We have identified 12 568 804 single nucleotide polymorphisms, 2 074 210 short InDels, and 26 142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7 000 629 novel variants with low frequency (defined as minor allele frequency < 5%), including 5 813 503 single nucleotide polymorphisms, 1 169 199 InDels, and 17 927 structural variants. Using deep sequencing data, we have built a greatly expanded spectrum of genetic variation for the Han Chinese genome. Compared to the 1000 Genomes Project, these Han Chinese deep sequencing data enhance the characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement to the 1000 Genomes Project, as well as to other human genome projects. © The Authors 2017. Published by Oxford University Press.
Brain Tumor Segmentation Using Deep Belief Networks and Pathological Knowledge.
Zhan, Tianming; Chen, Yi; Hong, Xunning; Lu, Zhenyu; Chen, Yunjie
2017-01-01
In this paper, we propose an automatic brain tumor segmentation method based on Deep Belief Networks (DBNs) and pathological knowledge. The proposed method is targeted against gliomas (both low and high grade) obtained in multi-sequence magnetic resonance images (MRIs). Firstly, a novel deep architecture is proposed to combine the multi-sequences intensities feature extraction with classification to get the classification probabilities of each voxel. Then, graph cut based optimization is executed on the classification probabilities to strengthen the spatial relationships of voxels. At last, pathological knowledge of gliomas is applied to remove some false positives. Our method was validated in the Brain Tumor Segmentation Challenge 2012 and 2013 databases (BRATS 2012, 2013). The performance of segmentation results demonstrates our proposal providing a competitive solution with stateof- the-art methods. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Feasibility of 3.0T pelvic MR imaging in the evaluation of endometriosis.
Manganaro, L; Fierro, F; Tomei, A; Irimia, D; Lodise, P; Sergi, M E; Vinci, V; Sollazzo, P; Porpora, M G; Delfini, R; Vittori, G; Marini, M
2012-06-01
Endometriosis represents an important clinical problem in women of reproductive age with high impact on quality of life, work productivity and health care management. The aim of this study is to define the role of 3T magnetom system MRI in the evaluation of endometriosis. Forty-six women, with transvaginal (TV) ultrasound examination positive for endometriosis, with pelvic pain, or infertile underwent an MR 3.0T examination with the following protocol: T2 weighted FRFSE HR sequences, T2 weighted FRFSE HR CUBE 3D sequences, T1 w FSE sequences, LAVA-flex sequences. Pelvic anatomy, macroscopic endometriosis implants, deep endometriosis implants, fallopian tube involvement, adhesions presence, fluid effusion in Douglas pouch, uterus and kidney pathologies or anomalies associated and sacral nervous routes were considered by two radiologists in consensus. Laparoscopy was considered the gold standard. MRI imaging diagnosed deep endometriosis in 22/46 patients, endometriomas not associated to deep implants in 9/46 patients, 15/46 patients resulted negative for endometriosis, 11 of 22 patients with deep endometriosis reported ovarian endometriosis cyst. We obtained high percentages of sensibility (96.97%), specificity (100.00%), VPP (100.00%), VPN (92.86%). Pelvic MRI performed with 3T system guarantees high spatial and contrast resolution, providing accurate information about endometriosis implants, with a good pre-surgery mapping of the lesions involving both bowels and bladder surface and recto-uterine ligaments. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Shafiee, Mohammad Javad; Chung, Audrey G; Khalvati, Farzad; Haider, Masoom A; Wong, Alexander
2017-10-01
While lung cancer is the second most diagnosed form of cancer in men and women, a sufficiently early diagnosis can be pivotal in patient survival rates. Imaging-based, or radiomics-driven, detection methods have been developed to aid diagnosticians, but largely rely on hand-crafted features that may not fully encapsulate the differences between cancerous and healthy tissue. Recently, the concept of discovery radiomics was introduced, where custom abstract features are discovered from readily available imaging data. We propose an evolutionary deep radiomic sequencer discovery approach based on evolutionary deep intelligence. Motivated by patient privacy concerns and the idea of operational artificial intelligence, the evolutionary deep radiomic sequencer discovery approach organically evolves increasingly more efficient deep radiomic sequencers that produce significantly more compact yet similarly descriptive radiomic sequences over multiple generations. As a result, this framework improves operational efficiency and enables diagnosis to be run locally at the radiologist's computer while maintaining detection accuracy. We evaluated the evolved deep radiomic sequencer (EDRS) discovered via the proposed evolutionary deep radiomic sequencer discovery framework against state-of-the-art radiomics-driven and discovery radiomics methods using clinical lung CT data with pathologically proven diagnostic data from the LIDC-IDRI dataset. The EDRS shows improved sensitivity (93.42%), specificity (82.39%), and diagnostic accuracy (88.78%) relative to previous radiomics approaches.
Archaeal Diversity in Waters from Deep South African Gold Mines
Takai, Ken; Moser, Duane P.; DeFlaun, Mary; Onstott, Tullis C.; Fredrickson, James K.
2001-01-01
A culture-independent molecular analysis of archaeal communities in waters collected from deep South African gold mines was performed by performing a PCR-mediated terminal restriction fragment length polymorphism (T-RFLP) analysis of rRNA genes (rDNA) in conjunction with a sequencing analysis of archaeal rDNA clone libraries. The water samples used represented various environments, including deep fissure water, mine service water, and water from an overlying dolomite aquifer. T-RFLP analysis revealed that the ribotype distribution of archaea varied with the source of water. The archaeal communities in the deep gold mine environments exhibited great phylogenetic diversity; the majority of the members were most closely related to uncultivated species. Some archaeal rDNA clones obtained from mine service water and dolomite aquifer water samples were most closely related to environmental rDNA clones from surface soil (soil clones) and marine environments (marine group I [MGI]). Other clones exhibited intermediate phylogenetic affiliation between soil clones and MGI in the Crenarchaeota. Fissure water samples, derived from active or dormant geothermal environments, yielded archaeal sequences that exhibited novel phylogeny, including a novel lineage of Euryarchaeota. These results suggest that deep South African gold mines harbor novel archaeal communities distinct from those observed in other environments. Based on the phylogenetic analysis of archaeal strains and rDNA clones, including the newly discovered archaeal rDNA clones, the evolutionary relationship and the phylogenetic organization of the domain Archaea are reevaluated. PMID:11722932
You, Ronghui; Huang, Xiaodi; Zhu, Shanfeng
2018-06-06
As of April 2018, UniProtKB has collected more than 115 million protein sequences. Less than 0.15% of these proteins, however, have been associated with experimental GO annotations. As such, the use of automatic protein function prediction (AFP) to reduce this huge gap becomes increasingly important. The previous studies conclude that sequence homology based methods are highly effective in AFP. In addition, mining motif, domain, and functional information from protein sequences has been found very helpful for AFP. Other than sequences, alternative information sources such as text, however, may be useful for AFP as well. Instead of using BOW (bag of words) representation in traditional text-based AFP, we propose a new method called DeepText2GO that relies on deep semantic text representation, together with different kinds of available protein information such as sequence homology, families, domains, and motifs, to improve large-scale AFP. Furthermore, DeepText2GO integrates text-based methods with sequence-based ones by means of a consensus approach. Extensive experiments on the benchmark dataset extracted from UniProt/SwissProt have demonstrated that DeepText2GO significantly outperformed both text-based and sequence-based methods, validating its superiority. Copyright © 2018 Elsevier Inc. All rights reserved.
Zhang, Xiao-yong; Tang, Gui-ling; Xu, Xin-ya; Nong, Xu-hua; Qi, Shu-Hua
2014-01-01
The fungal diversity in deep-sea environments has recently gained an increasing amount attention. Our knowledge and understanding of the true fungal diversity and the role it plays in deep-sea environments, however, is still limited. We investigated the fungal community structure in five sediments from a depth of ∼4000 m in the East India Ocean using a combination of targeted environmental sequencing and traditional cultivation. This approach resulted in the recovery of a total of 45 fungal operational taxonomic units (OTUs) and 20 culturable fungal phylotypes. This finding indicates that there is a great amount of fungal diversity in the deep-sea sediments collected in the East Indian Ocean. Three fungal OTUs and one culturable phylotype demonstrated high divergence (89%–97%) from the existing sequences in the GenBank. Moreover, 44.4% fungal OTUs and 30% culturable fungal phylotypes are new reports for deep-sea sediments. These results suggest that the deep-sea sediments from the East India Ocean can serve as habitats for new fungal communities compared with other deep-sea environments. In addition, different fungal community could be detected when using targeted environmental sequencing compared with traditional cultivation in this study, which suggests that a combination of targeted environmental sequencing and traditional cultivation will generate a more diverse fungal community in deep-sea environments than using either targeted environmental sequencing or traditional cultivation alone. This study is the first to report new insights into the fungal communities in deep-sea sediments from the East Indian Ocean, which increases our knowledge and understanding of the fungal diversity in deep-sea environments. PMID:25272044
GARFIELD-NGS: Genomic vARiants FIltering by dEep Learning moDels in NGS.
Ravasio, Viola; Ritelli, Marco; Legati, Andrea; Giacopuzzi, Edoardo
2018-04-14
Exome sequencing approach is extensively used in research and diagnostic laboratories to discover pathological variants and study genetic architecture of human diseases. However, a significant proportion of identified genetic variants are actually false positive calls, and this pose serious challenges for variants interpretation. Here, we propose a new tool named GARFIELD-NGS (Genomic vARiants FIltering by dEep Learning moDels in NGS), which rely on deep learning models to dissect false and true variants in exome sequencing experiments performed with Illumina or ION platforms. GARFIELD-NGS showed strong performances for both SNP and INDEL variants (AUC 0.71 - 0.98) and outperformed established hard filters. The method is robust also at low coverage down to 30X and can be applied on data generated with the recent Illumina two-colour chemistry. GARFIELD-NGS processes standard VCF file and produces a regular VCF output. Thus, it can be easily integrated in existing analysis pipeline, allowing application of different thresholds based on desired level of sensitivity and specificity. GARFIELD-NGS available at https://github.com/gedoardo83/GARFIELD-NGS. edoardo.giacopuzzi@unibs.it. Supplementary data are available at Bioinformatics online.
As sequencing costs continue to decline, a torrent of cancer genomic data is looming. Very soon, our ability to deeply investigate the cancer genome will outpace our ability to correlate these changes with the phenotypes that they produce. We propose the advanced development and extension of a software platform for performing deep phenotype extraction directly from clinical text of cancer patients, with the goal of enabling translational cancer research and precision medicine.
Quasispecies Analyses of the HIV-1 Near-full-length Genome With Illumina MiSeq
Ode, Hirotaka; Matsuda, Masakazu; Matsuoka, Kazuhiro; Hachiya, Atsuko; Hattori, Junko; Kito, Yumiko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru
2015-01-01
Human immunodeficiency virus type-1 (HIV-1) exhibits high between-host genetic diversity and within-host heterogeneity, recognized as quasispecies. Because HIV-1 quasispecies fluctuate in terms of multiple factors, such as antiretroviral exposure and host immunity, analyzing the HIV-1 genome is critical for selecting effective antiretroviral therapy and understanding within-host viral coevolution mechanisms. Here, to obtain HIV-1 genome sequence information that includes minority variants, we sought to develop a method for evaluating quasispecies throughout the HIV-1 near-full-length genome using the Illumina MiSeq benchtop deep sequencer. To ensure the reliability of minority mutation detection, we applied an analysis method of sequence read mapping onto a consensus sequence derived from de novo assembly followed by iterative mapping and subsequent unique error correction. Deep sequencing analyses of aHIV-1 clone showed that the analysis method reduced erroneous base prevalence below 1% in each sequence position and discarded only < 1% of all collected nucleotides, maximizing the usage of the collected genome sequences. Further, we designed primer sets to amplify the HIV-1 near-full-length genome from clinical plasma samples. Deep sequencing of 92 samples in combination with the primer sets and our analysis method provided sufficient coverage to identify >1%-frequency sequences throughout the genome. When we evaluated sequences of pol genes from 18 treatment-naïve patients' samples, the deep sequencing results were in agreement with Sanger sequencing and identified numerous additional minority mutations. The results suggest that our deep sequencing method would be suitable for identifying within-host viral population dynamics throughout the genome. PMID:26617593
Virus Identification in Unknown Tropical Febrile Illness Cases Using Deep Sequencing
Balmaseda, Angel; Harris, Eva; DeRisi, Joseph L.
2012-01-01
Dengue virus is an emerging infectious agent that infects an estimated 50–100 million people annually worldwide, yet current diagnostic practices cannot detect an etiologic pathogen in ∼40% of dengue-like illnesses. Metagenomic approaches to pathogen detection, such as viral microarrays and deep sequencing, are promising tools to address emerging and non-diagnosable disease challenges. In this study, we used the Virochip microarray and deep sequencing to characterize the spectrum of viruses present in human sera from 123 Nicaraguan patients presenting with dengue-like symptoms but testing negative for dengue virus. We utilized a barcoding strategy to simultaneously deep sequence multiple serum specimens, generating on average over 1 million reads per sample. We then implemented a stepwise bioinformatic filtering pipeline to remove the majority of human and low-quality sequences to improve the speed and accuracy of subsequent unbiased database searches. By deep sequencing, we were able to detect virus sequence in 37% (45/123) of previously negative cases. These included 13 cases with Human Herpesvirus 6 sequences. Other samples contained sequences with similarity to sequences from viruses in the Herpesviridae, Flaviviridae, Circoviridae, Anelloviridae, Asfarviridae, and Parvoviridae families. In some cases, the putative viral sequences were virtually identical to known viruses, and in others they diverged, suggesting that they may derive from novel viruses. These results demonstrate the utility of unbiased metagenomic approaches in the detection of known and divergent viruses in the study of tropical febrile illness. PMID:22347512
deepTools: a flexible platform for exploring deep-sequencing data.
Ramírez, Fidel; Dündar, Friederike; Diehl, Sarah; Grüning, Björn A; Manke, Thomas
2014-07-01
We present a Galaxy based web server for processing and visualizing deeply sequenced data. The web server's core functionality consists of a suite of newly developed tools, called deepTools, that enable users with little bioinformatic background to explore the results of their sequencing experiments in a standardized setting. Users can upload pre-processed files with continuous data in standard formats and generate heatmaps and summary plots in a straight-forward, yet highly customizable manner. In addition, we offer several tools for the analysis of files containing aligned reads and enable efficient and reproducible generation of normalized coverage files. As a modular and open-source platform, deepTools can easily be expanded and customized to future demands and developments. The deepTools webserver is freely available at http://deeptools.ie-freiburg.mpg.de and is accompanied by extensive documentation and tutorials aimed at conveying the principles of deep-sequencing data analysis. The web server can be used without registration. deepTools can be installed locally either stand-alone or as part of Galaxy. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Zhao, Zijian; Voros, Sandrine; Weng, Ying; Chang, Faliang; Li, Ruijian
2017-12-01
Worldwide propagation of minimally invasive surgeries (MIS) is hindered by their drawback of indirect observation and manipulation, while monitoring of surgical instruments moving in the operated body required by surgeons is a challenging problem. Tracking of surgical instruments by vision-based methods is quite lucrative, due to its flexible implementation via software-based control with no need to modify instruments or surgical workflow. A MIS instrument is conventionally split into a shaft and end-effector portions, while a 2D/3D tracking-by-detection framework is proposed, which performs the shaft tracking followed by the end-effector one. The former portion is described by line features via the RANSAC scheme, while the latter is depicted by special image features based on deep learning through a well-trained convolutional neural network. The method verification in 2D and 3D formulation is performed through the experiments on ex-vivo video sequences, while qualitative validation on in-vivo video sequences is obtained. The proposed method provides robust and accurate tracking, which is confirmed by the experimental results: its 3D performance in ex-vivo video sequences exceeds those of the available state-of -the-art methods. Moreover, the experiments on in-vivo sequences demonstrate that the proposed method can tackle the difficult condition of tracking with unknown camera parameters. Further refinements of the method will refer to the occlusion and multi-instrumental MIS applications.
An introduction to deep learning on biological sequence data: examples and solutions.
Jurtz, Vanessa Isabell; Johansen, Alexander Rosenberg; Nielsen, Morten; Almagro Armenteros, Jose Juan; Nielsen, Henrik; Sønderby, Casper Kaae; Winther, Ole; Sønderby, Søren Kaae
2017-11-15
Deep neural network architectures such as convolutional and long short-term memory networks have become increasingly popular as machine learning tools during the recent years. The availability of greater computational resources, more data, new algorithms for training deep models and easy to use libraries for implementation and training of neural networks are the drivers of this development. The use of deep learning has been especially successful in image recognition; and the development of tools, applications and code examples are in most cases centered within this field rather than within biology. Here, we aim to further the development of deep learning methods within biology by providing application examples and ready to apply and adapt code templates. Given such examples, we illustrate how architectures consisting of convolutional and long short-term memory neural networks can relatively easily be designed and trained to state-of-the-art performance on three biological sequence problems: prediction of subcellular localization, protein secondary structure and the binding of peptides to MHC Class II molecules. All implementations and datasets are available online to the scientific community at https://github.com/vanessajurtz/lasagne4bio. skaaesonderby@gmail.com. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Distribution and Diversity of Microbial Eukaryotes in Bathypelagic Waters of the South China Sea.
Xu, Dapeng; Jiao, Nianzhi; Ren, Rui; Warren, Alan
2017-05-01
Little is known about the biodiversity of microbial eukaryotes in the South China Sea, especially in waters at bathyal depths. Here, we employed SSU rDNA gene sequencing to reveal the diversity and community structure across depth and distance gradients in the South China Sea. Vertically, the highest alpha diversity was found at 75-m depth. The communities of microbial eukaryotes were clustered into shallow-, middle-, and deep-water groups according to the depth from which they were collected, indicating a depth-related diversity and distribution pattern. Rhizaria sequences dominated the microeukaryote community and occurred in all samples except those from less than 50-m deep, being most abundant near the sea floor where they contributed ca. 64-97% and 40-74% of the total sequences and OTUs recovered, respectively. A large portion of rhizarian OTUs has neither a nearest named neighbor nor a nearest neighbor in the GenBank database which indicated the presence of new phylotypes in the South China Sea. Given their overwhelming abundance and richness, further phylogenetic analysis of rhizarians were performed and three new genetic clusters were revealed containing sequences retrieved from the deep waters of the South China Sea. Our results shed light on the diversity and community structure of microbial eukaryotes in this not yet fully explored area. © 2016 The Author(s) Journal of Eukaryotic Microbiology © 2016 International Society of Protistologists.
Detection of microRNAs in color space.
Marco, Antonio; Griffiths-Jones, Sam
2012-02-01
Deep sequencing provides inexpensive opportunities to characterize the transcriptional diversity of known genomes. The AB SOLiD technology generates millions of short sequencing reads in color-space; that is, the raw data is a sequence of colors, where each color represents 2 nt and each nucleotide is represented by two consecutive colors. This strategy is purported to have several advantages, including increased ability to distinguish sequencing errors from polymorphisms. Several programs have been developed to map short reads to genomes in color space. However, a number of previously unexplored technical issues arise when using SOLiD technology to characterize microRNAs. Here we explore these technical difficulties. First, since the sequenced reads are longer than the biological sequences, every read is expected to contain linker fragments. The color-calling error rate increases toward the 3(') end of the read such that recognizing the linker sequence for removal becomes problematic. Second, mapping in color space may lead to the loss of the first nucleotide of each read. We propose a sequential trimming and mapping approach to map small RNAs. Using our strategy, we reanalyze three published insect small RNA deep sequencing datasets and characterize 22 new microRNAs. A bash shell script to perform the sequential trimming and mapping procedure, called SeqTrimMap, is available at: http://www.mirbase.org/tools/seqtrimmap/ antonio.marco@manchester.ac.uk Supplementary data are available at Bioinformatics online.
Low-Latency Telerobotic Sample Return and Biomolecular Sequencing for Deep Space Gateway
NASA Astrophysics Data System (ADS)
Lupisella, M.; Bleacher, J.; Lewis, R.; Dworkin, J.; Wright, M.; Burton, A.; Rubins, K.; Wallace, S.; Stahl, S.; John, K.; Archer, D.; Niles, P.; Regberg, A.; Smith, D.; Race, M.; Chiu, C.; Russell, J.; Rampe, E.; Bywaters, K.
2018-02-01
Low-latency telerobotics, crew-assisted sample return, and biomolecular sequencing can be used to acquire and analyze lunar farside and/or Apollo landing site samples. Sequencing can also be used to monitor and study Deep Space Gateway environment and crew health.
Crespo, Bibiana G; Wallhead, Philip J; Logares, Ramiro; Pedrós-Alió, Carlos
2016-01-01
High-throughput sequencing (HTS) techniques have suggested the existence of a wealth of species with very low relative abundance: the rare biosphere. We attempted to exhaustively map this rare biosphere in two water samples by performing an exceptionally deep pyrosequencing analysis (~500,000 final reads per sample). Species data were derived by a 97% identity criterion and various parametric distributions were fitted to the observed counts. Using the best-fitting Sichel distribution we estimate a total species richness of 1,568-1,669 (95% Credible Interval) and 5,027-5,196 for surface and deep water samples respectively, implying that 84-89% of the total richness in those two samples was sequenced, and we predict that a quadrupling of the present sequencing effort would suffice to observe 90% of the total richness in both samples. Comparing the HTS results with a culturing approach we found that most of the cultured taxa were not obtained by HTS, despite the high sequencing effort. Culturing therefore remains a useful tool for uncovering marine bacterial diversity, in addition to its other uses for studying the ecology of marine bacteria.
Zhang, Likui; Kang, Manyu; Huang, Yangchao; Yang, Lixiang
2016-05-01
The diversity and ecological significance of bacteria and archaea in deep-sea environments have been thoroughly investigated, but eukaryotic microorganisms in these areas, such as fungi, are poorly understood. To elucidate fungal diversity in calcareous deep-sea sediments in the Southwest India Ridge (SWIR), the internal transcribed spacer (ITS) regions of rRNA genes from two sediment metagenomic DNA samples were amplified and sequenced using the Illumina sequencing platform. The results revealed that 58-63 % and 36-42 % of the ITS sequences (97 % similarity) belonged to Basidiomycota and Ascomycota, respectively. These findings suggest that Basidiomycota and Ascomycota are the predominant fungal phyla in the two samples. We also found that Agaricomycetes, Leotiomycetes, and Pezizomycetes were the major fungal classes in the two samples. At the species level, Thelephoraceae sp. and Phialocephala fortinii were major fungal species in the two samples. Despite the low relative abundance, unidentified fungal sequences were also observed in the two samples. Furthermore, we found that there were slight differences in fungal diversity between the two sediment samples, although both were collected from the SWIR. Thus, our results demonstrate that calcareous deep-sea sediments in the SWIR harbor diverse fungi, which augment the fungal groups in deep-sea sediments. This is the first report of fungal communities in calcareous deep-sea sediments in the SWIR revealed by Illumina sequencing.
Jeanne, Nicolas; Saliou, Adrien; Carcenac, Romain; Lefebvre, Caroline; Dubois, Martine; Cazabat, Michelle; Nicot, Florence; Loiseau, Claire; Raymond, Stéphanie; Izopet, Jacques; Delobel, Pierre
2015-01-01
HIV-1 coreceptor usage must be accurately determined before starting CCR5 antagonist-based treatment as the presence of undetected minor CXCR4-using variants can cause subsequent virological failure. Ultra-deep pyrosequencing of HIV-1 V3 env allows to detect low levels of CXCR4-using variants that current genotypic approaches miss. However, the computation of the mass of sequence data and the need to identify true minor variants while excluding artifactual sequences generated during amplification and ultra-deep pyrosequencing is rate-limiting. Arbitrary fixed cut-offs below which minor variants are discarded are currently used but the errors generated during ultra-deep pyrosequencing are sequence-dependant rather than random. We have developed an automated processing of HIV-1 V3 env ultra-deep pyrosequencing data that uses biological filters to discard artifactual or non-functional V3 sequences followed by statistical filters to determine position-specific sensitivity thresholds, rather than arbitrary fixed cut-offs. It allows to retain authentic sequences with point mutations at V3 positions of interest and discard artifactual ones with accurate sensitivity thresholds. PMID:26585833
Jeanne, Nicolas; Saliou, Adrien; Carcenac, Romain; Lefebvre, Caroline; Dubois, Martine; Cazabat, Michelle; Nicot, Florence; Loiseau, Claire; Raymond, Stéphanie; Izopet, Jacques; Delobel, Pierre
2015-11-20
HIV-1 coreceptor usage must be accurately determined before starting CCR5 antagonist-based treatment as the presence of undetected minor CXCR4-using variants can cause subsequent virological failure. Ultra-deep pyrosequencing of HIV-1 V3 env allows to detect low levels of CXCR4-using variants that current genotypic approaches miss. However, the computation of the mass of sequence data and the need to identify true minor variants while excluding artifactual sequences generated during amplification and ultra-deep pyrosequencing is rate-limiting. Arbitrary fixed cut-offs below which minor variants are discarded are currently used but the errors generated during ultra-deep pyrosequencing are sequence-dependant rather than random. We have developed an automated processing of HIV-1 V3 env ultra-deep pyrosequencing data that uses biological filters to discard artifactual or non-functional V3 sequences followed by statistical filters to determine position-specific sensitivity thresholds, rather than arbitrary fixed cut-offs. It allows to retain authentic sequences with point mutations at V3 positions of interest and discard artifactual ones with accurate sensitivity thresholds.
The Function of Neuroendocrine Cells in Prostate Cancer
2013-04-01
integration site. We then performed deep sequencing and aligned reads to the genome. Our analysis revealed that both histological phenotypes are derived from...lentiviral integration site analysis . (B) Laser capture microdissection was performed on individual glands containing both squamous and...lentiviral integration site analysis . LTR: long terminal repeat (viral DNA), PCR: polymerase chain reaction. (D) Venn diagrams depict shared lentiviral
Deep dysphasic performance in non-fluent progressive aphasia: a case study.
Tree, J J; Perfect, T J; Hirsh, K W; Copstick, S
2001-01-01
We present a patient (PW) with non-fluent progressive aphasia, characterized by severe word finding difficulties and frequent phonemic paraphasias in spontaneous speech. It has been suggested that such patients have insufficient access to phonological information for output and cannot construct the appropriate sequence of selected phonemes for articulation. Consistent with such a proposal, we found that PW was impaired on a variety of verbal tasks that demand access to phonological representations (reading, repetition, confrontational naming and rhyme judgement); she also demonstrated poor performance on syntactic and grammatical processing tasks. However, examination of PW's repetition performance also revealed that she made semantic paraphasias and that her performance was influenced by imageability and lexical status. Her auditory-verbal short-term memory was also severely compromised. These features are consistent with 'deep dysphasia', a disorder reported in patients suffering from stroke or cerebrovascular accident, and rarely reported in the context of non-fluent progressive aphasia. PW's pattern of performance is evaluated in terms of current models of both non-fluent progressive aphasia and deep dysphasia.
Pan, Xiaoyong; Shen, Hong-Bin
2017-02-28
RNAs play key roles in cells through the interactions with proteins known as the RNA-binding proteins (RBP) and their binding motifs enable crucial understanding of the post-transcriptional regulation of RNAs. How the RBPs correctly recognize the target RNAs and why they bind specific positions is still far from clear. Machine learning-based algorithms are widely acknowledged to be capable of speeding up this process. Although many automatic tools have been developed to predict the RNA-protein binding sites from the rapidly growing multi-resource data, e.g. sequence, structure, their domain specific features and formats have posed significant computational challenges. One of current difficulties is that the cross-source shared common knowledge is at a higher abstraction level beyond the observed data, resulting in a low efficiency of direct integration of observed data across domains. The other difficulty is how to interpret the prediction results. Existing approaches tend to terminate after outputting the potential discrete binding sites on the sequences, but how to assemble them into the meaningful binding motifs is a topic worth of further investigation. In viewing of these challenges, we propose a deep learning-based framework (iDeep) by using a novel hybrid convolutional neural network and deep belief network to predict the RBP interaction sites and motifs on RNAs. This new protocol is featured by transforming the original observed data into a high-level abstraction feature space using multiple layers of learning blocks, where the shared representations across different domains are integrated. To validate our iDeep method, we performed experiments on 31 large-scale CLIP-seq datasets, and our results show that by integrating multiple sources of data, the average AUC can be improved by 8% compared to the best single-source-based predictor; and through cross-domain knowledge integration at an abstraction level, it outperforms the state-of-the-art predictors by 6%. Besides the overall enhanced prediction performance, the convolutional neural network module embedded in iDeep is also able to automatically capture the interpretable binding motifs for RBPs. Large-scale experiments demonstrate that these mined binding motifs agree well with the experimentally verified results, suggesting iDeep is a promising approach in the real-world applications. The iDeep framework not only can achieve promising performance than the state-of-the-art predictors, but also easily capture interpretable binding motifs. iDeep is available at http://www.csbio.sjtu.edu.cn/bioinf/iDeep.
Subsurface microbial diversity in deep-granitic-fracture water in Colorado
Sahl, J.W.; Schmidt, R.; Swanner, E.D.; Mandernack, K.W.; Templeton, A.S.; Kieft, Thomas L.; Smith, R.L.; Sanford, W.E.; Callaghan, R.L.; Mitton, J.B.; Spear, J.R.
2008-01-01
A microbial community analysis using 16S rRNA gene sequencing was performed on borehole water and a granite rock core from Henderson Mine, a >1,000-meter-deep molybdenum mine near Empire, CO. Chemical analysis of borehole water at two separate depths (1,044 m and 1,004 m below the mine entrance) suggests that a sharp chemical gradient exists, likely from the mixing of two distinct subsurface fluids, one metal rich and one relatively dilute; this has created unique niches for microorganisms. The microbial community analyzed from filtered, oxic borehole water indicated an abundance of sequences from iron-oxidizing bacteria (Gallionella spp.) and was compared to the community from the same borehole after 2 weeks of being plugged with an expandable packer. Statistical analyses with UniFrac revealed a significant shift in community structure following the addition of the packer. Phospholipid fatty acid (PLFA) analysis suggested that Nitrosomonadales dominated the oxic borehole, while PLFAs indicative of anaerobic bacteria were most abundant in the samples from the plugged borehole. Microbial sequences were represented primarily by Firmicutes, Proteobacteria, and a lineage of sequences which did not group with any identified bacterial division; phylogenetic analyses confirmed the presence of a novel candidate division. This "Henderson candidate division" dominated the clone libraries from the dilute anoxic fluids. Sequences obtained from the granitic rock core (1,740 m below the surface) were represented by the divisions Proteobacteria (primarily the family Ralstoniaceae) and Firmicutes. Sequences grouping within Ralstoniaceae were also found in the clone libraries from metal-rich fluids yet were absent in more dilute fluids. Lineage-specific comparisons, combined with phylogenetic statistical analyses, show that geochemical variance has an important effect on microbial community structure in deep, subsurface systems. Copyright ?? 2008, American Society for Microbiology. All Rights Reserved.
Subsurface Microbial Diversity in Deep-Granitic-Fracture Water in Colorado▿
Sahl, Jason W.; Schmidt, Raleigh; Swanner, Elizabeth D.; Mandernack, Kevin W.; Templeton, Alexis S.; Kieft, Thomas L.; Smith, Richard L.; Sanford, William E.; Callaghan, Robert L.; Mitton, Jeffry B.; Spear, John R.
2008-01-01
A microbial community analysis using 16S rRNA gene sequencing was performed on borehole water and a granite rock core from Henderson Mine, a >1,000-meter-deep molybdenum mine near Empire, CO. Chemical analysis of borehole water at two separate depths (1,044 m and 1,004 m below the mine entrance) suggests that a sharp chemical gradient exists, likely from the mixing of two distinct subsurface fluids, one metal rich and one relatively dilute; this has created unique niches for microorganisms. The microbial community analyzed from filtered, oxic borehole water indicated an abundance of sequences from iron-oxidizing bacteria (Gallionella spp.) and was compared to the community from the same borehole after 2 weeks of being plugged with an expandable packer. Statistical analyses with UniFrac revealed a significant shift in community structure following the addition of the packer. Phospholipid fatty acid (PLFA) analysis suggested that Nitrosomonadales dominated the oxic borehole, while PLFAs indicative of anaerobic bacteria were most abundant in the samples from the plugged borehole. Microbial sequences were represented primarily by Firmicutes, Proteobacteria, and a lineage of sequences which did not group with any identified bacterial division; phylogenetic analyses confirmed the presence of a novel candidate division. This “Henderson candidate division” dominated the clone libraries from the dilute anoxic fluids. Sequences obtained from the granitic rock core (1,740 m below the surface) were represented by the divisions Proteobacteria (primarily the family Ralstoniaceae) and Firmicutes. Sequences grouping within Ralstoniaceae were also found in the clone libraries from metal-rich fluids yet were absent in more dilute fluids. Lineage-specific comparisons, combined with phylogenetic statistical analyses, show that geochemical variance has an important effect on microbial community structure in deep, subsurface systems. PMID:17981950
Recurrent neural networks for breast lesion classification based on DCE-MRIs
NASA Astrophysics Data System (ADS)
Antropova, Natasha; Huynh, Benjamin; Giger, Maryellen
2018-02-01
Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) plays a significant role in breast cancer screening, cancer staging, and monitoring response to therapy. Recently, deep learning methods are being rapidly incorporated in image-based breast cancer diagnosis and prognosis. However, most of the current deep learning methods make clinical decisions based on 2-dimentional (2D) or 3D images and are not well suited for temporal image data. In this study, we develop a deep learning methodology that enables integration of clinically valuable temporal components of DCE-MRIs into deep learning-based lesion classification. Our work is performed on a database of 703 DCE-MRI cases for the task of distinguishing benign and malignant lesions, and uses the area under the ROC curve (AUC) as the performance metric in conducting that task. We train a recurrent neural network, specifically a long short-term memory network (LSTM), on sequences of image features extracted from the dynamic MRI sequences. These features are extracted with VGGNet, a convolutional neural network pre-trained on a large dataset of natural images ImageNet. The features are obtained from various levels of the network, to capture low-, mid-, and high-level information about the lesion. Compared to a classification method that takes as input only images at a single time-point (yielding an AUC = 0.81 (se = 0.04)), our LSTM method improves lesion classification with an AUC of 0.85 (se = 0.03).
Jiang, Haojun; Xie, Yifan; Li, Xuchao; Ge, Huijuan; Deng, Yongqiang; Mu, Haofang; Feng, Xiaoli; Yin, Lu; Du, Zhou; Chen, Fang; He, Nongyue
2016-01-01
Short tandem repeats (STRs) and single nucleotide polymorphisms (SNPs) have been already used to perform noninvasive prenatal paternity testing from maternal plasma DNA. The frequently used technologies were PCR followed by capillary electrophoresis and SNP typing array, respectively. Here, we developed a noninvasive prenatal paternity testing (NIPAT) based on SNP typing with maternal plasma DNA sequencing. We evaluated the influence factors (minor allele frequency (MAF), the number of total SNP, fetal fraction and effective sequencing depth) and designed three different selective SNP panels in order to verify the performance in clinical cases. Combining targeted deep sequencing of selective SNP and informative bioinformatics pipeline, we calculated the combined paternity index (CPI) of 17 cases to determine paternity. Sequencing-based NIPAT results fully agreed with invasive prenatal paternity test using STR multiplex system. Our study here proved that the maternal plasma DNA sequencing-based technology is feasible and accurate in determining paternity, which may provide an alternative in forensic application in the future.
Intensity inhomogeneity correction for magnetic resonance imaging of human brain at 7T.
Uwano, Ikuko; Kudo, Kohsuke; Yamashita, Fumio; Goodwin, Jonathan; Higuchi, Satomi; Ito, Kenji; Harada, Taisuke; Ogawa, Akira; Sasaki, Makoto
2014-02-01
To evaluate the performance and efficacy for intensity inhomogeneity correction of various sequences of the human brain in 7T MRI using the extended version of the unified segmentation algorithm. Ten healthy volunteers were scanned with four different sequences (2D spin echo [SE], 3D fast SE, 2D fast spoiled gradient echo, and 3D time-of-flight) by using a 7T MRI system. Intensity inhomogeneity correction was performed using the "New Segment" module in SPM8 with four different values (120, 90, 60, and 30 mm) of full width at half maximum (FWHM) in Gaussian smoothness. The uniformity in signals in the entire white matter was evaluated using the coefficient of variation (CV); mean signal intensities between the subcortical and deep white matter were compared, and contrast between subcortical white matter and gray matter was measured. The length of the lenticulostriate (LSA) was measured on maximum intensity projection (MIP) images in the original and corrected images. In all sequences, the CV decreased as the FWHM value decreased. The differences of mean signal intensities between subcortical and deep white matter also decreased with smaller FWHM values. The contrast between white and gray matter was maintained at all FWHM values. LSA length was significantly greater in corrected MIP than in the original MIP images. Intensity inhomogeneity in 7T MRI can be successfully corrected using SPM8 for various scan sequences.
Wei, Ran; Yan, Yue-Hong; Harris, AJ; Kang, Jong-Soo; Shen, Hui; Zhang, Xian-Chun
2017-01-01
Abstract The eupolypods II ferns represent a classic case of evolutionary radiation and, simultaneously, exhibit high substitution rate heterogeneity. These factors have been proposed to contribute to the contentious resolutions among clades within this fern group in multilocus phylogenetic studies. We investigated the deep phylogenetic relationships of eupolypod II ferns by sampling all major families and using 40 plastid genomes, or plastomes, of which 33 were newly sequenced with next-generation sequencing technology. We performed model-based analyses to evaluate the diversity of molecular evolutionary rates for these ferns. Our plastome data, with more than 26,000 informative characters, yielded good resolution for deep relationships within eupolypods II and unambiguously clarified the position of Rhachidosoraceae and the monophyly of Athyriaceae. Results of rate heterogeneity analysis revealed approximately 33 significant rate shifts in eupolypod II ferns, with the most heterogeneous rates (both accelerations and decelerations) occurring in two phylogenetically difficult lineages, that is, the Rhachidosoraceae–Aspleniaceae and Athyriaceae clades. These observations support the hypothesis that rate heterogeneity has previously constrained the deep phylogenetic resolution in eupolypods II. According to the plastome data, we propose that 14 chloroplast markers are particularly phylogenetically informative for eupolypods II both at the familial and generic levels. Our study demonstrates the power of a character-rich plastome data set and high-throughput sequencing for resolving the recalcitrant lineages, which have undergone rapid evolutionary radiation and dramatic changes in substitution rates. PMID:28854625
Detection of non-coding RNA in bacteria and archaea using the DETR'PROK Galaxy pipeline.
Toffano-Nioche, Claire; Luo, Yufei; Kuchly, Claire; Wallon, Claire; Steinbach, Delphine; Zytnicki, Matthias; Jacq, Annick; Gautheret, Daniel
2013-09-01
RNA-seq experiments are now routinely used for the large scale sequencing of transcripts. In bacteria or archaea, such deep sequencing experiments typically produce 10-50 million fragments that cover most of the genome, including intergenic regions. In this context, the precise delineation of the non-coding elements is challenging. Non-coding elements include untranslated regions (UTRs) of mRNAs, independent small RNA genes (sRNAs) and transcripts produced from the antisense strand of genes (asRNA). Here we present a computational pipeline (DETR'PROK: detection of ncRNAs in prokaryotes) based on the Galaxy framework that takes as input a mapping of deep sequencing reads and performs successive steps of clustering, comparison with existing annotation and identification of transcribed non-coding fragments classified into putative 5' UTRs, sRNAs and asRNAs. We provide a step-by-step description of the protocol using real-life example data sets from Vibrio splendidus and Escherichia coli. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.
Dronkers, C E A; Klok, F A; van Haren, G R; Gleditsch, J; Westerlund, E; Huisman, M V; Kroft, L J M
2018-03-01
Diagnosing upper extremity deep vein thrombosis (UEDVT) can be challenging. Compression ultrasonography is often inconclusive because of overlying anatomic structures that hamper compressing veins. Contrast venography is invasive and has a risk of contrast allergy. Magnetic Resonance Direct Thrombus Imaging (MRDTI) and Three Dimensional Turbo Spin-echo Spectral Attenuated Inversion Recovery (3D TSE-SPAIR) are both non-contrast-enhanced Magnetic Resonance Imaging (MRI) sequences that can visualize a thrombus directly by the visualization of methemoglobin, which is formed in a fresh blood clot. MRDTI has been proven to be accurate in diagnosing deep venous thrombosis (DVT) of the leg. The primary aim of this pilot study was to test the feasibility of diagnosing UEDVT with these MRI techniques. MRDTI and 3D TSE-SPAIR were performed in 3 pilot patients who were already diagnosed with UEDVT by ultrasonography or contrast venography. In all patients, UEDVT diagnosis could be confirmed by MRDTI and 3D TSE-SPAIR in all vein segments. In conclusion, this study showed that non-contrast MRDTI and 3D TSE-SPAIR sequences may be feasible tests to diagnose UEDVT. However diagnostic accuracy and management studies have to be performed before these techniques can be routinely used in clinical practice. Copyright © 2018 Elsevier Ltd. All rights reserved.
Joint deep shape and appearance learning: application to optic pathway glioma segmentation
NASA Astrophysics Data System (ADS)
Mansoor, Awais; Li, Ien; Packer, Roger J.; Avery, Robert A.; Linguraru, Marius George
2017-03-01
Automated tissue characterization is one of the major applications of computer-aided diagnosis systems. Deep learning techniques have recently demonstrated impressive performance for the image patch-based tissue characterization. However, existing patch-based tissue classification techniques struggle to exploit the useful shape information. Local and global shape knowledge such as the regional boundary changes, diameter, and volumetrics can be useful in classifying the tissues especially in scenarios where the appearance signature does not provide significant classification information. In this work, we present a deep neural network-based method for the automated segmentation of the tumors referred to as optic pathway gliomas (OPG) located within the anterior visual pathway (AVP; optic nerve, chiasm or tracts) using joint shape and appearance learning. Voxel intensity values of commonly used MRI sequences are generally not indicative of OPG. To be considered an OPG, current clinical practice dictates that some portion of AVP must demonstrate shape enlargement. The method proposed in this work integrates multiple sequence magnetic resonance image (T1, T2, and FLAIR) along with local boundary changes to train a deep neural network. For training and evaluation purposes, we used a dataset of multiple sequence MRI obtained from 20 subjects (10 controls, 10 NF1+OPG). To our best knowledge, this is the first deep representation learning-based approach designed to merge shape and multi-channel appearance data for the glioma detection. In our experiments, mean misclassification errors of 2:39% and 0:48% were observed respectively for glioma and control patches extracted from the AVP. Moreover, an overall dice similarity coefficient of 0:87+/-0:13 (0:93+/-0:06 for healthy tissue, 0:78+/-0:18 for glioma tissue) demonstrates the potential of the proposed method in the accurate localization and early detection of OPG.
MRI markers of small vessel disease in lobar and deep hemispheric intracerebral hemorrhage
Smith, Eric E.; Nandigam, Kaveer R.N.; Chen, Yu-Wei; Jeng, Jed; Salat, David; Halpin, Amy; Frosch, Matthew; Wendell, Lauren; Fazen, Louis; Rosand, Jonathan; Viswanathan, Anand; Greenberg, Steven M.
2014-01-01
Background MRI evidence of small vessel disease is common in intracerebral hemorrhage (ICH). We hypothesized that ICH caused by cerebral amyloid angiopathy (CAA) or hypertensive vasculopathy would have different distributions of MRI T2 white matter hyperintensity (WMH) and microbleeds (MB). Methods Data were analyzed from 133 consecutive patients with primary supratentorial ICH and adequate MRI sequences. CAA was diagnosed using the Boston criteria. WMH segmentation was performed using a validated semi-automated method. WMH and MB were compared according to site of symptomatic hematoma origin (lobar vs. deep) or by pattern of hemorrhages, including both hematomas and MB, on MRI GRE sequence (grouped as lobar only--probable CAA, lobar only--possible CAA, deep hemispheric only, or mixed lobar and deep hemorrhages). Results Lobar and deep hemispheric hematoma patients had similar median nWMH volumes (19.5 cm vs. 19.9 cm3, p=0.74) and prevalence of ≥1 MB (54% vs. 52%, p=0.99). The supratentorial WMH distribution was similar according to hemorrhage location category, however the prevalence of brainstem T2 hyperintensity was lower in lobar hematoma vs. deep hematoma (54% vs. 70%, p=0.004). Mixed ICH was common (23%). Mixed ICH patients had large nWMH volumes and a posterior distribution of cortical hemorrhages similar to that seen in CAA. Conclusions WMH distribution is largely similar between CAA-related and non-CAA-related ICH. Mixed lobar and deep hemorrhages are seen on MRI GRE in up to one quarter of patients; in these patients both hypertension and CAA may be contributing to the burden of WMH. PMID:20689084
Liu, Tong; Hu, John; Zuo, Yuhu; Jin, Yazhong; Hou, Jumei
2016-04-01
Deep sequencing of small RNAs is a useful tool to identify novel small RNAs that may be involved in fungal growth and pathogenesis. In this study, we used HiSeq deep sequencing to identify 747,487 unique small RNAs from Curvularia lunata. Among these small RNAs were 1012 microRNA-like RNAs (milRNAs), which are similar to other known microRNAs, and 48 potential novel milRNAs without homologs in other organisms have been identified using the miRBase© database. We used quantitative PCR to analyze the expression of four of these milRNAs from C. lunata at different developmental stages. The analysis revealed several changes associated with germinating conidia and mycelial growth, suggesting that these milRNAs may play a role in pathogen infection and mycelial growth. A total of 8334 target mRNAs for the 1012 milRNAs that were identified, and 256 target mRNAs for the 48 novel milRNAs were predicted by computational analysis. These target mRNAs of milRNAs were also performed by gene ontology and Kyoto Encyclopedia of Genes and Genomes pathway analysis. To our knowledge, this study is the first report of C. lunata's milRNA profiles. This information will provide a better understanding of pathogen development and infection mechanism.
NASA Technical Reports Server (NTRS)
1977-01-01
VIKING LANDER DIGS A DEEP HOLE ON MARS -- This six-inch-deep, 12- inch-wide, 29-inch-long hole was dug Feb. 12 and 14 by Viking Lander 1 as the first sequence in an attempt to reach a foot beneath the surface of the red planet. The activity is in the same area where Lander 1 acquired its first soil samples last July. The trench was dug by repeatedly backhoeing in a left-right-center pattern. The backhoe teeth produced the small parallel ridges at the far end of the trench (upper left). The larger ridges running the length of the trench are material left behind during the backhoe operation. What appears to be small rocks along the ridges and in the soil at the near end of the trench are really small dirt clods. The clods and the steepness of the trench walls indicate the material is cohesive and behaves something like ordinary flour. After a later sequence, to be performed March 1 and 2, a soil sample will be taken from the bottom of the trench for inorganic soil analysis and later for biology analysis. Information about the soil taken from the bottom of the trench may help explain the weathering process on Mars and may help resolve the dilemma created by Viking findings that first suggest but then cast doubt on the possibility of life in the Martian soil. The trench shown here is a result of one of the most complex command sequences yet performed by the lander. Viking l has been operating at Chryse Planitia on Mars since it landed July 20, 1976.
USDA-ARS?s Scientific Manuscript database
Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RN...
Geoseq: a tool for dissecting deep-sequencing datasets.
Gurtowski, James; Cancio, Anthony; Shah, Hardik; Levovitz, Chaya; George, Ajish; Homann, Robert; Sachidanandam, Ravi
2010-10-12
Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a) identify differential isoform expression in mRNA-seq datasets, b) identify miRNAs (microRNAs) in libraries, and identify mature and star sequences in miRNAS and c) to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.
Rational Protein Engineering Guided by Deep Mutational Scanning
Shin, HyeonSeok; Cho, Byung-Kwan
2015-01-01
Sequence–function relationship in a protein is commonly determined by the three-dimensional protein structure followed by various biochemical experiments. However, with the explosive increase in the number of genome sequences, facilitated by recent advances in sequencing technology, the gap between protein sequences available and three-dimensional structures is rapidly widening. A recently developed method termed deep mutational scanning explores the functional phenotype of thousands of mutants via massive sequencing. Coupled with a highly efficient screening system, this approach assesses the phenotypic changes made by the substitution of each amino acid sequence that constitutes a protein. Such an informational resource provides the functional role of each amino acid sequence, thereby providing sufficient rationale for selecting target residues for protein engineering. Here, we discuss the current applications of deep mutational scanning and consider experimental design. PMID:26404267
Burkholder, William F; Newell, Evan W; Poidinger, Michael; Chen, Swaine; Fink, Katja
2017-01-01
The inaugural workshop "Deep Sequencing in Infectious Diseases: Immune and Pathogen Repertoires for the Improvement of Patient Outcomes" was held in Singapore on 13-14 October 2016. The aim of the workshop was to discuss the latest trends in using high-throughput sequencing, bioinformatics, and allied technologies to analyze immune and pathogen repertoires and their interplay within the host, bringing together key international players in the field and Singapore-based researchers and clinician-scientists. The focus was in particular on the application of these technologies for the improvement of patient diagnosis, prognosis and treatment, and for other broad public health outcomes. The presentations by scientists and clinicians showed the potential of deep sequencing technology to capture the coevolution of adaptive immunity and pathogens. For clinical applications, some key challenges remain, such as the long turnaround time and relatively high cost of deep sequencing for pathogen identification and characterization and the lack of international standardization in immune repertoire analysis.
Burkholder, William F.; Newell, Evan W.; Poidinger, Michael; Chen, Swaine; Fink, Katja
2017-01-01
The inaugural workshop “Deep Sequencing in Infectious Diseases: Immune and Pathogen Repertoires for the Improvement of Patient Outcomes” was held in Singapore on 13–14 October 2016. The aim of the workshop was to discuss the latest trends in using high-throughput sequencing, bioinformatics, and allied technologies to analyze immune and pathogen repertoires and their interplay within the host, bringing together key international players in the field and Singapore-based researchers and clinician-scientists. The focus was in particular on the application of these technologies for the improvement of patient diagnosis, prognosis and treatment, and for other broad public health outcomes. The presentations by scientists and clinicians showed the potential of deep sequencing technology to capture the coevolution of adaptive immunity and pathogens. For clinical applications, some key challenges remain, such as the long turnaround time and relatively high cost of deep sequencing for pathogen identification and characterization and the lack of international standardization in immune repertoire analysis. PMID:28620372
DEEP: a general computational framework for predicting enhancers
Kleftogiannis, Dimitrios; Kalnis, Panos; Bajic, Vladimir B.
2015-01-01
Transcription regulation in multicellular eukaryotes is orchestrated by a number of DNA functional elements located at gene regulatory regions. Some regulatory regions (e.g. enhancers) are located far away from the gene they affect. Identification of distal regulatory elements is a challenge for the bioinformatics research. Although existing methodologies increased the number of computationally predicted enhancers, performance inconsistency of computational models across different cell-lines, class imbalance within the learning sets and ad hoc rules for selecting enhancer candidates for supervised learning, are some key questions that require further examination. In this study we developed DEEP, a novel ensemble prediction framework. DEEP integrates three components with diverse characteristics that streamline the analysis of enhancer's properties in a great variety of cellular conditions. In our method we train many individual classification models that we combine to classify DNA regions as enhancers or non-enhancers. DEEP uses features derived from histone modification marks or attributes coming from sequence characteristics. Experimental results indicate that DEEP performs better than four state-of-the-art methods on the ENCODE data. We report the first computational enhancer prediction results on FANTOM5 data where DEEP achieves 90.2% accuracy and 90% geometric mean (GM) of specificity and sensitivity across 36 different tissues. We further present results derived using in vivo-derived enhancer data from VISTA database. DEEP-VISTA, when tested on an independent test set, achieved GM of 80.1% and accuracy of 89.64%. DEEP framework is publicly available at http://cbrc.kaust.edu.sa/deep/. PMID:25378307
Cough event classification by pretrained deep neural network.
Liu, Jia-Ming; You, Mingyu; Wang, Zheng; Li, Guo-Zheng; Xu, Xianghuai; Qiu, Zhongmin
2015-01-01
Cough is an essential symptom in respiratory diseases. In the measurement of cough severity, an accurate and objective cough monitor is expected by respiratory disease society. This paper aims to introduce a better performed algorithm, pretrained deep neural network (DNN), to the cough classification problem, which is a key step in the cough monitor. The deep neural network models are built from two steps, pretrain and fine-tuning, followed by a Hidden Markov Model (HMM) decoder to capture tamporal information of the audio signals. By unsupervised pretraining a deep belief network, a good initialization for a deep neural network is learned. Then the fine-tuning step is a back propogation tuning the neural network so that it can predict the observation probability associated with each HMM states, where the HMM states are originally achieved by force-alignment with a Gaussian Mixture Model Hidden Markov Model (GMM-HMM) on the training samples. Three cough HMMs and one noncough HMM are employed to model coughs and noncoughs respectively. The final decision is made based on viterbi decoding algorihtm that generates the most likely HMM sequence for each sample. A sample is labeled as cough if a cough HMM is found in the sequence. The experiments were conducted on a dataset that was collected from 22 patients with respiratory diseases. Patient dependent (PD) and patient independent (PI) experimental settings were used to evaluate the models. Five criteria, sensitivity, specificity, F1, macro average and micro average are shown to depict different aspects of the models. From overall evaluation criteria, the DNN based methods are superior to traditional GMM-HMM based method on F1 and micro average with maximal 14% and 11% error reduction in PD and 7% and 10% in PI, meanwhile keep similar performances on macro average. They also surpass GMM-HMM model on specificity with maximal 14% error reduction on both PD and PI. In this paper, we tried pretrained deep neural network in cough classification problem. Our results showed that comparing with the conventional GMM-HMM framework, the HMM-DNN could get better overall performance on cough classification task.
Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition.
Ordóñez, Francisco Javier; Roggen, Daniel
2016-01-18
Human activity recognition (HAR) tasks have traditionally been solved using engineered features obtained by heuristic processes. Current research suggests that deep convolutional neural networks are suited to automate feature extraction from raw sensor inputs. However, human activities are made of complex sequences of motor movements, and capturing this temporal dynamics is fundamental for successful HAR. Based on the recent success of recurrent neural networks for time series domains, we propose a generic deep framework for activity recognition based on convolutional and LSTM recurrent units, which: (i) is suitable for multimodal wearable sensors; (ii) can perform sensor fusion naturally; (iii) does not require expert knowledge in designing features; and (iv) explicitly models the temporal dynamics of feature activations. We evaluate our framework on two datasets, one of which has been used in a public activity recognition challenge. Our results show that our framework outperforms competing deep non-recurrent networks on the challenge dataset by 4% on average; outperforming some of the previous reported results by up to 9%. Our results show that the framework can be applied to homogeneous sensor modalities, but can also fuse multimodal sensors to improve performance. We characterise key architectural hyperparameters' influence on performance to provide insights about their optimisation.
DNA Replication Profiling Using Deep Sequencing.
Saayman, Xanita; Ramos-Pérez, Cristina; Brown, Grant W
2018-01-01
Profiling of DNA replication during progression through S phase allows a quantitative snap-shot of replication origin usage and DNA replication fork progression. We present a method for using deep sequencing data to profile DNA replication in S. cerevisiae.
DRREP: deep ridge regressed epitope predictor.
Sher, Gene; Zhi, Degui; Zhang, Shaojie
2017-10-03
The ability to predict epitopes plays an enormous role in vaccine development in terms of our ability to zero in on where to do a more thorough in-vivo analysis of the protein in question. Though for the past decade there have been numerous advancements and improvements in epitope prediction, on average the best benchmark prediction accuracies are still only around 60%. New machine learning algorithms have arisen within the domain of deep learning, text mining, and convolutional networks. This paper presents a novel analytically trained and string kernel using deep neural network, which is tailored for continuous epitope prediction, called: Deep Ridge Regressed Epitope Predictor (DRREP). DRREP was tested on long protein sequences from the following datasets: SARS, Pellequer, HIV, AntiJen, and SEQ194. DRREP was compared to numerous state of the art epitope predictors, including the most recently published predictors called LBtope and DMNLBE. Using area under ROC curve (AUC), DRREP achieved a performance improvement over the best performing predictors on SARS (13.7%), HIV (8.9%), Pellequer (1.5%), and SEQ194 (3.1%), with its performance being matched only on the AntiJen dataset, by the LBtope predictor, where both DRREP and LBtope achieved an AUC of 0.702. DRREP is an analytically trained deep neural network, thus capable of learning in a single step through regression. By combining the features of deep learning, string kernels, and convolutional networks, the system is able to perform residue-by-residue prediction of continues epitopes with higher accuracy than the current state of the art predictors.
Human Splice-Site Prediction with Deep Neural Networks.
Naito, Tatsuhiko
2018-04-18
Accurate splice-site prediction is essential to delineate gene structures from sequence data. Several computational techniques have been applied to create a system to predict canonical splice sites. For classification tasks, deep neural networks (DNNs) have achieved record-breaking results and often outperformed other supervised learning techniques. In this study, a new method of splice-site prediction using DNNs was proposed. The proposed system receives an input sequence data and returns an answer as to whether it is splice site. The length of input is 140 nucleotides, with the consensus sequence (i.e., "GT" and "AG" for the donor and acceptor sites, respectively) in the middle. Each input sequence model is applied to the pretrained DNN model that determines the probability that an input is a splice site. The model consists of convolutional layers and bidirectional long short-term memory network layers. The pretraining and validation were conducted using the data set tested in previously reported methods. The performance evaluation results showed that the proposed method can outperform the previous methods. In addition, the pattern learned by the DNNs was visualized as position frequency matrices (PFMs). Some of PFMs were very similar to the consensus sequence. The trained DNN model and the brief source code for the prediction system are uploaded. Further improvement will be achieved following the further development of DNNs.
Zandrino, Franco; La Paglia, Ernesto; Musante, Francesco
2010-01-01
To assess the diagnostic accuracy of magnetic resonance imaging in local staging of endometrial carcinoma, and to review the results and pitfalls described in the literature. Thirty women with a histological diagnosis of endometrial carcinoma underwent magnetic resonance imaging. Unenhanced T2-weighted and dynamic contrast-enhanced Ti-weighted sequences were obtained. Hysterectomy and salpingo-oophorectomy was performed in all patients. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy were calculated for the detection of deep myometrial and cervical infiltration. For deep myometrial infiltration T2-weighted sequences reached a sensitivity of 85%, specificity of 76%, PPV of 73%, NVP of 87%, and accuracy of 80%, while contrast-enhanced scans reached a sensitivity of 90%, specificity of 80%, PPV of 82%, NPV of 89%, and accuracy of 85%. For cervical infiltration T2-weighted sequences reached a sensitivity of 75%, specificity of 88%, PPV of 50%, NPV of 96%, and accuracy of 87%, while contrast-enhanced scans reached a sensitivity of 100%, specificity of 94%, PPV of 75%, NPV of 100%, and accuracy of 95%. Unenhanced and dynamic gadolinium-enhanced magnetic resonance allows accurate assessment of myometrial and cervical infiltration. Information provided by magnetic resonance imaging can define prognosis and management.
VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research
Lai, Zhongwu; Markovets, Aleksandra; Ahdesmaki, Miika; Chapman, Brad; Hofmann, Oliver; McEwen, Robert; Johnson, Justin; Dougherty, Brian; Barrett, J. Carl; Dry, Jonathan R.
2016-01-01
Abstract Accurate variant calling in next generation sequencing (NGS) is critical to understand cancer genomes better. Here we present VarDict, a novel and versatile variant caller for both DNA- and RNA-sequencing data. VarDict simultaneously calls SNV, MNV, InDels, complex and structural variants, expanding the detected genetic driver landscape of tumors. It performs local realignments on the fly for more accurate allele frequency estimation. VarDict performance scales linearly to sequencing depth, enabling ultra-deep sequencing used to explore tumor evolution or detect tumor DNA circulating in blood. In addition, VarDict performs amplicon aware variant calling for polymerase chain reaction (PCR)-based targeted sequencing often used in diagnostic settings, and is able to detect PCR artifacts. Finally, VarDict also detects differences in somatic and loss of heterozygosity variants between paired samples. VarDict reprocessing of The Cancer Genome Atlas (TCGA) Lung Adenocarcinoma dataset called known driver mutations in KRAS, EGFR, BRAF, PIK3CA and MET in 16% more patients than previously published variant calls. We believe VarDict will greatly facilitate application of NGS in clinical cancer research. PMID:27060149
AMPLISAS: a web server for multilocus genotyping using next-generation amplicon sequencing data.
Sebastian, Alvaro; Herdegen, Magdalena; Migalska, Magdalena; Radwan, Jacek
2016-03-01
Next-generation sequencing (NGS) technologies are revolutionizing the fields of biology and medicine as powerful tools for amplicon sequencing (AS). Using combinations of primers and barcodes, it is possible to sequence targeted genomic regions with deep coverage for hundreds, even thousands, of individuals in a single experiment. This is extremely valuable for the genotyping of gene families in which locus-specific primers are often difficult to design, such as the major histocompatibility complex (MHC). The utility of AS is, however, limited by the high intrinsic sequencing error rates of NGS technologies and other sources of error such as polymerase amplification or chimera formation. Correcting these errors requires extensive bioinformatic post-processing of NGS data. Amplicon Sequence Assignment (AMPLISAS) is a tool that performs analysis of AS results in a simple and efficient way, while offering customization options for advanced users. AMPLISAS is designed as a three-step pipeline consisting of (i) read demultiplexing, (ii) unique sequence clustering and (iii) erroneous sequence filtering. Allele sequences and frequencies are retrieved in excel spreadsheet format, making them easy to interpret. AMPLISAS performance has been successfully benchmarked against previously published genotyped MHC data sets obtained with various NGS technologies. © 2015 John Wiley & Sons Ltd.
Piezophilic Bacteria Isolated from Sediment of the Shimokita Coalbed, Japan
NASA Astrophysics Data System (ADS)
Fang, J.; Kato, C.; Hori, T.; Morono, Y.; Inagaki, F.
2013-12-01
The Earth is a cold planet as well as pressured planet, hosting both the surface biosphere and the deep biosphere. Pressure ranges over four-orders of magnitude in the surface biosphere and probably more in the deep biosphere. Pressure is an important thermodynamic property of the deep biosphere that affects microbial physiology and biochemistry. Bacteria that require high-pressure conditions for optimal growth are called piezophilic bacteria. Subseafloor marine sediments are one of the most extensive microbial habitats on Earth. Marine sediments cover more than two-thirds of the Earth's surface, and represent a major part of the deep biosphere. Owing to its vast size and intimate connection with the surface biosphere, particularly the oceans, the deep biosphere has enormous potential for influencing global-scale biogeochemical processes, including energy, climate, carbon and nutrient cycles. Therefore, studying piezophilic bacteria of the deep biosphere has important implications in increasing our understanding of global biogeochemical cycles, the interactions between the biosphere and the geosphere, and the evolution of life. Sediment samples were obtained during IODP Expedition 337, from 1498 meters below sea floor (mbsf) (Sample 6R-3), 1951~1999 mbsf (19R-1~25R-3; coalbed mix), and 2406 mbsf (29R-7). The samples were mixed with MB2216 growth medium and cultivated under anaerobic conditions at 35 MPa (megapascal) pressure. Growth temperatures were adjusted to in situ environmental conditions, 35°C for 6R-3, 45°C for 19R-1~25R-3, and 55°C for 29R-7. The cultivation was performed three times, for 30 days each time. Microbial cells were obtained and the total DNA was extracted. At the same time, isolation of microbes was also performed under anaerobic conditions. Microbial communities in the coalbed sediment were analyzed by cloning, sequencing, and terminal restriction fragment length polymorphism (t-RFLP) of 16S ribosomal RNA genes. From the partial 16S rRNA gene sequences, we have identified abundant Alkalibacterium sp. in 6R-3 and 29R-7 at the first HP cultivation. We also identified Haloactibacillus sp. in 6R-3 and Anoxybacillus related sp. in 19R-1~25R-3 at the third HP cultivation. These microorganisms are likely piezophiles and play an important role in degradation of sedimentary organic matter and production of microbial metabolites sustaining the deep microbial ecosystem in the Shimokita Coalbed. The complete 16S sequencing and isolation of piezophiles are now ongoing.
Improving Protein Fold Recognition by Deep Learning Networks.
Jo, Taeho; Hou, Jie; Eickholt, Jesse; Cheng, Jianlin
2015-12-04
For accurate recognition of protein folds, a deep learning network method (DN-Fold) was developed to predict if a given query-template protein pair belongs to the same structural fold. The input used stemmed from the protein sequence and structural features extracted from the protein pair. We evaluated the performance of DN-Fold along with 18 different methods on Lindahl's benchmark dataset and on a large benchmark set extracted from SCOP 1.75 consisting of about one million protein pairs, at three different levels of fold recognition (i.e., protein family, superfamily, and fold) depending on the evolutionary distance between protein sequences. The correct recognition rate of ensembled DN-Fold for Top 1 predictions is 84.5%, 61.5%, and 33.6% and for Top 5 is 91.2%, 76.5%, and 60.7% at family, superfamily, and fold levels, respectively. We also evaluated the performance of single DN-Fold (DN-FoldS), which showed the comparable results at the level of family and superfamily, compared to ensemble DN-Fold. Finally, we extended the binary classification problem of fold recognition to real-value regression task, which also show a promising performance. DN-Fold is freely available through a web server at http://iris.rnet.missouri.edu/dnfold.
Leung, Preston; Eltahla, Auda A; Lloyd, Andrew R; Bull, Rowena A; Luciani, Fabio
2017-07-15
With the advent of affordable deep sequencing technologies, detection of low frequency variants within genetically diverse viral populations can now be achieved with unprecedented depth and efficiency. The high-resolution data provided by next generation sequencing technologies is currently recognised as the gold standard in estimation of viral diversity. In the analysis of rapidly mutating viruses, longitudinal deep sequencing datasets from viral genomes during individual infection episodes, as well as at the epidemiological level during outbreaks, now allow for more sophisticated analyses such as statistical estimates of the impact of complex mutation patterns on the evolution of the viral populations both within and between hosts. These analyses are revealing more accurate descriptions of the evolutionary dynamics that underpin the rapid adaptation of these viruses to the host response, and to drug therapies. This review assesses recent developments in methods and provide informative research examples using deep sequencing data generated from rapidly mutating viruses infecting humans, particularly hepatitis C virus (HCV), human immunodeficiency virus (HIV), Ebola virus and influenza virus, to understand the evolution of viral genomes and to explore the relationship between viral mutations and the host adaptive immune response. Finally, we discuss limitations in current technologies, and future directions that take advantage of publically available large deep sequencing datasets. Copyright © 2016 Elsevier B.V. All rights reserved.
Swenson, Luke C; Moores, Andrew; Low, Andrew J; Thielen, Alexander; Dong, Winnie; Woods, Conan; Jensen, Mark A; Wynhoven, Brian; Chan, Dennison; Glascock, Christopher; Harrigan, P Richard
2010-08-01
Tropism testing should rule out CXCR4-using HIV before treatment with CCR5 antagonists. Currently, the recombinant phenotypic Trofile assay (Monogram) is most widely utilized; however, genotypic tests may represent alternative methods. Independent triplicate amplifications of the HIV gp120 V3 region were made from either plasma HIV RNA or proviral DNA. These underwent standard, population-based sequencing with an ABI3730 (RNA n = 63; DNA n = 40), or "deep" sequencing with a Roche/454 Genome Sequencer-FLX (RNA n = 12; DNA n = 12). Position-specific scoring matrices (PSSMX4/R5) (-6.96 cutoff) and geno2pheno[coreceptor] (5% false-positive rate) inferred tropism from V3 sequence. These methods were then independently validated with a separate, blinded dataset (n = 278) of screening samples from the maraviroc MOTIVATE trials. Standard sequencing of HIV RNA with PSSM yielded 69% sensitivity and 91% specificity, relative to Trofile. The validation dataset gave 75% sensitivity and 83% specificity. Proviral DNA plus PSSM gave 77% sensitivity and 71% specificity. "Deep" sequencing of HIV RNA detected >2% inferred-CXCR4-using virus in 8/8 samples called non-R5 by Trofile, and <2% in 4/4 samples called R5. Triplicate analyses of V3 standard sequence data detect greater proportions of CXCR4-using samples than previously achieved. Sequencing proviral DNA and "deep" V3 sequencing may also be useful tools for assessing tropism.
Deep sequencing of cardiac microRNA-mRNA interactomes in clinical and experimental cardiomyopathy
Matkovich, Scot J.; Dorn, Gerald W.
2018-01-01
Summary MicroRNAs are a family of short (~21 nucleotide) noncoding RNAs that serve key roles in cellular growth and differentiation and the response of the heart to stress stimuli. As the sequence-specific recognition element of RNA-induced silencing complexes (RISCs), microRNAs bind mRNAs and prevent their translation via mechanisms that may include transcript degradation and/or prevention of ribosome binding. Short microRNA sequences and the ability of microRNAs to bind to mRNA sites having only partial/imperfect sequence complementarity complicates purely computational analyses of microRNA-mRNA interactomes. Furthermore, computational microRNA target prediction programs typically ignore biological context, and therefore the principal determinants of microRNA-mRNA binding: the presence and quantity of each. To address these deficiencies we describe an empirical method, developed via studies of stressed and failing hearts, to determine disease-induced changes in microRNAs, mRNAs, and the mRNAs targeted to the RISC, without cross-linking mRNAs to RISC proteins. Deep sequencing methods are used to determine RNA abundances, delivering unbiased, quantitative RNA data limited only by their annotation in the genome of interest. We describe the laboratory bench steps required to perform these experiments, experimental design strategies to achieve an appropriate number of sequencing reads per biological replicate, and computer-based processing tools and procedures to convert large raw sequencing data files into gene expression measures useful for differential expression analyses. PMID:25836573
Deep sequencing of cardiac microRNA-mRNA interactomes in clinical and experimental cardiomyopathy.
Matkovich, Scot J; Dorn, Gerald W
2015-01-01
MicroRNAs are a family of short (~21 nucleotide) noncoding RNAs that serve key roles in cellular growth and differentiation and the response of the heart to stress stimuli. As the sequence-specific recognition element of RNA-induced silencing complexes (RISCs), microRNAs bind mRNAs and prevent their translation via mechanisms that may include transcript degradation and/or prevention of ribosome binding. Short microRNA sequences and the ability of microRNAs to bind to mRNA sites having only partial/imperfect sequence complementarity complicate purely computational analyses of microRNA-mRNA interactomes. Furthermore, computational microRNA target prediction programs typically ignore biological context, and therefore the principal determinants of microRNA-mRNA binding: the presence and quantity of each. To address these deficiencies we describe an empirical method, developed via studies of stressed and failing hearts, to determine disease-induced changes in microRNAs, mRNAs, and the mRNAs targeted to the RISC, without cross-linking mRNAs to RISC proteins. Deep sequencing methods are used to determine RNA abundances, delivering unbiased, quantitative RNA data limited only by their annotation in the genome of interest. We describe the laboratory bench steps required to perform these experiments, experimental design strategies to achieve an appropriate number of sequencing reads per biological replicate, and computer-based processing tools and procedures to convert large raw sequencing data files into gene expression measures useful for differential expression analyses.
Barrett, Nolan H.; McCarthy, Peter J.
2017-01-01
ABSTRACT The proteobacterium Alteromonas sp. strain V450 was isolated from the Atlantic deep-sea sponge Leiodermatium sp. Here, we report the draft genome sequence of this strain, with a genome size of approx. 4.39 Mb and a G+C content of 44.01%. The results will aid deep-sea microbial ecology, evolution, and sponge-microbe association studies. PMID:28153886
Min, Xu; Zeng, Wanwen; Chen, Ning; Chen, Ting; Jiang, Rui
2017-07-15
Experimental techniques for measuring chromatin accessibility are expensive and time consuming, appealing for the development of computational approaches to predict open chromatin regions from DNA sequences. Along this direction, existing methods fall into two classes: one based on handcrafted k -mer features and the other based on convolutional neural networks. Although both categories have shown good performance in specific applications thus far, there still lacks a comprehensive framework to integrate useful k -mer co-occurrence information with recent advances in deep learning. We fill this gap by addressing the problem of chromatin accessibility prediction with a convolutional Long Short-Term Memory (LSTM) network with k -mer embedding. We first split DNA sequences into k -mers and pre-train k -mer embedding vectors based on the co-occurrence matrix of k -mers by using an unsupervised representation learning approach. We then construct a supervised deep learning architecture comprised of an embedding layer, three convolutional layers and a Bidirectional LSTM (BLSTM) layer for feature learning and classification. We demonstrate that our method gains high-quality fixed-length features from variable-length sequences and consistently outperforms baseline methods. We show that k -mer embedding can effectively enhance model performance by exploring different embedding strategies. We also prove the efficacy of both the convolution and the BLSTM layers by comparing two variations of the network architecture. We confirm the robustness of our model to hyper-parameters by performing sensitivity analysis. We hope our method can eventually reinforce our understanding of employing deep learning in genomic studies and shed light on research regarding mechanisms of chromatin accessibility. The source code can be downloaded from https://github.com/minxueric/ismb2017_lstm . tingchen@tsinghua.edu.cn or ruijiang@tsinghua.edu.cn. Supplementary materials are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Min, Xu; Zeng, Wanwen; Chen, Ning; Chen, Ting; Jiang, Rui
2017-01-01
Abstract Motivation: Experimental techniques for measuring chromatin accessibility are expensive and time consuming, appealing for the development of computational approaches to predict open chromatin regions from DNA sequences. Along this direction, existing methods fall into two classes: one based on handcrafted k-mer features and the other based on convolutional neural networks. Although both categories have shown good performance in specific applications thus far, there still lacks a comprehensive framework to integrate useful k-mer co-occurrence information with recent advances in deep learning. Results: We fill this gap by addressing the problem of chromatin accessibility prediction with a convolutional Long Short-Term Memory (LSTM) network with k-mer embedding. We first split DNA sequences into k-mers and pre-train k-mer embedding vectors based on the co-occurrence matrix of k-mers by using an unsupervised representation learning approach. We then construct a supervised deep learning architecture comprised of an embedding layer, three convolutional layers and a Bidirectional LSTM (BLSTM) layer for feature learning and classification. We demonstrate that our method gains high-quality fixed-length features from variable-length sequences and consistently outperforms baseline methods. We show that k-mer embedding can effectively enhance model performance by exploring different embedding strategies. We also prove the efficacy of both the convolution and the BLSTM layers by comparing two variations of the network architecture. We confirm the robustness of our model to hyper-parameters by performing sensitivity analysis. We hope our method can eventually reinforce our understanding of employing deep learning in genomic studies and shed light on research regarding mechanisms of chromatin accessibility. Availability and implementation: The source code can be downloaded from https://github.com/minxueric/ismb2017_lstm. Contact: tingchen@tsinghua.edu.cn or ruijiang@tsinghua.edu.cn Supplementary information: Supplementary materials are available at Bioinformatics online. PMID:28881969
Ries, David; Holtgräwe, Daniela; Viehöver, Prisca; Weisshaar, Bernd
2016-03-15
The combination of bulk segregant analysis (BSA) and next generation sequencing (NGS), also known as mapping by sequencing (MBS), has been shown to significantly accelerate the identification of causal mutations for species with a reference genome sequence. The usual approach is to cross homozygous parents that differ for the monogenic trait to address, to perform deep sequencing of DNA from F2 plants pooled according to their phenotype, and subsequently to analyze the allele frequency distribution based on a marker table for the parents studied. The method has been successfully applied for EMS induced mutations as well as natural variation. Here, we show that pooling genetically diverse breeding lines according to a contrasting phenotype also allows high resolution mapping of the causal gene in a crop species. The test case was the monogenic locus causing red vs. green hypocotyl color in Beta vulgaris (R locus). We determined the allele frequencies of polymorphic sequences using sequence data from two diverging phenotypic pools of 180 B. vulgaris accessions each. A single interval of about 31 kbp among the nine chromosomes was identified which indeed contained the causative mutation. By applying a variation of the mapping by sequencing approach, we demonstrated that phenotype-based pooling of diverse accessions from breeding panels and subsequent direct determination of the allele frequency distribution can be successfully applied for gene identification in a crop species. Our approach made it possible to identify a small interval around the causative gene. Sequencing of parents or individual lines was not necessary. Whenever the appropriate plant material is available, the approach described saves time compared to the generation of an F2 population. In addition, we provide clues for planning similar experiments with regard to pool size and the sequencing depth required.
Action-Driven Visual Object Tracking With Deep Reinforcement Learning.
Yun, Sangdoo; Choi, Jongwon; Yoo, Youngjoon; Yun, Kimin; Choi, Jin Young
2018-06-01
In this paper, we propose an efficient visual tracker, which directly captures a bounding box containing the target object in a video by means of sequential actions learned using deep neural networks. The proposed deep neural network to control tracking actions is pretrained using various training video sequences and fine-tuned during actual tracking for online adaptation to a change of target and background. The pretraining is done by utilizing deep reinforcement learning (RL) as well as supervised learning. The use of RL enables even partially labeled data to be successfully utilized for semisupervised learning. Through the evaluation of the object tracking benchmark data set, the proposed tracker is validated to achieve a competitive performance at three times the speed of existing deep network-based trackers. The fast version of the proposed method, which operates in real time on graphics processing unit, outperforms the state-of-the-art real-time trackers with an accuracy improvement of more than 8%.
Identification of Prostate Cancer-Specific microDNAs
2016-02-01
circular DNA by rolling circle amplification (RCA) and then amplified DNA fragments were subject to deep sequencing. Deep sequencing of the...demonstrate the existence of microDNAs in prostate cancer. We adopted multiple displacement amplification (MDA) with random 2 primers for enriched...prostate cancer cells through multiple displacement amplification and next generation sequencing. R e la ti v e c e ll g ro w th ( % ) 0 20
Sequence-specific bias correction for RNA-seq data using recurrent neural networks.
Zhang, Yao-Zhong; Yamaguchi, Rui; Imoto, Seiya; Miyano, Satoru
2017-01-25
The recent success of deep learning techniques in machine learning and artificial intelligence has stimulated a great deal of interest among bioinformaticians, who now wish to bring the power of deep learning to bare on a host of bioinformatical problems. Deep learning is ideally suited for biological problems that require automatic or hierarchical feature representation for biological data when prior knowledge is limited. In this work, we address the sequence-specific bias correction problem for RNA-seq data redusing Recurrent Neural Networks (RNNs) to model nucleotide sequences without pre-determining sequence structures. The sequence-specific bias of a read is then calculated based on the sequence probabilities estimated by RNNs, and used in the estimation of gene abundance. We explore the application of two popular RNN recurrent units for this task and demonstrate that RNN-based approaches provide a flexible way to model nucleotide sequences without knowledge of predetermined sequence structures. Our experiments show that training a RNN-based nucleotide sequence model is efficient and RNN-based bias correction methods compare well with the-state-of-the-art sequence-specific bias correction method on the commonly used MAQC-III data set. RNNs provides an alternative and flexible way to calculate sequence-specific bias without explicitly pre-determining sequence structures.
An, Xiaoping; Fan, Hang; Ma, Maijuan; Anderson, Benjamin D.; Jiang, Jiafu; Liu, Wei; Cao, Wuchun; Tong, Yigang
2014-01-01
This paper explored our hypothesis that sRNA (18∼30 bp) deep sequencing technique can be used as an efficient strategy to identify microorganisms other than viruses, such as prokaryotic and eukaryotic pathogens. In the study, the clean reads derived from the sRNA deep sequencing data of wild-caught ticks and mosquitoes were compared against the NCBI nucleotide collection (non-redundant nt database) using Blastn. The blast results were then analyzed with in-house Python scripts. An empirical formula was proposed to identify the putative pathogens. Results showed that not only viruses but also prokaryotic and eukaryotic species of interest can be screened out and were subsequently confirmed with experiments. Specially, a novel Rickettsia spp. was indicated to exist in Haemaphysalis longicornis ticks collected in Beijing. Our study demonstrated the reuse of sRNA deep sequencing data would have the potential to trace the origin of pathogens or discover novel agents of emerging/re-emerging infectious diseases. PMID:24618575
Purpose: High-risk neuroblastoma is an aggressive disease. DNA sequencing studies have revealed a paucity of actionable genomic alterations and a low mutation burden, posing challenges to develop effective novel therapies. We used RNA sequencing (RNA-seq) to investigate the biology of this disease including a focus on tumor-infiltrating lymphocytes (TILs). Experimental Design: We performed deep RNA-seq on pre-treatment diagnostic tumors from 129 high-risk and 21 low- or intermediate-risk patients with neuroblastomas.
Wang, Guojun; Barrett, Nolan H; McCarthy, Peter J
2017-02-02
The proteobacterium Alteromonas sp. strain V450 was isolated from the Atlantic deep-sea sponge Leiodermatium sp. Here, we report the draft genome sequence of this strain, with a genome size of approx. 4.39 Mb and a G+C content of 44.01%. The results will aid deep-sea microbial ecology, evolution, and sponge-microbe association studies. Copyright © 2017 Wang et al.
St. John, Elizabeth P.; Simen, Birgitte B.; Turenchalk, Gregory S.; Braverman, Michael S.; Abbate, Isabella; Aerssens, Jeroen; Bouchez, Olivier; Gabriel, Christian; Izopet, Jacques; Meixenberger, Karolin; Di Giallonardo, Francesca; Schlapbach, Ralph; Paredes, Roger; Sakwa, James; Schmitz-Agheguian, Gudrun G.; Thielen, Alexander; Victor, Martin
2016-01-01
Background Ultra deep sequencing is of increasing use not only in research but also in diagnostics. For implementation of ultra deep sequencing assays in clinical laboratories for routine diagnostics, intra- and inter-laboratory testing are of the utmost importance. Methods A multicenter study was conducted to validate an updated assay design for 454 Life Sciences’ GS FLX Titanium system targeting protease/reverse transcriptase (RTP) and env (V3) regions to identify HIV-1 drug-resistance mutations and determine co-receptor use with high sensitivity. The study included 30 HIV-1 subtype B and 6 subtype non-B samples with viral titers (VT) of 3,940–447,400 copies/mL, two dilution series (52,129–1,340 and 25,130–734 copies/mL), and triplicate samples. Amplicons spanning PR codons 10–99, RT codons 1–251 and the entire V3 region were generated using barcoded primers. Analysis was performed using the GS Amplicon Variant Analyzer and geno2pheno for tropism. For comparison, population sequencing was performed using the ViroSeq HIV-1 genotyping system. Results The median sequencing depth across the 11 sites was 1,829 reads per position for RTP (IQR 592–3,488) and 2,410 for V3 (IQR 786–3,695). 10 preselected drug resistant variants were measured across sites and showed high inter-laboratory correlation across all sites with data (P<0.001). The triplicate samples of a plasmid mixture confirmed the high inter-laboratory consistency (mean% ± stdev: 4.6 ±0.5, 4.8 ±0.4, 4.9 ±0.3) and revealed good intra-laboratory consistency (mean% range ± stdev range: 4.2–5.2 ± 0.04–0.65). In the two dilutions series, no variants >20% were missed, variants 2–10% were detected at most sites (even at low VT), and variants 1–2% were detected by some sites. All mutations detected by population sequencing were also detected by UDS. Conclusions This assay design results in an accurate and reproducible approach to analyze HIV-1 mutant spectra, even at variant frequencies well below those routinely detectable by population sequencing. PMID:26756901
miRBase: integrating microRNA annotation and deep-sequencing data.
Kozomara, Ana; Griffiths-Jones, Sam
2011-01-01
miRBase is the primary online repository for all microRNA sequences and annotation. The current release (miRBase 16) contains over 15,000 microRNA gene loci in over 140 species, and over 17,000 distinct mature microRNA sequences. Deep-sequencing technologies have delivered a sharp rise in the rate of novel microRNA discovery. We have mapped reads from short RNA deep-sequencing experiments to microRNAs in miRBase and developed web interfaces to view these mappings. The user can view all read data associated with a given microRNA annotation, filter reads by experiment and count, and search for microRNAs by tissue- and stage-specific expression. These data can be used as a proxy for relative expression levels of microRNA sequences, provide detailed evidence for microRNA annotations and alternative isoforms of mature microRNAs, and allow us to revisit previous annotations. miRBase is available online at: http://www.mirbase.org/.
Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition
Ordóñez, Francisco Javier; Roggen, Daniel
2016-01-01
Human activity recognition (HAR) tasks have traditionally been solved using engineered features obtained by heuristic processes. Current research suggests that deep convolutional neural networks are suited to automate feature extraction from raw sensor inputs. However, human activities are made of complex sequences of motor movements, and capturing this temporal dynamics is fundamental for successful HAR. Based on the recent success of recurrent neural networks for time series domains, we propose a generic deep framework for activity recognition based on convolutional and LSTM recurrent units, which: (i) is suitable for multimodal wearable sensors; (ii) can perform sensor fusion naturally; (iii) does not require expert knowledge in designing features; and (iv) explicitly models the temporal dynamics of feature activations. We evaluate our framework on two datasets, one of which has been used in a public activity recognition challenge. Our results show that our framework outperforms competing deep non-recurrent networks on the challenge dataset by 4% on average; outperforming some of the previous reported results by up to 9%. Our results show that the framework can be applied to homogeneous sensor modalities, but can also fuse multimodal sensors to improve performance. We characterise key architectural hyperparameters’ influence on performance to provide insights about their optimisation. PMID:26797612
Transcriptome sequences resolve deep relationships of the grape family.
Wen, Jun; Xiong, Zhiqiang; Nie, Ze-Long; Mao, Likai; Zhu, Yabing; Kan, Xian-Zhao; Ickert-Bond, Stefanie M; Gerrath, Jean; Zimmer, Elizabeth A; Fang, Xiao-Dong
2013-01-01
Previous phylogenetic studies of the grape family (Vitaceae) yielded poorly resolved deep relationships, thus impeding our understanding of the evolution of the family. Next-generation sequencing now offers access to protein coding sequences very easily, quickly and cost-effectively. To improve upon earlier work, we extracted 417 orthologous single-copy nuclear genes from the transcriptomes of 15 species of the Vitaceae, covering its phylogenetic diversity. The resulting transcriptome phylogeny provides robust support for the deep relationships, showing the phylogenetic utility of transcriptome data for plants over a time scale at least since the mid-Cretaceous. The pros and cons of transcriptome data for phylogenetic inference in plants are also evaluated.
Exome Sequencing and the Management of Neurometabolic Disorders.
Tarailo-Graovac, Maja; Shyr, Casper; Ross, Colin J; Horvath, Gabriella A; Salvarinova, Ramona; Ye, Xin C; Zhang, Lin-Hua; Bhavsar, Amit P; Lee, Jessica J Y; Drögemöller, Britt I; Abdelsayed, Mena; Alfadhel, Majid; Armstrong, Linlea; Baumgartner, Matthias R; Burda, Patricie; Connolly, Mary B; Cameron, Jessie; Demos, Michelle; Dewan, Tammie; Dionne, Janis; Evans, A Mark; Friedman, Jan M; Garber, Ian; Lewis, Suzanne; Ling, Jiqiang; Mandal, Rupasri; Mattman, Andre; McKinnon, Margaret; Michoulas, Aspasia; Metzger, Daniel; Ogunbayo, Oluseye A; Rakic, Bojana; Rozmus, Jacob; Ruben, Peter; Sayson, Bryan; Santra, Saikat; Schultz, Kirk R; Selby, Kathryn; Shekel, Paul; Sirrs, Sandra; Skrypnyk, Cristina; Superti-Furga, Andrea; Turvey, Stuart E; Van Allen, Margot I; Wishart, David; Wu, Jiang; Wu, John; Zafeiriou, Dimitrios; Kluijtmans, Leo; Wevers, Ron A; Eydoux, Patrice; Lehman, Anna M; Vallance, Hilary; Stockler-Ipsiroglu, Sylvia; Sinclair, Graham; Wasserman, Wyeth W; van Karnebeek, Clara D
2016-06-09
Whole-exome sequencing has transformed gene discovery and diagnosis in rare diseases. Translation into disease-modifying treatments is challenging, particularly for intellectual developmental disorder. However, the exception is inborn errors of metabolism, since many of these disorders are responsive to therapy that targets pathophysiological features at the molecular or cellular level. To uncover the genetic basis of potentially treatable inborn errors of metabolism, we combined deep clinical phenotyping (the comprehensive characterization of the discrete components of a patient's clinical and biochemical phenotype) with whole-exome sequencing analysis through a semiautomated bioinformatics pipeline in consecutively enrolled patients with intellectual developmental disorder and unexplained metabolic phenotypes. We performed whole-exome sequencing on samples obtained from 47 probands. Of these patients, 6 were excluded, including 1 who withdrew from the study. The remaining 41 probands had been born to predominantly nonconsanguineous parents of European descent. In 37 probands, we identified variants in 2 genes newly implicated in disease, 9 candidate genes, 22 known genes with newly identified phenotypes, and 9 genes with expected phenotypes; in most of the genes, the variants were classified as either pathogenic or probably pathogenic. Complex phenotypes of patients in five families were explained by coexisting monogenic conditions. We obtained a diagnosis in 28 of 41 probands (68%) who were evaluated. A test of a targeted intervention was performed in 18 patients (44%). Deep phenotyping and whole-exome sequencing in 41 probands with intellectual developmental disorder and unexplained metabolic abnormalities led to a diagnosis in 68%, the identification of 11 candidate genes newly implicated in neurometabolic disease, and a change in treatment beyond genetic counseling in 44%. (Funded by BC Children's Hospital Foundation and others.).
Exome Sequencing and the Management of Neurometabolic Disorders
Tarailo-Graovac, M.; Shyr, C.; Ross, C.J.; Horvath, G.A.; Salvarinova, R.; Ye, X.C.; Zhang, L.-H.; Bhavsar, A.P.; Lee, J.J.Y.; Drögemöller, B.I.; Abdelsayed, M.; Alfadhel, M.; Armstrong, L.; Baumgartner, M.R.; Burda, P.; Connolly, M.B.; Cameron, J.; Demos, M.; Dewan, T.; Dionne, J.; Evans, A.M.; Friedman, J.M.; Garber, I.; Lewis, S.; Ling, J.; Mandal, R.; Mattman, A.; McKinnon, M.; Michoulas, A.; Metzger, D.; Ogunbayo, O.A.; Rakic, B.; Rozmus, J.; Ruben, P.; Sayson, B.; Santra, S.; Schultz, K.R.; Selby, K.; Shekel, P.; Sirrs, S.; Skrypnyk, C.; Superti-Furga, A.; Turvey, S.E.; Van Allen, M.I.; Wishart, D.; Wu, J.; Wu, J.; Zafeiriou, D.; Kluijtmans, L.; Wevers, R.A.; Eydoux, P.; Lehman, A.M.; Vallance, H.; Stockler-Ipsiroglu, S.; Sinclair, G.; Wasserman, W.W.; van Karnebeek, C.D.
2016-01-01
BACKGROUND Whole-exome sequencing has transformed gene discovery and diagnosis in rare diseases. Translation into disease-modifying treatments is challenging, particularly for intellectual developmental disorder. However, the exception is inborn errors of metabolism, since many of these disorders are responsive to therapy that targets pathophysiological features at the molecular or cellular level. METHODS To uncover the genetic basis of potentially treatable inborn errors of metabolism, we combined deep clinical phenotyping (the comprehensive characterization of the discrete components of a patient’s clinical and biochemical phenotype) with whole-exome sequencing analysis through a semiautomated bioinformatics pipeline in consecutively enrolled patients with intellectual developmental disorder and unexplained metabolic phenotypes. RESULTS We performed whole-exome sequencing on samples obtained from 47 probands. Of these patients, 6 were excluded, including 1 who withdrew from the study. The remaining 41 probands had been born to predominantly nonconsanguineous parents of European descent. In 37 probands, we identified variants in 2 genes newly implicated in disease, 9 candidate genes, 22 known genes with newly identified phenotypes, and 9 genes with expected phenotypes; in most of the genes, the variants were classified as either pathogenic or probably pathogenic. Complex phenotypes of patients in five families were explained by coexisting monogenic conditions. We obtained a diagnosis in 28 of 41 probands (68%) who were evaluated. A test of a targeted intervention was performed in 18 patients (44%). CONCLUSIONS Deep phenotyping and whole-exome sequencing in 41 probands with intellectual developmental disorder and unexplained metabolic abnormalities led to a diagnosis in 68%, the identification of 11 candidate genes newly implicated in neurometabolic disease, and a change in treatment beyond genetic counseling in 44%. (Funded by BC Children’s Hospital Foundation and others.) PMID:27276562
Draft Genome Sequence of Pseudomonas oceani DSM 100277T, a Deep-Sea Bacterium
2018-01-01
ABSTRACT Pseudomonas oceani DSM 100277T was isolated from deep seawater in the Okinawa Trough at 1390 m. P. oceani belongs to the Pseudomonas pertucinogena group. Here, we report the draft genome sequence of P. oceani, which has an estimated size of 4.1 Mb and exhibits 3,790 coding sequences, with a G+C content of 59.94 mol%. PMID:29650573
Kravatsky, Yuri; Chechetkin, Vladimir; Fedoseeva, Daria; Gorbacheva, Maria; Kravatskaya, Galina; Kretova, Olga; Tchurikov, Nickolai
2017-11-23
The efficient development of antiviral drugs, including efficient antiviral small interfering RNAs (siRNAs), requires continuous monitoring of the strict correspondence between a drug and the related highly variable viral DNA/RNA target(s). Deep sequencing is able to provide an assessment of both the general target conservation and the frequency of particular mutations in the different target sites. The aim of this study was to develop a reliable bioinformatic pipeline for the analysis of millions of short, deep sequencing reads corresponding to selected highly variable viral sequences that are drug target(s). The suggested bioinformatic pipeline combines the available programs and the ad hoc scripts based on an original algorithm of the search for the conserved targets in the deep sequencing data. We also present the statistical criteria for the threshold of reliable mutation detection and for the assessment of variations between corresponding data sets. These criteria are robust against the possible sequencing errors in the reads. As an example, the bioinformatic pipeline is applied to the study of the conservation of RNA interference (RNAi) targets in human immunodeficiency virus 1 (HIV-1) subtype A. The developed pipeline is freely available to download at the website http://virmut.eimb.ru/. Brief comments and comparisons between VirMut and other pipelines are also presented.
A new method for enhancer prediction based on deep belief network.
Bu, Hongda; Gan, Yanglan; Wang, Yang; Zhou, Shuigeng; Guan, Jihong
2017-10-16
Studies have shown that enhancers are significant regulatory elements to play crucial roles in gene expression regulation. Since enhancers are unrelated to the orientation and distance to their target genes, it is a challenging mission for scholars and researchers to accurately predicting distal enhancers. In the past years, with the high-throughout ChiP-seq technologies development, several computational techniques emerge to predict enhancers using epigenetic or genomic features. Nevertheless, the inconsistency of computational models across different cell-lines and the unsatisfactory prediction performance call for further research in this area. Here, we propose a new Deep Belief Network (DBN) based computational method for enhancer prediction, which is called EnhancerDBN. This method combines diverse features, composed of DNA sequence compositional features, DNA methylation and histone modifications. Our computational results indicate that 1) EnhancerDBN outperforms 13 existing methods in prediction, and 2) GC content and DNA methylation can serve as relevant features for enhancer prediction. Deep learning is effective in boosting the performance of enhancer prediction.
Aftershock occurrence rate decay for individual sequences and catalogs
NASA Astrophysics Data System (ADS)
Nyffenegger, Paul A.
One of the earliest observations of the Earth's seismicity is that the rate of aftershock occurrence decays with time according to a power law commonly known as modified Omori-law (MOL) decay. However, the physical reasons for aftershock occurrence and the empirical decay in rate remain unclear despite numerous models that yield similar rate decay behavior. Key problems in relating the observed empirical relationship to the physical conditions of the mainshock and fault are the lack of studies including small magnitude mainshocks and the lack of uniformity between studies. We use simulated aftershock sequences to investigate the factors which influence the maximum likelihood (ML) estimate of the Omori-law p value, the parameter describing aftershock occurrence rate decay, for both individual aftershock sequences and "stacked" or superposed sequences. Generally the ML estimate of p is accurate, but since the ML estimated uncertainty is unaffected by whether the sequence resembles an MOL model, a goodness-of-fit test such as the Anderson-Darling statistic is necessary. While stacking aftershock sequences permits the study of entire catalogs and sequences with small aftershock populations, stacking introduces artifacts. The p value for stacked sequences is approximately equal to the mean of the individual sequence p values. We apply single-link cluster analysis to identify all aftershock sequences from eleven regional seismicity catalogs. We observe two new mathematically predictable empirical relationships for the distribution of aftershock sequence populations. The average properties of aftershock sequences are not correlated with tectonic environment, but aftershock populations and p values do show a depth dependence. The p values show great variability with time, and large values or changes in p sometimes precedes major earthquakes. Studies of teleseismic earthquake catalogs over the last twenty years have led seismologists to question seismicity models and aftershock sequence decay for deep sequences. For seven exceptional deep sequences, we conclude that MOL decay adequately describes these sequences, and little difference exists compared to shallow sequences. However, they do include larger aftershock populations compared to most deep sequences. These results imply that p values for deep sequences are larger than those for intermediate depth sequences.
An Anatomy of a Seismic Sequence in a Deep Gold Mine
NASA Astrophysics Data System (ADS)
Gibowicz, S. J.
1997-12-01
An unusual swarm-like seismic sequence occurred in April 1993 at the Western Deep Levels gold mine, South Africa. Altogether 199 events with moment magnitude from -0.5 to 3.1 were recorded and located by the mine seismic network. The sequence lasted 12 days and was composed in fact of four main shock-aftershocks sequences, closely following each other in space and time. The events were confined to a volume of rock extending to 670 m in the N-S, 630 m in the E-W, and 390 m in the vertical directions. The first sequence lasted 179 hours and the second only 13 hours, being interrupted by the third sequence which lasted 31 hours, being in turn interrupted by the fourth sequence. The parameter p, describing the rate of occurrence of aftershocks, ranged from 0.7 to 1. The first sequence is characterized by the lowest value of the fractal correlation dimension D = 1.75 and the second by the highest value of D = 2.4, whereas the third and fourth sequences are characterized by the middle value of D = 1.9.¶The corner frequencies of P and S waves are in close proximity and range from 14 to 220 Hz. A display of source parameters as a function of time shows that the four main shocks are most distinctly marked by their source radius. For 46 events a moment tensor inversion was performed. In most cases the double-couple component is dominant, ranging from 60 to 90 percent of the solution. The double-couple solutions correspond to the same number of normal and reverse faults and oblique-slip focal mechanisms. An analysis of space distribution of P, T and B axes reveals that the distribution of B axes is the most regular.
Matsumura, Emilyn E; Coletta-Filho, Helvecio D; Nouri, Shahideh; Falk, Bryce W; Nerva, Luca; Oliveira, Tiago S; Dorta, Silvia O; Machado, Marcos A
2017-04-24
Citrus sudden death (CSD) has caused the death of approximately four million orange trees in a very important citrus region in Brazil. Although its etiology is still not completely clear, symptoms and distribution of affected plants indicate a viral disease. In a search for viruses associated with CSD, we have performed a comparative high-throughput sequencing analysis of the transcriptome and small RNAs from CSD-symptomatic and -asymptomatic plants using the Illumina platform. The data revealed mixed infections that included Citrus tristeza virus (CTV) as the most predominant virus, followed by the Citrus sudden death-associated virus (CSDaV), Citrus endogenous pararetrovirus (CitPRV) and two putative novel viruses tentatively named Citrus jingmen-like virus (CJLV), and Citrus virga-like virus (CVLV). The deep sequencing analyses were sensitive enough to differentiate two genotypes of both viruses previously associated with CSD-affected plants: CTV and CSDaV. Our data also showed a putative association of the CSD-symptomatic plants with a specific CSDaV genotype and a likely association with CitPRV as well, whereas the two putative novel viruses showed to be more associated with CSD-asymptomatic plants. This is the first high-throughput sequencing-based study of the viral sequences present in CSD-affected citrus plants, and generated valuable information for further CSD studies.
Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.
Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo
2016-01-11
Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.
Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields
NASA Astrophysics Data System (ADS)
Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo
2016-01-01
Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.
A deep learning pipeline for Indian dance style classification
NASA Astrophysics Data System (ADS)
Dewan, Swati; Agarwal, Shubham; Singh, Navjyoti
2018-04-01
In this paper, we address the problem of dance style classification to classify Indian dance or any dance in general. We propose a 3-step deep learning pipeline. First, we extract 14 essential joint locations of the dancer from each video frame, this helps us to derive any body region location within the frame, we use this in the second step which forms the main part of our pipeline. Here, we divide the dancer into regions of important motion in each video frame. We then extract patches centered at these regions. Main discriminative motion is captured in these patches. We stack the features from all such patches of a frame into a single vector and form our hierarchical dance pose descriptor. Finally, in the third step, we build a high level representation of the dance video using the hierarchical descriptors and train it using a Recurrent Neural Network (RNN) for classification. Our novelty also lies in the way we use multiple representations for a single video. This helps us to: (1) Overcome the RNN limitation of learning small sequences over big sequences such as dance; (2) Extract more data from the available dataset for effective deep learning by training multiple representations. Our contributions in this paper are three-folds: (1) We provide a deep learning pipeline for classification of any form of dance; (2) We prove that a segmented representation of a dance video works well with sequence learning techniques for recognition purposes; (3) We extend and refine the ICD dataset and provide a new dataset for evaluation of dance. Our model performs comparable or better in some cases than the state-of-the-art on action recognition benchmarks.
Genome-wide prediction of cis-regulatory regions using supervised deep learning methods.
Li, Yifeng; Shi, Wenqiang; Wasserman, Wyeth W
2018-05-31
In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide. Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome). The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations.
Predicting human protein function with multi-task deep neural networks.
Fa, Rui; Cozzetto, Domenico; Wan, Cen; Jones, David T
2018-01-01
Machine learning methods for protein function prediction are urgently needed, especially now that a substantial fraction of known sequences remains unannotated despite the extensive use of functional assignments based on sequence similarity. One major bottleneck supervised learning faces in protein function prediction is the structured, multi-label nature of the problem, because biological roles are represented by lists of terms from hierarchically organised controlled vocabularies such as the Gene Ontology. In this work, we build on recent developments in the area of deep learning and investigate the usefulness of multi-task deep neural networks (MTDNN), which consist of upstream shared layers upon which are stacked in parallel as many independent modules (additional hidden layers with their own output units) as the number of output GO terms (the tasks). MTDNN learns individual tasks partially using shared representations and partially from task-specific characteristics. When no close homologues with experimentally validated functions can be identified, MTDNN gives more accurate predictions than baseline methods based on annotation frequencies in public databases or homology transfers. More importantly, the results show that MTDNN binary classification accuracy is higher than alternative machine learning-based methods that do not exploit commonalities and differences among prediction tasks. Interestingly, compared with a single-task predictor, the performance improvement is not linearly correlated with the number of tasks in MTDNN, but medium size models provide more improvement in our case. One of advantages of MTDNN is that given a set of features, there is no requirement for MTDNN to have a bootstrap feature selection procedure as what traditional machine learning algorithms do. Overall, the results indicate that the proposed MTDNN algorithm improves the performance of protein function prediction. On the other hand, there is still large room for deep learning techniques to further enhance prediction ability.
Hanriot, Lucie; Keime, Céline; Gay, Nadine; Faure, Claudine; Dossat, Carole; Wincker, Patrick; Scoté-Blachon, Céline; Peyron, Christelle; Gandrillon, Olivier
2008-01-01
Background "Open" transcriptome analysis methods allow to study gene expression without a priori knowledge of the transcript sequences. As of now, SAGE (Serial Analysis of Gene Expression), LongSAGE and MPSS (Massively Parallel Signature Sequencing) are the mostly used methods for "open" transcriptome analysis. Both LongSAGE and MPSS rely on the isolation of 21 pb tag sequences from each transcript. In contrast to LongSAGE, the high throughput sequencing method used in MPSS enables the rapid sequencing of very large libraries containing several millions of tags, allowing deep transcriptome analysis. However, a bias in the complexity of the transcriptome representation obtained by MPSS was recently uncovered. Results In order to make a deep analysis of mouse hypothalamus transcriptome avoiding the limitation introduced by MPSS, we combined LongSAGE with the Solexa sequencing technology and obtained a library of more than 11 millions of tags. We then compared it to a LongSAGE library of mouse hypothalamus sequenced with the Sanger method. Conclusion We found that Solexa sequencing technology combined with LongSAGE is perfectly suited for deep transcriptome analysis. In contrast to MPSS, it gives a complex representation of transcriptome as reliable as a LongSAGE library sequenced by the Sanger method. PMID:18796152
Wang, Chen; Han, Jian; Liu, Chonghuai; Kibet, Korir Nicholas; Kayesh, Emrul; Shangguan, Lingfei; Li, Xiaoying; Fang, Jinggui
2012-03-29
MicroRNA (miRNA) is a class of functional non-coding small RNA with 19-25 nucleotides in length while Amur grape (Vitis amurensis Rupr.) is an important wild fruit crop with the strongest cold resistance among the Vitis species, is used as an excellent breeding parent for grapevine, and has elicited growing interest in wine production. To date, there is a relatively large number of grapevine miRNAs (vv-miRNAs) from cultivated grapevine varieties such as Vitis vinifera L. and hybrids of V. vinifera and V. labrusca, but there is no report on miRNAs from Vitis amurensis Rupr, a wild grapevine species. A small RNA library from Amur grape was constructed and Solexa technology used to perform deep sequencing of the library followed by subsequent bioinformatics analysis to identify new miRNAs. In total, 126 conserved miRNAs belonging to 27 miRNA families were identified, and 34 known but non-conserved miRNAs were also found. Significantly, 72 new potential Amur grape-specific miRNAs were discovered. The sequences of these new potential va-miRNAs were further validated through miR-RACE, and accumulation of 18 new va-miRNAs in seven tissues of grapevines confirmed by real time RT-PCR (qRT-PCR) analysis. The expression levels of va-miRNAs in flowers and berries were found to be basically consistent in identity to those from deep sequenced sRNAs libraries of combined corresponding tissues. We also describe the conservation and variation of va-miRNAs using miR-SNPs and miR-LDs during plant evolution based on comparison of orthologous sequences, and further reveal that the number and sites of miR-SNP in diverse miRNA families exhibit distinct divergence. Finally, 346 target genes for the new miRNAs were predicted and they include a number of Amur grape stress tolerance genes and many genes regulating anthocyanin synthesis and sugar metabolism. Deep sequencing of short RNAs from Amur grape flowers and berries identified 72 new potential miRNAs and 34 known but non-conserved miRNAs, indicating that specific miRNAs exist in Amur grape. These results show that a number of regulatory miRNAs exist in Amur grape and play an important role in Amur grape growth, development, and response to abiotic or biotic stress.
Transcriptome and Small RNA Deep Sequencing Reveals Deregulation of miRNA Biogenesis in Human Glioma
Moore, Lynette M.; Kivinen, Virpi; Liu, Yuexin; Annala, Matti; Cogdell, David; Liu, Xiuping; Liu, Chang-Gong; Sawaya, Raymond; Yli-Harja, Olli; Shmulevich, Ilya; Fuller, Gregory N.; Zhang, Wei; Nykter, Matti
2013-01-01
Altered expression of oncogenic and tumor-suppressing microRNAs (miRNAs) is widely associated with tumorigenesis. However, the regulatory mechanisms underlying these alterations are poorly understood. We sought to shed light on the deregulation of miRNA biogenesis promoting the aberrant miRNA expression profiles identified in these tumors. Using sequencing technology to perform both whole-transcriptome and small RNA sequencing of glioma patient samples, we examined precursor and mature miRNAs to directly evaluate the miRNA maturation process, and interrogated expression profiles for genes involved in the major steps of miRNA biogenesis. We found that ratios of mature to precursor forms of a large number of miRNAs increased with the progression from normal brain to low-grade and then to high-grade gliomas. The expression levels of genes involved in each of the three major steps of miRNA biogenesis (nuclear processing, nucleo-cytoplasmic transport, and cytoplasmic processing) were systematically altered in glioma tissues. Survival analysis of an independent data set demonstrated that the alteration of genes involved in miRNA maturation correlates with survival in glioma patients. Direct quantification of miRNA maturation with deep sequencing demonstrated that deregulation of the miRNA biogenesis pathway is a hallmark for glioma genesis and progression. PMID:23007860
Environmental surveillance of viruses by tangential flow filtration and metagenomic reconstruction.
Furtak, Vyacheslav; Roivainen, Merja; Mirochnichenko, Olga; Zagorodnyaya, Tatiana; Laassri, Majid; Zaidi, Sohail Z; Rehman, Lubna; Alam, Muhammad M; Chizhikov, Vladimir; Chumakov, Konstantin
2016-04-14
An approach is proposed for environmental surveillance of poliovirus by concentrating sewage samples with tangential flow filtration (TFF) followed by deep sequencing of viral RNA. Subsequent to testing the method with samples from Finland, samples from Pakistan, a country endemic for poliovirus, were investigated. Genomic sequencing was either performed directly, for unbiased identification of viruses regardless of their ability to grow in cell cultures, or after virus enrichment by cell culture or immunoprecipitation. Bioinformatics enabled separation and determination of individual consensus sequences. Overall, deep sequencing of the entire viral population identified polioviruses, non-polio enteroviruses, and other viruses. In Pakistani sewage samples, adeno-associated virus, unable to replicate autonomously in cell cultures, was the most abundant human virus. The presence of recombinants of wild polioviruses of serotype 1 (WPV1) was also inferred, whereby currently circulating WPV1 of south-Asian (SOAS) lineage comprised two sub-lineages depending on their non-capsid region origin. Complete genome analyses additionally identified point mutants and intertypic recombinants between attenuated Sabin strains in the Pakistani samples, and in one Finnish sample. The approach could allow rapid environmental surveillance of viruses causing human infections. It creates a permanent digital repository of the entire virome potentially useful for retrospective screening of future discovered viruses.
Integrated design, execution, and analysis of arrayed and pooled CRISPR genome-editing experiments.
Canver, Matthew C; Haeussler, Maximilian; Bauer, Daniel E; Orkin, Stuart H; Sanjana, Neville E; Shalem, Ophir; Yuan, Guo-Cheng; Zhang, Feng; Concordet, Jean-Paul; Pinello, Luca
2018-05-01
CRISPR (clustered regularly interspaced short palindromic repeats) genome-editing experiments offer enormous potential for the evaluation of genomic loci using arrayed single guide RNAs (sgRNAs) or pooled sgRNA libraries. Numerous computational tools are available to help design sgRNAs with optimal on-target efficiency and minimal off-target potential. In addition, computational tools have been developed to analyze deep-sequencing data resulting from genome-editing experiments. However, these tools are typically developed in isolation and oftentimes are not readily translatable into laboratory-based experiments. Here, we present a protocol that describes in detail both the computational and benchtop implementation of an arrayed and/or pooled CRISPR genome-editing experiment. This protocol provides instructions for sgRNA design with CRISPOR (computational tool for the design, evaluation, and cloning of sgRNA sequences), experimental implementation, and analysis of the resulting high-throughput sequencing data with CRISPResso (computational tool for analysis of genome-editing outcomes from deep-sequencing data). This protocol allows for design and execution of arrayed and pooled CRISPR experiments in 4-5 weeks by non-experts, as well as computational data analysis that can be performed in 1-2 d by both computational and noncomputational biologists alike using web-based and/or command-line versions.
Improving Protein Fold Recognition by Deep Learning Networks
NASA Astrophysics Data System (ADS)
Jo, Taeho; Hou, Jie; Eickholt, Jesse; Cheng, Jianlin
2015-12-01
For accurate recognition of protein folds, a deep learning network method (DN-Fold) was developed to predict if a given query-template protein pair belongs to the same structural fold. The input used stemmed from the protein sequence and structural features extracted from the protein pair. We evaluated the performance of DN-Fold along with 18 different methods on Lindahl’s benchmark dataset and on a large benchmark set extracted from SCOP 1.75 consisting of about one million protein pairs, at three different levels of fold recognition (i.e., protein family, superfamily, and fold) depending on the evolutionary distance between protein sequences. The correct recognition rate of ensembled DN-Fold for Top 1 predictions is 84.5%, 61.5%, and 33.6% and for Top 5 is 91.2%, 76.5%, and 60.7% at family, superfamily, and fold levels, respectively. We also evaluated the performance of single DN-Fold (DN-FoldS), which showed the comparable results at the level of family and superfamily, compared to ensemble DN-Fold. Finally, we extended the binary classification problem of fold recognition to real-value regression task, which also show a promising performance. DN-Fold is freely available through a web server at http://iris.rnet.missouri.edu/dnfold.
Xu, Chang; Nezami Ranjbar, Mohammad R; Wu, Zhong; DiCarlo, John; Wang, Yexun
2017-01-03
Detection of DNA mutations at very low allele fractions with high accuracy will significantly improve the effectiveness of precision medicine for cancer patients. To achieve this goal through next generation sequencing, researchers need a detection method that 1) captures rare mutation-containing DNA fragments efficiently in the mix of abundant wild-type DNA; 2) sequences the DNA library extensively to deep coverage; and 3) distinguishes low level true variants from amplification and sequencing errors with high accuracy. Targeted enrichment using PCR primers provides researchers with a convenient way to achieve deep sequencing for a small, yet most relevant region using benchtop sequencers. Molecular barcoding (or indexing) provides a unique solution for reducing sequencing artifacts analytically. Although different molecular barcoding schemes have been reported in recent literature, most variant calling has been done on limited targets, using simple custom scripts. The analytical performance of barcode-aware variant calling can be significantly improved by incorporating advanced statistical models. We present here a highly efficient, simple and scalable enrichment protocol that integrates molecular barcodes in multiplex PCR amplification. In addition, we developed smCounter, an open source, generic, barcode-aware variant caller based on a Bayesian probabilistic model. smCounter was optimized and benchmarked on two independent read sets with SNVs and indels at 5 and 1% allele fractions. Variants were called with very good sensitivity and specificity within coding regions. We demonstrated that we can accurately detect somatic mutations with allele fractions as low as 1% in coding regions using our enrichment protocol and variant caller.
Andersson, P; Klein, M; Lilliebridge, R A; Giffard, P M
2013-09-01
Ultra-deep Illumina sequencing was performed on whole genome amplified DNA derived from a Chlamydia trachomatis-positive vaginal swab. Alignment of reads with reference genomes allowed robust SNP identification from the C. trachomatis chromosome and plasmid. This revealed that the C. trachomatis in the specimen was very closely related to the sequenced urogenital, serovar F, clade T1 isolate F-SW4. In addition, high genome-wide coverage was obtained for Prevotella melaninogenica, Gardnerella vaginalis, Clostridiales genomosp. BVAB3 and Mycoplasma hominis. This illustrates the potential of metagenome data to provide high resolution bacterial typing data from multiple taxa in a diagnostic specimen. ©2013 The Authors Clinical Microbiology and Infection ©2013 European Society of Clinical Microbiology and Infectious Diseases.
NASA Technical Reports Server (NTRS)
Wissler, Steven S.; Maldague, Pierre; Rocca, Jennifer; Seybold, Calina
2006-01-01
The Deep Impact mission was ambitious and challenging. JPL's well proven, easily adaptable multi-mission sequence planning tools combined with integrated spacecraft subsystem models enabled a small operations team to develop, validate, and execute extremely complex sequence-based activities within very short development times. This paper focuses on the core planning tool used in the mission, APGEN. It shows how the multi-mission design and adaptability of APGEN made it possible to model spacecraft subsystems as well as ground assets throughout the lifecycle of the Deep Impact project, starting with models of initial, high-level mission objectives, and culminating in detailed predictions of spacecraft behavior during mission-critical activities.
Transcriptome Sequences Resolve Deep Relationships of the Grape Family
Wen, Jun; Xiong, Zhiqiang; Nie, Ze-Long; Mao, Likai; Zhu, Yabing; Kan, Xian-Zhao; Ickert-Bond, Stefanie M.; Gerrath, Jean; Zimmer, Elizabeth A.; Fang, Xiao-Dong
2013-01-01
Previous phylogenetic studies of the grape family (Vitaceae) yielded poorly resolved deep relationships, thus impeding our understanding of the evolution of the family. Next-generation sequencing now offers access to protein coding sequences very easily, quickly and cost-effectively. To improve upon earlier work, we extracted 417 orthologous single-copy nuclear genes from the transcriptomes of 15 species of the Vitaceae, covering its phylogenetic diversity. The resulting transcriptome phylogeny provides robust support for the deep relationships, showing the phylogenetic utility of transcriptome data for plants over a time scale at least since the mid-Cretaceous. The pros and cons of transcriptome data for phylogenetic inference in plants are also evaluated. PMID:24069307
Deep Learning and Its Applications in Biomedicine.
Cao, Chensi; Liu, Feng; Tan, Hai; Song, Deshou; Shu, Wenjie; Li, Weizhong; Zhou, Yiming; Bo, Xiaochen; Xie, Zhi
2018-02-01
Advances in biological and medical technologies have been providing us explosive volumes of biological and physiological data, such as medical images, electroencephalography, genomic and protein sequences. Learning from these data facilitates the understanding of human health and disease. Developed from artificial neural networks, deep learning-based algorithms show great promise in extracting features and learning patterns from complex data. The aim of this paper is to provide an overview of deep learning techniques and some of the state-of-the-art applications in the biomedical field. We first introduce the development of artificial neural network and deep learning. We then describe two main components of deep learning, i.e., deep learning architectures and model optimization. Subsequently, some examples are demonstrated for deep learning applications, including medical image classification, genomic sequence analysis, as well as protein structure classification and prediction. Finally, we offer our perspectives for the future directions in the field of deep learning. Copyright © 2018. Production and hosting by Elsevier B.V.
Tang, Kai; Lin, Dan; Zheng, Qiang; Liu, Keshao; Yang, Yujie; Han, Yu; Jiao, Nianzhi
2017-06-27
Marine phages are spectacularly diverse in nature. Dozens of roseophages infecting members of Roseobacter clade bacteria were isolated and characterized, exhibiting a very high degree of genetic diversity. In the present study, the induction of two temperate bacteriophages, namely, vB_ThpS-P1 and vB_PeaS-P1, was performed in Roseobacter clade bacteria isolated from the deep-sea water, Thiobacimonas profunda JLT2016 and Pelagibaca abyssi JLT2014, respectively. Two novel phages in morphological, genomic and proteomic features were presented, and their phylogeny and evolutionary relationships were explored by bioinformatic analysis. Electron microscopy showed that the morphology of the two phages were similar to that of siphoviruses. Genome sequencing indicated that the two phages were similar in size, organization, and content, thereby suggesting that these shared a common ancestor. Despite the presence of Mu-like phage head genes, the phages are more closely related to Rhodobacter phage RC1 than Mu phages in terms of gene content and sequence similarity. Based on comparative genomic and phylogenetic analysis, we propose a Mu-like head phage group to allow for the inclusion of Mu-like phages and two newly phages. The sequences of the Mu-like head phage group were widespread, occurring in each investigated metagenomes. Furthermore, the horizontal exchange of genetic material within the Mu-like head phage group might have involved a gene that was associated with phage phenotypic characteristics. This study is the first report on the complete genome sequences of temperate phages that infect deep-sea roseobacters, belonging to the Mu-like head phage group. The Mu-like head phage group might represent a small but ubiquitous fraction of marine viral diversity.
VirusDetect: An automated pipeline for efficient virus discovery using deep sequencing of small RNAs
USDA-ARS?s Scientific Manuscript database
Accurate detection of viruses in plants and animals is critical for agriculture production and human health. Deep sequencing and assembly of virus-derived siRNAs has proven to be a highly efficient approach for virus discovery. However, to date no computational tools specifically designed for both k...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gordon, Sean
2013-03-01
Sean Gordon of the USDA on Natural variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions at the 8th Annual Genomics of Energy Environment Meeting on March 27, 2013 in Walnut Creek, CA.
Microbial Diversity in Deep-sea Methane Seep Sediments Presented by SSU rRNA Gene Tag Sequencing
Nunoura, Takuro; Takaki, Yoshihiro; Kazama, Hiromi; Hirai, Miho; Ashi, Juichiro; Imachi, Hiroyuki; Takai, Ken
2012-01-01
Microbial community structures in methane seep sediments in the Nankai Trough were analyzed by tag-sequencing analysis for the small subunit (SSU) rRNA gene using a newly developed primer set. The dominant members of Archaea were Deep-sea Hydrothermal Vent Euryarchaeotic Group 6 (DHVEG 6), Marine Group I (MGI) and Deep Sea Archaeal Group (DSAG), and those in Bacteria were Alpha-, Gamma-, Delta- and Epsilonproteobacteria, Chloroflexi, Bacteroidetes, Planctomycetes and Acidobacteria. Diversity and richness were examined by 8,709 and 7,690 tag-sequences from sediments at 5 and 25 cm below the seafloor (cmbsf), respectively. The estimated diversity and richness in the methane seep sediment are as high as those in soil and deep-sea hydrothermal environments, although the tag-sequences obtained in this study were not sufficient to show whole microbial diversity in this analysis. We also compared the diversity and richness of each taxon/division between the sediments from the two depths, and found that the diversity and richness of some taxa/divisions varied significantly along with the depth. PMID:22510646
Chung, Jongsuk; Son, Dae-Soon; Jeon, Hyo-Jeong; Kim, Kyoung-Mee; Park, Gahee; Ryu, Gyu Ha; Park, Woong-Yang; Park, Donghyun
2016-01-01
Targeted capture massively parallel sequencing is increasingly being used in clinical settings, and as costs continue to decline, use of this technology may become routine in health care. However, a limited amount of tissue has often been a challenge in meeting quality requirements. To offer a practical guideline for the minimum amount of input DNA for targeted sequencing, we optimized and evaluated the performance of targeted sequencing depending on the input DNA amount. First, using various amounts of input DNA, we compared commercially available library construction kits and selected Agilent’s SureSelect-XT and KAPA Biosystems’ Hyper Prep kits as the kits most compatible with targeted deep sequencing using Agilent’s SureSelect custom capture. Then, we optimized the adapter ligation conditions of the Hyper Prep kit to improve library construction efficiency and adapted multiplexed hybrid selection to reduce the cost of sequencing. In this study, we systematically evaluated the performance of the optimized protocol depending on the amount of input DNA, ranging from 6.25 to 200 ng, suggesting the minimal input DNA amounts based on coverage depths required for specific applications. PMID:27220682
Draft Genome Sequence of Pseudomonas oceani DSM 100277T, a Deep-Sea Bacterium.
García-Valdés, Elena; Gomila, Margarita; Mulet, Magdalena; Lalucat, Jorge
2018-04-12
Pseudomonas oceani DSM 100277 T was isolated from deep seawater in the Okinawa Trough at 1390 m. P. oceani belongs to the Pseudomonas pertucinogena group. Here, we report the draft genome sequence of P. oceani , which has an estimated size of 4.1 Mb and exhibits 3,790 coding sequences, with a G+C content of 59.94 mol%. Copyright © 2018 García-Valdés et al.
Brown, Shawn P; Callaham, Mac A; Oliver, Alena K; Jumpponen, Ari
2013-12-01
Prescribed burning is a common management tool to control fuel loads, ground vegetation, and facilitate desirable game species. We evaluated soil fungal community responses to long-term prescribed fire treatments in a loblolly pine forest on the Piedmont of Georgia and utilized deep Internal Transcribed Spacer Region 1 (ITS1) amplicon sequencing afforded by the recent Ion Torrent Personal Genome Machine (PGM). These deep sequence data (19,000 + reads per sample after subsampling) indicate that frequent fires (3-year fire interval) shift soil fungus communities, whereas infrequent fires (6-year fire interval) permit system resetting to a state similar to that without prescribed fire. Furthermore, in nonmetric multidimensional scaling analyses, primarily ectomycorrhizal taxa were correlated with axes associated with long fire intervals, whereas soil saprobes tended to be correlated with the frequent fire recurrence. We conclude that (1) multiplexed Ion Torrent PGM analyses allow deep cost effective sequencing of fungal communities but may suffer from short read lengths and inconsistent sequence quality adjacent to the sequencing adaptor; (2) frequent prescribed fires elicit a shift in soil fungal communities; and (3) such shifts do not occur when fire intervals are longer. Our results emphasize the general responsiveness of these forests to management, and the importance of fire return intervals in meeting management objectives. © 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.
Deep sequencing reveals double mutations in cis of MPL exon 10 in myeloproliferative neoplasms.
Pietra, Daniela; Brisci, Angela; Rumi, Elisa; Boggi, Sabrina; Elena, Chiara; Pietrelli, Alessandro; Bordoni, Roberta; Ferrari, Maurizio; Passamonti, Francesco; De Bellis, Gianluca; Cremonesi, Laura; Cazzola, Mario
2011-04-01
Somatic mutations of MPL exon 10, mainly involving a W515 substitution, have been described in JAK2 (V617F)-negative patients with essential thrombocythemia and primary myelofibrosis. We used direct sequencing and high-resolution melt analysis to identify mutations of MPL exon 10 in 570 patients with myeloproliferative neoplasms, and allele specific PCR and deep sequencing to further characterize a subset of mutated patients. Somatic mutations were detected in 33 of 221 patients (15%) with JAK2 (V617F)-negative essential thrombocythemia or primary myelofibrosis. Only one patient with essential thrombocythemia carried both JAK2 (V617F) and MPL (W515L). High-resolution melt analysis identified abnormal patterns in all the MPL mutated cases, while direct sequencing did not detect the mutant MPL in one fifth of them. In 3 cases carrying double MPL mutations, deep sequencing analysis showed identical load and location in cis of the paired lesions, indicating their simultaneous occurrence on the same chromosome.
Estudio de la población estelar de varios cúmulos en Carina
NASA Astrophysics Data System (ADS)
Molina-Lera, J. A.; Baume, G. L.; Carraro, G.; Costa, E.
2015-08-01
Based on deep photometric data in the bands, complemented with infrared 2MASS data, we conducted an analysis of the fundamental parameters of six open clusters located in the Carina region. To perform a systematic study we developed a specialized code. In particular, we investigated the behavior of the respective lower main sequences. Our analysis indicated the presence of a significant population of pre-sequence stars in several of the clusters. We therefore obtained estimated values of contraction ages. Furthermore, we have determined the slopes of the initial mass functions of the studied clusters.
Juan, Li; Tong, Hong-li; Zhang, Pengjun; Guo, Guanghong; Wang, Zi; Wen, Xinyu; Dong, Zhennan; Tian, Ya-ping
2014-09-03
Small non-coding microRNAs (miRNAs) are involved in cancer development and progression, and serum profiles of cervical cancer patients may be useful for identifying novel miRNAs. We performed deep sequencing on serum pools of cervical cancer patients and healthy controls with 3 replicates and constructed a small RNA library. We used MIREAP to predict novel miRNAs and identified 2 putative novel miRNAs between serum pools of cervical cancer patients and healthy controls after filtering out pseudo-pre-miRNAs using Triplet-SVM analysis. The 2 putative novel miRNAs were validated by real time PCR and were significantly decreased in cervical cancer patients compared with healthy controls. One novel miRNA had an area under curve (AUC) of 0.921 (95% CI: 0.883, 0.959) with a sensitivity of 85.7% and a specificity of 88.2% when discriminating between cervical cancer patients and healthy controls. Our results suggest that characterizing serum profiles of cervical cancers by Solexa sequencing may be a good method for identifying novel miRNAs and that the validated novel miRNAs described here may be cervical cancer-associated biomarkers.
RaptorX-Property: a web server for protein structure property prediction.
Wang, Sheng; Li, Wei; Liu, Shiwang; Xu, Jinbo
2016-07-08
RaptorX Property (http://raptorx2.uchicago.edu/StructurePropertyPred/predict/) is a web server predicting structure property of a protein sequence without using any templates. It outperforms other servers, especially for proteins without close homologs in PDB or with very sparse sequence profile (i.e. carries little evolutionary information). This server employs a powerful in-house deep learning model DeepCNF (Deep Convolutional Neural Fields) to predict secondary structure (SS), solvent accessibility (ACC) and disorder regions (DISO). DeepCNF not only models complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent property labels. Our experimental results show that, tested on CASP10, CASP11 and the other benchmarks, this server can obtain ∼84% Q3 accuracy for 3-state SS, ∼72% Q8 accuracy for 8-state SS, ∼66% Q3 accuracy for 3-state solvent accessibility, and ∼0.89 area under the ROC curve (AUC) for disorder prediction. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Gencay, Mikael; Hübner, Kirsten; Gohl, Peter; Seffner, Anja; Weizenegger, Michael; Neofytos, Dionysios; Batrla, Richard; Woeste, Andreas; Kim, Hyon-suk; Westergaard, Gaston; Reinsch, Christine; Brill, Eva; Thu Thuy, Pham Thi; Hoang, Bui Huu; Sonderup, Mark; Spearman, C. Wendy; Pabinger, Stephan; Gautier, Jérémie; Brancaccio, Giuseppina; Fasano, Massimo; Santantonio, Teresa; Gaeta, Giovanni B.; Nauck, Markus; Kaminski, Wolfgang E.
2017-01-01
The diversity of the hepatitis B surface antigen (HBsAg) has a significant impact on the performance of diagnostic screening tests and the clinical outcome of hepatitis B infection. Neutralizing or diagnostic antibodies against the HBsAg are directed towards its highly conserved major hydrophilic region (MHR), in particular towards its “a” determinant subdomain. Here, we explored, on a global scale, the genetic diversity of the HBsAg MHR in a large, multi-ethnic cohort of randomly selected subjects with HBV infection from four continents. A total of 1553 HBsAg positive blood samples of subjects originating from 20 different countries across Africa, America, Asia and central Europe were characterized for amino acid variation in the MHR. Using highly sensitive ultra-deep sequencing, we found 72.8% of the successfully sequenced subjects (n = 1391) demonstrated amino acid sequence variation in the HBsAg MHR. This indicates that the global variation frequency in the HBsAg MHR is threefold higher than previously reported. The majority of the amino acid mutations were found in the HBV genotypes B (28.9%) and C (25.4%). Collectively, we identified 345 distinct amino acid mutations in the MHR. Among these, we report 62 previously unknown mutations, which extends the worldwide pool of currently known HBsAg MHR mutations by 22%. Importantly, topological analysis identified the “a” determinant upstream flanking region as the structurally most diverse subdomain of the HBsAg MHR. The highest prevalence of “a” determinant region mutations was observed in subjects from Asia, followed by the African, American and European cohorts, respectively. Finally, we found that more than half (59.3%) of all HBV subjects investigated carried multiple MHR mutations. Together, this worldwide ultra-deep sequencing based genotyping study reveals that the global prevalence and structural complexity of variation in the hepatitis B surface antigen have, to date, been significantly underappreciated. PMID:28472040
Gencay, Mikael; Hübner, Kirsten; Gohl, Peter; Seffner, Anja; Weizenegger, Michael; Neofytos, Dionysios; Batrla, Richard; Woeste, Andreas; Kim, Hyon-Suk; Westergaard, Gaston; Reinsch, Christine; Brill, Eva; Thu Thuy, Pham Thi; Hoang, Bui Huu; Sonderup, Mark; Spearman, C Wendy; Pabinger, Stephan; Gautier, Jérémie; Brancaccio, Giuseppina; Fasano, Massimo; Santantonio, Teresa; Gaeta, Giovanni B; Nauck, Markus; Kaminski, Wolfgang E
2017-01-01
The diversity of the hepatitis B surface antigen (HBsAg) has a significant impact on the performance of diagnostic screening tests and the clinical outcome of hepatitis B infection. Neutralizing or diagnostic antibodies against the HBsAg are directed towards its highly conserved major hydrophilic region (MHR), in particular towards its "a" determinant subdomain. Here, we explored, on a global scale, the genetic diversity of the HBsAg MHR in a large, multi-ethnic cohort of randomly selected subjects with HBV infection from four continents. A total of 1553 HBsAg positive blood samples of subjects originating from 20 different countries across Africa, America, Asia and central Europe were characterized for amino acid variation in the MHR. Using highly sensitive ultra-deep sequencing, we found 72.8% of the successfully sequenced subjects (n = 1391) demonstrated amino acid sequence variation in the HBsAg MHR. This indicates that the global variation frequency in the HBsAg MHR is threefold higher than previously reported. The majority of the amino acid mutations were found in the HBV genotypes B (28.9%) and C (25.4%). Collectively, we identified 345 distinct amino acid mutations in the MHR. Among these, we report 62 previously unknown mutations, which extends the worldwide pool of currently known HBsAg MHR mutations by 22%. Importantly, topological analysis identified the "a" determinant upstream flanking region as the structurally most diverse subdomain of the HBsAg MHR. The highest prevalence of "a" determinant region mutations was observed in subjects from Asia, followed by the African, American and European cohorts, respectively. Finally, we found that more than half (59.3%) of all HBV subjects investigated carried multiple MHR mutations. Together, this worldwide ultra-deep sequencing based genotyping study reveals that the global prevalence and structural complexity of variation in the hepatitis B surface antigen have, to date, been significantly underappreciated.
Lee, Hyung-Chul; Ryu, Ho-Geol; Chung, Eun-Jin; Jung, Chul-Woo
2018-03-01
The discrepancy between predicted effect-site concentration and measured bispectral index is problematic during intravenous anesthesia with target-controlled infusion of propofol and remifentanil. We hypothesized that bispectral index during total intravenous anesthesia would be more accurately predicted by a deep learning approach. Long short-term memory and the feed-forward neural network were sequenced to simulate the pharmacokinetic and pharmacodynamic parts of an empirical model, respectively, to predict intraoperative bispectral index during combined use of propofol and remifentanil. Inputs of long short-term memory were infusion histories of propofol and remifentanil, which were retrieved from target-controlled infusion pumps for 1,800 s at 10-s intervals. Inputs of the feed-forward network were the outputs of long short-term memory and demographic data such as age, sex, weight, and height. The final output of the feed-forward network was the bispectral index. The performance of bispectral index prediction was compared between the deep learning model and previously reported response surface model. The model hyperparameters comprised 8 memory cells in the long short-term memory layer and 16 nodes in the hidden layer of the feed-forward network. The model training and testing were performed with separate data sets of 131 and 100 cases. The concordance correlation coefficient (95% CI) were 0.561 (0.560 to 0.562) in the deep learning model, which was significantly larger than that in the response surface model (0.265 [0.263 to 0.266], P < 0.001). The deep learning model-predicted bispectral index during target-controlled infusion of propofol and remifentanil more accurately compared to the traditional model. The deep learning approach in anesthetic pharmacology seems promising because of its excellent performance and extensibility.
Daikoku, Tohru; Oyama, Yukari; Yajima, Misako; Sekizuka, Tsuyoshi; Kuroda, Makoto; Shimada, Yuka; Takehara, Kazuhiko; Miwa, Naoko; Okuda, Tomoko; Sata, Tetsutaro; Shiraki, Kimiyasu
2015-06-01
Herpes simplex virus 2 caused a genital ulcer, and a secondary herpetic whitlow appeared during acyclovir therapy. The secondary and recurrent whitlow isolates were acyclovir-resistant and temperature-sensitive in contrast to a genital isolate. We identified the ribonucleotide reductase mutation responsible for temperature-sensitivity by deep-sequencing analysis.
Recurrent and functional regulatory mutations in breast cancer.
Rheinbay, Esther; Parasuraman, Prasanna; Grimsby, Jonna; Tiao, Grace; Engreitz, Jesse M; Kim, Jaegil; Lawrence, Michael S; Taylor-Weiner, Amaro; Rodriguez-Cuevas, Sergio; Rosenberg, Mara; Hess, Julian; Stewart, Chip; Maruvka, Yosef E; Stojanov, Petar; Cortes, Maria L; Seepo, Sara; Cibulskis, Carrie; Tracy, Adam; Pugh, Trevor J; Lee, Jesse; Zheng, Zongli; Ellisen, Leif W; Iafrate, A John; Boehm, Jesse S; Gabriel, Stacey B; Meyerson, Matthew; Golub, Todd R; Baselga, Jose; Hidalgo-Miranda, Alfredo; Shioda, Toshi; Bernards, Andre; Lander, Eric S; Getz, Gad
2017-07-06
Genomic analysis of tumours has led to the identification of hundreds of cancer genes on the basis of the presence of mutations in protein-coding regions. By contrast, much less is known about cancer-causing mutations in non-coding regions. Here we perform deep sequencing in 360 primary breast cancers and develop computational methods to identify significantly mutated promoters. Clear signals are found in the promoters of three genes. FOXA1, a known driver of hormone-receptor positive breast cancer, harbours a mutational hotspot in its promoter leading to overexpression through increased E2F binding. RMRP and NEAT1, two non-coding RNA genes, carry mutations that affect protein binding to their promoters and alter expression levels. Our study shows that promoter regions harbour recurrent mutations in cancer with functional consequences and that the mutations occur at similar frequencies as in coding regions. Power analyses indicate that more such regions remain to be discovered through deep sequencing of adequately sized cohorts of patients.
Genomic region operation kit for flexible processing of deep sequencing data.
Ovaska, Kristian; Lyly, Lauri; Sahu, Biswajyoti; Jänne, Olli A; Hautaniemi, Sampsa
2013-01-01
Computational analysis of data produced in deep sequencing (DS) experiments is challenging due to large data volumes and requirements for flexible analysis approaches. Here, we present a mathematical formalism based on set algebra for frequently performed operations in DS data analysis to facilitate translation of biomedical research questions to language amenable for computational analysis. With the help of this formalism, we implemented the Genomic Region Operation Kit (GROK), which supports various DS-related operations such as preprocessing, filtering, file conversion, and sample comparison. GROK provides high-level interfaces for R, Python, Lua, and command line, as well as an extension C++ API. It supports major genomic file formats and allows storing custom genomic regions in efficient data structures such as red-black trees and SQL databases. To demonstrate the utility of GROK, we have characterized the roles of two major transcription factors (TFs) in prostate cancer using data from 10 DS experiments. GROK is freely available with a user guide from >http://csbi.ltdk.helsinki.fi/grok/.
Gu, Yifeng; Zhang, Lei; Chen, Xiaowu
2014-08-01
MicroRNAs (miRNAs) play an important role in gonadal development and differentiation in fish. However, understanding of the mechanism of this process is hindered by our poor knowledge of miRNA expression patterns in fish gonads. In this study, miRNA libraries derived from adult gonads of Paralichthys olivaceus were generated by using next-generation sequencing (NGS) technology. Bioinformatics analysis was performed to distinguish mature miRNA sequences from two classes of small RNAs represented in the sequencing data. A total of 141 mature miRNAs were identified, in which 21 miRNAs were found in P. olivaceus for the first time. Variance and preference of miRNAs expression were concluded from the deep sequencing reads. Some miRNAs, such as pol-miR-143, pol-miR-26a and pol-let-7a were found with quite high expression levels in both gonads, while some exhibited a clear sex-biased expression in different gonad. Approximate 20.0% and 13.1% of the isolated miRNAs were preferentially expressed in the testis (FC<0.5) or ovary (FC>2), respectively. The identification and the preliminary analysis of the sex-biased expression of miRNAs in P. olivaceus gonads in our work by using NGS will provide us a basic catalog of miRNAs to facilitate future improvement and exploitation of sexual regulatory mechanisms in P. olivaceus. Copyright © 2014. Published by Elsevier Inc.
Deep sequencing reveals cell-type-specific patterns of single-cell transcriptome variation.
Dueck, Hannah; Khaladkar, Mugdha; Kim, Tae Kyung; Spaethling, Jennifer M; Francis, Chantal; Suresh, Sangita; Fisher, Stephen A; Seale, Patrick; Beck, Sheryl G; Bartfai, Tamas; Kuhn, Bernhard; Eberwine, James; Kim, Junhyong
2015-06-09
Differentiation of metazoan cells requires execution of different gene expression programs but recent single-cell transcriptome profiling has revealed considerable variation within cells of seeming identical phenotype. This brings into question the relationship between transcriptome states and cell phenotypes. Additionally, single-cell transcriptomics presents unique analysis challenges that need to be addressed to answer this question. We present high quality deep read-depth single-cell RNA sequencing for 91 cells from five mouse tissues and 18 cells from two rat tissues, along with 30 control samples of bulk RNA diluted to single-cell levels. We find that transcriptomes differ globally across tissues with regard to the number of genes expressed, the average expression patterns, and within-cell-type variation patterns. We develop methods to filter genes for reliable quantification and to calibrate biological variation. All cell types include genes with high variability in expression, in a tissue-specific manner. We also find evidence that single-cell variability of neuronal genes in mice is correlated with that in rats consistent with the hypothesis that levels of variation may be conserved. Single-cell RNA-sequencing data provide a unique view of transcriptome function; however, careful analysis is required in order to use single-cell RNA-sequencing measurements for this purpose. Technical variation must be considered in single-cell RNA-sequencing studies of expression variation. For a subset of genes, biological variability within each cell type appears to be regulated in order to perform dynamic functions, rather than solely molecular noise.
Micropathogen Community Analysis in Hyalomma rufipes via High-Throughput Sequencing of Small RNAs
Luo, Jin; Liu, Min-Xuan; Ren, Qiao-Yun; Chen, Ze; Tian, Zhan-Cheng; Hao, Jia-Wei; Wu, Feng; Liu, Xiao-Cui; Luo, Jian-Xun; Yin, Hong; Wang, Hui; Liu, Guang-Yuan
2017-01-01
Ticks are important vectors in the transmission of a broad range of micropathogens to vertebrates, including humans. Because of the role of ticks in disease transmission, identifying and characterizing the micropathogen profiles of tick populations have become increasingly important. The objective of this study was to survey the micropathogens of Hyalomma rufipes ticks. Illumina HiSeq2000 technology was utilized to perform deep sequencing of small RNAs (sRNAs) extracted from field-collected H. rufipes ticks in Gansu Province, China. The resultant sRNA library data revealed that the surveyed tick populations produced reads that were homologous to St. Croix River Virus (SCRV) sequences. We also observed many reads that were homologous to microbial and/or pathogenic isolates, including bacteria, protozoa, and fungi. As part of this analysis, a phylogenetic tree was constructed to display the relationships among the homologous sequences that were identified. The study offered a unique opportunity to gain insight into the micropathogens of H. rufipes ticks. The effective control of arthropod vectors in the future will require knowledge of the micropathogen composition of vectors harboring infectious agents. Understanding the ecological factors that regulate vector propagation in association with the prevalence and persistence of micropathogen lineages is also imperative. These interactions may affect the evolution of micropathogen lineages, especially if the micropathogens rely on the vector or host for dispersal. The sRNA deep-sequencing approach used in this analysis provides an intuitive method to survey micropathogen prevalence in ticks and other vector species. PMID:28861401
NASA Astrophysics Data System (ADS)
Wang, Yiheng; Liu, Tong; Xu, Dong; Shi, Huidong; Zhang, Chaoyang; Mo, Yin-Yuan; Wang, Zheng
2016-01-01
The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named “DeepMethyl” to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.
Wang, Yiheng; Liu, Tong; Xu, Dong; Shi, Huidong; Zhang, Chaoyang; Mo, Yin-Yuan; Wang, Zheng
2016-01-22
The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named "DeepMethyl" to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.
Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning.
Teng, Haotian; Cao, Minh Duc; Hall, Michael B; Duarte, Tania; Wang, Sheng; Coin, Lachlan J M
2018-05-01
Sequencing by translocating DNA fragments through an array of nanopores is a rapidly maturing technology that offers faster and cheaper sequencing than other approaches. However, accurately deciphering the DNA sequence from the noisy and complex electrical signal is challenging. Here, we report Chiron, the first deep learning model to achieve end-to-end basecalling and directly translate the raw signal to DNA sequence without the error-prone segmentation step. Trained with only a small set of 4,000 reads, we show that our model provides state-of-the-art basecalling accuracy, even on previously unseen species. Chiron achieves basecalling speeds of more than 2,000 bases per second using desktop computer graphics processing units.
Calvo, Sarah E; Tucker, Elena J; Compton, Alison G; Kirby, Denise M; Crawford, Gabriel; Burtt, Noel P; Rivas, Manuel A; Guiducci, Candace; Bruno, Damien L; Goldberger, Olga A; Redman, Michelle C; Wiltshire, Esko; Wilson, Callum J; Altshuler, David; Gabriel, Stacey B; Daly, Mark J; Thorburn, David R; Mootha, Vamsi K
2010-01-01
Discovering the molecular basis of mitochondrial respiratory chain disease is challenging given the large number of both mitochondrial and nuclear genes involved. We report a strategy of focused candidate gene prediction, high-throughput sequencing, and experimental validation to uncover the molecular basis of mitochondrial complex I (CI) disorders. We created five pools of DNA from a cohort of 103 patients and then performed deep sequencing of 103 candidate genes to spotlight 151 rare variants predicted to impact protein function. We used confirmatory experiments to establish genetic diagnoses in 22% of previously unsolved cases, and discovered that defects in NUBPL and FOXRED1 can cause CI deficiency. Our study illustrates how large-scale sequencing, coupled with functional prediction and experimental validation, can reveal novel disease-causing mutations in individual patients. PMID:20818383
Tsuchiya, Mariko; Amano, Kojiro; Abe, Masaya; Seki, Misato; Hase, Sumitaka; Sato, Kengo; Sakakibara, Yasubumi
2016-06-15
Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs. We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5'-end processing and 3'-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain. The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA004502. yasu@bio.keio.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Evaluation of non-coding variation in GLUT1 deficiency.
Liu, Yu-Chi; Lee, Jia Wei Audrey; Bellows, Susannah T; Damiano, John A; Mullen, Saul A; Berkovic, Samuel F; Bahlo, Melanie; Scheffer, Ingrid E; Hildebrand, Michael S
2016-12-01
Loss-of-function mutations in SLC2A1, encoding glucose transporter-1 (GLUT-1), lead to dysfunction of glucose transport across the blood-brain barrier. Ten percent of cases with hypoglycorrhachia (fasting cerebrospinal fluid [CSF] glucose <2.2mmol/L) do not have mutations. We hypothesized that GLUT1 deficiency could be due to non-coding SLC2A1 variants. We performed whole exome sequencing of one proband with a GLUT1 phenotype and hypoglycorrhachia negative for SLC2A1 sequencing and copy number variants. We studied a further 55 patients with different epilepsies and low CSF glucose who did not have exonic mutations or copy number variants. We sequenced non-coding promoter and intronic regions. We performed mRNA studies for the recurrent intronic variant. The proband had a de novo splice site mutation five base pairs from the intron-exon boundary. Three of 55 patients had deep intronic SLC2A1 variants, including a recurrent variant in two. The recurrent variant produced less SLC2A1 mRNA transcript. Fasting CSF glucose levels show an age-dependent correlation, which makes the definition of hypoglycorrhachia challenging. Low CSF glucose levels may be associated with pathogenic SLC2A1 mutations including deep intronic SLC2A1 variants. Extending genetic screening to non-coding regions will enable diagnosis of more patients with GLUT1 deficiency, allowing implementation of the ketogenic diet to improve outcomes. © 2016 Mac Keith Press.
Chemoresistance Evolution in Triple-Negative Breast Cancer Delineated by Single-Cell Sequencing.
Kim, Charissa; Gao, Ruli; Sei, Emi; Brandt, Rachel; Hartman, Johan; Hatschek, Thomas; Crosetto, Nicola; Foukakis, Theodoros; Navin, Nicholas E
2018-05-03
Triple-negative breast cancer (TNBC) is an aggressive subtype that frequently develops resistance to chemotherapy. An unresolved question is whether resistance is caused by the selection of rare pre-existing clones or alternatively through the acquisition of new genomic aberrations. To investigate this question, we applied single-cell DNA and RNA sequencing in addition to bulk exome sequencing to profile longitudinal samples from 20 TNBC patients during neoadjuvant chemotherapy (NAC). Deep-exome sequencing identified 10 patients in which NAC led to clonal extinction and 10 patients in which clones persisted after treatment. In 8 patients, we performed a more detailed study using single-cell DNA sequencing to analyze 900 cells and single-cell RNA sequencing to analyze 6,862 cells. Our data showed that resistant genotypes were pre-existing and adaptively selected by NAC, while transcriptional profiles were acquired by reprogramming in response to chemotherapy in TNBC patients. Copyright © 2018 Elsevier Inc. All rights reserved.
Avsec, Žiga; Cheng, Jun; Gagneur, Julien
2018-01-01
Abstract Motivation Regulatory sequences are not solely defined by their nucleic acid sequence but also by their relative distances to genomic landmarks such as transcription start site, exon boundaries or polyadenylation site. Deep learning has become the approach of choice for modeling regulatory sequences because of its strength to learn complex sequence features. However, modeling relative distances to genomic landmarks in deep neural networks has not been addressed. Results Here we developed spline transformation, a neural network module based on splines to flexibly and robustly model distances. Modeling distances to various genomic landmarks with spline transformations significantly increased state-of-the-art prediction accuracy of in vivo RNA-binding protein binding sites for 120 out of 123 proteins. We also developed a deep neural network for human splice branchpoint based on spline transformations that outperformed the current best, already distance-based, machine learning model. Compared to piecewise linear transformation, as obtained by composition of rectified linear units, spline transformation yields higher prediction accuracy as well as faster and more robust training. As spline transformation can be applied to further quantities beyond distances, such as methylation or conservation, we foresee it as a versatile component in the genomics deep learning toolbox. Availability and implementation Spline transformation is implemented as a Keras layer in the CONCISE python package: https://github.com/gagneurlab/concise. Analysis code is available at https://github.com/gagneurlab/Manuscript_Avsec_Bioinformatics_2017. Contact avsec@in.tum.de or gagneur@in.tum.de Supplementary information Supplementary data are available at Bioinformatics online. PMID:29155928
MUFOLD-SS: New deep inception-inside-inception networks for protein secondary structure prediction.
Fang, Chao; Shang, Yi; Xu, Dong
2018-05-01
Protein secondary structure prediction can provide important information for protein 3D structure prediction and protein functions. Deep learning offers a new opportunity to significantly improve prediction accuracy. In this article, a new deep neural network architecture, named the Deep inception-inside-inception (Deep3I) network, is proposed for protein secondary structure prediction and implemented as a software tool MUFOLD-SS. The input to MUFOLD-SS is a carefully designed feature matrix corresponding to the primary amino acid sequence of a protein, which consists of a rich set of information derived from individual amino acid, as well as the context of the protein sequence. Specifically, the feature matrix is a composition of physio-chemical properties of amino acids, PSI-BLAST profile, and HHBlits profile. MUFOLD-SS is composed of a sequence of nested inception modules and maps the input matrix to either eight states or three states of secondary structures. The architecture of MUFOLD-SS enables effective processing of local and global interactions between amino acids in making accurate prediction. In extensive experiments on multiple datasets, MUFOLD-SS outperformed the best existing methods and other deep neural networks significantly. MUFold-SS can be downloaded from http://dslsrv8.cs.missouri.edu/~cf797/MUFoldSS/download.html. © 2018 Wiley Periodicals, Inc.
Beltman, Joost B; Urbanus, Jos; Velds, Arno; van Rooij, Nienke; Rohr, Jan C; Naik, Shalin H; Schumacher, Ton N
2016-04-02
Next generation sequencing (NGS) of amplified DNA is a powerful tool to describe genetic heterogeneity within cell populations that can both be used to investigate the clonal structure of cell populations and to perform genetic lineage tracing. For applications in which both abundant and rare sequences are biologically relevant, the relatively high error rate of NGS techniques complicates data analysis, as it is difficult to distinguish rare true sequences from spurious sequences that are generated by PCR or sequencing errors. This issue, for instance, applies to cellular barcoding strategies that aim to follow the amount and type of offspring of single cells, by supplying these with unique heritable DNA tags. Here, we use genetic barcoding data from the Illumina HiSeq platform to show that straightforward read threshold-based filtering of data is typically insufficient to filter out spurious barcodes. Importantly, we demonstrate that specific sequencing errors occur at an approximately constant rate across different samples that are sequenced in parallel. We exploit this observation by developing a novel approach to filter out spurious sequences. Application of our new method demonstrates its value in the identification of true sequences amongst spurious sequences in biological data sets.
Fungal diversity in deep-sea sediments of a hydrothermal vent system in the Southwest Indian Ridge
NASA Astrophysics Data System (ADS)
Xu, Wei; Gong, Lin-feng; Pang, Ka-Lai; Luo, Zhu-Hua
2018-01-01
Deep-sea hydrothermal sediment is known to support remarkably diverse microbial consortia. In deep sea environments, fungal communities remain less studied despite their known taxonomic and functional diversity. High-throughput sequencing methods have augmented our capacity to assess eukaryotic diversity and their functions in microbial ecology. Here we provide the first description of the fungal community diversity found in deep sea sediments collected at the Southwest Indian Ridge (SWIR) using culture-dependent and high-throughput sequencing approaches. A total of 138 fungal isolates were cultured from seven different sediment samples using various nutrient media, and these isolates were identified to 14 fungal taxa, including 11 Ascomycota taxa (7 genera) and 3 Basidiomycota taxa (2 genera) based on internal transcribed spacers (ITS1, ITS2 and 5.8S) of rDNA. Using illumina HiSeq sequencing, a total of 757,467 fungal ITS2 tags were recovered from the samples and clustered into 723 operational taxonomic units (OTUs) belonging to 79 taxa (Ascomycota and Basidiomycota contributed to 99% of all samples) based on 97% sequence similarity. Results from both approaches suggest that there is a high fungal diversity in the deep-sea sediments collected in the SWIR and fungal communities were shown to be slightly different by location, although all were collected from adjacent sites at the SWIR. This study provides baseline data of the fungal diversity and biogeography, and a glimpse to the microbial ecology associated with the deep-sea sediments of the hydrothermal vent system of the Southwest Indian Ridge.
Contamination Tracer Testing With Seabed Rock Drills: IODP Expedition 357
NASA Astrophysics Data System (ADS)
Orcutt, B.; Bergenthal, M.; Freudenthal, T.; Smith, D. J.; Lilley, M. D.; Schneiders, L.; Fruh-Green, G. L.
2016-12-01
IODP Expedition 357 utilized seabed rock drills for the first time in the history of the ocean drilling program, with the aim of collecting intact core of shallow mantle sequences from the Atlantis Massif to examine serpentinization processes and the deep biosphere. This new drilling approach required the development of a new system for delivering synthetic tracers during drilling to assess for possible sample contamination. Here, we describe this new tracer delivery system, assess the performance of the system during the expedition, provide an overview of the quality of the core samples collected for deep biosphere investigations based on tracer concentrations, and make recommendations for future applications of the system.
Contamination tracer testing with seabed drills: IODP Expedition 357
NASA Astrophysics Data System (ADS)
Orcutt, Beth N.; Bergenthal, Markus; Freudenthal, Tim; Smith, David; Lilley, Marvin D.; Schnieders, Luzie; Green, Sophie; Früh-Green, Gretchen L.
2017-11-01
IODP Expedition 357 utilized seabed drills for the first time in the history of the ocean drilling program, with the aim of collecting intact sequences of shallow mantle core from the Atlantis Massif to examine serpentinization processes and the deep biosphere. This novel drilling approach required the development of a new remote seafloor system for delivering synthetic tracers during drilling to assess for possible sample contamination. Here, we describe this new tracer delivery system, assess the performance of the system during the expedition, provide an overview of the quality of the core samples collected for deep biosphere investigations based on tracer concentrations, and make recommendations for future applications of the system.
2012-01-01
Background MicroRNA (miRNA) is a class of functional non-coding small RNA with 19-25 nucleotides in length while Amur grape (Vitis amurensis Rupr.) is an important wild fruit crop with the strongest cold resistance among the Vitis species, is used as an excellent breeding parent for grapevine, and has elicited growing interest in wine production. To date, there is a relatively large number of grapevine miRNAs (vv-miRNAs) from cultivated grapevine varieties such as Vitis vinifera L. and hybrids of V. vinifera and V. labrusca, but there is no report on miRNAs from Vitis amurensis Rupr, a wild grapevine species. Results A small RNA library from Amur grape was constructed and Solexa technology used to perform deep sequencing of the library followed by subsequent bioinformatics analysis to identify new miRNAs. In total, 126 conserved miRNAs belonging to 27 miRNA families were identified, and 34 known but non-conserved miRNAs were also found. Significantly, 72 new potential Amur grape-specific miRNAs were discovered. The sequences of these new potential va-miRNAs were further validated through miR-RACE, and accumulation of 18 new va-miRNAs in seven tissues of grapevines confirmed by real time RT-PCR (qRT-PCR) analysis. The expression levels of va-miRNAs in flowers and berries were found to be basically consistent in identity to those from deep sequenced sRNAs libraries of combined corresponding tissues. We also describe the conservation and variation of va-miRNAs using miR-SNPs and miR-LDs during plant evolution based on comparison of orthologous sequences, and further reveal that the number and sites of miR-SNP in diverse miRNA families exhibit distinct divergence. Finally, 346 target genes for the new miRNAs were predicted and they include a number of Amur grape stress tolerance genes and many genes regulating anthocyanin synthesis and sugar metabolism. Conclusions Deep sequencing of short RNAs from Amur grape flowers and berries identified 72 new potential miRNAs and 34 known but non-conserved miRNAs, indicating that specific miRNAs exist in Amur grape. These results show that a number of regulatory miRNAs exist in Amur grape and play an important role in Amur grape growth, development, and response to abiotic or biotic stress. PMID:22455456
2010-01-01
Background Bathymodiolus azoricus is a deep-sea hydrothermal vent mussel found in association with large faunal communities living in chemosynthetic environments at the bottom of the sea floor near the Azores Islands. Investigation of the exceptional physiological reactions that vent mussels have adopted in their habitat, including responses to environmental microbes, remains a difficult challenge for deep-sea biologists. In an attempt to reveal genes potentially involved in the deep-sea mussel innate immunity we carried out a high-throughput sequence analysis of freshly collected B. azoricus transcriptome using gills tissues as the primary source of immune transcripts given its strategic role in filtering the surrounding waterborne potentially infectious microorganisms. Additionally, a substantial EST data set was produced and from which a comprehensive collection of genes coding for putative proteins was organized in a dedicated database, "DeepSeaVent" the first deep-sea vent animal transcriptome database based on the 454 pyrosequencing technology. Results A normalized cDNA library from gills tissue was sequenced in a full 454 GS-FLX run, producing 778,996 sequencing reads. Assembly of the high quality reads resulted in 75,407 contigs of which 3,071 were singletons. A total of 39,425 transcripts were conceptually translated into amino-sequences of which 22,023 matched known proteins in the NCBI non-redundant protein database, 15,839 revealed conserved protein domains through InterPro functional classification and 9,584 were assigned with Gene Ontology terms. Queries conducted within the database enabled the identification of genes putatively involved in immune and inflammatory reactions which had not been previously evidenced in the vent mussel. Their physical counterpart was confirmed by semi-quantitative quantitative Reverse-Transcription-Polymerase Chain Reactions (RT-PCR) and their RNA transcription level by quantitative PCR (qPCR) experiments. Conclusions We have established the first tissue transcriptional analysis of a deep-sea hydrothermal vent animal and generated a searchable catalog of genes that provides a direct method of identifying and retrieving vast numbers of novel coding sequences which can be applied in gene expression profiling experiments from a non-conventional model organism. This provides the most comprehensive sequence resource for identifying novel genes currently available for a deep-sea vent organism, in particular, genes putatively involved in immune and inflammatory reactions in vent mussels. The characterization of the B. azoricus transcriptome will facilitate research into biological processes underlying physiological adaptations to hydrothermal vent environments and will provide a basis for expanding our understanding of genes putatively involved in adaptations processes during post-capture long term acclimatization experiments, at "sea-level" conditions, using B. azoricus as a model organism. PMID:20937131
Kulmanov, Maxat; Khan, Mohammed Asif; Hoehndorf, Robert; Wren, Jonathan
2018-02-15
A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein-protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations. Web server: http://deepgo.bio2vec.net, Source code: https://github.com/bio-ontology-research-group/deepgo. robert.hoehndorf@kaust.edu.sa. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Gibson, Richard M.; Meyer, Ashley M.; Winner, Dane; Archer, John; Feyertag, Felix; Ruiz-Mateos, Ezequiel; Leal, Manuel; Robertson, David L.; Schmotzer, Christine L.
2014-01-01
With 29 individual antiretroviral drugs available from six classes that are approved for the treatment of HIV-1 infection, a combination of different phenotypic and genotypic tests is currently needed to monitor HIV-infected individuals. In this study, we developed a novel HIV-1 genotypic assay based on deep sequencing (DeepGen HIV) to simultaneously assess HIV-1 susceptibilities to all drugs targeting the three viral enzymes and to predict HIV-1 coreceptor tropism. Patient-derived gag-p2/NCp7/p1/p6/pol-PR/RT/IN- and env-C2V3 PCR products were sequenced using the Ion Torrent Personal Genome Machine. Reads spanning the 3′ end of the Gag, protease (PR), reverse transcriptase (RT), integrase (IN), and V3 regions were extracted, truncated, translated, and assembled for genotype and HIV-1 coreceptor tropism determination. DeepGen HIV consistently detected both minority drug-resistant viruses and non-R5 HIV-1 variants from clinical specimens with viral loads of ≥1,000 copies/ml and from B and non-B subtypes. Additional mutations associated with resistance to PR, RT, and IN inhibitors, previously undetected by standard (Sanger) population sequencing, were reliably identified at frequencies as low as 1%. DeepGen HIV results correlated with phenotypic (original Trofile, 92%; enhanced-sensitivity Trofile assay [ESTA], 80%; TROCAI, 81%; and VeriTrop, 80%) and genotypic (population sequencing/Geno2Pheno with a 10% false-positive rate [FPR], 84%) HIV-1 tropism test results. DeepGen HIV (83%) and Trofile (85%) showed similar concordances with the clinical response following an 8-day course of maraviroc monotherapy (MCT). In summary, this novel all-inclusive HIV-1 genotypic and coreceptor tropism assay, based on deep sequencing of the PR, RT, IN, and V3 regions, permits simultaneous multiplex detection of low-level drug-resistant and/or non-R5 viruses in up to 96 clinical samples. This comprehensive test, the first of its class, will be instrumental in the development of new antiretroviral drugs and, more importantly, will aid in the treatment and management of HIV-infected individuals. PMID:24468782
Dendrites, deep learning, and sequences in the hippocampus.
Bhalla, Upinder S
2017-10-12
The hippocampus places us both in time and space. It does so over remarkably large spans: milliseconds to years, and centimeters to kilometers. This works for sensory representations, for memory, and for behavioral context. How does it fit in such wide ranges of time and space scales, and keep order among the many dimensions of stimulus context? A key organizing principle for a wide sweep of scales and stimulus dimensions is that of order in time, or sequences. Sequences of neuronal activity are ubiquitous in sensory processing, in motor control, in planning actions, and in memory. Against this strong evidence for the phenomenon, there are currently more models than definite experiments about how the brain generates ordered activity. The flip side of sequence generation is discrimination. Discrimination of sequences has been extensively studied at the behavioral, systems, and modeling level, but again physiological mechanisms are fewer. It is against this backdrop that I discuss two recent developments in neural sequence computation, that at face value share little beyond the label "neural." These are dendritic sequence discrimination, and deep learning. One derives from channel physiology and molecular signaling, the other from applied neural network theory - apparently extreme ends of the spectrum of neural circuit detail. I suggest that each of these topics has deep lessons about the possible mechanisms, scales, and capabilities of hippocampal sequence computation. © 2017 Wiley Periodicals, Inc.
A robust and cost-effective approach to sequence and analyze complete genomes of small RNA viruses
USDA-ARS?s Scientific Manuscript database
Background: Next-generation sequencing (NGS) allows ultra-deep sequencing of nucleic acids. The use of sequence-independent amplification of viral nucleic acids without utilization of target-specific primers provides advantages over traditional sequencing methods and allows detection of unsuspected ...
De novo transcriptome assembly and positive selection analysis of an individual deep-sea fish.
Lan, Yi; Sun, Jin; Xu, Ting; Chen, Chong; Tian, Renmao; Qiu, Jian-Wen; Qian, Pei-Yuan
2018-05-24
High hydrostatic pressure and low temperatures make the deep sea a harsh environment for life forms. Actin organization and microtubules assembly, which are essential for intracellular transport and cell motility, can be disrupted by high hydrostatic pressure. High hydrostatic pressure can also damage DNA. Nucleic acids exposed to low temperatures can form secondary structures that hinder genetic information processing. To study how deep-sea creatures adapt to such a hostile environment, one of the most straightforward ways is to sequence and compare their genes with those of their shallow-water relatives. We captured an individual of the fish species Aldrovandia affinis, which is a typical deep-sea inhabitant, from the Okinawa Trough at a depth of 1550 m using a remotely operated vehicle (ROV). We sequenced its transcriptome and analyzed its molecular adaptation. We obtained 27,633 protein coding sequences using an Illumina platform and compared them with those of several shallow-water fish species. Analysis of 4918 single-copy orthologs identified 138 positively selected genes in A. affinis, including genes involved in microtubule regulation. Particularly, functional domains related to cold shock as well as DNA repair are exposed to positive selection pressure in both deep-sea fish and hadal amphipod. Overall, we have identified a set of positively selected genes related to cytoskeleton structures, DNA repair and genetic information processing, which shed light on molecular adaptation to the deep sea. These results suggest that amino acid substitutions of these positively selected genes may contribute crucially to the adaptation of deep-sea animals. Additionally, we provide a high-quality transcriptome of a deep-sea fish for future deep-sea studies.
Gregor, Ivan; Dröge, Johannes; Schirmer, Melanie; Quince, Christopher; McHardy, Alice C
2016-01-01
Background. Metagenomics is an approach for characterizing environmental microbial communities in situ, it allows their functional and taxonomic characterization and to recover sequences from uncultured taxa. This is often achieved by a combination of sequence assembly and binning, where sequences are grouped into 'bins' representing taxa of the underlying microbial community. Assignment to low-ranking taxonomic bins is an important challenge for binning methods as is scalability to Gb-sized datasets generated with deep sequencing techniques. One of the best available methods for species bins recovery from deep-branching phyla is the expert-trained PhyloPythiaS package, where a human expert decides on the taxa to incorporate in the model and identifies 'training' sequences based on marker genes directly from the sample. Due to the manual effort involved, this approach does not scale to multiple metagenome samples and requires substantial expertise, which researchers who are new to the area do not have. Results. We have developed PhyloPythiaS+, a successor to our PhyloPythia(S) software. The new (+) component performs the work previously done by the human expert. PhyloPythiaS+ also includes a new k-mer counting algorithm, which accelerated the simultaneous counting of 4-6-mers used for taxonomic binning 100-fold and reduced the overall execution time of the software by a factor of three. Our software allows to analyze Gb-sized metagenomes with inexpensive hardware, and to recover species or genera-level bins with low error rates in a fully automated fashion. PhyloPythiaS+ was compared to MEGAN, taxator-tk, Kraken and the generic PhyloPythiaS model. The results showed that PhyloPythiaS+ performs especially well for samples originating from novel environments in comparison to the other methods. Availability. PhyloPythiaS+ in a virtual machine is available for installation under Windows, Unix systems or OS X on: https://github.com/algbioi/ppsp/wiki.
Sohlberg, Elina; Bomberg, Malin; Miettinen, Hanna; Nyyssönen, Mari; Salavirta, Heikki; Vikman, Minna; Itävaara, Merja
2015-01-01
The diversity and functional role of fungi, one of the ecologically most important groups of eukaryotic microorganisms, remains largely unknown in deep biosphere environments. In this study we investigated fungal communities in packer-isolated bedrock fractures in Olkiluoto, Finland at depths ranging from 296 to 798 m below surface level. DNA- and cDNA-based high-throughput amplicon sequencing analysis of the fungal internal transcribed spacer (ITS) gene markers was used to examine the total fungal diversity and to identify the active members in deep fracture zones at different depths. Results showed that fungi were present in fracture zones at all depths and fungal diversity was higher than expected. Most of the observed fungal sequences belonged to the phylum Ascomycota. Phyla Basidiomycota and Chytridiomycota were only represented as a minor part of the fungal community. Dominating fungal classes in the deep bedrock aquifers were Sordariomycetes, Eurotiomycetes, and Dothideomycetes from the Ascomycota phylum and classes Microbotryomycetes and Tremellomycetes from the Basidiomycota phylum, which are the most frequently detected fungal taxa reported also from deep sea environments. In addition some fungal sequences represented potentially novel fungal species. Active fungi were detected in most of the fracture zones, which proves that fungi are able to maintain cellular activity in these oligotrophic conditions. Possible roles of fungi and their origin in deep bedrock groundwater can only be speculated in the light of current knowledge but some species may be specifically adapted to deep subsurface environment and may play important roles in the utilization and recycling of nutrients and thus sustaining the deep subsurface microbial community.
Ultra-deep mutant spectrum profiling: improving sequencing accuracy using overlapping read pairs.
Chen-Harris, Haiyin; Borucki, Monica K; Torres, Clinton; Slezak, Tom R; Allen, Jonathan E
2013-02-12
High throughput sequencing is beginning to make a transformative impact in the area of viral evolution. Deep sequencing has the potential to reveal the mutant spectrum within a viral sample at high resolution, thus enabling the close examination of viral mutational dynamics both within- and between-hosts. The challenge however, is to accurately model the errors in the sequencing data and differentiate real viral mutations, particularly those that exist at low frequencies, from sequencing errors. We demonstrate that overlapping read pairs (ORP) -- generated by combining short fragment sequencing libraries and longer sequencing reads -- significantly reduce sequencing error rates and improve rare variant detection accuracy. Using this sequencing protocol and an error model optimized for variant detection, we are able to capture a large number of genetic mutations present within a viral population at ultra-low frequency levels (<0.05%). Our rare variant detection strategies have important implications beyond viral evolution and can be applied to any basic and clinical research area that requires the identification of rare mutations.
NASA Astrophysics Data System (ADS)
Chen, Xinyuan; Song, Li; Yang, Xiaokang
2016-09-01
Video denoising can be described as the problem of mapping from a specific length of noisy frames to clean one. We propose a deep architecture based on Recurrent Neural Network (RNN) for video denoising. The model learns a patch-based end-to-end mapping between the clean and noisy video sequences. It takes the corrupted video sequences as the input and outputs the clean one. Our deep network, which we refer to as deep Recurrent Neural Networks (deep RNNs or DRNNs), stacks RNN layers where each layer receives the hidden state of the previous layer as input. Experiment shows (i) the recurrent architecture through temporal domain extracts motion information and does favor to video denoising, and (ii) deep architecture have large enough capacity for expressing mapping relation between corrupted videos as input and clean videos as output, furthermore, (iii) the model has generality to learned different mappings from videos corrupted by different types of noise (e.g., Poisson-Gaussian noise). By training on large video databases, we are able to compete with some existing video denoising methods.
Identification of Androgen Receptor-Specific Enhancer RNAs
2016-06-01
Release; Distribution Unlimited The views, opinions and/or findings contained in this report are those of the author(s) and should not be construed as...There are two Tasks in this application. First, we will perform global run-on assay and deep sequencing to identify AR -specific enhancer RNAs. Second...1 Introduction The androgen receptor ( AR ) is a nuclear receptor transcription factor required for normal prostate development and prostate
Deng, Lei; Fan, Chao; Zeng, Zhiwen
2017-12-28
Direct prediction of the three-dimensional (3D) structures of proteins from one-dimensional (1D) sequences is a challenging problem. Significant structural characteristics such as solvent accessibility and contact number are essential for deriving restrains in modeling protein folding and protein 3D structure. Thus, accurately predicting these features is a critical step for 3D protein structure building. In this study, we present DeepSacon, a computational method that can effectively predict protein solvent accessibility and contact number by using a deep neural network, which is built based on stacked autoencoder and a dropout method. The results demonstrate that our proposed DeepSacon achieves a significant improvement in the prediction quality compared with the state-of-the-art methods. We obtain 0.70 three-state accuracy for solvent accessibility, 0.33 15-state accuracy and 0.74 Pearson Correlation Coefficient (PCC) for the contact number on the 5729 monomeric soluble globular protein dataset. We also evaluate the performance on the CASP11 benchmark dataset, DeepSacon achieves 0.68 three-state accuracy and 0.69 PCC for solvent accessibility and contact number, respectively. We have shown that DeepSacon can reliably predict solvent accessibility and contact number with stacked sparse autoencoder and a dropout approach.
Wu, Jieying; Gao, Weimin; Zhang, Weiwen; Meldrum, Deirdre R
2011-01-01
Limitation in sample quality and quantity is one of the big obstacles for applying metatranscriptomic technologies to explore gene expression and functionality of microbial communities in natural environments. In this study, several amplification methods were evaluated for whole-transcriptome amplification of deep-sea microbial samples, which are of low cell density and high impurity. The best amplification method was identified and incorporated into a complete protocol to isolate and amplify deep-sea microbial samples. In the protocol, total RNA was first isolated by a modified method combining Trizol (Invitrogen, CA) and RNeasy (QIAGEN, CA) method, amplified with a WT-Ovation™ Pico RNA Amplification System (NuGEN, CA), and then converted to double-strand DNA from single-strand cDNA with a WT-Ovation™ Exon Module (NuGEN, CA). The products from the whole-transcriptome amplification of deep-sea microbial samples were assessed first through random clone library sequencing. The BLAST search results showed that marine-based sequences are dominant in the libraries, consistent with the ecological source of the samples. The products were then used for next-generation Roche GS FLX Titanium sequencing to obtain metatranscriptome data. Preliminary analysis of the metatranscriptomic data showed good sequencing quality. Although the protocol was designed and demonstrated to be effective for deep-sea microbial samples, it should be applicable to similar samples from other extreme environments in exploring community structure and functionality of microbial communities. Copyright © 2010 Elsevier B.V. All rights reserved.
Zeng, Cong; Thomas, Leighton J; Kelly, Michelle; Gardner, Jonathan P A
2016-05-01
The complete mitochondrial genome of a New Zealand specimen of the deep-sea sponge Poecillastra laminaris (Sollas, 1886) (Astrophorida, Vulcanellidae), from the Colville Ridge, New Zealand, was sequenced using the 454 Life Science pyrosequencing system. To identify homologous mitochondrial sequences, the 454 reads were mapped to the complete mitochondrial genome sequence of Geodia neptuni (GeneBank No. NC_006990). The P. laminaris genome is 18,413 bp in length and includes 14 protein-coding genes, 24 transfer RNA genes and 2 ribosomal RNA genes. Gene order resembled that of other demosponges. The base composition of the genome is A (29.1%), T (35.2%), C (14.0%) and G (21.7%). This is the second published mitogenome for a sponge of the order Astrophorida and will be useful in future phylogenetic analysis of deep-sea sponges.
Wang, Duolin; Zeng, Shuai; Xu, Chunhui; Qiu, Wangren; Liang, Yanchun; Joshi, Trupti; Xu, Dong
2017-12-15
Computational methods for phosphorylation site prediction play important roles in protein function studies and experimental design. Most existing methods are based on feature extraction, which may result in incomplete or biased features. Deep learning as the cutting-edge machine learning method has the ability to automatically discover complex representations of phosphorylation patterns from the raw sequences, and hence it provides a powerful tool for improvement of phosphorylation site prediction. We present MusiteDeep, the first deep-learning framework for predicting general and kinase-specific phosphorylation sites. MusiteDeep takes raw sequence data as input and uses convolutional neural networks with a novel two-dimensional attention mechanism. It achieves over a 50% relative improvement in the area under the precision-recall curve in general phosphorylation site prediction and obtains competitive results in kinase-specific prediction compared to other well-known tools on the benchmark data. MusiteDeep is provided as an open-source tool available at https://github.com/duolinwang/MusiteDeep. xudong@missouri.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Fungal diversity in deep-sea sediments associated with asphalt seeps at the Sao Paulo Plateau
NASA Astrophysics Data System (ADS)
Nagano, Yuriko; Miura, Toshiko; Nishi, Shinro; Lima, Andre O.; Nakayama, Cristina; Pellizari, Vivian H.; Fujikura, Katsunori
2017-12-01
We investigated the fungal diversity in a total of 20 deep-sea sediment samples (of which 14 samples were associated with natural asphalt seeps and 6 samples were not associated) collected from two different sites at the Sao Paulo Plateau off Brazil by Ion Torrent PGM targeting ITS region of ribosomal RNA. Our results suggest that diverse fungi (113 operational taxonomic units (OTUs) based on clustering at 97% sequence similarity assigned into 9 classes and 31 genus) are present in deep-sea sediment samples collected at the Sao Paulo Plateau, dominated by Ascomycota (74.3%), followed by Basidiomycota (11.5%), unidentified fungi (7.1%), and sequences with no affiliation to any organisms in the public database (7.1%). However, it was revealed that only three species, namely Penicillium sp., Cadophora malorum and Rhodosporidium diobovatum, were dominant, with the majority of OTUs remaining a minor community. Unexpectedly, there was no significant difference in major fungal community structure between the asphalt seep and non-asphalt seep sites, despite the presence of mass hydrocarbon deposits and the high amount of macro organisms surrounding the asphalt seeps. However, there were some differences in the minor fungal communities, with possible asphalt degrading fungi present specifically in the asphalt seep sites. In contrast, some differences were found between the two different sampling sites. Classification of OTUs revealed that only 47 (41.6%) fungal OTUs exhibited >97% sequence similarity, in comparison with pre-existing ITS sequences in public databases, indicating that a majority of deep-sea inhabiting fungal taxa still remain undescribed. Although our knowledge on fungi and their role in deep-sea environments is still limited and scarce, this study increases our understanding of fungal diversity and community structure in deep-sea environments.
Identifying MicroRNAs and Transcript Targets in Jatropha Seeds
Galli, Vanessa; Guzman, Frank; de Oliveira, Luiz F. V.; Loss-Morais, Guilherme; Körbes, Ana P.; Silva, Sérgio D. A.; Margis-Pinheiro, Márcia M. A. N.; Margis, Rogério
2014-01-01
MicroRNAs, or miRNAs, are endogenously encoded small RNAs that play a key role in diverse plant biological processes. Jatropha curcas L. has received significant attention as a potential oilseed crop for the production of renewable oil. Here, a sRNA library of mature seeds and three mRNA libraries from three different seed development stages were generated by deep sequencing to identify and characterize the miRNAs and pre-miRNAs of J. curcas. Computational analysis was used for the identification of 180 conserved miRNAs and 41 precursors (pre-miRNAs) as well as 16 novel pre-miRNAs. The predicted miRNA target genes are involved in a broad range of physiological functions, including cellular structure, nuclear function, translation, transport, hormone synthesis, defense, and lipid metabolism. Some pre-miRNA and miRNA targets vary in abundance between the three stages of seed development. A search for sequences that produce siRNA was performed, and the results indicated that J. curcas siRNAs play a role in nuclear functions, transport, catalytic processes and disease resistance. This study presents the first large scale identification of J. curcas miRNAs and their targets in mature seeds based on deep sequencing, and it contributes to a functional understanding of these miRNAs. PMID:24551031
Takai, Erina; Totoki, Yasushi; Nakamura, Hiromi; Kato, Mamoru; Shibata, Tatsuhiro; Yachida, Shinichi
2016-01-01
Pancreatic ductal adenocarcinoma (PDAC) remains one of the most lethal malignancies. The genomic landscape of the PDAC genome features four frequently mutated genes (KRAS, CDKN2A, TP53, and SMAD4) and dozens of candidate driver genes altered at low frequency, including potential clinical targets. Circulating cell-free DNA (cfDNA) is a promising resource to detect molecular characteristics of tumors, supporting the concept of "liquid biopsy".We determined the mutational status of KRAS in plasma cfDNA using multiplex droplet digital PCR in 259 patients with PDAC, retrospectively. Furthermore, we constructed a novel modified SureSelect-KAPA-Illumina platform and an original panel of 60 genes. We then performed targeted deep sequencing of cfDNA in 48 patients who had ≥1 % mutant allele frequencies of KRAS in plasma cfDNA.Droplet digital PCR detected KRAS mutations in plasma cfDNA in 63 of 107 (58.9 %) patients with inoperable tumors. Importantly, potentially targetable somatic mutations were identified in 14 of 48 patients (29.2 %) examined by cfDNA sequencing.Our two-step approach with plasma cfDNA, combining droplet digital PCR and targeted deep sequencing, is a feasible clinical approach. Assessment of mutations in plasma cfDNA may provide a new diagnostic tool, assisting decisions for optimal therapeutic strategies for PDAC patients.
Clinical utility of circulating tumor DNA for molecular assessment in pancreatic cancer.
Takai, Erina; Totoki, Yasushi; Nakamura, Hiromi; Morizane, Chigusa; Nara, Satoshi; Hama, Natsuko; Suzuki, Masami; Furukawa, Eisaku; Kato, Mamoru; Hayashi, Hideyuki; Kohno, Takashi; Ueno, Hideki; Shimada, Kazuaki; Okusaka, Takuji; Nakagama, Hitoshi; Shibata, Tatsuhiro; Yachida, Shinichi
2015-12-16
Pancreatic ductal adenocarcinoma (PDAC) remains one of the most lethal malignancies. The genomic landscape of the PDAC genome features four frequently mutated genes (KRAS, CDKN2A, TP53, and SMAD4) and dozens of candidate driver genes altered at low frequency, including potential clinical targets. Circulating cell-free DNA (cfDNA) is a promising resource to detect and monitor molecular characteristics of tumors. In the present study, we determined the mutational status of KRAS in plasma cfDNA using multiplex picoliter-droplet digital PCR in 259 patients with PDAC. We constructed a novel modified SureSelect-KAPA-Illumina platform and an original panel of 60 genes. We then performed targeted deep sequencing of cfDNA and matched germline DNA samples in 48 patients who had ≥1% mutant allele frequencies of KRAS in plasma cfDNA. Importantly, potentially targetable somatic mutations were identified in 14 of 48 patients (29.2%) examined by targeted deep sequencing of cfDNA. We also analyzed somatic copy number alterations based on the targeted sequencing data using our in-house algorithm, and potentially targetable amplifications were detected. Assessment of mutations and copy number alterations in plasma cfDNA may provide a prognostic and diagnostic tool to assist decisions regarding optimal therapeutic strategies for PDAC patients.
LookSeq: a browser-based viewer for deep sequencing data.
Manske, Heinrich Magnus; Kwiatkowski, Dominic P
2009-11-01
Sequencing a genome to great depth can be highly informative about heterogeneity within an individual or a population. Here we address the problem of how to visualize the multiple layers of information contained in deep sequencing data. We propose an interactive AJAX-based web viewer for browsing large data sets of aligned sequence reads. By enabling seamless browsing and fast zooming, the LookSeq program assists the user to assimilate information at different levels of resolution, from an overview of a genomic region to fine details such as heterogeneity within the sample. A specific problem, particularly if the sample is heterogeneous, is how to depict information about structural variation. LookSeq provides a simple graphical representation of paired sequence reads that is more revealing about potential insertions and deletions than are conventional methods.
QQ-SNV: single nucleotide variant detection at low frequency by comparing the quality quantiles.
Van der Borght, Koen; Thys, Kim; Wetzels, Yves; Clement, Lieven; Verbist, Bie; Reumers, Joke; van Vlijmen, Herman; Aerssens, Jeroen
2015-11-10
Next generation sequencing enables studying heterogeneous populations of viral infections. When the sequencing is done at high coverage depth ("deep sequencing"), low frequency variants can be detected. Here we present QQ-SNV (http://sourceforge.net/projects/qqsnv), a logistic regression classifier model developed for the Illumina sequencing platforms that uses the quantiles of the quality scores, to distinguish true single nucleotide variants from sequencing errors based on the estimated SNV probability. To train the model, we created a dataset of an in silico mixture of five HIV-1 plasmids. Testing of our method in comparison to the existing methods LoFreq, ShoRAH, and V-Phaser 2 was performed on two HIV and four HCV plasmid mixture datasets and one influenza H1N1 clinical dataset. For default application of QQ-SNV, variants were called using a SNV probability cutoff of 0.5 (QQ-SNV(D)). To improve the sensitivity we used a SNV probability cutoff of 0.0001 (QQ-SNV(HS)). To also increase specificity, SNVs called were overruled when their frequency was below the 80(th) percentile calculated on the distribution of error frequencies (QQ-SNV(HS-P80)). When comparing QQ-SNV versus the other methods on the plasmid mixture test sets, QQ-SNV(D) performed similarly to the existing approaches. QQ-SNV(HS) was more sensitive on all test sets but with more false positives. QQ-SNV(HS-P80) was found to be the most accurate method over all test sets by balancing sensitivity and specificity. When applied to a paired-end HCV sequencing study, with lowest spiked-in true frequency of 0.5%, QQ-SNV(HS-P80) revealed a sensitivity of 100% (vs. 40-60% for the existing methods) and a specificity of 100% (vs. 98.0-99.7% for the existing methods). In addition, QQ-SNV required the least overall computation time to process the test sets. Finally, when testing on a clinical sample, four putative true variants with frequency below 0.5% were consistently detected by QQ-SNV(HS-P80) from different generations of Illumina sequencers. We developed and successfully evaluated a novel method, called QQ-SNV, for highly efficient single nucleotide variant calling on Illumina deep sequencing virology data.
2015-01-01
Nematodes inhabiting benthic deep-sea ecosystems account for >90% of the total metazoan abundances and they have been hypothesised to be hyper-diverse, but their biodiversity is still largely unknown. Metabarcoding could facilitate the census of biodiversity, especially for those tiny metazoans for which morphological identification is difficult. We compared, for the first time, different DNA extraction procedures based on the use of two commercial kits and a previously published laboratory protocol and tested their suitability for sequencing analyses of 18S rDNA of marine nematodes. We also investigated the reliability of Roche 454 sequencing analyses for assessing the biodiversity of deep-sea nematode assemblages previously morphologically identified. Finally, intra-genomic variation in 18S rRNA gene repeats was investigated by Illumina MiSeq in different deep-sea nematode morphospecies to assess the influence of polymorphisms on nematode biodiversity estimates. Our results indicate that the two commercial kits should be preferred for the molecular analysis of biodiversity of deep-sea nematodes since they consistently provide amplifiable DNA suitable for sequencing. We report that the morphological identification of deep-sea nematodes matches the results obtained by metabarcoding analysis only at the order-family level and that a large portion of Operational Clustered Taxonomic Units (OCTUs) was not assigned. We also show that independently from the cut-off criteria and bioinformatic pipelines used, the number of OCTUs largely exceeds the number of individuals and that 18S rRNA gene of different morpho-species of nematodes displayed intra-genomic polymorphisms. Our results indicate that metabarcoding is an important tool to explore the diversity of deep-sea nematodes, but still fails in identifying most of the species due to limited number of sequences deposited in the public databases, and in providing quantitative data on the species encountered. These aspects should be carefully taken into account before using metabarcoding in quantitative ecological research and monitoring programmes of marine biodiversity. PMID:26701112
Dell'Anno, Antonio; Carugati, Laura; Corinaldesi, Cinzia; Riccioni, Giulia; Danovaro, Roberto
2015-01-01
Nematodes inhabiting benthic deep-sea ecosystems account for >90% of the total metazoan abundances and they have been hypothesised to be hyper-diverse, but their biodiversity is still largely unknown. Metabarcoding could facilitate the census of biodiversity, especially for those tiny metazoans for which morphological identification is difficult. We compared, for the first time, different DNA extraction procedures based on the use of two commercial kits and a previously published laboratory protocol and tested their suitability for sequencing analyses of 18S rDNA of marine nematodes. We also investigated the reliability of Roche 454 sequencing analyses for assessing the biodiversity of deep-sea nematode assemblages previously morphologically identified. Finally, intra-genomic variation in 18S rRNA gene repeats was investigated by Illumina MiSeq in different deep-sea nematode morphospecies to assess the influence of polymorphisms on nematode biodiversity estimates. Our results indicate that the two commercial kits should be preferred for the molecular analysis of biodiversity of deep-sea nematodes since they consistently provide amplifiable DNA suitable for sequencing. We report that the morphological identification of deep-sea nematodes matches the results obtained by metabarcoding analysis only at the order-family level and that a large portion of Operational Clustered Taxonomic Units (OCTUs) was not assigned. We also show that independently from the cut-off criteria and bioinformatic pipelines used, the number of OCTUs largely exceeds the number of individuals and that 18S rRNA gene of different morpho-species of nematodes displayed intra-genomic polymorphisms. Our results indicate that metabarcoding is an important tool to explore the diversity of deep-sea nematodes, but still fails in identifying most of the species due to limited number of sequences deposited in the public databases, and in providing quantitative data on the species encountered. These aspects should be carefully taken into account before using metabarcoding in quantitative ecological research and monitoring programmes of marine biodiversity.
Takai, Ken; Horikoshi, Koki
1999-01-01
Molecular phylogenetic analysis of a naturally occurring microbial community in a deep-subsurface geothermal environment indicated that the phylogenetic diversity of the microbial population in the environment was extremely limited and that only hyperthermophilic archaeal members closely related to Pyrobaculum were present. All archaeal ribosomal DNA sequences contained intron-like sequences, some of which had open reading frames with repeated homing-endonuclease motifs. The sequence similarity analysis and the phylogenetic analysis of these homing endonucleases suggested the possible phylogenetic relationship among archaeal rRNA-encoded homing endonucleases. PMID:10584021
Same-day genomic and epigenomic diagnosis of brain tumors using real-time nanopore sequencing.
Euskirchen, Philipp; Bielle, Franck; Labreche, Karim; Kloosterman, Wigard P; Rosenberg, Shai; Daniau, Mailys; Schmitt, Charlotte; Masliah-Planchon, Julien; Bourdeaut, Franck; Dehais, Caroline; Marie, Yannick; Delattre, Jean-Yves; Idbaih, Ahmed
2017-11-01
Molecular classification of cancer has entered clinical routine to inform diagnosis, prognosis, and treatment decisions. At the same time, new tumor entities have been identified that cannot be defined histologically. For central nervous system tumors, the current World Health Organization classification explicitly demands molecular testing, e.g., for 1p/19q-codeletion or IDH mutations, to make an integrated histomolecular diagnosis. However, a plethora of sophisticated technologies is currently needed to assess different genomic and epigenomic alterations and turnaround times are in the range of weeks, which makes standardized and widespread implementation difficult and hinders timely decision making. Here, we explored the potential of a pocket-size nanopore sequencing device for multimodal and rapid molecular diagnostics of cancer. Low-pass whole genome sequencing was used to simultaneously generate copy number (CN) and methylation profiles from native tumor DNA in the same sequencing run. Single nucleotide variants in IDH1, IDH2, TP53, H3F3A, and the TERT promoter region were identified using deep amplicon sequencing. Nanopore sequencing yielded ~0.1X genome coverage within 6 h and resulting CN and epigenetic profiles correlated well with matched microarray data. Diagnostically relevant alterations, such as 1p/19q codeletion, and focal amplifications could be recapitulated. Using ad hoc random forests, we could perform supervised pan-cancer classification to distinguish gliomas, medulloblastomas, and brain metastases of different primary sites. Single nucleotide variants in IDH1, IDH2, and H3F3A were identified using deep amplicon sequencing within minutes of sequencing. Detection of TP53 and TERT promoter mutations shows that sequencing of entire genes and GC-rich regions is feasible. Nanopore sequencing allows same-day detection of structural variants, point mutations, and methylation profiling using a single device with negligible capital cost. It outperforms hybridization-based and current sequencing technologies with respect to time to diagnosis and required laboratory equipment and expertise, aiming to make precision medicine possible for every cancer patient, even in resource-restricted settings.
Single-Cell Whole-Genome Amplification and Sequencing: Methodology and Applications.
Huang, Lei; Ma, Fei; Chapman, Alec; Lu, Sijia; Xie, Xiaoliang Sunney
2015-01-01
We present a survey of single-cell whole-genome amplification (WGA) methods, including degenerate oligonucleotide-primed polymerase chain reaction (DOP-PCR), multiple displacement amplification (MDA), and multiple annealing and looping-based amplification cycles (MALBAC). The key parameters to characterize the performance of these methods are defined, including genome coverage, uniformity, reproducibility, unmappable rates, chimera rates, allele dropout rates, false positive rates for calling single-nucleotide variations, and ability to call copy-number variations. Using these parameters, we compare five commercial WGA kits by performing deep sequencing of multiple single cells. We also discuss several major applications of single-cell genomics, including studies of whole-genome de novo mutation rates, the early evolution of cancer genomes, circulating tumor cells (CTCs), meiotic recombination of germ cells, preimplantation genetic diagnosis (PGD), and preimplantation genomic screening (PGS) for in vitro-fertilized embryos.
Li, Jonathan Z; Chapman, Brad; Charlebois, Patrick; Hofmann, Oliver; Weiner, Brian; Porter, Alyssa J; Samuel, Reshmi; Vardhanabhuti, Saran; Zheng, Lu; Eron, Joseph; Taiwo, Babafemi; Zody, Michael C; Henn, Matthew R; Kuritzkes, Daniel R; Hide, Winston; Wilson, Cara C; Berzins, Baiba I; Acosta, Edward P; Bastow, Barbara; Kim, Peter S; Read, Sarah W; Janik, Jennifer; Meres, Debra S; Lederman, Michael M; Mong-Kryspin, Lori; Shaw, Karl E; Zimmerman, Louis G; Leavitt, Randi; De La Rosa, Guy; Jennings, Amy
2014-01-01
The impact of raltegravir-resistant HIV-1 minority variants (MVs) on raltegravir treatment failure is unknown. Illumina sequencing offers greater throughput than 454, but sequence analysis tools for viral sequencing are needed. We evaluated Illumina and 454 for the detection of HIV-1 raltegravir-resistant MVs. A5262 was a single-arm study of raltegravir and darunavir/ritonavir in treatment-naïve patients. Pre-treatment plasma was obtained from 5 participants with raltegravir resistance at the time of virologic failure. A control library was created by pooling integrase clones at predefined proportions. Multiplexed sequencing was performed with Illumina and 454 platforms at comparable costs. Illumina sequence analysis was performed with the novel snp-assess tool and 454 sequencing was analyzed with V-Phaser. Illumina sequencing resulted in significantly higher sequence coverage and a 0.095% limit of detection. Illumina accurately detected all MVs in the control library at ≥0.5% and 7/10 MVs expected at 0.1%. 454 sequencing failed to detect any MVs at 0.1% with 5 false positive calls. For MVs detected in the patient samples by both 454 and Illumina, the correlation in the detected variant frequencies was high (R2 = 0.92, P<0.001). Illumina sequencing detected 2.4-fold greater nucleotide MVs and 2.9-fold greater amino acid MVs compared to 454. The only raltegravir-resistant MV detected was an E138K mutation in one participant by Illumina sequencing, but not by 454. In participants of A5262 with raltegravir resistance at virologic failure, baseline raltegravir-resistant MVs were rarely detected. At comparable costs to 454 sequencing, Illumina demonstrated greater depth of coverage, increased sensitivity for detecting HIV MVs, and fewer false positive variant calls.
NASA Astrophysics Data System (ADS)
Yakimov, Michail M.; Cono, Violetta La; Denaro, Renata
2009-05-01
The autotrophic and ammonia-oxidizing crenarchaeal assemblage at offshore site located in the deep Mediterranean (Tyrrhenian Sea, depth 3000 m) water was studied by PCR amplification of the key functional genes involved in energy (ammonia mono-oxygenase alpha subunit, amoA) and central metabolism (acetyl-CoA carboxylase alpha subunit, accA). Using two recently annotated genomes of marine crenarchaeons, an initial set of primers targeting archaeal accA-like genes was designed. Approximately 300 clones were analyzed, of which 100% of amoA library and almost 70% of accA library were unambiguously related to the corresponding genes from marine Crenarchaeota. Even though the acetyl-CoA carboxylase is phylogenetically not well conserved and the remaining clones were affiliated to various bacterial acetyl-CoA/propionyl-CoA carboxylase genes, the pool of archaeal sequences was applied for development of quantitative PCR analysis of accA-like distribution using TaqMan ® methodolgy. The archaeal accA gene fragments, together with alignable gene fragments from the Sargasso Sea and North Pacific Subtropical Gyre (ALOHA Station) metagenome databases, were analyzed by multiple sequence alignment. Two accA-like sequences, found in ALOHA Station at the depth of 4000 m, formed a deeply branched clade with 64% of all archaeal Tyrrhenian clones. No close relatives for residual 36% of clones, except of those recovered from Eastern Mediterranean, was found, suggesting the existence of a specific lineage of the crenarchaeal accA genes in deep Mediterranean water. Alignment of Mediterranean amoA sequences defined four cosmopolitan phylotypes of Crenarchaeota putative ammonia mono-oxygenase subunit A gene occurring in the water sample from the 3000 m depth. Without exception all phylotypes fell into Deep Marine Group I cluster that contain the vast majority of known sequences recovered from global deep-sea environment. Remarkably, three phylotypes accounted for 91% of all Mediterranean amoA clones and corresponded to the sequences retrieved from the less deep compartments of the world's ocean, most likely reflecting the higher temperature at the depth of the Mediterranean Sea. In order to verify whether these phylotypes might represent important Crenarchaeota in the functioning of the Mediterranean bathypelagic ecosystem, expression of crenarchaeal amoA gene was monitored by direct RNA retrieval and following analysis of amoA-related mRNA transcripts. Surprisingly, all mRNA-derived sequences formed a tight monophyletic group, which fell into large Shallow Marine Group I cluster with sequences retrieved from shallow (up to 200 m) waters, sediments and corals. This group was not detected in DNA-based clone library, obviously, due to an overwhelming dominance of the Deep Marine Group I. The failure to recover the amoA transcripts, related to Deep Marine Group I of Crenarchaeota, was unanticipated and likely resulted from the physiology of these strongly adapted deep-sea organisms. As far as all seawater samples were treated on-board under atmospheric pressure conditions and sunlight, the decompression and/or photoinhibition likely affected their metabolic activity, followed by the strong decay of gene expression.
Lu, Zen H; Brown, Alexander; Wilson, Alison D; Calvert, Jay G; Balasch, Monica; Fuentes-Utrilla, Pablo; Loecherbach, Julia; Turner, Frances; Talbot, Richard; Archibald, Alan L; Ait-Ali, Tahar
2014-03-04
Porcine Reproductive and Respiratory Syndrome (PRRS) is a disease of major economic impact worldwide. The etiologic agent of this disease is the PRRS virus (PRRSV). Increasing evidence suggest that microevolution within a coexisting quasispecies population can give rise to high sequence heterogeneity in PRRSV. We developed a pipeline based on the ultra-deep next generation sequencing approach to first construct the complete genome of a European PRRSV, strain Olot/9, cultured on macrophages and then capture the rare variants representative of the mixed quasispecies population. Olot/91 differs from the reference Lelystad strain by about 5% and a total of 88 variants, with frequencies as low as 1%, were detected in the mixed population. These variants included 16 non-synonymous variants concentrated in the genes encoding structural and nonstructural proteins; including Glycoprotein 2a and 5. Using an ultra-deep sequencing methodology, the complete genome of Olot/91 was constructed without any prior knowledge of the sequence. Rare variants that constitute minor fractions of the heterogeneous PRRSV population could successfully be detected to allow further exploration of microevolutionary events.
Wang, Ruijia; Nambiar, Ram; Zheng, Dinghai
2018-01-01
Abstract PolyA_DB is a database cataloging cleavage and polyadenylation sites (PASs) in several genomes. Previous versions were based mainly on expressed sequence tags (ESTs), which had a limited amount and could lead to inaccurate PAS identification due to the presence of internal A-rich sequences in transcripts. Here, we present an updated version of the database based solely on deep sequencing data. First, PASs are mapped by the 3′ region extraction and deep sequencing (3′READS) method, ensuring unequivocal PAS identification. Second, a large volume of data based on diverse biological samples increases PAS coverage by 3.5-fold over the EST-based version and provides PAS usage information. Third, strand-specific RNA-seq data are used to extend annotated 3′ ends of genes to obtain more thorough annotations of alternative polyadenylation (APA) sites. Fourth, conservation information of PAS across mammals sheds light on significance of APA sites. The database (URL: http://www.polya-db.org/v3) currently holds PASs in human, mouse, rat and chicken, and has links to the UCSC genome browser for further visualization and for integration with other genomic data. PMID:29069441
Optimization of conditions to sequence long cDNAs from viruses
USDA-ARS?s Scientific Manuscript database
Fourth generation sequencing with the Minion nanopore sequencer provides opportunity to obtain deep coverage and long read for single molecules. This will benefit studies on RNA viruses. In the past, Sanger, Illumina, and Ion Torrent sequencing have been utilized to study RNA viruses. Both technique...
SNP discovery through de novo deep sequencing using the next generation of DNA sequencers
USDA-ARS?s Scientific Manuscript database
The production of high volumes of DNA sequence data using new technologies has permitted more efficient identification of single nucleotide polymorphisms in vertebrate genomes. This chapter presented practical methodology for production and analysis of DNA sequence data for SNP discovery....
Betz-Stablein, B. D.; Töpfer, A.; Littlejohn, M.; Yuen, L.; Colledge, D.; Sozzi, V.; Angus, P.; Thompson, A.; Revill, P.; Beerenwinkel, N.; Warner, N.
2016-01-01
ABSTRACT Chronic hepatitis B (CHB) is prevalent worldwide. The infectious agent, hepatitis B virus (HBV), replicates via an RNA intermediate and is error prone, leading to the rapid generation of closely related but not identical viral variants, including those that can escape host immune responses and antiviral treatments. The complexity of CHB can be further enhanced by the presence of HBV variants with large deletions in the genome generated via splicing (spHBV variants). Although spHBV variants are incapable of autonomous replication, their replication is rescued by wild-type HBV. spHBV variants have been shown to enhance wild-type virus replication, and their prevalence increases with liver disease progression. Single-molecule deep sequencing was performed on whole HBV genomes extracted from samples, including the liver explant, longitudinally collected from a subject with CHB over a 15-year period after liver transplantation. By employing novel bioinformatics methods, this analysis showed that the dynamics of the viral population across a period of changing treatment regimens was complex. The spHBV variants detected in the liver explant remained present posttransplantation, and a highly diverse novel spHBV population as well as variants with multiple deletions in the pre-S genes emerged. The identification of novel mutations outside the HBV reverse transcriptase gene that co-occurred with known drug resistance-associated mutations highlights the relevance of using full-genome deep sequencing and supports the hypothesis that drug resistance involves interactions across the full length of the HBV genome. IMPORTANCE Single-molecule sequencing allowed the characterization, in unprecedented detail, of the evolution of HBV populations and offered unique insights into the dynamics of defective and spHBV variants following liver transplantation and complex treatment regimens. This analysis also showed the rapid adaptation of HBV populations to treatment regimens with evolving drug resistance phenotypes and evidence of purifying selection across the whole genome. Finally, the new open-source bioinformatics tools with the capacity to easily identify potential spliced variants from deep sequencing data are freely available. PMID:27252524
Deep Sequencing Analysis of Apple Infecting Viruses in Korea
Cho, In-Sook; Igori, Davaajargal; Lim, Seungmo; Choi, Gug-Seoun; Hammond, John; Lim, Hyoun-Sub; Moon, Jae Sun
2016-01-01
Deep sequencing has generated 52 contigs derived from five viruses; Apple chlorotic leaf spot virus (ACLSV), Apple stem grooving virus (ASGV), Apple stem pitting virus (ASPV), Apple green crinkle associated virus (AGCaV), and Apricot latent virus (ApLV) were identified from eight apple samples showing small leaves and/or growth retardation. Nucleotide (nt) sequence identity of the assembled contigs was from 68% to 99% compared to the reference sequences of the five respective viral genomes. Sequences of ASPV and ASGV were the most abundantly represented by the 52 contigs assembled. The presence of the five viruses in the samples was confirmed by RT-PCR using specific primers based on the sequences of each assembled contig. All five viruses were detected in three of the samples, whereas all samples had mixed infections with at least two viruses. The most frequently detected virus was ASPV, followed by ASGV, ApLV, ACLSV, and AGCaV which were withal found in mixed infections in the tested samples. AGCaV was identified in assembled contigs ID 1012480 and 93549, which showed 82% and 78% nt sequence identity with ORF1 of AGCaV isolate Aurora-1. ApLV was identified in three assembled contigs, ID 65587, 1802365, and 116777, which showed 77%, 78%, and 76% nt sequence identity respectively with ORF1 of ApLV isolate LA2. Deep sequencing assay was shown to be a valuable and powerful tool for detection and identification of known and unknown virome in infected apple trees, here identifying ApLV and AGCaV in commercial orchards in Korea for the first time. PMID:27721694
Gries, Jasmin; Schumacher, Dirk; Arand, Julia; Lutsik, Pavlo; Markelova, Maria Rivera; Fichtner, Iduna; Walter, Jörn; Sers, Christine; Tierling, Sascha
2013-01-01
The use of next generation sequencing has expanded our view on whole mammalian methylome patterns. In particular, it provides a genome-wide insight of local DNA methylation diversity at single nucleotide level and enables the examination of single chromosome sequence sections at a sufficient statistical power. We describe a bisulfite-based sequence profiling pipeline, Bi-PROF, which is based on the 454 GS-FLX Titanium technology that allows to obtain up to one million sequence stretches at single base pair resolution without laborious subcloning. To illustrate the performance of the experimental workflow connected to a bioinformatics program pipeline (BiQ Analyzer HT) we present a test analysis set of 68 different epigenetic marker regions (amplicons) in five individual patient-derived xenograft tissue samples of colorectal cancer and one healthy colon epithelium sample as a control. After the 454 GS-FLX Titanium run, sequence read processing and sample decoding, the obtained alignments are quality controlled and statistically evaluated. Comprehensive methylation pattern interpretation (profiling) assessed by analyzing 102-104 sequence reads per amplicon allows an unprecedented deep view on pattern formation and methylation marker heterogeneity in tissues concerned by complex diseases like cancer. PMID:23803588
Wu, Lucia R.; Chen, Sherry X.; Wu, Yalei; Patel, Abhijit A.; Zhang, David Yu
2018-01-01
Rare DNA-sequence variants hold important clinical and biological information, but existing detection techniques are expensive, complex, allele-specific, or don’t allow for significant multiplexing. Here, we report a temperature-robust polymerase-chain-reaction method, which we term blocker displacement amplification (BDA), that selectively amplifies all sequence variants, including single-nucleotide variants (SNVs), within a roughly 20-nucleotide window by 1,000-fold over wild-type sequences. This allows for easy detection and quantitation of hundreds of potential variants originally at ≤0.1% in allele frequency. BDA is compatible with inexpensive thermocycler instrumentation and employs a rationally designed competitive hybridization reaction to achieve comparable enrichment performance across annealing temperatures ranging from 56 °C to 64 °C. To show the sequence generality of BDA, we demonstrate enrichment of 156 SNVs and the reliable detection of single-digit copies. We also show that the BDA detection of rare driver mutations in cell-free DNA samples extracted from the blood plasma of lung-cancer patients is highly consistent with deep sequencing using molecular lineage tags, with a receiver operator characteristic accuracy of 95%. PMID:29805844
Deep sequencing approaches for the analysis of prokaryotic transcriptional boundaries and dynamics.
James, Katherine; Cockell, Simon J; Zenkin, Nikolay
2017-05-01
The identification of the protein-coding regions of a genome is straightforward due to the universality of start and stop codons. However, the boundaries of the transcribed regions, conditional operon structures, non-coding RNAs and the dynamics of transcription, such as pausing of elongation, are non-trivial to identify, even in the comparatively simple genomes of prokaryotes. Traditional methods for the study of these areas, such as tiling arrays, are noisy, labour-intensive and lack the resolution required for densely-packed bacterial genomes. Recently, deep sequencing has become increasingly popular for the study of the transcriptome due to its lower costs, higher accuracy and single nucleotide resolution. These methods have revolutionised our understanding of prokaryotic transcriptional dynamics. Here, we review the deep sequencing and data analysis techniques that are available for the study of transcription in prokaryotes, and discuss the bioinformatic considerations of these analyses. Copyright © 2017 Elsevier Inc. All rights reserved.
Insertion sequences enrichment in extreme Red sea brine pool vent.
Elbehery, Ali H A; Aziz, Ramy K; Siam, Rania
2017-03-01
Mobile genetic elements are major agents of genome diversification and evolution. Limited studies addressed their characteristics, including abundance, and role in extreme habitats. One of the rare natural habitats exposed to multiple-extreme conditions, including high temperature, salinity and concentration of heavy metals, are the Red Sea brine pools. We assessed the abundance and distribution of different mobile genetic elements in four Red Sea brine pools including the world's largest known multiple-extreme deep-sea environment, the Red Sea Atlantis II Deep. We report a gradient in the abundance of mobile genetic elements, dramatically increasing in the harshest environment of the pool. Additionally, we identified a strong association between the abundance of insertion sequences and extreme conditions, being highest in the harshest and deepest layer of the Red Sea Atlantis II Deep. Our comparative analyses of mobile genetic elements in secluded, extreme and relatively non-extreme environments, suggest that insertion sequences predominantly contribute to polyextremophiles genome plasticity.
Ono, Hiroyuki; Saitsu, Hirotomo; Horikawa, Reiko; Nakashima, Shinichi; Ohkubo, Yumiko; Yanagi, Kumiko; Nakabayashi, Kazuhiko; Fukami, Maki; Fujisawa, Yasuko; Ogata, Tsutomu
2018-02-02
Although partial androgen insensitivity syndrome (PAIS) is caused by attenuated responsiveness to androgens, androgen receptor gene (AR) mutations on the coding regions and their splice sites have been identified only in <25% of patients with a diagnosis of PAIS. We performed extensive molecular studies including whole exome sequencing in a Japanese family with PAIS, identifying a deep intronic variant beyond the branch site at intron 6 of AR (NM_000044.4:c.2450-42 G > A). This variant created the splice acceptor motif that was accompanied by pyrimidine-rich sequence and two candidate branch sites. Consistent with this, reverse transcriptase (RT)-PCR experiments for cycloheximide-treated lymphoblastoid cell lines revealed a relatively large amount of aberrant mRNA produced by the newly created splice acceptor site and a relatively small amount of wildtype mRNA produced by the normal splice acceptor site. Furthermore, most of the aberrant mRNA was shown to undergo nonsense mediated decay (NMD) and, if a small amount of aberrant mRNA may have escaped NMD, such mRNA was predicted to generate a truncated AR protein missing some functional domains. These findings imply that the deep intronic mutation creating an alternative splice acceptor site resulted in the production of a relatively small amount of wildtype AR mRNA, leading to PAIS.
Zhang, Ning; Wen, Jun; Zimmer, Elizabeth A.
2015-01-01
Vitaceae is well-known for having one of the most economically important fruits, i.e., the grape (Vitis vinifera). The deep phylogeny of the grape family was not resolved until a recent phylogenomic analysis of 417 nuclear genes from transcriptome data. However, it has been reported extensively that topologies based on nuclear and organellar genes may be incongruent due to differences in their evolutionary histories. Therefore, it is important to reconstruct a backbone phylogeny of the grape family using plastomes and mitochondrial genes. In this study, next-generation sequencing data sets of 27 species were obtained using genome skimming with total DNAs from silica-gel preserved tissue samples on an Illumina HiSeq 2500 instrument. Plastomes were assembled using the combination of de novo and reference genome (of V. vinifera) methods. Sixteen mitochondrial genes were also obtained via genome skimming using the reference genome of V. vinifera. Extensive phylogenetic analyses were performed using maximum likelihood and Bayesian methods. The topology based on either plastome data or mitochondrial genes is congruent with the one using hundreds of nuclear genes, indicating that the grape family did not exhibit significant reticulation at the deep level. The results showcase the power of genome skimming in capturing extensive phylogenetic data: especially from chloroplast and mitochondrial DNAs. PMID:26656830
Zhang, Ning; Wen, Jun; Zimmer, Elizabeth A
2015-01-01
Vitaceae is well-known for having one of the most economically important fruits, i.e., the grape (Vitis vinifera). The deep phylogeny of the grape family was not resolved until a recent phylogenomic analysis of 417 nuclear genes from transcriptome data. However, it has been reported extensively that topologies based on nuclear and organellar genes may be incongruent due to differences in their evolutionary histories. Therefore, it is important to reconstruct a backbone phylogeny of the grape family using plastomes and mitochondrial genes. In this study,next-generation sequencing data sets of 27 species were obtained using genome skimming with total DNAs from silica-gel preserved tissue samples on an Illumina NextSeq 500 instrument [corrected]. Plastomes were assembled using the combination of de novo and reference genome (of V. vinifera) methods. Sixteen mitochondrial genes were also obtained via genome skimming using the reference genome of V. vinifera. Extensive phylogenetic analyses were performed using maximum likelihood and Bayesian methods. The topology based on either plastome data or mitochondrial genes is congruent with the one using hundreds of nuclear genes, indicating that the grape family did not exhibit significant reticulation at the deep level. The results showcase the power of genome skimming in capturing extensive phylogenetic data: especially from chloroplast and mitochondrial DNAs.
Hirose, Yusuke; Onuki, Mamiko; Tenjimbayashi, Yuri; Mori, Seiichiro; Ishii, Yoshiyuki; Takeuchi, Takamasa; Tasaka, Nobutaka; Satoh, Toyomi; Morisada, Tohru; Iwata, Takashi; Miyamoto, Shingo; Matsumoto, Koji; Sekizawa, Akihiko; Kukimoto, Iwao
2018-06-15
Persistent infection with oncogenic human papillomaviruses (HPVs) causes cervical cancer, accompanied by the accumulation of somatic mutations into the host genome. There are concomitant genetic changes in the HPV genome during viral infection; however, their relevance to cervical carcinogenesis is poorly understood. Here, we explored within-host genetic diversity of HPV by performing deep-sequencing analyses of viral whole-genome sequences in clinical specimens. The whole genomes of HPV types 16, 52, and 58 were amplified by type-specific PCR from total cellular DNA of cervical exfoliated cells collected from patients with cervical intraepithelial neoplasia (CIN) and invasive cervical cancer (ICC) and were deep sequenced. After constructing a reference viral genome sequence for each specimen, nucleotide positions showing changes with >0.5% frequencies compared to the reference sequence were determined for individual samples. In total, 1,052 positions of nucleotide variations were detected in HPV genomes from 151 samples (CIN1, n = 56; CIN2/3, n = 68; ICC, n = 27), with various numbers per sample. Overall, C-to-T and C-to-A substitutions were the dominant changes observed across all histological grades. While C-to-T transitions were predominantly detected in CIN1, their prevalence was decreased in CIN2/3 and fell below that of C-to-A transversions in ICC. Analysis of the trinucleotide context encompassing substituted bases revealed that TpCpN, a preferred target sequence for cellular APOBEC cytosine deaminases, was a primary site for C-to-T substitutions in the HPV genome. These results strongly imply that the APOBEC proteins are drivers of HPV genome mutation, particularly in CIN1 lesions. IMPORTANCE HPVs exhibit surprisingly high levels of genetic diversity, including a large repertoire of minor genomic variants in each viral genotype. Here, by conducting deep-sequencing analyses, we show for the first time a comprehensive snapshot of the within-host genetic diversity of high-risk HPVs during cervical carcinogenesis. Quasispecies harboring minor nucleotide variations in viral whole-genome sequences were extensively observed across different grades of CIN and cervical cancer. Among the within-host variations, C-to-T transitions, a characteristic change mediated by cellular APOBEC cytosine deaminases, were predominantly detected throughout the whole viral genome, most strikingly in low-grade CIN lesions. The results strongly suggest that within-host variations of the HPV genome are primarily generated through the interaction with host cell DNA-editing enzymes and that such within-host variability is an evolutionary source of the genetic diversity of HPVs. Copyright © 2018 American Society for Microbiology.
Ravì, Daniele; Szczotka, Agnieszka Barbara; Shakir, Dzhoshkun Ismail; Pereira, Stephen P; Vercauteren, Tom
2018-06-01
Probe-based confocal laser endomicroscopy (pCLE) is a recent imaging modality that allows performing in vivo optical biopsies. The design of pCLE hardware, and its reliance on an optical fibre bundle, fundamentally limits the image quality with a few tens of thousands fibres, each acting as the equivalent of a single-pixel detector, assembled into a single fibre bundle. Video registration techniques can be used to estimate high-resolution (HR) images by exploiting the temporal information contained in a sequence of low-resolution (LR) images. However, the alignment of LR frames, required for the fusion, is computationally demanding and prone to artefacts. In this work, we propose a novel synthetic data generation approach to train exemplar-based Deep Neural Networks (DNNs). HR pCLE images with enhanced quality are recovered by the models trained on pairs of estimated HR images (generated by the video registration algorithm) and realistic synthetic LR images. Performance of three different state-of-the-art DNNs techniques were analysed on a Smart Atlas database of 8806 images from 238 pCLE video sequences. The results were validated through an extensive image quality assessment that takes into account different quality scores, including a Mean Opinion Score (MOS). Results indicate that the proposed solution produces an effective improvement in the quality of the obtained reconstructed image. The proposed training strategy and associated DNNs allows us to perform convincing super-resolution of pCLE images.
2011-01-01
Background Readthrough fusions across adjacent genes in the genome, or transcription-induced chimeras (TICs), have been estimated using expressed sequence tag (EST) libraries to involve 4-6% of all genes. Deep transcriptional sequencing (RNA-Seq) now makes it possible to study the occurrence and expression levels of TICs in individual samples across the genome. Methods We performed single-end RNA-Seq on three human prostate adenocarcinoma samples and their corresponding normal tissues, as well as brain and universal reference samples. We developed two bioinformatics methods to specifically identify TIC events: a targeted alignment method using artificial exon-exon junctions within 200,000 bp from adjacent genes, and genomic alignment allowing splicing within individual reads. We performed further experimental verification and characterization of selected TIC and fusion events using quantitative RT-PCR and comparative genomic hybridization microarrays. Results Targeted alignment against artificial exon-exon junctions yielded 339 distinct TIC events, including 32 gene pairs with multiple isoforms. The false discovery rate was estimated to be 1.5%. Spliced alignment to the genome was less sensitive, finding only 18% of those found by targeted alignment in 33-nt reads and 59% of those in 50-nt reads. However, spliced alignment revealed 30 cases of TICs with intervening exons, in addition to distant inversions, scrambled genes, and translocations. Our findings increase the catalog of observed TIC gene pairs by 66%. We verified 6 of 6 predicted TICs in all prostate samples, and 2 of 5 predicted novel distant gene fusions, both private events among 54 prostate tumor samples tested. Expression of TICs correlates with that of the upstream gene, which can explain the prostate-specific pattern of some TIC events and the restriction of the SLC45A3-ELK4 e4-e2 TIC to ERG-negative prostate samples, as confirmed in 20 matched prostate tumor and normal samples and 9 lung cancer cell lines. Conclusions Deep transcriptional sequencing and analysis with targeted and spliced alignment methods can effectively identify TIC events across the genome in individual tissues. Prostate and reference samples exhibit a wide range of TIC events, involving more genes than estimated previously using ESTs. Tissue specificity of TIC events is correlated with expression patterns of the upstream gene. Some TIC events, such as MSMB-NCOA4, may play functional roles in cancer. PMID:21261984
Evolution of simeprevir-resistant variants over time by ultra-deep sequencing in HCV genotype 1b.
Akuta, Norio; Suzuki, Fumitaka; Sezaki, Hitomi; Suzuki, Yoshiyuki; Hosaka, Tetsuya; Kobayashi, Masahiro; Kobayashi, Mariko; Saitoh, Satoshi; Ikeda, Kenji; Kumada, Hiromitsu
2014-08-01
Using ultra-deep sequencing technology, the present study was designed to investigate the evolution of simeprevir-resistant variants (amino acid substitutions of aa80, aa155, aa156, and aa168 positions in HCV NS3 region) over time. In Toranomon Hospital, 18 Japanese patients infected with HCV genotype 1b, received triple therapy of simeprevir/PEG-IFN/ribavirin (DRAGON or CONCERT study). Sustained virological response rate was 67%, and that was significantly higher in patients with IL28B rs8099917 TT than in those with non-TT. Six patients, who did not achieve sustained virological response, were tested for resistant variants by ultra-deep sequencing, at the baseline, at the time of re-elevation of viral loads, and at 96 weeks after the completion of treatment. Twelve of 18 resistant variants, detected at re-elevation of viral load, were de novo resistant variants. Ten of 12 de novo resistant variants become undetectable over time, and that five of seven resistant variants, detected at baseline, persisted over time. In one patient, variants of Q80R at baseline (0.3%) increased at 96-week after the cessation of treatment (10.2%), and de novo resistant variants of D168E (0.3%) also increased at 96-week after the cessation of treatment (9.7%). In conclusion, the present study indicates that the emergence of simeprevir-resistant variants after the start of treatment could not be predicted at baseline, and the majority of de novo resistant variants become undetectable over time. Further large-scale prospective studies should be performed to investigate the clinical utility in detecting simeprevir-resistant variants. © 2014 Wiley Periodicals, Inc.
Kretova, Olga V; Chechetkin, Vladimir R; Fedoseeva, Daria M; Kravatsky, Yuri V; Sosin, Dmitri V; Alembekov, Ildar R; Gorbacheva, Maria A; Gashnikova, Natalya M; Tchurikov, Nickolai A
2017-02-01
Any method for silencing the activity of the HIV-1 retrovirus should tackle the extremely high variability of HIV-1 sequences and mutational escape. We studied sequence variability in the vicinity of selected RNA interference (RNAi) targets from isolates of HIV-1 subtype A in Russia, and we propose that using artificial RNAi is a potential alternative to traditional antiretroviral therapy. We prove that using multiple RNAi targets overcomes the variability in HIV-1 isolates. The optimal number of targets critically depends on the conservation of the target sequences. The total number of targets that are conserved with a probability of 0.7-0.8 should exceed at least 2. Combining deep sequencing and multitarget RNAi may provide an efficient approach to cure HIV/AIDS.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Radovich, B.J.; Hoffman, M.W.; Perlmutter, M.A.
1995-12-31
Several large, TCF-size gas fields have been discovered in the Surma Basin, Bangladesh. Detailed sequence stratigraphy was performed on log and seismic data to study these fields and future potential of the area. The prospective section is Upper Miocene sands caught up in a series of younger compressional fault-related folds caused by the Indian Plate colliding with S.E. Asia in the late Tertiary. World-class gas/water contacts are observed on the seismic data over the fields. Sequence stratigraphic techniques reveal an ordered, predictable stratigraphic architecture of sandy highstands and transgressions, and muddy aggraded prograding complexes with deep incisions at each sequencemore » boundary. This serves as a framework to understand the hydrocarbon accumulations in the area. Cyclostratigraphy is used to understand the unusual lithology distributions in the basin.« less
Command system output bit verification
NASA Technical Reports Server (NTRS)
Odd, C. W.; Abbate, S. F.
1981-01-01
An automatic test was developed to test the ability of the deep space station (DSS) command subsystem and exciter to generate and radiate, from the exciter, the correct idle bit sequence for a given flight project or to store and radiate received command data elements and files without alteration. This test, called the command system output bit verification test, is an extension of the command system performance test (SPT) and can be selected as an SPT option. The test compares the bit stream radiated from the DSS exciter with reference sequences generated by the SPT software program. The command subsystem and exciter are verified when the bit stream and reference sequences are identical. It is a key element of the acceptance testing conducted on the command processor assembly (CPA) operational program (DMC-0584-OP-G) prior to its transfer from development to operations.
DeepLoc: prediction of protein subcellular localization using deep learning.
Almagro Armenteros, José Juan; Sønderby, Casper Kaae; Sønderby, Søren Kaae; Nielsen, Henrik; Winther, Ole
2017-11-01
The prediction of eukaryotic protein subcellular localization is a well-studied topic in bioinformatics due to its relevance in proteomics research. Many machine learning methods have been successfully applied in this task, but in most of them, predictions rely on annotation of homologues from knowledge databases. For novel proteins where no annotated homologues exist, and for predicting the effects of sequence variants, it is desirable to have methods for predicting protein properties from sequence information only. Here, we present a prediction algorithm using deep neural networks to predict protein subcellular localization relying only on sequence information. At its core, the prediction model uses a recurrent neural network that processes the entire protein sequence and an attention mechanism identifying protein regions important for the subcellular localization. The model was trained and tested on a protein dataset extracted from one of the latest UniProt releases, in which experimentally annotated proteins follow more stringent criteria than previously. We demonstrate that our model achieves a good accuracy (78% for 10 categories; 92% for membrane-bound or soluble), outperforming current state-of-the-art algorithms, including those relying on homology information. The method is available as a web server at http://www.cbs.dtu.dk/services/DeepLoc. Example code is available at https://github.com/JJAlmagro/subcellular_localization. The dataset is available at http://www.cbs.dtu.dk/services/DeepLoc/data.php. jjalma@dtu.dk. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Xiong, Dapeng; Zeng, Jianyang; Gong, Haipeng
2017-09-01
Residue-residue contacts are of great value for protein structure prediction, since contact information, especially from those long-range residue pairs, can significantly reduce the complexity of conformational sampling for protein structure prediction in practice. Despite progresses in the past decade on protein targets with abundant homologous sequences, accurate contact prediction for proteins with limited sequence information is still far from satisfaction. Methodologies for these hard targets still need further improvement. We presented a computational program DeepConPred, which includes a pipeline of two novel deep-learning-based methods (DeepCCon and DeepRCon) as well as a contact refinement step, to improve the prediction of long-range residue contacts from primary sequences. When compared with previous prediction approaches, our framework employed an effective scheme to identify optimal and important features for contact prediction, and was only trained with coevolutionary information derived from a limited number of homologous sequences to ensure robustness and usefulness for hard targets. Independent tests showed that 59.33%/49.97%, 64.39%/54.01% and 70.00%/59.81% of the top L/5, top L/10 and top 5 predictions were correct for CASP10/CASP11 proteins, respectively. In general, our algorithm ranked as one of the best methods for CASP targets. All source data and codes are available at http://166.111.152.91/Downloads.html . hgong@tsinghua.edu.cn or zengjy321@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Rath, Matthias; Jenssen, Sönke E; Schwefel, Konrad; Spiegler, Stefanie; Kleimeier, Dana; Sperling, Christian; Kaderali, Lars; Felbor, Ute
2017-09-01
Cerebral cavernous malformations (CCM) are vascular lesions of the central nervous system that can cause headaches, seizures and hemorrhagic stroke. Disease-associated mutations have been identified in three genes: CCM1/KRIT1, CCM2 and CCM3/PDCD10. The precise proportion of deep-intronic variants in these genes and their clinical relevance is yet unknown. Here, a long-range PCR (LR-PCR) approach for target enrichment of the entire genomic regions of the three genes was combined with next generation sequencing (NGS) to screen for coding and non-coding variants. NGS detected all six CCM1/KRIT1, two CCM2 and four CCM3/PDCD10 mutations that had previously been identified by Sanger sequencing. Two of the pathogenic variants presented here are novel. Additionally, 20 stringently selected CCM index cases that had remained mutation-negative after conventional sequencing and exclusion of copy number variations were screened for deep-intronic mutations. The combination of bioinformatics filtering and transcript analyses did not reveal any deep-intronic splice mutations in these cases. Our results demonstrate that target enrichment by LR-PCR combined with NGS can be used for a comprehensive analysis of the entire genomic regions of the CCM genes in a research context. However, its clinical utility is limited as deep-intronic splice mutations in CCM1/KRIT1, CCM2 and CCM3/PDCD10 seem to be rather rare. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
Musculoskeletal MRI findings of juvenile localized scleroderma.
Eutsler, Eric P; Horton, Daniel B; Epelman, Monica; Finkel, Terri; Averill, Lauren W
2017-04-01
Juvenile localized scleroderma comprises a group of autoimmune conditions often characterized clinically by an area of skin hardening. In addition to superficial changes in the skin and subcutaneous tissues, juvenile localized scleroderma may involve the deep soft tissues, bones and joints, possibly resulting in functional impairment and pain in addition to cosmetic changes. There is literature documenting the spectrum of findings for deep involvement of localized scleroderma (fascia, muscles, tendons, bones and joints) in adults, but there is limited literature for the condition in children. We aimed to document the spectrum of musculoskeletal magnetic resonance imaging (MRI) findings of both superficial and deep juvenile localized scleroderma involvement in children and to evaluate the utility of various MRI sequences for detecting those findings. Two radiologists retrospectively evaluated 20 MRI studies of the extremities in 14 children with juvenile localized scleroderma. Each imaging sequence was also given a subjective score of 0 (not useful), 1 (somewhat useful) or 2 (most useful for detecting the findings). Deep tissue involvement was detected in 65% of the imaged extremities. Fascial thickening and enhancement were seen in 50% of imaged extremities. Axial T1, axial T1 fat-suppressed (FS) contrast-enhanced and axial fluid-sensitive sequences were rated most useful. Fascial thickening and enhancement were the most commonly encountered deep tissue findings in extremity MRIs of children with juvenile localized scleroderma. Because abnormalities of the skin, subcutaneous tissues and fascia tend to run longitudinally in an affected limb, axial T1, axial fluid-sensitive and axial T1-FS contrast-enhanced sequences should be included in the imaging protocol.
Dissecting enzyme function with microfluidic-based deep mutational scanning.
Romero, Philip A; Tran, Tuan M; Abate, Adam R
2015-06-09
Natural enzymes are incredibly proficient catalysts, but engineering them to have new or improved functions is challenging due to the complexity of how an enzyme's sequence relates to its biochemical properties. Here, we present an ultrahigh-throughput method for mapping enzyme sequence-function relationships that combines droplet microfluidic screening with next-generation DNA sequencing. We apply our method to map the activity of millions of glycosidase sequence variants. Microfluidic-based deep mutational scanning provides a comprehensive and unbiased view of the enzyme function landscape. The mapping displays expected patterns of mutational tolerance and a strong correspondence to sequence variation within the enzyme family, but also reveals previously unreported sites that are crucial for glycosidase function. We modified the screening protocol to include a high-temperature incubation step, and the resulting thermotolerance landscape allowed the discovery of mutations that enhance enzyme thermostability. Droplet microfluidics provides a general platform for enzyme screening that, when combined with DNA-sequencing technologies, enables high-throughput mapping of enzyme sequence space.
USDA-ARS?s Scientific Manuscript database
The complete genome sequence of a Southern tomato virus (STV) isolate on tomato plants in a seed production field in Bangladesh was obtained for the first time using next generation sequencing. The identified isolate STV_BD-13 shares high degree of sequence identity (99%) with several known STV isol...
USDA-ARS?s Scientific Manuscript database
Complete genome sequence of a double-stranded RNA (dsRNA) virus, southern tomato virus (STV), on tomatoes in China, was elucidated using small RNAs deep sequencing. The identified STV_CN12 shares 99% sequence identity to other isolates from Mexico, France, Spain, and U.S. This is the first report ...
Modeling genome coverage in single-cell sequencing
Daley, Timothy; Smith, Andrew D.
2014-01-01
Motivation: Single-cell DNA sequencing is necessary for examining genetic variation at the cellular level, which remains hidden in bulk sequencing experiments. But because they begin with such small amounts of starting material, the amount of information that is obtained from single-cell sequencing experiment is highly sensitive to the choice of protocol employed and variability in library preparation. In particular, the fraction of the genome represented in single-cell sequencing libraries exhibits extreme variability due to quantitative biases in amplification and loss of genetic material. Results: We propose a method to predict the genome coverage of a deep sequencing experiment using information from an initial shallow sequencing experiment mapped to a reference genome. The observed coverage statistics are used in a non-parametric empirical Bayes Poisson model to estimate the gain in coverage from deeper sequencing. This approach allows researchers to know statistical features of deep sequencing experiments without actually sequencing deeply, providing a basis for optimizing and comparing single-cell sequencing protocols or screening libraries. Availability and implementation: The method is available as part of the preseq software package. Source code is available at http://smithlabresearch.org/preseq. Contact: andrewds@usc.edu Supplementary information: Supplementary material is available at Bioinformatics online. PMID:25107873
Risk of Breast Cancer with CXCR4-using HIV Defined by V3-Loop Sequencing
Goedert, James J.; Swenson, Luke C.; Napolitano, Laura A.; Haddad, Mojgan; Anastos, Kathryn; Minkoff, Howard; Young, Mary; Levine, Alexandra; Adeyemi, Oluwatoyin; Seaberg, Eric C.; Aouizerat, Bradley; Rabkin, Charles S.; Harrigan, P. Richard; Hessol, Nancy A.
2014-01-01
Objective Evaluate the risk of female breast cancer associated with HIV-CXCR4 (X4) tropism as determined by various genotypic measures. Methods A breast cancer case-control study, with pairwise comparisons of tropism determination methods, was conducted. From the Women's Interagency HIV Study repository, one stored plasma specimen was selected from 25 HIV-infected cases near the breast cancer diagnosis date and 75 HIV-infected control women matched for age and calendar date. HIVgp120-V3 sequences were derived by Sanger population sequencing (PS) and 454-pyro deep sequencing (DS). Sequencing-based HIV-X4 tropism was defined using the geno2pheno algorithm, with both high-stringency DS [False-Positive-Rate (FPR 3.5) and 2% X4 cutoff], and lower stringency DS (FPR 5.75, 15% X4 cut-off). Concordance of tropism results by PS, DS, and previously performed phenotyping was assessed with kappa (κ) statistics. Case-control comparisons used exact P-values and conditional logistic regression. Results In 74 women (19 cases, 55 controls) with complete results, prevalence of HIV-X4 by PS was 5% in cases vs 29% in controls (P=0.06, odds ratio 0.14, confidence interval 0.003-1.03). Smaller case-control prevalence differences were found with high-stringency DS (21% vs 36%, P=0.32), lower-stringency DS (16% vs 35%, P=0.18), and phenotyping (11% vs 31%, P=0.10). HIV-X4-tropism concordance was best between PS and lower-stringency DS (93%, κ=0.83). Other pairwise concordances were 82%-92% (κ=0.56-0.81). Concordance was similar among cases and controls. Conclusions HIV-X4 defined by population sequencing (PS) had good agreement with lower stringency deep sequencing and was significantly associated with lower odds of breast cancer. PMID:25321183
Software for Automation of Real-Time Agents, Version 2
NASA Technical Reports Server (NTRS)
Fisher, Forest; Estlin, Tara; Gaines, Daniel; Schaffer, Steve; Chouinard, Caroline; Engelhardt, Barbara; Wilklow, Colette; Mutz, Darren; Knight, Russell; Rabideau, Gregg;
2005-01-01
Version 2 of Closed Loop Execution and Recovery (CLEaR) has been developed. CLEaR is an artificial intelligence computer program for use in planning and execution of actions of autonomous agents, including, for example, Deep Space Network (DSN) antenna ground stations, robotic exploratory ground vehicles (rovers), robotic aircraft (UAVs), and robotic spacecraft. CLEaR automates the generation and execution of command sequences, monitoring the sequence execution, and modifying the command sequence in response to execution deviations and failures as well as new goals for the agent to achieve. The development of CLEaR has focused on the unification of planning and execution to increase the ability of the autonomous agent to perform under tight resource and time constraints coupled with uncertainty in how much of resources and time will be required to perform a task. This unification is realized by extending the traditional three-tier robotic control architecture by increasing the interaction between the software components that perform deliberation and reactive functions. The increase in interaction reduces the need to replan, enables earlier detection of the need to replan, and enables replanning to occur before an agent enters a state of failure.
USDA-ARS?s Scientific Manuscript database
Over the past decade, Next Generation Sequencing (NGS) technologies, also called deep sequencing, have continued to evolve, increasing capacity and lower the cost necessary for large genome sequencing projects. The one of the advantage of NGS platforms is the possibility to sequence the samples with...
Lahuerta, Juan J.; Pepin, François; González, Marcos; Barrio, Santiago; Ayala, Rosa; Puig, Noemí; Montalban, María A.; Paiva, Bruno; Weng, Li; Jiménez, Cristina; Sopena, María; Moorhead, Martin; Cedena, Teresa; Rapado, Immaculada; Mateos, María Victoria; Rosiñol, Laura; Oriol, Albert; Blanchard, María J.; Martínez, Rafael; Bladé, Joan; San Miguel, Jesús; Faham, Malek; García-Sanz, Ramón
2014-01-01
We assessed the prognostic value of minimal residual disease (MRD) detection in multiple myeloma (MM) patients using a sequencing-based platform in bone marrow samples from 133 MM patients in at least very good partial response (VGPR) after front-line therapy. Deep sequencing was carried out in patients in whom a high-frequency myeloma clone was identified and MRD was assessed using the IGH-VDJH, IGH-DJH, and IGK assays. The results were contrasted with those of multiparametric flow cytometry (MFC) and allele-specific oligonucleotide polymerase chain reaction (ASO-PCR). The applicability of deep sequencing was 91%. Concordance between sequencing and MFC and ASO-PCR was 83% and 85%, respectively. Patients who were MRD– by sequencing had a significantly longer time to tumor progression (TTP) (median 80 vs 31 months; P < .0001) and overall survival (median not reached vs 81 months; P = .02), compared with patients who were MRD+. When stratifying patients by different levels of MRD, the respective TTP medians were: MRD ≥10−3 27 months, MRD 10−3 to 10−5 48 months, and MRD <10−5 80 months (P = .003 to .0001). Ninety-two percent of VGPR patients were MRD+. In complete response patients, the TTP remained significantly longer for MRD– compared with MRD+ patients (131 vs 35 months; P = .0009). PMID:24646471
Zhang, Yiming; Jin, Quan; Wang, Shuting; Ren, Ren
2011-05-01
The mobile behavior of 1481 peptides in ion mobility spectrometry (IMS), which are generated by protease digestion of the Drosophila melanogaster proteome, is modeled and predicted based on two different types of characterization methods, i.e. sequence-based approach and structure-based approach. In this procedure, the sequence-based approach considers both the amino acid composition of a peptide and the local environment profile of each amino acid in the peptide; the structure-based approach is performed with the CODESSA protocol, which regards a peptide as a common organic compound and generates more than 200 statistically significant variables to characterize the whole structure profile of a peptide molecule. Subsequently, the nonlinear support vector machine (SVM) and Gaussian process (GP) as well as linear partial least squares (PLS) regression is employed to correlate the structural parameters of the characterizations with the IMS drift times of these peptides. The obtained quantitative structure-spectrum relationship (QSSR) models are evaluated rigorously and investigated systematically via both one-deep and two-deep cross-validations as well as the rigorous Monte Carlo cross-validation (MCCV). We also give a comprehensive comparison on the resulting statistics arising from the different combinations of variable types with modeling methods and find that the sequence-based approach can give the QSSR models with better fitting ability and predictive power but worse interpretability than the structure-based approach. In addition, though the QSSR modeling using sequence-based approach is not needed for the preparation of the minimization structures of peptides before the modeling, it would be considerably efficient as compared to that using structure-based approach. Copyright © 2011 Elsevier Ltd. All rights reserved.
Giudice, Valentina; Feng, Xingmin; Lin, Zenghua; Hu, Wei; Zhang, Fanmao; Qiao, Wangmin; Ibanez, Maria Del Pilar Fernandez; Rios, Olga; Young, Neal S
2018-05-01
Oligoclonal expansion of CD8 + CD28 - lymphocytes has been considered indirect evidence for a pathogenic immune response in acquired aplastic anemia. A subset of CD8 + CD28 - cells with CD57 expression, termed effector memory cells, is expanded in several immune-mediated diseases and may have a role in immune surveillance. We hypothesized that effector memory CD8 + CD28 - CD57 + cells may drive aberrant oligoclonal expansion in aplastic anemia. We found CD8 + CD57 + cells frequently expanded in the blood of aplastic anemia patients, with oligoclonal characteristics by flow cytometric Vβ usage analysis: skewing in 1-5 Vβ families and frequencies of immunodominant clones ranging from 1.98% to 66.5%. Oligoclonal characteristics were also observed in total CD8 + cells from aplastic anemia patients with CD8 + CD57 + cell expansion by T-cell receptor deep sequencing, as well as the presence of 1-3 immunodominant clones. Oligoclonality was confirmed by T-cell receptor repertoire deep sequencing of enriched CD8 + CD57 + cells, which also showed decreased diversity compared to total CD4 + and CD8 + cell pools. From analysis of complementarity-determining region 3 sequences in the CD8 + cell pool, a total of 29 sequences were shared between patients and controls, but these sequences were highly expressed in aplastic anemia subjects and also present in their immunodominant clones. In summary, expansion of effector memory CD8 + T cells is frequent in aplastic anemia and mirrors Vβ oligoclonal expansion. Flow cytometric Vβ usage analysis combined with deep sequencing technologies allows high resolution characterization of the T-cell receptor repertoire, and might represent a useful tool in the diagnosis and periodic evaluation of aplastic anemia patients. (Registered at clinicaltrials.gov identifiers: 00001620, 01623167, 00001397, 00071045, 00081523, 00961064 ). Copyright © 2018 Ferrata Storti Foundation.
Er, Tze-Kiong; Wang, Yen-Yun; Chen, Chih-Chieh; Herreros-Villanueva, Marta; Liu, Ta-Chih; Yuan, Shyng-Shiou F
2015-10-01
Many genetic factors play an important role in the development of oral squamous cell carcinoma. The aim of this study was to assess the mutational profile in oral squamous cell carcinoma using formalin-fixed, paraffin-embedded tumors from a Taiwanese population by performing targeted sequencing of 26 cancer-associated genes that are frequently mutated in solid tumors. Next-generation sequencing was performed in 50 formalin-fixed, paraffin-embedded tumor specimens obtained from patients with oral squamous cell carcinoma. Genetic alterations in the 26 cancer-associated genes were detected using a deep sequencing (>1000X) approach. TP53, PIK3CA, MET, APC, CDH1, and FBXW7 were most frequently mutated genes. Most remarkably, TP53 mutations and PIK3CA mutations, which accounted for 68% and 18% of tumors, respectively, were more prevalent in a Taiwanese population. Other genes including MET (4%), APC (4%), CDH1 (2%), and FBXW7 (2%) were identified in our population. In summary, our study shows the feasibility of performing targeted sequencing using formalin-fixed, paraffin-embedded samples. Additionally, this study also reports the mutational landscape of oral squamous cell carcinoma in the Taiwanese population. We believe that this study will shed new light on fundamental aspects in understanding the molecular pathogenesis of oral squamous cell carcinoma and may aid in the development of new targeted therapies. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Complete genome sequence of a tomato infecting tomato mottle mosaic virus in New York
USDA-ARS?s Scientific Manuscript database
Complete genome sequence of an emerging isolate of tomato mottle mosaic virus (ToMMV) infecting experimental nicotianan benthamiana plants in up-state New York was obtained using small RNA deep sequencing. ToMMV_NY-13 shared 99% sequence identity to ToMMV isolates from Mexico and Florida. Broader d...
NASA Astrophysics Data System (ADS)
Zhao, Lei; Wang, Zengcai; Wang, Xiaojin; Qi, Yazhou; Liu, Qing; Zhang, Guoxin
2016-09-01
Human fatigue is an important cause of traffic accidents. To improve the safety of transportation, we propose, in this paper, a framework for fatigue expression recognition using image-based facial dynamic multi-information and a bimodal deep neural network. First, the landmark of face region and the texture of eye region, which complement each other in fatigue expression recognition, are extracted from facial image sequences captured by a single camera. Then, two stacked autoencoder neural networks are trained for landmark and texture, respectively. Finally, the two trained neural networks are combined by learning a joint layer on top of them to construct a bimodal deep neural network. The model can be used to extract a unified representation that fuses landmark and texture modalities together and classify fatigue expressions accurately. The proposed system is tested on a human fatigue dataset obtained from an actual driving environment. The experimental results demonstrate that the proposed method performs stably and robustly, and that the average accuracy achieves 96.2%.
High-speed railway real-time localization auxiliary method based on deep neural network
NASA Astrophysics Data System (ADS)
Chen, Dongjie; Zhang, Wensheng; Yang, Yang
2017-11-01
High-speed railway intelligent monitoring and management system is composed of schedule integration, geographic information, location services, and data mining technology for integration of time and space data. Assistant localization is a significant submodule of the intelligent monitoring system. In practical application, the general access is to capture the image sequences of the components by using a high-definition camera, digital image processing technique and target detection, tracking and even behavior analysis method. In this paper, we present an end-to-end character recognition method based on a deep CNN network called YOLO-toc for high-speed railway pillar plate number. Different from other deep CNNs, YOLO-toc is an end-to-end multi-target detection framework, furthermore, it exhibits a state-of-art performance on real-time detection with a nearly 50fps achieved on GPU (GTX960). Finally, we realize a real-time but high-accuracy pillar plate number recognition system and integrate natural scene OCR into a dedicated classification YOLO-toc model.
Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi
2015-11-20
The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.
Bidlingmaier, Scott; Ha, Kevin; Lee, Nam-Kyung; Su, Yang; Liu, Bin
2016-04-01
Although the bioactive sphingolipid ceramide is an important cell signaling molecule, relatively few direct ceramide-interacting proteins are known. We used an approach combining yeast surface cDNA display and deep sequencing technology to identify novel proteins binding directly to ceramide. We identified 234 candidate ceramide-binding protein fragments and validated binding for 20. Most (17) bound selectively to ceramide, although a few (3) bound to other lipids as well. Several novel ceramide-binding domains were discovered, including the EF-hand calcium-binding motif, the heat shock chaperonin-binding motif STI1, the SCP2 sterol-binding domain, and the tetratricopeptide repeat region motif. Interestingly, four of the verified ceramide-binding proteins (HPCA, HPCAL1, NCS1, and VSNL1) and an additional three candidate ceramide-binding proteins (NCALD, HPCAL4, and KCNIP3) belong to the neuronal calcium sensor family of EF hand-containing proteins. We used mutagenesis to map the ceramide-binding site in HPCA and to create a mutant HPCA that does not bind to ceramide. We demonstrated selective binding to ceramide by mammalian cell-produced wild type but not mutant HPCA. Intriguingly, we also identified a fragment from prostaglandin D2synthase that binds preferentially to ceramide 1-phosphate. The wide variety of proteins and domains capable of binding to ceramide suggests that many of the signaling functions of ceramide may be regulated by direct binding to these proteins. Based on the deep sequencing data, we estimate that our yeast surface cDNA display library covers ∼60% of the human proteome and our selection/deep sequencing protocol can identify target-interacting protein fragments that are present at extremely low frequency in the starting library. Thus, the yeast surface cDNA display/deep sequencing approach is a rapid, comprehensive, and flexible method for the analysis of protein-ligand interactions, particularly for the study of non-protein ligands. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.
Sheik, Cody S.; Reese, Brandi Kiel; Twing, Katrina I.; Sylvan, Jason B.; Grim, Sharon L.; Schrenk, Matthew O.; Sogin, Mitchell L.; Colwell, Frederick S.
2018-01-01
Earth’s subsurface environment is one of the largest, yet least studied, biomes on Earth, and many questions remain regarding what microorganisms are indigenous to the subsurface. Through the activity of the Census of Deep Life (CoDL) and the Deep Carbon Observatory, an open access 16S ribosomal RNA gene sequence database from diverse subsurface environments has been compiled. However, due to low quantities of biomass in the deep subsurface, the potential for incorporation of contaminants from reagents used during sample collection, processing, and/or sequencing is high. Thus, to understand the ecology of subsurface microorganisms (i.e., the distribution, richness, or survival), it is necessary to minimize, identify, and remove contaminant sequences that will skew the relative abundances of all taxa in the sample. In this meta-analysis, we identify putative contaminants associated with the CoDL dataset, recommend best practices for removing contaminants from samples, and propose a series of best practices for subsurface microbiology sampling. The most abundant putative contaminant genera observed, independent of evenness across samples, were Propionibacterium, Aquabacterium, Ralstonia, and Acinetobacter. While the top five most frequently observed genera were Pseudomonas, Propionibacterium, Acinetobacter, Ralstonia, and Sphingomonas. The majority of the most frequently observed genera (high evenness) were associated with reagent or potential human contamination. Additionally, in DNA extraction blanks, we observed potential archaeal contaminants, including methanogens, which have not been discussed in previous contamination studies. Such contaminants would directly affect the interpretation of subsurface molecular studies, as methanogenesis is an important subsurface biogeochemical process. Utilizing previously identified contaminant genera, we found that ∼27% of the total dataset were identified as contaminant sequences that likely originate from DNA extraction and DNA cleanup methods. Thus, controls must be taken at every step of the collection and processing procedure when working with low biomass environments such as, but not limited to, portions of Earth’s deep subsurface. Taken together, we stress that the CoDL dataset is an incredible resource for the broader research community interested in subsurface life, and steps to remove contamination derived sequences must be taken prior to using this dataset. PMID:29780369
Le, Thuy; Chiarella, Jennifer; Simen, Birgitte B; Hanczaruk, Bozena; Egholm, Michael; Landry, Marie L; Dieckhaus, Kevin; Rosen, Marc I; Kozal, Michael J
2009-06-29
It is largely unknown how frequently low-abundance HIV drug-resistant variants at levels under limit of detection of conventional genotyping (<20% of quasi-species) are present in antiretroviral-experienced persons experiencing virologic failure. Further, the clinical implications of low-abundance drug-resistant variants at time of virologic failure are unknown. Plasma samples from 22 antiretroviral-experienced subjects collected at time of virologic failure (viral load 1380 to 304,000 copies/mL) were obtained from a specimen bank (from 2004-2007). The prevalence and profile of drug-resistant mutations were determined using Sanger sequencing and ultra-deep pyrosequencing. Genotypes were interpreted using Stanford HIV database algorithm. Antiretroviral treatment histories were obtained by chart review and correlated with drug-resistant mutations. Low-abundance drug-resistant mutations were detected in all 22 subjects by deep sequencing and only in 3 subjects by Sanger sequencing. In total they accounted for 90 of 247 mutations (36%) detected by deep sequencing; the majority of these (95%) were not detected by standard genotyping. A mean of 4 additional mutations per subject were detected by deep sequencing (p<0.0001, 95%CI: 2.85-5.53). The additional low-abundance drug-resistant mutations increased a subject's genotypic resistance to one or more antiretrovirals in 17 of 22 subjects (77%). When correlated with subjects' antiretroviral treatment histories, the additional low-abundance drug-resistant mutations correlated with the failing antiretroviral drugs in 21% subjects and correlated with historical antiretroviral use in 79% subjects (OR, 13.73; 95% CI, 2.5-74.3, p = 0.0016). Low-abundance HIV drug-resistant mutations in antiretroviral-experienced subjects at time of virologic failure can increase a subject's overall burden of resistance, yet commonly go unrecognized by conventional genotyping. The majority of unrecognized resistant mutations correlate with historical antiretroviral use. Ultra-deep sequencing can provide important historical resistance information for clinicians when planning subsequent antiretroviral regimens for highly treatment-experienced patients, particularly when their prior treatment histories and longitudinal genotypes are not available.
Le, Thuy; Chiarella, Jennifer; Simen, Birgitte B.; Hanczaruk, Bozena; Egholm, Michael; Landry, Marie L.; Dieckhaus, Kevin; Rosen, Marc I.; Kozal, Michael J.
2009-01-01
Background It is largely unknown how frequently low-abundance HIV drug-resistant variants at levels under limit of detection of conventional genotyping (<20% of quasi-species) are present in antiretroviral-experienced persons experiencing virologic failure. Further, the clinical implications of low-abundance drug-resistant variants at time of virologic failure are unknown. Methodology/Principal Findings Plasma samples from 22 antiretroviral-experienced subjects collected at time of virologic failure (viral load 1380 to 304,000 copies/mL) were obtained from a specimen bank (from 2004–2007). The prevalence and profile of drug-resistant mutations were determined using Sanger sequencing and ultra-deep pyrosequencing. Genotypes were interpreted using Stanford HIV database algorithm. Antiretroviral treatment histories were obtained by chart review and correlated with drug-resistant mutations. Low-abundance drug-resistant mutations were detected in all 22 subjects by deep sequencing and only in 3 subjects by Sanger sequencing. In total they accounted for 90 of 247 mutations (36%) detected by deep sequencing; the majority of these (95%) were not detected by standard genotyping. A mean of 4 additional mutations per subject were detected by deep sequencing (p<0.0001, 95%CI: 2.85–5.53). The additional low-abundance drug-resistant mutations increased a subject's genotypic resistance to one or more antiretrovirals in 17 of 22 subjects (77%). When correlated with subjects' antiretroviral treatment histories, the additional low-abundance drug-resistant mutations correlated with the failing antiretroviral drugs in 21% subjects and correlated with historical antiretroviral use in 79% subjects (OR, 13.73; 95% CI, 2.5–74.3, p = 0.0016). Conclusions/Significance Low-abundance HIV drug-resistant mutations in antiretroviral-experienced subjects at time of virologic failure can increase a subject's overall burden of resistance, yet commonly go unrecognized by conventional genotyping. The majority of unrecognized resistant mutations correlate with historical antiretroviral use. Ultra-deep sequencing can provide important historical resistance information for clinicians when planning subsequent antiretroviral regimens for highly treatment-experienced patients, particularly when their prior treatment histories and longitudinal genotypes are not available. PMID:19562031
Sheik, Cody S; Reese, Brandi Kiel; Twing, Katrina I; Sylvan, Jason B; Grim, Sharon L; Schrenk, Matthew O; Sogin, Mitchell L; Colwell, Frederick S
2018-01-01
Earth's subsurface environment is one of the largest, yet least studied, biomes on Earth, and many questions remain regarding what microorganisms are indigenous to the subsurface. Through the activity of the Census of Deep Life (CoDL) and the Deep Carbon Observatory, an open access 16S ribosomal RNA gene sequence database from diverse subsurface environments has been compiled. However, due to low quantities of biomass in the deep subsurface, the potential for incorporation of contaminants from reagents used during sample collection, processing, and/or sequencing is high. Thus, to understand the ecology of subsurface microorganisms (i.e., the distribution, richness, or survival), it is necessary to minimize, identify, and remove contaminant sequences that will skew the relative abundances of all taxa in the sample. In this meta-analysis, we identify putative contaminants associated with the CoDL dataset, recommend best practices for removing contaminants from samples, and propose a series of best practices for subsurface microbiology sampling. The most abundant putative contaminant genera observed, independent of evenness across samples, were Propionibacterium , Aquabacterium , Ralstonia , and Acinetobacter . While the top five most frequently observed genera were Pseudomonas , Propionibacterium , Acinetobacter , Ralstonia , and Sphingomonas . The majority of the most frequently observed genera (high evenness) were associated with reagent or potential human contamination. Additionally, in DNA extraction blanks, we observed potential archaeal contaminants, including methanogens, which have not been discussed in previous contamination studies. Such contaminants would directly affect the interpretation of subsurface molecular studies, as methanogenesis is an important subsurface biogeochemical process. Utilizing previously identified contaminant genera, we found that ∼27% of the total dataset were identified as contaminant sequences that likely originate from DNA extraction and DNA cleanup methods. Thus, controls must be taken at every step of the collection and processing procedure when working with low biomass environments such as, but not limited to, portions of Earth's deep subsurface. Taken together, we stress that the CoDL dataset is an incredible resource for the broader research community interested in subsurface life, and steps to remove contamination derived sequences must be taken prior to using this dataset.
Kempton, Colton E.; Heninger, Justin R.; Johnson, Steven M.
2014-01-01
Nucleosomes and their positions in the eukaryotic genome play an important role in regulating gene expression by influencing accessibility to DNA. Many factors influence a nucleosome's final position in the chromatin landscape including the underlying genomic sequence. One of the primary reasons for performing in vitro nucleosome reconstitution experiments is to identify how the underlying DNA sequence will influence a nucleosome's position in the absence of other compounding cellular factors. However, concerns have been raised about the reproducibility of data generated from these kinds of experiments. Here we present data for in vitro nucleosome reconstitution experiments performed on linear plasmid DNA that demonstrate that, when coverage is deep enough, these reconstitution experiments are exquisitely reproducible and highly consistent. Our data also suggests that a coverage depth of 35X be maintained for maximal confidence when assaying nucleosome positions, but lower coverage levels may be generally sufficient. These coverage depth recommendations are sufficient in the experimental system and conditions used in this study, but may vary depending on the exact parameters used in other systems. PMID:25093869
HSA: a heuristic splice alignment tool.
Bu, Jingde; Chi, Xuebin; Jin, Zhong
2013-01-01
RNA-Seq methodology is a revolutionary transcriptomics sequencing technology, which is the representative of Next generation Sequencing (NGS). With the high throughput sequencing of RNA-Seq, we can acquire much more information like differential expression and novel splice variants from deep sequence analysis and data mining. But the short read length brings a great challenge to alignment, especially when the reads span two or more exons. A two steps heuristic splice alignment tool is generated in this investigation. First, map raw reads to reference with unspliced aligner--BWA; second, split initial unmapped reads into three equal short reads (seeds), align each seed to the reference, filter hits, search possible split position of read and extend hits to a complete match. Compare with other splice alignment tools like SOAPsplice and Tophat2, HSA has a better performance in call rate and efficiency, but its results do not as accurate as the other software to some extent. HSA is an effective spliced aligner of RNA-Seq reads mapping, which is available at https://github.com/vlcc/HSA.
Novel method for high-throughput colony PCR screening in nanoliter-reactors
Walser, Marcel; Pellaux, Rene; Meyer, Andreas; Bechtold, Matthias; Vanderschuren, Herve; Reinhardt, Richard; Magyar, Joseph; Panke, Sven; Held, Martin
2009-01-01
We introduce a technology for the rapid identification and sequencing of conserved DNA elements employing a novel suspension array based on nanoliter (nl)-reactors made from alginate. The reactors have a volume of 35 nl and serve as reaction compartments during monoseptic growth of microbial library clones, colony lysis, thermocycling and screening for sequence motifs via semi-quantitative fluorescence analyses. nl-Reactors were kept in suspension during all high-throughput steps which allowed performing the protocol in a highly space-effective fashion and at negligible expenses of consumables and reagents. As a first application, 11 high-quality microsatellites for polymorphism studies in cassava were isolated and sequenced out of a library of 20 000 clones in 2 days. The technology is widely scalable and we envision that throughputs for nl-reactor based screenings can be increased up to 100 000 and more samples per day thereby efficiently complementing protocols based on established deep-sequencing technologies. PMID:19282448
Powell, J. Elijah; Ratnayeke, Nalin; Moran, Nancy A.
2017-01-01
High throughput rRNA amplicon surveys of bacterial communities provide a rapid snapshot of taxonomic composition. But strains with nearly identical rRNA sequences often differ in gene repertoires and metabolic capabilities. To assess strain-level variation within Snodgrassella alvi, a gut symbiont of corbiculate bees, we performed deep sequencing on amplicons of a single copy coding gene (minD) as well as the 16S rDNA V4 region. We surveyed honey bees (Apis mellifera) sampled globally and 12 bumble bee species (Bombus) sampled from two regions of the USA. The minD analyses reveal that S. alvi contains far more strain diversity than is evident from 16S rDNA analysis. Many taxa inferred on the basis of 16S rDNA are shared between A. mellifera and Bombus species, but taxa inferred on the basis of minD are never shared and often are restricted to particular Bombus species. Clustering based on minD revealed that gut communities often reflect host species and geographic location. Both minD and 16S rDNA analyses indicate that strain diversity is higher in A. mellifera than in Bombus species. The minD locus flanks a 16S gene, enabling development of strain-specific 16S fluorescent probes to illuminate the spatial relationship of strains within the bee gut. PMID:27482856
Gutierrez, Tony; Biddle, Jennifer F; Teske, Andreas; Aitken, Michael D
2015-01-01
Marine hydrocarbon-degrading bacteria perform a fundamental role in the biodegradation of crude oil and its petrochemical derivatives in coastal and open ocean environments. However, there is a paucity of knowledge on the diversity and function of these organisms in deep-sea sediment. Here we used stable-isotope probing (SIP), a valuable tool to link the phylogeny and function of targeted microbial groups, to investigate polycyclic aromatic hydrocarbon (PAH)-degrading bacteria under aerobic conditions in sediments from Guaymas Basin with uniformly labeled [(13)C]-phenanthrene (PHE). The dominant sequences in clone libraries constructed from (13)C-enriched bacterial DNA (from PHE enrichments) were identified to belong to the genus Cycloclasticus. We used quantitative PCR primers targeting the 16S rRNA gene of the SIP-identified Cycloclasticus to determine their abundance in sediment incubations amended with unlabeled PHE and showed substantial increases in gene abundance during the experiments. We also isolated a strain, BG-2, representing the SIP-identified Cycloclasticus sequence (99.9% 16S rRNA gene sequence identity), and used this strain to provide direct evidence of PHE degradation and mineralization. In addition, we isolated Halomonas, Thalassospira, and Lutibacterium sp. with demonstrable PHE-degrading capacity from Guaymas Basin sediment. This study demonstrates the value of coupling SIP with cultivation methods to identify and expand on the known diversity of PAH-degrading bacteria in the deep-sea.
Gutierrez, Tony; Biddle, Jennifer F.; Teske, Andreas; Aitken, Michael D.
2015-01-01
Marine hydrocarbon-degrading bacteria perform a fundamental role in the biodegradation of crude oil and its petrochemical derivatives in coastal and open ocean environments. However, there is a paucity of knowledge on the diversity and function of these organisms in deep-sea sediment. Here we used stable-isotope probing (SIP), a valuable tool to link the phylogeny and function of targeted microbial groups, to investigate polycyclic aromatic hydrocarbon (PAH)-degrading bacteria under aerobic conditions in sediments from Guaymas Basin with uniformly labeled [13C]-phenanthrene (PHE). The dominant sequences in clone libraries constructed from 13C-enriched bacterial DNA (from PHE enrichments) were identified to belong to the genus Cycloclasticus. We used quantitative PCR primers targeting the 16S rRNA gene of the SIP-identified Cycloclasticus to determine their abundance in sediment incubations amended with unlabeled PHE and showed substantial increases in gene abundance during the experiments. We also isolated a strain, BG-2, representing the SIP-identified Cycloclasticus sequence (99.9% 16S rRNA gene sequence identity), and used this strain to provide direct evidence of PHE degradation and mineralization. In addition, we isolated Halomonas, Thalassospira, and Lutibacterium sp. with demonstrable PHE-degrading capacity from Guaymas Basin sediment. This study demonstrates the value of coupling SIP with cultivation methods to identify and expand on the known diversity of PAH-degrading bacteria in the deep-sea. PMID:26217326
Azzouzi, Imane; Moest, Hansjoerg; Wollscheid, Bernd; Schmugge, Markus; Eekels, Julia J M; Speer, Oliver
2015-05-01
During maturation, erythropoietic cells extrude their nuclei but retain their ability to respond to oxidant stress by tightly regulating protein translation. Several studies have reported microRNA-mediated regulation of translation during terminal stages of erythropoiesis, even after enucleation. In the present study, we performed a detailed examination of the endogenous microRNA machinery in human red blood cells using a combination of deep sequencing analysis of microRNAs and proteomic analysis of the microRNA-induced silencing complex. Among the 197 different microRNAs detected, miR-451a was the most abundant, representing more than 60% of all read sequences. In addition, miR-451a and its known target, 14-3-3ζ mRNA, were bound to the microRNA-induced silencing complex, implying their direct interaction in red blood cells. The proteomic characterization of endogenous Argonaute 2-associated microRNA-induced silencing complex revealed 26 cofactor candidates. Among these cofactors, we identified several RNA-binding proteins, as well as motor proteins and vesicular trafficking proteins. Our results demonstrate that red blood cells contain complex microRNA machinery, which might enable immature red blood cells to control protein translation independent of de novo nuclei information. Copyright © 2015 ISEH - International Society for Experimental Hematology. Published by Elsevier Inc. All rights reserved.
Wanke, Stefan; Granados Mendoza, Carolina; Müller, Sebastian; Paizanni Guillén, Anna; Neinhuis, Christoph; Lemmon, Alan R; Lemmon, Emily Moriarty; Samain, Marie-Stéphanie
2017-12-01
Recalcitrant relationships are characterized by very short internodes that can be found among shallow and deep phylogenetic scales all over the tree of life. Adding large amounts of presumably informative sequences, while decreasing systematic error, has been suggested as a possible approach to increase phylogenetic resolution. The development of enrichment strategies, coupled with next generation sequencing, resulted in a cost-effective way to facilitate the reconstruction of recalcitrant relationships. By applying the anchored hybrid enrichment (AHE) genome partitioning strategy to Aristolochia using an universal angiosperm probe set, we obtained 231-233 out of 517 single or low copy nuclear loci originally contained in the enrichment kit, resulting in a total alignment length of 154,756bp to 160,150bp. Since Aristolochia (Piperales; magnoliids) is distantly related to any angiosperm species whose genome has been used for the plant AHE probe design (Amborella trichopoda being the closest), it serves as a proof of universality for this probe set. Aristolochia comprises approximately 500 species grouped in several clades (OTUs), whose relationships to each other are partially unknown. Previous phylogenetic studies have shown that these lineages branched deep in time and in quick succession, seen as short-deep internodes. Short-shallow internodes are also characteristic of some Aristolochia lineages such as Aristolochia subsection Pentandrae, a clade of presumably recent diversification. This subsection is here included to test the performance of AHE at species level. Filtering and subsampling loci using the phylogenetic informativeness method resolves several recalcitrant phylogenetic relationships within Aristolochia. By assuming different ploidy levels during bioinformatics processing of raw data, first hints are obtained that polyploidization contributed to the evolution of Aristolochia. Phylogenetic results are discussed in the light of current systematics and morphology. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Graphical classification of DNA sequences of HLA alleles by deep learning.
Miyake, Jun; Kaneshita, Yuhei; Asatani, Satoshi; Tagawa, Seiichi; Niioka, Hirohiko; Hirano, Takashi
2018-04-01
Alleles of human leukocyte antigen (HLA)-A DNAs are classified and expressed graphically by using artificial intelligence "Deep Learning (Stacked autoencoder)". Nucleotide sequence data corresponding to the length of 822 bp, collected from the Immuno Polymorphism Database, were compressed to 2-dimensional representation and were plotted. Profiles of the two-dimensional plots indicate that the alleles can be classified as clusters are formed. The two-dimensional plot of HLA-A DNAs gives a clear outlook for characterizing the various alleles.
The Deepwater Horizon Oil Spill: Ecogenomics of the Deep-Sea Plume
NASA Astrophysics Data System (ADS)
Hazen, T. C.
2012-12-01
The explosion on April 20, 2010 at the BP-leased Deepwater Horizon drilling rig in the Gulf of Mexico off the coast of Louisiana, resulted in oil and gas rising to the surface and the oil coming ashore in many parts of the Gulf, it also resulted in the dispersment of an immense oil plume 4,000 feet below the surface of the water. Despite spanning more than 600 feet in the water column and extending more than 10 miles from the wellhead, the dispersed oil plume was gone within weeks after the wellhead was capped - degraded and diluted to undetectable levels. Furthermore, this degradation took place without significant oxygen depletion. Ecogenomics enabled discovery of new and unclassified species of oil-eating bacteria that apparently lives in the deep Gulf where oil seeps are common. Using 16s microarrays, functional gene arrays, clone libraries, lipid analysis and a variety of hydrocarbon and micronutrient analyses we were able to characterize the oil degraders. Metagenomic sequence data was obtained for the deep-water samples using the Illumina platform. In addition, single cells were sorted and sequenced for the some of the most dominant bacteria that were represented in the oil plume; namely uncultivated representatives of Colwellia and Oceanospirillum. In addition, we performed laboratory microcosm experiments using uncontaminated water collected from The Gulf at the depth of the oil plume to which we added oil and COREXIT. These samples were characterized by 454 pyrotag. The results provide information about the key players and processes involved in degradation of oil, with and without COREXIT, in different impacted environments in The Gulf of Mexico. We are also extending these studies to explore dozens of deep sediment samples that were also collected after the oil spill around the wellhead. This data suggests that a great potential for intrinsic bioremediation of oil plumes exists in the deep-sea and other environs in the Gulf of Mexico.
Coffey, Lark L; Page, Brady L; Greninger, Alexander L; Herring, Belinda L; Russell, Richard C; Doggett, Stephen L; Haniotis, John; Wang, Chunlin; Deng, Xutao; Delwart, Eric L
2014-01-05
Viral metagenomics characterizes known and identifies unknown viruses based on sequence similarities to any previously sequenced viral genomes. A metagenomics approach was used to identify virus sequences in Australian mosquitoes causing cytopathic effects in inoculated mammalian cell cultures. Sequence comparisons revealed strains of Liao Ning virus (Reovirus, Seadornavirus), previously detected only in China, livestock-infecting Stretch Lagoon virus (Reovirus, Orbivirus), two novel dimarhabdoviruses, named Beaumont and North Creek viruses, and two novel orthobunyaviruses, named Murrumbidgee and Salt Ash viruses. The novel virus proteomes diverged by ≥ 50% relative to their closest previously genetically characterized viral relatives. Deep sequencing also generated genomes of Warrego and Wallal viruses, orbiviruses linked to kangaroo blindness, whose genomes had not been fully characterized. This study highlights viral metagenomics in concert with traditional arbovirus surveillance to characterize known and new arboviruses in field-collected mosquitoes. Follow-up epidemiological studies are required to determine whether the novel viruses infect humans. © 2013 Elsevier Inc. All rights reserved.
3' terminal diversity of MRP RNA and other human noncoding RNAs revealed by deep sequencing.
Goldfarb, Katherine C; Cech, Thomas R
2013-09-21
Post-transcriptional 3' end processing is a key component of RNA regulation. The abundant and essential RNA subunit of RNase MRP has been proposed to function in three distinct cellular compartments and therefore may utilize this mode of regulation. Here we employ 3' RACE coupled with high-throughput sequencing to characterize the 3' terminal sequences of human MRP RNA and other noncoding RNAs that form RNP complexes. The 3' terminal sequence of MRP RNA from HEK293T cells has a distinctive distribution of genomically encoded termini (including an assortment of U residues) with a portion of these selectively tagged by oligo(A) tails. This profile contrasts with the relatively homogenous 3' terminus of an in vitro transcribed MRP RNA control and the differing 3' terminal profiles of U3 snoRNA, RNase P RNA, and telomerase RNA (hTR). 3' RACE coupled with deep sequencing provides a valuable framework for the functional characterization of 3' terminal sequences of noncoding RNAs.
Reactive Sequencing for Autonomous Navigation Evolving from Phoenix Entry, Descent, and Landing
NASA Technical Reports Server (NTRS)
Grasso, Christopher A.; Riedel, Joseph E.; Vaughan, Andrew T.
2010-01-01
Virtual Machine Language (VML) is an award-winning advanced procedural sequencing language in use on NASA deep-space missions since 1997, and was used for the successful entry, descent, and landing (EDL) of the Phoenix spacecraft onto the surface of Mars. Phoenix EDL utilized a state-oriented operations architecture which executed within the constraints of the existing VML 2.0 flight capability, compatible with the linear "land or die" nature of the mission. The intricacies of Phoenix EDL included the planned discarding of portions of the vehicle, the complex communications management for relay through on-orbit assets, the presence of temporally indeterminate physical events, and the need to rapidly catch up four days of sequencing should a reboot of the spacecraft flight computer occur shortly before atmospheric entry. These formidable operational challenges led to new techniques for packaging and coordinating reusable sequences called blocks using one-way synchronization via VML sequencing global variable events. The coordinated blocks acted as an ensemble to land the spacecraft, while individually managing various elements in as simple a fashion as possible. This paper outlines prototype VML 2.1 flight capabilities that have evolved from the one-way synchronization techniques in order to implement even more ambitious autonomous mission capabilities. Target missions for these new capabilities include autonomous touch-and-go sampling of cometary and asteroidal bodies, lunar landing of robotic missions, and ultimately landing of crewed lunar vehicles. Close proximity guidance, navigation, and control operations, on-orbit rendezvous, and descent and landing events featured in these missions require elaborate abort capability, manifesting highly non-linear scenarios that are so complex as to overtax traditional sequencing, or even the sort of one-way coordinated sequencing used during EDL. Foreseeing advanced command and control needs for small body and lunar landing guidance, navigation and control scenarios, work began three years ago on substantial upgrades to VML that are now being exercised in scenarios for lunar landing and comet/asteroid rendezvous. The advanced state-based approach includes coordinated state transition machines with distributed decision-making logic. These state machines are not merely sequences - they are reactive logic constructs capable of autonomous decision making within a well-defined domain. Combined with the JPL's AutoNav software used on Deep Space 1 and Deep Impact, the system allows spacecraft to autonomously navigate to an unmapped surface, soft-contact, and either land or ascend. The state machine architecture enabled by VML 2.1 has successfully performed sampling missions and lunar descent missions in a simulated environment, and is progressing toward flight capability. The authors are also investigating using the VML 2.1 flight director architecture to perform autonomous activities like rendezvous with a passive hypothetical Mars sample return capsule. The approach being pursued is similar to the touch-and-go sampling state machines, with the added complications associated with the search for, physical capture of, and securing of a separate spacecraft. Complications include optically finding and tracking the Orbiting Sample Capsule (OSC), keeping the OSC illuminated, making orbital adjustments, and physically capturing the OSC. Other applications could include autonomous science collection and fault compensation.
Zhong, Daibin; Lo, Eugenia; Wang, Xiaoming; Yewhalaw, Delenasaw; Zhou, Guofa; Atieli, Harrysone E; Githeko, Andrew; Hemming-Schroeder, Elizabeth; Lee, Ming-Chieh; Afrane, Yaw; Yan, Guiyun
2018-05-02
Parasite genetic diversity and multiplicity of infection (MOI) affect clinical outcomes, response to drug treatment and naturally-acquired or vaccine-induced immunity. Traditional methods often underestimate the frequency and diversity of multiclonal infections due to technical sensitivity and specificity. Next-generation sequencing techniques provide a novel opportunity to study complexity of parasite populations and molecular epidemiology. Symptomatic and asymptomatic Plasmodium vivax samples were collected from health centres/hospitals and schools, respectively, from 2011 to 2015 in Ethiopia. Similarly, both symptomatic and asymptomatic Plasmodium falciparum samples were collected, respectively, from hospitals and schools in 2005 and 2015 in Kenya. Finger-pricked blood samples were collected and dried on filter paper. Long amplicon (> 400 bp) deep sequencing of merozoite surface protein 1 (msp1) gene was conducted to determine multiplicity and molecular epidemiology of P. vivax and P. falciparum infections. The results were compared with those based on short amplicon (117 bp) deep sequencing. A total of 139 P. vivax and 222 P. falciparum samples were pyro-sequenced for pvmsp1 and pfmsp1, yielding a total of 21 P. vivax and 99 P. falciparum predominant haplotypes. The average MOI for P. vivax and P. falciparum were 2.16 and 2.68, respectively, which were significantly higher than that of microsatellite markers and short amplicon (117 bp) deep sequencing. Multiclonal infections were detected in 62.2% of the samples for P. vivax and 74.8% of the samples for P. falciparum. Four out of the five subjects with recurrent P. vivax malaria were found to be a relapse 44-65 days after clearance of parasites. No difference was observed in MOI among P. vivax patients of different symptoms, ages and genders. Similar patterns were also observed in P. falciparum except for one study site in Kenyan lowland areas with significantly higher MOI. The study used a novel method to evaluate Plasmodium MOI and molecular epidemiological patterns by long amplicon ultra-deep sequencing. The complexity of infections were similar among age groups, symptoms, genders, transmission settings (spatial heterogeneity), as well as over years (pre- vs. post-scale-up interventions). This study demonstrated that long amplicon deep sequencing is a useful tool to investigate multiplicity and molecular epidemiology of Plasmodium parasite infections.
The Role Of Rejuvenation In Shaping The High-Mass End Of The Main Sequence
NASA Astrophysics Data System (ADS)
Mancini, Chiara
2017-06-01
We investigate the nature of star forming galaxies with reduced specific SFRs and high stellar masses, those that seemingly cause the so-called bending of the main sequence. The fact that such objects host large bulges recently lead some to suggest that the internal formation of the bulges, via compaction or disk instabilities, was the late event that induced sSFRs of massive galaxies to drop in a slow downfall and thus the main sequence to bend. We have studied in detail a sample of 16 galaxies at 0.5
Olsson, Linda; Zettermark, Sofia; Biloglav, Andrea; Castor, Anders; Behrendtz, Mikael; Forestier, Erik; Paulsson, Kajsa; Johansson, Bertil
2016-07-01
Cytogenetic analyses of a consecutive series of 67 paediatric (median age 8 years; range 0-17) de novo acute myeloid leukaemia (AML) patients revealed aberrations in 55 (82%) cases. The most common subgroups were KMT2A rearrangement (29%), normal karyotype (15%), RUNX1-RUNX1T1 (10%), deletions of 5q, 7q and/or 17p (9%), myeloid leukaemia associated with Down syndrome (7%), PML-RARA (7%) and CBFB-MYH11 (5%). Single nucleotide polymorphism array (SNP-A) analysis and exon sequencing of 100 genes, performed in 52 and 40 cases, respectively (39 overlapping), revealed ≥1 aberration in 89%; when adding cytogenetic data, this frequency increased to 98%. Uniparental isodisomies (UPIDs) were detected in 13% and copy number aberrations (CNAs) in 63% (median 2/case); three UPIDs and 22 CNAs were recurrent. Twenty-two genes were targeted by focal CNAs, including AEBP2 and PHF6 deletions and genes involved in AML-associated gene fusions. Deep sequencing identified mutations in 65% of cases (median 1/case). In total, 60 mutations were found in 30 genes, primarily those encoding signalling proteins (47%), transcription factors (25%), or epigenetic modifiers (13%). Twelve genes (BCOR, CEBPA, FLT3, GATA1, KIT, KRAS, NOTCH1, NPM1, NRAS, PTPN11, SMC3 and TP53) were recurrently mutated. We conclude that SNP-A and deep sequencing analyses complement the cytogenetic diagnosis of paediatric AML. © 2016 John Wiley & Sons Ltd.
Complete genome sequence of a novel genotype of squash mosaic virus
USDA-ARS?s Scientific Manuscript database
Complete genome sequence of a novel genotype of Squash mosaic virus (SqMV) infecting squash plants in Spain was obtained using deep sequencing of small ribonucleic acids and assembly. The low nucleotide sequence identities, with 87-88% on RNA1 and 84-86% on RNA2 to known SqMV isolates, suggest a new...
USDA-ARS?s Scientific Manuscript database
The complete genome sequence (6,423 nt) of an emerging Cucumber green mottle mosaic virus (CGMMV) isolate on cucumber in North America was determined through deep sequencing of sRNA and rapid amplification of cDNA ends. It shares 99% nucleotide sequence identity to the Asian genotype, but only 90% t...
Kitahara, Marcelo V.; Cairns, Stephen D.; Stolarski, Jarosław; Blair, David; Miller, David J.
2010-01-01
Background Classical morphological taxonomy places the approximately 1400 recognized species of Scleractinia (hard corals) into 27 families, but many aspects of coral evolution remain unclear despite the application of molecular phylogenetic methods. In part, this may be a consequence of such studies focusing on the reef-building (shallow water and zooxanthellate) Scleractinia, and largely ignoring the large number of deep-sea species. To better understand broad patterns of coral evolution, we generated molecular data for a broad and representative range of deep sea scleractinians collected off New Caledonia and Australia during the last decade, and conducted the most comprehensive molecular phylogenetic analysis to date of the order Scleractinia. Methodology Partial (595 bp) sequences of the mitochondrial cytochrome oxidase subunit 1 (CO1) gene were determined for 65 deep-sea (azooxanthellate) scleractinians and 11 shallow-water species. These new data were aligned with 158 published sequences, generating a 234 taxon dataset representing 25 of the 27 currently recognized scleractinian families. Principal Findings/Conclusions There was a striking discrepancy between the taxonomic validity of coral families consisting predominantly of deep-sea or shallow-water species. Most families composed predominantly of deep-sea azooxanthellate species were monophyletic in both maximum likelihood and Bayesian analyses but, by contrast (and consistent with previous studies), most families composed predominantly of shallow-water zooxanthellate taxa were polyphyletic, although Acroporidae, Poritidae, Pocilloporidae, and Fungiidae were exceptions to this general pattern. One factor contributing to this inconsistency may be the greater environmental stability of deep-sea environments, effectively removing taxonomic “noise” contributed by phenotypic plasticity. Our phylogenetic analyses imply that the most basal extant scleractinians are azooxanthellate solitary corals from deep-water, their divergence predating that of the robust and complex corals. Deep-sea corals are likely to be critical to understanding anthozoan evolution and the origins of the Scleractinia. PMID:20628613
HIV-1 transmission linkage in an HIV-1 prevention clinical trial
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leitner, Thomas; Campbell, Mary S; Mullins, James I
2009-01-01
HIV-1 sequencing has been used extensively in epidemiologic and forensic studies to investigate patterns of HIV-1 transmission. However, the criteria for establishing genetic linkage between HIV-1 strains in HIV-1 prevention trials have not been formalized. The Partners in Prevention HSV/HIV Transmission Study (ClinicaITrials.gov NCT00194519) enrolled 3408 HIV-1 serodiscordant heterosexual African couples to determine the efficacy of genital herpes suppression with acyclovir in reducing HIV-1 transmission. The trial analysis required laboratory confirmation of HIV-1 linkage between enrolled partners in couples in which seroconversion occurred. Here we describe the process and results from HIV-1 sequencing studies used to perform transmission linkage determinationmore » in this clinical trial. Consensus Sanger sequencing of env (C2-V3-C3) and gag (p17-p24) genes was performed on plasma HIV-1 RNA from both partners within 3 months of seroconversion; env single molecule or pyrosequencing was also performed in some cases. For linkage, we required monophyletic clustering between HIV-1 sequences in the transmitting and seroconverting partners, and developed a Bayesian algorithm using genetic distances to evaluate the posterior probability of linkage of participants sequences. Adjudicators classified transmissions as linked, unlinked, or indeterminate. Among 151 seroconversion events, we found 108 (71.5%) linked, 40 (26.5%) unlinked, and 3 (2.0%) to have indeterminate transmissions. Nine (8.3%) were linked by consensus gag sequencing only and 8 (7.4%) required deep sequencing of env. In this first use of HIV-1 sequencing to establish endpoints in a large clinical trial, more than one-fourth of transmissions were unlinked to the enrolled partner, illustrating the relevance of these methods in the design of future HIV-1 prevention trials in serodiscordant couples. A hierarchy of sequencing techniques, analysis methods, and expert adjudication contributed to the linkage determination process.« less
Comprehensive discovery of noncoding RNAs in acute myeloid leukemia cell transcriptomes.
Zhang, Jin; Griffith, Malachi; Miller, Christopher A; Griffith, Obi L; Spencer, David H; Walker, Jason R; Magrini, Vincent; McGrath, Sean D; Ly, Amy; Helton, Nichole M; Trissal, Maria; Link, Daniel C; Dang, Ha X; Larson, David E; Kulkarni, Shashikant; Cordes, Matthew G; Fronick, Catrina C; Fulton, Robert S; Klco, Jeffery M; Mardis, Elaine R; Ley, Timothy J; Wilson, Richard K; Maher, Christopher A
2017-11-01
To detect diverse and novel RNA species comprehensively, we compared deep small RNA and RNA sequencing (RNA-seq) methods applied to a primary acute myeloid leukemia (AML) sample. We were able to discover previously unannotated small RNAs using deep sequencing of a library method using broader insert size selection. We analyzed the long noncoding RNA (lncRNA) landscape in AML by comparing deep sequencing from multiple RNA-seq library construction methods for the sample that we studied and then integrating RNA-seq data from 179 AML cases. This identified lncRNAs that are completely novel, differentially expressed, and associated with specific AML subtypes. Our study revealed the complexity of the noncoding RNA transcriptome through a combined strategy of strand-specific small RNA and total RNA-seq. This dataset will serve as an invaluable resource for future RNA-based analyses. Copyright © 2017 ISEH – Society for Hematology and Stem Cells. Published by Elsevier Inc. All rights reserved.
Cocos, Anne; Fiks, Alexander G; Masino, Aaron J
2017-07-01
Social media is an important pharmacovigilance data source for adverse drug reaction (ADR) identification. Human review of social media data is infeasible due to data quantity, thus natural language processing techniques are necessary. Social media includes informal vocabulary and irregular grammar, which challenge natural language processing methods. Our objective is to develop a scalable, deep-learning approach that exceeds state-of-the-art ADR detection performance in social media. We developed a recurrent neural network (RNN) model that labels words in an input sequence with ADR membership tags. The only input features are word-embedding vectors, which can be formed through task-independent pretraining or during ADR detection training. Our best-performing RNN model used pretrained word embeddings created from a large, non-domain-specific Twitter dataset. It achieved an approximate match F-measure of 0.755 for ADR identification on the dataset, compared to 0.631 for a baseline lexicon system and 0.65 for the state-of-the-art conditional random field model. Feature analysis indicated that semantic information in pretrained word embeddings boosted sensitivity and, combined with contextual awareness captured in the RNN, precision. Our model required no task-specific feature engineering, suggesting generalizability to additional sequence-labeling tasks. Learning curve analysis showed that our model reached optimal performance with fewer training examples than the other models. ADR detection performance in social media is significantly improved by using a contextually aware model and word embeddings formed from large, unlabeled datasets. The approach reduces manual data-labeling requirements and is scalable to large social media datasets. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Position-specific binding of FUS to nascent RNA regulates mRNA length
Masuda, Akio; Takeda, Jun-ichi; Okuno, Tatsuya; Okamoto, Takaaki; Ohkawara, Bisei; Ito, Mikako; Ishigaki, Shinsuke; Sobue, Gen
2015-01-01
More than half of all human genes produce prematurely terminated polyadenylated short mRNAs. However, the underlying mechanisms remain largely elusive. CLIP-seq (cross-linking immunoprecipitation [CLIP] combined with deep sequencing) of FUS (fused in sarcoma) in neuronal cells showed that FUS is frequently clustered around an alternative polyadenylation (APA) site of nascent RNA. ChIP-seq (chromatin immunoprecipitation [ChIP] combined with deep sequencing) of RNA polymerase II (RNAP II) demonstrated that FUS stalls RNAP II and prematurely terminates transcription. When an APA site is located upstream of an FUS cluster, FUS enhances polyadenylation by recruiting CPSF160 and up-regulates the alternative short transcript. In contrast, when an APA site is located downstream from an FUS cluster, polyadenylation is not activated, and the RNAP II-suppressing effect of FUS leads to down-regulation of the alternative short transcript. CAGE-seq (cap analysis of gene expression [CAGE] combined with deep sequencing) and PolyA-seq (a strand-specific and quantitative method for high-throughput sequencing of 3' ends of polyadenylated transcripts) revealed that position-specific regulation of mRNA lengths by FUS is operational in two-thirds of transcripts in neuronal cells, with enrichment in genes involved in synaptic activities. PMID:25995189
Miyatake, Satoko; Koshimizu, Eriko; Hayashi, Yukiko K; Miya, Kazushi; Shiina, Masaaki; Nakashima, Mitsuko; Tsurusaki, Yoshinori; Miyake, Noriko; Saitsu, Hirotomo; Ogata, Kazuhiro; Nishino, Ichizo; Matsumoto, Naomichi
2014-07-01
When an expected mutation in a particular disease-causing gene is not identified in a suspected carrier, it is usually assumed to be due to germline mosaicism. We report here very-low-grade somatic mosaicism in ACTA1 in an unaffected mother of two siblings affected with a neonatal form of nemaline myopathy. The mosaicism was detected by deep resequencing using a next-generation sequencer. We identified a novel heterozygous mutation in ACTA1, c.448A>G (p.Thr150Ala), in the affected siblings. Three-dimensional structural modeling suggested that this mutation may affect polymerization and/or actin's interactions with other proteins. In this family, we expected autosomal dominant inheritance with either parent demonstrating germline or somatic mosaicism. Sanger sequencing identified no mutation. However, further deep resequencing of this mutation on a next-generation sequencer identified very-low-grade somatic mosaicism in the mother: 0.4%, 1.1%, and 8.3% in the saliva, blood leukocytes, and nails, respectively. Our study demonstrates the possibility of very-low-grade somatic mosaicism in suspected carriers, rather than germline mosaicism. Copyright © 2014 Elsevier B.V. All rights reserved.
Leda, Ana Rachel; Hunter, James; Oliveira, Ursula Castro; Azevedo, Inacio Junqueira; Sucupira, Maria Cecilia Araripe; Diaz, Ricardo Sobhie
2018-04-19
The presence of minority transmitted drug resistance mutations was assessed using ultra-deep sequencing and correlated with disease progression among recently HIV-1-infected individuals from Brazil. Samples at baseline during recent infection and 1 year after the establishment of the infection were analysed. Viral RNA and proviral DNA from 25 individuals were subjected to ultra-deep sequencing of the reverse transcriptase and protease regions of HIV-1. Viral strains carrying transmitted drug resistance mutations were detected in 9 out of the 25 patients, for all major antiretroviral classes, ranging from one to five mutations per patient. Ultra-deep sequencing detected strains with frequencies as low as 1.6% and only strains with frequencies >20% were detected by population plasma sequencing (three patients). Transmitted drug resistance strains with frequencies <14.8% did not persist upon established infection. The presence of transmitted drug resistance mutations was negatively correlated with the viral load and with CD4+ T cell count decay. Transmitted drug resistance mutations representing small percentages of the viral population do not persist during infection because they are negatively selected in the first year after HIV-1 seroconversion.
GenomeGems: evaluation of genetic variability from deep sequencing data
2012-01-01
Background Detection of disease-causing mutations using Deep Sequencing technologies possesses great challenges. In particular, organizing the great amount of sequences generated so that mutations, which might possibly be biologically relevant, are easily identified is a difficult task. Yet, for this assignment only limited automatic accessible tools exist. Findings We developed GenomeGems to gap this need by enabling the user to view and compare Single Nucleotide Polymorphisms (SNPs) from multiple datasets and to load the data onto the UCSC Genome Browser for an expanded and familiar visualization. As such, via automatic, clear and accessible presentation of processed Deep Sequencing data, our tool aims to facilitate ranking of genomic SNP calling. GenomeGems runs on a local Personal Computer (PC) and is freely available at http://www.tau.ac.il/~nshomron/GenomeGems. Conclusions GenomeGems enables researchers to identify potential disease-causing SNPs in an efficient manner. This enables rapid turnover of information and leads to further experimental SNP validation. The tool allows the user to compare and visualize SNPs from multiple experiments and to easily load SNP data onto the UCSC Genome browser for further detailed information. PMID:22748151
Cavalier-Smith, Thomas
2015-04-01
Contradictory and confusing results can arise if sequenced 'monoprotist' samples really contain DNA of very different species. Eukaryote-wide phylogenetic analyses using five genes from the amoeboflagellate culture ATCC 50646 previously implied it was an undescribed percolozoan related to percolatean flagellates (Stephanopogon, Percolomonas). Contrastingly, three phylogenetic analyses of 18S rRNA alone, did not place it within Percolozoa, but as an isolated deep-branching excavate. I resolve that contradiction by sequence phylogenies for all five genes individually, using up to 652 taxa. Its 18S rRNA sequence (GQ377652) is near-identical to one from stained-glass windows, somewhat more distant from one from cooling-tower water, all three related to terrestrial actinocephalid gregarines Hoplorhynchus and Pyxinia. All four protein-gene sequences (Hsp90; α-tubulin; β-tubulin; actin) are from an amoeboflagellate heterolobosean percolozoan, not especially deeply branching. Contrary to previous conclusions from trees combining protein and rRNA sequences or rDNA trees including Eozoa only, this culture does not represent a major novel deep-branching eukaryote lineage distinct from Heterolobosea, and thus lacks special significance for deep eukaryote phylogeny, though the rDNA sequence is important for gregarine phylogeny. α-Tubulin trees for over 250 eukaryotes refute earlier suggestions of lateral gene transfer within eukaryotes, being largely congruent with morphology and other gene trees. Copyright © 2015. Published by Elsevier GmbH.
Chromatin accessibility prediction via a hybrid deep convolutional neural network.
Liu, Qiao; Xia, Fei; Yin, Qijin; Jiang, Rui
2018-03-01
A majority of known genetic variants associated with human-inherited diseases lie in non-coding regions that lack adequate interpretation, making it indispensable to systematically discover functional sites at the whole genome level and precisely decipher their implications in a comprehensive manner. Although computational approaches have been complementing high-throughput biological experiments towards the annotation of the human genome, it still remains a big challenge to accurately annotate regulatory elements in the context of a specific cell type via automatic learning of the DNA sequence code from large-scale sequencing data. Indeed, the development of an accurate and interpretable model to learn the DNA sequence signature and further enable the identification of causative genetic variants has become essential in both genomic and genetic studies. We proposed Deopen, a hybrid framework mainly based on a deep convolutional neural network, to automatically learn the regulatory code of DNA sequences and predict chromatin accessibility. In a series of comparison with existing methods, we show the superior performance of our model in not only the classification of accessible regions against background sequences sampled at random, but also the regression of DNase-seq signals. Besides, we further visualize the convolutional kernels and show the match of identified sequence signatures and known motifs. We finally demonstrate the sensitivity of our model in finding causative noncoding variants in the analysis of a breast cancer dataset. We expect to see wide applications of Deopen with either public or in-house chromatin accessibility data in the annotation of the human genome and the identification of non-coding variants associated with diseases. Deopen is freely available at https://github.com/kimmo1019/Deopen. ruijiang@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Cornelissen, Marion; Gall, Astrid; Vink, Monique; Zorgdrager, Fokla; Binter, Špela; Edwards, Stephanie; Jurriaans, Suzanne; Bakker, Margreet; Ong, Swee Hoe; Gras, Luuk; van Sighem, Ard; Bezemer, Daniela; de Wolf, Frank; Reiss, Peter; Kellam, Paul; Berkhout, Ben; Fraser, Christophe; van der Kuyl, Antoinette C
2017-07-15
The BEEHIVE (Bridging the Evolution and Epidemiology of HIV in Europe) project aims to analyse nearly-complete viral genomes from >3000 HIV-1 infected Europeans using high-throughput deep sequencing techniques to investigate the virus genetic contribution to virulence. Following the development of a computational pipeline, including a new de novo assembler for RNA virus genomes, to generate larger contiguous sequences (contigs) from the abundance of short sequence reads that characterise the data, another area that determines genome sequencing success is the quality and quantity of the input RNA. A pilot experiment with 125 patient plasma samples was performed to investigate the optimal method for isolation of HIV-1 viral RNA for long amplicon genome sequencing. Manual isolation with the QIAamp Viral RNA Mini Kit (Qiagen) was superior over robotically extracted RNA using either the QIAcube robotic system, the mSample Preparation Systems RNA kit with automated extraction by the m2000sp system (Abbott Molecular), or the MagNA Pure 96 System in combination with the MagNA Pure 96 Instrument (Roche Diagnostics). We scored amplification of a set of four HIV-1 amplicons of ∼1.9, 3.6, 3.0 and 3.5kb, and subsequent recovery of near-complete viral genomes. Subsequently, 616 BEEHIVE patient samples were analysed to determine factors that influence successful amplification of the genome in four overlapping amplicons using the QIAamp Viral RNA Kit for viral RNA isolation. Both low plasma viral load and high sample age (stored before 1999) negatively influenced the amplification of viral amplicons >3kb. A plasma viral load of >100,000 copies/ml resulted in successful amplification of all four amplicons for 86% of the samples, this value dropped to only 46% for samples with viral loads of <20,000 copies/ml. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Predicting residue-wise contact orders in proteins by support vector regression.
Song, Jiangning; Burrage, Kevin
2006-10-03
The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.
DSAP: deep-sequencing small RNA analysis pipeline.
Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus
2010-07-01
DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.
Sakai, Hiroaki; Kanamori, Hiroyuki; Arai-Kichise, Yuko; Shibata-Hatta, Mari; Ebana, Kaworu; Oono, Youko; Kurita, Kanako; Fujisawa, Hiroko; Katagiri, Satoshi; Mukai, Yoshiyuki; Hamada, Masao; Itoh, Takeshi; Matsumoto, Takashi; Katayose, Yuichi; Wakasa, Kyo; Yano, Masahiro; Wu, Jianzhong
2014-01-01
Having a deep genetic structure evolved during its domestication and adaptation, the Asian cultivated rice (Oryza sativa) displays considerable physiological and morphological variations. Here, we describe deep whole-genome sequencing of the aus rice cultivar Kasalath by using the advanced next-generation sequencing (NGS) technologies to gain a better understanding of the sequence and structural changes among highly differentiated cultivars. The de novo assembled Kasalath sequences represented 91.1% (330.55 Mb) of the genome and contained 35 139 expressed loci annotated by RNA-Seq analysis. We detected 2 787 250 single-nucleotide polymorphisms (SNPs) and 7393 large insertion/deletion (indel) sites (>100 bp) between Kasalath and Nipponbare, and 2 216 251 SNPs and 3780 large indels between Kasalath and 93-11. Extensive comparison of the gene contents among these cultivars revealed similar rates of gene gain and loss. We detected at least 7.39 Mb of inserted sequences and 40.75 Mb of unmapped sequences in the Kasalath genome in comparison with the Nipponbare reference genome. Mapping of the publicly available NGS short reads from 50 rice accessions proved the necessity and the value of using the Kasalath whole-genome sequence as an additional reference to capture the sequence polymorphisms that cannot be discovered by using the Nipponbare sequence alone. PMID:24578372
Unified Deep Learning Architecture for Modeling Biology Sequence.
Wu, Hongjie; Cao, Chengyuan; Xia, Xiaoyan; Lu, Qiang
2017-10-09
Prediction of the spatial structure or function of biological macromolecules based on their sequence remains an important challenge in bioinformatics. When modeling biological sequences using traditional sequencing models, characteristics, such as long-range interactions between basic units, the complicated and variable output of labeled structures, and the variable length of biological sequences, usually lead to different solutions on a case-by-case basis. This study proposed the use of bidirectional recurrent neural networks based on long short-term memory or a gated recurrent unit to capture long-range interactions by designing the optional reshape operator to adapt to the diversity of the output labels and implementing a training algorithm to support the training of sequence models capable of processing variable-length sequences. Additionally, the merge and pooling operators enhanced the ability to capture short-range interactions between basic units of biological sequences. The proposed deep-learning model and its training algorithm might be capable of solving currently known biological sequence-modeling problems through the use of a unified framework. We validated our model on one of the most difficult biological sequence-modeling problems currently known, with our results indicating the ability of the model to obtain predictions of protein residue interactions that exceeded the accuracy of current popular approaches by 10% based on multiple benchmarks.
Planetary cubesats - mission architectures
NASA Astrophysics Data System (ADS)
Bousquet, Pierre W.; Ulamec, Stephan; Jaumann, Ralf; Vane, Gregg; Baker, John; Clark, Pamela; Komarek, Tomas; Lebreton, Jean-Pierre; Yano, Hajime
2016-07-01
Miniaturisation of technologies over the last decade has made cubesats a valid solution for deep space missions. For example, a spectacular set 13 cubesats will be delivered in 2018 to a high lunar orbit within the frame of SLS' first flight, referred to as Exploration Mission-1 (EM-1). Each of them will perform autonomously valuable scientific or technological investigations. Other situations are encountered, such as the auxiliary landers / rovers and autonomous camera that will be carried in 2018 to asteroid 1993 JU3 by JAXA's Hayabusas 2 probe, and will provide complementary scientific return to their mothership. In this case, cubesats depend on a larger spacecraft for deployment and other resources, such as telecommunication relay or propulsion. For both situations, we will describe in this paper how cubesats can be used as remote observatories (such as NEO detection missions), as technology demonstrators, and how they can perform or contribute to all steps in the Deep Space exploration sequence: Measurements during Deep Space cruise, Body Fly-bies, Body Orbiters, Atmospheric probes (Jupiter probe, Venus atmospheric probes, ..), Static Landers, Mobile landers (such as balloons, wheeled rovers, small body rovers, drones, penetrators, floating devices, …), Sample Return. We will elaborate on mission architectures for the most promising concepts where cubesat size devices offer an advantage in terms of affordability, feasibility, and increase of scientific return.
Pérez, Ruben; Calleros, Lucía; Marandino, Ana; Sarute, Nicolás; Iraola, Gregorio; Grecco, Sofia; Blanc, Hervé; Vignuzzi, Marco; Isakov, Ofer; Shomron, Noam; Carrau, Lucía; Hernández, Martín; Francia, Lourdes; Sosa, Katia; Tomás, Gonzalo; Panzera, Yanina
2014-01-01
Canine parvovirus (CPV), a fast-evolving single-stranded DNA virus, comprises three antigenic variants (2a, 2b, and 2c) with different frequencies and genetic variability among countries. The contribution of co-infection and recombination to the genetic variability of CPV is far from being fully elucidated. Here we took advantage of a natural CPV population, recently formed by the convergence of divergent CPV-2c and CPV-2a strains, to study co-infection and recombination. Complete sequences of the viral coding region of CPV-2a and CPV-2c strains from 40 samples were generated and analyzed using phylogenetic tools. Two samples showed co-infection and were further analyzed by deep sequencing. The sequence profile of one of the samples revealed the presence of CPV-2c and CPV-2a strains that differed at 29 nucleotides. The other sample included a minor CPV-2a strain (13.3% of the viral population) and a major recombinant strain (86.7%). The recombinant strain arose from inter-genotypic recombination between CPV-2c and CPV-2a strains within the VP1/VP2 gene boundary. Our findings highlight the importance of deep-sequencing analysis to provide a better understanding of CPV molecular diversity. PMID:25365348
Šlapeta, Jan; Saverimuttu, Stefan; Vogelnest, Larry; Sangster, Cheryl; Hulst, Frances; Rose, Karrie; Thompson, Paul; Whittington, Richard
2017-11-01
The short-beaked echidna (Tachyglossus aculeatus) and the platypus (Ornithorhynchus anatinus) are iconic egg-laying monotremes (Mammalia: Monotremata) from Australasia. The aim of this study was to demonstrate the utility of diversity profiles in disease investigations of monotremes. Using small subunit (18S) rDNA amplicon deep-sequencing we demonstrated the presence of apicomplexan parasites and confirmed by direct and cloned amplicon gene sequencing Theileria ornithorhynchi, Theileria tachyglossi, Eimeria echidnae and Cryptosporidium fayeri. Using a combination of samples from healthy and diseased animals, we show a close evolutionary relationship between species of coccidia (Eimeria) and piroplasms (Theileria) from the echidna and platypus. The presence of E. echidnae was demonstrated in faeces and tissues affected by disseminated coccidiosis. Moreover, the presence of E. echidnae DNA in the blood of echidnas was associated with atoxoplasma-like stages in white blood cells, suggesting Hepatozoon tachyglossi blood stages are disseminated E. echidnae stages. These next-generation DNA sequencing technologies are suited to material and organisms that have not been previously characterised and for which the material is scarce. The deep sequencing approach supports traditional diagnostic methods, including microscopy, clinical pathology and histopathology, to better define the status quo. This approach is particularly suitable for wildlife disease investigation. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Pasquale, V.; Chiozzi, P.; Verdoya, M.
2013-05-01
Temperatures recorded in wells as deep as 6 km drilled for hydrocarbon prospecting were used together with geological information to depict the thermal regime of the sedimentary sequence of the eastern sector of the Po Plain. After correction for drilling disturbance, temperature data were analyzed through an inversion technique based on a laterally constant thermal gradient model. The obtained thermal gradient is quite low within the deep carbonate unit (14 mK m- 1), while it is larger (53 mK m- 1) in the overlying impermeable formations. In the uppermost sedimentary layers, the thermal gradient is close to the regional average (21 mK m- 1). We argue that such a vertical change cannot be ascribed to thermal conductivity variation within the sedimentary sequence, but to deep groundwater flow. Since the hydrogeological characteristics (including litho-stratigraphic sequence and structural setting) hardly permit forced convection, we suggest that thermal convection might occur within the deep carbonate aquifer. The potential of this mechanism was evaluated by means of the Rayleigh number analysis. It turned out that permeability required for convection to occur must be larger than 3 10- 15 m2. The average over-heat ratio is 0.45. The lateral variation of hydrothermal regime was tested by using temperature data representing the aquifer thermal conditions. We found that thermal convection might be more developed and variable at the Ferrara High and its surroundings, where widespread fracturing may have increased permeability.
Han, R; Rai, A; Nakamura, M; Suzuki, H; Takahashi, H; Yamazaki, M; Saito, K
2016-01-01
Study on transcriptome, the entire pool of transcripts in an organism or single cells at certain physiological or pathological stage, is indispensable in unraveling the connection and regulation between DNA and protein. Before the advent of deep sequencing, microarray was the main approach to handle transcripts. Despite obvious shortcomings, including limited dynamic range and difficulties to compare the results from distinct experiments, microarray was widely applied. During the past decade, next-generation sequencing (NGS) has revolutionized our understanding of genomics in a fast, high-throughput, cost-effective, and tractable manner. By adopting NGS, efficiency and fruitful outcomes concerning the efforts to elucidate genes responsible for producing active compounds in medicinal plants were profoundly enhanced. The whole process involves steps, from the plant material sampling, to cDNA library preparation, to deep sequencing, and then bioinformatics takes over to assemble enormous-yet fragmentary-data from which to comb and extract information. The unprecedentedly rapid development of such technologies provides so many choices to facilitate the task, which can cause confusion when choosing the suitable methodology for specific purposes. Here, we review the general approaches for deep transcriptome analysis and then focus on their application in discovering biosynthetic pathways of medicinal plants that produce important secondary metabolites. © 2016 Elsevier Inc. All rights reserved.
Goldsmith, Dawn B.; Parsons, Rachel J.; Beyene, Damitu; Salamon, Peter
2015-01-01
Deep sequencing of the viral phoH gene, a host-derived auxiliary metabolic gene, was used to track viral diversity throughout the water column at the Bermuda Atlantic Time-series Study (BATS) site in the summer (September) and winter (March) of three years. Viral phoH sequences reveal differences in the viral communities throughout a depth profile and between seasons in the same year. Variation was also detected between the same seasons in subsequent years, though these differences were not as great as the summer/winter distinctions. Over 3,600 phoH operational taxonomic units (OTUs; 97% sequence identity) were identified. Despite high richness, most phoH sequences belong to a few large, common OTUs whereas the majority of the OTUs are small and rare. While many OTUs make sporadic appearances at just a few times or depths, a small number of OTUs dominate the community throughout the seasons, depths, and years. PMID:26157645
Nagahama, Hiroshi; Suzuki, Kengo; Shonai, Takaharu; Aratani, Kazuki; Sakurai, Yuuki; Nakamura, Manami; Sakata, Motomichi
2015-01-01
Electrodes are surgically implanted into the subthalamic nucleus (STN) of Parkinson's disease patients to provide deep brain stimulation. For ensuring correct positioning, the anatomic location of the STN must be determined preoperatively. Magnetic resonance imaging has been used for pinpointing the location of the STN. To identify the optimal imaging sequence for identifying the STN, we compared images produced with T2 star-weighted angiography (SWAN), gradient echo T2*-weighted imaging, and fast spin echo T2-weighted imaging in 6 healthy volunteers. Our comparison involved measurement of the contrast-to-noise ratio (CNR) for the STN and substantia nigra and a radiologist's interpretations of the images. Of the sequences examined, the CNR and qualitative scores were significantly higher on SWAN images than on other images (p < 0.01) for STN visualization. Kappa value (0.74) on SWAN images was the highest in three sequences for visualizing the STN. SWAN is the sequence best suited for identifying the STN at the present time.
Deep sequencing methods for protein engineering and design.
Wrenbeck, Emily E; Faber, Matthew S; Whitehead, Timothy A
2017-08-01
The advent of next-generation sequencing (NGS) has revolutionized protein science, and the development of complementary methods enabling NGS-driven protein engineering have followed. In general, these experiments address the functional consequences of thousands of protein variants in a massively parallel manner using genotype-phenotype linked high-throughput functional screens followed by DNA counting via deep sequencing. We highlight the use of information rich datasets to engineer protein molecular recognition. Examples include the creation of multiple dual-affinity Fabs targeting structurally dissimilar epitopes and engineering of a broad germline-targeted anti-HIV-1 immunogen. Additionally, we highlight the generation of enzyme fitness landscapes for conducting fundamental studies of protein behavior and evolution. We conclude with discussion of technological advances. Copyright © 2016 Elsevier Ltd. All rights reserved.
Dutta, Sutapa; Kumawat, Giriraj; Singh, Bikram P; Gupta, Deepak K; Singh, Sangeeta; Dogra, Vivek; Gaikwad, Kishor; Sharma, Tilak R; Raje, Ranjeet S; Bandhopadhya, Tapas K; Datta, Subhojit; Singh, Mahendra N; Bashasab, Fakrudin; Kulwal, Pawan; Wanjari, K B; K Varshney, Rajeev; Cook, Douglas R; Singh, Nagendra K
2011-01-20
Pigeonpea [Cajanus cajan (L.) Millspaugh], one of the most important food legumes of semi-arid tropical and subtropical regions, has limited genomic resources, particularly expressed sequence based (genic) markers. We report a comprehensive set of validated genic simple sequence repeat (SSR) markers using deep transcriptome sequencing, and its application in genetic diversity analysis and mapping. In this study, 43,324 transcriptome shotgun assembly unigene contigs were assembled from 1.696 million 454 GS-FLX sequence reads of separate pooled cDNA libraries prepared from leaf, root, stem and immature seed of two pigeonpea varieties, Asha and UPAS 120. A total of 3,771 genic-SSR loci, excluding homopolymeric and compound repeats, were identified; of which 2,877 PCR primer pairs were designed for marker development. Dinucleotide was the most common repeat motif with a frequency of 60.41%, followed by tri- (34.52%), hexa- (2.62%), tetra- (1.67%) and pentanucleotide (0.76%) repeat motifs. Primers were synthesized and tested for 772 of these loci with repeat lengths of ≥ 18 bp. Of these, 550 markers were validated for consistent amplification in eight diverse pigeonpea varieties; 71 were found to be polymorphic on agarose gel electrophoresis. Genetic diversity analysis was done on 22 pigeonpea varieties and eight wild species using 20 highly polymorphic genic-SSR markers. The number of alleles at these loci ranged from 4-10 and the polymorphism information content values ranged from 0.46 to 0.72. Neighbor-joining dendrogram showed distinct separation of the different groups of pigeonpea cultivars and wild species. Deep transcriptome sequencing of the two parental lines helped in silico identification of polymorphic genic-SSR loci to facilitate the rapid development of an intra-species reference genetic map, a subset of which was validated for expected allelic segregation in the reference mapping population. We developed 550 validated genic-SSR markers in pigeonpea using deep transcriptome sequencing. From these, 20 highly polymorphic markers were used to evaluate the genetic relationship among species of the genus Cajanus. A comprehensive set of genic-SSR markers was developed as an important genomic resource for diversity analysis and genetic mapping in pigeonpea.
2011-01-01
Background Pigeonpea [Cajanus cajan (L.) Millspaugh], one of the most important food legumes of semi-arid tropical and subtropical regions, has limited genomic resources, particularly expressed sequence based (genic) markers. We report a comprehensive set of validated genic simple sequence repeat (SSR) markers using deep transcriptome sequencing, and its application in genetic diversity analysis and mapping. Results In this study, 43,324 transcriptome shotgun assembly unigene contigs were assembled from 1.696 million 454 GS-FLX sequence reads of separate pooled cDNA libraries prepared from leaf, root, stem and immature seed of two pigeonpea varieties, Asha and UPAS 120. A total of 3,771 genic-SSR loci, excluding homopolymeric and compound repeats, were identified; of which 2,877 PCR primer pairs were designed for marker development. Dinucleotide was the most common repeat motif with a frequency of 60.41%, followed by tri- (34.52%), hexa- (2.62%), tetra- (1.67%) and pentanucleotide (0.76%) repeat motifs. Primers were synthesized and tested for 772 of these loci with repeat lengths of ≥18 bp. Of these, 550 markers were validated for consistent amplification in eight diverse pigeonpea varieties; 71 were found to be polymorphic on agarose gel electrophoresis. Genetic diversity analysis was done on 22 pigeonpea varieties and eight wild species using 20 highly polymorphic genic-SSR markers. The number of alleles at these loci ranged from 4-10 and the polymorphism information content values ranged from 0.46 to 0.72. Neighbor-joining dendrogram showed distinct separation of the different groups of pigeonpea cultivars and wild species. Deep transcriptome sequencing of the two parental lines helped in silico identification of polymorphic genic-SSR loci to facilitate the rapid development of an intra-species reference genetic map, a subset of which was validated for expected allelic segregation in the reference mapping population. Conclusion We developed 550 validated genic-SSR markers in pigeonpea using deep transcriptome sequencing. From these, 20 highly polymorphic markers were used to evaluate the genetic relationship among species of the genus Cajanus. A comprehensive set of genic-SSR markers was developed as an important genomic resource for diversity analysis and genetic mapping in pigeonpea. PMID:21251263
Parkes, R John; Sellek, Gerard; Webster, Gordon; Martin, Derek; Anders, Erik; Weightman, Andrew J; Sass, Henrik
2009-01-01
Deep subseafloor sediments may contain depressurization-sensitive, anaerobic, piezophilic prokaryotes. To test this we developed the DeepIsoBUG system, which when coupled with the HYACINTH pressure-retaining drilling and core storage system and the PRESS core cutting and processing system, enables deep sediments to be handled without depressurization (up to 25 MPa) and anaerobic prokaryotic enrichments and isolation to be conducted up to 100 MPa. Here, we describe the system and its first use with subsurface gas hydrate sediments from the Indian Continental Shelf, Cascadia Margin and Gulf of Mexico. Generally, highest cell concentrations in enrichments occurred close to in situ pressures (14 MPa) in a variety of media, although growth continued up to at least 80 MPa. Predominant sequences in enrichments were Carnobacterium, Clostridium, Marinilactibacillus and Pseudomonas, plus Acetobacterium and Bacteroidetes in Indian samples, largely independent of media and pressures. Related 16S rRNA gene sequences for all of these Bacteria have been detected in deep, subsurface environments, although isolated strains were piezotolerant, being able to grow at atmospheric pressure. Only the Clostridium and Acetobacterium were obligate anaerobes. No Archaea were enriched. It may be that these sediment samples were not deep enough (total depth 1126–1527 m) to obtain obligate piezophiles. PMID:19694787
USDA-ARS?s Scientific Manuscript database
Modern day genomics holds the promise of solving the complexities of basic plant sciences, and of catalyzing practical advances in plant breeding. While contiguous, "base perfect" deep sequencing is a key module of any genome project, recent advances in parallel next generation sequencing technologi...
3′ terminal diversity of MRP RNA and other human noncoding RNAs revealed by deep sequencing
2013-01-01
Background Post-transcriptional 3′ end processing is a key component of RNA regulation. The abundant and essential RNA subunit of RNase MRP has been proposed to function in three distinct cellular compartments and therefore may utilize this mode of regulation. Here we employ 3′ RACE coupled with high-throughput sequencing to characterize the 3′ terminal sequences of human MRP RNA and other noncoding RNAs that form RNP complexes. Results The 3′ terminal sequence of MRP RNA from HEK293T cells has a distinctive distribution of genomically encoded termini (including an assortment of U residues) with a portion of these selectively tagged by oligo(A) tails. This profile contrasts with the relatively homogenous 3′ terminus of an in vitro transcribed MRP RNA control and the differing 3′ terminal profiles of U3 snoRNA, RNase P RNA, and telomerase RNA (hTR). Conclusions 3′ RACE coupled with deep sequencing provides a valuable framework for the functional characterization of 3′ terminal sequences of noncoding RNAs. PMID:24053768
Schilmiller, Anthony L; Miner, Dennis P; Larson, Matthew; McDowell, Eric; Gang, David R; Wilkerson, Curtis; Last, Robert L
2010-07-01
Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces beta-caryophyllene and alpha-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells.
Schilmiller, Anthony L.; Miner, Dennis P.; Larson, Matthew; McDowell, Eric; Gang, David R.; Wilkerson, Curtis; Last, Robert L.
2010-01-01
Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces β-caryophyllene and α-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells. PMID:20431087
Chaillon, Antoine; Nakazawa, Masato; Wertheim, Joel O; Little, Susan J; Smith, Davey M; Mehta, Sanjay R; Gianella, Sara
2017-11-01
During primary HIV infection, the presence of minority drug resistance mutations (DRM) may be a consequence of sexual transmission, de novo mutations, or technical errors in identification. Baseline blood samples were collected from 24 HIV-infected antiretroviral-naive, genetically and epidemiologically linked source and recipient partners shortly after the recipient's estimated date of infection. An additional 32 longitudinal samples were available from 11 recipients. Deep sequencing of HIV reverse transcriptase (RT) was performed (Roche/454), and the sequences were screened for nucleoside and nonnucleoside RT inhibitor DRM. The likelihood of sexual transmission and persistence of DRM was assessed using Bayesian-based statistical modeling. While the majority of DRM (>20%) were consistently transmitted from source to recipient, the probability of detecting a minority DRM in the recipient was not increased when the same minority DRM was detected in the source (Bayes factor [BF] = 6.37). Longitudinal analyses revealed an exponential decay of DRM (BF = 0.05) while genetic diversity increased. Our analysis revealed no substantial evidence for sexual transmission of minority DRM (BF = 0.02). The presence of minority DRM during early infection, followed by a rapid decay, is consistent with the "mutation-selection balance" hypothesis, in which deleterious mutations are more efficiently purged later during HIV infection when the larger effective population size allows more efficient selection. Future studies using more recent sequencing technologies that are less prone to single-base errors should confirm these results by applying a similar Bayesian framework in other clinical settings. IMPORTANCE The advent of sensitive sequencing platforms has led to an increased identification of minority drug resistance mutations (DRM), including among antiretroviral therapy-naive HIV-infected individuals. While transmission of DRM may impact future therapy options for newly infected individuals, the clinical significance of the detection of minority DRM remains controversial. In the present study, we applied deep-sequencing techniques within a Bayesian hierarchical framework to a cohort of 24 transmission pairs to investigate whether minority DRM detected shortly after transmission were the consequence of (i) sexual transmission from the source, (ii) de novo emergence shortly after infection followed by viral selection and evolution, or (iii) technical errors/limitations of deep-sequencing methods. We found no clear evidence to support the sexual transmission of minority resistant variants, and our results suggested that minor resistant variants may emerge de novo shortly after transmission, when the small effective population size limits efficient purge by natural selection. Copyright © 2017 American Society for Microbiology.
Optical Communications Channel Combiner
NASA Technical Reports Server (NTRS)
Quirk, Kevin J.; Quirk, Kevin J.; Nguyen, Danh H.; Nguyen, Huy
2012-01-01
NASA has identified deep-space optical communications links as an integral part of a unified space communication network in order to provide data rates in excess of 100 Mb/s. The distances and limited power inherent in a deep-space optical downlink necessitate the use of photon-counting detectors and a power-efficient modulation such as pulse position modulation (PPM). For the output of each photodetector, whether from a separate telescope or a portion of the detection area, a communication receiver estimates a log-likelihood ratio for each PPM slot. To realize the full effective aperture of these receivers, their outputs must be combined prior to information decoding. A channel combiner was developed to synchronize the log-likelihood ratio (LLR) sequences of multiple receivers, and then combines these into a single LLR sequence for information decoding. The channel combiner synchronizes the LLR sequences of up to three receivers and then combines these into a single LLR sequence for output. The channel combiner has three channel inputs, each of which takes as input a sequence of four-bit LLRs for each PPM slot in a codeword via a XAUI 10 Gb/s quad optical fiber interface. The cross-correlation between the channels LLR time series are calculated and used to synchronize the sequences prior to combining. The output of the channel combiner is a sequence of four-bit LLRs for each PPM slot in a codeword via a XAUI 10 Gb/s quad optical fiber interface. The unit is controlled through a 1 Gb/s Ethernet UDP/IP interface. A deep-space optical communication link has not yet been demonstrated. This ground-station channel combiner was developed to demonstrate this capability and is unique in its ability to process such a signal.
Synchronisation, acquisition and tracking for telemetry and data reception
NASA Astrophysics Data System (ADS)
Vandoninck, A.
1992-06-01
The important parameters of synchronization, acquisition, and tracking are addressed, and each function is highlighted separately. The following sequence is such as the functions occur in the system in time and for the type of data to be received, with distinction between telemetry and data reception, between direct carrier modulation or the use of a subcarrier, and between deep space and normal reception. For the telemetry reception the acquisition is described taking into account the difference in performances as geostationary or polar orbits, and the dependencies on the different Doppler offsets and rates are distinguished. The related functions and parameters are covered and the specifications of an average receiver are summarized. The synchronization of the valid data is described with a distinction for data directly modulated or via a subcarrier, the type of modulation and bitrate. The relevant functions and parameters of the average receiver/demodulator are summarized. The tracking of the signal in the course of the operational phase is described and relevant parameters of an actual system are presented. The reception of real data is handled and a sequence of acquisition, synchronization, and tracking is applied. Here higher bitrates and direct modulation schemes play an important role. The market equipment with the relevant parameters are discussed. The three functions in cases where deep reception is needed are covered. The high performance receiver/demodulator functions and how the acquisition, synchronization, and tracking is handled in such application, are explained.
The 3-D aftershock distribution of three recent M5~5.5 earthquakes in the Anza region,California
NASA Astrophysics Data System (ADS)
Zhang, Q.; Wdowinski, S.; Lin, G.
2011-12-01
The San Jacinto fault zone (SJFZ) exhibits the highest level of seismicity compared to other regions in southern California. On average, it produces four earthquakes per day, most of them at depth of 10-17 km. Over the past decade, an increasing seismic activity occurred in the Anza region, which included three M5~5.5 events and their aftershock sequences. These events occurred in 2001, 2005, and 2010. In this research we map the 3-D distribution of these three events to evaluate their rupture geometry and better understand the unusual deep seismic pattern along the SJFZ, which was termed "deep creep" (Wdowinski, 2009). We relocated 97,562 events from 1981 to 2011 in Anza region by applying the Source-Specific Station Term (SSST) method (Lin et al., 2006) and used an accurate 1-D velocity model derived from 3-D model of Lin et al (2007) and used In order to separate the aftershock sequence from background seismicity, we characterized each of the three aftershock sequences using Omori's law. Preliminary results show that all three sequences had a similar geometry of deep elongated aftershock distribution. Most aftershocks occurred at depth of 10-17 km and extended over a 70 km long segments of the SJFZ, centered at the mainshock hypocenters. A comparative study of other M5~5.5 mainshocks and their aftershock sequences in southern California reveals very different geometrical pattern, suggesting that the three Anza M5~5.5 events are unique and can be indicative of "deep creep" deformation processes. Reference 1.Lin, G.and Shearer,P.M.,2006, The COMPLOC earthquake location package,Seism. Res. Lett.77, pp.440-444. 2.Lin, G. and Shearer, P.M., Hauksson, E., and Thurber C.H.,2007, A three-dimensional crustal seismic velocity model for southern California from a composite event method,J. Geophys.Res.112, B12306, doi: 10.1029/ 2007JB004977. 3.Wdowinski, S. ,2009, Deep creep as a cause for the excess seismicity along the San Jacinto fault, Nat. Geosci.,doi:10.1038/NGEO684.
Effects of hydrostatic pressure on yeasts isolated from deep-sea hydrothermal vents.
Burgaud, Gaëtan; Hué, Nguyen Thi Minh; Arzur, Danielle; Coton, Monika; Perrier-Cornet, Jean-Marie; Jebbar, Mohamed; Barbier, Georges
2015-11-01
Hydrostatic pressure plays a significant role in the distribution of life in the biosphere. Knowledge of deep-sea piezotolerant and (hyper)piezophilic bacteria and archaea diversity has been well documented, along with their specific adaptations to cope with high hydrostatic pressure (HHP). Recent investigations of deep-sea microbial community compositions have shown unexpected micro-eukaryotic communities, mainly dominated by fungi. Molecular methods such as next-generation sequencing have been used for SSU rRNA gene sequencing to reveal fungal taxa. Currently, a difficult but fascinating challenge for marine mycologists is to create deep-sea marine fungus culture collections and assess their ability to cope with pressure. Indeed, although there is no universal genetic marker for piezoresistance, physiological analyses provide concrete relevant data for estimating their adaptations and understanding the role of fungal communities in the abyss. The present study investigated morphological and physiological responses of fungi to HHP using a collection of deep-sea yeasts as a model. The aim was to determine whether deep-sea yeasts were able to tolerate different HHP and if they were metabolically active. Here we report an unexpected taxonomic-based dichotomic response to pressure with piezosensitve ascomycetes and piezotolerant basidiomycetes, and distinct morphological switches triggered by pressure for certain strains. Copyright © 2015 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.
Lonardi, Stefano; Mirebrahim, Hamid; Wanamaker, Steve; Alpert, Matthew; Ciardo, Gianfranco; Duma, Denisa; Close, Timothy J
2015-09-15
As the invention of DNA sequencing in the 70s, computational biologists have had to deal with the problem of de novo genome assembly with limited (or insufficient) depth of sequencing. In this work, we investigate the opposite problem, that is, the challenge of dealing with excessive depth of sequencing. We explore the effect of ultra-deep sequencing data in two domains: (i) the problem of decoding reads to bacterial artificial chromosome (BAC) clones (in the context of the combinatorial pooling design we have recently proposed), and (ii) the problem of de novo assembly of BAC clones. Using real ultra-deep sequencing data, we show that when the depth of sequencing increases over a certain threshold, sequencing errors make these two problems harder and harder (instead of easier, as one would expect with error-free data), and as a consequence the quality of the solution degrades with more and more data. For the first problem, we propose an effective solution based on 'divide and conquer': we 'slice' a large dataset into smaller samples of optimal size, decode each slice independently, and then merge the results. Experimental results on over 15 000 barley BACs and over 4000 cowpea BACs demonstrate a significant improvement in the quality of the decoding and the final assembly. For the second problem, we show for the first time that modern de novo assemblers cannot take advantage of ultra-deep sequencing data. Python scripts to process slices and resolve decoding conflicts are available from http://goo.gl/YXgdHT; software Hashfilter can be downloaded from http://goo.gl/MIyZHs stelo@cs.ucr.edu or timothy.close@ucr.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
microRNA expression profiling in fetal single ventricle malformation identified by deep sequencing.
Yu, Zhang-Bin; Han, Shu-Ping; Bai, Yun-Fei; Zhu, Chun; Pan, Ya; Guo, Xi-Rong
2012-01-01
microRNAs (miRNAs) have emerged as key regulators in many biological processes, particularly cardiac growth and development, although the specific miRNA expression profile associated with this process remains to be elucidated. This study aimed to characterize the cellular microRNA profile involved in the development of congenital heart malformation, through the investigation of single ventricle (SV) defects. Comprehensive miRNA profiling in human fetal SV cardiac tissue was performed by deep sequencing. Differential expression of 48 miRNAs was revealed by sequencing by oligonucleotide ligation and detection (SOLiD) analysis. Of these, 38 were down-regulated and 10 were up-regulated in differentiated SV cardiac tissue, compared to control cardiac tissue. This was confirmed by real-time quantitative reverse transcription-polymerase chain reaction (qRT-PCR) analysis. Predicted target genes of the 48 differentially expressed miRNAs were analyzed by gene ontology and categorized according to cellular process, regulation of biological process and metabolic process. Pathway-Express analysis identified the WNT and mTOR signaling pathways as the most significant processes putatively affected by the differential expression of these miRNAs. The candidate genes involved in cardiac development were identified as potential targets for these differentially expressed microRNAs and the collaborative network of microRNAs and cardiac development related-mRNAs was constructed. These data provide the basis for future investigation of the mechanism of the occurrence and development of fetal SV malformations.
Visschedijk, Marijn C; Alberts, Rudi; Mucha, Soren; Deelen, Patrick; de Jong, Dirk J; Pierik, Marieke; Spekhorst, Lieke M; Imhann, Floris; van der Meulen-de Jong, Andrea E; van der Woude, C Janneke; van Bodegraven, Adriaan A; Oldenburg, Bas; Löwenberg, Mark; Dijkstra, Gerard; Ellinghaus, David; Schreiber, Stefan; Wijmenga, Cisca; Rivas, Manuel A; Franke, Andre; van Diemen, Cleo C; Weersma, Rinse K
2016-01-01
Genome-wide association studies have revealed several common genetic risk variants for ulcerative colitis (UC). However, little is known about the contribution of rare, large effect genetic variants to UC susceptibility. In this study, we performed a deep targeted re-sequencing of 122 genes in Dutch UC patients in order to investigate the contribution of rare variants to the genetic susceptibility to UC. The selection of genes consists of 111 established human UC susceptibility genes and 11 genes that lead to spontaneous colitis when knocked-out in mice. In addition, we sequenced the promoter regions of 45 genes where known variants exert cis-eQTL-effects. Targeted pooled re-sequencing was performed on DNA of 790 Dutch UC cases. The Genome of the Netherlands project provided sequence data of 500 healthy controls. After quality control and prioritization based on allele frequency and pathogenicity probability, follow-up genotyping of 171 rare variants was performed on 1021 Dutch UC cases and 1166 Dutch controls. Single-variant association and gene-based analyses identified an association of rare variants in the MUC2 gene with UC. The associated variants in the Dutch population could not be replicated in a German replication cohort (1026 UC cases, 3532 controls). In conclusion, this study has identified a putative role for MUC2 on UC susceptibility in the Dutch population and suggests a population-specific contribution of rare variants to UC.
Unique microbial community in drilling fluids from Chinese continental scientific drilling
Zhang, Gengxin; Dong, Hailiang; Jiang, Hongchen; Xu, Zhiqin; Eberl, Dennis D.
2006-01-01
Circulating drilling fluid is often regarded as a contamination source in investigations of subsurface microbiology. However, it also provides an opportunity to sample geological fluids at depth and to study contained microbial communities. During our study of deep subsurface microbiology of the Chinese Continental Scientific Deep drilling project, we collected 6 drilling fluid samples from a borehole from 2290 to 3350 m below the land surface. Microbial communities in these samples were characterized with cultivation-dependent and -independent techniques. Characterization of 16S rRNA genes indicated that the bacterial clone sequences related to Firmicutes became progressively dominant with increasing depth. Most sequences were related to anaerobic, thermophilic, halophilic or alkaliphilic bacteria. These habitats were consistent with the measured geochemical characteristics of the drilling fluids that have incorporated geological fluids and partly reflected the in-situ conditions. Several clone types were closely related to Thermoanaerobacter ethanolicus, Caldicellulosiruptor lactoaceticus, and Anaerobranca gottschalkii, an anaerobic metal-reducer, an extreme thermophile, and an anaerobic chemoorganotroph, respectively, with an optimal growth temperature of 50–68°C. Seven anaerobic, thermophilic Fe(III)-reducing bacterial isolates were obtained and they were capable of reducing iron oxide and clay minerals to produce siderite, vivianite, and illite. The archaeal diversity was low. Most archaeal sequences were not related to any known cultivated species, but rather to environmental clone sequences recovered from subsurface environments. We infer that the detected microbes were derived from geological fluids at depth and their growth habitats reflected the deep subsurface conditions. These findings have important implications for microbial survival and their ecological functions in the deep subsurface.
Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J; Kellam, Paul; van der Hoek, Lia
2014-01-01
We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis.
Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J.; Kellam, Paul; van der Hoek, Lia
2014-01-01
We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis. PMID:24695106
Adhikari, Badri; Hou, Jie; Cheng, Jianlin
2018-03-01
In this study, we report the evaluation of the residue-residue contacts predicted by our three different methods in the CASP12 experiment, focusing on studying the impact of multiple sequence alignment, residue coevolution, and machine learning on contact prediction. The first method (MULTICOM-NOVEL) uses only traditional features (sequence profile, secondary structure, and solvent accessibility) with deep learning to predict contacts and serves as a baseline. The second method (MULTICOM-CONSTRUCT) uses our new alignment algorithm to generate deep multiple sequence alignment to derive coevolution-based features, which are integrated by a neural network method to predict contacts. The third method (MULTICOM-CLUSTER) is a consensus combination of the predictions of the first two methods. We evaluated our methods on 94 CASP12 domains. On a subset of 38 free-modeling domains, our methods achieved an average precision of up to 41.7% for top L/5 long-range contact predictions. The comparison of the three methods shows that the quality and effective depth of multiple sequence alignments, coevolution-based features, and machine learning integration of coevolution-based features and traditional features drive the quality of predicted protein contacts. On the full CASP12 dataset, the coevolution-based features alone can improve the average precision from 28.4% to 41.6%, and the machine learning integration of all the features further raises the precision to 56.3%, when top L/5 predicted long-range contacts are evaluated. And the correlation between the precision of contact prediction and the logarithm of the number of effective sequences in alignments is 0.66. © 2017 Wiley Periodicals, Inc.
Thurber, Mary I; Ghai, Ria R; Hyeroba, David; Weny, Geoffrey; Tumukunde, Alex; Chapman, Colin A; Wiseman, Roger W; Dinis, Jorge; Steeil, James; Greiner, Ellis C; Friedrich, Thomas C; O'Connor, David H; Goldberg, Tony L
2013-07-01
Hemoparasites of the apicomplexan family Plasmodiidae include the etiological agents of malaria, as well as a suite of non-human primate parasites from which the human malaria agents evolved. Despite the significance of these parasites for global health, little information is available about their ecology in multi-host communities. Primates were investigated in Kibale National Park, Uganda, where ecological relationships among host species are well characterized. Blood samples were examined for parasites of the genera Plasmodium and Hepatocystis using microscopy and PCR targeting the parasite mitochondrial cytochrome b gene, followed by Sanger sequencing. To assess co-infection, "deep sequencing" of a variable region within cytochrome b was performed. Out of nine black-and-white colobus (Colobus guereza), one blue guenon (Cercopithecus mitis), five grey-cheeked mangabeys (Lophocebus albigena), 23 olive baboons (Papio anubis), 52 red colobus (Procolobus rufomitratus) and 12 red-tailed guenons (Cercopithecus ascanius), 79 infections (77.5%) were found, all of which were Hepatocystis spp. Sanger sequencing revealed 25 different parasite haplotypes that sorted phylogenetically into six species-specific but morphologically similar lineages. "Deep sequencing" revealed mixed-lineage co-infections in baboons and red colobus (41.7% and 64.7% of individuals, respectively) but not in other host species. One lineage infecting red colobus also infected baboons, but always as the minor variant, suggesting directional cross-species transmission. Hepatocystis parasites in this primate community are a diverse assemblage of cryptic lineages, some of which co-infect hosts and at least one of which can cross primate species barriers. Copyright © 2013 Australian Society for Parasitology Inc. Published by Elsevier Ltd. All rights reserved.
Burroughs, A Maxwell; Ando, Yoshinari; de Hoon, Michiel J L; Tomaru, Yasuhiro; Nishibu, Takahiro; Ukekawa, Ryo; Funakoshi, Taku; Kurokawa, Tsutomu; Suzuki, Harukazu; Hayashizaki, Yoshihide; Daub, Carsten O
2010-10-01
Animal microRNA sequences are subject to 3' nucleotide addition. Through detailed analysis of deep-sequenced short RNA data sets, we show adenylation and uridylation of miRNA is globally present and conserved across Drosophila and vertebrates. To better understand 3' adenylation function, we deep-sequenced RNA after knockdown of nucleotidyltransferase enzymes. The PAPD4 nucleotidyltransferase adenylates a wide range of miRNA loci, but adenylation does not appear to affect miRNA stability on a genome-wide scale. Adenine addition appears to reduce effectiveness of miRNA targeting of mRNA transcripts while deep-sequencing of RNA bound to immunoprecipitated Argonaute (AGO) subfamily proteins EIF2C1-EIF2C3 revealed substantial reduction of adenine addition in miRNA associated with EIF2C2 and EIF2C3. Our findings show 3' addition events are widespread and conserved across animals, PAPD4 is a primary miRNA adenylating enzyme, and suggest a role for 3' adenine addition in modulating miRNA effectiveness, possibly through interfering with incorporation into the RNA-induced silencing complex (RISC), a regulatory role that would complement the role of miRNA uridylation in blocking DICER1 uptake.
HomozygosityMapper2012--bridging the gap between homozygosity mapping and deep sequencing.
Seelow, Dominik; Schuelke, Markus
2012-07-01
Homozygosity mapping is a common method to map recessive traits in consanguineous families. To facilitate these analyses, we have developed HomozygosityMapper, a web-based approach to homozygosity mapping. HomozygosityMapper allows researchers to directly upload the genotype files produced by the major genotyping platforms as well as deep sequencing data. It detects stretches of homozygosity shared by the affected individuals and displays them graphically. Users can interactively inspect the underlying genotypes, manually refine these regions and eventually submit them to our candidate gene search engine GeneDistiller to identify the most promising candidate genes. Here, we present the new version of HomozygosityMapper. The most striking new feature is the support of Next Generation Sequencing *.vcf files as input. Upon users' requests, we have implemented the analysis of common experimental rodents as well as of important farm animals. Furthermore, we have extended the options for single families and loss of heterozygosity studies. Another new feature is the export of *.bed files for targeted enrichment of the potential disease regions for deep sequencing strategies. HomozygosityMapper also generates files for conventional linkage analyses which are already restricted to the possible disease regions, hence superseding CPU-intensive genome-wide analyses. HomozygosityMapper is freely available at http://www.homozygositymapper.org/.
Maximum entropy methods for extracting the learned features of deep neural networks.
Finnegan, Alex; Song, Jun S
2017-10-01
New architectures of multilayer artificial neural networks and new methods for training them are rapidly revolutionizing the application of machine learning in diverse fields, including business, social science, physical sciences, and biology. Interpreting deep neural networks, however, currently remains elusive, and a critical challenge lies in understanding which meaningful features a network is actually learning. We present a general method for interpreting deep neural networks and extracting network-learned features from input data. We describe our algorithm in the context of biological sequence analysis. Our approach, based on ideas from statistical physics, samples from the maximum entropy distribution over possible sequences, anchored at an input sequence and subject to constraints implied by the empirical function learned by a network. Using our framework, we demonstrate that local transcription factor binding motifs can be identified from a network trained on ChIP-seq data and that nucleosome positioning signals are indeed learned by a network trained on chemical cleavage nucleosome maps. Imposing a further constraint on the maximum entropy distribution also allows us to probe whether a network is learning global sequence features, such as the high GC content in nucleosome-rich regions. This work thus provides valuable mathematical tools for interpreting and extracting learned features from feed-forward neural networks.
Lopez-Fernandez, Margarita; Cherkouk, Andrea; Vilchez-Vargas, Ramiro; Jauregui, Ruy; Pieper, Dietmar; Boon, Nico; Sanchez-Castro, Ivan; Merroun, Mohamed L
2015-11-01
The long-term disposal of radioactive wastes in a deep geological repository is the accepted international solution for the treatment and management of these special residues. The microbial community of the selected host rocks and engineered barriers for the deep geological repository may affect the performance and the safety of the radioactive waste disposal. In this work, the bacterial population of bentonite formations of Almeria (Spain), selected as a reference material for bentonite-engineered barriers in the disposal of radioactive wastes, was studied. 16S ribosomal RNA (rRNA) gene-based approaches were used to study the bacterial community of the bentonite samples by traditional clone libraries and Illumina sequencing. Using both techniques, the bacterial diversity analysis revealed similar results, with phylotypes belonging to 14 different bacterial phyla: Acidobacteria, Actinobacteria, Armatimonadetes, Bacteroidetes, Chloroflexi, Cyanobacteria, Deinococcus-Thermus, Firmicutes, Gemmatimonadetes, Planctomycetes, Proteobacteria, Nitrospirae, Verrucomicrobia and an unknown phylum. The dominant groups of the community were represented by Proteobacteria and Bacteroidetes. A high diversity was found in three of the studied samples. However, two samples were less diverse and dominated by Betaproteobacteria.
Osca, David; Templado, José; Zardoya, Rafael
2014-09-01
The complete nucleotide sequence of the mitochondrial (mt) genome of the deep-sea vent snail Ifremeria nautilei (Gastropoda: Abyssochrysoidea) was determined. The double stranded circular molecule is 15,664 pb in length and encodes for the typical 37 metazoan mitochondrial genes. The gene arrangement of the Ifremeria mt genome is most similar to genome organization of caenogastropods and differs only on the relative position of the trnW gene. The deduced amino acid sequences of the mt protein coding genes of Ifremeria mt genome were aligned with orthologous sequences from representatives of the main lineages of gastropods and phylogenetic relationships were inferred. The reconstructed phylogeny supports that Ifremeria belongs to Caenogastropoda and that it is closely related to hypsogastropod superfamilies. Results were compared with a reconstructed nuclear-based phylogeny. Moreover, a relaxed molecular-clock timetree calibrated with fossils dated the divergence of Abyssochrysoidea in the Late Jurassic-Early Cretaceous indicating a relatively modern colonization of deep-sea environments by these snails. Copyright © 2014 Elsevier B.V. All rights reserved.
Systematic comparison of variant calling pipelines using gold standard personal exome variants
Hwang, Sohyun; Kim, Eiru; Lee, Insuk; Marcotte, Edward M.
2015-01-01
The success of clinical genomics using next generation sequencing (NGS) requires the accurate and consistent identification of personal genome variants. Assorted variant calling methods have been developed, which show low concordance between their calls. Hence, a systematic comparison of the variant callers could give important guidance to NGS-based clinical genomics. Recently, a set of high-confident variant calls for one individual (NA12878) has been published by the Genome in a Bottle (GIAB) consortium, enabling performance benchmarking of different variant calling pipelines. Based on the gold standard reference variant calls from GIAB, we compared the performance of thirteen variant calling pipelines, testing combinations of three read aligners—BWA-MEM, Bowtie2, and Novoalign—and four variant callers—Genome Analysis Tool Kit HaplotypeCaller (GATK-HC), Samtools mpileup, Freebayes and Ion Proton Variant Caller (TVC), for twelve data sets for the NA12878 genome sequenced by different platforms including Illumina2000, Illumina2500, and Ion Proton, with various exome capture systems and exome coverage. We observed different biases toward specific types of SNP genotyping errors by the different variant callers. The results of our study provide useful guidelines for reliable variant identification from deep sequencing of personal genomes. PMID:26639839
Liu, Xiaoming; Fu, Yun-Xin; Maxwell, Taylor J.; Boerwinkle, Eric
2010-01-01
It is known that sequencing error can bias estimation of evolutionary or population genetic parameters. This problem is more prominent in deep resequencing studies because of their large sample size n, and a higher probability of error at each nucleotide site. We propose a new method based on the composite likelihood of the observed SNP configurations to infer population mutation rate θ = 4Neμ, population exponential growth rate R, and error rate ɛ, simultaneously. Using simulation, we show the combined effects of the parameters, θ, n, ɛ, and R on the accuracy of parameter estimation. We compared our maximum composite likelihood estimator (MCLE) of θ with other θ estimators that take into account the error. The results show the MCLE performs well when the sample size is large or the error rate is high. Using parametric bootstrap, composite likelihood can also be used as a statistic for testing the model goodness-of-fit of the observed DNA sequences. The MCLE method is applied to sequence data on the ANGPTL4 gene in 1832 African American and 1045 European American individuals. PMID:19952140
Rivière, Jean-Baptiste; Mirzaa, Ghayda M.; O’Roak, Brian J.; Beddaoui, Margaret; Alcantara, Diana; Conway, Robert L.; St-Onge, Judith; Schwartzentruber, Jeremy A.; Gripp, Karen W.; Nikkel, Sarah M.; Worthylake, Thea; Sullivan, Christopher T.; Ward, Thomas R.; Butler, Hailly E.; Kramer, Nancy A.; Albrecht, Beate; Armour, Christine M.; Armstrong, Linlea; Caluseriu, Oana; Cytrynbaum, Cheryl; Drolet, Beth A.; Innes, A. Micheil; Lauzon, Julie L.; Lin, Angela E.; Mancini, Grazia M. S.; Meschino, Wendy S.; Reggin, James D.; Saggar, Anand K.; Lerman-Sagie, Tally; Uyanik, Gökhan; Weksberg, Rosanna; Zirn, Birgit; Beaulieu, Chandree L.; Majewski, Jacek; Bulman, Dennis E.; O’Driscoll, Mark; Shendure, Jay; Graham, John M.; Boycott, Kym M.; Dobyns, William B.
2012-01-01
Megalencephaly-capillary malformation (MCAP) and megalencephaly-polymicrogyria-polydactyly-hydrocephalus (MPPH) syndromes are sporadic overgrowth disorders associated with markedly enlarged brain size and other recognizable features1-5. We performed exome sequencing in three families with MCAP or MPPH and confirmed our initial observations in exomes from 7 MCAP and 174 control individuals, as well as in 40 additional megalencephaly subjects using a combination of Sanger sequencing, restriction-enzyme assays, and targeted deep sequencing. We identified de novo germline or postzygotic mutations in three core components of the phosphatidylinositol-3-kinase (PI3K)/AKT pathway. These include two mutations of AKT3, one recurrent mutation of PIK3R2 in 11 unrelated MPPH families, and 15 mostly postzygotic mutations of PIK3CA in 23 MCAP and one MPPH patients. Our data highlight the central role of PI3K/AKT signaling in vascular, limb and brain development, and emphasize the power of massively parallel sequencing in a challenging context of phenotypic and genetic heterogeneity combined with postzygotic mosaicism. PMID:22729224
Rivière, Jean-Baptiste; Mirzaa, Ghayda M; O'Roak, Brian J; Beddaoui, Margaret; Alcantara, Diana; Conway, Robert L; St-Onge, Judith; Schwartzentruber, Jeremy A; Gripp, Karen W; Nikkel, Sarah M; Worthylake, Thea; Sullivan, Christopher T; Ward, Thomas R; Butler, Hailly E; Kramer, Nancy A; Albrecht, Beate; Armour, Christine M; Armstrong, Linlea; Caluseriu, Oana; Cytrynbaum, Cheryl; Drolet, Beth A; Innes, A Micheil; Lauzon, Julie L; Lin, Angela E; Mancini, Grazia M S; Meschino, Wendy S; Reggin, James D; Saggar, Anand K; Lerman-Sagie, Tally; Uyanik, Gökhan; Weksberg, Rosanna; Zirn, Birgit; Beaulieu, Chandree L; Majewski, Jacek; Bulman, Dennis E; O'Driscoll, Mark; Shendure, Jay; Graham, John M; Boycott, Kym M; Dobyns, William B
2012-06-24
Megalencephaly-capillary malformation (MCAP) and megalencephaly-polymicrogyria-polydactyly-hydrocephalus (MPPH) syndromes are sporadic overgrowth disorders associated with markedly enlarged brain size and other recognizable features. We performed exome sequencing in 3 families with MCAP or MPPH, and our initial observations were confirmed in exomes from 7 individuals with MCAP and 174 control individuals, as well as in 40 additional subjects with megalencephaly, using a combination of Sanger sequencing, restriction enzyme assays and targeted deep sequencing. We identified de novo germline or postzygotic mutations in three core components of the phosphatidylinositol 3-kinase (PI3K)-AKT pathway. These include 2 mutations in AKT3, 1 recurrent mutation in PIK3R2 in 11 unrelated families with MPPH and 15 mostly postzygotic mutations in PIK3CA in 23 individuals with MCAP and 1 with MPPH. Our data highlight the central role of PI3K-AKT signaling in vascular, limb and brain development and emphasize the power of massively parallel sequencing in a challenging context of phenotypic and genetic heterogeneity combined with postzygotic mosaicism.
Deep sampling of the Palomero maize transcriptome by a high throughput strategy of pyrosequencing.
Vega-Arreguín, Julio C; Ibarra-Laclette, Enrique; Jiménez-Moraila, Beatriz; Martínez, Octavio; Vielle-Calzada, Jean Philippe; Herrera-Estrella, Luis; Herrera-Estrella, Alfredo
2009-07-06
In-depth sequencing analysis has not been able to determine the overall complexity of transcriptional activity of a plant organ or tissue sample. In some cases, deep parallel sequencing of Expressed Sequence Tags (ESTs), although not yet optimized for the sequencing of cDNAs, has represented an efficient procedure for validating gene prediction and estimating overall gene coverage. This approach could be very valuable for complex plant genomes. In addition, little emphasis has been given to efforts aiming at an estimation of the overall transcriptional universe found in a multicellular organism at a specific developmental stage. To explore, in depth, the transcriptional diversity in an ancient maize landrace, we developed a protocol to optimize the sequencing of cDNAs and performed 4 consecutive GS20-454 pyrosequencing runs of a cDNA library obtained from 2 week-old Palomero Toluqueño maize plants. The protocol reported here allowed obtaining over 90% of informative sequences. These GS20-454 runs generated over 1.5 Million reads, representing the largest amount of sequences reported from a single plant cDNA library. A collection of 367,391 quality-filtered reads (30.09 Mb) from a single run was sufficient to identify transcripts corresponding to 34% of public maize ESTs databases; total sequences generated after 4 filtered runs increased this coverage to 50%. Comparisons of all 1.5 Million reads to the Maize Assembled Genomic Islands (MAGIs) provided evidence for the transcriptional activity of 11% of MAGIs. We estimate that 5.67% (86,069 sequences) do not align with public ESTs or annotated genes, potentially representing new maize transcripts. Following the assembly of 74.4% of the reads in 65,493 contigs, real-time PCR of selected genes confirmed a predicted correlation between the abundance of GS20-454 sequences and corresponding levels of gene expression. A protocol was developed that significantly increases the number, length and quality of cDNA reads using massive 454 parallel sequencing. We show that recurrent 454 pyrosequencing of a single cDNA sample is necessary to attain a thorough representation of the transcriptional universe present in maize, that can also be used to estimate transcript abundance of specific genes. This data suggests that the molecular and functional diversity contained in the vast native landraces remains to be explored, and that large-scale transcriptional sequencing of a presumed ancestor of the modern maize varieties represents a valuable approach to characterize the functional diversity of maize for future agricultural and evolutionary studies.
Groves, Benjamin; Kuchina, Anna; Rosenberg, Alexander B.; Jojic, Nebojsa; Fields, Stanley; Seelig, Georg
2017-01-01
Our ability to predict protein expression from DNA sequence alone remains poor, reflecting our limited understanding of cis-regulatory grammar and hampering the design of engineered genes for synthetic biology applications. Here, we generate a model that predicts the protein expression of the 5′ untranslated region (UTR) of mRNAs in the yeast Saccharomyces cerevisiae. We constructed a library of half a million 50-nucleotide-long random 5′ UTRs and assayed their activity in a massively parallel growth selection experiment. The resulting data allow us to quantify the impact on protein expression of Kozak sequence composition, upstream open reading frames (uORFs), and secondary structure. We trained a convolutional neural network (CNN) on the random library and showed that it performs well at predicting the protein expression of both a held-out set of the random 5′ UTRs as well as native S. cerevisiae 5′ UTRs. The model additionally was used to computationally evolve highly active 5′ UTRs. We confirmed experimentally that the great majority of the evolved sequences led to higher protein expression rates than the starting sequences, demonstrating the predictive power of this model. PMID:29097404
A Statistical Guide to the Design of Deep Mutational Scanning Experiments
Matuszewski, Sebastian; Hildebrandt, Marcel E.; Ghenu, Ana-Hermina; Jensen, Jeffrey D.; Bank, Claudia
2016-01-01
The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deep-sequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates. PMID:27412710
DOE Office of Scientific and Technical Information (OSTI.GOV)
Whitehead, Timothy A.; Chevalier, Aaron; Song, Yifan
2012-06-19
We show that comprehensive sequence-function maps obtained by deep sequencing can be used to reprogram interaction specificity and to leapfrog over bottlenecks in affinity maturation by combining many individually small contributions not detectable in conventional approaches. We use this approach to optimize two computationally designed inhibitors against H1N1 influenza hemagglutinin and, in both cases, obtain variants with subnanomolar binding affinity. The most potent of these, a 51-residue protein, is broadly cross-reactive against all influenza group 1 hemagglutinins, including human H2, and neutralizes H1N1 viruses with a potency that rivals that of several human monoclonal antibodies, demonstrating that computational design followedmore » by comprehensive energy landscape mapping can generate proteins with potential therapeutic utility.« less
Dissecting genetic and environmental mutation signatures with model organisms.
Segovia, Romulo; Tam, Annie S; Stirling, Peter C
2015-08-01
Deep sequencing has impacted on cancer research by enabling routine sequencing of genomes and exomes to identify genetic changes associated with carcinogenesis. Researchers can now use the frequency, type, and context of all mutations in tumor genomes to extract mutation signatures that reflect the driving mutational processes. Identifying mutation signatures, however, may not immediately suggest a mechanism. Consequently, several recent studies have employed deep sequencing of model organisms exposed to discrete genetic or environmental perturbations. These studies exploit the simpler genomes and availability of powerful genetic tools in model organisms to analyze mutation signatures under controlled conditions, forging mechanistic links between mutational processes and signatures. We discuss the power of this approach and suggest that many such studies may be on the horizon. Copyright © 2015 Elsevier Ltd. All rights reserved.
Desjardin, Dennis E; Hemmes, Don E; Perry, Brian A
2014-01-01
Pseudobaeospora wipapatiae is described as new based on material collected in alien wet habitats on the island of Hawaii. Unique features of this beautiful species include deep ruby-colored basidiomes with two-spored basidia, amyloid cheilocystidia and a hymeniderm pileipellis with abundant pileocystidia that is initially deep ruby in KOH then changes to lilac gray. Phylogenetic analysis of nuclear large ribosomal subunit sequence data suggest a close relationship between Pseudobaeospora and Tricholoma. BLAST comparisons of internal transcribed spacer and 5.8S nuclear ribosomal subunit regions sequence data reveal greatest similarity with existing sequences of Pseudobaeospora species. A comprehensive description, color photograph, illustrations of salient micromorphological features and comparisons with phenetically similar taxa are provided. © 2014 by The Mycological Society of America.
Spiliopoulou, Athina; Colombo, Marco; Orchard, Peter; Agakov, Felix; McKeigue, Paul
2017-01-01
We address the task of genotype imputation to a dense reference panel given genotype likelihoods computed from ultralow coverage sequencing as inputs. In this setting, the data have a high-level of missingness or uncertainty, and are thus more amenable to a probabilistic representation. Most existing imputation algorithms are not well suited for this situation, as they rely on prephasing for computational efficiency, and, without definite genotype calls, the prephasing task becomes computationally expensive. We describe GeneImp, a program for genotype imputation that does not require prephasing and is computationally tractable for whole-genome imputation. GeneImp does not explicitly model recombination, instead it capitalizes on the existence of large reference panels—comprising thousands of reference haplotypes—and assumes that the reference haplotypes can adequately represent the target haplotypes over short regions unaltered. We validate GeneImp based on data from ultralow coverage sequencing (0.5×), and compare its performance to the most recent version of BEAGLE that can perform this task. We show that GeneImp achieves imputation quality very close to that of BEAGLE, using one to two orders of magnitude less time, without an increase in memory complexity. Therefore, GeneImp is the first practical choice for whole-genome imputation to a dense reference panel when prephasing cannot be applied, for instance, in datasets produced via ultralow coverage sequencing. A related future application for GeneImp is whole-genome imputation based on the off-target reads from deep whole-exome sequencing. PMID:28348060
Bazak, Lily; Haviv, Ami; Barak, Michal; Jacob-Hirsch, Jasmine; Deng, Patricia; Zhang, Rui; Isaacs, Farren J; Rechavi, Gideon; Li, Jin Billy; Eisenberg, Eli; Levanon, Erez Y
2014-03-01
RNA molecules transmit the information encoded in the genome and generally reflect its content. Adenosine-to-inosine (A-to-I) RNA editing by ADAR proteins converts a genomically encoded adenosine into inosine. It is known that most RNA editing in human takes place in the primate-specific Alu sequences, but the extent of this phenomenon and its effect on transcriptome diversity are not yet clear. Here, we analyzed large-scale RNA-seq data and detected ∼1.6 million editing sites. As detection sensitivity increases with sequencing coverage, we performed ultradeep sequencing of selected Alu sequences and showed that the scope of editing is much larger than anticipated. We found that virtually all adenosines within Alu repeats that form double-stranded RNA undergo A-to-I editing, although most sites exhibit editing at only low levels (<1%). Moreover, using high coverage sequencing, we observed editing of transcripts resulting from residual antisense expression, doubling the number of edited sites in the human genome. Based on bioinformatic analyses and deep targeted sequencing, we estimate that there are over 100 million human Alu RNA editing sites, located in the majority of human genes. These findings set the stage for exploring how this primate-specific massive diversification of the transcriptome is utilized.
Prediction of enhancer-promoter interactions via natural language processing.
Zeng, Wanwen; Wu, Mengmeng; Jiang, Rui
2018-05-09
Precise identification of three-dimensional genome organization, especially enhancer-promoter interactions (EPIs), is important to deciphering gene regulation, cell differentiation and disease mechanisms. Currently, it is a challenging task to distinguish true interactions from other nearby non-interacting ones since the power of traditional experimental methods is limited due to low resolution or low throughput. We propose a novel computational framework EP2vec to assay three-dimensional genomic interactions. We first extract sequence embedding features, defined as fixed-length vector representations learned from variable-length sequences using an unsupervised deep learning method in natural language processing. Then, we train a classifier to predict EPIs using the learned representations in supervised way. Experimental results demonstrate that EP2vec obtains F1 scores ranging from 0.841~ 0.933 on different datasets, which outperforms existing methods. We prove the robustness of sequence embedding features by carrying out sensitivity analysis. Besides, we identify motifs that represent cell line-specific information through analysis of the learned sequence embedding features by adopting attention mechanism. Last, we show that even superior performance with F1 scores 0.889~ 0.940 can be achieved by combining sequence embedding features and experimental features. EP2vec sheds light on feature extraction for DNA sequences of arbitrary lengths and provides a powerful approach for EPIs identification.
Dessimoz, Christophe; Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro
2011-09-01
Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references.
Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro
2011-01-01
Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references. PMID:21712341
Population genomics of intrapatient HIV-1 evolution
Zanini, Fabio; Brodin, Johanna; Thebo, Lina; Lanz, Christa; Bratt, Göran; Albert, Jan; Neher, Richard A
2015-01-01
Many microbial populations rapidly adapt to changing environments with multiple variants competing for survival. To quantify such complex evolutionary dynamics in vivo, time resolved and genome wide data including rare variants are essential. We performed whole-genome deep sequencing of HIV-1 populations in 9 untreated patients, with 6-12 longitudinal samples per patient spanning 5-8 years of infection. The data can be accessed and explored via an interactive web application. We show that patterns of minor diversity are reproducible between patients and mirror global HIV-1 diversity, suggesting a universal landscape of fitness costs that control diversity. Reversions towards the ancestral HIV-1 sequence are observed throughout infection and account for almost one third of all sequence changes. Reversion rates depend strongly on conservation. Frequent recombination limits linkage disequilibrium to about 100bp in most of the genome, but strong hitch-hiking due to short range linkage limits diversity. DOI: http://dx.doi.org/10.7554/eLife.11282.001 PMID:26652000
Landscape of Infiltrating T Cells in Liver Cancer Revealed by Single-Cell Sequencing.
Zheng, Chunhong; Zheng, Liangtao; Yoo, Jae-Kwang; Guo, Huahu; Zhang, Yuanyuan; Guo, Xinyi; Kang, Boxi; Hu, Ruozhen; Huang, Julie Y; Zhang, Qiming; Liu, Zhouzerui; Dong, Minghui; Hu, Xueda; Ouyang, Wenjun; Peng, Jirun; Zhang, Zemin
2017-06-15
Systematic interrogation of tumor-infiltrating lymphocytes is key to the development of immunotherapies and the prediction of their clinical responses in cancers. Here, we perform deep single-cell RNA sequencing on 5,063 single T cells isolated from peripheral blood, tumor, and adjacent normal tissues from six hepatocellular carcinoma patients. The transcriptional profiles of these individual cells, coupled with assembled T cell receptor (TCR) sequences, enable us to identify 11 T cell subsets based on their molecular and functional properties and delineate their developmental trajectory. Specific subsets such as exhausted CD8 + T cells and Tregs are preferentially enriched and potentially clonally expanded in hepatocellular carcinoma (HCC), and we identified signature genes for each subset. One of the genes, layilin, is upregulated on activated CD8 + T cells and Tregs and represses the CD8 + T cell functions in vitro. This compendium of transcriptome data provides valuable insights and a rich resource for understanding the immune landscape in cancers. Copyright © 2017 Elsevier Inc. All rights reserved.
Fischer, Sebastian; Greipel, Leonie; Klockgether, Jens; Dorda, Marie; Wiehlmann, Lutz; Cramer, Nina; Tümmler, Burkhard
2017-05-01
Early antimicrobial chemotherapy can prevent or at least delay chronic cystic fibrosis (CF) airways infections with Pseudomonas aeruginosa. During a 10-year study period P. aeruginosa was detected for the first time in 54 CF patients regularly seen at the CF centre Hannover. Amplicon sequencing of 34 loci of the P. aeruginosa core genome was performed in baseline and post-treatment isolates of the 15 CF patients who had remained P. aeruginosa - positive after the first round of antipseudomonal chemotherapy. Deep sequencing uncovered coexisting alternative nucleotides at in total 33 of 55,284 examined genome positions including six non-synonymous polymorphisms in the lasR gene, a key regulator of quorum sensing. After early treatment 42 of 50 novel nucleotide substitutions had emerged in exopolysaccharide biosynthesis, efflux pump and porin genes. Early treatment selects pathoadaptive mutations in P. aeruginosa that are typical for chronic infections of CF lungs. Copyright © 2016 European Cystic Fibrosis Society. Published by Elsevier B.V. All rights reserved.
Villacreses, Javier; Rojas-Herrera, Marcelo; Sánchez, Carolina; Hewstone, Nicole; Undurraga, Soledad F.; Alzate, Juan F.; Manque, Patricio; Maracaja-Coutinho, Vinicius; Polanco, Victor
2015-01-01
Here, we report the genome sequence and evidence for transcriptional activity of a virus-like element in the native Chilean berry tree Aristotelia chilensis. We propose to name the endogenous sequence as Aristotelia chilensis Virus 1 (AcV1). High-throughput sequencing of the genome of this tree uncovered an endogenous viral element, with a size of 7122 bp, corresponding to the complete genome of AcV1. Its sequence contains three open reading frames (ORFs): ORFs 1 and 2 shares 66%–73% amino acid similarity with members of the Caulimoviridae virus family, especially the Petunia vein clearing virus (PVCV), Petuvirus genus. ORF1 encodes a movement protein (MP); ORF2 a Reverse Transcriptase (RT) and a Ribonuclease H (RNase H) domain; and ORF3 showed no amino acid sequence similarity with any other known virus proteins. Analogous to other known endogenous pararetrovirus sequences (EPRVs), AcV1 is integrated in the genome of Maqui Berry and showed low viral transcriptional activity, which was detected by deep sequencing technology (DNA and RNA-seq). Phylogenetic analysis of AcV1 and other pararetroviruses revealed a closer resemblance with Petuvirus. Overall, our data suggests that AcV1 could be a new member of Caulimoviridae family, genus Petuvirus, and the first evidence of this kind of virus in a fruit plant. PMID:25855242
Hou, Weiguo; Wang, Shang; Briggs, Brandon R; Li, Gaoyuan; Xie, Wei; Dong, Hailiang
2018-01-01
Myocyanophages, a group of viruses infecting cyanobacteria, are abundant and play important roles in elemental cycling. Here we investigated the particle-associated viral communities retained on 0.2 μm filters and in sediment samples (representing ancient cyanophage communities) from four ocean and three lake locations, using high-throughput sequencing and a newly designed primer pair targeting a gene fragment (∼145-bp in length) encoding the cyanophage gp23 major capsid protein (MCP). Diverse viral communities were detected in all samples. The fragments of 142-, 145-, and 148-bp in length were most abundant in the amplicons, and most sequences (>92%) belonged to cyanophages. Additionally, different sequencing depths resulted in different diversity estimates of the viral community. Operational taxonomic units obtained from deep sequencing of the MCP gene covered the majority of those obtained from shallow sequencing, suggesting that deep sequencing exhibited a more complete picture of cyanophage community than shallow sequencing. Our results also revealed a wide geographic distribution of marine myocyanophages, i.e., higher dissimilarities of the myocyanophage communities corresponded with the larger distances between the sampling sites. Collectively, this study suggests that the newly designed primer pair can be effectively used to study the community and diversity of myocyanophage from different environments, and the high-throughput sequencing represents a good method to understand viral diversity.
Hou, Weiguo; Wang, Shang; Briggs, Brandon R.; Li, Gaoyuan; Xie, Wei; Dong, Hailiang
2018-01-01
Myocyanophages, a group of viruses infecting cyanobacteria, are abundant and play important roles in elemental cycling. Here we investigated the particle-associated viral communities retained on 0.2 μm filters and in sediment samples (representing ancient cyanophage communities) from four ocean and three lake locations, using high-throughput sequencing and a newly designed primer pair targeting a gene fragment (∼145-bp in length) encoding the cyanophage gp23 major capsid protein (MCP). Diverse viral communities were detected in all samples. The fragments of 142-, 145-, and 148-bp in length were most abundant in the amplicons, and most sequences (>92%) belonged to cyanophages. Additionally, different sequencing depths resulted in different diversity estimates of the viral community. Operational taxonomic units obtained from deep sequencing of the MCP gene covered the majority of those obtained from shallow sequencing, suggesting that deep sequencing exhibited a more complete picture of cyanophage community than shallow sequencing. Our results also revealed a wide geographic distribution of marine myocyanophages, i.e., higher dissimilarities of the myocyanophage communities corresponded with the larger distances between the sampling sites. Collectively, this study suggests that the newly designed primer pair can be effectively used to study the community and diversity of myocyanophage from different environments, and the high-throughput sequencing represents a good method to understand viral diversity.
High Class-Imbalance in pre-miRNA Prediction: A Novel Approach Based on deepSOM.
Stegmayer, Georgina; Yones, Cristian; Kamenetzky, Laura; Milone, Diego H
2017-01-01
The computational prediction of novel microRNA within a full genome involves identifying sequences having the highest chance of being a miRNA precursor (pre-miRNA). These sequences are usually named candidates to miRNA. The well-known pre-miRNAs are usually only a few in comparison to the hundreds of thousands of potential candidates to miRNA that have to be analyzed, which makes this task a high class-imbalance classification problem. The classical way of approaching it has been training a binary classifier in a supervised manner, using well-known pre-miRNAs as positive class and artificially defining the negative class. However, although the selection of positive labeled examples is straightforward, it is very difficult to build a set of negative examples in order to obtain a good set of training samples for a supervised method. In this work, we propose a novel and effective way of approaching this problem using machine learning, without the definition of negative examples. The proposal is based on clustering unlabeled sequences of a genome together with well-known miRNA precursors for the organism under study, which allows for the quick identification of the best candidates to miRNA as those sequences clustered with known precursors. Furthermore, we propose a deep model to overcome the problem of having very few positive class labels. They are always maintained in the deep levels as positive class while less likely pre-miRNA sequences are filtered level after level. Our approach has been compared with other methods for pre-miRNAs prediction in several species, showing effective predictivity of novel miRNAs. Additionally, we will show that our approach has a lower training time and allows for a better graphical navegability and interpretation of the results. A web-demo interface to try deepSOM is available at http://fich.unl.edu.ar/sinc/web-demo/deepsom/.
NASA Astrophysics Data System (ADS)
Zhao, Feng; Xu, Kuidong
2016-10-01
In comparison with the macrobenthos and prokaryotes, patterns of diversity and distribution of microbial eukaryotes in deep-sea hydrothermal vents are poorly known. The widely used high-throughput sequencing of 18S rDNA has revealed a high diversity of microeukaryotes yielded from both living organisms and buried DNA in marine sediments. More recently, cDNA surveys have been utilized to uncover the diversity of active organisms. However, both methods have never been used to evaluate the diversity of ciliates in hydrothermal vents. By using high-throughput DNA and cDNA sequencing of 18S rDNA, we evaluated the molecular diversity of ciliates, a representative group of microbial eukaryotes, from the sediments of deep-sea hydrothermal vents in the Okinawa Trough and compared it with that of an adjacent deep-sea area about 15 km away and that of an offshore area of the Yellow Sea about 500 km away. The results of DNA sequencing showed that Spirotrichea and Oligohymenophorea were the most diverse and abundant groups in all the three habitats. The proportion of sequences of Oligohymenophorea was the highest in the hydrothermal vents whereas Spirotrichea was the most diverse group at all three habitats. Plagiopyleans were found only in the hydrothermal vents but with low diversity and abundance. By contrast, the cDNA sequencing showed that Plagiopylea was the most diverse and most abundant group in the hydrothermal vents, followed by Spirotrichea in terms of diversity and Oligohymenophorea in terms of relative abundance. A novel group of ciliates, distinctly separate from the 12 known classes, was detected in the hydrothermal vents, indicating undescribed, possibly highly divergent ciliates may inhabit this environment. Statistical analyses showed that: (i) the three habitats differed significantly from one another in terms of diversity of both the rare and the total ciliate taxa, and; (ii) the adjacent deep sea was more similar to the offshore area than to the hydrothermal vents. In terms of the diversity of abundant taxa, however, there was no significant difference between the hydrothermal vents and the adjacent deep sea, both of which differed significantly from the offshore area. As abundant ciliate taxa can be found in several sampling sites, they are likely adapted to large environmental variations, while rare taxa are found in specific habitat and thus are potentially more sensitive to varying environmental conditions.
Role of Mitochondrial Inheritance on Prostate Cancer Outcome in African American Men. Addendum
2016-11-01
DNA sequencing technique developed by our collaborator using single amplicon long-range PCR that permits deep coverage (10,000-20,000X on average) of...the mitochondrial genome. We have sequenced 652 samples derived from frozen fully using this technology. The additional DNA samples derived from...paraffin embedded (FFPE) tissue were more challenging, but have now been sequenced . Mapping of DNA variants in our sequenced genomes to mitochondrial
Hua, Xiao-Mei; Liang, Li; Wen, Zhong-Ling; Du, Mei-Hang; Meng, Fan-Fan; Pang, Yan-Jun; Tang, Cheng-Yi
2018-01-01
The worldwide commercial cultivation of transgenic crops, including glyphosate-tolerant (GT) soybeans, has increased widely during the past 20 years. However, it is accompanied with a growing concern about potential effects of transgenic crops on the soil microbial communities, especially on rhizosphere bacterial communities. Our previous study found that the GT soybean line NZL06-698 (N698) significantly affected rhizosphere bacteria, including some unidentified taxa, through 16S rRNA gene (16S rDNA) V4 region amplicon deep sequencing via Illumina MiSeq. In this study, we performed 16S rDNA V5–V7 region amplicon deep sequencing via Illumina MiSeq and shotgun metagenomic approaches to identify those major taxa. Results of these processes revealed that the species richness and evenness increased in the rhizosphere bacterial communities of N698, the beta diversity of the rhizosphere bacterial communities of N698 was affected, and that certain dominant bacterial phyla and genera were related to N698 compared with its control cultivar Mengdou12. Consistent with our previous findings, this study showed that N698 affects the rhizosphere bacterial communities. In specific, N698 negatively affects Rahnella, Janthinobacterium, Stenotrophomonas, Sphingomonas and Luteibacter while positively affecting Arthrobacter, Bradyrhizobium, Ramlibacter and Nitrospira. PMID:29659545
Adelman, Zach N.; Kilcullen, Kathleen A.; Koganemaru, Reina; Anderson, Michelle A. E.; Anderson, Troy D.; Miller, Dini M.
2011-01-01
A frightening resurgence of bed bug infestations has occurred over the last 10 years in the U.S. and current chemical methods have been inadequate for controlling this pest due to widespread insecticide resistance. Little is known about the mechanisms of resistance present in U.S. bed bug populations, making it extremely difficult to develop intelligent strategies for their control. We have identified bed bugs collected in Richmond, VA which exhibit both kdr-type (L925I) and metabolic resistance to pyrethroid insecticides. Using LD50 bioassays, we determined that resistance ratios for Richmond strain bed bugs were ∼5200-fold to the insecticide deltamethrin. To identify metabolic genes potentially involved in the detoxification of pyrethroids, we performed deep-sequencing of the adult bed bug transcriptome, obtaining more than 2.5 million reads on the 454 titanium platform. Following assembly, analysis of newly identified gene transcripts in both Harlan (susceptible) and Richmond (resistant) bed bugs revealed several candidate cytochrome P450 and carboxylesterase genes which were significantly over-expressed in the resistant strain, consistent with the idea of increased metabolic resistance. These data will accelerate efforts to understand the biochemical basis for insecticide resistance in bed bugs, and provide molecular markers to assist in the surveillance of metabolic resistance. PMID:22039447
Beck, Andrew; Tesh, Robert B.; Wood, Thomas G.; Widen, Steven G.; Ryman, Kate D.; Barrett, Alan D. T.
2014-01-01
Background. The first comparison of a live RNA viral vaccine strain to its wild-type parental strain by deep sequencing is presented using as a model the yellow fever virus (YFV) live vaccine strain 17D-204 and its wild-type parental strain, Asibi. Methods. The YFV 17D-204 vaccine genome was compared to that of the parental strain Asibi by massively parallel methods. Variability was compared on multiple scales of the viral genomes. A modeled exploration of small-frequency variants was performed to reconstruct plausible regions of mutational plasticity. Results. Overt quasispecies diversity is a feature of the parental strain, whereas the live vaccine strain lacks diversity according to multiple independent measurements. A lack of attenuating mutations in the Asibi population relative to that of 17D-204 was observed, demonstrating that the vaccine strain was derived by discrete mutation of Asibi and not by selection of genomes in the wild-type population. Conclusions. Relative quasispecies structure is a plausible correlate of attenuation for live viral vaccines. Analyses such as these of attenuated viruses improve our understanding of the molecular basis of vaccine attenuation and provide critical information on the stability of live vaccines and the risk of reversion to virulence. PMID:24141982
Beck, Andrew; Tesh, Robert B; Wood, Thomas G; Widen, Steven G; Ryman, Kate D; Barrett, Alan D T
2014-02-01
The first comparison of a live RNA viral vaccine strain to its wild-type parental strain by deep sequencing is presented using as a model the yellow fever virus (YFV) live vaccine strain 17D-204 and its wild-type parental strain, Asibi. The YFV 17D-204 vaccine genome was compared to that of the parental strain Asibi by massively parallel methods. Variability was compared on multiple scales of the viral genomes. A modeled exploration of small-frequency variants was performed to reconstruct plausible regions of mutational plasticity. Overt quasispecies diversity is a feature of the parental strain, whereas the live vaccine strain lacks diversity according to multiple independent measurements. A lack of attenuating mutations in the Asibi population relative to that of 17D-204 was observed, demonstrating that the vaccine strain was derived by discrete mutation of Asibi and not by selection of genomes in the wild-type population. Relative quasispecies structure is a plausible correlate of attenuation for live viral vaccines. Analyses such as these of attenuated viruses improve our understanding of the molecular basis of vaccine attenuation and provide critical information on the stability of live vaccines and the risk of reversion to virulence.
Fu, Lu-Lu; Xu, Ying; Li, Dan-Dan; Dai, Xiao-Wei; Xu, Xin; Zhang, Jing-Shun; Ming, Hao; Zhang, Xue-Ying; Zhang, Guo-Qing; Ma, Ya-Lan; Zheng, Lian-Wen
2018-05-30
Polycystic ovary syndrome (PCOS) is one of the most common endocrine disorders in reproductive-aged women. However, the exact pathophysiology of PCOS remains largely unclear. We performed deep sequencing to investigate the mRNA and long noncoding RNA (lncRNA) expression profiles in the ovarian tissues of letrozole-induced PCOS rat model and control rats. A total of 2147 mRNAs and 158 lncRNAs were differentially expressed between the PCOS models and control. Gene ontology analysis indicated that differentially expressed mRNAs were associated with biological adhesion, reproduction, and metabolic process. Pathway analysis results indicated that these aberrantly expressed mRNAs were related to several specific signaling pathways, including insulin resistance, steroid hormone biosynthesis, PPAR signaling pathway, cell adhesion molecules, autoimmune thyroid disease, and AMPK signaling pathway. The relative expression levels of mRNAs and lncRNAs were validated through qRT-PCR. LncRNA-miRNA-mRNA network was constructed to explore ceRNAs involved in the PCOS model and were also verified by qRTPCR experiment. These findings may provide insight into the pathogenesis of PCOS and clues to find key diagnostic and therapeutic roles of lncRNA in PCOS. Copyright © 2018 Elsevier B.V. All rights reserved.
Deep Sequencing Reveals a Divergent Ugandan cassava brown streak virus Isolate from Malawi
Winter, Stephan; Mukasa, Settumba; Tairo, Fred; Sseruwagi, Peter; Ndunguru, Joseph; Duffy, Siobain
2017-01-01
ABSTRACT Illumina sequencing of RNA from a cassava cutting from northern Malawi produced a genome of Ugandan cassava brown streak virus (UCBSV-MW-NB7_2013). Sequence comparisons revealed stronger similarity to an isolate from nearby Tanzania (93.4% pairwise nucleotide identity) than to those previously reported from Malawi (86.9 to 87.0%). PMID:28818908
USDA-ARS?s Scientific Manuscript database
The soybean Consensus Map 4.0 facilitated the anchoring of 95.6% of the soybean whole genome sequence developed by the Joint Genome Institute, Department of Energy but only properly oriented 66% of the sequence scaffolds. To find additional single nucleotide polymorphism (SNP) markers for additiona...
Identifying active foraminifera in the Sea of Japan using metatranscriptomic approach
NASA Astrophysics Data System (ADS)
Lejzerowicz, Franck; Voltsky, Ivan; Pawlowski, Jan
2013-02-01
Metagenetics represents an efficient and rapid tool to describe environmental diversity patterns of microbial eukaryotes based on ribosomal DNA sequences. However, the results of metagenetic studies are often biased by the presence of extracellular DNA molecules that are persistent in the environment, especially in deep-sea sediment. As an alternative, short-lived RNA molecules constitute a good proxy for the detection of active species. Here, we used a metatranscriptomic approach based on RNA-derived (cDNA) sequences to study the diversity of the deep-sea benthic foraminifera and compared it to the metagenetic approach. We analyzed 257 ribosomal DNA and cDNA sequences obtained from seven sediments samples collected in the Sea of Japan at depths ranging from 486 to 3665 m. The DNA and RNA-based approaches gave a similar view of the taxonomic composition of foraminiferal assemblage, but differed in some important points. First, the cDNA dataset was dominated by sequences of rotaliids and robertiniids, suggesting that these calcareous species, some of which have been observed in Rose Bengal stained samples, are the most active component of foraminiferal community. Second, the richness of monothalamous (single-chambered) foraminifera was particularly high in DNA extracts from the deepest samples, confirming that this group of foraminifera is abundant but not necessarily very active in the deep-sea sediments. Finally, the high divergence of undetermined sequences in cDNA dataset indicate the limits of our database and lack of knowledge about some active but possibly rare species. Our study demonstrates the capability of the metatranscriptomic approach to detect active foraminiferal species and prompt its use in future high-throughput sequencing-based environmental surveys.
NASA Astrophysics Data System (ADS)
Okay, Aral I.; Altiner, Demir
2016-10-01
The Haymana region in Central Anatolia is located in the southern part of the Pontides close to the İzmir-Ankara suture. During the Cretaceous, the region formed part of the south-facing active margin of the Eurasia. The area preserves a nearly complete record of the Cretaceous system. Shallow marine carbonates of earliest Cretaceous age are overlain by a 700-m-thick Cretaceous sequence, dominated by deep marine limestones. Three unconformity-bounded pelagic carbonate sequences of Berriasian, Albian-Cenomanian and Turonian-Santonian ages are recognized: Each depositional sequence is preceded by a period of tilting and submarine erosion during the Berriasian, early Albian and late Cenomanian, which corresponds to phases of local extension in the active continental margin. Carbonate breccias mark the base of the sequences and each carbonate sequence steps down on older units. The deep marine carbonate deposition ended in the late Santonian followed by tilting, erosion and folding during the Campanian. Deposition of thick siliciclastic turbidites started in the late Campanian and continued into the Tertiary. Unlike most forearc basins, the Haymana region was a site of deep marine carbonate deposition until the Campanian. This was because the Pontide arc was extensional and the volcanic detritus was trapped in the intra-arc basins and did not reach the forearc or the trench. The extensional nature of the arc is also shown by the opening of the Black Sea as a backarc basin in the Turonian-Santonian. The carbonate sedimentation in an active margin is characterized by synsedimentary vertical displacements, which results in submarine erosion, carbonate breccias and in the lateral discontinuity of the sequences, and differs from blanket like carbonate deposition in the passive margins.
2014-01-01
Background Hypervariable region 1 (HVR1) contained within envelope protein 2 (E2) gene is the most variable part of HCV genome and its translation product is a major target for the host immune response. Variability within HVR1 may facilitate evasion of the immune response and could affect treatment outcome. The aim of the study was to analyze the impact of HVR1 heterogeneity employing sensitive ultra-deep sequencing, on the outcome of PEG-IFN-α (pegylated interferon α) and ribavirin treatment. Methods HVR1 sequences were amplified from pretreatment serum samples of 25 patients infected with genotype 1b HCV (12 responders and 13 non-responders) and were subjected to pyrosequencing (GS Junior, 454/Roche). Reads were corrected for sequencing error using ShoRAH software, while population reconstruction was done using three different minimal variant frequency cut-offs of 1%, 2% and 5%. Statistical analysis was done using Mann–Whitney and Fisher’s exact tests. Results Complexity, Shannon entropy, nucleotide diversity per site, genetic distance and the number of genetic substitutions were not significantly different between responders and non-responders, when analyzing viral populations at any of the three frequencies (≥1%, ≥2% and ≥5%). When clonal sample was used to determine pyrosequencing error, 4% of reads were found to be incorrect and the most abundant variant was present at a frequency of 1.48%. Use of ShoRAH reduced the sequencing error to 1%, with the most abundant erroneous variant present at frequency of 0.5%. Conclusions While deep sequencing revealed complex genetic heterogeneity of HVR1 in chronic hepatitis C patients, there was no correlation between treatment outcome and any of the analyzed quasispecies parameters. PMID:25016390
BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone.
Yang, Bite; Liu, Feng; Ren, Chao; Ouyang, Zhangyi; Xie, Ziwei; Bo, Xiaochen; Shu, Wenjie
2017-07-01
Enhancer elements are noncoding stretches of DNA that play key roles in controlling gene expression programmes. Despite major efforts to develop accurate enhancer prediction methods, identifying enhancer sequences continues to be a challenge in the annotation of mammalian genomes. One of the major issues is the lack of large, sufficiently comprehensive and experimentally validated enhancers for humans or other species. Thus, the development of computational methods based on limited experimentally validated enhancers and deciphering the transcriptional regulatory code encoded in the enhancer sequences is urgent. We present a deep-learning-based hybrid architecture, BiRen, which predicts enhancers using the DNA sequence alone. Our results demonstrate that BiRen can learn common enhancer patterns directly from the DNA sequence and exhibits superior accuracy, robustness and generalizability in enhancer prediction relative to other state-of-the-art enhancer predictors based on sequence characteristics. Our BiRen will enable researchers to acquire a deeper understanding of the regulatory code of enhancer sequences. Our BiRen method can be freely accessed at https://github.com/wenjiegroup/BiRen . shuwj@bmi.ac.cn or boxc@bmi.ac.cn. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Deep-towed high resolution seismic imaging II: Determination of P-wave velocity distribution
NASA Astrophysics Data System (ADS)
Marsset, B.; Ker, S.; Thomas, Y.; Colin, F.
2018-02-01
The acquisition of high resolution seismic data in deep waters requires the development of deep towed seismic sources and receivers able to deal with the high hydrostatic pressure environment. The low frequency piezoelectric transducer of the SYSIF (SYstème Sismique Fond) deep towed seismic device comply with the former requirement taking advantage of the coupling of a mechanical resonance (Janus driver) and a fluid resonance (Helmholtz cavity) to produce a large frequency bandwidth acoustic signal (220-1050 Hz). The ability to perform deep towed multichannel seismic imaging with SYSIF was demonstrated in 2014, yet, the ability to determine P-wave velocity distribution wasn't achieved. P-wave velocity analysis relies on the ratio between the source-receiver offset range and the depth of the seismic reflectors, thus towing the seismic source and receivers closer to the sea bed will provide a better geometry for P-wave velocity determination. Yet, technical issues, related to the acoustic source directivity, arise for this approach in the particular framework of piezoelectric sources. A signal processing sequence is therefore added to the initial processing flow. Data acquisition took place during the GHASS (Gas Hydrates, fluid Activities and Sediment deformations in the western Black Sea) cruise in the Romanian waters of the Black Sea. The results of the imaging processing are presented for two seismic data sets acquired over gas hydrates and gas bearing sediments. The improvement in the final seismic resolution demonstrates the validity of the velocity model.
Li, Man; Ling, Cheng; Xu, Qi; Gao, Jingyang
2018-02-01
Sequence classification is crucial in predicting the function of newly discovered sequences. In recent years, the prediction of the incremental large-scale and diversity of sequences has heavily relied on the involvement of machine-learning algorithms. To improve prediction accuracy, these algorithms must confront the key challenge of extracting valuable features. In this work, we propose a feature-enhanced protein classification approach, considering the rich generation of multiple sequence alignment algorithms, N-gram probabilistic language model and the deep learning technique. The essence behind the proposed method is that if each group of sequences can be represented by one feature sequence, composed of homologous sites, there should be less loss when the sequence is rebuilt, when a more relevant sequence is added to the group. On the basis of this consideration, the prediction becomes whether a query sequence belonging to a group of sequences can be transferred to calculate the probability that the new feature sequence evolves from the original one. The proposed work focuses on the hierarchical classification of G-protein Coupled Receptors (GPCRs), which begins by extracting the feature sequences from the multiple sequence alignment results of the GPCRs sub-subfamilies. The N-gram model is then applied to construct the input vectors. Finally, these vectors are imported into a convolutional neural network to make a prediction. The experimental results elucidate that the proposed method provides significant performance improvements. The classification error rate of the proposed method is reduced by at least 4.67% (family level I) and 5.75% (family Level II), in comparison with the current state-of-the-art methods. The implementation program of the proposed work is freely available at: https://github.com/alanFchina/CNN .
Vernick, Kenneth D.
2017-01-01
Metavisitor is a software package that allows biologists and clinicians without specialized bioinformatics expertise to detect and assemble viral genomes from deep sequence datasets. The package is composed of a set of modular bioinformatic tools and workflows that are implemented in the Galaxy framework. Using the graphical Galaxy workflow editor, users with minimal computational skills can use existing Metavisitor workflows or adapt them to suit specific needs by adding or modifying analysis modules. Metavisitor works with DNA, RNA or small RNA sequencing data over a range of read lengths and can use a combination of de novo and guided approaches to assemble genomes from sequencing reads. We show that the software has the potential for quick diagnosis as well as discovery of viruses from a vast array of organisms. Importantly, we provide here executable Metavisitor use cases, which increase the accessibility and transparency of the software, ultimately enabling biologists or clinicians to focus on biological or medical questions. PMID:28045932
Copeland, Alex; Gu, Wei; Yasawong, Montri; Lapidus, Alla; Lucas, Susan; Deshpande, Shweta; Pagani, Ioanna; Tapia, Roxanne; Cheng, Jan-Fang; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Pan, Chongle; Brambilla, Evelyne-Marie; Rohde, Manfred; Tindall, Brian J.; Sikorski, Johannes; Göker, Markus; Detter, John C.; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Woyke, Tanja
2012-01-01
Marinithermus hydrothermalis Sako et al. 2003 is the type species of the monotypic genus Marinithermus. M. hydrothermalis T1T was the first isolate within the phylum “Thermus-Deinococcus” to exhibit optimal growth under a salinity equivalent to that of sea water and to have an absolute requirement for NaCl for growth. M. hydrothermalis T1T is of interest because it may provide a new insight into the ecological significance of the aerobic, thermophilic decomposers in the circulation of organic compounds in deep-sea hydrothermal vent ecosystems. This is the first completed genome sequence of a member of the genus Marinithermus and the seventh sequence from the family Thermaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,269,167 bp long genome with its 2,251 protein-coding and 59 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:22675595
EZH2 mutations and promoter hypermethylation in childhood acute lymphoblastic leukemia.
Schäfer, Vivien; Ernst, Jana; Rinke, Jenny; Winkelmann, Nils; Beck, James F; Hochhaus, Andreas; Gruhn, Bernd; Ernst, Thomas
2016-07-01
Acute lymphoblastic leukemia (ALL) is the most common malignancy in children and young adults. The polycomb repressive complex 2 (PRC2) has been identified as one of the most frequently mutated epigenetic protein complexes in hematologic cancers. PRC2 acts as an epigenetic repressor through histone H3 lysine 27 trimethylation (H3K27me3), catalyzed by the histone methyltransferase enhancer of zeste homolog 2 protein (EZH2). To study the prevalence and clinical impact of PRC2 aberrations in an unselected childhood ALL cohort (n = 152), we performed PRC2 mutational screenings by Sanger sequencing and promoter methylation analyses by quantitative pyrosequencing for the three PRC2 core component genes EZH2, suppressor of zeste 12 (SUZ12), and embryonic ectoderm development (EED). Targeted deep next-generation sequencing of 30 frequently mutated genes in leukemia was performed to search for cooperating mutations in patients harboring PRC2 aberrations. Finally, the functional consequence of EZH2 promoter hypermethylation on H3K27me3 was studied by Western blot analyses of primary cells. Loss-of-function EZH2 mutations were detected in 2/152 (1.3 %) patients with common-ALL and early T-cell precursor (ETP)-ALL, respectively. In one patient, targeted deep sequencing identified cooperating mutations in ASXL1 and TET2. EZH2 promoter hypermethylation was found in one patient with ETP-ALL which led to reduced H3K27me3. In comparison with healthy children, the EZH2 promoter was significantly higher methylated in T-ALL patients. No mutations or promoter methylation changes were identified for SUZ12 or EED genes, respectively. Although PRC2 aberrations seem to be rare in childhood ALL, our findings indicate that EZH2 aberrations might contribute to the disease in specific cases. Hereby, EZH2 promoter hypermethylation might have functionally similar consequences as loss-of-function mutations.
NASA Astrophysics Data System (ADS)
Eyles, Nicholas; Mullins, Henry T.; Hine, Albert C.
1991-09-01
This paper presents the first detailed data regarding the newly discovered deep infill of Okanagan Lake. Okanagan Lake (50°00'N, 119°30'W) is 120 km long, ˜ 3-5 km wide and occupies a glacially overdeepened bedrock basin in the southern interior of British Columbia. This basin, and other elongate lakes of the region (e.g. Shuswap, Kootenay, Kalamalka, Canim and Mahood lakes), mark the site of westward flowing ice streams within successive Cordilleran ice sheets. An air gun seismic survey of Okanagan Lake shows that the bedrock floor is nearly 650 m below sea-level, more than 2000 m below the rim of the surrounding plateau. The maximum thickness of Pleistocene sediment in Okanagan Lake basin approaches 800 m. Forty-six seismic reflection traverses and an axial profile show a relatively simple stratigraphy composed of three seismic sequences argued to be no older than the last glacial cycle (< 30 ka). A discontinuous basal unit (sequence I) characterized by large-scale diffractions, and up to 460 m thick, infills the narrow, V-shaped bedrock floor of the basin and is interpreted as a boulder gravel deposited by subglacial meltwaters. Overlying seismic sequence II is composed of two sub-sequences. Sub-sequence IIa is a chaotic to massive facies up to 736 m thick. Lakeshore exposures close to where this unit reaches lake level show deformed and chaotically-bedded glaciolacustrine silts containing gravel lens and large ice-rafted boulders. The surface topography of this sub-sequence is irregular and in general mimics the form of the underlying bedrock as a result of compaction. This sequence passes laterally into stratified facies (sub-sequence IIb) at the northern end of the basin. Seismic sequence II appears to record rapid ice-proximal dumping of glaciolacustrine silt as the Okanagan glacier backwasted upvalley in a deep lake. A thin (60 m max.) laminated seismic sequence (III) drapes the hummocky surface of sequence II and represents postglacial sedimentation from fan-deltas. The extreme thickness of sequences I and II in Okanagan Lake reflects the focussing of large volumes of meltwater and sediment into the basin during deglaciation; pre-existing sediments that pre-date the last glacial cycle appear to have been completely eroded. Glaciological conditions during sedimentation may have been similar to marine-based outlet glaciers calving in deep water in fiord basins. In contrast to marine settings where ice bergs are free to disperse, large volumes of dead ice were trapped within the basin; structural evidence for sedimentation around dead ice blocks has been previously used to argue that the Cordilleran Ice Sheet downwasted in situ. We emphasize in contrast, the trapping of dead ice left behind by rapidly calving lake-based outlet glaciers.
Mullaji, Arun; Sharma, Amit; Marawar, Satyajit; Kanna, Raj
2009-08-01
A novel sequence of posteromedial release consistent with surgical technique of total knee arthroplasty was performed in 15 cadaveric knees. Medial and lateral flexion and extension gaps were measured after each step of the release using a computed tomography-free computer navigation system. A spring-loaded distractor and a manual distractor were used to distract the joint. Posterior cruciate ligament release increased flexion more than extension gap; deep medial collateral ligament release had a negligible effect; semimembranosus release increased the flexion gap medially; reduction osteotomy increased medial flexion and extension gaps; superficial medial collateral ligament release increased medial joint gap more in flexion and caused severe instability. This sequence of release led to incremental and differential effects on flexion-extension gaps and has implications in correcting varus deformity.
Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees.
Williams, Philip H; Eyles, Rod; Weiller, Georg
2012-01-01
MicroRNAs (miRNAs) are nonprotein coding RNAs between 20 and 22 nucleotides long that attenuate protein production. Different types of sequence data are being investigated for novel miRNAs, including genomic and transcriptomic sequences. A variety of machine learning methods have successfully predicted miRNA precursors, mature miRNAs, and other nonprotein coding sequences. MirTools, mirDeep2, and miRanalyzer require "read count" to be included with the input sequences, which restricts their use to deep-sequencing data. Our aim was to train a predictor using a cross-section of different species to accurately predict miRNAs outside the training set. We wanted a system that did not require read-count for prediction and could therefore be applied to short sequences extracted from genomic, EST, or RNA-seq sources. A miRNA-predictive decision-tree model has been developed by supervised machine learning. It only requires that the corresponding genome or transcriptome is available within a sequence window that includes the precursor candidate so that the required sequence features can be collected. Some of the most critical features for training the predictor are the miRNA:miRNA(∗) duplex energy and the number of mismatches in the duplex. We present a cross-species plant miRNA predictor with 84.08% sensitivity and 98.53% specificity based on rigorous testing by leave-one-out validation.
miRanalyzer: a microRNA detection and analysis tool for next-generation sequencing experiments.
Hackenberg, Michael; Sturm, Martin; Langenberger, David; Falcón-Pérez, Juan Manuel; Aransay, Ana M
2009-07-01
Next-generation sequencing allows now the sequencing of small RNA molecules and the estimation of their expression levels. Consequently, there will be a high demand of bioinformatics tools to cope with the several gigabytes of sequence data generated in each single deep-sequencing experiment. Given this scene, we developed miRanalyzer, a web server tool for the analysis of deep-sequencing experiments for small RNAs. The web server tool requires a simple input file containing a list of unique reads and its copy numbers (expression levels). Using these data, miRanalyzer (i) detects all known microRNA sequences annotated in miRBase, (ii) finds all perfect matches against other libraries of transcribed sequences and (iii) predicts new microRNAs. The prediction of new microRNAs is an especially important point as there are many species with very few known microRNAs. Therefore, we implemented a highly accurate machine learning algorithm for the prediction of new microRNAs that reaches AUC values of 97.9% and recall values of up to 75% on unseen data. The web tool summarizes all the described steps in a single output page, which provides a comprehensive overview of the analysis, adding links to more detailed output pages for each analysis module. miRanalyzer is available at http://web.bioinformatics.cicbiogune.es/microRNA/.
Lucas-Neto, Lia; Reimão, Sofia; Oliveira, Edson; Rainha-Campos, Alexandre; Sousa, João; Nunes, Rita G; Gonçalves-Ferreira, António; Campos, Jorge G
2015-07-01
The human nucleus accumbens (Acc) has become a target for deep brain stimulation (DBS) in some neuropsychiatric disorders. Nonetheless, even with the most recent advances in neuroimaging it remains difficult to accurately delineate the Acc and closely related subcortical structures, by conventional MRI sequences. It is our purpose to perform a MRI study of the human Acc and to determine whether there are reliable anatomical landmarks that enable the precise location and identification of the nucleus and its core/shell division. For the Acc identification and delineation, based on anatomical landmarks, T1WI, T1IR and STIR 3T-MR images were acquired in 10 healthy volunteers. Additionally, 32-direction DTI was obtained for Acc segmentation. Seed masks for the Acc were generated with FreeSurfer and probabilistic tractography was performed using FSL. The probability of connectivity between the seed voxels and distinct brain areas was determined and subjected to k-means clustering analysis, defining 2 different regions. With conventional T1WI, the Acc borders are better defined through its surrounding anatomical structures. The DTI color-coded vector maps and IR sequences add further detail in the Acc identification and delineation. Additionally, using probabilistic tractography it is possible to segment the Acc into a core and shell division and establish its structural connectivity with different brain areas. Advanced MRI techniques allow in vivo delineation and segmentation of the human Acc and represent an additional guiding tool in the precise and safe target definition for DBS. © 2015 International Neuromodulation Society.
Measurements of RF heating during 3.0-T MRI of a pig implanted with deep brain stimulator.
Gorny, Krzysztof R; Presti, Michael F; Goerss, Stephan J; Hwang, Sun C; Jang, Dong-Pyo; Kim, Inyong; Min, Hoon-Ki; Shu, Yunhong; Favazza, Christopher P; Lee, Kendall H; Bernstein, Matt A
2013-06-01
To present preliminary, in vivo temperature measurements during MRI of a pig implanted with a deep brain stimulation (DBS) system. DBS system (Medtronic Inc., Minneapolis, MN) was implanted in the brain of an anesthetized pig. 3.0-T MRI was performed with a T/R head coil using the low-SAR GRE EPI and IR-prepped GRE sequences (SAR: 0.42 and 0.39 W/kg, respectively), and the high-SAR 4-echo RF spin echo (SAR: 2.9 W/kg). Fluoroptic thermometry was used to directly measure RF-related heating at the DBS electrodes, and at the implantable pulse generator (IPG). For reference the measurements were repeated in the same pig at 1.5 T and, at both field strengths, in a phantom. At 3.0T, the maximal temperature elevations at DBS electrodes were 0.46 °C and 2.3 °C, for the low- and high-SAR sequences, respectively. No heating was observed on the implanted IPG during any of the measurements. Measurements of in vivo heating differed from those obtained in the phantom. The 3.0-T MRI using GRE EPI and IR-prepped GRE sequences resulted in local temperature elevations at DBS electrodes of no more than 0.46 °C. Although no extrapolation should be made to human exams and much further study will be needed, these preliminary data are encouraging for the future use 3.0-T MRI in patients with DBS. Copyright © 2013 Elsevier Inc. All rights reserved.
Measurements of RF Heating during 3.0T MRI of a Pig Implanted with Deep Brain Stimulator
Gorny, Krzysztof R; Presti, Michael F; Goerss, Stephan J; Hwang, Sun C; Jang, Dong-Pyo; Kim, Inyong; Shu, Yunhong; Favazza, Christopher P; Lee, Kendall H; Bernstein, Matt A
2012-01-01
Purpose To present preliminary, in vivo temperature measurements during MRI of a pig implanted with a deep brain stimulation (DBS) system. Materials and Methods DBS system (Medtronic Inc., Minneapolis, MN) was implanted in the brain of an anesthetized pig. 3.0T MRI was performed with a T/R head coil using the low-SAR GRE EPI and IR-prepped GRE sequences (SAR: 0.42 W/kg and 0.39 W/kg, respectively), and the high-SAR 4-echo RF spin echo (SAR: 2.9 W/kg). Fluoroptic thermometry was used to directly measure RF-related heating at the DBS electrodes, and at the implantable pulse generator (IPG). For reference the measurements were repeated in the same pig at 1.5T and, at both field strengths, in a phantom. Results At 3.0T, the maximal temperature elevations at DBS electrodes were 0.46 °C and 2.3 °C, for the low- and high-SAR sequences, respectively. No heating was observed on the implanted IPG during any of the measurements. Measurements of in-vivo heating differed from those obtained in the phantom. Conclusion The 3.0T MRI using GRE EPI and IR-prepped GRE sequences resulted in local temperature elevations at DBS electrodes of no more than 0.46°C. Although no extrapolation should be made to human exams and much further study will be needed, these preliminary data are encouraging for the future use 3.0T MRI in patients with DBS. PMID:23228310
A Statistical Guide to the Design of Deep Mutational Scanning Experiments.
Matuszewski, Sebastian; Hildebrandt, Marcel E; Ghenu, Ana-Hermina; Jensen, Jeffrey D; Bank, Claudia
2016-09-01
The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deep-sequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates. Copyright © 2016 by the Genetics Society of America.
Saghatelyan, Ani; Poghosyan, Lianna
2015-01-01
The 2,379,636-bp draft genome sequence of Thermus scotoductus strain K1, isolated from geothermal spring outlet located in the Karvachar region in Nagorno Karabakh is presented. Strain K1 shares about 80% genome sequence similarity with T. scotoductus strain SA-01, recovered from a deep gold mine in South Africa. PMID:26564055
Brannon, A Rose; Vakiani, Efsevia; Sylvester, Brooke E; Scott, Sasinya N; McDermott, Gregory; Shah, Ronak H; Kania, Krishan; Viale, Agnes; Oschwald, Dayna M; Vacic, Vladimir; Emde, Anne-Katrin; Cercek, Andrea; Yaeger, Rona; Kemeny, Nancy E; Saltz, Leonard B; Shia, Jinru; D'Angelica, Michael I; Weiser, Martin R; Solit, David B; Berger, Michael F
2014-08-28
Colorectal cancer is the second leading cause of cancer death in the United States, with over 50,000 deaths estimated in 2014. Molecular profiling for somatic mutations that predict absence of response to anti-EGFR therapy has become standard practice in the treatment of metastatic colorectal cancer; however, the quantity and type of tissue available for testing is frequently limited. Further, the degree to which the primary tumor is a faithful representation of metastatic disease has been questioned. As next-generation sequencing technology becomes more widely available for clinical use and additional molecularly targeted agents are considered as treatment options in colorectal cancer, it is important to characterize the extent of tumor heterogeneity between primary and metastatic tumors. We performed deep coverage, targeted next-generation sequencing of 230 key cancer-associated genes for 69 matched primary and metastatic tumors and normal tissue. Mutation profiles were 100% concordant for KRAS, NRAS, and BRAF, and were highly concordant for recurrent alterations in colorectal cancer. Additionally, whole genome sequencing of four patient trios did not reveal any additional site-specific targetable alterations. Colorectal cancer primary tumors and metastases exhibit high genomic concordance. As current clinical practices in colorectal cancer revolve around KRAS, NRAS, and BRAF mutation status, diagnostic sequencing of either primary or metastatic tissue as available is acceptable for most patients. Additionally, consistency between targeted sequencing and whole genome sequencing results suggests that targeted sequencing may be a suitable strategy for clinical diagnostic applications.
Brewer, Michael S; Swafford, Lynn; Spruill, Chad L; Bond, Jason E
2013-01-01
Arthropods are the most diverse group of eukaryotic organisms, but their phylogenetic relationships are poorly understood. Herein, we describe three mitochondrial genomes representing orders of millipedes for which complete genomes had not been characterized. Newly sequenced genomes are combined with existing data to characterize the protein coding regions of myriapods and to attempt to reconstruct the evolutionary relationships within the Myriapoda and Arthropoda. The newly sequenced genomes are similar to previously characterized millipede sequences in terms of synteny and length. Unique translocations occurred within the newly sequenced taxa, including one half of the Appalachioria falcifera genome, which is inverted with respect to other millipede genomes. Across myriapods, amino acid conservation levels are highly dependent on the gene region. Additionally, individual loci varied in the level of amino acid conservation. Overall, most gene regions showed low levels of conservation at many sites. Attempts to reconstruct the evolutionary relationships suffered from questionable relationships and low support values. Analyses of phylogenetic informativeness show the lack of signal deep in the trees (i.e., genes evolve too quickly). As a result, the myriapod tree resembles previously published results but lacks convincing support, and, within the arthropod tree, well established groups were recovered as polyphyletic. The novel genome sequences described herein provide useful genomic information concerning millipede groups that had not been investigated. Taken together with existing sequences, the variety of compositions and evolution of myriapod mitochondrial genomes are shown to be more complex than previously thought. Unfortunately, the use of mitochondrial protein-coding regions in deep arthropod phylogenetics appears problematic, a result consistent with previously published studies. Lack of phylogenetic signal renders the resulting tree topologies as suspect. As such, these data are likely inappropriate for investigating such ancient relationships.
Chong, Cheong-Meng; Leung, Siu Wai; Prieto-da-Silva, Álvaro R. B.; Havt, Alexandre; Quinet, Yves P.; Martins, Alice M. C.; Lee, Simon M. Y.; Rádis-Baptista, Gandhi
2014-01-01
Background Dinoponera quadriceps is a predatory giant ant that inhabits the Neotropical region and subdues its prey (insects) with stings that deliver a toxic cocktail of molecules. Human accidents occasionally occur and cause local pain and systemic symptoms. A comprehensive study of the D. quadriceps venom gland transcriptome is required to advance our knowledge about the toxin repertoire of the giant ant venom and to understand the physiopathological basis of Hymenoptera envenomation. Results We conducted a transcriptome analysis of a cDNA library from the D. quadriceps venom gland with Sanger sequencing in combination with whole-transcriptome shotgun deep sequencing. From the cDNA library, a total of 420 independent clones were analyzed. Although the proportion of dinoponeratoxin isoform precursors was high, the first giant ant venom inhibitor cysteine-knot (ICK) toxin was found. The deep next generation sequencing yielded a total of 2,514,767 raw reads that were assembled into 18,546 contigs. A BLAST search of the assembled contigs against non-redundant and Swiss-Prot databases showed that 6,463 contigs corresponded to BLASTx hits and indicated an interesting diversity of transcripts related to venom gene expression. The majority of these venom-related sequences code for a major polypeptide core, which comprises venom allergens, lethal-like proteins and esterases, and a minor peptide framework composed of inter-specific structurally conserved cysteine-rich toxins. Both the cDNA library and deep sequencing yielded large proportions of contigs that showed no similarities with known sequences. Conclusions To our knowledge, this is the first report of the venom gland transcriptome of the New World giant ant D. quadriceps. The glandular venom system was dissected, and the toxin arsenal was revealed; this process brought to light novel sequences that included an ICK-folded toxins, allergen proteins, esterases (phospholipases and carboxylesterases), and lethal-like toxins. These findings contribute to the understanding of the ecology, behavior and venomics of hymenopterans. PMID:24498135
Complete genome sequence of the Antarctic Halorubrum lacusprofundi type strain ACAM 34
Anderson, Iain J.; DasSarma, Priya; Lucas, Susan; ...
2016-09-10
Halorubrum lacusprofundi is an extreme halophile within the archaeal phylum Euryarchaeota. The type strain ACAM 34 was isolated from Deep Lake, Antarctica. H. lacusprofundi is of phylogenetic interest because it is distantly related to the haloarchaea that have previously been sequenced. It is also of interest because of its psychrotolerance. We report here the complete genome sequence of H. lacusprofundi type strain ACAM 34 and its annotation. In conclusion, this genome is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea.
Complete genome sequence of the Antarctic Halorubrum lacusprofundi type strain ACAM 34
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anderson, Iain J.; DasSarma, Priya; Lucas, Susan
Halorubrum lacusprofundi is an extreme halophile within the archaeal phylum Euryarchaeota. The type strain ACAM 34 was isolated from Deep Lake, Antarctica. H. lacusprofundi is of phylogenetic interest because it is distantly related to the haloarchaea that have previously been sequenced. It is also of interest because of its psychrotolerance. We report here the complete genome sequence of H. lacusprofundi type strain ACAM 34 and its annotation. In conclusion, this genome is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea.
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.
Wang, Sheng; Sun, Siqi; Li, Zhen; Zhang, Renyu; Xu, Jinbo
2017-01-01
Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. http://raptorx.uchicago.edu/ContactMap/.
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model
Li, Zhen; Zhang, Renyu
2017-01-01
Motivation Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. Method This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Results Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. Availability http://raptorx.uchicago.edu/ContactMap/ PMID:28056090
Aasvang, E K; Werner, M U; Kehlet, H
2014-09-01
Deep pain complaints are more frequent than cutaneous in post-surgical patients, and a prevalent finding in quantitative sensory testing studies. However, the preferred assessment method - pressure algometry - is indirect and tissue unspecific, hindering advances in treatment and preventive strategies. Thus, there is a need for development of methods with direct stimulation of suspected hyperalgesic tissues to identify the peripheral origin of nociceptive input. We compared the reliability of an ultrasound-guided needle stimulation protocol of electrical detection and pain thresholds to pressure algometry, by performing identical test-retest sequences 10 days apart, in deep tissues in the groin region. Electrical stimulation was performed by five up-and-down staircase series of single impulses of 0.04 ms duration, starting from 0 mA in increments of 0.2 mA until a threshold was reached and descending until sensation was lost. Method reliability was assessed by Bland-Altman plots, descriptive statistics, coefficients of variance and intraclass correlation coefficients. The electrical stimulation method was comparable to pressure algometry regarding 10 days test-retest repeatability, but with superior same-day reliability for electrical stimulation (P < 0.05). Between-subject variance rather than within-subject variance was the main source for test variation. There were no systematic differences in electrical thresholds across tissues and locations (P > 0.05). The presented tissue-specific direct deep tissue electrical stimulation technique has equal or superior reliability compared with the indirect tissue-unspecific stimulation by pressure algometry. This method may facilitate advances in mechanism based preventive and treatment strategies in acute and chronic post-surgical pain states. © 2014 The Acta Anaesthesiologica Scandinavica Foundation. Published by John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
Tully, B. J.; Sylvan, J. B.; Heidelberg, J. F.; Huber, J. A.
2014-12-01
There are many limitations involved with sampling microbial diversity from deep-sea subsurface environments, ranging from physical sample collection, low microbial biomass, culturing at in situ conditions, and inefficient nucleic acid extractions. As such, we are continually modifying our methods to obtain better results and expanding what we know about microbes in these environments. Here we present analysis of metagenomes sequences from samples collected from 120 m within the Louisville Seamount and from the top 5-10cm of the sediment in the center of the south Pacific gyre (SPG). Both systems are low biomass with ~102 and ~104 cells per cm3 for Louisville Seamount samples analyzed and the SPG sediment, respectively. The Louisville Seamount represents the first in situ subseafloor basalt and the SPG sediments represent the first in situ low biomass sediment microbial metagenomes. Both of these environments, subseafloor basalt and sediments underlying oligotrophic ocean gyres, represent large provinces of the seafloor environment that remain understudied. Despite the low biomass and DNA generated from these samples, we have generated 16 near complete genomes (5 from Louisville and 11 from the SPG) from the two metagenomic datasets. These genomes are estimated to be between 51-100% complete and span a range of phylogenetic groups, including the Proteobacteria, Actinobacteria, Firmicutes, Chloroflexi, and unclassified bacterial groups. With these genomes, we have assessed potential functional capabilities of these organisms and performed a comparative analysis between the environmental genomes and previously sequenced relatives to determine possible adaptations that may elucidate survival mechanisms for these low energy environments. These methods illustrate a baseline analysis that can be applied to future metagenomic deep-sea subsurface datasets and will help to further our understanding of microbiology within these environments.
Yoshida, Mitsuhiro; Mochizuki, Tomohiro; Urayama, Syun-Ichi; Yoshida-Takashima, Yukari; Nishi, Shinro; Hirai, Miho; Nomaki, Hidetaka; Takaki, Yoshihiro; Nunoura, Takuro; Takai, Ken
2018-01-01
Previous studies on marine environmental virology have primarily focused on double-stranded DNA (dsDNA) viruses; however, it has recently been suggested that single-stranded DNA (ssDNA) viruses are more abundant in marine ecosystems. In this study, we performed a quantitative viral community DNA analysis to estimate the relative abundance and composition of both ssDNA and dsDNA viruses in offshore upper bathyal sediment from Tohoku, Japan (water depth = 500 m). The estimated dsDNA viral abundance ranged from 3 × 106 to 5 × 106 genome copies per cm3 sediment, showing values similar to the range of fluorescence-based direct virus counts. In contrast, the estimated ssDNA viral abundance ranged from 1 × 108 to 3 × 109 genome copies per cm3 sediment, thus providing an estimation that the ssDNA viral populations represent 96.3–99.8% of the benthic total DNA viral assemblages. In the ssDNA viral metagenome, most of the identified viral sequences were associated with ssDNA viral families such as Circoviridae and Microviridae. The principle components analysis of the ssDNA viral sequence components from the sedimentary ssDNA viral metagenomic libraries found that the different depth viral communities at the study site all exhibited similar profiles compared with deep-sea sediment ones at other reference sites. Our results suggested that deep-sea benthic ssDNA viruses have been significantly underestimated by conventional direct virus counts and that their contributions to deep-sea benthic microbial mortality and geochemical cycles should be further addressed by such a new quantitative approach. PMID:29467725
Polymenakou, Paraskevi N; Christakis, Christos A; Mandalakis, Manolis; Oulas, Anastasis
2015-06-01
The deep eastern basin of the Mediterranean Sea is considered to be one of the world's most oligotrophic areas in the world. Here we performed pyrosequenicng analysis of bacterial and archaeal communities in oxic nutrient-poor sediments collected from the eastern Mediterranean at 1025-4393 m depth. Microbial communities were surveyed by targeting the hypervariable V5-V6 regions of the 16S ribosomal RNA gene using bar-coded pyrosequencing. With a total of 13,194 operational taxonomic units (OTUs) or phylotypes at 97% sequence similarities, the phylogenetic affiliation of microbes was assigned to 23 bacterial and 2 archaeal known phyla, 23 candidate divisions at the phylum level and distributed into 186 families. It was further revealed that the microbial consortia inhabiting all sampling sites were highly diverse, but dominated by phylotypes closely related to members of the genus Pseudomonas and Marine Group I archaea. Such pronounced and widespread enrichment probably manifests the cosmopolitan character of these species and raises questions about their metabolic adaptation to the physical stressors and low nutrient availability of the deep eastern Mediterranean Sea. Copyright © 2015 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.
NASA Astrophysics Data System (ADS)
Cavalett, Angélica; Silva, Marcus Adonai Castro da; Toyofuku, Takashi; Mendes, Rodrigo; Taketani, Rodrigo Gouvêa; Pedrini, Jéssica; Freitas, Robert Cardoso de; Sumida, Paulo Yukio Gomes; Yamanaka, Toshiro; Nagano, Yuriko; Pellizari, Vivian Helena; Perez, José Angel Alvarez; Kitazato, Hiroshi; Lima, André Oliveira de Souza
2017-12-01
The deep ocean is the largest marine environment on Earth and is home to a large reservoir of biodiversity. Within the deep ocean, large organic falls attract a suite of metazoans and microorganisms, which form an important community that, in part, relies on reduced chemical compounds. Here, we describe a deep-sea (4204 m) microbial community associated with sediments collected underneath a whale fall skeleton in the South Atlantic Ocean. Metagenomic analysis of 1 Gb of Illumina HiSeq. 2000 reads, including taxonomic and functional genes, was performed by using the MG-RAST pipeline, SEED, COG and the KEGG database. The results showed that Proteobacteria (79%) was the main phylum represented. The most dominant bacterial class in this phylum was Epsilonproteobacteria (69%), and Sulfurovum sp. NBC37-1 (97%) was the dominant species. Different species of Epsilonproteobacteria have been described in marine and terrestrial environments as important organisms for nutrient cycling. Functional analysis revealed key genes for nitrogen and sulfur cycles, including protein sequences for Sox system (sulfur oxidation) enzymes. These enzymes were mainly those of the Epsilonproteobacteria, indicating their importance for nitrogen and sulfur cycles and the balance of nutrients in this environment.
Draft Genome Sequence of Aldehyde-Degrading Strain Halomonas axialensis ACH-L-8
Ye, Jun; Ren, Chong; Shan, Xiexie
2016-01-01
Halomonas axialensis ACH-L-8, a deep-sea strain isolated from the South China Sea, has the ability to degrade aldehydes. Here, we present an annotated draft genome sequence of this species, which could provide fundamental molecular information on the aldehydes-degrading mechanism. PMID:27081145
USDA-ARS?s Scientific Manuscript database
Butyrate is a nutritional element with strong epigenetic regulatory activity as an inhibitor of histone deacetylases (HDACs). Based on the analysis of differentially expressed genes induced by butyrate in the bovine epithelial cell using deep RNA-sequencing technology (RNA-seq), a set of unique gen...
Tillmar, Andreas O.; Dell'Amico, Barbara; Welander, Jenny; Holmlund, Gunilla
2013-01-01
Species identification can be interesting in a wide range of areas, for example, in forensic applications, food monitoring and in archeology. The vast majority of existing DNA typing methods developed for species determination, mainly focuses on a single species source. There are, however, many instances where all species from mixed sources need to be determined, even when the species in minority constitutes less than 1 % of the sample. The introduction of next generation sequencing opens new possibilities for such challenging samples. In this study we present a universal deep sequencing method using 454 GS Junior sequencing of a target on the mitochondrial gene 16S rRNA. The method was designed through phylogenetic analyses of DNA reference sequences from more than 300 mammal species. Experiments were performed on artificial species-species mixture samples in order to verify the method’s robustness and its ability to detect all species within a mixture. The method was also tested on samples from authentic forensic casework. The results showed to be promising, discriminating over 99.9 % of mammal species and the ability to detect multiple donors within a mixture and also to detect minor components as low as 1 % of a mixed sample. PMID:24358309
Engle, E K; Fisher, D A C; Miller, C A; McLellan, M D; Fulton, R S; Moore, D M; Wilson, R K; Ley, T J; Oh, S T
2015-04-01
Clonal architecture in myeloproliferative neoplasms (MPNs) is poorly understood. Here we report genomic analyses of a patient with primary myelofibrosis (PMF) transformed to secondary acute myeloid leukemia (sAML). Whole genome sequencing (WGS) was performed on PMF and sAML diagnosis samples, with skin included as a germline surrogate. Deep sequencing validation was performed on the WGS samples and an additional sample obtained during sAML remission/relapsed PMF. Clustering analysis of 649 validated somatic single-nucleotide variants revealed four distinct clonal groups, each including putative driver mutations. The first group (including JAK2 and U2AF1), representing the founding clone, included mutations with high frequency at all three disease stages. The second clonal group (including MYB) was present only in PMF, suggesting the presence of a clone that was dispensable for transformation. The third group (including ASXL1) contained mutations with low frequency in PMF and high frequency in subsequent samples, indicating evolution of the dominant clone with disease progression. The fourth clonal group (including IDH1 and RUNX1) was acquired at sAML transformation and was predominantly absent at sAML remission/relapsed PMF. Taken together, these findings illustrate the complex clonal dynamics associated with disease evolution in MPNs and sAML.
2014-01-01
Background The objective of this study was to perform a systematic review and a meta-analysis in order to estimate the diagnostic accuracy of diffusion weighted imaging (DWI) in the preoperative assessment of deep myometrial invasion in patients with endometrial carcinoma. Methods Studies evaluating DWI for the detection of deep myometrial invasion in patients with endometrial carcinoma were systematically searched for in the MEDLINE, EMBASE, and Cochrane Library from January 1995 to January 2014. Methodologic quality was assessed by using the Quality Assessment of Diagnostic Accuracy Studies tool. Bivariate random-effects meta-analytic methods were used to obtain pooled estimates of sensitivity, specificity, diagnostic odds ratio (DOR) and receiver operating characteristic (ROC) curves. The study also evaluated the clinical utility of DWI in preoperative assessment of deep myometrial invasion. Results Seven studies enrolling a total of 320 individuals met the study inclusion criteria. The summary area under the ROC curve was 0.91. There was no evidence of publication bias (P = 0.90, bias coefficient analysis). Sensitivity and specificity of DWI for detection of deep myometrial invasion across all studies were 0.90 and 0.89, respectively. Positive and negative likelihood ratios with DWI were 8 and 0.11 respectively. In patients with high pre-test probabilities, DWI enabled confirmation of deep myometrial invasion; in patients with low pre-test probabilities, DWI enabled exclusion of deep myometrial invasion. The worst case scenario (pre-test probability, 50%) post-test probabilities were 89% and 10% for positive and negative DWI results, respectively. Conclusion DWI has high sensitivity and specificity for detecting deep myometrial invasion and more importantly can reliably rule out deep myometrial invasion. Therefore, it would be worthwhile to add a DWI sequence to the standard MRI protocols in preoperative evaluation of endometrial cancer in order to detect deep myometrial invasion, which along with other poor prognostic factors like age, tumor grade, and LVSI would be useful in stratifying high risk groups thereby helping in the tailoring of surgical approach in patient with low risk of endometrial carcinoma. PMID:25608571
Slack, J.F.; Grenne, Tor; Bekker, A.; Rouxel, O.J.; Lindberg, P.A.
2007-01-01
A current model for the evolution of Proterozoic deep seawater composition involves a change from anoxic sulfide-free to sulfidic conditions 1.8??Ga. In an earlier model the deep ocean became oxic at that time. Both models are based on the secular distribution of banded iron formation (BIF) in shallow marine sequences. We here present a new model based on rare earth elements, especially redox-sensitive Ce, in hydrothermal silica-iron oxide sediments from deeper-water, open-marine settings related to volcanogenic massive sulfide (VMS) deposits. In contrast to Archean, Paleozoic, and modern hydrothermal iron oxide sediments, 1.74 to 1.71??Ga hematitic chert (jasper) and iron formation in central Arizona, USA, show moderate positive to small negative Ce anomalies, suggesting that the redox state of the deep ocean then was at a transitional, suboxic state with low concentrations of dissolved O2 but no H2S. The presence of jasper and/or iron formation related to VMS deposits in other volcanosedimentary sequences ca. 1.79-1.69??Ga, 1.40??Ga, and 1.24??Ga also reflects oxygenated and not sulfidic deep ocean waters during these time periods. Suboxic conditions in the deep ocean are consistent with the lack of shallow-marine BIF ??? 1.8 to 0.8??Ga, and likely limited nutrient concentrations in seawater and, consequently, may have constrained biological evolution. ?? 2006 Elsevier B.V. All rights reserved.
Smola, Matthew J; Rice, Greggory M; Busan, Steven; Siegfried, Nathan A; Weeks, Kevin M
2015-11-01
Selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) chemistries exploit small electrophilic reagents that react with 2'-hydroxyl groups to interrogate RNA structure at single-nucleotide resolution. Mutational profiling (MaP) identifies modified residues by using reverse transcriptase to misread a SHAPE-modified nucleotide and then counting the resulting mutations by massively parallel sequencing. The SHAPE-MaP approach measures the structure of large and transcriptome-wide systems as accurately as can be done for simple model RNAs. This protocol describes the experimental steps, implemented over 3 d, that are required to perform SHAPE probing and to construct multiplexed SHAPE-MaP libraries suitable for deep sequencing. Automated processing of MaP sequencing data is accomplished using two software packages. ShapeMapper converts raw sequencing files into mutational profiles, creates SHAPE reactivity plots and provides useful troubleshooting information. SuperFold uses these data to model RNA secondary structures, identify regions with well-defined structures and visualize probable and alternative helices, often in under 1 d. SHAPE-MaP can be used to make nucleotide-resolution biophysical measurements of individual RNA motifs, rare components of complex RNA ensembles and entire transcriptomes.
Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data.
Favero, F; Joshi, T; Marquard, A M; Birkbak, N J; Krzystanek, M; Li, Q; Szallasi, Z; Eklund, A C
2015-01-01
Exome or whole-genome deep sequencing of tumor DNA along with paired normal DNA can potentially provide a detailed picture of the somatic mutations that characterize the tumor. However, analysis of such sequence data can be complicated by the presence of normal cells in the tumor specimen, by intratumor heterogeneity, and by the sheer size of the raw data. In particular, determination of copy number variations from exome sequencing data alone has proven difficult; thus, single nucleotide polymorphism (SNP) arrays have often been used for this task. Recently, algorithms to estimate absolute, but not allele-specific, copy number profiles from tumor sequencing data have been described. We developed Sequenza, a software package that uses paired tumor-normal DNA sequencing data to estimate tumor cellularity and ploidy, and to calculate allele-specific copy number profiles and mutation profiles. We applied Sequenza, as well as two previously published algorithms, to exome sequence data from 30 tumors from The Cancer Genome Atlas. We assessed the performance of these algorithms by comparing their results with those generated using matched SNP arrays and processed by the allele-specific copy number analysis of tumors (ASCAT) algorithm. Comparison between Sequenza/exome and SNP/ASCAT revealed strong correlation in cellularity (Pearson's r = 0.90) and ploidy estimates (r = 0.42, or r = 0.94 after manual inspecting alternative solutions). This performance was noticeably superior to previously published algorithms. In addition, in artificial data simulating normal-tumor admixtures, Sequenza detected the correct ploidy in samples with tumor content as low as 30%. The agreement between Sequenza and SNP array-based copy number profiles suggests that exome sequencing alone is sufficient not only for identifying small scale mutations but also for estimating cellularity and inferring DNA copy number aberrations. © The Author 2014. Published by Oxford University Press on behalf of the European Society for Medical Oncology.
The dynamics of genome replication using deep sequencing
Müller, Carolin A.; Hawkins, Michelle; Retkute, Renata; Malla, Sunir; Wilson, Ray; Blythe, Martin J.; Nakato, Ryuichiro; Komata, Makiko; Shirahige, Katsuhiko; de Moura, Alessandro P.S.; Nieduszynski, Conrad A.
2014-01-01
Eukaryotic genomes are replicated from multiple DNA replication origins. We present complementary deep sequencing approaches to measure origin location and activity in Saccharomyces cerevisiae. Measuring the increase in DNA copy number during a synchronous S-phase allowed the precise determination of genome replication. To map origin locations, replication forks were stalled close to their initiation sites; therefore, copy number enrichment was limited to origins. Replication timing profiles were generated from asynchronous cultures using fluorescence-activated cell sorting. Applying this technique we show that the replication profiles of haploid and diploid cells are indistinguishable, indicating that both cell types use the same cohort of origins with the same activities. Finally, increasing sequencing depth allowed the direct measure of replication dynamics from an exponentially growing culture. This is the first time this approach, called marker frequency analysis, has been successfully applied to a eukaryote. These data provide a high-resolution resource and methodological framework for studying genome biology. PMID:24089142
Danielsson, Frida; Wiking, Mikaela; Mahdessian, Diana; Skogs, Marie; Ait Blal, Hammou; Hjelmare, Martin; Stadler, Charlotte; Uhlén, Mathias; Lundberg, Emma
2013-01-04
One of the major challenges of a chromosome-centric proteome project is to explore in a systematic manner the potential proteins identified from the chromosomal genome sequence, but not yet characterized on a protein level. Here, we describe the use of RNA deep sequencing to screen human cell lines for RNA profiles and to use this information to select cell lines suitable for characterization of the corresponding gene product. In this manner, the subcellular localization of proteins can be analyzed systematically using antibody-based confocal microscopy. We demonstrate the usefulness of selecting cell lines with high expression levels of RNA transcripts to increase the likelihood of high quality immunofluorescence staining and subsequent successful subcellular localization of the corresponding protein. The results show a path to combine transcriptomics with affinity proteomics to characterize the proteins in a gene- or chromosome-centric manner.
Deep sequencing reveals persistence of cell-associated mumps vaccine virus in chronic encephalitis.
Morfopoulou, Sofia; Mee, Edward T; Connaughton, Sarah M; Brown, Julianne R; Gilmour, Kimberly; Chong, W K 'Kling'; Duprex, W Paul; Ferguson, Deborah; Hubank, Mike; Hutchinson, Ciaran; Kaliakatsos, Marios; McQuaid, Stephen; Paine, Simon; Plagnol, Vincent; Ruis, Christopher; Virasami, Alex; Zhan, Hong; Jacques, Thomas S; Schepelmann, Silke; Qasim, Waseem; Breuer, Judith
2017-01-01
Routine childhood vaccination against measles, mumps and rubella has virtually abolished virus-related morbidity and mortality. Notwithstanding this, we describe here devastating neurological complications associated with the detection of live-attenuated mumps virus Jeryl Lynn (MuV JL5 ) in the brain of a child who had undergone successful allogeneic transplantation for severe combined immunodeficiency (SCID). This is the first confirmed report of MuV JL5 associated with chronic encephalitis and highlights the need to exclude immunodeficient individuals from immunisation with live-attenuated vaccines. The diagnosis was only possible by deep sequencing of the brain biopsy. Sequence comparison of the vaccine batch to the MuV JL5 isolated from brain identified biased hypermutation, particularly in the matrix gene, similar to those found in measles from cases of SSPE. The findings provide unique insights into the pathogenesis of paramyxovirus brain infections.
Cryopyrin-associated Periodic Syndrome Caused by a Myeloid-Restricted Somatic NLRP3 Mutation
Zhou, Qing; Aksentijevich, Ivona; Wood, Geryl M.; Walts, Avram D.; Hoffmann, Patrycja; Remmers, Elaine F.; Kastner, Daniel L.; Ombrello, Amanda K.
2015-01-01
Objective To identify the cause of disease in an adult patient presenting with recent onset fevers, chills, urticaria, fatigue, and profound myalgia, who was negative for cryopyrin-associated periodic syndrome (CAPS) NLRP3 mutations by conventional Sanger DNA sequencing. Methods We performed whole-exome sequencing and targeted deep sequencing using DNA from the patient’s whole blood to identify a possible NLRP3 somatic mutation. We then screened for this mutation in subcloned NLRP3 amplicons from fibroblasts, buccal cells, granulocytes, negatively-selected monocytes, and T and B lymphocytes and further confirmed the somatic mutation by targeted sequencing of exon 3. Results We identified a previously reported CAPS-associated mutation, p.Tyr570Cys, with a mutant allele frequency of 15% based on exome data. Targeted sequencing and subcloning of NLRP3 amplicons confirmed the presence of the somatic mutation in whole blood at a ratio similar to the exome data. The mutant allele frequency was in the range of 13.3%–16.8% in monocytes and 15.2%–18% in granulocytes; Notably, this mutation was either absent or present at a very low frequency in B and T lymphocytes, buccal cells, and in the patient’s cultured fibroblasts. Conclusion These data document the possibility of myeloid-restricted somatic mosaicism in the pathogenesis of CAPS, underscoring the emerging role of massively-parallel sequencing in clinical diagnosis. PMID:25988971
Saghatelyan, Ani; Poghosyan, Lianna; Panosyan, Hovik; Birkeland, Nils-Kåre
2015-11-12
The 2,379,636-bp draft genome sequence of Thermus scotoductus strain K1, isolated from geothermal spring outlet located in the Karvachar region in Nagorno Karabakh is presented. Strain K1 shares about 80% genome sequence similarity with T. scotoductus strain SA-01, recovered from a deep gold mine in South Africa. Copyright © 2015 Saghatelyan et al.
Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing
Matochko, Wadim L.; Derda, Ratmir
2013-01-01
Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (S a). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of S a and use them to define the sequencing operator (S e q). Sequencing without any bias and errors is S e q = S a IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (C E N), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process. PMID:24416071
Expedition 48/49 crew visit to MSFC
2017-04-06
NASA astronaut Kate Rubins presents highlights from Expedition 48/49, her mission to the International Space Station, to team members and Space Camp students from the U.S. Space & Rocket Center in Huntsville, April 6 at NASA's Marshall Space Flight Center. During her mission, Rubins became the first person to sequence DNA in space, researching technology development for deep-space exploration by humans, Earth and space science. She also conducted two spacewalks, in which she and NASA astronaut Jeff Williams installed an International Docking Adapter and performed maintenance of the station's external thermal control system and installed high-definition cameras.
Samad, Abdul Fatah A; Nazaruddin, Nazaruddin; Murad, Abdul Munir Abdul; Jani, Jaeyres; Zainal, Zamri; Ismail, Ismanizan
2018-03-01
In current era, majority of microRNA (miRNA) are being discovered through computational approaches which are more confined towards model plants. Here, for the first time, we have described the identification and characterization of novel miRNA in a non-model plant, Persicaria minor ( P . minor ) using computational approach. Unannotated sequences from deep sequencing were analyzed based on previous well-established parameters. Around 24 putative novel miRNAs were identified from 6,417,780 reads of the unannotated sequence which represented 11 unique putative miRNA sequences. PsRobot target prediction tool was deployed to identify the target transcripts of putative novel miRNAs. Most of the predicted target transcripts (mRNAs) were known to be involved in plant development and stress responses. Gene ontology showed that majority of the putative novel miRNA targets involved in cellular component (69.07%), followed by molecular function (30.08%) and biological process (0.85%). Out of 11 unique putative miRNAs, 7 miRNAs were validated through semi-quantitative PCR. These novel miRNAs discoveries in P . minor may develop and update the current public miRNA database.
Poly(A)-tag deep sequencing data processing to extract poly(A) sites.
Wu, Xiaohui; Ji, Guoli; Li, Qingshun Quinn
2015-01-01
Polyadenylation [poly(A)] is an essential posttranscriptional processing step in the maturation of eukaryotic mRNA. The advent of next-generation sequencing (NGS) technology has offered feasible means to generate large-scale data and new opportunities for intensive study of polyadenylation, particularly deep sequencing of the transcriptome targeting the junction of 3'-UTR and the poly(A) tail of the transcript. To take advantage of this unprecedented amount of data, we present an automated workflow to identify polyadenylation sites by integrating NGS data cleaning, processing, mapping, normalizing, and clustering. In this pipeline, a series of Perl scripts are seamlessly integrated to iteratively map the single- or paired-end sequences to the reference genome. After mapping, the poly(A) tags (PATs) at the same genome coordinate are grouped into one cleavage site, and the internal priming artifacts removed. Then the ambiguous region is introduced to parse the genome annotation for cleavage site clustering. Finally, cleavage sites within a close range of 24 nucleotides and from different samples can be clustered into poly(A) clusters. This procedure could be used to identify thousands of reliable poly(A) clusters from millions of NGS sequences in different tissues or treatments.
Browning, J.V.; Miller, K.G.; McLaughlin, P.P.; Edwards, L.E.; Kulpecz, A.A.; Powars, D.S.; Wade, B.S.; Feigenson, M.D.; Wright, J.D.
2009-01-01
The Eyreville core holes provide the first continuously cored record of postimpact sequences from within the deepest part of the central Chesapeake Bay impact crater. We analyzed the upper Eocene to Pliocene postimpact sediments from the Eyreville A and C core holes for lithology (semiquantitative measurements of grain size and composition), sequence stratigraphy, and chronostratigraphy. Age is based primarily on Sr isotope stratigraphy supplemented by biostratigraphy (dinocysts, nannofossils, and planktonic foraminifers); age resolution is approximately ??0.5 Ma for early Miocene sequences and approximately ??1.0 Ma for younger and older sequences. Eocene-lower Miocene sequences are subtle, upper middle to lower upper Miocene sequences are more clearly distinguished, and upper Miocene- Pliocene sequences display a distinct facies pattern within sequences. We recognize two upper Eocene, two Oligocene, nine Miocene, three Pliocene, and one Pleistocene sequence and correlate them with those in New Jersey and Delaware. The upper Eocene through Pleistocene strata at Eyreville record changes from: (1) rapidly deposited, extremely fi ne-grained Eocene strata that probably represent two sequences deposited in a deep (>200 m) basin; to (2) highly dissected Oligocene (two very thin sequences) to lower Miocene (three thin sequences) with a long hiatus; to (3) a thick, rapidly deposited (43-73 m/Ma), very fi ne-grained, biosiliceous middle Miocene (16.5-14 Ma) section divided into three sequences (V5-V3) deposited in middle neritic paleoenvironments; to (4) a 4.5-Ma-long hiatus (12.8-8.3 Ma); to (5) sandy, shelly upper Miocene to Pliocene strata (8.3-2.0 Ma) divided into six sequences deposited in shelf and shoreface environments; and, last, to (6) a sandy middle Pleistocene paralic sequence (~400 ka). The Eyreville cores thus record the fi lling of a deep impact-generated basin where the timing of sequence boundaries is heavily infl uenced by eustasy. ?? 2009 The Geological Society of America.
2010-01-01
Background Nematodes represent the most abundant benthic metazoa in one of the largest habitats on earth, the deep sea. Characterizing major patterns of biodiversity within this dominant group is a critical step towards understanding evolutionary patterns across this vast ecosystem. The present study has aimed to place deep-sea nematode species into a phylogenetic framework, investigate relationships between shallow water and deep-sea taxa, and elucidate phylogeographic patterns amongst the deep-sea fauna. Results Molecular data (18 S and 28 S rRNA) confirms a high diversity amongst deep-sea Enoplids. There is no evidence for endemic deep-sea lineages in Maximum Likelihood or Bayesian phylogenies, and Enoplids do not cluster according to depth or geographic location. Tree topologies suggest frequent interchanges between deep-sea and shallow water habitats, as well as a mixture of early radiations and more recently derived lineages amongst deep-sea taxa. This study also provides convincing evidence of cosmopolitan marine species, recovering a subset of Oncholaimid nematodes with identical gene sequences (18 S, 28 S and cox1) at trans-Atlantic sample sites. Conclusions The complex clade structures recovered within the Enoplida support a high global species richness for marine nematodes, with phylogeographic patterns suggesting the existence of closely related, globally distributed species complexes in the deep sea. True cosmopolitan species may additionally exist within this group, potentially driven by specific life history traits of Enoplids. Although this investigation aimed to intensively sample nematodes from the order Enoplida, specimens were only identified down to genus (at best) and our sampling regime focused on an infinitesimal small fraction of the deep-sea floor. Future nematode studies should incorporate an extended sample set covering a wide depth range (shelf, bathyal, and abyssal sites), utilize additional genetic loci (e.g. mtDNA) that are informative at the species level, and apply high-throughput sequencing methods to fully assay community diversity. Finally, further molecular studies are needed to determine whether phylogeographic patterns observed in Enoplids are common across other ubiquitous marine groups (e.g. Chromadorida, Monhysterida). PMID:21167065
Filippidou, Sevasti; Jaussi, Marion; Junier, Thomas; Wunderlin, Tina; Jeanneret, Nicole; Regenspurg, Simona; Li, Po-E; Lo, Chien-Chi; McMurry, Kim; Gleasner, Cheryl D.; Vuyisich, Momchilo; Chain, Patrick S.
2015-01-01
The genome of strain GS3372 is the first publicly available strain of Aeribacillus pallidus. This endospore-forming thermophilic strain was isolated from a deep geothermal reservoir. The availability of this genome can contribute to the clarification of the taxonomy of the closely related Anoxybacillus, Geobacillus, and Aeribacillus genera. PMID:26316637
Bacterial community diversity of the deep-sea octocoral Paramuricea placomus.
Kellogg, Christina A; Ross, Steve W; Brooke, Sandra D
2016-01-01
Compared to tropical corals, much less is known about deep-sea coral biology and ecology. Although the microbial communities of some deep-sea corals have been described, this is the first study to characterize the bacterial community associated with the deep-sea octocoral, Paramuricea placomus . Samples from five colonies of P. placomus were collected from Baltimore Canyon (379-382 m depth) in the Atlantic Ocean off the east coast of the United States of America. DNA was extracted from the coral samples and 16S rRNA gene amplicons were pyrosequenced using V4-V5 primers. Three samples sequenced deeply (>4,000 sequences each) and were further analyzed. The dominant microbial phylum was Proteobacteria, but other major phyla included Firmicutes and Planctomycetes. A conserved community of bacterial taxa held in common across the three P. placomus colonies was identified, comprising 68-90% of the total bacterial community depending on the coral individual. The bacterial community of P. placomus does not appear to include the genus Endozoicomonas , which has been found previously to be the dominant bacterial associate in several temperate and tropical gorgonians. Inferred functionality suggests the possibility of nitrogen cycling by the core bacterial community.
Bacterial community diversity of the deep-sea octocoral Paramuricea placomus
Kellogg, Christina A.; Ross, Steve W.; Brooke, Sandra D.
2016-01-01
Compared to tropical corals, much less is known about deep-sea coral biology and ecology. Although the microbial communities of some deep-sea corals have been described, this is the first study to characterize the bacterial community associated with the deep-sea octocoral, Paramuricea placomus. Samples from five colonies of P. placomus were collected from Baltimore Canyon (379–382 m depth) in the Atlantic Ocean off the east coast of the United States of America. DNA was extracted from the coral samples and 16S rRNA gene amplicons were pyrosequenced using V4-V5 primers. Three samples sequenced deeply (>4,000 sequences each) and were further analyzed. The dominant microbial phylum was Proteobacteria, but other major phyla included Firmicutes and Planctomycetes. A conserved community of bacterial taxa held in common across the three P. placomuscolonies was identified, comprising 68–90% of the total bacterial community depending on the coral individual. The bacterial community of P. placomusdoes not appear to include the genus Endozoicomonas, which has been found previously to be the dominant bacterial associate in several temperate and tropical gorgonians. Inferred functionality suggests the possibility of nitrogen cycling by the core bacterial community.
Edgar, Robyn; Veerapaneni, Ram S.; D’Elia, Tom; Morris, Paul F.; Rogers, Scott O.
2013-01-01
Lake Vostok, the 7th largest (by volume) and 4th deepest lake on Earth, is covered by more than 3,700 m of ice, making it the largest subglacial lake known. The combination of cold, heat (from possible hydrothermal activity), pressure (from the overriding glacier), limited nutrients and complete darkness presents extreme challenges to life. Here, we report metagenomic/metatranscriptomic sequence analyses from four accretion ice sections from the Vostok 5G ice core. Two sections accreted in the vicinity of an embayment on the southwestern end of the lake, and the other two represented part of the southern main basin. We obtained 3,507 unique gene sequences from concentrates of 500 ml of 0.22 µm-filtered accretion ice meltwater. Taxonomic classifications (to genus and/or species) were possible for 1,623 of the sequences. Species determinations in combination with mRNA gene sequence results allowed deduction of the metabolic pathways represented in the accretion ice and, by extension, in the lake. Approximately 94% of the sequences were from Bacteria and 6% were from Eukarya. Only two sequences were from Archaea. In general, the taxa were similar to organisms previously described from lakes, brackish water, marine environments, soil, glaciers, ice, lake sediments, deep-sea sediments, deep-sea thermal vents, animals and plants. Sequences from aerobic, anaerobic, psychrophilic, thermophilic, halophilic, alkaliphilic, acidophilic, desiccation-resistant, autotrophic and heterotrophic organisms were present, including a number from multicellular eukaryotes. PMID:23843994
Shtarkman, Yury M; Koçer, Zeynep A; Edgar, Robyn; Veerapaneni, Ram S; D'Elia, Tom; Morris, Paul F; Rogers, Scott O
2013-01-01
Lake Vostok, the 7(th) largest (by volume) and 4(th) deepest lake on Earth, is covered by more than 3,700 m of ice, making it the largest subglacial lake known. The combination of cold, heat (from possible hydrothermal activity), pressure (from the overriding glacier), limited nutrients and complete darkness presents extreme challenges to life. Here, we report metagenomic/metatranscriptomic sequence analyses from four accretion ice sections from the Vostok 5G ice core. Two sections accreted in the vicinity of an embayment on the southwestern end of the lake, and the other two represented part of the southern main basin. We obtained 3,507 unique gene sequences from concentrates of 500 ml of 0.22 µm-filtered accretion ice meltwater. Taxonomic classifications (to genus and/or species) were possible for 1,623 of the sequences. Species determinations in combination with mRNA gene sequence results allowed deduction of the metabolic pathways represented in the accretion ice and, by extension, in the lake. Approximately 94% of the sequences were from Bacteria and 6% were from Eukarya. Only two sequences were from Archaea. In general, the taxa were similar to organisms previously described from lakes, brackish water, marine environments, soil, glaciers, ice, lake sediments, deep-sea sediments, deep-sea thermal vents, animals and plants. Sequences from aerobic, anaerobic, psychrophilic, thermophilic, halophilic, alkaliphilic, acidophilic, desiccation-resistant, autotrophic and heterotrophic organisms were present, including a number from multicellular eukaryotes.
Jansen, Anne M L; Crobach, Stijn; Geurts-Giele, Willemina R R; van den Akker, Brendy E W M; Garcia, Marina Ventayol; Ruano, Dina; Nielsen, Maartje; Tops, Carli M J; Wijnen, Juul T; Hes, Frederik J; van Wezel, Tom; Dinjens, Winand N M; Morreau, Hans
2017-02-01
We investigated the presence and patterns of mosaicism in the APC gene in patients with colon neoplasms not associated with any other genetic variants; we performed deep sequence analysis of APC in at least 2 adenomas or carcinomas per patient. We identified mosaic variants in APC in adenomas from 9 of the 18 patients with 21 to approximately 100 adenomas. Mosaic variants of APC were variably detected in leukocyte DNA and/or non-neoplastic intestinal mucosa of these patients. In a comprehensive sequence analysis of 1 patient, we found no evidence for mosaicism in APC in non-neoplastic intestinal mucosa. One patient was found to carry a mosaic c.4666dupA APC variant in only 10 of 16 adenomas, indicating the importance of screening 2 or more adenomas for genetic variants. Copyright © 2017 AGA Institute. Published by Elsevier Inc. All rights reserved.
Using small RNA (sRNA) deep sequencing to understand global virus distribution in plants
USDA-ARS?s Scientific Manuscript database
Small RNAs (sRNAs), a class of regulatory RNAs, have been used to serve as the specificity determinants of suppressing gene expression in plants and animals. Next generation sequencing (NGS) uncovered the sRNA landscape in most organisms including their associated microbes. In the current study, w...
Fatal Metacestode Infection in Bornean Orangutan Caused by Unknown Versteria Species
Gendron-Fitzpatrick, Annette; Deering, Kathleen M.; Wallace, Roberta S.; Clyde, Victoria L.; Lauck, Michael; Rosen, Gail E.; Bennett, Andrew J.; Greiner, Ellis C.; O’Connor, David H.
2014-01-01
A captive juvenile Bornean orangutan (Pongo pygmaeus) died from an unknown disseminated parasitic infection. Deep sequencing of DNA from infected tissues, followed by gene-specific PCR and sequencing, revealed a divergent species within the newly proposed genus Versteria (Cestoda: Taeniidae). Versteria may represent a previously unrecognized risk to primate health. PMID:24377497
USDA-ARS?s Scientific Manuscript database
The phylogeny of Amaryllidaceae tribe Hippeastreae was inferred using chloroplast (3’ycf1, ndhF, trnL-F) and nuclear (ITS rDNA) sequence data under maximum parsimony and maximum likelihood frameworks. Network analyses were applied to resolve conflicting signals among data sets and putative scenarios...
Analysis of alterative cleavage and polyadenylation by 3′ region extraction and deep sequencing
Hoque, Mainul; Ji, Zhe; Zheng, Dinghai; Luo, Wenting; Li, Wencheng; You, Bei; Park, Ji Yeon; Yehia, Ghassan; Tian, Bin
2012-01-01
Alternative cleavage and polyadenylation (APA) leads to mRNA isoforms with different coding sequences (CDS) and/or 3′ untranslated regions (3′UTRs). Using 3′ Region Extraction And Deep Sequencing (3′READS), a method which addresses the internal priming and oligo(A) tail issues that commonly plague polyA site (pA) identification, we comprehensively mapped pAs in the mouse genome, thoroughly annotating 3′ ends of genes and revealing over five thousand pAs (~8% of total) flanked by A-rich sequences, which have hitherto been overlooked. About 79% of mRNA genes and 66% of long non-coding RNA (lncRNA) genes have APA; but these two gene types have distinct usage patterns for pAs in introns and upstream exons. Promoter-distal pAs become relatively more abundant during embryonic development and cell differentiation, a trend affecting pAs in both 3′-most exons and upstream regions. Upregulated isoforms generally have stronger pAs, suggesting global modulation of the 3′ end processing activity in development and differentiation. PMID:23241633
Microbes in deep marine sediments viewed through amplicon sequencing and metagenomics
NASA Astrophysics Data System (ADS)
Biddle, J.; Leon, Z. R.; Russell, J. A., III; Martino, A. J.
2016-12-01
Nearly twenty percent of microbial biomass on Earth can be found in the marine subsurface. The majority of this is concentrated on continental margins, which have been investigated by scientific drilling. On the Costa Rica Margin, Iberian Margin and Peru Margins, sediment samples have been investigated through DNA extraction followed by amplicon and metagenomic sequencing. Overall samples show a high degree of microbial diversity, including many lineages of newly defined groups. In this talk, metagenome assembled genomes of unusual lineages will be presented, including their relationships to shallower relatives. From Costa Rica, in particular, we have retrieved deep relatives of Lokiarchaeota and Thorarchaeota, as well as other deeply branching archaeal relatives. We discuss their genome similarities to both other archaea and eukaryotes. From the Iberian Margin, relatives of Atribacteria and Aerophobetes will be discussed. Finally, we will detail the knowledge lost or gained depending on whether samples are studied via amplicon sequencing or total metagenomics, as studies in other environments have shown that up to 15% of microbial diversity is ignored when samples are studied via amplicon sequencing alone.
Li, Chenghua; Feng, Weida; Qiu, Lihua; Xia, Changge; Su, Xiurong; Jin, Chunhua; Zhou, Tingting; Zeng, Yuan; Li, Taiwu
2012-08-01
MicroRNAs (miRNAs) constitute a family of small RNA species which have been demonstrated to be one of key effectors in mediating host-pathogen interaction. In this study, two haemocytes miRNA libraries were constructed with deep sequenced by illumina Hiseq2000 from healthy (L1) and skin ulceration syndrome Apostichopus japonicus (L2). The high throughput solexa sequencing resulted in 9,579,038 and 7,742,558 clean data from L1 and L2, respectively. Sequences analysis revealed that 40 conserved miRNAs were found in both libraries, in which let-7 and mir-125 were speculated to be clustered together and expressed accordingly. Eighty-six miRNA candidates were also identified by reference genome search and stem-loop structure prediction. Importantly, mir-31 and mir-2008 displayed significant differential expression between the two libraries according to FPKM model, which might be considered as promising targets for elucidating the intrinsic mechanism of skin ulceration syndrome outbreak in the species. Copyright © 2012 Elsevier Ltd. All rights reserved.
Analysis of deep learning methods for blind protein contact prediction in CASP12.
Wang, Sheng; Sun, Siqi; Xu, Jinbo
2018-03-01
Here we present the results of protein contact prediction achieved in CASP12 by our RaptorX-Contact server, which is an early implementation of our deep learning method for contact prediction. On a set of 38 free-modeling target domains with a median family size of around 58 effective sequences, our server obtained an average top L/5 long- and medium-range contact accuracy of 47% and 44%, respectively (L = length). A complete implementation has an average accuracy of 59% and 57%, respectively. Our deep learning method formulates contact prediction as a pixel-level image labeling problem and simultaneously predicts all residue pairs of a protein using a combination of two deep residual neural networks, taking as input the residue conservation information, predicted secondary structure and solvent accessibility, contact potential, and coevolution information. Our approach differs from existing methods mainly in (1) formulating contact prediction as a pixel-level image labeling problem instead of an image-level classification problem; (2) simultaneously predicting all contacts of an individual protein to make effective use of contact occurrence patterns; and (3) integrating both one-dimensional and two-dimensional deep convolutional neural networks to effectively learn complex sequence-structure relationship including high-order residue correlation. This paper discusses the RaptorX-Contact pipeline, both contact prediction and contact-based folding results, and finally the strength and weakness of our method. © 2017 Wiley Periodicals, Inc.
Diversity and Biogeography of Bathyal and Abyssal Seafloor Bacteria
Bienhold, Christina; Zinger, Lucie; Boetius, Antje; Ramette, Alban
2016-01-01
The deep ocean floor covers more than 60% of the Earth’s surface, and hosts diverse bacterial communities with important functions in carbon and nutrient cycles. The identification of key bacterial members remains a challenge and their patterns of distribution in seafloor sediment yet remain poorly described. Previous studies were either regionally restricted or included few deep-sea sediments, and did not specifically test biogeographic patterns across the vast oligotrophic bathyal and abyssal seafloor. Here we define the composition of this deep seafloor microbiome by describing those bacterial operational taxonomic units (OTU) that are specifically associated with deep-sea surface sediments at water depths ranging from 1000–5300 m. We show that the microbiome of the surface seafloor is distinct from the subsurface seafloor. The cosmopolitan bacterial OTU were affiliated with the clades JTB255 (class Gammaproteobacteria, order Xanthomonadales) and OM1 (Actinobacteria, order Acidimicrobiales), comprising 21% and 7% of their respective clades, and about 1% of all sequences in the study. Overall, few sequence-abundant bacterial types were globally dispersed and displayed positive range-abundance relationships. Most bacterial populations were rare and exhibited a high degree of endemism, explaining the substantial differences in community composition observed over large spatial scales. Despite the relative physicochemical uniformity of deep-sea sediments, we identified indicators of productivity regimes, especially sediment organic matter content, as factors significantly associated with changes in bacterial community structure across the globe. PMID:26814838
Diverse deep-sea fungi from the South China Sea and their antimicrobial activity.
Zhang, Xiao-Yong; Zhang, Yun; Xu, Xin-Ya; Qi, Shu-Hua
2013-11-01
We investigated the diversity of fungal communities in nine different deep-sea sediment samples of the South China Sea by culture-dependent methods followed by analysis of fungal internal transcribed spacer (ITS) sequences. Although 14 out of 27 identified species were reported in a previous study, 13 species were isolated from sediments of deep-sea environments for the first report. Moreover, these ITS sequences of six isolates shared 84-92 % similarity with their closest matches in GenBank, which suggested that they might be novel phylotypes of genera Ajellomyces, Podosordaria, Torula, and Xylaria. The antimicrobial activities of these fungal isolates were explored using a double-layer technique. A relatively high proportion (56 %) of fungal isolates exhibited antimicrobial activity against at least one pathogenic bacterium or fungus among four marine pathogenic microbes (Micrococcus luteus, Pseudoaltermonas piscida, Aspergerillus versicolor, and A. sydowii). Out of these antimicrobial fungi, the genera Arthrinium, Aspergillus, and Penicillium exhibited antibacterial and antifungal activities, while genus Aureobasidium displayed only antibacterial activity, and genera Acremonium, Cladosporium, Geomyces, and Phaeosphaeriopsis displayed only antifungal activity. To our knowledge, this is the first report to investigate the diversity and antimicrobial activity of culturable deep-sea-derived fungi in the South China Sea. These results suggest that diverse deep-sea fungi from the South China Sea are a potential source for antibiotics' discovery and further increase the pool of fungi available for natural bioactive product screening.
High fungal diversity and abundance recovered in the deep-sea sediments of the Pacific Ocean.
Xu, Wei; Pang, Ka-Lai; Luo, Zhu-Hua
2014-11-01
Knowledge about the presence and ecological significance of bacteria and archaea in the deep-sea environments has been well recognized, but the eukaryotic microorganisms, such as fungi, have rarely been reported. The present study investigated the composition and abundance of fungal community in the deep-sea sediments of the Pacific Ocean. In this study, a total of 1,947 internal transcribed spacer (ITS) regions of fungal rRNA gene clones were recovered from five sediment samples at the Pacific Ocean (water depths ranging from 5,017 to 6,986 m) using three different PCR primer sets. There were 16, 17, and 15 different operational taxonomic units (OTUs) identified from fungal-universal, Ascomycota-, and Basidiomycota-specific clone libraries, respectively. Majority of the recovered sequences belonged to diverse phylotypes of Ascomycota (25 phylotypes) and Basidiomycota (18 phylotypes). The multiple primer approach totally recovered 27 phylotypes which showed low similarities (≤97 %) with available fungal sequences in the GenBank, suggesting possible new fungal taxa occurring in the deep-sea environments or belonging to taxa not represented in the GenBank. Our results also recovered high fungal LSU rRNA gene copy numbers (3.52 × 10(6) to 5.23 × 10(7)copies/g wet sediment) from the Pacific Ocean sediment samples, suggesting that the fungi might be involved in important ecological functions in the deep-sea environments.
Hefron, Ryan; Borghetti, Brett; Schubert Kabban, Christine; Christensen, James; Estepp, Justin
2018-04-26
Applying deep learning methods to electroencephalograph (EEG) data for cognitive state assessment has yielded improvements over previous modeling methods. However, research focused on cross-participant cognitive workload modeling using these techniques is underrepresented. We study the problem of cross-participant state estimation in a non-stimulus-locked task environment, where a trained model is used to make workload estimates on a new participant who is not represented in the training set. Using experimental data from the Multi-Attribute Task Battery (MATB) environment, a variety of deep neural network models are evaluated in the trade-space of computational efficiency, model accuracy, variance and temporal specificity yielding three important contributions: (1) The performance of ensembles of individually-trained models is statistically indistinguishable from group-trained methods at most sequence lengths. These ensembles can be trained for a fraction of the computational cost compared to group-trained methods and enable simpler model updates. (2) While increasing temporal sequence length improves mean accuracy, it is not sufficient to overcome distributional dissimilarities between individuals’ EEG data, as it results in statistically significant increases in cross-participant variance. (3) Compared to all other networks evaluated, a novel convolutional-recurrent model using multi-path subnetworks and bi-directional, residual recurrent layers resulted in statistically significant increases in predictive accuracy and decreases in cross-participant variance.
Lindqvist, C Mårten; Lundmark, Anders; Nordlund, Jessica; Freyhult, Eva; Ekman, Diana; Carlsson Almlöf, Jonas; Raine, Amanda; Övernäs, Elin; Abrahamsson, Jonas; Frost, Britt-Marie; Grandér, Dan; Heyman, Mats; Palle, Josefine; Forestier, Erik; Lönnerholm, Gudmar; Berglund, Eva C; Syvänen, Ann-Christine
2016-09-27
To characterize the mutational patterns of acute lymphoblastic leukemia (ALL) we performed deep next generation sequencing of 872 cancer genes in 172 diagnostic and 24 relapse samples from 172 pediatric ALL patients. We found an overall greater mutational burden and more driver mutations in T-cell ALL (T-ALL) patients compared to B-cell precursor ALL (BCP-ALL) patients. In addition, the majority of the mutations in T-ALL had occurred in the original leukemic clone, while most of the mutations in BCP-ALL were subclonal. BCP-ALL patients carrying any of the recurrent translocations ETV6-RUNX1, BCR-ABL or TCF3-PBX1 harbored few mutations in driver genes compared to other BCP-ALL patients. Specifically in BCP-ALL, we identified ATRX as a novel putative driver gene and uncovered an association between somatic mutations in the Notch signaling pathway at ALL diagnosis and increased risk of relapse. Furthermore, we identified EP300, ARID1A and SH2B3 as relapse-associated genes. The genes highlighted in our study were frequently involved in epigenetic regulation, associated with germline susceptibility to ALL, and present in minor subclones at diagnosis that became dominant at relapse. We observed a high degree of clonal heterogeneity and evolution between diagnosis and relapse in both BCP-ALL and T-ALL, which could have implications for the treatment efficiency.
Hefron, Ryan; Borghetti, Brett; Schubert Kabban, Christine; Christensen, James; Estepp, Justin
2018-01-01
Applying deep learning methods to electroencephalograph (EEG) data for cognitive state assessment has yielded improvements over previous modeling methods. However, research focused on cross-participant cognitive workload modeling using these techniques is underrepresented. We study the problem of cross-participant state estimation in a non-stimulus-locked task environment, where a trained model is used to make workload estimates on a new participant who is not represented in the training set. Using experimental data from the Multi-Attribute Task Battery (MATB) environment, a variety of deep neural network models are evaluated in the trade-space of computational efficiency, model accuracy, variance and temporal specificity yielding three important contributions: (1) The performance of ensembles of individually-trained models is statistically indistinguishable from group-trained methods at most sequence lengths. These ensembles can be trained for a fraction of the computational cost compared to group-trained methods and enable simpler model updates. (2) While increasing temporal sequence length improves mean accuracy, it is not sufficient to overcome distributional dissimilarities between individuals’ EEG data, as it results in statistically significant increases in cross-participant variance. (3) Compared to all other networks evaluated, a novel convolutional-recurrent model using multi-path subnetworks and bi-directional, residual recurrent layers resulted in statistically significant increases in predictive accuracy and decreases in cross-participant variance. PMID:29701668
Huang, Shunmou; Yang, Hongli; Zhan, Gaomiao; Wang, Xinfa; Liu, Guihua; Wang, Hanzhong
2012-01-01
Background Single nucleotide polymorphisms (SNPs) are an important class of genetic marker for target gene mapping. As of yet, there is no rapid and effective method to identify SNPs linked with agronomic traits in rapeseed and other crop species. Methodology/Principal Findings We demonstrate a novel method for identifying SNP markers in rapeseed by deep sequencing a representative library and performing bulk segregant analysis. With this method, SNPs associated with rapeseed pod shatter-resistance were discovered. Firstly, a reduced representation of the rapeseed genome was used. Genomic fragments ranging from 450–550 bp were prepared from the susceptible bulk (ten F2 plants with the silique shattering resistance index, SSRI <0.10) and the resistance bulk (ten F2 plants with SSRI >0.90), and also Solexa sequencing-produced 90 bp reads. Approximately 50 million of these sequence reads were assembled into contigs to a depth of 20-fold coverage. Secondly, 60,396 ‘simple SNPs’ were identified, and the statistical significance was evaluated using Fisher's exact test. There were 70 associated SNPs whose –log10 p value over 16 were selected to be further analyzed. The distribution of these SNPs appeared a tight cluster, which consisted of 14 associated SNPs within a 396 kb region on chromosome A09. Our evidence indicates that this region contains a major quantitative trait locus (QTL). Finally, two associated SNPs from this region were mapped on a major QTL region. Conclusions/Significance 70 associated SNPs were discovered and a major QTL for rapeseed pod shatter-resistance was found on chromosome A09 using our novel method. The associated SNP markers were used for mapping of the QTL, and may be useful for improving pod shatter-resistance in rapeseed through marker-assisted selection and map-based cloning. This approach will accelerate the discovery of major QTLs and the cloning of functional genes for important agronomic traits in rapeseed and other crop species. PMID:22529909
Govindarajan, Subramaniam S.; Qi, Feng; Li, Jian-Liang; Sahoo, Malaya K.
2017-01-01
ABSTRACT Paenibacillus sp. strain KS1 was isolated from an epiphyte, Tillandsia usneoides (Spanish moss), in central Florida, USA. Here, we report a draft genome sequence of this strain, which consists of a total of 398 contigs spanning 6,508,195 bp, with a G+C content of 46.5% and comprising 5,401 predicted coding sequences. PMID:28153888
Xiao, Bingbing; Niu, Xiaoxi; Han, Na; Wang, Ben; Du, Pengcheng; Na, Risu; Chen, Chen; Liao, Qinping
2016-06-02
Bacterial vaginosis (BV) is a highly prevalent disease in women, and increases the risk of pelvic inflammatory disease. It has been given wide attention because of the high recurrence rate. Traditional diagnostic methods based on microscope providing limited information on the vaginal microbiota increase the difficulty in tracing the development of the disease in bacteria resistance condition. In this study, we used deep-sequencing technology to observe dynamic variation of the vaginal microbiota at three major time points during treatment, at D0 (before treatment), D7 (stop using the antibiotics) and D30 (the 30-day follow-up visit). Sixty-five patients with BV were enrolled (48 were cured and 17 were not cured), and their bacterial composition of the vaginal microbiota was compared. Interestingly, we identified 9 patients might be recurrence. We also introduced a new measurement point of D7, although its microbiota were significantly inhabited by antibiotic and hard to be observed by traditional method. The vaginal microbiota in deep-sequencing-view present a strong correlation to the final outcome. Thus, coupled with detailed individual bioinformatics analysis and deep-sequencing technology, we may illustrate a more accurate map of vaginal microbial to BV patients, which provide a new opportunity to reduce the rate of recurrence of BV.
Brouilette, Scott; Kuersten, Scott; Mein, Charles; Bozek, Monika; Terry, Anna; Dias, Kerith-Rae; Bhaw-Rosun, Leena; Shintani, Yasunori; Coppen, Steven; Ikebe, Chiho; Sawhney, Vinit; Campbell, Niall; Kaneko, Masahiro; Tano, Nobuko; Ishida, Hidekazu; Suzuki, Ken; Yashiro, Kenta
2012-10-01
Deep sequencing of single cell-derived cDNAs offers novel insights into oncogenesis and embryogenesis. However, traditional library preparation for RNA-seq analysis requires multiple steps with consequent sample loss and stochastic variation at each step significantly affecting output. Thus, a simpler and better protocol is desirable. The recently developed hyperactive Tn5-mediated library preparation, which brings high quality libraries, is likely one of the solutions. Here, we tested the applicability of hyperactive Tn5-mediated library preparation to deep sequencing of single cell cDNA, optimized the protocol, and compared it with the conventional method based on sonication. This new technique does not require any expensive or special equipment, which secures wider availability. A library was constructed from only 100 ng of cDNA, which enables the saving of precious specimens. Only a few steps of robust enzymatic reaction resulted in saved time, enabling more specimens to be prepared at once, and with a more reproducible size distribution among the different specimens. The obtained RNA-seq results were comparable to the conventional method. Thus, this Tn5-mediated preparation is applicable for anyone who aims to carry out deep sequencing for single cell cDNAs. Copyright © 2012 Wiley Periodicals, Inc.
Guo, Feng; Wang, Zhi-Ping; Yu, Ke; Zhang, T.
2015-01-01
Foaming of activated sludge (AS) causes adverse impacts on wastewater treatment operation and hygiene. In this study, we investigated the microbial communities of foam, foaming AS and non-foaming AS in a sewage treatment plant via deep-sequencing of the taxonomic marker genes 16S rRNA and mycobacterial rpoB and a metagenomic approach. In addition to Actinobacteria, many genera (e.g., Clostridium XI, Arcobacter, Flavobacterium) were more abundant in the foam than in the AS. On the other hand, deep-sequencing of rpoB did not detect any obligate pathogenic mycobacteria in the foam. We found that unknown factors other than the abundance of Gordonia sp. could determine the foaming process, because abundance of the same species was stable before and after a foaming event over six months. More interestingly, although the dominant Gordonia foam former was the closest with G. amarae, it was identified as an undescribed Gordonia species by referring to the 16S rRNA gene, gyrB and, most convincingly, the reconstructed draft genome from metagenomic reads. Our results, based on metagenomics and deep sequencing, reveal that foams are derived from diverse taxa, which expands previous understanding and provides new insight into the underlying complications of the foaming phenomenon in AS. PMID:25560234
Oasis 2: improved online analysis of small RNA-seq data.
Rahman, Raza-Ur; Gautam, Abhivyakti; Bethune, Jörn; Sattar, Abdul; Fiosins, Maksims; Magruder, Daniel Sumner; Capece, Vincenzo; Shomroni, Orr; Bonn, Stefan
2018-02-14
Small RNA molecules play important roles in many biological processes and their dysregulation or dysfunction can cause disease. The current method of choice for genome-wide sRNA expression profiling is deep sequencing. Here we present Oasis 2, which is a new main release of the Oasis web application for the detection, differential expression, and classification of small RNAs in deep sequencing data. Compared to its predecessor Oasis, Oasis 2 features a novel and speed-optimized sRNA detection module that supports the identification of small RNAs in any organism with higher accuracy. Next to the improved detection of small RNAs in a target organism, the software now also recognizes potential cross-species miRNAs and viral and bacterial sRNAs in infected samples. In addition, novel miRNAs can now be queried and visualized interactively, providing essential information for over 700 high-quality miRNA predictions across 14 organisms. Robust biomarker signatures can now be obtained using the novel enhanced classification module. Oasis 2 enables biologists and medical researchers to rapidly analyze and query small RNA deep sequencing data with improved precision, recall, and speed, in an interactive and user-friendly environment. Oasis 2 is implemented in Java, J2EE, mysql, Python, R, PHP and JavaScript. It is freely available at https://oasis.dzne.de.
Wilson, M R; Zimmermann, L L; Crawford, E D; Sample, H A; Soni, P R; Baker, A N; Khan, L M; DeRisi, J L
2017-03-01
Solid organ transplant patients are vulnerable to suffering neurologic complications from a wide array of viral infections and can be sentinels in the population who are first to get serious complications from emerging infections like the recent waves of arboviruses, including West Nile virus, Chikungunya virus, Zika virus, and Dengue virus. The diverse and rapidly changing landscape of possible causes of viral encephalitis poses great challenges for traditional candidate-based infectious disease diagnostics that already fail to identify a causative pathogen in approximately 50% of encephalitis cases. We present the case of a 14-year-old girl on immunosuppression for a renal transplant who presented with acute meningoencephalitis. Traditional diagnostics failed to identify an etiology. RNA extracted from her cerebrospinal fluid was subjected to unbiased metagenomic deep sequencing, enhanced with the use of a Cas9-based technique for host depletion. This analysis identified West Nile virus (WNV). Convalescent serum serologies subsequently confirmed WNV seroconversion. These results support a clear clinical role for metagenomic deep sequencing in the setting of suspected viral encephalitis, especially in the context of the high-risk transplant patient population. © 2016 The Authors. American Journal of Transplantation published by Wiley Periodicals, Inc. on behalf of American Society of Transplant Surgeons.
A deep learning framework for causal shape transformation.
Lore, Kin Gwn; Stoecklein, Daniel; Davies, Michael; Ganapathysubramanian, Baskar; Sarkar, Soumik
2018-02-01
Recurrent neural network (RNN) and Long Short-term Memory (LSTM) networks are the common go-to architecture for exploiting sequential information where the output is dependent on a sequence of inputs. However, in most considered problems, the dependencies typically lie in the latent domain which may not be suitable for applications involving the prediction of a step-wise transformation sequence that is dependent on the previous states only in the visible domain with a known terminal state. We propose a hybrid architecture of convolution neural networks (CNN) and stacked autoencoders (SAE) to learn a sequence of causal actions that nonlinearly transform an input visual pattern or distribution into a target visual pattern or distribution with the same support and demonstrated its practicality in a real-world engineering problem involving the physics of fluids. We solved a high-dimensional one-to-many inverse mapping problem concerning microfluidic flow sculpting, where the use of deep learning methods as an inverse map is very seldom explored. This work serves as a fruitful use-case to applied scientists and engineers in how deep learning can be beneficial as a solution for high-dimensional physical problems, and potentially opening doors to impactful advance in fields such as material sciences and medical biology where multistep topological transformations is a key element. Copyright © 2017 Elsevier Ltd. All rights reserved.
Small RNA Deep Sequencing and the Effects of microRNA408 on Root Gravitropic Bending in Arabidopsis
NASA Astrophysics Data System (ADS)
Li, Huasheng; Lu, Jinying; Sun, Qiao; Chen, Yu; He, Dacheng; Liu, Min
2015-11-01
MicroRNA (miRNA) is a non-coding small RNA composed of 20 to 24 nucleotides that influences plant root development. This study analyzed the miRNA expression in Arabidopsis root tip cells using Illumina sequencing and real-time PCR before (sample 0) and 15 min after (sample 15) a 3-D clinostat rotational treatment was administered. After stimulation was performed, the expression levels of seven miRNA genes, including Arabidopsis miR160, miR161, miR394, miR402, miR403, miR408, and miR823, were significantly upregulated. Illumina sequencing results also revealed two novel miRNAsthat have not been previously reported, The target genes of these miRNAs included pentatricopeptide repeat-containing protein and diadenosine tetraphosphate hydrolase. An overexpression vector of Arabidopsis miR408 was constructed and transferred to Arabidopsis plant. The roots of plants over expressing miR408 exhibited a slower reorientation upon gravistimulation in comparison with those of wild-type. This result indicate that miR408 could play a role in root gravitropic response.
Siniscalchi, Luciene Alves Batista; Leite, Laura Rabelo; Oliveira, Guilherme; Chernicharo, Carlos Augusto Lemos; de Araújo, Juliana Calabria
2017-07-01
Methane is produced in anaerobic environments, such as reactors used to treat wastewaters, and can be consumed by methanotrophs. The composition and structure of a microbial community enriched from anaerobic sewage sludge under methane-oxidation condition coupled to denitrification were investigated. Denaturing gradient gel electrophoresis (DGGE) analysis retrieved sequences of Methylocaldum and Chloroflexi. Deep sequencing analysis revealed a complex community that changed over time and was affected by methane concentration. Methylocaldum (8.2%), Methylosinus (2.3%), Methylomonas (0.02%), Methylacidiphilales (0.45%), Nitrospirales (0.18%), and Methanosarcinales (0.3%) were detected. Despite denitrifying conditions provided, Nitrospirales and Methanosarcinales, known to perform anaerobic methane oxidation coupled to denitrification (DAMO) process, were in very low abundance. Results demonstrated that aerobic and anaerobic methanotrophs coexisted in the reactor together with heterotrophic microorganisms, suggesting that a diverse microbial community was important to sustain methanotrophic activity. The methanogenic sludge was a good inoculum to enrich methanotrophs, and cultivation conditions play a selective role in determining community composition.
Viral to metazoan marine plankton nucleotide sequences from the Tara Oceans expedition
Alberti, Adriana; Poulain, Julie; Engelen, Stefan; Labadie, Karine; Romac, Sarah; Ferrera, Isabel; Albini, Guillaume; Aury, Jean-Marc; Belser, Caroline; Bertrand, Alexis; Cruaud, Corinne; Da Silva, Corinne; Dossat, Carole; Gavory, Frédérick; Gas, Shahinaz; Guy, Julie; Haquelle, Maud; Jacoby, E'krame; Jaillon, Olivier; Lemainque, Arnaud; Pelletier, Eric; Samson, Gaëlle; Wessner, Mark; Bazire, Pascal; Beluche, Odette; Bertrand, Laurie; Besnard-Gonnet, Marielle; Bordelais, Isabelle; Boutard, Magali; Dubois, Maria; Dumont, Corinne; Ettedgui, Evelyne; Fernandez, Patricia; Garcia, Espérance; Aiach, Nathalie Giordanenco; Guerin, Thomas; Hamon, Chadia; Brun, Elodie; Lebled, Sandrine; Lenoble, Patricia; Louesse, Claudine; Mahieu, Eric; Mairey, Barbara; Martins, Nathalie; Megret, Catherine; Milani, Claire; Muanga, Jacqueline; Orvain, Céline; Payen, Emilie; Perroud, Peggy; Petit, Emmanuelle; Robert, Dominique; Ronsin, Murielle; Vacherie, Benoit; Acinas, Silvia G.; Royo-Llonch, Marta; Cornejo-Castillo, Francisco M.; Logares, Ramiro; Fernández-Gómez, Beatriz; Bowler, Chris; Cochrane, Guy; Amid, Clara; Hoopen, Petra Ten; De Vargas, Colomban; Grimsley, Nigel; Desgranges, Elodie; Kandels-Lewis, Stefanie; Ogata, Hiroyuki; Poulton, Nicole; Sieracki, Michael E.; Stepanauskas, Ramunas; Sullivan, Matthew B.; Brum, Jennifer R.; Duhaime, Melissa B.; Poulos, Bonnie T.; Hurwitz, Bonnie L.; Acinas, Silvia G.; Bork, Peer; Boss, Emmanuel; Bowler, Chris; De Vargas, Colomban; Follows, Michael; Gorsky, Gabriel; Grimsley, Nigel; Hingamp, Pascal; Iudicone, Daniele; Jaillon, Olivier; Kandels-Lewis, Stefanie; Karp-Boss, Lee; Karsenti, Eric; Not, Fabrice; Ogata, Hiroyuki; Pesant, Stéphane; Raes, Jeroen; Sardet, Christian; Sieracki, Michael E.; Speich, Sabrina; Stemmann, Lars; Sullivan, Matthew B.; Sunagawa, Shinichi; Wincker, Patrick; Pesant, Stéphane; Karsenti, Eric; Wincker, Patrick
2017-01-01
A unique collection of oceanic samples was gathered by the Tara Oceans expeditions (2009–2013), targeting plankton organisms ranging from viruses to metazoans, and providing rich environmental context measurements. Thanks to recent advances in the field of genomics, extensive sequencing has been performed for a deep genomic analysis of this huge collection of samples. A strategy based on different approaches, such as metabarcoding, metagenomics, single-cell genomics and metatranscriptomics, has been chosen for analysis of size-fractionated plankton communities. Here, we provide detailed procedures applied for genomic data generation, from nucleic acids extraction to sequence production, and we describe registries of genomics datasets available at the European Nucleotide Archive (ENA, www.ebi.ac.uk/ena). The association of these metadata to the experimental procedures applied for their generation will help the scientific community to access these data and facilitate their analysis. This paper complements other efforts to provide a full description of experiments and open science resources generated from the Tara Oceans project, further extending their value for the study of the world’s planktonic ecosystems. PMID:28763055
Viral to metazoan marine plankton nucleotide sequences from the Tara Oceans expedition.
Alberti, Adriana; Poulain, Julie; Engelen, Stefan; Labadie, Karine; Romac, Sarah; Ferrera, Isabel; Albini, Guillaume; Aury, Jean-Marc; Belser, Caroline; Bertrand, Alexis; Cruaud, Corinne; Da Silva, Corinne; Dossat, Carole; Gavory, Frédérick; Gas, Shahinaz; Guy, Julie; Haquelle, Maud; Jacoby, E'krame; Jaillon, Olivier; Lemainque, Arnaud; Pelletier, Eric; Samson, Gaëlle; Wessner, Mark; Acinas, Silvia G; Royo-Llonch, Marta; Cornejo-Castillo, Francisco M; Logares, Ramiro; Fernández-Gómez, Beatriz; Bowler, Chris; Cochrane, Guy; Amid, Clara; Hoopen, Petra Ten; De Vargas, Colomban; Grimsley, Nigel; Desgranges, Elodie; Kandels-Lewis, Stefanie; Ogata, Hiroyuki; Poulton, Nicole; Sieracki, Michael E; Stepanauskas, Ramunas; Sullivan, Matthew B; Brum, Jennifer R; Duhaime, Melissa B; Poulos, Bonnie T; Hurwitz, Bonnie L; Pesant, Stéphane; Karsenti, Eric; Wincker, Patrick
2017-08-01
A unique collection of oceanic samples was gathered by the Tara Oceans expeditions (2009-2013), targeting plankton organisms ranging from viruses to metazoans, and providing rich environmental context measurements. Thanks to recent advances in the field of genomics, extensive sequencing has been performed for a deep genomic analysis of this huge collection of samples. A strategy based on different approaches, such as metabarcoding, metagenomics, single-cell genomics and metatranscriptomics, has been chosen for analysis of size-fractionated plankton communities. Here, we provide detailed procedures applied for genomic data generation, from nucleic acids extraction to sequence production, and we describe registries of genomics datasets available at the European Nucleotide Archive (ENA, www.ebi.ac.uk/ena). The association of these metadata to the experimental procedures applied for their generation will help the scientific community to access these data and facilitate their analysis. This paper complements other efforts to provide a full description of experiments and open science resources generated from the Tara Oceans project, further extending their value for the study of the world's planktonic ecosystems.
Evidence for a persistent microbial seed bank throughout the global ocean
Gibbons, Sean M.; Caporaso, J. Gregory; Pirrung, Meg; Field, Dawn; Knight, Rob; Gilbert, Jack A.
2013-01-01
Do bacterial taxa demonstrate clear endemism, like macroorganisms, or can one site’s bacterial community recapture the total phylogenetic diversity of the world’s oceans? Here we compare a deep bacterial community characterization from one site in the English Channel (L4-DeepSeq) with 356 datasets from the International Census of Marine Microbes (ICoMM) taken from around the globe (ranging from marine pelagic and sediment samples to sponge-associated environments). At the L4-DeepSeq site, increasing sequencing depth uncovers greater phylogenetic overlap with the global ICoMM data. This site contained 31.7–66.2% of operational taxonomic units identified in a given ICoMM biome. Extrapolation of this overlap suggests that 1.93 × 1011 sequences from the L4 site would capture all ICoMM bacterial phylogenetic diversity. Current technology trends suggest this limit may be attainable within 3 y. These results strongly suggest the marine biosphere maintains a previously undetected, persistent microbial seed bank. PMID:23487761
Zhang, De-Chao; Liu, Yan-Xia; Li, Xin-Zheng
2015-09-01
Deep sea ferromanganese (FeMn) nodules contain metallic mineral resources and have great economic potential. In this study, a combination of culture-dependent and culture-independent (16S rRNA genes clone library and pyrosequencing) methods was used to investigate the bacterial diversity in FeMn nodules from Jiaolong Seamount, the South China Sea. Eleven bacterial strains including some moderate thermophiles were isolated. The majority of strains belonged to the phylum Proteobacteria; one isolate belonged to the phylum Firmicutes. A total of 259 near full-length bacterial 16S rRNA gene sequences in a clone library and 67,079 valid reads obtained using pyrosequencing indicated that members of the Gammaproteobacteria dominated, with the most abundant bacterial genera being Pseudomonas and Alteromonas. Sequence analysis indicated the presence of many organisms whose closest relatives are known manganese oxidizers, iron reducers, hydrogen-oxidizing bacteria and methylotrophs. This is the first reported investigation of bacterial diversity associated with deep sea FeMn nodules from the South China Sea.
Zhu, Yuan O; Aw, Pauline P K; de Sessions, Paola Florez; Hong, Shuzhen; See, Lee Xian; Hong, Lewis Z; Wilm, Andreas; Li, Chen Hao; Hue, Stephane; Lim, Seng Gee; Nagarajan, Niranjan; Burkholder, William F; Hibberd, Martin
2017-10-27
Viral populations are complex, dynamic, and fast evolving. The evolution of groups of closely related viruses in a competitive environment is termed quasispecies. To fully understand the role that quasispecies play in viral evolution, characterizing the trajectories of viral genotypes in an evolving population is the key. In particular, long-range haplotype information for thousands of individual viruses is critical; yet generating this information is non-trivial. Popular deep sequencing methods generate relatively short reads that do not preserve linkage information, while third generation sequencing methods have higher error rates that make detection of low frequency mutations a bioinformatics challenge. Here we applied BAsE-Seq, an Illumina-based single-virion sequencing technology, to eight samples from four chronic hepatitis B (CHB) patients - once before antiviral treatment and once after viral rebound due to resistance. With single-virion sequencing, we obtained 248-8796 single-virion sequences per sample, which allowed us to find evidence for both hard and soft selective sweeps. We were able to reconstruct population demographic history that was independently verified by clinically collected data. We further verified four of the samples independently through PacBio SMRT and Illumina Pooled deep sequencing. Overall, we showed that single-virion sequencing yields insight into viral evolution and population dynamics in an efficient and high throughput manner. We believe that single-virion sequencing is widely applicable to the study of viral evolution in the context of drug resistance and host adaptation, allows differentiation between soft or hard selective sweeps, and may be useful in the reconstruction of intra-host viral population demographic history.
Giugni, Elisabetta; Sabatini, Umberto; Hagberg, Gisela E; Formisano, Rita; Castriota-Scanderbeg, Alessandro
2005-05-01
Diffuse axonal injury (DAI) is a common type of primary neuronal injury in patients with severe traumatic brain injury (TBI), and is frequently accompanied by tissue tear hemorrhage. T2-weighted gradient-recalled echo (GRE) sequences are more sensitive than T2-weighted spin-echo images for detection of hemorrhage. The purpose of this study is to compare turbo Proton Echo Planar Spectroscopic Imaging (t-PEPSI), an extremely fast sequence, with GRE sequence in the detection of DAI. Twenty-one patients (mean age 26.8 years) with severe TBI occurred at least 3 months earlier, underwent a brain MR Imaging study on a 1.5-T scanner. A qualitative evaluation of the t-PEPSI sequences was performed by identifying the optimal echo time and in-plane resolution. The number and size of DAI lesions, as well as the signal intensity contrast ratio (SI CR), were computed for each set of GRE and t-PEPSI images, and divided according to their anatomic location as lobar and/or deep brain. There was no significant difference between GRE and t-PEPSI sequences in the detection of the total number of DAI lesions (291 vs. 230, respectively). GRE sequence delineated a higher number of DAI in the temporal lobe compared to the t-PEPSI sequence (74 vs. 37, P < .004), while no differences were found for the other regions. The SI CR was significantly lower with the t-PEPSI than the GRE sequence (P < .00001). Owing to its very short scan time and high sensitivity to the hemorrhage foci, the t-PEPSI sequence may be used as an alternative to the GRE to assess brain DAI in severe TBI patients, especially if uncooperative and medically unstable.
Metagenomic Analysis of Viral Communities in (Hado)Pelagic Sediments
Yoshida, Mitsuhiro; Takaki, Yoshihiro; Eitoku, Masamitsu; Nunoura, Takuro; Takai, Ken
2013-01-01
In this study, we analyzed viral metagenomes (viromes) in the sedimentary habitats of three geographically and geologically distinct (hado)pelagic environments in the northwest Pacific; the Izu-Ogasawara Trench (water depth = 9,760 m) (OG), the Challenger Deep in the Mariana Trench (10,325 m) (MA), and the forearc basin off the Shimokita Peninsula (1,181 m) (SH). Virus abundance ranged from 106 to 1011 viruses/cm3 of sediments (down to 30 cm below the seafloor [cmbsf]). We recovered viral DNA assemblages (viromes) from the (hado)pelagic sediment samples and obtained a total of 37,458, 39,882, and 70,882 sequence reads by 454 GS FLX Titanium pyrosequencing from the virome libraries of the OG, MA, and SH (hado)pelagic sediments, respectively. Only 24−30% of the sequence reads from each virome library exhibited significant similarities to the sequences deposited in the public nr protein database (E-value <10−3 in BLAST). Among the sequences identified as potential viral genes based on the BLAST search, 95−99% of the sequence reads in each library were related to genes from single-stranded DNA (ssDNA) viral families, including Microviridae, Circoviridae, and Geminiviridae. A relatively high abundance of sequences related to the genetic markers (major capsid protein [VP1] and replication protein [Rep]) of two ssDNA viral groups were also detected in these libraries, thereby revealing a high genotypic diversity of their viruses (833 genotypes for VP1 and 2,551 genotypes for Rep). A majority of the viral genes predicted from each library were classified into three ssDNA viral protein categories: Rep, VP1, and minor capsid protein. The deep-sea sedimentary viromes were distinct from the viromes obtained from the oceanic and fresh waters and marine eukaryotes, and thus, deep-sea sediments harbor novel viromes, including previously unidentified ssDNA viruses. PMID:23468952
Metagenomic analysis of viral communities in (hado)pelagic sediments.
Yoshida, Mitsuhiro; Takaki, Yoshihiro; Eitoku, Masamitsu; Nunoura, Takuro; Takai, Ken
2013-01-01
In this study, we analyzed viral metagenomes (viromes) in the sedimentary habitats of three geographically and geologically distinct (hado)pelagic environments in the northwest Pacific; the Izu-Ogasawara Trench (water depth = 9,760 m) (OG), the Challenger Deep in the Mariana Trench (10,325 m) (MA), and the forearc basin off the Shimokita Peninsula (1,181 m) (SH). Virus abundance ranged from 10(6) to 10(11) viruses/cm(3) of sediments (down to 30 cm below the seafloor [cmbsf]). We recovered viral DNA assemblages (viromes) from the (hado)pelagic sediment samples and obtained a total of 37,458, 39,882, and 70,882 sequence reads by 454 GS FLX Titanium pyrosequencing from the virome libraries of the OG, MA, and SH (hado)pelagic sediments, respectively. Only 24-30% of the sequence reads from each virome library exhibited significant similarities to the sequences deposited in the public nr protein database (E-value <10(-3) in BLAST). Among the sequences identified as potential viral genes based on the BLAST search, 95-99% of the sequence reads in each library were related to genes from single-stranded DNA (ssDNA) viral families, including Microviridae, Circoviridae, and Geminiviridae. A relatively high abundance of sequences related to the genetic markers (major capsid protein [VP1] and replication protein [Rep]) of two ssDNA viral groups were also detected in these libraries, thereby revealing a high genotypic diversity of their viruses (833 genotypes for VP1 and 2,551 genotypes for Rep). A majority of the viral genes predicted from each library were classified into three ssDNA viral protein categories: Rep, VP1, and minor capsid protein. The deep-sea sedimentary viromes were distinct from the viromes obtained from the oceanic and fresh waters and marine eukaryotes, and thus, deep-sea sediments harbor novel viromes, including previously unidentified ssDNA viruses.
Li, Guo; Liu, Yong; Liu, Chao; Su, Zhongwu; Ren, Shuling; Wang, Yunyun; Deng, Tengbo; Huang, Donghai; Tian, Yongquan; Qiu, Yuanzheng
2016-09-06
Radioresistance is one of the major factors limiting the therapeutic efficacy and prognosis of patients with nasopharyngeal carcinoma (NPC). Accumulating evidence has suggested that aberrant expression of long noncoding RNAs (lncRNAs) contributes to cancer progression. Therefore, here we identified lncRNAs associated with radioresistance in NPC. The differential expression profiles of lncRNAs associated with NPC radioresistance were constructed by next-generation deep sequencing by comparing radioresistant NPC cells with their parental cells. LncRNA-related mRNAs were predicted and analyzed using bioinformatics algorithms compared with the mRNA profiles related to radioresistance obtained in our previous study. Several lncRNAs and associated mRNAs were validated in established NPC radioresistant cell models and NPC tissues. By comparison between radioresistant CNE-2-Rs and parental CNE-2 cells by next-generation deep sequencing, a total of 781 known lncRNAs and 2054 novel lncRNAs were annotated. The top five upregulated and downregulated known/novel lncRNAs were detected using quantitative real-time reverse transcription-polymerase chain reaction, and 7/10 known lncRNAs and 3/10 novel lncRNAs were demonstrated to have significant differential expression trends that were the same as those predicted by deep sequencing. From the prediction process, 13 pairs of lncRNAs and their associated genes were acquired, and the prediction trends of three pairs were validated in both radioresistant CNE-2-Rs and 6-10B-Rs cell lines, including lncRNA n373932 and SLITRK5, n409627 and PRSS12, and n386034 and RIMKLB. LncRNA n373932 and its related SLITRK5 showed dramatic expression changes in post-irradiation radioresistant cells and a negative expression correlation in NPC tissues (R = -0.595, p < 0.05). Our study provides an overview of the expression profiles of radioresistant lncRNAs and potentially related mRNAs, which will facilitate future investigations into the function of lncRNAs in NPC radioresistance.
Itakura, Jun; Kurosaki, Masayuki; Higuchi, Mayu; Takada, Hitomi; Nakakuki, Natsuko; Itakura, Yoshie; Tamaki, Nobuharu; Yasui, Yutaka; Suzuki, Shoko; Tsuchiya, Kaoru; Nakanishi, Hiroyuki; Takahashi, Yuka; Maekawa, Shinya; Enomoto, Nobuyuki; Izumi, Namiki
2015-01-01
The presence of resistance-associated variants (RAVs) of hepatitis C virus (HCV) attenuates the efficacy of direct acting antivirals (DAAs). The objective of this study was to characterize the susceptibility of RAVs to interferon-based therapy. Direct and deep sequencing were performed to detect Y93H RAV in the NS5A region. Twenty nine genotype 1b patients with detectable RAV at baseline were treated by a combination of simeprevir, pegylated interferon and ribavirin. The longitudinal changes in the proportion of Y93H RAV during therapy and at breakthrough or relapse were determined. By direct sequencing, Y93H RAV became undetectable or decreased in proportion at an early time point during therapy (within 7 days) in 57% of patients with both the Y93H variant and wild type virus at baseline when HCV RNA was still detectable. By deep sequencing, the proportion of Y93H RAV against Y93 wild type was 52.7% (5.8%- 97.4%) at baseline which significantly decreased to 29.7% (0.16%- 98.3%) within 7 days of initiation of treatment (p = 0.023). The proportion of Y93H RAV was reduced in 21 of 29 cases (72.4%) and a marked reduction of more than 10% was observed in 14 cases (48.7%). HCV RNA reduction was significantly greater for Y93H RAV (-3.65±1.3 logIU/mL/day) than the Y93 wild type (-3.35±1.0 logIU/mL/day) (p<0.001). Y93H RAV is more susceptible to interferon-based therapy than the Y93 wild type.
Deep sequencing-based analysis of the anaerobic stimulon in Neisseria gonorrhoeae
2011-01-01
Background Maintenance of an anaerobic denitrification system in the obligate human pathogen, Neisseria gonorrhoeae, suggests that an anaerobic lifestyle may be important during the course of infection. Furthermore, mounting evidence suggests that reduction of host-produced nitric oxide has several immunomodulary effects on the host. However, at this point there have been no studies analyzing the complete gonococcal transcriptome response to anaerobiosis. Here we performed deep sequencing to compare the gonococcal transcriptomes of aerobically and anaerobically grown cells. Using the information derived from this sequencing, we discuss the implications of the robust transcriptional response to anaerobic growth. Results We determined that 198 chromosomal genes were differentially expressed (~10% of the genome) in response to anaerobic conditions. We also observed a large induction of genes encoded within the cryptic plasmid, pJD1. Validation of RNA-seq data using translational-lacZ fusions or RT-PCR demonstrated the RNA-seq results to be very reproducible. Surprisingly, many genes of prophage origin were induced anaerobically, as well as several transcriptional regulators previously unknown to be involved in anaerobic growth. We also confirmed expression and regulation of a small RNA, likely a functional equivalent of fnrS in the Enterobacteriaceae family. We also determined that many genes found to be responsive to anaerobiosis have also been shown to be responsive to iron and/or oxidative stress. Conclusions Gonococci will be subject to many forms of environmental stress, including oxygen-limitation, during the course of infection. Here we determined that the anaerobic stimulon in gonococci was larger than previous studies would suggest. Many new targets for future research have been uncovered, and the results derived from this study may have helped to elucidate factors or mechanisms of virulence that may have otherwise been overlooked. PMID:21251255
Leung, Ross Ka-Kit; Dong, Zhi Qiang; Sa, Fei; Chong, Cheong Meng; Lei, Si Wan; Tsui, Stephen Kwok-Wing; Lee, Simon Ming-Yuen
2014-02-01
Minor variants have significant implications in quasispecies evolution, early cancer detection and non-invasive fetal genotyping but their accurate detection by next-generation sequencing (NGS) is hampered by sequencing errors. We generated sequencing data from mixtures at predetermined ratios in order to provide insight into sequencing errors and variations that can arise for which simulation cannot be performed. The information also enables better parameterization in depth of coverage, read quality and heterogeneity, library preparation techniques, technical repeatability for mathematical modeling, theory development and simulation experimental design. We devised minor variant authentication rules that achieved 100% accuracy in both testing and validation experiments. The rules are free from tedious inspection of alignment accuracy, sequencing read quality or errors introduced by homopolymers. The authentication processes only require minor variants to: (1) have minimum depth of coverage larger than 30; (2) be reported by (a) four or more variant callers, or (b) DiBayes or LoFreq, plus SNVer (or BWA when no results are returned by SNVer), and with the interassay coefficient of variation (CV) no larger than 0.1. Quantification accuracy undermined by sequencing errors could neither be overcome by ultra-deep sequencing, nor recruiting more variant callers to reach a consensus, such that consistent underestimation and overestimation (i.e. low CV) were observed. To accommodate stochastic error and adjust the observed ratio within a specified accuracy, we presented a proof of concept for the use of a double calibration curve for quantification, which provides an important reference towards potential industrial-scale fabrication of calibrants for NGS.
Oral Microbiome of Deep and Shallow Dental Pockets In Chronic Periodontitis
Ge, Xiuchun; Rodriguez, Rafael; Trinh, My; Gunsolley, John; Xu, Ping
2013-01-01
We examined the subgingival bacterial biodiversity in untreated chronic periodontitis patients by sequencing 16S rRNA genes. The primary purpose of the study was to compare the oral microbiome in deep (diseased) and shallow (healthy) sites. A secondary purpose was to evaluate the influences of smoking, race and dental caries on this relationship. A total of 88 subjects from two clinics were recruited. Paired subgingival plaque samples were taken from each subject, one from a probing site depth >5 mm (deep site) and the other from a probing site depth ≤3mm (shallow site). A universal primer set was designed to amplify the V4–V6 region for oral microbial 16S rRNA sequences. Differences in genera and species attributable to deep and shallow sites were determined by statistical analysis using a two-part model and false discovery rate. Fifty-one of 170 genera and 200 of 746 species were found significantly different in abundances between shallow and deep sites. Besides previously identified periodontal disease-associated bacterial species, additional species were found markedly changed in diseased sites. Cluster analysis revealed that the microbiome difference between deep and shallow sites was influenced by patient-level effects such as clinic location, race and smoking. The differences between clinic locations may be influenced by racial distribution, in that all of the African Americans subjects were seen at the same clinic. Our results suggested that there were influences from the microbiome for caries and periodontal disease and these influences are independent. PMID:23762384
Li, S; Dumdei, E J; Blunt, J W; Munro, M H; Robinson, W T; Pannell, L K
1998-06-26
The structure, stereochemistry, and conformation of theonellapeptolide IIIe (1), a new 36-membered ring cyclic peptolide from the New Zealand deep-water sponge Lamellomorpha strongylata, is described. The sequence of the cytotoxic peptolide was determined through a combination of NMR and MS-MS techniques and confirmed by X-ray crystal structure analysis, which, with chiral HPLC, established the absolute stereochemistry.
Fusarium musae as cause of superficial and deep-seated human infections.
Esposto, M C; Prigitano, A; Tortorano, A M
2016-12-01
BLAST analysis in GenBank of 60 Fusarium verticillioides clinical isolates using the sequence of translation elongation factor 1-alpha allowed the identification of four F. musae confirming that this species is not a rare etiology of superficial and deep infections and that its habitat is not restricted to banana fruits. Copyright © 2016 Elsevier Masson SAS. All rights reserved.
Filippidou, Sevasti; Jaussi, Marion; Junier, Thomas; Wunderlin, Tina; Jeanneret, Nicole; Regenspurg, Simona; Li, Po-E; Lo, Chien-Chi; Johnson, Shannon; McMurry, Kim; Gleasner, Cheryl D; Vuyisich, Momchilo; Chain, Patrick S; Junier, Pilar
2015-08-27
The genome of strain GS3372 is the first publicly available strain of Aeribacillus pallidus. This endospore-forming thermophilic strain was isolated from a deep geothermal reservoir. The availability of this genome can contribute to the clarification of the taxonomy of the closely related Anoxybacillus, Geobacillus, and Aeribacillus genera. Copyright © 2015 Filippidou et al.
Genome-wide discovery of novel and conserved microRNAs in white shrimp (Litopenaeus vannamei).
Xi, Qian-Yun; Xiong, Yuan-Yan; Wang, Yuan-Mei; Cheng, Xiao; Qi, Qi-En; Shu, Gang; Wang, Song-Bo; Wang, Li-Na; Gao, Ping; Zhu, Xiao-Tong; Jiang, Qing-Yan; Zhang, Yong-Liang; Liu, Li
2015-01-01
Of late years, a large amount of conserved and species-specific microRNAs (miRNAs) have been performed on identification from species which are economically important but lack a full genome sequence. In this study, Solexa deep sequencing and cross-species miRNA microarray were used to detect miRNAs in white shrimp. We identified 239 conserved miRNAs, 14 miRNA* sequences and 20 novel miRNAs by bioinformatics analysis from 7,561,406 high-quality reads representing 325,370 distinct sequences. The all 20 novel miRNAs were species-specific in white shrimp and not homologous in other species. Using the conserved miRNAs from the miRBase database as a query set to search for homologs from shrimp expressed sequence tags (ESTs), 32 conserved computationally predicted miRNAs were discovered in shrimp. In addition, using microarray analysis in the shrimp fed with Panax ginseng polysaccharide complex, 151 conserved miRNAs were identified, 18 of which were significant up-expression, while 49 miRNAs were significant down-expression. In particular, qRT-PCR analysis was also performed for nine miRNAs in three shrimp tissues such as muscle, gill and hepatopancreas. Results showed that these miRNAs expression are tissue specific. Combining results of the three methods, we detected 20 novel and 394 conserved miRNAs. Verification with quantitative reverse transcription (qRT-PCR) and Northern blot showed a high confidentiality of data. The study provides the first comprehensive specific miRNA profile of white shrimp, which includes useful information for future investigations into the function of miRNAs in regulation of shrimp development and immunology.
Lata, Pushpa; Govindarajan, Subramaniam S; Qi, Feng; Li, Jian-Liang; Sahoo, Malaya K
2017-02-02
Paenibacillus sp. strain KS1 was isolated from an epiphyte, Tillandsia usneoides (Spanish moss), in central Florida, USA. Here, we report a draft genome sequence of this strain, which consists of a total of 398 contigs spanning 6,508,195 bp, with a G+C content of 46.5% and comprising 5,401 predicted coding sequences. Copyright © 2017 Lata et al.
Xiang, Yu; Bernardy, Mike; Bhagwat, Basdeo; Wiersma, Paul A; DeYoung, Robyn; Bouthillier, Michel
2015-02-01
Strawberry decline disease, probably caused by synergistic reactions of mixed virus infections, threatens the North American strawberry industry. Deep sequencing of strawberry plant samples from eastern Canada resulted in the identification of a new virus genome resembling poleroviruses in sequence and genome structure. Phylogenetic analysis suggests that it is a new member of the genus Polerovirus, family Luteoviridae. The virus is tentatively named "strawberry polerovirus 1" (SPV1).
Tarn, Jonathan; Peoples, Logan M; Hardy, Kevin; Cameron, James; Bartlett, Douglas H
2016-01-01
Relatively few studies have described the microbial populations present in ultra-deep hadal environments, largely as a result of difficulties associated with sampling. Here we report Illumina-tag V6 16S rRNA sequence-based analyses of the free-living and particle-associated microbial communities recovered from locations within two of the deepest hadal sites on Earth, the Challenger Deep (10,918 meters below surface-mbs) and the Sirena Deep (10,667 mbs) within the Mariana Trench, as well as one control site (Ulithi Atoll, 761 mbs). Seawater samples were collected using an autonomous lander positioned ~1 m above the seafloor. The bacterial populations within the Mariana Trench bottom water samples were dissimilar to other deep-sea microbial communities, though with overlap with those of diffuse flow hydrothermal vents and deep-subsurface locations. Distinct particle-associated and free-living bacterial communities were found to exist. The hadal bacterial populations were also markedly different from one another, indicating the likelihood of different chemical conditions at the two sites. In contrast to the bacteria, the hadal archaeal communities were more similar to other less deep datasets and to each other due to an abundance of cosmopolitan deep-sea taxa. The hadal communities were enriched in 34 bacterial and 4 archaeal operational taxonomic units (OTUs) including members of the Gammaproteobacteria, Epsilonproteobacteria, Marinimicrobia, Cyanobacteria, Deltaproteobacteria, Gemmatimonadetes, Atribacteria, Spirochaetes, and Euryarchaeota. Sequences matching cultivated piezophiles were notably enriched in the Challenger Deep, especially within the particle-associated fraction, and were found in higher abundances than in other hadal studies, where they were either far less prevalent or missing. Our results indicate the importance of heterotrophy, sulfur-cycling, and methane and hydrogen utilization within the bottom waters of the deeper regions of the Mariana Trench, and highlight novel community features of these extreme habitats.
Tarn, Jonathan; Peoples, Logan M.; Hardy, Kevin; Cameron, James; Bartlett, Douglas H.
2016-01-01
Relatively few studies have described the microbial populations present in ultra-deep hadal environments, largely as a result of difficulties associated with sampling. Here we report Illumina-tag V6 16S rRNA sequence-based analyses of the free-living and particle-associated microbial communities recovered from locations within two of the deepest hadal sites on Earth, the Challenger Deep (10,918 meters below surface-mbs) and the Sirena Deep (10,667 mbs) within the Mariana Trench, as well as one control site (Ulithi Atoll, 761 mbs). Seawater samples were collected using an autonomous lander positioned ~1 m above the seafloor. The bacterial populations within the Mariana Trench bottom water samples were dissimilar to other deep-sea microbial communities, though with overlap with those of diffuse flow hydrothermal vents and deep-subsurface locations. Distinct particle-associated and free-living bacterial communities were found to exist. The hadal bacterial populations were also markedly different from one another, indicating the likelihood of different chemical conditions at the two sites. In contrast to the bacteria, the hadal archaeal communities were more similar to other less deep datasets and to each other due to an abundance of cosmopolitan deep-sea taxa. The hadal communities were enriched in 34 bacterial and 4 archaeal operational taxonomic units (OTUs) including members of the Gammaproteobacteria, Epsilonproteobacteria, Marinimicrobia, Cyanobacteria, Deltaproteobacteria, Gemmatimonadetes, Atribacteria, Spirochaetes, and Euryarchaeota. Sequences matching cultivated piezophiles were notably enriched in the Challenger Deep, especially within the particle-associated fraction, and were found in higher abundances than in other hadal studies, where they were either far less prevalent or missing. Our results indicate the importance of heterotrophy, sulfur-cycling, and methane and hydrogen utilization within the bottom waters of the deeper regions of the Mariana Trench, and highlight novel community features of these extreme habitats. PMID:27242695
Evolutionary process of deep-sea bathymodiolus mussels.
Miyazaki, Jun-Ichi; de Oliveira Martins, Leonardo; Fujita, Yuko; Matsumoto, Hiroto; Fujiwara, Yoshihiro
2010-04-27
Since the discovery of deep-sea chemosynthesis-based communities, much work has been done to clarify their organismal and environmental aspects. However, major topics remain to be resolved, including when and how organisms invade and adapt to deep-sea environments; whether strategies for invasion and adaptation are shared by different taxa or unique to each taxon; how organisms extend their distribution and diversity; and how they become isolated to speciate in continuous waters. Deep-sea mussels are one of the dominant organisms in chemosynthesis-based communities, thus investigations of their origin and evolution contribute to resolving questions about life in those communities. We investigated worldwide phylogenetic relationships of deep-sea Bathymodiolus mussels and their mytilid relatives by analyzing nucleotide sequences of the mitochondrial cytochrome c oxidase subunit I (COI) and NADH dehydrogenase subunit 4 (ND4) genes. Phylogenetic analysis of the concatenated sequence data showed that mussels of the subfamily Bathymodiolinae from vents and seeps were divided into four groups, and that mussels of the subfamily Modiolinae from sunken wood and whale carcasses assumed the outgroup position and shallow-water modioline mussels were positioned more distantly to the bathymodioline mussels. We provisionally hypothesized the evolutionary history of Bathymodilolus mussels by estimating evolutionary time under a relaxed molecular clock model. Diversification of bathymodioline mussels was initiated in the early Miocene, and subsequently diversification of the groups occurred in the early to middle Miocene. The phylogenetic relationships support the "Evolutionary stepping stone hypothesis," in which mytilid ancestors exploited sunken wood and whale carcasses in their progressive adaptation to deep-sea environments. This hypothesis is also supported by the evolutionary transition of symbiosis in that nutritional adaptation to the deep sea proceeded from extracellular to intracellular symbiotic states in whale carcasses. The estimated evolutionary time suggests that the mytilid ancestors were able to exploit whales during adaptation to the deep sea.
The green ash transcriptome and identification of genes responding to abiotic and biotic stresses
Thomas Lane; Teodora Best; Nicole Zembower; Jack Davitt; Nathan Henry; Yi Xu; Jennifer Koch; Haiying Liang; John McGraw; Stephan Schuster; Donghwan Shim; Mark V. Coggeshall; John E. Carlson; Margaret E. Staton
2016-01-01
Background: To develop a set of transcriptome sequences to support research on environmental stress responses in green ash (Fraxinus pennsylvanica), we undertook deep RNA sequencing of green ash tissues under various stress treatments. The treatments, including emerald ash borer (EAB) feeding, heat, drought, cold and ozone, were selected to mimic...
Woo, Hannah L.; O’Dell, Kaela B.; Utturkar, Sagar; McBride, Kathryn R.; Huntemann, Marcel; Clum, Alicia; Pillay, Manoj; Palaniappan, Krishnaveni; Varghese, Neha; Mikhailova, Natalia; Stamatis, Dimitrios; Reddy, T. B. K.; Ngan, Chew Yee; Daum, Chris; Shapiro, Nicole; Markowitz, Victor; Ivanova, Natalia; Kyrpides, Nikos; Woyke, Tanja; Brown, Steven D.
2016-01-01
Thalassospira sp. strain KO164 was isolated from eastern Mediterranean seawater and sediment laboratory microcosms enriched on insoluble organosolv lignin under oxic conditions. The near-complete genome sequence presented here will facilitate analyses into this deep-ocean bacterium’s ability to degrade recalcitrant organics such as lignin. PMID:27881538
Woo, Hannah L.; O’Dell, Kaela B.; Utturkar, Sagar; ...
2016-11-23
We isolated Thalassospirasp. strain KO164 from eastern Mediterranean seawater and sediment laboratory microcosms enriched on insoluble organosolv lignin under oxic conditions. Furthermore, an analysis of the deep-ocean bacterium’s ability to degrade recalcitrant organics such as lignin near-complete genome sequence, will be presented here.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Woo, Hannah L.; O’Dell, Kaela B.; Utturkar, Sagar
We isolated Thalassospirasp. strain KO164 from eastern Mediterranean seawater and sediment laboratory microcosms enriched on insoluble organosolv lignin under oxic conditions. Furthermore, an analysis of the deep-ocean bacterium’s ability to degrade recalcitrant organics such as lignin near-complete genome sequence, will be presented here.
Draft Genome Sequence of the Spore-Forming Probiotic Strain Bacillus coagulans Unique IS-2
Upadrasta, Aditya; Pitta, Swetha
2016-01-01
Bacillus coagulans Unique IS-2 is a potential spore-forming probiotic that is commercially available on the market. The draft genome sequence presented here provides deep insight into the beneficial features of this strain for its safe use as a probiotic for various human and animal health applications. PMID:27103709
Deep Sequencing Reveals the Complete Genome Sequence of Sweet potato virus G from East Timor
Maina, Solomon; Edwards, Owain R.; Barbetti, Martin J.; de Almeida, Luis; Ximenes, Abel
2016-01-01
We present the first complete Sweet potato virus G (SPVG) genome from sweet potato in East Timor and compare it with seven complete SPVG genomes from South Korea (three), Taiwan (two), Argentina (one), and the United States (one). It most resembles the genomes from the United States and South Korea. PMID:27609925
Shahinas, Dea; Silverman, Michael; Sittler, Taylor; Chiu, Charles; Kim, Peter; Allen-Vercoe, Emma; Weese, Scott; Wong, Andrew; Low, Donald E.; Pillai, Dylan R.
2012-01-01
ABSTRACT Fecal microbiome transplantation by low-volume enema is an effective, safe, and inexpensive alternative to antibiotic therapy for patients with chronic relapsing Clostridium difficile infection (CDI). We explored the microbial diversity of pre- and posttransplant stool specimens from CDI patients (n = 6) using deep sequencing of the 16S rRNA gene. While interindividual variability in microbiota change occurs with fecal transplantation and vancomycin exposure, in this pilot study we note that clinical cure of CDI is associated with an increase in diversity and richness. Genus- and species-level analysis may reveal a cocktail of microorganisms or products thereof that will ultimately be used as a probiotic to treat CDI. PMID:23093385
NASA Astrophysics Data System (ADS)
Cheong, D.; Kim, D.; Kim, Y.
2010-12-01
The block 6-1 located in the southwestern margin of the Ulleung basin, East Sea (Sea of Japan) is an area where recently produces commercial natural gas and condensate. A total of 17 exploratory wells have been drilled, and also many seismic explorations have been carried out since early 1970s. Among the wells and seismic sections, the Gorae 1 well and a seismic section through the Gorae 1-2 well were chosen for this simulation work. Then, a 2-D graphic simulation using SEDPAK elucidates the evolution, burial history and diagenesis of the sedimentary sequence. The study area is a suitable place for modeling a petroleum system and evaluating hydrocarbon potential of reservoir. Shale as a source rock is about 3500m deep from sea floor, and sandstones interbedded with thin mud layers are distributed as potential reservoir rocks from 3,500m to 2,000m deep. On top of that, shales cover as seal rocks and overburden rocks upto 900m deep. Input data(sea level, sediment supply, subsidence rate, etc) for the simulation was taken from several previous published papers including the well and seismic data, and the thermal maturity of the sediment was calculated from known thermal gradient data. In this study area, gas and condensate have been found and commercially produced, and the result of the simulation also shows that there is a gas window between 4000m and 6000m deep, so that three possible interpretations can be inferred from the simulation result. First, oil has already moved and gone to the southeastern area along uplifting zones. Or second, oil has never been generated because organic matter is kerogen type 3, and or finally, generated oil has been converted into gas by thermally overcooking. SEDPAK has an advantage that it provides the timing and depth information of generated oil and gas with TTI values even though it has a limit which itself can not perform geochemical modeling to analyze thermal maturity level of source rocks. Based on the result of our simulation, added exploratory wells are required to discover deeper gas located in the study area.
Mason, Olivia U; Hazen, Terry C; Borglin, Sharon; Chain, Patrick S G; Dubinsky, Eric A; Fortney, Julian L; Han, James; Holman, Hoi-Ying N; Hultman, Jenni; Lamendella, Regina; Mackelprang, Rachel; Malfatti, Stephanie; Tom, Lauren M; Tringe, Susannah G; Woyke, Tanja; Zhou, Jizhong; Rubin, Edward M; Jansson, Janet K
2012-09-01
The Deepwater Horizon oil spill in the Gulf of Mexico resulted in a deep-sea hydrocarbon plume that caused a shift in the indigenous microbial community composition with unknown ecological consequences. Early in the spill history, a bloom of uncultured, thus uncharacterized, members of the Oceanospirillales was previously detected, but their role in oil disposition was unknown. Here our aim was to determine the functional role of the Oceanospirillales and other active members of the indigenous microbial community using deep sequencing of community DNA and RNA, as well as single-cell genomics. Shotgun metagenomic and metatranscriptomic sequencing revealed that genes for motility, chemotaxis and aliphatic hydrocarbon degradation were significantly enriched and expressed in the hydrocarbon plume samples compared with uncontaminated seawater collected from plume depth. In contrast, although genes coding for degradation of more recalcitrant compounds, such as benzene, toluene, ethylbenzene, total xylenes and polycyclic aromatic hydrocarbons, were identified in the metagenomes, they were expressed at low levels, or not at all based on analysis of the metatranscriptomes. Isolation and sequencing of two Oceanospirillales single cells revealed that both cells possessed genes coding for n-alkane and cycloalkane degradation. Specifically, the near-complete pathway for cyclohexane oxidation in the Oceanospirillales single cells was elucidated and supported by both metagenome and metatranscriptome data. The draft genome also included genes for chemotaxis, motility and nutrient acquisition strategies that were also identified in the metagenomes and metatranscriptomes. These data point towards a rapid response of members of the Oceanospirillales to aliphatic hydrocarbons in the deep sea.
Han, Yucui; Lv, Peng; Hou, Shenglin; Li, Suying; Ji, Guisu; Ma, Xue; Du, Ruiheng; Liu, Guoqing
2015-01-01
Sorghum is one of the most promising bioenergy crops. Stem juice yield, together with stem sugar concentration, determines sugar yield in sweet sorghum. Bulked segregant analysis (BSA) is a gene mapping technique for identifying genomic regions containing genetic loci affecting a trait of interest that when combined with deep sequencing could effectively accelerate the gene mapping process. In this study, a dry stem sorghum landrace was characterized and the stem water controlling locus, qSW6, was fine mapped using QTL analysis and the combined BSA and deep sequencing technologies. Results showed that: (i) In sorghum variety Jiliang 2, stem water content was around 80% before flowering stage. It dropped to 75% during grain filling with little difference between different internodes. In landrace G21, stem water content keeps dropping after the flag leaf stage. The drop from 71% at flowering time progressed to 60% at grain filling time. Large differences exist between different internodes with the lowest (51%) at the 7th and 8th internodes at dough stage. (ii) A quantitative trait locus (QTL) controlling stem water content mapped on chromosome 6 between SSR markers Ch6-2 and gpsb069 explained about 34.7-56.9% of the phenotypic variation for the 5th to 10th internodes, respectively. (iii) BSA and deep sequencing analysis narrowed the associated region to 339 kb containing 38 putative genes. The results could help reveal molecular mechanisms underlying juice yield of sorghum and thus to improve total sugar yield.
Lasorsa, Vito Alessandro; Formicola, Daniela; Pignataro, Piero; Cimmino, Flora; Calabrese, Francesco Maria; Mora, Jaume; Esposito, Maria Rosaria; Pantile, Marcella; Zanon, Carlo; De Mariano, Marilena; Longo, Luca; Hogarty, Michael D.; de Torres, Carmen; Tonini, Gian Paolo; Iolascon, Achille; Capasso, Mario
2016-01-01
The spectrum of somatic mutation of the most aggressive forms of neuroblastoma is not completely determined. We sought to identify potential cancer drivers in clinically aggressive neuroblastoma. Whole exome sequencing was conducted on 17 germline and tumor DNA samples from high-risk patients with adverse events within 36 months from diagnosis (HR-Event3) to identify somatic mutations and deep targeted sequencing of 134 genes selected from the initial screening in additional 48 germline and tumor pairs (62.5% HR-Event3 and high-risk patients), 17 HR-Event3 tumors and 17 human-derived neuroblastoma cell lines. We revealed 22 significantly mutated genes, many of which implicated in cancer progression. Fifteen genes (68.2%) were highly expressed in neuroblastoma supporting their involvement in the disease. CHD9, a cancer driver gene, was the most significantly altered (4.0% of cases) after ALK. Other genes (PTK2, NAV3, NAV1, FZD1 and ATRX), expressed in neuroblastoma and involved in cell invasion and migration were mutated at frequency ranged from 4% to 2%. Focal adhesion and regulation of actin cytoskeleton pathways, were frequently disrupted (14.1% of cases) thus suggesting potential novel therapeutic strategies to prevent disease progression. Notably BARD1, CHEK2 and AXIN2 were enriched in rare, potentially pathogenic, germline variants. In summary, whole exome and deep targeted sequencing identified novel cancer genes of clinically aggressive neuroblastoma. Our analyses show pathway-level implications of infrequently mutated genes in leading neuroblastoma progression. PMID:27009842
Lasorsa, Vito Alessandro; Formicola, Daniela; Pignataro, Piero; Cimmino, Flora; Calabrese, Francesco Maria; Mora, Jaume; Esposito, Maria Rosaria; Pantile, Marcella; Zanon, Carlo; De Mariano, Marilena; Longo, Luca; Hogarty, Michael D; de Torres, Carmen; Tonini, Gian Paolo; Iolascon, Achille; Capasso, Mario
2016-04-19
The spectrum of somatic mutation of the most aggressive forms of neuroblastoma is not completely determined. We sought to identify potential cancer drivers in clinically aggressive neuroblastoma.Whole exome sequencing was conducted on 17 germline and tumor DNA samples from high-risk patients with adverse events within 36 months from diagnosis (HR-Event3) to identify somatic mutations and deep targeted sequencing of 134 genes selected from the initial screening in additional 48 germline and tumor pairs (62.5% HR-Event3 and high-risk patients), 17 HR-Event3 tumors and 17 human-derived neuroblastoma cell lines.We revealed 22 significantly mutated genes, many of which implicated in cancer progression. Fifteen genes (68.2%) were highly expressed in neuroblastoma supporting their involvement in the disease. CHD9, a cancer driver gene, was the most significantly altered (4.0% of cases) after ALK.Other genes (PTK2, NAV3, NAV1, FZD1 and ATRX), expressed in neuroblastoma and involved in cell invasion and migration were mutated at frequency ranged from 4% to 2%.Focal adhesion and regulation of actin cytoskeleton pathways, were frequently disrupted (14.1% of cases) thus suggesting potential novel therapeutic strategies to prevent disease progression.Notably BARD1, CHEK2 and AXIN2 were enriched in rare, potentially pathogenic, germline variants.In summary, whole exome and deep targeted sequencing identified novel cancer genes of clinically aggressive neuroblastoma. Our analyses show pathway-level implications of infrequently mutated genes in leading neuroblastoma progression.
Beyond Bacteria: A Study of the Enteric Microbial Consortium in Extremely Low Birth Weight Infants
Cotton, Charles Michael; Goldberg, Ronald N.; Wynn, James L.; Jackson, Robert B.; Seed, Patrick C.
2011-01-01
Extremely low birth weight (ELBW) infants have high morbidity and mortality, frequently due to invasive infections from bacteria, fungi, and viruses. The microbial communities present in the gastrointestinal tracts of preterm infants may serve as a reservoir for invasive organisms and remain poorly characterized. We used deep pyrosequencing to examine the gut-associated microbiome of 11 ELBW infants in the first postnatal month, with a first time determination of the eukaryote microbiota such as fungi and nematodes, including bacteria and viruses that have not been previously described. Among the fungi observed, Candida sp. and Clavispora sp. dominated the sequences, but a range of environmental molds were also observed. Surprisingly, seventy-one percent of the infant fecal samples tested contained ribosomal sequences corresponding to the parasitic organism Trichinella. Ribosomal DNA sequences for the roundworm symbiont Xenorhabdus accompanied these sequences in the infant with the greatest proportion of Trichinella sequences. When examining ribosomal DNA sequences in aggregate, Enterobacteriales, Pseudomonas, Staphylococcus, and Enterococcus were the most abundant bacterial taxa in a low diversity bacterial community (mean Shannon-Weaver Index of 1.02±0.69), with relatively little change within individual infants through time. To supplement the ribosomal sequence data, shotgun sequencing was performed on DNA from multiple displacement amplification (MDA) of total fecal genomic DNA from two infants. In addition to the organisms mentioned previously, the metagenome also revealed sequences for gram positive and gram negative bacteriophages, as well as human adenovirus C. Together, these data reveal surprising eukaryotic and viral microbial diversity in ELBW enteric microbiota dominated bytypes of bacteria known to cause invasive disease in these infants. PMID:22174751
Using pyrosequencing to shed light on deep mine microbial ecology
Edwards, Robert A; Rodriguez-Brito, Beltran; Wegley, Linda; Haynes, Matthew; Breitbart, Mya; Peterson, Dean M; Saar, Martin O; Alexander, Scott; Alexander, E Calvin; Rohwer, Forest
2006-01-01
Background Contrasting biological, chemical and hydrogeological analyses highlights the fundamental processes that shape different environments. Generating and interpreting the biological sequence data was a costly and time-consuming process in defining an environment. Here we have used pyrosequencing, a rapid and relatively inexpensive sequencing technology, to generate environmental genome sequences from two sites in the Soudan Mine, Minnesota, USA. These sites were adjacent to each other, but differed significantly in chemistry and hydrogeology. Results Comparisons of the microbes and the subsystems identified in the two samples highlighted important differences in metabolic potential in each environment. The microbes were performing distinct biochemistry on the available substrates, and subsystems such as carbon utilization, iron acquisition mechanisms, nitrogen assimilation, and respiratory pathways separated the two communities. Although the correlation between much of the microbial metabolism occurring and the geochemical conditions from which the samples were isolated could be explained, the reason for the presence of many pathways in these environments remains to be determined. Despite being physically close, these two communities were markedly different from each other. In addition, the communities were also completely different from other microbial communities sequenced to date. Conclusion We anticipate that pyrosequencing will be widely used to sequence environmental samples because of the speed, cost, and technical advantages. Furthermore, subsystem comparisons rapidly identify the important metabolisms employed by the microbes in different environments. PMID:16549033
Soshi, Takahiro; Nakajima, Heizo; Hagiwara, Hiroko
2016-10-01
Static knowledge about the grammar of a natural language is represented in the cortico-subcortical system. However, the differences in dynamic verbal processing under different cognitive conditions are unclear. To clarify this, we conducted an electrophysiological experiment involving a semantic priming paradigm in which semantically congruent or incongruent word sequences (prime nouns-target verbs) were randomly presented. We examined the event-related brain potentials that occurred in response to congruent and incongruent target words that were preceded by primes with or without grammatical case markers. The two participant groups performed either the shallow (lexical judgment) or deep (direct semantic judgment) semantic tasks. We hypothesized that, irrespective of the case markers, the congruent targets would reduce centro-posterior N400 activities under the deep semantic condition, which induces selective attention to the semantic relatedness of content words. However, the same congruent targets with correct case markers would reduce lateralized negativity under the shallow semantic condition because grammatical case markers are related to automatic structural integration under semantically unattended conditions. We observed that congruent targets (e.g., 'open') that were preceded by primes with congruent case markers (e.g., 'shutter-object case') reduced lateralized negativity under the shallow semantic condition. In contrast, congruent targets, irrespective of case markers, consistently yielded N400 reductions under the deep semantic condition. To summarize, human neural verbal processing differed in response to the same grammatical markers in the same verbal expressions under semantically attended or unattended conditions.
Gillotin, Sébastien; Guillemot, François
2016-06-20
Chromatin immunoprecipitation followed by deep sequencing (ChIP-Seq) is an important strategy to study gene regulation. When availability of cells is limited, however, it can be useful to focus on specific genes to investigate in depth the role of transcription factors or histone marks. Unfortunately, performing ChIP experiments to study transcription factors' binding to DNA can be difficult when biological material is restricted. This protocol describes a robust method to perform μChIP for over-expressed or endogenous transcription factors using ~100,000 cells per ChIP experiment (Masserdotti et al ., 2015). We also describe optimization steps, which we think are critical for this protocol to work and which can be used to further reduce the number of cells.
Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology.
Otto, Thomas D; Sanders, Mandy; Berriman, Matthew; Newbold, Chris
2010-07-15
The accuracy of reference genomes is important for downstream analysis but a low error rate requires expensive manual interrogation of the sequence. Here, we describe a novel algorithm (Iterative Correction of Reference Nucleotides) that iteratively aligns deep coverage of short sequencing reads to correct errors in reference genome sequences and evaluate their accuracy. Using Plasmodium falciparum (81% A + T content) as an extreme example, we show that the algorithm is highly accurate and corrects over 2000 errors in the reference sequence. We give examples of its application to numerous other eukaryotic and prokaryotic genomes and suggest additional applications. The software is available at http://icorn.sourceforge.net
Deep learning on temporal-spectral data for anomaly detection
NASA Astrophysics Data System (ADS)
Ma, King; Leung, Henry; Jalilian, Ehsan; Huang, Daniel
2017-05-01
Detecting anomalies is important for continuous monitoring of sensor systems. One significant challenge is to use sensor data and autonomously detect changes that cause different conditions to occur. Using deep learning methods, we are able to monitor and detect changes as a result of some disturbance in the system. We utilize deep neural networks for sequence analysis of time series. We use a multi-step method for anomaly detection. We train the network to learn spectral and temporal features from the acoustic time series. We test our method using fiber-optic acoustic data from a pipeline.
Zhou, Qing; Aksentijevich, Ivona; Wood, Geryl M; Walts, Avram D; Hoffmann, Patrycja; Remmers, Elaine F; Kastner, Daniel L; Ombrello, Amanda K
2015-09-01
To identify the cause of disease in an adult patient presenting with recent-onset fevers, chills, urticaria, fatigue, and profound myalgia, who was found to be negative for cryopyrin-associated periodic syndrome (CAPS) NLRP3 mutations by conventional Sanger DNA sequencing. We performed whole-exome sequencing and targeted deep sequencing using DNA from the patient's whole blood to identify a possible NLRP3 somatic mutation. We then screened for this mutation in subcloned NLRP3 amplicons from fibroblasts, buccal cells, granulocytes, negatively selected monocytes, and T and B lymphocytes and further confirmed the somatic mutation by targeted sequencing of exon 3. We identified a previously reported CAPS-associated mutation, p.Tyr570Cys, with a mutant allele frequency of 15% based on exome data. Targeted sequencing and subcloning of NLRP3 amplicons confirmed the presence of the somatic mutation in whole blood at a ratio similar to the exome data. The mutant allele frequency was in the range of 13.3-16.8% in monocytes and 15.2-18% in granulocytes. Notably, this mutation was either absent or present at a very low frequency in B and T lymphocytes, in buccal cells, and in the patient's cultured fibroblasts. Our findings indicate the possibility of myeloid-restricted somatic mosaicism in the pathogenesis of CAPS, underscoring the emerging role of massively parallel sequencing in clinical diagnosis. Published 2015. This article is a U.S. Government work and is in the public domain in the USA.
Precise genotyping and recombination detection of Enterovirus
2015-01-01
Enteroviruses (EV) with different genotypes cause diverse infectious diseases in humans and mammals. A correct EV typing result is crucial for effective medical treatment and disease control; however, the emergence of novel viral strains has impaired the performance of available diagnostic tools. Here, we present a web-based tool, named EVIDENCE (EnteroVirus In DEep conception, http://symbiont.iis.sinica.edu.tw/evidence), for EV genotyping and recombination detection. We introduce the idea of using mixed-ranking scores to evaluate the fitness of prototypes based on relatedness and on the genome regions of interest. Using phylogenetic methods, the most possible genotype is determined based on the closest neighbor among the selected references. To detect possible recombination events, EVIDENCE calculates the sequence distance and phylogenetic relationship among sequences of all sliding windows scanning over the whole genome. Detected recombination events are plotted in an interactive figure for viewing of fine details. In addition, all EV sequences available in GenBank were collected and revised using the latest classification and nomenclature of EV in EVIDENCE. These sequences are built into the database and are retrieved in an indexed catalog, or can be searched for by keywords or by sequence similarity. EVIDENCE is the first web-based tool containing pipelines for genotyping and recombination detection, with updated, built-in, and complete reference sequences to improve sensitivity and specificity. The use of EVIDENCE can accelerate genotype identification, aiding clinical diagnosis and enhancing our understanding of EV evolution. PMID:26678286
Zhang, Long; Jia, Lianyin; Ren, Yazhou
2017-01-01
Protein-protein interactions (PPIs) play crucial roles in almost all cellular processes. Although a large amount of PPIs have been verified by high-throughput techniques in the past decades, currently known PPIs pairs are still far from complete. Furthermore, the wet-lab experiments based techniques for detecting PPIs are time-consuming and expensive. Hence, it is urgent and essential to develop automatic computational methods to efficiently and accurately predict PPIs. In this paper, a sequence-based approach called DNN-LCTD is developed by combining deep neural networks (DNNs) and a novel local conjoint triad description (LCTD) feature representation. LCTD incorporates the advantage of local description and conjoint triad, thus, it is capable to account for the interactions between residues in both continuous and discontinuous regions of amino acid sequences. DNNs can not only learn suitable features from the data by themselves, but also learn and discover hierarchical representations of data. When performing on the PPIs data of Saccharomyces cerevisiae, DNN-LCTD achieves superior performance with accuracy as 93.12%, precision as 93.75%, sensitivity as 93.83%, area under the receiver operating characteristic curve (AUC) as 97.92%, and it only needs 718 s. These results indicate DNN-LCTD is very promising for predicting PPIs. DNN-LCTD can be a useful supplementary tool for future proteomics study. PMID:29117139
Wang, Jun; Zhang, Long; Jia, Lianyin; Ren, Yazhou; Yu, Guoxian
2017-11-08
Protein-protein interactions (PPIs) play crucial roles in almost all cellular processes. Although a large amount of PPIs have been verified by high-throughput techniques in the past decades, currently known PPIs pairs are still far from complete. Furthermore, the wet-lab experiments based techniques for detecting PPIs are time-consuming and expensive. Hence, it is urgent and essential to develop automatic computational methods to efficiently and accurately predict PPIs. In this paper, a sequence-based approach called DNN-LCTD is developed by combining deep neural networks (DNNs) and a novel local conjoint triad description (LCTD) feature representation. LCTD incorporates the advantage of local description and conjoint triad, thus, it is capable to account for the interactions between residues in both continuous and discontinuous regions of amino acid sequences. DNNs can not only learn suitable features from the data by themselves, but also learn and discover hierarchical representations of data. When performing on the PPIs data of Saccharomyces cerevisiae , DNN-LCTD achieves superior performance with accuracy as 93.12%, precision as 93.75%, sensitivity as 93.83%, area under the receiver operating characteristic curve (AUC) as 97.92%, and it only needs 718 s. These results indicate DNN-LCTD is very promising for predicting PPIs. DNN-LCTD can be a useful supplementary tool for future proteomics study.
Sivadas, A; Salleh, M Z; Teh, L K; Scaria, V
2017-10-01
Expanding the scope of pharmacogenomic research by including multiple global populations is integral to building robust evidence for its clinical translation. Deep whole-genome sequencing of diverse ethnic populations provides a unique opportunity to study rare and common pharmacogenomic markers that often vary in frequency across populations. In this study, we aim to build a diverse map of pharmacogenetic variants in South East Asian (SEA) Malay population using deep whole-genome sequences of 100 healthy SEA Malay individuals. We investigated the allelic diversity of potentially deleterious pharmacogenomic variants in SEA Malay population. Our analysis revealed 227 common and 466 rare potentially functional single nucleotide variants (SNVs) in 437 pharmacogenomic genes involved in drug metabolism, transport and target genes, including 74 novel variants. This study has created one of the most comprehensive maps of pharmacogenetic markers in any population from whole genomes and will hugely benefit pharmacogenomic investigations and drug dosage recommendations in SEA Malays.
Rapid Fine Conformational Epitope Mapping Using Comprehensive Mutagenesis and Deep Sequencing*
Kowalsky, Caitlin A.; Faber, Matthew S.; Nath, Aritro; Dann, Hailey E.; Kelly, Vince W.; Liu, Li; Shanker, Purva; Wagner, Ellen K.; Maynard, Jennifer A.; Chan, Christina; Whitehead, Timothy A.
2015-01-01
Knowledge of the fine location of neutralizing and non-neutralizing epitopes on human pathogens affords a better understanding of the structural basis of antibody efficacy, which will expedite rational design of vaccines, prophylactics, and therapeutics. However, full utilization of the wealth of information from single cell techniques and antibody repertoire sequencing awaits the development of a high throughput, inexpensive method to map the conformational epitopes for antibody-antigen interactions. Here we show such an approach that combines comprehensive mutagenesis, cell surface display, and DNA deep sequencing. We develop analytical equations to identify epitope positions and show the method effectiveness by mapping the fine epitope for different antibodies targeting TNF, pertussis toxin, and the cancer target TROP2. In all three cases, the experimentally determined conformational epitope was consistent with previous experimental datasets, confirming the reliability of the experimental pipeline. Once the comprehensive library is generated, fine conformational epitope maps can be prepared at a rate of four per day. PMID:26296891
Tracking the origins and drivers of subclonal metastatic expansion in prostate cancer
Hong, Matthew K. H.; Macintyre, Geoff; Wedge, David C.; ...
2015-04-01
Tumour heterogeneity in primary prostate cancer is a well-established phenomenon. However, how the subclonal diversity of tumours changes during metastasis and progression to lethality is poorly understood. Here we reveal the precise direction of metastatic spread across four lethal prostate cancer patients using whole-genome and ultra-deep targeted sequencing of longitudinally collected primary and metastatic tumours. We find one case of metastatic spread to the surgical bed causing local recurrence, and another case of cross-metastatic site seeding combining with dynamic remoulding of subclonal mixtures in response to therapy. By ultra-deep sequencing end-stage blood, we detect both metastatic and primary tumour clones,more » even years after removal of the prostate. As a result, analysis of mutations associated with metastasis reveals an enrichment of TP53 mutations, and additional sequencing of metastases from 19 patients demonstrates that acquisition of TP53 mutations is linked with the expansion of subclones with metastatic potential which we can detect in the blood.« less
NASA Technical Reports Server (NTRS)
Woese, C. R.; Achenbach, L.; Rouviere, P.; Mandelco, L.
1991-01-01
A major and too little recognized source of artifact in phylogenetic analysis of molecular sequence data is compositional difference among sequences. The problem becomes particularly acute when alignments contain ribosomal RNAs from both mesophilic and thermophilic species. Among prokaryotes the latter are considerably higher in G + C content than the former, which often results in artificial clustering of thermophilic lineages and their being placed artificially deep in phylogenetic trees. In this communication we review archaeal phylogeny in the light of this consideration, focusing in particular on the phylogenetic position of the sulfate reducing species Archaeoglobus fulgidus, using both 16S rRNA and 23S rRNA sequences. The analysis shows clearly that the previously reported deep branching of the A. fulgidus lineage (very near the base of the euryarchaeal side of the archaeal tree) is incorrect, and that the lineage actually groups with a previously recognized unit that comprises the Methanomicrobiales and extreme halophiles.
Tracking the origins and drivers of subclonal metastatic expansion in prostate cancer.
Hong, Matthew K H; Macintyre, Geoff; Wedge, David C; Van Loo, Peter; Patel, Keval; Lunke, Sebastian; Alexandrov, Ludmil B; Sloggett, Clare; Cmero, Marek; Marass, Francesco; Tsui, Dana; Mangiola, Stefano; Lonie, Andrew; Naeem, Haroon; Sapre, Nikhil; Phal, Pramit M; Kurganovs, Natalie; Chin, Xiaowen; Kerger, Michael; Warren, Anne Y; Neal, David; Gnanapragasam, Vincent; Rosenfeld, Nitzan; Pedersen, John S; Ryan, Andrew; Haviv, Izhak; Costello, Anthony J; Corcoran, Niall M; Hovens, Christopher M
2015-04-01
Tumour heterogeneity in primary prostate cancer is a well-established phenomenon. However, how the subclonal diversity of tumours changes during metastasis and progression to lethality is poorly understood. Here we reveal the precise direction of metastatic spread across four lethal prostate cancer patients using whole-genome and ultra-deep targeted sequencing of longitudinally collected primary and metastatic tumours. We find one case of metastatic spread to the surgical bed causing local recurrence, and another case of cross-metastatic site seeding combining with dynamic remoulding of subclonal mixtures in response to therapy. By ultra-deep sequencing end-stage blood, we detect both metastatic and primary tumour clones, even years after removal of the prostate. Analysis of mutations associated with metastasis reveals an enrichment of TP53 mutations, and additional sequencing of metastases from 19 patients demonstrates that acquisition of TP53 mutations is linked with the expansion of subclones with metastatic potential which we can detect in the blood.
Deep intronic GPR143 mutation in a Japanese family with ocular albinism
Naruto, Takuya; Okamoto, Nobuhiko; Masuda, Kiyoshi; Endo, Takao; Hatsukawa, Yoshikazu; Kohmoto, Tomohiro; Imoto, Issei
2015-01-01
Deep intronic mutations are often ignored as possible causes of human disease. Using whole-exome sequencing, we analysed genomic DNAs of a Japanese family with two male siblings affected by ocular albinism and congenital nystagmus. Although mutations or copy number alterations of coding regions were not identified in candidate genes, the novel intronic mutation c.659-131 T > G within GPR143 intron 5 was identified as hemizygous in affected siblings and as heterozygous in the unaffected mother. This mutation was predicted to create a cryptic splice donor site within intron 5 and activate a cryptic acceptor site at 41nt upstream, causing the insertion into the coding sequence of an out-of-frame 41-bp pseudoexon with a premature stop codon in the aberrant transcript, which was confirmed by minigene experiments. This result expands the mutational spectrum of GPR143 and suggests the utility of next-generation sequencing integrated with in silico and experimental analyses for improving the molecular diagnosis of this disease. PMID:26061757
Deep intronic GPR143 mutation in a Japanese family with ocular albinism.
Naruto, Takuya; Okamoto, Nobuhiko; Masuda, Kiyoshi; Endo, Takao; Hatsukawa, Yoshikazu; Kohmoto, Tomohiro; Imoto, Issei
2015-06-10
Deep intronic mutations are often ignored as possible causes of human disease. Using whole-exome sequencing, we analysed genomic DNAs of a Japanese family with two male siblings affected by ocular albinism and congenital nystagmus. Although mutations or copy number alterations of coding regions were not identified in candidate genes, the novel intronic mutation c.659-131 T > G within GPR143 intron 5 was identified as hemizygous in affected siblings and as heterozygous in the unaffected mother. This mutation was predicted to create a cryptic splice donor site within intron 5 and activate a cryptic acceptor site at 41nt upstream, causing the insertion into the coding sequence of an out-of-frame 41-bp pseudoexon with a premature stop codon in the aberrant transcript, which was confirmed by minigene experiments. This result expands the mutational spectrum of GPR143 and suggests the utility of next-generation sequencing integrated with in silico and experimental analyses for improving the molecular diagnosis of this disease.
Protein model discrimination using mutational sensitivity derived from deep sequencing.
Adkar, Bharat V; Tripathi, Arti; Sahoo, Anusmita; Bajaj, Kanika; Goswami, Devrishi; Chakrabarti, Purbani; Swarnkar, Mohit K; Gokhale, Rajesh S; Varadarajan, Raghavan
2012-02-08
A major bottleneck in protein structure prediction is the selection of correct models from a pool of decoys. Relative activities of ∼1,200 individual single-site mutants in a saturation library of the bacterial toxin CcdB were estimated by determining their relative populations using deep sequencing. This phenotypic information was used to define an empirical score for each residue (RankScore), which correlated with the residue depth, and identify active-site residues. Using these correlations, ∼98% of correct models of CcdB (RMSD ≤ 4Å) were identified from a large set of decoys. The model-discrimination methodology was further validated on eleven different monomeric proteins using simulated RankScore values. The methodology is also a rapid, accurate way to obtain relative activities of each mutant in a large pool and derive sequence-structure-function relationships without protein isolation or characterization. It can be applied to any system in which mutational effects can be monitored by a phenotypic readout. Copyright © 2012 Elsevier Ltd. All rights reserved.
UCSC genome browser: deep support for molecular biomedical research.
Mangan, Mary E; Williams, Jennifer M; Lathe, Scott M; Karolchik, Donna; Lathe, Warren C
2008-01-01
The volume and complexity of genomic sequence data, and the additional experimental data required for annotation of the genomic context, pose a major challenge for display and access for biomedical researchers. Genome browsers organize this data and make it available in various ways to extract useful information to advance research projects. The UCSC Genome Browser is one of these resources. The official sequence data for a given species forms the framework to display many other types of data such as expression, variation, cross-species comparisons, and more. Visual representations of the data are available for exploration. Data can be queried with sequences. Complex database queries are also easily achieved with the Table Browser interface. Associated tools permit additional query types or access to additional data sources such as images of in situ localizations. Support for solving researcher's issues is provided with active discussion mailing lists and by providing updated training materials. The UCSC Genome Browser provides a source of deep support for a wide range of biomedical molecular research (http://genome.ucsc.edu).
Deep-Earth reactor: nuclear fission, helium, and the geomagnetic field.
Hollenbach, D F; Herndon, J M
2001-09-25
Geomagnetic field reversals and changes in intensity are understandable from an energy standpoint as natural consequences of intermittent and/or variable nuclear fission chain reactions deep within the Earth. Moreover, deep-Earth production of helium, having (3)He/(4)He ratios within the range observed from deep-mantle sources, is demonstrated to be a consequence of nuclear fission. Numerical simulations of a planetary-scale geo-reactor were made by using the SCALE sequence of codes. The results clearly demonstrate that such a geo-reactor (i) would function as a fast-neutron fuel breeder reactor; (ii) could, under appropriate conditions, operate over the entire period of geologic time; and (iii) would function in such a manner as to yield variable and/or intermittent output power.
Bromberg, Yana; Yachdav, Guy; Ofran, Yanay; Schneider, Reinhard; Rost, Burkhard
2009-05-01
The rapidly increasing quantity of protein sequence data continues to widen the gap between available sequences and annotations. Comparative modeling suggests some aspects of the 3D structures of approximately half of all known proteins; homology- and network-based inferences annotate some aspect of function for a similar fraction of the proteome. For most known protein sequences, however, there is detailed knowledge about neither their function nor their structure. Comprehensive efforts towards the expert curation of sequence annotations have failed to meet the demand of the rapidly increasing number of available sequences. Only the automated prediction of protein function in the absence of homology can close the gap between available sequences and annotations in the foreseeable future. This review focuses on two novel methods for automated annotation, and briefly presents an outlook on how modern web software may revolutionize the field of protein sequence annotation. First, predictions of protein binding sites and functional hotspots, and the evolution of these into the most successful type of prediction of protein function from sequence will be discussed. Second, a new tool, comprehensive in silico mutagenesis, which contributes important novel predictions of function and at the same time prepares for the onset of the next sequencing revolution, will be described. While these two new sub-fields of protein prediction represent the breakthroughs that have been achieved methodologically, it will then be argued that a different development might further change the way biomedical researchers benefit from annotations: modern web software can connect the worldwide web in any browser with the 'Deep Web' (ie, proprietary data resources). The availability of this direct connection, and the resulting access to a wealth of data, may impact drug discovery and development more than any existing method that contributes to protein annotation.
Water mass dynamics shape Ross Sea protist communities in mesopelagic and bathypelagic layers
NASA Astrophysics Data System (ADS)
Zoccarato, Luca; Pallavicini, Alberto; Cerino, Federica; Fonda Umani, Serena; Celussi, Mauro
2016-12-01
Deep-sea environments host the largest pool of microbes and represent the last largely unexplored and poorly known ecosystems on Earth. The Ross Sea is characterized by unique oceanographic dynamics and harbors several water masses deeply involved in cooling and ventilation of deep oceans. In this study the V9 region of the 18S rDNA was targeted and sequenced with the Ion Torrent high-throughput sequencing technology to unveil differences in protist communities (>2 μm) correlated with biogeochemical properties of the water masses. The analyzed samples were significantly different in terms of environmental parameters and community composition outlining significant structuring effects of temperature and salinity. Overall, Alveolata (especially Dinophyta), Stramenopiles and Excavata groups dominated mesopelagic and bathypelagic layers, and protist communities were shaped according to the biogeochemistry of the water masses (advection effect and mixing events). Newly-formed High Salinity Shelf Water (HSSW) was characterized by high relative abundance of phototrophic organisms that bloom at the surface during the austral summer. Oxygen-depleted Circumpolar Deep Water (CDW) showed higher abundance of Excavata, common bacterivores in deep water masses. At the shelf-break, Antarctic Bottom Water (AABW), formed by the entrainment of shelf waters in CDW, maintained the eukaryotic genetic signature typical of both parental water masses.
Röthig, Till; Yum, Lauren K.; Kremb, Stephan G.; Roik, Anna; Voolstra, Christian R.
2017-01-01
Microbes associated with deep-sea corals remain poorly studied. The lack of symbiotic algae suggests that associated microbes may play a fundamental role in maintaining a viable coral host via acquisition and recycling of nutrients. Here we employed 16 S rRNA gene sequencing to study bacterial communities of three deep-sea scleractinian corals from the Red Sea, Dendrophyllia sp., Eguchipsammia fistula, and Rhizotrochus typus. We found diverse, species-specific microbiomes, distinct from the surrounding seawater. Microbiomes were comprised of few abundant bacteria, which constituted the majority of sequences (up to 58% depending on the coral species). In addition, we found a high diversity of rare bacteria (taxa at <1% abundance comprised >90% of all bacteria). Interestingly, we identified anaerobic bacteria, potentially providing metabolic functions at low oxygen conditions, as well as bacteria harboring the potential to degrade crude oil components. Considering the presence of oil and gas fields in the Red Sea, these bacteria may unlock this carbon source for the coral host. In conclusion, the prevailing environmental conditions of the deep Red Sea (>20 °C, <2 mg oxygen L−1) may require distinct functional adaptations, and our data suggest that bacterial communities may contribute to coral functioning in this challenging environment. PMID:28303925
Röthig, Till; Yum, Lauren K; Kremb, Stephan G; Roik, Anna; Voolstra, Christian R
2017-03-17
Microbes associated with deep-sea corals remain poorly studied. The lack of symbiotic algae suggests that associated microbes may play a fundamental role in maintaining a viable coral host via acquisition and recycling of nutrients. Here we employed 16 S rRNA gene sequencing to study bacterial communities of three deep-sea scleractinian corals from the Red Sea, Dendrophyllia sp., Eguchipsammia fistula, and Rhizotrochus typus. We found diverse, species-specific microbiomes, distinct from the surrounding seawater. Microbiomes were comprised of few abundant bacteria, which constituted the majority of sequences (up to 58% depending on the coral species). In addition, we found a high diversity of rare bacteria (taxa at <1% abundance comprised >90% of all bacteria). Interestingly, we identified anaerobic bacteria, potentially providing metabolic functions at low oxygen conditions, as well as bacteria harboring the potential to degrade crude oil components. Considering the presence of oil and gas fields in the Red Sea, these bacteria may unlock this carbon source for the coral host. In conclusion, the prevailing environmental conditions of the deep Red Sea (>20 °C, <2 mg oxygen L -1 ) may require distinct functional adaptations, and our data suggest that bacterial communities may contribute to coral functioning in this challenging environment.
NASA Astrophysics Data System (ADS)
Zhang, Likui; Kang, Manyu; Xu, Jiajun; Xu, Jian; Shuai, Yinjie; Zhou, Xiaojian; Yang, Zhihui; Ma, Kesen
2016-05-01
Active deep-sea hydrothermal vents harbor abundant thermophilic and hyperthermophilic microorganisms. However, microbial communities in inactive hydrothermal vents have not been well documented. Here, we investigated bacterial and archaeal communities in the two deep-sea sediments (named as TVG4 and TVG11) collected from inactive hydrothermal vents in the Southwest India Ridge using the high-throughput sequencing technology of Illumina MiSeq2500 platform. Based on the V4 region of 16S rRNA gene, sequence analysis showed that bacterial communities in the two samples were dominated by Proteobacteria, followed by Bacteroidetes, Actinobacteria and Firmicutes. Furthermore, archaeal communities in the two samples were dominated by Thaumarchaeota and Euryarchaeota. Comparative analysis showed that (i) TVG4 displayed the higher bacterial richness and lower archaeal richness than TVG11; (ii) the two samples had more divergence in archaeal communities than bacterial communities. Bacteria and archaea that are potentially associated with nitrogen, sulfur metal and methane cycling were detected in the two samples. Overall, we first provided a comparative picture of bacterial and archaeal communities and revealed their potentially ecological roles in the deep-sea environments of inactive hydrothermal vents in the Southwest Indian Ridge, augmenting microbial communities in inactive hydrothermal vents.
Identification of a novel MYO7A mutation in Usher syndrome type 1.
Cheng, Ling; Yu, Hongsong; Jiang, Yan; He, Juan; Pu, Sisi; Li, Xin; Zhang, Li
2018-01-05
Usher syndrome (USH) is an autosomal recessive disease characterized by deafness and retinitis pigmentosa. In view of the high phenotypic and genetic heterogeneity in USH, performing genetic screening with traditional methods is impractical. In the present study, we carried out targeted next-generation sequencing (NGS) to uncover the underlying gene in an USH family (2 USH patients and 15 unaffected relatives). One hundred and thirty-five genes associated with inherited retinal degeneration were selected for deep exome sequencing. Subsequently, variant analysis, Sanger validation and segregation tests were utilized to identify the disease-causing mutations in this family. All affected individuals had a classic USH type I (USH1) phenotype which included deafness, vestibular dysfunction and retinitis pigmentosa. Targeted NGS and Sanger sequencing validation suggested that USH1 patients carried an unreported splice site mutation, c.5168+1G>A, as a compound heterozygous mutation with c.6070C>T (p.R2024X) in the MYO7A gene. A functional study revealed decreased expression of the MYO7A gene in the individuals carrying heterozygous mutations. In conclusion, targeted next-generation sequencing provided a comprehensive and efficient diagnosis for USH1. This study revealed the genetic defects in the MYO7A gene and expanded the spectrum of clinical phenotypes associated with USH1 mutations.
Horner, David S; Lefkimmiatis, Konstantinos; Reyes, Aurelio; Gissi, Carmela; Saccone, Cecilia; Pesole, Graziano
2007-01-01
Background Phylogenetic relationships between Lagomorpha, Rodentia and Primates and their allies (Euarchontoglires) have long been debated. While it is now generally agreed that Rodentia constitutes a monophyletic sister-group of Lagomorpha and that this clade (Glires) is sister to Primates and Dermoptera, higher-level relationships within Rodentia remain contentious. Results We have sequenced and performed extensive evolutionary analyses on the mitochondrial genome of the scaly-tailed flying squirrel Anomalurus sp., an enigmatic rodent whose phylogenetic affinities have been obscure and extensively debated. Our phylogenetic analyses of the coding regions of available complete mitochondrial genome sequences from Euarchontoglires suggest that Anomalurus is a sister taxon to the Hystricognathi, and that this clade represents the most basal divergence among sampled Rodentia. Bayesian dating methods incorporating a relaxed molecular clock provide divergence-time estimates which are consistently in agreement with the fossil record and which indicate a rapid radiation within Glires around 60 million years ago. Conclusion Taken together, the data presented provide a working hypothesis as to the phylogenetic placement of Anomalurus, underline the utility of mitochondrial sequences in the resolution of even relatively deep divergences and go some way to explaining the difficulty of conclusively resolving higher-level relationships within Glires with available data and methodologies. PMID:17288612
NASA Astrophysics Data System (ADS)
Tian, Caihong; Tek Tay, Wee; Feng, Hongqiang; Wang, Ying; Hu, Yongmin; Li, Guoping
2015-06-01
Adelphocoris suturalis is one of the most serious pest insects of Bt cotton in China, however its molecular genetics, biochemistry and physiology are poorly understood. We used high throughput sequencing platform to perform de novo transcriptome assembly and gene expression analyses across different developmental stages (eggs, 2nd and 5th instar nymphs, female and male adults). We obtained 20 GB of clean data and revealed 88,614 unigenes, including 23,830 clusters and 64,784 singletons. These unigene sequences were annotated and classified by Gene Ontology, Clusters of Orthologous Groups, and Kyoto Encyclopedia of Genes and Genomes databases. A large number of differentially expressed genes were discovered through pairwise comparisons between these developmental stages. Gene expression profiles were dramatically different between life stage transitions, with some of these most differentially expressed genes being associated with sex difference, metabolism and development. Quantitative real-time PCR results confirm deep-sequencing findings based on relative expression levels of nine randomly selected genes. Furthermore, over 791,390 single nucleotide polymorphisms and 2,682 potential simple sequence repeats were identified. Our study provided comprehensive transcriptional gene expression information for A. suturalis that will form the basis to better understanding of development pathways, hormone biosynthesis, sex differences and wing formation in mirid bugs.
Tian, Caihong; Tek Tay, Wee; Feng, Hongqiang; Wang, Ying; Hu, Yongmin; Li, Guoping
2015-01-01
Adelphocoris suturalis is one of the most serious pest insects of Bt cotton in China, however its molecular genetics, biochemistry and physiology are poorly understood. We used high throughput sequencing platform to perform de novo transcriptome assembly and gene expression analyses across different developmental stages (eggs, 2nd and 5th instar nymphs, female and male adults). We obtained 20 GB of clean data and revealed 88,614 unigenes, including 23,830 clusters and 64,784 singletons. These unigene sequences were annotated and classified by Gene Ontology, Clusters of Orthologous Groups, and Kyoto Encyclopedia of Genes and Genomes databases. A large number of differentially expressed genes were discovered through pairwise comparisons between these developmental stages. Gene expression profiles were dramatically different between life stage transitions, with some of these most differentially expressed genes being associated with sex difference, metabolism and development. Quantitative real-time PCR results confirm deep-sequencing findings based on relative expression levels of nine randomly selected genes. Furthermore, over 791,390 single nucleotide polymorphisms and 2,682 potential simple sequence repeats were identified. Our study provided comprehensive transcriptional gene expression information for A. suturalis that will form the basis to better understanding of development pathways, hormone biosynthesis, sex differences and wing formation in mirid bugs. PMID:26047353
Antisense Transcription Is Pervasive but Rarely Conserved in Enteric Bacteria
Raghavan, Rahul; Sloan, Daniel B.; Ochman, Howard
2012-01-01
ABSTRACT Noncoding RNAs, including antisense RNAs (asRNAs) that originate from the complementary strand of protein-coding genes, are involved in the regulation of gene expression in all domains of life. Recent application of deep-sequencing technologies has revealed that the transcription of asRNAs occurs genome-wide in bacteria. Although the role of the vast majority of asRNAs remains unknown, it is often assumed that their presence implies important regulatory functions, similar to those of other noncoding RNAs. Alternatively, many antisense transcripts may be produced by chance transcription events from promoter-like sequences that result from the degenerate nature of bacterial transcription factor binding sites. To investigate the biological relevance of antisense transcripts, we compared genome-wide patterns of asRNA expression in closely related enteric bacteria, Escherichia coli and Salmonella enterica serovar Typhimurium, by performing strand-specific transcriptome sequencing. Although antisense transcripts are abundant in both species, less than 3% of asRNAs are expressed at high levels in both species, and only about 14% appear to be conserved among species. And unlike the promoters of protein-coding genes, asRNA promoters show no evidence of sequence conservation between, or even within, species. Our findings suggest that many or even most bacterial asRNAs are nonadaptive by-products of the cell’s transcription machinery. PMID:22872780
Variant calling in low-coverage whole genome sequencing of a Native American population sample.
Bizon, Chris; Spiegel, Michael; Chasse, Scott A; Gizer, Ian R; Li, Yun; Malc, Ewa P; Mieczkowski, Piotr A; Sailsbery, Josh K; Wang, Xiaoshu; Ehlers, Cindy L; Wilhelmsen, Kirk C
2014-01-30
The reduction in the cost of sequencing a human genome has led to the use of genotype sampling strategies in order to impute and infer the presence of sequence variants that can then be tested for associations with traits of interest. Low-coverage Whole Genome Sequencing (WGS) is a sampling strategy that overcomes some of the deficiencies seen in fixed content SNP array studies. Linkage-disequilibrium (LD) aware variant callers, such as the program Thunder, may provide a calling rate and accuracy that makes a low-coverage sequencing strategy viable. We examined the performance of an LD-aware variant calling strategy in a population of 708 low-coverage whole genome sequences from a community sample of Native Americans. We assessed variant calling through a comparison of the sequencing results to genotypes measured in 641 of the same subjects using a fixed content first generation exome array. The comparison was made using the variant calling routines GATK Unified Genotyper program and the LD-aware variant caller Thunder. Thunder was found to improve concordance in a coverage dependent fashion, while correctly calling nearly all of the common variants as well as a high percentage of the rare variants present in the sample. Low-coverage WGS is a strategy that appears to collect genetic information intermediate in scope between fixed content genotyping arrays and deep-coverage WGS. Our data suggests that low-coverage WGS is a viable strategy with a greater chance of discovering novel variants and associations than fixed content arrays for large sample association analyses.
Yakimov, Michail M; Cono, Violetta La; Smedile, Francesco; DeLuca, Thomas H; Juárez, Silvia; Ciordia, Sergio; Fernández, Marisol; Albar, Juan Pablo; Ferrer, Manuel; Golyshin, Peter N; Giuliano, Laura
2011-01-01
Mesophilic Crenarchaeota have recently been thought to be significant contributors to nitrogen (N) and carbon (C) cycling. In this study, we examined the vertical distribution of ammonia-oxidizing Crenarchaeota at offshore site in Southern Tyrrhenian Sea. The median value of the crenachaeal cell to amoA gene ratio was close to one suggesting that virtually all deep-sea Crenarchaeota possess the capacity to oxidize ammonia. Crenarchaea-specific genes, nirK and ureC, for nitrite reductase and urease were identified and their affiliation demonstrated the presence of ‘deep-sea' clades distinct from ‘shallow' representatives. Measured deep-sea dark CO2 fixation estimates were comparable to the median value of photosynthetic biomass production calculated for this area of Tyrrhenian Sea, pointing to the significance of this process in the C cycle of aphotic marine ecosystems. To elucidate the pivotal organisms in this process, we targeted known marine crenarchaeal autotrophy-related genes, coding for acetyl-CoA carboxylase (accA) and 4-hydroxybutyryl-CoA dehydratase (4-hbd). As in case of nirK and ureC, these genes are grouped with deep-sea sequences being distantly related to those retrieved from the epipelagic zone. To pair the molecular data with specific functional attributes we performed [14C]HCO3 incorporation experiments followed by analyses of radiolabeled proteins using shotgun proteomics approach. More than 100 oligopeptides were attributed to 40 marine crenarchaeal-specific proteins that are involved in 10 different metabolic processes, including autotrophy. Obtained results provided a clear proof of chemolithoautotrophic physiology of bathypelagic crenarchaeota and indicated that this numerically predominant group of microorganisms facilitate a hitherto unrecognized sink for inorganic C of a global importance. PMID:21209665
Yakimov, Michail M; Cono, Violetta La; Smedile, Francesco; DeLuca, Thomas H; Juárez, Silvia; Ciordia, Sergio; Fernández, Marisol; Albar, Juan Pablo; Ferrer, Manuel; Golyshin, Peter N; Giuliano, Laura
2011-06-01
Mesophilic Crenarchaeota have recently been thought to be significant contributors to nitrogen (N) and carbon (C) cycling. In this study, we examined the vertical distribution of ammonia-oxidizing Crenarchaeota at offshore site in Southern Tyrrhenian Sea. The median value of the crenachaeal cell to amoA gene ratio was close to one suggesting that virtually all deep-sea Crenarchaeota possess the capacity to oxidize ammonia. Crenarchaea-specific genes, nirK and ureC, for nitrite reductase and urease were identified and their affiliation demonstrated the presence of 'deep-sea' clades distinct from 'shallow' representatives. Measured deep-sea dark CO(2) fixation estimates were comparable to the median value of photosynthetic biomass production calculated for this area of Tyrrhenian Sea, pointing to the significance of this process in the C cycle of aphotic marine ecosystems. To elucidate the pivotal organisms in this process, we targeted known marine crenarchaeal autotrophy-related genes, coding for acetyl-CoA carboxylase (accA) and 4-hydroxybutyryl-CoA dehydratase (4-hbd). As in case of nirK and ureC, these genes are grouped with deep-sea sequences being distantly related to those retrieved from the epipelagic zone. To pair the molecular data with specific functional attributes we performed [(14)C]HCO(3) incorporation experiments followed by analyses of radiolabeled proteins using shotgun proteomics approach. More than 100 oligopeptides were attributed to 40 marine crenarchaeal-specific proteins that are involved in 10 different metabolic processes, including autotrophy. Obtained results provided a clear proof of chemolithoautotrophic physiology of bathypelagic crenarchaeota and indicated that this numerically predominant group of microorganisms facilitate a hitherto unrecognized sink for inorganic C of a global importance.
Liu, Jinlin; Jia, Zhijuan; Li, Sha; Li, Yan; You, Qiang; Zhang, Chunyan; Zheng, Xiaotong; Xiong, Guomei; Zhao, Jin; Qi, Chao; Yang, Jihong
2016-09-15
The chemical and biological compositions of deep-sea sediments are interesting because of the underexplored diversity when it comes to bioprospecting. The special geographical location and climates make Arctic Ocean a unique ocean area containing an abundance of microbial resources. A metagenomic library was constructed based on the deep-sea sediments of Arctic Ocean. Part of insertion fragments of this library were sequenced. A chitin deacetylase gene, cdaYJ, was identified and characterized. A metagenomic library with 2750 clones was obtained and ten clones were sequenced. Results revealed several interesting genes, including a chitin deacetylase coding sequence, cdaYJ. The CdaYJ is homologous to some known chitin deacetylases and contains conserved chitin deacetylase active sites. CdaYJ protein exhibits a long N-terminal and a relative short C-terminal. Phylogenetic analysis revealed that CdaYJ showed highest homology to CDAs from Alphaproteobacteria. The cdaYJ gene was subcloned into the pET-28a vector and the recombinant CdaYJ (rCdaYJ) was expressed in Escherichia coli BL21 (DE3). rCdaYJ showed a molecular weight of 43kDa, and exhibited deacetylation activity by using p-nitroacetanilide as substrate. The optimal pH and temperature of rCdaYJ were tested as pH7.4 and 28°C, respectively. The construction of metagenomic library of the Arctic deep-sea sediments provides us an opportunity to look into the microbial communities and exploiting valuable gene resources. A chitin deacetylase CdaYJ was identified from the library. It showed highest deacetylation activity under slight alkaline and low temperature conditions. CdaYJ might be a candidate chitin deacetylase that possesses industrial and pharmaceutical potentials. Copyright © 2016 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Bar Or, I.; Ben-Dov, E.; Kushmaro, A.; Eckert, W.; Sivan, O.
2014-06-01
Microbial methane oxidation process (methanotrophy) is the primary control on the emission of the greenhouse gas methane (CH4) to the atmosphere. In terrestrial environments, aerobic methanotrophic bacteria are mainly responsible for oxidizing the methane. In marine sediments the coupling of the anaerobic oxidation of methane (AOM) with sulfate reduction, often by a consortium of anaerobic methanotrophic archaea (ANME) and sulfate reducing bacteria, was found to consume almost all the upward diffusing methane. Recently, we showed geochemical evidence for AOM driven by iron reduction in Lake Kinneret (LK) (Israel) deep sediments and suggested that this process can be an important global methane sink. The goal of the present study was to link the geochemical gradients found in the porewater (chemical and isotope profiles) with possible changes in microbial community structure. Specifically, we examined the possible shift in the microbial community in the deep iron-driven AOM zone and its similarity to known sulfate driven AOM populations. Screening of archaeal 16S rRNA gene sequences revealed Thaumarchaeota and Euryarchaeota as the dominant phyla in the sediment. Thaumarchaeota, which belongs to the family of copper containing membrane-bound monooxgenases, increased with depth while Euryarchaeota decreased. This may indicate the involvement of Thaumarchaeota, which were discovered to be ammonia oxidizers but whose activity could also be linked to methane, in AOM in the deep sediment. ANMEs sequences were not found in the clone libraries, suggesting that iron-driven AOM is not through sulfate. Bacterial 16S rRNA sequences displayed shifts in community diversity with depth. Proteobacteria and Chloroflexi increased with depth, which could be connected with their different dissimilatory anaerobic processes. The observed changes in microbial community structure suggest possible direct and indirect mechanisms for iron-driven AOM in deep sediments.
Singh, Vimal; Pfeuffer, Josef; Zhao, Tiejun; Ress, David
2018-04-01
High-resolution functional magnetic resonance imaging of human subcortical brain structures is challenging because of their deep location in the cranium, and their comparatively weak blood oxygen level dependent responses to strong stimuli. Magnetic resonance imaging data for subcortical brain regions exhibit both low signal-to-noise ratio and low functional contrast-to-noise ratio. To overcome these challenges, this work evaluates the use of dual-echo spiral variants that combine outward and inward trajectories. Specifically, in-in, in-out, and out-out combinations are evaluated. For completeness, single-echo spiral-in and parallel-receive-accelerated echo-planar-imaging sequences are also evaluated. Sequence evaluation was based on comparison of functional contrast-to-noise ratio within retinotopically predefined regions of interest. Superior colliculus was chosen as sample subcortical brain region because it exhibits a strong visual response. All sequences were compared relative to a single-echo spiral-out trajectory to establish a within-session reference. In superior colliculus, the dual-echo out-out outperformed the reference trajectory by 55% in contrast-to-noise ratio, while all other trajectories had performance similar to the reference. The sequences were also compared in early visual cortex. Here, both dual-echo spiral out-out and in-out outperformed the reference by ∼25%. Dual-echo spiral variants offer improved contrast-to-noise ratio performance for high-resolution imaging for both superior colliculus and cortex. Magn Reson Med 79:1931-1940, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Deep Whole-Genome Sequencing to Detect Mixed Infection of Mycobacterium tuberculosis
Gan, Mingyu; Liu, Qingyun; Yang, Chongguang; Gao, Qian; Luo, Tao
2016-01-01
Mixed infection by multiple Mycobacterium tuberculosis (MTB) strains is associated with poor treatment outcome of tuberculosis (TB). Traditional genotyping methods have been used to detect mixed infections of MTB, however, their sensitivity and resolution are limited. Deep whole-genome sequencing (WGS) has been proved highly sensitive and discriminative for studying population heterogeneity of MTB. Here, we developed a phylogenetic-based method to detect MTB mixed infections using WGS data. We collected published WGS data of 782 global MTB strains from public database. We called homogeneous and heterogeneous single nucleotide variations (SNVs) of individual strains by mapping short reads to the ancestral MTB reference genome. We constructed a phylogenomic database based on 68,639 homogeneous SNVs of 652 MTB strains. Mixed infections were determined if multiple evolutionary paths were identified by mapping the SNVs of individual samples to the phylogenomic database. By simulation, our method could specifically detect mixed infections when the sequencing depth of minor strains was as low as 1× coverage, and when the genomic distance of two mixed strains was as small as 16 SNVs. By applying our methods to all 782 samples, we detected 47 mixed infections and 45 of them were caused by locally endemic strains. The results indicate that our method is highly sensitive and discriminative for identifying mixed infections from deep WGS data of MTB isolates. PMID:27391214
Dhahbi, Joseph M; Spindler, Stephen R; Atamna, Hani; Yamakawa, Amy; Guerrero, Noel; Boffelli, Dario; Mote, Patricia; Martin, David I K
2013-02-01
MicroRNAs (miRNAs) function to modulate gene expression, and through this property they regulate a broad spectrum of cellular processes. They can circulate in blood and thereby mediate cell-to-cell communication. Aging involves changes in many cellular processes that are potentially regulated by miRNAs, and some evidence has implicated circulating miRNAs in the aging process. In order to initiate a comprehensive assessment of the role of circulating miRNAs in aging, we have used deep sequencing to characterize circulating miRNAs in the serum of young mice, old mice, and old mice maintained on calorie restriction (CR). Deep sequencing identifies a set of novel miRNAs, and also accurately measures all known miRNAs present in serum. This analysis demonstrates that the levels of many miRNAs circulating in the mouse are increased with age, and that the increases can be antagonized by CR. The genes targeted by this set of age-modulated miRNAs are predicted to regulate biological processes directly relevant to the manifestations of aging including metabolic changes, and the miRNAs themselves have been linked to diseases associated with old age. This finding implicates circulating miRNAs in the aging process, raising questions about their tissues of origin, their cellular targets, and their functional role in metabolic changes that occur with aging.
Xiao, Bingbing; Niu, Xiaoxi; Han, Na; Wang, Ben; Du, Pengcheng; Na, Risu; Chen, Chen; Liao, Qinping
2016-01-01
Bacterial vaginosis (BV) is a highly prevalent disease in women, and increases the risk of pelvic inflammatory disease. It has been given wide attention because of the high recurrence rate. Traditional diagnostic methods based on microscope providing limited information on the vaginal microbiota increase the difficulty in tracing the development of the disease in bacteria resistance condition. In this study, we used deep-sequencing technology to observe dynamic variation of the vaginal microbiota at three major time points during treatment, at D0 (before treatment), D7 (stop using the antibiotics) and D30 (the 30-day follow-up visit). Sixty-five patients with BV were enrolled (48 were cured and 17 were not cured), and their bacterial composition of the vaginal microbiota was compared. Interestingly, we identified 9 patients might be recurrence. We also introduced a new measurement point of D7, although its microbiota were significantly inhabited by antibiotic and hard to be observed by traditional method. The vaginal microbiota in deep-sequencing-view present a strong correlation to the final outcome. Thus, coupled with detailed individual bioinformatics analysis and deep-sequencing technology, we may illustrate a more accurate map of vaginal microbial to BV patients, which provide a new opportunity to reduce the rate of recurrence of BV. PMID:27253522
NASA Astrophysics Data System (ADS)
Ravara, Ascensão; Ramos, Diana; Teixeira, Marcos A. L.; Costa, Filipe O.; Cunha, Marina R.
2017-03-01
The polychaetes of the order Phyllodocida (excluding Nereidiformia and Phyllodociformia incertae sedis) collected from deep-sea habitats of the Iberian margin (Bay of Biscay, Horseshoe continental rise, Gulf of Cadiz and Alboran Sea), and Atlantic seamounts (Gorringe Bank, Atlantis and Nameless) are reported herein. Thirty-six species belonging to seven families - Acoetidae, Pholoidae, Polynoidae, Sigalionidae, Glyceridae, Goniadidae and Phyllodocidae, were identified. Amended descriptions and/or new illustrations are given for the species Allmaniella setubalensis, Anotochaetonoe michelbhaudi, Lepidasthenia brunnea and Polynoe sp. Relevant taxonomical notes are provided for other seventeen species. Allmaniella setubalensis, Anotochaetonoe michelbhaudi, Harmothoe evei, Eumida longicirrata and Glycera noelae, previously known only from their type localities were found in different deep-water places of the studied areas and constitute new records for the Iberian margin. The geographic distributions and the bathymetric range of thirteen and fifteen species, respectively, are extended. The morphology-based biodiversity inventory was complemented with DNA sequences of the mitochondrial barcode region (COI barcodes) providing a molecular tag for future reference. Twenty new sequences were obtained for nine species in the families Acoetidae, Glyceridae and Polynoidae and for three lineages within the Phylodoce madeirensis complex (Phyllodocidae). A brief analysis of the newly obtained sequences and publicly available COI barcode data for the genera herein reported, highlighted several cases of unclear taxonomic assignments, which need further study.
Woo, Hannah L; O'Dell, Kaela B; Utturkar, Sagar; McBride, Kathryn R; Huntemann, Marcel; Clum, Alicia; Pillay, Manoj; Palaniappan, Krishnaveni; Varghese, Neha; Mikhailova, Natalia; Stamatis, Dimitrios; Reddy, T B K; Ngan, Chew Yee; Daum, Chris; Shapiro, Nicole; Markowitz, Victor; Ivanova, Natalia; Kyrpides, Nikos; Woyke, Tanja; Brown, Steven D; Hazen, Terry C
2016-11-23
Thalassospira sp. strain KO164 was isolated from eastern Mediterranean seawater and sediment laboratory microcosms enriched on insoluble organosolv lignin under oxic conditions. The near-complete genome sequence presented here will facilitate analyses into this deep-ocean bacterium's ability to degrade recalcitrant organics such as lignin. Copyright © 2016 Woo et al.
Whole-Genome Characterization of Prunus necrotic ringspot virus Infecting Sweet Cherry in China
2018-01-01
ABSTRACT Prunus necrotic ringspot virus (PNRSV) causes yield loss in most cultivated stone fruits, including sweet cherry. Using a small RNA deep-sequencing approach combined with end-genome sequence cloning, we identified the complete genomes of all three PNRSV strands from PNRSV-infected sweet cherry trees and compared them with those of two previously reported isolates. PMID:29496825