RIPiT-Seq: A high-throughput approach for footprinting RNA:protein complexes
Singh, Guramrit; Ricci, Emiliano P.; Moore, Melissa J.
2013-01-01
Development of high-throughput approaches to map the RNA interaction sites of individual RNA binding proteins (RBPs) transcriptome-wide is rapidly transforming our understanding of post-transcriptional gene regulatory mechanisms. Here we describe a ribonucleoprotein (RNP) footprinting approach we recently developed for identifying occupancy sites of both individual RBPs and multi-subunit RNP complexes. RNA:protein immunoprecipitation in tandem (RIPiT) yields highly specific RNA footprints of cellular RNPs isolated via two sequential purifications; the resulting RNA footprints can then be identified by high-throughput sequencing (Seq). RIPiT-Seq is broadly applicable to all RBPs regardless of their RNA binding mode and thus provides a means to map the RNA binding sites of RBPs with poor inherent ultraviolet (UV) crosslinkability. Further, among current high-throughput approaches, RIPiT has the unique capacity to differentiate binding sites of RNPs with overlapping protein composition. It is therefore particularly suited for studying dynamic RNP assemblages whose composition evolves as gene expression proceeds. PMID:24096052
Identification and removal of low-complexity sites in allele-specific analysis of ChIP-seq data.
Waszak, Sebastian M; Kilpinen, Helena; Gschwind, Andreas R; Orioli, Andrea; Raghav, Sunil K; Witwicki, Robert M; Migliavacca, Eugenia; Yurovsky, Alisa; Lappalainen, Tuuli; Hernandez, Nouria; Reymond, Alexandre; Dermitzakis, Emmanouil T; Deplancke, Bart
2014-01-15
High-throughput sequencing technologies enable the genome-wide analysis of the impact of genetic variation on molecular phenotypes at unprecedented resolution. However, although powerful, these technologies can also introduce unexpected artifacts. We investigated the impact of library amplification bias on the identification of allele-specific (AS) molecular events from high-throughput sequencing data derived from chromatin immunoprecipitation assays (ChIP-seq). Putative AS DNA binding activity for RNA polymerase II was determined using ChIP-seq data derived from lymphoblastoid cell lines of two parent-daughter trios. We found that, at high-sequencing depth, many significant AS binding sites suffered from an amplification bias, as evidenced by a larger number of clonal reads representing one of the two alleles. To alleviate this bias, we devised an amplification bias detection strategy, which filters out sites with low read complexity and sites featuring a significant excess of clonal reads. This method will be useful for AS analyses involving ChIP-seq and other functional sequencing assays. The R package abs filter for library clonality simulations and detection of amplification-biased sites is available from http://updepla1srv1.epfl.ch/waszaks/absfilter
Chabbert, Christophe D; Adjalley, Sophie H; Steinmetz, Lars M; Pelechano, Vicent
2018-01-01
Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) or microarray hybridization (ChIP-on-chip) are standard methods for the study of transcription factor binding sites and histone chemical modifications. However, these approaches only allow profiling of a single factor or protein modification at a time.In this chapter, we present Bar-ChIP, a higher throughput version of ChIP-Seq that relies on the direct ligation of molecular barcodes to chromatin fragments. Bar-ChIP enables the concurrent profiling of multiple DNA-protein interactions and is therefore amenable to experimental scale-up, without the need for any robotic instrumentation.
Wei, Yingying; Wu, George; Ji, Hongkai
2013-05-01
Mapping genome-wide binding sites of all transcription factors (TFs) in all biological contexts is a critical step toward understanding gene regulation. The state-of-the-art technologies for mapping transcription factor binding sites (TFBSs) couple chromatin immunoprecipitation (ChIP) with high-throughput sequencing (ChIP-seq) or tiling array hybridization (ChIP-chip). These technologies have limitations: they are low-throughput with respect to surveying many TFs. Recent advances in genome-wide chromatin profiling, including development of technologies such as DNase-seq, FAIRE-seq and ChIP-seq for histone modifications, make it possible to predict in vivo TFBSs by analyzing chromatin features at computationally determined DNA motif sites. This promising new approach may allow researchers to monitor the genome-wide binding sites of many TFs simultaneously. In this article, we discuss various experimental design and data analysis issues that arise when applying this approach. Through a systematic analysis of the data from the Encyclopedia Of DNA Elements (ENCODE) project, we compare the predictive power of individual and combinations of chromatin marks using supervised and unsupervised learning methods, and evaluate the value of integrating information from public ChIP and gene expression data. We also highlight the challenges and opportunities for developing novel analytical methods, such as resolving the one-motif-multiple-TF ambiguity and distinguishing functional and non-functional TF binding targets from the predicted binding sites. The online version of this article (doi:10.1007/s12561-012-9066-5) contains supplementary material, which is available to authorized users.
Kwon, Andrew T.; Arenillas, David J.; Hunt, Rebecca Worsley; Wasserman, Wyeth W.
2012-01-01
oPOSSUM-3 is a web-accessible software system for identification of over-represented transcription factor binding sites (TFBS) and TFBS families in either DNA sequences of co-expressed genes or sequences generated from high-throughput methods, such as ChIP-Seq. Validation of the system with known sets of co-regulated genes and published ChIP-Seq data demonstrates the capacity for oPOSSUM-3 to identify mediating transcription factors (TF) for co-regulated genes or co-recovered sequences. oPOSSUM-3 is available at http://opossum.cisreg.ca. PMID:22973536
Kwon, Andrew T; Arenillas, David J; Worsley Hunt, Rebecca; Wasserman, Wyeth W
2012-09-01
oPOSSUM-3 is a web-accessible software system for identification of over-represented transcription factor binding sites (TFBS) and TFBS families in either DNA sequences of co-expressed genes or sequences generated from high-throughput methods, such as ChIP-Seq. Validation of the system with known sets of co-regulated genes and published ChIP-Seq data demonstrates the capacity for oPOSSUM-3 to identify mediating transcription factors (TF) for co-regulated genes or co-recovered sequences. oPOSSUM-3 is available at http://opossum.cisreg.ca.
Pyicos: a versatile toolkit for the analysis of high-throughput sequencing data.
Althammer, Sonja; González-Vallinas, Juan; Ballaré, Cecilia; Beato, Miguel; Eyras, Eduardo
2011-12-15
High-throughput sequencing (HTS) has revolutionized gene regulation studies and is now fundamental for the detection of protein-DNA and protein-RNA binding, as well as for measuring RNA expression. With increasing variety and sequencing depth of HTS datasets, the need for more flexible and memory-efficient tools to analyse them is growing. We describe Pyicos, a powerful toolkit for the analysis of mapped reads from diverse HTS experiments: ChIP-Seq, either punctuated or broad signals, CLIP-Seq and RNA-Seq. We prove the effectiveness of Pyicos to select for significant signals and show that its accuracy is comparable and sometimes superior to that of methods specifically designed for each particular type of experiment. Pyicos facilitates the analysis of a variety of HTS datatypes through its flexibility and memory efficiency, providing a useful framework for data integration into models of regulatory genomics. Open-source software, with tutorials and protocol files, is available at http://regulatorygenomics.upf.edu/pyicos or as a Galaxy server at http://regulatorygenomics.upf.edu/galaxy eduardo.eyras@upf.edu Supplementary data are available at Bioinformatics online.
Accurate Prediction of Inducible Transcription Factor Binding Intensities In Vivo
Siepel, Adam; Lis, John T.
2012-01-01
DNA sequence and local chromatin landscape act jointly to determine transcription factor (TF) binding intensity profiles. To disentangle these influences, we developed an experimental approach, called protein/DNA binding followed by high-throughput sequencing (PB–seq), that allows the binding energy landscape to be characterized genome-wide in the absence of chromatin. We applied our methods to the Drosophila Heat Shock Factor (HSF), which inducibly binds a target DNA sequence element (HSE) following heat shock stress. PB–seq involves incubating sheared naked genomic DNA with recombinant HSF, partitioning the HSF–bound and HSF–free DNA, and then detecting HSF–bound DNA by high-throughput sequencing. We compared PB–seq binding profiles with ones observed in vivo by ChIP–seq and developed statistical models to predict the observed departures from idealized binding patterns based on covariates describing the local chromatin environment. We found that DNase I hypersensitivity and tetra-acetylation of H4 were the most influential covariates in predicting changes in HSF binding affinity. We also investigated the extent to which DNA accessibility, as measured by digital DNase I footprinting data, could be predicted from MNase–seq data and the ChIP–chip profiles for many histone modifications and TFs, and found GAGA element associated factor (GAF), tetra-acetylation of H4, and H4K16 acetylation to be the most predictive covariates. Lastly, we generated an unbiased model of HSF binding sequences, which revealed distinct biophysical properties of the HSF/HSE interaction and a previously unrecognized substructure within the HSE. These findings provide new insights into the interplay between the genomic sequence and the chromatin landscape in determining transcription factor binding intensity. PMID:22479205
Ozer, Abdullah; Tome, Jacob M; Friedman, Robin C; Gheba, Dan; Schroth, Gary P; Lis, John T
2015-08-01
Because RNA-protein interactions have a central role in a wide array of biological processes, methods that enable a quantitative assessment of these interactions in a high-throughput manner are in great demand. Recently, we developed the high-throughput sequencing-RNA affinity profiling (HiTS-RAP) assay that couples sequencing on an Illumina GAIIx genome analyzer with the quantitative assessment of protein-RNA interactions. This assay is able to analyze interactions between one or possibly several proteins with millions of different RNAs in a single experiment. We have successfully used HiTS-RAP to analyze interactions of the EGFP and negative elongation factor subunit E (NELF-E) proteins with their corresponding canonical and mutant RNA aptamers. Here we provide a detailed protocol for HiTS-RAP that can be completed in about a month (8 d hands-on time). This includes the preparation and testing of recombinant proteins and DNA templates, clustering DNA templates on a flowcell, HiTS and protein binding with a GAIIx instrument, and finally data analysis. We also highlight aspects of HiTS-RAP that can be further improved and points of comparison between HiTS-RAP and two other recently developed methods, quantitative analysis of RNA on a massively parallel array (RNA-MaP) and RNA Bind-n-Seq (RBNS), for quantitative analysis of RNA-protein interactions.
Parallel factor ChIP provides essential internal control for quantitative differential ChIP-seq.
Guertin, Michael J; Cullen, Amy E; Markowetz, Florian; Holding, Andrew N
2018-04-17
A key challenge in quantitative ChIP combined with high-throughput sequencing (ChIP-seq) is the normalization of data in the presence of genome-wide changes in occupancy. Analysis-based normalization methods were developed for transcriptomic data and these are dependent on the underlying assumption that total transcription does not change between conditions. For genome-wide changes in transcription factor (TF) binding, these assumptions do not hold true. The challenges in normalization are confounded by experimental variability during sample preparation, processing and recovery. We present a novel normalization strategy utilizing an internal standard of unchanged peaks for reference. Our method can be readily applied to monitor genome-wide changes by ChIP-seq that are otherwise lost or misrepresented through analytical normalization. We compare our approach to normalization by total read depth and two alternative methods that utilize external experimental controls to study TF binding. We successfully resolve the key challenges in quantitative ChIP-seq analysis and demonstrate its application by monitoring the loss of Estrogen Receptor-alpha (ER) binding upon fulvestrant treatment, ER binding in response to estrodiol, ER mediated change in H4K12 acetylation and profiling ER binding in patient-derived xenographs. This is supported by an adaptable pipeline to normalize and quantify differential TF binding genome-wide and generate metrics for differential binding at individual sites.
Pyicos: a versatile toolkit for the analysis of high-throughput sequencing data
Althammer, Sonja; González-Vallinas, Juan; Ballaré, Cecilia; Beato, Miguel; Eyras, Eduardo
2011-01-01
Motivation: High-throughput sequencing (HTS) has revolutionized gene regulation studies and is now fundamental for the detection of protein–DNA and protein–RNA binding, as well as for measuring RNA expression. With increasing variety and sequencing depth of HTS datasets, the need for more flexible and memory-efficient tools to analyse them is growing. Results: We describe Pyicos, a powerful toolkit for the analysis of mapped reads from diverse HTS experiments: ChIP-Seq, either punctuated or broad signals, CLIP-Seq and RNA-Seq. We prove the effectiveness of Pyicos to select for significant signals and show that its accuracy is comparable and sometimes superior to that of methods specifically designed for each particular type of experiment. Pyicos facilitates the analysis of a variety of HTS datatypes through its flexibility and memory efficiency, providing a useful framework for data integration into models of regulatory genomics. Availability: Open-source software, with tutorials and protocol files, is available at http://regulatorygenomics.upf.edu/pyicos or as a Galaxy server at http://regulatorygenomics.upf.edu/galaxy Contact: eduardo.eyras@upf.edu Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:21994224
High-throughput full-length single-cell mRNA-seq of rare cells.
Ooi, Chin Chun; Mantalas, Gary L; Koh, Winston; Neff, Norma F; Fuchigami, Teruaki; Wong, Dawson J; Wilson, Robert J; Park, Seung-Min; Gambhir, Sanjiv S; Quake, Stephen R; Wang, Shan X
2017-01-01
Single-cell characterization techniques, such as mRNA-seq, have been applied to a diverse range of applications in cancer biology, yielding great insight into mechanisms leading to therapy resistance and tumor clonality. While single-cell techniques can yield a wealth of information, a common bottleneck is the lack of throughput, with many current processing methods being limited to the analysis of small volumes of single cell suspensions with cell densities on the order of 107 per mL. In this work, we present a high-throughput full-length mRNA-seq protocol incorporating a magnetic sifter and magnetic nanoparticle-antibody conjugates for rare cell enrichment, and Smart-seq2 chemistry for sequencing. We evaluate the efficiency and quality of this protocol with a simulated circulating tumor cell system, whereby non-small-cell lung cancer cell lines (NCI-H1650 and NCI-H1975) are spiked into whole blood, before being enriched for single-cell mRNA-seq by EpCAM-functionalized magnetic nanoparticles and the magnetic sifter. We obtain high efficiency (> 90%) capture and release of these simulated rare cells via the magnetic sifter, with reproducible transcriptome data. In addition, while mRNA-seq data is typically only used for gene expression analysis of transcriptomic data, we demonstrate the use of full-length mRNA-seq chemistries like Smart-seq2 to facilitate variant analysis of expressed genes. This enables the use of mRNA-seq data for differentiating cells in a heterogeneous population by both their phenotypic and variant profile. In a simulated heterogeneous mixture of circulating tumor cells in whole blood, we utilize this high-throughput protocol to differentiate these heterogeneous cells by both their phenotype (lung cancer versus white blood cells), and mutational profile (H1650 versus H1975 cells), in a single sequencing run. This high-throughput method can help facilitate single-cell analysis of rare cell populations, such as circulating tumor or endothelial cells, with demonstrably high-quality transcriptomic data.
Chung, Dongjun; Kuan, Pei Fen; Li, Bo; Sanalkumar, Rajendran; Liang, Kun; Bresnick, Emery H; Dewey, Colin; Keleş, Sündüz
2011-07-01
Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads). This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads). Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with uni-reads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments.
Nakato, Ryuichiro; Itoh, Tahehiko; Shirahige, Katsuhiko
2013-07-01
Chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq) can identify genomic regions that bind proteins involved in various chromosomal functions. Although the development of next-generation sequencers offers the technology needed to identify these protein-binding sites, the analysis can be computationally challenging because sequencing data sometimes consist of >100 million reads/sample. Herein, we describe a cost-effective and time-efficient protocol that is generally applicable to ChIP-seq analysis; this protocol uses a novel peak-calling program termed DROMPA to identify peaks and an additional program, parse2wig, to preprocess read-map files. This two-step procedure drastically reduces computational time and memory requirements compared with other programs. DROMPA enables the identification of protein localization sites in repetitive sequences and efficiently identifies both broad and sharp protein localization peaks. Specifically, DROMPA outputs a protein-binding profile map in pdf or png format, which can be easily manipulated by users who have a limited background in bioinformatics. © 2013 The Authors Genes to Cells © 2013 by the Molecular Biology Society of Japan and Wiley Publishing Asia Pty Ltd.
Guo, Wei-Li; Huang, De-Shuang
2017-08-22
Transcription factors (TFs) are DNA-binding proteins that have a central role in regulating gene expression. Identification of DNA-binding sites of TFs is a key task in understanding transcriptional regulation, cellular processes and disease. Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) enables genome-wide identification of in vivo TF binding sites. However, it is still difficult to map every TF in every cell line owing to cost and biological material availability, which poses an enormous obstacle for integrated analysis of gene regulation. To address this problem, we propose a novel computational approach, TFBSImpute, for predicting additional TF binding profiles by leveraging information from available ChIP-seq TF binding data. TFBSImpute fuses the dataset to a 3-mode tensor and imputes missing TF binding signals via simultaneous completion of multiple TF binding matrices with positional consistency. We show that signals predicted by our method achieve overall similarity with experimental data and that TFBSImpute significantly outperforms baseline approaches, by assessing the performance of imputation methods against observed ChIP-seq TF binding profiles. Besides, motif analysis shows that TFBSImpute preforms better in capturing binding motifs enriched in observed data compared with baselines, indicating that the higher performance of TFBSImpute is not simply due to averaging related samples. We anticipate that our approach will constitute a useful complement to experimental mapping of TF binding, which is beneficial for further study of regulation mechanisms and disease.
Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw
2017-01-01
Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare . However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes.
Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw
2017-01-01
Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare. However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes. PMID:29250096
Global Analysis of Transcription Factor-Binding Sites in Yeast Using ChIP-Seq
Lefrançois, Philippe; Gallagher, Jennifer E. G.; Snyder, Michael
2016-01-01
Transcription factors influence gene expression through their ability to bind DNA at specific regulatory elements. Specific DNA-protein interactions can be isolated through the chromatin immunoprecipitation (ChIP) procedure, in which DNA fragments bound by the protein of interest are recovered. ChIP is followed by high-throughput DNA sequencing (Seq) to determine the genomic provenance of ChIP DNA fragments and their relative abundance in the sample. This chapter describes a ChIP-Seq strategy adapted for budding yeast to enable the genome-wide characterization of binding sites of transcription factors (TFs) and other DNA-binding proteins in an efficient and cost-effective way. Yeast strains with epitope-tagged TFs are most commonly used for ChIP-Seq, along with their matching untagged control strains. The initial step of ChIP involves the cross-linking of DNA and proteins. Next, yeast cells are lysed and sonicated to shear chromatin into smaller fragments. An antibody against an epitope-tagged TF is used to pull down chromatin complexes containing DNA and the TF of interest. DNA is then purified and proteins degraded. Specific barcoded adapters for multiplex DNA sequencing are ligated to ChIP DNA. Short DNA sequence reads (28–36 base pairs) are parsed according to the barcode and aligned against the yeast reference genome, thus generating a nucleotide-resolution map of transcription factor-binding sites and their occupancy. PMID:25213249
Cheng, Chia-Yang; Chu, Chia-Han; Hsu, Hung-Wei; Hsu, Fang-Rong; Tang, Chung Yi; Wang, Wen-Ching; Kung, Hsing-Jien; Chang, Pei-Ching
2014-01-01
Post-translational modification (PTM) of transcriptional factors and chromatin remodelling proteins is recognized as a major mechanism by which transcriptional regulation occurs. Chromatin immunoprecipitation (ChIP) in combination with high-throughput sequencing (ChIP-seq) is being applied as a gold standard when studying the genome-wide binding sites of transcription factor (TFs). This has greatly improved our understanding of protein-DNA interactions on a genomic-wide scale. However, current ChIP-seq peak calling tools are not sufficiently sensitive and are unable to simultaneously identify post-translational modified TFs based on ChIP-seq analysis; this is largely due to the wide-spread presence of multiple modified TFs. Using SUMO-1 modification as an example; we describe here an improved approach that allows the simultaneous identification of the particular genomic binding regions of all TFs with SUMO-1 modification. Traditional peak calling methods are inadequate when identifying multiple TF binding sites that involve long genomic regions and therefore we designed a ChIP-seq processing pipeline for the detection of peaks via a combinatorial fusion method. Then, we annotate the peaks with known transcription factor binding sites (TFBS) using the Transfac Matrix Database (v7.0), which predicts potential SUMOylated TFs. Next, the peak calling result was further analyzed based on the promoter proximity, TFBS annotation, a literature review, and was validated by ChIP-real-time quantitative PCR (qPCR) and ChIP-reChIP real-time qPCR. The results show clearly that SUMOylated TFs are able to be pinpointed using our pipeline. A methodology is presented that analyzes SUMO-1 ChIP-seq patterns and predicts related TFs. Our analysis uses three peak calling tools. The fusion of these different tools increases the precision of the peak calling results. TFBS annotation method is able to predict potential SUMOylated TFs. Here, we offer a new approach that enhances ChIP-seq data analysis and allows the identification of multiple SUMOylated TF binding sites simultaneously, which can then be utilized for other functional PTM binding site prediction in future.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Jun-Hao; Liu, Shun; Zheng, Ling-Ling
Long non-coding RNAs (lncRNAs) are emerging as important regulatory molecules in developmental, physiological, and pathological processes. However, the precise mechanism and functions of most of lncRNAs remain largely unknown. Recent advances in high-throughput sequencing of immunoprecipitated RNAs after cross-linking (CLIP-Seq) provide powerful ways to identify biologically relevant protein–lncRNA interactions. In this study, by analyzing millions of RNA-binding protein (RBP) binding sites from 117 CLIP-Seq datasets generated by 50 independent studies, we identified 22,735 RBP–lncRNA regulatory relationships. We found that one single lncRNA will generally be bound and regulated by one or multiple RBPs, the combination of which may coordinately regulatemore » gene expression. We also revealed the expression correlation of these interaction networks by mining expression profiles of over 6000 normal and tumor samples from 14 cancer types. Our combined analysis of CLIP-Seq data and genome-wide association studies data discovered hundreds of disease-related single nucleotide polymorphisms resided in the RBP binding sites of lncRNAs. Finally, we developed interactive web implementations to provide visualization, analysis, and downloading of the aforementioned large-scale datasets. Our study represented an important step in identification and analysis of RBP–lncRNA interactions and showed that these interactions may play crucial roles in cancer and genetic diseases.« less
Cotney, Justin L; Noonan, James P
2015-02-02
Chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-Seq) is a powerful method used to identify genome-wide binding patterns of transcription factors and distribution of various histone modifications associated with different chromatin states. In most published studies, ChIP-Seq has been performed on cultured cells grown under controlled conditions, allowing generation of large amounts of material in a homogeneous biological state. Although such studies have provided great insight into the dynamic landscapes of animal genomes, they do not allow the examination of transcription factor binding and chromatin states in adult tissues, developing embryonic structures, or tumors. Such knowledge is critical to understanding the information required to create and maintain a complex biological tissue and to identify noncoding regions of the genome directly involved in tissues affected by complex diseases such as autism. Studying these tissue types with ChIP-Seq can be challenging due to the limited availability of tissues and the lack of complex biological states able to be achieved in culture. These inherent differences require alterations of standard cross-linking and chromatin extraction typically used in cell culture. Here we describe a general approach for using small amounts of animal tissue to perform ChIP-Seq directed at histone modifications and transcription factors. Tissue is homogenized before treatment with formaldehyde to ensure proper cross-linking, and a two-step nuclear isolation is performed to increase extraction of soluble chromatin. Small amounts of soluble chromatin are then used for immunoprecipitation (IP) and prepared for multiplexed high-throughput sequencing. © 2015 Cold Spring Harbor Laboratory Press.
Impact of artifact removal on ChIP quality metrics in ChIP-seq and ChIP-exo data
Carroll, Thomas S.; Liang, Ziwei; Salama, Rafik; Stark, Rory; de Santiago, Ines
2014-01-01
With the advent of ChIP-seq multiplexing technologies and the subsequent increase in ChIP-seq throughput, the development of working standards for the quality assessment of ChIP-seq studies has received significant attention. The ENCODE consortium's large scale analysis of transcription factor binding and epigenetic marks as well as concordant work on ChIP-seq by other laboratories has established a new generation of ChIP-seq quality control measures. The use of these metrics alongside common processing steps has however not been evaluated. In this study, we investigate the effects of blacklisting and removal of duplicated reads on established metrics of ChIP-seq quality and show that the interpretation of these metrics is highly dependent on the ChIP-seq preprocessing steps applied. Further to this we perform the first investigation of the use of these metrics for ChIP-exo data and make recommendations for the adaptation of the NSC statistic to allow for the assessment of ChIP-exo efficiency. PMID:24782889
2017-01-01
Tight and tunable control of gene expression is a highly desirable goal in synthetic biology for constructing predictable gene circuits and achieving preferred phenotypes. Elucidating the sequence–function relationship of promoters is crucial for manipulating gene expression at the transcriptional level, particularly for inducible systems dependent on transcriptional regulators. Sort-seq methods employing fluorescence-activated cell sorting (FACS) and high-throughput sequencing allow for the quantitative analysis of sequence–function relationships in a robust and rapid way. Here we utilized a massively parallel sort-seq approach to analyze the formaldehyde-inducible Escherichia coli promoter (Pfrm) with single-nucleotide resolution. A library of mutated formaldehyde-inducible promoters was cloned upstream of gfp on a plasmid. The library was partitioned into bins via FACS on the basis of green fluorescent protein (GFP) expression level, and mutated promoters falling into each expression bin were identified with high-throughput sequencing. The resulting analysis identified two 19 base pair repressor binding sites, one upstream of the −35 RNA polymerase (RNAP) binding site and one overlapping with the −10 site, and assessed the relative importance of each position and base therein. Key mutations were identified for tuning expression levels and were used to engineer formaldehyde-inducible promoters with predictable activities. Engineered variants demonstrated up to 14-fold lower basal expression, 13-fold higher induced expression, and a 3.6-fold stronger response as indicated by relative dynamic range. Finally, an engineered formaldehyde-inducible promoter was employed to drive the expression of heterologous methanol assimilation genes and achieved increased biomass levels on methanol, a non-native substrate of E. coli. PMID:28463494
ChIP-chip versus ChIP-seq: Lessons for experimental design and data analysis
2011-01-01
Background Chromatin immunoprecipitation (ChIP) followed by microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) allows genome-wide discovery of protein-DNA interactions such as transcription factor bindings and histone modifications. Previous reports only compared a small number of profiles, and little has been done to compare histone modification profiles generated by the two technologies or to assess the impact of input DNA libraries in ChIP-seq analysis. Here, we performed a systematic analysis of a modENCODE dataset consisting of 31 pairs of ChIP-chip/ChIP-seq profiles of the coactivator CBP, RNA polymerase II (RNA PolII), and six histone modifications across four developmental stages of Drosophila melanogaster. Results Both technologies produce highly reproducible profiles within each platform, ChIP-seq generally produces profiles with a better signal-to-noise ratio, and allows detection of more peaks and narrower peaks. The set of peaks identified by the two technologies can be significantly different, but the extent to which they differ varies depending on the factor and the analysis algorithm. Importantly, we found that there is a significant variation among multiple sequencing profiles of input DNA libraries and that this variation most likely arises from both differences in experimental condition and sequencing depth. We further show that using an inappropriate input DNA profile can impact the average signal profiles around genomic features and peak calling results, highlighting the importance of having high quality input DNA data for normalization in ChIP-seq analysis. Conclusions Our findings highlight the biases present in each of the platforms, show the variability that can arise from both technology and analysis methods, and emphasize the importance of obtaining high quality and deeply sequenced input DNA libraries for ChIP-seq analysis. PMID:21356108
David, Fabrice P A; Delafontaine, Julien; Carat, Solenne; Ross, Frederick J; Lefebvre, Gregory; Jarosz, Yohan; Sinclair, Lucas; Noordermeer, Daan; Rougemont, Jacques; Leleu, Marion
2014-01-01
The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch.
HTSstation: A Web Application and Open-Access Libraries for High-Throughput Sequencing Data Analysis
David, Fabrice P. A.; Delafontaine, Julien; Carat, Solenne; Ross, Frederick J.; Lefebvre, Gregory; Jarosz, Yohan; Sinclair, Lucas; Noordermeer, Daan; Rougemont, Jacques; Leleu, Marion
2014-01-01
The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch. PMID:24475057
Combining multiple ChIP-seq peak detection systems using combinatorial fusion.
Schweikert, Christina; Brown, Stuart; Tang, Zuojian; Smith, Phillip R; Hsu, D Frank
2012-01-01
Due to the recent rapid development in ChIP-seq technologies, which uses high-throughput next-generation DNA sequencing to identify the targets of Chromatin Immunoprecipitation, there is an increasing amount of sequencing data being generated that provides us with greater opportunity to analyze genome-wide protein-DNA interactions. In particular, we are interested in evaluating and enhancing computational and statistical techniques for locating protein binding sites. Many peak detection systems have been developed; in this study, we utilize the following six: CisGenome, MACS, PeakSeq, QuEST, SISSRs, and TRLocator. We define two methods to merge and rescore the regions of two peak detection systems and analyze the performance based on average precision and coverage of transcription start sites. The results indicate that ChIP-seq peak detection can be improved by fusion using score or rank combination. Our method of combination and fusion analysis would provide a means for generic assessment of available technologies and systems and assist researchers in choosing an appropriate system (or fusion method) for analyzing ChIP-seq data. This analysis offers an alternate approach for increasing true positive rates, while decreasing false positive rates and hence improving the ChIP-seq peak identification process.
ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data
2010-01-01
Background Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) or ChIP followed by genome tiling array analysis (ChIP-chip) have become standard technologies for genome-wide identification of DNA-binding protein target sites. A number of algorithms have been developed in parallel that allow identification of binding sites from ChIP-seq or ChIP-chip datasets and subsequent visualization in the University of California Santa Cruz (UCSC) Genome Browser as custom annotation tracks. However, summarizing these tracks can be a daunting task, particularly if there are a large number of binding sites or the binding sites are distributed widely across the genome. Results We have developed ChIPpeakAnno as a Bioconductor package within the statistical programming environment R to facilitate batch annotation of enriched peaks identified from ChIP-seq, ChIP-chip, cap analysis of gene expression (CAGE) or any experiments resulting in a large number of enriched genomic regions. The binding sites annotated with ChIPpeakAnno can be viewed easily as a table, a pie chart or plotted in histogram form, i.e., the distribution of distances to the nearest genes for each set of peaks. In addition, we have implemented functionalities for determining the significance of overlap between replicates or binding sites among transcription factors within a complex, and for drawing Venn diagrams to visualize the extent of the overlap between replicates. Furthermore, the package includes functionalities to retrieve sequences flanking putative binding sites for PCR amplification, cloning, or motif discovery, and to identify Gene Ontology (GO) terms associated with adjacent genes. Conclusions ChIPpeakAnno enables batch annotation of the binding sites identified from ChIP-seq, ChIP-chip, CAGE or any technology that results in a large number of enriched genomic regions within the statistical programming environment R. Allowing users to pass their own annotation data such as a different Chromatin immunoprecipitation (ChIP) preparation and a dataset from literature, or existing annotation packages, such as GenomicFeatures and BSgenome, provides flexibility. Tight integration to the biomaRt package enables up-to-date annotation retrieval from the BioMart database. PMID:20459804
Qi, Zhigang; Smith, Kristina M; Bredeweg, Erin L; Bosnjak, Natasa; Freitag, Michael; Nargang, Frank E
2017-02-09
In Neurospora crassa , blocking the function of the standard mitochondrial electron transport chain results in the induction of an alternative oxidase (AOX). AOX transfers electrons directly from ubiquinol to molecular oxygen. AOX serves as a model of retrograde regulation since it is encoded by a nuclear gene that is regulated in response to signals from mitochondria. The N. crassa transcription factors AOD2 and AOD5 are necessary for the expression of the AOX gene. To gain insight into the mechanism by which these factors function, and to determine if they have roles in the expression of additional genes in N. crassa , we constructed strains expressing only tagged versions of the proteins. Cell fractionation experiments showed that both proteins are localized to the nucleus under both AOX inducing and noninducing conditions. Furthermore, chromatin immunoprecipitation and high throughput sequencing (ChIP-seq) analysis revealed that the proteins are bound to the promoter region of the AOX gene under both conditions. ChIP-seq also showed that the transcription factors bind to the upstream regions of a number of genes that are involved in energy production and metabolism. Dependence on AOD2 and AOD5 for the expression of several of these genes was verified by quantitative PCR. The majority of ChIP-seq peaks observed were enriched for both AOD2 and AOD5. However, we also observed occasional sites where one factor appeared to bind preferentially. The most striking of these was a conserved sequence that bound large amounts of AOD2 but little AOD5. This sequence was found within a 310 bp repeat unit that occurs at several locations in the genome. Copyright © 2017 Qi et al.
Lee, Jiwon; Boutz, Daniel R; Chromikova, Veronika; Joyce, M Gordon; Vollmers, Christopher; Leung, Kwanyee; Horton, Andrew P; DeKosky, Brandon J; Lee, Chang-Han; Lavinder, Jason J; Murrin, Ellen M; Chrysostomou, Constantine; Hoi, Kam Hon; Tsybovsky, Yaroslav; Thomas, Paul V; Druz, Aliaksandr; Zhang, Baoshan; Zhang, Yi; Wang, Lingshu; Kong, Wing-Pui; Park, Daechan; Popova, Lyubov I; Dekker, Cornelia L; Davis, Mark M; Carter, Chalise E; Ross, Ted M; Ellington, Andrew D; Wilson, Patrick C; Marcotte, Edward M; Mascola, John R; Ippolito, Gregory C; Krammer, Florian; Quake, Stephen R; Kwong, Peter D; Georgiou, George
2016-12-01
Molecular understanding of serological immunity to influenza has been confounded by the complexity of the polyclonal antibody response in humans. Here we used high-resolution proteomics analysis of immunoglobulin (referred to as Ig-seq) coupled with high-throughput sequencing of transcripts encoding B cell receptors (BCR-seq) to quantitatively determine the antibody repertoire at the individual clonotype level in the sera of young adults before and after vaccination with trivalent seasonal influenza vaccine. The serum repertoire comprised between 40 and 147 clonotypes that were specific to each of the three monovalent components of the trivalent influenza vaccine, with boosted pre-existing clonotypes accounting for ∼60% of the response. An unexpectedly high fraction of serum antibodies recognized both the H1 and H3 monovalent vaccines. Recombinant versions of these H1 + H3 cross-reactive antibodies showed broad binding to hemagglutinins (HAs) from previously circulating virus strains; several of these antibodies, which were prevalent in the serum of multiple donors, recognized the same conserved epitope in the HA head domain. Although the HA-head-specific H1 + H3 antibodies did not show neutralization activity in vitro, they protected mice against infection with the H1N1 and H3N2 virus strains when administered before or after challenge. Collectively, our data reveal unanticipated insights regarding the serological response to influenza vaccination and raise questions about the added benefits of using a quadrivalent vaccine instead of a trivalent vaccine.
Quantitative RNA-seq analysis of the Campylobacter jejuni transcriptome
Chaudhuri, Roy R.; Yu, Lu; Kanji, Alpa; Perkins, Timothy T.; Gardner, Paul P.; Choudhary, Jyoti; Maskell, Duncan J.
2011-01-01
Campylobacter jejuni is the most common bacterial cause of foodborne disease in the developed world. Its general physiology and biochemistry, as well as the mechanisms enabling it to colonize and cause disease in various hosts, are not well understood, and new approaches are required to understand its basic biology. High-throughput sequencing technologies provide unprecedented opportunities for functional genomic research. Recent studies have shown that direct Illumina sequencing of cDNA (RNA-seq) is a useful technique for the quantitative and qualitative examination of transcriptomes. In this study we report RNA-seq analyses of the transcriptomes of C. jejuni (NCTC11168) and its rpoN mutant. This has allowed the identification of hitherto unknown transcriptional units, and further defines the regulon that is dependent on rpoN for expression. The analysis of the NCTC11168 transcriptome was supplemented by additional proteomic analysis using liquid chromatography-MS. The transcriptomic and proteomic datasets represent an important resource for the Campylobacter research community. PMID:21816880
Blatti, Charles; Sinha, Saurabh
2014-07-01
The Motif Enrichment Tool (MET) provides an online interface that enables users to find major transcriptional regulators of their gene sets of interest. MET searches the appropriate regulatory region around each gene and identifies which transcription factor DNA-binding specificities (motifs) are statistically overrepresented. Motif enrichment analysis is currently available for many metazoan species including human, mouse, fruit fly, planaria and flowering plants. MET also leverages high-throughput experimental data such as ChIP-seq and DNase-seq from ENCODE and ModENCODE to identify the regulatory targets of a transcription factor with greater precision. The results from MET are produced in real time and are linked to a genome browser for easy follow-up analysis. Use of the web tool is free and open to all, and there is no login requirement. ADDRESS: http://veda.cs.uiuc.edu/MET/. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Sasagawa, Yohei; Danno, Hiroki; Takada, Hitomi; Ebisawa, Masashi; Tanaka, Kaori; Hayashi, Tetsutaro; Kurisaki, Akira; Nikaido, Itoshi
2018-03-09
High-throughput single-cell RNA-seq methods assign limited unique molecular identifier (UMI) counts as gene expression values to single cells from shallow sequence reads and detect limited gene counts. We thus developed a high-throughput single-cell RNA-seq method, Quartz-Seq2, to overcome these issues. Our improvements in the reaction steps make it possible to effectively convert initial reads to UMI counts, at a rate of 30-50%, and detect more genes. To demonstrate the power of Quartz-Seq2, we analyzed approximately 10,000 transcriptomes from in vitro embryonic stem cells and an in vivo stromal vascular fraction with a limited number of reads.
ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia.
Landt, Stephen G; Marinov, Georgi K; Kundaje, Anshul; Kheradpour, Pouya; Pauli, Florencia; Batzoglou, Serafim; Bernstein, Bradley E; Bickel, Peter; Brown, James B; Cayting, Philip; Chen, Yiwen; DeSalvo, Gilberto; Epstein, Charles; Fisher-Aylor, Katherine I; Euskirchen, Ghia; Gerstein, Mark; Gertz, Jason; Hartemink, Alexander J; Hoffman, Michael M; Iyer, Vishwanath R; Jung, Youngsook L; Karmakar, Subhradip; Kellis, Manolis; Kharchenko, Peter V; Li, Qunhua; Liu, Tao; Liu, X Shirley; Ma, Lijia; Milosavljevic, Aleksandar; Myers, Richard M; Park, Peter J; Pazin, Michael J; Perry, Marc D; Raha, Debasish; Reddy, Timothy E; Rozowsky, Joel; Shoresh, Noam; Sidow, Arend; Slattery, Matthew; Stamatoyannopoulos, John A; Tolstorukov, Michael Y; White, Kevin P; Xi, Simon; Farnham, Peggy J; Lieb, Jason D; Wold, Barbara J; Snyder, Michael
2012-09-01
Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.
ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia
Landt, Stephen G.; Marinov, Georgi K.; Kundaje, Anshul; Kheradpour, Pouya; Pauli, Florencia; Batzoglou, Serafim; Bernstein, Bradley E.; Bickel, Peter; Brown, James B.; Cayting, Philip; Chen, Yiwen; DeSalvo, Gilberto; Epstein, Charles; Fisher-Aylor, Katherine I.; Euskirchen, Ghia; Gerstein, Mark; Gertz, Jason; Hartemink, Alexander J.; Hoffman, Michael M.; Iyer, Vishwanath R.; Jung, Youngsook L.; Karmakar, Subhradip; Kellis, Manolis; Kharchenko, Peter V.; Li, Qunhua; Liu, Tao; Liu, X. Shirley; Ma, Lijia; Milosavljevic, Aleksandar; Myers, Richard M.; Park, Peter J.; Pazin, Michael J.; Perry, Marc D.; Raha, Debasish; Reddy, Timothy E.; Rozowsky, Joel; Shoresh, Noam; Sidow, Arend; Slattery, Matthew; Stamatoyannopoulos, John A.; Tolstorukov, Michael Y.; White, Kevin P.; Xi, Simon; Farnham, Peggy J.; Lieb, Jason D.; Wold, Barbara J.; Snyder, Michael
2012-01-01
Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals. PMID:22955991
Hardcastle, Thomas J
2016-01-15
High-throughput data are now commonplace in biological research. Rapidly changing technologies and application mean that novel methods for detecting differential behaviour that account for a 'large P, small n' setting are required at an increasing rate. The development of such methods is, in general, being done on an ad hoc basis, requiring further development cycles and a lack of standardization between analyses. We present here a generalized method for identifying differential behaviour within high-throughput biological data through empirical Bayesian methods. This approach is based on our baySeq algorithm for identification of differential expression in RNA-seq data based on a negative binomial distribution, and in paired data based on a beta-binomial distribution. Here we show how the same empirical Bayesian approach can be applied to any parametric distribution, removing the need for lengthy development of novel methods for differently distributed data. Comparisons with existing methods developed to address specific problems in high-throughput biological data show that these generic methods can achieve equivalent or better performance. A number of enhancements to the basic algorithm are also presented to increase flexibility and reduce computational costs. The methods are implemented in the R baySeq (v2) package, available on Bioconductor http://www.bioconductor.org/packages/release/bioc/html/baySeq.html. tjh48@cam.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Trapnell, Cole; Roberts, Adam; Goff, Loyal; Pertea, Geo; Kim, Daehwan; Kelley, David R; Pimentel, Harold; Salzberg, Steven L; Rinn, John L; Pachter, Lior
2012-01-01
Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and ~1 h of hands-on time. PMID:22383036
KatharoSeq Enables High-Throughput Microbiome Analysis from Low-Biomass Samples
Minich, Jeremiah J.; Zhu, Qiyun; Janssen, Stefan; Hendrickson, Ryan; Amir, Amnon; Vetter, Russ; Hyde, John; Doty, Megan M.; Stillwell, Kristina; Benardini, James; Kim, Jae H.; Allen, Eric E.
2018-01-01
ABSTRACT Microbiome analyses of low-biomass samples are challenging because of contamination and inefficiencies, leading many investigators to employ low-throughput methods with minimal controls. We developed a new automated protocol, KatharoSeq (from the Greek katharos [clean]), that outperforms single-tube extractions while processing at least five times as fast. KatharoSeq incorporates positive and negative controls to reveal the whole bacterial community from inputs of as few as 50 cells and correctly identifies 90.6% (standard error, 0.013%) of the reads from 500 cells. To demonstrate the broad utility of KatharoSeq, we performed 16S rRNA amplicon and shotgun metagenome analyses of the Jet Propulsion Laboratory spacecraft assembly facility (SAF; n = 192, 96), 52 rooms of a neonatal intensive care unit (NICU; n = 388, 337), and an endangered-abalone-rearing facility (n = 192, 123), obtaining spatially resolved, unique microbiomes reproducible across hundreds of samples. The SAF, our primary focus, contains 32 sOTUs (sub-OTUs, defined as exact sequence matches) and their inferred variants identified by the deblur algorithm, with four (Acinetobacter lwoffii, Paracoccus marcusii, Mycobacterium sp., and Novosphingobium) being present in >75% of the samples. According to microbial spatial topography, the most abundant cleanroom contaminant, A. lwoffii, is related to human foot traffic exposure. In the NICU, we have been able to discriminate environmental exposure related to patient infectious disease, and in the abalone facility, we show that microbial communities reflect the marine environment rather than human input. Consequently, we demonstrate the feasibility and utility of large-scale, low-biomass metagenomic analyses using the KatharoSeq protocol. IMPORTANCE Various indoor, outdoor, and host-associated environments contain small quantities of microbial biomass and represent a niche that is often understudied because of technical constraints. Many studies that attempt to evaluate these low-biomass microbiome samples are riddled with erroneous results that are typically false positive signals obtained during the sampling process. We have investigated various low-biomass kits and methods to determine the limit of detection of these pipelines. Here we present KatharoSeq, a high-throughput protocol combining laboratory and bioinformatic methods that can differentiate a true positive signal in samples with as few as 50 to 500 cells. We demonstrate the application of this method in three unique low-biomass environments, including a SAF, a hospital NICU, and an abalone-rearing facility. PMID:29577086
The promise and challenge of high-throughput sequencing of the antibody repertoire
Georgiou, George; Ippolito, Gregory C; Beausang, John; Busse, Christian E; Wardemann, Hedda; Quake, Stephen R
2014-01-01
Efforts to determine the antibody repertoire encoded by B cells in the blood or lymphoid organs using high-throughput DNA sequencing technologies have been advancing at an extremely rapid pace and are transforming our understanding of humoral immune responses. Information gained from high-throughput DNA sequencing of immunoglobulin genes (Ig-seq) can be applied to detect B-cell malignancies with high sensitivity, to discover antibodies specific for antigens of interest, to guide vaccine development and to understand autoimmunity. Rapid progress in the development of experimental protocols and informatics analysis tools is helping to reduce sequencing artifacts, to achieve more precise quantification of clonal diversity and to extract the most pertinent biological information. That said, broader application of Ig-seq, especially in clinical settings, will require the development of a standardized experimental design framework that will enable the sharing and meta-analysis of sequencing data generated by different laboratories. PMID:24441474
Inertial-ordering-assisted droplet microfluidics for high-throughput single-cell RNA-sequencing.
Moon, Hui-Sung; Je, Kwanghwi; Min, Jae-Woong; Park, Donghyun; Han, Kyung-Yeon; Shin, Seung-Ho; Park, Woong-Yang; Yoo, Chang Eun; Kim, Shin-Hyun
2018-02-27
Single-cell RNA-seq reveals the cellular heterogeneity inherent in the population of cells, which is very important in many clinical and research applications. Recent advances in droplet microfluidics have achieved the automatic isolation, lysis, and labeling of single cells in droplet compartments without complex instrumentation. However, barcoding errors occurring in the cell encapsulation process because of the multiple-beads-in-droplet and insufficient throughput because of the low concentration of beads for avoiding multiple-beads-in-a-droplet remain important challenges for precise and efficient expression profiling of single cells. In this study, we developed a new droplet-based microfluidic platform that significantly improved the throughput while reducing barcoding errors through deterministic encapsulation of inertially ordered beads. Highly concentrated beads containing oligonucleotide barcodes were spontaneously ordered in a spiral channel by an inertial effect, which were in turn encapsulated in droplets one-by-one, while cells were simultaneously encapsulated in the droplets. The deterministic encapsulation of beads resulted in a high fraction of single-bead-in-a-droplet and rare multiple-beads-in-a-droplet although the bead concentration increased to 1000 μl -1 , which diminished barcoding errors and enabled accurate high-throughput barcoding. We successfully validated our device with single-cell RNA-seq. In addition, we found that multiple-beads-in-a-droplet, generated using a normal Drop-Seq device with a high concentration of beads, underestimated transcript numbers and overestimated cell numbers. This accurate high-throughput platform can expand the capability and practicality of Drop-Seq in single-cell analysis.
Feng, Shaolong; Eucker, Tyson P.; Holly, Mayumi K.; Konkel, Michael E.
2014-01-01
We present the results of a study using high-throughput whole-transcriptome sequencing (RNA-seq) and vibrational spectroscopy to characterize and fingerprint pathogenic-bacterium injury under conditions of unfavorable stress. Two garlic-derived organosulfur compounds were found to be highly effective antimicrobial compounds against Cronobacter sakazakii, a leading pathogen associated with invasive infection of infants and causing meningitis, necrotizing entercolitis, and bacteremia. RNA-seq shows changes in gene expression patterns and transcriptomic response, while confocal micro-Raman spectroscopy characterizes macromolecular changes in the bacterial cell resulting from this chemical stress. RNA-seq analyses showed that the bacterial response to ajoene differed from the response to diallyl sulfide. Specifically, ajoene caused downregulation of motility-related genes, while diallyl sulfide treatment caused an increased expression of cell wall synthesis genes. Confocal micro-Raman spectroscopy revealed that the two compounds appear to have the same phase I antimicrobial mechanism of binding to thiol-containing proteins/enzymes in bacterial cells generating a disulfide stretching band but different phase II antimicrobial mechanisms, showing alterations in the secondary structures of proteins in two different ways. Diallyl sulfide primarily altered the α-helix and β-sheet, as reflected in changes in amide I, while ajoene altered the structures containing phenylalanine and tyrosine. Bayesian probability analysis validated the ability of principal component analysis to differentiate treated and control C. sakazakii cells. Scanning electron microscopy confirmed cell injury, showing significant morphological variations in cells following treatments by these two compounds. Findings from this study aid in the development of effective intervention strategies to reduce the risk of C. sakazakii contamination in the food production environment and on food contact surfaces, reducing the risks to susceptible consumers. PMID:24271174
Watanabe, Kazuhide; Biesinger, Jacob; Salmans, Michael L.; Roberts, Brian S.; Arthur, William T.; Cleary, Michele; Andersen, Bogi; Xie, Xiaohui; Dai, Xing
2014-01-01
Background Deregulation of canonical Wnt/CTNNB1 (beta-catenin) pathway is one of the earliest events in the pathogenesis of colon cancer. Mutations in APC or CTNNB1 are highly frequent in colon cancer and cause aberrant stabilization of CTNNB1, which activates the transcription of Wnt target genes by binding to chromatin via the TCF/LEF transcription factors. Here we report an integrative analysis of genome-wide chromatin occupancy of CTNNB1 by chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) and gene expression profiling by microarray analysis upon RNAi-mediated knockdown of CTNNB1 in colon cancer cells. Results We observed 3629 CTNNB1 binding peaks across the genome and a significant correlation between CTNNB1 binding and knockdown-induced gene expression change. Our integrative analysis led to the discovery of a direct Wnt target signature composed of 162 genes. Gene ontology analysis of this signature revealed a significant enrichment of Wnt pathway genes, suggesting multiple feedback regulations of the pathway. We provide evidence that this gene signature partially overlaps with the Lgr5+ intestinal stem cell signature, and is significantly enriched in normal intestinal stem cells as well as in clinical colorectal cancer samples. Interestingly, while the expression of the CTNNB1 target gene set does not correlate with survival, elevated expression of negative feedback regulators within the signature predicts better prognosis. Conclusion Our data provide a genome-wide view of chromatin occupancy and gene regulation of Wnt/CTNNB1 signaling in colon cancer cells. PMID:24651522
Watanabe, Kazuhide; Biesinger, Jacob; Salmans, Michael L; Roberts, Brian S; Arthur, William T; Cleary, Michele; Andersen, Bogi; Xie, Xiaohui; Dai, Xing
2014-01-01
Deregulation of canonical Wnt/CTNNB1 (beta-catenin) pathway is one of the earliest events in the pathogenesis of colon cancer. Mutations in APC or CTNNB1 are highly frequent in colon cancer and cause aberrant stabilization of CTNNB1, which activates the transcription of Wnt target genes by binding to chromatin via the TCF/LEF transcription factors. Here we report an integrative analysis of genome-wide chromatin occupancy of CTNNB1 by chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) and gene expression profiling by microarray analysis upon RNAi-mediated knockdown of CTNNB1 in colon cancer cells. We observed 3629 CTNNB1 binding peaks across the genome and a significant correlation between CTNNB1 binding and knockdown-induced gene expression change. Our integrative analysis led to the discovery of a direct Wnt target signature composed of 162 genes. Gene ontology analysis of this signature revealed a significant enrichment of Wnt pathway genes, suggesting multiple feedback regulations of the pathway. We provide evidence that this gene signature partially overlaps with the Lgr5+ intestinal stem cell signature, and is significantly enriched in normal intestinal stem cells as well as in clinical colorectal cancer samples. Interestingly, while the expression of the CTNNB1 target gene set does not correlate with survival, elevated expression of negative feedback regulators within the signature predicts better prognosis. Our data provide a genome-wide view of chromatin occupancy and gene regulation of Wnt/CTNNB1 signaling in colon cancer cells.
2014-01-01
Background RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. Results We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification” includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module “mRNA identification” includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module “Target screening” provides expression profiling analyses and graphic visualization. The module “Self-testing” offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program’s functionality. Conclusions eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory. PMID:24593312
Yuan, Tiezheng; Huang, Xiaoyi; Dittmar, Rachel L; Du, Meijun; Kohli, Manish; Boardman, Lisa; Thibodeau, Stephen N; Wang, Liang
2014-03-05
RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification" includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module "mRNA identification" includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module "Target screening" provides expression profiling analyses and graphic visualization. The module "Self-testing" offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program's functionality. eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory.
Position-specific binding of FUS to nascent RNA regulates mRNA length
Masuda, Akio; Takeda, Jun-ichi; Okuno, Tatsuya; Okamoto, Takaaki; Ohkawara, Bisei; Ito, Mikako; Ishigaki, Shinsuke; Sobue, Gen
2015-01-01
More than half of all human genes produce prematurely terminated polyadenylated short mRNAs. However, the underlying mechanisms remain largely elusive. CLIP-seq (cross-linking immunoprecipitation [CLIP] combined with deep sequencing) of FUS (fused in sarcoma) in neuronal cells showed that FUS is frequently clustered around an alternative polyadenylation (APA) site of nascent RNA. ChIP-seq (chromatin immunoprecipitation [ChIP] combined with deep sequencing) of RNA polymerase II (RNAP II) demonstrated that FUS stalls RNAP II and prematurely terminates transcription. When an APA site is located upstream of an FUS cluster, FUS enhances polyadenylation by recruiting CPSF160 and up-regulates the alternative short transcript. In contrast, when an APA site is located downstream from an FUS cluster, polyadenylation is not activated, and the RNAP II-suppressing effect of FUS leads to down-regulation of the alternative short transcript. CAGE-seq (cap analysis of gene expression [CAGE] combined with deep sequencing) and PolyA-seq (a strand-specific and quantitative method for high-throughput sequencing of 3' ends of polyadenylated transcripts) revealed that position-specific regulation of mRNA lengths by FUS is operational in two-thirds of transcripts in neuronal cells, with enrichment in genes involved in synaptic activities. PMID:25995189
S-MART, a software toolbox to aid RNA-Seq data analysis.
Zytnicki, Matthias; Quesneville, Hadi
2011-01-01
High-throughput sequencing is now routinely performed in many experiments. But the analysis of the millions of sequences generated, is often beyond the expertise of the wet labs who have no personnel specializing in bioinformatics. Whereas several tools are now available to map high-throughput sequencing data on a genome, few of these can extract biological knowledge from the mapped reads. We have developed a toolbox called S-MART, which handles mapped RNA-Seq data. S-MART is an intuitive and lightweight tool which performs many of the tasks usually required for the analysis of mapped RNA-Seq reads. S-MART does not require any computer science background and thus can be used by all of the biologist community through a graphical interface. S-MART can run on any personal computer, yielding results within an hour even for Gb of data for most queries. S-MART may perform the entire analysis of the mapped reads, without any need for other ad hoc scripts. With this tool, biologists can easily perform most of the analyses on their computer for their RNA-Seq data, from the mapped data to the discovery of important loci.
S-MART, A Software Toolbox to Aid RNA-seq Data Analysis
Zytnicki, Matthias; Quesneville, Hadi
2011-01-01
High-throughput sequencing is now routinely performed in many experiments. But the analysis of the millions of sequences generated, is often beyond the expertise of the wet labs who have no personnel specializing in bioinformatics. Whereas several tools are now available to map high-throughput sequencing data on a genome, few of these can extract biological knowledge from the mapped reads. We have developed a toolbox called S-MART, which handles mapped RNA-Seq data. S-MART is an intuitive and lightweight tool which performs many of the tasks usually required for the analysis of mapped RNA-Seq reads. S-MART does not require any computer science background and thus can be used by all of the biologist community through a graphical interface. S-MART can run on any personal computer, yielding results within an hour even for Gb of data for most queries. S-MART may perform the entire analysis of the mapped reads, without any need for other ad hoc scripts. With this tool, biologists can easily perform most of the analyses on their computer for their RNA-Seq data, from the mapped data to the discovery of important loci. PMID:21998740
Data exploration, quality control and statistical analysis of ChIP-exo/nexus experiments
Welch, Rene; Chung, Dongjun; Grass, Jeffrey; Landick, Robert
2017-01-01
Abstract ChIP-exo/nexus experiments rely on innovative modifications of the commonly used ChIP-seq protocol for high resolution mapping of transcription factor binding sites. Although many aspects of the ChIP-exo data analysis are similar to those of ChIP-seq, these high throughput experiments pose a number of unique quality control and analysis challenges. We develop a novel statistical quality control pipeline and accompanying R/Bioconductor package, ChIPexoQual, to enable exploration and analysis of ChIP-exo and related experiments. ChIPexoQual evaluates a number of key issues including strand imbalance, library complexity, and signal enrichment of data. Assessment of these features are facilitated through diagnostic plots and summary statistics computed over regions of the genome with varying levels of coverage. We evaluated our QC pipeline with both large collections of public ChIP-exo/nexus data and multiple, new ChIP-exo datasets from Escherichia coli. ChIPexoQual analysis of these datasets resulted in guidelines for using these QC metrics across a wide range of sequencing depths and provided further insights for modelling ChIP-exo data. PMID:28911122
Data exploration, quality control and statistical analysis of ChIP-exo/nexus experiments.
Welch, Rene; Chung, Dongjun; Grass, Jeffrey; Landick, Robert; Keles, Sündüz
2017-09-06
ChIP-exo/nexus experiments rely on innovative modifications of the commonly used ChIP-seq protocol for high resolution mapping of transcription factor binding sites. Although many aspects of the ChIP-exo data analysis are similar to those of ChIP-seq, these high throughput experiments pose a number of unique quality control and analysis challenges. We develop a novel statistical quality control pipeline and accompanying R/Bioconductor package, ChIPexoQual, to enable exploration and analysis of ChIP-exo and related experiments. ChIPexoQual evaluates a number of key issues including strand imbalance, library complexity, and signal enrichment of data. Assessment of these features are facilitated through diagnostic plots and summary statistics computed over regions of the genome with varying levels of coverage. We evaluated our QC pipeline with both large collections of public ChIP-exo/nexus data and multiple, new ChIP-exo datasets from Escherichia coli. ChIPexoQual analysis of these datasets resulted in guidelines for using these QC metrics across a wide range of sequencing depths and provided further insights for modelling ChIP-exo data. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Devailly, Guillaume; Mantsoki, Anna; Joshi, Anagha
2016-11-01
Better protocols and decreasing costs have made high-throughput sequencing experiments now accessible even to small experimental laboratories. However, comparing one or few experiments generated by an individual lab to the vast amount of relevant data freely available in the public domain might be limited due to lack of bioinformatics expertise. Though several tools, including genome browsers, allow such comparison at a single gene level, they do not provide a genome-wide view. We developed Heat*seq, a web-tool that allows genome scale comparison of high throughput experiments chromatin immuno-precipitation followed by sequencing, RNA-sequencing and Cap Analysis of Gene Expression) provided by a user, to the data in the public domain. Heat*seq currently contains over 12 000 experiments across diverse tissues and cell types in human, mouse and drosophila. Heat*seq displays interactive correlation heatmaps, with an ability to dynamically subset datasets to contextualize user experiments. High quality figures and tables are produced and can be downloaded in multiple formats. Web application: http://www.heatstarseq.roslin.ed.ac.uk/ Source code: https://github.com/gdevailly CONTACT: Guillaume.Devailly@roslin.ed.ac.uk or Anagha.Joshi@roslin.ed.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Wei, Yulong; Silke, Jordan R; Xia, Xuhua
2017-12-15
Bacterial translation initiation is influenced by base pairing between the Shine-Dalgarno (SD) sequence in the 5' UTR of mRNA and the anti-SD (aSD) sequence at the free 3' end of the 16S rRNA (3' TAIL) due to: 1) the SD/aSD sequence binding location and 2) SD/aSD binding affinity. In order to understand what makes an SD/aSD interaction optimal, we must define: 1) terminus of the 3' TAIL and 2) extent of the core aSD sequence within the 3' TAIL. Our approach to characterize these components in Escherichia coli and Bacillus subtilis involves 1) mapping the 3' boundary of the mature 16S rRNA using high-throughput RNA sequencing (RNA-Seq), and 2) identifying the segment within the 3' TAIL that is strongly preferred in SD/aSD pairing. Using RNA-Seq data, we resolve previous discrepancies in the reported 3' TAIL in B. subtilis and recovered the established 3' TAIL in E. coli. Furthermore, we extend previous studies to suggest that both highly and lowly expressed genes favor SD sequences with intermediate binding affinity, but this trend is exclusive to SD sequences that complement the core aSD sequences defined herein.
Cabrera, Paula V.; Pang, Mabel; Marshall, Jamie L.; Kung, Raymond; Nelson, Stanley F.; Stalnaker, Stephanie H.; Wells, Lance; Crosbie-Watson, Rachelle H.; Baum, Linda G.
2012-01-01
Duchenne muscular dystrophy is an X-linked disorder characterized by loss of dystrophin, a cytoskeletal protein that connects the actin cytoskeleton in skeletal muscle cells to extracellular matrix. Dystrophin binds to the cytoplasmic domain of the transmembrane glycoprotein β-dystroglycan (β-DG), which associates with cell surface α-dystroglycan (α-DG) that binds laminin in the extracellular matrix. β-DG can also associate with utrophin, and this differential association correlates with specific glycosylation changes on α-DG. Genetic modification of α-DG glycosylation can promote utrophin binding and rescue dystrophic phenotypes in mouse dystrophy models. We used high throughput screening with the plant lectin Wisteria floribunda agglutinin (WFA) to identify compounds that altered muscle cell surface glycosylation, with the goal of finding compounds that increase abundance of α-DG and associated sarcolemmal glycoproteins, increase utrophin usage, and increase laminin binding. We identified one compound, lobeline, from the Prestwick library of Food and Drug Administration-approved compounds that fulfilled these criteria, increasing WFA binding to C2C12 cells and to primary muscle cells from wild type and mdx mice. WFA binding and enhancement by lobeline required complex N-glycans but not O-mannose glycans that bind laminin. However, inhibiting complex N-glycan processing reduced laminin binding to muscle cell glycoproteins, although O-mannosylation was intact. Glycan analysis demonstrated a general increase in N-glycans on lobeline-treated cells rather than specific alterations in cell surface glycosylation, consistent with increased abundance of multiple sarcolemmal glycoproteins. This demonstrates the feasibility of high throughput screening with plant lectins to identify compounds that alter muscle cell glycosylation and identifies a novel role for N-glycans in regulating muscle cell function. PMID:22570487
Li, Jie; Overall, Christopher C.; Johnson, Rudd C.; ...
2015-09-21
The alternative sigma factor σ E functions to maintain bacterial homeostasis and membrane integrity in response to extracytoplasmic stress by regulating thousands of genes both directly and indirectly. The transcriptional regulatory network governed by σ E in Salmonella and E. coli has been examined using microarray, however a genome-wide analysis of σ E–binding sites inSalmonella has not yet been reported. We infected macrophages with Salmonella Typhimurium over a select time course. Using chromatin immunoprecipitation followed by high-throughput DNA sequencing (ChIP-seq), 31 σ E–binding sites were identified. Seventeen sites were new, which included outer membrane proteins, a quorum-sensing protein, a cellmore » division factor, and a signal transduction modulator. The consensus sequence identified for σ E in vivo binding was similar to the one previously reported, except for a conserved G and A between the -35 and -10 regions. One third of the σ E–binding sites did not contain the consensus sequence, suggesting there may be alternative mechanisms by which σ E modulates transcription. By dissecting direct and indirect modes of σ E-mediated regulation, we found that σ E activates gene expression through recognition of both canonical and reversed consensus sequence. Lastly, new σ E regulated genes ( greA, luxS, ompA and ompX) are shown to be involved in heat shock and oxidative stress responses.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Jie; Overall, Christopher C.; Johnson, Rudd C.
The alternative sigma factor σ E functions to maintain bacterial homeostasis and membrane integrity in response to extracytoplasmic stress by regulating thousands of genes both directly and indirectly. The transcriptional regulatory network governed by σ E in Salmonella and E. coli has been examined using microarray, however a genome-wide analysis of σ E–binding sites inSalmonella has not yet been reported. We infected macrophages with Salmonella Typhimurium over a select time course. Using chromatin immunoprecipitation followed by high-throughput DNA sequencing (ChIP-seq), 31 σ E–binding sites were identified. Seventeen sites were new, which included outer membrane proteins, a quorum-sensing protein, a cellmore » division factor, and a signal transduction modulator. The consensus sequence identified for σ E in vivo binding was similar to the one previously reported, except for a conserved G and A between the -35 and -10 regions. One third of the σ E–binding sites did not contain the consensus sequence, suggesting there may be alternative mechanisms by which σ E modulates transcription. By dissecting direct and indirect modes of σ E-mediated regulation, we found that σ E activates gene expression through recognition of both canonical and reversed consensus sequence. Lastly, new σ E regulated genes ( greA, luxS, ompA and ompX) are shown to be involved in heat shock and oxidative stress responses.« less
A computational method for estimating the PCR duplication rate in DNA and RNA-seq experiments.
Bansal, Vikas
2017-03-14
PCR amplification is an important step in the preparation of DNA sequencing libraries prior to high-throughput sequencing. PCR amplification introduces redundant reads in the sequence data and estimating the PCR duplication rate is important to assess the frequency of such reads. Existing computational methods do not distinguish PCR duplicates from "natural" read duplicates that represent independent DNA fragments and therefore, over-estimate the PCR duplication rate for DNA-seq and RNA-seq experiments. In this paper, we present a computational method to estimate the average PCR duplication rate of high-throughput sequence datasets that accounts for natural read duplicates by leveraging heterozygous variants in an individual genome. Analysis of simulated data and exome sequence data from the 1000 Genomes project demonstrated that our method can accurately estimate the PCR duplication rate on paired-end as well as single-end read datasets which contain a high proportion of natural read duplicates. Further, analysis of exome datasets prepared using the Nextera library preparation method indicated that 45-50% of read duplicates correspond to natural read duplicates likely due to fragmentation bias. Finally, analysis of RNA-seq datasets from individuals in the 1000 Genomes project demonstrated that 70-95% of read duplicates observed in such datasets correspond to natural duplicates sampled from genes with high expression and identified outlier samples with a 2-fold greater PCR duplication rate than other samples. The method described here is a useful tool for estimating the PCR duplication rate of high-throughput sequence datasets and for assessing the fraction of read duplicates that correspond to natural read duplicates. An implementation of the method is available at https://github.com/vibansal/PCRduplicates .
Smaczniak, Cezary; Muiño, Jose M; Chen, Dijun; Angenent, Gerco C; Kaufmann, Kerstin
2017-08-01
Floral organ identities in plants are specified by the combinatorial action of homeotic master regulatory transcription factors. However, how these factors achieve their regulatory specificities is still largely unclear. Genome-wide in vivo DNA binding data show that homeotic MADS domain proteins recognize partly distinct genomic regions, suggesting that DNA binding specificity contributes to functional differences of homeotic protein complexes. We used in vitro systematic evolution of ligands by exponential enrichment followed by high-throughput DNA sequencing (SELEX-seq) on several floral MADS domain protein homo- and heterodimers to measure their DNA binding specificities. We show that specification of reproductive organs is associated with distinct binding preferences of a complex formed by SEPALLATA3 and AGAMOUS. Binding specificity is further modulated by different binding site spacing preferences. Combination of SELEX-seq and genome-wide DNA binding data allows differentiation between targets in specification of reproductive versus perianth organs in the flower. We validate the importance of DNA binding specificity for organ-specific gene regulation by modulating promoter activity through targeted mutagenesis. Our study shows that intrafamily protein interactions affect DNA binding specificity of floral MADS domain proteins. Differential DNA binding of MADS domain protein complexes plays a role in the specificity of target gene regulation. © 2017 American Society of Plant Biologists. All rights reserved.
Liao, Wei; Jordaan, Gwen; Nham, Phillipp; Phan, Ryan T; Pelegrini, Matteo; Sharma, Sanjai
2015-10-16
To determine differentially expressed and spliced RNA transcripts in chronic lymphocytic leukemia specimens a high throughput RNA-sequencing (HTS RNA-seq) analysis was performed. Ten CLL specimens and five normal peripheral blood CD19+ B cells were analyzed by HTS RNA-seq. The library preparation was performed with Illumina TrueSeq RNA kit and analyzed by Illumina HiSeq 2000 sequencing system. An average of 48.5 million reads for B cells, and 50.6 million reads for CLL specimens were obtained with 10396 and 10448 assembled transcripts for normal B cells and primary CLL specimens respectively. With the Cuffdiff analysis, 2091 differentially expressed genes (DEG) between B cells and CLL specimens based on FPKM (fragments per kilobase of transcript per million reads and false discovery rate, FDR q < 0.05, fold change >2) were identified. Expression of selected DEGs (n = 32) with up regulated and down regulated expression in CLL from RNA-seq data were also analyzed by qRT-PCR in a test cohort of CLL specimens. Even though there was a variation in fold expression of DEG genes between RNA-seq and qRT-PCR; more than 90 % of analyzed genes were validated by qRT-PCR analysis. Analysis of RNA-seq data for splicing alterations in CLL and B cells was performed by Multivariate Analysis of Transcript Splicing (MATS analysis). Skipped exon was the most frequent splicing alteration in CLL specimens with 128 significant events (P-value <0.05, minimum inclusion level difference >0.1). The RNA-seq analysis of CLL specimens identifies novel DEG and alternatively spliced genes that are potential prognostic markers and therapeutic targets. High level of validation by qRT-PCR for a number of DEG genes supports the accuracy of this analysis. Global comparison of transcriptomes of B cells, IGVH non-mutated CLL (U-CLL) and mutated CLL specimens (M-CLL) with multidimensional scaling analysis was able to segregate CLL and B cell transcriptomes but the M-CLL and U-CLL transcriptomes were indistinguishable. The analysis of HTS RNA-seq data to identify alternative splicing events and other genetic abnormalities specific to CLL is an added advantage of RNA-seq that is not feasible with other genome wide analysis.
Methods for processing high-throughput RNA sequencing data.
Ares, Manuel
2014-11-03
High-throughput sequencing (HTS) methods for analyzing RNA populations (RNA-Seq) are gaining rapid application to many experimental situations. The steps in an RNA-Seq experiment require thought and planning, especially because the expense in time and materials is currently higher and the protocols are far less routine than those used for other high-throughput methods, such as microarrays. As always, good experimental design will make analysis and interpretation easier. Having a clear biological question, an idea about the best way to do the experiment, and an understanding of the number of replicates needed will make the entire process more satisfying. Whether the goal is capturing transcriptome complexity from a tissue or identifying small fragments of RNA cross-linked to a protein of interest, conversion of the RNA to cDNA followed by direct sequencing using the latest methods is a developing practice, with new technical modifications and applications appearing every day. Even more rapid are the development and improvement of methods for analysis of the very large amounts of data that arrive at the end of an RNA-Seq experiment, making considerations regarding reproducibility, validation, visualization, and interpretation increasingly important. This introduction is designed to review and emphasize a pathway of analysis from experimental design through data presentation that is likely to be successful, with the recognition that better methods are right around the corner. © 2014 Cold Spring Harbor Laboratory Press.
Evaluation of sequencing approaches for high-throughput ...
Whole-genome in vitro transcriptomics has shown the capability to identify mechanisms of action and estimates of potency for chemical-mediated effects in a toxicological framework, but with limited throughput and high cost. We present the evaluation of three toxicogenomics platforms for potential application to high-throughput screening: 1. TempO-Seq utilizing custom designed paired probes per gene; 2. Targeted sequencing (TSQ) utilizing Illumina’s TruSeq RNA Access Library Prep Kit containing tiled exon-specific probe sets; 3. Low coverage whole transcriptome sequencing (LSQ) using Illumina’s TruSeq Stranded mRNA Kit. Each platform was required to cover the ~20,000 genes of the full transcriptome, operate directly with cell lysates, and be automatable with 384-well plates. Technical reproducibility was assessed using MAQC control RNA samples A and B, while functional utility for chemical screening was evaluated using six treatments at a single concentration after 6 hr in MCF7 breast cancer cells: 10 µM chlorpromazine, 10 µM ciclopriox, 10 µM genistein, 100 nM sirolimus, 1 µM tanespimycin, and 1 µM trichostatin A. All RNA samples and chemical treatments were run with 5 technical replicates. The three platforms achieved different read depths, with the TempO-Seq having ~34M mapped reads per sample, while TSQ and LSQ averaged 20M and 11M aligned reads per sample, respectively. Inter-replicate correlation averaged ≥0.95 for raw log2 expression values i
Tijssen, Marloes R.; Cvejic, Ana; Joshi, Anagha; Hannah, Rebecca L.; Ferreira, Rita; Forrai, Ariel; Bellissimo, Dana C.; Oram, S. Helen; Smethurst, Peter A.; Wilson, Nicola K.; Wang, Xiaonan; Ottersbach, Katrin; Stemple, Derek L.; Green, Anthony R.; Ouwehand, Willem H.; Göttgens, Berthold
2011-01-01
Summary Hematopoietic differentiation critically depends on combinations of transcriptional regulators controlling the development of individual lineages. Here, we report the genome-wide binding sites for the five key hematopoietic transcription factors—GATA1, GATA2, RUNX1, FLI1, and TAL1/SCL—in primary human megakaryocytes. Statistical analysis of the 17,263 regions bound by at least one factor demonstrated that simultaneous binding by all five factors was the most enriched pattern and often occurred near known hematopoietic regulators. Eight genes not previously appreciated to function in hematopoiesis that were bound by all five factors were shown to be essential for thrombocyte and/or erythroid development in zebrafish. Moreover, one of these genes encoding the PDZK1IP1 protein shared transcriptional enhancer elements with the blood stem cell regulator TAL1/SCL. Multifactor ChIP-Seq analysis in primary human cells coupled with a high-throughput in vivo perturbation screen therefore offers a powerful strategy to identify essential regulators of complex mammalian differentiation processes. PMID:21571218
Role of APOE Isoforms in the Pathogenesis of TBI Induced Alzheimer’s Disease
2015-10-01
global deletion, APOE targeted replacement, complex breeding, CCI model optimization, mRNA library generation, high throughput massive parallel ...ATP binding cassette transporter A1 (ABCA1) is a lipid transporter that controls the generation of HDL in plasma and ApoE-containing lipoproteins in... parallel sequencing, mRNA-seq, behavioral testing, mem- ory impairement, recovery. 3 Overall Project Summary During the reported period, we have been able
TRAPR: R Package for Statistical Analysis and Visualization of RNA-Seq Data.
Lim, Jae Hyun; Lee, Soo Youn; Kim, Ju Han
2017-03-01
High-throughput transcriptome sequencing, also known as RNA sequencing (RNA-Seq), is a standard technology for measuring gene expression with unprecedented accuracy. Numerous bioconductor packages have been developed for the statistical analysis of RNA-Seq data. However, these tools focus on specific aspects of the data analysis pipeline, and are difficult to appropriately integrate with one another due to their disparate data structures and processing methods. They also lack visualization methods to confirm the integrity of the data and the process. In this paper, we propose an R-based RNA-Seq analysis pipeline called TRAPR, an integrated tool that facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization that allow researchers to build customized analysis pipelines.
Pervasive Targeting of Nascent Transcripts by Hfq.
Kambara, Tracy K; Ramsey, Kathryn M; Dove, Simon L
2018-05-01
Hfq is an RNA chaperone and an important post-transcriptional regulator in bacteria. Using chromatin immunoprecipitation coupled with high-throughput DNA sequencing (ChIP-seq), we show that Hfq associates with hundreds of different regions of the Pseudomonas aeruginosa chromosome. These associations are abolished when transcription is inhibited, indicating that they reflect Hfq binding to transcripts during their synthesis. Analogous ChIP-seq analyses with the post-transcriptional regulator Crc reveal that it associates with many of the same nascent transcripts as Hfq, an activity we show is Hfq dependent. Our findings indicate that Hfq binds many transcripts co-transcriptionally in P. aeruginosa, often in concert with Crc, and uncover direct regulatory targets of these proteins. They also highlight a general approach for studying the interactions of RNA-binding proteins with nascent transcripts in bacteria. The binding of post-transcriptional regulators to nascent mRNAs may represent a prevalent means of controlling translation in bacteria where transcription and translation are coupled. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Transcriptome Dynamics during Maize Endosperm Development
Feng, Jiaojiao; Xu, Shutu; Wang, Lei; Li, Feifei; Li, Yibo; Zhang, Renhe; Zhang, Xinghua; Xue, Jiquan; Guo, Dongwei
2016-01-01
The endosperm is a major organ of the seed that plays vital roles in determining seed weight and quality. However, genome-wide transcriptome patterns throughout maize endosperm development have not been comprehensively investigated to date. Accordingly, we performed a high-throughput RNA sequencing (RNA-seq) analysis of the maize endosperm transcriptome at 5, 10, 15 and 20 days after pollination (DAP). We found that more than 11,000 protein-coding genes underwent alternative splicing (AS) events during the four developmental stages studied. These genes were mainly involved in intracellular protein transport, signal transmission, cellular carbohydrate metabolism, cellular lipid metabolism, lipid biosynthesis, protein modification, histone modification, cellular amino acid metabolism, and DNA repair. Additionally, 7,633 genes, including 473 transcription factors (TFs), were differentially expressed among the four developmental stages. The differentially expressed TFs were from 50 families, including the bZIP, WRKY, GeBP and ARF families. Further analysis of the stage-specific TFs showed that binding, nucleus and ligand-dependent nuclear receptor activities might be important at 5 DAP, that immune responses, signalling, binding and lumen development are involved at 10 DAP, that protein metabolic processes and the cytoplasm might be important at 15 DAP, and that the responses to various stimuli are different at 20 DAP compared with the other developmental stages. This RNA-seq analysis provides novel, comprehensive insights into the transcriptome dynamics during early endosperm development in maize. PMID:27695101
JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles
Portales-Casamar, Elodie; Thongjuea, Supat; Kwon, Andrew T.; Arenillas, David; Zhao, Xiaobei; Valen, Eivind; Yusuf, Dimas; Lenhard, Boris; Wasserman, Wyeth W.; Sandelin, Albin
2010-01-01
JASPAR (http://jaspar.genereg.net) is the leading open-access database of matrix profiles describing the DNA-binding patterns of transcription factors (TFs) and other proteins interacting with DNA in a sequence-specific manner. Its fourth major release is the largest expansion of the core database to date: the database now holds 457 non-redundant, curated profiles. The new entries include the first batch of profiles derived from ChIP-seq and ChIP-chip whole-genome binding experiments, and 177 yeast TF binding profiles. The introduction of a yeast division brings the convenience of JASPAR to an active research community. As binding models are refined by newer data, the JASPAR database now uses versioning of matrices: in this release, 12% of the older models were updated to improved versions. Classification of TF families has been improved by adopting a new DNA-binding domain nomenclature. A curated catalog of mammalian TFs is provided, extending the use of the JASPAR profiles to additional TFs belonging to the same structural family. The changes in the database set the system ready for more rapid acquisition of new high-throughput data sources. Additionally, three new special collections provide matrix profile data produced by recent alternative high-throughput approaches. PMID:19906716
Dale, Ryan K; Matzat, Leah H; Lei, Elissa P
2014-08-01
Here we introduce metaseq, a software library written in Python, which enables loading multiple genomic data formats into standard Python data structures and allows flexible, customized manipulation and visualization of data from high-throughput sequencing studies. We demonstrate its practical use by analyzing multiple datasets related to chromatin insulators, which are DNA-protein complexes proposed to organize the genome into distinct transcriptional domains. Recent studies in Drosophila and mammals have implicated RNA in the regulation of chromatin insulator activities. Moreover, the Drosophila RNA-binding protein Shep has been shown to antagonize gypsy insulator activity in a tissue-specific manner, but the precise role of RNA in this process remains unclear. Better understanding of chromatin insulator regulation requires integration of multiple datasets, including those from chromatin-binding, RNA-binding, and gene expression experiments. We use metaseq to integrate RIP- and ChIP-seq data for Shep and the core gypsy insulator protein Su(Hw) in two different cell types, along with publicly available ChIP-chip and RNA-seq data. Based on the metaseq-enabled analysis presented here, we propose a model where Shep associates with chromatin cotranscriptionally, then is recruited to insulator complexes in trans where it plays a negative role in insulator activity. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.
High-Throughput Single-Cell RNA Sequencing and Data Analysis.
Sagar; Herman, Josip Stefan; Pospisilik, John Andrew; Grün, Dominic
2018-01-01
Understanding biological systems at a single cell resolution may reveal several novel insights which remain masked by the conventional population-based techniques providing an average readout of the behavior of cells. Single-cell transcriptome sequencing holds the potential to identify novel cell types and characterize the cellular composition of any organ or tissue in health and disease. Here, we describe a customized high-throughput protocol for single-cell RNA-sequencing (scRNA-seq) combining flow cytometry and a nanoliter-scale robotic system. Since scRNA-seq requires amplification of a low amount of endogenous cellular RNA, leading to substantial technical noise in the dataset, downstream data filtering and analysis require special care. Therefore, we also briefly describe in-house state-of-the-art data analysis algorithms developed to identify cellular subpopulations including rare cell types as well as to derive lineage trees by ordering the identified subpopulations of cells along the inferred differentiation trajectories.
TSSAR: TSS annotation regime for dRNA-seq data.
Amman, Fabian; Wolfinger, Michael T; Lorenz, Ronny; Hofacker, Ivo L; Stadler, Peter F; Findeiß, Sven
2014-03-27
Differential RNA sequencing (dRNA-seq) is a high-throughput screening technique designed to examine the architecture of bacterial operons in general and the precise position of transcription start sites (TSS) in particular. Hitherto, dRNA-seq data were analyzed by visualizing the sequencing reads mapped to the reference genome and manually annotating reliable positions. This is very labor intensive and, due to the subjectivity, biased. Here, we present TSSAR, a tool for automated de novo TSS annotation from dRNA-seq data that respects the statistics of dRNA-seq libraries. TSSAR uses the premise that the number of sequencing reads starting at a certain genomic position within a transcriptional active region follows a Poisson distribution with a parameter that depends on the local strength of expression. The differences of two dRNA-seq library counts thus follow a Skellam distribution. This provides a statistical basis to identify significantly enriched primary transcripts.We assessed the performance by analyzing a publicly available dRNA-seq data set using TSSAR and two simple approaches that utilize user-defined score cutoffs. We evaluated the power of reproducing the manual TSS annotation. Furthermore, the same data set was used to reproduce 74 experimentally validated TSS in H. pylori from reliable techniques such as RACE or primer extension. Both analyses showed that TSSAR outperforms the static cutoff-dependent approaches. Having an automated and efficient tool for analyzing dRNA-seq data facilitates the use of the dRNA-seq technique and promotes its application to more sophisticated analysis. For instance, monitoring the plasticity and dynamics of the transcriptomal architecture triggered by different stimuli and growth conditions becomes possible.The main asset of a novel tool for dRNA-seq analysis that reaches out to a broad user community is usability. As such, we provide TSSAR both as intuitive RESTful Web service ( http://rna.tbi.univie.ac.at/TSSAR) together with a set of post-processing and analysis tools, as well as a stand-alone version for use in high-throughput dRNA-seq data analysis pipelines.
Protein Interaction Profile Sequencing (PIP-seq).
Foley, Shawn W; Gregory, Brian D
2016-10-10
Every eukaryotic RNA transcript undergoes extensive post-transcriptional processing from the moment of transcription up through degradation. This regulation is performed by a distinct cohort of RNA-binding proteins which recognize their target transcript by both its primary sequence and secondary structure. Here, we describe protein interaction profile sequencing (PIP-seq), a technique that uses ribonuclease-based footprinting followed by high-throughput sequencing to globally assess both protein-bound RNA sequences and RNA secondary structure. PIP-seq utilizes single- and double-stranded RNA-specific nucleases in the absence of proteins to infer RNA secondary structure. These libraries are also compared to samples that undergo nuclease digestion in the presence of proteins in order to find enriched protein-bound sequences. Combined, these four libraries provide a comprehensive, transcriptome-wide view of RNA secondary structure and RNA protein interaction sites from a single experimental technique. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.
YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs
Shigematsu, Megumi; Honda, Shozo; Loher, Phillipe; Telonis, Aristeidis G.; Rigoutsos, Isidore
2017-01-01
Abstract Besides translation, transfer RNAs (tRNAs) play many non-canonical roles in various biological pathways and exhibit highly variable expression profiles. To unravel the emerging complexities of tRNA biology and molecular mechanisms underlying them, an efficient tRNA sequencing method is required. However, the rigid structure of tRNA has been presenting a challenge to the development of such methods. We report the development of Y-shaped Adapter-ligated MAture TRNA sequencing (YAMAT-seq), an efficient and convenient method for high-throughput sequencing of mature tRNAs. YAMAT-seq circumvents the issue of inefficient adapter ligation, a characteristic of conventional RNA sequencing methods for mature tRNAs, by employing the efficient and specific ligation of Y-shaped adapter to mature tRNAs using T4 RNA Ligase 2. Subsequent cDNA amplification and next-generation sequencing successfully yield numerous mature tRNA sequences. YAMAT-seq has high specificity for mature tRNAs and high sensitivity to detect most isoacceptors from minute amount of total RNA. Moreover, YAMAT-seq shows quantitative capability to estimate expression levels of mature tRNAs, and has high reproducibility and broad applicability for various cell lines. YAMAT-seq thus provides high-throughput technique for identifying tRNA profiles and their regulations in various transcriptomes, which could play important regulatory roles in translation and other biological processes. PMID:28108659
Predicting gene regulatory networks of soybean nodulation from RNA-Seq transcriptome data.
Zhu, Mingzhu; Dahmen, Jeremy L; Stacey, Gary; Cheng, Jianlin
2013-09-22
High-throughput RNA sequencing (RNA-Seq) is a revolutionary technique to study the transcriptome of a cell under various conditions at a systems level. Despite the wide application of RNA-Seq techniques to generate experimental data in the last few years, few computational methods are available to analyze this huge amount of transcription data. The computational methods for constructing gene regulatory networks from RNA-Seq expression data of hundreds or even thousands of genes are particularly lacking and urgently needed. We developed an automated bioinformatics method to predict gene regulatory networks from the quantitative expression values of differentially expressed genes based on RNA-Seq transcriptome data of a cell in different stages and conditions, integrating transcriptional, genomic and gene function data. We applied the method to the RNA-Seq transcriptome data generated for soybean root hair cells in three different development stages of nodulation after rhizobium infection. The method predicted a soybean nodulation-related gene regulatory network consisting of 10 regulatory modules common for all three stages, and 24, 49 and 70 modules separately for the first, second and third stage, each containing both a group of co-expressed genes and several transcription factors collaboratively controlling their expression under different conditions. 8 of 10 common regulatory modules were validated by at least two kinds of validations, such as independent DNA binding motif analysis, gene function enrichment test, and previous experimental data in the literature. We developed a computational method to reliably reconstruct gene regulatory networks from RNA-Seq transcriptome data. The method can generate valuable hypotheses for interpreting biological data and designing biological experiments such as ChIP-Seq, RNA interference, and yeast two hybrid experiments.
A maximum entropy model for chromatin structure
NASA Astrophysics Data System (ADS)
Farre, Pau; Emberly, Eldon; Emberly Group Team
The DNA inside the nucleus of eukaryotic cells shows a variety of conserved structures at different length scales These structures are formed by interactions between protein complexes that bind to the DNA and regulate gene activity. Recent high throughput sequencing techniques allow for the measurement both of the genome wide contact map of the folded DNA within a cell (HiC) and where various proteins are bound to the DNA (ChIP-seq). In this talk I will present a maximum-entropy method capable of both predicting HiC contact maps from binding data, and binding data from HiC contact maps. This method results in an intuitive Ising-type model that is able to predict how altering the presence of binding factors can modify chromosome conformation, without the need of polymer simulations.
Prasad, Kasavajhala V. S. K.; Abdel-Hameed, Amira A. E.; Xing, Denghui; Reddy, Anireddy S. N.
2016-01-01
Abiotic and biotic stresses cause significant yield losses in all crops. Acquisition of stress tolerance in plants requires rapid reprogramming of gene expression. SR1/CAMTA3, a member of signal responsive transcription factors (TFs), functions both as a positive and a negative regulator of biotic stress responses and as a positive regulator of cold stress-induced gene expression. Using high throughput RNA-seq, we identified ~3000 SR1-regulated genes. Promoters of about 60% of the differentially expressed genes have a known DNA binding site for SR1, suggesting that they are likely direct targets. Gene ontology analysis of SR1-regulated genes confirmed previously known functions of SR1 and uncovered a potential role for this TF in salt stress. Our results showed that SR1 mutant is more tolerant to salt stress than the wild type and complemented line. Improved tolerance of sr1 seedlings to salt is accompanied with the induction of salt-responsive genes. Furthermore, ChIP-PCR results showed that SR1 binds to promoters of several salt-responsive genes. These results suggest that SR1 acts as a negative regulator of salt tolerance by directly repressing the expression of salt-responsive genes. Overall, this study identified SR1-regulated genes globally and uncovered a previously uncharacterized role for SR1 in salt stress response. PMID:27251464
Mechanisms of Lin28-Mediated miRNA and mRNA Regulation—A Structural and Functional Perspective
Mayr, Florian; Heinemann, Udo
2013-01-01
Lin28 is an essential RNA-binding protein that is ubiquitously expressed in embryonic stem cells. Its physiological function has been linked to the regulation of differentiation, development, and oncogenesis as well as glucose metabolism. Lin28 mediates these pleiotropic functions by inhibiting let-7 miRNA biogenesis and by modulating the translation of target mRNAs. Both activities strongly depend on Lin28’s RNA-binding domains (RBDs), an N-terminal cold-shock domain (CSD) and a C-terminal Zn-knuckle domain (ZKD). Recent biochemical and structural studies revealed the mechanisms of how Lin28 controls let-7 biogenesis. Lin28 binds to the terminal loop of pri- and pre-let-7 miRNA and represses their processing by Drosha and Dicer. Several biochemical and structural studies showed that the specificity of this interaction is mainly mediated by the ZKD with a conserved GGAGA or GGAGA-like motif. Further RNA crosslinking and immunoprecipitation coupled to high-throughput sequencing (CLIP-seq) studies confirmed this binding motif and uncovered a large number of new mRNA binding sites. Here we review exciting recent progress in our understanding of how Lin28 binds structurally diverse RNAs and fulfills its pleiotropic functions. PMID:23939427
YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs.
Shigematsu, Megumi; Honda, Shozo; Loher, Phillipe; Telonis, Aristeidis G; Rigoutsos, Isidore; Kirino, Yohei
2017-05-19
Besides translation, transfer RNAs (tRNAs) play many non-canonical roles in various biological pathways and exhibit highly variable expression profiles. To unravel the emerging complexities of tRNA biology and molecular mechanisms underlying them, an efficient tRNA sequencing method is required. However, the rigid structure of tRNA has been presenting a challenge to the development of such methods. We report the development of Y-shaped Adapter-ligated MAture TRNA sequencing (YAMAT-seq), an efficient and convenient method for high-throughput sequencing of mature tRNAs. YAMAT-seq circumvents the issue of inefficient adapter ligation, a characteristic of conventional RNA sequencing methods for mature tRNAs, by employing the efficient and specific ligation of Y-shaped adapter to mature tRNAs using T4 RNA Ligase 2. Subsequent cDNA amplification and next-generation sequencing successfully yield numerous mature tRNA sequences. YAMAT-seq has high specificity for mature tRNAs and high sensitivity to detect most isoacceptors from minute amount of total RNA. Moreover, YAMAT-seq shows quantitative capability to estimate expression levels of mature tRNAs, and has high reproducibility and broad applicability for various cell lines. YAMAT-seq thus provides high-throughput technique for identifying tRNA profiles and their regulations in various transcriptomes, which could play important regulatory roles in translation and other biological processes. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Yang, Jian-Hua; Li, Jun-Hao; Jiang, Shan; Zhou, Hui; Qu, Liang-Hu
2013-01-01
Long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) represent two classes of important non-coding RNAs in eukaryotes. Although these non-coding RNAs have been implicated in organismal development and in various human diseases, surprisingly little is known about their transcriptional regulation. Recent advances in chromatin immunoprecipitation with next-generation DNA sequencing (ChIP-Seq) have provided methods of detecting transcription factor binding sites (TFBSs) with unprecedented sensitivity. In this study, we describe ChIPBase (http://deepbase.sysu.edu.cn/chipbase/), a novel database that we have developed to facilitate the comprehensive annotation and discovery of transcription factor binding maps and transcriptional regulatory relationships of lncRNAs and miRNAs from ChIP-Seq data. The current release of ChIPBase includes high-throughput sequencing data that were generated by 543 ChIP-Seq experiments in diverse tissues and cell lines from six organisms. By analysing millions of TFBSs, we identified tens of thousands of TF-lncRNA and TF-miRNA regulatory relationships. Furthermore, two web-based servers were developed to annotate and discover transcriptional regulatory relationships of lncRNAs and miRNAs from ChIP-Seq data. In addition, we developed two genome browsers, deepView and genomeView, to provide integrated views of multidimensional data. Moreover, our web implementation supports diverse query types and the exploration of TFs, lncRNAs, miRNAs, gene ontologies and pathways.
methylPipe and compEpiTools: a suite of R packages for the integrative analysis of epigenomics data.
Kishore, Kamal; de Pretis, Stefano; Lister, Ryan; Morelli, Marco J; Bianchi, Valerio; Amati, Bruno; Ecker, Joseph R; Pelizzola, Mattia
2015-09-29
Numerous methods are available to profile several epigenetic marks, providing data with different genome coverage and resolution. Large epigenomic datasets are then generated, and often combined with other high-throughput data, including RNA-seq, ChIP-seq for transcription factors (TFs) binding and DNase-seq experiments. Despite the numerous computational tools covering specific steps in the analysis of large-scale epigenomics data, comprehensive software solutions for their integrative analysis are still missing. Multiple tools must be identified and combined to jointly analyze histone marks, TFs binding and other -omics data together with DNA methylation data, complicating the analysis of these data and their integration with publicly available datasets. To overcome the burden of integrating various data types with multiple tools, we developed two companion R/Bioconductor packages. The former, methylPipe, is tailored to the analysis of high- or low-resolution DNA methylomes in several species, accommodating (hydroxy-)methyl-cytosines in both CpG and non-CpG sequence context. The analysis of multiple whole-genome bisulfite sequencing experiments is supported, while maintaining the ability of integrating targeted genomic data. The latter, compEpiTools, seamlessly incorporates the results obtained with methylPipe and supports their integration with other epigenomics data. It provides a number of methods to score these data in regions of interest, leading to the identification of enhancers, lncRNAs, and RNAPII stalling/elongation dynamics. Moreover, it allows a fast and comprehensive annotation of the resulting genomic regions, and the association of the corresponding genes with non-redundant GeneOntology terms. Finally, the package includes a flexible method based on heatmaps for the integration of various data types, combining annotation tracks with continuous or categorical data tracks. methylPipe and compEpiTools provide a comprehensive Bioconductor-compliant solution for the integrative analysis of heterogeneous epigenomics data. These packages are instrumental in providing biologists with minimal R skills a complete toolkit facilitating the analysis of their own data, or in accelerating the analyses performed by more experienced bioinformaticians.
ASPeak: an abundance sensitive peak detection algorithm for RIP-Seq.
Kucukural, Alper; Özadam, Hakan; Singh, Guramrit; Moore, Melissa J; Cenik, Can
2013-10-01
Unlike DNA, RNA abundances can vary over several orders of magnitude. Thus, identification of RNA-protein binding sites from high-throughput sequencing data presents unique challenges. Although peak identification in ChIP-Seq data has been extensively explored, there are few bioinformatics tools tailored for peak calling on analogous datasets for RNA-binding proteins. Here we describe ASPeak (abundance sensitive peak detection algorithm), an implementation of an algorithm that we previously applied to detect peaks in exon junction complex RNA immunoprecipitation in tandem experiments. Our peak detection algorithm yields stringent and robust target sets enabling sensitive motif finding and downstream functional analyses. ASPeak is implemented in Perl as a complete pipeline that takes bedGraph files as input. ASPeak implementation is freely available at https://sourceforge.net/projects/as-peak under the GNU General Public License. ASPeak can be run on a personal computer, yet is designed to be easily parallelizable. ASPeak can also run on high performance computing clusters providing efficient speedup. The documentation and user manual can be obtained from http://master.dl.sourceforge.net/project/as-peak/manual.pdf.
Gene expression profiling of human breast tissue samples using SAGE-Seq.
Wu, Zhenhua Jeremy; Meyer, Clifford A; Choudhury, Sibgat; Shipitsin, Michail; Maruyama, Reo; Bessarabova, Marina; Nikolskaya, Tatiana; Sukumar, Saraswati; Schwartzman, Armin; Liu, Jun S; Polyak, Kornelia; Liu, X Shirley
2010-12-01
We present a powerful application of ultra high-throughput sequencing, SAGE-Seq, for the accurate quantification of normal and neoplastic mammary epithelial cell transcriptomes. We develop data analysis pipelines that allow the mapping of sense and antisense strands of mitochondrial and RefSeq genes, the normalization between libraries, and the identification of differentially expressed genes. We find that the diversity of cancer transcriptomes is significantly higher than that of normal cells. Our analysis indicates that transcript discovery plateaus at 10 million reads/sample, and suggests a minimum desired sequencing depth around five million reads. Comparison of SAGE-Seq and traditional SAGE on normal and cancerous breast tissues reveals higher sensitivity of SAGE-Seq to detect less-abundant genes, including those encoding for known breast cancer-related transcription factors and G protein-coupled receptors (GPCRs). SAGE-Seq is able to identify genes and pathways abnormally activated in breast cancer that traditional SAGE failed to call. SAGE-Seq is a powerful method for the identification of biomarkers and therapeutic targets in human disease.
Ambrosini, Giovanna; Dreos, René; Kumar, Sunil; Bucher, Philipp
2016-11-18
ChIP-seq and related high-throughput chromatin profilig assays generate ever increasing volumes of highly valuable biological data. To make sense out of it, biologists need versatile, efficient and user-friendly tools for access, visualization and itegrative analysis of such data. Here we present the ChIP-Seq command line tools and web server, implementing basic algorithms for ChIP-seq data analysis starting with a read alignment file. The tools are optimized for memory-efficiency and speed thus allowing for processing of large data volumes on inexpensive hardware. The web interface provides access to a large database of public data. The ChIP-Seq tools have a modular and interoperable design in that the output from one application can serve as input to another one. Complex and innovative tasks can thus be achieved by running several tools in a cascade. The various ChIP-Seq command line tools and web services either complement or compare favorably to related bioinformatics resources in terms of computational efficiency, ease of access to public data and interoperability with other web-based tools. The ChIP-Seq server is accessible at http://ccg.vital-it.ch/chipseq/ .
Barshad, Gilad; Blumberg, Amit; Cohen, Tal; Mishmar, Dan
2018-06-14
Oxidative phosphorylation (OXPHOS), a fundamental energy source in all human tissues, requires interactions between mitochondrial (mtDNA)- and nuclear (nDNA)-encoded protein subunits. Although such interactions are fundamental to OXPHOS, bi-genomic coregulation is poorly understood. To address this question, we analyzed ∼8500 RNA-seq experiments from 48 human body sites. Despite well-known variation in mitochondrial activity, quantity, and morphology, we found overall positive mtDNA-nDNA OXPHOS genes' co-expression across human tissues. Nevertheless, negative mtDNA-nDNA gene expression correlation was identified in the hypothalamus, basal ganglia, and amygdala (subcortical brain regions, collectively termed the "primitive" brain). Single-cell RNA-seq analysis of mouse and human brains revealed that this phenomenon is evolutionarily conserved, and both are influenced by brain cell types (involving excitatory/inhibitory neurons and nonneuronal cells) and by their spatial brain location. As the "primitive" brain is highly oxidative, we hypothesized that such negative mtDNA-nDNA co-expression likely controls for the high mtDNA transcript levels, which enforce tight OXPHOS regulation, rather than rewiring toward glycolysis. Accordingly, we found "primitive" brain-specific up-regulation of lactate dehydrogenase B ( LDHB ), which associates with high OXPHOS activity, at the expense of LDHA , which promotes glycolysis. Analyses of co-expression, DNase-seq, and ChIP-seq experiments revealed candidate RNA-binding proteins and CEBPB as the best regulatory candidates to explain these phenomena. Finally, cross-tissue expression analysis unearthed tissue-dependent splice variants and OXPHOS subunit paralogs and allowed revising the list of canonical OXPHOS transcripts. Taken together, our analysis provides a comprehensive view of mito-nuclear gene co-expression across human tissues and provides overall insights into the bi-genomic regulation of mitochondrial activities. © 2018 Barshad et al.; Published by Cold Spring Harbor Laboratory Press.
GenomicTools: a computational platform for developing high-throughput analytics in genomics.
Tsirigos, Aristotelis; Haiminen, Niina; Bilal, Erhan; Utro, Filippo
2012-01-15
Recent advances in sequencing technology have resulted in the dramatic increase of sequencing data, which, in turn, requires efficient management of computational resources, such as computing time, memory requirements as well as prototyping of computational pipelines. We present GenomicTools, a flexible computational platform, comprising both a command-line set of tools and a C++ API, for the analysis and manipulation of high-throughput sequencing data such as DNA-seq, RNA-seq, ChIP-seq and MethylC-seq. GenomicTools implements a variety of mathematical operations between sets of genomic regions thereby enabling the prototyping of computational pipelines that can address a wide spectrum of tasks ranging from pre-processing and quality control to meta-analyses. Additionally, the GenomicTools platform is designed to analyze large datasets of any size by minimizing memory requirements. In practical applications, where comparable, GenomicTools outperforms existing tools in terms of both time and memory usage. The GenomicTools platform (version 2.0.0) was implemented in C++. The source code, documentation, user manual, example datasets and scripts are available online at http://code.google.com/p/ibm-cbc-genomic-tools.
Li, Peipei; Piao, Yongjun; Shon, Ho Sun; Ryu, Keun Ho
2015-10-28
Recently, rapid improvements in technology and decrease in sequencing costs have made RNA-Seq a widely used technique to quantify gene expression levels. Various normalization approaches have been proposed, owing to the importance of normalization in the analysis of RNA-Seq data. A comparison of recently proposed normalization methods is required to generate suitable guidelines for the selection of the most appropriate approach for future experiments. In this paper, we compared eight non-abundance (RC, UQ, Med, TMM, DESeq, Q, RPKM, and ERPKM) and two abundance estimation normalization methods (RSEM and Sailfish). The experiments were based on real Illumina high-throughput RNA-Seq of 35- and 76-nucleotide sequences produced in the MAQC project and simulation reads. Reads were mapped with human genome obtained from UCSC Genome Browser Database. For precise evaluation, we investigated Spearman correlation between the normalization results from RNA-Seq and MAQC qRT-PCR values for 996 genes. Based on this work, we showed that out of the eight non-abundance estimation normalization methods, RC, UQ, Med, TMM, DESeq, and Q gave similar normalization results for all data sets. For RNA-Seq of a 35-nucleotide sequence, RPKM showed the highest correlation results, but for RNA-Seq of a 76-nucleotide sequence, least correlation was observed than the other methods. ERPKM did not improve results than RPKM. Between two abundance estimation normalization methods, for RNA-Seq of a 35-nucleotide sequence, higher correlation was obtained with Sailfish than that with RSEM, which was better than without using abundance estimation methods. However, for RNA-Seq of a 76-nucleotide sequence, the results achieved by RSEM were similar to without applying abundance estimation methods, and were much better than with Sailfish. Furthermore, we found that adding a poly-A tail increased alignment numbers, but did not improve normalization results. Spearman correlation analysis revealed that RC, UQ, Med, TMM, DESeq, and Q did not noticeably improve gene expression normalization, regardless of read length. Other normalization methods were more efficient when alignment accuracy was low; Sailfish with RPKM gave the best normalization results. When alignment accuracy was high, RC was sufficient for gene expression calculation. And we suggest ignoring poly-A tail during differential gene expression analysis.
ChIP-PaM: an algorithm to identify protein-DNA interaction using ChIP-Seq data.
Wu, Song; Wang, Jianmin; Zhao, Wei; Pounds, Stanley; Cheng, Cheng
2010-06-03
ChIP-Seq is a powerful tool for identifying the interaction between genomic regulators and their bound DNAs, especially for locating transcription factor binding sites. However, high cost and high rate of false discovery of transcription factor binding sites identified from ChIP-Seq data significantly limit its application. Here we report a new algorithm, ChIP-PaM, for identifying transcription factor target regions in ChIP-Seq datasets. This algorithm makes full use of a protein-DNA binding pattern by capitalizing on three lines of evidence: 1) the tag count modelling at the peak position, 2) pattern matching of a specific tag count distribution, and 3) motif searching along the genome. A novel data-based two-step eFDR procedure is proposed to integrate the three lines of evidence to determine significantly enriched regions. Our algorithm requires no technical controls and efficiently discriminates falsely enriched regions from regions enriched by true transcription factor (TF) binding on the basis of ChIP-Seq data only. An analysis of real genomic data is presented to demonstrate our method. In a comparison with other existing methods, we found that our algorithm provides more accurate binding site discovery while maintaining comparable statistical power.
Measuring Sister Chromatid Cohesion Protein Genome Occupancy in Drosophila melanogaster by ChIP-seq.
Dorsett, Dale; Misulovin, Ziva
2017-01-01
This chapter presents methods to conduct and analyze genome-wide chromatin immunoprecipitation of the cohesin complex and the Nipped-B cohesin loading factor in Drosophila cells using high-throughput DNA sequencing (ChIP-seq). Procedures for isolation of chromatin, immunoprecipitation, and construction of sequencing libraries for the Ion Torrent Proton high throughput sequencer are detailed, and computational methods to calculate occupancy as input-normalized fold-enrichment are described. The results obtained by ChIP-seq are compared to those obtained by ChIP-chip (genomic ChIP using tiling microarrays), and the effects of sequencing depth on the accuracy are analyzed. ChIP-seq provides similar sensitivity and reproducibility as ChIP-chip, and identifies the same broad regions of occupancy. The locations of enrichment peaks, however, can differ between ChIP-chip and ChIP-seq, and low sequencing depth can splinter broad regions of occupancy into distinct peaks.
Yi, Ming; Zhao, Yongmei; Jia, Li; He, Mei; Kebebew, Electron; Stephens, Robert M.
2014-01-01
To apply exome-seq-derived variants in the clinical setting, there is an urgent need to identify the best variant caller(s) from a large collection of available options. We have used an Illumina exome-seq dataset as a benchmark, with two validation scenarios—family pedigree information and SNP array data for the same samples, permitting global high-throughput cross-validation, to evaluate the quality of SNP calls derived from several popular variant discovery tools from both the open-source and commercial communities using a set of designated quality metrics. To the best of our knowledge, this is the first large-scale performance comparison of exome-seq variant discovery tools using high-throughput validation with both Mendelian inheritance checking and SNP array data, which allows us to gain insights into the accuracy of SNP calling through such high-throughput validation in an unprecedented way, whereas the previously reported comparison studies have only assessed concordance of these tools without directly assessing the quality of the derived SNPs. More importantly, the main purpose of our study was to establish a reusable procedure that applies high-throughput validation to compare the quality of SNP discovery tools with a focus on exome-seq, which can be used to compare any forthcoming tool(s) of interest. PMID:24831545
Zhu, Lin; Guo, Wei-Li; Deng, Su-Ping; Huang, De-Shuang
2016-01-01
In recent years, thanks to the efforts of individual scientists and research consortiums, a huge amount of chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) experimental data have been accumulated. Instead of investigating them independently, several recent studies have convincingly demonstrated that a wealth of scientific insights can be gained by integrative analysis of these ChIP-seq data. However, when used for the purpose of integrative analysis, a serious drawback of current ChIP-seq technique is that it is still expensive and time-consuming to generate ChIP-seq datasets of high standard. Most researchers are therefore unable to obtain complete ChIP-seq data for several TFs in a wide variety of cell lines, which considerably limits the understanding of transcriptional regulation pattern. In this paper, we propose a novel method called ChIP-PIT to overcome the aforementioned limitation. In ChIP-PIT, ChIP-seq data corresponding to a diverse collection of cell types, TFs and genes are fused together using the three-mode pair-wise interaction tensor (PIT) model, and the prediction of unperformed ChIP-seq experimental results is formulated as a tensor completion problem. Computationally, we propose efficient first-order method based on extensions of coordinate descent method to learn the optimal solution of ChIP-PIT, which makes it particularly suitable for the analysis of massive scale ChIP-seq data. Experimental evaluation the ENCODE data illustrate the usefulness of the proposed model.
A ChIP-Seq Data Analysis Pipeline Based on Bioconductor Packages.
Park, Seung-Jin; Kim, Jong-Hwan; Yoon, Byung-Ha; Kim, Seon-Young
2017-03-01
Nowadays, huge volumes of chromatin immunoprecipitation-sequencing (ChIP-Seq) data are generated to increase the knowledge on DNA-protein interactions in the cell, and accordingly, many tools have been developed for ChIP-Seq analysis. Here, we provide an example of a streamlined workflow for ChIP-Seq data analysis composed of only four packages in Bioconductor: dada2, QuasR, mosaics, and ChIPseeker. 'dada2' performs trimming of the high-throughput sequencing data. 'QuasR' and 'mosaics' perform quality control and mapping of the input reads to the reference genome and peak calling, respectively. Finally, 'ChIPseeker' performs annotation and visualization of the called peaks. This workflow runs well independently of operating systems (e.g., Windows, Mac, or Linux) and processes the input fastq files into various results in one run. R code is available at github: https://github.com/ddhb/Workflow_of_Chipseq.git.
A ChIP-Seq Data Analysis Pipeline Based on Bioconductor Packages
Park, Seung-Jin; Kim, Jong-Hwan; Yoon, Byung-Ha; Kim, Seon-Young
2017-01-01
Nowadays, huge volumes of chromatin immunoprecipitation-sequencing (ChIP-Seq) data are generated to increase the knowledge on DNA-protein interactions in the cell, and accordingly, many tools have been developed for ChIP-Seq analysis. Here, we provide an example of a streamlined workflow for ChIP-Seq data analysis composed of only four packages in Bioconductor: dada2, QuasR, mosaics, and ChIPseeker. ‘dada2’ performs trimming of the high-throughput sequencing data. ‘QuasR’ and ‘mosaics’ perform quality control and mapping of the input reads to the reference genome and peak calling, respectively. Finally, ‘ChIPseeker’ performs annotation and visualization of the called peaks. This workflow runs well independently of operating systems (e.g., Windows, Mac, or Linux) and processes the input fastq files into various results in one run. R code is available at github: https://github.com/ddhb/Workflow_of_Chipseq.git. PMID:28416945
Pagès, Hervé
2018-01-01
Biological experiments involving genomics or other high-throughput assays typically yield a data matrix that can be explored and analyzed using the R programming language with packages from the Bioconductor project. Improvements in the throughput of these assays have resulted in an explosion of data even from routine experiments, which poses a challenge to the existing computational infrastructure for statistical data analysis. For example, single-cell RNA sequencing (scRNA-seq) experiments frequently generate large matrices containing expression values for each gene in each cell, requiring sparse or file-backed representations for memory-efficient manipulation in R. These alternative representations are not easily compatible with high-performance C++ code used for computationally intensive tasks in existing R/Bioconductor packages. Here, we describe a C++ interface named beachmat, which enables agnostic data access from various matrix representations. This allows package developers to write efficient C++ code that is interoperable with dense, sparse and file-backed matrices, amongst others. We evaluated the performance of beachmat for accessing data from each matrix representation using both simulated and real scRNA-seq data, and defined a clear memory/speed trade-off to motivate the choice of an appropriate representation. We also demonstrate how beachmat can be incorporated into the code of other packages to drive analyses of a very large scRNA-seq data set. PMID:29723188
Lun, Aaron T L; Pagès, Hervé; Smith, Mike L
2018-05-01
Biological experiments involving genomics or other high-throughput assays typically yield a data matrix that can be explored and analyzed using the R programming language with packages from the Bioconductor project. Improvements in the throughput of these assays have resulted in an explosion of data even from routine experiments, which poses a challenge to the existing computational infrastructure for statistical data analysis. For example, single-cell RNA sequencing (scRNA-seq) experiments frequently generate large matrices containing expression values for each gene in each cell, requiring sparse or file-backed representations for memory-efficient manipulation in R. These alternative representations are not easily compatible with high-performance C++ code used for computationally intensive tasks in existing R/Bioconductor packages. Here, we describe a C++ interface named beachmat, which enables agnostic data access from various matrix representations. This allows package developers to write efficient C++ code that is interoperable with dense, sparse and file-backed matrices, amongst others. We evaluated the performance of beachmat for accessing data from each matrix representation using both simulated and real scRNA-seq data, and defined a clear memory/speed trade-off to motivate the choice of an appropriate representation. We also demonstrate how beachmat can be incorporated into the code of other packages to drive analyses of a very large scRNA-seq data set.
Analysis, annotation, and profiling of the oat seed transcriptome
USDA-ARS?s Scientific Manuscript database
Novel high-throughput next generation sequencing (NGS) technologies are providing opportunities to explore genomes and transcriptomes in a cost-effective manner. To construct a gene expression atlas of developing oat (Avena sativa) seeds, two software packages specifically designed for RNA-seq (Trin...
Hu, Peng; Fabyanic, Emily; Kwon, Deborah Y; Tang, Sheng; Zhou, Zhaolan; Wu, Hao
2017-12-07
Massively parallel single-cell RNA sequencing can precisely resolve cellular diversity in a high-throughput manner at low cost, but unbiased isolation of intact single cells from complex tissues such as adult mammalian brains is challenging. Here, we integrate sucrose-gradient-assisted purification of nuclei with droplet microfluidics to develop a highly scalable single-nucleus RNA-seq approach (sNucDrop-seq), which is free of enzymatic dissociation and nucleus sorting. By profiling ∼18,000 nuclei isolated from cortical tissues of adult mice, we demonstrate that sNucDrop-seq not only accurately reveals neuronal and non-neuronal subtype composition with high sensitivity but also enables in-depth analysis of transient transcriptional states driven by neuronal activity, at single-cell resolution, in vivo. Copyright © 2017 Elsevier Inc. All rights reserved.
2011-01-01
Background The generation and analysis of high-throughput sequencing data are becoming a major component of many studies in molecular biology and medical research. Illumina's Genome Analyzer (GA) and HiSeq instruments are currently the most widely used sequencing devices. Here, we comprehensively evaluate properties of genomic HiSeq and GAIIx data derived from two plant genomes and one virus, with read lengths of 95 to 150 bases. Results We provide quantifications and evidence for GC bias, error rates, error sequence context, effects of quality filtering, and the reliability of quality values. By combining different filtering criteria we reduced error rates 7-fold at the expense of discarding 12.5% of alignable bases. While overall error rates are low in HiSeq data we observed regions of accumulated wrong base calls. Only 3% of all error positions accounted for 24.7% of all substitution errors. Analyzing the forward and reverse strands separately revealed error rates of up to 18.7%. Insertions and deletions occurred at very low rates on average but increased to up to 2% in homopolymers. A positive correlation between read coverage and GC content was found depending on the GC content range. Conclusions The errors and biases we report have implications for the use and the interpretation of Illumina sequencing data. GAIIx and HiSeq data sets show slightly different error profiles. Quality filtering is essential to minimize downstream analysis artifacts. Supporting previous recommendations, the strand-specificity provides a criterion to distinguish sequencing errors from low abundance polymorphisms. PMID:22067484
Synthetic spike-in standards for high-throughput 16S rRNA gene amplicon sequencing
Tourlousse, Dieter M.; Yoshiike, Satowa; Ohashi, Akiko; Matsukura, Satoko; Noda, Naohiro
2017-01-01
Abstract High-throughput sequencing of 16S rRNA gene amplicons (16S-seq) has become a widely deployed method for profiling complex microbial communities but technical pitfalls related to data reliability and quantification remain to be fully addressed. In this work, we have developed and implemented a set of synthetic 16S rRNA genes to serve as universal spike-in standards for 16S-seq experiments. The spike-ins represent full-length 16S rRNA genes containing artificial variable regions with negligible identity to known nucleotide sequences, permitting unambiguous identification of spike-in sequences in 16S-seq read data from any microbiome sample. Using defined mock communities and environmental microbiota, we characterized the performance of the spike-in standards and demonstrated their utility for evaluating data quality on a per-sample basis. Further, we showed that staggered spike-in mixtures added at the point of DNA extraction enable concurrent estimation of absolute microbial abundances suitable for comparative analysis. Results also underscored that template-specific Illumina sequencing artifacts may lead to biases in the perceived abundance of certain taxa. Taken together, the spike-in standards represent a novel bioanalytical tool that can substantially improve 16S-seq-based microbiome studies by enabling comprehensive quality control along with absolute quantification. PMID:27980100
Deng, Xin; Liang, Haihua; Chen, Kai; He, Chuan; Lan, Lefu; Tang, Xiaoyan
2014-01-01
Pseudomonas syringae uses the two-component system RhpRS to regulate the expression of type III secretion system (T3SS) genes and bacterial virulence. However, the molecular mechanisms and the regulons of RhpRS have yet to be fully elucidated. Here, we show that RhpS functions as a kinase and a phosphatase on RhpR and as an autokinase upon itself. RhpR is phosphorylated by the small phosphodonor acetyl phosphate. A specific RhpR-binding site containing the inverted repeat (IR) motif GTATC-N6-GATAC, was mapped to its own promoter by a DNase I footprint analysis. Electrophoretic mobility shift assay indicated that P-RhpR has a higher binding affinity to the IR motif than RhpR. To identify additional RhpR targets in P. syringae, we performed chromatin immunoprecipitation followed by high-throughput DNA sequencing (ChIP-seq) and detected 167 enriched loci including the hrpR promoter, suggesting the direct regulation of T3SS cascade genes by RhpR. A genome-wide microarray analysis showed that, in addition to the T3SS cascade genes, RhpR differentially regulates a large set of genes with various functions in response to different growth conditions. Together, these results suggested that RhpRS is a global regulator that allows P. syringae to sense and respond to environmental changes by coordinating T3SS expression and many other biological processes. PMID:25249629
Karas, Vlad O; Sinnott-Armstrong, Nicholas A; Varghese, Vici; Shafer, Robert W; Greenleaf, William J; Sherlock, Gavin
2018-01-01
Abstract Much of the within species genetic variation is in the form of single nucleotide polymorphisms (SNPs), typically detected by whole genome sequencing (WGS) or microarray-based technologies. However, WGS produces mostly uninformative reads that perfectly match the reference, while microarrays require genome-specific reagents. We have developed Diff-seq, a sequencing-based mismatch detection assay for SNP discovery without the requirement for specialized nucleic-acid reagents. Diff-seq leverages the Surveyor endonuclease to cleave mismatched DNA molecules that are generated after cross-annealing of a complex pool of DNA fragments. Sequencing libraries enriched for Surveyor-cleaved molecules result in increased coverage at the variant sites. Diff-seq detected all mismatches present in an initial test substrate, with specific enrichment dependent on the identity and context of the variation. Application to viral sequences resulted in increased observation of variant alleles in a biologically relevant context. Diff-Seq has the potential to increase the sensitivity and efficiency of high-throughput sequencing in the detection of variation. PMID:29361139
Introduction to Single-Cell RNA Sequencing.
Olsen, Thale Kristin; Baryawno, Ninib
2018-04-01
During the last decade, high-throughput sequencing methods have revolutionized the entire field of biology. The opportunity to study entire transcriptomes in great detail using RNA sequencing (RNA-seq) has fueled many important discoveries and is now a routine method in biomedical research. However, RNA-seq is typically performed in "bulk," and the data represent an average of gene expression patterns across thousands to millions of cells; this might obscure biologically relevant differences between cells. Single-cell RNA-seq (scRNA-seq) represents an approach to overcome this problem. By isolating single cells, capturing their transcripts, and generating sequencing libraries in which the transcripts are mapped to individual cells, scRNA-seq allows assessment of fundamental biological properties of cell populations and biological systems at unprecedented resolution. Here, we present the most common scRNA-seq protocols in use today and the basics of data analysis and discuss factors that are important to consider before planning and designing an scRNA-seq project. © 2018 by John Wiley & Sons, Inc. Copyright © 2018 John Wiley & Sons, Inc.
How B-Cell Receptor Repertoire Sequencing Can Be Enriched with Structural Antibody Data
Kovaltsuk, Aleksandr; Krawczyk, Konrad; Galson, Jacob D.; Kelly, Dominic F.; Deane, Charlotte M.; Trück, Johannes
2017-01-01
Next-generation sequencing of immunoglobulin gene repertoires (Ig-seq) allows the investigation of large-scale antibody dynamics at a sequence level. However, structural information, a crucial descriptor of antibody binding capability, is not collected in Ig-seq protocols. Developing systematic relationships between the antibody sequence information gathered from Ig-seq and low-throughput techniques such as X-ray crystallography could radically improve our understanding of antibodies. The mapping of Ig-seq datasets to known antibody structures can indicate structurally, and perhaps functionally, uncharted areas. Furthermore, contrasting naïve and antigenically challenged datasets using structural antibody descriptors should provide insights into antibody maturation. As the number of antibody structures steadily increases and more and more Ig-seq datasets become available, the opportunities that arise from combining the two types of information increase as well. Here, we review how these data types enrich one another and show potential for advancing our knowledge of the immune system and improving antibody engineering. PMID:29276518
Development of rapid and sensitive high throughput pharmacologic assays for marine phycotoxins.
Van Dolah, F M; Finley, E L; Haynes, B L; Doucette, G J; Moeller, P D; Ramsdell, J S
1994-01-01
The lack of rapid, high throughput assays is a major obstacle to many aspects of research on marine phycotoxins. Here we describe the application of microplate scintillation technology to develop high throughput assays for several classes of marine phycotoxin based on their differential pharmacologic actions. High throughput "drug discovery" format microplate receptor binding assays developed for brevetoxins/ciguatoxins and for domoic acid are described. Analysis for brevetoxins/ciguatoxins is carried out by binding competition with [3H] PbTx-3 for site 5 on the voltage dependent sodium channel in rat brain synaptosomes. Analysis of domoic acid is based on binding competition with [3H] kainic acid for the kainate/quisqualate glutamate receptor using frog brain synaptosomes. In addition, a high throughput microplate 45Ca flux assay for determination of maitotoxins is described. These microplate assays can be completed within 3 hours, have sensitivities of less than 1 ng, and can analyze dozens of samples simultaneously. The assays have been demonstrated to be useful for assessing algal toxicity and for assay-guided purification of toxins, and are applicable to the detection of biotoxins in seafood.
Global Analysis of Photosynthesis Transcriptional Regulatory Networks
Imam, Saheed; Noguera, Daniel R.; Donohue, Timothy J.
2014-01-01
Photosynthesis is a crucial biological process that depends on the interplay of many components. This work analyzed the gene targets for 4 transcription factors: FnrL, PrrA, CrpK and MppG (RSP_2888), which are known or predicted to control photosynthesis in Rhodobacter sphaeroides. Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) identified 52 operons under direct control of FnrL, illustrating its regulatory role in photosynthesis, iron homeostasis, nitrogen metabolism and regulation of sRNA synthesis. Using global gene expression analysis combined with ChIP-seq, we mapped the regulons of PrrA, CrpK and MppG. PrrA regulates ∼34 operons encoding mainly photosynthesis and electron transport functions, while CrpK, a previously uncharacterized Crp-family protein, regulates genes involved in photosynthesis and maintenance of iron homeostasis. Furthermore, CrpK and FnrL share similar DNA binding determinants, possibly explaining our observation of the ability of CrpK to partially compensate for the growth defects of a ΔFnrL mutant. We show that the Rrf2 family protein, MppG, plays an important role in photopigment biosynthesis, as part of an incoherent feed-forward loop with PrrA. Our results reveal a previously unrealized, high degree of combinatorial regulation of photosynthetic genes and significant cross-talk between their transcriptional regulators, while illustrating previously unidentified links between photosynthesis and the maintenance of iron homeostasis. PMID:25503406
Global analysis of photosynthesis transcriptional regulatory networks.
Imam, Saheed; Noguera, Daniel R; Donohue, Timothy J
2014-12-01
Photosynthesis is a crucial biological process that depends on the interplay of many components. This work analyzed the gene targets for 4 transcription factors: FnrL, PrrA, CrpK and MppG (RSP_2888), which are known or predicted to control photosynthesis in Rhodobacter sphaeroides. Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) identified 52 operons under direct control of FnrL, illustrating its regulatory role in photosynthesis, iron homeostasis, nitrogen metabolism and regulation of sRNA synthesis. Using global gene expression analysis combined with ChIP-seq, we mapped the regulons of PrrA, CrpK and MppG. PrrA regulates ∼34 operons encoding mainly photosynthesis and electron transport functions, while CrpK, a previously uncharacterized Crp-family protein, regulates genes involved in photosynthesis and maintenance of iron homeostasis. Furthermore, CrpK and FnrL share similar DNA binding determinants, possibly explaining our observation of the ability of CrpK to partially compensate for the growth defects of a ΔFnrL mutant. We show that the Rrf2 family protein, MppG, plays an important role in photopigment biosynthesis, as part of an incoherent feed-forward loop with PrrA. Our results reveal a previously unrealized, high degree of combinatorial regulation of photosynthetic genes and significant cross-talk between their transcriptional regulators, while illustrating previously unidentified links between photosynthesis and the maintenance of iron homeostasis.
Highly sensitive and unbiased approach for elucidating antibody repertoires
Lin, Sherry G.; Ba, Zhaoqing; Du, Zhou; Zhang, Yu; Hu, Jiazhi; Alt, Frederick W.
2016-01-01
Developing B lymphocytes undergo V(D)J recombination to assemble germ-line V, D, and J gene segments into exons that encode the antigen-binding variable region of Ig heavy (H) and light (L) chains. IgH and IgL chains associate to form the B-cell receptor (BCR), which, upon antigen binding, activates B cells to secrete BCR as an antibody. Each of the huge number of clonally independent B cells expresses a unique set of IgH and IgL variable regions. The ability of V(D)J recombination to generate vast primary B-cell repertoires results from a combinatorial assortment of large numbers of different V, D, and J segments, coupled with diversification of the junctions between them to generate the complementary determining region 3 (CDR3) for antigen contact. Approaches to evaluate in depth the content of primary antibody repertoires and, ultimately, to study how they are further molded by secondary mutation and affinity maturation processes are of great importance to the B-cell development, vaccine, and antibody fields. We now describe an unbiased, sensitive, and readily accessible assay, referred to as high-throughput genome-wide translocation sequencing-adapted repertoire sequencing (HTGTS-Rep-seq), to quantify antibody repertoires. HTGTS-Rep-seq quantitatively identifies the vast majority of IgH and IgL V(D)J exons, including their unique CDR3 sequences, from progenitor and mature mouse B lineage cells via the use of specific J primers. HTGTS-Rep-seq also accurately quantifies DJH intermediates and V(D)J exons in either productive or nonproductive configurations. HTGTS-Rep-seq should be useful for studies of human samples, including clonal B-cell expansions, and also for following antibody affinity maturation processes. PMID:27354528
RNA-Rocket: an RNA-Seq analysis resource for infectious disease research
Warren, Andrew S.; Aurrecoechea, Cristina; Brunk, Brian; Desai, Prerak; Emrich, Scott; Giraldo-Calderón, Gloria I.; Harb, Omar; Hix, Deborah; Lawson, Daniel; Machi, Dustin; Mao, Chunhong; McClelland, Michael; Nordberg, Eric; Shukla, Maulik; Vosshall, Leslie B.; Wattam, Alice R.; Will, Rebecca; Yoo, Hyun Seung; Sobral, Bruno
2015-01-01
Motivation: RNA-Seq is a method for profiling transcription using high-throughput sequencing and is an important component of many research projects that wish to study transcript isoforms, condition specific expression and transcriptional structure. The methods, tools and technologies used to perform RNA-Seq analysis continue to change, creating a bioinformatics challenge for researchers who wish to exploit these data. Resources that bring together genomic data, analysis tools, educational material and computational infrastructure can minimize the overhead required of life science researchers. Results: RNA-Rocket is a free service that provides access to RNA-Seq and ChIP-Seq analysis tools for studying infectious diseases. The site makes available thousands of pre-indexed genomes, their annotations and the ability to stream results to the bioinformatics resources VectorBase, EuPathDB and PATRIC. The site also provides a combination of experimental data and metadata, examples of pre-computed analysis, step-by-step guides and a user interface designed to enable both novice and experienced users of RNA-Seq data. Availability and implementation: RNA-Rocket is available at rnaseq.pathogenportal.org. Source code for this project can be found at github.com/cidvbi/PathogenPortal. Contact: anwarren@vt.edu Supplementary information: Supplementary materials are available at Bioinformatics online. PMID:25573919
RNA-Rocket: an RNA-Seq analysis resource for infectious disease research.
Warren, Andrew S; Aurrecoechea, Cristina; Brunk, Brian; Desai, Prerak; Emrich, Scott; Giraldo-Calderón, Gloria I; Harb, Omar; Hix, Deborah; Lawson, Daniel; Machi, Dustin; Mao, Chunhong; McClelland, Michael; Nordberg, Eric; Shukla, Maulik; Vosshall, Leslie B; Wattam, Alice R; Will, Rebecca; Yoo, Hyun Seung; Sobral, Bruno
2015-05-01
RNA-Seq is a method for profiling transcription using high-throughput sequencing and is an important component of many research projects that wish to study transcript isoforms, condition specific expression and transcriptional structure. The methods, tools and technologies used to perform RNA-Seq analysis continue to change, creating a bioinformatics challenge for researchers who wish to exploit these data. Resources that bring together genomic data, analysis tools, educational material and computational infrastructure can minimize the overhead required of life science researchers. RNA-Rocket is a free service that provides access to RNA-Seq and ChIP-Seq analysis tools for studying infectious diseases. The site makes available thousands of pre-indexed genomes, their annotations and the ability to stream results to the bioinformatics resources VectorBase, EuPathDB and PATRIC. The site also provides a combination of experimental data and metadata, examples of pre-computed analysis, step-by-step guides and a user interface designed to enable both novice and experienced users of RNA-Seq data. RNA-Rocket is available at rnaseq.pathogenportal.org. Source code for this project can be found at github.com/cidvbi/PathogenPortal. anwarren@vt.edu Supplementary materials are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Preparation of Low-Input and Ligation-Free ChIP-seq Libraries Using Template-Switching Technology.
Bolduc, Nathalie; Lehman, Alisa P; Farmer, Andrew
2016-10-10
Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) has become the gold standard for mapping of transcription factors and histone modifications throughout the genome. However, for ChIP experiments involving few cells or targeting low-abundance transcription factors, the small amount of DNA recovered makes ligation of adapters very challenging. In this unit, we describe a ChIP-seq workflow that can be applied to small cell numbers, including a robust single-tube and ligation-free method for preparation of sequencing libraries from sub-nanogram amounts of ChIP DNA. An example ChIP protocol is first presented, resulting in selective enrichment of DNA-binding proteins and cross-linked DNA fragments immobilized on beads via an antibody bridge. This is followed by a protocol for fast and easy cross-linking reversal and DNA recovery. Finally, we describe a fast, ligation-free library preparation protocol, featuring DNA SMART technology, resulting in samples ready for Illumina sequencing. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.
Integration of ATAC-seq and RNA-seq identifies human alpha cell and beta cell signature genes.
Ackermann, Amanda M; Wang, Zhiping; Schug, Jonathan; Naji, Ali; Kaestner, Klaus H
2016-03-01
Although glucagon-secreting α-cells and insulin-secreting β-cells have opposing functions in regulating plasma glucose levels, the two cell types share a common developmental origin and exhibit overlapping transcriptomes and epigenomes. Notably, destruction of β-cells can stimulate repopulation via transdifferentiation of α-cells, at least in mice, suggesting plasticity between these cell fates. Furthermore, dysfunction of both α- and β-cells contributes to the pathophysiology of type 1 and type 2 diabetes, and β-cell de-differentiation has been proposed to contribute to type 2 diabetes. Our objective was to delineate the molecular properties that maintain islet cell type specification yet allow for cellular plasticity. We hypothesized that correlating cell type-specific transcriptomes with an atlas of open chromatin will identify novel genes and transcriptional regulatory elements such as enhancers involved in α- and β-cell specification and plasticity. We sorted human α- and β-cells and performed the "Assay for Transposase-Accessible Chromatin with high throughput sequencing" (ATAC-seq) and mRNA-seq, followed by integrative analysis to identify cell type-selective gene regulatory regions. We identified numerous transcripts with either α-cell- or β-cell-selective expression and discovered the cell type-selective open chromatin regions that correlate with these gene activation patterns. We confirmed cell type-selective expression on the protein level for two of the top hits from our screen. The "group specific protein" (GC; or vitamin D binding protein) was restricted to α-cells, while CHODL (chondrolectin) immunoreactivity was only present in β-cells. Furthermore, α-cell- and β-cell-selective ATAC-seq peaks were identified to overlap with known binding sites for islet transcription factors, as well as with single nucleotide polymorphisms (SNPs) previously identified as risk loci for type 2 diabetes. We have determined the genetic landscape of human α- and β-cells based on chromatin accessibility and transcript levels, which allowed for detection of novel α- and β-cell signature genes not previously known to be expressed in islets. Using fine-mapping of open chromatin, we have identified thousands of potential cis-regulatory elements that operate in an endocrine cell type-specific fashion.
Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput.
Gierahn, Todd M; Wadsworth, Marc H; Hughes, Travis K; Bryson, Bryan D; Butler, Andrew; Satija, Rahul; Fortune, Sarah; Love, J Christopher; Shalek, Alex K
2017-04-01
Single-cell RNA-seq can precisely resolve cellular states, but applying this method to low-input samples is challenging. Here, we present Seq-Well, a portable, low-cost platform for massively parallel single-cell RNA-seq. Barcoded mRNA capture beads and single cells are sealed in an array of subnanoliter wells using a semipermeable membrane, enabling efficient cell lysis and transcript capture. We use Seq-Well to profile thousands of primary human macrophages exposed to Mycobacterium tuberculosis.
Analysis of ChIP-seq Data in R/Bioconductor.
de Santiago, Ines; Carroll, Thomas
2018-01-01
The development of novel high-throughput sequencing methods for ChIP (chromatin immunoprecipitation) has provided a very powerful tool to study gene regulation in multiple conditions at unprecedented resolution and scale. Proactive quality-control and appropriate data analysis techniques are of critical importance to extract the most meaningful results from the data. Over the last years, an array of R/Bioconductor tools has been developed allowing researchers to process and analyze ChIP-seq data. This chapter provides an overview of the methods available to analyze ChIP-seq data based primarily on software packages from the open-source Bioconductor project. Protocols described in this chapter cover basic steps including data alignment, peak calling, quality control and data visualization, as well as more complex methods such as the identification of differentially bound regions and functional analyses to annotate regulatory regions. The steps in the data analysis process were demonstrated on publicly available data sets and will serve as a demonstration of the computational procedures routinely used for the analysis of ChIP-seq data in R/Bioconductor, from which readers can construct their own analysis pipelines.
Faggionato, Davide; Serb, Jeanne M
2017-08-01
The rise of high-throughput RNA sequencing (RNA-seq) and de novo transcriptome assembly has had a transformative impact on how we identify and study genes in the phototransduction cascade of non-model organisms. But the advantage provided by the nearly automated annotation of RNA-seq transcriptomes may at the same time hinder the possibility for gene discovery and the discovery of new gene functions. For example, standard functional annotation based on domain homology to known protein families can only confirm group membership, not identify the emergence of new biochemical function. In this study, we show the importance of developing a strategy that circumvents the limitations of semiautomated annotation and apply this workflow to photosensitivity as a means to discover non-opsin photoreceptors. We hypothesize that non-opsin G-protein-coupled receptor (GPCR) proteins may have chromophore-binding lysines in locations that differ from opsin. Here, we provide the first case study describing non-opsin light-sensitive GPCRs based on tissue-specific RNA-seq data of the common bay scallop Argopecten irradians (Lamarck, 1819). Using a combination of sequence analysis and three-dimensional protein modeling, we identified two candidate proteins. We tested their photochemical properties and provide evidence showing that these two proteins incorporate 11-cis and/or all-trans retinal and react to light photochemically. Based on this case study, we demonstrate that there is potential for the discovery of new light-sensitive GPCRs, and we have developed a workflow that starts from RNA-seq assemblies to the discovery of new non-opsin, GPCR-based photopigments.
Wiles, Travis J.; Norton, J. Paul; Russell, Colin W.; Dalley, Brian K.; Fischer, Kael F.; Mulvey, Matthew A.
2013-01-01
Strains of Extraintestinal Pathogenic Escherichia c oli (ExPEC) exhibit an array of virulence strategies and are a major cause of urinary tract infections, sepsis and meningitis. Efforts to understand ExPEC pathogenesis are challenged by the high degree of genetic and phenotypic variation that exists among isolates. Determining which virulence traits are widespread and which are strain-specific will greatly benefit the design of more effective therapies. Towards this goal, we utilized a quantitative genetic footprinting technique known as transposon insertion sequencing (Tn-seq) in conjunction with comparative pathogenomics to functionally dissect the genetic repertoire of a reference ExPEC isolate. Using Tn-seq and high-throughput zebrafish infection models, we tracked changes in the abundance of ExPEC variants within saturated transposon mutant libraries following selection within distinct host niches. Nine hundred and seventy bacterial genes (18% of the genome) were found to promote pathogen fitness in either a niche-dependent or independent manner. To identify genes with the highest therapeutic and diagnostic potential, a novel Trait Enrichment Analysis (TEA) algorithm was developed to ascertain the phylogenetic distribution of candidate genes. TEA revealed that a significant portion of the 970 genes identified by Tn-seq have homologues more often contained within the genomes of ExPEC and other known pathogens, which, as suggested by the first axiom of molecular Koch's postulates, is considered to be a key feature of true virulence determinants. Three of these Tn-seq-derived pathogen-associated genes—a transcriptional repressor, a putative metalloendopeptidase toxin and a hypothetical DNA binding protein—were deleted and shown to independently affect ExPEC fitness in zebrafish and mouse models of infection. Together, the approaches and observations reported herein provide a resource for future pathogenomics-based research and highlight the diversity of factors required by a single ExPEC isolate to survive within varying host environments. PMID:23990803
Deciphering the genomic targets of alkylating polyamide conjugates using high-throughput sequencing
Chandran, Anandhakumar; Syed, Junetha; Taylor, Rhys D.; Kashiwazaki, Gengo; Sato, Shinsuke; Hashiya, Kaori; Bando, Toshikazu; Sugiyama, Hiroshi
2016-01-01
Chemically engineered small molecules targeting specific genomic sequences play an important role in drug development research. Pyrrole-imidazole polyamides (PIPs) are a group of molecules that can bind to the DNA minor-groove and can be engineered to target specific sequences. Their biological effects rely primarily on their selective DNA binding. However, the binding mechanism of PIPs at the chromatinized genome level is poorly understood. Herein, we report a method using high-throughput sequencing to identify the DNA-alkylating sites of PIP-indole-seco-CBI conjugates. High-throughput sequencing analysis of conjugate 2 showed highly similar DNA-alkylating sites on synthetic oligos (histone-free DNA) and on human genomes (chromatinized DNA context). To our knowledge, this is the first report identifying alkylation sites across genomic DNA by alkylating PIP conjugates using high-throughput sequencing. PMID:27098039
Synthetic spike-in standards for high-throughput 16S rRNA gene amplicon sequencing.
Tourlousse, Dieter M; Yoshiike, Satowa; Ohashi, Akiko; Matsukura, Satoko; Noda, Naohiro; Sekiguchi, Yuji
2017-02-28
High-throughput sequencing of 16S rRNA gene amplicons (16S-seq) has become a widely deployed method for profiling complex microbial communities but technical pitfalls related to data reliability and quantification remain to be fully addressed. In this work, we have developed and implemented a set of synthetic 16S rRNA genes to serve as universal spike-in standards for 16S-seq experiments. The spike-ins represent full-length 16S rRNA genes containing artificial variable regions with negligible identity to known nucleotide sequences, permitting unambiguous identification of spike-in sequences in 16S-seq read data from any microbiome sample. Using defined mock communities and environmental microbiota, we characterized the performance of the spike-in standards and demonstrated their utility for evaluating data quality on a per-sample basis. Further, we showed that staggered spike-in mixtures added at the point of DNA extraction enable concurrent estimation of absolute microbial abundances suitable for comparative analysis. Results also underscored that template-specific Illumina sequencing artifacts may lead to biases in the perceived abundance of certain taxa. Taken together, the spike-in standards represent a novel bioanalytical tool that can substantially improve 16S-seq-based microbiome studies by enabling comprehensive quality control along with absolute quantification. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Choi, Sun Young; Park, Byeonghyeok; Choi, In-Geol; Sim, Sang Jun; Lee, Sun-Mi; Um, Youngsoon; Woo, Han Min
2016-01-01
The development of high-throughput technology using RNA-seq has allowed understanding of cellular mechanisms and regulations of bacterial transcription. In addition, transcriptome analysis with RNA-seq has been used to accelerate strain improvement through systems metabolic engineering. Synechococcus elongatus PCC 7942, a photosynthetic bacterium, has remarkable potential for biochemical and biofuel production due to photoautotrophic cell growth and direct CO2 conversion. Here, we performed a transcriptome analysis of S. elongatus PCC 7942 using RNA-seq to understand the changes of cellular metabolism and regulation for nitrogen starvation responses. As a result, differentially expressed genes (DEGs) were identified and functionally categorized. With mapping onto metabolic pathways, we probed transcriptional perturbation and regulation of carbon and nitrogen metabolisms relating to nitrogen starvation responses. Experimental evidence such as chlorophyll a and phycobilisome content and the measurement of CO2 uptake rate validated the transcriptome analysis. The analysis suggests that S. elongatus PCC 7942 reacts to nitrogen starvation by not only rearranging the cellular transport capacity involved in carbon and nitrogen assimilation pathways but also by reducing protein synthesis and photosynthesis activities. PMID:27488818
Veeranagouda, Yaligara; Debono-Lagneaux, Delphine; Fournet, Hamida; Thill, Gilbert; Didier, Michel
2018-01-16
The emergence of clustered regularly interspaced short palindromic repeats-Cas9 (CRISPR-Cas9) gene editing systems has enabled the creation of specific mutants at low cost, in a short time and with high efficiency, in eukaryotic cells. Since a CRISPR-Cas9 system typically creates an array of mutations in targeted sites, a successful gene editing project requires careful selection of edited clones. This process can be very challenging, especially when working with multiallelic genes and/or polyploid cells (such as cancer and plants cells). Here we described a next-generation sequencing method called CRISPR-Cas9 Edited Site Sequencing (CRES-Seq) for the efficient and high-throughput screening of CRISPR-Cas9-edited clones. CRES-Seq facilitates the precise genotyping up to 96 CRISPR-Cas9-edited sites (CRES) in a single MiniSeq (Illumina) run with an approximate sequencing cost of $6/clone. CRES-Seq is particularly useful when multiple genes are simultaneously targeted by CRISPR-Cas9, and also for screening of clones generated from multiallelic genes/polyploid cells. © 2018 by John Wiley & Sons, Inc. Copyright © 2018 John Wiley & Sons, Inc.
Boyacı, Ezel; Bojko, Barbara; Reyes-Garcés, Nathaly; Poole, Justen J; Gómez-Ríos, Germán Augusto; Teixeira, Alexandre; Nicol, Beate; Pawliszyn, Janusz
2018-01-18
In vitro high-throughput non-depletive quantitation of chemicals in biofluids is of growing interest in many areas. Some of the challenges facing researchers include the limited volume of biofluids, rapid and high-throughput sampling requirements, and the lack of reliable methods. Coupled to the above, growing interest in the monitoring of kinetics and dynamics of miniaturized biosystems has spurred the demand for development of novel and revolutionary methodologies for analysis of biofluids. The applicability of solid-phase microextraction (SPME) is investigated as a potential technology to fulfill the aforementioned requirements. As analytes with sufficient diversity in their physicochemical features, nicotine, N,N-Diethyl-meta-toluamide, and diclofenac were selected as test compounds for the study. The objective was to develop methodologies that would allow repeated non-depletive sampling from 96-well plates, using 100 µL of sample. Initially, thin film-SPME was investigated. Results revealed substantial depletion and consequent disruption in the system. Therefore, new ultra-thin coated fibers were developed. The applicability of this device to the described sampling scenario was tested by determining the protein binding of the analytes. Results showed good agreement with rapid equilibrium dialysis. The presented method allows high-throughput analysis using small volumes, enabling fast reliable free and total concentration determinations without disruption of system equilibrium.
Repliscan: a tool for classifying replication timing regions.
Zynda, Gregory J; Song, Jawon; Concia, Lorenzo; Wear, Emily E; Hanley-Bowdoin, Linda; Thompson, William F; Vaughn, Matthew W
2017-08-07
Replication timing experiments that use label incorporation and high throughput sequencing produce peaked data similar to ChIP-Seq experiments. However, the differences in experimental design, coverage density, and possible results make traditional ChIP-Seq analysis methods inappropriate for use with replication timing. To accurately detect and classify regions of replication across the genome, we present Repliscan. Repliscan robustly normalizes, automatically removes outlying and uninformative data points, and classifies Repli-seq signals into discrete combinations of replication signatures. The quality control steps and self-fitting methods make Repliscan generally applicable and more robust than previous methods that classify regions based on thresholds. Repliscan is simple and effective to use on organisms with different genome sizes. Even with analysis window sizes as small as 1 kilobase, reliable profiles can be generated with as little as 2.4x coverage.
SeqAPASS to evaluate conservation of high-throughput screening targets across non-mammalian species
Cell-based high-throughput screening (HTS) and computational technologies are being applied as tools for toxicity testing in the 21st century. The U.S. Environmental Protection Agency (EPA) embraced these technologies and created the ToxCast Program in 2007, which has served as a...
2006-10-01
Gibbs, E. M., Fletterick, R. J., Day, Y. S. N., Myszka, D. G., and Rath, V. L. (2002) “Structure-activity analysis of the purine-binding site of human ...Rich, R. L., Day, Y. S. N., Morton, T. A., and Myszka, D. G., (2001) “High- resolution and high-throughput protocols for measuring drug/ human serum...entire text) 1. Attard, P., Images of nanobubbles on hydrophobic surfaces and their interactions. Phys. Rev. Lett., 2001. 87. 2. Ottino, J.M
IAOseq: inferring abundance of overlapping genes using RNA-seq data.
Sun, Hong; Yang, Shuang; Tun, Liangliang; Li, Yixue
2015-01-01
Overlapping transcription constitutes a common mechanism for regulating gene expression. A major limitation of the overlapping transcription assays is the lack of high throughput expression data. We developed a new tool (IAOseq) that is based on reads distributions along the transcribed regions to identify the expression levels of overlapping genes from standard RNA-seq data. Compared with five commonly used quantification methods, IAOseq showed better performance in the estimation accuracy of overlapping transcription levels. For the same strand overlapping transcription, currently existing high-throughput methods are rarely available to distinguish which strand was present in the original mRNA template. The IAOseq results showed that the commonly used methods gave an average of 1.6 fold overestimation of the expression levels of same strand overlapping genes. This work provides a useful tool for mining overlapping transcription levels from standard RNA-seq libraries. IAOseq could be used to help us understand the complex regulatory mechanism mediated by overlapping transcripts. IAOseq is freely available at http://lifecenter.sgst.cn/main/en/IAO_seq.jsp.
Uniform, optimal signal processing of mapped deep-sequencing data.
Kumar, Vibhor; Muratani, Masafumi; Rayan, Nirmala Arul; Kraus, Petra; Lufkin, Thomas; Ng, Huck Hui; Prabhakar, Shyam
2013-07-01
Despite their apparent diversity, many problems in the analysis of high-throughput sequencing data are merely special cases of two general problems, signal detection and signal estimation. Here we adapt formally optimal solutions from signal processing theory to analyze signals of DNA sequence reads mapped to a genome. We describe DFilter, a detection algorithm that identifies regulatory features in ChIP-seq, DNase-seq and FAIRE-seq data more accurately than assay-specific algorithms. We also describe EFilter, an estimation algorithm that accurately predicts mRNA levels from as few as 1-2 histone profiles (R ∼0.9). Notably, the presence of regulatory motifs in promoters correlates more with histone modifications than with mRNA levels, suggesting that histone profiles are more predictive of cis-regulatory mechanisms. We show by applying DFilter and EFilter to embryonic forebrain ChIP-seq data that regulatory protein identification and functional annotation are feasible despite tissue heterogeneity. The mathematical formalism underlying our tools facilitates integrative analysis of data from virtually any sequencing-based functional profile.
Multiplex single-molecule interaction profiling of DNA-barcoded proteins.
Gu, Liangcai; Li, Chao; Aach, John; Hill, David E; Vidal, Marc; Church, George M
2014-11-27
In contrast with advances in massively parallel DNA sequencing, high-throughput protein analyses are often limited by ensemble measurements, individual analyte purification and hence compromised quality and cost-effectiveness. Single-molecule protein detection using optical methods is limited by the number of spectrally non-overlapping chromophores. Here we introduce a single-molecular-interaction sequencing (SMI-seq) technology for parallel protein interaction profiling leveraging single-molecule advantages. DNA barcodes are attached to proteins collectively via ribosome display or individually via enzymatic conjugation. Barcoded proteins are assayed en masse in aqueous solution and subsequently immobilized in a polyacrylamide thin film to construct a random single-molecule array, where barcoding DNAs are amplified into in situ polymerase colonies (polonies) and analysed by DNA sequencing. This method allows precise quantification of various proteins with a theoretical maximum array density of over one million polonies per square millimetre. Furthermore, protein interactions can be measured on the basis of the statistics of colocalized polonies arising from barcoding DNAs of interacting proteins. Two demanding applications, G-protein coupled receptor and antibody-binding profiling, are demonstrated. SMI-seq enables 'library versus library' screening in a one-pot assay, simultaneously interrogating molecular binding affinity and specificity.
Multiplex single-molecule interaction profiling of DNA barcoded proteins
Gu, Liangcai; Li, Chao; Aach, John; Hill, David E.; Vidal, Marc; Church, George M.
2014-01-01
In contrast with advances in massively parallel DNA sequencing1, high-throughput protein analyses2-4 are often limited by ensemble measurements, individual analyte purification and hence compromised quality and cost-effectiveness. Single-molecule (SM) protein detection achieved using optical methods5 is limited by the number of spectrally nonoverlapping chromophores. Here, we introduce a single molecular interaction-sequencing (SMI-Seq) technology for parallel protein interaction profiling leveraging SM advantages. DNA barcodes are attached to proteins collectively via ribosome display6 or individually via enzymatic conjugation. Barcoded proteins are assayed en masse in aqueous solution and subsequently immobilized in a polyacrylamide (PAA) thin film to construct a random SM array, where barcoding DNAs are amplified into in situ polymerase colonies (polonies)7 and analyzed by DNA sequencing. This method allows precise quantification of various proteins with a theoretical maximum array density of over one million polonies per square millimeter. Furthermore, protein interactions can be measured based on the statistics of colocalized polonies arising from barcoding DNAs of interacting proteins. Two demanding applications, G-protein coupled receptor (GPCR) and antibody binding profiling, were demonstrated. SMI-Seq enables “library vs. library” screening in a one-pot assay, simultaneously interrogating molecular binding affinity and specificity. PMID:25252978
Tome, Jacob M; Ozer, Abdullah; Pagano, John M; Gheba, Dan; Schroth, Gary P; Lis, John T
2014-06-01
RNA-protein interactions play critical roles in gene regulation, but methods to quantitatively analyze these interactions at a large scale are lacking. We have developed a high-throughput sequencing-RNA affinity profiling (HiTS-RAP) assay by adapting a high-throughput DNA sequencer to quantify the binding of fluorescently labeled protein to millions of RNAs anchored to sequenced cDNA templates. Using HiTS-RAP, we measured the affinity of mutagenized libraries of GFP-binding and NELF-E-binding aptamers to their respective targets and identified critical regions of interaction. Mutations additively affected the affinity of the NELF-E-binding aptamer, whose interaction depended mainly on a single-stranded RNA motif, but not that of the GFP aptamer, whose interaction depended primarily on secondary structure.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Daum, Christopher; Zane, Matthew; Han, James
2011-01-31
The U.S. Department of Energy (DOE) Joint Genome Institute's (JGI) Production Sequencing group is committed to the generation of high-quality genomic DNA sequence to support the mission areas of renewable energy generation, global carbon management, and environmental characterization and clean-up. Within the JGI's Production Sequencing group, a robust Illumina Genome Analyzer and HiSeq pipeline has been established. Optimization of the sesequencer pipelines has been ongoing with the aim of continual process improvement of the laboratory workflow, reducing operational costs and project cycle times to increases ample throughput, and improving the overall quality of the sequence generated. A sequence QC analysismore » pipeline has been implemented to automatically generate read and assembly level quality metrics. The foremost of these optimization projects, along with sequencing and operational strategies, throughput numbers, and sequencing quality results will be presented.« less
Schulz, Sebastian; Eckweiler, Denitsa; Bielecka, Agata; Nicolai, Tanja; Franke, Raimo; Dötsch, Andreas; Hornischer, Klaus; Bruchmann, Sebastian; Düvel, Juliane; Häussler, Susanne
2015-01-01
Sigma factors are essential global regulators of transcription initiation in bacteria which confer promoter recognition specificity to the RNA polymerase core enzyme. They provide effective mechanisms for simultaneously regulating expression of large numbers of genes in response to challenging conditions, and their presence has been linked to bacterial virulence and pathogenicity. In this study, we constructed nine his-tagged sigma factor expressing and/or deletion mutant strains in the opportunistic pathogen Pseudomonas aeruginosa. To uncover the direct and indirect sigma factor regulons, we performed mRNA profiling, as well as chromatin immunoprecipitation coupled to high-throughput sequencing. We furthermore elucidated the de novo binding motif of each sigma factor, and validated the RNA- and ChIP-seq results by global motif searches in the proximity of transcriptional start sites (TSS). Our integrated approach revealed a highly modular network architecture which is composed of insulated functional sigma factor modules. Analysis of the interconnectivity of the various sigma factor networks uncovered a limited, but highly function-specific, crosstalk which orchestrates complex cellular processes. Our data indicate that the modular structure of sigma factor networks enables P. aeruginosa to function adequately in its environment and at the same time is exploited to build up higher-level functions by specific interconnections that are dominated by a participation of RpoN. PMID:25780925
Qu, Xiancheng; Hu, Menghong; Shang, Yueyong; Pan, Lisha; Jia, Peixuan; Fu, Chunxue; Liu, Qigen; Wang, Youji
2018-01-01
Next-generation sequencing was used to analyze the effects of toxic microcystin-LR (MC-LR) on silver carp (Hypophthalmichthys molitrix). Silver carps were intraperitoneally injected with MC-LR, and RNA-seq and miRNA-seq in the liver were analyzed at 0.25, 0.5, and 1 h. The expression of glutathione S-transferase (GST), which acts as a marker gene for MC-LR, was tested to determine the earliest time point at which GST transcription was initiated in the liver tissues of the MC-LR-treated silver carps. Hepatic RNA-seq/miRNA-seq analysis and data integration analysis were conducted with reference to the identified time point. Quantitative PCR (qPCR) was performed to detect the expression of the following genes at the three time points: heme oxygenase 1 (HO-1), interleukin-10 receptor 1 (IL-10R1), apolipoprotein A-I (apoA-I), and heme binding protein 2 (HBP2). Results showed that the liver GST expression was remarkably decreased at 0.25 h (P < 0.05). RNA-seq at this time point revealed that the liver tissue contained 97,505 unigenes, including 184 significantly different unigenes and 75 unknown genes. Gene Ontology (GO) term enrichment analysis suggested that 35 of the 145 enriched GO terms were significantly enriched and mainly related to the immune system regulation network. KEGG pathway enrichment analysis showed that 18 of the 189 pathways were significantly enriched, and the most significant was a ribosome pathway containing 77 differentially expressed genes. miRNA-seq analysis indicated that the longest miRNA had 22 nucleotides (nt), followed by 21 and 23 nt. A total of 286 known miRNAs, 332 known miRNA precursor sequences, and 438 new miRNAs were predicted. A total of 1,048,575 mRNA–miRNA interaction sites were obtained, and 21,252 and 21,241 target genes were respectively predicted in known and new miRNAs. qPCR revealed that HO-1, IL-10R1, apoA-I, and HBP2 were significantly differentially expressed and might play important roles in the toxicity and liver detoxification of MC-LR in fish. These results were consistent with those of high-throughput sequencing, thereby verifying the accuracy of our sequencing data. RNA-seq and miRNA-seq analyses of silver carp liver injected with MC-LR provided valuable and new insights into the toxic effects of MC-LR and the antitoxic mechanisms of MC-LR in fish. The RNA/miRNA data are available from the NCBI database Registration No. : SRP075165. PMID:29692738
High-throughput transcriptome analysis of barley (Hordeum vulgare) exposed to excessive boron.
Tombuloglu, Guzin; Tombuloglu, Huseyin; Sakcali, M Serdal; Unver, Turgay
2015-02-15
Boron (B) is an essential micronutrient for optimum plant growth. However, above certain threshold B is toxic and causes yield loss in agricultural lands. While a number of studies were conducted to understand B tolerance mechanism, a transcriptome-wide approach for B tolerant barley is performed here for the first time. A high-throughput RNA-Seq (cDNA) sequencing technology (Illumina) was used with barley (Hordeum vulgare), yielding 208 million clean reads. In total, 256,874 unigenes were generated and assigned to known peptide databases: Gene Ontology (GO) (99,043), Swiss-Prot (38,266), Clusters of Orthologous Groups (COG) (26,250), and the Kyoto Encyclopedia of Genes and Genomes (KEGG) (36,860), as determined by BLASTx search. According to the digital gene expression (DGE) analyses, 16% and 17% of the transcripts were found to be differentially regulated in root and leaf tissues, respectively. Most of them were involved in cell wall, stress response, membrane, protein kinase and transporter mechanisms. Some of the genes detected as highly expressed in root tissue are phospholipases, predicted divalent heavy-metal cation transporters, formin-like proteins and calmodulin/Ca(2+)-binding proteins. In addition, chitin-binding lectin precursor, ubiquitin carboxyl-terminal hydrolase, and serine/threonine-protein kinase AFC2 genes were indicated to be highly regulated in leaf tissue upon excess B treatment. Some pathways, such as the Ca(2+)-calmodulin system, are activated in response to B toxicity. The differential regulation of 10 transcripts was confirmed by qRT-PCR, revealing the tissue-specific responses against B toxicity and their putative function in B-tolerance mechanisms. Copyright © 2014. Published by Elsevier B.V.
Li, Yang Eric; Xiao, Mu; Shi, Binbin; Yang, Yu-Cheng T; Wang, Dong; Wang, Fei; Marcia, Marco; Lu, Zhi John
2017-09-08
Crosslinking immunoprecipitation sequencing (CLIP-seq) technologies have enabled researchers to characterize transcriptome-wide binding sites of RNA-binding protein (RBP) with high resolution. We apply a soft-clustering method, RBPgroup, to various CLIP-seq datasets to group together RBPs that specifically bind the same RNA sites. Such combinatorial clustering of RBPs helps interpret CLIP-seq data and suggests functional RNA regulatory elements. Furthermore, we validate two RBP-RBP interactions in cell lines. Our approach links proteins and RNA motifs known to possess similar biochemical and cellular properties and can, when used in conjunction with additional experimental data, identify high-confidence RBP groups and their associated RNA regulatory elements.
Soyer, Jessica L; Möller, Mareike; Schotanus, Klaas; Connolly, Lanelle R; Galazka, Jonathan M; Freitag, Michael; Stukenbrock, Eva H
2015-06-01
The presence or absence of specific transcription factors, chromatin remodeling machineries, chromatin modification enzymes, post-translational histone modifications and histone variants all play crucial roles in the regulation of pathogenicity genes. Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) provides an important tool to study genome-wide protein-DNA interactions to help understand gene regulation in the context of native chromatin. ChIP-seq is a convenient in vivo technique to identify, map and characterize occupancy of specific DNA fragments with proteins against which specific antibodies exist or which can be epitope-tagged in vivo. We optimized existing ChIP protocols for use in the wheat pathogen Zymoseptoria tritici and closely related sister species. Here, we provide a detailed method, underscoring which aspects of the technique are organism-specific. Library preparation for Illumina sequencing is described, as this is currently the most widely used ChIP-seq method. One approach for the analysis and visualization of representative sequence is described; improved tools for these analyses are constantly being developed. Using ChIP-seq with antibodies against H3K4me2, which is considered a mark for euchromatin or H3K9me3 and H3K27me3, which are considered marks for heterochromatin, the overall distribution of euchromatin and heterochromatin in the genome of Z. tritici can be determined. Our ChIP-seq protocol was also successfully applied to Z. tritici strains with high levels of melanization or aberrant colony morphology, and to different species of the genus (Z. ardabiliae and Z. pseudotritici), suggesting that our technique is robust. The methods described here provide a powerful framework to study new aspects of chromatin biology and gene regulation in this prominent wheat pathogen. Copyright © 2015 Elsevier Inc. All rights reserved.
High-throughput illumina strand-specific RNA sequencing library preparation
USDA-ARS?s Scientific Manuscript database
Conventional Illumina RNA-Seq does not have the resolution to decode the complex eukaryote transcriptome due to the lack of RNA polarity information. Strand-specific RNA sequencing (ssRNA-Seq) can overcome these limitations and as such is better suited for genome annotation, de novo transcriptome as...
Recent advances in targeted RNA-Seq technology allow researchers to efficiently and cost-effectively obtain whole transcriptome profiles using picograms of mRNA from human cell lysates. Low mRNA input requirements and sample multiplexing capabilities has made time- and concentrat...
SPAR: small RNA-seq portal for analysis of sequencing experiments.
Kuksa, Pavel P; Amlie-Wolf, Alexandre; Katanic, Živadin; Valladares, Otto; Wang, Li-San; Leung, Yuk Yee
2018-05-04
The introduction of new high-throughput small RNA sequencing protocols that generate large-scale genomics datasets along with increasing evidence of the significant regulatory roles of small non-coding RNAs (sncRNAs) have highlighted the urgent need for tools to analyze and interpret large amounts of small RNA sequencing data. However, it remains challenging to systematically and comprehensively discover and characterize sncRNA genes and specifically-processed sncRNA products from these datasets. To fill this gap, we present Small RNA-seq Portal for Analysis of sequencing expeRiments (SPAR), a user-friendly web server for interactive processing, analysis, annotation and visualization of small RNA sequencing data. SPAR supports sequencing data generated from various experimental protocols, including smRNA-seq, short total RNA sequencing, microRNA-seq, and single-cell small RNA-seq. Additionally, SPAR includes publicly available reference sncRNA datasets from our DASHR database and from ENCODE across 185 human tissues and cell types to produce highly informative small RNA annotations across all major small RNA types and other features such as co-localization with various genomic features, precursor transcript cleavage patterns, and conservation. SPAR allows the user to compare the input experiment against reference ENCODE/DASHR datasets. SPAR currently supports analyses of human (hg19, hg38) and mouse (mm10) sequencing data. SPAR is freely available at https://www.lisanwanglab.org/SPAR.
USDA-ARS?s Scientific Manuscript database
The transcriptome provides a functional footprint of the genome by enumerating the molecular components of cells and tissues. The field of transcript discovery has been revolutionized through high-throughput mRNA sequencing (RNA-seq). Here, we present a methodology that replicates and improves exist...
Lai, Ghee Chuan; Cho, Hongbaek; Bernhardt, Thomas G
2017-07-01
Bacterial cells are typically surrounded by an net-like macromolecule called the cell wall constructed from the heteropolymer peptidoglycan (PG). Biogenesis of this matrix is the target of penicillin and related beta-lactams. These drugs inhibit the transpeptidase activity of PG synthases called penicillin-binding proteins (PBPs), preventing the crosslinking of nascent wall material into the existing network. The beta-lactam mecillinam specifically targets the PBP2 enzyme in the cell elongation machinery of Escherichia coli. Low-throughput selections for mecillinam resistance have historically been useful in defining mechanisms involved in cell wall biogenesis and the killing activity of beta-lactam antibiotics. Here, we used transposon-sequencing (Tn-Seq) as a high-throughput method to identify nearly all mecillinam resistance loci in the E. coli genome, providing a comprehensive resource for uncovering new mechanisms underlying PG assembly and drug resistance. Induction of the stringent response or the Rcs envelope stress response has been previously implicated in mecillinam resistance. We therefore also performed the Tn-Seq analysis in mutants defective for these responses in addition to wild-type cells. Thus, the utility of the dataset was greatly enhanced by determining the stress response dependence of each resistance locus in the resistome. Reasoning that stress response-independent resistance loci are those most likely to identify direct modulators of cell wall biogenesis, we focused our downstream analysis on this subset of the resistome. Characterization of one of these alleles led to the surprising discovery that the overproduction of endopeptidase enzymes that cleave crosslinks in the cell wall promotes mecillinam resistance by stimulating PG synthesis by a subset of PBPs. Our analysis of this activation mechanism suggests that, contrary to the prevailing view in the field, PG synthases and PG cleaving enzymes need not function in multi-enzyme complexes to expand the cell wall matrix.
Mapping RNA-seq Reads with STAR
Dobin, Alexander; Gingeras, Thomas R.
2015-01-01
Mapping of large sets of high-throughput sequencing reads to a reference genome is one of the foundational steps in RNA-seq data analysis. The STAR software package performs this task with high levels of accuracy and speed. In addition to detecting annotated and novel splice junctions, STAR is capable of discovering more complex RNA sequence arrangements, such as chimeric and circular RNA. STAR can align spliced sequences of any length with moderate error rates providing scalability for emerging sequencing technologies. STAR generates output files that can be used for many downstream analyses such as transcript/gene expression quantification, differential gene expression, novel isoform reconstruction, signal visualization, and so forth. In this unit we describe computational protocols that produce various output files, use different RNA-seq datatypes, and utilize different mapping strategies. STAR is Open Source software that can be run on Unix, Linux or Mac OS X systems. PMID:26334920
Mapping RNA-seq Reads with STAR.
Dobin, Alexander; Gingeras, Thomas R
2015-09-03
Mapping of large sets of high-throughput sequencing reads to a reference genome is one of the foundational steps in RNA-seq data analysis. The STAR software package performs this task with high levels of accuracy and speed. In addition to detecting annotated and novel splice junctions, STAR is capable of discovering more complex RNA sequence arrangements, such as chimeric and circular RNA. STAR can align spliced sequences of any length with moderate error rates, providing scalability for emerging sequencing technologies. STAR generates output files that can be used for many downstream analyses such as transcript/gene expression quantification, differential gene expression, novel isoform reconstruction, and signal visualization. In this unit, we describe computational protocols that produce various output files, use different RNA-seq datatypes, and utilize different mapping strategies. STAR is open source software that can be run on Unix, Linux, or Mac OS X systems. Copyright © 2015 John Wiley & Sons, Inc.
MeRIP-PF: An Easy-to-use Pipeline for High-resolution Peak-finding in MeRIP-Seq Data
Li, Yuli; Song, Shuhui; Li, Cuiping; Yu, Jun
2013-01-01
RNA modifications, especially methylation of the N6 position of adenosine (A)—m6A, represent an emerging research frontier in RNA biology. With the rapid development of high-throughput sequencing technology, in-depth study of m6A distribution and function relevance becomes feasible. However, a robust method to effectively identify m6A-modified regions has not been available yet. Here, we present a novel high-efficiency and user-friendly analysis pipeline called MeRIP-PF for the signal identification of MeRIP-Seq data in reference to controls. MeRIP-PF provides a statistical P-value for each identified m6A region based on the difference of read distribution when compared to the controls and also calculates false discovery rate (FDR) as a cut off to differentiate reliable m6A regions from the background. Furthermore, MeRIP-PF also achieves gene annotation of m6A signals or peaks and produce outputs in both XLS and graphical format, which are useful for further study. MeRIP-PF is implemented in Perl and is freely available at http://software.big.ac.cn/MeRIP-PF.html. PMID:23434047
USDA-ARS?s Scientific Manuscript database
High-throughput sequencing is often used for studies of the transcriptome, particularly for comparisons between experimental conditions. Due to sequencing costs, a limited number of biological replicates are typically considered in such experiments, leading to low detection power for differential ex...
USDA-ARS?s Scientific Manuscript database
We conducted genomic sequencing to identify viruses associated with mosaic disease of an apple tree using the high-throughput sequencing (HTS) Illumina RNA-seq platform. The objective was to examine if rapid identification and characterization of viruses could be effectively achieved by RNA-seq anal...
Recent advances in ChIP-seq analysis: from quality management to whole-genome annotation.
Nakato, Ryuichiro; Shirahige, Katsuhiko
2017-03-01
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) analysis can detect protein/DNA-binding and histone-modification sites across an entire genome. Recent advances in sequencing technologies and analyses enable us to compare hundreds of samples simultaneously; such large-scale analysis has potential to reveal the high-dimensional interrelationship level for regulatory elements and annotate novel functional genomic regions de novo. Because many experimental considerations are relevant to the choice of a method in a ChIP-seq analysis, the overall design and quality management of the experiment are of critical importance. This review offers guiding principles of computation and sample preparation for ChIP-seq analyses, highlighting the validity and limitations of the state-of-the-art procedures at each step. We also discuss the latest challenges of single-cell analysis that will encourage a new era in this field. © The Author 2016. Published by Oxford University Press.
Penjor, Tshering; Mimura, Takashi; Matsumoto, Ryoji; Yamamoto, Masashi; Nagano, Yukio
2014-01-01
Lime [Citrus aurantifolia (Cristm.) Swingle] is a Citrus species that is a popular ingredient in many cuisines. Some citrus plants are known to originate in the area ranging from northeastern India to southwestern China. In the current study, we characterized and compared limes grown in Bhutan (n = 5 accessions) and Indonesia (n = 3 accessions). The limes were separated into two groups based on their morphology. Restriction site-associated DNA sequencing (RAD-seq) separated the eight accessions into two clusters. One cluster contained four accessions from Bhutan, whereas the other cluster contained one accession from Bhutan and the three accessions from Indonesia. This genetic classification supported the morphological classification of limes. The analysis suggests that the properties associated with asexual reproduction, and somatic homologous recombination, have contributed to the genetic diversification of limes. PMID:24781859
Analytical workflow profiling gene expression in murine macrophages
Nixon, Scott E.; González-Peña, Dianelys; Lawson, Marcus A.; McCusker, Robert H.; Hernandez, Alvaro G.; O’Connor, Jason C.; Dantzer, Robert; Kelley, Keith W.
2015-01-01
Comprehensive and simultaneous analysis of all genes in a biological sample is a capability of RNA-Seq technology. Analysis of the entire transcriptome benefits from summarization of genes at the functional level. As a cellular response of interest not previously explored with RNA-Seq, peritoneal macrophages from mice under two conditions (control and immunologically challenged) were analyzed for gene expression differences. Quantification of individual transcripts modeled RNA-Seq read distribution and uncertainty (using a Beta Negative Binomial distribution), then tested for differential transcript expression (False Discovery Rate-adjusted p-value < 0.05). Enrichment of functional categories utilized the list of differentially expressed genes. A total of 2079 differentially expressed transcripts representing 1884 genes were detected. Enrichment of 92 categories from Gene Ontology Biological Processes and Molecular Functions, and KEGG pathways were grouped into 6 clusters. Clusters included defense and inflammatory response (Enrichment Score = 11.24) and ribosomal activity (Enrichment Score = 17.89). Our work provides a context to the fine detail of individual gene expression differences in murine peritoneal macrophages during immunological challenge with high throughput RNA-Seq. PMID:25708305
Advances in single-cell RNA sequencing and its applications in cancer research.
Zhu, Sibo; Qing, Tao; Zheng, Yuanting; Jin, Li; Shi, Leming
2017-08-08
Unlike population-level approaches, single-cell RNA sequencing enables transcriptomic analysis of an individual cell. Through the combination of high-throughput sequencing and bioinformatic tools, single-cell RNA-seq can detect more than 10,000 transcripts in one cell to distinguish cell subsets and dynamic cellular changes. After several years' development, single-cell RNA-seq can now achieve massively parallel, full-length mRNA sequencing as well as in situ sequencing and even has potential for multi-omic detection. One appealing area of single-cell RNA-seq is cancer research, and it is regarded as a promising way to enhance prognosis and provide more precise target therapy by identifying druggable subclones. Indeed, progresses have been made regarding solid tumor analysis to reveal intratumoral heterogeneity, correlations between signaling pathways, stemness, drug resistance, and tumor architecture shaping the microenvironment. Furthermore, through investigation into circulating tumor cells, many genes have been shown to promote a propensity toward stemness and the epithelial-mesenchymal transition, to enhance anchoring and adhesion, and to be involved in mechanisms of anoikis resistance and drug resistance. This review focuses on advances and progresses of single-cell RNA-seq with regard to the following aspects: 1. Methodologies of single-cell RNA-seq 2. Single-cell isolation techniques 3. Single-cell RNA-seq in solid tumor research 4. Single-cell RNA-seq in circulating tumor cell research 5.
Advances in single-cell RNA sequencing and its applications in cancer research
Zhu, Sibo; Qing, Tao; Zheng, Yuanting; Jin, Li; Shi, Leming
2017-01-01
Unlike population-level approaches, single-cell RNA sequencing enables transcriptomic analysis of an individual cell. Through the combination of high-throughput sequencing and bioinformatic tools, single-cell RNA-seq can detect more than 10,000 transcripts in one cell to distinguish cell subsets and dynamic cellular changes. After several years’ development, single-cell RNA-seq can now achieve massively parallel, full-length mRNA sequencing as well as in situ sequencing and even has potential for multi-omic detection. One appealing area of single-cell RNA-seq is cancer research, and it is regarded as a promising way to enhance prognosis and provide more precise target therapy by identifying druggable subclones. Indeed, progresses have been made regarding solid tumor analysis to reveal intratumoral heterogeneity, correlations between signaling pathways, stemness, drug resistance, and tumor architecture shaping the microenvironment. Furthermore, through investigation into circulating tumor cells, many genes have been shown to promote a propensity toward stemness and the epithelial-mesenchymal transition, to enhance anchoring and adhesion, and to be involved in mechanisms of anoikis resistance and drug resistance. This review focuses on advances and progresses of single-cell RNA-seq with regard to the following aspects: 1. Methodologies of single-cell RNA-seq 2. Single-cell isolation techniques 3. Single-cell RNA-seq in solid tumor research 4. Single-cell RNA-seq in circulating tumor cell research 5. Perspectives PMID:28881849
Ching, Travers; Zhu, Xun; Garmire, Lana X
2018-04-01
Artificial neural networks (ANN) are computing architectures with many interconnections of simple neural-inspired computing elements, and have been applied to biomedical fields such as imaging analysis and diagnosis. We have developed a new ANN framework called Cox-nnet to predict patient prognosis from high throughput transcriptomics data. In 10 TCGA RNA-Seq data sets, Cox-nnet achieves the same or better predictive accuracy compared to other methods, including Cox-proportional hazards regression (with LASSO, ridge, and mimimax concave penalty), Random Forests Survival and CoxBoost. Cox-nnet also reveals richer biological information, at both the pathway and gene levels. The outputs from the hidden layer node provide an alternative approach for survival-sensitive dimension reduction. In summary, we have developed a new method for accurate and efficient prognosis prediction on high throughput data, with functional biological insights. The source code is freely available at https://github.com/lanagarmire/cox-nnet.
The Role of Genome Accessibility in Transcription Factor Binding in Bacteria.
Gomes, Antonio L C; Wang, Harris H
2016-04-01
ChIP-seq enables genome-scale identification of regulatory regions that govern gene expression. However, the biological insights generated from ChIP-seq analysis have been limited to predictions of binding sites and cooperative interactions. Furthermore, ChIP-seq data often poorly correlate with in vitro measurements or predicted motifs, highlighting that binding affinity alone is insufficient to explain transcription factor (TF)-binding in vivo. One possibility is that binding sites are not equally accessible across the genome. A more comprehensive biophysical representation of TF-binding is required to improve our ability to understand, predict, and alter gene expression. Here, we show that genome accessibility is a key parameter that impacts TF-binding in bacteria. We developed a thermodynamic model that parameterizes ChIP-seq coverage in terms of genome accessibility and binding affinity. The role of genome accessibility is validated using a large-scale ChIP-seq dataset of the M. tuberculosis regulatory network. We find that accounting for genome accessibility led to a model that explains 63% of the ChIP-seq profile variance, while a model based in motif score alone explains only 35% of the variance. Moreover, our framework enables de novo ChIP-seq peak prediction and is useful for inferring TF-binding peaks in new experimental conditions by reducing the need for additional experiments. We observe that the genome is more accessible in intergenic regions, and that increased accessibility is positively correlated with gene expression and anti-correlated with distance to the origin of replication. Our biophysically motivated model provides a more comprehensive description of TF-binding in vivo from first principles towards a better representation of gene regulation in silico, with promising applications in systems biology.
RNA-Seq Technology and Its Application in Fish Transcriptomics
Ba, Yi; Zhuang, Qianfeng
2014-01-01
Abstract High-throughput sequencing technologies, also known as next-generation sequencing (NGS) technologies, have revolutionized the way that genomic research is advancing. In addition to the static genome, these state-of-art technologies have been recently exploited to analyze the dynamic transcriptome, and the resulting technology is termed RNA sequencing (RNA-seq). RNA-seq is free from many limitations of other transcriptomic approaches, such as microarray and tag-based sequencing method. Although RNA-seq has only been available for a short time, studies using this method have completely changed our perspective of the breadth and depth of eukaryotic transcriptomes. In terms of the transcriptomics of teleost fishes, both model and non-model species have benefited from the RNA-seq approach and have undergone tremendous advances in the past several years. RNA-seq has helped not only in mapping and annotating fish transcriptome but also in our understanding of many biological processes in fish, such as development, adaptive evolution, host immune response, and stress response. In this review, we first provide an overview of each step of RNA-seq from library construction to the bioinformatic analysis of the data. We then summarize and discuss the recent biological insights obtained from the RNA-seq studies in a variety of fish species. PMID:24380445
High-Throughput Lectin Microarray-Based Analysis of Live Cell Surface Glycosylation
Li, Yu; Tao, Sheng-ce; Zhu, Heng; Schneck, Jonathan P.
2011-01-01
Lectins, plant-derived glycan-binding proteins, have long been used to detect glycans on cell surfaces. However, the techniques used to characterize serum or cells have largely been limited to mass spectrometry, blots, flow cytometry, and immunohistochemistry. While these lectin-based approaches are well established and they can discriminate a limited number of sugar isomers by concurrently using a limited number of lectins, they are not amenable for adaptation to a high-throughput platform. Fortunately, given the commercial availability of lectins with a variety of glycan specificities, lectins can be printed on a glass substrate in a microarray format to profile accessible cell-surface glycans. This method is an inviting alternative for analysis of a broad range of glycans in a high-throughput fashion and has been demonstrated to be a feasible method of identifying binding-accessible cell surface glycosylation on living cells. The current unit presents a lectin-based microarray approach for analyzing cell surface glycosylation in a high-throughput fashion. PMID:21400689
A Comparison Study for DNA Motif Modeling on Protein Binding Microarray.
Wong, Ka-Chun; Li, Yue; Peng, Chengbin; Wong, Hau-San
2016-01-01
Transcription factor binding sites (TFBSs) are relatively short (5-15 bp) and degenerate. Identifying them is a computationally challenging task. In particular, protein binding microarray (PBM) is a high-throughput platform that can measure the DNA binding preference of a protein in a comprehensive and unbiased manner; for instance, a typical PBM experiment can measure binding signal intensities of a protein to all possible DNA k-mers (k = 8∼10). Since proteins can often bind to DNA with different binding intensities, one of the major challenges is to build TFBS (also known as DNA motif) models which can fully capture the quantitative binding affinity data. To learn DNA motif models from the non-convex objective function landscape, several optimization methods are compared and applied to the PBM motif model building problem. In particular, representative methods from different optimization paradigms have been chosen for modeling performance comparison on hundreds of PBM datasets. The results suggest that the multimodal optimization methods are very effective for capturing the binding preference information from PBM data. In particular, we observe a general performance improvement if choosing di-nucleotide modeling over mono-nucleotide modeling. In addition, the models learned by the best-performing method are applied to two independent applications: PBM probe rotation testing and ChIP-Seq peak sequence prediction, demonstrating its biological applicability.
Quantification of differential gene expression by multiplexed targeted resequencing of cDNA
Arts, Peer; van der Raadt, Jori; van Gestel, Sebastianus H.C.; Steehouwer, Marloes; Shendure, Jay; Hoischen, Alexander; Albers, Cornelis A.
2017-01-01
Whole-transcriptome or RNA sequencing (RNA-Seq) is a powerful and versatile tool for functional analysis of different types of RNA molecules, but sample reagent and sequencing cost can be prohibitive for hypothesis-driven studies where the aim is to quantify differential expression of a limited number of genes. Here we present an approach for quantification of differential mRNA expression by targeted resequencing of complementary DNA using single-molecule molecular inversion probes (cDNA-smMIPs) that enable highly multiplexed resequencing of cDNA target regions of ∼100 nucleotides and counting of individual molecules. We show that accurate estimates of differential expression can be obtained from molecule counts for hundreds of smMIPs per reaction and that smMIPs are also suitable for quantification of relative gene expression and allele-specific expression. Compared with low-coverage RNA-Seq and a hybridization-based targeted RNA-Seq method, cDNA-smMIPs are a cost-effective high-throughput tool for hypothesis-driven expression analysis in large numbers of genes (10 to 500) and samples (hundreds to thousands). PMID:28474677
The ChIP-exo Method: Identifying Protein-DNA Interactions with Near Base Pair Precision.
Perreault, Andrea A; Venters, Bryan J
2016-12-23
Chromatin immunoprecipitation (ChIP) is an indispensable tool in the fields of epigenetics and gene regulation that isolates specific protein-DNA interactions. ChIP coupled to high throughput sequencing (ChIP-seq) is commonly used to determine the genomic location of proteins that interact with chromatin. However, ChIP-seq is hampered by relatively low mapping resolution of several hundred base pairs and high background signal. The ChIP-exo method is a refined version of ChIP-seq that substantially improves upon both resolution and noise. The key distinction of the ChIP-exo methodology is the incorporation of lambda exonuclease digestion in the library preparation workflow to effectively footprint the left and right 5' DNA borders of the protein-DNA crosslink site. The ChIP-exo libraries are then subjected to high throughput sequencing. The resulting data can be leveraged to provide unique and ultra-high resolution insights into the functional organization of the genome. Here, we describe the ChIP-exo method that we have optimized and streamlined for mammalian systems and next-generation sequencing-by-synthesis platform.
Zhang, L; Liu, X J
2016-06-03
With the rapid development of next-generation high-throughput sequencing technology, RNA-seq has become a standard and important technique for transcriptome analysis. For multi-sample RNA-seq data, the existing expression estimation methods usually deal with each single-RNA-seq sample, and ignore that the read distributions are consistent across multiple samples. In the current study, we propose a structured sparse regression method, SSRSeq, to estimate isoform expression using multi-sample RNA-seq data. SSRSeq uses a non-parameter model to capture the general tendency of non-uniformity read distribution for all genes across multiple samples. Additionally, our method adds a structured sparse regularization, which not only incorporates the sparse specificity between a gene and its corresponding isoform expression levels, but also reduces the effects of noisy reads, especially for lowly expressed genes and isoforms. Four real datasets were used to evaluate our method on isoform expression estimation. Compared with other popular methods, SSRSeq reduced the variance between multiple samples, and produced more accurate isoform expression estimations, and thus more meaningful biological interpretations.
Strand-Specific RNA-Seq Analyses of Fruiting Body Development in Coprinopsis cinerea
Muraguchi, Hajime; Umezawa, Kiwamu; Niikura, Mai; ...
2015-10-28
We report that the basidiomycete fungus Coprinopsis cinerea is an important model system for multicellular development. Fruiting bodies of C. cinerea are typical mushrooms, which can be produced synchronously on defined media in the laboratory. To investigate the transcriptome in detail during fruiting body development, high-throughput sequencing (RNA-seq) was performed using cDNA libraries strand-specifically constructed from 13 points (stages/tissues) with two biological replicates. The reads were aligned to 14,245 predicted transcripts, and counted for forward and reverse transcripts. Differentially expressed genes (DEGs) between two adjacent points and between vegetative mycelium and each point were detected by Tag Count Comparison (TCC).more » To validate RNA-seq data, expression levels of selected genes were compared using RPKM values in RNA-seq data and qRT-PCR data, and DEGs detected in microarray data were examined in MA plots of RNA-seq data by TCC. We discuss events deduced from GO analysis of DEGs. In addition, we uncovered both transcription factor candidates and antisense transcripts that are likely to be involved in developmental regulation for fruiting.« less
Strand-Specific RNA-Seq Analyses of Fruiting Body Development in Coprinopsis cinerea
DOE Office of Scientific and Technical Information (OSTI.GOV)
Muraguchi, Hajime; Umezawa, Kiwamu; Niikura, Mai
We report that the basidiomycete fungus Coprinopsis cinerea is an important model system for multicellular development. Fruiting bodies of C. cinerea are typical mushrooms, which can be produced synchronously on defined media in the laboratory. To investigate the transcriptome in detail during fruiting body development, high-throughput sequencing (RNA-seq) was performed using cDNA libraries strand-specifically constructed from 13 points (stages/tissues) with two biological replicates. The reads were aligned to 14,245 predicted transcripts, and counted for forward and reverse transcripts. Differentially expressed genes (DEGs) between two adjacent points and between vegetative mycelium and each point were detected by Tag Count Comparison (TCC).more » To validate RNA-seq data, expression levels of selected genes were compared using RPKM values in RNA-seq data and qRT-PCR data, and DEGs detected in microarray data were examined in MA plots of RNA-seq data by TCC. We discuss events deduced from GO analysis of DEGs. In addition, we uncovered both transcription factor candidates and antisense transcripts that are likely to be involved in developmental regulation for fruiting.« less
Barteneva, Natasha S; Vorobjev, Ivan A
2018-01-01
In this paper, we review some of the recent advances in cellular heterogeneity and single-cell analysis methods. In modern research of cellular heterogeneity, there are four major approaches: analysis of pooled samples, single-cell analysis, high-throughput single-cell analysis, and lately integrated analysis of cellular population at a single-cell level. Recently developed high-throughput single-cell genetic analysis methods such as RNA-Seq require purification step and destruction of an analyzed cell often are providing a snapshot of the investigated cell without spatiotemporal context. Correlative analysis of multiparameter morphological, functional, and molecular information is important for differentiation of more uniform groups in the spectrum of different cell types. Simplified distributions (histograms and 2D plots) can underrepresent biologically significant subpopulations. Future directions may include the development of nondestructive methods for dissecting molecular events in intact cells, simultaneous correlative cellular analysis of phenotypic and molecular features by hybrid technologies such as imaging flow cytometry, and further progress in supervised and non-supervised statistical analysis algorithms.
DIANA-LncBase v2: indexing microRNA targets on non-coding transcripts
Paraskevopoulou, Maria D.; Vlachos, Ioannis S.; Karagkouni, Dimitra; Georgakilas, Georgios; Kanellos, Ilias; Vergoulis, Thanasis; Zagganas, Konstantinos; Tsanakas, Panayiotis; Floros, Evangelos; Dalamagas, Theodore; Hatzigeorgiou, Artemis G.
2016-01-01
microRNAs (miRNAs) are short non-coding RNAs (ncRNAs) that act as post-transcriptional regulators of coding gene expression. Long non-coding RNAs (lncRNAs) have been recently reported to interact with miRNAs. The sponge-like function of lncRNAs introduces an extra layer of complexity in the miRNA interactome. DIANA-LncBase v1 provided a database of experimentally supported and in silico predicted miRNA Recognition Elements (MREs) on lncRNAs. The second version of LncBase (www.microrna.gr/LncBase) presents an extensive collection of miRNA:lncRNA interactions. The significantly enhanced database includes more than 70 000 low and high-throughput, (in)direct miRNA:lncRNA experimentally supported interactions, derived from manually curated publications and the analysis of 153 AGO CLIP-Seq libraries. The new experimental module presents a 14-fold increase compared to the previous release. LncBase v2 hosts in silico predicted miRNA targets on lncRNAs, identified with the DIANA-microT algorithm. The relevant module provides millions of predicted miRNA binding sites, accompanied with detailed metadata and MRE conservation metrics. LncBase v2 caters information regarding cell type specific miRNA:lncRNA regulation and enables users to easily identify interactions in 66 different cell types, spanning 36 tissues for human and mouse. Database entries are also supported by accurate lncRNA expression information, derived from the analysis of more than 6 billion RNA-Seq reads. PMID:26612864
Issues with RNA-seq analysis in non-model organisms: A salmonid example.
Sundaram, Arvind; Tengs, Torstein; Grimholt, Unni
2017-10-01
High throughput sequencing (HTS) is useful for many purposes as exemplified by the other topics included in this special issue. The purpose of this paper is to look into the unique challenges of using this technology in non-model organisms where resources such as genomes, functional genome annotations or genome complexity provide obstacles not met in model organisms. To describe these challenges, we narrow our scope to RNA sequencing used to study differential gene expression in response to pathogen challenge. As a demonstration species we chose Atlantic salmon, which has a sequenced genome with poor annotation and an added complexity due to many duplicated genes. We find that our RNA-seq analysis pipeline deciphers between duplicates despite high sequence identity. However, annotation issues provide problems in linking differentially expressed genes to pathways. Also, comparing results between approaches and species are complicated due to lack of standardized annotation. Copyright © 2017 Elsevier Ltd. All rights reserved.
Zhang, Qi; Zeng, Xin; Younkin, Sam; Kawli, Trupti; Snyder, Michael P; Keleş, Sündüz
2016-02-24
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) experiments revolutionized genome-wide profiling of transcription factors and histone modifications. Although maturing sequencing technologies allow these experiments to be carried out with short (36-50 bps), long (75-100 bps), single-end, or paired-end reads, the impact of these read parameters on the downstream data analysis are not well understood. In this paper, we evaluate the effects of different read parameters on genome sequence alignment, coverage of different classes of genomic features, peak identification, and allele-specific binding detection. We generated 101 bps paired-end ChIP-seq data for many transcription factors from human GM12878 and MCF7 cell lines. Systematic evaluations using in silico variations of these data as well as fully simulated data, revealed complex interplay between the sequencing parameters and analysis tools, and indicated clear advantages of paired-end designs in several aspects such as alignment accuracy, peak resolution, and most notably, allele-specific binding detection. Our work elucidates the effect of design on the downstream analysis and provides insights to investigators in deciding sequencing parameters in ChIP-seq experiments. We present the first systematic evaluation of the impact of ChIP-seq designs on allele-specific binding detection and highlights the power of pair-end designs in such studies.
Qin, Yidan; Yao, Jun; Wu, Douglas C.; Nottingham, Ryan M.; Mohr, Sabine; Hunicke-Smith, Scott; Lambowitz, Alan M.
2016-01-01
Next-generation RNA-sequencing (RNA-seq) has revolutionized transcriptome profiling, gene expression analysis, and RNA-based diagnostics. Here, we developed a new RNA-seq method that exploits thermostable group II intron reverse transcriptases (TGIRTs) and used it to profile human plasma RNAs. TGIRTs have higher thermostability, processivity, and fidelity than conventional reverse transcriptases, plus a novel template-switching activity that can efficiently attach RNA-seq adapters to target RNA sequences without RNA ligation. The new TGIRT-seq method enabled construction of RNA-seq libraries from <1 ng of plasma RNA in <5 h. TGIRT-seq of RNA in 1-mL plasma samples from a healthy individual revealed RNA fragments mapping to a diverse population of protein-coding gene and long ncRNAs, which are enriched in intron and antisense sequences, as well as nearly all known classes of small ncRNAs, some of which have never before been seen in plasma. Surprisingly, many of the small ncRNA species were present as full-length transcripts, suggesting that they are protected from plasma RNases in ribonucleoprotein (RNP) complexes and/or exosomes. This TGIRT-seq method is readily adaptable for profiling of whole-cell, exosomal, and miRNAs, and for related procedures, such as HITS-CLIP and ribosome profiling. PMID:26554030
Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; Lamson, Jacob S.; He, Jennifer; Hoover, Cindi A.; Blow, Matthew J.; Bristow, James; Butland, Gareth
2015-01-01
ABSTRACT Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with any transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative d-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. PMID:25968644
MeRIP-PF: an easy-to-use pipeline for high-resolution peak-finding in MeRIP-Seq data.
Li, Yuli; Song, Shuhui; Li, Cuiping; Yu, Jun
2013-02-01
RNA modifications, especially methylation of the N(6) position of adenosine (A)-m(6)A, represent an emerging research frontier in RNA biology. With the rapid development of high-throughput sequencing technology, in-depth study of m(6)A distribution and function relevance becomes feasible. However, a robust method to effectively identify m(6)A-modified regions has not been available yet. Here, we present a novel high-efficiency and user-friendly analysis pipeline called MeRIP-PF for the signal identification of MeRIP-Seq data in reference to controls. MeRIP-PF provides a statistical P-value for each identified m(6)A region based on the difference of read distribution when compared to the controls and also calculates false discovery rate (FDR) as a cut off to differentiate reliable m(6)A regions from the background. Furthermore, MeRIP-PF also achieves gene annotation of m(6)A signals or peaks and produce outputs in both XLS and graphical format, which are useful for further study. MeRIP-PF is implemented in Perl and is freely available at http://software.big.ac.cn/MeRIP-PF.html. Copyright © 2013. Production and hosting by Elsevier Ltd.
Systems biology of cancer biomarker detection.
Mitra, Sanga; Das, Smarajit; Chakrabarti, Jayprokas
2013-01-01
Cancer systems-biology is an ever-growing area of research due to explosion of data; how to mine these data and extract useful information is the problem. To have an insight on carcinogenesis one need to systematically mine several resources, such as databases, microarray and next-generation sequences. This review encompasses management and analysis of cancer data, databases construction and data deposition, whole transcriptome and genome comparison, analysing results from high throughput experiments to uncover cellular pathways and molecular interactions, and the design of effective algorithms to identify potential biomarkers. Recent technical advances such as ChIP-on-chip, ChIP-seq and RNA-seq can be applied to get epigenetic information transformed into a high-throughput endeavour to which systems biology and bioinformatics are making significant inroads. The data from ENCODE and GENCODE projects available through UCSC genome browser can be considered as benchmark for comparison and meta-analysis. A pipeline for integrating next generation sequencing data, microarray data, and putting them together with the existing database is discussed. The understanding of cancer genomics is changing the way we approach cancer diagnosis and treatment. To give a better understanding of utilizing available resources' we have chosen oral cancer to show how and what kind of analysis can be done. This review is a computational genomic primer that provides a bird's eye view of computational and bioinformatics' tools currently available to perform integrated genomic and system biology analyses of several carcinoma.
Tn5Prime, a Tn5 based 5' capture method for single cell RNA-seq.
Cole, Charles; Byrne, Ashley; Beaudin, Anna E; Forsberg, E Camilla; Vollmers, Christopher
2018-06-01
RNA-sequencing (RNA-seq) is a powerful technique to investigate and quantify entire transcriptomes. Recent advances in the field have made it possible to explore the transcriptomes of single cells. However, most widely used RNA-seq protocols fail to provide crucial information regarding transcription start sites. Here we present a protocol, Tn5Prime, that takes advantage of the Tn5 transposase-based Smart-seq2 protocol to create RNA-seq libraries that capture the 5' end of transcripts. The Tn5Prime method dramatically streamlines the 5' capture process and is both cost effective and reliable. By applying Tn5Prime to bulk RNA and single cell samples, we were able to define transcription start sites as well as quantify transcriptomes at high accuracy and reproducibility. Additionally, similar to 3' end-based high-throughput methods like Drop-seq and 10× Genomics Chromium, the 5' capture Tn5Prime method allows the introduction of cellular identifiers during reverse transcription, simplifying the analysis of large numbers of single cells. In contrast to 3' end-based methods, Tn5Prime also enables the assembly of the variable 5' ends of the antibody sequences present in single B-cell data. Therefore, Tn5Prime presents a robust tool for both basic and applied research into the adaptive immune system and beyond.
Shi, Yang; Chinnaiyan, Arul M; Jiang, Hui
2015-07-01
High-throughput sequencing of transcriptomes (RNA-Seq) has become a powerful tool to study gene expression. Here we present an R package, rSeqNP, which implements a non-parametric approach to test for differential expression and splicing from RNA-Seq data. rSeqNP uses permutation tests to access statistical significance and can be applied to a variety of experimental designs. By combining information across isoforms, rSeqNP is able to detect more differentially expressed or spliced genes from RNA-Seq data. The R package with its source code and documentation are freely available at http://www-personal.umich.edu/∼jianghui/rseqnp/. jianghui@umich.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Role of APOE Isoforms in the Pathogenesis of TBI induced Alzheimer’s Disease
2016-10-01
deletion, APOE targeted replacement, complex breeding, CCI model optimization, mRNA library generation, high throughput massive parallel sequencing...demonstrate that the lack of Abca1 increases amyloid plaques and decreased APOE protein levels in AD-model mice. In this proposal we will test the hypothesis...injury, inflammatory reaction, transcriptome, high throughput massive parallel sequencing, mRNA-seq., behavioral testing, memory impairment, recovery 3
Picking ChIP-seq peak detectors for analyzing chromatin modification experiments
Micsinai, Mariann; Parisi, Fabio; Strino, Francesco; Asp, Patrik; Dynlacht, Brian D.; Kluger, Yuval
2012-01-01
Numerous algorithms have been developed to analyze ChIP-Seq data. However, the complexity of analyzing diverse patterns of ChIP-Seq signals, especially for epigenetic marks, still calls for the development of new algorithms and objective comparisons of existing methods. We developed Qeseq, an algorithm to detect regions of increased ChIP read density relative to background. Qeseq employs critical novel elements, such as iterative recalibration and neighbor joining of reads to identify enriched regions of any length. To objectively assess its performance relative to other 14 ChIP-Seq peak finders, we designed a novel protocol based on Validation Discriminant Analysis (VDA) to optimally select validation sites and generated two validation datasets, which are the most comprehensive to date for algorithmic benchmarking of key epigenetic marks. In addition, we systematically explored a total of 315 diverse parameter configurations from these algorithms and found that typically optimal parameters in one dataset do not generalize to other datasets. Nevertheless, default parameters show the most stable performance, suggesting that they should be used. This study also provides a reproducible and generalizable methodology for unbiased comparative analysis of high-throughput sequencing tools that can facilitate future algorithmic development. PMID:22307239
Picking ChIP-seq peak detectors for analyzing chromatin modification experiments.
Micsinai, Mariann; Parisi, Fabio; Strino, Francesco; Asp, Patrik; Dynlacht, Brian D; Kluger, Yuval
2012-05-01
Numerous algorithms have been developed to analyze ChIP-Seq data. However, the complexity of analyzing diverse patterns of ChIP-Seq signals, especially for epigenetic marks, still calls for the development of new algorithms and objective comparisons of existing methods. We developed Qeseq, an algorithm to detect regions of increased ChIP read density relative to background. Qeseq employs critical novel elements, such as iterative recalibration and neighbor joining of reads to identify enriched regions of any length. To objectively assess its performance relative to other 14 ChIP-Seq peak finders, we designed a novel protocol based on Validation Discriminant Analysis (VDA) to optimally select validation sites and generated two validation datasets, which are the most comprehensive to date for algorithmic benchmarking of key epigenetic marks. In addition, we systematically explored a total of 315 diverse parameter configurations from these algorithms and found that typically optimal parameters in one dataset do not generalize to other datasets. Nevertheless, default parameters show the most stable performance, suggesting that they should be used. This study also provides a reproducible and generalizable methodology for unbiased comparative analysis of high-throughput sequencing tools that can facilitate future algorithmic development.
Rozenberg, Andrey; Leese, Florian; Weiss, Linda C; Tollrian, Ralph
2016-01-01
Tag-Seq is a high-throughput approach used for discovering SNPs and characterizing gene expression. In comparison to RNA-Seq, Tag-Seq eases data processing and allows detection of rare mRNA species using only one tag per transcript molecule. However, reduced library complexity raises the issue of PCR duplicates, which distort gene expression levels. Here we present a novel Tag-Seq protocol that uses the least biased methods for RNA library preparation combined with a novel approach for joint PCR template and sample labeling. In our protocol, input RNA is fragmented by hydrolysis, and poly(A)-bearing RNAs are selected and directly ligated to mixed DNA-RNA P5 adapters. The P5 adapters contain i5 barcodes composed of sample-specific (moderately) degenerate base regions (mDBRs), which later allow detection of PCR duplicates. The P7 adapter is attached via reverse transcription with individual i7 barcodes added during the amplification step. The resulting libraries can be sequenced on an Illumina sequencer. After sample demultiplexing and PCR duplicate removal with a free software tool we designed, the data are ready for downstream analysis. Our protocol was tested on RNA samples from predator-induced and control Daphnia microcrustaceans.
Microfluidic single-cell whole-transcriptome sequencing.
Streets, Aaron M; Zhang, Xiannian; Cao, Chen; Pang, Yuhong; Wu, Xinglong; Xiong, Liang; Yang, Lu; Fu, Yusi; Zhao, Liang; Tang, Fuchou; Huang, Yanyi
2014-05-13
Single-cell whole-transcriptome analysis is a powerful tool for quantifying gene expression heterogeneity in populations of cells. Many techniques have, thus, been recently developed to perform transcriptome sequencing (RNA-Seq) on individual cells. To probe subtle biological variation between samples with limiting amounts of RNA, more precise and sensitive methods are still required. We adapted a previously developed strategy for single-cell RNA-Seq that has shown promise for superior sensitivity and implemented the chemistry in a microfluidic platform for single-cell whole-transcriptome analysis. In this approach, single cells are captured and lysed in a microfluidic device, where mRNAs with poly(A) tails are reverse-transcribed into cDNA. Double-stranded cDNA is then collected and sequenced using a next generation sequencing platform. We prepared 94 libraries consisting of single mouse embryonic cells and technical replicates of extracted RNA and thoroughly characterized the performance of this technology. Microfluidic implementation increased mRNA detection sensitivity as well as improved measurement precision compared with tube-based protocols. With 0.2 M reads per cell, we were able to reconstruct a majority of the bulk transcriptome with 10 single cells. We also quantified variation between and within different types of mouse embryonic cells and found that enhanced measurement precision, detection sensitivity, and experimental throughput aided the distinction between biological variability and technical noise. With this work, we validated the advantages of an early approach to single-cell RNA-Seq and showed that the benefits of combining microfluidic technology with high-throughput sequencing will be valuable for large-scale efforts in single-cell transcriptome analysis.
Teplitsky, Ella; Joshi, Karan; Ericson, Daniel L.; ...
2015-07-01
We describe a high throughput method for screening up to 1728 distinct chemicals with protein crystals on a single microplate. Acoustic droplet ejection (ADE) was used to co-position 2.5 nL of protein, precipitant, and chemicals on a MiTeGen in situ-1 crystallization plate™ for screening by co-crystallization or soaking. ADE-transferred droplets follow a precise trajectory which allows all components to be transferred through small apertures in the microplate lid. The apertures were large enough for 2.5 nL droplets to pass through them, but small enough so that they did not disrupt the internal environment created by the mother liquor. Using thismore » system, thermolysin and trypsin crystals were efficiently screened for binding to a heavy-metal mini-library. Fluorescence and X-ray diffraction were used to confirm that each chemical in the heavy-metal library was correctly paired with the intended protein crystal. Moreover, a fragment mini-library was screened to observe two known lysozyme We describe a high throughput method for screening up to 1728 distinct chemicals with protein crystals on a single microplate. Acoustic droplet ejection (ADE) was used to co-position 2.5 nL of protein, precipitant, and chemicals on a MiTeGen in situ-1 crystallization plate™ for screening by co-crystallization or soaking. ADE-transferred droplets follow a precise trajectory which allows all components to be transferred through small apertures in the microplate lid. The apertures were large enough for 2.5 nL droplets to pass through them, but small enough so that they did not disrupt the internal environment created by the mother liquor. Using this system, thermolysin and trypsin crystals were efficiently screened for binding to a heavy-metal mini-library. Fluorescence and X-ray diffraction were used to confirm that each chemical in the heavy-metal library was correctly paired with the intended protein crystal. A fragment mini-library was screened to observe two known lysozyme ligands using both co-crystallization and soaking. A similar approach was used to identify multiple, novel thaumatin binding sites for ascorbic acid. This technology pushes towards a faster, automated, and more flexible strategy for high throughput screening of chemical libraries (such as fragment libraries) using as little as 2.5 nL of each component.ds using both co-crystallization and soaking. We used a A similar approach to identify multiple, novel thaumatin binding sites for ascorbic acid. This technology pushes towards a faster, automated, and more flexible strategy for high throughput screening of chemical libraries (such as fragment libraries) using as little as 2.5 nL of each component.« less
Classifying next-generation sequencing data using a zero-inflated Poisson model.
Zhou, Yan; Wan, Xiang; Zhang, Baoxue; Tong, Tiejun
2018-04-15
With the development of high-throughput techniques, RNA-sequencing (RNA-seq) is becoming increasingly popular as an alternative for gene expression analysis, such as RNAs profiling and classification. Identifying which type of diseases a new patient belongs to with RNA-seq data has been recognized as a vital problem in medical research. As RNA-seq data are discrete, statistical methods developed for classifying microarray data cannot be readily applied for RNA-seq data classification. Witten proposed a Poisson linear discriminant analysis (PLDA) to classify the RNA-seq data in 2011. Note, however, that the count datasets are frequently characterized by excess zeros in real RNA-seq or microRNA sequence data (i.e. when the sequence depth is not enough or small RNAs with the length of 18-30 nucleotides). Therefore, it is desired to develop a new model to analyze RNA-seq data with an excess of zeros. In this paper, we propose a Zero-Inflated Poisson Logistic Discriminant Analysis (ZIPLDA) for RNA-seq data with an excess of zeros. The new method assumes that the data are from a mixture of two distributions: one is a point mass at zero, and the other follows a Poisson distribution. We then consider a logistic relation between the probability of observing zeros and the mean of the genes and the sequencing depth in the model. Simulation studies show that the proposed method performs better than, or at least as well as, the existing methods in a wide range of settings. Two real datasets including a breast cancer RNA-seq dataset and a microRNA-seq dataset are also analyzed, and they coincide with the simulation results that our proposed method outperforms the existing competitors. The software is available at http://www.math.hkbu.edu.hk/∼tongt. xwan@comp.hkbu.edu.hk or tongt@hkbu.edu.hk. Supplementary data are available at Bioinformatics online.
Bayesian Correlation Analysis for Sequence Count Data
Lau, Nelson; Perkins, Theodore J.
2016-01-01
Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities’ measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low—especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities’ signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset. PMID:27701449
Splicing factor SFRS1 recognizes a functionally diverse landscape of RNA transcripts.
Sanford, Jeremy R; Wang, Xin; Mort, Matthew; Vanduyn, Natalia; Cooper, David N; Mooney, Sean D; Edenberg, Howard J; Liu, Yunlong
2009-03-01
Metazoan genes are encrypted with at least two superimposed codes: the genetic code to specify the primary structure of proteins and the splicing code to expand their proteomic output via alternative splicing. Here, we define the specificity of a central regulator of pre-mRNA splicing, the conserved, essential splicing factor SFRS1. Cross-linking immunoprecipitation and high-throughput sequencing (CLIP-seq) identified 23,632 binding sites for SFRS1 in the transcriptome of cultured human embryonic kidney cells. SFRS1 was found to engage many different classes of functionally distinct transcripts including mRNA, miRNA, snoRNAs, ncRNAs, and conserved intergenic transcripts of unknown function. The majority of these diverse transcripts share a purine-rich consensus motif corresponding to the canonical SFRS1 binding site. The consensus site was not only enriched in exons cross-linked to SFRS1 in vivo, but was also enriched in close proximity to splice sites. mRNAs encoding RNA processing factors were significantly overrepresented, suggesting that SFRS1 may broadly influence the post-transcriptional control of gene expression in vivo. Finally, a search for the SFRS1 consensus motif within the Human Gene Mutation Database identified 181 mutations in 82 different genes that disrupt predicted SFRS1 binding sites. This comprehensive analysis substantially expands the known roles of human SR proteins in the regulation of a diverse array of RNA transcripts.
Azab, Marwa Mohamed; Fayyad, Dalia Mukhtar
2018-01-01
The use of high throughput next generation technologies has allowed more comprehensive analysis than traditional Sanger sequencing. The specific aim of this study was to investigate the microbial diversity of primary endodontic infections using Illumina MiSeq sequencing platform in Egyptian patients. Samples were collected from 19 patients in Suez Canal University Hospital (Endodontic Department) using sterile # 15K file and paper points. DNA was extracted using Mo Bio power soil DNA isolation extraction kit followed by PCR amplification and agarose gel electrophoresis. The microbiome was characterized on the basis of the V3 and V4 hypervariable region of the 16S rRNA gene by using paired-end sequencing on Illumina MiSeq device. MOTHUR software was used in sequence filtration and analysis of sequenced data. A total of 1858 operational taxonomic units at 97% similarity were assigned to 26 phyla, 245 families, and 705 genera. Four main phyla Firmicutes, Bacteroidetes, Proteobacteria, and Synergistetes were predominant in all samples. At genus level, Prevotella, Bacillus, Porphyromonas, Streptococcus, and Bacteroides were the most abundant. Illumina MiSeq platform sequencing can be used to investigate oral microbiome composition of endodontic infections. Elucidating the ecology of endodontic infections is a necessary step in developing effective intracanal antimicrobials. PMID:29849646
Chang, Yi-Wen; Su, Ying-Jhen; Hsiao, Michael; Wei, Kuo-Chen; Lin, Wei-Hsin; Liang, Chi-Lung; Chen, Shin-Cheh; Lee, Jia-Lin
2015-08-15
Wnt signaling contributes to the reprogramming and maintenance of cancer stem cell (CSC) states that are activated by epithelial-mesenchymal transition (EMT). However, the mechanistic relationship between EMT and the Wnt pathway in CSC is not entirely clear. Chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq) indicated that EMT induces a switch from the β-catenin/E-cadherin/Sox15 complex to the β-catenin/Twist1/TCF4 complex, the latter of which then binds to CSC-related gene promoters. Tandem coimmunoprecipitation and re-ChIP experiments with epithelial-type cells further revealed that Sox15 associates with the β-catenin/E-cadherin complex, which then binds to the proximal promoter region of CASP3. Through this mechanism, Twist1 cleavage is triggered to regulate a β-catenin-elicited promotion of the CSC phenotype. During EMT, we documented that Twist1 binding to β-catenin enhanced the transcriptional activity of the β-catenin/TCF4 complex, including by binding to the proximal promoter region of ABCG2, a CSC marker. In terms of clinical application, our definition of a five-gene CSC signature (nuclear β-catenin(High)/nuclear Twist1(High)/E-cadherin(Low)/Sox15(Low)/CD133(High)) may provide a useful prognostic marker for human lung cancer. ©2015 American Association for Cancer Research.
Burdick, David B; Cavnor, Chris C; Handcock, Jeremy; Killcoyne, Sarah; Lin, Jake; Marzolf, Bruz; Ramsey, Stephen A; Rovira, Hector; Bressler, Ryan; Shmulevich, Ilya; Boyle, John
2010-07-14
High throughput sequencing has become an increasingly important tool for biological research. However, the existing software systems for managing and processing these data have not provided the flexible infrastructure that research requires. Existing software solutions provide static and well-established algorithms in a restrictive package. However as high throughput sequencing is a rapidly evolving field, such static approaches lack the ability to readily adopt the latest advances and techniques which are often required by researchers. We have used a loosely coupled, service-oriented infrastructure to develop SeqAdapt. This system streamlines data management and allows for rapid integration of novel algorithms. Our approach also allows computational biologists to focus on developing and applying new methods instead of writing boilerplate infrastructure code. The system is based around the Addama service architecture and is available at our website as a demonstration web application, an installable single download and as a collection of individual customizable services.
2010-01-01
Background High throughput sequencing has become an increasingly important tool for biological research. However, the existing software systems for managing and processing these data have not provided the flexible infrastructure that research requires. Results Existing software solutions provide static and well-established algorithms in a restrictive package. However as high throughput sequencing is a rapidly evolving field, such static approaches lack the ability to readily adopt the latest advances and techniques which are often required by researchers. We have used a loosely coupled, service-oriented infrastructure to develop SeqAdapt. This system streamlines data management and allows for rapid integration of novel algorithms. Our approach also allows computational biologists to focus on developing and applying new methods instead of writing boilerplate infrastructure code. Conclusion The system is based around the Addama service architecture and is available at our website as a demonstration web application, an installable single download and as a collection of individual customizable services. PMID:20630057
Cox-nnet: An artificial neural network method for prognosis prediction of high-throughput omics data
Ching, Travers; Zhu, Xun
2018-01-01
Artificial neural networks (ANN) are computing architectures with many interconnections of simple neural-inspired computing elements, and have been applied to biomedical fields such as imaging analysis and diagnosis. We have developed a new ANN framework called Cox-nnet to predict patient prognosis from high throughput transcriptomics data. In 10 TCGA RNA-Seq data sets, Cox-nnet achieves the same or better predictive accuracy compared to other methods, including Cox-proportional hazards regression (with LASSO, ridge, and mimimax concave penalty), Random Forests Survival and CoxBoost. Cox-nnet also reveals richer biological information, at both the pathway and gene levels. The outputs from the hidden layer node provide an alternative approach for survival-sensitive dimension reduction. In summary, we have developed a new method for accurate and efficient prognosis prediction on high throughput data, with functional biological insights. The source code is freely available at https://github.com/lanagarmire/cox-nnet. PMID:29634719
Huang, Shao-shan Carol; Clarke, David C.; Gosline, Sara J. C.; Labadorf, Adam; Chouinard, Candace R.; Gordon, William; Lauffenburger, Douglas A.; Fraenkel, Ernest
2013-01-01
Cellular signal transduction generally involves cascades of post-translational protein modifications that rapidly catalyze changes in protein-DNA interactions and gene expression. High-throughput measurements are improving our ability to study each of these stages individually, but do not capture the connections between them. Here we present an approach for building a network of physical links among these data that can be used to prioritize targets for pharmacological intervention. Our method recovers the critical missing links between proteomic and transcriptional data by relating changes in chromatin accessibility to changes in expression and then uses these links to connect proteomic and transcriptome data. We applied our approach to integrate epigenomic, phosphoproteomic and transcriptome changes induced by the variant III mutation of the epidermal growth factor receptor (EGFRvIII) in a cell line model of glioblastoma multiforme (GBM). To test the relevance of the network, we used small molecules to target highly connected nodes implicated by the network model that were not detected by the experimental data in isolation and we found that a large fraction of these agents alter cell viability. Among these are two compounds, ICG-001, targeting CREB binding protein (CREBBP), and PKF118–310, targeting β-catenin (CTNNB1), which have not been tested previously for effectiveness against GBM. At the level of transcriptional regulation, we used chromatin immunoprecipitation sequencing (ChIP-Seq) to experimentally determine the genome-wide binding locations of p300, a transcriptional co-regulator highly connected in the network. Analysis of p300 target genes suggested its role in tumorigenesis. We propose that this general method, in which experimental measurements are used as constraints for building regulatory networks from the interactome while taking into account noise and missing data, should be applicable to a wide range of high-throughput datasets. PMID:23408876
MOCCS: Clarifying DNA-binding motif ambiguity using ChIP-Seq data.
Ozaki, Haruka; Iwasaki, Wataru
2016-08-01
As a key mechanism of gene regulation, transcription factors (TFs) bind to DNA by recognizing specific short sequence patterns that are called DNA-binding motifs. A single TF can accept ambiguity within its DNA-binding motifs, which comprise both canonical (typical) and non-canonical motifs. Clarification of such DNA-binding motif ambiguity is crucial for revealing gene regulatory networks and evaluating mutations in cis-regulatory elements. Although chromatin immunoprecipitation sequencing (ChIP-seq) now provides abundant data on the genomic sequences to which a given TF binds, existing motif discovery methods are unable to directly answer whether a given TF can bind to a specific DNA-binding motif. Here, we report a method for clarifying the DNA-binding motif ambiguity, MOCCS. Given ChIP-Seq data of any TF, MOCCS comprehensively analyzes and describes every k-mer to which that TF binds. Analysis of simulated datasets revealed that MOCCS is applicable to various ChIP-Seq datasets, requiring only a few minutes per dataset. Application to the ENCODE ChIP-Seq datasets proved that MOCCS directly evaluates whether a given TF binds to each DNA-binding motif, even if known position weight matrix models do not provide sufficient information on DNA-binding motif ambiguity. Furthermore, users are not required to provide numerous parameters or background genomic sequence models that are typically unavailable. MOCCS is implemented in Perl and R and is freely available via https://github.com/yuifu/moccs. By complementing existing motif-discovery software, MOCCS will contribute to the basic understanding of how the genome controls diverse cellular processes via DNA-protein interactions. Copyright © 2016 Elsevier Ltd. All rights reserved.
Wang, Haoran; Wang, Mingxiu; Cheng, Qiang
2018-03-08
Detection of complex splice sites (SSs) and polyadenylation sites (PASs) of eukaryotic genes is essential for the elucidation of gene regulatory mechanisms. Transcriptome-wide studies using high-throughput sequencing (HTS) have revealed prevalent alternative splicing (AS) and alternative polyadenylation (APA) in plants. However, small-scale and high-depth HTS aimed at detecting genes or gene families are very few and limited. We explored a convenient and flexible method for profiling SSs and PASs, which combines rapid amplification of 3'-cDNA ends (3'-RACE) and HTS. Fourteen NAC (NAM, ATAF1/2, CUC2) transcription factor genes of Populus trichocarpa were analyzed by 3'-RACE-seq. Based on experimental reproducibility, boundary sequence analysis and reverse transcription PCR (RT-PCR) verification, only canonical SSs were considered to be authentic. Based on stringent criteria, candidate PASs without any internal priming features were chosen as authentic PASs and assumed to be PAS-rich markers. Thirty-four novel canonical SSs, six intronic/internal exons and thirty 3'-UTR PAS-rich markers were revealed by 3'-RACE-seq. Using 3'-RACE and real-time PCR, we confirmed that three APA transcripts ending in/around PAS-rich markers were differentially regulated in response to plant hormones. Our results indicate that 3'-RACE-seq is a robust and cost-effective method to discover SSs and label active regions subjected to APA for genes or gene families. The method is suitable for small-scale AS and APA research in the initial stage.
Gao, Meiping; Zhang, Shangwen; Luo, Cong; He, Xinhua; Wei, Shaolong; Jiang, Wen; He, Fanglian; Lin, Zhicheng; Yan, Meixin; Dong, Weiqong
2018-04-05
Sagittaria sagittifolia L is an important bulb vegetable that has high nutritional and medical value. Bulb formation and development are crucial to Sagittaria sagittifolia; however, its sucrose metabolism is poorly understood and there are a lack of sufficient transcriptomic and genomic data available to fully understand the molecular mechanisms underlying bulb formation and development as well as the bulb transcriptome. Five cDNA libraries were constructed at different developmental stages and sequenced using high-throughput Illumina RNA sequencing. From approximately 63.53 Gb clean reads, a total of 60,884 unigenes, with an average length of 897.34 bp and N50 of 1.368 kb, were obtained. A total of 36,590 unigenes were successfully annotated using five public databases. Across different developmental stages, 4195, 827, 832, 851, and 1494 were differentially expressed in T02, T03, T04, T05, and T06 libraries, respectively. Gene ontology (GO) analysis revealed several differentially-expressed genes (DEGs) associated with catalytic activity, binding, and transporter activity. The Kyoto encyclopedia of genes and genomes (KEGG) revealed that these DEGs are involved in physiological and biochemical processes. RT-qPCR was used to profile the expression of these unigenes and revealed that the expression patterns of the DEGs were consistent with the transcriptome data. In this study, we conducted a comparative gene expression analysis at the transcriptional level using RNA-seq across the different developmental stages of Sagittaria sagittifolia. We identified a set of genes that might contribute to starch and sucrose metabolism, and the genetic mechanisms related to bulblet development were also explored. This study provides important data for future studies of the genetic and molecular mechanisms underlying bulb formation and development in Sagittaria sagittifolia. Copyright © 2018. Published by Elsevier B.V.
Gadala-Maria, Daniel; Yaari, Gur; Uduman, Mohamed; Kleinstein, Steven H
2015-02-24
Individual variation in germline and expressed B-cell immunoglobulin (Ig) repertoires has been associated with aging, disease susceptibility, and differential response to infection and vaccination. Repertoire properties can now be studied at large-scale through next-generation sequencing of rearranged Ig genes. Accurate analysis of these repertoire-sequencing (Rep-Seq) data requires identifying the germline variable (V), diversity (D), and joining (J) gene segments used by each Ig sequence. Current V(D)J assignment methods work by aligning sequences to a database of known germline V(D)J segment alleles. However, existing databases are likely to be incomplete and novel polymorphisms are hard to differentiate from the frequent occurrence of somatic hypermutations in Ig sequences. Here we develop a Tool for Ig Genotype Elucidation via Rep-Seq (TIgGER). TIgGER analyzes mutation patterns in Rep-Seq data to identify novel V segment alleles, and also constructs a personalized germline database containing the specific set of alleles carried by a subject. This information is then used to improve the initial V segment assignments from existing tools, like IMGT/HighV-QUEST. The application of TIgGER to Rep-Seq data from seven subjects identified 11 novel V segment alleles, including at least one in every subject examined. These novel alleles constituted 13% of the total number of unique alleles in these subjects, and impacted 3% of V(D)J segment assignments. These results reinforce the highly polymorphic nature of human Ig V genes, and suggest that many novel alleles remain to be discovered. The integration of TIgGER into Rep-Seq processing pipelines will increase the accuracy of V segment assignments, thus improving B-cell repertoire analyses.
From reads to regions: a Bioconductor workflow to detect differential binding in ChIP-seq data
Lun, Aaron T. L.; Smyth, Gordon K.
2016-01-01
Chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq) is widely used to identify the genomic binding sites for protein of interest. Most conventional approaches to ChIP-seq data analysis involve the detection of the absolute presence (or absence) of a binding site. However, an alternative strategy is to identify changes in the binding intensity between two biological conditions, i.e., differential binding (DB). This may yield more relevant results than conventional analyses, as changes in binding can be associated with the biological difference being investigated. The aim of this article is to facilitate the implementation of DB analyses, by comprehensively describing a computational workflow for the detection of DB regions from ChIP-seq data. The workflow is based primarily on R software packages from the open-source Bioconductor project and covers all steps of the analysis pipeline, from alignment of read sequences to interpretation and visualization of putative DB regions. In particular, detection of DB regions will be conducted using the counts for sliding windows from the csaw package, with statistical modelling performed using methods in the edgeR package. Analyses will be demonstrated on real histone mark and transcription factor data sets. This will provide readers with practical usage examples that can be applied in their own studies. PMID:26834993
Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline
Rahmatallah, Yasir; Emmert-Streib, Frank
2016-01-01
Transcriptome sequencing (RNA-seq) is gradually replacing microarrays for high-throughput studies of gene expression. The main challenge of analyzing microarray data is not in finding differentially expressed genes, but in gaining insights into the biological processes underlying phenotypic differences. To interpret experimental results from microarrays, gene set analysis (GSA) has become the method of choice, in particular because it incorporates pre-existing biological knowledge (in a form of functionally related gene sets) into the analysis. Here we provide a brief review of several statistically different GSA approaches (competitive and self-contained) that can be adapted from microarrays practice as well as those specifically designed for RNA-seq. We evaluate their performance (in terms of Type I error rate, power, robustness to the sample size and heterogeneity, as well as the sensitivity to different types of selection biases) on simulated and real RNA-seq data. Not surprisingly, the performance of various GSA approaches depends only on the statistical hypothesis they test and does not depend on whether the test was developed for microarrays or RNA-seq data. Interestingly, we found that competitive methods have lower power as well as robustness to the samples heterogeneity than self-contained methods, leading to poor results reproducibility. We also found that the power of unsupervised competitive methods depends on the balance between up- and down-regulated genes in tested gene sets. These properties of competitive methods have been overlooked before. Our evaluation provides a concise guideline for selecting GSA approaches, best performing under particular experimental settings in the context of RNA-seq. PMID:26342128
The Impact of Normalization Methods on RNA-Seq Data Analysis
Zyprych-Walczak, J.; Szabelska, A.; Handschuh, L.; Górczak, K.; Klamecka, K.; Figlerowicz, M.; Siatkowski, I.
2015-01-01
High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably. PMID:26176014
Rai, Muhammad Farooq; Tycksen, Eric D; Sandell, Linda J; Brophy, Robert H
2018-01-01
Microarrays and RNA-seq are at the forefront of high throughput transcriptome analyses. Since these methodologies are based on different principles, there are concerns about the concordance of data between the two techniques. The concordance of RNA-seq and microarrays for genome-wide analysis of differential gene expression has not been rigorously assessed in clinically derived ligament tissues. To demonstrate the concordance between RNA-seq and microarrays and to assess potential benefits of RNA-seq over microarrays, we assessed differences in transcript expression in anterior cruciate ligament (ACL) tissues based on time-from-injury. ACL remnants were collected from patients with an ACL tear at the time of ACL reconstruction. RNA prepared from torn ACL remnants was subjected to Agilent microarrays (N = 24) and RNA-seq (N = 8). The correlation of biological replicates in RNA-seq and microarrays data was similar (0.98 vs. 0.97), demonstrating that each platform has high internal reproducibility. Correlations between the RNA-seq data and the individual microarrays were low, but correlations between the RNA-seq values and the geometric mean of the microarrays values were moderate. The cross-platform concordance for differentially expressed transcripts or enriched pathways was linearly correlated (r = 0.64). RNA-Seq was superior in detecting low abundance transcripts and differentiating biologically critical isoforms. Additional independent validation of transcript expression was undertaken using microfluidic PCR for selected genes. PCR data showed 100% concordance (in expression pattern) with RNA-seq and microarrays data. These findings demonstrate that RNA-seq has advantages over microarrays for transcriptome profiling of ligament tissues when available and affordable. Furthermore, these findings are likely transferable to other musculoskeletal tissues where tissue collection is challenging and cells are in low abundance. © 2017 Orthopaedic Research Society. Published by Wiley Periodicals, Inc. J Orthop Res 36:484-497, 2018. © 2017 Orthopaedic Research Society. Published by Wiley Periodicals, Inc.
CapZyme-Seq Comprehensively Defines Promoter-Sequence Determinants for RNA 5' Capping with NAD.
Vvedenskaya, Irina O; Bird, Jeremy G; Zhang, Yuanchao; Zhang, Yu; Jiao, Xinfu; Barvík, Ivan; Krásný, Libor; Kiledjian, Megerditch; Taylor, Deanne M; Ebright, Richard H; Nickels, Bryce E
2018-05-03
Nucleoside-containing metabolites such as NAD + can be incorporated as 5' caps on RNA by serving as non-canonical initiating nucleotides (NCINs) for transcription initiation by RNA polymerase (RNAP). Here, we report CapZyme-seq, a high-throughput-sequencing method that employs NCIN-decapping enzymes NudC and Rai1 to detect and quantify NCIN-capped RNA. By combining CapZyme-seq with multiplexed transcriptomics, we determine efficiencies of NAD + capping by Escherichia coli RNAP for ∼16,000 promoter sequences. The results define preferred transcription start site (TSS) positions for NAD + capping and define a consensus promoter sequence for NAD + capping: HRRASWW (TSS underlined). By applying CapZyme-seq to E. coli total cellular RNA, we establish that sequence determinants for NCIN capping in vivo match the NAD + -capping consensus defined in vitro, and we identify and quantify NCIN-capped small RNAs (sRNAs). Our findings define the promoter-sequence determinants for NCIN capping with NAD + and provide a general method for analysis of NCIN capping in vitro and in vivo. Copyright © 2018 Elsevier Inc. All rights reserved.
Bartram, Jack; Mountjoy, Edward; Brooks, Tony; Hancock, Jeremy; Williamson, Helen; Wright, Gary; Moppett, John; Goulden, Nick; Hubank, Mike
2016-07-01
High-throughput sequencing (HTS) (next-generation sequencing) of the rearranged Ig and T-cell receptor genes promises to be less expensive and more sensitive than current methods of monitoring minimal residual disease (MRD) in patients with acute lymphoblastic leukemia. However, the adoption of new approaches by clinical laboratories requires careful evaluation of all potential sources of error and the development of strategies to ensure the highest accuracy. Timely and efficient clinical use of HTS platforms will depend on combining multiple samples (multiplexing) in each sequencing run. Here we examine the Ig heavy-chain gene HTS on the Illumina MiSeq platform for MRD. We identify errors associated with multiplexing that could potentially impact the accuracy of MRD analysis. We optimize a strategy that combines high-purity, sequence-optimized oligonucleotides, dual indexing, and an error-aware demultiplexing approach to minimize errors and maximize sensitivity. We present a probability-based, demultiplexing pipeline Error-Aware Demultiplexer that is suitable for all MiSeq strategies and accurately assigns samples to the correct identifier without excessive loss of data. Finally, using controls quantified by digital PCR, we show that HTS-MRD can accurately detect as few as 1 in 10(6) copies of specific leukemic MRD. Crown Copyright © 2016. Published by Elsevier Inc. All rights reserved.
Protein-RNA specificity by high-throughput principal component analysis of NMR spectra.
Collins, Katherine M; Oregioni, Alain; Robertson, Laura E; Kelly, Geoff; Ramos, Andres
2015-03-31
Defining the RNA target selectivity of the proteins regulating mRNA metabolism is a key issue in RNA biology. Here we present a novel use of principal component analysis (PCA) to extract the RNA sequence preference of RNA binding proteins. We show that PCA can be used to compare the changes in the nuclear magnetic resonance (NMR) spectrum of a protein upon binding a set of quasi-degenerate RNAs and define the nucleobase specificity. We couple this application of PCA to an automated NMR spectra recording and processing protocol and obtain an unbiased and high-throughput NMR method for the analysis of nucleobase preference in protein-RNA interactions. We test the method on the RNA binding domains of three important regulators of RNA metabolism. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Soreq, Lilach; Guffanti, Alessandro; Salomonis, Nathan; Simchovitz, Alon; Israel, Zvi; Bergman, Hagai; Soreq, Hermona
2014-01-01
The continuously prolonged human lifespan is accompanied by increase in neurodegenerative diseases incidence, calling for the development of inexpensive blood-based diagnostics. Analyzing blood cell transcripts by RNA-Seq is a robust means to identify novel biomarkers that rapidly becomes a commonplace. However, there is lack of tools to discover novel exons, junctions and splicing events and to precisely and sensitively assess differential splicing through RNA-Seq data analysis and across RNA-Seq platforms. Here, we present a new and comprehensive computational workflow for whole-transcriptome RNA-Seq analysis, using an updated version of the software AltAnalyze, to identify both known and novel high-confidence alternative splicing events, and to integrate them with both protein-domains and microRNA binding annotations. We applied the novel workflow on RNA-Seq data from Parkinson's disease (PD) patients' leukocytes pre- and post- Deep Brain Stimulation (DBS) treatment and compared to healthy controls. Disease-mediated changes included decreased usage of alternative promoters and N-termini, 5′-end variations and mutually-exclusive exons. The PD regulated FUS and HNRNP A/B included prion-like domains regulated regions. We also present here a workflow to identify and analyze long non-coding RNAs (lncRNAs) via RNA-Seq data. We identified reduced lncRNA expression and selective PD-induced changes in 13 of over 6,000 detected leukocyte lncRNAs, four of which were inversely altered post-DBS. These included the U1 spliceosomal lncRNA and RP11-462G22.1, each entailing sequence complementarity to numerous microRNAs. Analysis of RNA-Seq from PD and unaffected controls brains revealed over 7,000 brain-expressed lncRNAs, of which 3,495 were co-expressed in the leukocytes including U1, which showed both leukocyte and brain increases. Furthermore, qRT-PCR validations confirmed these co-increases in PD leukocytes and two brain regions, the amygdala and substantia-nigra, compared to controls. This novel workflow allows deep multi-level inspection of RNA-Seq datasets and provides a comprehensive new resource for understanding disease transcriptome modifications in PD and other neurodegenerative diseases. PMID:24651478
Du, Wenxiao; Zeng, Fanrong
2016-12-14
Adults of the lady beetle species Harmonia axyridis (Pallas) are bred artificially en masse for classic biological control, which requires egg-laying by the H. axyridis ovary. Development-related genes may impact the growth of the H. axyridis adult ovary but have not been reported. Here, we used integrative time-series RNA-seq analysis of the ovary in H. axyridis adults to detect development-related genes. A total of 28,558 unigenes were functionally annotated using seven types of databases to obtain an annotated unigene database for ovaries in H. axyridis adults. We also analysed differentially expressed genes (DEGs) between samples. Based on a combination of the results of this bioinformatics analysis with literature reports and gene expression level changes in four different stages, we focused on the development of oocyte reproductive stem cell and yolk formation process and identified 26 genes with high similarity to development-related genes. 20 DEGs were randomly chosen for quantitative real-time PCR (qRT-PCR) to validate the accuracy of the RNA-seq results. This study establishes a robust pipeline for the discovery of key genes using high-throughput sequencing and the identification of a class of development-related genes for characterization.
Miyazaki, Nobuo; Kiyose, Norihiko; Akazawa, Yoko; Takashima, Mizuki; Hagihara, Yosihisa; Inoue, Naokazu; Matsuda, Tomonari; Ogawa, Ryu; Inoue, Seiya; Ito, Yuji
2015-09-01
The antigen-binding domain of camelid dimeric heavy chain antibodies, known as VHH or Nanobody, has much potential in pharmaceutical and industrial applications. To establish the isolation process of antigen-specific VHH, a VHH phage library was constructed with a diversity of 8.4 × 10(7) from cDNA of peripheral blood mononuclear cells of an alpaca (Lama pacos) immunized with a fragment of IZUMO1 (IZUMO1PFF) as a model antigen. By conventional biopanning, 13 antigen-specific VHHs were isolated. The amino acid sequences of these VHHs, designated as N-group VHHs, were very similar to each other (>93% identity). To find more diverse antibodies, we performed high-throughput sequencing (HTS) of VHH genes. By comparing the frequencies of each sequence between before and after biopanning, we found the sequences whose frequencies were increased by biopanning. The top 100 sequences of them were supplied for phylogenic tree analysis. In total 75% of them belonged to N-group VHHs, but the other were phylogenically apart from N-group VHHs (Non N-group). Two of three VHHs selected from non N-group VHHs showed sufficient antigen binding ability. These results suggested that biopanning followed by HTS provided a useful method for finding minor and diverse antigen-specific clones that could not be identified by conventional biopanning. © The Authors 2015. Published by Oxford University Press on behalf of the Japanese Biochemical Society. All rights reserved.
SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data
Dotu, Ivan; Adamson, Scott I.; Coleman, Benjamin; Fournier, Cyril; Ricart-Altimiras, Emma; Eyras, Eduardo
2018-01-01
RNA-protein binding is critical to gene regulation, controlling fundamental processes including splicing, translation, localization and stability, and aberrant RNA-protein interactions are known to play a role in a wide variety of diseases. However, molecular understanding of RNA-protein interactions remains limited; in particular, identification of RNA motifs that bind proteins has long been challenging, especially when such motifs depend on both sequence and structure. Moreover, although RNA binding proteins (RBPs) often contain more than one binding domain, algorithms capable of identifying more than one binding motif simultaneously have not been developed. In this paper we present a novel pipeline to determine binding peaks in crosslinking immunoprecipitation (CLIP) data, to discover multiple possible RNA sequence/structure motifs among them, and to experimentally validate such motifs. At the core is a new semi-automatic algorithm SARNAclust, the first unsupervised method to identify and deconvolve multiple sequence/structure motifs simultaneously. SARNAclust computes similarity between sequence/structure objects using a graph kernel, providing the ability to isolate the impact of specific features through the bulge graph formalism. Application of SARNAclust to synthetic data shows its capability of clustering 5 motifs at once with a V-measure value of over 0.95, while GraphClust achieves only a V-measure of 0.083 and RNAcontext cannot detect any of the motifs. When applied to existing eCLIP sets, SARNAclust finds known motifs for SLBP and HNRNPC and novel motifs for several other RBPs such as AGGF1, AKAP8L and ILF3. We demonstrate an experimental validation protocol, a targeted Bind-n-Seq-like high-throughput sequencing approach that relies on RNA inverse folding for oligo pool design, that can validate the components within the SLBP motif. Finally, we use this protocol to experimentally interrogate the SARNAclust motif predictions for protein ILF3. Our results support a newly identified partially double-stranded UUUUUGAGA motif similar to that known for the splicing factor HNRNPC. PMID:29596423
Yu, Jing; Hirose-Yotsuya, Lisa; Take, Kazumi; Sun, Wei; Iwabu, Masato; Okada-Iwabu, Miki; Fujita, Takanori; Aoyama, Tomohisa; Tsutsumi, Shuichi; Ueki, Kohjiro; Kodama, Tatsuhiko; Sakai, Juro; Aburatani, Hiroyuki; Kadowaki, Takashi
2011-01-01
Identification of regulatory elements within the genome is crucial for understanding the mechanisms that govern cell type–specific gene expression. We generated genome-wide maps of open chromatin sites in 3T3-L1 adipocytes (on day 0 and day 8 of differentiation) and NIH-3T3 fibroblasts using formaldehyde-assisted isolation of regulatory elements coupled with high-throughput sequencing (FAIRE-seq). FAIRE peaks at the promoter were associated with active transcription and histone modifications of H3K4me3 and H3K27ac. Non-promoter FAIRE peaks were characterized by H3K4me1+/me3-, the signature of enhancers, and were largely located in distal regions. The non-promoter FAIRE peaks showed dynamic change during differentiation, while the promoter FAIRE peaks were relatively constant. Functionally, the adipocyte- and preadipocyte-specific non-promoter FAIRE peaks were, respectively, associated with genes up-regulated and down-regulated by differentiation. Genes highly up-regulated during differentiation were associated with multiple clustered adipocyte-specific FAIRE peaks. Among the adipocyte-specific FAIRE peaks, 45.3% and 11.7% overlapped binding sites for, respectively, PPARγ and C/EBPα, the master regulators of adipocyte differentiation. Computational motif analyses of the adipocyte-specific FAIRE peaks revealed enrichment of a binding motif for nuclear family I (NFI) transcription factors. Indeed, ChIP assay showed that NFI occupy the adipocyte-specific FAIRE peaks and/or the PPARγ binding sites near PPARγ, C/EBPα, and aP2 genes. Overexpression of NFIA in 3T3-L1 cells resulted in robust induction of these genes and lipid droplet formation without differentiation stimulus. Overexpression of dominant-negative NFIA or siRNA–mediated knockdown of NFIA or NFIB significantly suppressed both induction of genes and lipid accumulation during differentiation, suggesting a physiological function of these factors in the adipogenic program. Together, our study demonstrates the utility of FAIRE-seq in providing a global view of cell type–specific regulatory elements in the genome and in identifying transcriptional regulators of adipocyte differentiation. PMID:22028663
An interactive environment for agile analysis and visualization of ChIP-sequencing data.
Lerdrup, Mads; Johansen, Jens Vilstrup; Agrawal-Singh, Shuchi; Hansen, Klaus
2016-04-01
To empower experimentalists with a means for fast and comprehensive chromatin immunoprecipitation sequencing (ChIP-seq) data analyses, we introduce an integrated computational environment, EaSeq. The software combines the exploratory power of genome browsers with an extensive set of interactive and user-friendly tools for genome-wide abstraction and visualization. It enables experimentalists to easily extract information and generate hypotheses from their own data and public genome-wide datasets. For demonstration purposes, we performed meta-analyses of public Polycomb ChIP-seq data and established a new screening approach to analyze more than 900 datasets from mouse embryonic stem cells for factors potentially associated with Polycomb recruitment. EaSeq, which is freely available and works on a standard personal computer, can substantially increase the throughput of many analysis workflows, facilitate transparency and reproducibility by automatically documenting and organizing analyses, and enable a broader group of scientists to gain insights from ChIP-seq data.
dCLIP: a computational approach for comparative CLIP-seq analyses
2014-01-01
Although comparison of RNA-protein interaction profiles across different conditions has become increasingly important to understanding the function of RNA-binding proteins (RBPs), few computational approaches have been developed for quantitative comparison of CLIP-seq datasets. Here, we present an easy-to-use command line tool, dCLIP, for quantitative CLIP-seq comparative analysis. The two-stage method implemented in dCLIP, including a modified MA normalization method and a hidden Markov model, is shown to be able to effectively identify differential binding regions of RBPs in four CLIP-seq datasets, generated by HITS-CLIP, iCLIP and PAR-CLIP protocols. dCLIP is freely available at http://qbrc.swmed.edu/software/. PMID:24398258
GC-Content Normalization for RNA-Seq Data
2011-01-01
Background Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting expression measures. Normalization is therefore essential to ensure accurate inference of expression levels and subsequent analyses thereof. Results We focus on biases related to GC-content and demonstrate the existence of strong sample-specific GC-content effects on RNA-Seq read counts, which can substantially bias differential expression analysis. We propose three simple within-lane gene-level GC-content normalization approaches and assess their performance on two different RNA-Seq datasets, involving different species and experimental designs. Our methods are compared to state-of-the-art normalization procedures in terms of bias and mean squared error for expression fold-change estimation and in terms of Type I error and p-value distributions for tests of differential expression. The exploratory data analysis and normalization methods proposed in this article are implemented in the open-source Bioconductor R package EDASeq. Conclusions Our within-lane normalization procedures, followed by between-lane normalization, reduce GC-content bias and lead to more accurate estimates of expression fold-changes and tests of differential expression. Such results are crucial for the biological interpretation of RNA-Seq experiments, where downstream analyses can be sensitive to the supplied lists of genes. PMID:22177264
Mapping specificity landscapes of RNA-protein interactions by high throughput sequencing.
Jankowsky, Eckhard; Harris, Michael E
2017-04-15
To function in a biological setting, RNA binding proteins (RBPs) have to discriminate between alternative binding sites in RNAs. This discrimination can occur in the ground state of an RNA-protein binding reaction, in its transition state, or in both. The extent by which RBPs discriminate at these reaction states defines RBP specificity landscapes. Here, we describe the HiTS-Kin and HiTS-EQ techniques, which combine kinetic and equilibrium binding experiments with high throughput sequencing to quantitatively assess substrate discrimination for large numbers of substrate variants at ground and transition states of RNA-protein binding reactions. We discuss experimental design, practical considerations and data analysis and outline how a combination of HiTS-Kin and HiTS-EQ allows the mapping of RBP specificity landscapes. Copyright © 2017 Elsevier Inc. All rights reserved.
Gupta, Vikas; Estrada, April D; Blakley, Ivory; Reid, Rob; Patel, Ketan; Meyer, Mason D; Andersen, Stig Uggerhøj; Brown, Allan F; Lila, Mary Ann; Loraine, Ann E
2015-01-01
Blueberries are a rich source of antioxidants and other beneficial compounds that can protect against disease. Identifying genes involved in synthesis of bioactive compounds could enable the breeding of berry varieties with enhanced health benefits. Toward this end, we annotated a previously sequenced draft blueberry genome assembly using RNA-Seq data from five stages of berry fruit development and ripening. Genome-guided assembly of RNA-Seq read alignments combined with output from ab initio gene finders produced around 60,000 gene models, of which more than half were similar to proteins from other species, typically the grape Vitis vinifera. Comparison of gene models to the PlantCyc database of metabolic pathway enzymes identified candidate genes involved in synthesis of bioactive compounds, including bixin, an apocarotenoid with potential disease-fighting properties, and defense-related cyanogenic glycosides, which are toxic. Cyanogenic glycoside (CG) biosynthetic enzymes were highly expressed in green fruit, and a candidate CG detoxification enzyme was up-regulated during fruit ripening. Candidate genes for ethylene, anthocyanin, and 400 other biosynthetic pathways were also identified. Homology-based annotation using Blast2GO and InterPro assigned Gene Ontology terms to around 15,000 genes. RNA-Seq expression profiling showed that blueberry growth, maturation, and ripening involve dynamic gene expression changes, including coordinated up- and down-regulation of metabolic pathway enzymes and transcriptional regulators. Analysis of RNA-seq alignments identified developmentally regulated alternative splicing, promoter use, and 3' end formation. We report genome sequence, gene models, functional annotations, and RNA-Seq expression data that provide an important new resource enabling high throughput studies in blueberry.
Munger, Steven C.; Raghupathy, Narayanan; Choi, Kwangbom; Simons, Allen K.; Gatti, Daniel M.; Hinerfeld, Douglas A.; Svenson, Karen L.; Keller, Mark P.; Attie, Alan D.; Hibbs, Matthew A.; Graber, Joel H.; Chesler, Elissa J.; Churchill, Gary A.
2014-01-01
Massively parallel RNA sequencing (RNA-seq) has yielded a wealth of new insights into transcriptional regulation. A first step in the analysis of RNA-seq data is the alignment of short sequence reads to a common reference genome or transcriptome. Genetic variants that distinguish individual genomes from the reference sequence can cause reads to be misaligned, resulting in biased estimates of transcript abundance. Fine-tuning of read alignment algorithms does not correct this problem. We have developed Seqnature software to construct individualized diploid genomes and transcriptomes for multiparent populations and have implemented a complete analysis pipeline that incorporates other existing software tools. We demonstrate in simulated and real data sets that alignment to individualized transcriptomes increases read mapping accuracy, improves estimation of transcript abundance, and enables the direct estimation of allele-specific expression. Moreover, when applied to expression QTL mapping we find that our individualized alignment strategy corrects false-positive linkage signals and unmasks hidden associations. We recommend the use of individualized diploid genomes over reference sequence alignment for all applications of high-throughput sequencing technology in genetically diverse populations. PMID:25236449
Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data.
Paulson, Joseph N; Chen, Cho-Yi; Lopes-Ramos, Camila M; Kuijjer, Marieke L; Platig, John; Sonawane, Abhijeet R; Fagny, Maud; Glass, Kimberly; Quackenbush, John
2017-10-03
Although ultrahigh-throughput RNA-Sequencing has become the dominant technology for genome-wide transcriptional profiling, the vast majority of RNA-Seq studies typically profile only tens of samples, and most analytical pipelines are optimized for these smaller studies. However, projects are generating ever-larger data sets comprising RNA-Seq data from hundreds or thousands of samples, often collected at multiple centers and from diverse tissues. These complex data sets present significant analytical challenges due to batch and tissue effects, but provide the opportunity to revisit the assumptions and methods that we use to preprocess, normalize, and filter RNA-Seq data - critical first steps for any subsequent analysis. We find that analysis of large RNA-Seq data sets requires both careful quality control and the need to account for sparsity due to the heterogeneity intrinsic in multi-group studies. We developed Yet Another RNA Normalization software pipeline (YARN), that includes quality control and preprocessing, gene filtering, and normalization steps designed to facilitate downstream analysis of large, heterogeneous RNA-Seq data sets and we demonstrate its use with data from the Genotype-Tissue Expression (GTEx) project. An R package instantiating YARN is available at http://bioconductor.org/packages/yarn .
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peterson, Elena S.; McCue, Lee Ann; Rutledge, Alexandra C.
2012-04-25
Visual Exploration and Statistics to Promote Annotation (VESPA) is an interactive visual analysis software tool that facilitates the discovery of structural mis-annotations in prokaryotic genomes. VESPA integrates high-throughput peptide-centric proteomics data and oligo-centric or RNA-Seq transcriptomics data into a genomic context. The data may be interrogated via visual analysis across multiple levels of genomic resolution, linked searches, exports and interaction with BLAST to rapidly identify location of interest within the genome and evaluate potential mis-annotations.
Yang, Chia-Chun; Andrews, Erik H; Chen, Min-Hsuan; Wang, Wan-Yu; Chen, Jeremy J W; Gerstein, Mark; Liu, Chun-Chi; Cheng, Chao
2016-08-12
Chromatin immunoprecipitation followed by massively parallel DNA sequencing (ChIP-seq) or microarray hybridization (ChIP-chip) has been widely used to determine the genomic occupation of transcription factors (TFs). We have previously developed a probabilistic method, called TIP (Target Identification from Profiles), to identify TF target genes using ChIP-seq/ChIP-chip data. To achieve high specificity, TIP applies a conservative method to estimate significance of target genes, with the trade-off being a relatively low sensitivity of target gene identification compared to other methods. Additionally, TIP's output does not render binding-peak locations or intensity, information highly useful for visualization and general experimental biological use, while the variability of ChIP-seq/ChIP-chip file formats has made input into TIP more difficult than desired. To improve upon these facets, here we present are fined TIP with key extensions. First, it implements a Gaussian mixture model for p-value estimation, increasing target gene identification sensitivity and more accurately capturing the shape of TF binding profile distributions. Second, it enables the incorporation of TF binding-peak data by identifying their locations in significant target gene promoter regions and quantifies their strengths. Finally, for full ease of implementation we have incorporated it into a web server ( http://syslab3.nchu.edu.tw/iTAR/ ) that enables flexibility of input file format, can be used across multiple species and genome assembly versions, and is freely available for public use. The web server additionally performs GO enrichment analysis for the identified target genes to reveal the potential function of the corresponding TF. The iTAR web server provides a user-friendly interface and supports target gene identification in seven species, ranging from yeast to human. To facilitate investigating the quality of ChIP-seq/ChIP-chip data, the web server generates the chart of the characteristic binding profiles and the density plot of normalized regulatory scores. The iTAR web server is a useful tool in identifying TF target genes from ChIP-seq/ChIP-chip data and discovering biological insights.
DIANA-LncBase v2: indexing microRNA targets on non-coding transcripts.
Paraskevopoulou, Maria D; Vlachos, Ioannis S; Karagkouni, Dimitra; Georgakilas, Georgios; Kanellos, Ilias; Vergoulis, Thanasis; Zagganas, Konstantinos; Tsanakas, Panayiotis; Floros, Evangelos; Dalamagas, Theodore; Hatzigeorgiou, Artemis G
2016-01-04
microRNAs (miRNAs) are short non-coding RNAs (ncRNAs) that act as post-transcriptional regulators of coding gene expression. Long non-coding RNAs (lncRNAs) have been recently reported to interact with miRNAs. The sponge-like function of lncRNAs introduces an extra layer of complexity in the miRNA interactome. DIANA-LncBase v1 provided a database of experimentally supported and in silico predicted miRNA Recognition Elements (MREs) on lncRNAs. The second version of LncBase (www.microrna.gr/LncBase) presents an extensive collection of miRNA:lncRNA interactions. The significantly enhanced database includes more than 70 000 low and high-throughput, (in)direct miRNA:lncRNA experimentally supported interactions, derived from manually curated publications and the analysis of 153 AGO CLIP-Seq libraries. The new experimental module presents a 14-fold increase compared to the previous release. LncBase v2 hosts in silico predicted miRNA targets on lncRNAs, identified with the DIANA-microT algorithm. The relevant module provides millions of predicted miRNA binding sites, accompanied with detailed metadata and MRE conservation metrics. LncBase v2 caters information regarding cell type specific miRNA:lncRNA regulation and enables users to easily identify interactions in 66 different cell types, spanning 36 tissues for human and mouse. Database entries are also supported by accurate lncRNA expression information, derived from the analysis of more than 6 billion RNA-Seq reads. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Guanylate-binding protein-1 is a potential new therapeutic target for triple-negative breast cancer.
Quintero, Melissa; Adamoski, Douglas; Reis, Larissa Menezes Dos; Ascenção, Carolline Fernanda Rodrigues; Oliveira, Krishina Ratna Sousa de; Gonçalves, Kaliandra de Almeida; Dias, Marília Meira; Carazzolle, Marcelo Falsarella; Dias, Sandra Martha Gomes
2017-11-07
Triple-negative breast cancer (TNBC) is characterized by a lack of estrogen and progesterone receptor expression (ESR and PGR, respectively) and an absence of human epithelial growth factor receptor (ERBB2) amplification. Approximately 15-20% of breast malignancies are TNBC. Patients with TNBC often have an unfavorable prognosis. In addition, TNBC represents an important clinical challenge since it does not respond to hormone therapy. In this work, we integrated high-throughput mRNA sequencing (RNA-Seq) data from normal and tumor tissues (obtained from The Cancer Genome Atlas, TCGA) and cell lines obtained through in-house sequencing or available from the Gene Expression Omnibus (GEO) to generate a unified list of differentially expressed (DE) genes. Methylome and proteomic data were integrated to our analysis to give further support to our findings. Genes that were overexpressed in TNBC were then curated to retain new potentially druggable targets based on in silico analysis. Knocking-down was used to assess gene importance for TNBC cell proliferation. Our pipeline analysis generated a list of 243 potential new targets for treating TNBC. We finally demonstrated that knock-down of Guanylate-Binding Protein 1 (GBP1 ), one of the candidate genes, selectively affected the growth of TNBC cell lines. Moreover, we showed that GBP1 expression was controlled by epidermal growth factor receptor (EGFR) in breast cancer cell lines. We propose that GBP1 is a new potential druggable therapeutic target for treating TNBC with enhanced EGFR expression.
Qing, Xiaodan; Zeng, Dong; Wang, Hesong; Ni, Xueqin; Lai, Jing; Liu, Lei; Khalique, Abdul; Pan, Kangcheng; Jing, Bo
2018-04-20
Subclinical necrotic enteritis (SNE) widely outbreaks in chickens which inflicted growth-slowing, causing enormous social and economic burdens. To better understand the molecular underpinnings of SNE on lipid metabolism and explore novel preventative strategies against SNE, we studied the regulatory mechanism of a potential probiotic, Lactobacillus johnsonii BS15 on the lipid metabolism pathways involved in chickens with SNE. One hundred eighty one-day-old chickens were randomly divided into three groups and arranged with basal diet (control and SNE group). Added with BS15 (1 × 10 6 cfu/g) or Man Rogosa Sharpe (MRS) liquid medium for 28 days. The hepatic gene expression of each group was then measured using high-throughput analysis methods (RNA-Seq). Quantitative real-time PCR (qRT-PCR) was used to detect the expression changes of the related genes. The results showed that there are eleven lipid metabolic pathways were found during the prevention of BS15 treatment in SNE chickens by RNA-Seq, including the peroxisome proliferator-activated receptor (PPAR) signaling pathway and arachidonic acid metabolism. BS15 notably facilitated the expressions of fatty acid binding protein 2 (FABP2), acyl-CoA synthetase bubblegum family member 1 (ACSBG1), perilipin 1 (PLIN1) and perilipin 2 (PLIN2), which were involved in PPAR signaling pathway of SNE chickens. Besides, suppression of phospholipase A2 group IVA (PLA2G4A) in arachidonic acid metabolism was observed in SNE chickens after BS15 prevention. The expression patterns of FABP2, ACSBG1, PLIN1, PLIN2 and PLA24G in qRT-PCR validation were consistent with RNA-Seq results. These findings indicate that SNE may affect the hepatic lipid metabolism of chickens. Meanwhile, BS15 pretreatment may provide a prospective natural prophylaxis strategy against SNE through improving the PPAR signaling pathway and arachidonic acid metabolism.
Missing data and technical variability in single-cell RNA-sequencing experiments.
Hicks, Stephanie C; Townes, F William; Teng, Mingxiang; Irizarry, Rafael A
2017-11-06
Until recently, high-throughput gene expression technology, such as RNA-Sequencing (RNA-seq) required hundreds of thousands of cells to produce reliable measurements. Recent technical advances permit genome-wide gene expression measurement at the single-cell level. Single-cell RNA-Seq (scRNA-seq) is the most widely used and numerous publications are based on data produced with this technology. However, RNA-seq and scRNA-seq data are markedly different. In particular, unlike RNA-seq, the majority of reported expression levels in scRNA-seq are zeros, which could be either biologically-driven, genes not expressing RNA at the time of measurement, or technically-driven, genes expressing RNA, but not at a sufficient level to be detected by sequencing technology. Another difference is that the proportion of genes reporting the expression level to be zero varies substantially across single cells compared to RNA-seq samples. However, it remains unclear to what extent this cell-to-cell variation is being driven by technical rather than biological variation. Furthermore, while systematic errors, including batch effects, have been widely reported as a major challenge in high-throughput technologies, these issues have received minimal attention in published studies based on scRNA-seq technology. Here, we use an assessment experiment to examine data from published studies and demonstrate that systematic errors can explain a substantial percentage of observed cell-to-cell expression variability. Specifically, we present evidence that some of these reported zeros are driven by technical variation by demonstrating that scRNA-seq produces more zeros than expected and that this bias is greater for lower expressed genes. In addition, this missing data problem is exacerbated by the fact that this technical variation varies cell-to-cell. Then, we show how this technical cell-to-cell variability can be confused with novel biological results. Finally, we demonstrate and discuss how batch-effects and confounded experiments can intensify the problem. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Liu, Yu; Koyutürk, Mehmet; Maxwell, Sean; Xiang, Min; Veigl, Martina; Cooper, Richard S; Tayo, Bamidele O; Li, Li; LaFramboise, Thomas; Wang, Zhenghe; Zhu, Xiaofeng; Chance, Mark R
2014-08-16
Sequences up to several megabases in length have been found to be present in individual genomes but absent in the human reference genome. These sequences may be common in populations, and their absence in the reference genome may indicate rare variants in the genomes of individuals who served as donors for the human genome project. As the reference genome is used in probe design for microarray technology and mapping short reads in next generation sequencing (NGS), this missing sequence could be a source of bias in functional genomic studies and variant analysis. One End Anchor (OEA) and/or orphan reads from paired-end sequencing have been used to identify novel sequences that are absent in reference genome. However, there is no study to investigate the distribution, evolution and functionality of those sequences in human populations. To systematically identify and study the missing common sequences (micSeqs), we extended the previous method by pooling OEA reads from large number of individuals and applying strict filtering methods to remove false sequences. The pipeline was applied to data from phase 1 of the 1000 Genomes Project. We identified 309 micSeqs that are present in at least 1% of the human population, but absent in the reference genome. We confirmed 76% of these 309 micSeqs by comparison to other primate genomes, individual human genomes, and gene expression data. Furthermore, we randomly selected fifteen micSeqs and confirmed their presence using PCR validation in 38 additional individuals. Functional analysis using published RNA-seq and ChIP-seq data showed that eleven micSeqs are highly expressed in human brain and three micSeqs contain transcription factor (TF) binding regions, suggesting they are functional elements. In addition, the identified micSeqs are absent in non-primates and show dynamic acquisition during primate evolution culminating with most micSeqs being present in Africans, suggesting some micSeqs may be important sources of human diversity. 76% of micSeqs were confirmed by a comparative genomics approach. Fourteen micSeqs are expressed in human brain or contain TF binding regions. Some micSeqs are primate-specific, conserved and may play a role in the evolution of primates.
Dotsey, Emmanuel Y.; Gorlani, Andrea; Ingale, Sampat; Achenbach, Chad J.; Forthal, Donald N.; Felgner, Philip L.; Gach, Johannes S.
2015-01-01
In recent years, high throughput discovery of human recombinant monoclonal antibodies (mAbs) has been applied to greatly advance our understanding of the specificity, and functional activity of antibodies against HIV. Thousands of antibodies have been generated and screened in functional neutralization assays, and antibodies associated with cross-strain neutralization and passive protection in primates, have been identified. To facilitate this type of discovery, a high throughput-screening tool is needed to accurately classify mAbs, and their antigen targets. In this study, we analyzed and evaluated a prototype microarray chip comprised of the HIV-1 recombinant proteins gp140, gp120, gp41, and several membrane proximal external region peptides. The protein microarray analysis of 11 HIV-1 envelope-specific mAbs revealed diverse binding affinities and specificities across clades. Half maximal effective concentrations, generated by our chip analysis, correlated significantly (P<0.0001) with concentrations from ELISA binding measurements. Polyclonal immune responses in plasma samples from HIV-1 infected subjects exhibited different binding patterns, and reactivity against printed proteins. Examining the totality of the specificity of the humoral response in this way reveals the exquisite diversity, and specificity of the humoral response to HIV. PMID:25938510
The PhoP-Dependent ncRNA Mcr7 Modulates the TAT Secretion System in Mycobacterium tuberculosis
Benjak, Andrej; Uplekar, Swapna; Rougemont, Jacques; Guilhot, Christophe; Malaga, Wladimir; Martín, Carlos; Cole, Stewart T.
2014-01-01
The PhoPR two-component system is essential for virulence in Mycobacterium tuberculosis where it controls expression of approximately 2% of the genes, including those for the ESX-1 secretion apparatus, a major virulence determinant. Mutations in phoP lead to compromised production of pathogen-specific cell wall components and attenuation both ex vivo and in vivo. Using antibodies against the native protein in ChIP-seq experiments (chromatin immunoprecipitation followed by high-throughput sequencing) we demonstrated that PhoP binds to at least 35 loci on the M. tuberculosis genome. The PhoP regulon comprises several transcriptional regulators as well as genes for polyketide synthases and PE/PPE proteins. Integration of ChIP-seq results with high-resolution transcriptomic analysis (RNA-seq) revealed that PhoP controls 30 genes directly, whilst regulatory cascades are responsible for signal amplification and downstream effects through proteins like EspR, which controls Esx1 function, via regulation of the espACD operon. The most prominent site of PhoP regulation was located in the intergenic region between rv2395 and PE_PGRS41, where the mcr7 gene codes for a small non-coding RNA (ncRNA). Northern blot experiments confirmed the absence of Mcr7 in an M. tuberculosis phoP mutant as well as low-level expression of the ncRNA in M. tuberculosis complex members other than M. tuberculosis. By means of genetic and proteomic analyses we demonstrated that Mcr7 modulates translation of the tatC mRNA thereby impacting the activity of the Twin Arginine Translocation (Tat) protein secretion apparatus. As a result, secretion of the immunodominant Ag85 complex and the beta-lactamase BlaC is affected, among others. Mcr7, the first ncRNA of M. tuberculosis whose function has been established, therefore represents a missing link between the PhoPR two-component system and the downstream functions necessary for successful infection of the host. PMID:24874799
YM500: a small RNA sequencing (smRNA-seq) database for microRNA research
Cheng, Wei-Chung; Chung, I-Fang; Huang, Tse-Shun; Chang, Shih-Ting; Sun, Hsing-Jen; Tsai, Cheng-Fong; Liang, Muh-Lii; Wong, Tai-Tong; Wang, Hsei-Wei
2013-01-01
MicroRNAs (miRNAs) are small RNAs ∼22 nt in length that are involved in the regulation of a variety of physiological and pathological processes. Advances in high-throughput small RNA sequencing (smRNA-seq), one of the next-generation sequencing applications, have reshaped the miRNA research landscape. In this study, we established an integrative database, the YM500 (http://ngs.ym.edu.tw/ym500/), containing analysis pipelines and analysis results for 609 human and mice smRNA-seq results, including public data from the Gene Expression Omnibus (GEO) and some private sources. YM500 collects analysis results for miRNA quantification, for isomiR identification (incl. RNA editing), for arm switching discovery, and, more importantly, for novel miRNA predictions. Wetlab validation on >100 miRNAs confirmed high correlation between miRNA profiling and RT-qPCR results (R = 0.84). This database allows researchers to search these four different types of analysis results via our interactive web interface. YM500 allows researchers to define the criteria of isomiRs, and also integrates the information of dbSNP to help researchers distinguish isomiRs from SNPs. A user-friendly interface is provided to integrate miRNA-related information and existing evidence from hundreds of sequencing datasets. The identified novel miRNAs and isomiRs hold the potential for both basic research and biotech applications. PMID:23203880
HSA: a heuristic splice alignment tool.
Bu, Jingde; Chi, Xuebin; Jin, Zhong
2013-01-01
RNA-Seq methodology is a revolutionary transcriptomics sequencing technology, which is the representative of Next generation Sequencing (NGS). With the high throughput sequencing of RNA-Seq, we can acquire much more information like differential expression and novel splice variants from deep sequence analysis and data mining. But the short read length brings a great challenge to alignment, especially when the reads span two or more exons. A two steps heuristic splice alignment tool is generated in this investigation. First, map raw reads to reference with unspliced aligner--BWA; second, split initial unmapped reads into three equal short reads (seeds), align each seed to the reference, filter hits, search possible split position of read and extend hits to a complete match. Compare with other splice alignment tools like SOAPsplice and Tophat2, HSA has a better performance in call rate and efficiency, but its results do not as accurate as the other software to some extent. HSA is an effective spliced aligner of RNA-Seq reads mapping, which is available at https://github.com/vlcc/HSA.
EMQIT: a machine learning approach for energy based PWM matrix quality improvement.
Smolinska, Karolina; Pacholczyk, Marcin
2017-08-01
Transcription factor binding affinities to DNA play a key role for the gene regulation. Learning the specificity of the mechanisms of binding TFs to DNA is important both to experimentalists and theoreticians. With the development of high-throughput methods such as, e.g., ChiP-seq the need to provide unbiased models of binding events has been made apparent. We present EMQIT a modification to the approach introduced by Alamanova et al. and later implemented as 3DTF server. We observed that tuning of Boltzmann factor weights, used for conversion of calculated energies to nucleotide probabilities, has a significant impact on the quality of the associated PWM matrix. Consequently, we proposed to use receiver operator characteristics curves and the 10-fold cross-validation to learn best weights using experimentally verified data from TRANSFAC database. We applied our method to data available for various TFs. We verified the efficiency of detecting TF binding sites by the 3DTF matrices improved with our technique using experimental data from the TRANSFAC database. The comparison showed a significant similarity and comparable performance between the improved and the experimental matrices (TRANSFAC). Improved 3DTF matrices achieved significantly higher AUC values than the original 3DTF matrices (at least by 0.1) and, at the same time, detected notably more experimentally verified TFBSs. The resulting new improved PWM matrices for analyzed factors show similarity to TRANSFAC matrices. Matrices had comparable predictive capabilities. Moreover, improved PWMs achieve better results than matrices downloaded from 3DTF server. Presented approach is general and applicable to any energy-based matrices. EMQIT is available online at http://biosolvers.polsl.pl:3838/emqit . This article was reviewed by Oliviero Carugo, Marek Kimmel and István Simon.
Lapierre, Pascal; Mir, Mushtaq; Chase, Michael R.; Pyle, Margaret M.; Gawande, Richa; Ahmad, Rushdy; Sarracino, David A.; Ioerger, Thomas R.; Fortune, Sarah M.; Derbyshire, Keith M.; Wade, Joseph T.; Gray, Todd A.
2015-01-01
RNA-seq technologies have provided significant insight into the transcription networks of mycobacteria. However, such studies provide no definitive information on the translational landscape. Here, we use a combination of high-throughput transcriptome and proteome-profiling approaches to more rigorously understand protein expression in two mycobacterial species. RNA-seq and ribosome profiling in Mycobacterium smegmatis, and transcription start site (TSS) mapping and N-terminal peptide mass spectrometry in Mycobacterium tuberculosis, provide complementary, empirical datasets to examine the congruence of transcription and translation in the Mycobacterium genus. We find that nearly one-quarter of mycobacterial transcripts are leaderless, lacking a 5’ untranslated region (UTR) and Shine-Dalgarno ribosome-binding site. Our data indicate that leaderless translation is a major feature of mycobacterial genomes and is comparably robust to leadered initiation. Using translational reporters to systematically probe the cis-sequence requirements of leaderless translation initiation in mycobacteria, we find that an ATG or GTG at the mRNA 5’ end is both necessary and sufficient. This criterion, together with our ribosome occupancy data, suggests that mycobacteria encode hundreds of small, unannotated proteins at the 5’ ends of transcripts. The conservation of small proteins in both mycobacterial species tested suggests that some play important roles in mycobacterial physiology. Our translational-reporter system further indicates that mycobacterial leadered translation initiation requires a Shine Dalgarno site in the 5’ UTR and that ATG, GTG, TTG, and ATT codons can robustly initiate translation. Our combined approaches provide the first comprehensive view of mycobacterial gene structures and their non-canonical mechanisms of protein expression. PMID:26536359
Shell, Scarlet S; Wang, Jing; Lapierre, Pascal; Mir, Mushtaq; Chase, Michael R; Pyle, Margaret M; Gawande, Richa; Ahmad, Rushdy; Sarracino, David A; Ioerger, Thomas R; Fortune, Sarah M; Derbyshire, Keith M; Wade, Joseph T; Gray, Todd A
2015-11-01
RNA-seq technologies have provided significant insight into the transcription networks of mycobacteria. However, such studies provide no definitive information on the translational landscape. Here, we use a combination of high-throughput transcriptome and proteome-profiling approaches to more rigorously understand protein expression in two mycobacterial species. RNA-seq and ribosome profiling in Mycobacterium smegmatis, and transcription start site (TSS) mapping and N-terminal peptide mass spectrometry in Mycobacterium tuberculosis, provide complementary, empirical datasets to examine the congruence of transcription and translation in the Mycobacterium genus. We find that nearly one-quarter of mycobacterial transcripts are leaderless, lacking a 5' untranslated region (UTR) and Shine-Dalgarno ribosome-binding site. Our data indicate that leaderless translation is a major feature of mycobacterial genomes and is comparably robust to leadered initiation. Using translational reporters to systematically probe the cis-sequence requirements of leaderless translation initiation in mycobacteria, we find that an ATG or GTG at the mRNA 5' end is both necessary and sufficient. This criterion, together with our ribosome occupancy data, suggests that mycobacteria encode hundreds of small, unannotated proteins at the 5' ends of transcripts. The conservation of small proteins in both mycobacterial species tested suggests that some play important roles in mycobacterial physiology. Our translational-reporter system further indicates that mycobacterial leadered translation initiation requires a Shine Dalgarno site in the 5' UTR and that ATG, GTG, TTG, and ATT codons can robustly initiate translation. Our combined approaches provide the first comprehensive view of mycobacterial gene structures and their non-canonical mechanisms of protein expression.
Kondrashova, Olga; Love, Clare J.; Lunke, Sebastian; Hsu, Arthur L.; Waring, Paul M.; Taylor, Graham R.
2015-01-01
Whilst next generation sequencing can report point mutations in fixed tissue tumour samples reliably, the accurate determination of copy number is more challenging. The conventional Multiplex Ligation-dependent Probe Amplification (MLPA) assay is an effective tool for measurement of gene dosage, but is restricted to around 50 targets due to size resolution of the MLPA probes. By switching from a size-resolved format, to a sequence-resolved format we developed a scalable, high-throughput, quantitative assay. MLPA-seq is capable of detecting deletions, duplications, and amplifications in as little as 5ng of genomic DNA, including from formalin-fixed paraffin-embedded (FFPE) tumour samples. We show that this method can detect BRCA1, BRCA2, ERBB2 and CCNE1 copy number changes in DNA extracted from snap-frozen and FFPE tumour tissue, with 100% sensitivity and >99.5% specificity. PMID:26569395
Vidjil: A Web Platform for Analysis of High-Throughput Repertoire Sequencing.
Duez, Marc; Giraud, Mathieu; Herbert, Ryan; Rocher, Tatiana; Salson, Mikaël; Thonier, Florian
2016-01-01
The B and T lymphocytes are white blood cells playing a key role in the adaptive immunity. A part of their DNA, called the V(D)J recombinations, is specific to each lymphocyte, and enables recognition of specific antigenes. Today, with new sequencing techniques, one can get billions of DNA sequences from these regions. With dedicated Repertoire Sequencing (RepSeq) methods, it is now possible to picture population of lymphocytes, and to monitor more accurately the immune response as well as pathologies such as leukemia. Vidjil is an open-source platform for the interactive analysis of high-throughput sequencing data from lymphocyte recombinations. It contains an algorithm gathering reads into clonotypes according to their V(D)J junctions, a web application made of a sample, experiment and patient database and a visualization for the analysis of clonotypes along the time. Vidjil is implemented in C++, Python and Javascript and licensed under the GPLv3 open-source license. Source code, binaries and a public web server are available at http://www.vidjil.org and at http://bioinfo.lille.inria.fr/vidjil. Using the Vidjil web application consists of four steps: 1. uploading a raw sequence file (typically a FASTQ); 2. running RepSeq analysis software; 3. visualizing the results; 4. annotating the results and saving them for future use. For the end-user, the Vidjil web application needs no specific installation and just requires a connection and a modern web browser. Vidjil is used by labs in hematology or immunology for research and clinical applications.
Vidjil: A Web Platform for Analysis of High-Throughput Repertoire Sequencing
Duez, Marc; Herbert, Ryan; Rocher, Tatiana; Salson, Mikaël; Thonier, Florian
2016-01-01
Background The B and T lymphocytes are white blood cells playing a key role in the adaptive immunity. A part of their DNA, called the V(D)J recombinations, is specific to each lymphocyte, and enables recognition of specific antigenes. Today, with new sequencing techniques, one can get billions of DNA sequences from these regions. With dedicated Repertoire Sequencing (RepSeq) methods, it is now possible to picture population of lymphocytes, and to monitor more accurately the immune response as well as pathologies such as leukemia. Methods and Results Vidjil is an open-source platform for the interactive analysis of high-throughput sequencing data from lymphocyte recombinations. It contains an algorithm gathering reads into clonotypes according to their V(D)J junctions, a web application made of a sample, experiment and patient database and a visualization for the analysis of clonotypes along the time. Vidjil is implemented in C++, Python and Javascript and licensed under the GPLv3 open-source license. Source code, binaries and a public web server are available at http://www.vidjil.org and at http://bioinfo.lille.inria.fr/vidjil. Using the Vidjil web application consists of four steps: 1. uploading a raw sequence file (typically a FASTQ); 2. running RepSeq analysis software; 3. visualizing the results; 4. annotating the results and saving them for future use. For the end-user, the Vidjil web application needs no specific installation and just requires a connection and a modern web browser. Vidjil is used by labs in hematology or immunology for research and clinical applications. PMID:27835690
SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read
2010-01-01
Background High-throughput automated sequencing has enabled an exponential growth rate of sequencing data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival of pyrosequencing enhances this problem and necessitates customisable pre-processing algorithms. Results SeqTrim has been implemented both as a Web and as a standalone command line application. Already-published and newly-designed algorithms have been included to identify sequence inserts, to remove low quality, vector, adaptor, low complexity and contaminant sequences, and to detect chimeric reads. The availability of several input and output formats allows its inclusion in sequence processing workflows. Due to its specific algorithms, SeqTrim outperforms other pre-processors implemented as Web services or standalone applications. It performs equally well with sequences from EST libraries, SSH libraries, genomic DNA libraries and pyrosequencing reads and does not lead to over-trimming. Conclusions SeqTrim is an efficient pipeline designed for pre-processing of any type of sequence read, including next-generation sequencing. It is easily configurable and provides a friendly interface that allows users to know what happened with sequences at every pre-processing stage, and to verify pre-processing of an individual sequence if desired. The recommended pipeline reveals more information about each sequence than previously described pre-processors and can discard more sequencing or experimental artefacts. PMID:20089148
Cooper, James; Ding, Yi; Song, Jiuzhou; Zhao, Keji
2017-11-01
Increased chromatin accessibility is a feature of cell-type-specific cis-regulatory elements; therefore, mapping of DNase I hypersensitive sites (DHSs) enables the detection of active regulatory elements of transcription, including promoters, enhancers, insulators and locus-control regions. Single-cell DNase sequencing (scDNase-seq) is a method of detecting genome-wide DHSs when starting with either single cells or <1,000 cells from primary cell sources. This technique enables genome-wide mapping of hypersensitive sites in a wide range of cell populations that cannot be analyzed using conventional DNase I sequencing because of the requirement for millions of starting cells. Fresh cells, formaldehyde-cross-linked cells or cells recovered from formalin-fixed paraffin-embedded (FFPE) tissue slides are suitable for scDNase-seq assays. To generate scDNase-seq libraries, cells are lysed and then digested with DNase I. Circular carrier plasmid DNA is included during subsequent DNA purification and library preparation steps to prevent loss of the small quantity of DHS DNA. Libraries are generated for high-throughput sequencing on the Illumina platform using standard methods. Preparation of scDNase-seq libraries requires only 2 d. The materials and molecular biology techniques described in this protocol should be accessible to any general molecular biology laboratory. Processing of high-throughput sequencing data requires basic bioinformatics skills and uses publicly available bioinformatics software.
SNP discovery in the bovine milk transcriptome using RNA-Seq technology.
Cánovas, Angela; Rincon, Gonzalo; Islas-Trejo, Alma; Wickramasinghe, Saumya; Medrano, Juan F
2010-12-01
High-throughput sequencing of RNA (RNA-Seq) was developed primarily to analyze global gene expression in different tissues. However, it also is an efficient way to discover coding SNPs. The objective of this study was to perform a SNP discovery analysis in the milk transcriptome using RNA-Seq. Seven milk samples from Holstein cows were analyzed by sequencing cDNAs using the Illumina Genome Analyzer system. We detected 19,175 genes expressed in milk samples corresponding to approximately 70% of the total number of genes analyzed. The SNP detection analysis revealed 100,734 SNPs in Holstein samples, and a large number of those corresponded to differences between the Holstein breed and the Hereford bovine genome assembly Btau4.0. The number of polymorphic SNPs within Holstein cows was 33,045. The accuracy of RNA-Seq SNP discovery was tested by comparing SNPs detected in a set of 42 candidate genes expressed in milk that had been resequenced earlier using Sanger sequencing technology. Seventy of 86 SNPs were detected using both RNA-Seq and Sanger sequencing technologies. The KASPar Genotyping System was used to validate unique SNPs found by RNA-Seq but not observed by Sanger technology. Our results confirm that analyzing the transcriptome using RNA-Seq technology is an efficient and cost-effective method to identify SNPs in transcribed regions. This study creates guidelines to maximize the accuracy of SNP discovery and prevention of false-positive SNP detection, and provides more than 33,000 SNPs located in coding regions of genes expressed during lactation that can be used to develop genotyping platforms to perform marker-trait association studies in Holstein cattle.
Wu, Yang; Stauffer, Shaun R; Stanfield, Robyn L; Tapia, Phillip H; Ursu, Oleg; Fisher, Gregory W; Szent-Gyorgyi, Christopher; Evangelisti, Annette; Waller, Anna; Strouse, J Jacob; Carter, Mark B; Bologa, Cristian; Gouveia, Kristine; Poslusney, Mike; Waggoner, Alan S; Lindsley, Craig W; Jarvik, Jonathan W; Sklar, Larry A
2016-01-01
A new class of biosensors, fluorogen activating proteins (FAPs), has been successfully used to track receptor trafficking in live cells. Unlike the traditional fluorescent proteins (FPs), FAPs do not fluoresce unless bound to their specific small-molecule fluorogens, and thus FAP-based assays are highly sensitive. Application of the FAP-based assay for protein trafficking in high-throughput flow cytometry resulted in the discovery of a new class of compounds that interferes with the binding between fluorogens and FAP, thus blocking the fluorescence signal. These compounds are high-affinity, nonfluorescent analogs of fluorogens with little or no toxicity to the tested cells and no apparent interference with the normal function of FAP-tagged receptors. The most potent compound among these, N,4-dimethyl-N-(2-oxo-2-(4-(pyridin-2-yl)piperazin-1-yl)ethyl)benzenesulfonamide (ML342), has been investigated in detail. X-ray crystallographic analysis revealed that ML342 competes with the fluorogen, sulfonated thiazole orange coupled to diethylene glycol diamine (TO1-2p), for the same binding site on a FAP, AM2.2. Kinetic analysis shows that the FAP-fluorogen interaction is more complex than a homogeneous one-site binding process, with multiple conformational states of the fluorogen and/or the FAP, and possible dimerization of the FAP moiety involved in the process. © 2015 Society for Laboratory Automation and Screening.
Library construction for next-generation sequencing: Overviews and challenges
Head, Steven R.; Komori, H. Kiyomi; LaMere, Sarah A.; Whisenant, Thomas; Van Nieuwerburgh, Filip; Salomon, Daniel R.; Ordoukhanian, Phillip
2014-01-01
High-throughput sequencing, also known as next-generation sequencing (NGS), has revolutionized genomic research. In recent years, NGS technology has steadily improved, with costs dropping and the number and range of sequencing applications increasing exponentially. Here, we examine the critical role of sequencing library quality and consider important challenges when preparing NGS libraries from DNA and RNA sources. Factors such as the quantity and physical characteristics of the RNA or DNA source material as well as the desired application (i.e., genome sequencing, targeted sequencing, RNA-seq, ChIP-seq, RIP-seq, and methylation) are addressed in the context of preparing high quality sequencing libraries. In addition, the current methods for preparing NGS libraries from single cells are also discussed. PMID:24502796
Limitations and possibilities of low cell number ChIP-seq
2012-01-01
Background Chromatin immunoprecipitation coupled with high-throughput DNA sequencing (ChIP-seq) offers high resolution, genome-wide analysis of DNA-protein interactions. However, current standard methods require abundant starting material in the range of 1–20 million cells per immunoprecipitation, and remain a bottleneck to the acquisition of biologically relevant epigenetic data. Using a ChIP-seq protocol optimised for low cell numbers (down to 100,000 cells / IP), we examined the performance of the ChIP-seq technique on a series of decreasing cell numbers. Results We present an enhanced native ChIP-seq method tailored to low cell numbers that represents a 200-fold reduction in input requirements over existing protocols. The protocol was tested over a range of starting cell numbers covering three orders of magnitude, enabling determination of the lower limit of the technique. At low input cell numbers, increased levels of unmapped and duplicate reads reduce the number of unique reads generated, and can drive up sequencing costs and affect sensitivity if ChIP is attempted from too few cells. Conclusions The optimised method presented here considerably reduces the input requirements for performing native ChIP-seq. It extends the applicability of the technique to isolated primary cells and rare cell populations (e.g. biobank samples, stem cells), and in many cases will alleviate the need for cell culture and any associated alteration of epigenetic marks. However, this study highlights a challenge inherent to ChIP-seq from low cell numbers: as cell input numbers fall, levels of unmapped sequence reads and PCR-generated duplicate reads rise. We discuss a number of solutions to overcome the effects of reducing cell number that may aid further improvements to ChIP performance. PMID:23171294
Xu, Xiaohui Sophia; Rose, Anne; Demers, Roger; Eley, Timothy; Ryan, John; Stouffer, Bruce; Cojocaru, Laura; Arnold, Mark
2014-01-01
The determination of drug-protein binding is important in the pharmaceutical development process because of the impact of protein binding on both the pharmacokinetics and pharmacodynamics of drugs. Equilibrium dialysis is the preferred method to measure the free drug fraction because it is considered to be more accurate. The throughput of equilibrium dialysis has recently been improved by implementing a 96-well format plate. Results/methodology: This manuscript illustrates the successful application of a 96-well rapid equilibrium dialysis (RED) device in the determination of atazanavir plasma-protein binding. This RED method of measuring free fraction was successfully validated and then applied to the analysis of clinical plasma samples taken from HIV-infected pregnant women administered atazanavir. Combined with LC-MS/MS detection, the 96-well format equilibrium dialysis device was suitable for measuring the free and bound concentration of pharmaceutical molecules in a high-throughput mode.
Gao, Xiating; Liu, Yang; Liu, Huan; Yang, Zhen; Liu, Qin; Zhang, Yuanxing; Wang, Qiyao
2017-10-15
In Vibrio species, AphB is essential to activate virulence cascades by sensing low-pH and anaerobiosis signals; however, its regulon remains largely unknown. Here, AphB is found to be a key virulence regulator in Vibrio alginolyticus , a pathogen for marine animals and humans. Chromatin immunoprecipitation followed by high-throughput DNA sequencing (ChIP-seq) enabled the detection of 20 loci in the V. alginolyticus genome that contained AphB-binding peaks. An AphB-specific binding consensus was confirmed by electrophoretic mobility shift assays (EMSAs), and the regulation of genes flanking such binding sites was demonstrated using quantitative real-time PCR analysis. AphB binds directly to its own promoter and positively controls its own expression in later growth stages. AphB also activates the expression of the exotoxin Asp by binding directly to the promoter regions of asp and the master quorum-sensing (QS) regulator luxR DNase I footprinting analysis uncovered distinct AphB-binding sites (BBS) in these promoters. Furthermore, a BBS in the luxR promoter region overlaps that of LuxR-binding site I, which mediates the positive control of luxR promoter activity by AphB. This study provides new insights into the AphB regulon and reveals the mechanisms underlying AphB regulation of physiological adaptation and QS-controlled virulence in V. alginolyticus IMPORTANCE In this work, AphB is determined to play essential roles in the expression of genes associated with QS, physiology, and virulence in V. alginolyticus , a pathogen for marine animals and humans. AphB was found to bind directly to 20 genes and control their expression by a 17-bp consensus binding sequence. Among the 20 genes, the aphB gene itself was identified to be positively autoregulated, and AphB also positively controlled asp and luxR expression. Taken together, these findings improve our understanding of the roles of AphB in controlling physiological adaptation and QS-controlled virulence gene expression. Copyright © 2017 American Society for Microbiology.
Hüser, Daniela; Gogol-Döring, Andreas; Chen, Wei
2014-01-01
ABSTRACT Genome-wide analysis of adeno-associated virus (AAV) type 2 integration in HeLa cells has shown that wild-type AAV integrates at numerous genomic sites, including AAVS1 on chromosome 19q13.42. Multiple GAGY/C repeats, resembling consensus AAV Rep-binding sites are preferred, whereas rep-deficient AAV vectors (rAAV) regularly show a random integration profile. This study is the first study to analyze wild-type AAV integration in diploid human fibroblasts. Applying high-throughput third-generation PacBio-based DNA sequencing, integration profiles of wild-type AAV and rAAV are compared side by side. Bioinformatic analysis reveals that both wild-type AAV and rAAV prefer open chromatin regions. Although genomic features of AAV integration largely reproduce previous findings, the pattern of integration hot spots differs from that described in HeLa cells before. DNase-Seq data for human fibroblasts and for HeLa cells reveal variant chromatin accessibility at preferred AAV integration hot spots that correlates with variant hot spot preferences. DNase-Seq patterns of these sites in human tissues, including liver, muscle, heart, brain, skin, and embryonic stem cells further underline variant chromatin accessibility. In summary, AAV integration is dependent on cell-type-specific, variant chromatin accessibility leading to random integration profiles for rAAV, whereas wild-type AAV integration sites cluster near GAGY/C repeats. IMPORTANCE Adeno-associated virus type 2 (AAV) is assumed to establish latency by chromosomal integration of its DNA. This is the first genome-wide analysis of wild-type AAV2 integration in diploid human cells and the first to compare wild-type to recombinant AAV vector integration side by side under identical experimental conditions. Major determinants of wild-type AAV integration represent open chromatin regions with accessible consensus AAV Rep-binding sites. The variant chromatin accessibility of different human tissues or cell types will have impact on vector targeting to be considered during gene therapy. PMID:25031342
Using Poisson mixed-effects model to quantify transcript-level gene expression in RNA-Seq.
Hu, Ming; Zhu, Yu; Taylor, Jeremy M G; Liu, Jun S; Qin, Zhaohui S
2012-01-01
RNA sequencing (RNA-Seq) is a powerful new technology for mapping and quantifying transcriptomes using ultra high-throughput next-generation sequencing technologies. Using deep sequencing, gene expression levels of all transcripts including novel ones can be quantified digitally. Although extremely promising, the massive amounts of data generated by RNA-Seq, substantial biases and uncertainty in short read alignment pose challenges for data analysis. In particular, large base-specific variation and between-base dependence make simple approaches, such as those that use averaging to normalize RNA-Seq data and quantify gene expressions, ineffective. In this study, we propose a Poisson mixed-effects (POME) model to characterize base-level read coverage within each transcript. The underlying expression level is included as a key parameter in this model. Since the proposed model is capable of incorporating base-specific variation as well as between-base dependence that affect read coverage profile throughout the transcript, it can lead to improved quantification of the true underlying expression level. POME can be freely downloaded at http://www.stat.purdue.edu/~yuzhu/pome.html. yuzhu@purdue.edu; zhaohui.qin@emory.edu Supplementary data are available at Bioinformatics online.
Wilkinson, Samuel L.; John, Shibu; Walsh, Roddy; Novotny, Tomas; Valaskova, Iveta; Gupta, Manu; Game, Laurence; Barton, Paul J R.; Cook, Stuart A.; Ware, James S.
2013-01-01
Background Molecular genetic testing is recommended for diagnosis of inherited cardiac disease, to guide prognosis and treatment, but access is often limited by cost and availability. Recently introduced high-throughput bench-top DNA sequencing platforms have the potential to overcome these limitations. Methodology/Principal Findings We evaluated two next-generation sequencing (NGS) platforms for molecular diagnostics. The protein-coding regions of six genes associated with inherited arrhythmia syndromes were amplified from 15 human samples using parallelised multiplex PCR (Access Array, Fluidigm), and sequenced on the MiSeq (Illumina) and Ion Torrent PGM (Life Technologies). Overall, 97.9% of the target was sequenced adequately for variant calling on the MiSeq, and 96.8% on the Ion Torrent PGM. Regions missed tended to be of high GC-content, and most were problematic for both platforms. Variant calling was assessed using 107 variants detected using Sanger sequencing: within adequately sequenced regions, variant calling on both platforms was highly accurate (Sensitivity: MiSeq 100%, PGM 99.1%. Positive predictive value: MiSeq 95.9%, PGM 95.5%). At the time of the study the Ion Torrent PGM had a lower capital cost and individual runs were cheaper and faster. The MiSeq had a higher capacity (requiring fewer runs), with reduced hands-on time and simpler laboratory workflows. Both provide significant cost and time savings over conventional methods, even allowing for adjunct Sanger sequencing to validate findings and sequence exons missed by NGS. Conclusions/Significance MiSeq and Ion Torrent PGM both provide accurate variant detection as part of a PCR-based molecular diagnostic workflow, and provide alternative platforms for molecular diagnosis of inherited cardiac conditions. Though there were performance differences at this throughput, platforms differed primarily in terms of cost, scalability, protocol stability and ease of use. Compared with current molecular genetic diagnostic tests for inherited cardiac arrhythmias, these NGS approaches are faster, less expensive, and yet more comprehensive. PMID:23861798
Quantitative phenotyping via deep barcode sequencing.
Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey
2009-10-01
Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.
Single-cell genomic profiling of acute myeloid leukemia for clinical use: A pilot study
Yan, Benedict; Hu, Yongli; Ban, Kenneth H.K.; Tiang, Zenia; Ng, Christopher; Lee, Joanne; Tan, Wilson; Chiu, Lily; Tan, Tin Wee; Seah, Elaine; Ng, Chin Hin; Chng, Wee-Joo; Foo, Roger
2017-01-01
Although bulk high-throughput genomic profiling studies have led to a significant increase in the understanding of cancer biology, there is increasing awareness that bulk profiling approaches do not completely elucidate tumor heterogeneity. Single-cell genomic profiling enables the distinction of tumor heterogeneity, and may improve clinical diagnosis through the identification and characterization of putative subclonal populations. In the present study, the challenges associated with a single-cell genomics profiling workflow for clinical diagnostics were investigated. Single-cell RNA-sequencing (RNA-seq) was performed on 20 cells from an acute myeloid leukemia bone marrow sample. Putative blasts were identified based on their gene expression profiles and principal component analysis was performed to identify outlier cells. Variant calling was performed on the single-cell RNA-seq data. The present pilot study demonstrates a proof of concept for clinical single-cell genomic profiling. The recognized limitations include significant stochastic RNA loss and the relatively low throughput of the current proposed platform. Although the results of the present study are promising, further technological advances and protocol optimization are necessary for single-cell genomic profiling to be clinically viable. PMID:28454300
Ayyappan, Vasudevan; Kalavacharla, Venu; Thimmapuram, Jyothi; Bhide, Ketaki P; Sripathi, Venkateswara R; Smolinski, Tomasz G; Manoharan, Muthusamy; Thurston, Yaqoob; Todd, Antonette; Kingham, Bruce
2015-01-01
Histone modifications such as methylation and acetylation play a significant role in controlling gene expression in unstressed and stressed plants. Genome-wide analysis of such stress-responsive modifications and genes in non-model crops is limited. We report the genome-wide profiling of histone methylation (H3K9me2) and acetylation (H4K12ac) in common bean (Phaseolus vulgaris L.) under rust (Uromyces appendiculatus) stress using two high-throughput approaches, chromatin immunoprecipitation sequencing (ChIP-Seq) and RNA sequencing (RNA-Seq). ChIP-Seq analysis revealed 1,235 and 556 histone methylation and acetylation responsive genes from common bean leaves treated with the rust pathogen at 0, 12 and 84 hour-after-inoculation (hai), while RNA-Seq analysis identified 145 and 1,763 genes differentially expressed between mock-inoculated and inoculated plants. The combined ChIP-Seq and RNA-Seq analyses identified some key defense responsive genes (calmodulin, cytochrome p450, chitinase, DNA Pol II, and LRR) and transcription factors (WRKY, bZIP, MYB, HSFB3, GRAS, NAC, and NMRA) in bean-rust interaction. Differential methylation and acetylation affected a large proportion of stress-responsive genes including resistant (R) proteins, detoxifying enzymes, and genes involved in ion flux and cell death. The genes identified were functionally classified using Gene Ontology (GO) and EuKaryotic Orthologous Groups (KOGs). The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis identified a putative pathway with ten key genes involved in plant-pathogen interactions. This first report of an integrated analysis of histone modifications and gene expression involved in the bean-rust interaction as reported here provides a comprehensive resource for other epigenomic regulation studies in non-model species under stress.
Thimmapuram, Jyothi; Bhide, Ketaki P.; Sripathi, Venkateswara R.; Smolinski, Tomasz G.; Manoharan, Muthusamy; Thurston, Yaqoob; Todd, Antonette; Kingham, Bruce
2015-01-01
Histone modifications such as methylation and acetylation play a significant role in controlling gene expression in unstressed and stressed plants. Genome-wide analysis of such stress-responsive modifications and genes in non-model crops is limited. We report the genome-wide profiling of histone methylation (H3K9me2) and acetylation (H4K12ac) in common bean (Phaseolus vulgaris L.) under rust (Uromyces appendiculatus) stress using two high-throughput approaches, chromatin immunoprecipitation sequencing (ChIP-Seq) and RNA sequencing (RNA-Seq). ChIP-Seq analysis revealed 1,235 and 556 histone methylation and acetylation responsive genes from common bean leaves treated with the rust pathogen at 0, 12 and 84 hour-after-inoculation (hai), while RNA-Seq analysis identified 145 and 1,763 genes differentially expressed between mock-inoculated and inoculated plants. The combined ChIP-Seq and RNA-Seq analyses identified some key defense responsive genes (calmodulin, cytochrome p450, chitinase, DNA Pol II, and LRR) and transcription factors (WRKY, bZIP, MYB, HSFB3, GRAS, NAC, and NMRA) in bean-rust interaction. Differential methylation and acetylation affected a large proportion of stress-responsive genes including resistant (R) proteins, detoxifying enzymes, and genes involved in ion flux and cell death. The genes identified were functionally classified using Gene Ontology (GO) and EuKaryotic Orthologous Groups (KOGs). The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis identified a putative pathway with ten key genes involved in plant-pathogen interactions. This first report of an integrated analysis of histone modifications and gene expression involved in the bean-rust interaction as reported here provides a comprehensive resource for other epigenomic regulation studies in non-model species under stress. PMID:26167691
Assessment of the cPAS-based BGISEQ-500 platform for metagenomic sequencing.
Fang, Chao; Zhong, Huanzi; Lin, Yuxiang; Chen, Bing; Han, Mo; Ren, Huahui; Lu, Haorong; Luber, Jacob M; Xia, Min; Li, Wangsheng; Stein, Shayna; Xu, Xun; Zhang, Wenwei; Drmanac, Radoje; Wang, Jian; Yang, Huanming; Hammarström, Lennart; Kostic, Aleksandar D; Kristiansen, Karsten; Li, Junhua
2018-03-01
More extensive use of metagenomic shotgun sequencing in microbiome research relies on the development of high-throughput, cost-effective sequencing. Here we present a comprehensive evaluation of the performance of the new high-throughput sequencing platform BGISEQ-500 for metagenomic shotgun sequencing and compare its performance with that of 2 Illumina platforms. Using fecal samples from 20 healthy individuals, we evaluated the intra-platform reproducibility for metagenomic sequencing on the BGISEQ-500 platform in a setup comprising 8 library replicates and 8 sequencing replicates. Cross-platform consistency was evaluated by comparing 20 pairwise replicates on the BGISEQ-500 platform vs the Illumina HiSeq 2000 platform and the Illumina HiSeq 4000 platform. In addition, we compared the performance of the 2 Illumina platforms against each other. By a newly developed overall accuracy quality control method, an average of 82.45 million high-quality reads (96.06% of raw reads) per sample, with 90.56% of bases scoring Q30 and above, was obtained using the BGISEQ-500 platform. Quantitative analyses revealed extremely high reproducibility between BGISEQ-500 intra-platform replicates. Cross-platform replicates differed slightly more than intra-platform replicates, yet a high consistency was observed. Only a low percentage (2.02%-3.25%) of genes exhibited significant differences in relative abundance comparing the BGISEQ-500 and HiSeq platforms, with a bias toward genes with higher GC content being enriched on the HiSeq platforms. Our study provides the first set of performance metrics for human gut metagenomic sequencing data using BGISEQ-500. The high accuracy and technical reproducibility confirm the applicability of the new platform for metagenomic studies, though caution is still warranted when combining metagenomic data from different platforms.
Macrophage Responses to Epithelial Dysfunction Promote Lung Fibrosis in Aging
2017-10-01
and Christman, 2016, AJRCMB) and at the time of this report listed among highly accessed on AJRCMB website . Importantly, our protocol and findings...seq were prepared using a high-throughput automated robotic platform (Agilent Bravo) to minimize a batch effect, all libraries have passed the QC
Argo_CUDA: Exhaustive GPU based approach for motif discovery in large DNA datasets.
Vishnevsky, Oleg V; Bocharnikov, Andrey V; Kolchanov, Nikolay A
2018-02-01
The development of chromatin immunoprecipitation sequencing (ChIP-seq) technology has revolutionized the genetic analysis of the basic mechanisms underlying transcription regulation and led to accumulation of information about a huge amount of DNA sequences. There are a lot of web services which are currently available for de novo motif discovery in datasets containing information about DNA/protein binding. An enormous motif diversity makes their finding challenging. In order to avoid the difficulties, researchers use different stochastic approaches. Unfortunately, the efficiency of the motif discovery programs dramatically declines with the query set size increase. This leads to the fact that only a fraction of top "peak" ChIP-Seq segments can be analyzed or the area of analysis should be narrowed. Thus, the motif discovery in massive datasets remains a challenging issue. Argo_Compute Unified Device Architecture (CUDA) web service is designed to process the massive DNA data. It is a program for the detection of degenerate oligonucleotide motifs of fixed length written in 15-letter IUPAC code. Argo_CUDA is a full-exhaustive approach based on the high-performance GPU technologies. Compared with the existing motif discovery web services, Argo_CUDA shows good prediction quality on simulated sets. The analysis of ChIP-Seq sequences revealed the motifs which correspond to known transcription factor binding sites.
A technical assessment of the porcine ejaculated spermatozoa for a sperm-specific RNA-seq analysis.
Gòdia, Marta; Mayer, Fabiana Quoos; Nafissi, Julieta; Castelló, Anna; Rodríguez-Gil, Joan Enric; Sánchez, Armand; Clop, Alex
2018-04-26
The study of the boar sperm transcriptome by RNA-seq can provide relevant information on sperm quality and fertility and might contribute to animal breeding strategies. However, the analysis of the spermatozoa RNA is challenging as these cells harbor very low amounts of highly fragmented RNA, and the ejaculates also contain other cell types with larger amounts of non-fragmented RNA. Here, we describe a strategy for a successful boar sperm purification, RNA extraction and RNA-seq library preparation. Using these approaches our objectives were: (i) to evaluate the sperm recovery rate (SRR) after boar spermatozoa purification by density centrifugation using the non-porcine-specific commercial reagent BoviPure TM ; (ii) to assess the correlation between SRR and sperm quality characteristics; (iii) to evaluate the relationship between sperm cell RNA load and sperm quality traits and (iv) to compare different library preparation kits for both total RNA-seq (SMARTer Universal Low Input RNA and TruSeq RNA Library Prep kit) and small RNA-seq (NEBNext Small RNA and TailorMix miRNA Sample Prep v2) for high-throughput sequencing. Our results show that pig SRR (~22%) is lower than in other mammalian species and that it is not significantly dependent of the sperm quality parameters analyzed in our study. Moreover, no relationship between the RNA yield per sperm cell and sperm phenotypes was found. We compared a RNA-seq library preparation kit optimized for low amounts of fragmented RNA with a standard kit designed for high amount and quality of input RNA and found that for sperm, a protocol designed to work on low-quality RNA is essential. We also compared two small RNA-seq kits and did not find substantial differences in their performance. We propose the methodological workflow described for the RNA-seq screening of the boar spermatozoa transcriptome. FPKM: fragments per kilobase of transcript per million mapped reads; KRT1: keratin 1; miRNA: micro-RNA; miscRNA: miscellaneous RNA; Mt rRNA: mitochondrial ribosomal RNA; Mt tRNA: mitochondrial transference RNA; OAZ3: ornithine decarboxylase antizyme 3; ORT: osmotic resistance test; piRNA: Piwi-interacting RNA; PRM1: protamine 1; PTPRC: protein tyrosine phosphatase receptor type C; rRNA: ribosomal RNA; snoRNA: small nucleolar RNA; snRNA: small nuclear RNA; SRR: sperm recovery rate; tRNA: transfer RNA.
Razali, Haslina; O'Connor, Emily; Drews, Anna; Burke, Terry; Westerdahl, Helena
2017-07-28
High-throughput sequencing enables high-resolution genotyping of extremely duplicated genes. 454 amplicon sequencing (454) has become the standard technique for genotyping the major histocompatibility complex (MHC) genes in non-model organisms. However, illumina MiSeq amplicon sequencing (MiSeq), which offers a much higher read depth, is now superseding 454. The aim of this study was to quantitatively and qualitatively evaluate the performance of MiSeq in relation to 454 for genotyping MHC class I alleles using a house sparrow (Passer domesticus) dataset with pedigree information. House sparrows provide a good study system for this comparison as their MHC class I genes have been studied previously and, consequently, we had prior expectations concerning the number of alleles per individual. We found that 454 and MiSeq performed equally well in genotyping amplicons with low diversity, i.e. amplicons from individuals that had fewer than 6 alleles. Although there was a higher rate of failure in the 454 dataset in resolving amplicons with higher diversity (6-9 alleles), the same genotypes were identified by both 454 and MiSeq in 98% of cases. We conclude that low diversity amplicons are equally well genotyped using either 454 or MiSeq, but the higher coverage afforded by MiSeq can lead to this approach outperforming 454 in amplicons with higher diversity.
A hierarchical model for clustering m(6)A methylation peaks in MeRIP-seq data.
Cui, Xiaodong; Meng, Jia; Zhang, Shaowu; Rao, Manjeet K; Chen, Yidong; Huang, Yufei
2016-08-22
The recent advent of the state-of-art high throughput sequencing technology, known as Methylated RNA Immunoprecipitation combined with RNA sequencing (MeRIP-seq) revolutionizes the area of mRNA epigenetics and enables the biologists and biomedical researchers to have a global view of N (6)-Methyladenosine (m(6)A) on transcriptome. Yet there is a significant need for new computation tools for processing and analysing MeRIP-Seq data to gain a further insight into the function and m(6)A mRNA methylation. We developed a novel algorithm and an open source R package ( http://compgenomics.utsa.edu/metcluster ) for uncovering the potential types of m(6)A methylation by clustering the degree of m(6)A methylation peaks in MeRIP-Seq data. This algorithm utilizes a hierarchical graphical model to model the reads account variance and the underlying clusters of the methylation peaks. Rigorous statistical inference is performed to estimate the model parameter and detect the number of clusters. MeTCluster is evaluated on both simulated and real MeRIP-seq datasets and the results demonstrate its high accuracy in characterizing the clusters of methylation peaks. Our algorithm was applied to two different sets of real MeRIP-seq datasets and reveals a novel pattern that methylation peaks with less peak enrichment tend to clustered in the 5' end of both in both mRNAs and lncRNAs, whereas those with higher peak enrichment are more likely to be distributed in CDS and towards the 3'end of mRNAs and lncRNAs. This result might suggest that m(6)A's functions could be location specific. In this paper, a novel hierarchical graphical model based algorithm was developed for clustering the enrichment of methylation peaks in MeRIP-seq data. MeTCluster is written in R and is publicly available.
Wang, WeiBo; Sun, Wei; Wang, Wei; Szatkiewicz, Jin
2018-03-01
The application of high-throughput sequencing in a broad range of quantitative genomic assays (e.g., DNA-seq, ChIP-seq) has created a high demand for the analysis of large-scale read-count data. Typically, the genome is divided into tiling windows and windowed read-count data is generated for the entire genome from which genomic signals are detected (e.g. copy number changes in DNA-seq, enrichment peaks in ChIP-seq). For accurate analysis of read-count data, many state-of-the-art statistical methods use generalized linear models (GLM) coupled with the negative-binomial (NB) distribution by leveraging its ability for simultaneous bias correction and signal detection. However, although statistically powerful, the GLM+NB method has a quadratic computational complexity and therefore suffers from slow running time when applied to large-scale windowed read-count data. In this study, we aimed to speed up substantially the GLM+NB method by using a randomized algorithm and we demonstrate here the utility of our approach in the application of detecting copy number variants (CNVs) using a real example. We propose an efficient estimator, the randomized GLM+NB coefficients estimator (RGE), for speeding up the GLM+NB method. RGE samples the read-count data and solves the estimation problem on a smaller scale. We first theoretically validated the consistency and the variance properties of RGE. We then applied RGE to GENSENG, a GLM+NB based method for detecting CNVs. We named the resulting method as "R-GENSENG". Based on extensive evaluation using both simulated and empirical data, we concluded that R-GENSENG is ten times faster than the original GENSENG while maintaining GENSENG's accuracy in CNV detection. Our results suggest that RGE strategy developed here could be applied to other GLM+NB based read-count analyses, i.e. ChIP-seq data analysis, to substantially improve their computational efficiency while preserving the analytic power.
HTSeq--a Python framework to work with high-throughput sequencing data.
Anders, Simon; Pyl, Paul Theodor; Huber, Wolfgang
2015-01-15
A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. HTSeq is released as an open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. © The Author 2014. Published by Oxford University Press.
Distinct p53 genomic binding patterns in normal and cancer-derived human cells
McCorkle, Sean R; McCombie, WR; Dunn, John J
2011-01-01
Here, we report genome-wide analysis of the tumor suppressor p53 binding sites in normal human cells. 743 high-confidence ChIP-seq peaks representing putative genomic binding sites were identified in normal IMR90 fibroblasts using a reference chromatin sample. More than 40% were located within 2 kb of a transcription start site (TSS), a distribution similar to that documented for individually studied, functional p53 binding sites and, to date, not observed by previous p53 genome-wide studies. Nearly half of the high-confidence binding sites in the IMR90 cells reside in CpG islands in marked contrast to sites reported in cancer-derived cells. The distinct genomic features of the IMR90 binding sites do not reflect a distinct preference for specific sequences, since the de novo developed p53 motif based on our study is similar to those reported by genome-wide studies of cancer cells. More likely, the different chromatin landscape in normal, compared with cancer-derived cells, influences p53 binding via modulating availability of the sites. We compared the IMR90 ChIP-seq peaks to the recently published IMR90 methylome1 and demonstrated that they are enriched at hypomethylated DNA. Our study represents the first genome-wide, de novo mapping of p53 binding sites in normal human cells and reveals that p53 binding sites reside in distinct genomic landscapes in normal and cancer-derived human cells. PMID:22127205
Kim, Taemook; Seo, Hogyu David; Hennighausen, Lothar; Lee, Daeyoup
2018-01-01
Abstract Octopus-toolkit is a stand-alone application for retrieving and processing large sets of next-generation sequencing (NGS) data with a single step. Octopus-toolkit is an automated set-up-and-analysis pipeline utilizing the Aspera, SRA Toolkit, FastQC, Trimmomatic, HISAT2, STAR, Samtools, and HOMER applications. All the applications are installed on the user's computer when the program starts. Upon the installation, it can automatically retrieve original files of various epigenomic and transcriptomic data sets, including ChIP-seq, ATAC-seq, DNase-seq, MeDIP-seq, MNase-seq and RNA-seq, from the gene expression omnibus data repository. The downloaded files can then be sequentially processed to generate BAM and BigWig files, which are used for advanced analyses and visualization. Currently, it can process NGS data from popular model genomes such as, human (Homo sapiens), mouse (Mus musculus), dog (Canis lupus familiaris), plant (Arabidopsis thaliana), zebrafish (Danio rerio), fruit fly (Drosophila melanogaster), worm (Caenorhabditis elegans), and budding yeast (Saccharomyces cerevisiae) genomes. With the processed files from Octopus-toolkit, the meta-analysis of various data sets, motif searches for DNA-binding proteins, and the identification of differentially expressed genes and/or protein-binding sites can be easily conducted with few commands by users. Overall, Octopus-toolkit facilitates the systematic and integrative analysis of available epigenomic and transcriptomic NGS big data. PMID:29420797
Reid-Bayliss, Kate S; Loeb, Lawrence A
2017-08-29
Transcriptional mutagenesis (TM) due to misincorporation during RNA transcription can result in mutant RNAs, or epimutations, that generate proteins with altered properties. TM has long been hypothesized to play a role in aging, cancer, and viral and bacterial evolution. However, inadequate methodologies have limited progress in elucidating a causal association. We present a high-throughput, highly accurate RNA sequencing method to measure epimutations with single-molecule sensitivity. Accurate RNA consensus sequencing (ARC-seq) uniquely combines RNA barcoding and generation of multiple cDNA copies per RNA molecule to eliminate errors introduced during cDNA synthesis, PCR, and sequencing. The stringency of ARC-seq can be scaled to accommodate the quality of input RNAs. We apply ARC-seq to directly assess transcriptome-wide epimutations resulting from RNA polymerase mutants and oxidative stress.
de Muinck, Eric J; Trosvik, Pål; Gilfillan, Gregor D; Hov, Johannes R; Sundaram, Arvind Y M
2017-07-06
Advances in sequencing technologies and bioinformatics have made the analysis of microbial communities almost routine. Nonetheless, the need remains to improve on the techniques used for gathering such data, including increasing throughput while lowering cost and benchmarking the techniques so that potential sources of bias can be better characterized. We present a triple-index amplicon sequencing strategy to sequence large numbers of samples at significantly lower c ost and in a shorter timeframe compared to existing methods. The design employs a two-stage PCR protocol, incorpo rating three barcodes to each sample, with the possibility to add a fourth-index. It also includes heterogeneity spacers to overcome low complexity issues faced when sequencing amplicons on Illumina platforms. The library preparation method was extensively benchmarked through analysis of a mock community in order to assess biases introduced by sample indexing, number of PCR cycles, and template concentration. We further evaluated the method through re-sequencing of a standardized environmental sample. Finally, we evaluated our protocol on a set of fecal samples from a small cohort of healthy adults, demonstrating good performance in a realistic experimental setting. Between-sample variation was mainly related to batch effects, such as DNA extraction, while sample indexing was also a significant source of bias. PCR cycle number strongly influenced chimera formation and affected relative abundance estimates of species with high GC content. Libraries were sequenced using the Illumina HiSeq and MiSeq platforms to demonstrate that this protocol is highly scalable to sequence thousands of samples at a very low cost. Here, we provide the most comprehensive study of performance and bias inherent to a 16S rRNA gene amplicon sequencing method to date. Triple-indexing greatly reduces the number of long custom DNA oligos required for library preparation, while the inclusion of variable length heterogeneity spacers minimizes the need for PhiX spike-in. This design results in a significant cost reduction of highly multiplexed amplicon sequencing. The biases we characterize highlight the need for highly standardized protocols. Reassuringly, we find that the biological signal is a far stronger structuring factor than the various sources of bias.
ChIP-seq: advantages and challenges of a maturing technology.
Park, Peter J
2009-10-01
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is a technique for genome-wide profiling of DNA-binding proteins, histone modifications or nucleosomes. Owing to the tremendous progress in next-generation sequencing technology, ChIP-seq offers higher resolution, less noise and greater coverage than its array-based predecessor ChIP-chip. With the decreasing cost of sequencing, ChIP-seq has become an indispensable tool for studying gene regulation and epigenetic mechanisms. In this Review, I describe the benefits and challenges in harnessing this technique with an emphasis on issues related to experimental design and data analysis. ChIP-seq experiments generate large quantities of data, and effective computational analysis will be crucial for uncovering biological mechanisms.
RNA-Seq Analysis Reveals a Positive Role of HTR2A in Adipogenesis in Yan Yellow Cattle.
Yun, Jinyan; Jin, Haiguo; Cao, Yang; Zhang, Lichun; Zhao, Yumin; Jin, Xin; Yu, Yongsheng
2018-06-13
In this study, we performed high throughput RNA sequencing at the primary bovine preadipocyte (Day-0), mid-differentiation (Day-4), and differentiated adipocyte (Day-9) stages in order to characterize the transcriptional events regulating differentiation and function. The preadipocytes were isolated from subcutaneous fetal bovine adipose tissues and were differentiated into mature adipocytes. The adipogenic characteristics of the adipocytes were detected during various stages of adipogenesis (Day-0, Day-4, and Day-9). We used RNA sequencing (RNA-seq) to investigate a comprehensive transcriptome information of adipocytic differentiation. Compared to the pre-differentiation stage (Day-0), 2510 genes were identified as differentially expressed genes (DEGs) at the mid-differentiation stage (Day-4). We found 2446 DEGs in the mature adipocytic stage relative to the mid-differentiation stage. Some adipogenesis-related transcription factors, CCAAT-enhancer-binding protein α (C/EBPα) and peroxisome proliferator-activated receptor γ (PPARγ) were differentially expressed at Day-0, Day-4, and Day-9. We further investigated the adipogenic function of 5-hydroxytryptamine receptor 2A (HTR2A) in adipogenesis. Overexpression of HTR2A stimulated the differentiation of preadipocytes, and knockdown of HTR2A had opposite effects. Furthermore, functional enrichment analysis of DEGs revealed that the PI3K-Akt signaling pathway was the significantly enriched pathway, and HTR2A regulated adipogenesis by activating or inhibiting phosphorylation of phospho-AKT (Ser473). In summary, the present study provides the first comparative transcription of various periods of adipocytes in cattle, which presents a solid foundation for further study into the molecular mechanism of fat deposition and the improvement of beef quality in cattle.
Nobrega, R Paul; Brown, Michael; Williams, Cody; Sumner, Chris; Estep, Patricia; Caffry, Isabelle; Yu, Yao; Lynaugh, Heather; Burnina, Irina; Lilov, Asparouh; Desroches, Jordan; Bukowski, John; Sun, Tingwan; Belk, Jonathan P; Johnson, Kirt; Xu, Yingda
2017-10-01
The state-of-the-art industrial drug discovery approach is the empirical interrogation of a library of drug candidates against a target molecule. The advantage of high-throughput kinetic measurements over equilibrium assessments is the ability to measure each of the kinetic components of binding affinity. Although high-throughput capabilities have improved with advances in instrument hardware, three bottlenecks in data processing remain: (1) intrinsic molecular properties that lead to poor biophysical quality in vitro are not accounted for in commercially available analysis models, (2) processing data through a user interface is time-consuming and not amenable to parallelized data collection, and (3) a commercial solution that includes historical kinetic data in the analysis of kinetic competition data does not exist. Herein, we describe a generally applicable method for the automated analysis, storage, and retrieval of kinetic binding data. This analysis can deconvolve poor quality data on-the-fly and store and organize historical data in a queryable format for use in future analyses. Such database-centric strategies afford greater insight into the molecular mechanisms of kinetic competition, allowing for the rapid identification of allosteric effectors and the presentation of kinetic competition data in absolute terms of percent bound to antigen on the biosensor.
Mykles, Donald L.; Burnett, Karen G.; Durica, David S.; Joyce, Blake L.; McCarthy, Fiona M.; Schmidt, Carl J.; Stillman, Jonathon H.
2016-01-01
High-throughput RNA sequencing (RNA-seq) technology has become an important tool for studying physiological responses of organisms to changes in their environment. De novo assembly of RNA-seq data has allowed researchers to create a comprehensive catalog of genes expressed in a tissue and to quantify their expression without a complete genome sequence. The contributions from the “Tapping the Power of Crustacean Transcriptomics to Address Grand Challenges in Comparative Biology” symposium in this issue show the successes and limitations of using RNA-seq in the study of crustaceans. In conjunction with the symposium, the Animal Genome to Phenome Research Coordination Network collated comments from participants at the meeting regarding the challenges encountered when using transcriptomics in their research. Input came from novices and experts ranging from graduate students to principal investigators. Many were unaware of the bioinformatics analysis resources currently available on the CyVerse platform. Our analysis of community responses led to three recommendations for advancing the field: (1) integration of genomic and RNA-seq sequence assemblies for crustacean gene annotation and comparative expression; (2) development of methodologies for the functional analysis of genes; and (3) information and training exchange among laboratories for transmission of best practices. The field lacks the methods for manipulating tissue-specific gene expression. The decapod crustacean research community should consider the cherry shrimp, Neocaridina denticulata, as a decapod model for the application of transgenic tools for functional genomics. This would require a multi-investigator effort. PMID:27639274
Lan, DaoLiang; Xiong, XianRong; Wei, YanLi; Xu, Tong; Zhong, JinCheng; Zhi, XiangDong; Wang, Yong; Li, Jian
2014-09-01
RNA-Seq, a high-throughput (HT) sequencing technique, has been used effectively in large-scale transcriptomic studies, and is particularly useful for improving gene structure information and mining of new genes. In this study, RNA-Seq HT technology was employed to analyze the transcriptome of yak ovary. After Illumina-Solexa deep sequencing, 26826516 clean reads with a total of 4828772880 bp were obtained from the ovary library. Alignment analysis showed that 16992 yak genes mapped to the yak genome and 3734 of these genes were involved in alternative splicing. Gene structure refinement analysis showed that 7340 genes that were annotated in the yak genome could be extended at the 5' or 3' ends based on the alignments been the transcripts and the genome sequence. Novel transcript prediction analysis identified 6321 new transcripts with lengths ranging from 180 to 14884 bp, and 2267 of them were predicted to code proteins. BLAST analysis of the new transcripts showed that 1200?4933 mapped to the non-redundant (nr), nucleotide (nt) and/or SwissProt sequence databases. Comparative statistical analysis of the new mapped transcripts showed that the majority of them were similar to genes in Bos taurus (41.4%), Bos grunniens mutus (33.0%), Ovis aries (6.3%), Homo sapiens (2.8%), Mus musculus (1.6%) and other species. Functional analysis showed that these expressed genes were involved in various Gene Ontology (GO) categories and Kyoto Encyclopedia of Genes and Genomes pathways. GO analysis of the new transcripts found that the largest proportion of them was associated with reproduction. The results of this study will provide a basis for describing the normal transcriptome map of yak ovary and for future studies on yak breeding performance. Moreover, the results confirmed that RNA-Seq HT technology is highly advantageous in improving gene structure information and mining of new genes, as well as in providing valuable data to expand the yak genome information.
Yap, Hui-Yeng Y.; Chooi, Yit-Heng; Fung, Shin-Yee; Ng, Szu-Ting; Tan, Chon-Seng; Tan, Nget-Hong
2015-01-01
Lignosus rhinocerotis (Cooke) Ryvarden (tiger milk mushroom) has long been known for its nutritional and medicinal benefits among the local communities in Southeast Asia. However, the molecular and genetic basis of its medicinal and nutraceutical properties at transcriptional level have not been investigated. In this study, the transcriptome of L. rhinocerotis sclerotium, the part with medicinal value, was analyzed using high-throughput Illumina HiSeqTM platform with good sequencing quality and alignment results. A total of 3,673, 117, and 59,649 events of alternative splicing, novel transcripts, and SNP variation were found to enrich its current genome database. A large number of transcripts were expressed and involved in the processing of gene information and carbohydrate metabolism. A few highly expressed genes encoding the cysteine-rich cerato-platanin, hydrophobins, and sugar-binding lectins were identified and their possible roles in L. rhinocerotis were discussed. Genes encoding enzymes involved in the biosynthesis of glucans, six gene clusters encoding four terpene synthases and one each of non-ribosomal peptide synthetase and polyketide synthase, and 109 transcribed cytochrome P450 sequences were also identified in the transcriptome. The data from this study forms a valuable foundation for future research in the exploitation of this mushroom in pharmacological and industrial applications. PMID:26606395
Huang, Daosheng; Guo, Guoji; Yuan, Ping; Ralston, Amy; Sun, Lingang; Huss, Mikael; Mistri, Tapan; Pinello, Luca; Ng, Huck Hui; Yuan, Guocheng; Ji, Junfeng; Rossant, Janet; Robson, Paul; Han, Xiaoping
2017-12-07
The first cellular differentiation event in mouse development leads to the formation of the blastocyst consisting of the inner cell mass (ICM) and trophectoderm (TE). The transcription factor CDX2 is required for proper TE specification, where it promotes expression of TE genes, and represses expression of Pou5f1 (OCT4). However its downstream network in the developing embryo is not fully characterized. Here, we performed high-throughput single embryo qPCR analysis in Cdx2 null embryos to identify CDX2-regulated targets in vivo. To identify genes likely to be regulated by CDX2 directly, we performed CDX2 ChIP-Seq on trophoblast stem (TS) cells. In addition, we examined the dynamics of gene expression changes using inducible CDX2 embryonic stem (ES) cells, so that we could predict which CDX2-bound genes are activated or repressed by CDX2 binding. By integrating these data with observations of chromatin modifications, we identify putative novel regulatory elements that repress gene expression in a lineage-specific manner. Interestingly, we found CDX2 binding sites within regulatory elements of key pluripotent genes such as Pou5f1 and Nanog, pointing to the existence of a novel mechanism by which CDX2 maintains repression of OCT4 in trophoblast. Our study proposes a general mechanism in regulating lineage segregation during mammalian development.
Goldman, Johnathan M; Zhang, Li Ang; Manna, Arunava; Armitage, Bruce A; Ly, Danith H; Schneider, James W
2013-07-08
Hybridization analysis of short DNA and RNA targets presents many challenges for detection. The commonly employed sandwich hybridization approach cannot be implemented for these short targets due to insufficient probe-target binding strengths for unmodified DNA probes. Here, we present a method capable of rapid and stable sandwich hybridization detection for 22 nucleotide DNA and RNA targets. Stable hybridization is achieved using an n-alkylated, polyethylene glycol γ-carbon modified peptide nucleic acid (γPNA) amphiphile. The γPNA's exceptionally high affinity enables stable hybridization of a second DNA-based probe to the remaining bases of the short target. Upon hybridization of both probes, an electrophoretic mobility shift is measured via interaction of the n-alkane modification on the γPNA with capillary electrophoresis running buffer containing nonionic surfactant micelles. We find that sandwich hybridization of both probes is stable under multiple binding configurations and demonstrate single base mismatch discrimination. The binding strength of both probes is also stabilized via coaxial stacking on adjacent hybridization to targets. We conclude with a discussion on the implementation of the proposed sandwich hybridization assay as a high-throughput microRNA detection method.
2014-01-01
Background Locating the protein-coding genes in novel genomes is essential to understanding and exploiting the genomic information but it is still difficult to accurately predict all the genes. The recent availability of detailed information about transcript structure from high-throughput sequencing of messenger RNA (RNA-Seq) delineates many expressed genes and promises increased accuracy in gene prediction. Computational gene predictors have been intensively developed for and tested in well-studied animal genomes. Hundreds of fungal genomes are now or will soon be sequenced. The differences of fungal genomes from animal genomes and the phylogenetic sparsity of well-studied fungi call for gene-prediction tools tailored to them. Results SnowyOwl is a new gene prediction pipeline that uses RNA-Seq data to train and provide hints for the generation of Hidden Markov Model (HMM)-based gene predictions and to evaluate the resulting models. The pipeline has been developed and streamlined by comparing its predictions to manually curated gene models in three fungal genomes and validated against the high-quality gene annotation of Neurospora crassa; SnowyOwl predicted N. crassa genes with 83% sensitivity and 65% specificity. SnowyOwl gains sensitivity by repeatedly running the HMM gene predictor Augustus with varied input parameters and selectivity by choosing the models with best homology to known proteins and best agreement with the RNA-Seq data. Conclusions SnowyOwl efficiently uses RNA-Seq data to produce accurate gene models in both well-studied and novel fungal genomes. The source code for the SnowyOwl pipeline (in Python) and a web interface (in PHP) is freely available from http://sourceforge.net/projects/snowyowl/. PMID:24980894
Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; ...
2015-05-12
Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with anymore » transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes. However, the current strategies for TnSeq are too laborious to be applied to hundreds of experimental conditions across multiple bacteria. Here, we describe an approach, random bar code transposon-site sequencing (RB-TnSeq), which greatly simplifies the measurement of gene fitness by using bar code sequencing (BarSeq) to monitor the abundance of mutants. We performed 387 genome-wide fitness assays across five bacteria and identified phenotypes for over 5,000 genes. RB-TnSeq can be applied to diverse bacteria and is a powerful tool to annotate uncharacterized genes using phenotype data.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.
Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with anymore » transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes. However, the current strategies for TnSeq are too laborious to be applied to hundreds of experimental conditions across multiple bacteria. Here, we describe an approach, random bar code transposon-site sequencing (RB-TnSeq), which greatly simplifies the measurement of gene fitness by using bar code sequencing (BarSeq) to monitor the abundance of mutants. We performed 387 genome-wide fitness assays across five bacteria and identified phenotypes for over 5,000 genes. RB-TnSeq can be applied to diverse bacteria and is a powerful tool to annotate uncharacterized genes using phenotype data.« less
Michel, Audrey M.; Mullan, James P. A.; Velayudhan, Vimalkumar; O'Connor, Patrick B. F.; Donohue, Claire A.; Baranov, Pavel V.
2016-01-01
ABSTRACT Ribosome profiling (ribo-seq) is a technique that uses high-throughput sequencing to reveal the exact locations and densities of translating ribosomes at the entire transcriptome level. The technique has become very popular since its inception in 2009. Yet experimentalists who generate ribo-seq data often have to rely on bioinformaticians to process and analyze their data. We present RiboGalaxy (http://ribogalaxy.ucc.ie), a freely available Galaxy-based web server for processing and analyzing ribosome profiling data with the visualization functionality provided by GWIPS-viz (http://gwips.ucc.ie). RiboGalaxy offers researchers a suite of tools specifically tailored for processing ribo-seq and corresponding mRNA-seq data. Researchers can take advantage of the published workflows which reduce the multi-step alignment process to a minimum of inputs from the user. Users can then explore their own aligned data as custom tracks in GWIPS-viz and compare their ribosome profiles to existing ribo-seq tracks from published studies. In addition, users can assess the quality of their ribo-seq data, determine the strength of the triplet periodicity signal, generate meta-gene ribosome profiles as well as analyze the relative impact of mRNA sequence features on local read density. RiboGalaxy is accompanied by extensive documentation and tips for helping users. In addition we provide a forum (http://gwips.ucc.ie/Forum) where we encourage users to post their questions and feedback to improve the overall RiboGalaxy service. PMID:26821742
A normalization strategy for comparing tag count data
2012-01-01
Background High-throughput sequencing, such as ribonucleic acid sequencing (RNA-seq) and chromatin immunoprecipitation sequencing (ChIP-seq) analyses, enables various features of organisms to be compared through tag counts. Recent studies have demonstrated that the normalization step for RNA-seq data is critical for a more accurate subsequent analysis of differential gene expression. Development of a more robust normalization method is desirable for identifying the true difference in tag count data. Results We describe a strategy for normalizing tag count data, focusing on RNA-seq. The key concept is to remove data assigned as potential differentially expressed genes (DEGs) before calculating the normalization factor. Several R packages for identifying DEGs are currently available, and each package uses its own normalization method and gene ranking algorithm. We compared a total of eight package combinations: four R packages (edgeR, DESeq, baySeq, and NBPSeq) with their default normalization settings and with our normalization strategy. Many synthetic datasets under various scenarios were evaluated on the basis of the area under the curve (AUC) as a measure for both sensitivity and specificity. We found that packages using our strategy in the data normalization step overall performed well. This result was also observed for a real experimental dataset. Conclusion Our results showed that the elimination of potential DEGs is essential for more accurate normalization of RNA-seq data. The concept of this normalization strategy can widely be applied to other types of tag count data and to microarray data. PMID:22475125
Research Associate | Center for Cancer Research
The Basic Science Program (BSP) at the Frederick National Laboratory for Cancer Research (FNLCR) pursues independent, multidisciplinary research programs in basic or applied molecular biology, immunology, retrovirology, cancer biology or human genetics. As part of the BSP, the Microbiome and Genetics Core (the Core) characterizes microbiomes by next-generation sequencing to determine their composition and variation, as influenced by immune, genetic, and host health factors. The Core provides support across a spectrum of processes, from nucleic acid isolation through bioinformatics and statistical analysis. KEY ROLES/RESPONSIBILITIES The Research Associate II will provide support in the areas of automated isolation, preparation, PCR and sequencing of DNA on next generation platforms (Illumina MiSeq and NextSeq). An opportunity exists to join the Core’s team of highly trained experimentalists and bioinformaticians working to characterize microbiome samples. The following represent requirements of the position: A minimum of five (5) years related of biomedical experience. Experience with high-throughput nucleic acid (DNA/RNA) extraction. Experience in performing PCR amplification (including quantitative real-time PCR). Experience or familiarity with robotic liquid handling protocols (especially on the Eppendorf epMotion 5073 or 5075 platforms). Experience in operating and maintaining benchtop Illumina sequencers (MiSeq and NextSeq). Ability to evaluate experimental quality and to troubleshoot molecular biology protocols. Experience with sample tracking, inventory management and biobanking. Ability to operate and communicate effectively in a team-oriented work environment.
Zhang, Xirui; Daaboul, George G; Spuhler, Philipp S; Dröge, Peter; Ünlü, M Selim
2016-03-14
DNA-binding proteins play crucial roles in the maintenance and functions of the genome and yet, their specific binding mechanisms are not fully understood. Recently, it was discovered that DNA-binding proteins recognize specific binding sites to carry out their functions through an indirect readout mechanism by recognizing and capturing DNA conformational flexibility and deformation. High-throughput DNA microarray-based methods that provide large-scale protein-DNA binding information have shown effective and comprehensive analysis of protein-DNA binding affinities, but do not provide information of DNA conformational changes in specific protein-DNA complexes. Building on the high-throughput capability of DNA microarrays, we demonstrate a quantitative approach that simultaneously measures the amount of protein binding to DNA and nanometer-scale DNA conformational change induced by protein binding in a microarray format. Both measurements rely on spectral interferometry on a layered substrate using a single optical instrument in two distinct modalities. In the first modality, we quantitate the amount of binding of protein to surface-immobilized DNA in each DNA spot using a label-free spectral reflectivity technique that accurately measures the surface densities of protein and DNA accumulated on the substrate. In the second modality, for each DNA spot, we simultaneously measure DNA conformational change using a fluorescence vertical sectioning technique that determines average axial height of fluorophores tagged to specific nucleotides of the surface-immobilized DNA. The approach presented in this paper, when combined with current high-throughput DNA microarray-based technologies, has the potential to serve as a rapid and simple method for quantitative and large-scale characterization of conformational specific protein-DNA interactions.
Genome-wide profiling of DNA-binding proteins using barcode-based multiplex Solexa sequencing.
Raghav, Sunil Kumar; Deplancke, Bart
2012-01-01
Chromatin immunoprecipitation (ChIP) is a commonly used technique to detect the in vivo binding of proteins to DNA. ChIP is now routinely paired to microarray analysis (ChIP-chip) or next-generation sequencing (ChIP-Seq) to profile the DNA occupancy of proteins of interest on a genome-wide level. Because ChIP-chip introduces several biases, most notably due to the use of a fixed number of probes, ChIP-Seq has quickly become the method of choice as, depending on the sequencing depth, it is more sensitive, quantitative, and provides a greater binding site location resolution. With the ever increasing number of reads that can be generated per sequencing run, it has now become possible to analyze several samples simultaneously while maintaining sufficient sequence coverage, thus significantly reducing the cost per ChIP-Seq experiment. In this chapter, we provide a step-by-step guide on how to perform multiplexed ChIP-Seq analyses. As a proof-of-concept, we focus on the genome-wide profiling of RNA Polymerase II as measuring its DNA occupancy at different stages of any biological process can provide insights into the gene regulatory mechanisms involved. However, the protocol can also be used to perform multiplexed ChIP-Seq analyses of other DNA-binding proteins such as chromatin modifiers and transcription factors.
Brooks, Matthew J.; Rajasimha, Harsha K.; Roger, Jerome E.
2011-01-01
Purpose Next-generation sequencing (NGS) has revolutionized systems-based analysis of cellular pathways. The goals of this study are to compare NGS-derived retinal transcriptome profiling (RNA-seq) to microarray and quantitative reverse transcription polymerase chain reaction (qRT–PCR) methods and to evaluate protocols for optimal high-throughput data analysis. Methods Retinal mRNA profiles of 21-day-old wild-type (WT) and neural retina leucine zipper knockout (Nrl−/−) mice were generated by deep sequencing, in triplicate, using Illumina GAIIx. The sequence reads that passed quality filters were analyzed at the transcript isoform level with two methods: Burrows–Wheeler Aligner (BWA) followed by ANOVA (ANOVA) and TopHat followed by Cufflinks. qRT–PCR validation was performed using TaqMan and SYBR Green assays. Results Using an optimized data analysis workflow, we mapped about 30 million sequence reads per sample to the mouse genome (build mm9) and identified 16,014 transcripts in the retinas of WT and Nrl−/− mice with BWA workflow and 34,115 transcripts with TopHat workflow. RNA-seq data confirmed stable expression of 25 known housekeeping genes, and 12 of these were validated with qRT–PCR. RNA-seq data had a linear relationship with qRT–PCR for more than four orders of magnitude and a goodness of fit (R2) of 0.8798. Approximately 10% of the transcripts showed differential expression between the WT and Nrl−/− retina, with a fold change ≥1.5 and p value <0.05. Altered expression of 25 genes was confirmed with qRT–PCR, demonstrating the high degree of sensitivity of the RNA-seq method. Hierarchical clustering of differentially expressed genes uncovered several as yet uncharacterized genes that may contribute to retinal function. Data analysis with BWA and TopHat workflows revealed a significant overlap yet provided complementary insights in transcriptome profiling. Conclusions Our study represents the first detailed analysis of retinal transcriptomes, with biologic replicates, generated by RNA-seq technology. The optimized data analysis workflows reported here should provide a framework for comparative investigations of expression profiles. Our results show that NGS offers a comprehensive and more accurate quantitative and qualitative evaluation of mRNA content within a cell or tissue. We conclude that RNA-seq based transcriptome characterization would expedite genetic network analyses and permit the dissection of complex biologic functions. PMID:22162623
Hsu, Han-Hsiu; Araki, Michihiro; Mochizuki, Masao; Hori, Yoshimi; Murata, Masahiro; Kahar, Prihardi; Yoshida, Takanobu; Hasunuma, Tomohisa; Kondo, Akihiko
2017-03-02
Chinese hamster ovary (CHO) cells are the primary host used for biopharmaceutical protein production. The engineering of CHO cells to produce higher amounts of biopharmaceuticals has been highly dependent on empirical approaches, but recent high-throughput "omics" methods are changing the situation in a rational manner. Omics data analyses using gene expression or metabolite profiling make it possible to identify key genes and metabolites in antibody production. Systematic omics approaches using different types of time-series data are expected to further enhance understanding of cellular behaviours and molecular networks for rational design of CHO cells. This study developed a systematic method for obtaining and analysing time-dependent intracellular and extracellular metabolite profiles, RNA-seq data (enzymatic mRNA levels) and cell counts from CHO cell cultures to capture an overall view of the CHO central metabolic pathway (CMP). We then calculated correlation coefficients among all the profiles and visualised the whole CMP by heatmap analysis and metabolic pathway mapping, to classify genes and metabolites together. This approach provides an efficient platform to identify key genes and metabolites in CHO cell culture.
NASA Astrophysics Data System (ADS)
Abuzahra, M. A. M.; Jakaria; Listyarini, K.; Furqon, A.; Sumantri, C.; Uddin, M. J.; Gunawan, A.
2018-05-01
High-throughput RNA sequencing (RNA-Seq) reveals new challenges for the detection of transcriptome variants (SNPs) in different tissues and species. The aims of this study was to characterize a SNP discovery analysis in the sheep meat odour and flavour transcriptome using RNA-Seq. Six liver samples from divergent sheep meat odour and flavour were analyzed using the Illumina Genome Hiseq 2500 Analyzer. The SNP detection analysis revealed 142 SNPs in sheep meat samples, and a large number of those corresponded to differences between high and low sheep meat odour and flavour ovis genome assembly OAR v4.0. Among them, about 90.4% of genes had multiple polymorphisms within 12 genes (JAML, ANGPTL8, LOC101103463, SEPW1, SCN5A, LOC101113036, DOCK6, GTSE1, KIF12, KCTD17, KANK2, CYP2A6). Several of the SNPs (JAML, CYP2A6, SEPW1, and KIF12) found in this study could be included as suitable markers in genotyping platforms to perform association analyses in commercial populations and apply genomic selection protocols in the sheep meat production.
Gong, Wuming; Koyano-Nakagawa, Naoko; Li, Tongbin; Garry, Daniel J
2015-03-07
Decoding the temporal control of gene expression patterns is key to the understanding of the complex mechanisms that govern developmental decisions during heart development. High-throughput methods have been employed to systematically study the dynamic and coordinated nature of cardiac differentiation at the global level with multiple dimensions. Therefore, there is a pressing need to develop a systems approach to integrate these data from individual studies and infer the dynamic regulatory networks in an unbiased fashion. We developed a two-step strategy to integrate data from (1) temporal RNA-seq, (2) temporal histone modification ChIP-seq, (3) transcription factor (TF) ChIP-seq and (4) gene perturbation experiments to reconstruct the dynamic network during heart development. First, we trained a logistic regression model to predict the probability (LR score) of any base being bound by 543 TFs with known positional weight matrices. Second, four dimensions of data were combined using a time-varying dynamic Bayesian network model to infer the dynamic networks at four developmental stages in the mouse [mouse embryonic stem cells (ESCs), mesoderm (MES), cardiac progenitors (CP) and cardiomyocytes (CM)]. Our method not only infers the time-varying networks between different stages of heart development, but it also identifies the TF binding sites associated with promoter or enhancers of downstream genes. The LR scores of experimentally verified ESCs and heart enhancers were significantly higher than random regions (p <10(-100)), suggesting that a high LR score is a reliable indicator for functional TF binding sites. Our network inference model identified a region with an elevated LR score approximately -9400 bp upstream of the transcriptional start site of Nkx2-5, which overlapped with a previously reported enhancer region (-9435 to -8922 bp). TFs such as Tead1, Gata4, Msx2, and Tgif1 were predicted to bind to this region and participate in the regulation of Nkx2-5 gene expression. Our model also predicted the key regulatory networks for the ESC-MES, MES-CP and CP-CM transitions. We report a novel method to systematically integrate multi-dimensional -omics data and reconstruct the gene regulatory networks. This method will allow one to rapidly determine the cis-modules that regulate key genes during cardiac differentiation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Teplitsky, Ella; Joshi, Karan; Ericson, Daniel L.
We describe a high throughput method for screening up to 1728 distinct chemicals with protein crystals on a single microplate. Acoustic droplet ejection (ADE) was used to co-position 2.5 nL of protein, precipitant, and chemicals on a MiTeGen in situ-1 crystallization plate™ for screening by co-crystallization or soaking. ADE-transferred droplets follow a precise trajectory which allows all components to be transferred through small apertures in the microplate lid. The apertures were large enough for 2.5 nL droplets to pass through them, but small enough so that they did not disrupt the internal environment created by the mother liquor. Using thismore » system, thermolysin and trypsin crystals were efficiently screened for binding to a heavy-metal mini-library. Fluorescence and X-ray diffraction were used to confirm that each chemical in the heavy-metal library was correctly paired with the intended protein crystal. Moreover, a fragment mini-library was screened to observe two known lysozyme We describe a high throughput method for screening up to 1728 distinct chemicals with protein crystals on a single microplate. Acoustic droplet ejection (ADE) was used to co-position 2.5 nL of protein, precipitant, and chemicals on a MiTeGen in situ-1 crystallization plate™ for screening by co-crystallization or soaking. ADE-transferred droplets follow a precise trajectory which allows all components to be transferred through small apertures in the microplate lid. The apertures were large enough for 2.5 nL droplets to pass through them, but small enough so that they did not disrupt the internal environment created by the mother liquor. Using this system, thermolysin and trypsin crystals were efficiently screened for binding to a heavy-metal mini-library. Fluorescence and X-ray diffraction were used to confirm that each chemical in the heavy-metal library was correctly paired with the intended protein crystal. A fragment mini-library was screened to observe two known lysozyme ligands using both co-crystallization and soaking. A similar approach was used to identify multiple, novel thaumatin binding sites for ascorbic acid. This technology pushes towards a faster, automated, and more flexible strategy for high throughput screening of chemical libraries (such as fragment libraries) using as little as 2.5 nL of each component.ds using both co-crystallization and soaking. We used a A similar approach to identify multiple, novel thaumatin binding sites for ascorbic acid. This technology pushes towards a faster, automated, and more flexible strategy for high throughput screening of chemical libraries (such as fragment libraries) using as little as 2.5 nL of each component.« less
Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment.
Gierliński, Marek; Cole, Christian; Schofield, Pietà; Schurch, Nicholas J; Sherstnev, Alexander; Singh, Vijender; Wrobel, Nicola; Gharbi, Karim; Simpson, Gordon; Owen-Hughes, Tom; Blaxter, Mark; Barton, Geoffrey J
2015-11-15
High-throughput RNA sequencing (RNA-seq) is now the standard method to determine differential gene expression. Identifying differentially expressed genes crucially depends on estimates of read-count variability. These estimates are typically based on statistical models such as the negative binomial distribution, which is employed by the tools edgeR, DESeq and cuffdiff. Until now, the validity of these models has usually been tested on either low-replicate RNA-seq data or simulations. A 48-replicate RNA-seq experiment in yeast was performed and data tested against theoretical models. The observed gene read counts were consistent with both log-normal and negative binomial distributions, while the mean-variance relation followed the line of constant dispersion parameter of ∼0.01. The high-replicate data also allowed for strict quality control and screening of 'bad' replicates, which can drastically affect the gene read-count distribution. RNA-seq data have been submitted to ENA archive with project ID PRJEB5348. g.j.barton@dundee.ac.uk. © The Author 2015. Published by Oxford University Press.
Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment
Cole, Christian; Schofield, Pietà; Schurch, Nicholas J.; Sherstnev, Alexander; Singh, Vijender; Wrobel, Nicola; Gharbi, Karim; Simpson, Gordon; Owen-Hughes, Tom; Blaxter, Mark; Barton, Geoffrey J.
2015-01-01
Motivation: High-throughput RNA sequencing (RNA-seq) is now the standard method to determine differential gene expression. Identifying differentially expressed genes crucially depends on estimates of read-count variability. These estimates are typically based on statistical models such as the negative binomial distribution, which is employed by the tools edgeR, DESeq and cuffdiff. Until now, the validity of these models has usually been tested on either low-replicate RNA-seq data or simulations. Results: A 48-replicate RNA-seq experiment in yeast was performed and data tested against theoretical models. The observed gene read counts were consistent with both log-normal and negative binomial distributions, while the mean-variance relation followed the line of constant dispersion parameter of ∼0.01. The high-replicate data also allowed for strict quality control and screening of ‘bad’ replicates, which can drastically affect the gene read-count distribution. Availability and implementation: RNA-seq data have been submitted to ENA archive with project ID PRJEB5348. Contact: g.j.barton@dundee.ac.uk PMID:26206307
Synthesis and SAR of piperazine amides as novel c-jun N-terminal kinase (JNK) inhibitors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shin, Youseung; Chen, Weiming; Habel, Jeff
2009-09-14
A novel series of c-jun N-terminal kinase (JNK) inhibitors were designed and developed from a high-throughput-screening hit. Through the optimization of the piperazine amide 1, several potent compounds were discovered. The X-ray crystal structure of 4g showed a unique binding mode different from other well known JNK3 inhibitors.
Pan, Fei; Zhong, Xiaohan; Xia, Dongsheng; Yin, Xianze; Li, Fan; Zhao, Dongye; Ji, Haodong; Liu, Wen
2017-01-01
This study investigated the efficiency of nanoscale zero-valent iron combined with persulfate (NZVI/PS) for enhanced degradation of brilliant red X-3B in an upflow anaerobic sludge blanket (UASB) reactor, and examined the effects of NZVI/PS on anaerobic microbial communities during the treatment process. The addition of NZVI (0.5 g/L) greatly enhanced the decolourization rate of X-3B from 63.8% to 98.4%. The Biolog EcoPlateTM technique was utilized to examine microbial metabolism in the reactor, and the Illumina MiSeq high-throughput sequencing revealed 22 phyla and 88 genera of the bacteria. The largest genera (Lactococcus) decreased from 33.03% to 7.94%, while the Akkermansia genera increased from 1.69% to 20.23% according to the abundance in the presence of 0.2 g/L NZVI during the biological treatment process. Meanwhile, three strains were isolated from the sludge in the UASB reactors and identified by 16 S rRNA analysis. The distribution of three strains was consistent with the results from the Illumina MiSeq high throughput sequencing. The X-ray photoelectron spectroscopy results indicated that Fe(0) was transformed into Fe(II)/Fe(III) during the treatment process, which are beneficial for the microorganism growth, and thus promoting their metabolic processes and microbial community. PMID:28300176
NASA Astrophysics Data System (ADS)
Pan, Fei; Zhong, Xiaohan; Xia, Dongsheng; Yin, Xianze; Li, Fan; Zhao, Dongye; Ji, Haodong; Liu, Wen
2017-03-01
This study investigated the efficiency of nanoscale zero-valent iron combined with persulfate (NZVI/PS) for enhanced degradation of brilliant red X-3B in an upflow anaerobic sludge blanket (UASB) reactor, and examined the effects of NZVI/PS on anaerobic microbial communities during the treatment process. The addition of NZVI (0.5 g/L) greatly enhanced the decolourization rate of X-3B from 63.8% to 98.4%. The Biolog EcoPlateTM technique was utilized to examine microbial metabolism in the reactor, and the Illumina MiSeq high-throughput sequencing revealed 22 phyla and 88 genera of the bacteria. The largest genera (Lactococcus) decreased from 33.03% to 7.94%, while the Akkermansia genera increased from 1.69% to 20.23% according to the abundance in the presence of 0.2 g/L NZVI during the biological treatment process. Meanwhile, three strains were isolated from the sludge in the UASB reactors and identified by 16 S rRNA analysis. The distribution of three strains was consistent with the results from the Illumina MiSeq high throughput sequencing. The X-ray photoelectron spectroscopy results indicated that Fe(0) was transformed into Fe(II)/Fe(III) during the treatment process, which are beneficial for the microorganism growth, and thus promoting their metabolic processes and microbial community.
Gong, Ting; Szustakowski, Joseph D
2013-04-15
For heterogeneous tissues, measurements of gene expression through mRNA-Seq data are confounded by relative proportions of cell types involved. In this note, we introduce an efficient pipeline: DeconRNASeq, an R package for deconvolution of heterogeneous tissues based on mRNA-Seq data. It adopts a globally optimized non-negative decomposition algorithm through quadratic programming for estimating the mixing proportions of distinctive tissue types in next-generation sequencing data. We demonstrated the feasibility and validity of DeconRNASeq across a range of mixing levels and sources using mRNA-Seq data mixed in silico at known concentrations. We validated our computational approach for various benchmark data, with high correlation between our predicted cell proportions and the real fractions of tissues. Our study provides a rigorous, quantitative and high-resolution tool as a prerequisite to use mRNA-Seq data. The modularity of package design allows an easy deployment of custom analytical pipelines for data from other high-throughput platforms. DeconRNASeq is written in R, and is freely available at http://bioconductor.org/packages. Supplementary data are available at Bioinformatics online.
Budak, Gungor; Srivastava, Rajneesh; Janga, Sarath Chandra
2017-06-01
RNA-binding proteins (RBPs) control the regulation of gene expression in eukaryotic genomes at post-transcriptional level by binding to their cognate RNAs. Although several variants of CLIP (crosslinking and immunoprecipitation) protocols are currently available to study the global protein-RNA interaction landscape at single-nucleotide resolution in a cell, currently there are very few tools that can facilitate understanding and dissecting the functional associations of RBPs from the resulting binding maps. Here, we present Seten, a web-based and command line tool, which can identify and compare processes, phenotypes, and diseases associated with RBPs from condition-specific CLIP-seq profiles. Seten uses BED files resulting from most peak calling algorithms, which include scores reflecting the extent of binding of an RBP on the target transcript, to provide both traditional functional enrichment as well as gene set enrichment results for a number of gene set collections including BioCarta, KEGG, Reactome, Gene Ontology (GO), Human Phenotype Ontology (HPO), and MalaCards Disease Ontology for several organisms including fruit fly, human, mouse, rat, worm, and yeast. It also provides an option to dynamically compare the associated gene sets across data sets as bubble charts, to facilitate comparative analysis. Benchmarking of Seten using eCLIP data for IGF2BP1, SRSF7, and PTBP1 against their corresponding CRISPR RNA-seq in K562 cells as well as randomized negative controls, demonstrated that its gene set enrichment method outperforms functional enrichment, with scores significantly contributing to the discovery of true annotations. Comparative performance analysis using these CRISPR control data sets revealed significantly higher precision and comparable recall to that observed using ChIP-Enrich. Seten's web interface currently provides precomputed results for about 200 CLIP-seq data sets and both command line as well as web interfaces can be used to analyze CLIP-seq data sets. We highlight several examples to show the utility of Seten for rapid profiling of various CLIP-seq data sets. Seten is available on http://www.iupui.edu/∼sysbio/seten/. © 2017 Budak et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Wu, Liang; Zhang, Xiaolong; Zhao, Zhikun; Wang, Ling; Li, Bo; Li, Guibo; Dean, Michael; Yu, Qichao; Wang, Yanhui; Lin, Xinxin; Rao, Weijian; Mei, Zhanlong; Li, Yang; Jiang, Runze; Yang, Huan; Li, Fuqiang; Xie, Guoyun; Xu, Liqin; Wu, Kui; Zhang, Jie; Chen, Jianghao; Wang, Ting; Kristiansen, Karsten; Zhang, Xiuqing; Li, Yingrui; Yang, Huanming; Wang, Jian; Hou, Yong; Xu, Xun
2015-01-01
Viral infection causes multiple forms of human cancer, and HPV infection is the primary factor in cervical carcinomas. Recent single-cell RNA-seq studies highlight the tumor heterogeneity present in most cancers, but virally induced tumors have not been studied. HeLa is a well characterized HPV+ cervical cancer cell line. We developed a new high throughput platform to prepare single-cell RNA on a nanoliter scale based on a customized microwell chip. Using this method, we successfully amplified full-length transcripts of 669 single HeLa S3 cells and 40 of them were randomly selected to perform single-cell RNA sequencing. Based on these data, we obtained a comprehensive understanding of the heterogeneity of HeLa S3 cells in gene expression, alternative splicing and fusions. Furthermore, we identified a high diversity of HPV-18 expression and splicing at the single-cell level. By co-expression analysis we identified 283 E6, E7 co-regulated genes, including CDC25, PCNA, PLK4, BUB1B and IRF1 known to interact with HPV viral proteins. Our results reveal the heterogeneity of a virus-infected cell line. It not only provides a transcriptome characterization of HeLa S3 cells at the single cell level, but is a demonstration of the power of single cell RNA-seq analysis of virally infected cells and cancers.
Detection and Analysis of Circular RNAs by RT-PCR.
Panda, Amaresh C; Gorospe, Myriam
2018-03-20
Gene expression in eukaryotic cells is tightly regulated at the transcriptional and posttranscriptional levels. Posttranscriptional processes, including pre-mRNA splicing, mRNA export, mRNA turnover, and mRNA translation, are controlled by RNA-binding proteins (RBPs) and noncoding (nc)RNAs. The vast family of ncRNAs comprises diverse regulatory RNAs, such as microRNAs and long noncoding (lnc)RNAs, but also the poorly explored class of circular (circ)RNAs. Although first discovered more than three decades ago by electron microscopy, only the advent of high-throughput RNA-sequencing (RNA-seq) and the development of innovative bioinformatic pipelines have begun to allow the systematic identification of circRNAs (Szabo and Salzman, 2016; Panda et al ., 2017b; Panda et al ., 2017c). However, the validation of true circRNAs identified by RNA sequencing requires other molecular biology techniques including reverse transcription (RT) followed by conventional or quantitative (q) polymerase chain reaction (PCR), and Northern blot analysis (Jeck and Sharpless, 2014). RT-qPCR analysis of circular RNAs using divergent primers has been widely used for the detection, validation, and sometimes quantification of circRNAs (Abdelmohsen et al ., 2015 and 2017; Panda et al ., 2017b). As detailed here, divergent primers designed to span the circRNA backsplice junction sequence can specifically amplify the circRNAs and not the counterpart linear RNA. In sum, RT-PCR analysis using divergent primers allows direct detection and quantification of circRNAs.
Zhao, Meng-Meng; Du, Shan-Shan; Li, Qiu-Hong; Chen, Tao; Qiu, Hui; Wu, Qin; Chen, Shan-Shan; Zhou, Ying; Zhang, Yuan; Hu, Yang; Su, Yi-Liang; Shen, Li; Zhang, Fen; Weng, Dong; Li, Hui-Ping
2017-02-01
This study aims to use high throughput 16SrRNA gene sequencing to examine the bacterial profile of lymph node biopsy samples of patients with sarcoidosis and to further verify the association between Propionibacterium acnes (P. acnes) and sarcoidosis. A total of 36 mediastinal lymph node biopsy specimens were collected from 17 cases of sarcoidosis, 8 tuberculosis (TB group), and 11 non-infectious lung diseases (control group). The V4 region of the bacterial 16SrRNA gene in the specimens was amplified and sequenced using the high throughput sequencing platform MiSeq, and bacterial profile was established. The data analysis software QIIME and Metastats were used to compare bacterial relative abundance in the three patient groups. Overall, 545 genera were identified; 38 showed significantly lower and 29 had significantly higher relative abundance in the sarcoidosis group than in the TB and control groups (P < 0.01). P. acnes 16SrRNA was exclusively found in all the 17 samples of the sarcoidosis group, whereas was not detected in the TB and control groups. The relative abundance of P. acnes in the sarcoidosis group (0.16% ± 0. 11%) was significantly higher than that in the TB (Metastats analysis: P = 0.0010, q = 0.0044) and control groups (Metastats analysis: P = 0.0010, q = 0.0038). The relative abundance of P. granulosum was only 0.0022% ± 0. 0044% in the sarcoidosis group. P. granulosum 16SrRNA was not detected in the other two groups. High throughput 16SrRNA gene sequencing appears to be a useful tool to investigate the bacterial profile of sarcoidosis specimens. The results suggest that P. acnes may be involved in sarcoidosis development.
Expression Profiling Smackdown: Human Transcriptome Array HTA 2.0 vs. RNA-Seq
Palermo, Meghann; Driscoll, Heather; Tighe, Scott; Dragon, Julie; Bond, Jeff; Shukla, Arti; Vangala, Mahesh; Vincent, James; Hunter, Tim
2014-01-01
The advent of both microarray and massively parallel sequencing have revolutionized high-throughput analysis of the human transcriptome. Due to limitations in microarray technology, detecting and quantifying coding transcript isoforms, in addition to non-coding transcripts, has been challenging. As a result, RNA-Seq has been the preferred method for characterizing the full human transcriptome, until now. A new high-resolution array from Affymetrix, GeneChip Human Transcriptome Array 2.0 (HTA 2.0), has been designed to interrogate all transcript isoforms in the human transcriptome with >6 million probes targeting coding transcripts, exon-exon splice junctions, and non-coding transcripts. Here we compare expression results from GeneChip HTA 2.0 and RNA-Seq data using identical RNA extractions from three samples each of healthy human mesothelial cells in culture, LP9-C1, and healthy mesothelial cells treated with asbestos, LP9-A1. For GeneChip HTA 2.0 sample preparation, we chose to compare two target preparation methods, NuGEN Ovation Pico WTA V2 with the Encore Biotin Module versus Affymetrix's GeneChip WT PLUS with the WT Terminal Labeling Kit, on identical RNA extractions from both untreated and treated samples. These same RNA extractions were used for the RNA-Seq library preparation. All analyses were performed in Partek Genomics Suite 6.6. Expression profiles for control and asbestos-treated mesothelial cells prepared with NuGEN versus Affymetrix target preparation methods (GeneChip HTA 2.0) are compared to each other as well as to RNA-Seq results.
NASA Astrophysics Data System (ADS)
Zhang, Xirui; Daaboul, George G.; Spuhler, Philipp S.; Dröge, Peter; Ünlü, M. Selim
2016-03-01
DNA-binding proteins play crucial roles in the maintenance and functions of the genome and yet, their specific binding mechanisms are not fully understood. Recently, it was discovered that DNA-binding proteins recognize specific binding sites to carry out their functions through an indirect readout mechanism by recognizing and capturing DNA conformational flexibility and deformation. High-throughput DNA microarray-based methods that provide large-scale protein-DNA binding information have shown effective and comprehensive analysis of protein-DNA binding affinities, but do not provide information of DNA conformational changes in specific protein-DNA complexes. Building on the high-throughput capability of DNA microarrays, we demonstrate a quantitative approach that simultaneously measures the amount of protein binding to DNA and nanometer-scale DNA conformational change induced by protein binding in a microarray format. Both measurements rely on spectral interferometry on a layered substrate using a single optical instrument in two distinct modalities. In the first modality, we quantitate the amount of binding of protein to surface-immobilized DNA in each DNA spot using a label-free spectral reflectivity technique that accurately measures the surface densities of protein and DNA accumulated on the substrate. In the second modality, for each DNA spot, we simultaneously measure DNA conformational change using a fluorescence vertical sectioning technique that determines average axial height of fluorophores tagged to specific nucleotides of the surface-immobilized DNA. The approach presented in this paper, when combined with current high-throughput DNA microarray-based technologies, has the potential to serve as a rapid and simple method for quantitative and large-scale characterization of conformational specific protein-DNA interactions.DNA-binding proteins play crucial roles in the maintenance and functions of the genome and yet, their specific binding mechanisms are not fully understood. Recently, it was discovered that DNA-binding proteins recognize specific binding sites to carry out their functions through an indirect readout mechanism by recognizing and capturing DNA conformational flexibility and deformation. High-throughput DNA microarray-based methods that provide large-scale protein-DNA binding information have shown effective and comprehensive analysis of protein-DNA binding affinities, but do not provide information of DNA conformational changes in specific protein-DNA complexes. Building on the high-throughput capability of DNA microarrays, we demonstrate a quantitative approach that simultaneously measures the amount of protein binding to DNA and nanometer-scale DNA conformational change induced by protein binding in a microarray format. Both measurements rely on spectral interferometry on a layered substrate using a single optical instrument in two distinct modalities. In the first modality, we quantitate the amount of binding of protein to surface-immobilized DNA in each DNA spot using a label-free spectral reflectivity technique that accurately measures the surface densities of protein and DNA accumulated on the substrate. In the second modality, for each DNA spot, we simultaneously measure DNA conformational change using a fluorescence vertical sectioning technique that determines average axial height of fluorophores tagged to specific nucleotides of the surface-immobilized DNA. The approach presented in this paper, when combined with current high-throughput DNA microarray-based technologies, has the potential to serve as a rapid and simple method for quantitative and large-scale characterization of conformational specific protein-DNA interactions. Electronic supplementary information (ESI) available: DNA sequences and nomenclature (Table 1S); SDS-PAGE assay of IHF stock solution (Fig. 1S); determination of the concentration of IHF stock solution by Bradford assay (Fig. 2S); equilibrium binding isotherm fitting results of other DNA sequences (Table 2S); calculation of dissociation constants (Fig. 3S, 4S; Table 2S); geometric model for quantitation of DNA bending angle induced by specific IHF binding (Fig. 4S); customized flow cell assembly (Fig. 5S); real-time measurement of average fluorophore height change by SSFM (Fig. 6S); summary of binding parameters obtained from additive isotherm model fitting (Table 3S); average surface densities of 10 dsDNA spots and bound IHF at equilibrium (Table 4S); effects of surface densities on the binding and bending of dsDNA (Tables 5S, 6S and Fig. 7S-10S). See DOI: 10.1039/c5nr06785e
Transcriptome analysis by strand-specific sequencing of complementary DNA
Parkhomchuk, Dmitri; Borodina, Tatiana; Amstislavskiy, Vyacheslav; Banaru, Maria; Hallen, Linda; Krobitsch, Sylvia; Lehrach, Hans; Soldatov, Alexey
2009-01-01
High-throughput complementary DNA sequencing (RNA-Seq) is a powerful tool for whole-transcriptome analysis, supplying information about a transcript's expression level and structure. However, it is difficult to determine the polarity of transcripts, and therefore identify which strand is transcribed. Here, we present a simple cDNA sequencing protocol that preserves information about a transcript's direction. Using Saccharomyces cerevisiae and mouse brain transcriptomes as models, we demonstrate that knowing the transcript's orientation allows more accurate determination of the structure and expression of genes. It also helps to identify new genes and enables studying promoter-associated and antisense transcription. The transcriptional landscapes we obtained are available online. PMID:19620212
Transcriptome analysis by strand-specific sequencing of complementary DNA.
Parkhomchuk, Dmitri; Borodina, Tatiana; Amstislavskiy, Vyacheslav; Banaru, Maria; Hallen, Linda; Krobitsch, Sylvia; Lehrach, Hans; Soldatov, Alexey
2009-10-01
High-throughput complementary DNA sequencing (RNA-Seq) is a powerful tool for whole-transcriptome analysis, supplying information about a transcript's expression level and structure. However, it is difficult to determine the polarity of transcripts, and therefore identify which strand is transcribed. Here, we present a simple cDNA sequencing protocol that preserves information about a transcript's direction. Using Saccharomyces cerevisiae and mouse brain transcriptomes as models, we demonstrate that knowing the transcript's orientation allows more accurate determination of the structure and expression of genes. It also helps to identify new genes and enables studying promoter-associated and antisense transcription. The transcriptional landscapes we obtained are available online.
Schmidt, Florian; Gasparoni, Nina; Gasparoni, Gilles; Gianmoena, Kathrin; Cadenas, Cristina; Polansky, Julia K.; Ebert, Peter; Nordström, Karl; Barann, Matthias; Sinha, Anupam; Fröhler, Sebastian; Xiong, Jieyi; Dehghani Amirabad, Azim; Behjati Ardakani, Fatemeh; Hutter, Barbara; Zipprich, Gideon; Felder, Bärbel; Eils, Jürgen; Brors, Benedikt; Chen, Wei; Hengstler, Jan G.; Hamann, Alf; Lengauer, Thomas; Rosenstiel, Philip; Walter, Jörn; Schulz, Marcel H.
2017-01-01
The binding and contribution of transcription factors (TF) to cell specific gene expression is often deduced from open-chromatin measurements to avoid costly TF ChIP-seq assays. Thus, it is important to develop computational methods for accurate TF binding prediction in open-chromatin regions (OCRs). Here, we report a novel segmentation-based method, TEPIC, to predict TF binding by combining sets of OCRs with position weight matrices. TEPIC can be applied to various open-chromatin data, e.g. DNaseI-seq and NOMe-seq. Additionally, Histone-Marks (HMs) can be used to identify candidate TF binding sites. TEPIC computes TF affinities and uses open-chromatin/HM signal intensity as quantitative measures of TF binding strength. Using machine learning, we find low affinity binding sites to improve our ability to explain gene expression variability compared to the standard presence/absence classification of binding sites. Further, we show that both footprints and peaks capture essential TF binding events and lead to a good prediction performance. In our application, gene-based scores computed by TEPIC with one open-chromatin assay nearly reach the quality of several TF ChIP-seq data sets. Finally, these scores correctly predict known transcriptional regulators as illustrated by the application to novel DNaseI-seq and NOMe-seq data for primary human hepatocytes and CD4+ T-cells, respectively. PMID:27899623
SeqAPASS: Sequence alignment to predict across-species susceptibility
Efforts to shift the toxicity testing paradigm from whole organism studies to those focused on the initiation of toxicity and relevant pathways have led to increased utilization of in vitro and in silico methods. Hence the emergence of high through-put screening (HTS) programs, s...
Transcription profile of boar spermatozoa as revealed by RNA-sequencing
USDA-ARS?s Scientific Manuscript database
High-throughput RNA sequencing (RNA-Seq) overcomes the limitations of the current hybridization-based techniques to detect the actual pool of RNA transcripts in spermatozoa. The application of this technology in livestock can speed the discovery of potential predictors of male fertility. As a first ...
Bondt, Albert; Rombouts, Yoann; Selman, Maurice H J; Hensbergen, Paul J; Reiding, Karli R; Hazes, Johanna M W; Dolhain, Radboud J E M; Wuhrer, Manfred
2014-11-01
The N-linked glycosylation of the constant fragment (Fc) of immunoglobulin G has been shown to change during pathological and physiological events and to strongly influence antibody inflammatory properties. In contrast, little is known about Fab-linked N-glycosylation, carried by ∼ 20% of IgG. Here we present a high-throughput workflow to analyze Fab and Fc glycosylation of polyclonal IgG purified from 5 μl of serum. We were able to detect and quantify 37 different N-glycans by means of MALDI-TOF-MS analysis in reflectron positive mode using a novel linkage-specific derivatization of sialic acid. This method was applied to 174 samples of a pregnancy cohort to reveal Fab glycosylation features and their change with pregnancy. Data analysis revealed marked differences between Fab and Fc glycosylation, especially in the levels of galactosylation and sialylation, incidence of bisecting GlcNAc, and presence of high mannose structures, which were all higher in the Fab portion than the Fc, whereas Fc showed higher levels of fucosylation. Additionally, we observed several changes during pregnancy and after delivery. Fab N-glycan sialylation was increased and bisection was decreased relative to postpartum time points, and nearly complete galactosylation of Fab glycans was observed throughout. Fc glycosylation changes were similar to results described before, with increased galactosylation and sialylation and decreased bisection during pregnancy. We expect that the parallel analysis of IgG Fab and Fc, as set up in this paper, will be important for unraveling roles of these glycans in (auto)immunity, which may be mediated via recognition by human lectins or modulation of antigen binding. © 2014 by The American Society for Biochemistry and Molecular Biology, Inc.
Bondt, Albert; Rombouts, Yoann; Selman, Maurice H. J.; Hensbergen, Paul J.; Reiding, Karli R.; Hazes, Johanna M. W.; Dolhain, Radboud J. E. M.; Wuhrer, Manfred
2014-01-01
The N-linked glycosylation of the constant fragment (Fc) of immunoglobulin G has been shown to change during pathological and physiological events and to strongly influence antibody inflammatory properties. In contrast, little is known about Fab-linked N-glycosylation, carried by ∼20% of IgG. Here we present a high-throughput workflow to analyze Fab and Fc glycosylation of polyclonal IgG purified from 5 μl of serum. We were able to detect and quantify 37 different N-glycans by means of MALDI-TOF-MS analysis in reflectron positive mode using a novel linkage-specific derivatization of sialic acid. This method was applied to 174 samples of a pregnancy cohort to reveal Fab glycosylation features and their change with pregnancy. Data analysis revealed marked differences between Fab and Fc glycosylation, especially in the levels of galactosylation and sialylation, incidence of bisecting GlcNAc, and presence of high mannose structures, which were all higher in the Fab portion than the Fc, whereas Fc showed higher levels of fucosylation. Additionally, we observed several changes during pregnancy and after delivery. Fab N-glycan sialylation was increased and bisection was decreased relative to postpartum time points, and nearly complete galactosylation of Fab glycans was observed throughout. Fc glycosylation changes were similar to results described before, with increased galactosylation and sialylation and decreased bisection during pregnancy. We expect that the parallel analysis of IgG Fab and Fc, as set up in this paper, will be important for unraveling roles of these glycans in (auto)immunity, which may be mediated via recognition by human lectins or modulation of antigen binding. PMID:25004930
Ioannidis, Vassilios; van Nimwegen, Erik; Stockinger, Heinz
2016-01-01
ISMARA ( ismara.unibas.ch) automatically infers the key regulators and regulatory interactions from high-throughput gene expression or chromatin state data. However, given the large sizes of current next generation sequencing (NGS) datasets, data uploading times are a major bottleneck. Additionally, for proprietary data, users may be uncomfortable with uploading entire raw datasets to an external server. Both these problems could be alleviated by providing a means by which users could pre-process their raw data locally, transferring only a small summary file to the ISMARA server. We developed a stand-alone client application that pre-processes large input files (RNA-seq or ChIP-seq data) on the user's computer for performing ISMARA analysis in a completely automated manner, including uploading of small processed summary files to the ISMARA server. This reduces file sizes by up to a factor of 1000, and upload times from many hours to mere seconds. The client application is available from ismara.unibas.ch/ISMARA/client. PMID:28232860
Liu, Gary W; Livesay, Brynn R; Kacherovsky, Nataly A; Cieslewicz, Maryelise; Lutz, Emi; Waalkes, Adam; Jensen, Michael C; Salipante, Stephen J; Pun, Suzie H
2015-08-19
Peptide ligands are used to increase the specificity of drug carriers to their target cells and to facilitate intracellular delivery. One method to identify such peptide ligands, phage display, enables high-throughput screening of peptide libraries for ligands binding to therapeutic targets of interest. However, conventional methods for identifying target binders in a library by Sanger sequencing are low-throughput, labor-intensive, and provide a limited perspective (<0.01%) of the complete sequence space. Moreover, the small sample space can be dominated by nonspecific, preferentially amplifying "parasitic sequences" and plastic-binding sequences, which may lead to the identification of false positives or exclude the identification of target-binding sequences. To overcome these challenges, we employed next-generation Illumina sequencing to couple high-throughput screening and high-throughput sequencing, enabling more comprehensive access to the phage display library sequence space. In this work, we define the hallmarks of binding sequences in next-generation sequencing data, and develop a method that identifies several target-binding phage clones for murine, alternatively activated M2 macrophages with a high (100%) success rate: sequences and binding motifs were reproducibly present across biological replicates; binding motifs were identified across multiple unique sequences; and an unselected, amplified library accurately filtered out parasitic sequences. In addition, we validate the Multiple Em for Motif Elicitation tool as an efficient and principled means of discovering binding sequences.
Saturation analysis of ChIP-seq data for reproducible identification of binding peaks
Hansen, Peter; Hecht, Jochen; Ibrahim, Daniel M.; Krannich, Alexander; Truss, Matthias; Robinson, Peter N.
2015-01-01
Chromatin immunoprecipitation coupled with next-generation sequencing (ChIP-seq) is a powerful technology to identify the genome-wide locations of transcription factors and other DNA binding proteins. Computational ChIP-seq peak calling infers the location of protein–DNA interactions based on various measures of enrichment of sequence reads. In this work, we introduce an algorithm, Q, that uses an assessment of the quadratic enrichment of reads to center candidate peaks followed by statistical analysis of saturation of candidate peaks by 5′ ends of reads. We show that our method not only is substantially faster than several competing methods but also demonstrates statistically significant advantages with respect to reproducibility of results and in its ability to identify peaks with reproducible binding site motifs. We show that Q has superior performance in the delineation of double RNAPII and H3K4me3 peaks surrounding transcription start sites related to a better ability to resolve individual peaks. The method is implemented in C+l+ and is freely available under an open source license. PMID:26163319
ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data
Krestel, Ralf; Ohler, Uwe; Vingron, Martin; Marsico, Annalisa
2017-01-01
Abstract RNA-binding proteins (RBPs) play an important role in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. The extent to which RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders either take the structure of the RNA only partially into account, or employ models which are not directly interpretable as sequence-structure motifs. We developed ssHMM, an RNA motif finder based on a hidden Markov model (HMM) and Gibbs sampling which fully captures the relationship between RNA sequence and secondary structure preference of a given RBP. Compared to previous methods which output separate logos for sequence and structure, it directly produces a combined sequence-structure motif when trained on a large set of sequences. ssHMM’s model is visualized intuitively as a graph and facilitates biological interpretation. ssHMM can be used to find novel bona fide sequence-structure motifs of uncharacterized RBPs, such as the one presented here for the YY1 protein. ssHMM reaches a high motif recovery rate on synthetic data, it recovers known RBP motifs from CLIP-Seq data, and scales linearly on the input size, being considerably faster than MEMERIS and RNAcontext on large datasets while being on par with GraphProt. It is freely available on Github and as a Docker image. PMID:28977546
Yu, Hua; Jiao, Bingke; Lu, Lu; Wang, Pengfei; Chen, Shuangcheng; Liang, Chengzhi; Liu, Wei
2018-01-01
Accurately reconstructing gene co-expression network is of great importance for uncovering the genetic architecture underlying complex and various phenotypes. The recent availability of high-throughput RNA-seq sequencing has made genome-wide detecting and quantifying of the novel, rare and low-abundance transcripts practical. However, its potential merits in reconstructing gene co-expression network have still not been well explored. Using massive-scale RNA-seq samples, we have designed an ensemble pipeline, called NetMiner, for building genome-scale and high-quality Gene Co-expression Network (GCN) by integrating three frequently used inference algorithms. We constructed a RNA-seq-based GCN in one species of monocot rice. The quality of network obtained by our method was verified and evaluated by the curated gene functional association data sets, which obviously outperformed each single method. In addition, the powerful capability of network for associating genes with functions and agronomic traits was shown by enrichment analysis and case studies. In particular, we demonstrated the potential value of our proposed method to predict the biological roles of unknown protein-coding genes, long non-coding RNA (lncRNA) genes and circular RNA (circRNA) genes. Our results provided a valuable and highly reliable data source to select key candidate genes for subsequent experimental validation. To facilitate identification of novel genes regulating important biological processes and phenotypes in other plants or animals, we have published the source code of NetMiner, making it freely available at https://github.com/czllab/NetMiner.
Gallardo-Escárate, Cristian; Valenzuela-Muñoz, Valentina; Nuñez-Acuña, Gustavo
2014-01-01
Despite the economic and environmental impacts that sea lice infestations have on salmon farming worldwide, genomic data generated by high-throughput transcriptome sequencing for different developmental stages, sexes, and strains of sea lice is still limited or unknown. In this study, RNA-seq analysis was performed using de novo transcriptome assembly as a reference for evidenced transcriptional changes from six developmental stages of the salmon louse Caligus rogercresseyi. EST-datasets were generated from the nauplius I, nauplius II, copepodid and chalimus stages and from female and male adults using MiSeq Illumina sequencing. A total of 151,788,682 transcripts were yielded, which were assembled into 83,444 high quality contigs and subsequently annotated into roughly 24,000 genes based on known proteins. To identify differential transcription patterns among salmon louse stages, cluster analyses were performed using normalized gene expression values. Herein, four clusters were differentially expressed between nauplius I–II and copepodid stages (604 transcripts), five clusters between copepodid and chalimus stages (2,426 transcripts), and six clusters between female and male adults (2,478 transcripts). Gene ontology analysis revealed that the nauplius I–II, copepodid and chalimus stages are mainly annotated to aminoacid transfer/repair/breakdown, metabolism, molting cycle, and nervous system development. Additionally, genes showing differential transcription in female and male adults were highly related to cytoskeletal and contractile elements, reproduction, cell development, morphogenesis, and transcription-translation processes. The data presented in this study provides the most comprehensive transcriptome resource available for C. rogercresseyi, which should be used for future genomic studies linked to host-parasite interactions. PMID:24691066
Gallardo-Escárate, Cristian; Valenzuela-Muñoz, Valentina; Nuñez-Acuña, Gustavo
2014-01-01
Despite the economic and environmental impacts that sea lice infestations have on salmon farming worldwide, genomic data generated by high-throughput transcriptome sequencing for different developmental stages, sexes, and strains of sea lice is still limited or unknown. In this study, RNA-seq analysis was performed using de novo transcriptome assembly as a reference for evidenced transcriptional changes from six developmental stages of the salmon louse Caligus rogercresseyi. EST-datasets were generated from the nauplius I, nauplius II, copepodid and chalimus stages and from female and male adults using MiSeq Illumina sequencing. A total of 151,788,682 transcripts were yielded, which were assembled into 83,444 high quality contigs and subsequently annotated into roughly 24,000 genes based on known proteins. To identify differential transcription patterns among salmon louse stages, cluster analyses were performed using normalized gene expression values. Herein, four clusters were differentially expressed between nauplius I-II and copepodid stages (604 transcripts), five clusters between copepodid and chalimus stages (2,426 transcripts), and six clusters between female and male adults (2,478 transcripts). Gene ontology analysis revealed that the nauplius I-II, copepodid and chalimus stages are mainly annotated to aminoacid transfer/repair/breakdown, metabolism, molting cycle, and nervous system development. Additionally, genes showing differential transcription in female and male adults were highly related to cytoskeletal and contractile elements, reproduction, cell development, morphogenesis, and transcription-translation processes. The data presented in this study provides the most comprehensive transcriptome resource available for C. rogercresseyi, which should be used for future genomic studies linked to host-parasite interactions.
Markel, Eric; Maciak, Charlene; Butcher, Bronwyn G.; Myers, Christopher R.; Stodghill, Paul; Bao, Zhongmeng; Cartinhour, Sam; Swingle, Bryan
2011-01-01
The diversity of regulatory systems encoded by bacteria provides an indication of the variety of stresses and interactions that these organisms encounter in nature. We have been investigating how the plant pathogen Pseudomonas syringae pv. tomato DC3000 responds to iron limitation and have focused on the iron starvation (IS) sigma factors to identify regulon members and to explore the mechanistic details of genetic control for this class of regulators. In the study described in this report, we used chromatin immunoprecipitation paired with high-throughput sequencing (ChIP-Seq) to screen the genome for locations associated with binding of the P. syringae IS sigma factor PSPTO_1203. We used multiple methods to demonstrate differential regulation of two genes identified in the ChIP-Seq screen and characterize the promoter elements that facilitate PSPTO_1203-dependent regulation. The genes regulated by PSPTO_1203 encode a TonB-dependent transducer (PSPTO_1206) and a cytoplasmic membrane protein (PSPTO_2145), which is located in the P. syringae pyoverdine cluster. Additionally, we identified siderophores that induce the activity of PSPTO_1203 and used this information to investigate the functional components of the signal transduction cascade. PMID:21840980
Targeted Capture and High-Throughput Sequencing Using Molecular Inversion Probes (MIPs).
Cantsilieris, Stuart; Stessman, Holly A; Shendure, Jay; Eichler, Evan E
2017-01-01
Molecular inversion probes (MIPs) in combination with massively parallel DNA sequencing represent a versatile, yet economical tool for targeted sequencing of genomic DNA. Several thousand genomic targets can be selectively captured using long oligonucleotides containing unique targeting arms and universal linkers. The ability to append sequencing adaptors and sample-specific barcodes allows large-scale pooling and subsequent high-throughput sequencing at relatively low cost per sample. Here, we describe a "wet bench" protocol detailing the capture and subsequent sequencing of >2000 genomic targets from 192 samples, representative of a single lane on the Illumina HiSeq 2000 platform.
We demonstrate a computational network model that integrates 18 in vitro, high-throughput screening assays measuring estrogen receptor (ER) binding, dimerization, chromatin binding, transcriptional activation and ER-dependent cell proliferation. The network model uses activity pa...
Quantitative phenotyping via deep barcode sequencing
Smith, Andrew M.; Heisler, Lawrence E.; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J.; Chee, Mark; Roth, Frederick P.; Giaever, Guri; Nislow, Corey
2009-01-01
Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or “Bar-seq,” outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that ∼20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene–environment interactions on a genome-wide scale. PMID:19622793
Nalpas, Nicolas C; Park, Stephen D E; Magee, David A; Taraktsoglou, Maria; Browne, John A; Conlon, Kevin M; Rue-Albrecht, Kévin; Killick, Kate E; Hokamp, Karsten; Lohan, Amanda J; Loftus, Brendan J; Gormley, Eamonn; Gordon, Stephen V; MacHugh, David E
2013-04-08
Mycobacterium bovis, the causative agent of bovine tuberculosis, is an intracellular pathogen that can persist inside host macrophages during infection via a diverse range of mechanisms that subvert the host immune response. In the current study, we have analysed and compared the transcriptomes of M. bovis-infected monocyte-derived macrophages (MDM) purified from six Holstein-Friesian females with the transcriptomes of non-infected control MDM from the same animals over a 24 h period using strand-specific RNA sequencing (RNA-seq). In addition, we compare gene expression profiles generated using RNA-seq with those previously generated by us using the high-density Affymetrix® GeneChip® Bovine Genome Array platform from the same MDM-extracted RNA. A mean of 7.2 million reads from each MDM sample mapped uniquely and unambiguously to single Bos taurus reference genome locations. Analysis of these mapped reads showed 2,584 genes (1,392 upregulated; 1,192 downregulated) and 757 putative natural antisense transcripts (558 upregulated; 119 downregulated) that were differentially expressed based on sense and antisense strand data, respectively (adjusted P-value ≤ 0.05). Of the differentially expressed genes, 694 were common to both the sense and antisense data sets, with the direction of expression (i.e. up- or downregulation) positively correlated for 693 genes and negatively correlated for the remaining gene. Gene ontology analysis of the differentially expressed genes revealed an enrichment of immune, apoptotic and cell signalling genes. Notably, the number of differentially expressed genes identified from RNA-seq sense strand analysis was greater than the number of differentially expressed genes detected from microarray analysis (2,584 genes versus 2,015 genes). Furthermore, our data reveal a greater dynamic range in the detection and quantification of gene transcripts for RNA-seq compared to microarray technology. This study highlights the value of RNA-seq in identifying novel immunomodulatory mechanisms that underlie host-mycobacterial pathogen interactions during infection, including possible complex post-transcriptional regulation of host gene expression involving antisense RNA.
High-throughput analysis of peptide binding modules
Liu, Bernard A.; Engelmann, Brett; Nash, Piers D.
2014-01-01
Modular protein interaction domains that recognize linear peptide motifs are found in hundreds of proteins within the human genome. Some protein interaction domains such as SH2, 14-3-3, Chromo and Bromo domains serve to recognize post-translational modification of amino acids (such as phosphorylation, acetylation, methylation etc.) and translate these into discrete cellular responses. Other modules such as SH3 and PDZ domains recognize linear peptide epitopes and serve to organize protein complexes based on localization and regions of elevated concentration. In both cases, the ability to nucleate specific signaling complexes is in large part dependent on the selectivity of a given protein module for its cognate peptide ligand. High throughput analysis of peptide-binding domains by peptide or protein arrays, phage display, mass spectrometry or other HTP techniques provides new insight into the potential protein-protein interactions prescribed by individual or even whole families of modules. Systems level analyses have also promoted a deeper understanding of the underlying principles that govern selective protein-protein interactions and how selectivity evolves. Lastly, there is a growing appreciation for the limitations and potential pitfalls of high-throughput analysis of protein-peptide interactomes. This review will examine some of the common approaches utilized for large-scale studies of protein interaction domains and suggest a set of standards for the analysis and validation of datasets from large-scale studies of peptide-binding modules. We will also highlight how data from large-scale studies of modular interaction domain families can provide insight into systems level properties such as the linguistics of selective interactions. PMID:22610655
Identification of the miRNA targetome in hippocampal neurons using RIP-seq.
Malmevik, Josephine; Petri, Rebecca; Klussendorf, Thies; Knauff, Pina; Åkerblom, Malin; Johansson, Jenny; Soneji, Shamit; Jakobsson, Johan
2015-07-28
MicroRNAs (miRNAs) are key players in the regulation of neuronal processes by targeting a large network of target messenger RNAs (mRNAs). However, the identity and function of mRNAs targeted by miRNAs in specific cells of the brain are largely unknown. Here, we established an adeno-associated viral vector (AAV)-based neuron-specific Argonaute2:GFP-RNA immunoprecipitation followed by high-throughput sequencing to analyse the regulatory role of miRNAs in mouse hippocampal neurons. Using this approach, we identified more than two thousand miRNA targets in hippocampal neurons, regulating essential neuronal features such as cell signalling, transcription and axon guidance. Furthermore, we found that stable inhibition of the highly expressed miR-124 and miR-125 in hippocampal neurons led to significant but distinct changes in the AGO2 binding of target mRNAs, resulting in subsequent upregulation of numerous miRNA target genes. These findings greatly enhance our understanding of the miRNA targetome in hippocampal neurons.
Skelly, Daniel A.; Johansson, Marnie; Madeoy, Jennifer; Wakefield, Jon; Akey, Joshua M.
2011-01-01
Variation in gene expression is thought to make a significant contribution to phenotypic diversity among individuals within populations. Although high-throughput cDNA sequencing offers a unique opportunity to delineate the genome-wide architecture of regulatory variation, new statistical methods need to be developed to capitalize on the wealth of information contained in RNA-seq data sets. To this end, we developed a powerful and flexible hierarchical Bayesian model that combines information across loci to allow both global and locus-specific inferences about allele-specific expression (ASE). We applied our methodology to a large RNA-seq data set obtained in a diploid hybrid of two diverse Saccharomyces cerevisiae strains, as well as to RNA-seq data from an individual human genome. Our statistical framework accurately quantifies levels of ASE with specified false-discovery rates, achieving high reproducibility between independent sequencing platforms. We pinpoint loci that show unusual and biologically interesting patterns of ASE, including allele-specific alternative splicing and transcription termination sites. Our methodology provides a rigorous, quantitative, and high-resolution tool for profiling ASE across whole genomes. PMID:21873452
2012-01-01
Background ChIP-seq provides new opportunities to study allele-specific protein-DNA binding (ASB). However, detecting allelic imbalance from a single ChIP-seq dataset often has low statistical power since only sequence reads mapped to heterozygote SNPs are informative for discriminating two alleles. Results We develop a new method iASeq to address this issue by jointly analyzing multiple ChIP-seq datasets. iASeq uses a Bayesian hierarchical mixture model to learn correlation patterns of allele-specificity among multiple proteins. Using the discovered correlation patterns, the model allows one to borrow information across datasets to improve detection of allelic imbalance. Application of iASeq to 77 ChIP-seq samples from 40 ENCODE datasets and 1 genomic DNA sample in GM12878 cells reveals that allele-specificity of multiple proteins are highly correlated, and demonstrates the ability of iASeq to improve allelic inference compared to analyzing each individual dataset separately. Conclusions iASeq illustrates the value of integrating multiple datasets in the allele-specificity inference and offers a new tool to better analyze ASB. PMID:23194258
Comprehensive RNA-Seq profiling to evaluate lactating sheep mammary gland transcriptome
Suárez-Vega, Aroa; Gutiérrez-Gil, Beatriz; Klopp, Christophe; Tosser-Klopp, Gwenola; Arranz, Juan-José
2016-01-01
RNA-Seq enables the generation of extensive transcriptome information providing the capability to characterize transcripts (including alternative isoforms and polymorphism), to quantify expression and to identify differential regulation in a single experiment. Our aim in this study was to take advantage of using RNA-Seq high-throughput technology to provide a comprehensive transcriptome profiling of the sheep lactating mammary gland. Eight ewes of two dairy sheep breeds with differences in milk production traits were used in this experiment (four Churra and four Assaf ewes). Milk samples from these animals were collected on days 10, 50, 120 and 150 after lambing to cover the various physiological stages of the mammary gland across the complete lactation. RNA samples were extracted from milk somatic cells. The RNA-Seq dataset was generated using an Illumina HiSeq 2000 sequencer. The information reported here will be useful to understand the biology of lactation in sheep, providing also an opportunity to characterize their different patterns on milk production aptitude. PMID:27377755
Comprehensive RNA-Seq profiling to evaluate lactating sheep mammary gland transcriptome.
Suárez-Vega, Aroa; Gutiérrez-Gil, Beatriz; Klopp, Christophe; Tosser-Klopp, Gwenola; Arranz, Juan-José
2016-07-05
RNA-Seq enables the generation of extensive transcriptome information providing the capability to characterize transcripts (including alternative isoforms and polymorphism), to quantify expression and to identify differential regulation in a single experiment. Our aim in this study was to take advantage of using RNA-Seq high-throughput technology to provide a comprehensive transcriptome profiling of the sheep lactating mammary gland. Eight ewes of two dairy sheep breeds with differences in milk production traits were used in this experiment (four Churra and four Assaf ewes). Milk samples from these animals were collected on days 10, 50, 120 and 150 after lambing to cover the various physiological stages of the mammary gland across the complete lactation. RNA samples were extracted from milk somatic cells. The RNA-Seq dataset was generated using an Illumina HiSeq 2000 sequencer. The information reported here will be useful to understand the biology of lactation in sheep, providing also an opportunity to characterize their different patterns on milk production aptitude.
Lu, Ruipeng; Mucaki, Eliseos J; Rogan, Peter K
2017-03-17
Data from ChIP-seq experiments can derive the genome-wide binding specificities of transcription factors (TFs) and other regulatory proteins. We analyzed 765 ENCODE ChIP-seq peak datasets of 207 human TFs with a novel motif discovery pipeline based on recursive, thresholded entropy minimization. This approach, while obviating the need to compensate for skewed nucleotide composition, distinguishes true binding motifs from noise, quantifies the strengths of individual binding sites based on computed affinity and detects adjacent cofactor binding sites that coordinate with the targets of primary, immunoprecipitated TFs. We obtained contiguous and bipartite information theory-based position weight matrices (iPWMs) for 93 sequence-specific TFs, discovered 23 cofactor motifs for 127 TFs and revealed six high-confidence novel motifs. The reliability and accuracy of these iPWMs were determined via four independent validation methods, including the detection of experimentally proven binding sites, explanation of effects of characterized SNPs, comparison with previously published motifs and statistical analyses. We also predict previously unreported TF coregulatory interactions (e.g. TF complexes). These iPWMs constitute a powerful tool for predicting the effects of sequence variants in known binding sites, performing mutation analysis on regulatory SNPs and predicting previously unrecognized binding sites and target genes. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Efficient identification of tubby-binding proteins by an improved system of T7 phage display.
Caberoy, Nora B; Zhou, Yixiong; Jiang, Xiaoyu; Alvarado, Gabriela; Li, Wei
2010-01-01
Mutation in the tubby gene causes adult-onset obesity, progressive retinal, and cochlear degeneration with unknown mechanism. In contrast, mutations in tubby-like protein 1 (Tulp1), whose C-terminus is highly homologous to tubby, only lead to retinal degeneration. We speculate that their diverse N-terminus may define their distinct disease profile. To elucidate the binding partners of tubby, we used tubby N-terminus (tubby-N) as bait to identify unknown binding proteins with open-reading-frame (ORF) phage display. T7 phage display was engineered with three improvements: high-quality ORF phage display cDNA library, specific phage elution by protease cleavage, and dual phage display for sensitive high throughput screening. The new system is capable of identifying unknown bait-binding proteins in as fast as approximately 4-7 days. While phage display with conventional cDNA libraries identifies high percentage of out-of-frame unnatural short peptides, all 28 tubby-N-binding clones identified by ORF phage display were ORFs. They encode 16 proteins, including 8 nuclear proteins. Fourteen proteins were analyzed by yeast two-hybrid assay and protein pull-down assay with ten of them independently verified. Comparative binding analyses revealed several proteins binding to both tubby and Tulp1 as well as one tubby-specific binding protein. These data suggest that tubby-N is capable of interacting with multiple nuclear and cytoplasmic protein binding partners. These results demonstrated that the newly-engineered ORF phage display is a powerful technology to identify unknown protein-protein interactions. (c) 2009 John Wiley & Sons, Ltd.
Zhuang, Ze-Gang; Zhang, Jun-Ai; Luo, Hou-Long; Liu, Gan-Bin; Lu, Yuan-Bin; Ge, Nan-Hai; Zheng, Bi-Ying; Li, Rui Xi; Chen, Chen; Wang, Xin; Liu, Yu-Qing; Liu, Feng-Hui; Zhou, Yong; Cai, Xiao-Zhen; Chen, Zheng W; Xu, Jun-Fa
2017-10-01
It has been reported that circular RNA (circRNA) is associated with human cancer. However, few studies have been reported in active pulmonary tuberculosis (APTB). The global circRNA expression was detected in the peripheral blood mononuclear cells (PBMCs) of APTB patients (n=5) and health controls (HC) (n=5) by using high-throughput sequencing. According to the systematical bioinformatics analysis, the basic content of circRNAs and their fold changes in the two groups were calculated. We selected 6 significant differentially expressed circRNAs, hsa_circ_0005836, hsa_circ_0009128, hsa_circ_0003519, hsa_circ_0023956, hsa_circ_0078768, and hsa_circ_0088452 and validated the expression in PBMCs from APTB (n=10) and HC (n=10) by real-time quantitative reverse transcription-polymerase chain reactions (qRT-PCRs). Further, the verification of these specific circRNAs (hsa_circ_0005836 and hsa_circ_0009128) between APTB (n=34) and HC (n=30) in PBMCs was also conducted by qRT-PCRs. The RNA-seq data showed the significant differential expression of the 523 circRNAs between the APTB and HC groups (199 circRNAs were significantly up-regulated and 324 circRNAs were down-regulated). Hsa_circ_0005836 and hsa_circ_0009128 expression was significantly down-regulated in the PBMCs of APTB (P<0.05) in the samples of APTB compared to HC in our study. The gene ontology based enrichment analysis of the circRNA-miRNA-mRNAs network showed that cellular catabolic process (P=7.10E-08), regulation of metabolic process (P=2.10E-06), catalytic activity (P=3.67E-08), protein binding (P=1.71E-07), cell part (P=3.46E-06), intracellular part (P=1.71E-07), and intracellular (P=3.67E-08) were recognized in the comparisons between APTB and HC. Based on KEGG analysis, HTLV-I infection, regulation of actin cytoskeleton, neurotrophin signaling pathway and mTOR signaling pathway were relevant during tuberculosis bacillus infection. We found for the first time that hsa_circ_0005836 and hsa_circ_0009128 were significantly down-regulated in the PBMCs of APTB compared with HC. Our findings indicate hsa_circ_0005836 might serve as a novel potential biomarker for TB infection. Copyright © 2017. Published by Elsevier Ltd.
Cortijo, Sandra; Charoensawan, Varodom; Roudier, François; Wigge, Philip A
2018-01-01
Chromatin immunoprecipitation combined with next-generation sequencing (ChIP-seq) is a powerful technique to investigate in vivo transcription factor (TF) binding to DNA, as well as chromatin marks. Here we provide a detailed protocol for all the key steps to perform ChIP-seq in Arabidopsis thaliana roots, also working on other A. thaliana tissues and in most non-ligneous plants. We detail all steps from material collection, fixation, chromatin preparation, immunoprecipitation, library preparation, and finally computational analysis based on a combination of publicly available tools.
Polstein, Lauren R.; Perez-Pinera, Pablo; Kocak, D. Dewran; Vockley, Christopher M.; Bledsoe, Peggy; Song, Lingyun; Safi, Alexias; Crawford, Gregory E.; Reddy, Timothy E.; Gersbach, Charles A.
2015-01-01
Genome engineering technologies based on the CRISPR/Cas9 and TALE systems are enabling new approaches in science and biotechnology. However, the specificity of these tools in complex genomes and the role of chromatin structure in determining DNA binding are not well understood. We analyzed the genome-wide effects of TALE- and CRISPR-based transcriptional activators in human cells using ChIP-seq to assess DNA-binding specificity and RNA-seq to measure the specificity of perturbing the transcriptome. Additionally, DNase-seq was used to assess genome-wide chromatin remodeling that occurs as a result of their action. Our results show that these transcription factors are highly specific in both DNA binding and gene regulation and are able to open targeted regions of closed chromatin independent of gene activation. Collectively, these results underscore the potential for these technologies to make precise changes to gene expression for gene and cell therapies or fundamental studies of gene function. PMID:26025803
Sequence Alignment to Predict Across Species Susceptibility ...
Conservation of a molecular target across species can be used as a line-of-evidence to predict the likelihood of chemical susceptibility. The web-based Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool was developed to simplify, streamline, and quantitatively assess protein sequence/structural similarity across taxonomic groups as a means to predict relative intrinsic susceptibility. The intent of the tool is to allow for evaluation of any potential protein target, so it is amenable to variable degrees of protein characterization, depending on available information about the chemical/protein interaction and the molecular target itself. To allow for flexibility in the analysis, a layered strategy was adopted for the tool. The first level of the SeqAPASS analysis compares primary amino acid sequences to a query sequence, calculating a metric for sequence similarity (including detection of candidate orthologs), the second level evaluates sequence similarity within selected domains (e.g., ligand-binding domain, DNA binding domain), and the third level of analysis compares individual amino acid residue positions identified as being of importance for protein conformation and/or ligand binding upon chemical perturbation. Each level of the SeqAPASS analysis provides increasing evidence to apply toward rapid, screening-level assessments of probable cross species susceptibility. Such analyses can support prioritization of chemicals for further ev
Extrapolating toxicity data across species using U.S. EPA SeqAPASS tool
In vitro high-throughput screening (HTS) and in silico technologies have emerged as 21st century tools for chemical hazard identification. In 2007 the U.S. Environmental Protection Agency (EPA) launched the ToxCast Program, which has screened thousands of chemicals in hundreds of...
Kulakovskiy, Ivan V; Vorontsov, Ilya E; Yevshin, Ivan S; Sharipov, Ruslan N; Fedorova, Alla D; Rumynskiy, Eugene I; Medvedeva, Yulia A; Magana-Mora, Arturo; Bajic, Vladimir B; Papatsenko, Dmitry A; Kolpakov, Fedor A; Makeev, Vsevolod J
2018-01-04
We present a major update of the HOCOMOCO collection that consists of patterns describing DNA binding specificities for human and mouse transcription factors. In this release, we profited from a nearly doubled volume of published in vivo experiments on transcription factor (TF) binding to expand the repertoire of binding models, replace low-quality models previously based on in vitro data only and cover more than a hundred TFs with previously unknown binding specificities. This was achieved by systematic motif discovery from more than five thousand ChIP-Seq experiments uniformly processed within the BioUML framework with several ChIP-Seq peak calling tools and aggregated in the GTRD database. HOCOMOCO v11 contains binding models for 453 mouse and 680 human transcription factors and includes 1302 mononucleotide and 576 dinucleotide position weight matrices, which describe primary binding preferences of each transcription factor and reliable alternative binding specificities. An interactive interface and bulk downloads are available on the web: http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco11. In this release, we complement HOCOMOCO by MoLoTool (Motif Location Toolbox, http://molotool.autosome.ru) that applies HOCOMOCO models for visualization of binding sites in short DNA sequences. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Wu, Jian; Dai, Wei; Wu, Lin; Wang, Jinke
2018-02-13
Next-generation sequencing (NGS) is fundamental to the current biological and biomedical research. Construction of sequencing library is a key step of NGS. Therefore, various library construction methods have been explored. However, the current methods are still limited by some shortcomings. This study developed a new NGS library construction method, Single strand Adaptor Library Preparation (SALP), by using a novel single strand adaptor (SSA). SSA is a double-stranded oligonucleotide with a 3' overhang of 3 random nucleotides, which can be efficiently ligated to the 3' end of single strand DNA by T4 DNA ligase. SALP can be started with any denatured DNA fragments such as those sheared by Tn5 tagmentation, enzyme digestion and sonication. When started with Tn5-tagmented chromatin, SALP can overcome a key limitation of ATAC-seq and become a high-throughput NGS library construction method, SALP-seq, which can be used to comparatively characterize the chromatin openness state of multiple cells unbiasly. In this way, this study successfully characterized the comparative chromatin openness states of four different cell lines, including GM12878, HepG2, HeLa and 293T, with SALP-seq. Similarly, this study also successfully characterized the chromatin openness states of HepG2 cells with SALP-seq by using 10 5 to 500 cells. This study developed a new NGS library construction method, SALP, by using a novel kind of single strand adaptor (SSA), which should has wide applications in the future due to its unique performance.
Tichy, Diana; Pickl, Julia Maria Anna; Benner, Axel; Sültmann, Holger
2017-03-31
The identification of microRNA (miRNA) target genes is crucial for understanding miRNA function. Many methods for the genome-wide miRNA target identification have been developed in recent years; however, they have several limitations including the dependence on low-confident prediction programs and artificial miRNA manipulations. Ago-RNA immunoprecipitation combined with high-throughput sequencing (Ago-RIP-Seq) is a promising alternative. However, appropriate statistical data analysis algorithms taking into account the experimental design and the inherent noise of such experiments are largely lacking.Here, we investigate the experimental design for Ago-RIP-Seq and examine biostatistical methods to identify de novo miRNA target genes. Statistical approaches considered are either based on a negative binomial model fit to the read count data or applied to transformed data using a normal distribution-based generalized linear model. We compare them by a real data simulation study using plasmode data sets and evaluate the suitability of the approaches to detect true miRNA targets by sensitivity and false discovery rates. Our results suggest that simple approaches like linear regression models on (appropriately) transformed read count data are preferable. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
ChiLin: a comprehensive ChIP-seq and DNase-seq quality control and analysis pipeline.
Qin, Qian; Mei, Shenglin; Wu, Qiu; Sun, Hanfei; Li, Lewyn; Taing, Len; Chen, Sujun; Li, Fugen; Liu, Tao; Zang, Chongzhi; Xu, Han; Chen, Yiwen; Meyer, Clifford A; Zhang, Yong; Brown, Myles; Long, Henry W; Liu, X Shirley
2016-10-03
Transcription factor binding, histone modification, and chromatin accessibility studies are important approaches to understanding the biology of gene regulation. ChIP-seq and DNase-seq have become the standard techniques for studying protein-DNA interactions and chromatin accessibility respectively, and comprehensive quality control (QC) and analysis tools are critical to extracting the most value from these assay types. Although many analysis and QC tools have been reported, few combine ChIP-seq and DNase-seq data analysis and quality control in a unified framework with a comprehensive and unbiased reference of data quality metrics. ChiLin is a computational pipeline that automates the quality control and data analyses of ChIP-seq and DNase-seq data. It is developed using a flexible and modular software framework that can be easily extended and modified. ChiLin is ideal for batch processing of many datasets and is well suited for large collaborative projects involving ChIP-seq and DNase-seq from different designs. ChiLin generates comprehensive quality control reports that include comparisons with historical data derived from over 23,677 public ChIP-seq and DNase-seq samples (11,265 datasets) from eight literature-based classified categories. To the best of our knowledge, this atlas represents the most comprehensive ChIP-seq and DNase-seq related quality metric resource currently available. These historical metrics provide useful heuristic quality references for experiment across all commonly used assay types. Using representative datasets, we demonstrate the versatility of the pipeline by applying it to different assay types of ChIP-seq data. The pipeline software is available open source at https://github.com/cfce/chilin . ChiLin is a scalable and powerful tool to process large batches of ChIP-seq and DNase-seq datasets. The analysis output and quality metrics have been structured into user-friendly directories and reports. We have successfully compiled 23,677 profiles into a comprehensive quality atlas with fine classification for users.
Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-seq data.
Teng, Mingxiang; Irizarry, Rafael A
2017-11-01
The main application of ChIP-seq technology is the detection of genomic regions that bind to a protein of interest. A large part of functional genomics' public catalogs is based on ChIP-seq data. These catalogs rely on peak calling algorithms that infer protein-binding sites by detecting genomic regions associated with more mapped reads (coverage) than expected by chance, as a result of the experimental protocol's lack of perfect specificity. We find that GC-content bias accounts for substantial variability in the observed coverage for ChIP-seq experiments and that this variability leads to false-positive peak calls. More concerning is that the GC effect varies across experiments, with the effect strong enough to result in a substantial number of peaks called differently when different laboratories perform experiments on the same cell line. However, accounting for GC content bias in ChIP-seq is challenging because the binding sites of interest tend to be more common in high GC-content regions, which confounds real biological signals with unwanted variability. To account for this challenge, we introduce a statistical approach that accounts for GC effects on both nonspecific noise and signal induced by the binding site. The method can be used to account for this bias in binding quantification as well to improve existing peak calling algorithms. We use this approach to show a reduction in false-positive peaks as well as improved consistency across laboratories. © 2017 Teng and Irizarry; Published by Cold Spring Harbor Laboratory Press.
High-throughput detection of RNA processing in bacteria.
Gill, Erin E; Chan, Luisa S; Winsor, Geoffrey L; Dobson, Neil; Lo, Raymond; Ho Sui, Shannan J; Dhillon, Bhavjinder K; Taylor, Patrick K; Shrestha, Raunak; Spencer, Cory; Hancock, Robert E W; Unrau, Peter J; Brinkman, Fiona S L
2018-03-27
Understanding the RNA processing of an organism's transcriptome is an essential but challenging step in understanding its biology. Here we investigate with unprecedented detail the transcriptome of Pseudomonas aeruginosa PAO1, a medically important and innately multi-drug resistant bacterium. We systematically mapped RNA cleavage and dephosphorylation sites that result in 5'-monophosphate terminated RNA (pRNA) using monophosphate RNA-Seq (pRNA-Seq). Transcriptional start sites (TSS) were also mapped using differential RNA-Seq (dRNA-Seq) and both datasets were compared to conventional RNA-Seq performed in a variety of growth conditions. The pRNA-Seq library revealed known tRNA, rRNA and transfer-messenger RNA (tmRNA) processing sites, together with previously uncharacterized RNA cleavage events that were found disproportionately near the 5' ends of transcripts associated with basic bacterial functions such as oxidative phosphorylation and purine metabolism. The majority (97%) of the processed mRNAs were cleaved at precise codon positions within defined sequence motifs indicative of distinct endonucleolytic activities. The most abundant of these motifs corresponded closely to an E. coli RNase E site previously established in vitro. Using the dRNA-Seq library, we performed an operon analysis and predicted 3159 potential TSS. A correlation analysis uncovered 105 antiparallel pairs of TSS that were separated by 18 bp from each other and were centered on single palindromic TAT(A/T)ATA motifs (likely - 10 promoter elements), suggesting that, consistent with previous in vitro experimentation, these sites can initiate transcription bi-directionally and may thus provide a novel form of transcriptional regulation. TSS and RNA-Seq analysis allowed us to confirm expression of small non-coding RNAs (ncRNAs), many of which are differentially expressed in swarming and biofilm formation conditions. This study uses pRNA-Seq, a method that provides a genome-wide survey of RNA processing, to study the bacterium Pseudomonas aeruginosa and discover extensive transcript processing not previously appreciated. We have also gained novel insight into RNA maturation and turnover as well as a potential novel form of transcription regulation. NOTE: All sequence data has been submitted to the NCBI sequence read archive. Accession numbers are as follows: [NCBI sequence read archive: SRX156386, SRX157659, SRX157660, SRX157661, SRX157683 and SRX158075]. The sequence data is viewable using Jbrowse on www.pseudomonas.com .
Schmidt, Florian; Gasparoni, Nina; Gasparoni, Gilles; Gianmoena, Kathrin; Cadenas, Cristina; Polansky, Julia K; Ebert, Peter; Nordström, Karl; Barann, Matthias; Sinha, Anupam; Fröhler, Sebastian; Xiong, Jieyi; Dehghani Amirabad, Azim; Behjati Ardakani, Fatemeh; Hutter, Barbara; Zipprich, Gideon; Felder, Bärbel; Eils, Jürgen; Brors, Benedikt; Chen, Wei; Hengstler, Jan G; Hamann, Alf; Lengauer, Thomas; Rosenstiel, Philip; Walter, Jörn; Schulz, Marcel H
2017-01-09
The binding and contribution of transcription factors (TF) to cell specific gene expression is often deduced from open-chromatin measurements to avoid costly TF ChIP-seq assays. Thus, it is important to develop computational methods for accurate TF binding prediction in open-chromatin regions (OCRs). Here, we report a novel segmentation-based method, TEPIC, to predict TF binding by combining sets of OCRs with position weight matrices. TEPIC can be applied to various open-chromatin data, e.g. DNaseI-seq and NOMe-seq. Additionally, Histone-Marks (HMs) can be used to identify candidate TF binding sites. TEPIC computes TF affinities and uses open-chromatin/HM signal intensity as quantitative measures of TF binding strength. Using machine learning, we find low affinity binding sites to improve our ability to explain gene expression variability compared to the standard presence/absence classification of binding sites. Further, we show that both footprints and peaks capture essential TF binding events and lead to a good prediction performance. In our application, gene-based scores computed by TEPIC with one open-chromatin assay nearly reach the quality of several TF ChIP-seq data sets. Finally, these scores correctly predict known transcriptional regulators as illustrated by the application to novel DNaseI-seq and NOMe-seq data for primary human hepatocytes and CD4+ T-cells, respectively. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Single-cell regulome data analysis by SCRAT.
Ji, Zhicheng; Zhou, Weiqiang; Ji, Hongkai
2017-09-15
Emerging single-cell technologies (e.g. single-cell ATAC-seq, DNase-seq or ChIP-seq) have made it possible to assay regulome of individual cells. Single-cell regulome data are highly sparse and discrete. Analyzing such data is challenging. User-friendly software tools are still lacking. We present SCRAT, a Single-Cell Regulome Analysis Toolbox with a graphical user interface, for studying cell heterogeneity using single-cell regulome data. SCRAT can be used to conveniently summarize regulatory activities according to different features (e.g. gene sets, transcription factor binding motif sites, etc.). Using these features, users can identify cell subpopulations in a heterogeneous biological sample, infer cell identities of each subpopulation, and discover distinguishing features such as gene sets and transcription factors that show different activities among subpopulations. SCRAT is freely available at https://zhiji.shinyapps.io/scrat as an online web service and at https://github.com/zji90/SCRAT as an R package. hji@jhu.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Transcriptomic Analysis of Paulownia Infected by Paulownia Witches'-Broom Phytoplasma
Zhu, Shui-Fang; Lin, Cai-Li; Tian, Guo-Zhong; Xu, Xia; Zhao, Wen-Jun
2013-01-01
Phytoplasmas are plant pathogenic bacteria that have no cell wall and are responsible for major crop losses throughout the world. Phytoplasma-infected plants show a variety of symptoms and the mechanisms they use to physiologically alter the host plants are of considerable interest, but poorly understood. In this study we undertook a detailed analysis of Paulownia infected by Paulownia witches’-broom (PaWB) Phytoplasma using high-throughput mRNA sequencing (RNA-Seq) and digital gene expression (DGE). RNA-Seq analysis identified 74,831 unigenes, which were subsequently used as reference sequences for DGE analysis of diseased and healthy Paulownia in field grown and tissue cultured plants. Our study revealed that dramatic changes occurred in the gene expression profile of Paulownia after PaWB Phytoplasma infection. Genes encoding key enzymes in cytokinin biosynthesis, such as isopentenyl diphosphate isomerase and isopentenyltransferase, were significantly induced in the infected Paulownia. Genes involved in cell wall biosynthesis and degradation were largely up-regulated and genes related to photosynthesis were down-regulated after PaWB Phytoplasma infection. Our systematic analysis provides comprehensive transcriptomic data about plants infected by Phytoplasma. This information will help further our understanding of the detailed interaction mechanisms between plants and Phytoplasma. PMID:24130859
Patel, Rajesh; Tsan, Alison; Sumiyoshi, Teiko; Fu, Ling; Desai, Rupal; Schoenbrunner, Nancy; Myers, Thomas W.; Bauer, Keith; Smith, Edward; Raja, Rajiv
2014-01-01
Molecular profiling of tumor tissue to detect alterations, such as oncogenic mutations, plays a vital role in determining treatment options in oncology. Hence, there is an increasing need for a robust and high-throughput technology to detect oncogenic hotspot mutations. Although commercial assays are available to detect genetic alterations in single genes, only a limited amount of tissue is often available from patients, requiring multiplexing to allow for simultaneous detection of mutations in many genes using low DNA input. Even though next-generation sequencing (NGS) platforms provide powerful tools for this purpose, they face challenges such as high cost, large DNA input requirement, complex data analysis, and long turnaround times, limiting their use in clinical settings. We report the development of the next generation mutation multi-analyte panel (MUT-MAP), a high-throughput microfluidic, panel for detecting 120 somatic mutations across eleven genes of therapeutic interest (AKT1, BRAF, EGFR, FGFR3, FLT3, HRAS, KIT, KRAS, MET, NRAS, and PIK3CA) using allele-specific PCR (AS-PCR) and Taqman technology. This mutation panel requires as little as 2 ng of high quality DNA from fresh frozen or 100 ng of DNA from formalin-fixed paraffin-embedded (FFPE) tissues. Mutation calls, including an automated data analysis process, have been implemented to run 88 samples per day. Validation of this platform using plasmids showed robust signal and low cross-reactivity in all of the newly added assays and mutation calls in cell line samples were found to be consistent with the Catalogue of Somatic Mutations in Cancer (COSMIC) database allowing for direct comparison of our platform to Sanger sequencing. High correlation with NGS when compared to the SuraSeq500 panel run on the Ion Torrent platform in a FFPE dilution experiment showed assay sensitivity down to 0.45%. This multiplexed mutation panel is a valuable tool for high-throughput biomarker discovery in personalized medicine and cancer drug development. PMID:24658394
Liu, Liqin; Luo, Qiaoling; Teng, Wan; Li, Bin; Li, Hongwei; Li, Yiwen; Li, Zhensheng; Zheng, Qi
2018-05-01
Based on SLAF-seq, 67 Thinopyrum ponticum-specific markers and eight Th. ponticum-specific FISH probes were developed, and these markers and probes could be used for detection of alien chromatin in a wheat background. Decaploid Thinopyrum ponticum (2n = 10x = 70) is a valuable gene reservoir for wheat improvement. Identification of Th. ponticum introgression would facilitate its transfer into diverse wheat genetic backgrounds and its practical utilization in wheat improvement. Based on specific-locus-amplified fragment sequencing (SLAF-seq) technology, 67 new Th. ponticum-specific molecular markers and eight Th. ponticum-specific fluorescence in situ hybridization (FISH) probes have been developed from a tiny wheat-Th. ponticum translocation line. These newly developed molecular markers allowed the detection of Th. ponticum DNA in a variety of materials specifically and steadily at high throughput. According to the hybridization signal pattern, the eight Th. ponticum-specific probes could be divided into two groups. The first group including five dispersed repetitive sequence probes could identify Th. ponticum chromatin more sensitively and accurately than genomic in situ hybridization (GISH). Whereas the second group having three tandem repetitive sequence probes enabled the discrimination of Th. ponticum chromosomes together with another clone pAs1 in wheat-Th. ponticum partial amphiploid Xiaoyan 68.
Dummitt, Benjamin; Chang, Yie-Hwa
2006-06-01
Quantitation of the level or activity of specific proteins is one of the most commonly performed experiments in biomedical research. Protein detection has historically been difficult to adapt to high throughput platforms because of heavy reliance upon antibodies for protein detection. Molecular beacons for DNA binding proteins is a recently developed technology that attempts to overcome such limitations. Protein detection is accomplished using inexpensive, easy-to-synthesize oligonucleotides, accompanied by a fluorescence readout. Importantly, detection of the protein and reporting of the signal occur simultaneously, allowing for one-step protocols and increased potential for use in high throughput analysis. While the initial iteration of the technology allowed only for the detection of sequence-specific DNA binding proteins, more recent adaptations allow for the possibility of development of beacons for any protein, independent of native DNA binding activity. Here, we discuss the development of the technology, the mechanism of the reaction, and recent improvements and modifications made to improve the assay in terms of sensitivity, potential for multiplexing, and broad applicability.
Han, Changho; Chatterjee, Arindam; Noetzel, Meredith J; Panarese, Joseph D; Smith, Emery; Chase, Peter; Hodder, Peter; Niswender, Colleen; Conn, P Jeffrey; Lindsley, Craig W; Stauffer, Shaun R
2015-01-15
Results from a 2012 high-throughput screen of the NIH Molecular Libraries Small Molecule Repository (MLSMR) against the human muscarinic receptor subtype 1 (M1) for positive allosteric modulators is reported. A content-rich screen utilizing an intracellular calcium mobilization triple-addition protocol allowed for assessment of all three modes of pharmacology at M1, including agonist, positive allosteric modulator, and antagonist activities in a single screening platform. We disclose a dibenzyl-2H-pyrazolo[4,3-c]quinolin-3(5H)-one hit (DBPQ, CID 915409) and examine N-benzyl pharmacophore/SAR relationships versus previously reported quinolin-3(5H)-ones and isatins, including ML137. SAR and consideration of recently reported crystal structures, homology modeling, and structure-function relationships using point mutations suggests a shared binding mode orientation at the putative common allosteric binding site directed by the pendant N-benzyl substructure. Copyright © 2014 Elsevier Ltd. All rights reserved.
Guerette, Paul A; Hoon, Shawn; Seow, Yiqi; Raida, Manfred; Masic, Admir; Wong, Fong T; Ho, Vincent H B; Kong, Kiat Whye; Demirel, Melik C; Pena-Francesch, Abdon; Amini, Shahrouz; Tay, Gavin Z; Ding, Dawei; Miserez, Ali
2013-10-01
Efforts to engineer new materials inspired by biological structures are hampered by the lack of genomic data from many model organisms studied in biomimetic research. Here we show that biomimetic engineering can be accelerated by integrating high-throughput RNA-seq with proteomics and advanced materials characterization. This approach can be applied to a broad range of systems, as we illustrate by investigating diverse high-performance biological materials involved in embryo protection, adhesion and predation. In one example, we rapidly engineer recombinant squid sucker ring teeth proteins into a range of structural and functional materials, including nanopatterned surfaces and photo-cross-linked films that exceed the mechanical properties of most natural and synthetic polymers. Integrating RNA-seq with proteomics and materials science facilitates the molecular characterization of natural materials and the effective translation of their molecular designs into a wide range of bio-inspired materials.
zUMIs - A fast and flexible pipeline to process RNA sequencing data with UMIs.
Parekh, Swati; Ziegenhain, Christoph; Vieth, Beate; Enard, Wolfgang; Hellmann, Ines
2018-06-01
Single-cell RNA-sequencing (scRNA-seq) experiments typically analyze hundreds or thousands of cells after amplification of the cDNA. The high throughput is made possible by the early introduction of sample-specific bar codes (BCs), and the amplification bias is alleviated by unique molecular identifiers (UMIs). Thus, the ideal analysis pipeline for scRNA-seq data needs to efficiently tabulate reads according to both BC and UMI. zUMIs is a pipeline that can handle both known and random BCs and also efficiently collapse UMIs, either just for exon mapping reads or for both exon and intron mapping reads. If BC annotation is missing, zUMIs can accurately detect intact cells from the distribution of sequencing reads. Another unique feature of zUMIs is the adaptive downsampling function that facilitates dealing with hugely varying library sizes but also allows the user to evaluate whether the library has been sequenced to saturation. To illustrate the utility of zUMIs, we analyzed a single-nucleus RNA-seq dataset and show that more than 35% of all reads map to introns. Also, we show that these intronic reads are informative about expression levels, significantly increasing the number of detected genes and improving the cluster resolution. zUMIs flexibility makes if possible to accommodate data generated with any of the major scRNA-seq protocols that use BCs and UMIs and is the most feature-rich, fast, and user-friendly pipeline to process such scRNA-seq data.
Malhotra, Deepti; Portales-Casamar, Elodie; Singh, Anju; Srivastava, Siddhartha; Arenillas, David; Happel, Christine; Shyr, Casper; Wakabayashi, Nobunao; Kensler, Thomas W.; Wasserman, Wyeth W.; Biswal, Shyam
2010-01-01
The Nrf2 (nuclear factor E2 p45-related factor 2) transcription factor responds to diverse oxidative and electrophilic environmental stresses by circumventing repression by Keap1, translocating to the nucleus, and activating cytoprotective genes. Nrf2 responses provide protection against chemical carcinogenesis, chronic inflammation, neurodegeneration, emphysema, asthma and sepsis in murine models. Nrf2 regulates the expression of a plethora of genes that detoxify oxidants and electrophiles and repair or remove damaged macromolecules, such as through proteasomal processing. However, many direct targets of Nrf2 remain undefined. Here, mouse embryonic fibroblasts (MEF) with either constitutive nuclear accumulation (Keap1−/−) or depletion (Nrf2−/−) of Nrf2 were utilized to perform chromatin-immunoprecipitation with parallel sequencing (ChIP-Seq) and global transcription profiling. This unique Nrf2 ChIP-Seq dataset is highly enriched for Nrf2-binding motifs. Integrating ChIP-Seq and microarray analyses, we identified 645 basal and 654 inducible direct targets of Nrf2, with 244 genes at the intersection. Modulated pathways in stress response and cell proliferation distinguish the inducible and basal programs. Results were confirmed in an in vivo stress model of cigarette smoke-exposed mice. This study reveals global circuitry of the Nrf2 stress response emphasizing Nrf2 as a central node in cell survival response. PMID:20460467
Brozovic, Matija; Dantec, Christelle; Dardaillon, Justine; Dauga, Delphine; Faure, Emmanuel; Gineste, Mathieu; Louis, Alexandra; Naville, Magali; Nitta, Kazuhiro R; Piette, Jacques; Reeves, Wendy; Scornavacca, Céline; Simion, Paul; Vincentelli, Renaud; Bellec, Maelle; Aicha, Sameh Ben; Fagotto, Marie; Guéroult-Bellone, Marion; Haeussler, Maximilian; Jacox, Edwin; Lowe, Elijah K; Mendez, Mickael; Roberge, Alexis; Stolfi, Alberto; Yokomori, Rui; Cambillau, Christian; Christiaen, Lionel; Delsuc, Frédéric; Douzery, Emmanuel; Dumollard, Rémi; Kusakabe, Takehiro; Nakai, Kenta; Nishida, Hiroki; Satou, Yutaka; Swalla, Billie; Veeman, Michael; Volff, Jean-Nicolas
2018-01-01
Abstract ANISEED (www.aniseed.cnrs.fr) is the main model organism database for tunicates, the sister-group of vertebrates. This release gives access to annotated genomes, gene expression patterns, and anatomical descriptions for nine ascidian species. It provides increased integration with external molecular and taxonomy databases, better support for epigenomics datasets, in particular RNA-seq, ChIP-seq and SELEX-seq, and features novel interactive interfaces for existing and novel datatypes. In particular, the cross-species navigation and comparison is enhanced through a novel taxonomy section describing each represented species and through the implementation of interactive phylogenetic gene trees for 60% of tunicate genes. The gene expression section displays the results of RNA-seq experiments for the three major model species of solitary ascidians. Gene expression is controlled by the binding of transcription factors to cis-regulatory sequences. A high-resolution description of the DNA-binding specificity for 131 Ciona robusta (formerly C. intestinalis type A) transcription factors by SELEX-seq is provided and used to map candidate binding sites across the Ciona robusta and Phallusia mammillata genomes. Finally, use of a WashU Epigenome browser enhances genome navigation, while a Genomicus server was set up to explore microsynteny relationships within tunicates and with vertebrates, Amphioxus, echinoderms and hemichordates. PMID:29149270
Wang, Jun; Wang, Zhilan; Du, Xiaofen; Yang, Huiqing; Han, Fang; Han, Yuanhuai; Yuan, Feng; Zhang, Linyi; Peng, Shuzhong; Guo, Erhu
2017-01-01
Foxtail millet (Setaria italica), a very important grain crop in China, has become a new model plant for cereal crops and biofuel grasses. Although its reference genome sequence was released recently, quantitative trait loci (QTLs) controlling complex agronomic traits remains limited. The development of massively parallel genotyping methods and next-generation sequencing technologies provides an excellent opportunity for developing single-nucleotide polymorphisms (SNPs) for linkage map construction and QTL analysis of complex quantitative traits. In this study, a high-throughput and cost-effective RAD-seq approach was employed to generate a high-density genetic map for foxtail millet. A total of 2,668,587 SNP loci were detected according to the reference genome sequence; meanwhile, 9,968 SNP markers were used to genotype 124 F2 progenies derived from the cross between Hongmiaozhangu and Changnong35; a high-density genetic map spanning 1648.8 cM, with an average distance of 0.17 cM between adjacent markers was constructed; 11 major QTLs for eight agronomic traits were identified; five co-dominant DNA markers were developed. These findings will be of value for the identification of candidate genes and marker-assisted selection in foxtail millet.
Wang, Zhilan; Du, Xiaofen; Yang, Huiqing; Han, Fang; Han, Yuanhuai; Yuan, Feng; Zhang, Linyi; Peng, Shuzhong; Guo, Erhu
2017-01-01
Foxtail millet (Setaria italica), a very important grain crop in China, has become a new model plant for cereal crops and biofuel grasses. Although its reference genome sequence was released recently, quantitative trait loci (QTLs) controlling complex agronomic traits remains limited. The development of massively parallel genotyping methods and next-generation sequencing technologies provides an excellent opportunity for developing single-nucleotide polymorphisms (SNPs) for linkage map construction and QTL analysis of complex quantitative traits. In this study, a high-throughput and cost-effective RAD-seq approach was employed to generate a high-density genetic map for foxtail millet. A total of 2,668,587 SNP loci were detected according to the reference genome sequence; meanwhile, 9,968 SNP markers were used to genotype 124 F2 progenies derived from the cross between Hongmiaozhangu and Changnong35; a high-density genetic map spanning 1648.8 cM, with an average distance of 0.17 cM between adjacent markers was constructed; 11 major QTLs for eight agronomic traits were identified; five co-dominant DNA markers were developed. These findings will be of value for the identification of candidate genes and marker-assisted selection in foxtail millet. PMID:28644843
Ozer, Abdullah; Tome, Jacob M.; Friedman, Robin C.; Gheba, Dan; Schroth, Gary P.; Lis, John T.
2016-01-01
Because RNA-protein interactions play a central role in a wide-array of biological processes, methods that enable a quantitative assessment of these interactions in a high-throughput manner are in great demand. Recently, we developed the High Throughput Sequencing-RNA Affinity Profiling (HiTS-RAP) assay, which couples sequencing on an Illumina GAIIx with the quantitative assessment of one or several proteins’ interactions with millions of different RNAs in a single experiment. We have successfully used HiTS-RAP to analyze interactions of EGFP and NELF-E proteins with their corresponding canonical and mutant RNA aptamers. Here, we provide a detailed protocol for HiTS-RAP, which can be completed in about a month (8 days hands-on time) including the preparation and testing of recombinant proteins and DNA templates, clustering DNA templates on a flowcell, high-throughput sequencing and protein binding with GAIIx, and finally data analysis. We also highlight aspects of HiTS-RAP that can be further improved and points of comparison between HiTS-RAP and two other recently developed methods, RNA-MaP and RBNS. A successful HiTS-RAP experiment provides the sequence and binding curves for approximately 200 million RNAs in a single experiment. PMID:26182240
Liu, Lijun; Ramsay, Trevor; Zinkgraf, Matthew; Sundell, David; Street, Nathaniel Robert; Filkov, Vladimir; Groover, Andrew
2015-06-01
Identifying transcription factor target genes is essential for modeling the transcriptional networks underlying developmental processes. Here we report a chromatin immunoprecipitation sequencing (ChIP-seq) resource consisting of genome-wide binding regions and associated putative target genes for four Populus homeodomain transcription factors expressed during secondary growth and wood formation. Software code (programs and scripts) for processing the Populus ChIP-seq data are provided within a publically available iPlant image, including tools for ChIP-seq data quality control and evaluation adapted from the human Encyclopedia of DNA Elements (ENCODE) project. Basic information for each transcription factor (including members of Class I KNOX, Class III HD ZIP, BEL1-like families) binding are summarized, including the number and location of binding regions, distribution of binding regions relative to gene features, associated putative target genes, and enriched functional categories of putative target genes. These ChIP-seq data have been integrated within the Populus Genome Integrative Explorer (PopGenIE) where they can be analyzed using a variety of web-based tools. We present an example analysis that shows preferential binding of transcription factor ARBORKNOX1 to the nearest neighbor genes in a pre-calculated co-expression network module, and enrichment for meristem-related genes within this module including multiple orthologs of Arabidopsis KNOTTED-like Arabidopsis 2/6. © 2015 Society for Experimental Biology and John Wiley & Sons Ltd This article has been contributed to by US Government employees and their work is in the public domain in the USA.
2012-01-01
Background RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Results Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. Conclusions This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates. PMID:22985019
Robles, José A; Qureshi, Sumaira E; Stephen, Stuart J; Wilson, Susan R; Burden, Conrad J; Taylor, Jennifer M
2012-09-17
RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates.
Some pharmaceuticals and environmental chemicals bind the thyroid peroxidase (TPO) enzyme and disrupt thyroid hormone production. The potential for TPO inhibition is a function of both the binding affinity and concentration of the chemical within the thyroid gland. The former can...
In vitro high-throughput screening (HTS) and in silico technologies have emerged as 21st century tools for chemical hazard identification. In 2007 the U.S. Environmental Protection Agency (EPA) launched the ToxCast Program, which has screened thousands of chemicals in hundreds of...
Park, Su-Jin; Kumar, Mukesh; Kwon, Hyeok-il; Seong, Rak-Kyun; Han, Kyudong; Song, Jae-min; Kim, Chul-Joong; Choi, Young-Ki; Shin, Ok Sarah
2015-11-18
Emerging outbreaks of newly found, highly pathogenic avian influenza (HPAI) A(H5N8) viruses have been reported globally. Previous studies have indicated that H5N8 pathogenicity in mice is relatively moderate compared with H5N1 pathogenicity. However, detailed mechanisms underlying avian influenza pathogenicity are still undetermined. We used a high-throughput RNA-seq method to analyse host and pathogen transcriptomes in the lungs of mice infected with A/MD/Korea/W452/2014 (H5N8) and A/EM/Korea/W149/2006 (H5N1) viruses. Sequenced numbers of viral transcripts and expression levels of host immune-related genes at 1 day post infection (dpi) were higher in H5N8-infected than H5N1-infected mice. Dual sequencing of viral transcripts revealed that in contrast to the observations at 1 dpi, higher number of H5N1 genes than H5N8 genes was sequenced at 3 and 7 dpi, which is consistent with higher viral titres and virulence observed in infected lungs in vivo. Ingenuity pathway analysis revealed a more significant upregulation of death receptor signalling, driven by H5N1 than with H5N8 infection at 3 and 7 dpi. Early induction of immune response-related genes may elicit protection in H5N8-infected mice, which correlates with moderate pathogenicity in vivo. Collectively, our data provide new insight into the underlying mechanisms of the differential pathogenicity of avian influenza viruses.
Dynamic changes in host gene expression associated with H5N8 avian influenza virus infection in mice
Park, Su-Jin; Kumar, Mukesh; Kwon, Hyeok-il; Seong, Rak-Kyun; Han, Kyudong; Song, Jae-min; Kim, Chul-Joong; Choi, Young-Ki; Shin, Ok Sarah
2015-01-01
Emerging outbreaks of newly found, highly pathogenic avian influenza (HPAI) A(H5N8) viruses have been reported globally. Previous studies have indicated that H5N8 pathogenicity in mice is relatively moderate compared with H5N1 pathogenicity. However, detailed mechanisms underlying avian influenza pathogenicity are still undetermined. We used a high-throughput RNA-seq method to analyse host and pathogen transcriptomes in the lungs of mice infected with A/MD/Korea/W452/2014 (H5N8) and A/EM/Korea/W149/2006 (H5N1) viruses. Sequenced numbers of viral transcripts and expression levels of host immune-related genes at 1 day post infection (dpi) were higher in H5N8-infected than H5N1-infected mice. Dual sequencing of viral transcripts revealed that in contrast to the observations at 1 dpi, higher number of H5N1 genes than H5N8 genes was sequenced at 3 and 7 dpi, which is consistent with higher viral titres and virulence observed in infected lungs in vivo. Ingenuity pathway analysis revealed a more significant upregulation of death receptor signalling, driven by H5N1 than with H5N8 infection at 3 and 7 dpi. Early induction of immune response-related genes may elicit protection in H5N8-infected mice, which correlates with moderate pathogenicity in vivo. Collectively, our data provide new insight into the underlying mechanisms of the differential pathogenicity of avian influenza viruses. PMID:26576844
Chèneby, Jeanne; Gheorghe, Marius; Artufel, Marie
2018-01-01
Abstract With this latest release of ReMap (http://remap.cisreg.eu), we present a unique collection of regulatory regions in human, as a result of a large-scale integrative analysis of ChIP-seq experiments for hundreds of transcriptional regulators (TRs) such as transcription factors, transcriptional co-activators and chromatin regulators. In 2015, we introduced the ReMap database to capture the genome regulatory space by integrating public ChIP-seq datasets, covering 237 TRs across 13 million (M) peaks. In this release, we have extended this catalog to constitute a unique collection of regulatory regions. Specifically, we have collected, analyzed and retained after quality control a total of 2829 ChIP-seq datasets available from public sources, covering a total of 485 TRs with a catalog of 80M peaks. Additionally, the updated database includes new search features for TR names as well as aliases, including cell line names and the ability to navigate the data directly within genome browsers via public track hubs. Finally, full access to this catalog is available online together with a TR binding enrichment analysis tool. ReMap 2018 provides a significant update of the ReMap database, providing an in depth view of the complexity of the regulatory landscape in human. PMID:29126285
Comparison of normalization methods for differential gene expression analysis in RNA-Seq experiments
Maza, Elie; Frasse, Pierre; Senin, Pavel; Bouzayen, Mondher; Zouine, Mohamed
2013-01-01
In recent years, RNA-Seq technologies became a powerful tool for transcriptome studies. However, computational methods dedicated to the analysis of high-throughput sequencing data are yet to be standardized. In particular, it is known that the choice of a normalization procedure leads to a great variability in results of differential gene expression analysis. The present study compares the most widespread normalization procedures and proposes a novel one aiming at removing an inherent bias of studied transcriptomes related to their relative size. Comparisons of the normalization procedures are performed on real and simulated data sets. Real RNA-Seq data sets analyses, performed with all the different normalization methods, show that only 50% of significantly differentially expressed genes are common. This result highlights the influence of the normalization step on the differential expression analysis. Real and simulated data sets analyses give similar results showing 3 different groups of procedures having the same behavior. The group including the novel method named “Median Ratio Normalization” (MRN) gives the lower number of false discoveries. Within this group the MRN method is less sensitive to the modification of parameters related to the relative size of transcriptomes such as the number of down- and upregulated genes and the gene expression levels. The newly proposed MRN method efficiently deals with intrinsic bias resulting from relative size of studied transcriptomes. Validation with real and simulated data sets confirmed that MRN is more consistent and robust than existing methods. PMID:26442135
High-throughput sequencing of forensic genetic samples using punches of FTA cards with buccal swabs.
Kampmann, Marie-Louise; Buchard, Anders; Børsting, Claus; Morling, Niels
2016-01-01
Here, we demonstrate that punches from buccal swab samples preserved on FTA cards can be used for high-throughput DNA sequencing, also known as massively parallel sequencing (MPS). We typed 44 reference samples with the HID-Ion AmpliSeq Identity Panel using washed 1.2 mm punches from FTA cards with buccal swabs and compared the results with those obtained with DNA extracted using the EZ1 DNA Investigator Kit. Concordant profiles were obtained for all samples. Our protocol includes simple punch, wash, and PCR steps, reducing cost and hands-on time in the laboratory. Furthermore, it facilitates automation of DNA sequencing.
Evaluation of microRNA alignment techniques
Kaspi, Antony; El-Osta, Assam
2016-01-01
Genomic alignment of small RNA (smRNA) sequences such as microRNAs poses considerable challenges due to their short length (∼21 nucleotides [nt]) as well as the large size and complexity of plant and animal genomes. While several tools have been developed for high-throughput mapping of longer mRNA-seq reads (>30 nt), there are few that are specifically designed for mapping of smRNA reads including microRNAs. The accuracy of these mappers has not been systematically determined in the case of smRNA-seq. In addition, it is unknown whether these aligners accurately map smRNA reads containing sequence errors and polymorphisms. By using simulated read sets, we determine the alignment sensitivity and accuracy of 16 short-read mappers and quantify their robustness to mismatches, indels, and nontemplated nucleotide additions. These were explored in the context of a plant genome (Oryza sativa, ∼500 Mbp) and a mammalian genome (Homo sapiens, ∼3.1 Gbp). Analysis of simulated and real smRNA-seq data demonstrates that mapper selection impacts differential expression results and interpretation. These results will inform on best practice for smRNA mapping and enable more accurate smRNA detection and quantification of expression and RNA editing. PMID:27284164
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gallaher, Sean D.; Fitz-Gibbon, Sorel T.; Strenkert, Daniela
Chlamydomonas reinhardtii is a unicellular chlorophyte alga that is widely studied as a reference organism for understanding photosynthesis, sensory and motile cilia, and for development of an algal-based platform for producing biofuels and bio-products. Its highly repetitive, ~205-kbp circular chloroplast genome and ~15.8-kbp linear mitochondrial genome were sequenced prior to the advent of high-throughput sequencing technologies. Here, high coverage shotgun sequencing was used to assemble both organellar genomes de novo. These new genomes correct dozens of errors in the prior genome sequences and annotations. Gen-ome sequencing coverage indicates that each cell contains on average 83 copies of the chloroplast genomemore » and 130 copies of the mitochondrial genome. Using protocols and analyses optimized for organellar tran-scripts, RNA-Seq was used to quantify their relative abundances across 12 different growth conditions. Forty-six percent of total cellular mRNA is attributable to high expression from a few dozen chloroplast genes. RNA-Seq data were used to guide gene annotation, to demonstrate polycistronic gene expression, and to quantify splicing of psaA and psbA introns. In contrast to a conclusion from a recent study, we found that chloroplast transcripts are not edited. Unexpectedly, cytosine-rich polynucleotide tails were observed at the 3’-end of all mitochondrial transcripts. A comparative genomics analysis of eight laboratory strains and 11 wild isolates of C. reinhardtii identified 2658 variants in the organellargenomes, which is 1/10th as much genetic diversity as is found in the nucleus.« less
Potts, Anastasia H; Leng, Yuanyuan; Babitzke, Paul; Romeo, Tony
2018-03-29
The Csr global regulatory system coordinates gene expression in response to metabolic status. This system utilizes the RNA binding protein CsrA to regulate gene expression by binding to transcripts of structural and regulatory genes, thus affecting their structure, stability, translation, and/or transcription elongation. CsrA activity is controlled by sRNAs, CsrB and CsrC, which sequester CsrA away from other transcripts. CsrB/C levels are partly determined by their rates of turnover, which requires CsrD to render them susceptible to RNase E cleavage. Previous epistasis analysis suggested that CsrD affects gene expression through the other Csr components, CsrB/C and CsrA. However, those conclusions were based on a limited analysis of reporters. Here, we reassessed the global behavior of the Csr circuitry using epistasis analysis with RNA seq (Epi-seq). Because CsrD effects on mRNA levels were entirely lost in the csrA mutant and largely eliminated in a csrB/C mutant under our experimental conditions, while the majority of CsrA effects persisted in the absence of csrD, the original model accounts for the global behavior of the Csr system. Our present results also reflect a more nuanced role of CsrA as terminal regulator of the Csr system than has been recognized.
ScaffoldSeq: Software for characterization of directed evolution populations.
Woldring, Daniel R; Holec, Patrick V; Hackel, Benjamin J
2016-07-01
ScaffoldSeq is software designed for the numerous applications-including directed evolution analysis-in which a user generates a population of DNA sequences encoding for partially diverse proteins with related functions and would like to characterize the single site and pairwise amino acid frequencies across the population. A common scenario for enzyme maturation, antibody screening, and alternative scaffold engineering involves naïve and evolved populations that contain diversified regions, varying in both sequence and length, within a conserved framework. Analyzing the diversified regions of such populations is facilitated by high-throughput sequencing platforms; however, length variability within these regions (e.g., antibody CDRs) encumbers the alignment process. To overcome this challenge, the ScaffoldSeq algorithm takes advantage of conserved framework sequences to quickly identify diverse regions. Beyond this, unintended biases in sequence frequency are generated throughout the experimental workflow required to evolve and isolate clones of interest prior to DNA sequencing. ScaffoldSeq software uniquely handles this issue by providing tools to quantify and remove background sequences, cluster similar protein families, and dampen the impact of dominant clones. The software produces graphical and tabular summaries for each region of interest, allowing users to evaluate diversity in a site-specific manner as well as identify epistatic pairwise interactions. The code and detailed information are freely available at http://research.cems.umn.edu/hackel. Proteins 2016; 84:869-874. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Kleptoplast Regulation by an Antarctic Dinoflagellate
NASA Astrophysics Data System (ADS)
Gast, R. J.; Hehenberger, E.; Keeling, P.
2016-02-01
We are studying the evolutionary history and expression of plastid- targeted genes in an Antarctic dinoflagellate that steals chloroplasts from the haptophyte, Phaeocystis. Our project seeks to determine whether the kleptoplastidic dinoflagellate utilizes ancestral plastid proteins to regulate its stolen plastid, and how their transcription is related to environmental factors that are relevant to the Southern Ocean environment (temperature and light). To accomplish our goals, we have utilized high throughput transciptome analysis and RNA-Seq experiments of the dinoflagellate and Phaeocystis. Analysis of the dinoflagellate transcriptome has revealed complete mevalonic acid-independent and heme plastid-associated pathways as well as petF and petH transcripts with peridinin-plastid targeting sequences. In contrast, the proteins psaE, petJ, petC show similarity to non-Phaeocystis haptophyte homologs in their respective trees, and potentially carry haptophyte transit peptides. Anaylsis of RNA-Seq temperature and light experiments for the dinoflagellate indicate that there are significant differences in gene expression under the different environmental conditions, and we are in the process of identifying the genes associated with these changes. This work will help us to understand the environmental success of this alternative nutritional strategy.
Polstein, Lauren R; Perez-Pinera, Pablo; Kocak, D Dewran; Vockley, Christopher M; Bledsoe, Peggy; Song, Lingyun; Safi, Alexias; Crawford, Gregory E; Reddy, Timothy E; Gersbach, Charles A
2015-08-01
Genome engineering technologies based on the CRISPR/Cas9 and TALE systems are enabling new approaches in science and biotechnology. However, the specificity of these tools in complex genomes and the role of chromatin structure in determining DNA binding are not well understood. We analyzed the genome-wide effects of TALE- and CRISPR-based transcriptional activators in human cells using ChIP-seq to assess DNA-binding specificity and RNA-seq to measure the specificity of perturbing the transcriptome. Additionally, DNase-seq was used to assess genome-wide chromatin remodeling that occurs as a result of their action. Our results show that these transcription factors are highly specific in both DNA binding and gene regulation and are able to open targeted regions of closed chromatin independent of gene activation. Collectively, these results underscore the potential for these technologies to make precise changes to gene expression for gene and cell therapies or fundamental studies of gene function. © 2015 Polstein et al.; Published by Cold Spring Harbor Laboratory Press.
Hu, Xihao; Wu, Yang; Lu, Zhi John; Yip, Kevin Y
2016-11-01
High-throughput sequencing has been used to study posttranscriptional regulations, where the identification of protein-RNA binding is a major and fast-developing sub-area, which is in turn benefited by the sequencing methods for whole-transcriptome probing of RNA secondary structures. In the study of RNA secondary structures using high-throughput sequencing, bases are modified or cleaved according to their structural features, which alter the resulting composition of sequencing reads. In the study of protein-RNA binding, methods have been proposed to immuno-precipitate (IP) protein-bound RNA transcripts in vitro or in vivo By sequencing these transcripts, the protein-RNA interactions and the binding locations can be identified. For both types of data, read counts are affected by a combination of confounding factors, including expression levels of transcripts, sequence biases, mapping errors and the probing or IP efficiency of the experimental protocols. Careful processing of the sequencing data and proper extraction of important features are fundamentally important to a successful analysis. Here we review and compare different experimental methods for probing RNA secondary structures and binding sites of RNA-binding proteins (RBPs), and the computational methods proposed for analyzing the corresponding sequencing data. We suggest how these two types of data should be integrated to study the structural properties of RBP binding sites as a systematic way to better understand posttranscriptional regulations. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Bergander, Tryggve; Nilsson-Välimaa, Kristina; Oberg, Katarina; Lacki, Karol M
2008-01-01
Steadily increasing demand for more efficient and more affordable biomolecule-based therapies put a significant burden on biopharma companies to reduce the cost of R&D activities associated with introduction of a new drug to the market. Reducing the time required to develop a purification process would be one option to address the high cost issue. The reduction in time can be accomplished if more efficient methods/tools are available for process development work, including high-throughput techniques. This paper addresses the transitions from traditional column-based process development to a modern high-throughput approach utilizing microtiter filter plates filled with a well-defined volume of chromatography resin. The approach is based on implementing the well-known batch uptake principle into microtiter plate geometry. Two variants of the proposed approach, allowing for either qualitative or quantitative estimation of dynamic binding capacity as a function of residence time, are described. Examples of quantitative estimation of dynamic binding capacities of human polyclonal IgG on MabSelect SuRe and of qualitative estimation of dynamic binding capacity of amyloglucosidase on a prototype of Capto DEAE weak ion exchanger are given. The proposed high-throughput method for determination of dynamic binding capacity significantly reduces time and sample consumption as compared to a traditional method utilizing packed chromatography columns without sacrificing the accuracy of data obtained.
Liu, Lian; Zhang, Shao-Wu; Huang, Yufei; Meng, Jia
2017-08-31
As a newly emerged research area, RNA epigenetics has drawn increasing attention recently for the participation of RNA methylation and other modifications in a number of crucial biological processes. Thanks to high throughput sequencing techniques, such as, MeRIP-Seq, transcriptome-wide RNA methylation profile is now available in the form of count-based data, with which it is often of interests to study the dynamics at epitranscriptomic layer. However, the sample size of RNA methylation experiment is usually very small due to its costs; and additionally, there usually exist a large number of genes whose methylation level cannot be accurately estimated due to their low expression level, making differential RNA methylation analysis a difficult task. We present QNB, a statistical approach for differential RNA methylation analysis with count-based small-sample sequencing data. Compared with previous approaches such as DRME model based on a statistical test covering the IP samples only with 2 negative binomial distributions, QNB is based on 4 independent negative binomial distributions with their variances and means linked by local regressions, and in the way, the input control samples are also properly taken care of. In addition, different from DRME approach, which relies only the input control sample only for estimating the background, QNB uses a more robust estimator for gene expression by combining information from both input and IP samples, which could largely improve the testing performance for very lowly expressed genes. QNB showed improved performance on both simulated and real MeRIP-Seq datasets when compared with competing algorithms. And the QNB model is also applicable to other datasets related RNA modifications, including but not limited to RNA bisulfite sequencing, m 1 A-Seq, Par-CLIP, RIP-Seq, etc.
Mykles, Donald L; Burnett, Karen G; Durica, David S; Joyce, Blake L; McCarthy, Fiona M; Schmidt, Carl J; Stillman, Jonathon H
2016-12-01
High-throughput RNA sequencing (RNA-seq) technology has become an important tool for studying physiological responses of organisms to changes in their environment. De novo assembly of RNA-seq data has allowed researchers to create a comprehensive catalog of genes expressed in a tissue and to quantify their expression without a complete genome sequence. The contributions from the "Tapping the Power of Crustacean Transcriptomics to Address Grand Challenges in Comparative Biology" symposium in this issue show the successes and limitations of using RNA-seq in the study of crustaceans. In conjunction with the symposium, the Animal Genome to Phenome Research Coordination Network collated comments from participants at the meeting regarding the challenges encountered when using transcriptomics in their research. Input came from novices and experts ranging from graduate students to principal investigators. Many were unaware of the bioinformatics analysis resources currently available on the CyVerse platform. Our analysis of community responses led to three recommendations for advancing the field: (1) integration of genomic and RNA-seq sequence assemblies for crustacean gene annotation and comparative expression; (2) development of methodologies for the functional analysis of genes; and (3) information and training exchange among laboratories for transmission of best practices. The field lacks the methods for manipulating tissue-specific gene expression. The decapod crustacean research community should consider the cherry shrimp, Neocaridina denticulata, as a decapod model for the application of transgenic tools for functional genomics. This would require a multi-investigator effort. © The Author 2016. Published by Oxford University Press on behalf of the Society for Integrative and Comparative Biology. All rights reserved. For permissions please email: journals.permissions@oup.com.
SMARTIV: combined sequence and structure de-novo motif discovery for in-vivo RNA binding data.
Polishchuk, Maya; Paz, Inbal; Yakhini, Zohar; Mandel-Gutfreund, Yael
2018-05-25
Gene expression regulation is highly dependent on binding of RNA-binding proteins (RBPs) to their RNA targets. Growing evidence supports the notion that both RNA primary sequence and its local secondary structure play a role in specific Protein-RNA recognition and binding. Despite the great advance in high-throughput experimental methods for identifying sequence targets of RBPs, predicting the specific sequence and structure binding preferences of RBPs remains a major challenge. We present a novel webserver, SMARTIV, designed for discovering and visualizing combined RNA sequence and structure motifs from high-throughput RNA-binding data, generated from in-vivo experiments. The uniqueness of SMARTIV is that it predicts motifs from enriched k-mers that combine information from ranked RNA sequences and their predicted secondary structure, obtained using various folding methods. Consequently, SMARTIV generates Position Weight Matrices (PWMs) in a combined sequence and structure alphabet with assigned P-values. SMARTIV concisely represents the sequence and structure motif content as a single graphical logo, which is informative and easy for visual perception. SMARTIV was examined extensively on a variety of high-throughput binding experiments for RBPs from different families, generated from different technologies, showing consistent and accurate results. Finally, SMARTIV is a user-friendly webserver, highly efficient in run-time and freely accessible via http://smartiv.technion.ac.il/.
CELF1 preferentially binds to exon-intron boundary and regulates alternative splicing in HeLa cells.
Xia, Heng; Chen, Dong; Wu, Qijia; Wu, Gang; Zhou, Yanhong; Zhang, Yi; Zhang, Libin
2017-09-01
The current RIP-seq approach has been developed for the identification of genome-wide interaction between RNA binding protein (RBP) and the bound RNA transcripts, but still rarely for identifying its binding sites. In this study, we performed RIP-seq experiments in HeLa cells using a monoclonal antibody against CELF1. Mapping of the RIP-seq reads showed a biased distribution at the 3'UTR and intronic regions. A total of 15,285 and 1384 CELF1-specific sense and antisense peaks were identified using the ABLIRC software tool. Our bioinformatics analyses revealed that 5' and 3' splice site motifs and GU-rich motifs were highly enriched in the CELF1-bound peaks. Furthermore, transcriptome analyses revealed that alternative splicing was globally regulated by CELF1 in HeLa cells. For example, the inclusion of exon 16 of LMO7 gene, a marker gene of breast cancer, is positively regulated by CELF1. Taken together, we have shown that RIP-seq data can be used to decipher RBP binding sites and reveal an unexpected landscape of the genome-wide CELF1-RNA interactions in HeLa cells. In addition, we found that CELF1 globally regulates the alternative splicing by binding the exon-intron boundary in HeLa cells, which will deepen our understanding of the regulatory roles of CELF1 in the pre-mRNA splicing process. Copyright © 2017 Elsevier B.V. All rights reserved.
Brozovic, Matija; Dantec, Christelle; Dardaillon, Justine; Dauga, Delphine; Faure, Emmanuel; Gineste, Mathieu; Louis, Alexandra; Naville, Magali; Nitta, Kazuhiro R; Piette, Jacques; Reeves, Wendy; Scornavacca, Céline; Simion, Paul; Vincentelli, Renaud; Bellec, Maelle; Aicha, Sameh Ben; Fagotto, Marie; Guéroult-Bellone, Marion; Haeussler, Maximilian; Jacox, Edwin; Lowe, Elijah K; Mendez, Mickael; Roberge, Alexis; Stolfi, Alberto; Yokomori, Rui; Brown, C Titus; Cambillau, Christian; Christiaen, Lionel; Delsuc, Frédéric; Douzery, Emmanuel; Dumollard, Rémi; Kusakabe, Takehiro; Nakai, Kenta; Nishida, Hiroki; Satou, Yutaka; Swalla, Billie; Veeman, Michael; Volff, Jean-Nicolas; Lemaire, Patrick
2018-01-04
ANISEED (www.aniseed.cnrs.fr) is the main model organism database for tunicates, the sister-group of vertebrates. This release gives access to annotated genomes, gene expression patterns, and anatomical descriptions for nine ascidian species. It provides increased integration with external molecular and taxonomy databases, better support for epigenomics datasets, in particular RNA-seq, ChIP-seq and SELEX-seq, and features novel interactive interfaces for existing and novel datatypes. In particular, the cross-species navigation and comparison is enhanced through a novel taxonomy section describing each represented species and through the implementation of interactive phylogenetic gene trees for 60% of tunicate genes. The gene expression section displays the results of RNA-seq experiments for the three major model species of solitary ascidians. Gene expression is controlled by the binding of transcription factors to cis-regulatory sequences. A high-resolution description of the DNA-binding specificity for 131 Ciona robusta (formerly C. intestinalis type A) transcription factors by SELEX-seq is provided and used to map candidate binding sites across the Ciona robusta and Phallusia mammillata genomes. Finally, use of a WashU Epigenome browser enhances genome navigation, while a Genomicus server was set up to explore microsynteny relationships within tunicates and with vertebrates, Amphioxus, echinoderms and hemichordates. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
miRge - A Multiplexed Method of Processing Small RNA-Seq Data to Determine MicroRNA Entropy
Myers, Jason R.; Gupta, Simone; Weng, Lien-Chun; Ashton, John M.; Cornish, Toby C.; Pandey, Akhilesh; Halushka, Marc K.
2015-01-01
Small RNA RNA-seq for microRNAs (miRNAs) is a rapidly developing field where opportunities still exist to create better bioinformatics tools to process these large datasets and generate new, useful analyses. We built miRge to be a fast, smart small RNA-seq solution to process samples in a highly multiplexed fashion. miRge employs a Bayesian alignment approach, whereby reads are sequentially aligned against customized mature miRNA, hairpin miRNA, noncoding RNA and mRNA sequence libraries. miRNAs are summarized at the level of raw reads in addition to reads per million (RPM). Reads for all other RNA species (tRNA, rRNA, snoRNA, mRNA) are provided, which is useful for identifying potential contaminants and optimizing small RNA purification strategies. miRge was designed to optimally identify miRNA isomiRs and employs an entropy based statistical measurement to identify differential production of isomiRs. This allowed us to identify decreasing entropy in isomiRs as stem cells mature into retinal pigment epithelial cells. Conversely, we show that pancreatic tumor miRNAs have similar entropy to matched normal pancreatic tissues. In a head-to-head comparison with other miRNA analysis tools (miRExpress 2.0, sRNAbench, omiRAs, miRDeep2, Chimira, UEA small RNA Workbench), miRge was faster (4 to 32-fold) and was among the top-two methods in maximally aligning miRNAs reads per sample. Moreover, miRge has no inherent limits to its multiplexing. miRge was capable of simultaneously analyzing 100 small RNA-Seq samples in 52 minutes, providing an integrated analysis of miRNA expression across all samples. As miRge was designed for analysis of single as well as multiple samples, miRge is an ideal tool for high and low-throughput users. miRge is freely available at http://atlas.pathology.jhu.edu/baras/miRge.html. PMID:26571139
Paukszto, Łukasz; Jastrzębski, Jan P.; Czerwińska, Joanna; Chojnowska, Katarzyna; Kamińska, Barbara; Kurzyńska, Aleksandra; Smolińska, Nina; Giżejewski, Zygmunt; Kamiński, Tadeusz
2017-01-01
The European beaver (Castor fiber L.) is an important free-living rodent that inhabits Eurasian temperate forests. Beavers are often referred to as ecosystem engineers because they create or change existing habitats, enhance biodiversity and prepare the environment for diverse plant and animal species. Beavers are protected in most European Union countries, but their genomic background remains unknown. In this study, gene expression patterns in beaver testes and the variations in genetic expression in breeding and non-breeding seasons were determined by high-throughput transcriptome sequencing. Paired-end sequencing in the Illumina HiSeq 2000 sequencer produced a total of 373.06 million of high-quality reads. De novo assembly of contigs yielded 130,741 unigenes with an average length of 1,369.3 nt, N50 value of 1,734, and average GC content of 46.51%. A comprehensive analysis of the testicular transcriptome revealed more than 26,000 highly expressed unigenes which exhibited the highest homology with Rattus norvegicus and Ictidomys tridecemlineatus genomes. More than 8,000 highly expressed genes were found to be involved in fundamental biological processes, cellular components or molecular pathways. The study also revealed 42 genes whose regulation differed between breeding and non-breeding seasons. During the non-breeding period, the expression of 37 genes was up-regulated, and the expression of 5 genes was down-regulated relative to the breeding season. The identified genes encode molecules which are involved in signaling transduction, DNA repair, stress responses, inflammatory processes, metabolism and steroidogenesis. Our results pave the way for further research into season-dependent variations in beaver testes. PMID:28678806
Chokeshaiusaha, Kaj; Puthier, Denis; Nguyen, Catherine; Sananmuang, Thanida
2018-06-01
Trimethylation of histone 3 (H3) at 4th lysine N-termini (H3K4me3) in gene promoter region was the universal marker of active genes specific to cell lineage. On the contrary, coexistence of trimethylation at 27th lysine (H3K27me3) in the same loci-the bivalent H3K4m3/H3K27me3 was known to suspend the gene transcription in germ cells, and could also be inherited to the developed stem cell. In galline species, throughout example of H3K4m3 and H3K27me3 ChIP-seq analysis was still not provided. We therefore designed and demonstrated such procedures using ChIP-seq and mRNA-seq data of chicken follicular mesenchymal cells and male germ cells. Analytical workflow was designed and provided in this study. ChIP-seq and RNA-seq datasets of follicular mesenchymal cells and male germ cells were acquired and properly preprocessed. Peak calling by Model-based analysis of ChIP-seq 2 was performed to identify H3K4m3 or H3K27me3 enriched regions (Fold-change≥2, FDR≤0.01) in gene promoter regions. Integrative genomics viewer was utilized for cellular retinoic acid binding protein 1 ( CRABP1 ), growth differentiation factor 10 ( GDF10 ), and gremlin 1 ( GREM1 ) gene explorations. The acquired results indicated that follicular mesenchymal cells and germ cells shared several unique gene promoter regions enriched with H3K4me3 (5,704 peaks) and also unique regions of bivalent H3K4m3/H3K27me3 shared between all cell types and germ cells (1,909 peaks). Subsequent observation of follicular mesenchyme-specific genes- CRABP1 , GDF10 , and GREM1 correctly revealed vigorous transcriptions of these genes in follicular mesenchymal cells. As expected, bivalent H3K4m3/H3K27me3 pattern was manifested in gene promoter regions of germ cells, and thus suspended their transcriptions. According the results, an example of chicken H3K4m3/H3K27me3 ChIP-seq data analysis was successfully demonstrated in this study. Hopefully, the provided methodology should hereby be useful for galline ChIP-seq data analysis in the future.
Wei, Yu-Jie; Wu, Yun; Yan, Yin-Zhuo; Zou, Wan; Xue, Jie; Ma, Wen-Rui; Wang, Wei; Tian, Ge; Wang, Li-Ye
2018-01-01
In this study Illumina MiSeq was performed to investigate microbial diversity in soil, leaves, grape, grape juice and wine. A total of 1,043,102 fungal Internal Transcribed Spacer (ITS) reads and 2,422,188 high quality bacterial 16S rDNA sequences were used for taxonomic classification, revealed five fungal and eight bacterial phyla. At the genus level, the dominant fungi were Ascomycota, Sordariales, Tetracladium and Geomyces in soil, Aureobasidium and Pleosporaceae in grapes leaves, Aureobasidium in grape and grape juice. The dominant bacteria were Kaistobacter, Arthrobacter, Skermanella and Sphingomonas in soil, Pseudomonas, Acinetobacter and Kaistobacter in grape and grapes leaves, and Oenococcus in grape juice and wine. Principal coordinate analysis showed structural separation between the composition of fungi and bacteria in all samples. This is the first study to understand microbiome population in soil, grape, grapes leaves, grape juice and wine in Xinjiang through High-throughput Sequencing and identify microorganisms like Saccharomyces cerevisiae and Oenococcus spp. that may contribute to the quality and flavor of wine.
Yan, Yin-zhuo; Zou, Wan; Ma, Wen-rui; Wang, Wei; Tian, Ge; Wang, Li-ye
2018-01-01
In this study Illumina MiSeq was performed to investigate microbial diversity in soil, leaves, grape, grape juice and wine. A total of 1,043,102 fungal Internal Transcribed Spacer (ITS) reads and 2,422,188 high quality bacterial 16S rDNA sequences were used for taxonomic classification, revealed five fungal and eight bacterial phyla. At the genus level, the dominant fungi were Ascomycota, Sordariales, Tetracladium and Geomyces in soil, Aureobasidium and Pleosporaceae in grapes leaves, Aureobasidium in grape and grape juice. The dominant bacteria were Kaistobacter, Arthrobacter, Skermanella and Sphingomonas in soil, Pseudomonas, Acinetobacter and Kaistobacter in grape and grapes leaves, and Oenococcus in grape juice and wine. Principal coordinate analysis showed structural separation between the composition of fungi and bacteria in all samples. This is the first study to understand microbiome population in soil, grape, grapes leaves, grape juice and wine in Xinjiang through High-throughput Sequencing and identify microorganisms like Saccharomyces cerevisiae and Oenococcus spp. that may contribute to the quality and flavor of wine. PMID:29565999
Joshi, Dev Raj; Zhang, Yu; Zhang, Hong; Gao, Yingxin; Yang, Min
2018-01-01
Nitrogenous heterocyclic compounds are key pollutants in coking wastewater; however, the functional potential of microbial communities for biodegradation of such contaminants during biological treatment is still elusive. Herein, a high throughput functional gene array (GeoChip 5.0) in combination with Illumina HiSeq2500 sequencing was used to compare and characterize the microbial community functional structure in a long run (500days) bench scale bioreactor treating coking wastewater, with a control system treating synthetic wastewater. Despite the inhibitory toxic pollutants, GeoChip 5.0 detected almost all key functional gene (average 61,940 genes) categories in the coking wastewater sludge. With higher abundance, aromatic ring cleavage dioxygenase genes including multi ring1,2diox; one ring2,3diox; catechol represented significant functional potential for degradation of aromatic pollutants which was further confirmed by Illumina HiSeq2500 analysis results. Response ratio analysis revealed that three nitrogenous compound degrading genes- nbzA (nitro-aromatics), tdnB (aniline), and scnABC (thiocyanate) were unique for coking wastewater treatment, which might be strong cause to increase ammonia level during the aerobic process. Additionally, HiSeq2500 elucidated carbozole and isoquinoline degradation genes in the system. These findings expanded our understanding on functional potential of microbial communities to remove organic nitrogenous pollutants; hence it will be useful in optimization strategies for biological treatment of coking wastewater. Copyright © 2017. Published by Elsevier B.V.
Comparative Transcriptomic Analyses of Vegetable and Grain Pea (Pisum sativum L.) Seed Development
Liu, Na; Zhang, Guwen; Xu, Shengchun; Mao, Weihua; Hu, Qizan; Gong, Yaming
2015-01-01
Understanding the molecular mechanisms regulating pea seed developmental process is extremely important for pea breeding. In this study, we used high-throughput RNA-Seq and bioinformatics analyses to examine the changes in gene expression during seed development in vegetable pea and grain pea, and compare the gene expression profiles of these two pea types. RNA-Seq generated 18.7 G of raw data, which were then de novo assembled into 77,273 unigenes with a mean length of 930 bp. Our results illustrate that transcriptional control during pea seed development is a highly coordinated process. There were 459 and 801 genes differentially expressed at early and late seed maturation stages between vegetable pea and grain pea, respectively. Soluble sugar and starch metabolism related genes were significantly activated during the development of pea seeds coinciding with the onset of accumulation of sugar and starch in the seeds. A comparative analysis of genes involved in sugar and starch biosynthesis in vegetable pea (high seed soluble sugar and low starch) and grain pea (high seed starch and low soluble sugar) revealed that differential expression of related genes at late development stages results in a negative correlation between soluble sugar and starch biosynthetic flux in vegetable and grain pea seeds. RNA-Seq data was validated by using real-time quantitative RT-PCR analysis for 30 randomly selected genes. To our knowledge, this work represents the first report of seed development transcriptomics in pea. The obtained results provide a foundation to support future efforts to unravel the underlying mechanisms that control the developmental biology of pea seeds, and serve as a valuable resource for improving pea breeding. PMID:26635856
Liu, Guangxin; Wang, Pei; Li, Chan; Wang, Jing; Sun, Zhenyu; Zhao, Xinfeng; Zheng, Xiaohui
2017-07-01
Drug-protein interaction analysis is pregnant in designing new leads during drug discovery. We prepared the stationary phase containing immobilized β 2 -adrenoceptor (β 2 -AR) by linkage of the receptor on macroporous silica gel surface through N,N'-carbonyldiimidazole method. The stationary phase was applied in identifying antiasthmatic target of protopine guided by the prediction of site-directed molecular docking. Subsequent application of immobilized β 2 -AR in exploring the binding of protopine to the receptor was realized by frontal analysis and injection amount-dependent method. The association constants of protopine to β 2 -AR by the 2 methods were (1.00 ± 0.06) × 10 5 M -1 and (1.52 ± 0.14) × 10 4 M -1 . The numbers of binding sites were (1.23 ± 0.07) × 10 -7 M and (9.09 ± 0.06) × 10 -7 M, respectively. These results indicated that β 2 -AR is the specific target for therapeutic action of protopine in vivo. The target-drug binding occurred on Ser 169 in crystal structure of the receptor. Compared with frontal analysis, injection amount-dependent method is advantageous to drug saving, improvement of sampling efficiency, and performing speed. It has grave potential in high-throughput drug-receptor interaction analysis. Copyright © 2017 John Wiley & Sons, Ltd.
Dudakovic, Amel; Evans, Jared M.; Li, Ying; Middha, Sumit; McGee-Lawrence, Meghan E.; van Wijnen, Andre J.; Westendorf, Jennifer J.
2013-01-01
Bone has remarkable regenerative capacity, but this ability diminishes during aging. Histone deacetylase inhibitors (HDIs) promote terminal osteoblast differentiation and extracellular matrix production in culture. The epigenetic events altered by HDIs in osteoblasts may hold clues for the development of new anabolic treatments for osteoporosis and other conditions of low bone mass. To assess how HDIs affect the epigenome of committed osteoblasts, MC3T3 cells were treated with suberoylanilide hydroxamic acid (SAHA) and subjected to microarray gene expression profiling and high-throughput ChIP-Seq analysis. As expected, SAHA induced differentiation and matrix calcification of osteoblasts in vitro. ChIP-Seq analysis revealed that SAHA increased histone H4 acetylation genome-wide and in differentially regulated genes, except for the 500 bp upstream of transcriptional start sites. Pathway analysis indicated that SAHA increased the expression of insulin signaling modulators, including Slc9a3r1. SAHA decreased phosphorylation of insulin receptor β, Akt, and the Akt substrate FoxO1, resulting in FoxO1 stabilization. Thus, SAHA induces genome-wide H4 acetylation and modulates the insulin/Akt/FoxO1 signaling axis, whereas it promotes terminal osteoblast differentiation in vitro. PMID:23940046
DBATE: database of alternative transcripts expression.
Bianchi, Valerio; Colantoni, Alessio; Calderone, Alberto; Ausiello, Gabriele; Ferrè, Fabrizio; Helmer-Citterich, Manuela
2013-01-01
The use of high-throughput RNA sequencing technology (RNA-seq) allows whole transcriptome analysis, providing an unbiased and unabridged view of alternative transcript expression. Coupling splicing variant-specific expression with its functional inference is still an open and difficult issue for which we created the DataBase of Alternative Transcripts Expression (DBATE), a web-based repository storing expression values and functional annotation of alternative splicing variants. We processed 13 large RNA-seq panels from human healthy tissues and in disease conditions, reporting expression levels and functional annotations gathered and integrated from different sources for each splicing variant, using a variant-specific annotation transfer pipeline. The possibility to perform complex queries by cross-referencing different functional annotations permits the retrieval of desired subsets of splicing variant expression values that can be visualized in several ways, from simple to more informative. DBATE is intended as a novel tool to help appreciate how, and possibly why, the transcriptome expression is shaped. DATABASE URL: http://bioinformatica.uniroma2.it/DBATE/.
CNV-seq, a new method to detect copy number variation using high-throughput sequencing.
Xie, Chao; Tammi, Martti T
2009-03-06
DNA copy number variation (CNV) has been recognized as an important source of genetic variation. Array comparative genomic hybridization (aCGH) is commonly used for CNV detection, but the microarray platform has a number of inherent limitations. Here, we describe a method to detect copy number variation using shotgun sequencing, CNV-seq. The method is based on a robust statistical model that describes the complete analysis procedure and allows the computation of essential confidence values for detection of CNV. Our results show that the number of reads, not the length of the reads is the key factor determining the resolution of detection. This favors the next-generation sequencing methods that rapidly produce large amount of short reads. Simulation of various sequencing methods with coverage between 0.1x to 8x show overall specificity between 91.7 - 99.9%, and sensitivity between 72.2 - 96.5%. We also show the results for assessment of CNV between two individual human genomes.
A comparative study of ChIP-seq sequencing library preparation methods.
Sundaram, Arvind Y M; Hughes, Timothy; Biondi, Shea; Bolduc, Nathalie; Bowman, Sarah K; Camilli, Andrew; Chew, Yap C; Couture, Catherine; Farmer, Andrew; Jerome, John P; Lazinski, David W; McUsic, Andrew; Peng, Xu; Shazand, Kamran; Xu, Feng; Lyle, Robert; Gilfillan, Gregor D
2016-10-21
ChIP-seq is the primary technique used to investigate genome-wide protein-DNA interactions. As part of this procedure, immunoprecipitated DNA must undergo "library preparation" to enable subsequent high-throughput sequencing. To facilitate the analysis of biopsy samples and rare cell populations, there has been a recent proliferation of methods allowing sequencing library preparation from low-input DNA amounts. However, little information exists on the relative merits, performance, comparability and biases inherent to these procedures. Notably, recently developed single-cell ChIP procedures employing microfluidics must also employ library preparation reagents to allow downstream sequencing. In this study, seven methods designed for low-input DNA/ChIP-seq sample preparation (Accel-NGS® 2S, Bowman-method, HTML-PCR, SeqPlex™, DNA SMART™, TELP and ThruPLEX®) were performed on five replicates of 1 ng and 0.1 ng input H3K4me3 ChIP material, and compared to a "gold standard" reference PCR-free dataset. The performance of each method was examined for the prevalence of unmappable reads, amplification-derived duplicate reads, reproducibility, and for the sensitivity and specificity of peak calling. We identified consistent high performance in a subset of the tested reagents, which should aid researchers in choosing the most appropriate reagents for their studies. Furthermore, we expect this work to drive future advances by identifying and encouraging use of the most promising methods and reagents. The results may also aid judgements on how comparable are existing datasets that have been prepared with different sample library preparation reagents.
MotifMark: Finding regulatory motifs in DNA sequences.
Hassanzadeh, Hamid Reza; Kolhe, Pushkar; Isbell, Charles L; Wang, May D
2017-07-01
The interaction between proteins and DNA is a key driving force in a significant number of biological processes such as transcriptional regulation, repair, recombination, splicing, and DNA modification. The identification of DNA-binding sites and the specificity of target proteins in binding to these regions are two important steps in understanding the mechanisms of these biological activities. A number of high-throughput technologies have recently emerged that try to quantify the affinity between proteins and DNA motifs. Despite their success, these technologies have their own limitations and fall short in precise characterization of motifs, and as a result, require further downstream analysis to extract useful and interpretable information from a haystack of noisy and inaccurate data. Here we propose MotifMark, a new algorithm based on graph theory and machine learning, that can find binding sites on candidate probes and rank their specificity in regard to the underlying transcription factor. We developed a pipeline to analyze experimental data derived from compact universal protein binding microarrays and benchmarked it against two of the most accurate motif search methods. Our results indicate that MotifMark can be a viable alternative technique for prediction of motif from protein binding microarrays and possibly other related high-throughput techniques.
Kang, Guangliang; Du, Li; Zhang, Hong
2016-06-22
The growing complexity of biological experiment design based on high-throughput RNA sequencing (RNA-seq) is calling for more accommodative statistical tools. We focus on differential expression (DE) analysis using RNA-seq data in the presence of multiple treatment conditions. We propose a novel method, multiDE, for facilitating DE analysis using RNA-seq read count data with multiple treatment conditions. The read count is assumed to follow a log-linear model incorporating two factors (i.e., condition and gene), where an interaction term is used to quantify the association between gene and condition. The number of the degrees of freedom is reduced to one through the first order decomposition of the interaction, leading to a dramatically power improvement in testing DE genes when the number of conditions is greater than two. In our simulation situations, multiDE outperformed the benchmark methods (i.e. edgeR and DESeq2) even if the underlying model was severely misspecified, and the power gain was increasing in the number of conditions. In the application to two real datasets, multiDE identified more biologically meaningful DE genes than the benchmark methods. An R package implementing multiDE is available publicly at http://homepage.fudan.edu.cn/zhangh/softwares/multiDE . When the number of conditions is two, multiDE performs comparably with the benchmark methods. When the number of conditions is greater than two, multiDE outperforms the benchmark methods.
High-confidence coding and noncoding transcriptome maps
2017-01-01
The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap 2.0, The Cancer Genome Atlas, and GTEx projects, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalog that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of noncoding genomes. PMID:28396519
NASA Astrophysics Data System (ADS)
Pimenov, Nikolay; Kanapatskiy, Timur; Sivkov, Vadim; Toshchakov, Stepan; Korzhenkov, Aleksei; Ulyanova, Marina
2016-04-01
Comparison of the biogeochemical and microbial features was done for the gas-bearing and background sediments as well as near-bottom water of the Gdansk Deep, The Baltic Sea. Data were received in October, 2015 during 64th cruise of the R/V Akademik Mstislav Keldysh. Gas-bearing sediments were sampled within the known pockmark (Gas-Point, depth 94 m). Background sediments area (BG-Point, depth 86 m) was located several km off the pockmark area. The sulphate concentration in the pore water of the surface sediment layer (0-5 cm) of Gas-Point was 9,7 mmol/l, and sharply decreased with depth (did not exceed 1 mmol/l deeper than 50 cm). The sulphate concentration decrease at BG-Point also took place but was not so considerable. Sulphate concentration decrease is typical for the organic rich sediments of the high productive areas, both as for the methane seep areas. Fast sulphate depletion occurs due to active processes of its microbial reduction by consortium of the sulphate-reduction bacteria, which may use low-molecular organic compounds or hydrogen, formed at the different stages of the organic matter destruction; as well as within the process of the anaerobic methane oxidation by consortium of the methane-trophic archaea and sulphate-reduction bacteria. Together with sulphate concentration decrease the methane content increase, typical for the marine sediments, occurred. At the Gas-Point the methane concentration varied within 10 μmol/dm3 in the surface layer till its maximum at sediment horizon of 65 cm (5 mmol/dm3), and decreased to 1.5 mmol/dm3 at depth of 300 cm. The BG-Point maximum values were defined at sediment horizon 6 cm (2,6 μmol/dm3). Methane sulfate transition zone at the Gas-Point sediments was at 25-35 cm depth; whereas it was not defined at the BG-Point mud. High methane concentration in the gas-bearing sediments results in the formation of the methane seep from the sediments to the near-bottom water. So the Gas-Point near-bottom waters were characterized by high methane concentration (0.36-0.50 μmol/l) even in the water 2-5 m above the bottom (0.08-0.28 μmol/l), whereas at the BG-Point sediments methane concentration in the near-bottom water was 0.06-0.08 μmol/l. In order to get insights into the structure of microbial community responsible for realization of these redox processes we performed microbial community profiling using high-throughput 16S amplicon sequencing. DNA was extracted from sediments and water column in pockmark and background zones. NGS libraries were prepared with fusion primers for V4 variable region (Caporaso et al., 2012) and sequenced on the MiSeq system. Results well correlated with new data obtained from the analysis of the intensity of microbial processes. The study was financed by the Russian Scientific Fund (grant 14-37-00047). Caporaso JG, Lauber CL, Walters WA, Berg-Lyons D, Huntley J, Fierer N, Owens SM, Betley J, Fraser L, Bauer M, Gormley N, Gilbert JA, Smith G, Knight R. Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. ISME J. 2012 Aug;6(8):1621-4
Su, Zhipeng; Zhu, Jiawen; Xu, Zhuofei; Xiao, Ran; Zhou, Rui; Li, Lu; Chen, Huanchun
2016-01-01
Actinobacillus pleuropneumoniae is the pathogen of porcine contagious pleuropneumoniae, a highly contagious respiratory disease of swine. Although the genome of A. pleuropneumoniae was sequenced several years ago, limited information is available on the genome-wide transcriptional analysis to accurately annotate the gene structures and regulatory elements. High-throughput RNA sequencing (RNA-seq) has been applied to study the transcriptional landscape of bacteria, which can efficiently and accurately identify gene expression regions and unknown transcriptional units, especially small non-coding RNAs (sRNAs), UTRs and regulatory regions. The aim of this study is to comprehensively analyze the transcriptome of A. pleuropneumoniae by RNA-seq in order to improve the existing genome annotation and promote our understanding of A. pleuropneumoniae gene structures and RNA-based regulation. In this study, we utilized RNA-seq to construct a single nucleotide resolution transcriptome map of A. pleuropneumoniae. More than 3.8 million high-quality reads (average length ~90 bp) from a cDNA library were generated and aligned to the reference genome. We identified 32 open reading frames encoding novel proteins that were mis-annotated in the previous genome annotations. The start sites for 35 genes based on the current genome annotation were corrected. Furthermore, 51 sRNAs in the A. pleuropneumoniae genome were discovered, of which 40 sRNAs were never reported in previous studies. The transcriptome map also enabled visualization of 5'- and 3'-UTR regions, in which contained 11 sRNAs. In addition, 351 operons covering 1230 genes throughout the whole genome were identified. The RNA-Seq based transcriptome map validated annotated genes and corrected annotations of open reading frames in the genome, and led to the identification of many functional elements (e.g. regions encoding novel proteins, non-coding sRNAs and operon structures). The transcriptional units described in this study provide a foundation for future studies concerning the gene functions and the transcriptional regulatory architectures of this pathogen. PMID:27018591
High-sensitivity HLA typing by Saturated Tiling Capture Sequencing (STC-Seq).
Jiao, Yang; Li, Ran; Wu, Chao; Ding, Yibin; Liu, Yanning; Jia, Danmei; Wang, Lifeng; Xu, Xiang; Zhu, Jing; Zheng, Min; Jia, Junling
2018-01-15
Highly polymorphic human leukocyte antigen (HLA) genes are responsible for fine-tuning the adaptive immune system. High-resolution HLA typing is important for the treatment of autoimmune and infectious diseases. Additionally, it is routinely performed for identifying matched donors in transplantation medicine. Although many HLA typing approaches have been developed, the complexity, low-efficiency and high-cost of current HLA-typing assays limit their application in population-based high-throughput HLA typing for donors, which is required for creating large-scale databases for transplantation and precision medicine. Here, we present a cost-efficient Saturated Tiling Capture Sequencing (STC-Seq) approach to capturing 14 HLA class I and II genes. The highly efficient capture (an approximately 23,000-fold enrichment) of these genes allows for simplified allele calling. Tests on five genes (HLA-A/B/C/DRB1/DQB1) from 31 human samples and 351 datasets using STC-Seq showed results that were 98% consistent with the known two sets of digitals (field1 and field2) genotypes. Additionally, STC can capture genomic DNA fragments longer than 3 kb from HLA loci, making the library compatible with the third-generation sequencing. STC-Seq is a highly accurate and cost-efficient method for HLA typing which can be used to facilitate the establishment of population-based HLA databases for the precision and transplantation medicine.
Tadra-Sfeir, Michelle Z; Faoro, Helisson; Camilios-Neto, Doumit; Brusamarello-Santos, Liziane; Balsanelli, Eduardo; Weiss, Vinicius; Baura, Valter A; Wassem, Roseli; Cruz, Leonardo M; De Oliveira Pedrosa, Fábio; Souza, Emanuel M; Monteiro, Rose A
2015-01-01
Herbaspirillum seropedicae is a diazotrophic bacterium which associates endophytically with economically important gramineae. Flavonoids such as naringenin have been shown to have an effect on the interaction between H. seropedicae and its host plants. We used a high-throughput sequencing based method (RNA-Seq) to access the influence of naringenin on the whole transcriptome profile of H. seropedicae. Three hundred and four genes were downregulated and seventy seven were upregulated by naringenin. Data analysis revealed that genes related to bacterial flagella biosynthesis, chemotaxis and biosynthesis of peptidoglycan were repressed by naringenin. Moreover, genes involved in aromatic metabolism and multidrug transport efllux were actived.
Bottini, Silvia; Hamouda-Tekaya, Nedra; Tanasa, Bogdan; Zaragosi, Laure-Emmanuelle; Grandjean, Valerie; Repetto, Emanuela; Trabucchi, Michele
2017-05-19
Experimental evidence indicates that about 60% of miRNA-binding activity does not follow the canonical rule about the seed matching between miRNA and target mRNAs, but rather a non-canonical miRNA targeting activity outside the seed or with a seed-like motifs. Here, we propose a new unbiased method to identify canonical and non-canonical miRNA-binding sites from peaks identified by Ago2 Cross-Linked ImmunoPrecipitation associated to high-throughput sequencing (CLIP-seq). Since the quality of peaks is of pivotal importance for the final output of the proposed method, we provide a comprehensive benchmarking of four peak detection programs, namely CIMS, PIPE-CLIP, Piranha and Pyicoclip, on four publicly available Ago2-HITS-CLIP datasets and one unpublished in-house Ago2-dataset in stem cells. We measured the sensitivity, the specificity and the position accuracy toward miRNA binding sites identification, and the agreement with TargetScan. Secondly, we developed a new pipeline, called miRBShunter, to identify canonical and non-canonical miRNA-binding sites based on de novo motif identification from Ago2 peaks and prediction of miRNA::RNA heteroduplexes. miRBShunter was tested and experimentally validated on the in-house Ago2-dataset and on an Ago2-PAR-CLIP dataset in human stem cells. Overall, we provide guidelines to choose a suitable peak detection program and a new method for miRNA-target identification. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Bottini, Silvia; Hamouda-Tekaya, Nedra; Tanasa, Bogdan; Zaragosi, Laure-Emmanuelle; Grandjean, Valerie; Repetto, Emanuela
2017-01-01
Abstract Experimental evidence indicates that about 60% of miRNA-binding activity does not follow the canonical rule about the seed matching between miRNA and target mRNAs, but rather a non-canonical miRNA targeting activity outside the seed or with a seed-like motifs. Here, we propose a new unbiased method to identify canonical and non-canonical miRNA-binding sites from peaks identified by Ago2 Cross-Linked ImmunoPrecipitation associated to high-throughput sequencing (CLIP-seq). Since the quality of peaks is of pivotal importance for the final output of the proposed method, we provide a comprehensive benchmarking of four peak detection programs, namely CIMS, PIPE-CLIP, Piranha and Pyicoclip, on four publicly available Ago2-HITS-CLIP datasets and one unpublished in-house Ago2-dataset in stem cells. We measured the sensitivity, the specificity and the position accuracy toward miRNA binding sites identification, and the agreement with TargetScan. Secondly, we developed a new pipeline, called miRBShunter, to identify canonical and non-canonical miRNA-binding sites based on de novo motif identification from Ago2 peaks and prediction of miRNA::RNA heteroduplexes. miRBShunter was tested and experimentally validated on the in-house Ago2-dataset and on an Ago2-PAR-CLIP dataset in human stem cells. Overall, we provide guidelines to choose a suitable peak detection program and a new method for miRNA-target identification. PMID:28108660
Gray, Lucas T; Yao, Zizhen; Nguyen, Thuc Nghi; Kim, Tae Kyung; Zeng, Hongkui; Tasic, Bosiljka
2017-01-01
Mammalian cortex is a laminar structure, with each layer composed of a characteristic set of cell types with different morphological, electrophysiological, and connectional properties. Here, we define chromatin accessibility landscapes of major, layer-specific excitatory classes of neurons, and compare them to each other and to inhibitory cortical neurons using the Assay for Transposase-Accessible Chromatin with high-throughput sequencing (ATAC-seq). We identify a large number of layer-specific accessible sites, and significant association with genes that are expressed in specific cortical layers. Integration of these data with layer-specific transcriptomic profiles and transcription factor binding motifs enabled us to construct a regulatory network revealing potential key layer-specific regulators, including Cux1/2, Foxp2, Nfia, Pou3f2, and Rorb. This dataset is a valuable resource for identifying candidate layer-specific cis-regulatory elements in adult mouse cortex. DOI: http://dx.doi.org/10.7554/eLife.21883.001 PMID:28112643
Stem loop recognition by DDX17 facilitates miRNA processing and antiviral defense
Moy, Ryan H.; Cole, Brian S.; Yasunaga, Ari; Gold, Beth; Shankarling, Ganesh; Varble, Andrew; Molleston, Jerome M.; tenOever, Benjamin R.; Lynch, Kristen W.; Cherry, Sara
2014-01-01
DEAD-box helicases play essential roles in RNA metabolism across species, but emerging data suggest that they have additional functions in immunity. Through RNAi screening we identify an evolutionarily conserved and interferon-independent role for the DEAD-box helicase DDX17 in restricting Rift Valley fever virus (RVFV), a mosquito-transmitted virus in the bunyavirus family that causes severe morbidity and mortality in humans and livestock. Loss of Drosophila DDX17 (Rm62) in cells and flies enhanced RVFV infection. Similarly, depletion of DDX17 but not the related helicase DDX5 increased RVFV replication in human cells. Using cross-linking immunoprecipitation high-throughput sequencing (CLIP-seq), we show that DDX17 binds the stem loops of host pri-miRNA to facilitate their processing, and also an essential stem loop in bunyaviral RNA to restrict infection. Thus, DDX17 has dual roles in the recognition of stem loops: in the nucleus for endogenous miRNA biogenesis and in the cytoplasm for surveillance against structured non-self elements. PMID:25126784
Chèneby, Jeanne; Gheorghe, Marius; Artufel, Marie; Mathelier, Anthony; Ballester, Benoit
2018-01-04
With this latest release of ReMap (http://remap.cisreg.eu), we present a unique collection of regulatory regions in human, as a result of a large-scale integrative analysis of ChIP-seq experiments for hundreds of transcriptional regulators (TRs) such as transcription factors, transcriptional co-activators and chromatin regulators. In 2015, we introduced the ReMap database to capture the genome regulatory space by integrating public ChIP-seq datasets, covering 237 TRs across 13 million (M) peaks. In this release, we have extended this catalog to constitute a unique collection of regulatory regions. Specifically, we have collected, analyzed and retained after quality control a total of 2829 ChIP-seq datasets available from public sources, covering a total of 485 TRs with a catalog of 80M peaks. Additionally, the updated database includes new search features for TR names as well as aliases, including cell line names and the ability to navigate the data directly within genome browsers via public track hubs. Finally, full access to this catalog is available online together with a TR binding enrichment analysis tool. ReMap 2018 provides a significant update of the ReMap database, providing an in depth view of the complexity of the regulatory landscape in human. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Xie, Qi; Niu, Jun; Xu, Xilin; Xu, Lixin; Zhang, Yinbing; Fan, Bo; Liang, Xiaohong; Zhang, Lijuan; Yin, Shuxia; Han, Liebao
2015-01-01
Japanese lawngrass (Zoysia japonica Steud.) is an important warm-season turfgrass that is able to survive in a range of soils, from infertile sands to clays, and to grow well under saline conditions. However, little is known about the molecular mechanisms involved in its resistance to salt stress. Here, we used high-throughput RNA sequencing (RNA-seq) to investigate the changes in gene expression of Zoysia grass at high NaCl concentrations. We first constructed two sequencing libraries, including control and NaCl-treated samples, and sequenced them using the Illumina HiSeq™ 2000 platform. Approximately 157.20 million paired-end reads with a total length of 68.68 Mb were obtained. Subsequently, 32,849 unigenes with an N50 length of 1781 bp were assembled using Trinity. Furthermore, three public databases, the Kyoto Encyclopedia of Genes and Genomes (KEGG), Swiss-prot, and Clusters of Orthologous Groups (COGs), were used for gene function analysis and enrichment. The annotated genes included 57 Gene Ontology (GO) terms, 120 KEGG pathways, and 24 COGs. Compared with the control, 1455 genes were significantly different (false discovery rate ≤0.01, |log2Ratio |≥1) in the NaCl-treated samples. These genes were enriched in 10 KEGG pathways and 73 GO terms, and subjected to 25 COG categories. Using high-throughput next-generation sequencing, we built a database as a global transcript resource for Z. japonica Steud. roots. The results of this study will advance our understanding of the early salt response in Japanese lawngrass roots. PMID:26347751
Dong, Juyao; Salem, Daniel P; Sun, Jessica H; Strano, Michael S
2018-04-24
The high-throughput, label-free detection of biomolecules remains an important challenge in analytical chemistry with the potential of nanosensors to significantly increase the ability to multiplex such assays. In this work, we develop an optical sensor array, printable from a single-walled carbon nanotube/chitosan ink and functionalized to enable a divalent ion-based proximity quenching mechanism for transducing binding between a capture protein or an antibody with the target analyte. Arrays of 5 × 6, 200 μm near-infrared (nIR) spots at a density of ≈300 spots/cm 2 are conjugated with immunoglobulin-binding proteins (proteins A, G, and L) for the detection of human IgG, mouse IgM, rat IgG2a, and human IgD. Binding kinetics are measured in a parallel, multiplexed fashion from each sensor spot using a custom laser scanning imaging configuration with an nIR photomultiplier tube detector. These arrays are used to examine cross-reactivity, competitive and nonspecific binding of analyte mixtures. We find that protein G and protein L functionalized sensors report selective responses to mouse IgM on the latter, as anticipated. Optically addressable platforms such as the one examined in this work have potential to significantly advance the real-time, multiplexed biomolecular detection of complex mixtures.
Bond, Thomas E H; Sorenson, Alanna E; Schaeffer, Patrick M
2017-12-01
Biotin protein ligase (BirA) has been identified as an emerging drug target in Mycobacterium tuberculosis due to its essential metabolic role. Indeed, it is the only enzyme capable of covalently attaching biotin onto the biotin carboxyl carrier protein subunit of the acetyl-CoA carboxylase. Despite recent interest in this protein, there is still a gap in cost-effective high-throughput screening assays for rapid identification of mycobacterial BirA-targeting inhibitors. We present for the first time the cloning, expression, purification of mycobacterial GFP-tagged BirA and its application for the development of a high-throughput assay building on the principle of differential scanning fluorimetry of GFP-tagged proteins. The data obtained in this study reveal how biotin and ATP significantly increase the thermal stability (ΔT m =+16.5°C) of M. tuberculosis BirA and lead to formation of a high affinity holoenzyme complex (K obs =7.7nM). The new findings and mycobacterial BirA high-throughput assay presented in this work could provide an efficient platform for future anti-tubercular drug discovery campaigns. Copyright © 2017 Elsevier GmbH. All rights reserved.
Han, Xiaoping; Chen, Haide; Huang, Daosheng; Chen, Huidong; Fei, Lijiang; Cheng, Chen; Huang, He; Yuan, Guo-Cheng; Guo, Guoji
2018-04-05
Human pluripotent stem cells (hPSCs) provide powerful models for studying cellular differentiations and unlimited sources of cells for regenerative medicine. However, a comprehensive single-cell level differentiation roadmap for hPSCs has not been achieved. We use high throughput single-cell RNA-sequencing (scRNA-seq), based on optimized microfluidic circuits, to profile early differentiation lineages in the human embryoid body system. We present a cellular-state landscape for hPSC early differentiation that covers multiple cellular lineages, including neural, muscle, endothelial, stromal, liver, and epithelial cells. Through pseudotime analysis, we construct the developmental trajectories of these progenitor cells and reveal the gene expression dynamics in the process of cell differentiation. We further reprogram primed H9 cells into naïve-like H9 cells to study the cellular-state transition process. We find that genes related to hemogenic endothelium development are enriched in naïve-like H9. Functionally, naïve-like H9 show higher potency for differentiation into hematopoietic lineages than primed cells. Our single-cell analysis reveals the cellular-state landscape of hPSC early differentiation, offering new insights that can be harnessed for optimization of differentiation protocols.
Kunz, Meik; Dandekar, Thomas; Naseem, Muhammad
2017-01-01
Cytokinins (CKs) play an important role in plant growth and development. Also, several studies highlight the modulatory implications of CKs for plant-pathogen interaction. However, the underlying mechanisms of CK mediating immune networks in plants are still not fully understood. A detailed analysis of high-throughput transcriptome (RNA-Seq and microarrays) datasets under modulated conditions of plant CKs and its mergence with cellular interactome (large-scale protein-protein interaction data) has the potential to unlock the contribution of CKs to plant defense. Here, we specifically describe a detailed systems biology methodology pertinent to the acquisition and analysis of various omics datasets that delineate the role of plant CKs in impacting immune pathways in Arabidopsis.
Global regulation of alternative RNA splicing by the SR-rich protein RBM39.
Mai, Sanyue; Qu, Xiuhua; Li, Ping; Ma, Qingjun; Cao, Cheng; Liu, Xuan
2016-08-01
RBM39 is a serine/arginine-rich RNA-binding protein that is highly homologous to the splicing factor U2AF65. However, the role of RBM39 in alternative splicing is poorly understood. In this study, RBM39-mediated global alternative splicing was investigated using RNA-Seq and genome-wide RBM39-RNA interactions were mapped via cross-linking and immunoprecipitation coupled with deep sequencing (CLIP-Seq) in wild-type and RBM39-knockdown MCF-7 cells. RBM39 was involved in the up- or down-regulation of the transcript levels of various genes. Hundreds of alternative splicing events regulated by endogenous RBM39 were identified. The majority of these events were cassette exons. Genes containing RBM39-regulated alternative exons were found to be linked to G2/M transition, cellular response to DNA damage, adherens junctions and endocytosis. CLIP-Seq analysis showed that the binding site of RBM39 was mainly in proximity to 5' and 3' splicing sites. Considerable RBM39 binding to mRNAs encoding proteins involved in translation was observed. Of particular importance, ~20% of the alternative splicing events that were significantly regulated by RBM39 were similarly regulated by U2AF65. RBM39 is extensively involved in alternative splicing of RNA and helps regulate transcript levels. RBM39 may modulate alternative splicing similarly to U2AF65 by either directly binding to RNA or recruiting other splicing factors, such as U2AF65. The current study offers a genome-wide view of RBM39's regulatory function in alternative splicing. RBM39 may play important roles in multiple cellular processes by regulating both alternative splicing of RNA molecules and transcript levels. Copyright © 2016 Elsevier B.V. All rights reserved.
Chiaraviglio, Lucius
2014-01-01
Abstract Interpretation of high throughput screening (HTS) data in cell-based assays may be confounded by cytotoxic properties of screening compounds. Therefore, assessing cell toxicity in real time during the HTS process itself would be highly advantageous. Here, we investigate the potential of putatively impermeant, fluorescent, DNA-binding dyes to give cell toxicity readout during HTS. Amongst 19 DNA-binding dyes examined, three classes were identified that were (1) permeant, (2) cytotoxic, or (3) neither permeant nor cytotoxic during 3-day incubation with a macrophage cell line. In the last class, four dyes (SYTOX Green, CellTox Green, GelGreen, and EvaGreen) gave highly robust cytotoxicity data in 384-well screening plates. As proof of principle, successful combination with a luminescence-based assay in HTS format was demonstrated. Here, both intracellular growth of Legionella pneumophila (luminescence) and host cell viability (SYTOX Green exclusion) were assayed in the same screening well. Incorporation of membrane-impermeant, DNA-binding, fluorescent dyes in HTS assays should prove useful by allowing evaluation of cytotoxicity in real time, eliminating reagent addition steps and effort associated with endpoint cell viability analysis, and reducing the need for follow-up cytotoxicity screening. PMID:24831788
Sze, Sing-Hoi; Parrott, Jonathan J; Tarone, Aaron M
2017-12-06
While the continued development of high-throughput sequencing has facilitated studies of entire transcriptomes in non-model organisms, the incorporation of an increasing amount of RNA-Seq libraries has made de novo transcriptome assembly difficult. Although algorithms that can assemble a large amount of RNA-Seq data are available, they are generally very memory-intensive and can only be used to construct small assemblies. We develop a divide-and-conquer strategy that allows these algorithms to be utilized, by subdividing a large RNA-Seq data set into small libraries. Each individual library is assembled independently by an existing algorithm, and a merging algorithm is developed to combine these assemblies by picking a subset of high quality transcripts to form a large transcriptome. When compared to existing algorithms that return a single assembly directly, this strategy achieves comparable or increased accuracy as memory-efficient algorithms that can be used to process a large amount of RNA-Seq data, and comparable or decreased accuracy as memory-intensive algorithms that can only be used to construct small assemblies. Our divide-and-conquer strategy allows memory-intensive de novo transcriptome assembly algorithms to be utilized to construct large assemblies.
Sridharan, Vinod; Heimiller, Joseph; Robida, Mark D; Singh, Ravinder
2016-01-01
The Drosophila polypyrimidine tract-binding protein (dmPTB or hephaestus) plays an important role during spermatogenesis. The heph2 mutation in this gene results in a specific defect in spermatogenesis, causing aberrant spermatid individualization and male sterility. However, the array of molecular defects in the mutant remains uncharacterized. Using an unbiased high throughput sequencing approach, we have identified transcripts that are misregulated in this mutant. Aberrant transcripts show altered expression levels, exon skipping, and alternative 5' ends. We independently verified these findings by reverse-transcription and polymerase chain reaction (RT-PCR) analysis. Our analysis shows misregulation of transcripts that have been connected to spermatogenesis, including components of the actomyosin cytoskeletal apparatus. We show, for example, that the Myosin light chain 1 (Mlc1) transcript is aberrantly spliced. Furthermore, bioinformatics analysis reveals that Mlc1 contains a high affinity binding site(s) for dmPTB and that the site is conserved in many Drosophila species. We discuss that Mlc1 and other components of the actomyosin cytoskeletal apparatus offer important molecular links between the loss of dmPTB function and the observed developmental defect in spermatogenesis. This study provides the first comprehensive list of genes misregulated in vivo in the heph2 mutant in Drosophila and offers insight into the role of dmPTB during spermatogenesis.
A Pipeline for High-Throughput Concentration Response Modeling of Gene Expression for Toxicogenomics
House, John S.; Grimm, Fabian A.; Jima, Dereje D.; Zhou, Yi-Hui; Rusyn, Ivan; Wright, Fred A.
2017-01-01
Cell-based assays are an attractive option to measure gene expression response to exposure, but the cost of whole-transcriptome RNA sequencing has been a barrier to the use of gene expression profiling for in vitro toxicity screening. In addition, standard RNA sequencing adds variability due to variable transcript length and amplification. Targeted probe-sequencing technologies such as TempO-Seq, with transcriptomic representation that can vary from hundreds of genes to the entire transcriptome, may reduce some components of variation. Analyses of high-throughput toxicogenomics data require renewed attention to read-calling algorithms and simplified dose–response modeling for datasets with relatively few samples. Using data from induced pluripotent stem cell-derived cardiomyocytes treated with chemicals at varying concentrations, we describe here and make available a pipeline for handling expression data generated by TempO-Seq to align reads, clean and normalize raw count data, identify differentially expressed genes, and calculate transcriptomic concentration–response points of departure. The methods are extensible to other forms of concentration–response gene-expression data, and we discuss the utility of the methods for assessing variation in susceptibility and the diseased cellular state. PMID:29163636
MYCN controls an alternative RNA splicing program in high-risk metastatic neuroblastoma
Zhang, Shile; Wei, Jun S.; Li, Samuel Q.; Badgett, Tom C.; Song, Young K.; Agarwal, Saurabh; Coarfa, Cristian; Tolman, Catherine; Hurd, Laura; Liao, Hongling; He, Jianbin; Wen, Xinyu; Liu, Zhihui; Thiele, Carol J.; Westermann, Frank; Asgharzadeh, Shahab; Seeger, Robert C.; Maris, John M.; Auvil, Jamie M Guidry; Smith, Malcolm A; Kolaczyk, Eric D; Shohet, Jason; Khan, Javed
2016-01-01
The molecular mechanisms underlying the aggressive behavior of MYCN driven neuroblastoma (NBL) is under intense investigation; however, little is known about the impact of this family of transcription factors on the splicing program. Here we used high-throughput RNA sequencing to systematically study the expression of RNA isoforms in stage 4 MYCN-amplified NBL, an aggressive subtype of metastatic NBL. We show that MYCN-amplified NBL tumors display a distinct gene splicing pattern affecting multiple cancer hallmark functions. Six splicing factors displayed unique differential expression patterns in MYCN-amplified tumors and cell lines, and the binding motifs for some of these splicing factors are significantly enriched in differentially-spliced genes. Direct binding of MYCN to promoter regions of the splicing factors PTBP1 and HNRNPA1 detected by ChIP-seq demonstrates MYCN controls the splicing pattern by direct regulation of the expression of these key splicing factors. Furthermore, high expression of PTBP1 and HNRNPA1 was significantly associated with poor overall survival of stage4 NBL patients (p≤0.05). Knocking down PTBP1, HNRNPA1 and their downstream target PKM2, an isoform of pro-tumor-growth, result in repressed growth of NBL cells. Therefore, our study reveals a novel role of MYCN in controlling global splicing program through regulation of splicing factors in addition to its well-known role in the transcription program. These findings suggest a therapeutically potential to target the key splicing factors or gene isoforms in high-risk NBL with MYCN-amplification. PMID:26683771
Deng, Yue; Bao, Feng; Yang, Yang; Ji, Xiangyang; Du, Mulong; Zhang, Zhengdong
2017-01-01
Abstract The automated transcript discovery and quantification of high-throughput RNA sequencing (RNA-seq) data are important tasks of next-generation sequencing (NGS) research. However, these tasks are challenging due to the uncertainties that arise in the inference of complete splicing isoform variants from partially observed short reads. Here, we address this problem by explicitly reducing the inherent uncertainties in a biological system caused by missing information. In our approach, the RNA-seq procedure for transforming transcripts into short reads is considered an information transmission process. Consequently, the data uncertainties are substantially reduced by exploiting the information transduction capacity of information theory. The experimental results obtained from the analyses of simulated datasets and RNA-seq datasets from cell lines and tissues demonstrate the advantages of our method over state-of-the-art competitors. Our algorithm is an open-source implementation of MaxInfo. PMID:28911101
Liang, Yuan; Wang, Jing; Fei, Fuhuan; Sun, Huanmei; Liu, Ting; Li, Qian; Zhao, Xinfeng; Zheng, Xiaohui
2018-02-23
Investigations of drug-protein interactions have advanced our knowledge of ways to design more rational drugs. In addition to extensive thermodynamic studies, ongoing works are needed to enhance the exploration of drug-protein binding kinetics. In this work, the beta2-adrenoceptor (β 2 -AR) was immobilized on N, N'-carbonyldiimidazole activated amino polystyrene microspheres to prepare an affinity column (4.6 mm × 5.0 cm, 8 μm). The β 2 -AR column was utilized to determine the binding kinetics of five drugs to the receptor. Introducing peak profiling method into this receptor chromatographic analysis, we determined the dissociation rate constants (k d ) of salbutamol, terbutaline, methoxyphenamine, isoprenaline hydrochloride and ephedrine hydrochloride to β 2 -AR to be 15 (±1), 22 (±1), 3.3 (±0.2), 2.3 (±0.2) and 2.1 (±0.1) s -1 , respectively. The employment of nonlinear chromatography (NLC) in this case exhibited the same rank order of k d values for the five drugs bound to β 2 -AR. We confirmed that both the peak profiling method and NLC were capable of routine measurement of receptor-drug binding kinetics. Compared with the peak profiling method, NLC was advantageous in the simultaneous assessment of the kinetic and apparent thermodynamic parameters. It will become a powerful method for high throughput drug-receptor interaction analysis. Copyright © 2018 Elsevier B.V. All rights reserved.
Even-Desrumeaux, Klervi; Baty, Daniel; Chames, Patrick
2010-01-01
Antibodies microarrays are among the novel class of rapidly emerging proteomic technologies that will allow us to efficiently perform specific diagnosis and proteome analysis. Recombinant antibody fragments are especially suited for this approach but their stability is often a limiting factor. Camelids produce functional antibodies devoid of light chains (HCAbs) of which the single N-terminal domain is fully capable of antigen binding. When produced as an independent domain, these so-called single domain antibody fragments (sdAbs) have several advantages for biotechnological applications thanks to their unique properties of size (15 kDa), stability, solubility, and expression yield. These features should allow sdAbs to outperform other antibody formats in a number of applications, notably as capture molecule for antibody arrays. In this study, we have produced antibody microarrays using direct and oriented immobilization of sdAbs produced in crude bacterial lysates to generate proof-of-principle of a high-throughput compatible array design. Several sdAb immobilization strategies have been explored. Immobilization of in vivo biotinylated sdAbs by direct spotting of bacterial lysate on streptavidin and sandwich detection was developed to achieve high sensitivity and specificity, whereas immobilization of “multi-tagged” sdAbs via anti-tag antibodies and direct labeled sample detection strategy was optimized for the design of high-density antibody arrays for high-throughput proteomics and identification of potential biomarkers. PMID:20859568
Byeon, Ji-Yeon; Bailey, Ryan C
2011-09-07
High affinity capture agents recognizing biomolecular targets are essential in the performance of many proteomic detection methods. Herein, we report the application of a label-free silicon photonic biomolecular analysis platform for simultaneously determining kinetic association and dissociation constants for two representative protein capture agents: a thrombin-binding DNA aptamer and an anti-thrombin monoclonal antibody. The scalability and inherent multiplexing capability of the technology make it an attractive platform for simultaneously evaluating the binding characteristics of multiple capture agents recognizing the same target antigen, and thus a tool complementary to emerging high-throughput capture agent generation strategies.
High-throughput screening in two dimensions: binding intensity and off-rate on a peptide microarray.
Greving, Matthew P; Belcher, Paul E; Cox, Conor D; Daniel, Douglas; Diehnelt, Chris W; Woodbury, Neal W
2010-07-01
We report a high-throughput two-dimensional microarray-based screen, incorporating both target binding intensity and off-rate, which can be used to analyze thousands of compounds in a single binding assay. Relative binding intensities and time-resolved dissociation are measured for labeled tumor necrosis factor alpha (TNF-alpha) bound to a peptide microarray. The time-resolved dissociation is fitted to a one-component exponential decay model, from which relative dissociation rates are determined for all peptides with binding intensities above background. We show that most peptides with the slowest off-rates on the microarray also have the slowest off-rates when measured by surface plasmon resonance (SPR). 2010 Elsevier Inc. All rights reserved.
Wu, Wei; Lu, Chao-Xia; Wang, Yi-Ning; Liu, Fang; Chen, Wei; Liu, Yong-Tai; Han, Ye-Chen; Cao, Jian; Zhang, Shu-Yang; Zhang, Xue
2015-07-10
MYBPC3 dysfunctions have been proven to induce dilated cardiomyopathy, hypertrophic cardiomyopathy, and/or left ventricular noncompaction; however, the genotype-phenotype correlation between MYBPC3 and restrictive cardiomyopathy (RCM) has not been established. The newly developed next-generation sequencing method is capable of broad genomic DNA sequencing with high throughput and can help explore novel correlations between genetic variants and cardiomyopathies. A proband from a multigenerational family with 3 live patients and 1 unrelated patient with clinical diagnoses of RCM underwent a next-generation sequencing workflow based on a custom AmpliSeq panel, including 64 candidate pathogenic genes for cardiomyopathies, on the Ion Personal Genome Machine high-throughput sequencing benchtop instrument. The selected panel contained a total of 64 genes that were reportedly associated with inherited cardiomyopathies. All patients fulfilled strict criteria for RCM with clinical characteristics, echocardiography, and/or cardiac magnetic resonance findings. The multigenerational family with 3 adult RCM patients carried an identical nonsense MYBPC3 mutation, and the unrelated patient carried a missense mutation in the MYBPC3 gene. All of these results were confirmed by the Sanger sequencing method. This study demonstrated that MYBPC3 gene mutations, revealed by next-generation sequencing, were associated with familial and sporadic RCM patients. It is suggested that the next-generation sequencing platform with a selected panel provides a highly efficient approach for molecular diagnosis of hereditary and idiopathic RCM and helps build new genotype-phenotype correlations. © 2015 The Authors. Published on behalf of the American Heart Association, Inc., by Wiley Blackwell.
Chenette, Heather C.S.; Robinson, Julie R.; Hobley, Eboni; Husson, Scott M.
2012-01-01
This paper describes the surface modification of macroporous membranes using ATRP (atom transfer radical polymerization) to create cation-exchange adsorbers with high protein binding capacity at high product throughput. The work is motivated by the need for a more economical and rapid capture step in downstream processing of protein therapeutics. Membranes with three reported nominal pore sizes (0.2, 0.45, 1.0 μm) were modified with poly(3-sulfopropyl methacrylate, potassium salt) tentacles, to create a high density of protein binding sites. A special formulation was used in which the monomer was protected by a crown ether to enable surface-initiated ATRP of this cationic polyelectrolyte. Success with modification was supported by chemical analysis using Fourier-transform infrared spectroscopy and indirectly by measurement of pure water flux as a function of polymerization time. Uniformity of modification within the membranes was visualized with confocal laser scanning microscopy. Static and dynamic binding capacities were measured using lysozyme protein to allow comparisons with reported performance data for commercial cation-exchange materials. Dynamic binding capacities were measured for flow rates ranging from 13 to 109 column volumes (CV)/min. Results show that this unique ATRP formulation can be used to fabricate cation-exchange membrane adsorbers with dynamic binding capacities as high as 70 mg/mL at a throughput of 100 CV/min and unprecedented productivity of 300 mg/mL/min. PMID:23175597
Evaluating whole transcriptome amplification for gene profiling experiments using RNA-Seq.
Faherty, Sheena L; Campbell, C Ryan; Larsen, Peter A; Yoder, Anne D
2015-07-30
RNA-Seq has enabled high-throughput gene expression profiling to provide insight into the functional link between genotype and phenotype. Low quantities of starting RNA can be a severe hindrance for studies that aim to utilize RNA-Seq. To mitigate this bottleneck, whole transcriptome amplification (WTA) technologies have been developed to generate sufficient sequencing targets from minute amounts of RNA. Successful WTA requires accurate replication of transcript abundance without the loss or distortion of specific mRNAs. Here, we test the efficacy of NuGEN's Ovation RNA-Seq V2 system, which uses linear isothermal amplification with a unique chimeric primer for amplification, using white adipose tissue from standard laboratory rats (Rattus norvegicus). Our goal was to investigate potential biological artifacts introduced through WTA approaches by establishing comparisons between matched raw and amplified RNA libraries derived from biological replicates. We found that 93% of expressed genes were identical between all unamplified versus matched amplified comparisons, also finding that gene density is similar across all comparisons. Our sequencing experiment and downstream bioinformatic analyses using the Tuxedo analysis pipeline resulted in the assembly of 25,543 high-quality transcripts. Libraries constructed from raw RNA and WTA samples averaged 15,298 and 15,253 expressed genes, respectively. Although significant differentially expressed genes (P < 0.05) were identified in all matched samples, each of these represents less than 0.15% of all shared genes for each comparison. Transcriptome amplification is efficient at maintaining relative transcript frequencies with no significant bias when using this NuGEN linear isothermal amplification kit under ideal laboratory conditions as presented in this study. This methodology has broad applications, from clinical and diagnostic, to field-based studies when sample acquisition, or sample preservation, methods prove challenging.
Elucidation of terpenoid metabolism in Scoparia dulcis by RNA-seq analysis.
Yamamura, Yoshimi; Kurosaki, Fumiya; Lee, Jung-Bum
2017-03-07
Scoparia dulcis biosynthesize bioactive diterpenes, such as scopadulcic acid B (SDB), which are known for their unique molecular skeleton. Although the biosynthesis of bioactive diterpenes is catalyzed by a sequence of class II and class I diterpene synthases (diTPSs), the mechanisms underlying this process are yet to be fully identified. To elucidate these biosynthetic machinery, we performed a high-throughput RNA-seq analysis, and de novo assembly of clean reads revealed 46,332 unique transcripts and 40,503 two unigenes. We found diTPSs genes including a putative syn-copalyl diphosphate synthase (SdCPS2) and two kaurene synthase-like (SdKSLs) genes. Besides them, total 79 full-length of cytochrome P450 (CYP450) genes were also discovered. The expression analyses showed selected CYP450s associated with their expression pattern of SdCPS2 and SdKSL1, suggesting that CYP450 candidates involved diterpene modification. SdCPS2 represents the first predicted gene to produce syn-copalyl diphosphate in dicots. In addition, SdKSL1 potentially contributes to the SDB biosynthetic pathway. Therefore, these identified genes associated with diterpene biosynthesis lead to the development of genetic engineering focus on diterpene metabolism in S. dulcis.
Elucidation of terpenoid metabolism in Scoparia dulcis by RNA-seq analysis
Yamamura, Yoshimi; Kurosaki, Fumiya; Lee, Jung-Bum
2017-01-01
Scoparia dulcis biosynthesize bioactive diterpenes, such as scopadulcic acid B (SDB), which are known for their unique molecular skeleton. Although the biosynthesis of bioactive diterpenes is catalyzed by a sequence of class II and class I diterpene synthases (diTPSs), the mechanisms underlying this process are yet to be fully identified. To elucidate these biosynthetic machinery, we performed a high-throughput RNA-seq analysis, and de novo assembly of clean reads revealed 46,332 unique transcripts and 40,503 two unigenes. We found diTPSs genes including a putative syn-copalyl diphosphate synthase (SdCPS2) and two kaurene synthase-like (SdKSLs) genes. Besides them, total 79 full-length of cytochrome P450 (CYP450) genes were also discovered. The expression analyses showed selected CYP450s associated with their expression pattern of SdCPS2 and SdKSL1, suggesting that CYP450 candidates involved diterpene modification. SdCPS2 represents the first predicted gene to produce syn-copalyl diphosphate in dicots. In addition, SdKSL1 potentially contributes to the SDB biosynthetic pathway. Therefore, these identified genes associated with diterpene biosynthesis lead to the development of genetic engineering focus on diterpene metabolism in S. dulcis. PMID:28266568
Terakawa, Maki; Muneoka, Satoshi; Nagahira, Kazuhiro; Nagane, Yuriko; Yamate, Jyoji; Motomura, Masakatsu; Utsugisawa, Kimiaki
2017-01-01
The majority of patients with myasthenia gravis (MG), an organ-specific autoimmune disease, harbor autoantibodies that attack the nicotinic acetylcholine receptor (nAChR-Abs) at the neuromuscular junction of skeletal muscles, resulting in muscle weakness. Single cell manipulation technologies coupled with genetic engineering are very powerful tools to examine T cell and B cell repertoires and the dynamics of adaptive immunity. These tools have been utilized to develop mAbs in parallel with hybridomas, phage display technologies and B-cell immortalization. By applying a single cell technology and novel high-throughput cell-based binding assays, we identified peripheral B cells that produce pathogenic nAChR-Abs in patients with MG. Although anti-nAChR antibodies produced by individual peripheral B cells generally exhibited low binding affinity for the α-subunit of the nAChR and great sequence diversity, a small fraction of these antibodies bound with high affinity to native-structured nAChRs on cell surfaces. B12L, one such Ab isolated here, competed with a rat Ab (mAb35) for binding to the human nAChR and thus considered to recognize the main immunogenic region (MIR). By evaluating the Ab in in vitro cell-based assays and an in vivo rat passive transfer model, B12L was found to act as a pathogenic Ab in rodents and presumably in humans.These findings suggest that B cells in peripheral blood may impact MG pathogenicity. Our methodology can be applied not only to validate pathogenic Abs as molecular target of MG treatment, but also to discover and analyze Ab production systems in other human diseases. PMID:29040265
Makino, Tomohiro; Nakamura, Ryuichi; Terakawa, Maki; Muneoka, Satoshi; Nagahira, Kazuhiro; Nagane, Yuriko; Yamate, Jyoji; Motomura, Masakatsu; Utsugisawa, Kimiaki
2017-01-01
The majority of patients with myasthenia gravis (MG), an organ-specific autoimmune disease, harbor autoantibodies that attack the nicotinic acetylcholine receptor (nAChR-Abs) at the neuromuscular junction of skeletal muscles, resulting in muscle weakness. Single cell manipulation technologies coupled with genetic engineering are very powerful tools to examine T cell and B cell repertoires and the dynamics of adaptive immunity. These tools have been utilized to develop mAbs in parallel with hybridomas, phage display technologies and B-cell immortalization. By applying a single cell technology and novel high-throughput cell-based binding assays, we identified peripheral B cells that produce pathogenic nAChR-Abs in patients with MG. Although anti-nAChR antibodies produced by individual peripheral B cells generally exhibited low binding affinity for the α-subunit of the nAChR and great sequence diversity, a small fraction of these antibodies bound with high affinity to native-structured nAChRs on cell surfaces. B12L, one such Ab isolated here, competed with a rat Ab (mAb35) for binding to the human nAChR and thus considered to recognize the main immunogenic region (MIR). By evaluating the Ab in in vitro cell-based assays and an in vivo rat passive transfer model, B12L was found to act as a pathogenic Ab in rodents and presumably in humans.These findings suggest that B cells in peripheral blood may impact MG pathogenicity. Our methodology can be applied not only to validate pathogenic Abs as molecular target of MG treatment, but also to discover and analyze Ab production systems in other human diseases.
Na, Hong; Laver, John D.; Jeon, Jouhyun; Singh, Fateh; Ancevicius, Kristin; Fan, Yujie; Cao, Wen Xi; Nie, Kun; Yang, Zhenglin; Luo, Hua; Wang, Miranda; Rissland, Olivia; Westwood, J. Timothy; Kim, Philip M.; Smibert, Craig A.; Lipshitz, Howard D.; Sidhu, Sachdev S.
2016-01-01
Post-transcriptional regulation of mRNAs plays an essential role in the control of gene expression. mRNAs are regulated in ribonucleoprotein (RNP) complexes by RNA-binding proteins (RBPs) along with associated protein and noncoding RNA (ncRNA) cofactors. A global understanding of post-transcriptional control in any cell type requires identification of the components of all of its RNP complexes. We have previously shown that these complexes can be purified by immunoprecipitation using anti-RBP synthetic antibodies produced by phage display. To develop the large number of synthetic antibodies required for a global analysis of RNP complex composition, we have established a pipeline that combines (i) a computationally aided strategy for design of antigens located outside of annotated domains, (ii) high-throughput antigen expression and purification in Escherichia coli, and (iii) high-throughput antibody selection and screening. Using this pipeline, we have produced 279 antibodies against 61 different protein components of Drosophila melanogaster RNPs. Together with those produced in our low-throughput efforts, we have a panel of 311 antibodies for 67 RNP complex proteins. Tests of a subset of our antibodies demonstrated that 89% immunoprecipitate their endogenous target from embryo lysate. This panel of antibodies will serve as a resource for global studies of RNP complexes in Drosophila. Furthermore, our high-throughput pipeline permits efficient production of synthetic antibodies against any large set of proteins. PMID:26847261
Li, Wenli; Turner, Amy; Aggarwal, Praful; Matter, Andrea; Storvick, Erin; Arnett, Donna K; Broeckel, Ulrich
2015-12-16
Whole transcriptome sequencing (RNA-seq) represents a powerful approach for whole transcriptome gene expression analysis. However, RNA-seq carries a few limitations, e.g., the requirement of a significant amount of input RNA and complications led by non-specific mapping of short reads. The Ion AmpliSeq Transcriptome Human Gene Expression Kit (AmpliSeq) was recently introduced by Life Technologies as a whole-transcriptome, targeted gene quantification kit to overcome these limitations of RNA-seq. To assess the performance of this new methodology, we performed a comprehensive comparison of AmpliSeq with RNA-seq using two well-established next-generation sequencing platforms (Illumina HiSeq and Ion Torrent Proton). We analyzed standard reference RNA samples and RNA samples obtained from human induced pluripotent stem cell derived cardiomyocytes (hiPSC-CMs). Using published data from two standard RNA reference samples, we observed a strong concordance of log2 fold change for all genes when comparing AmpliSeq to Illumina HiSeq (Pearson's r = 0.92) and Ion Torrent Proton (Pearson's r = 0.92). We used ROC, Matthew's correlation coefficient and RMSD to determine the overall performance characteristics. All three statistical methods demonstrate AmpliSeq as a highly accurate method for differential gene expression analysis. Additionally, for genes with high abundance, AmpliSeq outperforms the two RNA-seq methods. When analyzing four closely related hiPSC-CM lines, we show that both AmpliSeq and RNA-seq capture similar global gene expression patterns consistent with known sources of variations. Our study indicates that AmpliSeq excels in the limiting areas of RNA-seq for gene expression quantification analysis. Thus, AmpliSeq stands as a very sensitive and cost-effective approach for very large scale gene expression analysis and mRNA marker screening with high accuracy.
Mapping RNA Structure In Vitro with SHAPE Chemistry and Next-Generation Sequencing (SHAPE-Seq).
Watters, Kyle E; Lucks, Julius B
2016-01-01
Mapping RNA structure with selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) chemistry has proven to be a versatile method for characterizing RNA structure in a variety of contexts. SHAPE reagents covalently modify RNAs in a structure-dependent manner to create adducts at the 2'-OH group of the ribose backbone at nucleotides that are structurally flexible. The positions of these adducts are detected using reverse transcriptase (RT) primer extension, which stops one nucleotide before the modification, to create a pool of cDNAs whose lengths reflect the location of SHAPE modification. Quantification of the cDNA pools is used to estimate the "reactivity" of each nucleotide in an RNA molecule to the SHAPE reagent. High reactivities indicate nucleotides that are structurally flexible, while low reactivities indicate nucleotides that are inflexible. These SHAPE reactivities can then be used to infer RNA structures by restraining RNA structure prediction algorithms. Here, we provide a state-of-the-art protocol describing how to perform in vitro RNA structure probing with SHAPE chemistry using next-generation sequencing to quantify cDNA pools and estimate reactivities (SHAPE-Seq). The use of next-generation sequencing allows for higher throughput, more consistent data analysis, and multiplexing capabilities. The technique described herein, SHAPE-Seq v2.0, uses a universal reverse transcription priming site that is ligated to the RNA after SHAPE modification. The introduced priming site allows for the structural analysis of an RNA independent of its sequence.
Integrating RNA sequencing into neuro-oncology practice.
Rogawski, David S; Vitanza, Nicholas A; Gauthier, Angela C; Ramaswamy, Vijay; Koschmann, Carl
2017-11-01
Malignant tumors of the central nervous system (CNS) cause substantial morbidity and mortality, yet efforts to optimize chemo- and radiotherapy have largely failed to improve dismal prognoses. Over the past decade, RNA sequencing (RNA-seq) has emerged as a powerful tool to comprehensively characterize the transcriptome of CNS tumor cells in one high-throughput step, leading to improved understanding of CNS tumor biology and suggesting new routes for targeted therapies. RNA-seq has been instrumental in improving the diagnostic classification of brain tumors, characterizing oncogenic fusion genes, and shedding light on intratumor heterogeneity. Currently, RNA-seq is beginning to be incorporated into regular neuro-oncology practice in the form of precision neuro-oncology programs, which use information from tumor sequencing to guide implementation of personalized targeted therapies. These programs show great promise in improving patient outcomes for tumors where single agent trials have been ineffective. As RNA-seq is a relatively new technique, many further applications yielding new advances in CNS tumor research and management are expected in the coming years. Copyright © 2017 Elsevier Inc. All rights reserved.
Ehmann, David E; Demeritt, Julie E; Hull, Kenneth G; Fisher, Stewart L
2004-05-06
UDP-N-acetylmuramyl-l-alanine ligase (MurC) is an essential bacterial enzyme involved in peptidoglycan biosynthesis and a target for the discovery of novel antibacterial agents. As a result of a high-throughput screen (HTS) against a chemical library for inhibitors of MurC, a series of benzofuran acyl-sulfonamides was identified as potential leads. One of these compounds, Compound A, inhibited Escherichia coli MurC with an IC(50) of 2.3 microM. Compound A exhibited time-dependent, partially reversible inhibition of E. coli MurC. Kinetic studies revealed a mode of inhibition consistent with the compound acting competitively with the MurC substrates ATP and UDP-N-acetyl-muramic acid (UNAM) with a K(i) of 4.5 microM against ATP and 6.3 microM against UNAM. Fluorescence binding experiments yielded a K(d) of 3.1 microM for the compound binding to MurC. Compound A also exhibited high-affinity binding to bovine serum albumin (BSA) as evidenced by a severe reduction in MurC inhibition upon addition of BSA. This finding is consistent with the high lipophilicity of the compound. Advancement of this compound series for further drug development will require reduction of albumin binding.
Tadra-Sfeir, Michelle Z.; Faoro, Helisson; Camilios-Neto, Doumit; Brusamarello-Santos, Liziane; Balsanelli, Eduardo; Weiss, Vinicius; Baura, Valter A.; Wassem, Roseli; Cruz, Leonardo M.; De Oliveira Pedrosa, Fábio; Souza, Emanuel M.; Monteiro, Rose A.
2015-01-01
Herbaspirillum seropedicae is a diazotrophic bacterium which associates endophytically with economically important gramineae. Flavonoids such as naringenin have been shown to have an effect on the interaction between H. seropedicae and its host plants. We used a high-throughput sequencing based method (RNA-Seq) to access the influence of naringenin on the whole transcriptome profile of H. seropedicae. Three hundred and four genes were downregulated and seventy seven were upregulated by naringenin. Data analysis revealed that genes related to bacterial flagella biosynthesis, chemotaxis and biosynthesis of peptidoglycan were repressed by naringenin. Moreover, genes involved in aromatic metabolism and multidrug transport efllux were actived. PMID:26052319
Survey of the transcriptome of Aspergillus oryzae via massively parallel mRNA sequencing
Wang, Bin; Guo, Guangwu; Wang, Chao; Lin, Ying; Wang, Xiaoning; Zhao, Mouming; Guo, Yong; He, Minghui; Zhang, Yong; Pan, Li
2010-01-01
Aspergillus oryzae, an important filamentous fungus used in food fermentation and the enzyme industry, has been shown through genome sequencing and various other tools to have prominent features in its genomic composition. However, the functional complexity of the A. oryzae transcriptome has not yet been fully elucidated. Here, we applied direct high-throughput paired-end RNA-sequencing (RNA-Seq) to the transcriptome of A. oryzae under four different culture conditions. With the high resolution and sensitivity afforded by RNA-Seq, we were able to identify a substantial number of novel transcripts, new exons, untranslated regions, alternative upstream initiation codons and upstream open reading frames, which provide remarkable insight into the A. oryzae transcriptome. We were also able to assess the alternative mRNA isoforms in A. oryzae and found a large number of genes undergoing alternative splicing. Many genes and pathways that might be involved in higher levels of protein production in solid-state culture than in liquid culture were identified by comparing gene expression levels between different cultures. Our analysis indicated that the transcriptome of A. oryzae is much more complex than previously anticipated, and these results may provide a blueprint for further study of the A. oryzae transcriptome. PMID:20392818
Survey of the transcriptome of Aspergillus oryzae via massively parallel mRNA sequencing.
Wang, Bin; Guo, Guangwu; Wang, Chao; Lin, Ying; Wang, Xiaoning; Zhao, Mouming; Guo, Yong; He, Minghui; Zhang, Yong; Pan, Li
2010-08-01
Aspergillus oryzae, an important filamentous fungus used in food fermentation and the enzyme industry, has been shown through genome sequencing and various other tools to have prominent features in its genomic composition. However, the functional complexity of the A. oryzae transcriptome has not yet been fully elucidated. Here, we applied direct high-throughput paired-end RNA-sequencing (RNA-Seq) to the transcriptome of A. oryzae under four different culture conditions. With the high resolution and sensitivity afforded by RNA-Seq, we were able to identify a substantial number of novel transcripts, new exons, untranslated regions, alternative upstream initiation codons and upstream open reading frames, which provide remarkable insight into the A. oryzae transcriptome. We were also able to assess the alternative mRNA isoforms in A. oryzae and found a large number of genes undergoing alternative splicing. Many genes and pathways that might be involved in higher levels of protein production in solid-state culture than in liquid culture were identified by comparing gene expression levels between different cultures. Our analysis indicated that the transcriptome of A. oryzae is much more complex than previously anticipated, and these results may provide a blueprint for further study of the A. oryzae transcriptome.
Habtom, Habteab; Demanèche, Sandrine; Dawson, Lorna; Azulay, Chen; Matan, Ofra; Robe, Patrick; Gafny, Ron; Simonet, Pascal; Jurkevitch, Edouard; Pasternak, Zohar
2017-01-01
The ubiquity and transferability of soil makes it a resource for the forensic investigator, as it can provide a link between agents and scenes. However, the information contained in soils, such as chemical compounds, physical particles or biological entities, is seldom used in forensic investigations; due mainly to the associated costs, lack of available expertise, and the lack of soil databases. The microbial DNA in soil is relatively easy to access and analyse, having thus the potential to provide a powerful means for discriminating soil samples or linking them to a common origin. We compared the effectiveness and reliability of multiple methods and genes for bacterial characterisation in the differentiation of soil samples: ribosomal intergenic spacer analysis (RISA), terminal restriction fragment length polymorphism (TRFLP) of the rpoB gene, and five methods using the 16S rRNA gene: phylogenetic microarrays, TRFLP, and high throughput sequencing with Roche 454, Illumina MiSeq and IonTorrent PGM platforms. All these methods were also compared to long-chain hydrocarbons (n-alkanes) and fatty alcohol profiling of the same soil samples. RISA, 16S TRFLP and MiSeq performed best, reliably and significantly discriminating between adjacent, similar soil types. As TRFLP employs the same capillary electrophoresis equipment and procedures used to analyse human DNA, it is readily available for use in most forensic laboratories. TRFLP was optimized for forensic usage in five parameters: choice of primer pair, fluorescent tagging, concentrating DNA after digestion, number of PCR amplifications per sample and number of capillary electrophoresis runs per PCR amplification. This study shows that molecular microbial ecology methodologies are robust in discriminating between soil samples, illustrating their potential usage as an evaluative forensic tool. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Leach, Richard E.; Jessmon, Philip; Coutifaris, Christos; Kruger, Michael; Myers, Evan R.; Ali-Fehmi, Rouba; Carson, Sandra A.; Legro, Richard S.; Schlaff, William D.; Carr, Bruce R.; Steinkampf, Michael P.; Silva, Susan; Leppert, Phyllis C.; Giudice, Linda; Diamond, Michael P.; Armant, D. Randall
2012-01-01
BACKGROUND Although histological dating of endometrial biopsies provides little help for prediction or diagnosis of infertility, analysis of individual endometrial proteins, proteomic profiling and transcriptome analysis have suggested several biomarkers with altered expression arising from intrinsic abnormalities, inadequate stimulation by or in response to gonadal steroids or altered function due to systemic disorders. The objective of this study was to delineate the developmental dynamics of potentially important proteins in the secretory phase of the menstrual cycle, utilizing a collection of endometrial biopsies from women of fertile (n = 89) and infertile (n = 89) couples. METHODS AND RESULTS Progesterone receptor-B (PGR-B), leukemia inhibitory factor, glycodelin/progestagen-associated endometrial protein (PAEP), homeobox A10, heparin-binding EGF-like growth factor, calcitonin and chemokine ligand 14 (CXCL14) were measured using a high-throughput, quantitative immunohistochemical method. Significant cyclic and tissue-specific regulation was documented for each protein, as well as their dysregulation in women of infertile couples. Infertile patients demonstrated a delay early in the secretory phase in the decline of PGR-B (P < 0.05) and premature mid-secretory increases in PAEP (P < 0.05) and CXCL14 (P < 0.05), suggesting that the implantation interval could be closing early. Correlation analysis identified potential interactions among certain proteins that were disrupted by infertility. CONCLUSIONS This approach overcomes the limitations of a small sample number. Protein expression and localization provided important insights into the potential roles of these proteins in normal and pathological development of the endometrium that is not attainable from transcriptome analysis, establishing a basis for biomarker, diagnostic and targeted drug development for women with infertility. PMID:22215622
A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data
2014-01-01
Abstract ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data. Reviewers This article was reviewed by Prof. Sandor Pongor, Dr. Yuriy Gusev, and Dr. Shyam Prabhakar (nominated by Prof. Limsoon Wong). PMID:24555784
Le Saux, Thomas; Hisamoto, Hideaki; Terabe, Shigeru
2006-02-03
Measurement of binding constant by chip electrophoresis is a very promising technique for the high throughput screening of non-covalent interactions. Among the different electrophoretic methods available that yield the binding parameters, continuous frontal analysis is the most appropriate for a transposition from capillary electrophoresis (CE) to microchip electrophoresis. Implementation of this methodology in microchip was exemplified by the measurement of inclusion constants of 2-naphtalenesulfonate and neutral phenols (phenol, 4-chlorophenol and 4-nitrophenol) into beta-cyclodextrin by competitive assays. The issue of competitor choice is discussed in relation to its appropriateness for proper monitoring of the interaction.
Kuang, Jian-Fei; Chen, Jian-Ye; Liu, Xun-Cheng; Han, Yan-Chao; Xiao, Yun-Yi; Shan, Wei; Tang, Yang; Wu, Ke-Qiang; He, Jun-Xian; Lu, Wang-Jin
2017-04-01
Fruit ripening is a complex, genetically programmed process involving the action of critical transcription factors (TFs). Despite the established significance of dehydration-responsive element binding (DREB) TFs in plant abiotic stress responses, the involvement of DREBs in fruit ripening is yet to be determined. Here, we identified four genes encoding ripening-regulated DREB TFs in banana (Musa acuminata), MaDREB1, MaDREB2, MaDREB3, and MaDREB4, and demonstrated that they play regulatory roles in fruit ripening. We showed that MaDREB1-MaDREB4 are nucleus-localized, induced by ethylene and encompass transcriptional activation activities. We performed a genome-wide chromatin immunoprecipitation and high-throughput sequencing (ChIP-Seq) experiment for MaDREB2 and identified 697 genomic regions as potential targets of MaDREB2. MaDREB2 binds to hundreds of loci with diverse functions and its binding sites are distributed in the promoter regions proximal to the transcriptional start site (TSS). Most of the MaDREB2-binding targets contain the conserved (A/G)CC(G/C)AC motif and MaDREB2 appears to directly regulate the expression of a number of genes involved in fruit ripening. In combination with transcriptome profiling (RNA sequencing) data, our results indicate that MaDREB2 may serve as both transcriptional activator and repressor during banana fruit ripening. In conclusion, our study suggests a hierarchical regulatory model of fruit ripening in banana and that the MaDREB TFs may act as transcriptional regulators in the regulatory network. © 2017 The Authors. New Phytologist © 2017 New Phytologist Trust.
CANEapp: a user-friendly application for automated next generation transcriptomic data analysis.
Velmeshev, Dmitry; Lally, Patrick; Magistri, Marco; Faghihi, Mohammad Ali
2016-01-13
Next generation sequencing (NGS) technologies are indispensable for molecular biology research, but data analysis represents the bottleneck in their application. Users need to be familiar with computer terminal commands, the Linux environment, and various software tools and scripts. Analysis workflows have to be optimized and experimentally validated to extract biologically meaningful data. Moreover, as larger datasets are being generated, their analysis requires use of high-performance servers. To address these needs, we developed CANEapp (application for Comprehensive automated Analysis of Next-generation sequencing Experiments), a unique suite that combines a Graphical User Interface (GUI) and an automated server-side analysis pipeline that is platform-independent, making it suitable for any server architecture. The GUI runs on a PC or Mac and seamlessly connects to the server to provide full GUI control of RNA-sequencing (RNA-seq) project analysis. The server-side analysis pipeline contains a framework that is implemented on a Linux server through completely automated installation of software components and reference files. Analysis with CANEapp is also fully automated and performs differential gene expression analysis and novel noncoding RNA discovery through alternative workflows (Cuffdiff and R packages edgeR and DESeq2). We compared CANEapp to other similar tools, and it significantly improves on previous developments. We experimentally validated CANEapp's performance by applying it to data derived from different experimental paradigms and confirming the results with quantitative real-time PCR (qRT-PCR). CANEapp adapts to any server architecture by effectively using available resources and thus handles large amounts of data efficiently. CANEapp performance has been experimentally validated on various biological datasets. CANEapp is available free of charge at http://psychiatry.med.miami.edu/research/laboratory-of-translational-rna-genomics/CANE-app . We believe that CANEapp will serve both biologists with no computational experience and bioinformaticians as a simple, timesaving but accurate and powerful tool to analyze large RNA-seq datasets and will provide foundations for future development of integrated and automated high-throughput genomics data analysis tools. Due to its inherently standardized pipeline and combination of automated analysis and platform-independence, CANEapp is an ideal for large-scale collaborative RNA-seq projects between different institutions and research groups.
Wright, Imogen A.; Travers, Simon A.
2014-01-01
The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. PMID:24861618
Tsuchiya, Megumi; Karim, M Rezaul; Matsumoto, Taro; Ogawa, Hidesato; Taniguchi, Hiroaki
2017-01-24
Transcriptional coregulators are vital to the efficient transcriptional regulation of nuclear chromatin structure. Coregulators play a variety of roles in regulating transcription. These include the direct interaction with transcription factors, the covalent modification of histones and other proteins, and the occasional chromatin conformation alteration. Accordingly, establishing relatively quick methods for identifying proteins that interact within this network is crucial to enhancing our understanding of the underlying regulatory mechanisms. LC-MS/MS-mediated protein binding partner identification is a validated technique used to analyze protein-protein interactions. By immunoprecipitating a previously-identified member of a protein complex with an antibody (occasionally with an antibody for a tagged protein), it is possible to identify its unknown protein interactions via mass spectrometry analysis. Here, we present a method of protein preparation for the LC-MS/MS-mediated high-throughput identification of protein interactions involving nuclear cofactors and their binding partners. This method allows for a better understanding of the transcriptional regulatory mechanisms of the targeted nuclear factors.
A receptor binding assay for paralytic shellfish poisoning toxins: recent advances and applications.
Powell, C L; Doucette, G J
1999-01-01
We recently described a high throughput receptor binding assay for paralytic shellfish poisoning (PSP) toxins, the use of the assay for detecting toxic activity in shellfish and algal extracts, and the validation of 11-[3H]-tetrodotoxin as an alternative radioligand to the [3H]-saxitoxin conventionally employed in the assay. Here, we report a dramatic increase in assay efficiency through application of microplate scintillation technology, resulting in an assay turn around time of 4 h. Efforts are now focused on demonstrating the range of applications for which this receptor assay can provide data comparable to the more time consuming, technically demanding HPLC analysis of PSP toxins, currently the method of choice for researchers. To date, we have compared the results of both methods for a variety of sample types, including different genera of PSP toxin producing dinoflagellates (e.g. Alexandrium lusitanicum, r2 = 0.9834, n = 12), size-fractioned field samples of Alexandrium spp. (20-64 microm; r2 = 0.9997, n = 10) as well as its associated zooplankton grazer community (200-500 microm: r2 = 0.6169, n = 10; >500 microm: r2 = 0.5063, n = 10), and contaminated human fluids (r2 = 0.9661, n = 7) from a PSP outbreak. Receptor-based STX equivalent values for all but the zooplankton samples were highly correlated and exhibited close quantitative agreement with those produced by HPLC. While the PSP receptor binding assay does not provide information on toxin composition obtainable by HPLC, it does represent a robust and reliable means of rapidly assessing PSP-like toxicity in laboratory and field samples. Moreover, this assay should be effective as a screening tool for use by public health officials in responding to suspected cases of PSP intoxication.
PCR cycles above routine numbers do not compromise high-throughput DNA barcoding results.
Vierna, J; Doña, J; Vizcaíno, A; Serrano, D; Jovani, R
2017-10-01
High-throughput DNA barcoding has become essential in ecology and evolution, but some technical questions still remain. Increasing the number of PCR cycles above the routine 20-30 cycles is a common practice when working with old-type specimens, which provide little amounts of DNA, or when facing annealing issues with the primers. However, increasing the number of cycles can raise the number of artificial mutations due to polymerase errors. In this work, we sequenced 20 COI libraries in the Illumina MiSeq platform. Libraries were prepared with 40, 45, 50, 55, and 60 PCR cycles from four individuals belonging to four species of four genera of cephalopods. We found no relationship between the number of PCR cycles and the number of mutations despite using a nonproofreading polymerase. Moreover, even when using a high number of PCR cycles, the resulting number of mutations was low enough not to be an issue in the context of high-throughput DNA barcoding (but may still remain an issue in DNA metabarcoding due to chimera formation). We conclude that the common practice of increasing the number of PCR cycles should not negatively impact the outcome of a high-throughput DNA barcoding study in terms of the occurrence of point mutations.
Sinclair, Thomas R; Manandhar, Anju; Shekoofa, Avat; Rosas-Anderson, Pablo; Bagherzadi, Laleh; Schoppach, Remy; Sadok, Walid; Rufty, Thomas W
2017-04-01
Theoretical derivation predicted growth retardation due to pot water limitations, i.e., pot binding. Experimental observations were consistent with these limitations. Combined, these results indicate a need for caution in high-throughput screening and phenotyping. Pot experiments are a mainstay in many plant studies, including the current emphasis on developing high-throughput, phenotyping systems. Pot studies can be vulnerable to decreased physiological activity of the plants particularly when pot volume is small, i.e., "pot binding". It is necessary to understand the conditions under which pot binding may exist to avoid the confounding influence of pot binding in interpreting experimental results. In this paper, a derivation is offered that gives well-defined conditions for the occurrence of pot binding based on restricted water availability. These results showed that not only are pot volume and plant size important variables, but the potting media is critical. Artificial potting mixtures used in many studies, including many high-throughput phenotyping systems, are particularly susceptible to the confounding influences of pot binding. Experimental studies for several crop species are presented that clearly show the existence of thresholds of plant leaf area at which various pot sizes and potting media result in the induction of pot binding even though there may be no immediate, visual plant symptoms. The derivation and experimental results showed that pot binding can readily occur in plant experiments if care is not given to have sufficiently large pots, suitable potting media, and maintenance of pot water status. Clear guidelines are provided for avoiding the confounding effects of water-limited pot binding in studying plant phenotype.
Zong, Shan; Deng, Shuyun; Chen, Kenian; Wu, Jia Qian
2014-11-11
Hematopoietic stem cells (HSCs) are used clinically for transplantation treatment to rebuild a patient's hematopoietic system in many diseases such as leukemia and lymphoma. Elucidating the mechanisms controlling HSCs self-renewal and differentiation is important for application of HSCs for research and clinical uses. However, it is not possible to obtain large quantity of HSCs due to their inability to proliferate in vitro. To overcome this hurdle, we used a mouse bone marrow derived cell line, the EML (Erythroid, Myeloid, and Lymphocytic) cell line, as a model system for this study. RNA-sequencing (RNA-Seq) has been increasingly used to replace microarray for gene expression studies. We report here a detailed method of using RNA-Seq technology to investigate the potential key factors in regulation of EML cell self-renewal and differentiation. The protocol provided in this paper is divided into three parts. The first part explains how to culture EML cells and separate Lin-CD34+ and Lin-CD34- cells. The second part of the protocol offers detailed procedures for total RNA preparation and the subsequent library construction for high-throughput sequencing. The last part describes the method for RNA-Seq data analysis and explains how to use the data to identify differentially expressed transcription factors between Lin-CD34+ and Lin-CD34- cells. The most significantly differentially expressed transcription factors were identified to be the potential key regulators controlling EML cell self-renewal and differentiation. In the discussion section of this paper, we highlight the key steps for successful performance of this experiment. In summary, this paper offers a method of using RNA-Seq technology to identify potential regulators of self-renewal and differentiation in EML cells. The key factors identified are subjected to downstream functional analysis in vitro and in vivo.
Chen, Kenian; Wu, Jia Qian
2014-01-01
Hematopoietic stem cells (HSCs) are used clinically for transplantation treatment to rebuild a patient's hematopoietic system in many diseases such as leukemia and lymphoma. Elucidating the mechanisms controlling HSCs self-renewal and differentiation is important for application of HSCs for research and clinical uses. However, it is not possible to obtain large quantity of HSCs due to their inability to proliferate in vitro. To overcome this hurdle, we used a mouse bone marrow derived cell line, the EML (Erythroid, Myeloid, and Lymphocytic) cell line, as a model system for this study. RNA-sequencing (RNA-Seq) has been increasingly used to replace microarray for gene expression studies. We report here a detailed method of using RNA-Seq technology to investigate the potential key factors in regulation of EML cell self-renewal and differentiation. The protocol provided in this paper is divided into three parts. The first part explains how to culture EML cells and separate Lin-CD34+ and Lin-CD34- cells. The second part of the protocol offers detailed procedures for total RNA preparation and the subsequent library construction for high-throughput sequencing. The last part describes the method for RNA-Seq data analysis and explains how to use the data to identify differentially expressed transcription factors between Lin-CD34+ and Lin-CD34- cells. The most significantly differentially expressed transcription factors were identified to be the potential key regulators controlling EML cell self-renewal and differentiation. In the discussion section of this paper, we highlight the key steps for successful performance of this experiment. In summary, this paper offers a method of using RNA-Seq technology to identify potential regulators of self-renewal and differentiation in EML cells. The key factors identified are subjected to downstream functional analysis in vitro and in vivo. PMID:25407807
Li, Zhao; Hu, Guanghui; Liu, Xiangfeng; Zhou, Yao; Li, Yu; Zhang, Xu; Yuan, Xiaohui; Zhang, Qian; Yang, Deguang; Wang, Tianyu; Zhang, Zhiwu
2016-01-01
Originating in a tropical climate, maize has faced great challenges as cultivation has expanded to the majority of the world's temperate zones. In these zones, frost and cold temperatures are major factors that prevent maize from reaching its full yield potential. Among 30 elite maize inbred lines adapted to northern China, we identified two lines of extreme, but opposite, freezing tolerance levels—highly tolerant and highly sensitive. During the seedling stage of these two lines, we used RNA-seq to measure changes in maize whole genome transcriptome before and after freezing treatment. In total, 19,794 genes were expressed, of which 4550 exhibited differential expression due to either treatment (before or after freezing) or line type (tolerant or sensitive). Of the 4550 differently expressed genes, 948 exhibited differential expression due to treatment within line or lines under freezing condition. Analysis of gene ontology found that these 948 genes were significantly enriched for binding functions (DNA binding, ATP binding, and metal ion binding), protein kinase activity, and peptidase activity. Based on their enrichment, literature support, and significant levels of differential expression, 30 of these 948 genes were selected for quantitative real-time PCR (qRT-PCR) validation. The validation confirmed our RNA-Seq-based findings, with squared correlation coefficients of 80% and 50% in the tolerance and sensitive lines, respectively. This study provided valuable resources for further studies to enhance understanding of the molecular mechanisms underlying maize early freezing response and enable targeted breeding strategies for developing varieties with superior frost resistance to achieve yield potential. PMID:27774095
Sanchez-Luque, Francisco J; Richardson, Sandra R; Faulkner, Geoffrey J
2016-01-01
Mobile genetic elements (MGEs) are of critical importance in genomics and developmental biology. Polymorphic and somatic MGE insertions have the potential to impact the phenotype of an individual, depending on their genomic locations and functional consequences. However, the identification of polymorphic and somatic insertions among the plethora of copies residing in the genome presents a formidable technical challenge. Whole genome sequencing has the potential to address this problem; however, its efficacy depends on the abundance of cells carrying the new insertion. Robust detection of somatic insertions present in only a subset of cells within a given sample can also be prohibitively expensive due to a requirement for high sequencing depth. Here, we describe retrotransposon capture sequencing (RC-seq), a sequence capture approach in which Illumina libraries are enriched for fragments containing the 5' and 3' termini of specific MGEs. RC-seq allows the detection of known polymorphic insertions present in an individual, as well as the identification of rare or private germline insertions not previously described. Furthermore, RC-seq can be used to detect and characterize somatic insertions, providing a valuable tool to elucidate the extent and characteristics of MGE activity in healthy tissues and in various disease states.
Insights into the increasing virulence of the swine-origin pandemic H1N1/2009 influenza virus
Zou, Wei; Chen, Dijun; Xiong, Min; Zhu, Jiping; Lin, Xian; Wang, Lun; Zhang, Jun; Chen, Lingling; Zhang, Hongyu; Chen, Huanchun; Chen, Ming; Jin, Meilin
2013-01-01
Pandemic H1N1/2009 viruses have been stabilized in swine herds, and some strains display higher pathogenicity than the human-origin isolates. In this study, high-throughput RNA sequencing (RNA-seq) is applied to explore the systemic transcriptome responses of the mouse lungs infected by swine (Jia6/10) and human (LN/09) H1N1/2009 viruses. The transcriptome data show that Jia6/10 activates stronger virus-sensing signals, such as the toll-like receptor, RIG-I like receptor and NOD-like receptor signalings, as well as a stronger NF-κB and JAK-STAT singals, which play significant roles in inducing innate immunity. Most cytokines and interferon-stimulated genes show higher expression lever in Jia/06 infected groups. Meanwhile, virus Jia6/10 activates stronger production of reactive oxygen species, which might further promote higher mutation rate of the virus genome. Collectively, our data reveal that the swine-origin pandemic H1N1/2009 virus elicits a stronger innate immune reaction and pro-oxidation stimulation, which might relate closely to the increasing pathogenicity. PMID:23549303
A Mixture Modeling Framework for Differential Analysis of High-Throughput Data
Taslim, Cenny; Lin, Shili
2014-01-01
The inventions of microarray and next generation sequencing technologies have revolutionized research in genomics; platforms have led to massive amount of data in gene expression, methylation, and protein-DNA interactions. A common theme among a number of biological problems using high-throughput technologies is differential analysis. Despite the common theme, different data types have their own unique features, creating a “moving target” scenario. As such, methods specifically designed for one data type may not lead to satisfactory results when applied to another data type. To meet this challenge so that not only currently existing data types but also data from future problems, platforms, or experiments can be analyzed, we propose a mixture modeling framework that is flexible enough to automatically adapt to any moving target. More specifically, the approach considers several classes of mixture models and essentially provides a model-based procedure whose model is adaptive to the particular data being analyzed. We demonstrate the utility of the methodology by applying it to three types of real data: gene expression, methylation, and ChIP-seq. We also carried out simulations to gauge the performance and showed that the approach can be more efficient than any individual model without inflating type I error. PMID:25057284
Fougeroux, André; Petit, Fabien; Anselmo, Anna; Gorni, Chiara; Cucurachi, Marco; Cersini, Antonella; Granato, Anna; Cardeti, Giusy; Formato, Giovanni; Mutinelli, Franco; Giuffra, Elisabetta; Williams, John L.; Botti, Sara
2017-01-01
Honeybees (Apis mellifera) are constantly subjected to many biotic stressors including parasites. This study examined honeybees infected with Nosema ceranae (N. ceranae). N. ceranae infection increases the bees energy requirements and may contribute to their decreased survival. RNA-seq was used to investigate gene expression at days 5, 10 and 15 Post Infection (P.I) with N. ceranae. The expression levels of genes, isoforms, alternative transcription start sites (TSS) and differential promoter usage revealed a complex pattern of transcriptional and post-transcriptional gene regulation suggesting that bees use a range of tactics to cope with the stress of N. ceranae infection. N. ceranae infection may cause reduced immune function in the bees by: (i)disturbing the host amino acids metabolism (ii) down-regulating expression of antimicrobial peptides (iii) down-regulation of cuticle coatings and (iv) down-regulation of odorant binding proteins. PMID:28350872
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mochalkin, Igor; Lightle, Sandra; Narasimhan, Lakshmi
2008-04-02
N-Acetylglucosamine-1-phosphate uridyltransferase (GlmU) is an essential enzyme in aminosugars metabolism and an attractive target for antibiotic drug discovery. GlmU catalyzes the formation of uridine-diphospho-N-acetylglucosamine (UDP-GlcNAc), an important precursor in the peptidoglycan and lipopolisaccharide biosynthesis in both Gram-negative and Gram-positive bacteria. Here we disclose a 1.9 {angstrom} resolution crystal structure of a synthetic small-molecule inhibitor of GlmU from Haemophilus influenzae (hiGlmU). The compound was identified through a high-throughput screening (HTS) configured to detect inhibitors that target the uridyltransferase active site of hiGlmU. The original HTS hit exhibited a modest micromolar potency (IC{sub 50} - 18 {mu}M in a racemic mixture) againstmore » hiGlmU and no activity against Staphylococcus aureus GlmU (saGlmU). The determined crystal structure indicated that the inhibitor occupies an allosteric site adjacent to the GlcNAc-1-P substrate-binding region. Analysis of the mechanistic model of the uridyltransferase reaction suggests that the binding of this allosteric inhibitor prevents structural rearrangements that are required for the enzymatic reaction, thus providing a basis for structure-guided design of a new class of mechanism-based inhibitors of GlmU.« less
BlackOPs: increasing confidence in variant detection through mappability filtering.
Cabanski, Christopher R; Wilkerson, Matthew D; Soloway, Matthew; Parker, Joel S; Liu, Jinze; Prins, Jan F; Marron, J S; Perou, Charles M; Hayes, D Neil
2013-10-01
Identifying variants using high-throughput sequencing data is currently a challenge because true biological variants can be indistinguishable from technical artifacts. One source of technical artifact results from incorrectly aligning experimentally observed sequences to their true genomic origin ('mismapping') and inferring differences in mismapped sequences to be true variants. We developed BlackOPs, an open-source tool that simulates experimental RNA-seq and DNA whole exome sequences derived from the reference genome, aligns these sequences by custom parameters, detects variants and outputs a blacklist of positions and alleles caused by mismapping. Blacklists contain thousands of artifact variants that are indistinguishable from true variants and, for a given sample, are expected to be almost completely false positives. We show that these blacklist positions are specific to the alignment algorithm and read length used, and BlackOPs allows users to generate a blacklist specific to their experimental setup. We queried the dbSNP and COSMIC variant databases and found numerous variants indistinguishable from mapping errors. We demonstrate how filtering against blacklist positions reduces the number of potential false variants using an RNA-seq glioblastoma cell line data set. In summary, accounting for mapping-caused variants tuned to experimental setups reduces false positives and, therefore, improves genome characterization by high-throughput sequencing.
Farkas, Kata; Harrison, Christian; Jones, David L.; McCarthy, Alan J.
2018-01-01
ABSTRACT Detection of viruses in the environment is heavily dependent on PCR-based approaches that require reference sequences for primer design. While this strategy can accurately detect known viruses, it will not find novel genotypes or emerging and invasive viral species. In this study, we investigated the use of viromics, i.e., high-throughput sequencing of the biosphere’s viral fraction, to detect human-/animal-pathogenic RNA viruses in the Conwy river catchment area in Wales, United Kingdom. Using a combination of filtering and nuclease treatment, we extracted the viral fraction from wastewater and estuarine river water and sediment, followed by high-throughput RNA sequencing (RNA-Seq) analysis on the Illumina HiSeq platform, for the discovery of RNA virus genomes. We found a higher richness of RNA viruses in wastewater samples than in river water and sediment, and we assembled a complete norovirus genotype GI.2 genome from wastewater effluent, which was not contemporaneously detected by conventional reverse transcription-quantitative PCR (qRT-PCR). The simultaneous presence of diverse rotavirus signatures in wastewater indicated the potential for zoonotic infections in the area and suggested runoff from pig farms as a possible origin of these viruses. Our results show that viromics can be an important tool in the discovery of pathogenic viruses in the environment and can be used to inform and optimize reference-based detection methods provided appropriate and rigorous controls are included. IMPORTANCE Enteric viruses cause gastrointestinal illness and are commonly transmitted through the fecal-oral route. When wastewater is released into river systems, these viruses can contaminate the environment. Our results show that we can use viromics to find the range of potentially pathogenic viruses that are present in the environment and identify prevalent genotypes. The ultimate goal is to trace the fate of these pathogenic viruses from origin to the point where they are a threat to human health, informing reference-based detection methods and water quality management. PMID:29795788
Adriaenssens, Evelien M; Farkas, Kata; Harrison, Christian; Jones, David L; Allison, Heather E; McCarthy, Alan J
2018-01-01
Detection of viruses in the environment is heavily dependent on PCR-based approaches that require reference sequences for primer design. While this strategy can accurately detect known viruses, it will not find novel genotypes or emerging and invasive viral species. In this study, we investigated the use of viromics, i.e., high-throughput sequencing of the biosphere's viral fraction, to detect human-/animal-pathogenic RNA viruses in the Conwy river catchment area in Wales, United Kingdom. Using a combination of filtering and nuclease treatment, we extracted the viral fraction from wastewater and estuarine river water and sediment, followed by high-throughput RNA sequencing (RNA-Seq) analysis on the Illumina HiSeq platform, for the discovery of RNA virus genomes. We found a higher richness of RNA viruses in wastewater samples than in river water and sediment, and we assembled a complete norovirus genotype GI.2 genome from wastewater effluent, which was not contemporaneously detected by conventional reverse transcription-quantitative PCR (qRT-PCR). The simultaneous presence of diverse rotavirus signatures in wastewater indicated the potential for zoonotic infections in the area and suggested runoff from pig farms as a possible origin of these viruses. Our results show that viromics can be an important tool in the discovery of pathogenic viruses in the environment and can be used to inform and optimize reference-based detection methods provided appropriate and rigorous controls are included. IMPORTANCE Enteric viruses cause gastrointestinal illness and are commonly transmitted through the fecal-oral route. When wastewater is released into river systems, these viruses can contaminate the environment. Our results show that we can use viromics to find the range of potentially pathogenic viruses that are present in the environment and identify prevalent genotypes. The ultimate goal is to trace the fate of these pathogenic viruses from origin to the point where they are a threat to human health, informing reference-based detection methods and water quality management.
Exome Pool-Seq in neurodevelopmental disorders.
Popp, Bernt; Ekici, Arif B; Thiel, Christian T; Hoyer, Juliane; Wiesener, Antje; Kraus, Cornelia; Reis, André; Zweier, Christiane
2017-12-01
High throughput sequencing has greatly advanced disease gene identification, especially in heterogeneous entities. Despite falling costs this is still an expensive and laborious technique, particularly when studying large cohorts. To address this problem we applied Exome Pool-Seq as an economic and fast screening technology in neurodevelopmental disorders (NDDs). Sequencing of 96 individuals can be performed in eight pools of 12 samples on less than one Illumina sequencer lane. In a pilot study with 96 cases we identified 27 variants, likely or possibly affecting function. Twenty five of these were identified in 923 established NDD genes (based on SysID database, status November 2016) (ACTB, AHDC1, ANKRD11, ATP6V1B2, ATRX, CASK, CHD8, GNAS, IFIH1, KCNQ2, KMT2A, KRAS, MAOA, MED12, MED13L, RIT1, SETD5, SIN3A, TCF4, TRAPPC11, TUBA1A, WAC, ZBTB18, ZMYND11), two in 543 (SysID) candidate genes (ZNF292, BPTF), and additionally a de novo loss-of-function variant in LRRC7, not previously implicated in NDDs. Most of them were confirmed to be de novo, but we also identified X-linked or autosomal-dominantly or autosomal-recessively inherited variants. With a detection rate of 28%, Exome Pool-Seq achieves comparable results to individual exome analyses but reduces costs by >85%. Compared with other large scale approaches using Molecular Inversion Probes (MIP) or gene panels, it allows flexible re-analysis of data. Exome Pool-Seq is thus well suited for large-scale, cost-efficient and flexible screening in characterized but heterogeneous entities like NDDs.
Hewitt, Stephen N.; Choi, Ryan; Kelley, Angela; Crowther, Gregory J.; Napuli, Alberto J.; Van Voorhis, Wesley C.
2011-01-01
Despite recent advances, the expression of heterologous proteins in Escherichia coli for crystallization remains a nontrivial challenge. The present study investigates the efficacy of maltose-binding protein (MBP) fusion as a general strategy for rescuing the expression of target proteins. From a group of sequence-verified clones with undetectable levels of protein expression in an E. coli T7 expression system, 95 clones representing 16 phylogenetically diverse organisms were selected for recloning into a chimeric expression vector with an N-terminal histidine-tagged MBP. PCR-amplified inserts were annealed into an identical ligation-independent cloning region in an MBP-fusion vector and were analyzed for expression and solubility by high-throughput nickel-affinity binding. This approach yielded detectable expression of 72% of the clones; soluble expression was visible in 62%. However, the solubility of most proteins was marginal to poor upon cleavage of the MBP tag. This study offers large-scale evidence that MBP can improve the soluble expression of previously non-expressing proteins from a variety of eukaryotic and prokaryotic organisms. While the behavior of the cleaved proteins was disappointing, further refinements in MBP tagging may permit the more widespread use of MBP-fusion proteins in crystallographic studies. PMID:21904041
Thomsen, Martin Christen Frølund; Nielsen, Morten
2012-01-01
Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed). PMID:22638583
High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing.
Lagarde, Julien; Uszczynska-Ratajczak, Barbara; Carbonell, Silvia; Pérez-Lluch, Sílvia; Abad, Amaya; Davis, Carrie; Gingeras, Thomas R; Frankish, Adam; Harrow, Jennifer; Guigo, Roderic; Johnson, Rory
2017-12-01
Accurate annotation of genes and their transcripts is a foundation of genomics, but currently no annotation technique combines throughput and accuracy. As a result, reference gene collections remain incomplete-many gene models are fragmentary, and thousands more remain uncataloged, particularly for long noncoding RNAs (lncRNAs). To accelerate lncRNA annotation, the GENCODE consortium has developed RNA Capture Long Seq (CLS), which combines targeted RNA capture with third-generation long-read sequencing. Here we present an experimental reannotation of the GENCODE intergenic lncRNA populations in matched human and mouse tissues that resulted in novel transcript models for 3,574 and 561 gene loci, respectively. CLS approximately doubled the annotated complexity of targeted loci, outperforming existing short-read techniques. Full-length transcript models produced by CLS enabled us to definitively characterize the genomic features of lncRNAs, including promoter and gene structure, and protein-coding potential. Thus, CLS removes a long-standing bottleneck in transcriptome annotation and generates manual-quality full-length transcript models at high-throughput scales.
Hong, Min Eui; Do, In-Gu; Kang, So Young; Ha, Sang Yun; Kim, Seung Tae; Park, Se Hoon; Kang, Won Ki; Choi, Min-Gew; Lee, Jun Ho; Sohn, Tae Sung; Bae, Jae Moon; Kim, Sung; Kim, Duk-Hwan; Kim, Kyoung-Mee
2014-01-01
In the era of targeted therapy, mutation profiling of cancer is a crucial aspect of making therapeutic decisions. To characterize cancer at a molecular level, the use of formalin-fixed paraffin-embedded tissue is important. We tested the Ion AmpliSeq Cancer Hotspot Panel v2 and nCounter Copy Number Variation Assay in 89 formalin-fixed paraffin-embedded gastric cancer samples to determine whether they are applicable in archival clinical samples for personalized targeted therapies. We validated the results with Sanger sequencing, real-time quantitative PCR, fluorescence in situ hybridization and immunohistochemistry. Frequently detected somatic mutations included TP53 (28.17%), APC (10.1%), PIK3CA (5.6%), KRAS (4.5%), SMO (3.4%), STK11 (3.4%), CDKN2A (3.4%) and SMAD4 (3.4%). Amplifications of HER2, CCNE1, MYC, KRAS and EGFR genes were observed in 8 (8.9%), 4 (4.5%), 2 (2.2%), 1 (1.1%) and 1 (1.1%) cases, respectively. In the cases with amplification, fluorescence in situ hybridization for HER2 verified gene amplification and immunohistochemistry for HER2, EGFR and CCNE1 verified the overexpression of proteins in tumor cells. In conclusion, we successfully performed semiconductor-based sequencing and nCounter copy number variation analyses in formalin-fixed paraffin-embedded gastric cancer samples. High-throughput screening in archival clinical samples enables faster, more accurate and cost-effective detection of hotspot mutations or amplification in genes. PMID:25372287
MYCN controls an alternative RNA splicing program in high-risk metastatic neuroblastoma.
Zhang, Shile; Wei, Jun S; Li, Samuel Q; Badgett, Tom C; Song, Young K; Agarwal, Saurabh; Coarfa, Cristian; Tolman, Catherine; Hurd, Laura; Liao, Hongling; He, Jianbin; Wen, Xinyu; Liu, Zhihui; Thiele, Carol J; Westermann, Frank; Asgharzadeh, Shahab; Seeger, Robert C; Maris, John M; Guidry Auvil, Jamie M; Smith, Malcolm A; Kolaczyk, Eric D; Shohet, Jason; Khan, Javed
2016-02-28
The molecular mechanisms underlying the aggressive behavior of MYCN driven neuroblastoma (NBL) is under intense investigation; however, little is known about the impact of this family of transcription factors on the splicing program. Here we used high-throughput RNA sequencing to systematically study the expression of RNA isoforms in stage 4 MYCN-amplified NBL, an aggressive subtype of metastatic NBL. We show that MYCN-amplified NBL tumors display a distinct gene splicing pattern affecting multiple cancer hallmark functions. Six splicing factors displayed unique differential expression patterns in MYCN-amplified tumors and cell lines, and the binding motifs for some of these splicing factors are significantly enriched in differentially-spliced genes. Direct binding of MYCN to promoter regions of the splicing factors PTBP1 and HNRNPA1 detected by ChIP-seq demonstrates that MYCN controls the splicing pattern by direct regulation of the expression of these key splicing factors. Furthermore, high expression of PTBP1 and HNRNPA1 was significantly associated with poor overall survival of stage4 NBL patients (p ≤ 0.05). Knocking down PTBP1, HNRNPA1 and their downstream target PKM2, an isoform of pro-tumor-growth, result in repressed growth of NBL cells. Therefore, our study reveals a novel role of MYCN in controlling global splicing program through regulation of splicing factors in addition to its well-known role in the transcription program. These findings suggest a therapeutically potential to target the key splicing factors or gene isoforms in high-risk NBL with MYCN-amplification. Published by Elsevier Ireland Ltd.
Accurate and predictive antibody repertoire profiling by molecular amplification fingerprinting.
Khan, Tarik A; Friedensohn, Simon; Gorter de Vries, Arthur R; Straszewski, Jakub; Ruscheweyh, Hans-Joachim; Reddy, Sai T
2016-03-01
High-throughput antibody repertoire sequencing (Ig-seq) provides quantitative molecular information on humoral immunity. However, Ig-seq is compromised by biases and errors introduced during library preparation and sequencing. By using synthetic antibody spike-in genes, we determined that primer bias from multiplex polymerase chain reaction (PCR) library preparation resulted in antibody frequencies with only 42 to 62% accuracy. Additionally, Ig-seq errors resulted in antibody diversity measurements being overestimated by up to 5000-fold. To rectify this, we developed molecular amplification fingerprinting (MAF), which uses unique molecular identifier (UID) tagging before and during multiplex PCR amplification, which enabled tagging of transcripts while accounting for PCR efficiency. Combined with a bioinformatic pipeline, MAF bias correction led to measurements of antibody frequencies with up to 99% accuracy. We also used MAF to correct PCR and sequencing errors, resulting in enhanced accuracy of full-length antibody diversity measurements, achieving 98 to 100% error correction. Using murine MAF-corrected data, we established a quantitative metric of recent clonal expansion-the intraclonal diversity index-which measures the number of unique transcripts associated with an antibody clone. We used this intraclonal diversity index along with antibody frequencies and somatic hypermutation to build a logistic regression model for prediction of the immunological status of clones. The model was able to predict clonal status with high confidence but only when using MAF error and bias corrected Ig-seq data. Improved accuracy by MAF provides the potential to greatly advance Ig-seq and its utility in immunology and biotechnology.
Accurate and predictive antibody repertoire profiling by molecular amplification fingerprinting
Khan, Tarik A.; Friedensohn, Simon; de Vries, Arthur R. Gorter; Straszewski, Jakub; Ruscheweyh, Hans-Joachim; Reddy, Sai T.
2016-01-01
High-throughput antibody repertoire sequencing (Ig-seq) provides quantitative molecular information on humoral immunity. However, Ig-seq is compromised by biases and errors introduced during library preparation and sequencing. By using synthetic antibody spike-in genes, we determined that primer bias from multiplex polymerase chain reaction (PCR) library preparation resulted in antibody frequencies with only 42 to 62% accuracy. Additionally, Ig-seq errors resulted in antibody diversity measurements being overestimated by up to 5000-fold. To rectify this, we developed molecular amplification fingerprinting (MAF), which uses unique molecular identifier (UID) tagging before and during multiplex PCR amplification, which enabled tagging of transcripts while accounting for PCR efficiency. Combined with a bioinformatic pipeline, MAF bias correction led to measurements of antibody frequencies with up to 99% accuracy. We also used MAF to correct PCR and sequencing errors, resulting in enhanced accuracy of full-length antibody diversity measurements, achieving 98 to 100% error correction. Using murine MAF-corrected data, we established a quantitative metric of recent clonal expansion—the intraclonal diversity index—which measures the number of unique transcripts associated with an antibody clone. We used this intraclonal diversity index along with antibody frequencies and somatic hypermutation to build a logistic regression model for prediction of the immunological status of clones. The model was able to predict clonal status with high confidence but only when using MAF error and bias corrected Ig-seq data. Improved accuracy by MAF provides the potential to greatly advance Ig-seq and its utility in immunology and biotechnology. PMID:26998518
Huang, Ke-Lin; Zhang, Mei-Li; Ma, Guang-Jing; Wu, Huan; Wu, Xiao-Ming; Ren, Feng; Li, Xue-Bao
2017-01-01
Seed oil content is an important agronomic trait in oilseed rape. However, the molecular mechanism of oil accumulation in rapeseeds is unclear so far. In this report, RNA sequencing technique (RNA-Seq) was performed to explore differentially expressed genes in siliques of two Brassica napus lines (HFA and LFA which contain high and low oil contents in seeds, respectively) at 15 and 25 days after pollination (DAP). The RNA-Seq results showed that 65746 and 66033 genes were detected in siliques of low oil content line at 15 and 25 DAP, and 65236 and 65211 genes were detected in siliques of high oil content line at 15 and 25 DAP, respectively. By comparative analysis, the differentially expressed genes (DEGs) were identified in siliques of these lines. The DEGs were involved in multiple pathways, including metabolic pathways, biosynthesis of secondary metabolic, photosynthesis, pyruvate metabolism, fatty metabolism, glycophospholipid metabolism, and DNA binding. Also, DEGs were related to photosynthesis, starch and sugar metabolism, pyruvate metabolism, and lipid metabolism at different developmental stage, resulting in the differential oil accumulation in seeds. Furthermore, RNA-Seq and qRT-PCR data revealed that some transcription factors positively regulate seed oil content. Thus, our data provide the valuable information for further exploring the molecular mechanism of lipid biosynthesis and oil accumulation in B. nupus.
Huang, Ke-Lin; Zhang, Mei-Li; Ma, Guang-Jing; Wu, Huan; Wu, Xiao-Ming; Ren, Feng
2017-01-01
Seed oil content is an important agronomic trait in oilseed rape. However, the molecular mechanism of oil accumulation in rapeseeds is unclear so far. In this report, RNA sequencing technique (RNA-Seq) was performed to explore differentially expressed genes in siliques of two Brassica napus lines (HFA and LFA which contain high and low oil contents in seeds, respectively) at 15 and 25 days after pollination (DAP). The RNA-Seq results showed that 65746 and 66033 genes were detected in siliques of low oil content line at 15 and 25 DAP, and 65236 and 65211 genes were detected in siliques of high oil content line at 15 and 25 DAP, respectively. By comparative analysis, the differentially expressed genes (DEGs) were identified in siliques of these lines. The DEGs were involved in multiple pathways, including metabolic pathways, biosynthesis of secondary metabolic, photosynthesis, pyruvate metabolism, fatty metabolism, glycophospholipid metabolism, and DNA binding. Also, DEGs were related to photosynthesis, starch and sugar metabolism, pyruvate metabolism, and lipid metabolism at different developmental stage, resulting in the differential oil accumulation in seeds. Furthermore, RNA-Seq and qRT-PCR data revealed that some transcription factors positively regulate seed oil content. Thus, our data provide the valuable information for further exploring the molecular mechanism of lipid biosynthesis and oil accumulation in B. nupus. PMID:28594951
Pott, Sebastian
2017-01-01
Gaining insights into the regulatory mechanisms that underlie the transcriptional variation observed between individual cells necessitates the development of methods that measure chromatin organization in single cells. Here I adapted Nucleosome Occupancy and Methylome-sequencing (NOMe-seq) to measure chromatin accessibility and endogenous DNA methylation in single cells (scNOMe-seq). scNOMe-seq recovered characteristic accessibility and DNA methylation patterns at DNase hypersensitive sites (DHSs). An advantage of scNOMe-seq is that sequencing reads are sampled independently of the accessibility measurement. scNOMe-seq therefore controlled for fragment loss, which enabled direct estimation of the fraction of accessible DHSs within individual cells. In addition, scNOMe-seq provided high resolution of chromatin accessibility within individual loci which was exploited to detect footprints of CTCF binding events and to estimate the average nucleosome phasing distances in single cells. scNOMe-seq is therefore well-suited to characterize the chromatin organization of single cells in heterogeneous cellular mixtures. DOI: http://dx.doi.org/10.7554/eLife.23203.001 PMID:28653622
Genomics Portals: integrative web-platform for mining genomics data.
Shinde, Kaustubh; Phatak, Mukta; Johannes, Freudenberg M; Chen, Jing; Li, Qian; Vineet, Joshi K; Hu, Zhen; Ghosh, Krishnendu; Meller, Jaroslaw; Medvedovic, Mario
2010-01-13
A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis. The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org.
Genomics Portals: integrative web-platform for mining genomics data
2010-01-01
Background A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Results Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis. Conclusion The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org. PMID:20070909
PRISM offers a comprehensive genomic approach to transcription factor function prediction
Wenger, Aaron M.; Clarke, Shoa L.; Guturu, Harendra; Chen, Jenny; Schaar, Bruce T.; McLean, Cory Y.; Bejerano, Gill
2013-01-01
The human genome encodes 1500–2000 different transcription factors (TFs). ChIP-seq is revealing the global binding profiles of a fraction of TFs in a fraction of their biological contexts. These data show that the majority of TFs bind directly next to a large number of context-relevant target genes, that most binding is distal, and that binding is context specific. Because of the effort and cost involved, ChIP-seq is seldom used in search of novel TF function. Such exploration is instead done using expression perturbation and genetic screens. Here we propose a comprehensive computational framework for transcription factor function prediction. We curate 332 high-quality nonredundant TF binding motifs that represent all major DNA binding domains, and improve cross-species conserved binding site prediction to obtain 3.3 million conserved, mostly distal, binding site predictions. We combine these with 2.4 million facts about all human and mouse gene functions, in a novel statistical framework, in search of enrichments of particular motifs next to groups of target genes of particular functions. Rigorous parameter tuning and a harsh null are used to minimize false positives. Our novel PRISM (predicting regulatory information from single motifs) approach obtains 2543 TF function predictions in a large variety of contexts, at a false discovery rate of 16%. The predictions are highly enriched for validated TF roles, and 45 of 67 (67%) tested binding site regions in five different contexts act as enhancers in functionally matched cells. PMID:23382538
Schlecht, Ulrich; Liu, Zhimin; Blundell, Jamie R; St Onge, Robert P; Levy, Sasha F
2017-05-25
Several large-scale efforts have systematically catalogued protein-protein interactions (PPIs) of a cell in a single environment. However, little is known about how the protein interactome changes across environmental perturbations. Current technologies, which assay one PPI at a time, are too low throughput to make it practical to study protein interactome dynamics. Here, we develop a highly parallel protein-protein interaction sequencing (PPiSeq) platform that uses a novel double barcoding system in conjunction with the dihydrofolate reductase protein-fragment complementation assay in Saccharomyces cerevisiae. PPiSeq detects PPIs at a rate that is on par with current assays and, in contrast with current methods, quantitatively scores PPIs with enough accuracy and sensitivity to detect changes across environments. Both PPI scoring and the bulk of strain construction can be performed with cell pools, making the assay scalable and easily reproduced across environments. PPiSeq is therefore a powerful new tool for large-scale investigations of dynamic PPIs.
High-throughput Screening Identification of Poliovirus RNA-dependent RNA Polymerase Inhibitors
Campagnola, Grace; Gong, Peng; Peersen, Olve B.
2011-01-01
Viral RNA-dependent RNA polymerase (RdRP) enzymes are essential for the replication of positive-strand RNA viruses and established targets for the development of selective antiviral therapeutics. In this work we have carried out a high-throughput screen of 154,267 compounds to identify poliovirus polymerase inhibitors using a fluorescence based RNA elongation assay. Screening and subsequent validation experiments using kinetic methods and RNA product analysis resulted in the identification of seven inhibitors that affect the RNA binding, initiation, or elongation activity of the polymerase. X-ray crystallography data show clear density for five of the compounds in the active site of the poliovirus polymerase elongation complex. The inhibitors occupy the NTP binding site by stacking on the priming nucleotide and interacting with the templating base, yet competition studies show fairly weak IC50 values in the low μM range. A comparison with nucleotide bound structures suggests that weak binding is likely due to the lack of a triphosphate group on the inhibitors. Consequently, the inhibitors are primarily effective at blocking polymerase initiation and do not effectively compete with NTP binding during processive elongation. These findings are discussed in the context of the polymerase elongation complex structure and allosteric control of the viral RdRP catalytic cycle. PMID:21722674
DNA binding by FOXP3 domain-swapped dimer suggests mechanisms of long-range chromosomal interactions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Y.; Chen, C.; Zhang, Z.
2015-01-07
FOXP3 is a lineage-specific transcription factor that is required for regulatory T cell development and function. In this study, we determined the crystal structure of the FOXP3 forkhead domain bound to DNA. The structure reveals that FOXP3 can form a stable domain-swapped dimer to bridge DNA in the absence of cofactors, suggesting that FOXP3 may play a role in long-range gene interactions. To test this hypothesis, we used circular chromosome conformation capture coupled with high throughput sequencing (4C-seq) to analyze FOXP3-dependent genomic contacts around a known FOXP3-bound locus, Ptpn22. Our studies reveal that FOXP3 induces significant changes in the chromatinmore » contacts between the Ptpn22 locus and other Foxp3-regulated genes, reflecting a mechanism by which FOXP3 reorganizes the genome architecture to coordinate the expression of its target genes. Our results suggest that FOXP3 mediates long-range chromatin interactions as part of its mechanisms to regulate specific gene expression in regulatory T cells.« less
Computational Identification of Diverse Mechanisms Underlying Transcription Factor-DNA Occupancy
Cheng, Qiong; Kazemian, Majid; Pham, Hannah; Blatti, Charles; Celniker, Susan E.; Wolfe, Scot A.; Brodsky, Michael H.; Sinha, Saurabh
2013-01-01
ChIP-based genome-wide assays of transcription factor (TF) occupancy have emerged as a powerful, high-throughput method to understand transcriptional regulation, especially on a global scale. This has led to great interest in the underlying biochemical mechanisms that direct TF-DNA binding, with the ultimate goal of computationally predicting a TF's occupancy profile in any cellular condition. In this study, we examined the influence of various potential determinants of TF-DNA binding on a much larger scale than previously undertaken. We used a thermodynamics-based model of TF-DNA binding, called “STAP,” to analyze 45 TF-ChIP data sets from Drosophila embryonic development. We built a cross-validation framework that compares a baseline model, based on the ChIP'ed (“primary”) TF's motif, to more complex models where binding by secondary TFs is hypothesized to influence the primary TF's occupancy. Candidates interacting TFs were chosen based on RNA-SEQ expression data from the time point of the ChIP experiment. We found widespread evidence of both cooperative and antagonistic effects by secondary TFs, and explicitly quantified these effects. We were able to identify multiple classes of interactions, including (1) long-range interactions between primary and secondary motifs (separated by ≤150 bp), suggestive of indirect effects such as chromatin remodeling, (2) short-range interactions with specific inter-site spacing biases, suggestive of direct physical interactions, and (3) overlapping binding sites suggesting competitive binding. Furthermore, by factoring out the previously reported strong correlation between TF occupancy and DNA accessibility, we were able to categorize the effects into those that are likely to be mediated by the secondary TF's effect on local accessibility and those that utilize accessibility-independent mechanisms. Finally, we conducted in vitro pull-down assays to test model-based predictions of short-range cooperative interactions, and found that seven of the eight TF pairs tested physically interact and that some of these interactions mediate cooperative binding to DNA. PMID:23935523
A compact imaging spectroscopic system for biomolecular detections on plasmonic chips.
Lo, Shu-Cheng; Lin, En-Hung; Wei, Pei-Kuen; Tsai, Wan-Shao
2016-10-17
In this study, we demonstrate a compact imaging spectroscopic system for high-throughput detection of biomolecular interactions on plasmonic chips, based on a curved grating as the key element of light diffraction and light focusing. Both the curved grating and the plasmonic chips are fabricated on flexible plastic substrates using a gas-assisted thermal-embossing method. A fiber-coupled broadband light source and a camera are included in the system. Spectral resolution within 1 nm is achieved in sensing environmental index solutions and protein bindings. The detected sensitivities of the plasmonic chip are comparable with a commercial spectrometer. An extra one-dimensional scanning stage enables high-throughput detection of protein binding on a designed plasmonic chip consisting of several nanoslit arrays with different periods. The detected resonance wavelengths match well with the grating equation under an air environment. Wavelength shifts between 1 and 9 nm are detected for antigens of various concentrations binding with antibodies. A simple, mass-productive and cost-effective method has been demonstrated on the imaging spectroscopic system for real-time, label-free, highly sensitive and high-throughput screening of biomolecular interactions.
Sun, Duanchen; Liu, Yinliang; Zhang, Xiang-Sun; Wu, Ling-Yun
2017-09-21
High-throughput experimental techniques have been dramatically improved and widely applied in the past decades. However, biological interpretation of the high-throughput experimental results, such as differential expression gene sets derived from microarray or RNA-seq experiments, is still a challenging task. Gene Ontology (GO) is commonly used in the functional enrichment studies. The GO terms identified via current functional enrichment analysis tools often contain direct parent or descendant terms in the GO hierarchical structure. Highly redundant terms make users difficult to analyze the underlying biological processes. In this paper, a novel network-based probabilistic generative model, NetGen, was proposed to perform the functional enrichment analysis. An additional protein-protein interaction (PPI) network was explicitly used to assist the identification of significantly enriched GO terms. NetGen achieved a superior performance than the existing methods in the simulation studies. The effectiveness of NetGen was explored further on four real datasets. Notably, several GO terms which were not directly linked with the active gene list for each disease were identified. These terms were closely related to the corresponding diseases when accessed to the curated literatures. NetGen has been implemented in the R package CopTea publicly available at GitHub ( http://github.com/wulingyun/CopTea/ ). Our procedure leads to a more reasonable and interpretable result of the functional enrichment analysis. As a novel term combination-based functional enrichment analysis method, NetGen is complementary to current individual term-based methods, and can help to explore the underlying pathogenesis of complex diseases.
PRADA: pipeline for RNA sequencing data analysis.
Torres-García, Wandaliz; Zheng, Siyuan; Sivachenko, Andrey; Vegesna, Rahulsimham; Wang, Qianghu; Yao, Rong; Berger, Michael F; Weinstein, John N; Getz, Gad; Verhaak, Roel G W
2014-08-01
Technological advances in high-throughput sequencing necessitate improved computational tools for processing and analyzing large-scale datasets in a systematic automated manner. For that purpose, we have developed PRADA (Pipeline for RNA-Sequencing Data Analysis), a flexible, modular and highly scalable software platform that provides many different types of information available by multifaceted analysis starting from raw paired-end RNA-seq data: gene expression levels, quality metrics, detection of unsupervised and supervised fusion transcripts, detection of intragenic fusion variants, homology scores and fusion frame classification. PRADA uses a dual-mapping strategy that increases sensitivity and refines the analytical endpoints. PRADA has been used extensively and successfully in the glioblastoma and renal clear cell projects of The Cancer Genome Atlas program. http://sourceforge.net/projects/prada/ gadgetz@broadinstitute.org or rverhaak@mdanderson.org Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Hou, Luanfeng; Wu, Qingping; Gu, Qihui; Zhou, Qin; Zhang, Jumei
2018-07-01
Aniline has aroused general concern owing to its strong toxicity and widespread distribution in water and soil. In the present study, the bacterial community composition before and after aniline acclimation was investigated. High-throughput Illumina MiSeq sequencing analysis illustrated a large shift in the structure of the bacterial community during the aniline acclimation period. Bacillus, Lactococcus, and Enterococcus were the dominant bacteria in biologically activated carbon before acclimation. However, the proportions of Pseudomonas, Thermomonas, and Acinetobacter increased significantly and several new bacterial taxa appeared after aniline acclimation, indicating that aniline acclimation had a strong impact on the bacterial community structure of biological activated carbon samples. Strain AN-1 accounted for the highest number of colonies on incubation plates and was identified as Acinetobacter sp. according to phylogenetic analysis of the 16S ribosomal ribonucleic acid gene sequence. Strain AN-1 was able to grow on aniline at pH value 4.0-10.0 and showed high aniline-degrading ability at neutral pH.
Song, J; Doucette, C; Hanniford, D; Hunady, K; Wang, N; Sherf, B; Harrington, J J; Brunden, K R; Stricker-Krongrad, A
2005-06-01
Target-based high-throughput screening (HTS) plays an integral role in drug discovery. The implementation of HTS assays generally requires high expression levels of the target protein, and this is typically accomplished using recombinant cDNA methodologies. However, the isolated gene sequences to many drug targets have intellectual property claims that restrict the ability to implement drug discovery programs. The present study describes the pharmacological characterization of the human histamine H3 receptor that was expressed using random activation of gene expression (RAGE), a technology that over-expresses proteins by up-regulating endogenous genes rather than introducing cDNA expression vectors into the cell. Saturation binding analysis using [125I]iodoproxyfan and RAGE-H3 membranes revealed a single class of binding sites with a K(D) value of 0.77 nM and a B(max) equal to 756 fmol/mg of protein. Competition binding studies showed that the rank order of potency for H3 agonists was N(alpha)-methylhistamine approximately (R)-alpha- methylhistamine > histamine and that the rank order of potency for H3 antagonists was clobenpropit > iodophenpropit > thioperamide. The same rank order of potency for H3 agonists and antagonists was observed in the functional assays as in the binding assays. The Fluorometic Imaging Plate Reader assays in RAGE-H3 cells gave high Z' values for agonist and antagonist screening, respectively. These results reveal that the human H3 receptor expressed with the RAGE technology is pharmacologically comparable to that expressed through recombinant methods. Moreover, the level of expression of the H3 receptor in the RAGE-H3 cells is suitable for HTS and secondary assays.
Wang, Yao; Cui, Yazhou; Zhou, Xiaoyan; Han, Jinxiang
2015-01-01
Objective Osteogenesis imperfecta (OI) is a rare inherited skeletal disease, characterized by bone fragility and low bone density. The mutations in this disorder have been widely reported to be on various exonal hotspots of the candidate genes, including COL1A1, COL1A2, CRTAP, LEPRE1, and FKBP10, thus creating a great demand for precise genetic tests. However, large genome sizes make the process daunting and the analyses, inefficient and expensive. Therefore, we aimed at developing a fast, accurate, efficient, and cheaper sequencing platform for OI diagnosis; and to this end, use of an advanced array-based technique was proposed. Method A CustomSeq Affymetrix Resequencing Array was established for high-throughput sequencing of five genes simultaneously. Genomic DNA extraction from 13 OI patients and 85 normal controls and amplification using long-range PCR (LR-PCR) were followed by DNA fragmentation and chip hybridization, according to standard Affymetrix protocols. Hybridization signals were determined using GeneChip Sequence Analysis Software (GSEQ). To examine the feasibility, the outcome from new resequencing approach was validated by conventional capillary sequencing method. Result Overall call rates using resequencing array was 96–98% and the agreement between microarray and capillary sequencing was 99.99%. 11 out of 13 OI patients with pathogenic mutations were successfully detected by the chip analysis without adjustment, and one mutation could also be identified using manual visual inspection. Conclusion A high-throughput resequencing array was developed that detects the disease-associated mutations in OI, providing a potential tool to facilitate large-scale genetic screening for OI patients. Through this method, a novel mutation was also found. PMID:25742658
Fully Automated Sample Preparation for Ultrafast N-Glycosylation Analysis of Antibody Therapeutics.
Szigeti, Marton; Lew, Clarence; Roby, Keith; Guttman, Andras
2016-04-01
There is a growing demand in the biopharmaceutical industry for high-throughput, large-scale N-glycosylation profiling of therapeutic antibodies in all phases of product development, but especially during clone selection when hundreds of samples should be analyzed in a short period of time to assure their glycosylation-based biological activity. Our group has recently developed a magnetic bead-based protocol for N-glycosylation analysis of glycoproteins to alleviate the hard-to-automate centrifugation and vacuum-centrifugation steps of the currently used protocols. Glycan release, fluorophore labeling, and cleanup were all optimized, resulting in a <4 h magnetic bead-based process with excellent yield and good repeatability. This article demonstrates the next level of this work by automating all steps of the optimized magnetic bead-based protocol from endoglycosidase digestion, through fluorophore labeling and cleanup with high-throughput sample processing in 96-well plate format, using an automated laboratory workstation. Capillary electrophoresis analysis of the fluorophore-labeled glycans was also optimized for rapid (<3 min) separation to accommodate the high-throughput processing of the automated sample preparation workflow. Ultrafast N-glycosylation analyses of several commercially relevant antibody therapeutics are also shown and compared to their biosimilar counterparts, addressing the biological significance of the differences. © 2015 Society for Laboratory Automation and Screening.
MIDAS: Mining differentially activated subpaths of KEGG pathways from multi-class RNA-seq data.
Lee, Sangseon; Park, Youngjune; Kim, Sun
2017-07-15
Pathway based analysis of high throughput transcriptome data is a widely used approach to investigate biological mechanisms. Since a pathway consists of multiple functions, the recent approach is to determine condition specific sub-pathways or subpaths. However, there are several challenges. First, few existing methods utilize explicit gene expression information from RNA-seq. More importantly, subpath activity is usually an average of statistical scores, e.g., correlations, of edges in a candidate subpath, which fails to reflect gene expression quantity information. In addition, none of existing methods can handle multiple phenotypes. To address these technical problems, we designed and implemented an algorithm, MIDAS, that determines condition specific subpaths, each of which has different activities across multiple phenotypes. MIDAS utilizes gene expression quantity information fully and the network centrality information to determine condition specific subpaths. To test performance of our tool, we used TCGA breast cancer RNA-seq gene expression profiles with five molecular subtypes. 36 differentially activate subpaths were determined. The utility of our method, MIDAS, was demonstrated in four ways. All 36 subpaths are well supported by the literature information. Subsequently, we showed that these subpaths had a good discriminant power for five cancer subtype classification and also had a prognostic power in terms of survival analysis. Finally, in a performance comparison of MIDAS to a recent subpath prediction method, PATHOME, our method identified more subpaths and much more genes that are well supported by the literature information. http://biohealth.snu.ac.kr/software/MIDAS/. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Peng, Chen; Frommlet, Alexandra; Perez, Manuel; Cobas, Carlos; Blechschmidt, Anke; Dominguez, Santiago; Lingel, Andreas
2016-04-14
NMR binding assays are routinely applied in hit finding and validation during early stages of drug discovery, particularly for fragment-based lead generation. To this end, compound libraries are screened by ligand-observed NMR experiments such as STD, T1ρ, and CPMG to identify molecules interacting with a target. The analysis of a high number of complex spectra is performed largely manually and therefore represents a limiting step in hit generation campaigns. Here we report a novel integrated computational procedure that processes and analyzes ligand-observed proton and fluorine NMR binding data in a fully automated fashion. A performance evaluation comparing automated and manual analysis results on (19)F- and (1)H-detected data sets shows that the program delivers robust, high-confidence hit lists in a fraction of the time needed for manual analysis and greatly facilitates visual inspection of the associated NMR spectra. These features enable considerably higher throughput, the assessment of larger libraries, and shorter turn-around times.
GobyWeb: Simplified Management and Analysis of Gene Expression and DNA Methylation Sequencing Data
Dorff, Kevin C.; Chambwe, Nyasha; Zeno, Zachary; Simi, Manuele; Shaknovich, Rita; Campagne, Fabien
2013-01-01
We present GobyWeb, a web-based system that facilitates the management and analysis of high-throughput sequencing (HTS) projects. The software provides integrated support for a broad set of HTS analyses and offers a simple plugin extension mechanism. Analyses currently supported include quantification of gene expression for messenger and small RNA sequencing, estimation of DNA methylation (i.e., reduced bisulfite sequencing and whole genome methyl-seq), or the detection of pathogens in sequenced data. In contrast to previous analysis pipelines developed for analysis of HTS data, GobyWeb requires significantly less storage space, runs analyses efficiently on a parallel grid, scales gracefully to process tens or hundreds of multi-gigabyte samples, yet can be used effectively by researchers who are comfortable using a web browser. We conducted performance evaluations of the software and found it to either outperform or have similar performance to analysis programs developed for specialized analyses of HTS data. We found that most biologists who took a one-hour GobyWeb training session were readily able to analyze RNA-Seq data with state of the art analysis tools. GobyWeb can be obtained at http://gobyweb.campagnelab.org and is freely available for non-commercial use. GobyWeb plugins are distributed in source code and licensed under the open source LGPL3 license to facilitate code inspection, reuse and independent extensions http://github.com/CampagneLaboratory/gobyweb2-plugins. PMID:23936070
Automated glycopeptide analysis—review of current state and future directions
Dallas, David C.; Martin, William F.; Hua, Serenus
2013-01-01
Glycosylation of proteins is involved in immune defense, cell–cell adhesion, cellular recognition and pathogen binding and is one of the most common and complex post-translational modifications. Science is still struggling to assign detailed mechanisms and functions to this form of conjugation. Even the structural analysis of glycoproteins—glycoproteomics—remains in its infancy due to the scarcity of high-throughput analytical platforms capable of determining glycopeptide composition and structure, especially platforms for complex biological mixtures. Glycopeptide composition and structure can be determined with high mass-accuracy mass spectrometry, particularly when combined with chromatographic separation, but the sheer volume of generated data necessitates computational software for interpretation. This review discusses the current state of glycopeptide assignment software—advances made to date and issues that remain to be addressed. The various software and algorithms developed so far provide important insights into glycoproteomics. However, there is currently no freely available software that can analyze spectral data in batch and unambiguously determine glycopeptide compositions for N- and O-linked glycopeptides from relevant biological sources such as human milk and serum. Few programs are capable of aiding in structural determination of the glycan component. To significantly advance the field of glycoproteomics, analytical software and algorithms are required that: (i) solve for both N- and O-linked glycopeptide compositions, structures and glycosites in biological mixtures; (ii) are high-throughput and process data in batches; (iii) can interpret mass spectral data from a variety of sources and (iv) are open source and freely available. PMID:22843980
Robust Sub-nanomolar Library Preparation for High Throughput Next Generation Sequencing.
Wu, Wells W; Phue, Je-Nie; Lee, Chun-Ting; Lin, Changyi; Xu, Lai; Wang, Rong; Zhang, Yaqin; Shen, Rong-Fong
2018-05-04
Current library preparation protocols for Illumina HiSeq and MiSeq DNA sequencers require ≥2 nM initial library for subsequent loading of denatured cDNA onto flow cells. Such amounts are not always attainable from samples having a relatively low DNA or RNA input; or those for which a limited number of PCR amplification cycles is preferred (less PCR bias and/or more even coverage). A well-tested sub-nanomolar library preparation protocol for Illumina sequencers has however not been reported. The aim of this study is to provide a much needed working protocol for sub-nanomolar libraries to achieve outcomes as informative as those obtained with the higher library input (≥ 2 nM) recommended by Illumina's protocols. Extensive studies were conducted to validate a robust sub-nanomolar (initial library of 100 pM) protocol using PhiX DNA (as a control), genomic DNA (Bordetella bronchiseptica and microbial mock community B for 16S rRNA gene sequencing), messenger RNA, microRNA, and other small noncoding RNA samples. The utility of our protocol was further explored for PhiX library concentrations as low as 25 pM, which generated only slightly fewer than 50% of the reads achieved under the standard Illumina protocol starting with > 2 nM. A sub-nanomolar library preparation protocol (100 pM) could generate next generation sequencing (NGS) results as robust as the standard Illumina protocol. Following the sub-nanomolar protocol, libraries with initial concentrations as low as 25 pM could also be sequenced to yield satisfactory and reproducible sequencing results.
Microarray Detection of Duplex and Triplex DNA Binders with DNA-Modified Gold Nanoparticles
Lytton-Jean, Abigail K. R.; Han, Min Su; Mirkin, Chad A.
2008-01-01
We have designed a chip-based assay, using microarray technology, for determining the relative binding affinities of duplex and triplex DNA binders. This assay combines the high discrimination capabilities afforded by DNA-modified Au nanoparticles with the high-throughput capabilities of DNA microarrays. The detection and screening of duplex DNA binders are important because these molecules, in many cases, are potential anticancer agents as well as toxins. Triplex DNA binders are also promising drug candidates. These molecules, in conjunction with triplex forming oligonucleotides, could potentially be used to achieve control of gene expression by interfering with transcription factors that bind to DNA. Therefore, the ability to screen for these molecules in a high-throughput fashion could dramatically improve the drug screening process. The assay reported here provides excellent discrimination between strong, intermediate, and weak duplex and triplex DNA binders in a high-throughput fashion. PMID:17614366
Meng Zhang; Peh, Jessie; Hergenrother, Paul J; Cunningham, Brian T
2014-01-01
High throughput screening of protein-small molecule binding interactions using label-free optical biosensors is challenging, as the detected signals are often similar in magnitude to experimental noise. Here, we describe a novel self-referencing external cavity laser (ECL) biosensor approach that achieves high resolution and high sensitivity, while eliminating thermal noise with sub-picometer wavelength accuracy. Using the self-referencing ECL biosensor, we demonstrate detection of binding between small molecules and a variety of immobilized protein targets with binding affinities or inhibition constants in the sub-nanomolar to low micromolar range. The demonstrated ability to perform detection in the presence of several interfering compounds opens the potential for increasing the throughput of the approach. As an example application, we performed a "needle-in-the-haystack" screen for inhibitors against carbonic anhydrase isozyme II (CA II), in which known inhibitors are clearly differentiated from inactive molecules within a compound library.
Guarnieri, Michael T.; Blagg, Brian S. J.
2011-01-01
Abstract Bacterial histidine kinases (HK) are members of the GHKL superfamily, which share a unique adenosine triphosphate (ATP)-binding Bergerat fold. Our previous studies have shown that Gyrase, Hsp90, MutL (GHL) inhibitors bind to the ATP-binding pocket of HK and may provide lead compounds for the design of novel antibiotics targeting these kinases. In this article, we developed a competition assay using the fluorescent ATP analog, 2′,3′-O-(2,4,6-trinitrophenyl) adenosine 5′-triphosphate. The method can be used for high-throughput screening of compound libraries targeting HKs or other ATP-binding proteins. We utilized the assay to screen a library of GHL inhibitors targeting the bacterial HK PhoQ, and discuss the applications of the 2′,3′-O-(2,4,6-trinitrophenyl) adenosine 5′-triphosphate competition assay beyond GHKL inhibitor screening. PMID:21050069
Sequencing the extrachromosomal circular mobilome reveals retrotransposon activity in plants
Llauro, Christel; Jobet, Edouard; Robakowska-Hyzorek, Dagmara; Lasserre, Eric; Ghesquière, Alain; Panaud, Olivier
2017-01-01
Retrotransposons are mobile genetic elements abundant in plant and animal genomes. While efficiently silenced by the epigenetic machinery, they can be reactivated upon stress or during development. Their level of transcription not reflecting their transposition ability, it is thus difficult to evaluate their contribution to the active mobilome. Here we applied a simple methodology based on the high throughput sequencing of extrachromosomal circular DNA (eccDNA) forms of active retrotransposons to characterize the repertoire of mobile retrotransposons in plants. This method successfully identified known active retrotransposons in both Arabidopsis and rice material where the epigenome is destabilized. When applying mobilome-seq to developmental stages in wild type rice, we identified PopRice as a highly active retrotransposon producing eccDNA forms in the wild type endosperm. The mobilome-seq strategy opens new routes for the characterization of a yet unexplored fraction of plant genomes. PMID:28212378
Sequencing the extrachromosomal circular mobilome reveals retrotransposon activity in plants.
Lanciano, Sophie; Carpentier, Marie-Christine; Llauro, Christel; Jobet, Edouard; Robakowska-Hyzorek, Dagmara; Lasserre, Eric; Ghesquière, Alain; Panaud, Olivier; Mirouze, Marie
2017-02-01
Retrotransposons are mobile genetic elements abundant in plant and animal genomes. While efficiently silenced by the epigenetic machinery, they can be reactivated upon stress or during development. Their level of transcription not reflecting their transposition ability, it is thus difficult to evaluate their contribution to the active mobilome. Here we applied a simple methodology based on the high throughput sequencing of extrachromosomal circular DNA (eccDNA) forms of active retrotransposons to characterize the repertoire of mobile retrotransposons in plants. This method successfully identified known active retrotransposons in both Arabidopsis and rice material where the epigenome is destabilized. When applying mobilome-seq to developmental stages in wild type rice, we identified PopRice as a highly active retrotransposon producing eccDNA forms in the wild type endosperm. The mobilome-seq strategy opens new routes for the characterization of a yet unexplored fraction of plant genomes.
Evaluation of tools for highly variable gene discovery from single-cell RNA-seq data.
Yip, Shun H; Sham, Pak Chung; Wang, Junwen
2018-02-21
Traditional RNA sequencing (RNA-seq) allows the detection of gene expression variations between two or more cell populations through differentially expressed gene (DEG) analysis. However, genes that contribute to cell-to-cell differences are not discoverable with RNA-seq because RNA-seq samples are obtained from a mixture of cells. Single-cell RNA-seq (scRNA-seq) allows the detection of gene expression in each cell. With scRNA-seq, highly variable gene (HVG) discovery allows the detection of genes that contribute strongly to cell-to-cell variation within a homogeneous cell population, such as a population of embryonic stem cells. This analysis is implemented in many software packages. In this study, we compare seven HVG methods from six software packages, including BASiCS, Brennecke, scLVM, scran, scVEGs and Seurat. Our results demonstrate that reproducibility in HVG analysis requires a larger sample size than DEG analysis. Discrepancies between methods and potential issues in these tools are discussed and recommendations are made.
Shirotani, Keiro; Futakawa, Satoshi; Nara, Kiyomitsu; Hoshi, Kyoka; Saito, Toshie; Tohyama, Yuriko; Kitazume, Shinobu; Yuasa, Tatsuhiko; Miyajima, Masakazu; Arai, Hajime; Kuno, Atsushi; Narimatsu, Hisashi; Hashimoto, Yasuhiro
2011-01-01
We have established high-throughput lectin-antibody ELISAs to measure different glycans on transferrin (Tf) in cerebrospinal fluid (CSF) using lectins and an anti-transferrin antibody (TfAb). Lectin blot and precipitation analysis of CSF revealed that PVL (Psathyrella velutina lectin) bound an unique N-acetylglucosamine-terminated N-glycans on “CSF-type” Tf whereas SSA (Sambucus sieboldiana agglutinin) bound α2,6-N-acetylneuraminic acid-terminated N-glycans on “serum-type” Tf. PVL-TfAb ELISA of 0.5 μL CSF samples detected “CSF-type” Tf but not “serum-type” Tf whereas SSA-TfAb ELISA detected “serum-type” Tf but not “CSF-type” Tf, demonstrating the specificity of the lectin-TfAb ELISAs. In idiopathic normal pressure hydrocephalus (iNPH), a senile dementia associated with ventriculomegaly, amounts of the SSA-reactive Tf were significantly higher than in non-iNPH patients, indicating that Tf glycan analysis by the high-throughput lectin-TfAb ELISAs could become practical diagnostic tools for iNPH. The lectin-antibody ELISAs of CSF proteins might be useful for diagnosis of the other neurological diseases. PMID:21876827
Shirotani, Keiro; Futakawa, Satoshi; Nara, Kiyomitsu; Hoshi, Kyoka; Saito, Toshie; Tohyama, Yuriko; Kitazume, Shinobu; Yuasa, Tatsuhiko; Miyajima, Masakazu; Arai, Hajime; Kuno, Atsushi; Narimatsu, Hisashi; Hashimoto, Yasuhiro
2011-01-01
We have established high-throughput lectin-antibody ELISAs to measure different glycans on transferrin (Tf) in cerebrospinal fluid (CSF) using lectins and an anti-transferrin antibody (TfAb). Lectin blot and precipitation analysis of CSF revealed that PVL (Psathyrella velutina lectin) bound an unique N-acetylglucosamine-terminated N-glycans on "CSF-type" Tf whereas SSA (Sambucus sieboldiana agglutinin) bound α2,6-N-acetylneuraminic acid-terminated N-glycans on "serum-type" Tf. PVL-TfAb ELISA of 0.5 μL CSF samples detected "CSF-type" Tf but not "serum-type" Tf whereas SSA-TfAb ELISA detected "serum-type" Tf but not "CSF-type" Tf, demonstrating the specificity of the lectin-TfAb ELISAs. In idiopathic normal pressure hydrocephalus (iNPH), a senile dementia associated with ventriculomegaly, amounts of the SSA-reactive Tf were significantly higher than in non-iNPH patients, indicating that Tf glycan analysis by the high-throughput lectin-TfAb ELISAs could become practical diagnostic tools for iNPH. The lectin-antibody ELISAs of CSF proteins might be useful for diagnosis of the other neurological diseases.
NASA Astrophysics Data System (ADS)
Streets, Aaron M.; Cao, Chen; Zhang, Xiannian; Huang, Yanyi
2016-03-01
Phenotype classification of single cells reveals biological variation that is masked in ensemble measurement. This heterogeneity is found in gene and protein expression as well as in cell morphology. Many techniques are available to probe phenotypic heterogeneity at the single cell level, for example quantitative imaging and single-cell RNA sequencing, but it is difficult to perform multiple assays on the same single cell. In order to directly track correlation between morphology and gene expression at the single cell level, we developed a microfluidic platform for quantitative coherent Raman imaging and immediate RNA sequencing (RNA-Seq) of single cells. With this device we actively sort and trap cells for analysis with stimulated Raman scattering microscopy (SRS). The cells are then processed in parallel pipelines for lysis, and preparation of cDNA for high-throughput transcriptome sequencing. SRS microscopy offers three-dimensional imaging with chemical specificity for quantitative analysis of protein and lipid distribution in single cells. Meanwhile, the microfluidic platform facilitates single-cell manipulation, minimizes contamination, and furthermore, provides improved RNA-Seq detection sensitivity and measurement precision, which is necessary for differentiating biological variability from technical noise. By combining coherent Raman microscopy with RNA sequencing, we can better understand the relationship between cellular morphology and gene expression at the single-cell level.
High Throughput Determination of Critical Human Dosing ...
High throughput toxicokinetics (HTTK) is a rapid approach that uses in vitro data to estimate TK for hundreds of environmental chemicals. Reverse dosimetry (i.e., reverse toxicokinetics or RTK) based on HTTK data converts high throughput in vitro toxicity screening (HTS) data into predicted human equivalent doses that can be linked with biologically relevant exposure scenarios. Thus, HTTK provides essential data for risk prioritization for thousands of chemicals that lack TK data. One critical HTTK parameter that can be measured in vitro is the unbound fraction of a chemical in plasma (Fub). However, for chemicals that bind strongly to plasma, Fub is below the limits of detection (LOD) for high throughput analytical chemistry, and therefore cannot be quantified. A novel method for quantifying Fub was implemented for 85 strategically selected chemicals: measurement of Fub was attempted at 10%, 30%, and 100% of physiological plasma concentrations using rapid equilibrium dialysis assays. Varying plasma concentrations instead of chemical concentrations makes high throughput analytical methodology more likely to be successful. Assays at 100% plasma concentration were unsuccessful for 34 chemicals. For 12 of these 34 chemicals, Fub could be quantified at 10% and/or 30% plasma concentrations; these results imply that the assay failure at 100% plasma concentration was caused by plasma protein binding for these chemicals. Assay failure for the remaining 22 chemicals may
Cyber-T web server: differential analysis of high-throughput data.
Kayala, Matthew A; Baldi, Pierre
2012-07-01
The Bayesian regularization method for high-throughput differential analysis, described in Baldi and Long (A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 2001: 17: 509-519) and implemented in the Cyber-T web server, is one of the most widely validated. Cyber-T implements a t-test using a Bayesian framework to compute a regularized variance of the measurements associated with each probe under each condition. This regularized estimate is derived by flexibly combining the empirical measurements with a prior, or background, derived from pooling measurements associated with probes in the same neighborhood. This approach flexibly addresses problems associated with low replication levels and technology biases, not only for DNA microarrays, but also for other technologies, such as protein arrays, quantitative mass spectrometry and next-generation sequencing (RNA-seq). Here we present an update to the Cyber-T web server, incorporating several useful new additions and improvements. Several preprocessing data normalization options including logarithmic and (Variance Stabilizing Normalization) VSN transforms are included. To augment two-sample t-tests, a one-way analysis of variance is implemented. Several methods for multiple tests correction, including standard frequentist methods and a probabilistic mixture model treatment, are available. Diagnostic plots allow visual assessment of the results. The web server provides comprehensive documentation and example data sets. The Cyber-T web server, with R source code and data sets, is publicly available at http://cybert.ics.uci.edu/.
The focus of this meeting is the SAP's review and comment on the Agency's proposed high-throughput computational model of androgen receptor pathway activity as an alternative to the current Tier 1 androgen receptor assay (OCSPP 890.1150: Androgen Receptor Binding Rat Prostate Cyt...
Nelson, Christopher S; Fuller, Chris K; Fordyce, Polly M; Greninger, Alexander L; Li, Hao; DeRisi, Joseph L
2013-07-01
The transcription factor forkhead box P2 (FOXP2) is believed to be important in the evolution of human speech. A mutation in its DNA-binding domain causes severe speech impairment. Humans have acquired two coding changes relative to the conserved mammalian sequence. Despite intense interest in FOXP2, it has remained an open question whether the human protein's DNA-binding specificity and chromatin localization are conserved. Previous in vitro and ChIP-chip studies have provided conflicting consensus sequences for the FOXP2-binding site. Using MITOMI 2.0 microfluidic affinity assays, we describe the binding site of FOXP2 and its affinity profile in base-specific detail for all substitutions of the strongest binding site. We find that human and chimp FOXP2 have similar binding sites that are distinct from previously suggested consensus binding sites. Additionally, through analysis of FOXP2 ChIP-seq data from cultured neurons, we find strong overrepresentation of a motif that matches our in vitro results and identifies a set of genes with FOXP2 binding sites. The FOXP2-binding sites tend to be conserved, yet we identified 38 instances of evolutionarily novel sites in humans. Combined, these data present a comprehensive portrait of FOXP2's-binding properties and imply that although its sequence specificity has been conserved, some of its genomic binding sites are newly evolved.
Nelson, Christopher S.; Fuller, Chris K.; Fordyce, Polly M.; Greninger, Alexander L.; Li, Hao; DeRisi, Joseph L.
2013-01-01
The transcription factor forkhead box P2 (FOXP2) is believed to be important in the evolution of human speech. A mutation in its DNA-binding domain causes severe speech impairment. Humans have acquired two coding changes relative to the conserved mammalian sequence. Despite intense interest in FOXP2, it has remained an open question whether the human protein’s DNA-binding specificity and chromatin localization are conserved. Previous in vitro and ChIP-chip studies have provided conflicting consensus sequences for the FOXP2-binding site. Using MITOMI 2.0 microfluidic affinity assays, we describe the binding site of FOXP2 and its affinity profile in base-specific detail for all substitutions of the strongest binding site. We find that human and chimp FOXP2 have similar binding sites that are distinct from previously suggested consensus binding sites. Additionally, through analysis of FOXP2 ChIP-seq data from cultured neurons, we find strong overrepresentation of a motif that matches our in vitro results and identifies a set of genes with FOXP2 binding sites. The FOXP2-binding sites tend to be conserved, yet we identified 38 instances of evolutionarily novel sites in humans. Combined, these data present a comprehensive portrait of FOXP2’s-binding properties and imply that although its sequence specificity has been conserved, some of its genomic binding sites are newly evolved. PMID:23625967
A novel statistical method for quantitative comparison of multiple ChIP-seq datasets.
Chen, Li; Wang, Chi; Qin, Zhaohui S; Wu, Hao
2015-06-15
ChIP-seq is a powerful technology to measure the protein binding or histone modification strength in the whole genome scale. Although there are a number of methods available for single ChIP-seq data analysis (e.g. 'peak detection'), rigorous statistical method for quantitative comparison of multiple ChIP-seq datasets with the considerations of data from control experiment, signal to noise ratios, biological variations and multiple-factor experimental designs is under-developed. In this work, we develop a statistical method to perform quantitative comparison of multiple ChIP-seq datasets and detect genomic regions showing differential protein binding or histone modification. We first detect peaks from all datasets and then union them to form a single set of candidate regions. The read counts from IP experiment at the candidate regions are assumed to follow Poisson distribution. The underlying Poisson rates are modeled as an experiment-specific function of artifacts and biological signals. We then obtain the estimated biological signals and compare them through the hypothesis testing procedure in a linear model framework. Simulations and real data analyses demonstrate that the proposed method provides more accurate and robust results compared with existing ones. An R software package ChIPComp is freely available at http://web1.sph.emory.edu/users/hwu30/software/ChIPComp.html. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Jaeger, Sébastien; Thieffry, Denis
2017-01-01
Abstract Transcription factor (TF) databases contain multitudes of binding motifs (TFBMs) from various sources, from which non-redundant collections are derived by manual curation. The advent of high-throughput methods stimulated the production of novel collections with increasing numbers of motifs. Meta-databases, built by merging these collections, contain redundant versions, because available tools are not suited to automatically identify and explore biologically relevant clusters among thousands of motifs. Motif discovery from genome-scale data sets (e.g. ChIP-seq) also produces redundant motifs, hampering the interpretation of results. We present matrix-clustering, a versatile tool that clusters similar TFBMs into multiple trees, and automatically creates non-redundant TFBM collections. A feature unique to matrix-clustering is its dynamic visualisation of aligned TFBMs, and its capability to simultaneously treat multiple collections from various sources. We demonstrate that matrix-clustering considerably simplifies the interpretation of combined results from multiple motif discovery tools, and highlights biologically relevant variations of similar motifs. We also ran a large-scale application to cluster ∼11 000 motifs from 24 entire databases, showing that matrix-clustering correctly groups motifs belonging to the same TF families, and drastically reduced motif redundancy. matrix-clustering is integrated within the RSAT suite (http://rsat.eu/), accessible through a user-friendly web interface or command-line for its integration in pipelines. PMID:28591841
Shibata, Mami; Mekuchi, Miyuki; Mori, Kazuki; Muta, Shigeru; Chowdhury, Vishwajit Sur; Nakamura, Yoji; Ojima, Nobuhiko; Saitoh, Kenji; Kobayashi, Takanori; Wada, Tokio; Inouye, Kiyoshi; Kuhara, Satoru; Tashiro, Kosuke
2016-06-01
Bluefin tuna are high-performance swimmers and top predators in the open ocean. Their swimming is grounded by unique features including an exceptional glycolytic potential in white muscle, which is supported by high enzymatic activities. Here we performed high-throughput RNA sequencing (RNA-Seq) in muscles of the Pacific bluefin tuna (Thunnus orientalis) and Pacific cod (Gadus macrocephalus) and conducted a comparative transcriptomic analysis of genes related to energy production. We found that the total expression of glycolytic genes was much higher in the white muscle of tuna than in the other muscles, and that the expression of only six genes for glycolytic enzymes accounted for 83.4% of the total. These expression patterns were in good agreement with the patterns of enzyme activity previously reported. The findings suggest that the mRNA expression of glycolytic genes may contribute directly to the enzymatic activities in the muscles of tuna.
Feld, Christine; Sahu, Peeyush; Frech, Miriam; Finkernagel, Florian; Nist, Andrea; Stiewe, Thorsten; Bauer, Uta-Maria; Neubauer, Andreas
2018-01-01
Abstract SKI is a transcriptional co-regulator and overexpressed in various human tumors, for example in acute myeloid leukemia (AML). SKI contributes to the origin and maintenance of the leukemic phenotype. Here, we use ChIP-seq and RNA-seq analysis to identify the epigenetic alterations induced by SKI overexpression in AML cells. We show that approximately two thirds of differentially expressed genes are up-regulated upon SKI deletion, of which >40% harbor SKI binding sites in their proximity, primarily in enhancer regions. Gene ontology analysis reveals that many of the differentially expressed genes are annotated to hematopoietic cell differentiation and inflammatory response, corroborating our finding that SKI contributes to a myeloid differentiation block in HL60 cells. We find that SKI peaks are enriched for RUNX1 consensus motifs, particularly in up-regulated SKI targets upon SKI deletion. RUNX1 ChIP-seq displays that nearly 70% of RUNX1 binding sites overlap with SKI peaks, mainly at enhancer regions. SKI and RUNX1 occupy the same genomic sites and cooperate in gene silencing. Our work demonstrates for the first time the predominant co-repressive function of SKI in AML cells on a genome-wide scale and uncovers the transcription factor RUNX1 as an important mediator of SKI-dependent transcriptional repression. PMID:29471413
Single-cell genome sequencing at ultra-high-throughput with microfluidic droplet barcoding.
Lan, Freeman; Demaree, Benjamin; Ahmed, Noorsher; Abate, Adam R
2017-07-01
The application of single-cell genome sequencing to large cell populations has been hindered by technical challenges in isolating single cells during genome preparation. Here we present single-cell genomic sequencing (SiC-seq), which uses droplet microfluidics to isolate, fragment, and barcode the genomes of single cells, followed by Illumina sequencing of pooled DNA. We demonstrate ultra-high-throughput sequencing of >50,000 cells per run in a synthetic community of Gram-negative and Gram-positive bacteria and fungi. The sequenced genomes can be sorted in silico based on characteristic sequences. We use this approach to analyze the distributions of antibiotic-resistance genes, virulence factors, and phage sequences in microbial communities from an environmental sample. The ability to routinely sequence large populations of single cells will enable the de-convolution of genetic heterogeneity in diverse cell populations.
Chen, Rong; Zhou, Jingjing; Qin, Lingyun; Chen, Yao; Huang, Yongqi; Liu, Huili; Su, Zhengding
2017-06-27
In nearly half of cancers, the anticancer activity of p53 protein is often impaired by the overexpressed oncoprotein Mdm2 and its homologue, MdmX, demanding efficient therapeutics to disrupt the aberrant p53-MdmX/Mdm2 interactions to restore the p53 activity. While many potent Mdm2-specific inhibitors have already undergone clinical investigations, searching for MdmX-specific inhibitors has become very attractive, requiring a more efficient screening strategy for evaluating potential scaffolds or leads. In this work, considering that the intrinsic fluorescence residue Trp23 in the p53 transaction domain (p53p) plays an important role in determining the p53-MdmX/Mdm2 interactions, we constructed a fusion protein to utilize this intrinsic fluorescence signal to monitor high-throughput screening of a compound library. The fusion protein was composed of the p53p followed by the N-terminal domain of MdmX (N-MdmX) through a flexible amino acid linker, while the whole fusion protein contained a sole intrinsic fluorescence probe. The fusion protein was then evaluated using fluorescence spectroscopy against model compounds. Our results revealed that the variation of the fluorescence signal was highly correlated with the concentration of the ligand within 65 μM. The fusion protein was further evaluated with respect to its feasibility for use in high-throughput screening using a model compound library, including controls. We found that the imidazo-indole scaffold was a bona fide scaffold for template-based design of MdmX inhibitors. Thus, the p53p-N-MdmX fusion protein we designed provides a convenient and efficient tool for high-throughput screening of new MdmX inhibitors. The strategy described in this work should be applicable for other protein targets to accelerate drug discovery.
Cornman, Robert S.
2017-01-01
Deformed wing virus (DWV) is a major pathogen of concern to apiculture, and recent reports have indicated the local predominance and potential virulence of recombinants between DWV and a related virus, Varroa destructor virus 1 (VDV). However, little is known about the frequency and titer of VDV and recombinants relative to DWV generally. In this study, I assessed the relative occurrence and titer of DWV and VDV in public RNA-seq accessions of honey bee using a rapid, kmer-based approach. Three recombinant types were detectable graphically and corroborated by de novo assembly. Recombination breakpoints did not disrupt the capsid-encoding region, consistent with previous reports, and both VDV- and DWV-derived capsids were observed in recombinant backgrounds. High abundance of VDV kmers was largely restricted to recombinant forms. Non-metric multidimensional scaling identified genotypic clusters among DWV isolates, which was corroborated by read mapping and consensus generation. The recently described DWV-C lineage was not detected in the searched accessions. The data further highlight the utility of high-throughput sequencing to monitor viral polymorphisms and statistically test biological predictors of titer, and point to the need for consistent methodologies and sampling schemes.
Kayal, Ehsan; Bentlage, Bastian; Cartwright, Paulyn; Yanagihara, Angel A; Lindsay, Dhugal J; Hopcroft, Russell R; Collins, Allen G
2015-01-01
Hydrozoans display the most morphological diversity within the phylum Cnidaria. While recent molecular studies have provided some insights into their evolutionary history, sister group relationships remain mostly unresolved, particularly at mid-taxonomic levels. Specifically, within Hydroidolina, the most speciose hydrozoan subclass, the relationships and sometimes integrity of orders are highly unsettled. Here we obtained the near complete mitochondrial sequence of twenty-six hydroidolinan hydrozoan species from a range of sources (DNA and RNA-seq data, long-range PCR). Our analyses confirm previous inference of the evolution of mtDNA in Hydrozoa while introducing a novel genome organization. Using RNA-seq data, we propose a mechanism for the expression of mitochondrial mRNA in Hydroidolina that can be extrapolated to the other medusozoan taxa. Phylogenetic analyses using the full set of mitochondrial gene sequences provide some insights into the order-level relationships within Hydroidolina, including siphonophores as the first diverging clade, a well-supported clade comprised of Leptothecata-Filifera III-IV, and a second clade comprised of Aplanulata-Capitata s.s.-Filifera I-II. Finally, we describe our relatively inexpensive and accessible multiplexing strategy to sequence long-range PCR amplicons that can be adapted to most high-throughput sequencing platforms.
Bentlage, Bastian; Cartwright, Paulyn; Yanagihara, Angel A.; Lindsay, Dhugal J.; Hopcroft, Russell R.; Collins, Allen G.
2015-01-01
Hydrozoans display the most morphological diversity within the phylum Cnidaria. While recent molecular studies have provided some insights into their evolutionary history, sister group relationships remain mostly unresolved, particularly at mid-taxonomic levels. Specifically, within Hydroidolina, the most speciose hydrozoan subclass, the relationships and sometimes integrity of orders are highly unsettled. Here we obtained the near complete mitochondrial sequence of twenty-six hydroidolinan hydrozoan species from a range of sources (DNA and RNA-seq data, long-range PCR). Our analyses confirm previous inference of the evolution of mtDNA in Hydrozoa while introducing a novel genome organization. Using RNA-seq data, we propose a mechanism for the expression of mitochondrial mRNA in Hydroidolina that can be extrapolated to the other medusozoan taxa. Phylogenetic analyses using the full set of mitochondrial gene sequences provide some insights into the order-level relationships within Hydroidolina, including siphonophores as the first diverging clade, a well-supported clade comprised of Leptothecata-Filifera III–IV, and a second clade comprised of Aplanulata-Capitata s.s.-Filifera I–II. Finally, we describe our relatively inexpensive and accessible multiplexing strategy to sequence long-range PCR amplicons that can be adapted to most high-throughput sequencing platforms. PMID:26618080
Lun, Aaron T.L.; Smyth, Gordon K.
2016-01-01
Chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq) is widely used to identify binding sites for a target protein in the genome. An important scientific application is to identify changes in protein binding between different treatment conditions, i.e. to detect differential binding. This can reveal potential mechanisms through which changes in binding may contribute to the treatment effect. The csaw package provides a framework for the de novo detection of differentially bound genomic regions. It uses a window-based strategy to summarize read counts across the genome. It exploits existing statistical software to test for significant differences in each window. Finally, it clusters windows into regions for output and controls the false discovery rate properly over all detected regions. The csaw package can handle arbitrarily complex experimental designs involving biological replicates. It can be applied to both transcription factor and histone mark datasets, and, more generally, to any type of sequencing data measuring genomic coverage. csaw performs favorably against existing methods for de novo DB analyses on both simulated and real data. csaw is implemented as a R software package and is freely available from the open-source Bioconductor project. PMID:26578583
Bailey, Sarah F; Scheible, Melissa K; Williams, Christopher; Silva, Deborah S B S; Hoggan, Marina; Eichman, Christopher; Faith, Seth A
2017-11-01
Next-generation Sequencing (NGS) is a rapidly evolving technology with demonstrated benefits for forensic genetic applications, and the strategies to analyze and manage the massive NGS datasets are currently in development. Here, the computing, data storage, connectivity, and security resources of the Cloud were evaluated as a model for forensic laboratory systems that produce NGS data. A complete front-to-end Cloud system was developed to upload, process, and interpret raw NGS data using a web browser dashboard. The system was extensible, demonstrating analysis capabilities of autosomal and Y-STRs from a variety of NGS instrumentation (Illumina MiniSeq and MiSeq, and Oxford Nanopore MinION). NGS data for STRs were concordant with standard reference materials previously characterized with capillary electrophoresis and Sanger sequencing. The computing power of the Cloud was implemented with on-demand auto-scaling to allow multiple file analysis in tandem. The system was designed to store resulting data in a relational database, amenable to downstream sample interpretations and databasing applications following the most recent guidelines in nomenclature for sequenced alleles. Lastly, a multi-layered Cloud security architecture was tested and showed that industry standards for securing data and computing resources were readily applied to the NGS system without disadvantageous effects for bioinformatic analysis, connectivity or data storage/retrieval. The results of this study demonstrate the feasibility of using Cloud-based systems for secured NGS data analysis, storage, databasing, and multi-user distributed connectivity. Copyright © 2017 Elsevier B.V. All rights reserved.
BioVLAB-MMIA-NGS: microRNA-mRNA integrated analysis using high-throughput sequencing data.
Chae, Heejoon; Rhee, Sungmin; Nephew, Kenneth P; Kim, Sun
2015-01-15
It is now well established that microRNAs (miRNAs) play a critical role in regulating gene expression in a sequence-specific manner, and genome-wide efforts are underway to predict known and novel miRNA targets. However, the integrated miRNA-mRNA analysis remains a major computational challenge, requiring powerful informatics systems and bioinformatics expertise. The objective of this study was to modify our widely recognized Web server for the integrated mRNA-miRNA analysis (MMIA) and its subsequent deployment on the Amazon cloud (BioVLAB-MMIA) to be compatible with high-throughput platforms, including next-generation sequencing (NGS) data (e.g. RNA-seq). We developed a new version called the BioVLAB-MMIA-NGS, deployed on both Amazon cloud and on a high-performance publicly available server called MAHA. By using NGS data and integrating various bioinformatics tools and databases, BioVLAB-MMIA-NGS offers several advantages. First, sequencing data is more accurate than array-based methods for determining miRNA expression levels. Second, potential novel miRNAs can be detected by using various computational methods for characterizing miRNAs. Third, because miRNA-mediated gene regulation is due to hybridization of an miRNA to its target mRNA, sequencing data can be used to identify many-to-many relationship between miRNAs and target genes with high accuracy. http://epigenomics.snu.ac.kr/biovlab_mmia_ngs/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Peterson, Elena S; McCue, Lee Ann; Schrimpe-Rutledge, Alexandra C; Jensen, Jeffrey L; Walker, Hyunjoo; Kobold, Markus A; Webb, Samantha R; Payne, Samuel H; Ansong, Charles; Adkins, Joshua N; Cannon, William R; Webb-Robertson, Bobbie-Jo M
2012-04-05
The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates. VESPA is a desktop Java™ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (Yersinia pestis Pestoides F and Synechococcus sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data. VESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic genomes. Data is evaluated via visual analysis across multiple levels of genomic resolution, linked searches and interaction with existing bioinformatics tools. We highlight the novel functionality of VESPA and core programming requirements for visualization of these large heterogeneous datasets for a client-side application. The software is freely available at https://www.biopilot.org/docs/Software/Vespa.php.
2012-01-01
Background The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates. Results VESPA is a desktop Java™ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (Yersinia pestis Pestoides F and Synechococcus sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data. Conclusions VESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic genomes. Data is evaluated via visual analysis across multiple levels of genomic resolution, linked searches and interaction with existing bioinformatics tools. We highlight the novel functionality of VESPA and core programming requirements for visualization of these large heterogeneous datasets for a client-side application. The software is freely available at https://www.biopilot.org/docs/Software/Vespa.php. PMID:22480257
Hurley, Jennifer M.; Dasgupta, Arko; Emerson, Jillian M.; Zhou, Xiaoying; Ringelberg, Carol S.; Knabe, Nicole; Lipzen, Anna M.; Lindquist, Erika A.; Daum, Christopher G.; Barry, Kerrie W.; Grigoriev, Igor V.; Smith, Kristina M.; Galagan, James E.; Bell-Pedersen, Deborah; Freitag, Michael; Cheng, Chao; Loros, Jennifer J.; Dunlap, Jay C.
2014-01-01
Neurospora crassa has been for decades a principal model for filamentous fungal genetics and physiology as well as for understanding the mechanism of circadian clocks. Eukaryotic fungal and animal clocks comprise transcription-translation–based feedback loops that control rhythmic transcription of a substantial fraction of these transcriptomes, yielding the changes in protein abundance that mediate circadian regulation of physiology and metabolism: Understanding circadian control of gene expression is key to understanding eukaryotic, including fungal, physiology. Indeed, the isolation of clock-controlled genes (ccgs) was pioneered in Neurospora where circadian output begins with binding of the core circadian transcription factor WCC to a subset of ccg promoters, including those of many transcription factors. High temporal resolution (2-h) sampling over 48 h using RNA sequencing (RNA-Seq) identified circadianly expressed genes in Neurospora, revealing that from ∼10% to as much 40% of the transcriptome can be expressed under circadian control. Functional classifications of these genes revealed strong enrichment in pathways involving metabolism, protein synthesis, and stress responses; in broad terms, daytime metabolic potential favors catabolism, energy production, and precursor assembly, whereas night activities favor biosynthesis of cellular components and growth. Discriminative regular expression motif elicitation (DREME) identified key promoter motifs highly correlated with the temporal regulation of ccgs. Correlations between ccg abundance from RNA-Seq, the degree of ccg-promoter activation as reported by ccg-promoter–luciferase fusions, and binding of WCC as measured by ChIP-Seq, are not strong. Therefore, although circadian activation is critical to ccg rhythmicity, posttranscriptional regulation plays a major role in determining rhythmicity at the mRNA level. PMID:25362047
Wright, Imogen A; Travers, Simon A
2014-07-01
The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Johnson, Matthew E; Deliard, Sandra; Zhu, Fengchang; Xia, Qianghua; Wells, Andrew D; Hankenson, Kurt D; Grant, Struan F A
2014-04-01
Genome-wide association studies (GWAS) have demonstrated that genetic variation at the MADS box transcription enhancer factor 2, polypeptide C (MEF2C) locus is robustly associated with bone mineral density, primarily at the femoral neck. MEF2C is a transcription factor known to operate via the Wnt signaling pathway. Our hypothesis was that MEF2C regulates the expression of a set of molecular pathways critical to skeletal function. Drawing on our laboratory and bioinformatic experience with ChIP-seq, we analyzed ChIP-seq data for MEF2C available via the ENCODE project to gain insight in to its global genomic binding pattern. We aligned the ChIP-seq data generated for GM12878 (an established lymphoblastoid cell line) and, using the analysis package HOMER, a total of 17,611 binding sites corresponding to 8,118 known genes were observed. We then performed a pathway analysis of the gene list using Ingenuity. At 5 kb, the gene list yielded 'EIF2 Signaling' as the most significant annotation, with a P value of 5.01 × 10(-26). Moving further out, this category remained the top pathway at 50 and 100 kb, then dropped to just second place at 500 kb and beyond by 'Molecular Mechanisms of Cancer'. In addition, at 50 kb and beyond 'RANK Signaling in Osteoclasts' was a consistent feature and resonates with the main general finding from GWAS of bone density. We also observed that MEF2C binding sites were significantly enriched primarily near inflammation associated genes identified from GWAS; indeed, a similar enrichment for inflammation genes has been reported previously using a similar approach for the vitamin D receptor, an established key regulator of bone turnover. Our analyses point to known connective tissue and skeletal processes but also provide novel insights in to networks involved in skeletal regulation. The fact that a specific GWAS category is enriched points to a possible role of inflammation through which it impacts bone mineral density.
Population Studies of Intact Vitamin D Binding Protein by Affinity Capture ESI-TOF-MS
Borges, Chad R.; Jarvis, Jason W.; Oran, Paul E.; Rogers, Stephen P.; Nelson, Randall W.
2008-01-01
Blood plasma proteins with molecular weights greater than approximately 30 kDa are refractory to comprehensive, high-throughput qualitative characterization of microheterogeneity across human populations. Analytical techniques for obtaining high mass resolution for targeted, intact protein characterization and, separately, high sample throughput exist, but efficient means of coupling these assay characteristics remain rather limited. This article discusses the impetus for analyzing intact proteins in a targeted manner across populations and describes the methodology required to couple mass spectrometric immunoassay with electrospray ionization mass spectrometry for the purpose of qualitatively characterizing a prototypical large plasma protein, vitamin D binding protein, across populations. PMID:19137103
MetaUniDec: High-Throughput Deconvolution of Native Mass Spectra
NASA Astrophysics Data System (ADS)
Reid, Deseree J.; Diesing, Jessica M.; Miller, Matthew A.; Perry, Scott M.; Wales, Jessica A.; Montfort, William R.; Marty, Michael T.
2018-04-01
The expansion of native mass spectrometry (MS) methods for both academic and industrial applications has created a substantial need for analysis of large native MS datasets. Existing software tools are poorly suited for high-throughput deconvolution of native electrospray mass spectra from intact proteins and protein complexes. The UniDec Bayesian deconvolution algorithm is uniquely well suited for high-throughput analysis due to its speed and robustness but was previously tailored towards individual spectra. Here, we optimized UniDec for deconvolution, analysis, and visualization of large data sets. This new module, MetaUniDec, centers around a hierarchical data format 5 (HDF5) format for storing datasets that significantly improves speed, portability, and file size. It also includes code optimizations to improve speed and a new graphical user interface for visualization, interaction, and analysis of data. To demonstrate the utility of MetaUniDec, we applied the software to analyze automated collision voltage ramps with a small bacterial heme protein and large lipoprotein nanodiscs. Upon increasing collisional activation, bacterial heme-nitric oxide/oxygen binding (H-NOX) protein shows a discrete loss of bound heme, and nanodiscs show a continuous loss of lipids and charge. By using MetaUniDec to track changes in peak area or mass as a function of collision voltage, we explore the energetic profile of collisional activation in an ultra-high mass range Orbitrap mass spectrometer. [Figure not available: see fulltext.
Tan, Kun; Zhang, Zhenni; Miao, Kai; Yu, Yong; Sui, Linlin; Tian, Jianhui; An, Lei
2016-07-01
How does in vitro fertilization (IVF) alter promoter DNA methylation patterns and its subsequent effects on gene expression profiles during placentation in mice? IVF-induced alterations in promoter DNA methylation might have functional consequences in a number of biological processes and functions during IVF placentation, including actin cytoskeleton organization, hematopoiesis, vasculogenesis, energy metabolism and nutrient transport. During post-implantation embryonic development, both embryonic and extraembryonic tissues undergo de novo DNA methylation, thereby establishing a global DNA methylation pattern, and influencing gene expression profiles. Embryonic and placental tissues of IVF conceptuses can have aberrant morphology and functions, resulting in adverse pregnancy outcomes such as pregnancy loss, low birthweight, and long-term health effects. To date, the IVF-induced global profiling of DNA methylation alterations, and their functional consequences on aberrant gene expression profiles in IVF placentas have not been systematically studied. Institute for Cancer Research mice (6 week-old females and 8-9 week-old males) were used to generate in vivo fertilization (IVO) and IVF blastocysts. After either IVO and development (IVO group as control) or in vitro fertilization and culture (IVF group), blastocysts were collected and transferred to pseudo-pregnant recipient mice. Extraembryonic (ectoplacental cone and extraembryonic ectoderm) and placental tissues from both groups were sampled at embryonic day (E) 7.5 (IVO, n = 822; IVF, n = 795) and E10.5 (IVO, n = 324; IVF, n = 278), respectively. The collected extraembryonic (E7.5) and placental tissues (E10.5) were then used for high-throughput RNA sequencing (RNA-seq) and methylated DNA immunoprecipitation sequencing (MeDIP-seq). The main dysfunctions indicated by bioinformatic analyses were further validated using molecular detection, and morphometric and phenotypic analyses. Dynamic functional profiling of high-throughput data, together with molecular detection, and morphometric and phenotypic analyses, showed that differentially expressed genes dysregulated by DNA methylation were functionally involved in: (i) actin cytoskeleton disorganization in IVF extraembryonic tissues, which may impair allantois or chorion formation, and chorioallantoic fusion; (ii) disturbed hematopoiesis and vasculogenesis, which may lead to abnormal placenta labyrinth formation and thereby impairing nutrition transport in IVF placentas; (iii) dysregulated energy and amino acid metabolism, which may cause placental dysfunctions, leading to delayed embryonic development or even lethality; (iv) disrupted genetic information processing, which can further influence gene transcriptional and translational processes. Findings in mouse placental tissues may not be fully representative of human placentas. Further studies are necessary to confirm these findings and determine their clinical significance. Our study is the first to provide the genome-wide analysis of gene expression dysregulation caused by DNA methylation during IVF placentation. Systematic understanding of the molecular mechanisms implicated in IVF placentation can be useful for the improvement of existing assisted conception systems to prevent these IVF-associated safety concerns. This work was supported by grants from the National Natural Science Foundation of China (No. 31472092), and the National High-Tech R&D Program (Nos. 2011|AA100303, 2013AA102506). There was no conflict of interest. © The Author 2016. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Identification of innate lymphoid cells in single-cell RNA-Seq data.
Suffiotti, Madeleine; Carmona, Santiago J; Jandus, Camilla; Gfeller, David
2017-07-01
Innate lymphoid cells (ILCs) consist of natural killer (NK) cells and non-cytotoxic ILCs that are broadly classified into ILC1, ILC2, and ILC3 subtypes. These cells recently emerged as important early effectors of innate immunity for their roles in tissue homeostasis and inflammation. Over the last few years, ILCs have been extensively studied in mouse and human at the functional and molecular level, including gene expression profiling. However, sorting ILCs with flow cytometry for gene expression analysis is a delicate and time-consuming process. Here we propose and validate a novel framework for studying ILCs at the transcriptomic level using single-cell RNA-Seq data. Our approach combines unsupervised clustering and a new cell type classifier trained on mouse ILC gene expression data. We show that this approach can accurately identify different ILCs, especially ILC2 cells, in human lymphocyte single-cell RNA-Seq data. Our new model relies only on genes conserved across vertebrates, thereby making it in principle applicable in any vertebrate species. Considering the rapid increase in throughput of single-cell RNA-Seq technology, our work provides a computational framework for studying ILC2 cells in single-cell transcriptomic data and may help exploring their conservation in distant vertebrate species.
Zuo, Qisheng; Li, Dong; Zhang, Lei; Elsayed, Ahmed Kamel; Lian, Chao; Shi, Qingqing; Zhang, Zhentao; Zhu, Rui; Wang, Yinjie; Jin, Kai; Zhang, Yani; Li, Bichun
2015-01-01
Here, we explore the regulatory mechanism of lipid metabolic signaling pathways and related genes during differentiation of male germ cells in chickens, with the hope that better understanding of these pathways may improve in vitro induction. Fluorescence-activated cell sorting was used to obtain highly purified cultures of embryonic stem cells (ESCs), primitive germ cells (PGCs), and spermatogonial stem cells (SSCs). The total RNA was then extracted from each type of cell. High-throughput analysis methods (RNA-seq) were used to sequence the transcriptome of these cells. Gene Ontology (GO) analysis and the KEGG database were used to identify lipid metabolism pathways and related genes. Retinoic acid (RA), the end-product of the retinol metabolism pathway, induced in vitro differentiation of ESC into male germ cells. Quantitative real-time PCR (qRT-PCR) was used to detect changes in the expression of the genes involved in the retinol metabolic pathways. From the results of RNA-seq and the database analyses, we concluded that there are 328 genes in 27 lipid metabolic pathways continuously involved in lipid metabolism during the differentiation of ESC into SSC in vivo, including retinol metabolism. Alcohol dehydrogenase 5 (ADH5) and aldehyde dehydrogenase 1 family member A1 (ALDH1A1) are involved in RA synthesis in the cell. ADH5 was specifically expressed in PGC in our experiments and aldehyde dehydrogenase 1 family member A1 (ALDH1A1) persistently increased throughout development. CYP26b1, a member of the cytochrome P450 superfamily, is involved in the degradation of RA. Expression of CYP26b1, in contrast, decreased throughout development. Exogenous RA in the culture medium induced differentiation of ESC to SSC-like cells. The expression patterns of ADH5, ALDH1A1, and CYP26b1 were consistent with RNA-seq results. We conclude that the retinol metabolism pathway plays an important role in the process of chicken male germ cell differentiation.
Linkage maps of the Atlantic salmon (Salmo salar) genome derived from RAD sequencing
2014-01-01
Background Genetic linkage maps are useful tools for mapping quantitative trait loci (QTL) influencing variation in traits of interest in a population. Genotyping-by-sequencing approaches such as Restriction-site Associated DNA sequencing (RAD-Seq) now enable the rapid discovery and genotyping of genome-wide SNP markers suitable for the development of dense SNP linkage maps, including in non-model organisms such as Atlantic salmon (Salmo salar). This paper describes the development and characterisation of a high density SNP linkage map based on SbfI RAD-Seq SNP markers from two Atlantic salmon reference families. Results Approximately 6,000 SNPs were assigned to 29 linkage groups, utilising markers from known genomic locations as anchors. Linkage maps were then constructed for the four mapping parents separately. Overall map lengths were comparable between male and female parents, but the distribution of the SNPs showed sex-specific patterns with a greater degree of clustering of sire-segregating SNPs to single chromosome regions. The maps were integrated with the Atlantic salmon draft reference genome contigs, allowing the unique assignment of ~4,000 contigs to a linkage group. 112 genome contigs mapped to two or more linkage groups, highlighting regions of putative homeology within the salmon genome. A comparative genomics analysis with the stickleback reference genome identified putative genes closely linked to approximately half of the ordered SNPs and demonstrated blocks of orthology between the Atlantic salmon and stickleback genomes. A subset of 47 RAD-Seq SNPs were successfully validated using a high-throughput genotyping assay, with a correspondence of 97% between the two assays. Conclusions This Atlantic salmon RAD-Seq linkage map is a resource for salmonid genomics research as genotyping-by-sequencing becomes increasingly common. This is aided by the integration of the SbfI RAD-Seq SNPs with existing reference maps and the draft reference genome, as well as the identification of putative genes proximal to the SNPs. Differences in the distribution of recombination events between the sexes is evident, and regions of homeology have been identified which are reflective of the recent salmonid whole genome duplication. PMID:24571138