Accurate Prediction of Inducible Transcription Factor Binding Intensities In Vivo
Siepel, Adam; Lis, John T.
2012-01-01
DNA sequence and local chromatin landscape act jointly to determine transcription factor (TF) binding intensity profiles. To disentangle these influences, we developed an experimental approach, called protein/DNA binding followed by high-throughput sequencing (PB–seq), that allows the binding energy landscape to be characterized genome-wide in the absence of chromatin. We applied our methods to the Drosophila Heat Shock Factor (HSF), which inducibly binds a target DNA sequence element (HSE) following heat shock stress. PB–seq involves incubating sheared naked genomic DNA with recombinant HSF, partitioning the HSF–bound and HSF–free DNA, and then detecting HSF–bound DNA by high-throughput sequencing. We compared PB–seq binding profiles with ones observed in vivo by ChIP–seq and developed statistical models to predict the observed departures from idealized binding patterns based on covariates describing the local chromatin environment. We found that DNase I hypersensitivity and tetra-acetylation of H4 were the most influential covariates in predicting changes in HSF binding affinity. We also investigated the extent to which DNA accessibility, as measured by digital DNase I footprinting data, could be predicted from MNase–seq data and the ChIP–chip profiles for many histone modifications and TFs, and found GAGA element associated factor (GAF), tetra-acetylation of H4, and H4K16 acetylation to be the most predictive covariates. Lastly, we generated an unbiased model of HSF binding sequences, which revealed distinct biophysical properties of the HSF/HSE interaction and a previously unrecognized substructure within the HSE. These findings provide new insights into the interplay between the genomic sequence and the chromatin landscape in determining transcription factor binding intensity. PMID:22479205
Yang, Chia-Chun; Andrews, Erik H; Chen, Min-Hsuan; Wang, Wan-Yu; Chen, Jeremy J W; Gerstein, Mark; Liu, Chun-Chi; Cheng, Chao
2016-08-12
Chromatin immunoprecipitation followed by massively parallel DNA sequencing (ChIP-seq) or microarray hybridization (ChIP-chip) has been widely used to determine the genomic occupation of transcription factors (TFs). We have previously developed a probabilistic method, called TIP (Target Identification from Profiles), to identify TF target genes using ChIP-seq/ChIP-chip data. To achieve high specificity, TIP applies a conservative method to estimate significance of target genes, with the trade-off being a relatively low sensitivity of target gene identification compared to other methods. Additionally, TIP's output does not render binding-peak locations or intensity, information highly useful for visualization and general experimental biological use, while the variability of ChIP-seq/ChIP-chip file formats has made input into TIP more difficult than desired. To improve upon these facets, here we present are fined TIP with key extensions. First, it implements a Gaussian mixture model for p-value estimation, increasing target gene identification sensitivity and more accurately capturing the shape of TF binding profile distributions. Second, it enables the incorporation of TF binding-peak data by identifying their locations in significant target gene promoter regions and quantifies their strengths. Finally, for full ease of implementation we have incorporated it into a web server ( http://syslab3.nchu.edu.tw/iTAR/ ) that enables flexibility of input file format, can be used across multiple species and genome assembly versions, and is freely available for public use. The web server additionally performs GO enrichment analysis for the identified target genes to reveal the potential function of the corresponding TF. The iTAR web server provides a user-friendly interface and supports target gene identification in seven species, ranging from yeast to human. To facilitate investigating the quality of ChIP-seq/ChIP-chip data, the web server generates the chart of the characteristic binding profiles and the density plot of normalized regulatory scores. The iTAR web server is a useful tool in identifying TF target genes from ChIP-seq/ChIP-chip data and discovering biological insights.
Wei, Yingying; Wu, George; Ji, Hongkai
2013-05-01
Mapping genome-wide binding sites of all transcription factors (TFs) in all biological contexts is a critical step toward understanding gene regulation. The state-of-the-art technologies for mapping transcription factor binding sites (TFBSs) couple chromatin immunoprecipitation (ChIP) with high-throughput sequencing (ChIP-seq) or tiling array hybridization (ChIP-chip). These technologies have limitations: they are low-throughput with respect to surveying many TFs. Recent advances in genome-wide chromatin profiling, including development of technologies such as DNase-seq, FAIRE-seq and ChIP-seq for histone modifications, make it possible to predict in vivo TFBSs by analyzing chromatin features at computationally determined DNA motif sites. This promising new approach may allow researchers to monitor the genome-wide binding sites of many TFs simultaneously. In this article, we discuss various experimental design and data analysis issues that arise when applying this approach. Through a systematic analysis of the data from the Encyclopedia Of DNA Elements (ENCODE) project, we compare the predictive power of individual and combinations of chromatin marks using supervised and unsupervised learning methods, and evaluate the value of integrating information from public ChIP and gene expression data. We also highlight the challenges and opportunities for developing novel analytical methods, such as resolving the one-motif-multiple-TF ambiguity and distinguishing functional and non-functional TF binding targets from the predicted binding sites. The online version of this article (doi:10.1007/s12561-012-9066-5) contains supplementary material, which is available to authorized users.
Kim, Tae Hoon; Dekker, Job
2018-05-01
Owing to its digital nature, ChIP-seq has become the standard method for genome-wide ChIP analysis. Using next-generation sequencing platforms (notably the Illumina Genome Analyzer), millions of short sequence reads can be obtained. The densities of recovered ChIP sequence reads along the genome are used to determine the binding sites of the protein. Although a relatively small amount of ChIP DNA is required for ChIP-seq, the current sequencing platforms still require amplification of the ChIP DNA by ligation-mediated PCR (LM-PCR). This protocol, which involves linker ligation followed by size selection, is the standard ChIP-seq protocol using an Illumina Genome Analyzer. The size-selected ChIP DNA is amplified by LM-PCR and size-selected for the second time. The purified ChIP DNA is then loaded into the Genome Analyzer. The ChIP DNA can also be processed in parallel for ChIP-chip results. © 2018 Cold Spring Harbor Laboratory Press.
Measuring Sister Chromatid Cohesion Protein Genome Occupancy in Drosophila melanogaster by ChIP-seq.
Dorsett, Dale; Misulovin, Ziva
2017-01-01
This chapter presents methods to conduct and analyze genome-wide chromatin immunoprecipitation of the cohesin complex and the Nipped-B cohesin loading factor in Drosophila cells using high-throughput DNA sequencing (ChIP-seq). Procedures for isolation of chromatin, immunoprecipitation, and construction of sequencing libraries for the Ion Torrent Proton high throughput sequencer are detailed, and computational methods to calculate occupancy as input-normalized fold-enrichment are described. The results obtained by ChIP-seq are compared to those obtained by ChIP-chip (genomic ChIP using tiling microarrays), and the effects of sequencing depth on the accuracy are analyzed. ChIP-seq provides similar sensitivity and reproducibility as ChIP-chip, and identifies the same broad regions of occupancy. The locations of enrichment peaks, however, can differ between ChIP-chip and ChIP-seq, and low sequencing depth can splinter broad regions of occupancy into distinct peaks.
3D gut-liver chip with a PK model for prediction of first-pass metabolism.
Lee, Dong Wook; Ha, Sang Keun; Choi, Inwook; Sung, Jong Hwan
2017-11-07
Accurate prediction of first-pass metabolism is essential for improving the time and cost efficiency of drug development process. Here, we have developed a microfluidic gut-liver co-culture chip that aims to reproduce the first-pass metabolism of oral drugs. This chip consists of two separate layers for gut (Caco-2) and liver (HepG2) cell lines, where cells can be co-cultured in both 2D and 3D forms. Both cell lines were maintained well in the chip, verified by confocal microscopy and measurement of hepatic enzyme activity. We investigated the PK profile of paracetamol in the chip, and corresponding PK model was constructed, which was used to predict PK profiles for different chip design parameters. Simulation results implied that a larger absorption surface area and a higher metabolic capacity are required to reproduce the in vivo PK profile of paracetamol more accurately. Our study suggests the possibility of reproducing the human PK profile on a chip, contributing to accurate prediction of pharmacological effect of drugs.
2014-01-01
Background Locating the protein-coding genes in novel genomes is essential to understanding and exploiting the genomic information but it is still difficult to accurately predict all the genes. The recent availability of detailed information about transcript structure from high-throughput sequencing of messenger RNA (RNA-Seq) delineates many expressed genes and promises increased accuracy in gene prediction. Computational gene predictors have been intensively developed for and tested in well-studied animal genomes. Hundreds of fungal genomes are now or will soon be sequenced. The differences of fungal genomes from animal genomes and the phylogenetic sparsity of well-studied fungi call for gene-prediction tools tailored to them. Results SnowyOwl is a new gene prediction pipeline that uses RNA-Seq data to train and provide hints for the generation of Hidden Markov Model (HMM)-based gene predictions and to evaluate the resulting models. The pipeline has been developed and streamlined by comparing its predictions to manually curated gene models in three fungal genomes and validated against the high-quality gene annotation of Neurospora crassa; SnowyOwl predicted N. crassa genes with 83% sensitivity and 65% specificity. SnowyOwl gains sensitivity by repeatedly running the HMM gene predictor Augustus with varied input parameters and selectivity by choosing the models with best homology to known proteins and best agreement with the RNA-Seq data. Conclusions SnowyOwl efficiently uses RNA-Seq data to produce accurate gene models in both well-studied and novel fungal genomes. The source code for the SnowyOwl pipeline (in Python) and a web interface (in PHP) is freely available from http://sourceforge.net/projects/snowyowl/. PMID:24980894
Schmidt, Florian; Gasparoni, Nina; Gasparoni, Gilles; Gianmoena, Kathrin; Cadenas, Cristina; Polansky, Julia K.; Ebert, Peter; Nordström, Karl; Barann, Matthias; Sinha, Anupam; Fröhler, Sebastian; Xiong, Jieyi; Dehghani Amirabad, Azim; Behjati Ardakani, Fatemeh; Hutter, Barbara; Zipprich, Gideon; Felder, Bärbel; Eils, Jürgen; Brors, Benedikt; Chen, Wei; Hengstler, Jan G.; Hamann, Alf; Lengauer, Thomas; Rosenstiel, Philip; Walter, Jörn; Schulz, Marcel H.
2017-01-01
The binding and contribution of transcription factors (TF) to cell specific gene expression is often deduced from open-chromatin measurements to avoid costly TF ChIP-seq assays. Thus, it is important to develop computational methods for accurate TF binding prediction in open-chromatin regions (OCRs). Here, we report a novel segmentation-based method, TEPIC, to predict TF binding by combining sets of OCRs with position weight matrices. TEPIC can be applied to various open-chromatin data, e.g. DNaseI-seq and NOMe-seq. Additionally, Histone-Marks (HMs) can be used to identify candidate TF binding sites. TEPIC computes TF affinities and uses open-chromatin/HM signal intensity as quantitative measures of TF binding strength. Using machine learning, we find low affinity binding sites to improve our ability to explain gene expression variability compared to the standard presence/absence classification of binding sites. Further, we show that both footprints and peaks capture essential TF binding events and lead to a good prediction performance. In our application, gene-based scores computed by TEPIC with one open-chromatin assay nearly reach the quality of several TF ChIP-seq data sets. Finally, these scores correctly predict known transcriptional regulators as illustrated by the application to novel DNaseI-seq and NOMe-seq data for primary human hepatocytes and CD4+ T-cells, respectively. PMID:27899623
Ranjbar, Reza; Behzadi, Payam; Najafi, Ali; Roudi, Raheleh
2017-01-01
A rapid, accurate, flexible and reliable diagnostic method may significantly decrease the costs of diagnosis and treatment. Designing an appropriate microarray chip reduces noises and probable biases in the final result. The aim of this study was to design and construct a DNA Microarray Chip for a rapid detection and identification of 10 important bacterial agents. In the present survey, 10 unique genomic regions relating to 10 pathogenic bacterial agents including Escherichia coli (E.coli), Shigella boydii, Sh.dysenteriae, Sh.flexneri, Sh.sonnei, Salmonella typhi, S.typhimurium, Brucella sp., Legionella pneumophila, and Vibrio cholera were selected for designing specific long oligo microarray probes. For this reason, the in-silico operations including utilization of the NCBI RefSeq database, Servers of PanSeq and Gview, AlleleID 7.7 and Oligo Analyzer 3.1 was done. On the other hand, the in-vitro part of the study comprised stages of robotic microarray chip probe spotting, bacterial DNAs extraction and DNA labeling, hybridization and microarray chip scanning. In wet lab section, different tools and apparatus such as Nexterion® Slide E, Qarray mini spotter, NimbleGen kit, TrayMix TM S4, and Innoscan 710 were used. A DNA microarray chip including 10 long oligo microarray probes was designed and constructed for detection and identification of 10 pathogenic bacteria. The DNA microarray chip was capable to identify all 10 bacterial agents tested simultaneously. The presence of a professional bioinformatician as a probe designer is needed to design appropriate multifunctional microarray probes to increase the accuracy of the outcomes.
Schmidt, Florian; Gasparoni, Nina; Gasparoni, Gilles; Gianmoena, Kathrin; Cadenas, Cristina; Polansky, Julia K; Ebert, Peter; Nordström, Karl; Barann, Matthias; Sinha, Anupam; Fröhler, Sebastian; Xiong, Jieyi; Dehghani Amirabad, Azim; Behjati Ardakani, Fatemeh; Hutter, Barbara; Zipprich, Gideon; Felder, Bärbel; Eils, Jürgen; Brors, Benedikt; Chen, Wei; Hengstler, Jan G; Hamann, Alf; Lengauer, Thomas; Rosenstiel, Philip; Walter, Jörn; Schulz, Marcel H
2017-01-09
The binding and contribution of transcription factors (TF) to cell specific gene expression is often deduced from open-chromatin measurements to avoid costly TF ChIP-seq assays. Thus, it is important to develop computational methods for accurate TF binding prediction in open-chromatin regions (OCRs). Here, we report a novel segmentation-based method, TEPIC, to predict TF binding by combining sets of OCRs with position weight matrices. TEPIC can be applied to various open-chromatin data, e.g. DNaseI-seq and NOMe-seq. Additionally, Histone-Marks (HMs) can be used to identify candidate TF binding sites. TEPIC computes TF affinities and uses open-chromatin/HM signal intensity as quantitative measures of TF binding strength. Using machine learning, we find low affinity binding sites to improve our ability to explain gene expression variability compared to the standard presence/absence classification of binding sites. Further, we show that both footprints and peaks capture essential TF binding events and lead to a good prediction performance. In our application, gene-based scores computed by TEPIC with one open-chromatin assay nearly reach the quality of several TF ChIP-seq data sets. Finally, these scores correctly predict known transcriptional regulators as illustrated by the application to novel DNaseI-seq and NOMe-seq data for primary human hepatocytes and CD4+ T-cells, respectively. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Chou, Wen-Chi; Ma, Qin; Yang, Shihui; ...
2015-03-12
The identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individual TUs based on two sets of parameters measuring the RNA-seq expression patterns across the genome: expression-level continuity and variance. A total of 2590 distinct TUs are predicted based on the four RNA-seq datasets.more » Moreover, among the predicted TUs, 44% have multiple genes. We assessed our prediction method on an independent set of RNA-seq data with longer reads. The evaluation confirmed the high quality of the predicted TUs. Functional enrichment analyses on a selected subset of the predicted TUs revealed interesting biology. To demonstrate the generality of the prediction method, we have also applied the method to RNA-seq data collected on Escherichia coli and achieved high prediction accuracies. The TU prediction program named SeqTU is publicly available athttps://code.google.com/p/seqtu/. We expect that the predicted TUs can serve as the baseline information for studying transcriptional and post-transcriptional regulation in C. thermocellum and other bacteria.« less
ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia.
Landt, Stephen G; Marinov, Georgi K; Kundaje, Anshul; Kheradpour, Pouya; Pauli, Florencia; Batzoglou, Serafim; Bernstein, Bradley E; Bickel, Peter; Brown, James B; Cayting, Philip; Chen, Yiwen; DeSalvo, Gilberto; Epstein, Charles; Fisher-Aylor, Katherine I; Euskirchen, Ghia; Gerstein, Mark; Gertz, Jason; Hartemink, Alexander J; Hoffman, Michael M; Iyer, Vishwanath R; Jung, Youngsook L; Karmakar, Subhradip; Kellis, Manolis; Kharchenko, Peter V; Li, Qunhua; Liu, Tao; Liu, X Shirley; Ma, Lijia; Milosavljevic, Aleksandar; Myers, Richard M; Park, Peter J; Pazin, Michael J; Perry, Marc D; Raha, Debasish; Reddy, Timothy E; Rozowsky, Joel; Shoresh, Noam; Sidow, Arend; Slattery, Matthew; Stamatoyannopoulos, John A; Tolstorukov, Michael Y; White, Kevin P; Xi, Simon; Farnham, Peggy J; Lieb, Jason D; Wold, Barbara J; Snyder, Michael
2012-09-01
Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.
ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia
Landt, Stephen G.; Marinov, Georgi K.; Kundaje, Anshul; Kheradpour, Pouya; Pauli, Florencia; Batzoglou, Serafim; Bernstein, Bradley E.; Bickel, Peter; Brown, James B.; Cayting, Philip; Chen, Yiwen; DeSalvo, Gilberto; Epstein, Charles; Fisher-Aylor, Katherine I.; Euskirchen, Ghia; Gerstein, Mark; Gertz, Jason; Hartemink, Alexander J.; Hoffman, Michael M.; Iyer, Vishwanath R.; Jung, Youngsook L.; Karmakar, Subhradip; Kellis, Manolis; Kharchenko, Peter V.; Li, Qunhua; Liu, Tao; Liu, X. Shirley; Ma, Lijia; Milosavljevic, Aleksandar; Myers, Richard M.; Park, Peter J.; Pazin, Michael J.; Perry, Marc D.; Raha, Debasish; Reddy, Timothy E.; Rozowsky, Joel; Shoresh, Noam; Sidow, Arend; Slattery, Matthew; Stamatoyannopoulos, John A.; Tolstorukov, Michael Y.; White, Kevin P.; Xi, Simon; Farnham, Peggy J.; Lieb, Jason D.; Wold, Barbara J.; Snyder, Michael
2012-01-01
Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals. PMID:22955991
Testa, Alison C; Hane, James K; Ellwood, Simon R; Oliver, Richard P
2015-03-11
The impact of gene annotation quality on functional and comparative genomics makes gene prediction an important process, particularly in non-model species, including many fungi. Sets of homologous protein sequences are rarely complete with respect to the fungal species of interest and are often small or unreliable, especially when closely related species have not been sequenced or annotated in detail. In these cases, protein homology-based evidence fails to correctly annotate many genes, or significantly improve ab initio predictions. Generalised hidden Markov models (GHMM) have proven to be invaluable tools in gene annotation and, recently, RNA-seq has emerged as a cost-effective means to significantly improve the quality of automated gene annotation. As these methods do not require sets of homologous proteins, improving gene prediction from these resources is of benefit to fungal researchers. While many pipelines now incorporate RNA-seq data in training GHMMs, there has been relatively little investigation into additionally combining RNA-seq data at the point of prediction, and room for improvement in this area motivates this study. CodingQuarry is a highly accurate, self-training GHMM fungal gene predictor designed to work with assembled, aligned RNA-seq transcripts. RNA-seq data informs annotations both during gene-model training and in prediction. Our approach capitalises on the high quality of fungal transcript assemblies by incorporating predictions made directly from transcript sequences. Correct predictions are made despite transcript assembly problems, including those caused by overlap between the transcripts of adjacent gene loci. Stringent benchmarking against high-confidence annotation subsets showed CodingQuarry predicted 91.3% of Schizosaccharomyces pombe genes and 90.4% of Saccharomyces cerevisiae genes perfectly. These results are 4-5% better than those of AUGUSTUS, the next best performing RNA-seq driven gene predictor tested. Comparisons against whole genome Sc. pombe and S. cerevisiae annotations further substantiate a 4-5% improvement in the number of correctly predicted genes. We demonstrate the success of a novel method of incorporating RNA-seq data into GHMM fungal gene prediction. This shows that a high quality annotation can be achieved without relying on protein homology or a training set of genes. CodingQuarry is freely available ( https://sourceforge.net/projects/codingquarry/ ), and suitable for incorporation into genome annotation pipelines.
Purification of nanogram-range immunoprecipitated DNA in ChIP-seq application.
Zhong, Jian; Ye, Zhenqing; Lenz, Samuel W; Clark, Chad R; Bharucha, Adil; Farrugia, Gianrico; Robertson, Keith D; Zhang, Zhiguo; Ordog, Tamas; Lee, Jeong-Heon
2017-12-21
Chromatin immunoprecipitation-sequencing (ChIP-seq) is a widely used epigenetic approach for investigating genome-wide protein-DNA interactions in cells and tissues. The approach has been relatively well established but several key steps still require further improvement. As a part of the procedure, immnoprecipitated DNA must undergo purification and library preparation for subsequent high-throughput sequencing. Current ChIP protocols typically yield nanogram quantities of immunoprecipitated DNA mainly depending on the target of interest and starting chromatin input amount. However, little information exists on the performance of reagents used for the purification of such minute amounts of immunoprecipitated DNA in ChIP elution buffer and their effects on ChIP-seq data. Here, we compared DNA recovery, library preparation efficiency, and ChIP-seq results obtained with several commercial DNA purification reagents applied to 1 ng ChIP DNA and also investigated the impact of conditions under which ChIP DNA is stored. We compared DNA recovery of ten commercial DNA purification reagents and phenol/chloroform extraction from 1 to 50 ng of immunopreciptated DNA in ChIP elution buffer. The recovery yield was significantly different with 1 ng of DNA while similar in higher DNA amounts. We also observed that the low nanogram range of purified DNA is prone to loss during storage depending on the type of polypropylene tube used. The immunoprecipitated DNA equivalent to 1 ng of purified DNA was subject to DNA purification and library preparation to evaluate the performance of four better performing purification reagents in ChIP-seq applications. Quantification of library DNAs indicated the selected purification kits have a negligible impact on the efficiency of library preparation. The resulting ChIP-seq data were comparable with the dataset generated by ENCODE consortium and were highly correlated between the data from different purification reagents. This study provides comparative data on commercial DNA purification reagents applied to nanogram-range immunopreciptated ChIP DNA and evidence for the importance of storage conditions of low nanogram-range purified DNA. We verified consistent high performance of a subset of the tested reagents. These results will facilitate the improvement of ChIP-seq methodology for low-input applications.
Uniform, optimal signal processing of mapped deep-sequencing data.
Kumar, Vibhor; Muratani, Masafumi; Rayan, Nirmala Arul; Kraus, Petra; Lufkin, Thomas; Ng, Huck Hui; Prabhakar, Shyam
2013-07-01
Despite their apparent diversity, many problems in the analysis of high-throughput sequencing data are merely special cases of two general problems, signal detection and signal estimation. Here we adapt formally optimal solutions from signal processing theory to analyze signals of DNA sequence reads mapped to a genome. We describe DFilter, a detection algorithm that identifies regulatory features in ChIP-seq, DNase-seq and FAIRE-seq data more accurately than assay-specific algorithms. We also describe EFilter, an estimation algorithm that accurately predicts mRNA levels from as few as 1-2 histone profiles (R ∼0.9). Notably, the presence of regulatory motifs in promoters correlates more with histone modifications than with mRNA levels, suggesting that histone profiles are more predictive of cis-regulatory mechanisms. We show by applying DFilter and EFilter to embryonic forebrain ChIP-seq data that regulatory protein identification and functional annotation are feasible despite tissue heterogeneity. The mathematical formalism underlying our tools facilitates integrative analysis of data from virtually any sequencing-based functional profile.
ChIP-chip versus ChIP-seq: Lessons for experimental design and data analysis
2011-01-01
Background Chromatin immunoprecipitation (ChIP) followed by microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) allows genome-wide discovery of protein-DNA interactions such as transcription factor bindings and histone modifications. Previous reports only compared a small number of profiles, and little has been done to compare histone modification profiles generated by the two technologies or to assess the impact of input DNA libraries in ChIP-seq analysis. Here, we performed a systematic analysis of a modENCODE dataset consisting of 31 pairs of ChIP-chip/ChIP-seq profiles of the coactivator CBP, RNA polymerase II (RNA PolII), and six histone modifications across four developmental stages of Drosophila melanogaster. Results Both technologies produce highly reproducible profiles within each platform, ChIP-seq generally produces profiles with a better signal-to-noise ratio, and allows detection of more peaks and narrower peaks. The set of peaks identified by the two technologies can be significantly different, but the extent to which they differ varies depending on the factor and the analysis algorithm. Importantly, we found that there is a significant variation among multiple sequencing profiles of input DNA libraries and that this variation most likely arises from both differences in experimental condition and sequencing depth. We further show that using an inappropriate input DNA profile can impact the average signal profiles around genomic features and peak calling results, highlighting the importance of having high quality input DNA data for normalization in ChIP-seq analysis. Conclusions Our findings highlight the biases present in each of the platforms, show the variability that can arise from both technology and analysis methods, and emphasize the importance of obtaining high quality and deeply sequenced input DNA libraries for ChIP-seq analysis. PMID:21356108
Expression Profiling Smackdown: Human Transcriptome Array HTA 2.0 vs. RNA-Seq
Palermo, Meghann; Driscoll, Heather; Tighe, Scott; Dragon, Julie; Bond, Jeff; Shukla, Arti; Vangala, Mahesh; Vincent, James; Hunter, Tim
2014-01-01
The advent of both microarray and massively parallel sequencing have revolutionized high-throughput analysis of the human transcriptome. Due to limitations in microarray technology, detecting and quantifying coding transcript isoforms, in addition to non-coding transcripts, has been challenging. As a result, RNA-Seq has been the preferred method for characterizing the full human transcriptome, until now. A new high-resolution array from Affymetrix, GeneChip Human Transcriptome Array 2.0 (HTA 2.0), has been designed to interrogate all transcript isoforms in the human transcriptome with >6 million probes targeting coding transcripts, exon-exon splice junctions, and non-coding transcripts. Here we compare expression results from GeneChip HTA 2.0 and RNA-Seq data using identical RNA extractions from three samples each of healthy human mesothelial cells in culture, LP9-C1, and healthy mesothelial cells treated with asbestos, LP9-A1. For GeneChip HTA 2.0 sample preparation, we chose to compare two target preparation methods, NuGEN Ovation Pico WTA V2 with the Encore Biotin Module versus Affymetrix's GeneChip WT PLUS with the WT Terminal Labeling Kit, on identical RNA extractions from both untreated and treated samples. These same RNA extractions were used for the RNA-Seq library preparation. All analyses were performed in Partek Genomics Suite 6.6. Expression profiles for control and asbestos-treated mesothelial cells prepared with NuGEN versus Affymetrix target preparation methods (GeneChip HTA 2.0) are compared to each other as well as to RNA-Seq results.
Genome-wide profiling of DNA-binding proteins using barcode-based multiplex Solexa sequencing.
Raghav, Sunil Kumar; Deplancke, Bart
2012-01-01
Chromatin immunoprecipitation (ChIP) is a commonly used technique to detect the in vivo binding of proteins to DNA. ChIP is now routinely paired to microarray analysis (ChIP-chip) or next-generation sequencing (ChIP-Seq) to profile the DNA occupancy of proteins of interest on a genome-wide level. Because ChIP-chip introduces several biases, most notably due to the use of a fixed number of probes, ChIP-Seq has quickly become the method of choice as, depending on the sequencing depth, it is more sensitive, quantitative, and provides a greater binding site location resolution. With the ever increasing number of reads that can be generated per sequencing run, it has now become possible to analyze several samples simultaneously while maintaining sufficient sequence coverage, thus significantly reducing the cost per ChIP-Seq experiment. In this chapter, we provide a step-by-step guide on how to perform multiplexed ChIP-Seq analyses. As a proof-of-concept, we focus on the genome-wide profiling of RNA Polymerase II as measuring its DNA occupancy at different stages of any biological process can provide insights into the gene regulatory mechanisms involved. However, the protocol can also be used to perform multiplexed ChIP-Seq analyses of other DNA-binding proteins such as chromatin modifiers and transcription factors.
ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data
2010-01-01
Background Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) or ChIP followed by genome tiling array analysis (ChIP-chip) have become standard technologies for genome-wide identification of DNA-binding protein target sites. A number of algorithms have been developed in parallel that allow identification of binding sites from ChIP-seq or ChIP-chip datasets and subsequent visualization in the University of California Santa Cruz (UCSC) Genome Browser as custom annotation tracks. However, summarizing these tracks can be a daunting task, particularly if there are a large number of binding sites or the binding sites are distributed widely across the genome. Results We have developed ChIPpeakAnno as a Bioconductor package within the statistical programming environment R to facilitate batch annotation of enriched peaks identified from ChIP-seq, ChIP-chip, cap analysis of gene expression (CAGE) or any experiments resulting in a large number of enriched genomic regions. The binding sites annotated with ChIPpeakAnno can be viewed easily as a table, a pie chart or plotted in histogram form, i.e., the distribution of distances to the nearest genes for each set of peaks. In addition, we have implemented functionalities for determining the significance of overlap between replicates or binding sites among transcription factors within a complex, and for drawing Venn diagrams to visualize the extent of the overlap between replicates. Furthermore, the package includes functionalities to retrieve sequences flanking putative binding sites for PCR amplification, cloning, or motif discovery, and to identify Gene Ontology (GO) terms associated with adjacent genes. Conclusions ChIPpeakAnno enables batch annotation of the binding sites identified from ChIP-seq, ChIP-chip, CAGE or any technology that results in a large number of enriched genomic regions within the statistical programming environment R. Allowing users to pass their own annotation data such as a different Chromatin immunoprecipitation (ChIP) preparation and a dataset from literature, or existing annotation packages, such as GenomicFeatures and BSgenome, provides flexibility. Tight integration to the biomaRt package enables up-to-date annotation retrieval from the BioMart database. PMID:20459804
Global Analysis of Transcription Factor-Binding Sites in Yeast Using ChIP-Seq
Lefrançois, Philippe; Gallagher, Jennifer E. G.; Snyder, Michael
2016-01-01
Transcription factors influence gene expression through their ability to bind DNA at specific regulatory elements. Specific DNA-protein interactions can be isolated through the chromatin immunoprecipitation (ChIP) procedure, in which DNA fragments bound by the protein of interest are recovered. ChIP is followed by high-throughput DNA sequencing (Seq) to determine the genomic provenance of ChIP DNA fragments and their relative abundance in the sample. This chapter describes a ChIP-Seq strategy adapted for budding yeast to enable the genome-wide characterization of binding sites of transcription factors (TFs) and other DNA-binding proteins in an efficient and cost-effective way. Yeast strains with epitope-tagged TFs are most commonly used for ChIP-Seq, along with their matching untagged control strains. The initial step of ChIP involves the cross-linking of DNA and proteins. Next, yeast cells are lysed and sonicated to shear chromatin into smaller fragments. An antibody against an epitope-tagged TF is used to pull down chromatin complexes containing DNA and the TF of interest. DNA is then purified and proteins degraded. Specific barcoded adapters for multiplex DNA sequencing are ligated to ChIP DNA. Short DNA sequence reads (28–36 base pairs) are parsed according to the barcode and aligned against the yeast reference genome, thus generating a nucleotide-resolution map of transcription factor-binding sites and their occupancy. PMID:25213249
Preparation of Low-Input and Ligation-Free ChIP-seq Libraries Using Template-Switching Technology.
Bolduc, Nathalie; Lehman, Alisa P; Farmer, Andrew
2016-10-10
Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) has become the gold standard for mapping of transcription factors and histone modifications throughout the genome. However, for ChIP experiments involving few cells or targeting low-abundance transcription factors, the small amount of DNA recovered makes ligation of adapters very challenging. In this unit, we describe a ChIP-seq workflow that can be applied to small cell numbers, including a robust single-tube and ligation-free method for preparation of sequencing libraries from sub-nanogram amounts of ChIP DNA. An example ChIP protocol is first presented, resulting in selective enrichment of DNA-binding proteins and cross-linked DNA fragments immobilized on beads via an antibody bridge. This is followed by a protocol for fast and easy cross-linking reversal and DNA recovery. Finally, we describe a fast, ligation-free library preparation protocol, featuring DNA SMART technology, resulting in samples ready for Illumina sequencing. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.
SeqTU: A web server for identification of bacterial transcription units
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Xin; Chou, Wen -Chi; Ma, Qin
A transcription unit (TU) consists of K ≥ 1 consecutive genes on the same strand of a bacterial genome that are transcribed into a single mRNA molecule under certain conditions. Their identification is an essential step in elucidation of transcriptional regulatory networks. We have recently developed a machine-learning method to accurately identify TUs from RNA-seq data, based on two features of the assembled RNA reads: the continuity and stability of RNA-seq coverage across a genomic region. While good performance was achieved by the method on Escherichia coli and Clostridium thermocellum, substantial work is needed to make the program generally applicablemore » to all bacteria, knowing that the program requires organism specific information. A web server, named SeqTU, was developed to automatically identify TUs with given RNA-seq data of any bacterium using a machine-learning approach. The server consists of a number of utility tools, in addition to TU identification, such as data preparation, data quality check and RNA-read mapping. SeqTU provides a user-friendly interface and automated prediction of TUs from given RNA-seq data. Furthermore, the predicted TUs are displayed intuitively using HTML format along with a graphic visualization of the prediction.« less
SeqTU: A web server for identification of bacterial transcription units
Chen, Xin; Chou, Wen -Chi; Ma, Qin; ...
2017-03-07
A transcription unit (TU) consists of K ≥ 1 consecutive genes on the same strand of a bacterial genome that are transcribed into a single mRNA molecule under certain conditions. Their identification is an essential step in elucidation of transcriptional regulatory networks. We have recently developed a machine-learning method to accurately identify TUs from RNA-seq data, based on two features of the assembled RNA reads: the continuity and stability of RNA-seq coverage across a genomic region. While good performance was achieved by the method on Escherichia coli and Clostridium thermocellum, substantial work is needed to make the program generally applicablemore » to all bacteria, knowing that the program requires organism specific information. A web server, named SeqTU, was developed to automatically identify TUs with given RNA-seq data of any bacterium using a machine-learning approach. The server consists of a number of utility tools, in addition to TU identification, such as data preparation, data quality check and RNA-read mapping. SeqTU provides a user-friendly interface and automated prediction of TUs from given RNA-seq data. Furthermore, the predicted TUs are displayed intuitively using HTML format along with a graphic visualization of the prediction.« less
BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS.
Hoff, Katharina J; Lange, Simone; Lomsadze, Alexandre; Borodovsky, Mark; Stanke, Mario
2016-03-01
Gene finding in eukaryotic genomes is notoriously difficult to automate. The task is to design a work flow with a minimal set of tools that would reach state-of-the-art performance across a wide range of species. GeneMark-ET is a gene prediction tool that incorporates RNA-Seq data into unsupervised training and subsequently generates ab initio gene predictions. AUGUSTUS is a gene finder that usually requires supervised training and uses information from RNA-Seq reads in the prediction step. Complementary strengths of GeneMark-ET and AUGUSTUS provided motivation for designing a new combined tool for automatic gene prediction. We present BRAKER1, a pipeline for unsupervised RNA-Seq-based genome annotation that combines the advantages of GeneMark-ET and AUGUSTUS. As input, BRAKER1 requires a genome assembly file and a file in bam-format with spliced alignments of RNA-Seq reads to the genome. First, GeneMark-ET performs iterative training and generates initial gene structures. Second, AUGUSTUS uses predicted genes for training and then integrates RNA-Seq read information into final gene predictions. In our experiments, we observed that BRAKER1 was more accurate than MAKER2 when it is using RNA-Seq as sole source for training and prediction. BRAKER1 does not require pre-trained parameters or a separate expert-prepared training step. BRAKER1 is available for download at http://bioinf.uni-greifswald.de/bioinf/braker/ and http://exon.gatech.edu/GeneMark/ katharina.hoff@uni-greifswald.de or borodovsky@gatech.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Single-Cell mRNA-Seq Using the Fluidigm C1 System and Integrated Fluidics Circuits.
Gong, Haibiao; Do, Devin; Ramakrishnan, Ramesh
2018-01-01
Single-cell mRNA-seq is a valuable tool to dissect expression profiles and to understand the regulatory network of genes. Microfluidics is well suited for single-cell analysis owing both to the small volume of the reaction chambers and easiness of automation. Here we describe the workflow of single-cell mRNA-seq using C1 IFC, which can isolate and process up to 96 cells. Both on-chip procedure (lysis, reverse transcription, and preamplification PCR) and off-chip sequencing library preparation protocols are described. The workflow generates full-length mRNA information, which is more valuable compared to 3' end counting method for many applications.
Cheng, Chia-Yang; Chu, Chia-Han; Hsu, Hung-Wei; Hsu, Fang-Rong; Tang, Chung Yi; Wang, Wen-Ching; Kung, Hsing-Jien; Chang, Pei-Ching
2014-01-01
Post-translational modification (PTM) of transcriptional factors and chromatin remodelling proteins is recognized as a major mechanism by which transcriptional regulation occurs. Chromatin immunoprecipitation (ChIP) in combination with high-throughput sequencing (ChIP-seq) is being applied as a gold standard when studying the genome-wide binding sites of transcription factor (TFs). This has greatly improved our understanding of protein-DNA interactions on a genomic-wide scale. However, current ChIP-seq peak calling tools are not sufficiently sensitive and are unable to simultaneously identify post-translational modified TFs based on ChIP-seq analysis; this is largely due to the wide-spread presence of multiple modified TFs. Using SUMO-1 modification as an example; we describe here an improved approach that allows the simultaneous identification of the particular genomic binding regions of all TFs with SUMO-1 modification. Traditional peak calling methods are inadequate when identifying multiple TF binding sites that involve long genomic regions and therefore we designed a ChIP-seq processing pipeline for the detection of peaks via a combinatorial fusion method. Then, we annotate the peaks with known transcription factor binding sites (TFBS) using the Transfac Matrix Database (v7.0), which predicts potential SUMOylated TFs. Next, the peak calling result was further analyzed based on the promoter proximity, TFBS annotation, a literature review, and was validated by ChIP-real-time quantitative PCR (qPCR) and ChIP-reChIP real-time qPCR. The results show clearly that SUMOylated TFs are able to be pinpointed using our pipeline. A methodology is presented that analyzes SUMO-1 ChIP-seq patterns and predicts related TFs. Our analysis uses three peak calling tools. The fusion of these different tools increases the precision of the peak calling results. TFBS annotation method is able to predict potential SUMOylated TFs. Here, we offer a new approach that enhances ChIP-seq data analysis and allows the identification of multiple SUMOylated TF binding sites simultaneously, which can then be utilized for other functional PTM binding site prediction in future.
Gilad, Yoav; Pritchard, Jonathan K.; Stephens, Matthew
2015-01-01
Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the information in the DNase I spatial cleavage profile characteristic of each DNA binding protein to accurately infer functional factor binding sites. However, the model for the spatial profile in this framework fails to account for the substantial variation in the DNase I cleavage profiles across different binding sites. Neither does it account for variation in the profiles at the same binding site across multiple replicate DNase I experiments, which are increasingly available. In this work, we introduce new methods, based on multi-scale models for inhomogeneous Poisson processes, to account for such variation in DNase I cleavage patterns both within and across binding sites. These models account for the spatial structure in the heterogeneity in DNase I cleavage patterns for each factor. Using DNase-seq measurements assayed in a lymphoblastoid cell line, we demonstrate the improved performance of this model for several transcription factors by comparing against the Chip-seq peaks for those factors. Finally, we explore the effects of DNase I sequence bias on inference of factor binding using a simple extension to our framework that allows for a more flexible background model. The proposed model can also be easily applied to paired-end ATAC-seq and DNase-seq data. msCentipede, a Python implementation of our algorithm, is available at http://rajanil.github.io/msCentipede. PMID:26406244
Raj, Anil; Shim, Heejung; Gilad, Yoav; Pritchard, Jonathan K; Stephens, Matthew
2015-01-01
Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the information in the DNase I spatial cleavage profile characteristic of each DNA binding protein to accurately infer functional factor binding sites. However, the model for the spatial profile in this framework fails to account for the substantial variation in the DNase I cleavage profiles across different binding sites. Neither does it account for variation in the profiles at the same binding site across multiple replicate DNase I experiments, which are increasingly available. In this work, we introduce new methods, based on multi-scale models for inhomogeneous Poisson processes, to account for such variation in DNase I cleavage patterns both within and across binding sites. These models account for the spatial structure in the heterogeneity in DNase I cleavage patterns for each factor. Using DNase-seq measurements assayed in a lymphoblastoid cell line, we demonstrate the improved performance of this model for several transcription factors by comparing against the Chip-seq peaks for those factors. Finally, we explore the effects of DNase I sequence bias on inference of factor binding using a simple extension to our framework that allows for a more flexible background model. The proposed model can also be easily applied to paired-end ATAC-seq and DNase-seq data. msCentipede, a Python implementation of our algorithm, is available at http://rajanil.github.io/msCentipede.
19 CFR 12.39 - Imported articles involving unfair methods of competition or practices.
Code of Federal Regulations, 2011 CFR
2011-04-01
... the authorization or consent of the Government. (e) Importations of semiconductor chip products. (1) In accordance with the Semiconductor Chip Protection Act of 1984 (17 U.S.C. 901 et seq.), if the... imported semiconductor chip products which infringe his rights in such mask work, the owner must obtain a...
Genome wide approaches to identify protein-DNA interactions.
Ma, Tao; Ye, Zhenqing; Wang, Liguo
2018-05-29
Transcription factors are DNA-binding proteins that play key roles in many fundamental biological processes. Unraveling their interactions with DNA is essential to identify their target genes and understand the regulatory network. Genome-wide identification of their binding sites became feasible thanks to recent progress in experimental and computational approaches. ChIP-chip, ChIP-seq, and ChIP-exo are three widely used techniques to demarcate genome-wide transcription factor binding sites. This review aims to provide an overview of these three techniques including their experiment procedures, computational approaches, and popular analytic tools. ChIP-chip, ChIP-seq, and ChIP-exo have been the major techniques to study genome-wide in vivo protein-DNA interaction. Due to the rapid development of next-generation sequencing technology, array-based ChIP-chip is deprecated and ChIP-seq has become the most widely used technique to identify transcription factor binding sites in genome-wide. The newly developed ChIP-exo further improves the spatial resolution to single nucleotide. Numerous tools have been developed to analyze ChIP-chip, ChIP-seq and ChIP-exo data. However, different programs may employ different mechanisms or underlying algorithms thus each will inherently include its own set of statistical assumption and bias. So choosing the most appropriate analytic program for a given experiment needs careful considerations. Moreover, most programs only have command line interface so their installation and usage will require basic computation expertise in Unix/Linux. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Therapeutic Mechanism of BET Bromodomain Inhibitor in Breast
2015-12-01
evaluated the performance of this model in predicting gene expression changes associated with 3h,12h and 24h of JQ1 treatment and found that BRD4 ChIP- seq...models of cancer4–6, have not been evaluated in TNBC. These inhibitors displace BET bromodomain proteins such as BRD4 from chromatin by competing...Extended Data Fig. 2c, d and Fig. 1b). Extending the translational significance of these findings, we evalu - ated the ability of JQ1 to inhibit tumour
A fast one-chip event-preprocessor and sequencer for the Simbol-X Low Energy Detector
NASA Astrophysics Data System (ADS)
Schanz, T.; Tenzer, C.; Maier, D.; Kendziorra, E.; Santangelo, A.
2010-12-01
We present an FPGA-based digital camera electronics consisting of an Event-Preprocessor (EPP) for on-board data preprocessing and a related Sequencer (SEQ) to generate the necessary signals to control the readout of the detector. The device has been originally designed for the Simbol-X low energy detector (LED). The EPP operates on 64×64 pixel images and has a real-time processing capability of more than 8000 frames per second. The already working releases of the EPP and the SEQ are now combined into one Digital-Camera-Controller-Chip (D3C).
Chromatin Immunoprecipitation (ChIP) Protocol for Low-abundance Embryonic Samples.
Rehimi, Rizwan; Bartusel, Michaela; Solinas, Francesca; Altmüller, Janine; Rada-Iglesias, Alvaro
2017-08-29
Chromatin immunoprecipitation (ChIP) is a widely-used technique for mapping the localization of post-translationally modified histones, histone variants, transcription factors, or chromatin-modifying enzymes at a given locus or on a genome-wide scale. The combination of ChIP assays with next-generation sequencing (i.e., ChIP-Seq) is a powerful approach to globally uncover gene regulatory networks and to improve the functional annotation of genomes, especially of non-coding regulatory sequences. ChIP protocols normally require large amounts of cellular material, thus precluding the applicability of this method to investigating rare cell types or small tissue biopsies. In order to make the ChIP assay compatible with the amount of biological material that can typically be obtained in vivo during early vertebrate embryogenesis, we describe here a simplified ChIP protocol in which the number of steps required to complete the assay were reduced to minimize sample loss. This ChIP protocol has been successfully used to investigate different histone modifications in various embryonic chicken and adult mouse tissues using low to medium cell numbers (5 x 10 4 - 5 x 10 5 cells). Importantly, this protocol is compatible with ChIP-seq technology using standard library preparation methods, thus providing global epigenomic maps in highly relevant embryonic tissues.
Optimizing ChIP-seq peak detectors using visual labels and supervised machine learning
Goerner-Potvin, Patricia; Morin, Andreanne; Shao, Xiaojian; Pastinen, Tomi
2017-01-01
Motivation: Many peak detection algorithms have been proposed for ChIP-seq data analysis, but it is not obvious which algorithm and what parameters are optimal for any given dataset. In contrast, regions with and without obvious peaks can be easily labeled by visual inspection of aligned read counts in a genome browser. We propose a supervised machine learning approach for ChIP-seq data analysis, using labels that encode qualitative judgments about which genomic regions contain or do not contain peaks. The main idea is to manually label a small subset of the genome, and then learn a model that makes consistent peak predictions on the rest of the genome. Results: We created 7 new histone mark datasets with 12 826 visually determined labels, and analyzed 3 existing transcription factor datasets. We observed that default peak detection parameters yield high false positive rates, which can be reduced by learning parameters using a relatively small training set of labeled data from the same experiment type. We also observed that labels from different people are highly consistent. Overall, these data indicate that our supervised labeling method is useful for quantitatively training and testing peak detection algorithms. Availability and Implementation: Labeled histone mark data http://cbio.ensmp.fr/~thocking/chip-seq-chunk-db/, R package to compute the label error of predicted peaks https://github.com/tdhock/PeakError Contacts: toby.hocking@mail.mcgill.ca or guil.bourque@mcgill.ca Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27797775
Optimizing ChIP-seq peak detectors using visual labels and supervised machine learning.
Hocking, Toby Dylan; Goerner-Potvin, Patricia; Morin, Andreanne; Shao, Xiaojian; Pastinen, Tomi; Bourque, Guillaume
2017-02-15
Many peak detection algorithms have been proposed for ChIP-seq data analysis, but it is not obvious which algorithm and what parameters are optimal for any given dataset. In contrast, regions with and without obvious peaks can be easily labeled by visual inspection of aligned read counts in a genome browser. We propose a supervised machine learning approach for ChIP-seq data analysis, using labels that encode qualitative judgments about which genomic regions contain or do not contain peaks. The main idea is to manually label a small subset of the genome, and then learn a model that makes consistent peak predictions on the rest of the genome. We created 7 new histone mark datasets with 12 826 visually determined labels, and analyzed 3 existing transcription factor datasets. We observed that default peak detection parameters yield high false positive rates, which can be reduced by learning parameters using a relatively small training set of labeled data from the same experiment type. We also observed that labels from different people are highly consistent. Overall, these data indicate that our supervised labeling method is useful for quantitatively training and testing peak detection algorithms. Labeled histone mark data http://cbio.ensmp.fr/~thocking/chip-seq-chunk-db/ , R package to compute the label error of predicted peaks https://github.com/tdhock/PeakError. toby.hocking@mail.mcgill.ca or guil.bourque@mcgill.ca. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Jordán-Pla, Antonio; Visa, Neus
2018-01-01
Arguably one of the most valuable techniques to study chromatin organization, ChIP is the method of choice to map the contacts established between proteins and genomic DNA. Ever since its inception, more than 30 years ago, ChIP has been constantly evolving, improving, and expanding its capabilities and reach. Despite its widespread use by many laboratories across a wide variety of disciplines, ChIP assays can be sometimes challenging to design, and are often sensitive to variations in practical implementation.In this chapter, we provide a general overview of the ChIP method and its most common variations, with a special focus on ChIP-seq. We try to address some of the most important aspects that need to be taken into account in order to design and perform experiments that generate the most reproducible, high-quality data. Some of the main topics covered include the use of properly characterized antibodies, alternatives to chromatin preparation, the need for proper controls, and some recommendations about ChIP-seq data analysis.
Organ-on-a-Chip Technology for Reproducing Multiorgan Physiology.
Lee, Seung Hwan; Sung, Jong Hwan
2018-01-01
In the drug development process, the accurate prediction of drug efficacy and toxicity is important in order to reduce the cost, labor, and effort involved. For this purpose, conventional 2D cell culture models are used in the early phase of drug development. However, the differences between the in vitro and the in vivo systems have caused the failure of drugs in the later phase of the drug-development process. Therefore, there is a need for a novel in vitro model system that can provide accurate information for evaluating the drug efficacy and toxicity through a closer recapitulation of the in vivo system. Recently, the idea of using microtechnology for mimicking the microscale tissue environment has become widespread, leading to the development of "organ-on-a-chip." Furthermore, the system is further developed for realizing a multiorgan model for mimicking interactions between multiple organs. These advancements are still ongoing and are aimed at ultimately developing "body-on-a-chip" or "human-on-a-chip" devices for predicting the response of the whole body. This review summarizes recently developed organ-on-a-chip technologies, and their applications for reproducing multiorgan functions. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Yuan, Chih-Chi; Craske, Madeleine Lisa; Labhart, Paul; Guler, Gulfem D.; Arnott, David; Maile, Tobias M.; Busby, Jennifer; Henry, Chisato; Kelly, Theresa K.; Tindell, Charles A.; Jhunjhunwala, Suchit; Zhao, Feng; Hatton, Charlie; Bryant, Barbara M.
2016-01-01
Chromatin immunoprecipitation and DNA sequencing (ChIP-seq) has been instrumental in inferring the roles of histone post-translational modifications in the regulation of transcription, chromatin compaction and other cellular processes that require modulation of chromatin structure. However, analysis of ChIP-seq data is challenging when the manipulation of a chromatin-modifying enzyme significantly affects global levels of histone post-translational modifications. For example, small molecule inhibition of the methyltransferase EZH2 reduces global levels of histone H3 lysine 27 trimethylation (H3K27me3). However, standard ChIP-seq normalization and analysis methods fail to detect a decrease upon EZH2 inhibitor treatment. We overcome this challenge by employing an alternative normalization approach that is based on the addition of Drosophila melanogaster chromatin and a D. melanogaster-specific antibody into standard ChIP reactions. Specifically, the use of an antibody that exclusively recognizes the D. melanogaster histone variant H2Av enables precipitation of D. melanogaster chromatin as a minor fraction of the total ChIP DNA. The D. melanogaster ChIP-seq tags are used to normalize the human ChIP-seq data from DMSO and EZH2 inhibitor-treated samples. Employing this strategy, a substantial reduction in H3K27me3 signal is now observed in ChIP-seq data from EZH2 inhibitor treated samples. PMID:27875550
Egan, Brian; Yuan, Chih-Chi; Craske, Madeleine Lisa; Labhart, Paul; Guler, Gulfem D; Arnott, David; Maile, Tobias M; Busby, Jennifer; Henry, Chisato; Kelly, Theresa K; Tindell, Charles A; Jhunjhunwala, Suchit; Zhao, Feng; Hatton, Charlie; Bryant, Barbara M; Classon, Marie; Trojer, Patrick
2016-01-01
Chromatin immunoprecipitation and DNA sequencing (ChIP-seq) has been instrumental in inferring the roles of histone post-translational modifications in the regulation of transcription, chromatin compaction and other cellular processes that require modulation of chromatin structure. However, analysis of ChIP-seq data is challenging when the manipulation of a chromatin-modifying enzyme significantly affects global levels of histone post-translational modifications. For example, small molecule inhibition of the methyltransferase EZH2 reduces global levels of histone H3 lysine 27 trimethylation (H3K27me3). However, standard ChIP-seq normalization and analysis methods fail to detect a decrease upon EZH2 inhibitor treatment. We overcome this challenge by employing an alternative normalization approach that is based on the addition of Drosophila melanogaster chromatin and a D. melanogaster-specific antibody into standard ChIP reactions. Specifically, the use of an antibody that exclusively recognizes the D. melanogaster histone variant H2Av enables precipitation of D. melanogaster chromatin as a minor fraction of the total ChIP DNA. The D. melanogaster ChIP-seq tags are used to normalize the human ChIP-seq data from DMSO and EZH2 inhibitor-treated samples. Employing this strategy, a substantial reduction in H3K27me3 signal is now observed in ChIP-seq data from EZH2 inhibitor treated samples.
Predicting survival times for neuroblastoma patients using RNA-seq expression profiles.
Grimes, Tyler; Walker, Alejandro R; Datta, Susmita; Datta, Somnath
2018-05-30
Neuroblastoma is the most common tumor of early childhood and is notorious for its high variability in clinical presentation. Accurate prognosis has remained a challenge for many patients. In this study, expression profiles from RNA-sequencing are used to predict survival times directly. Several models are investigated using various annotation levels of expression profiles (genes, transcripts, and introns), and an ensemble predictor is proposed as a heuristic for combining these different profiles. The use of RNA-seq data is shown to improve accuracy in comparison to using clinical data alone for predicting overall survival times. Furthermore, clinically high-risk patients can be subclassified based on their predicted overall survival times. In this effort, the best performing model was the elastic net using both transcripts and introns together. This model separated patients into two groups with 2-year overall survival rates of 0.40±0.11 (n=22) versus 0.80±0.05 (n=68). The ensemble approach gave similar results, with groups 0.42±0.10 (n=25) versus 0.82±0.05 (n=65). This suggests that the ensemble is able to effectively combine the individual RNA-seq datasets. Using predicted survival times based on RNA-seq data can provide improved prognosis by subclassifying clinically high-risk neuroblastoma patients. This article was reviewed by Subharup Guha and Isabel Nepomuceno.
CisMapper: predicting regulatory interactions from transcription factor ChIP-seq data
O'Connor, Timothy; Bodén, Mikael
2017-01-01
Abstract Identifying the genomic regions and regulatory factors that control the transcription of genes is an important, unsolved problem. The current method of choice predicts transcription factor (TF) binding sites using chromatin immunoprecipitation followed by sequencing (ChIP-seq), and then links the binding sites to putative target genes solely on the basis of the genomic distance between them. Evidence from chromatin conformation capture experiments shows that this approach is inadequate due to long-distance regulation via chromatin looping. We present CisMapper, which predicts the regulatory targets of a TF using the correlation between a histone mark at the TF's bound sites and the expression of each gene across a panel of tissues. Using both chromatin conformation capture and differential expression data, we show that CisMapper is more accurate at predicting the target genes of a TF than the distance-based approaches currently used, and is particularly advantageous for predicting the long-range regulatory interactions typical of tissue-specific gene expression. CisMapper also predicts which TF binding sites regulate a given gene more accurately than using genomic distance. Unlike distance-based methods, CisMapper can predict which transcription start site of a gene is regulated by a particular binding site of the TF. PMID:28204599
Gao, Hui; Zhao, Chunyan
2018-01-01
Chromatin immunoprecipitation (ChIP) has become the most effective and widely used tool to study the interactions between specific proteins or modified forms of proteins and a genomic DNA region. Combined with genome-wide profiling technologies, such as microarray hybridization (ChIP-on-chip) or massively parallel sequencing (ChIP-seq), ChIP could provide a genome-wide mapping of in vivo protein-DNA interactions in various organisms. Here, we describe a protocol of ChIP-on-chip that uses tiling microarray to obtain a genome-wide profiling of ChIPed DNA.
19 CFR 12.39 - Imported articles involving unfair methods of competition or practices.
Code of Federal Regulations, 2010 CFR
2010-04-01
... economically operated United States industry, or to restrain or monopolize trade and commerce in the United... the authorization or consent of the Government. (e) Importations of semiconductor chip products. (1) In accordance with the Semiconductor Chip Protection Act of 1984 (17 U.S.C. 901 et seq.), if the...
ChIP-seq: advantages and challenges of a maturing technology.
Park, Peter J
2009-10-01
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is a technique for genome-wide profiling of DNA-binding proteins, histone modifications or nucleosomes. Owing to the tremendous progress in next-generation sequencing technology, ChIP-seq offers higher resolution, less noise and greater coverage than its array-based predecessor ChIP-chip. With the decreasing cost of sequencing, ChIP-seq has become an indispensable tool for studying gene regulation and epigenetic mechanisms. In this Review, I describe the benefits and challenges in harnessing this technique with an emphasis on issues related to experimental design and data analysis. ChIP-seq experiments generate large quantities of data, and effective computational analysis will be crucial for uncovering biological mechanisms.
Accurate and predictive antibody repertoire profiling by molecular amplification fingerprinting.
Khan, Tarik A; Friedensohn, Simon; Gorter de Vries, Arthur R; Straszewski, Jakub; Ruscheweyh, Hans-Joachim; Reddy, Sai T
2016-03-01
High-throughput antibody repertoire sequencing (Ig-seq) provides quantitative molecular information on humoral immunity. However, Ig-seq is compromised by biases and errors introduced during library preparation and sequencing. By using synthetic antibody spike-in genes, we determined that primer bias from multiplex polymerase chain reaction (PCR) library preparation resulted in antibody frequencies with only 42 to 62% accuracy. Additionally, Ig-seq errors resulted in antibody diversity measurements being overestimated by up to 5000-fold. To rectify this, we developed molecular amplification fingerprinting (MAF), which uses unique molecular identifier (UID) tagging before and during multiplex PCR amplification, which enabled tagging of transcripts while accounting for PCR efficiency. Combined with a bioinformatic pipeline, MAF bias correction led to measurements of antibody frequencies with up to 99% accuracy. We also used MAF to correct PCR and sequencing errors, resulting in enhanced accuracy of full-length antibody diversity measurements, achieving 98 to 100% error correction. Using murine MAF-corrected data, we established a quantitative metric of recent clonal expansion-the intraclonal diversity index-which measures the number of unique transcripts associated with an antibody clone. We used this intraclonal diversity index along with antibody frequencies and somatic hypermutation to build a logistic regression model for prediction of the immunological status of clones. The model was able to predict clonal status with high confidence but only when using MAF error and bias corrected Ig-seq data. Improved accuracy by MAF provides the potential to greatly advance Ig-seq and its utility in immunology and biotechnology.
Accurate and predictive antibody repertoire profiling by molecular amplification fingerprinting
Khan, Tarik A.; Friedensohn, Simon; de Vries, Arthur R. Gorter; Straszewski, Jakub; Ruscheweyh, Hans-Joachim; Reddy, Sai T.
2016-01-01
High-throughput antibody repertoire sequencing (Ig-seq) provides quantitative molecular information on humoral immunity. However, Ig-seq is compromised by biases and errors introduced during library preparation and sequencing. By using synthetic antibody spike-in genes, we determined that primer bias from multiplex polymerase chain reaction (PCR) library preparation resulted in antibody frequencies with only 42 to 62% accuracy. Additionally, Ig-seq errors resulted in antibody diversity measurements being overestimated by up to 5000-fold. To rectify this, we developed molecular amplification fingerprinting (MAF), which uses unique molecular identifier (UID) tagging before and during multiplex PCR amplification, which enabled tagging of transcripts while accounting for PCR efficiency. Combined with a bioinformatic pipeline, MAF bias correction led to measurements of antibody frequencies with up to 99% accuracy. We also used MAF to correct PCR and sequencing errors, resulting in enhanced accuracy of full-length antibody diversity measurements, achieving 98 to 100% error correction. Using murine MAF-corrected data, we established a quantitative metric of recent clonal expansion—the intraclonal diversity index—which measures the number of unique transcripts associated with an antibody clone. We used this intraclonal diversity index along with antibody frequencies and somatic hypermutation to build a logistic regression model for prediction of the immunological status of clones. The model was able to predict clonal status with high confidence but only when using MAF error and bias corrected Ig-seq data. Improved accuracy by MAF provides the potential to greatly advance Ig-seq and its utility in immunology and biotechnology. PMID:26998518
Limitations and possibilities of low cell number ChIP-seq
2012-01-01
Background Chromatin immunoprecipitation coupled with high-throughput DNA sequencing (ChIP-seq) offers high resolution, genome-wide analysis of DNA-protein interactions. However, current standard methods require abundant starting material in the range of 1–20 million cells per immunoprecipitation, and remain a bottleneck to the acquisition of biologically relevant epigenetic data. Using a ChIP-seq protocol optimised for low cell numbers (down to 100,000 cells / IP), we examined the performance of the ChIP-seq technique on a series of decreasing cell numbers. Results We present an enhanced native ChIP-seq method tailored to low cell numbers that represents a 200-fold reduction in input requirements over existing protocols. The protocol was tested over a range of starting cell numbers covering three orders of magnitude, enabling determination of the lower limit of the technique. At low input cell numbers, increased levels of unmapped and duplicate reads reduce the number of unique reads generated, and can drive up sequencing costs and affect sensitivity if ChIP is attempted from too few cells. Conclusions The optimised method presented here considerably reduces the input requirements for performing native ChIP-seq. It extends the applicability of the technique to isolated primary cells and rare cell populations (e.g. biobank samples, stem cells), and in many cases will alleviate the need for cell culture and any associated alteration of epigenetic marks. However, this study highlights a challenge inherent to ChIP-seq from low cell numbers: as cell input numbers fall, levels of unmapped sequence reads and PCR-generated duplicate reads rise. We discuss a number of solutions to overcome the effects of reducing cell number that may aid further improvements to ChIP performance. PMID:23171294
Proton Upset Monte Carlo Simulation
NASA Technical Reports Server (NTRS)
O'Neill, Patrick M.; Kouba, Coy K.; Foster, Charles C.
2009-01-01
The Proton Upset Monte Carlo Simulation (PROPSET) program calculates the frequency of on-orbit upsets in computer chips (for given orbits such as Low Earth Orbit, Lunar Orbit, and the like) from proton bombardment based on the results of heavy ion testing alone. The software simulates the bombardment of modern microelectronic components (computer chips) with high-energy (.200 MeV) protons. The nuclear interaction of the proton with the silicon of the chip is modeled and nuclear fragments from this interaction are tracked using Monte Carlo techniques to produce statistically accurate predictions.
Li, You; Heavican, Tayla B.; Vellichirammal, Neetha N.; Iqbal, Javeed
2017-01-01
Abstract The RNA-Seq technology has revolutionized transcriptome characterization not only by accurately quantifying gene expression, but also by the identification of novel transcripts like chimeric fusion transcripts. The ‘fusion’ or ‘chimeric’ transcripts have improved the diagnosis and prognosis of several tumors, and have led to the development of novel therapeutic regimen. The fusion transcript detection is currently accomplished by several software packages, primarily relying on sequence alignment algorithms. The alignment of sequencing reads from fusion transcript loci in cancer genomes can be highly challenging due to the incorrect mapping induced by genomic alterations, thereby limiting the performance of alignment-based fusion transcript detection methods. Here, we developed a novel alignment-free method, ChimeRScope that accurately predicts fusion transcripts based on the gene fingerprint (as k-mers) profiles of the RNA-Seq paired-end reads. Results on published datasets and in-house cancer cell line datasets followed by experimental validations demonstrate that ChimeRScope consistently outperforms other popular methods irrespective of the read lengths and sequencing depth. More importantly, results on our in-house datasets show that ChimeRScope is a better tool that is capable of identifying novel fusion transcripts with potential oncogenic functions. ChimeRScope is accessible as a standalone software at (https://github.com/ChimeRScope/ChimeRScope/wiki) or via the Galaxy web-interface at (https://galaxy.unmc.edu/). PMID:28472320
Ancestry prediction in Singapore population samples using the Illumina ForenSeq kit.
Ramani, Anantharaman; Wong, Yongxun; Tan, Si Zhen; Shue, Bing Hong; Syn, Christopher
2017-11-01
The ability to predict bio-geographic ancestry can be valuable to generate investigative leads towards solving crimes. Ancestry informative marker (AIM) sets include large numbers of SNPs to predict an ancestral population. Massively parallel sequencing has enabled forensic laboratories to genotype a large number of such markers in a single assay. Illumina's ForenSeq DNA Signature Kit includes the ancestry informative SNPs reported by Kidd et al. In this study, the ancestry prediction capabilities of the ForenSeq kit through sequencing on the MiSeq FGx were evaluated in 1030 unrelated Singapore population samples of Chinese, Malay and Indian origin. A total of 59 ancestry SNPs and phenotypic SNPs with AIM properties were selected. The bio-geographic ancestry of the 1030 samples, as predicted by Illumina's ForenSeq Universal Analysis Software (UAS), was determined. 712 of the genotyped samples were used as a training sample set for the generation of an ancestry prediction model using STRUCTURE and Snipper. The performance of the prediction model was tested by both methods with the remaining 318 samples. Ancestry prediction in UAS was able to correctly classify the Singapore Chinese as part of the East Asian cluster, while Indians clustered with Ad-mixed Americans and Malays clustered in-between these two reference populations. Principal component analyses showed that the 59 SNPs were only able to account for 26% of the variation between the Singapore sub-populations. Their discriminatory potential was also found to be lower (G ST =0.085) than that reported in ALFRED (F ST =0.357). The Snipper algorithm was able to correctly predict bio-geographic ancestry in 91% of Chinese and Indian, and 88% of Malay individuals, while the success rates for the STRUCTURE algorithm were 94% in Chinese, 80% in Malay, and 91% in Indian individuals. Both these algorithms were able to provide admixture proportions when present. Ancestry prediction accuracy (in terms of likelihood ratio) was generally high in the absence of admixture. Misclassification occurred in admixed individuals, who were likely offspring of inter-ethnic marriages, and hence whose self-reported bio-geographic ancestries were dependent on that of their fathers, and in individuals of minority sub-populations with inter-ethnic beliefs. The ancestry prediction capabilities of the 59 SNPs on the ForenSeq kit were reasonably effective in differentiating the Singapore Chinese, Malay and Indian sub-populations, and will be of use for investigative purposes. However, there is potential for more accurate prediction through the evaluation of other AIM sets. Copyright © 2017 Elsevier B.V. All rights reserved.
Comprehensive evaluation of genome-wide 5-hydroxymethylcytosine profiling approaches in human DNA.
Skvortsova, Ksenia; Zotenko, Elena; Luu, Phuc-Loi; Gould, Cathryn M; Nair, Shalima S; Clark, Susan J; Stirzaker, Clare
2017-01-01
The discovery that 5-methylcytosine (5mC) can be oxidized to 5-hydroxymethylcytosine (5hmC) by the ten-eleven translocation (TET) proteins has prompted wide interest in the potential role of 5hmC in reshaping the mammalian DNA methylation landscape. The gold-standard bisulphite conversion technologies to study DNA methylation do not distinguish between 5mC and 5hmC. However, new approaches to mapping 5hmC genome-wide have advanced rapidly, although it is unclear how the different methods compare in accurately calling 5hmC. In this study, we provide a comparative analysis on brain DNA using three 5hmC genome-wide approaches, namely whole-genome bisulphite/oxidative bisulphite sequencing (WG Bis/OxBis-seq), Infinium HumanMethylation450 BeadChip arrays coupled with oxidative bisulphite (HM450K Bis/OxBis) and antibody-based immunoprecipitation and sequencing of hydroxymethylated DNA (hMeDIP-seq). We also perform loci-specific TET-assisted bisulphite sequencing (TAB-seq) for validation of candidate regions. We show that whole-genome single-base resolution approaches are advantaged in providing precise 5hmC values but require high sequencing depth to accurately measure 5hmC, as this modification is commonly in low abundance in mammalian cells. HM450K arrays coupled with oxidative bisulphite provide a cost-effective representation of 5hmC distribution, at CpG sites with 5hmC levels >~10%. However, 5hmC analysis is restricted to the genomic location of the probes, which is an important consideration as 5hmC modification is commonly enriched at enhancer elements. Finally, we show that the widely used hMeDIP-seq method provides an efficient genome-wide profile of 5hmC and shows high correlation with WG Bis/OxBis-seq 5hmC distribution in brain DNA. However, in cell line DNA with low levels of 5hmC, hMeDIP-seq-enriched regions are not detected by WG Bis/OxBis or HM450K, either suggesting misinterpretation of 5hmC calls by hMeDIP or lack of sensitivity of the latter methods. We highlight both the advantages and caveats of three commonly used genome-wide 5hmC profiling technologies and show that interpretation of 5hmC data can be significantly influenced by the sensitivity of methods used, especially as the levels of 5hmC are low and vary in different cell types and different genomic locations.
Impact of artifact removal on ChIP quality metrics in ChIP-seq and ChIP-exo data
Carroll, Thomas S.; Liang, Ziwei; Salama, Rafik; Stark, Rory; de Santiago, Ines
2014-01-01
With the advent of ChIP-seq multiplexing technologies and the subsequent increase in ChIP-seq throughput, the development of working standards for the quality assessment of ChIP-seq studies has received significant attention. The ENCODE consortium's large scale analysis of transcription factor binding and epigenetic marks as well as concordant work on ChIP-seq by other laboratories has established a new generation of ChIP-seq quality control measures. The use of these metrics alongside common processing steps has however not been evaluated. In this study, we investigate the effects of blacklisting and removal of duplicated reads on established metrics of ChIP-seq quality and show that the interpretation of these metrics is highly dependent on the ChIP-seq preprocessing steps applied. Further to this we perform the first investigation of the use of these metrics for ChIP-exo data and make recommendations for the adaptation of the NSC statistic to allow for the assessment of ChIP-exo efficiency. PMID:24782889
Nagalingam, Kumaran; Lorenc, Michał T; Manoli, Sahana; Cameron, Stephen L; Clarke, Anthony R; Dudley, Kevin J
2018-01-01
Interactions between DNA and proteins located in the cell nucleus play an important role in controlling physiological processes by specifying, augmenting and regulating context-specific transcription events. Chromatin immunoprecipitation (ChIP) is a widely used methodology to study DNA-protein interactions and has been successfully used in various cell types for over three decades. More recently, by combining ChIP with genomic screening technologies and Next Generation Sequencing (e.g. ChIP-seq), it has become possible to profile DNA-protein interactions (including covalent histone modifications) across entire genomes. However, the applicability of ChIP-chip and ChIP-seq has rarely been extended to non-model species because of a number of technical challenges. Here we report a method that can be used to identify genome wide covalent histone modifications in a group of non-model fruit fly species (Diptera: Tephritidae). The method was developed by testing and refining protocols that have been used in model organisms, including Drosophila melanogaster. We demonstrate that this method is suitable for a group of economically important pest fruit fly species, viz., Bactrocera dorsalis, Ceratitis capitata, Zeugodacus cucurbitae and Bactrocera tryoni. We also report an example ChIP-seq dataset for B. tryoni, providing evidence for histone modifications in the genome of a tephritid fruit fly for the first time. Since tephritids are major agricultural pests globally, this methodology will be a valuable resource to study taxa-specific evolutionary questions and to assist with pest management. It also provides a basis for researchers working with other non-model species to undertake genome wide DNA-protein interaction studies.
The ChIP-exo Method: Identifying Protein-DNA Interactions with Near Base Pair Precision.
Perreault, Andrea A; Venters, Bryan J
2016-12-23
Chromatin immunoprecipitation (ChIP) is an indispensable tool in the fields of epigenetics and gene regulation that isolates specific protein-DNA interactions. ChIP coupled to high throughput sequencing (ChIP-seq) is commonly used to determine the genomic location of proteins that interact with chromatin. However, ChIP-seq is hampered by relatively low mapping resolution of several hundred base pairs and high background signal. The ChIP-exo method is a refined version of ChIP-seq that substantially improves upon both resolution and noise. The key distinction of the ChIP-exo methodology is the incorporation of lambda exonuclease digestion in the library preparation workflow to effectively footprint the left and right 5' DNA borders of the protein-DNA crosslink site. The ChIP-exo libraries are then subjected to high throughput sequencing. The resulting data can be leveraged to provide unique and ultra-high resolution insights into the functional organization of the genome. Here, we describe the ChIP-exo method that we have optimized and streamlined for mammalian systems and next-generation sequencing-by-synthesis platform.
USDA-ARS?s Scientific Manuscript database
The promise of genomic selection is that genetic potential can be accurately predicted from genotypes. Simple deoxyribonucleic acid (DNA) tests might replace low accuracy predictions based on performance and pedigree for expensive or lowly heritable measures of puberty and fertility. The promise i...
USDA-ARS?s Scientific Manuscript database
The promise of genomic selection is accurate prediction of animals' genetic potential from their genotypes. Simple DNA tests might replace low accuracy predictions for expensive or lowly heritable measures of puberty and fertility based on performance and pedigree. Knowing which DNA variants affec...
FUNDAMENTALS OF VITAMIN D HORMONE-REGULATED GENE EXPRESSION
Pike, J. Wesley; Meyer, Mark B.
2014-01-01
Initial research focused upon several known genetic targets provided early insight into the mechanism of action of the vitamin D hormone (1,25-dihydroxyvitamin D3 (1,25(OH)2D3)). Recently, however, a series of technical advances involving the coupling of chromatin immunoprecipitation (ChIP) to unbiased methodologies that initially involved tiled DNA microarrays (ChIP-chip analysis) and now Next Generation DNA Sequencing techniques (ChIP-Seq analysis) has opened new avenues of research into the mechanisms through which 1,25(OH)2D3 regulates gene expression. In this review, we summarize briefly the results of this early work and then focus on more recent studies in which ChIP-chip and ChIP-seq analyses have been used to explore the mechanisms of 1,25(OH)2D3 action on a genome-wide scale providing specific target genes as examples. The results of this work have advanced our understanding of the mechanisms involved at both genetic and epigenetic levels and have revealed a series of new principles through which the vitamin D hormone functions to control the expression of genes. PMID:24239506
Chabbert, Christophe D; Adjalley, Sophie H; Steinmetz, Lars M; Pelechano, Vicent
2018-01-01
Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) or microarray hybridization (ChIP-on-chip) are standard methods for the study of transcription factor binding sites and histone chemical modifications. However, these approaches only allow profiling of a single factor or protein modification at a time.In this chapter, we present Bar-ChIP, a higher throughput version of ChIP-Seq that relies on the direct ligation of molecular barcodes to chromatin fragments. Bar-ChIP enables the concurrent profiling of multiple DNA-protein interactions and is therefore amenable to experimental scale-up, without the need for any robotic instrumentation.
Parallel factor ChIP provides essential internal control for quantitative differential ChIP-seq.
Guertin, Michael J; Cullen, Amy E; Markowetz, Florian; Holding, Andrew N
2018-04-17
A key challenge in quantitative ChIP combined with high-throughput sequencing (ChIP-seq) is the normalization of data in the presence of genome-wide changes in occupancy. Analysis-based normalization methods were developed for transcriptomic data and these are dependent on the underlying assumption that total transcription does not change between conditions. For genome-wide changes in transcription factor (TF) binding, these assumptions do not hold true. The challenges in normalization are confounded by experimental variability during sample preparation, processing and recovery. We present a novel normalization strategy utilizing an internal standard of unchanged peaks for reference. Our method can be readily applied to monitor genome-wide changes by ChIP-seq that are otherwise lost or misrepresented through analytical normalization. We compare our approach to normalization by total read depth and two alternative methods that utilize external experimental controls to study TF binding. We successfully resolve the key challenges in quantitative ChIP-seq analysis and demonstrate its application by monitoring the loss of Estrogen Receptor-alpha (ER) binding upon fulvestrant treatment, ER binding in response to estrodiol, ER mediated change in H4K12 acetylation and profiling ER binding in patient-derived xenographs. This is supported by an adaptable pipeline to normalize and quantify differential TF binding genome-wide and generate metrics for differential binding at individual sites.
A deep learning-based multi-model ensemble method for cancer prediction.
Xiao, Yawen; Wu, Jun; Lin, Zongli; Zhao, Xiaodong
2018-01-01
Cancer is a complex worldwide health problem associated with high mortality. With the rapid development of the high-throughput sequencing technology and the application of various machine learning methods that have emerged in recent years, progress in cancer prediction has been increasingly made based on gene expression, providing insight into effective and accurate treatment decision making. Thus, developing machine learning methods, which can successfully distinguish cancer patients from healthy persons, is of great current interest. However, among the classification methods applied to cancer prediction so far, no one method outperforms all the others. In this paper, we demonstrate a new strategy, which applies deep learning to an ensemble approach that incorporates multiple different machine learning models. We supply informative gene data selected by differential gene expression analysis to five different classification models. Then, a deep learning method is employed to ensemble the outputs of the five classifiers. The proposed deep learning-based multi-model ensemble method was tested on three public RNA-seq data sets of three kinds of cancers, Lung Adenocarcinoma, Stomach Adenocarcinoma and Breast Invasive Carcinoma. The test results indicate that it increases the prediction accuracy of cancer for all the tested RNA-seq data sets as compared to using a single classifier or the majority voting algorithm. By taking full advantage of different classifiers, the proposed deep learning-based multi-model ensemble method is shown to be accurate and effective for cancer prediction. Copyright © 2017 Elsevier B.V. All rights reserved.
Fu, Yong-Bi; Yang, Mo-Hua; Zeng, Fangqin; Biligetu, Bill
2017-01-01
Molecular plant breeding with the aid of molecular markers has played an important role in modern plant breeding over the last two decades. Many marker-based predictions for quantitative traits have been made to enhance parental selection, but the trait prediction accuracy remains generally low, even with the aid of dense, genome-wide SNP markers. To search for more accurate trait-specific prediction with informative SNP markers, we conducted a literature review on the prediction issues in molecular plant breeding and on the applicability of an RNA-Seq technique for developing function-associated specific trait (FAST) SNP markers. To understand whether and how FAST SNP markers could enhance trait prediction, we also performed a theoretical reasoning on the effectiveness of these markers in a trait-specific prediction, and verified the reasoning through computer simulation. To the end, the search yielded an alternative to regular genomic selection with FAST SNP markers that could be explored to achieve more accurate trait-specific prediction. Continuous search for better alternatives is encouraged to enhance marker-based predictions for an individual quantitative trait in molecular plant breeding. PMID:28729875
Comparison of RNA-seq and microarray-based models for clinical endpoint prediction.
Zhang, Wenqian; Yu, Ying; Hertwig, Falk; Thierry-Mieg, Jean; Zhang, Wenwei; Thierry-Mieg, Danielle; Wang, Jian; Furlanello, Cesare; Devanarayan, Viswanath; Cheng, Jie; Deng, Youping; Hero, Barbara; Hong, Huixiao; Jia, Meiwen; Li, Li; Lin, Simon M; Nikolsky, Yuri; Oberthuer, André; Qing, Tao; Su, Zhenqiang; Volland, Ruth; Wang, Charles; Wang, May D; Ai, Junmei; Albanese, Davide; Asgharzadeh, Shahab; Avigad, Smadar; Bao, Wenjun; Bessarabova, Marina; Brilliant, Murray H; Brors, Benedikt; Chierici, Marco; Chu, Tzu-Ming; Zhang, Jibin; Grundy, Richard G; He, Min Max; Hebbring, Scott; Kaufman, Howard L; Lababidi, Samir; Lancashire, Lee J; Li, Yan; Lu, Xin X; Luo, Heng; Ma, Xiwen; Ning, Baitang; Noguera, Rosa; Peifer, Martin; Phan, John H; Roels, Frederik; Rosswog, Carolina; Shao, Susan; Shen, Jie; Theissen, Jessica; Tonini, Gian Paolo; Vandesompele, Jo; Wu, Po-Yen; Xiao, Wenzhong; Xu, Joshua; Xu, Weihong; Xuan, Jiekun; Yang, Yong; Ye, Zhan; Dong, Zirui; Zhang, Ke K; Yin, Ye; Zhao, Chen; Zheng, Yuanting; Wolfinger, Russell D; Shi, Tieliu; Malkas, Linda H; Berthold, Frank; Wang, Jun; Tong, Weida; Shi, Leming; Peng, Zhiyu; Fischer, Matthias
2015-06-25
Gene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model. We generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays. Characterization of the neuroblastoma transcriptome by RNA-seq reveals that more than 48,000 genes and 200,000 transcripts are being expressed in this malignancy. We also find that RNA-seq provides much more detailed information on specific transcript expression patterns in clinico-genetic neuroblastoma subgroups than microarrays. To systematically compare the power of RNA-seq and microarray-based models in predicting clinical endpoints, we divide the cohort randomly into training and validation sets and develop 360 predictive models on six clinical endpoints of varying predictability. Evaluation of factors potentially affecting model performances reveals that prediction accuracies are most strongly influenced by the nature of the clinical endpoint, whereas technological platforms (RNA-seq vs. microarrays), RNA-seq data analysis pipelines, and feature levels (gene vs. transcript vs. exon-junction level) do not significantly affect performances of the models. We demonstrate that RNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction. Our findings may be valuable to guide future studies on the development of gene expression-based predictive models and their implementation in clinical practice.
Chung, Dongjun; Kuan, Pei Fen; Li, Bo; Sanalkumar, Rajendran; Liang, Kun; Bresnick, Emery H; Dewey, Colin; Keleş, Sündüz
2011-07-01
Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads). This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads). Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with uni-reads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments.
Genome-wide analysis of replication timing by next-generation sequencing with E/L Repli-seq.
Marchal, Claire; Sasaki, Takayo; Vera, Daniel; Wilson, Korey; Sima, Jiao; Rivera-Mulia, Juan Carlos; Trevilla-García, Claudia; Nogues, Coralin; Nafie, Ebtesam; Gilbert, David M
2018-05-01
This protocol is an extension to: Nat. Protoc. 6, 870-895 (2014); doi:10.1038/nprot.2011.328; published online 02 June 2011Cycling cells duplicate their DNA content during S phase, following a defined program called replication timing (RT). Early- and late-replicating regions differ in terms of mutation rates, transcriptional activity, chromatin marks and subnuclear position. Moreover, RT is regulated during development and is altered in diseases. Here, we describe E/L Repli-seq, an extension of our Repli-chip protocol. E/L Repli-seq is a rapid, robust and relatively inexpensive protocol for analyzing RT by next-generation sequencing (NGS), allowing genome-wide assessment of how cellular processes are linked to RT. Briefly, cells are pulse-labeled with BrdU, and early and late S-phase fractions are sorted by flow cytometry. Labeled nascent DNA is immunoprecipitated from both fractions and sequenced. Data processing leads to a single bedGraph file containing the ratio of nascent DNA from early versus late S-phase fractions. The results are comparable to those of Repli-chip, with the additional benefits of genome-wide sequence information and an increased dynamic range. We also provide computational pipelines for downstream analyses, for parsing phased genomes using single-nucleotide polymorphisms (SNPs) to analyze RT allelic asynchrony, and for direct comparison to Repli-chip data. This protocol can be performed in up to 3 d before sequencing, and requires basic cellular and molecular biology skills, as well as a basic understanding of Unix and R.
Wilkinson, Samuel L.; John, Shibu; Walsh, Roddy; Novotny, Tomas; Valaskova, Iveta; Gupta, Manu; Game, Laurence; Barton, Paul J R.; Cook, Stuart A.; Ware, James S.
2013-01-01
Background Molecular genetic testing is recommended for diagnosis of inherited cardiac disease, to guide prognosis and treatment, but access is often limited by cost and availability. Recently introduced high-throughput bench-top DNA sequencing platforms have the potential to overcome these limitations. Methodology/Principal Findings We evaluated two next-generation sequencing (NGS) platforms for molecular diagnostics. The protein-coding regions of six genes associated with inherited arrhythmia syndromes were amplified from 15 human samples using parallelised multiplex PCR (Access Array, Fluidigm), and sequenced on the MiSeq (Illumina) and Ion Torrent PGM (Life Technologies). Overall, 97.9% of the target was sequenced adequately for variant calling on the MiSeq, and 96.8% on the Ion Torrent PGM. Regions missed tended to be of high GC-content, and most were problematic for both platforms. Variant calling was assessed using 107 variants detected using Sanger sequencing: within adequately sequenced regions, variant calling on both platforms was highly accurate (Sensitivity: MiSeq 100%, PGM 99.1%. Positive predictive value: MiSeq 95.9%, PGM 95.5%). At the time of the study the Ion Torrent PGM had a lower capital cost and individual runs were cheaper and faster. The MiSeq had a higher capacity (requiring fewer runs), with reduced hands-on time and simpler laboratory workflows. Both provide significant cost and time savings over conventional methods, even allowing for adjunct Sanger sequencing to validate findings and sequence exons missed by NGS. Conclusions/Significance MiSeq and Ion Torrent PGM both provide accurate variant detection as part of a PCR-based molecular diagnostic workflow, and provide alternative platforms for molecular diagnosis of inherited cardiac conditions. Though there were performance differences at this throughput, platforms differed primarily in terms of cost, scalability, protocol stability and ease of use. Compared with current molecular genetic diagnostic tests for inherited cardiac arrhythmias, these NGS approaches are faster, less expensive, and yet more comprehensive. PMID:23861798
Lee, Moo-Yeal; Dordick, Jonathan S; Clark, Douglas S
2010-01-01
Due to poor drug candidate safety profiles that are often identified late in the drug development process, the clinical progression of new chemical entities to pharmaceuticals remains hindered, thus resulting in the high cost of drug discovery. To accelerate the identification of safer drug candidates and improve the clinical progression of drug candidates to pharmaceuticals, it is important to develop high-throughput tools that can provide early-stage predictive toxicology data. In particular, in vitro cell-based systems that can accurately mimic the human in vivo response and predict the impact of drug candidates on human toxicology are needed to accelerate the assessment of drug candidate toxicity and human metabolism earlier in the drug development process. The in vitro techniques that provide a high degree of human toxicity prediction will be perhaps more important in cosmetic and chemical industries in Europe, as animal toxicity testing is being phased out entirely in the immediate future.We have developed a metabolic enzyme microarray (the Metabolizing Enzyme Toxicology Assay Chip, or MetaChip) and a miniaturized three-dimensional (3D) cell-culture array (the Data Analysis Toxicology Assay Chip, or DataChip) for high-throughput toxicity screening of target compounds and their metabolic enzyme-generated products. The human or rat MetaChip contains an array of encapsulated metabolic enzymes that is designed to emulate the metabolic reactions in the human or rat liver. The human or rat DataChip contains an array of 3D human or rat cells encapsulated in alginate gels for cell-based toxicity screening. By combining the DataChip with the complementary MetaChip, in vitro toxicity results are obtained that correlate well with in vivo rat data.
Soyer, Jessica L; Möller, Mareike; Schotanus, Klaas; Connolly, Lanelle R; Galazka, Jonathan M; Freitag, Michael; Stukenbrock, Eva H
2015-06-01
The presence or absence of specific transcription factors, chromatin remodeling machineries, chromatin modification enzymes, post-translational histone modifications and histone variants all play crucial roles in the regulation of pathogenicity genes. Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) provides an important tool to study genome-wide protein-DNA interactions to help understand gene regulation in the context of native chromatin. ChIP-seq is a convenient in vivo technique to identify, map and characterize occupancy of specific DNA fragments with proteins against which specific antibodies exist or which can be epitope-tagged in vivo. We optimized existing ChIP protocols for use in the wheat pathogen Zymoseptoria tritici and closely related sister species. Here, we provide a detailed method, underscoring which aspects of the technique are organism-specific. Library preparation for Illumina sequencing is described, as this is currently the most widely used ChIP-seq method. One approach for the analysis and visualization of representative sequence is described; improved tools for these analyses are constantly being developed. Using ChIP-seq with antibodies against H3K4me2, which is considered a mark for euchromatin or H3K9me3 and H3K27me3, which are considered marks for heterochromatin, the overall distribution of euchromatin and heterochromatin in the genome of Z. tritici can be determined. Our ChIP-seq protocol was also successfully applied to Z. tritici strains with high levels of melanization or aberrant colony morphology, and to different species of the genus (Z. ardabiliae and Z. pseudotritici), suggesting that our technique is robust. The methods described here provide a powerful framework to study new aspects of chromatin biology and gene regulation in this prominent wheat pathogen. Copyright © 2015 Elsevier Inc. All rights reserved.
Inferring nucleosome positions with their histone mark annotation from ChIP data
Mammana, Alessandro; Vingron, Martin; Chung, Ho-Ryun
2013-01-01
Motivation: The nucleosome is the basic repeating unit of chromatin. It contains two copies each of the four core histones H2A, H2B, H3 and H4 and about 147 bp of DNA. The residues of the histone proteins are subject to numerous post-translational modifications, such as methylation or acetylation. Chromatin immunoprecipitiation followed by sequencing (ChIP-seq) is a technique that provides genome-wide occupancy data of these modified histone proteins, and it requires appropriate computational methods. Results: We present NucHunter, an algorithm that uses the data from ChIP-seq experiments directed against many histone modifications to infer positioned nucleosomes. NucHunter annotates each of these nucleosomes with the intensities of the histone modifications. We demonstrate that these annotations can be used to infer nucleosomal states with distinct correlations to underlying genomic features and chromatin-related processes, such as transcriptional start sites, enhancers, elongation by RNA polymerase II and chromatin-mediated repression. Thus, NucHunter is a versatile tool that can be used to predict positioned nucleosomes from a panel of histone modification ChIP-seq experiments and infer distinct histone modification patterns associated to different chromatin states. Availability: The software is available at http://epigen.molgen.mpg.de/nuchunter/. Contact: chung@molgen.mpg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23981350
Keating, Brendan; Bansal, Aruna T; Walsh, Susan; Millman, Jonathan; Newman, Jonathan; Kidd, Kenneth; Budowle, Bruce; Eisenberg, Arthur; Donfack, Joseph; Gasparini, Paolo; Budimlija, Zoran; Henders, Anjali K; Chandrupatla, Hareesh; Duffy, David L; Gordon, Scott D; Hysi, Pirro; Liu, Fan; Medland, Sarah E; Rubin, Laurence; Martin, Nicholas G; Spector, Timothy D; Kayser, Manfred
2013-05-01
When a forensic DNA sample cannot be associated directly with a previously genotyped reference sample by standard short tandem repeat profiling, the investigation required for identifying perpetrators, victims, or missing persons can be both costly and time consuming. Here, we describe the outcome of a collaborative study using the Identitas Version 1 (v1) Forensic Chip, the first commercially available all-in-one tool dedicated to the concept of developing intelligence leads based on DNA. The chip allows parallel interrogation of 201,173 genome-wide autosomal, X-chromosomal, Y-chromosomal, and mitochondrial single nucleotide polymorphisms for inference of biogeographic ancestry, appearance, relatedness, and sex. The first assessment of the chip's performance was carried out on 3,196 blinded DNA samples of varying quantities and qualities, covering a wide range of biogeographic origin and eye/hair coloration as well as variation in relatedness and sex. Overall, 95 % of the samples (N = 3,034) passed quality checks with an overall genotype call rate >90 % on variable numbers of available recorded trait information. Predictions of sex, direct match, and first to third degree relatedness were highly accurate. Chip-based predictions of biparental continental ancestry were on average ~94 % correct (further support provided by separately inferred patrilineal and matrilineal ancestry). Predictions of eye color were 85 % correct for brown and 70 % correct for blue eyes, and predictions of hair color were 72 % for brown, 63 % for blond, 58 % for black, and 48 % for red hair. From the 5 % of samples (N = 162) with <90 % call rate, 56 % yielded correct continental ancestry predictions while 7 % yielded sufficient genotypes to allow hair and eye color prediction. Our results demonstrate that the Identitas v1 Forensic Chip holds great promise for a wide range of applications including criminal investigations, missing person investigations, and for national security purposes.
Li, Guosheng; Jagadeeswaran, Guru; Mort, Andrew; Sunkar, Ramanjulu
2017-01-01
Histone modifications represent the crux of epigenetic gene regulation essential for most biological processes including abiotic stress responses in plants. Thus, identification of histone modifications at the genome-scale can provide clues for how some genes are 'turned-on' while some others are "turned-off" in response to stress. This chapter details a step-by-step protocol for identifying genome-wide histone modifications associated with stress-responsive gene regulation using chromatin immunoprecipitation (ChIP) followed by sequencing of the DNA (ChIP-seq).
Montagne, Louise; Derhourhi, Mehdi; Piton, Amélie; Toussaint, Bénédicte; Durand, Emmanuelle; Vaillant, Emmanuel; Thuillier, Dorothée; Gaget, Stefan; De Graeve, Franck; Rabearivelo, Iandry; Lansiaux, Amélie; Lenne, Bruno; Sukno, Sylvie; Desailloud, Rachel; Cnop, Miriam; Nicolescu, Ramona; Cohen, Lior; Zagury, Jean-François; Amouyal, Mélanie; Weill, Jacques; Muller, Jean; Sand, Olivier; Delobel, Bruno; Froguel, Philippe; Bonnefond, Amélie
2018-05-16
The molecular diagnosis of extreme forms of obesity, in which accurate detection of both copy number variations (CNVs) and point mutations, is crucial for an optimal care of the patients and genetic counseling for their families. Whole-exome sequencing (WES) has benefited considerably this molecular diagnosis, but its poor ability to detect CNVs remains a major limitation. We aimed to develop a method (CoDE-seq) enabling the accurate detection of both CNVs and point mutations in one step. CoDE-seq is based on an augmented WES method, using probes distributed uniformly throughout the genome. CoDE-seq was validated in 40 patients for whom chromosomal DNA microarray was available. CNVs and mutations were assessed in 82 children/young adults with suspected Mendelian obesity and/or intellectual disability and in their parents when available (n total = 145). CoDE-seq not only detected all of the 97 CNVs identified by chromosomal DNA microarrays but also found 84 additional CNVs, due to a better resolution. When compared to CoDE-seq and chromosomal DNA microarrays, WES failed to detect 37% and 14% of CNVs, respectively. In the 82 patients, a likely molecular diagnosis was achieved in >30% of the patients. Half of the genetic diagnoses were explained by CNVs while the other half by mutations. CoDE-seq has proven cost-efficient and highly effective as it avoids the sequential genetic screening approaches currently used in clinical practice for the accurate detection of CNVs and point mutations. Copyright © 2018 The Authors. Published by Elsevier GmbH.. All rights reserved.
Zhu, Lin; Guo, Wei-Li; Lu, Canyi; Huang, De-Shuang
2016-12-01
Although the newly available ChIP-seq data provides immense opportunities for comparative study of regulatory activities across different biological conditions, due to cost, time or sample material availability, it is not always possible for researchers to obtain binding profiles for every protein in every sample of interest, which considerably limits the power of integrative studies. Recently, by leveraging related information from measured data, Ernst et al. proposed ChromImpute for predicting additional ChIP-seq and other types of datasets, it is demonstrated that the imputed signal tracks accurately approximate the experimentally measured signals, and thereby could potentially enhance the power of integrative analysis. Despite the success of ChromImpute, in this paper, we reexamine its learning process, and show that its performance may degrade substantially and sometimes may even fail to output a prediction when the available data is scarce. This limitation could hurt its applicability to important predictive tasks, such as the imputation of TF binding data. To alleviate this problem, we propose a novel method called Local Sensitive Unified Embedding (LSUE) for imputing new ChIP-seq datasets. In LSUE, the ChIP-seq data compendium are fused together by mapping proteins, samples, and genomic positions simultaneously into the Euclidean space, thereby making their underling associations directly evaluable using simple calculations. In contrast to ChromImpute which mainly makes use of the local correlations between available datasets, LSUE can better estimate the overall data structure by formulating the representation learning of all involved entities as a single unified optimization problem. Meanwhile, a novel form of local sensitive low rank regularization is also proposed to further improve the performance of LSUE. Experimental evaluations on the ENCODE TF ChIP-seq data illustrate the performance of the proposed model. The code of LSUE is available at https://github.com/ekffar/LSUE.
Discriminative prediction of mammalian enhancers from DNA sequence
Lee, Dongwon; Karchin, Rachel; Beer, Michael A.
2011-01-01
Accurately predicting regulatory sequences and enhancers in entire genomes is an important but difficult problem, especially in large vertebrate genomes. With the advent of ChIP-seq technology, experimental detection of genome-wide EP300/CREBBP bound regions provides a powerful platform to develop predictive tools for regulatory sequences and to study their sequence properties. Here, we develop a support vector machine (SVM) framework which can accurately identify EP300-bound enhancers using only genomic sequence and an unbiased set of general sequence features. Moreover, we find that the predictive sequence features identified by the SVM classifier reveal biologically relevant sequence elements enriched in the enhancers, but we also identify other features that are significantly depleted in enhancers. The predictive sequence features are evolutionarily conserved and spatially clustered, providing further support of their functional significance. Although our SVM is trained on experimental data, we also predict novel enhancers and show that these putative enhancers are significantly enriched in both ChIP-seq signal and DNase I hypersensitivity signal in the mouse brain and are located near relevant genes. Finally, we present results of comparisons between other EP300/CREBBP data sets using our SVM and uncover sequence elements enriched and/or depleted in the different classes of enhancers. Many of these sequence features play a role in specifying tissue-specific or developmental-stage-specific enhancer activity, but our results indicate that some features operate in a general or tissue-independent manner. In addition to providing a high confidence list of enhancer targets for subsequent experimental investigation, these results contribute to our understanding of the general sequence structure of vertebrate enhancers. PMID:21875935
PASTA: splice junction identification from RNA-Sequencing data
2013-01-01
Background Next generation transcriptome sequencing (RNA-Seq) is emerging as a powerful experimental tool for the study of alternative splicing and its regulation, but requires ad-hoc analysis methods and tools. PASTA (Patterned Alignments for Splicing and Transcriptome Analysis) is a splice junction detection algorithm specifically designed for RNA-Seq data, relying on a highly accurate alignment strategy and on a combination of heuristic and statistical methods to identify exon-intron junctions with high accuracy. Results Comparisons against TopHat and other splice junction prediction software on real and simulated datasets show that PASTA exhibits high specificity and sensitivity, especially at lower coverage levels. Moreover, PASTA is highly configurable and flexible, and can therefore be applied in a wide range of analysis scenarios: it is able to handle both single-end and paired-end reads, it does not rely on the presence of canonical splicing signals, and it uses organism-specific regression models to accurately identify junctions. Conclusions PASTA is a highly efficient and sensitive tool to identify splicing junctions from RNA-Seq data. Compared to similar programs, it has the ability to identify a higher number of real splicing junctions, and provides highly annotated output files containing detailed information about their location and characteristics. Accurate junction data in turn facilitates the reconstruction of the splicing isoforms and the analysis of their expression levels, which will be performed by the remaining modules of the PASTA pipeline, still under development. Use of PASTA can therefore enable the large-scale investigation of transcription and alternative splicing. PMID:23557086
A comparative study of ChIP-seq sequencing library preparation methods.
Sundaram, Arvind Y M; Hughes, Timothy; Biondi, Shea; Bolduc, Nathalie; Bowman, Sarah K; Camilli, Andrew; Chew, Yap C; Couture, Catherine; Farmer, Andrew; Jerome, John P; Lazinski, David W; McUsic, Andrew; Peng, Xu; Shazand, Kamran; Xu, Feng; Lyle, Robert; Gilfillan, Gregor D
2016-10-21
ChIP-seq is the primary technique used to investigate genome-wide protein-DNA interactions. As part of this procedure, immunoprecipitated DNA must undergo "library preparation" to enable subsequent high-throughput sequencing. To facilitate the analysis of biopsy samples and rare cell populations, there has been a recent proliferation of methods allowing sequencing library preparation from low-input DNA amounts. However, little information exists on the relative merits, performance, comparability and biases inherent to these procedures. Notably, recently developed single-cell ChIP procedures employing microfluidics must also employ library preparation reagents to allow downstream sequencing. In this study, seven methods designed for low-input DNA/ChIP-seq sample preparation (Accel-NGS® 2S, Bowman-method, HTML-PCR, SeqPlex™, DNA SMART™, TELP and ThruPLEX®) were performed on five replicates of 1 ng and 0.1 ng input H3K4me3 ChIP material, and compared to a "gold standard" reference PCR-free dataset. The performance of each method was examined for the prevalence of unmappable reads, amplification-derived duplicate reads, reproducibility, and for the sensitivity and specificity of peak calling. We identified consistent high performance in a subset of the tested reagents, which should aid researchers in choosing the most appropriate reagents for their studies. Furthermore, we expect this work to drive future advances by identifying and encouraging use of the most promising methods and reagents. The results may also aid judgements on how comparable are existing datasets that have been prepared with different sample library preparation reagents.
Zhang, Yanju; Lameijer, Eric-Wubbo; 't Hoen, Peter A C; Ning, Zemin; Slagboom, P Eline; Ye, Kai
2012-02-15
RNA-seq is a powerful technology for the study of transcriptome profiles that uses deep-sequencing technologies. Moreover, it may be used for cellular phenotyping and help establishing the etiology of diseases characterized by abnormal splicing patterns. In RNA-Seq, the exact nature of splicing events is buried in the reads that span exon-exon boundaries. The accurate and efficient mapping of these reads to the reference genome is a major challenge. We developed PASSion, a pattern growth algorithm-based pipeline for splice site detection in paired-end RNA-Seq reads. Comparing the performance of PASSion to three existing RNA-Seq analysis pipelines, TopHat, MapSplice and HMMSplicer, revealed that PASSion is competitive with these packages. Moreover, the performance of PASSion is not affected by read length and coverage. It performs better than the other three approaches when detecting junctions in highly abundant transcripts. PASSion has the ability to detect junctions that do not have known splicing motifs, which cannot be found by the other tools. Of the two public RNA-Seq datasets, PASSion predicted ≈ 137,000 and 173,000 splicing events, of which on average 82 are known junctions annotated in the Ensembl transcript database and 18% are novel. In addition, our package can discover differential and shared splicing patterns among multiple samples. The code and utilities can be freely downloaded from https://trac.nbic.nl/passion and ftp://ftp.sanger.ac.uk/pub/zn1/passion.
A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis
Down, Thomas A.; Rakyan, Vardhman K.; Turner, Daniel J.; Flicek, Paul; Li, Heng; Kulesha, Eugene; Gräf, Stefan; Johnson, Nathan; Herrero, Javier; Tomazou, Eleni M.; Thorne, Natalie P.; Bäckdahl, Liselotte; Herberth, Marlis; Howe, Kevin L.; Jackson, David K.; Miretti, Marcos M.; Marioni, John C.; Birney, Ewan; Hubbard, Tim J. P.; Durbin, Richard; Tavaré, Simon; Beck, Stephan
2009-01-01
DNA methylation is an indispensible epigenetic modification of mammalian genomes. Consequently there is great interest in strategies for genome-wide/whole-genome DNA methylation analysis, and immunoprecipitation-based methods have proven to be a powerful option. Such methods are rapidly shifting the bottleneck from data generation to data analysis, necessitating the development of better analytical tools. Until now, a major analytical difficulty associated with immunoprecipitation-based DNA methylation profiling has been the inability to estimate absolute methylation levels. Here we report the development of a novel cross-platform algorithm – Bayesian Tool for Methylation Analysis (Batman) – for analyzing Methylated DNA Immunoprecipitation (MeDIP) profiles generated using arrays (MeDIP-chip) or next-generation sequencing (MeDIP-seq). The latter is an approach we have developed to elucidate the first high-resolution whole-genome DNA methylation profile (DNA methylome) of any mammalian genome. MeDIP-seq/MeDIP-chip combined with Batman represent robust, quantitative, and cost-effective functional genomic strategies for elucidating the function of DNA methylation. PMID:18612301
Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) Version 3.0 User Guide
User Guide to describe the complete functionality of the Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) Version 3.0 online tool. The US Environmental Protection Agency Sequence Alignment to Predict Across Species Susceptibility tool (SeqAPASS; https://seqa...
High-confidence coding and noncoding transcriptome maps
2017-01-01
The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap 2.0, The Cancer Genome Atlas, and GTEx projects, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalog that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of noncoding genomes. PMID:28396519
Berghoff, Bork A; Karlsson, Torgny; Källman, Thomas; Wagner, E Gerhart H; Grabherr, Manfred G
2017-01-01
Measuring how gene expression changes in the course of an experiment assesses how an organism responds on a molecular level. Sequencing of RNA molecules, and their subsequent quantification, aims to assess global gene expression changes on the RNA level (transcriptome). While advances in high-throughput RNA-sequencing (RNA-seq) technologies allow for inexpensive data generation, accurate post-processing and normalization across samples is required to eliminate any systematic noise introduced by the biochemical and/or technical processes. Existing methods thus either normalize on selected known reference genes that are invariant in expression across the experiment, assume that the majority of genes are invariant, or that the effects of up- and down-regulated genes cancel each other out during the normalization. Here, we present a novel method, moose 2 , which predicts invariant genes in silico through a dynamic programming (DP) scheme and applies a quadratic normalization based on this subset. The method allows for specifying a set of known or experimentally validated invariant genes, which guides the DP. We experimentally verified the predictions of this method in the bacterium Escherichia coli , and show how moose 2 is able to (i) estimate the expression value distances between RNA-seq samples, (ii) reduce the variation of expression values across all samples, and (iii) to subsequently reveal new functional groups of genes during the late stages of DNA damage. We further applied the method to three eukaryotic data sets, on which its performance compares favourably to other methods. The software is implemented in C++ and is publicly available from http://grabherr.github.io/moose2/. The proposed RNA-seq normalization method, moose 2 , is a valuable alternative to existing methods, with two major advantages: (i) in silico prediction of invariant genes provides a list of potential reference genes for downstream analyses, and (ii) non-linear artefacts in RNA-seq data are handled adequately to minimize variations between replicates.
Integration of RNA-Seq and RPPA data for survival time prediction in cancer patients.
Isik, Zerrin; Ercan, Muserref Ece
2017-10-01
Integration of several types of patient data in a computational framework can accelerate the identification of more reliable biomarkers, especially for prognostic purposes. This study aims to identify biomarkers that can successfully predict the potential survival time of a cancer patient by integrating the transcriptomic (RNA-Seq), proteomic (RPPA), and protein-protein interaction (PPI) data. The proposed method -RPBioNet- employs a random walk-based algorithm that works on a PPI network to identify a limited number of protein biomarkers. Later, the method uses gene expression measurements of the selected biomarkers to train a classifier for the survival time prediction of patients. RPBioNet was applied to classify kidney renal clear cell carcinoma (KIRC), glioblastoma multiforme (GBM), and lung squamous cell carcinoma (LUSC) patients based on their survival time classes (long- or short-term). The RPBioNet method correctly identified the survival time classes of patients with between 66% and 78% average accuracy for three data sets. RPBioNet operates with only 20 to 50 biomarkers and can achieve on average 6% higher accuracy compared to the closest alternative method, which uses only RNA-Seq data in the biomarker selection. Further analysis of the most predictive biomarkers highlighted genes that are common for both cancer types, as they may be driver proteins responsible for cancer progression. The novelty of this study is the integration of a PPI network with mRNA and protein expression data to identify more accurate prognostic biomarkers that can be used for clinical purposes in the future. Copyright © 2017 Elsevier Ltd. All rights reserved.
Is this the right normalization? A diagnostic tool for ChIP-seq normalization.
Angelini, Claudia; Heller, Ruth; Volkinshtein, Rita; Yekutieli, Daniel
2015-05-09
Chip-seq experiments are becoming a standard approach for genome-wide profiling protein-DNA interactions, such as detecting transcription factor binding sites, histone modification marks and RNA Polymerase II occupancy. However, when comparing a ChIP sample versus a control sample, such as Input DNA, normalization procedures have to be applied in order to remove experimental source of biases. Despite the substantial impact that the choice of the normalization method can have on the results of a ChIP-seq data analysis, their assessment is not fully explored in the literature. In particular, there are no diagnostic tools that show whether the applied normalization is indeed appropriate for the data being analyzed. In this work we propose a novel diagnostic tool to examine the appropriateness of the estimated normalization procedure. By plotting the empirical densities of log relative risks in bins of equal read count, along with the estimated normalization constant, after logarithmic transformation, the researcher is able to assess the appropriateness of the estimated normalization constant. We use the diagnostic plot to evaluate the appropriateness of the estimates obtained by CisGenome, NCIS and CCAT on several real data examples. Moreover, we show the impact that the choice of the normalization constant can have on standard tools for peak calling such as MACS or SICER. Finally, we propose a novel procedure for controlling the FDR using sample swapping. This procedure makes use of the estimated normalization constant in order to gain power over the naive choice of constant (used in MACS and SICER), which is the ratio of the total number of reads in the ChIP and Input samples. Linear normalization approaches aim to estimate a scale factor, r, to adjust for different sequencing depths when comparing ChIP versus Input samples. The estimated scaling factor can easily be incorporated in many peak caller algorithms to improve the accuracy of the peak identification. The diagnostic plot proposed in this paper can be used to assess how adequate ChIP/Input normalization constants are, and thus it allows the user to choose the most adequate estimate for the analysis.
Jingzhi Zhang; Feng Gu; J.Y. Zhu; Ronald S. Zalesny Jr.
2015-01-01
Sulfite pretreatment to overcome the recalcitrance of lignocelluloses (SPORL) was applied to poplar NE222 chips in a range of chemical loadings, temperatures, and times. The combined hydrolysis factor (CHF) as a pretreatment severity accurately predicted xylan dissolution by SPORL. Good correlations between CHF and pretreated...
Joshi, Dev Raj; Zhang, Yu; Zhang, Hong; Gao, Yingxin; Yang, Min
2018-01-01
Nitrogenous heterocyclic compounds are key pollutants in coking wastewater; however, the functional potential of microbial communities for biodegradation of such contaminants during biological treatment is still elusive. Herein, a high throughput functional gene array (GeoChip 5.0) in combination with Illumina HiSeq2500 sequencing was used to compare and characterize the microbial community functional structure in a long run (500days) bench scale bioreactor treating coking wastewater, with a control system treating synthetic wastewater. Despite the inhibitory toxic pollutants, GeoChip 5.0 detected almost all key functional gene (average 61,940 genes) categories in the coking wastewater sludge. With higher abundance, aromatic ring cleavage dioxygenase genes including multi ring1,2diox; one ring2,3diox; catechol represented significant functional potential for degradation of aromatic pollutants which was further confirmed by Illumina HiSeq2500 analysis results. Response ratio analysis revealed that three nitrogenous compound degrading genes- nbzA (nitro-aromatics), tdnB (aniline), and scnABC (thiocyanate) were unique for coking wastewater treatment, which might be strong cause to increase ammonia level during the aerobic process. Additionally, HiSeq2500 elucidated carbozole and isoquinoline degradation genes in the system. These findings expanded our understanding on functional potential of microbial communities to remove organic nitrogenous pollutants; hence it will be useful in optimization strategies for biological treatment of coking wastewater. Copyright © 2017. Published by Elsevier B.V.
The Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool was developed to address needs for rapid, cost effective methods of species extrapolation of chemical susceptibility. Specifically, the SeqAPASS tool compares the primary sequence (Level 1), functiona...
Organ/body-on-a-chip based on microfluidic technology for drug discovery.
Kimura, Hiroshi; Sakai, Yasuyuki; Fujii, Teruo
2018-02-01
Although animal experiments are indispensable for preclinical screening in the drug discovery process, various issues such as ethical considerations and species differences remain. To solve these issues, cell-based assays using human-derived cells have been actively pursued. However, it remains difficult to accurately predict drug efficacy, toxicity, and organs interactions, because cultivated cells often do not retain their original organ functions and morphologies in conventional in vitro cell culture systems. In the μTAS research field, which is a part of biochemical engineering, the technologies of organ-on-a-chip, based on microfluidic devices built using microfabrication, have been widely studied recently as a novel in vitro organ model. Since it is possible to physically and chemically mimic the in vitro environment by using microfluidic device technology, maintenance of cellular function and morphology, and replication of organ interactions can be realized using organ-on-a-chip devices. So far, functions of various organs and tissues, such as the lung, liver, kidney, and gut have been reproduced as in vitro models. Furthermore, a body-on-a-chip, integrating multi organ functions on a microfluidic device, has also been proposed for prediction of organ interactions. We herein provide a background of microfluidic systems, organ-on-a-chip, Body-on-a-chip technologies, and their challenges in the future. Copyright © 2017 The Japanese Society for the Study of Xenobiotics. Published by Elsevier Ltd. All rights reserved.
Reid-Bayliss, Kate S; Loeb, Lawrence A
2017-08-29
Transcriptional mutagenesis (TM) due to misincorporation during RNA transcription can result in mutant RNAs, or epimutations, that generate proteins with altered properties. TM has long been hypothesized to play a role in aging, cancer, and viral and bacterial evolution. However, inadequate methodologies have limited progress in elucidating a causal association. We present a high-throughput, highly accurate RNA sequencing method to measure epimutations with single-molecule sensitivity. Accurate RNA consensus sequencing (ARC-seq) uniquely combines RNA barcoding and generation of multiple cDNA copies per RNA molecule to eliminate errors introduced during cDNA synthesis, PCR, and sequencing. The stringency of ARC-seq can be scaled to accommodate the quality of input RNAs. We apply ARC-seq to directly assess transcriptome-wide epimutations resulting from RNA polymerase mutants and oxidative stress.
Arnaiz, Olivier; Van Dijk, Erwin; Bétermier, Mireille; Lhuillier-Akakpo, Maoussi; de Vanssay, Augustin; Duharcourt, Sandra; Sallet, Erika; Gouzy, Jérôme; Sperling, Linda
2017-06-26
The 15 sibling species of the Paramecium aurelia cryptic species complex emerged after a whole genome duplication that occurred tens of millions of years ago. Given extensive knowledge of the genetics and epigenetics of Paramecium acquired over the last century, this species complex offers a uniquely powerful system to investigate the consequences of whole genome duplication in a unicellular eukaryote as well as the genetic and epigenetic mechanisms that drive speciation. High quality Paramecium gene models are important for research using this system. The major aim of the work reported here was to build an improved gene annotation pipeline for the Paramecium lineage. We generated oriented RNA-Seq transcriptome data across the sexual process of autogamy for the model species Paramecium tetraurelia. We determined, for the first time in a ciliate, candidate P. tetraurelia transcription start sites using an adapted Cap-Seq protocol. We developed TrUC, multi-threaded Perl software that in conjunction with TopHat mapping of RNA-Seq data to a reference genome, predicts transcription units for the annotation pipeline. We used EuGene software to combine annotation evidence. The high quality gene structural annotations obtained for P. tetraurelia were used as evidence to improve published annotations for 3 other Paramecium species. The RNA-Seq data were also used for differential gene expression analysis, providing a gene expression atlas that is more sensitive than the previously established microarray resource. We have developed a gene annotation pipeline tailored for the compact genomes and tiny introns of Paramecium species. A novel component of this pipeline, TrUC, predicts transcription units using Cap-Seq and oriented RNA-Seq data. TrUC could prove useful beyond Paramecium, especially in the case of high gene density. Accurate predictions of 3' and 5' UTR will be particularly valuable for studies of gene expression (e.g. nucleosome positioning, identification of cis regulatory motifs). The P. tetraurelia improved transcriptome resource, gene annotations for P. tetraurelia, P. biaurelia, P. sexaurelia and P. caudatum, and Paramecium-trained EuGene configuration are available through ParameciumDB ( http://paramecium.i2bc.paris-saclay.fr ). TrUC software is freely distributed under a GNU GPL v3 licence ( https://github.com/oarnaiz/TrUC ).
X-MATE: a flexible system for mapping short read data
Pearson, John V.; Cloonan, Nicole; Grimmond, Sean M.
2011-01-01
Summary: Accurate and complete mapping of short-read sequencing to a reference genome greatly enhances the discovery of biological results and improves statistical predictions. We recently presented RNA-MATE, a pipeline for the recursive mapping of RNA-Seq datasets. With the rapid increase in genome re-sequencing projects, progression of available mapping software and the evolution of file formats, we now present X-MATE, an updated version of RNA-MATE, capable of mapping both RNA-Seq and DNA datasets and with improved performance, output file formats, configuration files, and flexibility in core mapping software. Availability: Executables, source code, junction libraries, test data and results and the user manual are available from http://grimmond.imb.uq.edu.au/X-MATE/. Contact: n.cloonan@uq.edu.au; s.grimmond@uq.edu.au Supplementary information: Supplementary data are available at Bioinformatics Online. PMID:21216778
Accurate identification of RNA editing sites from primitive sequence with deep neural networks.
Ouyang, Zhangyi; Liu, Feng; Zhao, Chenghui; Ren, Chao; An, Gaole; Mei, Chuan; Bo, Xiaochen; Shu, Wenjie
2018-04-16
RNA editing is a post-transcriptional RNA sequence alteration. Current methods have identified editing sites and facilitated research but require sufficient genomic annotations and prior-knowledge-based filtering steps, resulting in a cumbersome, time-consuming identification process. Moreover, these methods have limited generalizability and applicability in species with insufficient genomic annotations or in conditions of limited prior knowledge. We developed DeepRed, a deep learning-based method that identifies RNA editing from primitive RNA sequences without prior-knowledge-based filtering steps or genomic annotations. DeepRed achieved 98.1% and 97.9% area under the curve (AUC) in training and test sets, respectively. We further validated DeepRed using experimentally verified U87 cell RNA-seq data, achieving 97.9% positive predictive value (PPV). We demonstrated that DeepRed offers better prediction accuracy and computational efficiency than current methods with large-scale, mass RNA-seq data. We used DeepRed to assess the impact of multiple factors on editing identification with RNA-seq data from the Association of Biomolecular Resource Facilities and Sequencing Quality Control projects. We explored developmental RNA editing pattern changes during human early embryogenesis and evolutionary patterns in Drosophila species and the primate lineage using DeepRed. Our work illustrates DeepRed's state-of-the-art performance; it may decipher the hidden principles behind RNA editing, making editing detection convenient and effective.
Ryan, Michael C; Cleland, James; Kim, RyangGuk; Wong, Wing Chung; Weinstein, John N
2012-09-15
SpliceSeq is a resource for RNA-Seq data that provides a clear view of alternative splicing and identifies potential functional changes that result from splice variation. It displays intuitive visualizations and prioritized lists of results that highlight splicing events and their biological consequences. SpliceSeq unambiguously aligns reads to gene splice graphs, facilitating accurate analysis of large, complex transcript variants that cannot be adequately represented in other formats. SpliceSeq is freely available at http://bioinformatics.mdanderson.org/main/SpliceSeq:Overview. The application is a Java program that can be launched via a browser or installed locally. Local installation requires MySQL and Bowtie. mryan@insilico.us.com Supplementary data are available at Bioinformatics online.
Optimal use of tandem biotin and V5 tags in ChIP assays
Kolodziej, Katarzyna E; Pourfarzad, Farzin; de Boer, Ernie; Krpic, Sanja; Grosveld, Frank; Strouboulis, John
2009-01-01
Background Chromatin immunoprecipitation (ChIP) assays coupled to genome arrays (Chip-on-chip) or massive parallel sequencing (ChIP-seq) lead to the genome wide identification of binding sites of chromatin associated proteins. However, the highly variable quality of antibodies and the availability of epitopes in crosslinked chromatin can compromise genomic ChIP outcomes. Epitope tags have often been used as more reliable alternatives. In addition, we have employed protein in vivo biotinylation tagging as a very high affinity alternative to antibodies. In this paper we describe the optimization of biotinylation tagging for ChIP and its coupling to a known epitope tag in providing a reliable and efficient alternative to antibodies. Results Using the biotin tagged erythroid transcription factor GATA-1 as example, we describe several optimization steps for the application of the high affinity biotin streptavidin system in ChIP. We find that the omission of SDS during sonication, the use of fish skin gelatin as blocking agent and choice of streptavidin beads can lead to significantly improved ChIP enrichments and lower background compared to antibodies. We also show that the V5 epitope tag performs equally well under the conditions worked out for streptavidin ChIP and that it may suffer less from the effects of formaldehyde crosslinking. Conclusion The combined use of the very high affinity biotin tag with the less sensitive to crosslinking V5 tag provides for a flexible ChIP platform with potential implications in ChIP sequencing outcomes. PMID:19196479
Analysis of ChIP-seq Data in R/Bioconductor.
de Santiago, Ines; Carroll, Thomas
2018-01-01
The development of novel high-throughput sequencing methods for ChIP (chromatin immunoprecipitation) has provided a very powerful tool to study gene regulation in multiple conditions at unprecedented resolution and scale. Proactive quality-control and appropriate data analysis techniques are of critical importance to extract the most meaningful results from the data. Over the last years, an array of R/Bioconductor tools has been developed allowing researchers to process and analyze ChIP-seq data. This chapter provides an overview of the methods available to analyze ChIP-seq data based primarily on software packages from the open-source Bioconductor project. Protocols described in this chapter cover basic steps including data alignment, peak calling, quality control and data visualization, as well as more complex methods such as the identification of differentially bound regions and functional analyses to annotate regulatory regions. The steps in the data analysis process were demonstrated on publicly available data sets and will serve as a demonstration of the computational procedures routinely used for the analysis of ChIP-seq data in R/Bioconductor, from which readers can construct their own analysis pipelines.
2014-01-01
We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the United States Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for junction discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed, for these and qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcript-level profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings. PMID:25150838
Mendoza-Parra, Marco-Antonio; Saravaki, Vincent; Cholley, Pierre-Etienne; Blum, Matthias; Billoré, Benjamin; Gronemeyer, Hinrich
2016-01-01
We have established a certification system for antibodies to be used in chromatin immunoprecipitation assays coupled to massive parallel sequencing (ChIP-seq). This certification comprises a standardized ChIP procedure and the attribution of a numerical quality control indicator (QCi) to biological replicate experiments. The QCi computation is based on a universally applicable quality assessment that quantitates the global deviation of randomly sampled subsets of ChIP-seq dataset with the original genome-aligned sequence reads. Comparison with a QCi database for >28,000 ChIP-seq assays were used to attribute quality grades (ranging from 'AAA' to 'DDD') to a given dataset. In the present report we used the numerical QC system to assess the factors influencing the quality of ChIP-seq assays, including the nature of the target, the sequencing depth and the commercial source of the antibody. We have used this approach specifically to certify mono and polyclonal antibodies obtained from Active Motif directed against the histone modification marks H3K4me3, H3K27ac and H3K9ac for ChIP-seq. The antibodies received the grades AAA to BBC ( www.ngs-qc.org). We propose to attribute such quantitative grading of all antibodies attributed with the label "ChIP-seq grade".
Zhang, Yanju; Lameijer, Eric-Wubbo; 't Hoen, Peter A. C.; Ning, Zemin; Slagboom, P. Eline; Ye, Kai
2012-01-01
Motivation: RNA-seq is a powerful technology for the study of transcriptome profiles that uses deep-sequencing technologies. Moreover, it may be used for cellular phenotyping and help establishing the etiology of diseases characterized by abnormal splicing patterns. In RNA-Seq, the exact nature of splicing events is buried in the reads that span exon–exon boundaries. The accurate and efficient mapping of these reads to the reference genome is a major challenge. Results: We developed PASSion, a pattern growth algorithm-based pipeline for splice site detection in paired-end RNA-Seq reads. Comparing the performance of PASSion to three existing RNA-Seq analysis pipelines, TopHat, MapSplice and HMMSplicer, revealed that PASSion is competitive with these packages. Moreover, the performance of PASSion is not affected by read length and coverage. It performs better than the other three approaches when detecting junctions in highly abundant transcripts. PASSion has the ability to detect junctions that do not have known splicing motifs, which cannot be found by the other tools. Of the two public RNA-Seq datasets, PASSion predicted ∼ 137 000 and 173 000 splicing events, of which on average 82 are known junctions annotated in the Ensembl transcript database and 18% are novel. In addition, our package can discover differential and shared splicing patterns among multiple samples. Availability: The code and utilities can be freely downloaded from https://trac.nbic.nl/passion and ftp://ftp.sanger.ac.uk/pub/zn1/passion Contact: y.zhang@lumc.nl; k.ye@lumc.nl Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22219203
Wang, Yao; Cui, Yazhou; Zhou, Xiaoyan; Han, Jinxiang
2015-01-01
Objective Osteogenesis imperfecta (OI) is a rare inherited skeletal disease, characterized by bone fragility and low bone density. The mutations in this disorder have been widely reported to be on various exonal hotspots of the candidate genes, including COL1A1, COL1A2, CRTAP, LEPRE1, and FKBP10, thus creating a great demand for precise genetic tests. However, large genome sizes make the process daunting and the analyses, inefficient and expensive. Therefore, we aimed at developing a fast, accurate, efficient, and cheaper sequencing platform for OI diagnosis; and to this end, use of an advanced array-based technique was proposed. Method A CustomSeq Affymetrix Resequencing Array was established for high-throughput sequencing of five genes simultaneously. Genomic DNA extraction from 13 OI patients and 85 normal controls and amplification using long-range PCR (LR-PCR) were followed by DNA fragmentation and chip hybridization, according to standard Affymetrix protocols. Hybridization signals were determined using GeneChip Sequence Analysis Software (GSEQ). To examine the feasibility, the outcome from new resequencing approach was validated by conventional capillary sequencing method. Result Overall call rates using resequencing array was 96–98% and the agreement between microarray and capillary sequencing was 99.99%. 11 out of 13 OI patients with pathogenic mutations were successfully detected by the chip analysis without adjustment, and one mutation could also be identified using manual visual inspection. Conclusion A high-throughput resequencing array was developed that detects the disease-associated mutations in OI, providing a potential tool to facilitate large-scale genetic screening for OI patients. Through this method, a novel mutation was also found. PMID:25742658
RNA-Seq-Based Transcript Structure Analysis with TrBorderExt.
Wang, Yejun; Sun, Ming-An; White, Aaron P
2018-01-01
RNA-Seq has become a routine strategy for genome-wide gene expression comparisons in bacteria. Despite lower resolution in transcript border parsing compared with dRNA-Seq, TSS-EMOTE, Cappable-seq, Term-seq, and others, directional RNA-Seq still illustrates its advantages: low cost, quantification and transcript border analysis with a medium resolution (±10-20 nt). To facilitate mining of directional RNA-Seq datasets especially with respect to transcript structure analysis, we developed a tool, TrBorderExt, which can parse transcript start sites and termination sites accurately in bacteria. A detailed protocol is described in this chapter for how to use the software package step by step to identify bacterial transcript borders from raw RNA-Seq data. The package was developed with Perl and R programming languages, and is accessible freely through the website: http://www.szu-bioinf.org/TrBorderExt .
USDA-ARS?s Scientific Manuscript database
Members of “Candidatus Liberibacter” are associated with several important plant diseases such as citrus Huanglongbing (HLB) and potato zebra chip (ZC) disease. Inability to culture and low titers in infected hosts have been major obstacles for research on these bacteria. The use of whole genome seq...
Ryan, Michael C.; Cleland, James; Kim, RyangGuk; Wong, Wing Chung; Weinstein, John N.
2012-01-01
Summary: SpliceSeq is a resource for RNA-Seq data that provides a clear view of alternative splicing and identifies potential functional changes that result from splice variation. It displays intuitive visualizations and prioritized lists of results that highlight splicing events and their biological consequences. SpliceSeq unambiguously aligns reads to gene splice graphs, facilitating accurate analysis of large, complex transcript variants that cannot be adequately represented in other formats. Availability and implementation: SpliceSeq is freely available at http://bioinformatics.mdanderson.org/main/SpliceSeq:Overview. The application is a Java program that can be launched via a browser or installed locally. Local installation requires MySQL and Bowtie. Contact: mryan@insilico.us.com Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:22820202
Ruttanajit, Tida; Chanchamroen, Sujin; Cram, David S; Sawakwongpra, Kritchakorn; Suksalak, Wanwisa; Leng, Xue; Fan, Junmei; Wang, Li; Yao, Yuanqing; Quangkananurug, Wiwat
2016-02-01
Currently, our understanding of the nature and reproductive potential of blastocysts associated with trophectoderm (TE) lineage chromosomal mosaicism is limited. The objective of this study was to first validate copy number variation sequencing (CNV-Seq) for measuring the level of mosaicism and second, examine the nature and level of mosaicism in TE biopsies of patient's blastocysts. TE biopy samples were analysed by array comparative genomic hybridization (CGH) and CNV-Seq to discriminate between euploid, aneuploid and mosaic blastocysts. Using artificial models of TE mosaicism for five different chromosomes, CNV-Seq accurately and reproducibly quantitated mosaicism at levels of 50% and 20%. In a comparative 24-chromosome study of 49 blastocysts by array CGH and CNV-Seq, 43 blastocysts (87.8%) had a concordant diagnosis and 6 blastocysts (12.2%) were discordant. The discordance was attributed to low to medium levels of chromosomal mosaicism (30-70%) not detected by array CGH. In an expanded study of 399 blastocysts using CNV-Seq as the sole diagnostic method, the proportion of diploid-aneuploid mosaics (34, 8.5%) was significantly higher than aneuploid mosaics (18, 4.5%) (p < 0.02). Mosaicism is a significant chromosomal abnormality associated with the TE lineage of human blastocysts that can be reliably and accurately detected by CNV-Seq. © 2015 John Wiley & Sons, Ltd.
Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications.
Wu, Xiao-Lin; Xu, Jiaqi; Feng, Guofei; Wiggans, George R; Taylor, Jeremy F; He, Jun; Qian, Changsong; Qiu, Jiansheng; Simpson, Barry; Walker, Jeremy; Bauck, Stewart
2016-01-01
Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for the optimal design of LD SNP chips. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optimal LD SNP chips that can be imputed accurately to medium-density (MD) or high-density (HD) SNP genotypes for genomic prediction. The objective function facilitates maximization of non-gap map length and system information for the SNP chip, and the latter is computed either as locus-averaged (LASE) or haplotype-averaged Shannon entropy (HASE) and adjusted for uniformity of the SNP distribution. HASE performed better than LASE with ≤1,000 SNPs, but required considerably more computing time. Nevertheless, the differences diminished when >5,000 SNPs were selected. Optimization was accomplished conditionally on the presence of SNPs that were obligated to each chromosome. The frame location of SNPs on a chip can be either uniform (evenly spaced) or non-uniform. For the latter design, a tunable empirical Beta distribution was used to guide location distribution of frame SNPs such that both ends of each chromosome were enriched with SNPs. The SNP distribution on each chromosome was finalized through the objective function that was locally and empirically maximized. This MOLO algorithm was capable of selecting a set of approximately evenly-spaced and highly-informative SNPs, which in turn led to increased imputation accuracy compared with selection solely of evenly-spaced SNPs. Imputation accuracy increased with LD chip size, and imputation error rate was extremely low for chips with ≥3,000 SNPs. Assuming that genotyping or imputation error occurs at random, imputation error rate can be viewed as the upper limit for genomic prediction error. Our results show that about 25% of imputation error rate was propagated to genomic prediction in an Angus population. The utility of this MOLO algorithm was also demonstrated in a real application, in which a 6K SNP panel was optimized conditional on 5,260 obligatory SNP selected based on SNP-trait association in U.S. Holstein animals. With this MOLO algorithm, both imputation error rate and genomic prediction error rate were minimal.
Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications
Wu, Xiao-Lin; Xu, Jiaqi; Feng, Guofei; Wiggans, George R.; Taylor, Jeremy F.; He, Jun; Qian, Changsong; Qiu, Jiansheng; Simpson, Barry; Walker, Jeremy; Bauck, Stewart
2016-01-01
Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for the optimal design of LD SNP chips. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optimal LD SNP chips that can be imputed accurately to medium-density (MD) or high-density (HD) SNP genotypes for genomic prediction. The objective function facilitates maximization of non-gap map length and system information for the SNP chip, and the latter is computed either as locus-averaged (LASE) or haplotype-averaged Shannon entropy (HASE) and adjusted for uniformity of the SNP distribution. HASE performed better than LASE with ≤1,000 SNPs, but required considerably more computing time. Nevertheless, the differences diminished when >5,000 SNPs were selected. Optimization was accomplished conditionally on the presence of SNPs that were obligated to each chromosome. The frame location of SNPs on a chip can be either uniform (evenly spaced) or non-uniform. For the latter design, a tunable empirical Beta distribution was used to guide location distribution of frame SNPs such that both ends of each chromosome were enriched with SNPs. The SNP distribution on each chromosome was finalized through the objective function that was locally and empirically maximized. This MOLO algorithm was capable of selecting a set of approximately evenly-spaced and highly-informative SNPs, which in turn led to increased imputation accuracy compared with selection solely of evenly-spaced SNPs. Imputation accuracy increased with LD chip size, and imputation error rate was extremely low for chips with ≥3,000 SNPs. Assuming that genotyping or imputation error occurs at random, imputation error rate can be viewed as the upper limit for genomic prediction error. Our results show that about 25% of imputation error rate was propagated to genomic prediction in an Angus population. The utility of this MOLO algorithm was also demonstrated in a real application, in which a 6K SNP panel was optimized conditional on 5,260 obligatory SNP selected based on SNP-trait association in U.S. Holstein animals. With this MOLO algorithm, both imputation error rate and genomic prediction error rate were minimal. PMID:27583971
ChIP-seq Identification of Weakly Conserved Heart Enhancers
Blow, Matthew J.; McCulley, David J.; Li, Zirong; Zhang, Tao; Akiyama, Jennifer A.; Holt, Amy; Plajzer-Frick, Ingrid; Shoukry, Malak; Wright, Crystal; Chen, Feng; Afzal, Veena; Bristow, James; Ren, Bing; Black, Brian L.; Rubin, Edward M.; Visel, Axel; Pennacchio, Len A.
2011-01-01
Accurate control of tissue-specific gene expression plays a pivotal role in heart development, but few cardiac transcriptional enhancers have thus far been identified. Extreme non-coding sequence conservation successfully predicts enhancers active in many tissues, but fails to identify substantial numbers of heart enhancers. Here we used ChIP-seq with the enhancer-associated protein p300 from mouse embryonic day 11.5 heart tissue to identify over three thousand candidate heart enhancers genome-wide. Compared to other tissues studied at this time-point, most candidate heart enhancers are less deeply conserved in vertebrate evolution. Nevertheless, the testing of 130 candidate regions in a transgenic mouse assay revealed that most of them reproducibly function as enhancers active in the heart, irrespective of their degree of evolutionary constraint. These results provide evidence for a large population of poorly conserved heart enhancers and suggest that the evolutionary constraint of embryonic enhancers can vary depending on tissue type. PMID:20729851
Du, Xiuquan; Hu, Changlin; Yao, Yu; Sun, Shiwei; Zhang, Yanping
2017-12-12
In bioinformatics, exon skipping (ES) event prediction is an essential part of alternative splicing (AS) event analysis. Although many methods have been developed to predict ES events, a solution has yet to be found. In this study, given the limitations of machine learning algorithms with RNA-Seq data or genome sequences, a new feature, called RS (RNA-seq and sequence) features, was constructed. These features include RNA-Seq features derived from the RNA-Seq data and sequence features derived from genome sequences. We propose a novel Rotation Forest classifier to predict ES events with the RS features (RotaF-RSES). To validate the efficacy of RotaF-RSES, a dataset from two human tissues was used, and RotaF-RSES achieved an accuracy of 98.4%, a specificity of 99.2%, a sensitivity of 94.1%, and an area under the curve (AUC) of 98.6%. When compared to the other available methods, the results indicate that RotaF-RSES is efficient and can predict ES events with RS features.
Xu, Joshua; Gong, Binsheng; Wu, Leihong; Thakkar, Shraddha; Hong, Huixiao; Tong, Weida
2016-03-15
Studies on gene expression in response to therapy have led to the discovery of pharmacogenomics biomarkers and advances in precision medicine. Whole transcriptome sequencing (RNA-seq) is an emerging tool for profiling gene expression and has received wide adoption in the biomedical research community. However, its value in regulatory decision making requires rigorous assessment and consensus between various stakeholders, including the research community, regulatory agencies, and industry. The FDA-led SEquencing Quality Control (SEQC) consortium has made considerable progress in this direction, and is the subject of this review. Specifically, three RNA-seq platforms (Illumina HiSeq, Life Technologies SOLiD, and Roche 454) were extensively evaluated at multiple sites to assess cross-site and cross-platform reproducibility. The results demonstrated that relative gene expression measurements were consistently comparable across labs and platforms, but not so for the measurement of absolute expression levels. As part of the quality evaluation several studies were included to evaluate the utility of RNA-seq in clinical settings and safety assessment. The neuroblastoma study profiled tumor samples from 498 pediatric neuroblastoma patients by both microarray and RNA-seq. RNA-seq offers more utilities than microarray in determining the transcriptomic characteristics of cancer. However, RNA-seq and microarray-based models were comparable in clinical endpoint prediction, even when including additional features unique to RNA-seq beyond gene expression. The toxicogenomics study compared microarray and RNA-seq profiles of the liver samples from rats exposed to 27 different chemicals representing multiple toxicity modes of action. Cross-platform concordance was dependent on chemical treatment and transcript abundance. Though both RNA-seq and microarray are suitable for developing gene expression based predictive models with comparable prediction performance, RNA-seq offers advantages over microarray in profiling genes with low expression. The rat BodyMap study provided a comprehensive rat transcriptomic body map by performing RNA-Seq on 320 samples from 11 organs in either sex of juvenile, adolescent, adult and aged Fischer 344 rats. Lastly, the transferability study demonstrated that signature genes of predictive models are reciprocally transferable between microarray and RNA-seq data for model development using a comprehensive approach with two large clinical data sets. This result suggests continued usefulness of legacy microarray data in the coming RNA-seq era. In conclusion, the SEQC project enhances our understanding of RNA-seq and provides valuable guidelines for RNA-seq based clinical application and safety evaluation to advance precision medicine.
Sun, Guoyan; Zhao, Lingling; Zhao, Qingliang; Gao, Limin
2018-05-10
There have been few investigations dealing with the force model on grinding brittle materials. However, the dynamic material removal mechanisms have not yet been sufficiently explicated through the grain-workpiece interaction statuses while considering the brittle material characteristics. This paper proposes an improved grinding force model for Zerodur, which contains ductile removal force, brittle removal force, and frictional force, corresponding to the ductile and brittle material removal phases, as well as the friction process, respectively. The critical uncut chip thickness a gc of brittle-ductile transition and the maximum uncut chip thickness a gmax of a single abrasive grain are calculated to identify the specified material removal mode, while the comparative result between a gmax and a gc can be applied to determine the selection of effective grinding force components. Subsequently, indentation fracture tests are carried out to acquire accurate material mechanical properties of Zerodur in establishing the brittle removal force model. Then, the experiments were conducted to derive the coefficients in the grinding force prediction model. Simulated through this model, correlations between the grinding force and grinding parameters can be predicted. Finally, three groups of grinding experiments are carried out to validate the mathematical grinding force model. The experimental results indicate that the improved model is capable of predicting the realistic grinding force accurately with the relative mean errors of 6.04% to the normal grinding force and 7.22% to the tangential grinding force, respectively.
BAsE-Seq: a method for obtaining long viral haplotypes from short sequence reads.
Hong, Lewis Z; Hong, Shuzhen; Wong, Han Teng; Aw, Pauline P K; Cheng, Yan; Wilm, Andreas; de Sessions, Paola F; Lim, Seng Gee; Nagarajan, Niranjan; Hibberd, Martin L; Quake, Stephen R; Burkholder, William F
2014-01-01
We present a method for obtaining long haplotypes, of over 3 kb in length, using a short-read sequencer, Barcode-directed Assembly for Extra-long Sequences (BAsE-Seq). BAsE-Seq relies on transposing a template-specific barcode onto random segments of the template molecule and assembling the barcoded short reads into complete haplotypes. We applied BAsE-Seq on mixed clones of hepatitis B virus and accurately identified haplotypes occurring at frequencies greater than or equal to 0.4%, with >99.9% specificity. Applying BAsE-Seq to a clinical sample, we obtained over 9,000 viral haplotypes, which provided an unprecedented view of hepatitis B virus population structure during chronic infection. BAsE-Seq is readily applicable for monitoring quasispecies evolution in viral diseases.
NASA Astrophysics Data System (ADS)
Cai, Lei; Yuan, Wei; Zhang, Zhou; He, Lin; Chou, Kuo-Chen
2016-11-01
Four popular somatic single nucleotide variant (SNV) calling methods (Varscan, SomaticSniper, Strelka and MuTect2) were carefully evaluated on the real whole exome sequencing (WES, depth of ~50X) and ultra-deep targeted sequencing (UDT-Seq, depth of ~370X) data. The four tools returned poor consensus on candidates (only 20% of calls were with multiple hits by the callers). For both WES and UDT-Seq, MuTect2 and Strelka obtained the largest proportion of COSMIC entries as well as the lowest rate of dbSNP presence and high-alternative-alleles-in-control calls, demonstrating their superior sensitivity and accuracy. Combining different callers does increase reliability of candidates, but narrows the list down to very limited range of tumor read depth and variant allele frequency. Calling SNV on UDT-Seq data, which were of much higher read-depth, discovered additional true-positive variations, despite an even more tremendous growth in false positive predictions. Our findings not only provide valuable benchmark for state-of-the-art SNV calling methods, but also shed light on the access to more accurate SNV identification in the future.
Hermansen, Peter; MacKay, Scott; Wishart, David; Jie Chen
2016-08-01
Microfabricated interdigitated electrode chips have been designed for use in a unique gold-nanoparticle based biosensor system. The use of these electrodes will allow for simple, accurate, inexpensive, and portable biosensing, with potential applications in diagnostics, medical research, and environmental testing. To determine the optimal design for these electrodes, finite element analysis simulations were carried out using COMSOL Multiphysics software. The results of these simulations determined some of the optimal design parameters for microfabricating interdigitated electrodes as well as predicting the effects of different electrode materials. Finally, based on the results of these simulations two different kinds of interdigitated electrode chips were made using photolithography.
The bench scientist's guide to RNA-Seq analysis
USDA-ARS?s Scientific Manuscript database
RNA sequencing (RNA-Seq) is emerging as a highly accurate method to quantify transcript abundance. However, analyses of the large data sets obtained by sequencing the entire transcriptome of organisms have generally been performed by bioinformatic specialists. Here we outline a methods strategy desi...
Tuerk, Andreas; Wiktorin, Gregor; Güler, Serhat
2017-05-01
Accuracy of transcript quantification with RNA-Seq is negatively affected by positional fragment bias. This article introduces Mix2 (rd. "mixquare"), a transcript quantification method which uses a mixture of probability distributions to model and thereby neutralize the effects of positional fragment bias. The parameters of Mix2 are trained by Expectation Maximization resulting in simultaneous transcript abundance and bias estimates. We compare Mix2 to Cufflinks, RSEM, eXpress and PennSeq; state-of-the-art quantification methods implementing some form of bias correction. On four synthetic biases we show that the accuracy of Mix2 overall exceeds the accuracy of the other methods and that its bias estimates converge to the correct solution. We further evaluate Mix2 on real RNA-Seq data from the Microarray and Sequencing Quality Control (MAQC, SEQC) Consortia. On MAQC data, Mix2 achieves improved correlation to qPCR measurements with a relative increase in R2 between 4% and 50%. Mix2 also yields repeatable concentration estimates across technical replicates with a relative increase in R2 between 8% and 47% and reduced standard deviation across the full concentration range. We further observe more accurate detection of differential expression with a relative increase in true positives between 74% and 378% for 5% false positives. In addition, Mix2 reveals 5 dominant biases in MAQC data deviating from the common assumption of a uniform fragment distribution. On SEQC data, Mix2 yields higher consistency between measured and predicted concentration ratios. A relative error of 20% or less is obtained for 51% of transcripts by Mix2, 40% of transcripts by Cufflinks and RSEM and 30% by eXpress. Titration order consistency is correct for 47% of transcripts for Mix2, 41% for Cufflinks and RSEM and 34% for eXpress. We, further, observe improved repeatability across laboratory sites with a relative increase in R2 between 8% and 44% and reduced standard deviation.
Majoros, William H.; Campbell, Michael S.; Holt, Carson; DeNardo, Erin K.; Ware, Doreen; Allen, Andrew S.; Yandell, Mark; Reddy, Timothy E.
2017-01-01
Abstract Motivation: The accurate interpretation of genetic variants is critical for characterizing genotype–phenotype associations. Because the effects of genetic variants can depend strongly on their local genomic context, accurate genome annotations are essential. Furthermore, as some variants have the potential to disrupt or alter gene structure, variant interpretation efforts stand to gain from the use of individualized annotations that account for differences in gene structure between individuals or strains. Results: We describe a suite of software tools for identifying possible functional changes in gene structure that may result from sequence variants. ACE (‘Assessing Changes to Exons’) converts phased genotype calls to a collection of explicit haplotype sequences, maps transcript annotations onto them, detects gene-structure changes and their possible repercussions, and identifies several classes of possible loss of function. Novel transcripts predicted by ACE are commonly supported by spliced RNA-seq reads, and can be used to improve read alignment and transcript quantification when an individual-specific genome sequence is available. Using publicly available RNA-seq data, we show that ACE predictions confirm earlier results regarding the quantitative effects of nonsense-mediated decay, and we show that predicted loss-of-function events are highly concordant with patterns of intolerance to mutations across the human population. ACE can be readily applied to diverse species including animals and plants, making it a broadly useful tool for use in eukaryotic population-based resequencing projects, particularly for assessing the joint impact of all variants at a locus. Availability and Implementation: ACE is written in open-source C ++ and Perl and is available from geneprediction.org/ACE Contact: myandell@genetics.utah.edu or tim.reddy@duke.edu Supplementary information: Supplementary information is available at Bioinformatics online. PMID:28011790
Majoros, William H; Campbell, Michael S; Holt, Carson; DeNardo, Erin K; Ware, Doreen; Allen, Andrew S; Yandell, Mark; Reddy, Timothy E
2017-05-15
The accurate interpretation of genetic variants is critical for characterizing genotype-phenotype associations. Because the effects of genetic variants can depend strongly on their local genomic context, accurate genome annotations are essential. Furthermore, as some variants have the potential to disrupt or alter gene structure, variant interpretation efforts stand to gain from the use of individualized annotations that account for differences in gene structure between individuals or strains. We describe a suite of software tools for identifying possible functional changes in gene structure that may result from sequence variants. ACE ('Assessing Changes to Exons') converts phased genotype calls to a collection of explicit haplotype sequences, maps transcript annotations onto them, detects gene-structure changes and their possible repercussions, and identifies several classes of possible loss of function. Novel transcripts predicted by ACE are commonly supported by spliced RNA-seq reads, and can be used to improve read alignment and transcript quantification when an individual-specific genome sequence is available. Using publicly available RNA-seq data, we show that ACE predictions confirm earlier results regarding the quantitative effects of nonsense-mediated decay, and we show that predicted loss-of-function events are highly concordant with patterns of intolerance to mutations across the human population. ACE can be readily applied to diverse species including animals and plants, making it a broadly useful tool for use in eukaryotic population-based resequencing projects, particularly for assessing the joint impact of all variants at a locus. ACE is written in open-source C ++ and Perl and is available from geneprediction.org/ACE. myandell@genetics.utah.edu or tim.reddy@duke.edu. Supplementary information is available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
SeqAPASS: Predicting chemical susceptibility to threatened/endangered species
Conservation of a molecular target across species can be used as a line-of-evidence to predict the likelihood of chemical susceptibility. The web-based Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS; https://seqapass.epa.gov/seqapass/) application was devel...
Accurate lithography simulation model based on convolutional neural networks
NASA Astrophysics Data System (ADS)
Watanabe, Yuki; Kimura, Taiki; Matsunawa, Tetsuaki; Nojima, Shigeki
2017-07-01
Lithography simulation is an essential technique for today's semiconductor manufacturing process. In order to calculate an entire chip in realistic time, compact resist model is commonly used. The model is established for faster calculation. To have accurate compact resist model, it is necessary to fix a complicated non-linear model function. However, it is difficult to decide an appropriate function manually because there are many options. This paper proposes a new compact resist model using CNN (Convolutional Neural Networks) which is one of deep learning techniques. CNN model makes it possible to determine an appropriate model function and achieve accurate simulation. Experimental results show CNN model can reduce CD prediction errors by 70% compared with the conventional model.
visnormsc: A Graphical User Interface to Normalize Single-cell RNA Sequencing Data.
Tang, Lijun; Zhou, Nan
2017-12-26
Single-cell RNA sequencing (RNA-seq) allows the analysis of gene expression with high resolution. The intrinsic defects of this promising technology imports technical noise into the single-cell RNA-seq data, increasing the difficulty of accurate downstream inference. Normalization is a crucial step in single-cell RNA-seq data pre-processing. SCnorm is an accurate and efficient method that can be used for this purpose. An R implementation of this method is currently available. On one hand, the R package possesses many excellent features from R. On the other hand, R programming ability is required, which prevents the biologists who lack the skills from learning to use it quickly. To make this method more user-friendly, we developed a graphical user interface, visnormsc, for normalization of single-cell RNA-seq data. It is implemented in Python and is freely available at https://github.com/solo7773/visnormsc . Although visnormsc is based on the existing method, it contributes to this field by offering a user-friendly alternative. The out-of-the-box and cross-platform features make visnormsc easy to learn and to use. It is expected to serve biologists by simplifying single-cell RNA-seq normalization.
Picking ChIP-seq peak detectors for analyzing chromatin modification experiments
Micsinai, Mariann; Parisi, Fabio; Strino, Francesco; Asp, Patrik; Dynlacht, Brian D.; Kluger, Yuval
2012-01-01
Numerous algorithms have been developed to analyze ChIP-Seq data. However, the complexity of analyzing diverse patterns of ChIP-Seq signals, especially for epigenetic marks, still calls for the development of new algorithms and objective comparisons of existing methods. We developed Qeseq, an algorithm to detect regions of increased ChIP read density relative to background. Qeseq employs critical novel elements, such as iterative recalibration and neighbor joining of reads to identify enriched regions of any length. To objectively assess its performance relative to other 14 ChIP-Seq peak finders, we designed a novel protocol based on Validation Discriminant Analysis (VDA) to optimally select validation sites and generated two validation datasets, which are the most comprehensive to date for algorithmic benchmarking of key epigenetic marks. In addition, we systematically explored a total of 315 diverse parameter configurations from these algorithms and found that typically optimal parameters in one dataset do not generalize to other datasets. Nevertheless, default parameters show the most stable performance, suggesting that they should be used. This study also provides a reproducible and generalizable methodology for unbiased comparative analysis of high-throughput sequencing tools that can facilitate future algorithmic development. PMID:22307239
Position-specific binding of FUS to nascent RNA regulates mRNA length
Masuda, Akio; Takeda, Jun-ichi; Okuno, Tatsuya; Okamoto, Takaaki; Ohkawara, Bisei; Ito, Mikako; Ishigaki, Shinsuke; Sobue, Gen
2015-01-01
More than half of all human genes produce prematurely terminated polyadenylated short mRNAs. However, the underlying mechanisms remain largely elusive. CLIP-seq (cross-linking immunoprecipitation [CLIP] combined with deep sequencing) of FUS (fused in sarcoma) in neuronal cells showed that FUS is frequently clustered around an alternative polyadenylation (APA) site of nascent RNA. ChIP-seq (chromatin immunoprecipitation [ChIP] combined with deep sequencing) of RNA polymerase II (RNAP II) demonstrated that FUS stalls RNAP II and prematurely terminates transcription. When an APA site is located upstream of an FUS cluster, FUS enhances polyadenylation by recruiting CPSF160 and up-regulates the alternative short transcript. In contrast, when an APA site is located downstream from an FUS cluster, polyadenylation is not activated, and the RNAP II-suppressing effect of FUS leads to down-regulation of the alternative short transcript. CAGE-seq (cap analysis of gene expression [CAGE] combined with deep sequencing) and PolyA-seq (a strand-specific and quantitative method for high-throughput sequencing of 3' ends of polyadenylated transcripts) revealed that position-specific regulation of mRNA lengths by FUS is operational in two-thirds of transcripts in neuronal cells, with enrichment in genes involved in synaptic activities. PMID:25995189
Picking ChIP-seq peak detectors for analyzing chromatin modification experiments.
Micsinai, Mariann; Parisi, Fabio; Strino, Francesco; Asp, Patrik; Dynlacht, Brian D; Kluger, Yuval
2012-05-01
Numerous algorithms have been developed to analyze ChIP-Seq data. However, the complexity of analyzing diverse patterns of ChIP-Seq signals, especially for epigenetic marks, still calls for the development of new algorithms and objective comparisons of existing methods. We developed Qeseq, an algorithm to detect regions of increased ChIP read density relative to background. Qeseq employs critical novel elements, such as iterative recalibration and neighbor joining of reads to identify enriched regions of any length. To objectively assess its performance relative to other 14 ChIP-Seq peak finders, we designed a novel protocol based on Validation Discriminant Analysis (VDA) to optimally select validation sites and generated two validation datasets, which are the most comprehensive to date for algorithmic benchmarking of key epigenetic marks. In addition, we systematically explored a total of 315 diverse parameter configurations from these algorithms and found that typically optimal parameters in one dataset do not generalize to other datasets. Nevertheless, default parameters show the most stable performance, suggesting that they should be used. This study also provides a reproducible and generalizable methodology for unbiased comparative analysis of high-throughput sequencing tools that can facilitate future algorithmic development.
ORMAN: optimal resolution of ambiguous RNA-Seq multimappings in the presence of novel isoforms.
Dao, Phuong; Numanagić, Ibrahim; Lin, Yen-Yi; Hach, Faraz; Karakoc, Emre; Donmez, Nilgun; Collins, Colin; Eichler, Evan E; Sahinalp, S Cenk
2014-03-01
RNA-Seq technology is promising to uncover many novel alternative splicing events, gene fusions and other variations in RNA transcripts. For an accurate detection and quantification of transcripts, it is important to resolve the mapping ambiguity for those RNA-Seq reads that can be mapped to multiple loci: >17% of the reads from mouse RNA-Seq data and 50% of the reads from some plant RNA-Seq data have multiple mapping loci. In this study, we show how to resolve the mapping ambiguity in the presence of novel transcriptomic events such as exon skipping and novel indels towards accurate downstream analysis. We introduce ORMAN ( O ptimal R esolution of M ultimapping A mbiguity of R N A-Seq Reads), which aims to compute the minimum number of potential transcript products for each gene and to assign each multimapping read to one of these transcripts based on the estimated distribution of the region covering the read. ORMAN achieves this objective through a combinatorial optimization formulation, which is solved through well-known approximation algorithms, integer linear programs and heuristics. On a simulated RNA-Seq dataset including a random subset of transcripts from the UCSC database, the performance of several state-of-the-art methods for identifying and quantifying novel transcripts, such as Cufflinks, IsoLasso and CLIIQ, is significantly improved through the use of ORMAN. Furthermore, in an experiment using real RNA-Seq reads, we show that ORMAN is able to resolve multimapping to produce coverage values that are similar to the original distribution, even in genes with highly non-uniform coverage. ORMAN is available at http://orman.sf.net
Normalization, bias correction, and peak calling for ChIP-seq
Diaz, Aaron; Park, Kiyoub; Lim, Daniel A.; Song, Jun S.
2012-01-01
Next-generation sequencing is rapidly transforming our ability to profile the transcriptional, genetic, and epigenetic states of a cell. In particular, sequencing DNA from the immunoprecipitation of protein-DNA complexes (ChIP-seq) and methylated DNA (MeDIP-seq) can reveal the locations of protein binding sites and epigenetic modifications. These approaches contain numerous biases which may significantly influence the interpretation of the resulting data. Rigorous computational methods for detecting and removing such biases are still lacking. Also, multi-sample normalization still remains an important open problem. This theoretical paper systematically characterizes the biases and properties of ChIP-seq data by comparing 62 separate publicly available datasets, using rigorous statistical models and signal processing techniques. Statistical methods for separating ChIP-seq signal from background noise, as well as correcting enrichment test statistics for sequence-dependent and sonication biases, are presented. Our method effectively separates reads into signal and background components prior to normalization, improving the signal-to-noise ratio. Moreover, most peak callers currently use a generic null model which suffers from low specificity at the sensitivity level requisite for detecting subtle, but true, ChIP enrichment. The proposed method of determining a cell type-specific null model, which accounts for cell type-specific biases, is shown to be capable of achieving a lower false discovery rate at a given significance threshold than current methods. PMID:22499706
ChIP-seq Accurately Predicts Tissue-Specific Activity of Enhancers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Visel, Axel; Blow, Matthew J.; Li, Zirong
2009-02-01
A major yet unresolved quest in decoding the human genome is the identification of the regulatory sequences that control the spatial and temporal expression of genes. Distant-acting transcriptional enhancers are particularly challenging to uncover since they are scattered amongst the vast non-coding portion of the genome. Evolutionary sequence constraint can facilitate the discovery of enhancers, but fails to predict when and where they are active in vivo. Here, we performed chromatin immunoprecipitation with the enhancer-associated protein p300, followed by massively-parallel sequencing, to map several thousand in vivo binding sites of p300 in mouse embryonic forebrain, midbrain, and limb tissue. Wemore » tested 86 of these sequences in a transgenic mouse assay, which in nearly all cases revealed reproducible enhancer activity in those tissues predicted by p300 binding. Our results indicate that in vivo mapping of p300 binding is a highly accurate means for identifying enhancers and their associated activities and suggest that such datasets will be useful to study the role of tissue-specific enhancers in human biology and disease on a genome-wide scale.« less
voomDDA: discovery of diagnostic biomarkers and classification of RNA-seq data.
Zararsiz, Gokmen; Goksuluk, Dincer; Klaus, Bernd; Korkmaz, Selcuk; Eldem, Vahap; Karabulut, Erdem; Ozturk, Ahmet
2017-01-01
RNA-Seq is a recent and efficient technique that uses the capabilities of next-generation sequencing technology for characterizing and quantifying transcriptomes. One important task using gene-expression data is to identify a small subset of genes that can be used to build diagnostic classifiers particularly for cancer diseases. Microarray based classifiers are not directly applicable to RNA-Seq data due to its discrete nature. Overdispersion is another problem that requires careful modeling of mean and variance relationship of the RNA-Seq data. In this study, we present voomDDA classifiers: variance modeling at the observational level (voom) extensions of the nearest shrunken centroids (NSC) and the diagonal discriminant classifiers. VoomNSC is one of these classifiers and brings voom and NSC approaches together for the purpose of gene-expression based classification. For this purpose, we propose weighted statistics and put these weighted statistics into the NSC algorithm. The VoomNSC is a sparse classifier that models the mean-variance relationship using the voom method and incorporates voom's precision weights into the NSC classifier via weighted statistics. A comprehensive simulation study was designed and four real datasets are used for performance assessment. The overall results indicate that voomNSC performs as the sparsest classifier. It also provides the most accurate results together with power-transformed Poisson linear discriminant analysis, rlog transformed support vector machines and random forests algorithms. In addition to prediction purposes, the voomNSC classifier can be used to identify the potential diagnostic biomarkers for a condition of interest. Through this work, statistical learning methods proposed for microarrays can be reused for RNA-Seq data. An interactive web application is freely available at http://www.biosoft.hacettepe.edu.tr/voomDDA/.
Transforming RNA-Seq data to improve the performance of prognostic gene signatures.
Zwiener, Isabella; Frisch, Barbara; Binder, Harald
2014-01-01
Gene expression measurements have successfully been used for building prognostic signatures, i.e for identifying a short list of important genes that can predict patient outcome. Mostly microarray measurements have been considered, and there is little advice available for building multivariable risk prediction models from RNA-Seq data. We specifically consider penalized regression techniques, such as the lasso and componentwise boosting, which can simultaneously consider all measurements and provide both, multivariable regression models for prediction and automated variable selection. However, they might be affected by the typical skewness, mean-variance-dependency or extreme values of RNA-Seq covariates and therefore could benefit from transformations of the latter. In an analytical part, we highlight preferential selection of covariates with large variances, which is problematic due to the mean-variance dependency of RNA-Seq data. In a simulation study, we compare different transformations of RNA-Seq data for potentially improving detection of important genes. Specifically, we consider standardization, the log transformation, a variance-stabilizing transformation, the Box-Cox transformation, and rank-based transformations. In addition, the prediction performance for real data from patients with kidney cancer and acute myeloid leukemia is considered. We show that signature size, identification performance, and prediction performance critically depend on the choice of a suitable transformation. Rank-based transformations perform well in all scenarios and can even outperform complex variance-stabilizing approaches. Generally, the results illustrate that the distribution and potential transformations of RNA-Seq data need to be considered as a critical step when building risk prediction models by penalized regression techniques.
Transforming RNA-Seq Data to Improve the Performance of Prognostic Gene Signatures
Zwiener, Isabella; Frisch, Barbara; Binder, Harald
2014-01-01
Gene expression measurements have successfully been used for building prognostic signatures, i.e for identifying a short list of important genes that can predict patient outcome. Mostly microarray measurements have been considered, and there is little advice available for building multivariable risk prediction models from RNA-Seq data. We specifically consider penalized regression techniques, such as the lasso and componentwise boosting, which can simultaneously consider all measurements and provide both, multivariable regression models for prediction and automated variable selection. However, they might be affected by the typical skewness, mean-variance-dependency or extreme values of RNA-Seq covariates and therefore could benefit from transformations of the latter. In an analytical part, we highlight preferential selection of covariates with large variances, which is problematic due to the mean-variance dependency of RNA-Seq data. In a simulation study, we compare different transformations of RNA-Seq data for potentially improving detection of important genes. Specifically, we consider standardization, the log transformation, a variance-stabilizing transformation, the Box-Cox transformation, and rank-based transformations. In addition, the prediction performance for real data from patients with kidney cancer and acute myeloid leukemia is considered. We show that signature size, identification performance, and prediction performance critically depend on the choice of a suitable transformation. Rank-based transformations perform well in all scenarios and can even outperform complex variance-stabilizing approaches. Generally, the results illustrate that the distribution and potential transformations of RNA-Seq data need to be considered as a critical step when building risk prediction models by penalized regression techniques. PMID:24416353
A new method for enhancer prediction based on deep belief network.
Bu, Hongda; Gan, Yanglan; Wang, Yang; Zhou, Shuigeng; Guan, Jihong
2017-10-16
Studies have shown that enhancers are significant regulatory elements to play crucial roles in gene expression regulation. Since enhancers are unrelated to the orientation and distance to their target genes, it is a challenging mission for scholars and researchers to accurately predicting distal enhancers. In the past years, with the high-throughout ChiP-seq technologies development, several computational techniques emerge to predict enhancers using epigenetic or genomic features. Nevertheless, the inconsistency of computational models across different cell-lines and the unsatisfactory prediction performance call for further research in this area. Here, we propose a new Deep Belief Network (DBN) based computational method for enhancer prediction, which is called EnhancerDBN. This method combines diverse features, composed of DNA sequence compositional features, DNA methylation and histone modifications. Our computational results indicate that 1) EnhancerDBN outperforms 13 existing methods in prediction, and 2) GC content and DNA methylation can serve as relevant features for enhancer prediction. Deep learning is effective in boosting the performance of enhancer prediction.
How and how much does RAD-seq bias genetic diversity estimates?
Cariou, Marie; Duret, Laurent; Charlat, Sylvain
2016-11-08
RAD-seq is a powerful tool, increasingly used in population genomics. However, earlier studies have raised red flags regarding possible biases associated with this technique. In particular, polymorphism on restriction sites results in preferential sampling of closely related haplotypes, so that RAD data tends to underestimate genetic diversity. Here we (1) clarify the theoretical basis of this bias, highlighting the potential confounding effects of population structure and selection, (2) confront predictions to real data from in silico digestion of full genomes and (3) provide a proof of concept toward an ABC-based correction of the RAD-seq bias. Under a neutral and panmictic model, we confirm the previously established relationship between the true polymorphism and its RAD-based estimation, showing a more pronounced bias when polymorphism is high. Using more elaborate models, we show that selection, resulting in heterogeneous levels of polymorphism along the genome, exacerbates the bias and leads to a more pronounced underestimation. On the contrary, spatial genetic structure tends to reduce the bias. We confront the neutral and panmictic model to "ideal" empirical data (in silico RAD-sequencing) using full genomes from natural populations of the fruit fly Drosophila melanogaster and the fungus Shizophyllum commune, harbouring respectively moderate and high genetic diversity. In D. melanogaster, predictions fit the model, but the small difference between the true and RAD polymorphism makes this comparison insensitive to deviations from the model. In the highly polymorphic fungus, the model captures a large part of the bias but makes inaccurate predictions. Accordingly, ABC corrections based on this model improve the estimations, albeit with some imprecisions. The RAD-seq underestimation of genetic diversity associated with polymorphism in restriction sites becomes more pronounced when polymorphism is high. In practice, this means that in many systems where polymorphism does not exceed 2 %, the bias is of minor importance in the face of other sources of uncertainty, such as heterogeneous bases composition or technical artefacts. The neutral panmictic model provides a practical mean to correct the bias through ABC, albeit with some imprecisions. More elaborate ABC methods might integrate additional parameters, such as population structure and selection, but their opposite effects could hinder accurate corrections.
Gene expression profiling of human breast tissue samples using SAGE-Seq.
Wu, Zhenhua Jeremy; Meyer, Clifford A; Choudhury, Sibgat; Shipitsin, Michail; Maruyama, Reo; Bessarabova, Marina; Nikolskaya, Tatiana; Sukumar, Saraswati; Schwartzman, Armin; Liu, Jun S; Polyak, Kornelia; Liu, X Shirley
2010-12-01
We present a powerful application of ultra high-throughput sequencing, SAGE-Seq, for the accurate quantification of normal and neoplastic mammary epithelial cell transcriptomes. We develop data analysis pipelines that allow the mapping of sense and antisense strands of mitochondrial and RefSeq genes, the normalization between libraries, and the identification of differentially expressed genes. We find that the diversity of cancer transcriptomes is significantly higher than that of normal cells. Our analysis indicates that transcript discovery plateaus at 10 million reads/sample, and suggests a minimum desired sequencing depth around five million reads. Comparison of SAGE-Seq and traditional SAGE on normal and cancerous breast tissues reveals higher sensitivity of SAGE-Seq to detect less-abundant genes, including those encoding for known breast cancer-related transcription factors and G protein-coupled receptors (GPCRs). SAGE-Seq is able to identify genes and pathways abnormally activated in breast cancer that traditional SAGE failed to call. SAGE-Seq is a powerful method for the identification of biomarkers and therapeutic targets in human disease.
Li, Shan; Dong, Xia; Su, Zhengchang
2013-07-30
Although prokaryotic gene transcription has been studied over decades, many aspects of the process remain poorly understood. Particularly, recent studies have revealed that transcriptomes in many prokaryotes are far more complex than previously thought. Genes in an operon are often alternatively and dynamically transcribed under different conditions, and a large portion of genes and intergenic regions have antisense RNA (asRNA) and non-coding RNA (ncRNA) transcripts, respectively. Ironically, similar studies have not been conducted in the model bacterium E coli K12, thus it is unknown whether or not the bacterium possesses similar complex transcriptomes. Furthermore, although RNA-seq becomes the major method for analyzing the complexity of prokaryotic transcriptome, it is still a challenging task to accurately assemble full length transcripts using short RNA-seq reads. To fill these gaps, we have profiled the transcriptomes of E. coli K12 under different culture conditions and growth phases using a highly specific directional RNA-seq technique that can capture various types of transcripts in the bacterial cells, combined with a highly accurate and robust algorithm and tool TruHMM (http://bioinfolab.uncc.edu/TruHmm_package/) for assembling full length transcripts. We found that 46.9 ~ 63.4% of expressed operons were utilized in their putative alternative forms, 72.23 ~ 89.54% genes had putative asRNA transcripts and 51.37 ~ 72.74% intergenic regions had putative ncRNA transcripts under different culture conditions and growth phases. As has been demonstrated in many other prokaryotes, E. coli K12 also has a highly complex and dynamic transcriptomes under different culture conditions and growth phases. Such complex and dynamic transcriptomes might play important roles in the physiology of the bacterium. TruHMM is a highly accurate and robust algorithm for assembling full-length transcripts in prokaryotes using directional RNA-seq short reads.
2013-01-01
Background Although prokaryotic gene transcription has been studied over decades, many aspects of the process remain poorly understood. Particularly, recent studies have revealed that transcriptomes in many prokaryotes are far more complex than previously thought. Genes in an operon are often alternatively and dynamically transcribed under different conditions, and a large portion of genes and intergenic regions have antisense RNA (asRNA) and non-coding RNA (ncRNA) transcripts, respectively. Ironically, similar studies have not been conducted in the model bacterium E coli K12, thus it is unknown whether or not the bacterium possesses similar complex transcriptomes. Furthermore, although RNA-seq becomes the major method for analyzing the complexity of prokaryotic transcriptome, it is still a challenging task to accurately assemble full length transcripts using short RNA-seq reads. Results To fill these gaps, we have profiled the transcriptomes of E. coli K12 under different culture conditions and growth phases using a highly specific directional RNA-seq technique that can capture various types of transcripts in the bacterial cells, combined with a highly accurate and robust algorithm and tool TruHMM (http://bioinfolab.uncc.edu/TruHmm_package/) for assembling full length transcripts. We found that 46.9 ~ 63.4% of expressed operons were utilized in their putative alternative forms, 72.23 ~ 89.54% genes had putative asRNA transcripts and 51.37 ~ 72.74% intergenic regions had putative ncRNA transcripts under different culture conditions and growth phases. Conclusions As has been demonstrated in many other prokaryotes, E. coli K12 also has a highly complex and dynamic transcriptomes under different culture conditions and growth phases. Such complex and dynamic transcriptomes might play important roles in the physiology of the bacterium. TruHMM is a highly accurate and robust algorithm for assembling full-length transcripts in prokaryotes using directional RNA-seq short reads. PMID:23899370
Sun, Hong; Guns, Tias; Fierro, Ana Carolina; Thorrez, Lieven; Nijssen, Siegfried; Marchal, Kathleen
2012-01-01
Computationally retrieving biologically relevant cis-regulatory modules (CRMs) is not straightforward. Because of the large number of candidates and the imperfection of the screening methods, many spurious CRMs are detected that are as high scoring as the biologically true ones. Using ChIP-information allows not only to reduce the regions in which the binding sites of the assayed transcription factor (TF) should be located, but also allows restricting the valid CRMs to those that contain the assayed TF (here referred to as applying CRM detection in a query-based mode). In this study, we show that exploiting ChIP-information in a query-based way makes in silico CRM detection a much more feasible endeavor. To be able to handle the large datasets, the query-based setting and other specificities proper to CRM detection on ChIP-Seq based data, we developed a novel powerful CRM detection method ‘CPModule’. By applying it on a well-studied ChIP-Seq data set involved in self-renewal of mouse embryonic stem cells, we demonstrate how our tool can recover combinatorial regulation of five known TFs that are key in the self-renewal of mouse embryonic stem cells. Additionally, we make a number of new predictions on combinatorial regulation of these five key TFs with other TFs documented in TRANSFAC. PMID:22422841
A normalization strategy for comparing tag count data
2012-01-01
Background High-throughput sequencing, such as ribonucleic acid sequencing (RNA-seq) and chromatin immunoprecipitation sequencing (ChIP-seq) analyses, enables various features of organisms to be compared through tag counts. Recent studies have demonstrated that the normalization step for RNA-seq data is critical for a more accurate subsequent analysis of differential gene expression. Development of a more robust normalization method is desirable for identifying the true difference in tag count data. Results We describe a strategy for normalizing tag count data, focusing on RNA-seq. The key concept is to remove data assigned as potential differentially expressed genes (DEGs) before calculating the normalization factor. Several R packages for identifying DEGs are currently available, and each package uses its own normalization method and gene ranking algorithm. We compared a total of eight package combinations: four R packages (edgeR, DESeq, baySeq, and NBPSeq) with their default normalization settings and with our normalization strategy. Many synthetic datasets under various scenarios were evaluated on the basis of the area under the curve (AUC) as a measure for both sensitivity and specificity. We found that packages using our strategy in the data normalization step overall performed well. This result was also observed for a real experimental dataset. Conclusion Our results showed that the elimination of potential DEGs is essential for more accurate normalization of RNA-seq data. The concept of this normalization strategy can widely be applied to other types of tag count data and to microarray data. PMID:22475125
2013-01-01
Background High-throughput RNA sequencing (RNA-seq) offers unprecedented power to capture the real dynamics of gene expression. Experimental designs with extensive biological replication present a unique opportunity to exploit this feature and distinguish expression profiles with higher resolution. RNA-seq data analysis methods so far have been mostly applied to data sets with few replicates and their default settings try to provide the best performance under this constraint. These methods are based on two well-known count data distributions: the Poisson and the negative binomial. The way to properly calibrate them with large RNA-seq data sets is not trivial for the non-expert bioinformatics user. Results Here we show that expression profiles produced by extensively-replicated RNA-seq experiments lead to a rich diversity of count data distributions beyond the Poisson and the negative binomial, such as Poisson-Inverse Gaussian or Pólya-Aeppli, which can be captured by a more general family of count data distributions called the Poisson-Tweedie. The flexibility of the Poisson-Tweedie family enables a direct fitting of emerging features of large expression profiles, such as heavy-tails or zero-inflation, without the need to alter a single configuration parameter. We provide a software package for R called tweeDEseq implementing a new test for differential expression based on the Poisson-Tweedie family. Using simulations on synthetic and real RNA-seq data we show that tweeDEseq yields P-values that are equally or more accurate than competing methods under different configuration parameters. By surveying the tiny fraction of sex-specific gene expression changes in human lymphoblastoid cell lines, we also show that tweeDEseq accurately detects differentially expressed genes in a real large RNA-seq data set with improved performance and reproducibility over the previously compared methodologies. Finally, we compared the results with those obtained from microarrays in order to check for reproducibility. Conclusions RNA-seq data with many replicates leads to a handful of count data distributions which can be accurately estimated with the statistical model illustrated in this paper. This method provides a better fit to the underlying biological variability; this may be critical when comparing groups of RNA-seq samples with markedly different count data distributions. The tweeDEseq package forms part of the Bioconductor project and it is available for download at http://www.bioconductor.org. PMID:23965047
Ching, Travers; Zhu, Xun; Garmire, Lana X
2018-04-01
Artificial neural networks (ANN) are computing architectures with many interconnections of simple neural-inspired computing elements, and have been applied to biomedical fields such as imaging analysis and diagnosis. We have developed a new ANN framework called Cox-nnet to predict patient prognosis from high throughput transcriptomics data. In 10 TCGA RNA-Seq data sets, Cox-nnet achieves the same or better predictive accuracy compared to other methods, including Cox-proportional hazards regression (with LASSO, ridge, and mimimax concave penalty), Random Forests Survival and CoxBoost. Cox-nnet also reveals richer biological information, at both the pathway and gene levels. The outputs from the hidden layer node provide an alternative approach for survival-sensitive dimension reduction. In summary, we have developed a new method for accurate and efficient prognosis prediction on high throughput data, with functional biological insights. The source code is freely available at https://github.com/lanagarmire/cox-nnet.
Conservation of a molecular target across species can be used as a line-of-evidence to predict the likelihood of chemical susceptibility. The web-based Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool was developed to simplify, streamline, and quantitat...
Matsunaga, Hiroko; Goto, Mari; Arikawa, Koji; Shirai, Masataka; Tsunoda, Hiroyuki; Huang, Huan; Kambara, Hideki
2015-02-15
Analyses of gene expressions in single cells are important for understanding detailed biological phenomena. Here, a highly sensitive and accurate method by sequencing (called "bead-seq") to obtain a whole gene expression profile for a single cell is proposed. A key feature of the method is to use a complementary DNA (cDNA) library on magnetic beads, which enables adding washing steps to remove residual reagents in a sample preparation process. By adding the washing steps, the next steps can be carried out under the optimal conditions without losing cDNAs. Error sources were carefully evaluated to conclude that the first several steps were the key steps. It is demonstrated that bead-seq is superior to the conventional methods for single-cell gene expression analyses in terms of reproducibility, quantitative accuracy, and biases caused during sample preparation and sequencing processes. Copyright © 2014 Elsevier Inc. All rights reserved.
USDA-ARS?s Scientific Manuscript database
Utilizing next-generation sequencing technology, combined with ChIP (Chromatin Immunoprecipitation) technology, we analyzed histone modification (acetylation) induced by butyrate and the large-scale mapping of the epigenomic landscape of normal histone H3 and acetylated histone H3K9 and H3K27. To d...
Integration of multimodal RNA-seq data for prediction of kidney cancer survival
Schwartzi, Matt; Parkl, Martin; Phanl, John H.; Wang., May D.
2016-01-01
Kidney cancer is of prominent concern in modern medicine. Predicting patient survival is critical to patient awareness and developing a proper treatment regimens. Previous prediction models built upon molecular feature analysis are limited to just gene expression data. In this study we investigate the difference in predicting five year survival between unimodal and multimodal analysis of RNA-seq data from gene, exon, junction, and isoform modalities. Our preliminary findings report higher predictive accuracy-as measured by area under the ROC curve (AUC)-for multimodal learning when compared to unimodal learning with both support vector machine (SVM) and k-nearest neighbor (KNN) methods. The results of this study justify further research on the use of multimodal RNA-seq data to predict survival for other cancer types using a larger sample size and additional machine learning methods. PMID:27532026
The Sim-SEQ Project: Comparison of Selected Flow Models for the S-3 Site
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mukhopadhyay, Sumit; Doughty, Christine A.; Bacon, Diana H.
Sim-SEQ is an international initiative on model comparison for geologic carbon sequestration, with an objective to understand and, if possible, quantify model uncertainties. Model comparison efforts in Sim-SEQ are at present focusing on one specific field test site, hereafter referred to as the Sim-SEQ Study site (or S-3 site). Within Sim-SEQ, different modeling teams are developing conceptual models of CO2 injection at the S-3 site. In this paper, we select five flow models of the S-3 site and provide a qualitative comparison of their attributes and predictions. These models are based on five different simulators or modeling approaches: TOUGH2/EOS7C, STOMP-CO2e,more » MoReS, TOUGH2-MP/ECO2N, and VESA. In addition to model-to-model comparison, we perform a limited model-to-data comparison, and illustrate how model choices impact model predictions. We conclude the paper by making recommendations for model refinement that are likely to result in less uncertainty in model predictions.« less
cChIP-seq: a robust small-scale method for investigation of histone modifications.
Valensisi, Cristina; Liao, Jo Ling; Andrus, Colin; Battle, Stephanie L; Hawkins, R David
2015-12-21
ChIP-seq is highly utilized for mapping histone modifications that are informative about gene regulation and genome annotations. For example, applying ChIP-seq to histone modifications such as H3K4me1 has facilitated generating epigenomic maps of putative enhancers. This powerful technology, however, is limited in its application by the large number of cells required. ChIP-seq involves extensive manipulation of sample material and multiple reactions with limited quality control at each step, therefore, scaling down the number of cells required has proven challenging. Recently, several methods have been proposed to overcome this limit but most of these methods require extensive optimization to tailor the protocol to the specific antibody used or number of cells being profiled. Here we describe a robust, yet facile method, which we named carrier ChIP-seq (cChIP-seq), for use on limited cell amounts. cChIP-seq employs a DNA-free histone carrier in order to maintain the working ChIP reaction scale, removing the need to tailor reactions to specific amounts of cells or histone modifications to be assayed. We have applied our method to three different histone modifications, H3K4me3, H3K4me1 and H3K27me3 in the K562 cell line, and H3K4me1 in H1 hESCs. We successfully obtained epigenomic maps for these histone modifications starting with as few as 10,000 cells. We compared cChIP-seq data to data generated as part of the ENCODE project. ENCODE data are the reference standard in the field and have been generated starting from tens of million of cells. Our results show that cChIP-seq successfully recapitulates bulk data. Furthermore, we showed that the differences observed between small-scale ChIP-seq data and ENCODE data are largely to be due to lab-to-lab variability rather than operating on a reduced scale. Data generated using cChIP-seq are equivalent to reference epigenomic maps from three orders of magnitude more cells. Our method offers a robust and straightforward approach to scale down ChIP-seq to as low as 10,000 cells. The underlying principle of our strategy makes it suitable for being applied to a vast range of chromatin modifications without requiring expensive optimization. Furthermore, our strategy of a DNA-free carrier can be adapted to most ChIP-seq protocols.
Thermal modeling and analysis of thin-walled structures in micro milling
NASA Astrophysics Data System (ADS)
Zhang, J. F.; Ma, Y. H.; Feng, C.; Tang, W.; Wang, S.
2017-11-01
The numerical analytical model has been developed to predict the thermal effect with respect to thin walled structures by micro-milling. In order to investigate the temperature distribution around micro-edge of cutter, it is necessary to considering the friction power, the shearing power, the shear area between the tool micro-edge and materials. Due to the micro-cutting area is more difficult to be measured accurately, the minimum chip thickness as one of critical factors is also introduced. Finite element-based simulation was employed by the Advantedge, which was determined from the machining of Ti-6Al-4V over a range of the uncut chip thicknesses. Results from the proposed model have been successfully accounted for the effects of thermal softening for material.
GC-Content Normalization for RNA-Seq Data
2011-01-01
Background Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting expression measures. Normalization is therefore essential to ensure accurate inference of expression levels and subsequent analyses thereof. Results We focus on biases related to GC-content and demonstrate the existence of strong sample-specific GC-content effects on RNA-Seq read counts, which can substantially bias differential expression analysis. We propose three simple within-lane gene-level GC-content normalization approaches and assess their performance on two different RNA-Seq datasets, involving different species and experimental designs. Our methods are compared to state-of-the-art normalization procedures in terms of bias and mean squared error for expression fold-change estimation and in terms of Type I error and p-value distributions for tests of differential expression. The exploratory data analysis and normalization methods proposed in this article are implemented in the open-source Bioconductor R package EDASeq. Conclusions Our within-lane normalization procedures, followed by between-lane normalization, reduce GC-content bias and lead to more accurate estimates of expression fold-changes and tests of differential expression. Such results are crucial for the biological interpretation of RNA-Seq experiments, where downstream analyses can be sensitive to the supplied lists of genes. PMID:22177264
A multi-organ chip co-culture of neurospheres and liver equivalents for long-term substance testing.
Materne, Eva-Maria; Ramme, Anja Patricia; Terrasso, Ana Paula; Serra, Margarida; Alves, Paula Marques; Brito, Catarina; Sakharov, Dmitry A; Tonevitsky, Alexander G; Lauster, Roland; Marx, Uwe
2015-07-10
Current in vitro and animal tests for drug development are failing to emulate the systemic organ complexity of the human body and, therefore, often do not accurately predict drug toxicity, leading to high attrition rates in clinical studies (Paul et al., 2010). The phylogenetic distance between humans and laboratory animals is enormous, this affects the transferability of animal data on the efficacy of neuroprotective drugs. Therefore, many neuroprotective treatments that have shown promise in animals have not been successful when transferred to humans (Dragunow, 2008; Gibbons and Dragunow, 2010). We present a multi-organ chip capable of maintaining 3D tissues derived from various cell sources in a combined media circuit which bridges the gap in systemic and human tests. A steady state co-culture of human artificial liver microtissues and human neurospheres exposed to fluid flow over two weeks in the multi-organ chip has successfully proven its long-term performance. Daily lactate dehydrogenase activity measurements of the medium and immunofluorescence end-point staining proved the viability of the tissues and the maintenance of differentiated cellular phenotypes. Moreover, the lactate production and glucose consumption values of the tissues cultured indicated that a stable steady-state was achieved after 6 days of co-cultivation. The neurospheres remained differentiated neurons over the two-week cultivation in the multi-organ chip, proven by qPCR and immunofluorescence of the neuronal markers βIII-tubulin and microtubule-associated protein-2. Additionally, a two-week toxicity assay with a repeated substance exposure to the neurotoxic 2,5-hexanedione in two different concentrations induced high apoptosis within the neurospheres and liver microtissues, as shown by a strong increase of lactate dehydrogenase activity in the medium. The principal finding of the exposure of the co-culture to 2,5-hexanedione was that not only toxicity profiles of two different doses could be discriminated, but also that the co-cultures were more sensitive to the substance compared to respective single-tissue cultures in the multi-organ-chip. Thus, we provide here a new in vitro tool which might be utilized to predict the safety and efficacy of substances in clinical studies more accurately in the future. Copyright © 2015 Elsevier B.V. All rights reserved.
Li, Wenli; Turner, Amy; Aggarwal, Praful; Matter, Andrea; Storvick, Erin; Arnett, Donna K; Broeckel, Ulrich
2015-12-16
Whole transcriptome sequencing (RNA-seq) represents a powerful approach for whole transcriptome gene expression analysis. However, RNA-seq carries a few limitations, e.g., the requirement of a significant amount of input RNA and complications led by non-specific mapping of short reads. The Ion AmpliSeq Transcriptome Human Gene Expression Kit (AmpliSeq) was recently introduced by Life Technologies as a whole-transcriptome, targeted gene quantification kit to overcome these limitations of RNA-seq. To assess the performance of this new methodology, we performed a comprehensive comparison of AmpliSeq with RNA-seq using two well-established next-generation sequencing platforms (Illumina HiSeq and Ion Torrent Proton). We analyzed standard reference RNA samples and RNA samples obtained from human induced pluripotent stem cell derived cardiomyocytes (hiPSC-CMs). Using published data from two standard RNA reference samples, we observed a strong concordance of log2 fold change for all genes when comparing AmpliSeq to Illumina HiSeq (Pearson's r = 0.92) and Ion Torrent Proton (Pearson's r = 0.92). We used ROC, Matthew's correlation coefficient and RMSD to determine the overall performance characteristics. All three statistical methods demonstrate AmpliSeq as a highly accurate method for differential gene expression analysis. Additionally, for genes with high abundance, AmpliSeq outperforms the two RNA-seq methods. When analyzing four closely related hiPSC-CM lines, we show that both AmpliSeq and RNA-seq capture similar global gene expression patterns consistent with known sources of variations. Our study indicates that AmpliSeq excels in the limiting areas of RNA-seq for gene expression quantification analysis. Thus, AmpliSeq stands as a very sensitive and cost-effective approach for very large scale gene expression analysis and mRNA marker screening with high accuracy.
An Annotation Agnostic Algorithm for Detecting Nascent RNA Transcripts in GRO-Seq.
Azofeifa, Joseph G; Allen, Mary A; Lladser, Manuel E; Dowell, Robin D
2017-01-01
We present a fast and simple algorithm to detect nascent RNA transcription in global nuclear run-on sequencing (GRO-seq). GRO-seq is a relatively new protocol that captures nascent transcripts from actively engaged polymerase, providing a direct read-out on bona fide transcription. Most traditional assays, such as RNA-seq, measure steady state RNA levels which are affected by transcription, post-transcriptional processing, and RNA stability. GRO-seq data, however, presents unique analysis challenges that are only beginning to be addressed. Here, we describe a new algorithm, Fast Read Stitcher (FStitch), that takes advantage of two popular machine-learning techniques, hidden Markov models and logistic regression, to classify which regions of the genome are transcribed. Given a small user-defined training set, our algorithm is accurate, robust to varying read depth, annotation agnostic, and fast. Analysis of GRO-seq data without a priori need for annotation uncovers surprising new insights into several aspects of the transcription process.
The Role of Genome Accessibility in Transcription Factor Binding in Bacteria.
Gomes, Antonio L C; Wang, Harris H
2016-04-01
ChIP-seq enables genome-scale identification of regulatory regions that govern gene expression. However, the biological insights generated from ChIP-seq analysis have been limited to predictions of binding sites and cooperative interactions. Furthermore, ChIP-seq data often poorly correlate with in vitro measurements or predicted motifs, highlighting that binding affinity alone is insufficient to explain transcription factor (TF)-binding in vivo. One possibility is that binding sites are not equally accessible across the genome. A more comprehensive biophysical representation of TF-binding is required to improve our ability to understand, predict, and alter gene expression. Here, we show that genome accessibility is a key parameter that impacts TF-binding in bacteria. We developed a thermodynamic model that parameterizes ChIP-seq coverage in terms of genome accessibility and binding affinity. The role of genome accessibility is validated using a large-scale ChIP-seq dataset of the M. tuberculosis regulatory network. We find that accounting for genome accessibility led to a model that explains 63% of the ChIP-seq profile variance, while a model based in motif score alone explains only 35% of the variance. Moreover, our framework enables de novo ChIP-seq peak prediction and is useful for inferring TF-binding peaks in new experimental conditions by reducing the need for additional experiments. We observe that the genome is more accessible in intergenic regions, and that increased accessibility is positively correlated with gene expression and anti-correlated with distance to the origin of replication. Our biophysically motivated model provides a more comprehensive description of TF-binding in vivo from first principles towards a better representation of gene regulation in silico, with promising applications in systems biology.
2010-01-01
Background Global profiling of in vivo protein-DNA interactions using ChIP-based technologies has evolved rapidly in recent years. Although many genome-wide studies have identified thousands of ERα binding sites and have revealed the associated transcription factor (TF) partners, such as AP1, FOXA1 and CEBP, little is known about ERα associated hierarchical transcriptional regulatory networks. Results In this study, we applied computational approaches to analyze three public available ChIP-based datasets: ChIP-seq, ChIP-PET and ChIP-chip, and to investigate the hierarchical regulatory network for ERα and ERα partner TFs regulation in estrogen-dependent breast cancer MCF7 cells. 16 common TFs and two common new TF partners (RORA and PITX2) were found among ChIP-seq, ChIP-chip and ChIP-PET datasets. The regulatory networks were constructed by scanning the ChIP-peak region with TF specific position weight matrix (PWM). A permutation test was performed to test the reliability of each connection of the network. We then used DREM software to perform gene ontology function analysis on the common genes. We found that FOS, PITX2, RORA and FOXA1 were involved in the up-regulated genes. We also conducted the ERα and Pol-II ChIP-seq experiments in tamoxifen resistance MCF7 cells (denoted as MCF7-T in this study) and compared the difference between MCF7 and MCF7-T cells. The result showed very little overlap between these two cells in terms of targeted genes (21.2% of common genes) and targeted TFs (25% of common TFs). The significant dissimilarity may indicate totally different transcriptional regulatory mechanisms between these two cancer cells. Conclusions Our study uncovers new estrogen-mediated regulatory networks by mining three ChIP-based data in MCF7 cells and ChIP-seq data in MCF7-T cells. We compared the different ChIP-based technologies as well as different breast cancer cells. Our computational analytical approach may guide biologists to further study the underlying mechanisms in breast cancer cells or other human diseases. PMID:21167036
Yu, Hua; Jiao, Bingke; Lu, Lu; Wang, Pengfei; Chen, Shuangcheng; Liang, Chengzhi; Liu, Wei
2018-01-01
Accurately reconstructing gene co-expression network is of great importance for uncovering the genetic architecture underlying complex and various phenotypes. The recent availability of high-throughput RNA-seq sequencing has made genome-wide detecting and quantifying of the novel, rare and low-abundance transcripts practical. However, its potential merits in reconstructing gene co-expression network have still not been well explored. Using massive-scale RNA-seq samples, we have designed an ensemble pipeline, called NetMiner, for building genome-scale and high-quality Gene Co-expression Network (GCN) by integrating three frequently used inference algorithms. We constructed a RNA-seq-based GCN in one species of monocot rice. The quality of network obtained by our method was verified and evaluated by the curated gene functional association data sets, which obviously outperformed each single method. In addition, the powerful capability of network for associating genes with functions and agronomic traits was shown by enrichment analysis and case studies. In particular, we demonstrated the potential value of our proposed method to predict the biological roles of unknown protein-coding genes, long non-coding RNA (lncRNA) genes and circular RNA (circRNA) genes. Our results provided a valuable and highly reliable data source to select key candidate genes for subsequent experimental validation. To facilitate identification of novel genes regulating important biological processes and phenotypes in other plants or animals, we have published the source code of NetMiner, making it freely available at https://github.com/czllab/NetMiner.
Petukhov, Viktor; Guo, Jimin; Baryawno, Ninib; Severe, Nicolas; Scadden, David T; Samsonova, Maria G; Kharchenko, Peter V
2018-06-19
Recent single-cell RNA-seq protocols based on droplet microfluidics use massively multiplexed barcoding to enable simultaneous measurements of transcriptomes for thousands of individual cells. The increasing complexity of such data creates challenges for subsequent computational processing and troubleshooting of these experiments, with few software options currently available. Here, we describe a flexible pipeline for processing droplet-based transcriptome data that implements barcode corrections, classification of cell quality, and diagnostic information about the droplet libraries. We introduce advanced methods for correcting composition bias and sequencing errors affecting cellular and molecular barcodes to provide more accurate estimates of molecular counts in individual cells.
Chromatin Immunoprecipitation in Early Mouse Embryos.
García-González, Estela G; Roque-Ramirez, Bladimir; Palma-Flores, Carlos; Hernández-Hernández, J Manuel
2018-01-01
Epigenetic regulation is achieved at many levels by different factors such as tissue-specific transcription factors, members of the basal transcriptional apparatus, chromatin-binding proteins, and noncoding RNAs. Importantly, chromatin structure dictates the availability of a specific genomic locus for transcriptional activation as well as the efficiency with which transcription can occur. Chromatin immunoprecipitation (ChIP) is a method that allows elucidating gene regulation at the molecular level by assessing if chromatin modifications or proteins are present at a specific locus. Initially, the majority of ChIP experiments were performed on cultured cell lines and more recently this technique has been adapted to a variety of tissues in different model organisms. Using ChIP on mouse embryos, it is possible to document the presence or absence of specific proteins and chromatin modifications at genomic loci in vivo during mammalian development and to get biological meaning from observations made on tissue culture analyses. We describe here a ChIP protocol on freshly isolated mouse embryonic somites for in vivo analysis of muscle specific transcription factor binding on chromatin. This protocol has been easily adapted to other mouse embryonic tissues and has also been successfully scaled up to perform ChIP-Seq.
Predicting gene regulatory networks of soybean nodulation from RNA-Seq transcriptome data.
Zhu, Mingzhu; Dahmen, Jeremy L; Stacey, Gary; Cheng, Jianlin
2013-09-22
High-throughput RNA sequencing (RNA-Seq) is a revolutionary technique to study the transcriptome of a cell under various conditions at a systems level. Despite the wide application of RNA-Seq techniques to generate experimental data in the last few years, few computational methods are available to analyze this huge amount of transcription data. The computational methods for constructing gene regulatory networks from RNA-Seq expression data of hundreds or even thousands of genes are particularly lacking and urgently needed. We developed an automated bioinformatics method to predict gene regulatory networks from the quantitative expression values of differentially expressed genes based on RNA-Seq transcriptome data of a cell in different stages and conditions, integrating transcriptional, genomic and gene function data. We applied the method to the RNA-Seq transcriptome data generated for soybean root hair cells in three different development stages of nodulation after rhizobium infection. The method predicted a soybean nodulation-related gene regulatory network consisting of 10 regulatory modules common for all three stages, and 24, 49 and 70 modules separately for the first, second and third stage, each containing both a group of co-expressed genes and several transcription factors collaboratively controlling their expression under different conditions. 8 of 10 common regulatory modules were validated by at least two kinds of validations, such as independent DNA binding motif analysis, gene function enrichment test, and previous experimental data in the literature. We developed a computational method to reliably reconstruct gene regulatory networks from RNA-Seq transcriptome data. The method can generate valuable hypotheses for interpreting biological data and designing biological experiments such as ChIP-Seq, RNA interference, and yeast two hybrid experiments.
Gillotin, Sébastien; Guillemot, François
2016-06-20
Chromatin immunoprecipitation followed by deep sequencing (ChIP-Seq) is an important strategy to study gene regulation. When availability of cells is limited, however, it can be useful to focus on specific genes to investigate in depth the role of transcription factors or histone marks. Unfortunately, performing ChIP experiments to study transcription factors' binding to DNA can be difficult when biological material is restricted. This protocol describes a robust method to perform μChIP for over-expressed or endogenous transcription factors using ~100,000 cells per ChIP experiment (Masserdotti et al ., 2015). We also describe optimization steps, which we think are critical for this protocol to work and which can be used to further reduce the number of cells.
Accounting for immunoprecipitation efficiencies in the statistical analysis of ChIP-seq data.
Bao, Yanchun; Vinciotti, Veronica; Wit, Ernst; 't Hoen, Peter A C
2013-05-30
ImmunoPrecipitation (IP) efficiencies may vary largely between different antibodies and between repeated experiments with the same antibody. These differences have a large impact on the quality of ChIP-seq data: a more efficient experiment will necessarily lead to a higher signal to background ratio, and therefore to an apparent larger number of enriched regions, compared to a less efficient experiment. In this paper, we show how IP efficiencies can be explicitly accounted for in the joint statistical modelling of ChIP-seq data. We fit a latent mixture model to eight experiments on two proteins, from two laboratories where different antibodies are used for the two proteins. We use the model parameters to estimate the efficiencies of individual experiments, and find that these are clearly different for the different laboratories, and amongst technical replicates from the same lab. When we account for ChIP efficiency, we find more regions bound in the more efficient experiments than in the less efficient ones, at the same false discovery rate. A priori knowledge of the same number of binding sites across experiments can also be included in the model for a more robust detection of differentially bound regions among two different proteins. We propose a statistical model for the detection of enriched and differentially bound regions from multiple ChIP-seq data sets. The framework that we present accounts explicitly for IP efficiencies in ChIP-seq data, and allows to model jointly, rather than individually, replicates and experiments from different proteins, leading to more robust biological conclusions.
Oluwadare, Oluwatosin; Cheng, Jianlin
2017-11-14
With the development of chromosomal conformation capturing techniques, particularly, the Hi-C technique, the study of the spatial conformation of a genome is becoming an important topic in bioinformatics and computational biology. The Hi-C technique can generate genome-wide chromosomal interaction (contact) data, which can be used to investigate the higher-level organization of chromosomes, such as Topologically Associated Domains (TAD), i.e., locally packed chromosome regions bounded together by intra chromosomal contacts. The identification of the TADs for a genome is useful for studying gene regulation, genomic interaction, and genome function. Here, we formulate the TAD identification problem as an unsupervised machine learning (clustering) problem, and develop a new TAD identification method called ClusterTAD. We introduce a novel method to represent chromosomal contacts as features to be used by the clustering algorithm. Our results show that ClusterTAD can accurately predict the TADs on a simulated Hi-C data. Our method is also largely complementary and consistent with existing methods on the real Hi-C datasets of two mouse cells. The validation with the chromatin immunoprecipitation (ChIP) sequencing (ChIP-Seq) data shows that the domain boundaries identified by ClusterTAD have a high enrichment of CTCF binding sites, promoter-related marks, and enhancer-related histone modifications. As ClusterTAD is based on a proven clustering approach, it opens a new avenue to apply a large array of clustering methods developed in the machine learning field to the TAD identification problem. The source code, the results, and the TADs generated for the simulated and real Hi-C datasets are available here: https://github.com/BDM-Lab/ClusterTAD .
DIANA-LncBase v2: indexing microRNA targets on non-coding transcripts
Paraskevopoulou, Maria D.; Vlachos, Ioannis S.; Karagkouni, Dimitra; Georgakilas, Georgios; Kanellos, Ilias; Vergoulis, Thanasis; Zagganas, Konstantinos; Tsanakas, Panayiotis; Floros, Evangelos; Dalamagas, Theodore; Hatzigeorgiou, Artemis G.
2016-01-01
microRNAs (miRNAs) are short non-coding RNAs (ncRNAs) that act as post-transcriptional regulators of coding gene expression. Long non-coding RNAs (lncRNAs) have been recently reported to interact with miRNAs. The sponge-like function of lncRNAs introduces an extra layer of complexity in the miRNA interactome. DIANA-LncBase v1 provided a database of experimentally supported and in silico predicted miRNA Recognition Elements (MREs) on lncRNAs. The second version of LncBase (www.microrna.gr/LncBase) presents an extensive collection of miRNA:lncRNA interactions. The significantly enhanced database includes more than 70 000 low and high-throughput, (in)direct miRNA:lncRNA experimentally supported interactions, derived from manually curated publications and the analysis of 153 AGO CLIP-Seq libraries. The new experimental module presents a 14-fold increase compared to the previous release. LncBase v2 hosts in silico predicted miRNA targets on lncRNAs, identified with the DIANA-microT algorithm. The relevant module provides millions of predicted miRNA binding sites, accompanied with detailed metadata and MRE conservation metrics. LncBase v2 caters information regarding cell type specific miRNA:lncRNA regulation and enables users to easily identify interactions in 66 different cell types, spanning 36 tissues for human and mouse. Database entries are also supported by accurate lncRNA expression information, derived from the analysis of more than 6 billion RNA-Seq reads. PMID:26612864
NASA Astrophysics Data System (ADS)
Lin, Jieqiong; Guan, Liang; Lu, Mingming; Han, Jinguo; Kan, Yudi
2017-12-01
In traditional diamond cutting, the cutting force is usually large and it will affect tool life and machining quality. Elliptical vibration cutting (EVC) as one of the ultra-precision machining technologies has a lot of advantages, such as reduces cutting force, extend tool life and so on. It's difficult to predict the transient cutting force of EVC due to its unique elliptical motion trajectory. Study on chip formation will helpfully to predict cutting force. The geometric feature of chip has important effects on cutting force, however, few scholars have studied the chip formation. In order to investigate the time-varying cutting force of EVC, the geometric feature model of chip is established based on analysis of chip formation, and the effects of cutting parameters on the geometric feature of chip are analyzed. To predict transient force quickly and effectively, the geometric feature of chip is introduced into the cutting force model. The calculated results show that the error between the predicted cutting force in this paper and that in the literature is less than 2%, which proves its feasibility.
Detecting circular RNAs: bioinformatic and experimental challenges
Szabo, Linda; Salzman, Julia
2017-01-01
The pervasive expression of circular RNAs (circRNAs) is a recently discovered feature of gene expression in highly diverged eukaryotes. Numerous algorithms that are used to detect genome-wide circRNA expression from RNA sequencing (RNA-seq) data have been developed in the past few years, but there is little overlap in their predictions and no clear gold-standard method to assess the accuracy of these algorithms. We review sources of experimental and bioinformatic biases that complicate the accurate discovery of circRNAs and discuss statistical approaches to address these biases. We conclude with a discussion of the current experimental progress on the topic. PMID:27739534
Yu, Kyeong-Nam; Nadanaciva, Sashi; Rana, Payal; Lee, Dong Woo; Ku, Bosung; Roth, Alexander D; Dordick, Jonathan S; Will, Yvonne; Lee, Moo-Yeal
2018-03-01
Human liver contains various oxidative and conjugative enzymes that can convert nontoxic parent compounds to toxic metabolites or, conversely, toxic parent compounds to nontoxic metabolites. Unlike primary hepatocytes, which contain myriad drug-metabolizing enzymes (DMEs), but are difficult to culture and maintain physiological levels of DMEs, immortalized hepatic cell lines used in predictive toxicity assays are easy to culture, but lack the ability to metabolize compounds. To address this limitation and predict metabolism-induced hepatotoxicity in high-throughput, we developed an advanced miniaturized three-dimensional (3D) cell culture array (DataChip 2.0) and an advanced metabolizing enzyme microarray (MetaChip 2.0). The DataChip is a functionalized micropillar chip that supports the Hep3B human hepatoma cell line in a 3D microarray format. The MetaChip is a microwell chip containing immobilized DMEs found in the human liver. As a proof of concept for generating compound metabolites in situ on the chip and rapidly assessing their toxicity, 22 model compounds were dispensed into the MetaChip and sandwiched with the DataChip. The IC 50 values obtained from the chip platform were correlated with rat LD 50 values, human C max values, and drug-induced liver injury categories to predict adverse drug reactions in vivo. As a result, the platform had 100% sensitivity, 86% specificity, and 93% overall predictivity at optimum cutoffs of IC 50 and C max values. Therefore, the DataChip/MetaChip platform could be used as a high-throughput, early stage, microscale alternative to conventional in vitro multi-well plate platforms and provide a rapid and inexpensive assessment of metabolism-induced toxicity at early phases of drug development.
Fast and accurate enzyme activity measurements using a chip-based microfluidic calorimeter.
van Schie, Morten M C H; Ebrahimi, Kourosh Honarmand; Hagen, Wilfred R; Hagedoorn, Peter-Leon
2018-03-01
Recent developments in microfluidic and nanofluidic technologies have resulted in development of new chip-based microfluidic calorimeters with potential use in different fields. One application would be the accurate high-throughput measurement of enzyme activity. Calorimetry is a generic way to measure activity of enzymes, but unlike conventional calorimeters, chip-based calorimeters can be easily automated and implemented in high-throughput screening platforms. However, application of chip-based microfluidic calorimeters to measure enzyme activity has been limited due to problems associated with miniaturization such as incomplete mixing and a decrease in volumetric heat generated. To address these problems we introduced a calibration method and devised a convenient protocol for using a chip-based microfluidic calorimeter. Using the new calibration method, the progress curve of alkaline phosphatase, which has product inhibition for phosphate, measured by the calorimeter was the same as that recorded by UV-visible spectroscopy. Our results may enable use of current chip-based microfluidic calorimeters in a simple manner as a tool for high-throughput screening of enzyme activity with potential applications in drug discovery and enzyme engineering. Copyright © 2017. Published by Elsevier Inc.
Kamps-Hughes, Nick; McUsic, Andrew; Kurihara, Laurie; Harkins, Timothy T.; Pal, Prithwish; Ray, Claire
2018-01-01
The accurate detection of ultralow allele frequency variants in DNA samples is of interest in both research and medical settings, particularly in liquid biopsies where cancer mutational status is monitored from circulating DNA. Next-generation sequencing (NGS) technologies employing molecular barcoding have shown promise but significant sensitivity and specificity improvements are still needed to detect mutations in a majority of patients before the metastatic stage. To address this we present analytical validation data for ERASE-Seq (Elimination of Recurrent Artifacts and Stochastic Errors), a method for accurate and sensitive detection of ultralow frequency DNA variants in NGS data. ERASE-Seq differs from previous methods by creating a robust statistical framework to utilize technical replicates in conjunction with background error modeling, providing a 10 to 100-fold reduction in false positive rates compared to published molecular barcoding methods. ERASE-Seq was tested using spiked human DNA mixtures with clinically realistic DNA input quantities to detect SNVs and indels between 0.05% and 1% allele frequency, the range commonly found in liquid biopsy samples. Variants were detected with greater than 90% sensitivity and a false positive rate below 0.1 calls per 10,000 possible variants. The approach represents a significant performance improvement compared to molecular barcoding methods and does not require changing molecular reagents. PMID:29630678
Cox-nnet: An artificial neural network method for prognosis prediction of high-throughput omics data
Ching, Travers; Zhu, Xun
2018-01-01
Artificial neural networks (ANN) are computing architectures with many interconnections of simple neural-inspired computing elements, and have been applied to biomedical fields such as imaging analysis and diagnosis. We have developed a new ANN framework called Cox-nnet to predict patient prognosis from high throughput transcriptomics data. In 10 TCGA RNA-Seq data sets, Cox-nnet achieves the same or better predictive accuracy compared to other methods, including Cox-proportional hazards regression (with LASSO, ridge, and mimimax concave penalty), Random Forests Survival and CoxBoost. Cox-nnet also reveals richer biological information, at both the pathway and gene levels. The outputs from the hidden layer node provide an alternative approach for survival-sensitive dimension reduction. In summary, we have developed a new method for accurate and efficient prognosis prediction on high throughput data, with functional biological insights. The source code is freely available at https://github.com/lanagarmire/cox-nnet. PMID:29634719
Talpur, M Younis; Kara, Huseyin; Sherazi, S T H; Ayyildiz, H Filiz; Topkafa, Mustafa; Arslan, Fatma Nur; Naz, Saba; Durmaz, Fatih; Sirajuddin
2014-11-01
Single bounce attenuated total reflectance (SB-ATR) Fourier transform infrared (FTIR) spectroscopy in conjunction with chemometrics was used for accurate determination of free fatty acid (FFA), peroxide value (PV), iodine value (IV), conjugated diene (CD) and conjugated triene (CT) of cottonseed oil (CSO) during potato chips frying. Partial least square (PLS), stepwise multiple linear regression (SMLR), principal component regression (PCR) and simple Beer׳s law (SBL) were applied to develop the calibrations for simultaneous evaluation of five stated parameters of cottonseed oil (CSO) during frying of French frozen potato chips at 170°C. Good regression coefficients (R(2)) were achieved for FFA, PV, IV, CD and CT with value of >0.992 by PLS, SMLR, PCR, and SBL. Root mean square error of prediction (RMSEP) was found to be less than 1.95% for all determinations. Result of the study indicated that SB-ATR FTIR in combination with multivariate chemometrics could be used for accurate and simultaneous determination of different parameters during the frying process without using any toxic organic solvent. Copyright © 2014 Elsevier B.V. All rights reserved.
Karnik, Rahul; Beer, Michael A.
2015-01-01
The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-the-art computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called MotifSpec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs. PMID:26465884
Karnik, Rahul; Beer, Michael A
2015-01-01
The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-the-art computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called MotifSpec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs.
Comtet-Marre, Sophie; Chaucheyras-Durand, Frédérique; Bouzid, Ourdia; Mosoni, Pascale; Bayat, Ali R.; Peyret, Pierre; Forano, Evelyne
2018-01-01
Ruminants fulfill their energy needs for growth primarily through microbial breakdown of plant biomass in the rumen. Several biotic and abiotic factors influence the efficiency of fiber degradation, which can ultimately impact animal productivity and health. To provide more insight into mechanisms involved in the modulation of fibrolytic activity, a functional DNA microarray targeting genes encoding key enzymes involved in cellulose and hemicellulose degradation by rumen microbiota was designed. Eight carbohydrate-active enzyme (CAZyme) families (GH5, GH9, GH10, GH11, GH43, GH48, CE1, and CE6) were selected which represented 392 genes from bacteria, protozoa, and fungi. The DNA microarray, designated as FibroChip, was validated using targets of increasing complexity and demonstrated sensitivity and specificity. In addition, FibroChip was evaluated for its explorative and semi-quantitative potential. Differential expression of CAZyme genes was evidenced in the rumen bacterium Fibrobacter succinogenes S85 grown on wheat straw or cellobiose. FibroChip was used to identify the expressed CAZyme genes from the targeted families in the rumen of a cow fed a mixed diet based on grass silage. Among expressed genes, those encoding GH43, GH5, and GH10 families were the most represented. Most of the F. succinogenes genes detected by the FibroChip were also detected following RNA-seq analysis of RNA transcripts obtained from the rumen fluid sample. Use of the FibroChip also indicated that transcripts of fiber degrading enzymes derived from eukaryotes (protozoa and anaerobic fungi) represented a significant proportion of the total microbial mRNA pool. FibroChip represents a reliable and high-throughput tool that enables researchers to monitor active members of fiber degradation in the rumen. PMID:29487591
Comtet-Marre, Sophie; Chaucheyras-Durand, Frédérique; Bouzid, Ourdia; Mosoni, Pascale; Bayat, Ali R; Peyret, Pierre; Forano, Evelyne
2018-01-01
Ruminants fulfill their energy needs for growth primarily through microbial breakdown of plant biomass in the rumen. Several biotic and abiotic factors influence the efficiency of fiber degradation, which can ultimately impact animal productivity and health. To provide more insight into mechanisms involved in the modulation of fibrolytic activity, a functional DNA microarray targeting genes encoding key enzymes involved in cellulose and hemicellulose degradation by rumen microbiota was designed. Eight carbohydrate-active enzyme (CAZyme) families (GH5, GH9, GH10, GH11, GH43, GH48, CE1, and CE6) were selected which represented 392 genes from bacteria, protozoa, and fungi. The DNA microarray, designated as FibroChip, was validated using targets of increasing complexity and demonstrated sensitivity and specificity. In addition, FibroChip was evaluated for its explorative and semi-quantitative potential. Differential expression of CAZyme genes was evidenced in the rumen bacterium Fibrobacter succinogenes S85 grown on wheat straw or cellobiose. FibroChip was used to identify the expressed CAZyme genes from the targeted families in the rumen of a cow fed a mixed diet based on grass silage. Among expressed genes, those encoding GH43, GH5, and GH10 families were the most represented. Most of the F. succinogenes genes detected by the FibroChip were also detected following RNA-seq analysis of RNA transcripts obtained from the rumen fluid sample. Use of the FibroChip also indicated that transcripts of fiber degrading enzymes derived from eukaryotes (protozoa and anaerobic fungi) represented a significant proportion of the total microbial mRNA pool. FibroChip represents a reliable and high-throughput tool that enables researchers to monitor active members of fiber degradation in the rumen.
SeqMule: automated pipeline for analysis of human exome/genome sequencing data.
Guo, Yunfei; Ding, Xiaolei; Shen, Yufeng; Lyon, Gholson J; Wang, Kai
2015-09-18
Next-generation sequencing (NGS) technology has greatly helped us identify disease-contributory variants for Mendelian diseases. However, users are often faced with issues such as software compatibility, complicated configuration, and no access to high-performance computing facility. Discrepancies exist among aligners and variant callers. We developed a computational pipeline, SeqMule, to perform automated variant calling from NGS data on human genomes and exomes. SeqMule integrates computational-cluster-free parallelization capability built on top of the variant callers, and facilitates normalization/intersection of variant calls to generate consensus set with high confidence. SeqMule integrates 5 alignment tools, 5 variant calling algorithms and accepts various combinations all by one-line command, therefore allowing highly flexible yet fully automated variant calling. In a modern machine (2 Intel Xeon X5650 CPUs, 48 GB memory), when fast turn-around is needed, SeqMule generates annotated VCF files in a day from a 30X whole-genome sequencing data set; when more accurate calling is needed, SeqMule generates consensus call set that improves over single callers, as measured by both Mendelian error rate and consistency. SeqMule supports Sun Grid Engine for parallel processing, offers turn-key solution for deployment on Amazon Web Services, allows quality check, Mendelian error check, consistency evaluation, HTML-based reports. SeqMule is available at http://seqmule.openbioinformatics.org.
Structure-seq2: sensitive and accurate genome-wide profiling of RNA structure in vivo
Ritchey, Laura E.; Su, Zhao; Tang, Yin; Tack, David C.
2017-01-01
Abstract RNA serves many functions in biology such as splicing, temperature sensing, and innate immunity. These functions are often determined by the structure of RNA. There is thus a pressing need to understand RNA structure and how it changes during diverse biological processes both in vivo and genome-wide. Here, we present Structure-seq2, which provides nucleotide-resolution RNA structural information in vivo and genome-wide. This optimized version of our original Structure-seq method increases sensitivity by at least 4-fold and improves data quality by minimizing formation of a deleterious by-product, reducing ligation bias, and improving read coverage. We also present a variation of Structure-seq2 in which a biotinylated nucleotide is incorporated during reverse transcription, which greatly facilitates the protocol by eliminating two PAGE purification steps. We benchmark Structure-seq2 on both mRNA and rRNA structure in rice (Oryza sativa). We demonstrate that Structure-seq2 can lead to new biological insights. Our Structure-seq2 datasets uncover hidden breaks in chloroplast rRNA and identify a previously unreported N1-methyladenosine (m1A) in a nuclear-encoded Oryza sativa rRNA. Overall, Structure-seq2 is a rapid, sensitive, and unbiased method to probe RNA in vivo and genome-wide that facilitates new insights into RNA biology. PMID:28637286
A MBD-seq protocol for large-scale methylome-wide studies with (very) low amounts of DNA.
Aberg, Karolina A; Chan, Robin F; Shabalin, Andrey A; Zhao, Min; Turecki, Gustavo; Staunstrup, Nicklas Heine; Starnawska, Anna; Mors, Ole; Xie, Lin Y; van den Oord, Edwin Jcg
2017-09-01
We recently showed that, after optimization, our methyl-CpG binding domain sequencing (MBD-seq) application approximates the methylome-wide coverage obtained with whole-genome bisulfite sequencing (WGB-seq), but at a cost that enables adequately powered large-scale association studies. A prior drawback of MBD-seq is the relatively large amount of genomic DNA (ideally >1 µg) required to obtain high-quality data. Biomaterials are typically expensive to collect, provide a finite amount of DNA, and may simply not yield sufficient starting material. The ability to use low amounts of DNA will increase the breadth and number of studies that can be conducted. Therefore, we further optimized the enrichment step. With this low starting material protocol, MBD-seq performed equally well, or better, than the protocol requiring ample starting material (>1 µg). Using only 15 ng of DNA as input, there is minimal loss in data quality, achieving 93% of the coverage of WGB-seq (with standard amounts of input DNA) at similar false/positive rates. Furthermore, across a large number of genomic features, the MBD-seq methylation profiles closely tracked those observed for WGB-seq with even slightly larger effect sizes. This suggests that MBD-seq provides similar information about the methylome and classifies methylation status somewhat more accurately. Performance decreases with <15 ng DNA as starting material but, even with as little as 5 ng, MBD-seq still achieves 90% of the coverage of WGB-seq with comparable genome-wide methylation profiles. Thus, the proposed protocol is an attractive option for adequately powered and cost-effective methylome-wide investigations using (very) low amounts of DNA.
Development of a cell microarray chip for detection of circulating tumor cells
NASA Astrophysics Data System (ADS)
Yamamura, S.; Yatsushiro, S.; Abe, K.; Baba, Y.; Kataoka, M.
2012-03-01
Detection of circulating tumor cells (CTCs) in the peripheral blood of metastatic cancer patients has clinical significance in earlier diagnosis of metastases. In this study, a novel cell microarray chip for accurate and rapid detection of tumor cells from human leukocytes was developed. The chip with 20,944 microchambers (105 μm diameter and 50 μm depth) was made from polystyrene, and the surface was rendered to hydrophilic by means of reactive-ion etching, which led to the formation of mono-layers of leukocytes on the microchambers. As the model of CTCs detection, we spiked human bronchioalveolar carcinoma (H1650) cells into human T lymphoblastoid leukemia (CEM) cells suspension and detected H1650 cells using the chip. A CEM suspension contained with H1650 cells was dispersed on the chip surface, followed by 10 min standing to allow the cells to settle down into the microchambers. About 30 CEM cells were accommodated in each microchamber, over 600,000 CEM cells in total being on a chip. We could detect 1 H1650 cell per 106 CEM cells on the microarray by staining with fluorescence-conjugated antibody (Anti-Cytokeratin) and cell membrane marker (DiD). Thus, this cell microarray chip has highly potential to be a novel tool of accurate and rapid detection of CTCs.
Oguz, Cihan; Sen, Shurjo K; Davis, Adam R; Fu, Yi-Ping; O'Donnell, Christopher J; Gibbons, Gary H
2017-10-26
One goal of personalized medicine is leveraging the emerging tools of data science to guide medical decision-making. Achieving this using disparate data sources is most daunting for polygenic traits. To this end, we employed random forests (RFs) and neural networks (NNs) for predictive modeling of coronary artery calcium (CAC), which is an intermediate endo-phenotype of coronary artery disease (CAD). Model inputs were derived from advanced cases in the ClinSeq®; discovery cohort (n=16) and the FHS replication cohort (n=36) from 89 th -99 th CAC score percentile range, and age-matched controls (ClinSeq®; n=16, FHS n=36) with no detectable CAC (all subjects were Caucasian males). These inputs included clinical variables and genotypes of 56 single nucleotide polymorphisms (SNPs) ranked highest in terms of their nominal correlation with the advanced CAC state in the discovery cohort. Predictive performance was assessed by computing the areas under receiver operating characteristic curves (ROC-AUC). RF models trained and tested with clinical variables generated ROC-AUC values of 0.69 and 0.61 in the discovery and replication cohorts, respectively. In contrast, in both cohorts, the set of SNPs derived from the discovery cohort were highly predictive (ROC-AUC ≥0.85) with no significant change in predictive performance upon integration of clinical and genotype variables. Using the 21 SNPs that produced optimal predictive performance in both cohorts, we developed NN models trained with ClinSeq®; data and tested with FHS data and obtained high predictive accuracy (ROC-AUC=0.80-0.85) with several topologies. Several CAD and "vascular aging" related biological processes were enriched in the network of genes constructed from the predictive SNPs. We identified a molecular network predictive of advanced coronary calcium using genotype data from ClinSeq®; and FHS cohorts. Our results illustrate that machine learning tools, which utilize complex interactions between disease predictors intrinsic to the pathogenesis of polygenic disorders, hold promise for deriving predictive disease models and networks.
Bady, Pierre; Sciuscio, Davide; Diserens, Annie-Claire; Bloch, Jocelyne; van den Bent, Martin J; Marosi, Christine; Dietrich, Pierre-Yves; Weller, Michael; Mariani, Luigi; Heppner, Frank L; Mcdonald, David R; Lacombe, Denis; Stupp, Roger; Delorenzi, Mauro; Hegi, Monika E
2012-10-01
The methylation status of the O(6)-methylguanine-DNA methyltransferase (MGMT) gene is an important predictive biomarker for benefit from alkylating agent therapy in glioblastoma. Recent studies in anaplastic glioma suggest a prognostic value for MGMT methylation. Investigation of pathogenetic and epigenetic features of this intriguingly distinct behavior requires accurate MGMT classification to assess high throughput molecular databases. Promoter methylation-mediated gene silencing is strongly dependent on the location of the methylated CpGs, complicating classification. Using the HumanMethylation450 (HM-450K) BeadChip interrogating 176 CpGs annotated for the MGMT gene, with 14 located in the promoter, two distinct regions in the CpG island of the promoter were identified with high importance for gene silencing and outcome prediction. A logistic regression model (MGMT-STP27) comprising probes cg12434587 [corrected] and cg12981137 provided good classification properties and prognostic value (kappa = 0.85; log-rank p < 0.001) using a training-set of 63 glioblastomas from homogenously treated patients, for whom MGMT methylation was previously shown to be predictive for outcome based on classification by methylation-specific PCR. MGMT-STP27 was successfully validated in an independent cohort of chemo-radiotherapy-treated glioblastoma patients (n = 50; kappa = 0.88; outcome, log-rank p < 0.001). Lower prevalence of MGMT methylation among CpG island methylator phenotype (CIMP) positive tumors was found in glioblastomas from The Cancer Genome Atlas than in low grade and anaplastic glioma cohorts, while in CIMP-negative gliomas MGMT was classified as methylated in approximately 50 % regardless of tumor grade. The proposed MGMT-STP27 prediction model allows mining of datasets derived on the HM-450K or HM-27K BeadChip to explore effects of distinct epigenetic context of MGMT methylation suspected to modulate treatment resistance in different tumor types.
Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads.
Song, Li; Florea, Liliana
2015-01-01
Next-generation sequencing of cellular RNA (RNA-seq) is rapidly becoming the cornerstone of transcriptomic analysis. However, sequencing errors in the already short RNA-seq reads complicate bioinformatics analyses, in particular alignment and assembly. Error correction methods have been highly effective for whole-genome sequencing (WGS) reads, but are unsuitable for RNA-seq reads, owing to the variation in gene expression levels and alternative splicing. We developed a k-mer based method, Rcorrector, to correct random sequencing errors in Illumina RNA-seq reads. Rcorrector uses a De Bruijn graph to compactly represent all trusted k-mers in the input reads. Unlike WGS read correctors, which use a global threshold to determine trusted k-mers, Rcorrector computes a local threshold at every position in a read. Rcorrector has an accuracy higher than or comparable to existing methods, including the only other method (SEECER) designed for RNA-seq reads, and is more time and memory efficient. With a 5 GB memory footprint for 100 million reads, it can be run on virtually any desktop or server. The software is available free of charge under the GNU General Public License from https://github.com/mourisl/Rcorrector/.
Selective amplification and sequencing of cyclic phosphate-containing RNAs by the cP-RNA-seq method.
Honda, Shozo; Morichika, Keisuke; Kirino, Yohei
2016-03-01
RNA digestions catalyzed by many ribonucleases generate RNA fragments that contain a 2',3'-cyclic phosphate (cP) at their 3' termini. However, standard RNA-seq methods are unable to accurately capture cP-containing RNAs because the cP inhibits the adapter ligation reaction. We recently developed a method named cP-RNA-seq that is able to selectively amplify and sequence cP-containing RNAs. Here we describe the cP-RNA-seq protocol in which the 3' termini of all RNAs, except those containing a cP, are cleaved through a periodate treatment after phosphatase treatment; hence, subsequent adapter ligation and cDNA amplification steps are exclusively applied to cP-containing RNAs. cP-RNA-seq takes ∼6 d, excluding the time required for sequencing and bioinformatics analyses, which are not covered in detail in this protocol. Biochemical validation of the existence of cP in the identified RNAs takes ∼3 d. Even though the cP-RNA-seq method was developed to identify angiogenin-generating 5'-tRNA halves as a proof of principle, the method should be applicable to global identification of cP-containing RNA repertoires in various transcriptomes.
Predicting enhancer activity and variant impact using gkm-SVM.
Beer, Michael A
2017-09-01
We participated in the Critical Assessment of Genome Interpretation eQTL challenge to further test computational models of regulatory variant impact and their association with human disease. Our prediction model is based on a discriminative gapped-kmer SVM (gkm-SVM) trained on genome-wide chromatin accessibility data in the cell type of interest. The comparisons with massively parallel reporter assays (MPRA) in lymphoblasts show that gkm-SVM is among the most accurate prediction models even though all other models used the MPRA data for model training, and gkm-SVM did not. In addition, we compare gkm-SVM with other MPRA datasets and show that gkm-SVM is a reliable predictor of expression and that deltaSVM is a reliable predictor of variant impact in K562 cells and mouse retina. We further show that DHS (DNase-I hypersensitive sites) and ATAC-seq (assay for transposase-accessible chromatin using sequencing) data are equally predictive substrates for training gkm-SVM, and that DHS regions flanked by H3K27Ac and H3K4me1 marks are more predictive than DHS regions alone. © 2017 Wiley Periodicals, Inc.
Nonnegative Matrix Factorization for Efficient Hyperspectral Image Projection
NASA Technical Reports Server (NTRS)
Iacchetta, Alexander S.; Fienup, James R.; Leisawitz, David T.; Bolcar, Matthew R.
2015-01-01
Hyperspectral imaging for remote sensing has prompted development of hyperspectral image projectors that can be used to characterize hyperspectral imaging cameras and techniques in the lab. One such emerging astronomical hyperspectral imaging technique is wide-field double-Fourier interferometry. NASA's current, state-of-the-art, Wide-field Imaging Interferometry Testbed (WIIT) uses a Calibrated Hyperspectral Image Projector (CHIP) to generate test scenes and provide a more complete understanding of wide-field double-Fourier interferometry. Given enough time, the CHIP is capable of projecting scenes with astronomically realistic spatial and spectral complexity. However, this would require a very lengthy data collection process. For accurate but time-efficient projection of complicated hyperspectral images with the CHIP, the field must be decomposed both spectrally and spatially in a way that provides a favorable trade-off between accurately projecting the hyperspectral image and the time required for data collection. We apply nonnegative matrix factorization (NMF) to decompose hyperspectral astronomical datacubes into eigenspectra and eigenimages that allow time-efficient projection with the CHIP. Included is a brief analysis of NMF parameters that affect accuracy, including the number of eigenspectra and eigenimages used to approximate the hyperspectral image to be projected. For the chosen field, the normalized mean squared synthesis error is under 0.01 with just 8 eigenspectra. NMF of hyperspectral astronomical fields better utilizes the CHIP's capabilities, providing time-efficient and accurate representations of astronomical scenes to be imaged with the WIIT.
Mastering multi-depth bio-chip patterns with DVD LBRs
NASA Astrophysics Data System (ADS)
Carson, Doug
2017-08-01
Bio chip and bio disc are rapidly growing technologies used in medical, health and other industries. While there are numerous unique designs and features, these products all rely on precise three-dimensional micro-fluidic channels or arrays to move, separate and combine samples under test. These bio chip and bio disc consumables are typically manufactured by molding these parts to a precise three-dimensional pattern on a negative metal stamper, or they can be made in smaller quantities using an appropriate curable resin and a negative mold/stamper. Stampers required for bio chips have been traditionally made using either micro machining or XY stepping lithography. Both of these technologies have their advantages as well as limitations when it comes to creating micro-fluidic patterns. Significant breakthroughs in continuous maskless lithography have enabled accurate and efficient manufacturing of micro-fluidic masters using LBRs (Laser Beam Recorders) and DRIE (Deep Reactive Ion Etching). The important advantages of LBR continuous lithography vs. XY stepping lithography and micro machining are speed and cost. LBR based continuous lithography is >100x faster than XY stepping lithography and more accurate than micro machining. Several innovations were required in order to create multi-depth patterns with sub micron accuracy. By combining proven industrial LBRs with DCA's G3-VIA pattern generator and DRIE, three-dimensional bio chip masters and stampers are being manufactured efficiently and accurately.
Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs
LeGault, Laura H.; Dewey, Colin N.
2013-01-01
Motivation: Alternative splicing and other processes that allow for different transcripts to be derived from the same gene are significant forces in the eukaryotic cell. RNA-Seq is a promising technology for analyzing alternative transcripts, as it does not require prior knowledge of transcript structures or genome sequences. However, analysis of RNA-Seq data in the presence of genes with large numbers of alternative transcripts is currently challenging due to efficiency, identifiability and representation issues. Results: We present RNA-Seq models and associated inference algorithms based on the concept of probabilistic splice graphs, which alleviate these issues. We prove that our models are often identifiable and demonstrate that our inference methods for quantification and differential processing detection are efficient and accurate. Availability: Software implementing our methods is available at http://deweylab.biostat.wisc.edu/psginfer. Contact: cdewey@biostat.wisc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23846746
Hu, Peng; Fabyanic, Emily; Kwon, Deborah Y; Tang, Sheng; Zhou, Zhaolan; Wu, Hao
2017-12-07
Massively parallel single-cell RNA sequencing can precisely resolve cellular diversity in a high-throughput manner at low cost, but unbiased isolation of intact single cells from complex tissues such as adult mammalian brains is challenging. Here, we integrate sucrose-gradient-assisted purification of nuclei with droplet microfluidics to develop a highly scalable single-nucleus RNA-seq approach (sNucDrop-seq), which is free of enzymatic dissociation and nucleus sorting. By profiling ∼18,000 nuclei isolated from cortical tissues of adult mice, we demonstrate that sNucDrop-seq not only accurately reveals neuronal and non-neuronal subtype composition with high sensitivity but also enables in-depth analysis of transient transcriptional states driven by neuronal activity, at single-cell resolution, in vivo. Copyright © 2017 Elsevier Inc. All rights reserved.
Wolf, Timo; Schneiker-Bekel, Susanne; Neshat, Armin; Ortseifen, Vera; Wibberg, Daniel; Zemke, Till; Pühler, Alfred; Kalinowski, Jörn
2017-06-10
Actinoplanes sp. SE50/110 is the natural producer of acarbose, which is used in the treatment of diabetes mellitus type II. However, until now the transcriptional organization and regulation of the acarbose biosynthesis are only understood rudimentarily. The genome sequence of Actinoplanes sp. SE50/110 was known before, but was resequenced in this study to remove assembly artifacts and incorrect base callings. The annotation of the genome was refined in a multi-step approach, including modern bioinformatic pipelines, transcriptome and proteome data. A whole transcriptome RNA-seq library as well as an RNA-seq library enriched for primary 5'-ends were used for the detection of transcription start sites, to correct tRNA predictions, to identify novel transcripts like small RNAs and to improve the annotation through the correction of falsely annotated translation start sites. The transcriptome data sets were also applied to identify 31 cis-regulatory RNA structures, such as riboswitches or RNA thermometers as well as three leaderless transcribed short peptides found in putative attenuators upstream of genes for amino acid biosynthesis. The transcriptional organization of the acarbose biosynthetic gene cluster was elucidated in detail and fourteen novel biosynthetic gene clusters were suggested. The accurate genome sequence and precise annotation of the Actinoplanes sp. SE50/110 genome will be the foundation for future genetic engineering and systems biology studies. Copyright © 2017 Elsevier B.V. All rights reserved.
DIANA-LncBase v2: indexing microRNA targets on non-coding transcripts.
Paraskevopoulou, Maria D; Vlachos, Ioannis S; Karagkouni, Dimitra; Georgakilas, Georgios; Kanellos, Ilias; Vergoulis, Thanasis; Zagganas, Konstantinos; Tsanakas, Panayiotis; Floros, Evangelos; Dalamagas, Theodore; Hatzigeorgiou, Artemis G
2016-01-04
microRNAs (miRNAs) are short non-coding RNAs (ncRNAs) that act as post-transcriptional regulators of coding gene expression. Long non-coding RNAs (lncRNAs) have been recently reported to interact with miRNAs. The sponge-like function of lncRNAs introduces an extra layer of complexity in the miRNA interactome. DIANA-LncBase v1 provided a database of experimentally supported and in silico predicted miRNA Recognition Elements (MREs) on lncRNAs. The second version of LncBase (www.microrna.gr/LncBase) presents an extensive collection of miRNA:lncRNA interactions. The significantly enhanced database includes more than 70 000 low and high-throughput, (in)direct miRNA:lncRNA experimentally supported interactions, derived from manually curated publications and the analysis of 153 AGO CLIP-Seq libraries. The new experimental module presents a 14-fold increase compared to the previous release. LncBase v2 hosts in silico predicted miRNA targets on lncRNAs, identified with the DIANA-microT algorithm. The relevant module provides millions of predicted miRNA binding sites, accompanied with detailed metadata and MRE conservation metrics. LncBase v2 caters information regarding cell type specific miRNA:lncRNA regulation and enables users to easily identify interactions in 66 different cell types, spanning 36 tissues for human and mouse. Database entries are also supported by accurate lncRNA expression information, derived from the analysis of more than 6 billion RNA-Seq reads. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
SeqAPASS (Sequence Alignment to Predict Across Species Susceptibility) software and documentation
SeqAPASS is a software application facilitates rapid and streamlined, yet transparent, comparisons of the similarity of toxicologically-significant molecular targets across species. The present application facilitates analysis of primary amino acid sequence similarity (including ...
Determining Light Transmittance Characteristics of Wood and Bark Chips
Douglas B. Brumm; Robert C. Radcliffe; John A. Sturos
1983-01-01
Describes compter-assisted testing for measuring light transmittance of wood and bark chips. Electronic interface permitted the computer to collect physical data accurately and efficiently and to analyze and present the data in several tabular and grapical formats
Yin, Dandan; Lu, Xiyi; Su, Jun; He, Xuezhi; De, Wei; Yang, Jinsong; Li, Wei; Han, Liang; Zhang, Erbao
2018-05-24
Mounting evidence indicates that long noncoding RNAs (lncRNAs) could play a pivotal role in cancer biology. However, the role and molecular mechanism and global genes that were mediated by lncRNA AFAP1-AS1 in non-small cell lung cancer (NSCLC) remain largely unknown. Expression of AFAP1-AS1 was analyzed in 92 NSCLC tissues and cell lines by Quantitative real time polymerase chain reaction (qRT-PCR). The effect of AFAP1-AS1 on proliferation was evaluated by function assays both in in vitro and in vivo. RNA-seq assays were performed after knockdown AFAP1-AS1. RNA immunoprecipitation (RIP) was performed to confirm the interaction between AFAP1-AS1 and EZH2. Chromatin immunoprecipitation (ChIP) was used to study the promoter region of p21. AFAP1-AS1 expression was increased in NSCLC tissues and was correlated with clinical outcomes of NSCLC. Further experiments revealed that inhibition of its expression in NSCLC cells resulted in diminished cell growth in vitro and in vivo. RNA-seq revealed that knockdown of AFAP1-AS1 could induce the expression of p21. Mechanistic investigations found that AFAP1-AS1 could interact with EZH2 and recruit EZH2 to the promoter regions of p21, thus epigenetically repressing p21 expression. Together, these results suggest that lncRNA AFAP1-AS1 may serve as a candidate prognostic biomarker and target for new therapies in human NSCLC.
Predicting segregation of wood and bark chips by differences in terminal velocities.
John A. Sturos
1973-01-01
Presents data on length, width, thickness, moisture content, specific gravity, and terminal velocities of wood and bark chips for eight important pulpwood species. Gives differences in terminal velocities of the wood and bark chips used to predict the degree of segregation possible by air flotation.
Petegrosso, Raphael; Tolar, Jakub
2018-01-01
Single-cell RNA sequencing (scRNA-seq) has been widely applied to discover new cell types by detecting sub-populations in a heterogeneous group of cells. Since scRNA-seq experiments have lower read coverage/tag counts and introduce more technical biases compared to bulk RNA-seq experiments, the limited number of sampled cells combined with the experimental biases and other dataset specific variations presents a challenge to cross-dataset analysis and discovery of relevant biological variations across multiple cell populations. In this paper, we introduce a method of variance-driven multitask clustering of single-cell RNA-seq data (scVDMC) that utilizes multiple single-cell populations from biological replicates or different samples. scVDMC clusters single cells in multiple scRNA-seq experiments of similar cell types and markers but varying expression patterns such that the scRNA-seq data are better integrated than typical pooled analyses which only increase the sample size. By controlling the variance among the cell clusters within each dataset and across all the datasets, scVDMC detects cell sub-populations in each individual experiment with shared cell-type markers but varying cluster centers among all the experiments. Applied to two real scRNA-seq datasets with several replicates and one large-scale droplet-based dataset on three patient samples, scVDMC more accurately detected cell populations and known cell markers than pooled clustering and other recently proposed scRNA-seq clustering methods. In the case study applied to in-house Recessive Dystrophic Epidermolysis Bullosa (RDEB) scRNA-seq data, scVDMC revealed several new cell types and unknown markers validated by flow cytometry. MATLAB/Octave code available at https://github.com/kuanglab/scVDMC. PMID:29630593
Computational Prediction of miRNA Genes from Small RNA Sequencing Data
Kang, Wenjing; Friedländer, Marc R.
2015-01-01
Next-generation sequencing now for the first time allows researchers to gage the depth and variation of entire transcriptomes. However, now as rare transcripts can be detected that are present in cells at single copies, more advanced computational tools are needed to accurately annotate and profile them. microRNAs (miRNAs) are 22 nucleotide small RNAs (sRNAs) that post-transcriptionally reduce the output of protein coding genes. They have established roles in numerous biological processes, including cancers and other diseases. During miRNA biogenesis, the sRNAs are sequentially cleaved from precursor molecules that have a characteristic hairpin RNA structure. The vast majority of new miRNA genes that are discovered are mined from small RNA sequencing (sRNA-seq), which can detect more than a billion RNAs in a single run. However, given that many of the detected RNAs are degradation products from all types of transcripts, the accurate identification of miRNAs remain a non-trivial computational problem. Here, we review the tools available to predict animal miRNAs from sRNA sequencing data. We present tools for generalist and specialist use cases, including prediction from massively pooled data or in species without reference genome. We also present wet-lab methods used to validate predicted miRNAs, and approaches to computationally benchmark prediction accuracy. For each tool, we reference validation experiments and benchmarking efforts. Last, we discuss the future of the field. PMID:25674563
De Novo Chromosome Structure Prediction
NASA Astrophysics Data System (ADS)
di Pierro, Michele; Cheng, Ryan R.; Lieberman-Aiden, Erez; Wolynes, Peter G.; Onuchic, Jose'n.
Chromatin consists of DNA and hundreds of proteins that interact with the genetic material. In vivo, chromatin folds into nonrandom structures. The physical mechanism leading to these characteristic conformations, however, remains poorly understood. We recently introduced MiChroM, a model that generates chromosome conformations by using the idea that chromatin can be subdivided into types based on its biochemical interactions. Here we extend and complete our previous finding by showing that structural chromatin types can be inferred from ChIP-Seq data. Chromatin types, which are distinct from DNA sequence, are partially epigenetically controlled and change during cell differentiation, thus constituting a link between epigenetics, chromosomal organization, and cell development. We show that, for GM12878 lymphoblastoid cells we are able to predict accurate chromosome structures with the only input of genomic data. The degree of accuracy achieved by our prediction supports the viability of the proposed physical mechanism of chromatin folding and makes the computational model a powerful tool for future investigations.
Schwartz, Cindy L.; Chen, Lu; McCarten, Kathleen; Wolden, Suzanne; Constine, Louis S.; Hutchison, Robert E.; de Alarcon, Pedro A.; Keller, Frank G.; Kelly, Kara M.; Trippet, Tanya A.; Voss, Stephan D.; Friedman, Debra L.
2017-01-01
Background Early response to initial chemotherapy in Hodgkin lymphoma (HL) measured by computed tomography (CT) and/or positron emission tomography (PET) after two to three cycles of chemotherapy may inform therapeutic decisions. Risk stratification at diagnosis could, however, allow earlier and potentially more efficacious treatment modifications. Patients and Methods We developed a predictive model for event-free survival (EFS) in pediatric/adolescent HL using clinical data known at diagnosis from 1103 intermediate-risk HL patients treated on Children’s Oncology Group protocol AHOD0031 with doxorubicin, bleomycin, vincristine, etoposide, prednisone, cyclophosphamide (ABVE-PC) chemotherapy and radiation. Independent predictors of EFS were identified and used to develop and validate a prognostic score (Childhood Hodgkin International Prognostic Score [CHIPS]). A training cohort was randomly selected to include approximately half of the overall cohort, with the remainder forming the validation cohort. Results Stage 4 disease, large mediastinal mass, albumin (<3.5), and fever were independent predictors of EFS that were each assigned one point in the CHIPS. Four-year EFS was 93.1% for patients with CHIPS = 0, 88.5% for patients with CHIPS = 1, 77.6% for patients with CHIPS = 2, and 69.2% for patients with CHIPS = 3. Conclusions CHIPS was highly predictive of EFS, identifying a subset (with CHIPS 2 or 3) that comprises 27% of intermediate-risk patients who have a 4-year EFS of <80% and who may benefit from early therapeutic augmentation. Furthermore, CHIPS identified higher risk patients who were not identified by early PET or CT response. CHIPS is a robust and inexpensive approach to predicting risk in patients with intermediate-risk HL that may improve ability to tailor therapy to risk factors known at diagnosis. PMID:27786406
Schwartz, Cindy L; Chen, Lu; McCarten, Kathleen; Wolden, Suzanne; Constine, Louis S; Hutchison, Robert E; de Alarcon, Pedro A; Keller, Frank G; Kelly, Kara M; Trippet, Tanya A; Voss, Stephan D; Friedman, Debra L
2017-04-01
Early response to initial chemotherapy in Hodgkin lymphoma (HL) measured by computed tomography (CT) and/or positron emission tomography (PET) after two to three cycles of chemotherapy may inform therapeutic decisions. Risk stratification at diagnosis could, however, allow earlier and potentially more efficacious treatment modifications. We developed a predictive model for event-free survival (EFS) in pediatric/adolescent HL using clinical data known at diagnosis from 1103 intermediate-risk HL patients treated on Children's Oncology Group protocol AHOD0031 with doxorubicin, bleomycin, vincristine, etoposide, prednisone, cyclophosphamide (ABVE-PC) chemotherapy and radiation. Independent predictors of EFS were identified and used to develop and validate a prognostic score (Childhood Hodgkin International Prognostic Score [CHIPS]). A training cohort was randomly selected to include approximately half of the overall cohort, with the remainder forming the validation cohort. Stage 4 disease, large mediastinal mass, albumin (<3.5), and fever were independent predictors of EFS that were each assigned one point in the CHIPS. Four-year EFS was 93.1% for patients with CHIPS = 0, 88.5% for patients with CHIPS = 1, 77.6% for patients with CHIPS = 2, and 69.2% for patients with CHIPS = 3. CHIPS was highly predictive of EFS, identifying a subset (with CHIPS 2 or 3) that comprises 27% of intermediate-risk patients who have a 4-year EFS of <80% and who may benefit from early therapeutic augmentation. Furthermore, CHIPS identified higher risk patients who were not identified by early PET or CT response. CHIPS is a robust and inexpensive approach to predicting risk in patients with intermediate-risk HL that may improve ability to tailor therapy to risk factors known at diagnosis. © 2016 Wiley Periodicals, Inc.
Zhang, L; Liu, X J
2016-06-03
With the rapid development of next-generation high-throughput sequencing technology, RNA-seq has become a standard and important technique for transcriptome analysis. For multi-sample RNA-seq data, the existing expression estimation methods usually deal with each single-RNA-seq sample, and ignore that the read distributions are consistent across multiple samples. In the current study, we propose a structured sparse regression method, SSRSeq, to estimate isoform expression using multi-sample RNA-seq data. SSRSeq uses a non-parameter model to capture the general tendency of non-uniformity read distribution for all genes across multiple samples. Additionally, our method adds a structured sparse regularization, which not only incorporates the sparse specificity between a gene and its corresponding isoform expression levels, but also reduces the effects of noisy reads, especially for lowly expressed genes and isoforms. Four real datasets were used to evaluate our method on isoform expression estimation. Compared with other popular methods, SSRSeq reduced the variance between multiple samples, and produced more accurate isoform expression estimations, and thus more meaningful biological interpretations.
A Guide for Designing and Analyzing RNA-Seq Data.
Chatterjee, Aniruddha; Ahn, Antonio; Rodger, Euan J; Stockwell, Peter A; Eccles, Michael R
2018-01-01
The identity of a cell or an organism is at least in part defined by its gene expression and therefore analyzing gene expression remains one of the most frequently performed experimental techniques in molecular biology. The development of the RNA-Sequencing (RNA-Seq) method allows an unprecedented opportunity to analyze expression of protein-coding, noncoding RNA and also de novo transcript assembly of a new species or organism. However, the planning and design of RNA-Seq experiments has important implications for addressing the desired biological question and maximizing the value of the data obtained. In addition, RNA-Seq generates a huge volume of data and accurate analysis of this data involves several different steps and choices of tools. This can be challenging and overwhelming, especially for bench scientists. In this chapter, we describe an entire workflow for performing RNA-Seq experiments. We describe critical aspects of wet lab experiments such as RNA isolation, library preparation and the initial design of an experiment. Further, we provide a step-by-step description of the bioinformatics workflow for different steps involved in RNA-Seq data analysis. This includes power calculations, setting up a computational environment, acquisition and processing of publicly available data if desired, quality control measures, preprocessing steps for the raw data, differential expression analysis, and data visualization. We particularly mention important considerations for each step to provide a guide for designing and analyzing RNA-Seq data.
Prediction of 3D chip formation in the facing cutting with lathe machine using FEM
NASA Astrophysics Data System (ADS)
Prasetyo, Yudhi; Tauviqirrahman, Mohamad; Rusnaldy
2016-04-01
This paper presents the prediction of the chip formation at the machining process using a lathe machine in a more specific way focusing on facing cutting (face turning). The main purpose is to propose a new approach to predict the chip formation with the variation of the cutting directions i.e., the backward and forward direction. In addition, the interaction between stress analysis and chip formation on cutting process was also investigated. The simulations were conducted using three dimensional (3D) finite element method based on ABAQUS software with aluminum and high speed steel (HSS) as the workpiece and the tool materials, respectively. The simulation result showed that the chip resulted using a backward direction depicts a better formation than that using a conventional (forward) direction.
Tan, Jean-Marie; Payne, Elizabeth J.; Lin, Lynlee L.; Sinnya, Sudipta; Raphael, Anthony P.; Lambie, Duncan; Frazer, Ian H.; Dinger, Marcel E.; Soyer, H. Peter
2017-01-01
Identification of appropriate reference genes (RGs) is critical to accurate data interpretation in quantitative real-time PCR (qPCR) experiments. In this study, we have utilised next generation RNA sequencing (RNA-seq) to analyse the transcriptome of a panel of non-melanoma skin cancer lesions, identifying genes that are consistently expressed across all samples. Genes encoding ribosomal proteins were amongst the most stable in this dataset. Validation of this RNA-seq data was examined using qPCR to confirm the suitability of a set of highly stable genes for use as qPCR RGs. These genes will provide a valuable resource for the normalisation of qPCR data for the analysis of non-melanoma skin cancer. PMID:28852586
Carré, Clément; Mas, André; Krouk, Gabriel
2017-01-01
Inferring transcriptional gene regulatory networks from transcriptomic datasets is a key challenge of systems biology, with potential impacts ranging from medicine to agronomy. There are several techniques used presently to experimentally assay transcription factors to target relationships, defining important information about real gene regulatory networks connections. These techniques include classical ChIP-seq, yeast one-hybrid, or more recently, DAP-seq or target technologies. These techniques are usually used to validate algorithm predictions. Here, we developed a reverse engineering approach based on mathematical and computer simulation to evaluate the impact that this prior knowledge on gene regulatory networks may have on training machine learning algorithms. First, we developed a gene regulatory networks-simulating engine called FRANK (Fast Randomizing Algorithm for Network Knowledge) that is able to simulate large gene regulatory networks (containing 10 4 genes) with characteristics of gene regulatory networks observed in vivo. FRANK also generates stable or oscillatory gene expression directly produced by the simulated gene regulatory networks. The development of FRANK leads to important general conclusions concerning the design of large and stable gene regulatory networks harboring scale free properties (built ex nihilo). In combination with supervised (accepting prior knowledge) support vector machine algorithm we (i) address biologically oriented questions concerning our capacity to accurately reconstruct gene regulatory networks and in particular we demonstrate that prior-knowledge structure is crucial for accurate learning, and (ii) draw conclusions to inform experimental design to performed learning able to solve gene regulatory networks in the future. By demonstrating that our predictions concerning the influence of the prior-knowledge structure on support vector machine learning capacity holds true on real data ( Escherichia coli K14 network reconstruction using network and transcriptomic data), we show that the formalism used to build FRANK can to some extent be a reasonable model for gene regulatory networks in real cells.
Delaney, Nigel F.; Marx, Christopher J.
2012-01-01
Understanding evolutionary dynamics within microbial populations requires the ability to accurately follow allele frequencies through time. Here we present a rapid, cost-effective method (FREQ-Seq) that leverages Illumina next-generation sequencing for localized, quantitative allele frequency detection. Analogous to RNA-Seq, FREQ-Seq relies upon counts from the >105 reads generated per locus per time-point to determine allele frequencies. Loci of interest are directly amplified from a mixed population via two rounds of PCR using inexpensive, user-designed oligonucleotides and a bar-coded bridging primer system that can be regenerated in-house. The resulting bar-coded PCR products contain the adapters needed for Illumina sequencing, eliminating further library preparation. We demonstrate the utility of FREQ-Seq by determining the order and dynamics of beneficial alleles that arose as a microbial population, founded with an engineered strain of Methylobacterium, evolved to grow on methanol. Quantifying allele frequencies with minimal bias down to 1% abundance allowed effective analysis of SNPs, small in-dels and insertions of transposable elements. Our data reveal large-scale clonal interference during the early stages of adaptation and illustrate the utility of FREQ-Seq as a cost-effective tool for tracking allele frequencies in populations. PMID:23118913
SeqAPASS v3.0 for Extrapolation of Toxicity Knowledge Across Species
The U.S. Environmental Protection Agency Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS; https://seqapass.epa.gov/seqapass/) tool was initially released to the public in 2016, providing a novel means to begin to address challenges in extrapolating toxicity ...
Coplanar electrode microfluidic chip enabling accurate sheathless impedance cytometry.
De Ninno, Adele; Errico, Vito; Bertani, Francesca Romana; Businaro, Luca; Bisegna, Paolo; Caselli, Federica
2017-03-14
Microfluidic impedance cytometry offers a simple non-invasive method for single-cell analysis. Coplanar electrode chips are especially attractive due to ease of fabrication, yielding miniaturized, reproducible, and ultimately low-cost devices. However, their accuracy is challenged by the dependence of the measured signal on particle trajectory within the interrogation volume, that manifests itself as an error in the estimated particle size, unless any kind of focusing system is used. In this paper, we present an original five-electrode coplanar chip enabling accurate particle sizing without the need for focusing. The chip layout is designed to provide a peculiar signal shape from which a new metric correlating with particle trajectory can be extracted. This metric is exploited to correct the estimated size of polystyrene beads of 5.2, 6 and 7 μm nominal diameter, reaching coefficient of variations lower than the manufacturers' quoted values. The potential impact of the proposed device in the field of life sciences is demonstrated with an application to Saccharomyces cerevisiae yeast.
Fish swarm intelligent to optimize real time monitoring of chips drying using machine vision
NASA Astrophysics Data System (ADS)
Hendrawan, Y.; Hawa, L. C.; Damayanti, R.
2018-03-01
This study attempted to apply machine vision-based chips drying monitoring system which is able to optimise the drying process of cassava chips. The objective of this study is to propose fish swarm intelligent (FSI) optimization algorithms to find the most significant set of image features suitable for predicting water content of cassava chips during drying process using artificial neural network model (ANN). Feature selection entails choosing the feature subset that maximizes the prediction accuracy of ANN. Multi-Objective Optimization (MOO) was used in this study which consisted of prediction accuracy maximization and feature-subset size minimization. The results showed that the best feature subset i.e. grey mean, L(Lab) Mean, a(Lab) energy, red entropy, hue contrast, and grey homogeneity. The best feature subset has been tested successfully in ANN model to describe the relationship between image features and water content of cassava chips during drying process with R2 of real and predicted data was equal to 0.9.
ePIANNO: ePIgenomics ANNOtation tool.
Liu, Chia-Hsin; Ho, Bing-Ching; Chen, Chun-Ling; Chang, Ya-Hsuan; Hsu, Yi-Chiung; Li, Yu-Cheng; Yuan, Shin-Sheng; Huang, Yi-Huan; Chang, Chi-Sheng; Li, Ker-Chau; Chen, Hsuan-Yu
2016-01-01
Recently, with the development of next generation sequencing (NGS), the combination of chromatin immunoprecipitation (ChIP) and NGS, namely ChIP-seq, has become a powerful technique to capture potential genomic binding sites of regulatory factors, histone modifications and chromatin accessible regions. For most researchers, additional information including genomic variations on the TF binding site, allele frequency of variation between different populations, variation associated disease, and other neighbour TF binding sites are essential to generate a proper hypothesis or a meaningful conclusion. Many ChIP-seq datasets had been deposited on the public domain to help researchers make new discoveries. However, researches are often intimidated by the complexity of data structure and largeness of data volume. Such information would be more useful if they could be combined or downloaded with ChIP-seq data. To meet such demands, we built a webtool: ePIgenomic ANNOtation tool (ePIANNO, http://epianno.stat.sinica.edu.tw/index.html). ePIANNO is a web server that combines SNP information of populations (1000 Genomes Project) and gene-disease association information of GWAS (NHGRI) with ChIP-seq (hmChIP, ENCODE, and ROADMAP epigenomics) data. ePIANNO has a user-friendly website interface allowing researchers to explore, navigate, and extract data quickly. We use two examples to demonstrate how users could use functions of ePIANNO webserver to explore useful information about TF related genomic variants. Users could use our query functions to search target regions, transcription factors, or annotations. ePIANNO may help users to generate hypothesis or explore potential biological functions for their studies.
Yu, Chuanhe; Gan, Haiyun; Zhang, Zhiguo
2018-01-01
DNA replication initiates at DNA replication origins after unwinding of double-strand DNA(dsDNA) by replicative helicase to generate single-stranded DNA (ssDNA) templates for the continuous synthesis of leading-strand and the discontinuous synthesis of lagging-strand. Therefore, methods capable of detecting strand-specific information will likely yield insight into the association of proteins at leading and lagging strand of DNA replication forks and the regulation of leading and lagging strand synthesis during DNA replication. The enrichment and Sequencing of Protein-Associated Nascent DNA (eSPAN), which measure the relative amounts of proteins at nascent leading and lagging strands of DNA replication forks, is a step-wise procedure involving the chromatin immunoprecipitation (ChIP) of a protein of interest followed by the enrichment of protein-associated nascent DNA through BrdU immunoprecipitation. The isolated ssDNA is then subjected to strand-specific sequencing. This method can detect whether a protein is enriched at leading or lagging strand of DNA replication forks. In addition to eSPAN, two other strand-specific methods, (ChIP-ssSeq), which detects potential protein-ssDNA binding and BrdU-IP-ssSeq, which can measure synthesis of both leading and lagging strand, were developed along the way. These methods can provide strand-specific and complementary information about the association of the target protein with DNA replication forks as well as synthesis of leading and lagging strands genome wide. Below, we describe the detailed eSPAN, ChIP-ssSeq, and BrdU-IP-ssSeq protocols.
Evaluation of microRNA alignment techniques
Kaspi, Antony; El-Osta, Assam
2016-01-01
Genomic alignment of small RNA (smRNA) sequences such as microRNAs poses considerable challenges due to their short length (∼21 nucleotides [nt]) as well as the large size and complexity of plant and animal genomes. While several tools have been developed for high-throughput mapping of longer mRNA-seq reads (>30 nt), there are few that are specifically designed for mapping of smRNA reads including microRNAs. The accuracy of these mappers has not been systematically determined in the case of smRNA-seq. In addition, it is unknown whether these aligners accurately map smRNA reads containing sequence errors and polymorphisms. By using simulated read sets, we determine the alignment sensitivity and accuracy of 16 short-read mappers and quantify their robustness to mismatches, indels, and nontemplated nucleotide additions. These were explored in the context of a plant genome (Oryza sativa, ∼500 Mbp) and a mammalian genome (Homo sapiens, ∼3.1 Gbp). Analysis of simulated and real smRNA-seq data demonstrates that mapper selection impacts differential expression results and interpretation. These results will inform on best practice for smRNA mapping and enable more accurate smRNA detection and quantification of expression and RNA editing. PMID:27284164
He, Jun; Xu, Jiaqi; Wu, Xiao-Lin; Bauck, Stewart; Lee, Jungjae; Morota, Gota; Kachman, Stephen D; Spangler, Matthew L
2018-04-01
SNP chips are commonly used for genotyping animals in genomic selection but strategies for selecting low-density (LD) SNPs for imputation-mediated genomic selection have not been addressed adequately. The main purpose of the present study was to compare the performance of eight LD (6K) SNP panels, each selected by a different strategy exploiting a combination of three major factors: evenly-spaced SNPs, increased minor allele frequencies, and SNP-trait associations either for single traits independently or for all the three traits jointly. The imputation accuracies from 6K to 80K SNP genotypes were between 96.2 and 98.2%. Genomic prediction accuracies obtained using imputed 80K genotypes were between 0.817 and 0.821 for daughter pregnancy rate, between 0.838 and 0.844 for fat yield, and between 0.850 and 0.863 for milk yield. The two SNP panels optimized on the three major factors had the highest genomic prediction accuracy (0.821-0.863), and these accuracies were very close to those obtained using observed 80K genotypes (0.825-0.868). Further exploration of the underlying relationships showed that genomic prediction accuracies did not respond linearly to imputation accuracies, but were significantly affected by genotype (imputation) errors of SNPs in association with the traits to be predicted. SNPs optimal for map coverage and MAF were favorable for obtaining accurate imputation of genotypes whereas trait-associated SNPs improved genomic prediction accuracies. Thus, optimal LD SNP panels were the ones that combined both strengths. The present results have practical implications on the design of LD SNP chips for imputation-enabled genomic prediction.
Andrabi, Munazah; Hutchins, Andrew Paul; Miranda-Saavedra, Diego; Kono, Hidetoshi; Nussinov, Ruth; Mizuguchi, Kenji; Ahmad, Shandar
2017-06-22
DNA shape is emerging as an important determinant of transcription factor binding beyond just the DNA sequence. The only tool for large scale DNA shape estimates, DNAshape was derived from Monte-Carlo simulations and predicts four broad and static DNA shape features, Propeller twist, Helical twist, Minor groove width and Roll. The contributions of other shape features e.g. Shift, Slide and Opening cannot be evaluated using DNAshape. Here, we report a novel method DynaSeq, which predicts molecular dynamics-derived ensembles of a more exhaustive set of DNA shape features. We compared the DNAshape and DynaSeq predictions for the common features and applied both to predict the genome-wide binding sites of 1312 TFs available from protein interaction quantification (PIQ) data. The results indicate a good agreement between the two methods for the common shape features and point to advantages in using DynaSeq. Predictive models employing ensembles from individual conformational parameters revealed that base-pair opening - known to be important in strand separation - was the best predictor of transcription factor-binding sites (TFBS) followed by features employed by DNAshape. Of note, TFBS could be predicted not only from the features at the target motif sites, but also from those as far as 200 nucleotides away from the motif.
Hong, Jungeui; Gresham, David
2017-11-01
Quantitative analysis of next-generation sequencing (NGS) data requires discriminating duplicate reads generated by PCR from identical molecules that are of unique origin. Typically, PCR duplicates are identified as sequence reads that align to the same genomic coordinates using reference-based alignment. However, identical molecules can be independently generated during library preparation. Misidentification of these molecules as PCR duplicates can introduce unforeseen biases during analyses. Here, we developed a cost-effective sequencing adapter design by modifying Illumina TruSeq adapters to incorporate a unique molecular identifier (UMI) while maintaining the capacity to undertake multiplexed, single-index sequencing. Incorporation of UMIs into TruSeq adapters (TrUMIseq adapters) enables identification of bona fide PCR duplicates as identically mapped reads with identical UMIs. Using TrUMIseq adapters, we show that accurate removal of PCR duplicates results in improved accuracy of both allele frequency (AF) estimation in heterogeneous populations using DNA sequencing and gene expression quantification using RNA-Seq.
Partial bisulfite conversion for unique template sequencing
Kumar, Vijay; Rosenbaum, Julie; Wang, Zihua; Forcier, Talitha; Ronemus, Michael; Wigler, Michael
2018-01-01
Abstract We introduce a new protocol, mutational sequencing or muSeq, which uses sodium bisulfite to randomly deaminate unmethylated cytosines at a fixed and tunable rate. The muSeq protocol marks each initial template molecule with a unique mutation signature that is present in every copy of the template, and in every fragmented copy of a copy. In the sequenced read data, this signature is observed as a unique pattern of C-to-T or G-to-A nucleotide conversions. Clustering reads with the same conversion pattern enables accurate count and long-range assembly of initial template molecules from short-read sequence data. We explore count and low-error sequencing by profiling 135 000 restriction fragments in a PstI representation, demonstrating that muSeq improves copy number inference and significantly reduces sporadic sequencer error. We explore long-range assembly in the context of cDNA, generating contiguous transcript clusters greater than 3,000 bp in length. The muSeq assemblies reveal transcriptional diversity not observable from short-read data alone. PMID:29161423
2006-06-14
Robert Graybill . A Raw hoard for the use of this project was provided by the Computer Architecture Croup at the Massachusetts Institute of Technology...simulator is presented by MIT as being an accurate model of the Raw chip, we have found that it does not accurately model the board. Our comparison...G4 processor, model 7410. with a 32 kbyte level-1 cache on-chip and a 2 Mbyte L2 cache connected through a 250 MH/ bus [12]. Each node has 256 Mbyte
Tamplin, Owen J; Cox, Brian J; Rossant, Janet
2011-12-15
The node and notochord are key tissues required for patterning of the vertebrate body plan. Understanding the gene regulatory network that drives their formation and function is therefore important. Foxa2 is a key transcription factor at the top of this genetic hierarchy and finding its targets will help us to better understand node and notochord development. We performed an extensive microarray-based gene expression screen using sorted embryonic notochord cells to identify early notochord-enriched genes. We validated their specificity to the node and notochord by whole mount in situ hybridization. This provides the largest available resource of notochord-expressed genes, and therefore candidate Foxa2 target genes in the notochord. Using existing Foxa2 ChIP-seq data from adult liver, we were able to identify a set of genes expressed in the notochord that had associated regions of Foxa2-bound chromatin. Given that Foxa2 is a pioneer transcription factor, we reasoned that these sites might represent notochord-specific enhancers. Candidate Foxa2-bound regions were tested for notochord specific enhancer function in a zebrafish reporter assay and 7 novel notochord enhancers were identified. Importantly, sequence conservation or predictive models could not have readily identified these regions. Mutation of putative Foxa2 binding elements in two of these novel enhancers abrogated reporter expression and confirmed their Foxa2 dependence. The combination of highly specific gene expression profiling and genome-wide ChIP analysis is a powerful means of understanding developmental pathways, even for small cell populations such as the notochord. Copyright © 2011 Elsevier Inc. All rights reserved.
Development of a Plastic-Based Microfluidic Immunosensor Chip for Detection of H1N1 Influenza
Lee, Kyoung G.; Lee, Tae Jae; Jeong, Soon Woo; Choi, Ho Woon; Heo, Nam Su; Park, Jung Youn; Park, Tae Jung; Lee, Seok Jae
2012-01-01
Lab-on-a-chip can provide convenient and accurate diagnosis tools. In this paper, a plastic-based microfluidic immunosensor chip for the diagnosis of swine flu (H1N1) was developed by immobilizing hemagglutinin antigen on a gold surface using a genetically engineered polypeptide. A fluorescent dye-labeled antibody (Ab) was used for quantifying the concentration of Ab in the immunosensor chip using a fluorescent technique. For increasing the detection efficiency and reducing the errors, three chambers and three microchannels were designed in one microfluidic chip. This protocol could be applied to the diagnosis of other infectious diseases in a microfluidic device. PMID:23112630
The FDA's Experience with Emerging Genomics Technologies-Past, Present, and Future.
Xu, Joshua; Thakkar, Shraddha; Gong, Binsheng; Tong, Weida
2016-07-01
The rapid advancement of emerging genomics technologies and their application for assessing safety and efficacy of FDA-regulated products require a high standard of reliability and robustness supporting regulatory decision-making in the FDA. To facilitate the regulatory application, the FDA implemented a novel data submission program, Voluntary Genomics Data Submission (VGDS), and also to engage the stakeholders. As part of the endeavor, for the past 10 years, the FDA has led an international consortium of regulatory agencies, academia, pharmaceutical companies, and genomics platform providers, which was named MicroArray Quality Control Consortium (MAQC), to address issues such as reproducibility, precision, specificity/sensitivity, and data interpretation. Three projects have been completed so far assessing these genomics technologies: gene expression microarrays, whole genome genotyping arrays, and whole transcriptome sequencing (i.e., RNA-seq). The resultant studies provide the basic parameters for fit-for-purpose application of these new data streams in regulatory environments, and the solutions have been made available to the public through peer-reviewed publications. The latest MAQC project is also called the SEquencing Quality Control (SEQC) project focused on next-generation sequencing. Using reference samples with built-in controls, SEQC studies have demonstrated that relative gene expression can be measured accurately and reliably across laboratories and RNA-seq platforms. Besides prediction performance comparable to microarrays in clinical settings and safety assessments, RNA-seq is shown to have better sensitivity for low expression and reveal novel transcriptomic features. Future effort of MAQC will be focused on quality control of whole genome sequencing and targeted sequencing.
The FDA’s Experience with Emerging Genomics Technologies—Past, Present, and Future
Xu, Joshua; Thakkar, Shraddha; Gong, Binsheng; Tong, Weida
2016-01-01
The rapid advancement of emerging genomics technologies and their application for assessing safety and efficacy of FDA-regulated products require a high standard of reliability and robustness supporting regulatory decision-making in the FDA. To facilitate the regulatory application, the FDA implemented a novel data submission program, Voluntary Genomics Data Submission (VGDS), and also to engage the stakeholders. As part of the endeavor, for the past 10 years, the FDA has led an international consortium of regulatory agencies, academia, pharmaceutical companies, and genomics platform providers, which was named MicroArray Quality Control Consortium (MAQC), to address issues such as reproducibility, precision, specificity/sensitivity, and data interpretation. Three projects have been completed so far assessing these genomics technologies: gene expression microarrays, whole genome genotyping arrays, and whole transcriptome sequencing (i.e., RNA-seq). The resultant studies provide the basic parameters for fit-for-purpose application of these new data streams in regulatory environments, and the solutions have been made available to the public through peer-reviewed publications. The latest MAQC project is also called the SEquencing Quality Control (SEQC) project focused on next-generation sequencing. Using reference samples with built-in controls, SEQC studies have demonstrated that relative gene expression can be measured accurately and reliably across laboratories and RNA-seq platforms. Besides prediction performance comparable to microarrays in clinical settings and safety assessments, RNA-seq is shown to have better sensitivity for low expression and reveal novel transcriptomic features. Future effort of MAQC will be focused on quality control of whole genome sequencing and targeted sequencing. PMID:27116022
Halvade-RNA: Parallel variant calling from transcriptomic data using MapReduce.
Decap, Dries; Reumers, Joke; Herzeel, Charlotte; Costanza, Pascal; Fostier, Jan
2017-01-01
Given the current cost-effectiveness of next-generation sequencing, the amount of DNA-seq and RNA-seq data generated is ever increasing. One of the primary objectives of NGS experiments is calling genetic variants. While highly accurate, most variant calling pipelines are not optimized to run efficiently on large data sets. However, as variant calling in genomic data has become common practice, several methods have been proposed to reduce runtime for DNA-seq analysis through the use of parallel computing. Determining the effectively expressed variants from transcriptomics (RNA-seq) data has only recently become possible, and as such does not yet benefit from efficiently parallelized workflows. We introduce Halvade-RNA, a parallel, multi-node RNA-seq variant calling pipeline based on the GATK Best Practices recommendations. Halvade-RNA makes use of the MapReduce programming model to create and manage parallel data streams on which multiple instances of existing tools such as STAR and GATK operate concurrently. Whereas the single-threaded processing of a typical RNA-seq sample requires ∼28h, Halvade-RNA reduces this runtime to ∼2h using a small cluster with two 20-core machines. Even on a single, multi-core workstation, Halvade-RNA can significantly reduce runtime compared to using multi-threading, thus providing for a more cost-effective processing of RNA-seq data. Halvade-RNA is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR.
NASA Astrophysics Data System (ADS)
Haqiqi, M. T.; Yuliansyah; Suwinarti, W.; Amirta, R.
2018-04-01
Short Rotation Coppice (SRC) system is an option to provide renewable and sustainable feedstock in generating electricity for rural area. Here in this study, we focussed on application of Response Surface Methodology (RSM) to simplify calculation protocols to point out wood chip production and energy potency from some tropical SRC species identified as Bauhinia purpurea, Bridelia tomentosa, Calliandra calothyrsus, Fagraea racemosa, Gliricidia sepium, Melastoma malabathricum, Piper aduncum, Vernonia amygdalina, Vernonia arborea and Vitex pinnata. The result showed that the highest calorific value was obtained from V. pinnata wood (19.97 MJ kg-1) due to its high lignin content (29.84 %, w/w). Our findings also indicated that the use of RSM for estimating energy-electricity of SRC wood had significant term regarding to the quadratic model (R2 = 0.953), whereas the solid-chip ratio prediction was accurate (R2 = 1.000). In the near future, the simple formula will be promising to calculate energy production easily from woody biomass, especially from SRC species.
NASA Astrophysics Data System (ADS)
Kempf, S.; Wegner, M.; Deeg, L.; Fleischmann, A.; Gastaldo, L.; Herrmann, F.; Richter, D.; Enss, C.
2017-06-01
We report on the design, fabrication and characterization of a 64 pixel metallic magnetic calorimeter array that is read out by an integrated, on-chip microwave SQUID multiplexer. Based on the results of our comprehensive device characterization we refined the state-of-the-art multiplexer model which assumes each associated non-hysteretic rf-SQUID to purely behave as a flux-dependent inductor. In particular, we include the capacitance and the subgap resistance of the Josephson junction as well as screening effects and parasitic mutual couplings between different coils that show up only when a superconducting flux transformer is attached to the SQUID input. Thanks to these modifications, we are able to explain the occurrence of a magnetic flux dependence of the internal quality factor of the microwave resonators as well as to accurately calculate the characteristic multiplexer parameters. When combining the refined multiplexer model with the thermodynamical description of a metallic magnetic calorimeter, we find a reasonable agreement between our measurements and predictions.
Xuan, Shi-Hai; Zhou, Yu-Gui; Shao, Bo; Cui, Ya-Lin; Li, Jian; Yin, Hong-Bo; Song, Xiao-Ping; Cong, Hui; Jing, Feng-Xiang; Jin, Qing-Hui; Wang, Hui-Min; Zhou, Jie
2009-11-01
Macrolide drugs, such as clarithromycin (CAM), are a key component of many combination therapies used to eradicate Helicobacter pylori. However, resistance to CAM is increasing in H. pylori and is becoming a serious problem in H. pylori eradication therapy. CAM resistance in H. pylori is mostly due to point mutations (A2142G/C, A2143G) in the peptidyltransferase-encoding region of the 23S rRNA gene. In this study an enzymic colorimetry-based DNA chip was developed to analyse single-nucleotide polymorphisms of the 23S rRNA gene to determine the prevalence of mutations in CAM-related resistance in H. pylori-positive patients. The results of the colorimetric DNA chip were confirmed by direct DNA sequencing. In 63 samples, the incidence of the A2143G mutation was 17.46 % (11/63). The results of the colorimetric DNA chip were concordant with DNA sequencing in 96.83 % of results (61/63). The colorimetric DNA chip could detect wild-type and mutant signals at every site, even at a DNA concentration of 1.53 x 10(2) copies microl(-1). Thus, the colorimetric DNA chip is a reliable assay for rapid and accurate detection of mutations in the 23S rRNA gene of H. pylori that lead to CAM-related resistance, directly from gastric tissues.
On-chip particle trapping and manipulation
NASA Astrophysics Data System (ADS)
Leake, Kaelyn Danielle
The ability to control and manipulate the world around us is human nature. Humans and our ancestors have used tools for millions of years. Only in recent years have we been able to control objects at such small levels. In order to understand the world around us it is frequently necessary to interact with the biological world. Optical trapping and manipulation offer a non-invasive way to move, sort and interact with particles and cells to see how they react to the world around them. Optical tweezers are ideal in their abilities but they require large, non-portable, and expensive setups limiting how and where we can use them. A cheap portable platform is required in order to have optical manipulation reach its full potential. On-chip technology offers a great solution to this challenge. We focused on the Liquid-Core Anti-Resonant Reflecting Optical Waveguide (liquid-core ARROW) for our work. The ARROW is an ideal platform, which has anti-resonant layers which allow light to be guided in liquids, allowing for particles to easily be manipulated. It is manufactured using standard silicon manufacturing techniques making it easy to produce. The planner design makes it easy to integrate with other technologies. Initially I worked to improve the ARROW chip by reducing the intersection losses and by reducing the fluorescence and background on the ARROW chip. The ARROW chip has already been used to trap and push particles along its channel but here I introduce several new methods of particle trapping and manipulation on the ARROW chip. Traditional two beam traps use two counter propagating beams. A trapping scheme that uses two orthogonal beams which counter to first instinct allow for trapping at their intersection is introduced. This scheme is thoroughly predicted and analyzed using realistic conditions. Simulations of this method were done using a program which looks at both the fluidics and optical sources to model complex situations. These simulations were also used to model and predict a sorting method which combines fluid flow with a single optical source to automatically sort dielectric particles by size in waveguide networks. These simulations were shown to be accurate when repeated on-chip. Lastly I introduce a particle trapping technique that uses Multimode Interference(MMI) patterns in order to trap multiple particles at once. The location of the traps can be adjusted as can the number of trapping location by changing the input wavelength. By changing the wavelength back and forth between two values this MMI can be used to pass a particle down the channel like a conveyor belt.
Comparison of alternative approaches for analysing multi-level RNA-seq data
Mohorianu, Irina; Bretman, Amanda; Smith, Damian T.; Fowler, Emily K.; Dalmay, Tamas
2017-01-01
RNA sequencing (RNA-seq) is widely used for RNA quantification in the environmental, biological and medical sciences. It enables the description of genome-wide patterns of expression and the identification of regulatory interactions and networks. The aim of RNA-seq data analyses is to achieve rigorous quantification of genes/transcripts to allow a reliable prediction of differential expression (DE), despite variation in levels of noise and inherent biases in sequencing data. This can be especially challenging for datasets in which gene expression differences are subtle, as in the behavioural transcriptomics test dataset from D. melanogaster that we used here. We investigated the power of existing approaches for quality checking mRNA-seq data and explored additional, quantitative quality checks. To accommodate nested, multi-level experimental designs, we incorporated sample layout into our analyses. We employed a subsampling without replacement-based normalization and an identification of DE that accounted for the hierarchy and amplitude of effect sizes within samples, then evaluated the resulting differential expression call in comparison to existing approaches. In a final step to test for broader applicability, we applied our approaches to a published set of H. sapiens mRNA-seq samples, The dataset-tailored methods improved sample comparability and delivered a robust prediction of subtle gene expression changes. The proposed approaches have the potential to improve key steps in the analysis of RNA-seq data by incorporating the structure and characteristics of biological experiments. PMID:28792517
Complex and dynamic landscape of RNA polyadenylation revealed by PAS-Seq
Shepard, Peter J.; Choi, Eun-A; Lu, Jente; Flanagan, Lisa A.; Hertel, Klemens J.; Shi, Yongsheng
2011-01-01
Alternative polyadenylation (APA) of mRNAs has emerged as an important mechanism for post-transcriptional gene regulation in higher eukaryotes. Although microarrays have recently been used to characterize APA globally, they have a number of serious limitations that prevents comprehensive and highly quantitative analysis. To better characterize APA and its regulation, we have developed a deep sequencing-based method called Poly(A) Site Sequencing (PAS-Seq) for quantitatively profiling RNA polyadenylation at the transcriptome level. PAS-Seq not only accurately and comprehensively identifies poly(A) junctions in mRNAs and noncoding RNAs, but also provides quantitative information on the relative abundance of polyadenylated RNAs. PAS-Seq analyses of human and mouse transcriptomes showed that 40%–50% of all expressed genes produce alternatively polyadenylated mRNAs. Furthermore, our study detected evolutionarily conserved polyadenylation of histone mRNAs and revealed novel features of mitochondrial RNA polyadenylation. Finally, PAS-Seq analyses of mouse embryonic stem (ES) cells, neural stem/progenitor (NSP) cells, and neurons not only identified more poly(A) sites than what was found in the entire mouse EST database, but also detected significant changes in the global APA profile that lead to lengthening of 3′ untranslated regions (UTR) in many mRNAs during stem cell differentiation. Together, our PAS-Seq analyses revealed a complex landscape of RNA polyadenylation in mammalian cells and the dynamic regulation of APA during stem cell differentiation. PMID:21343387
Modeling the radiation pattern of LEDs.
Moreno, Ivan; Sun, Ching-Cherng
2008-02-04
Light-emitting diodes (LEDs) come in many varieties and with a wide range of radiation patterns. We propose a general, simple but accurate analytic representation for the radiation pattern of the light emitted from an LED. To accurately render both the angular intensity distribution and the irradiance spatial pattern, a simple phenomenological model takes into account the emitting surfaces (chip, chip array, or phosphor surface), and the light redirected by both the reflecting cup and the encapsulating lens. Mathematically, the pattern is described as the sum of a maximum of two or three Gaussian or cosine-power functions. The resulting equation is widely applicable for any kind of LED of practical interest. We accurately model a wide variety of radiation patterns from several world-class manufacturers.
Comparison and evaluation of two exome capture kits and sequencing platforms for variant calling.
Zhang, Guoqiang; Wang, Jianfeng; Yang, Jin; Li, Wenjie; Deng, Yutian; Li, Jing; Huang, Jun; Hu, Songnian; Zhang, Bing
2015-08-05
To promote the clinical application of next-generation sequencing, it is important to obtain accurate and consistent variants of target genomic regions at low cost. Ion Proton, the latest updated semiconductor-based sequencing instrument from Life Technologies, is designed to provide investigators with an inexpensive platform for human whole exome sequencing that achieves a rapid turnaround time. However, few studies have comprehensively compared and evaluated the accuracy of variant calling between Ion Proton and Illumina sequencing platforms such as HiSeq 2000, which is the most popular sequencing platform for the human genome. The Ion Proton sequencer combined with the Ion TargetSeq Exome Enrichment Kit together make up TargetSeq-Proton, whereas SureSelect-Hiseq is based on the Agilent SureSelect Human All Exon v4 Kit and the HiSeq 2000 sequencer. Here, we sequenced exonic DNA from four human blood samples using both TargetSeq-Proton and SureSelect-HiSeq. We then called variants in the exonic regions that overlapped between the two exome capture kits (33.6 Mb). The rates of shared variant loci called by two sequencing platforms were from 68.0 to 75.3% in four samples, whereas the concordance of co-detected variant loci reached 99%. Sanger sequencing validation revealed that the validated rate of concordant single nucleotide polymorphisms (SNPs) (91.5%) was higher than the SNPs specific to TargetSeq-Proton (60.0%) or specific to SureSelect-HiSeq (88.3%). With regard to 1-bp small insertions and deletions (InDels), the Sanger sequencing validated rates of concordant variants (100.0%) and SureSelect-HiSeq-specific (89.6%) were higher than those of TargetSeq-Proton-specific (15.8%). In the sequencing of exonic regions, a combination of using of two sequencing strategies (SureSelect-HiSeq and TargetSeq-Proton) increased the variant calling specificity for concordant variant loci and the sensitivity for variant loci called by any one platform. However, for the sequencing of platform-specific variants, the accuracy of variant calling by HiSeq 2000 was higher than that of Ion Proton, specifically for the InDel detection. Moreover, the variant calling software also influences the detection of SNPs and, specifically, InDels in Ion Proton exome sequencing.
Inertial-ordering-assisted droplet microfluidics for high-throughput single-cell RNA-sequencing.
Moon, Hui-Sung; Je, Kwanghwi; Min, Jae-Woong; Park, Donghyun; Han, Kyung-Yeon; Shin, Seung-Ho; Park, Woong-Yang; Yoo, Chang Eun; Kim, Shin-Hyun
2018-02-27
Single-cell RNA-seq reveals the cellular heterogeneity inherent in the population of cells, which is very important in many clinical and research applications. Recent advances in droplet microfluidics have achieved the automatic isolation, lysis, and labeling of single cells in droplet compartments without complex instrumentation. However, barcoding errors occurring in the cell encapsulation process because of the multiple-beads-in-droplet and insufficient throughput because of the low concentration of beads for avoiding multiple-beads-in-a-droplet remain important challenges for precise and efficient expression profiling of single cells. In this study, we developed a new droplet-based microfluidic platform that significantly improved the throughput while reducing barcoding errors through deterministic encapsulation of inertially ordered beads. Highly concentrated beads containing oligonucleotide barcodes were spontaneously ordered in a spiral channel by an inertial effect, which were in turn encapsulated in droplets one-by-one, while cells were simultaneously encapsulated in the droplets. The deterministic encapsulation of beads resulted in a high fraction of single-bead-in-a-droplet and rare multiple-beads-in-a-droplet although the bead concentration increased to 1000 μl -1 , which diminished barcoding errors and enabled accurate high-throughput barcoding. We successfully validated our device with single-cell RNA-seq. In addition, we found that multiple-beads-in-a-droplet, generated using a normal Drop-Seq device with a high concentration of beads, underestimated transcript numbers and overestimated cell numbers. This accurate high-throughput platform can expand the capability and practicality of Drop-Seq in single-cell analysis.
Gene Ranking of RNA-Seq Data via Discriminant Non-Negative Matrix Factorization.
Jia, Zhilong; Zhang, Xiang; Guan, Naiyang; Bo, Xiaochen; Barnes, Michael R; Luo, Zhigang
2015-01-01
RNA-sequencing is rapidly becoming the method of choice for studying the full complexity of transcriptomes, however with increasing dimensionality, accurate gene ranking is becoming increasingly challenging. This paper proposes an accurate and sensitive gene ranking method that implements discriminant non-negative matrix factorization (DNMF) for RNA-seq data. To the best of our knowledge, this is the first work to explore the utility of DNMF for gene ranking. When incorporating Fisher's discriminant criteria and setting the reduced dimension as two, DNMF learns two factors to approximate the original gene expression data, abstracting the up-regulated or down-regulated metagene by using the sample label information. The first factor denotes all the genes' weights of two metagenes as the additive combination of all genes, while the second learned factor represents the expression values of two metagenes. In the gene ranking stage, all the genes are ranked as a descending sequence according to the differential values of the metagene weights. Leveraging the nature of NMF and Fisher's criterion, DNMF can robustly boost the gene ranking performance. The Area Under the Curve analysis of differential expression analysis on two benchmarking tests of four RNA-seq data sets with similar phenotypes showed that our proposed DNMF-based gene ranking method outperforms other widely used methods. Moreover, the Gene Set Enrichment Analysis also showed DNMF outweighs others. DNMF is also computationally efficient, substantially outperforming all other benchmarked methods. Consequently, we suggest DNMF is an effective method for the analysis of differential gene expression and gene ranking for RNA-seq data.
SIDR: simultaneous isolation and parallel sequencing of genomic DNA and total RNA from single cells.
Han, Kyung Yeon; Kim, Kyu-Tae; Joung, Je-Gun; Son, Dae-Soon; Kim, Yeon Jeong; Jo, Areum; Jeon, Hyo-Jeong; Moon, Hui-Sung; Yoo, Chang Eun; Chung, Woosung; Eum, Hye Hyeon; Kim, Sangmin; Kim, Hong Kwan; Lee, Jeong Eon; Ahn, Myung-Ju; Lee, Hae-Ock; Park, Donghyun; Park, Woong-Yang
2018-01-01
Simultaneous sequencing of the genome and transcriptome at the single-cell level is a powerful tool for characterizing genomic and transcriptomic variation and revealing correlative relationships. However, it remains technically challenging to analyze both the genome and transcriptome in the same cell. Here, we report a novel method for simultaneous isolation of genomic DNA and total RNA (SIDR) from single cells, achieving high recovery rates with minimal cross-contamination, as is crucial for accurate description and integration of the single-cell genome and transcriptome. For reliable and efficient separation of genomic DNA and total RNA from single cells, the method uses hypotonic lysis to preserve nuclear lamina integrity and subsequently captures the cell lysate using antibody-conjugated magnetic microbeads. Evaluating the performance of this method using real-time PCR demonstrated that it efficiently recovered genomic DNA and total RNA. Thorough data quality assessments showed that DNA and RNA simultaneously fractionated by the SIDR method were suitable for genome and transcriptome sequencing analysis at the single-cell level. The integration of single-cell genome and transcriptome sequencing by SIDR (SIDR-seq) showed that genetic alterations, such as copy-number and single-nucleotide variations, were more accurately captured by single-cell SIDR-seq compared with conventional single-cell RNA-seq, although copy-number variations positively correlated with the corresponding gene expression levels. These results suggest that SIDR-seq is potentially a powerful tool to reveal genetic heterogeneity and phenotypic information inferred from gene expression patterns at the single-cell level. © 2018 Han et al.; Published by Cold Spring Harbor Laboratory Press.
SIDR: simultaneous isolation and parallel sequencing of genomic DNA and total RNA from single cells
Han, Kyung Yeon; Kim, Kyu-Tae; Joung, Je-Gun; Son, Dae-Soon; Kim, Yeon Jeong; Jo, Areum; Jeon, Hyo-Jeong; Moon, Hui-Sung; Yoo, Chang Eun; Chung, Woosung; Eum, Hye Hyeon; Kim, Sangmin; Kim, Hong Kwan; Lee, Jeong Eon; Ahn, Myung-Ju; Lee, Hae-Ock; Park, Donghyun; Park, Woong-Yang
2018-01-01
Simultaneous sequencing of the genome and transcriptome at the single-cell level is a powerful tool for characterizing genomic and transcriptomic variation and revealing correlative relationships. However, it remains technically challenging to analyze both the genome and transcriptome in the same cell. Here, we report a novel method for simultaneous isolation of genomic DNA and total RNA (SIDR) from single cells, achieving high recovery rates with minimal cross-contamination, as is crucial for accurate description and integration of the single-cell genome and transcriptome. For reliable and efficient separation of genomic DNA and total RNA from single cells, the method uses hypotonic lysis to preserve nuclear lamina integrity and subsequently captures the cell lysate using antibody-conjugated magnetic microbeads. Evaluating the performance of this method using real-time PCR demonstrated that it efficiently recovered genomic DNA and total RNA. Thorough data quality assessments showed that DNA and RNA simultaneously fractionated by the SIDR method were suitable for genome and transcriptome sequencing analysis at the single-cell level. The integration of single-cell genome and transcriptome sequencing by SIDR (SIDR-seq) showed that genetic alterations, such as copy-number and single-nucleotide variations, were more accurately captured by single-cell SIDR-seq compared with conventional single-cell RNA-seq, although copy-number variations positively correlated with the corresponding gene expression levels. These results suggest that SIDR-seq is potentially a powerful tool to reveal genetic heterogeneity and phenotypic information inferred from gene expression patterns at the single-cell level. PMID:29208629
Epigenetic modulation by TFII-I during embryonic stem cell differentiation.
Bayarsaihan, Dashzeveg; Makeyev, Aleksandr V; Enkhmandakh, Badam
2012-10-01
TFII-I transcription factors play an essential role during early vertebrate embryogenesis. Genome-wide mapping studies by ChIP-seq and ChIP-chip revealed that TFII-I primes multiple genomic loci in mouse embryonic stem cells and embryonic tissues. Moreover, many TFII-I-bound regions co-localize with H3K4me3/K27me3 bivalent chromatin within the promoters of lineage-specific genes. This minireview provides a summary of current knowledge regarding the function of TFII-I in epigenetic control of stem cell differentiation. Copyright © 2012 Wiley Periodicals, Inc.
Properties of different density genotypes used in dairy cattle evaluation
USDA-ARS?s Scientific Manuscript database
Dairy cattle breeders have used a 50K chip since April 2008 and a less expensive, lower density (3K) chip since September 2010 in genomic selection. Evaluations from 3K are less reliable because genotype calls are less accurate and missing markers are imputed. After excluding genotypes with < 90% ca...
Indel detection from DNA and RNA sequencing data with transIndel.
Yang, Rendong; Van Etten, Jamie L; Dehm, Scott M
2018-04-19
Insertions and deletions (indels) are a major class of genomic variation associated with human disease. Indels are primarily detected from DNA sequencing (DNA-seq) data but their transcriptional consequences remain unexplored due to challenges in discriminating medium-sized and large indels from splicing events in RNA-seq data. Here, we developed transIndel, a splice-aware algorithm that parses the chimeric alignments predicted by a short read aligner and reconstructs the mid-sized insertions and large deletions based on the linear alignments of split reads from DNA-seq or RNA-seq data. TransIndel exhibits competitive or superior performance over eight state-of-the-art indel detection tools on benchmarks using both synthetic and real DNA-seq data. Additionally, we applied transIndel to DNA-seq and RNA-seq datasets from 333 primary prostate cancer patients from The Cancer Genome Atlas (TCGA) and 59 metastatic prostate cancer patients from AACR-PCF Stand-Up- To-Cancer (SU2C) studies. TransIndel enhanced the taxonomy of DNA- and RNA-level alterations in prostate cancer by identifying recurrent FOXA1 indels as well as exitron splicing in genes implicated in disease progression. Our study demonstrates that transIndel is a robust tool for elucidation of medium- and large-sized indels from DNA-seq and RNA-seq data. Including RNA-seq in indel discovery efforts leads to significant improvements in sensitivity for identification of med-sized and large indels missed by DNA-seq, and reveals non-canonical RNA-splicing events in genes associated with disease pathology.
Repliscan: a tool for classifying replication timing regions.
Zynda, Gregory J; Song, Jawon; Concia, Lorenzo; Wear, Emily E; Hanley-Bowdoin, Linda; Thompson, William F; Vaughn, Matthew W
2017-08-07
Replication timing experiments that use label incorporation and high throughput sequencing produce peaked data similar to ChIP-Seq experiments. However, the differences in experimental design, coverage density, and possible results make traditional ChIP-Seq analysis methods inappropriate for use with replication timing. To accurately detect and classify regions of replication across the genome, we present Repliscan. Repliscan robustly normalizes, automatically removes outlying and uninformative data points, and classifies Repli-seq signals into discrete combinations of replication signatures. The quality control steps and self-fitting methods make Repliscan generally applicable and more robust than previous methods that classify regions based on thresholds. Repliscan is simple and effective to use on organisms with different genome sizes. Even with analysis window sizes as small as 1 kilobase, reliable profiles can be generated with as little as 2.4x coverage.
2013-01-01
Background The production of multiple transcript isoforms from one gene is a major source of transcriptome complexity. RNA-Seq experiments, in which transcripts are converted to cDNA and sequenced, allow the resolution and quantification of alternative transcript isoforms. However, methods to analyze splicing are underdeveloped and errors resulting in incorrect splicing calls occur in every experiment. Results We used RNA-Seq data to develop sequencing and aligner error models. By applying these error models to known input from simulations, we found that errors result from false alignment to minor splice motifs and antisense stands, shifted junction positions, paralog joining, and repeat induced gaps. By using a series of quantitative and qualitative filters, we eliminated diagnosed errors in the simulation, and applied this to RNA-Seq data from Drosophila melanogaster heads. We used high-confidence junction detections to specifically interrogate local splicing differences between transcripts. This method out-performed commonly used RNA-seq methods to identify known alternative splicing events in the Drosophila sex determination pathway. We describe a flexible software package to perform these tasks called Splicing Analysis Kit (Spanki), available at http://www.cbcb.umd.edu/software/spanki. Conclusions Splice-junction centric analysis of RNA-Seq data provides advantages in specificity for detection of alternative splicing. Our software provides tools to better understand error profiles in RNA-Seq data and improve inference from this new technology. The splice-junction centric approach that this software enables will provide more accurate estimates of differentially regulated splicing than current tools. PMID:24209455
Liu, S; Song, L; Cram, D S; Xiong, L; Wang, K; Wu, R; Liu, J; Deng, K; Jia, B; Zhong, M; Yang, F
2015-10-01
To compare the performance of traditional G-banding karyotyping with that of copy number variation sequencing (CNV-Seq) for detection of chromosomal abnormalities associated with miscarriage. Products of conception (POC) were collected from spontaneous miscarriages. Chromosomal abnormalities were detected using high-resolution G-banding karyotyping and CNV sequencing. Quantitative fluorescent polymerase chain reaction analysis of maternal and POC DNA for short tandem repeat (STR) markers was used to both monitor maternal cell contamination and confirm the chromosomal status and sex of the miscarriage tissue. A total of 64 samples of POC, comprising 16 with an abnormal and 48 with a normal karyotype, were selected and coded for analysis by CNV-Seq. CNV-Seq results were concordant for 14 (87.5%) of the 16 gross chromosomal abnormalities identified by karyotyping, including 11 autosomal trisomies and three sex chromosomal aneuploidies (45,X). Of the two discordant results, a 69,XXX polyploidy was missed by CNV-Seq, although supporting STR marker analysis confirmed the triploidy. In contrast, CNV-Seq identified a sample with 45,X karyotype as a 45,X/46,XY mosaic. In the remaining 48 samples of POC with a normal karyotype, CNV-Seq detected a 2.58-Mb 22q deletion associated with DiGeorge syndrome and nine different smaller CNVs of no apparent clinical significance. CNV-Seq used in parallel with STR profiling is a reliable and accurate alternative to karyotyping for identifying chromosome copy number abnormalities associated with spontaneous miscarriage. Copyright © 2015 ISUOG. Published by John Wiley & Sons Ltd.
Sturgill, David; Malone, John H; Sun, Xia; Smith, Harold E; Rabinow, Leonard; Samson, Marie-Laure; Oliver, Brian
2013-11-09
The production of multiple transcript isoforms from one gene is a major source of transcriptome complexity. RNA-Seq experiments, in which transcripts are converted to cDNA and sequenced, allow the resolution and quantification of alternative transcript isoforms. However, methods to analyze splicing are underdeveloped and errors resulting in incorrect splicing calls occur in every experiment. We used RNA-Seq data to develop sequencing and aligner error models. By applying these error models to known input from simulations, we found that errors result from false alignment to minor splice motifs and antisense stands, shifted junction positions, paralog joining, and repeat induced gaps. By using a series of quantitative and qualitative filters, we eliminated diagnosed errors in the simulation, and applied this to RNA-Seq data from Drosophila melanogaster heads. We used high-confidence junction detections to specifically interrogate local splicing differences between transcripts. This method out-performed commonly used RNA-seq methods to identify known alternative splicing events in the Drosophila sex determination pathway. We describe a flexible software package to perform these tasks called Splicing Analysis Kit (Spanki), available at http://www.cbcb.umd.edu/software/spanki. Splice-junction centric analysis of RNA-Seq data provides advantages in specificity for detection of alternative splicing. Our software provides tools to better understand error profiles in RNA-Seq data and improve inference from this new technology. The splice-junction centric approach that this software enables will provide more accurate estimates of differentially regulated splicing than current tools.
Ultra-dense magnetoresistive mass memory
NASA Technical Reports Server (NTRS)
Daughton, J. M.; Sinclair, R.; Dupuis, T.; Brown, J.
1992-01-01
This report details the progress and accomplishments of Nonvolatile Electronics (NVE), Inc., on the design of the wafer scale MRAM mass memory system during the fifth quarter of the project. NVE has made significant progress this quarter on the one megabit design in several different areas. A test chip, which will verify a working GMR bit with the dimensions required by the 1 Meg chip, has been designed, laid out, and is currently being processed in the NVE labs. This test chip will allow electrical specifications, tolerances, and processing issues to be finalized before construction of the actual chip, thus providing a greater assurance of success of the final 1 Meg design. A model has been developed to accurately simulate the parasitic effects of unselected sense lines. This model gives NVE the ability to perform accurate simulations of the array electronic and test different design concepts. Much of the circuit design for the 1 Meg chip has been completed and simulated and these designs are included. Progress has been made in the wafer scale design area to verify the reliable operation of the 16 K macrocell. This is currently being accomplished with the design and construction of two stand alone test systems which will perform life tests and gather data on reliabiliy and wearout mechanisms for analysis.
Terahertz microfluidic chips for detection of amino acids in aqueous solutions
NASA Astrophysics Data System (ADS)
Su, Bo; Zhang, Cong; Fan, Ning; Zhang, Cunlin
2016-11-01
Microfluidic technology can control the fluidic thickness accurately in less than 100 micrometers. So the combination of terahertz (THz) and microfluidic technology becomes one of the most interesting directions towards biological detection. We designed microfluidic chips for terahertz spectroscopy of biological samples in aqueous solutions. Using the terahertz time-domain spectroscopy (THz-TDS) system, we experimentally measured the transmittance of the chips and the THz absorption spectra of L-threonine and L-arginine, respectively. The results indicated the feasibility of performing high sensitivity THz spectroscopy of amino acids solutions. Therefore, the microfluidic chips can realize real-time and label-free measurement for biochemistry samples in THz-TDS system.
Su, Zhipeng; Zhu, Jiawen; Xu, Zhuofei; Xiao, Ran; Zhou, Rui; Li, Lu; Chen, Huanchun
2016-01-01
Actinobacillus pleuropneumoniae is the pathogen of porcine contagious pleuropneumoniae, a highly contagious respiratory disease of swine. Although the genome of A. pleuropneumoniae was sequenced several years ago, limited information is available on the genome-wide transcriptional analysis to accurately annotate the gene structures and regulatory elements. High-throughput RNA sequencing (RNA-seq) has been applied to study the transcriptional landscape of bacteria, which can efficiently and accurately identify gene expression regions and unknown transcriptional units, especially small non-coding RNAs (sRNAs), UTRs and regulatory regions. The aim of this study is to comprehensively analyze the transcriptome of A. pleuropneumoniae by RNA-seq in order to improve the existing genome annotation and promote our understanding of A. pleuropneumoniae gene structures and RNA-based regulation. In this study, we utilized RNA-seq to construct a single nucleotide resolution transcriptome map of A. pleuropneumoniae. More than 3.8 million high-quality reads (average length ~90 bp) from a cDNA library were generated and aligned to the reference genome. We identified 32 open reading frames encoding novel proteins that were mis-annotated in the previous genome annotations. The start sites for 35 genes based on the current genome annotation were corrected. Furthermore, 51 sRNAs in the A. pleuropneumoniae genome were discovered, of which 40 sRNAs were never reported in previous studies. The transcriptome map also enabled visualization of 5'- and 3'-UTR regions, in which contained 11 sRNAs. In addition, 351 operons covering 1230 genes throughout the whole genome were identified. The RNA-Seq based transcriptome map validated annotated genes and corrected annotations of open reading frames in the genome, and led to the identification of many functional elements (e.g. regions encoding novel proteins, non-coding sRNAs and operon structures). The transcriptional units described in this study provide a foundation for future studies concerning the gene functions and the transcriptional regulatory architectures of this pathogen. PMID:27018591
Grünberg, Sebastian; Henikoff, Steven; Hahn, Steven; Zentner, Gabriel E
2016-11-15
Mediator is a conserved, essential transcriptional coactivator complex, but its in vivo functions have remained unclear due to conflicting data regarding its genome-wide binding pattern obtained by genome-wide ChIP Here, we used ChEC-seq, a method orthogonal to ChIP, to generate a high-resolution map of Mediator binding to the yeast genome. We find that Mediator associates with upstream activating sequences (UASs) rather than the core promoter or gene body under all conditions tested. Mediator occupancy is surprisingly correlated with transcription levels at only a small fraction of genes. Using the same approach to map TFIID, we find that TFIID is associated with both TFIID- and SAGA-dependent genes and that TFIID and Mediator occupancy is cooperative. Our results clarify Mediator recruitment and binding to the genome, showing that Mediator binding to UASs is widespread, partially uncoupled from transcription, and mediated in part by TFIID. © 2016 The Authors.
Methods and kits for predicting a response to an erythropoietic agent
Merchant, Michael L.; Klein, Jon B.; Brier, Michael E.; Gaweda, Adam E.
2015-06-16
Methods for predicting a response to an erythropoietic agent in a subject include providing a biological sample from the subject, and determining an amount in the sample of at least one peptide selected from the group consisting of SEQ ID NOS: 1-17. If there is a measurable difference in the amount of the at least one peptide in the sample, when compared to a control level of the same peptide, the subject is then predicted to have a good response or a poor response to the erythropoietic agent. Kits for predicting a response to an erythropoietic agent are further provided and include one or more antibodies, or fragments thereof, that specifically recognize a peptide of SEQ ID NOS: 1-17.
McKenzie, Brittney A.
2017-01-01
Measuring the temperature of a sample is a fundamental need in many biological and chemical processes. When the volume of the sample is on the microliter or nanoliter scale (e.g., cells, microorganisms, precious samples, or samples in microfluidic devices), accurate measurement of the sample temperature becomes challenging. In this work, we demonstrate a technique for accurately determining the temperature of microliter volumes using a simple 3D-printed microfluidic chip. We accomplish this by first filling “microfluidic thermometer” channels on the chip with substances with precisely known freezing/melting points. We then use a thermoelectric cooler to create a stable and linear temperature gradient along these channels within a measurement region on the chip. A custom software tool (available as online Supporting Information) is then used to find the locations of solid-liquid interfaces in the thermometer channels; these locations have known temperatures equal to the freezing/melting points of the substances in the channels. The software then uses the locations of these interfaces to calculate the temperature at any desired point within the measurement region. Using this approach, the temperature of any microliter-scale on-chip sample can be measured with an uncertainty of about a quarter of a degree Celsius. As a proof-of-concept, we use this technique to measure the unknown freezing point of a 50 microliter volume of solution and demonstrate its feasibility on a 400 nanoliter sample. Additionally, this technique can be used to measure the temperature of any on-chip sample, not just near-zero-Celsius freezing points. We demonstrate this by using an oil that solidifies near room temperature (coconut oil) in a microfluidic thermometer to measure on-chip temperatures well above zero Celsius. By providing a low-cost and simple way to accurately measure temperatures in small volumes, this technique should find applications in both research and educational laboratories. PMID:29284028
NASA Astrophysics Data System (ADS)
Jiang, Ning
Accurately measuring the immune repertoire sequence composition, diversity, and abundance is important in studying repertoire response in infections, vaccinations, and cancer immunology. Using molecular identifiers (MIDs) to tag mRNA molecules is an effective method in improving the accuracy of immune repertoire sequencing (IR-seq). However, it is still difficult to use IR-seq on small amount of clinical samples to achieve a high coverage of the repertoire diversities. This is especially challenging in studying infections and vaccinations where B cell subpopulations with fewer cells, such as memory B cells or plasmablasts, are often of great interest to study somatic mutation patterns and diversity changes. Here, we describe an approach of IR-seq based on the use of MIDs in combination with a clustering method that can reveal more than 80% of the antibody diversity in a sample and can be applied to as few as 1,000 B cells. We applied this to study the antibody repertoires of young children before and during an acute malaria infection. We discovered unexpectedly high levels of somatic hypermutation (SHM) in infants and revealed characteristics of antibody repertoire development in young children that would have a profound impact on immunization in children.
Determining ERβ Binding Affinity to Singly Mutant ERE Using Dual Polarization Interferometry
NASA Astrophysics Data System (ADS)
Song, Hong Yan; Su, Xiaodi
In a classic mode of estrogen action, estrogen receptors (ERs) bind to estrogen responsive element (ERE) to activate gene transcription. A perfect ERE contains a 13-base pair sequence of a palindromic repeat separated by a three-base spacer, 5‧-GGTCAnnnTGACC-3‧. In addition to the consensus or wild-type ERE (wtERE), naturally occurring EREs often have one or two base pairs’ alternation. Based on the newly constructed Thermodynamic Modeling of ChIP-seq (TherMos) model, binding energy between ERβ and a series of 34-bp mutant EREs (mutERE) was simulated to predict the binding affinity between ERs and EREs with single base pair deviation at different sites of the 13-bp inverted sequence. Experimentally, dual polarization interferometry (DPI) method was developed to measure ERβ-mutEREs binding affinity. On a biotin-NeutrAvidin (NA)-biotin treated DPI chip, wtERE is immobilized. In a direct binding assay, ERβ-wtERE binding affinity is determined. In a competition assay, ERβ was preincubated with mutant EREs before being added for competitive binding to the immobilized wtERE. This competition strategy provided a successful platform to evaluate the binding affinity variation among large number of ERE with different base mutations. The experimental result correlates well with the mathematically predicted binding energy with a Spearman correlation coefficient of 0.97.
A Streaming Language Implementation of the Discontinuous Galerkin Method
NASA Technical Reports Server (NTRS)
Barth, Timothy; Knight, Timothy
2005-01-01
We present a Brook streaming language implementation of the 3-D discontinuous Galerkin method for compressible fluid flow on tetrahedral meshes. Efficient implementation of the discontinuous Galerkin method using the streaming model of computation introduces several algorithmic design challenges. Using a cycle-accurate simulator, performance characteristics have been obtained for the Stanford Merrimac stream processor. The current Merrimac design achieves 128 Gflops per chip and the desktop board is populated with 16 chips yielding a peak performance of 2 Teraflops. Total parts cost for the desktop board is less than $20K. Current cycle-accurate simulations for discretizations of the 3-D compressible flow equations yield approximately 40-50% of the peak performance of the Merrimac streaming processor chip. Ongoing work includes the assessment of the performance of the same algorithm on the 2 Teraflop desktop board with a target goal of achieving 1 Teraflop performance.
Solution-based circuits enable rapid and multiplexed pathogen detection.
Lam, Brian; Das, Jagotamoy; Holmes, Richard D; Live, Ludovic; Sage, Andrew; Sargent, Edward H; Kelley, Shana O
2013-01-01
Electronic readout of markers of disease provides compelling simplicity, sensitivity and specificity in the detection of small panels of biomarkers in clinical samples; however, the most important emerging tests for disease, such as infectious disease speciation and antibiotic-resistance profiling, will need to interrogate samples for many dozens of biomarkers. Electronic readout of large panels of markers has been hampered by the difficulty of addressing large arrays of electrode-based sensors on inexpensive platforms. Here we report a new concept--solution-based circuits formed on chip--that makes highly multiplexed electrochemical sensing feasible on passive chips. The solution-based circuits switch the information-carrying signal readout channels and eliminate all measurable crosstalk from adjacent, biomolecule-specific microsensors. We build chips that feature this advance and prove that they analyse unpurified samples successfully, and accurately classify pathogens at clinically relevant concentrations. We also show that signature molecules can be accurately read 2 minutes after sample introduction.
Modeling of heat transfer in compacted machining chips during friction consolidation process
NASA Astrophysics Data System (ADS)
Abbas, Naseer; Deng, Xiaomin; Li, Xiao; Reynolds, Anthony
2018-04-01
The current study aims to provide an understanding of the heat transfer process in compacted aluminum alloy AA6061 machining chips during the friction consolidation process (FCP) through experimental investigations and mathematical modelling and numerical simulation. Compaction and friction consolidation of machining chips is the first stage of the Friction Extrusion Process (FEP), which is a novel method for recycling machining chips to produce useful products such as wires. In this study, compacted machining chips are modelled as a continuum whose material properties vary with density during friction consolidation. Based on density and temperature dependent thermal properties, the temperature field in the chip material and process chamber caused by frictional heating during the friction consolidation process is predicted. The predicted temperature field is found to compare well with temperature measurements at select points where such measurements can be made using thermocouples.
Skelly, Daniel A.; Johansson, Marnie; Madeoy, Jennifer; Wakefield, Jon; Akey, Joshua M.
2011-01-01
Variation in gene expression is thought to make a significant contribution to phenotypic diversity among individuals within populations. Although high-throughput cDNA sequencing offers a unique opportunity to delineate the genome-wide architecture of regulatory variation, new statistical methods need to be developed to capitalize on the wealth of information contained in RNA-seq data sets. To this end, we developed a powerful and flexible hierarchical Bayesian model that combines information across loci to allow both global and locus-specific inferences about allele-specific expression (ASE). We applied our methodology to a large RNA-seq data set obtained in a diploid hybrid of two diverse Saccharomyces cerevisiae strains, as well as to RNA-seq data from an individual human genome. Our statistical framework accurately quantifies levels of ASE with specified false-discovery rates, achieving high reproducibility between independent sequencing platforms. We pinpoint loci that show unusual and biologically interesting patterns of ASE, including allele-specific alternative splicing and transcription termination sites. Our methodology provides a rigorous, quantitative, and high-resolution tool for profiling ASE across whole genomes. PMID:21873452
Partial bisulfite conversion for unique template sequencing.
Kumar, Vijay; Rosenbaum, Julie; Wang, Zihua; Forcier, Talitha; Ronemus, Michael; Wigler, Michael; Levy, Dan
2018-01-25
We introduce a new protocol, mutational sequencing or muSeq, which uses sodium bisulfite to randomly deaminate unmethylated cytosines at a fixed and tunable rate. The muSeq protocol marks each initial template molecule with a unique mutation signature that is present in every copy of the template, and in every fragmented copy of a copy. In the sequenced read data, this signature is observed as a unique pattern of C-to-T or G-to-A nucleotide conversions. Clustering reads with the same conversion pattern enables accurate count and long-range assembly of initial template molecules from short-read sequence data. We explore count and low-error sequencing by profiling 135 000 restriction fragments in a PstI representation, demonstrating that muSeq improves copy number inference and significantly reduces sporadic sequencer error. We explore long-range assembly in the context of cDNA, generating contiguous transcript clusters greater than 3,000 bp in length. The muSeq assemblies reveal transcriptional diversity not observable from short-read data alone. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Selective amplification and sequencing of cyclic phosphate-containing RNAs by the cP-RNA-seq method
Honda, Shozo; Morichika, Keisuke; Kirino, Yohei
2016-01-01
RNA digestions catalyzed by many ribonucleases generate RNA fragments containing a 2′,3′-cyclic phosphate (cP) at their 3′-termini. However, standard RNA-seq methods are unable to accurately capture cP-containing RNAs because the cP inhibits the adapter ligation reaction. We recently developed a method named “cP-RNA-seq” that is able to selectively amplify and sequence cP-containing RNAs. Here we describe the cP-RNA-seq protocol in which the 3′-termini of all RNAs, except those containing a cP, are cleaved through a periodate treatment after phosphatase treatment, hence subsequent adapter ligation and cDNA amplification steps are exclusively applied to cP-containing RNAs. cP-RNA-seq takes ~6 d, excluding the time required for sequencing and bioinformatics analyses, such downstream assays are not covered in detail in this protocol. Biochemical validation of the existence of cP in the identified RNAs takes ~3 d. Even though the cP-RNA-seq method was developed to identify angiogenin-generating 5′-tRNA halves as a proof of principle, the method should be applicable to global identification of cP-containing RNA repertoires in various transcriptomes. PMID:26866791
ChIP-PaM: an algorithm to identify protein-DNA interaction using ChIP-Seq data.
Wu, Song; Wang, Jianmin; Zhao, Wei; Pounds, Stanley; Cheng, Cheng
2010-06-03
ChIP-Seq is a powerful tool for identifying the interaction between genomic regulators and their bound DNAs, especially for locating transcription factor binding sites. However, high cost and high rate of false discovery of transcription factor binding sites identified from ChIP-Seq data significantly limit its application. Here we report a new algorithm, ChIP-PaM, for identifying transcription factor target regions in ChIP-Seq datasets. This algorithm makes full use of a protein-DNA binding pattern by capitalizing on three lines of evidence: 1) the tag count modelling at the peak position, 2) pattern matching of a specific tag count distribution, and 3) motif searching along the genome. A novel data-based two-step eFDR procedure is proposed to integrate the three lines of evidence to determine significantly enriched regions. Our algorithm requires no technical controls and efficiently discriminates falsely enriched regions from regions enriched by true transcription factor (TF) binding on the basis of ChIP-Seq data only. An analysis of real genomic data is presented to demonstrate our method. In a comparison with other existing methods, we found that our algorithm provides more accurate binding site discovery while maintaining comparable statistical power.
Impact of sequencing depth and read length on single cell RNA sequencing data of T cells.
Rizzetto, Simone; Eltahla, Auda A; Lin, Peijie; Bull, Rowena; Lloyd, Andrew R; Ho, Joshua W K; Venturi, Vanessa; Luciani, Fabio
2017-10-06
Single cell RNA sequencing (scRNA-seq) provides great potential in measuring the gene expression profiles of heterogeneous cell populations. In immunology, scRNA-seq allowed the characterisation of transcript sequence diversity of functionally relevant T cell subsets, and the identification of the full length T cell receptor (TCRαβ), which defines the specificity against cognate antigens. Several factors, e.g. RNA library capture, cell quality, and sequencing output affect the quality of scRNA-seq data. We studied the effects of read length and sequencing depth on the quality of gene expression profiles, cell type identification, and TCRαβ reconstruction, utilising 1,305 single cells from 8 publically available scRNA-seq datasets, and simulation-based analyses. Gene expression was characterised by an increased number of unique genes identified with short read lengths (<50 bp), but these featured higher technical variability compared to profiles from longer reads. Successful TCRαβ reconstruction was achieved for 6 datasets (81% - 100%) with at least 0.25 millions (PE) reads of length >50 bp, while it failed for datasets with <30 bp reads. Sufficient read length and sequencing depth can control technical noise to enable accurate identification of TCRαβ and gene expression profiles from scRNA-seq data of T cells.
Classifying lower grade glioma cases according to whole genome gene expression.
Chen, Baoshi; Liang, Tingyu; Yang, Pei; Wang, Haoyuan; Liu, Yanwei; Yang, Fan; You, Gan
2016-11-08
To identify a gene-based signature as a novel prognostic model in lower grade gliomas. A gene signature developed from HOXA7, SLC2A4RG and MN1 could segregate patients into low and high risk score groups with different overall survival (OS), and was validated in TCGA RNA-seq and GSE16011 mRNA array datasets. Receiver operating characteristic (ROC) was performed to show that the three-gene signature was more sensitive and specific than histology, grade, age, IDH1 mutation and 1p/19q co-deletion. Gene Set Enrichment Analysis (GSEA) and GO analysis showed high-risk samples were associated with tumor associated macrophages (TAMs) and highly invasive phenotypes. Moreover, HOXA7-siRNA inhibited migration and invasion in vitro, and downregulated MMP9 at the protein level in U251 glioma cells. A cohort of 164 glioma specimens from the Chinese Glioma Genome Atlas (CGGA) array database were assessed as the training group. TCGA RNA-seq and GSE16011 mRNA array datasets were used for validation. Regression analyses and linear risk score assessment were performed for the identification of the three-gene signature comprising HOXA7, SLC2A4RG and MN1. We established a three-gene signature for lower grade gliomas, which could independently predict overall survival (OS) of lower grade glioma patients with higher sensitivity and specificity compared with other clinical characteristics. These findings indicate that the three-gene signature is a new prognostic model that could provide improved OS prediction and accurate therapies for lower grade glioma patients.
Genomic prediction in a nuclear population of layers using single-step models.
Yan, Yiyuan; Wu, Guiqin; Liu, Aiqiao; Sun, Congjiao; Han, Wenpeng; Li, Guangqi; Yang, Ning
2018-02-01
Single-step genomic prediction method has been proposed to improve the accuracy of genomic prediction by incorporating information of both genotyped and ungenotyped animals. The objective of this study is to compare the prediction performance of single-step model with a 2-step models and the pedigree-based models in a nuclear population of layers. A total of 1,344 chickens across 4 generations were genotyped by a 600 K SNP chip. Four traits were analyzed, i.e., body weight at 28 wk (BW28), egg weight at 28 wk (EW28), laying rate at 38 wk (LR38), and Haugh unit at 36 wk (HU36). In predicting offsprings, individuals from generation 1 to 3 were used as training data and females from generation 4 were used as validation set. The accuracies of predicted breeding values by pedigree BLUP (PBLUP), genomic BLUP (GBLUP), SSGBLUP and single-step blending (SSBlending) were compared for both genotyped and ungenotyped individuals. For genotyped females, GBLUP performed no better than PBLUP because of the small size of training data, while the 2 single-step models predicted more accurately than the PBLUP model. The average predictive ability of SSGBLUP and SSBlending were 16.0% and 10.8% higher than the PBLUP model across traits, respectively. Furthermore, the predictive abilities for ungenotyped individuals were also enhanced. The average improvements of prediction abilities were 5.9% and 1.5% for SSGBLUP and SSBlending model, respectively. It was concluded that single-step models, especially the SSGBLUP model, can yield more accurate prediction of genetic merits and are preferable for practical implementation of genomic selection in layers. © 2017 Poultry Science Association Inc.
Zhang, ZhiZhuo; Chang, Cheng Wei; Hugo, Willy; Cheung, Edwin; Sung, Wing-Kin
2013-03-01
Although de novo motifs can be discovered through mining over-represented sequence patterns, this approach misses some real motifs and generates many false positives. To improve accuracy, one solution is to consider some additional binding features (i.e., position preference and sequence rank preference). This information is usually required from the user. This article presents a de novo motif discovery algorithm called SEME (sampling with expectation maximization for motif elicitation), which uses pure probabilistic mixture model to model the motif's binding features and uses expectation maximization (EM) algorithms to simultaneously learn the sequence motif, position, and sequence rank preferences without asking for any prior knowledge from the user. SEME is both efficient and accurate thanks to two important techniques: the variable motif length extension and importance sampling. Using 75 large-scale synthetic datasets, 32 metazoan compendium benchmark datasets, and 164 chromatin immunoprecipitation sequencing (ChIP-Seq) libraries, we demonstrated the superior performance of SEME over existing programs in finding transcription factor (TF) binding sites. SEME is further applied to a more difficult problem of finding the co-regulated TF (coTF) motifs in 15 ChIP-Seq libraries. It identified significantly more correct coTF motifs and, at the same time, predicted coTF motifs with better matching to the known motifs. Finally, we show that the learned position and sequence rank preferences of each coTF reveals potential interaction mechanisms between the primary TF and the coTF within these sites. Some of these findings were further validated by the ChIP-Seq experiments of the coTFs. The application is available online.
Thomsen, Martin Christen Frølund; Nielsen, Morten
2012-01-01
Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed). PMID:22638583
The Nuclear Receptor, RORγ, Regulates Pathways Necessary for Breast Cancer Metastasis
Oh, Tae Gyu; Wang, Shu-Ching M.; Acharya, Bipul R.; Goode, Joel M.; Graham, J. Dinny; Clarke, Christine L.; Yap, Alpha S.; Muscat, George E.O.
2016-01-01
We have previously reported that RORγ expression was decreased in ER − ve breast cancer, and increased expression improves clinical outcomes. However, the underlying RORγ dependent mechanisms that repress breast carcinogenesis have not been elucidated. Here we report that RORγ negatively regulates the oncogenic TGF-β/EMT and mammary stem cell (MaSC) pathways, whereas RORγ positively regulates DNA-repair. We demonstrate that RORγ expression is: (i) decreased in basal-like subtype cancers, and (ii) inversely correlated with histological grade and drivers of carcinogenesis in breast cancer cohorts. Furthermore, integration of RNA-seq and ChIP-chip data reveals that RORγ regulates the expression of many genes involved in TGF-β/EMT-signaling, DNA-repair and MaSC pathways (including the non-coding RNA, LINC00511). In accordance, pharmacological studies demonstrate that an RORγ agonist suppresses breast cancer cell viability, migration, the EMT transition (microsphere outgrowth) and mammosphere-growth. In contrast, RNA-seq demonstrates an RORγ inverse agonist induces TGF-β/EMT-signaling. These findings suggest pharmacological modulation of RORγ activity may have utility in breast cancer. PMID:27211549
Zhu, Lin; Guo, Wei-Li; Deng, Su-Ping; Huang, De-Shuang
2016-01-01
In recent years, thanks to the efforts of individual scientists and research consortiums, a huge amount of chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) experimental data have been accumulated. Instead of investigating them independently, several recent studies have convincingly demonstrated that a wealth of scientific insights can be gained by integrative analysis of these ChIP-seq data. However, when used for the purpose of integrative analysis, a serious drawback of current ChIP-seq technique is that it is still expensive and time-consuming to generate ChIP-seq datasets of high standard. Most researchers are therefore unable to obtain complete ChIP-seq data for several TFs in a wide variety of cell lines, which considerably limits the understanding of transcriptional regulation pattern. In this paper, we propose a novel method called ChIP-PIT to overcome the aforementioned limitation. In ChIP-PIT, ChIP-seq data corresponding to a diverse collection of cell types, TFs and genes are fused together using the three-mode pair-wise interaction tensor (PIT) model, and the prediction of unperformed ChIP-seq experimental results is formulated as a tensor completion problem. Computationally, we propose efficient first-order method based on extensions of coordinate descent method to learn the optimal solution of ChIP-PIT, which makes it particularly suitable for the analysis of massive scale ChIP-seq data. Experimental evaluation the ENCODE data illustrate the usefulness of the proposed model.
A Lab-on-Chip Design for Miniature Autonomous Bio-Chemoprospecting Planetary Rovers
NASA Astrophysics Data System (ADS)
Santoli, S.
The performance of the so-called ` Lab-on-Chip ' devices, featuring micrometre size components and employed at present for carrying out in a very fast and economic way the extremely high number of sequence determinations required in genomic analyses, can be largely improved as to further size reduction, decrease of power consumption and reaction efficiency through development of nanofluidics and of nano-to-micro inte- grated systems. As is shown, such new technologies would lead to robotic, fully autonomous, microwatt consumption and complete ` laboratory on a chip ' units for accurate, fast and cost-effective astrobiological and planetary exploration missions. The theory and the manufacturing technologies for the ` active chip ' of a miniature bio/chemoprospecting planetary rover working on micro- and nanofluidics are investigated. The chip would include micro- and nanoreactors, integrated MEMS (MicroElectroMechanical System) components, nanoelectronics and an intracavity nanolaser for highly accurate and fast chemical analysis as an application of such recently introduced solid state devices. Nano-reactors would be able to strongly speed up reaction kinetics as a result of increased frequency of reactive collisions. The reaction dynamics may also be altered with respect to standard macroscopic reactors. A built-in miniature telemetering unit would connect a network of other similar rovers and a central, ground-based or orbiting control unit for data collection and transmission to an Earth-based unit through a powerful antenna. The development of the ` Lab-on-Chip ' concept for space applications would affect the economy of space exploration missions, as the rover's ` Lab-on-Chip ' development would link space missions with the ever growing terrestrial market and business concerning such devices, largely employed in modern genomics and bioinformatics, so that it would allow the recoupment of space mission costs.
A CMOS Imager with Focal Plane Compression using Predictive Coding
NASA Technical Reports Server (NTRS)
Leon-Salas, Walter D.; Balkir, Sina; Sayood, Khalid; Schemm, Nathan; Hoffman, Michael W.
2007-01-01
This paper presents a CMOS image sensor with focal-plane compression. The design has a column-level architecture and it is based on predictive coding techniques for image decorrelation. The prediction operations are performed in the analog domain to avoid quantization noise and to decrease the area complexity of the circuit, The prediction residuals are quantized and encoded by a joint quantizer/coder circuit. To save area resources, the joint quantizerlcoder circuit exploits common circuitry between a single-slope analog-to-digital converter (ADC) and a Golomb-Rice entropy coder. This combination of ADC and encoder allows the integration of the entropy coder at the column level. A prototype chip was fabricated in a 0.35 pm CMOS process. The output of the chip is a compressed bit stream. The test chip occupies a silicon area of 2.60 mm x 5.96 mm which includes an 80 X 44 APS array. Tests of the fabricated chip demonstrate the validity of the design.
How to Combine ChIP with qPCR.
Asp, Patrik
2018-01-01
Chromatin immunoprecipitation (ChIP) coupled with quantitative PCR (qPCR) has in the last 15 years become a basic mainstream tool in genomic research. Numerous commercially available ChIP kits, qPCR kits, and real-time PCR systems allow for quick and easy analysis of virtually anything chromatin-related as long as there is an available antibody. However, the highly accurate quantitative dimension added by using qPCR to analyze ChIP samples significantly raises the bar in terms of experimental accuracy, appropriate controls, data analysis, and data presentation. This chapter will address these potential pitfalls by providing protocols and procedures that address the difficulties inherent in ChIP-qPCR assays.
SeqRate: sequence-based protein folding type classification and rates prediction
2010-01-01
Background Protein folding rate is an important property of a protein. Predicting protein folding rate is useful for understanding protein folding process and guiding protein design. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. And most methods do not distinguish the different kinetic nature (two-state folding or multi-state folding) of the proteins. Here we developed a method, SeqRate, to predict both protein folding kinetic type (two-state versus multi-state) and real-value folding rate using sequence length, amino acid composition, contact order, contact number, and secondary structure information predicted from only protein sequence with support vector machines. Results We systematically studied the contributions of individual features to folding rate prediction. On a standard benchmark dataset, the accuracy of folding kinetic type classification is 80%. The Pearson correlation coefficient and the mean absolute difference between predicted and experimental folding rates (sec-1) in the base-10 logarithmic scale are 0.81 and 0.79 for two-state protein folders, and 0.80 and 0.68 for three-state protein folders. SeqRate is the first sequence-based method for protein folding type classification and its accuracy of fold rate prediction is improved over previous sequence-based methods. Its performance can be further enhanced with additional information, such as structure-based geometric contacts, as inputs. Conclusions Both the web server and software of predicting folding rate are publicly available at http://casp.rnet.missouri.edu/fold_rate/index.html. PMID:20438647
Nguyen, Chuong; Nguyen, Van Duy
2016-01-01
Azurin from Pseudomonas aeruginosa is known anticancer bacteriocin, which can specifically penetrate human cancer cells and induce apoptosis. We hypothesized that pathogenic and commensal bacteria with long term residence in human body can produce azurin-like bacteriocins as a weapon against the invasion of cancers. In our previous work, putative bacteriocins have been screened from complete genomes of 66 dominant bacteria species in human gut microbiota and subsequently characterized by subjecting them as functional annotation algorithms with azurin as control. We have qualitatively predicted 14 putative bacteriocins that possessed functional properties very similar to those of azurin. In this work, we perform a number of quantitative and structure-based analyses including hydrophobic percentage calculation, structural modeling, and molecular docking study of bacteriocins of interest against protein p53, a cancer target. Finally, we have identified 8 putative bacteriocins that bind p53 in a same manner as p28-azurin and azurin, in which 3 peptides (p1seq16, p2seq20, and p3seq24) shared with our previous study and 5 novel ones (p1seq09, p2seq05, p2seq08, p3seq02, and p3seq17) discovered in the first time. These bacteriocins are suggested for further in vitro tests in different neoplastic line cells.
Nguyen, Chuong; Nguyen, Van Duy
2016-01-01
Azurin from Pseudomonas aeruginosa is known anticancer bacteriocin, which can specifically penetrate human cancer cells and induce apoptosis. We hypothesized that pathogenic and commensal bacteria with long term residence in human body can produce azurin-like bacteriocins as a weapon against the invasion of cancers. In our previous work, putative bacteriocins have been screened from complete genomes of 66 dominant bacteria species in human gut microbiota and subsequently characterized by subjecting them as functional annotation algorithms with azurin as control. We have qualitatively predicted 14 putative bacteriocins that possessed functional properties very similar to those of azurin. In this work, we perform a number of quantitative and structure-based analyses including hydrophobic percentage calculation, structural modeling, and molecular docking study of bacteriocins of interest against protein p53, a cancer target. Finally, we have identified 8 putative bacteriocins that bind p53 in a same manner as p28-azurin and azurin, in which 3 peptides (p1seq16, p2seq20, and p3seq24) shared with our previous study and 5 novel ones (p1seq09, p2seq05, p2seq08, p3seq02, and p3seq17) discovered in the first time. These bacteriocins are suggested for further in vitro tests in different neoplastic line cells. PMID:27239476
Mcclintock, Andrew S; Stiles, William B; Himawan, Lina; Anderson, Timothy; Barkham, Michael; Hardy, Gillian E
2016-01-01
Our aim was to examine client mood in the initial and final sessions of cognitive-behavioral therapy (CBT) and psychodynamic-interpersonal therapy (PIT) and to determine how client mood is related to therapy outcomes. Hierarchical linear modeling was applied to data from a clinical trial comparing CBT with PIT. In this trial, client mood was assessed before and after sessions with the Session Evaluation Questionnaire-Positivity Subscale (SEQ-P). In the initial sessions, CBT clients had higher pre-session and post-session SEQ-P ratings and greater pre-to-post session mood change than did clients in PIT. In the final sessions, these pre, post, and change scores were generally equivalent across CBT and PIT. CBT outcome was predicted by pre- and post-session SEQ-P ratings from both the initial sessions and the final sessions of CBT. However, PIT outcome was predicted by pre- and post-session SEQ-P ratings from the final sessions only. Pre-to-post session mood change was unrelated to outcome in both treatments. These results suggest different change processes are at work in CBT and PIT.
Hysteretic energy prediction method for mainshock-aftershock sequences
NASA Astrophysics Data System (ADS)
Zhai, Changhai; Ji, Duofa; Wen, Weiping; Li, Cuihua; Lei, Weidong; Xie, Lili
2018-04-01
Structures located in seismically active regions may be subjected to mainshock-aftershock (MSAS) sequences. Strong aftershocks significantly affect the hysteretic energy demand of structures. The hysteretic energy, E H,seq, is normalized by mass m and expressed in terms of the equivalent velocity, V D,seq, to quantitatively investigate aftershock effects on the hysteretic energy of structures. The equivalent velocity, V D,seq, is computed by analyzing the response time-history of an inelastic single-degree-of-freedom (SDOF) system with a varying vibration period subjected to 309 MSAS sequences. The present study selected two kinds of MSAS sequences, with one aftershock and two aftershocks, respectively. The aftershocks are scaled to maintain different relative intensities. The variation of the equivalent velocity, V D,seq, is studied for consideration of the ductility values, site conditions, relative intensities, number of aftershocks, hysteretic models, and damping ratios. The MSAS sequence with one aftershock exhibited a 10% to 30% hysteretic energy increase, whereas the MSAS sequence with two aftershocks presented a 20% to 40% hysteretic energy increase. Finally, a hysteretic energy prediction equation is proposed as a function of the vibration period, ductility value, and damping ratio to estimate hysteretic energy for mainshock-aftershock sequences.
Comparison of the performance of Ion Torrent chips in noninvasive prenatal trisomy detection.
Wang, Yanlin; Wen, Zujia; Shen, Jiawei; Cheng, Weiwei; Li, Jun; Qin, Xiaolan; Ma, Duan; Shi, Yongyong
2014-07-01
Semiconductor high-throughput sequencing, represented by Ion Torrent PGM/Proton, proves to be feasible in the noninvasive prenatal diagnosis of fetal aneuploidies. It is commendable that, with less data and relevant cost also, an accurate result can be achieved owing to the high sensitivity and specificity of such kind of technology. We conducted a comparative analysis of the performance of four different Ion chips in detecting fetal chromosomal aneuploidies. Eight maternal plasma DNA samples, including four pregnancies with normal fetuses and four with trisomy 21 fetuses, were sequenced on Ion Torrent 314/316/318/PI chips, respectively. Results such as read mapped ratio, correlation coefficient and phred quality score were calculated and parallelly compared. All samples were correctly classified even with low-throughput chip, and, among the four chips, the 316 chip had the highest read mapped ratio, correlation coefficient, mean read length and phred quality score. All chips were well consistent with each other. Our results showed that all Ion chips are applicable in noninvasive prenatal fetal aneuploidy diagnosis. We recommend researchers or clinicians to use the appropriate chip with barcoding technology on the basis of the sample number.
Genomic prediction using imputed whole-genome sequence data in Holstein Friesian cattle.
van Binsbergen, Rianne; Calus, Mario P L; Bink, Marco C A M; van Eeuwijk, Fred A; Schrooten, Chris; Veerkamp, Roel F
2015-09-17
In contrast to currently used single nucleotide polymorphism (SNP) panels, the use of whole-genome sequence data is expected to enable the direct estimation of the effects of causal mutations on a given trait. This could lead to higher reliabilities of genomic predictions compared to those based on SNP genotypes. Also, at each generation of selection, recombination events between a SNP and a mutation can cause decay in reliability of genomic predictions based on markers rather than on the causal variants. Our objective was to investigate the use of imputed whole-genome sequence genotypes versus high-density SNP genotypes on (the persistency of) the reliability of genomic predictions using real cattle data. Highly accurate phenotypes based on daughter performance and Illumina BovineHD Beadchip genotypes were available for 5503 Holstein Friesian bulls. The BovineHD genotypes (631,428 SNPs) of each bull were used to impute whole-genome sequence genotypes (12,590,056 SNPs) using the Beagle software. Imputation was done using a multi-breed reference panel of 429 sequenced individuals. Genomic estimated breeding values for three traits were predicted using a Bayesian stochastic search variable selection (BSSVS) model and a genome-enabled best linear unbiased prediction model (GBLUP). Reliabilities of predictions were based on 2087 validation bulls, while the other 3416 bulls were used for training. Prediction reliabilities ranged from 0.37 to 0.52. BSSVS performed better than GBLUP in all cases. Reliabilities of genomic predictions were slightly lower with imputed sequence data than with BovineHD chip data. Also, the reliabilities tended to be lower for both sequence data and BovineHD chip data when relationships between training animals were low. No increase in persistency of prediction reliability using imputed sequence data was observed. Compared to BovineHD genotype data, using imputed sequence data for genomic prediction produced no advantage. To investigate the putative advantage of genomic prediction using (imputed) sequence data, a training set with a larger number of individuals that are distantly related to each other and genomic prediction models that incorporate biological information on the SNPs or that apply stricter SNP pre-selection should be considered.
Harrison, Christine J; Griffiths, Mike; Moorman, Fìona; Schnittger, Susanne; Cayuela, Jean-Michel; Shurtleff, Sheila; Gottardi, Enrico; Mitterbauer, Gerlinde; Colomer, Dolores; Delabesse, Eric; Castéras, Vincent; Maroc, Nicolas
2007-02-01
Rearrangements of the MLL gene are significant in acute leukemia. Among the most frequent translocations are t(4;11)(q21;q23) and t(9;11)(p22;q23), which give rise to the MLL-AFF1 and MLL-MLLT3 fusion genes (alias MLL-AF4 and MLL-AF9) in acute lymphoblastic and acute myeloid leukemia, respectively. Current evidence suggests that determining the MLL status of acute leukemia, including precise identification of the partner gene, is important in defining appropriate treatment. This underscores the need for accurate detection methods. A novel molecular diagnostic device, the MLL FusionChip, has been successfully used to identify MLL fusion gene translocations in acute leukemia, including the precise breakpoint location. This study evaluated the performance of the MLL FusionChip within a routine clinical environment, comprising nine centers worldwide, in the analysis of 21 control and 136 patient samples. It was shown that the assay allowed accurate detection of the MLL fusion gene, regardless of the breakpoint location, and confirmed that this multiplex approach was robust in a global multicenter trial. The MLL FusionChip was shown to be superior to other detection methods. The type of molecular information provided by MLL FusionChip gave an indication of the appropriate primers to design for disease monitoring of MLL patients following treatment.
RNA-Seq of Circulating Tumor Cells in Stage II-III Breast Cancer.
Lang, Julie E; Ring, Alexander; Porras, Tania; Kaur, Pushpinder; Forte, Victoria A; Mineyev, Neal; Tripathy, Debu; Press, Michael F; Campo, Daniel
2018-06-04
We characterized the whole transcriptome of circulating tumor cells (CTCs) in stage II-III breast cancer to evaluate correlations with primary tumor biology. CTCs were isolated from peripheral blood (PB) via immunomagnetic enrichment followed by fluorescence-activated cell sorting (IE/FACS). CTCs, PB, and fresh tumors were profiled using RNA-seq. Formalin-fixed, paraffin-embedded (FFPE) tumors were subjected to RNA-seq and NanoString PAM50 assays with risk of recurrence (ROR) scores. CTCs were detected in 29/33 (88%) patients. We selected 21 cases to attempt RNA-seq (median number of CTCs = 9). Sixteen CTC samples yielded results that passed quality-control metrics, and these samples had a median of 4,311,255 uniquely mapped reads (less than PB or tumors). Intrinsic subtype predicted by comparing estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) versus PAM50 for FFPE tumors was 85% concordant. However, CTC RNA-seq subtype assessed by the PAM50 classification genes was highly discordant, both with the subtype predicted by ER/PR/HER2 and by PAM50 tumors. Two patients died of metastatic disease, both of whom had high ROR scores and high CTC counts. We identified significant genes, canonical pathways, upstream regulators, and molecular interaction networks comparing CTCs by various clinical factors. We also identified a 75-gene signature with highest expression in CTCs and tumors taken together that was prognostic in The Cancer Genome Atlas and Molecular Taxonomy of Breast Cancer International Consortium datasets. It is feasible to use RNA-seq of CTCs in non-metastatic patients to discover novel tumor biology characteristics.
Wang, Zichen; Ma'ayan, Avi
2016-01-01
RNA-seq analysis is becoming a standard method for global gene expression profiling. However, open and standard pipelines to perform RNA-seq analysis by non-experts remain challenging due to the large size of the raw data files and the hardware requirements for running the alignment step. Here we introduce a reproducible open source RNA-seq pipeline delivered as an IPython notebook and a Docker image. The pipeline uses state-of-the-art tools and can run on various platforms with minimal configuration overhead. The pipeline enables the extraction of knowledge from typical RNA-seq studies by generating interactive principal component analysis (PCA) and hierarchical clustering (HC) plots, performing enrichment analyses against over 90 gene set libraries, and obtaining lists of small molecules that are predicted to either mimic or reverse the observed changes in mRNA expression. We apply the pipeline to a recently published RNA-seq dataset collected from human neuronal progenitors infected with the Zika virus (ZIKV). In addition to confirming the presence of cell cycle genes among the genes that are downregulated by ZIKV, our analysis uncovers significant overlap with upregulated genes that when knocked out in mice induce defects in brain morphology. This result potentially points to the molecular processes associated with the microcephaly phenotype observed in newborns from pregnant mothers infected with the virus. In addition, our analysis predicts small molecules that can either mimic or reverse the expression changes induced by ZIKV. The IPython notebook and Docker image are freely available at: http://nbviewer.jupyter.org/github/maayanlab/Zika-RNAseq-Pipeline/blob/master/Zika.ipynb and https://hub.docker.com/r/maayanlab/zika/.
Software for rapid time dependent ChIP-sequencing analysis (TDCA).
Myschyshyn, Mike; Farren-Dai, Marco; Chuang, Tien-Jui; Vocadlo, David
2017-11-25
Chromatin immunoprecipitation followed by DNA sequencing (ChIP-seq) and associated methods are widely used to define the genome wide distribution of chromatin associated proteins, post-translational epigenetic marks, and modifications found on DNA bases. An area of emerging interest is to study time dependent changes in the distribution of such proteins and marks by using serial ChIP-seq experiments performed in a time resolved manner. Despite such time resolved studies becoming increasingly common, software to facilitate analysis of such data in a robust automated manner is limited. We have designed software called Time-Dependent ChIP-Sequencing Analyser (TDCA), which is the first program to automate analysis of time-dependent ChIP-seq data by fitting to sigmoidal curves. We provide users with guidance for experimental design of TDCA for modeling of time course (TC) ChIP-seq data using two simulated data sets. Furthermore, we demonstrate that this fitting strategy is widely applicable by showing that automated analysis of three previously published TC data sets accurately recapitulates key findings reported in these studies. Using each of these data sets, we highlight how biologically relevant findings can be readily obtained by exploiting TDCA to yield intuitive parameters that describe behavior at either a single locus or sets of loci. TDCA enables customizable analysis of user input aligned DNA sequencing data, coupled with graphical outputs in the form of publication-ready figures that describe behavior at either individual loci or sets of loci sharing common traits defined by the user. TDCA accepts sequencing data as standard binary alignment map (BAM) files and loci of interest in browser extensible data (BED) file format. TDCA accurately models the number of sequencing reads, or coverage, at loci from TC ChIP-seq studies or conceptually related TC sequencing experiments. TC experiments are reduced to intuitive parametric values that facilitate biologically relevant data analysis, and the uncovering of variations in the time-dependent behavior of chromatin. TDCA automates the analysis of TC ChIP-seq experiments, permitting researchers to easily obtain raw and modeled data for specific loci or groups of loci with similar behavior while also enhancing consistency of data analysis of TC data within the genomics field.
Wu, Liang; Zhang, Xiaolong; Zhao, Zhikun; Wang, Ling; Li, Bo; Li, Guibo; Dean, Michael; Yu, Qichao; Wang, Yanhui; Lin, Xinxin; Rao, Weijian; Mei, Zhanlong; Li, Yang; Jiang, Runze; Yang, Huan; Li, Fuqiang; Xie, Guoyun; Xu, Liqin; Wu, Kui; Zhang, Jie; Chen, Jianghao; Wang, Ting; Kristiansen, Karsten; Zhang, Xiuqing; Li, Yingrui; Yang, Huanming; Wang, Jian; Hou, Yong; Xu, Xun
2015-01-01
Viral infection causes multiple forms of human cancer, and HPV infection is the primary factor in cervical carcinomas. Recent single-cell RNA-seq studies highlight the tumor heterogeneity present in most cancers, but virally induced tumors have not been studied. HeLa is a well characterized HPV+ cervical cancer cell line. We developed a new high throughput platform to prepare single-cell RNA on a nanoliter scale based on a customized microwell chip. Using this method, we successfully amplified full-length transcripts of 669 single HeLa S3 cells and 40 of them were randomly selected to perform single-cell RNA sequencing. Based on these data, we obtained a comprehensive understanding of the heterogeneity of HeLa S3 cells in gene expression, alternative splicing and fusions. Furthermore, we identified a high diversity of HPV-18 expression and splicing at the single-cell level. By co-expression analysis we identified 283 E6, E7 co-regulated genes, including CDC25, PCNA, PLK4, BUB1B and IRF1 known to interact with HPV viral proteins. Our results reveal the heterogeneity of a virus-infected cell line. It not only provides a transcriptome characterization of HeLa S3 cells at the single cell level, but is a demonstration of the power of single cell RNA-seq analysis of virally infected cells and cancers.
miR-MaGiC improves quantification accuracy for small RNA-seq.
Russell, Pamela H; Vestal, Brian; Shi, Wen; Rudra, Pratyaydipta D; Dowell, Robin; Radcliffe, Richard; Saba, Laura; Kechris, Katerina
2018-05-15
Many tools have been developed to profile microRNA (miRNA) expression from small RNA-seq data. These tools must contend with several issues: the small size of miRNAs, the small number of unique miRNAs, the fact that similar miRNAs can be transcribed from multiple loci, and the presence of miRNA isoforms known as isomiRs. Methods failing to address these issues can return misleading information. We propose a novel quantification method designed to address these concerns. We present miR-MaGiC, a novel miRNA quantification method, implemented as a cross-platform tool in Java. miR-MaGiC performs stringent mapping to a core region of each miRNA and defines a meaningful set of target miRNA sequences by collapsing the miRNA space to "functional groups". We hypothesize that these two features, mapping stringency and collapsing, provide more optimal quantification to a more meaningful unit (i.e., miRNA family). We test miR-MaGiC and several published methods on 210 small RNA-seq libraries, evaluating each method's ability to accurately reflect global miRNA expression profiles. We define accuracy as total counts close to the total number of input reads originating from miRNAs. We find that miR-MaGiC, which incorporates both stringency and collapsing, provides the most accurate counts.
Liu, Wanting; Xiang, Lunping; Zheng, Tingkai; Jin, Jingjie
2018-01-01
Abstract Translation is a key regulatory step, linking transcriptome and proteome. Two major methods of translatome investigations are RNC-seq (sequencing of translating mRNA) and Ribo-seq (ribosome profiling). To facilitate the investigation of translation, we built a comprehensive database TranslatomeDB (http://www.translatomedb.net/) which provides collection and integrated analysis of published and user-generated translatome sequencing data. The current version includes 2453 Ribo-seq, 10 RNC-seq and their 1394 corresponding mRNA-seq datasets in 13 species. The database emphasizes the analysis functions in addition to the dataset collections. Differential gene expression (DGE) analysis can be performed between any two datasets of same species and type, both on transcriptome and translatome levels. The translation indices translation ratios, elongation velocity index and translational efficiency can be calculated to quantitatively evaluate translational initiation efficiency and elongation velocity, respectively. All datasets were analyzed using a unified, robust, accurate and experimentally-verifiable pipeline based on the FANSe3 mapping algorithm and edgeR for DGE analyzes. TranslatomeDB also allows users to upload their own datasets and utilize the identical unified pipeline to analyze their data. We believe that our TranslatomeDB is a comprehensive platform and knowledgebase on translatome and proteome research, releasing the biologists from complex searching, analyzing and comparing huge sequencing data without needing local computational power. PMID:29106630
Sequence Alignment to Predict Across Species Susceptibility ...
Conservation of a molecular target across species can be used as a line-of-evidence to predict the likelihood of chemical susceptibility. The web-based Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool was developed to simplify, streamline, and quantitatively assess protein sequence/structural similarity across taxonomic groups as a means to predict relative intrinsic susceptibility. The intent of the tool is to allow for evaluation of any potential protein target, so it is amenable to variable degrees of protein characterization, depending on available information about the chemical/protein interaction and the molecular target itself. To allow for flexibility in the analysis, a layered strategy was adopted for the tool. The first level of the SeqAPASS analysis compares primary amino acid sequences to a query sequence, calculating a metric for sequence similarity (including detection of candidate orthologs), the second level evaluates sequence similarity within selected domains (e.g., ligand-binding domain, DNA binding domain), and the third level of analysis compares individual amino acid residue positions identified as being of importance for protein conformation and/or ligand binding upon chemical perturbation. Each level of the SeqAPASS analysis provides increasing evidence to apply toward rapid, screening-level assessments of probable cross species susceptibility. Such analyses can support prioritization of chemicals for further ev
Integrative Analysis of Many RNA-Seq Datasets to Study Alternative Splicing
Li, Wenyuan; Dai, Chao; Kang, Shuli; Zhou, Xianghong Jasmine
2014-01-01
Alternative splicing is an important gene regulatory mechanism that dramatically increases the complexity of the proteome. However, how alternative splicing is regulated and how transcription and splicing are coordinated are still poorly understood, and functions of transcript isoforms have been studied only in a few limited cases. Nowadays, RNA-seq technology provides an exceptional opportunity to study alternative splicing on genome-wide scales and in an unbiased manner. With the rapid accumulation of data in public repositories, new challenges arise from the urgent need to effectively integrate many different RNA-seq datasets for study alterative splicing. This paper discusses a set of advanced computational methods that can integrate and analyze many RNA-seq datasets to systematically identify splicing modules, unravel the coupling of transcription and splicing, and predict the functions of splicing isoforms on a genome-wide scale. PMID:24583115
Revisiting lab-on-a-chip technology for drug discovery.
Neuži, Pavel; Giselbrecht, Stefan; Länge, Kerstin; Huang, Tony Jun; Manz, Andreas
2012-08-01
The field of microfluidics or lab-on-a-chip technology aims to improve and extend the possibilities of bioassays, cell biology and biomedical research based on the idea of miniaturization. Microfluidic systems allow more accurate modelling of physiological situations for both fundamental research and drug development, and enable systematic high-volume testing for various aspects of drug discovery. Microfluidic systems are in development that not only model biological environments but also physically mimic biological tissues and organs; such 'organs on a chip' could have an important role in expediting early stages of drug discovery and help reduce reliance on animal testing. This Review highlights the latest lab-on-a-chip technologies for drug discovery and discusses the potential for future developments in this field.
Statistical modeling of isoform splicing dynamics from RNA-seq time series data.
Huang, Yuanhua; Sanguinetti, Guido
2016-10-01
Isoform quantification is an important goal of RNA-seq experiments, yet it remains problematic for genes with low expression or several isoforms. These difficulties may in principle be ameliorated by exploiting correlated experimental designs, such as time series or dosage response experiments. Time series RNA-seq experiments, in particular, are becoming increasingly popular, yet there are no methods that explicitly leverage the experimental design to improve isoform quantification. Here, we present DICEseq, the first isoform quantification method tailored to correlated RNA-seq experiments. DICEseq explicitly models the correlations between different RNA-seq experiments to aid the quantification of isoforms across experiments. Numerical experiments on simulated datasets show that DICEseq yields more accurate results than state-of-the-art methods, an advantage that can become considerable at low coverage levels. On real datasets, our results show that DICEseq provides substantially more reproducible and robust quantifications, increasing the correlation of estimates from replicate datasets by up to 10% on genes with low or moderate expression levels (bottom third of all genes). Furthermore, DICEseq permits to quantify the trade-off between temporal sampling of RNA and depth of sequencing, frequently an important choice when planning experiments. Our results have strong implications for the design of RNA-seq experiments, and offer a novel tool for improved analysis of such datasets. Python code is freely available at http://diceseq.sf.net G.Sanguinetti@ed.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Statistical inference for time course RNA-Seq data using a negative binomial mixed-effect model.
Sun, Xiaoxiao; Dalpiaz, David; Wu, Di; S Liu, Jun; Zhong, Wenxuan; Ma, Ping
2016-08-26
Accurate identification of differentially expressed (DE) genes in time course RNA-Seq data is crucial for understanding the dynamics of transcriptional regulatory network. However, most of the available methods treat gene expressions at different time points as replicates and test the significance of the mean expression difference between treatments or conditions irrespective of time. They thus fail to identify many DE genes with different profiles across time. In this article, we propose a negative binomial mixed-effect model (NBMM) to identify DE genes in time course RNA-Seq data. In the NBMM, mean gene expression is characterized by a fixed effect, and time dependency is described by random effects. The NBMM is very flexible and can be fitted to both unreplicated and replicated time course RNA-Seq data via a penalized likelihood method. By comparing gene expression profiles over time, we further classify the DE genes into two subtypes to enhance the understanding of expression dynamics. A significance test for detecting DE genes is derived using a Kullback-Leibler distance ratio. Additionally, a significance test for gene sets is developed using a gene set score. Simulation analysis shows that the NBMM outperforms currently available methods for detecting DE genes and gene sets. Moreover, our real data analysis of fruit fly developmental time course RNA-Seq data demonstrates the NBMM identifies biologically relevant genes which are well justified by gene ontology analysis. The proposed method is powerful and efficient to detect biologically relevant DE genes and gene sets in time course RNA-Seq data.
Hybrid De Novo Genome Assembly Using MiSeq and SOLiD Short Read Data
Ikegami, Tsutomu; Inatsugi, Toyohiro; Kojima, Isao; Umemura, Myco; Hagiwara, Hiroko; Machida, Masayuki; Asai, Kiyoshi
2015-01-01
A hybrid de novo assembly pipeline was constructed to utilize both MiSeq and SOLiD short read data in combination in the assembly. The short read data were converted to a standard format of the pipeline, and were supplied to the pipeline components such as ABySS and SOAPdenovo. The assembly pipeline proceeded through several stages, and either MiSeq paired-end data, SOLiD mate-paired data, or both of them could be specified as input data at each stage separately. The pipeline was examined on the filamentous fungus Aspergillus oryzae RIB40, by aligning the assembly results against the reference sequences. Using both the MiSeq and the SOLiD data in the hybrid assembly, the alignment length was improved by a factor of 3 to 8, compared with the assemblies using either one of the data types. The number of the reproduced gene cluster regions encoding secondary metabolite biosyntheses (SMB) was also improved by the hybrid assemblies. These results imply that the MiSeq data with long read length are essential to construct accurate nucleotide sequences, while the SOLiD mate-paired reads with long insertion length enhance long-range arrangements of the sequences. The pipeline was also tested on the actinomycete Streptomyces avermitilis MA-4680, whose gene is known to have high-GC content. Although the quality of the SOLiD reads was too low to perform any meaningful assemblies by themselves, the alignment length to the reference was improved by a factor of 2, compared with the assembly using only the MiSeq data. PMID:25919614
Normalization of RNA-seq data using factor analysis of control genes or samples
Risso, Davide; Ngai, John; Speed, Terence P.; Dudoit, Sandrine
2015-01-01
Normalization of RNA-seq data has proven essential to ensure accurate inference of expression levels. Here we show that usual normalization approaches mostly account for sequencing depth and fail to correct for library preparation and other more-complex unwanted effects. We evaluate the performance of the External RNA Control Consortium (ERCC) spike-in controls and investigate the possibility of using them directly for normalization. We show that the spike-ins are not reliable enough to be used in standard global-scaling or regression-based normalization procedures. We propose a normalization strategy, remove unwanted variation (RUV), that adjusts for nuisance technical effects by performing factor analysis on suitable sets of control genes (e.g., ERCC spike-ins) or samples (e.g., replicate libraries). Our approach leads to more-accurate estimates of expression fold-changes and tests of differential expression compared to state-of-the-art normalization methods. In particular, RUV promises to be valuable for large collaborative projects involving multiple labs, technicians, and/or platforms. PMID:25150836
Nalpas, Nicolas C; Park, Stephen D E; Magee, David A; Taraktsoglou, Maria; Browne, John A; Conlon, Kevin M; Rue-Albrecht, Kévin; Killick, Kate E; Hokamp, Karsten; Lohan, Amanda J; Loftus, Brendan J; Gormley, Eamonn; Gordon, Stephen V; MacHugh, David E
2013-04-08
Mycobacterium bovis, the causative agent of bovine tuberculosis, is an intracellular pathogen that can persist inside host macrophages during infection via a diverse range of mechanisms that subvert the host immune response. In the current study, we have analysed and compared the transcriptomes of M. bovis-infected monocyte-derived macrophages (MDM) purified from six Holstein-Friesian females with the transcriptomes of non-infected control MDM from the same animals over a 24 h period using strand-specific RNA sequencing (RNA-seq). In addition, we compare gene expression profiles generated using RNA-seq with those previously generated by us using the high-density Affymetrix® GeneChip® Bovine Genome Array platform from the same MDM-extracted RNA. A mean of 7.2 million reads from each MDM sample mapped uniquely and unambiguously to single Bos taurus reference genome locations. Analysis of these mapped reads showed 2,584 genes (1,392 upregulated; 1,192 downregulated) and 757 putative natural antisense transcripts (558 upregulated; 119 downregulated) that were differentially expressed based on sense and antisense strand data, respectively (adjusted P-value ≤ 0.05). Of the differentially expressed genes, 694 were common to both the sense and antisense data sets, with the direction of expression (i.e. up- or downregulation) positively correlated for 693 genes and negatively correlated for the remaining gene. Gene ontology analysis of the differentially expressed genes revealed an enrichment of immune, apoptotic and cell signalling genes. Notably, the number of differentially expressed genes identified from RNA-seq sense strand analysis was greater than the number of differentially expressed genes detected from microarray analysis (2,584 genes versus 2,015 genes). Furthermore, our data reveal a greater dynamic range in the detection and quantification of gene transcripts for RNA-seq compared to microarray technology. This study highlights the value of RNA-seq in identifying novel immunomodulatory mechanisms that underlie host-mycobacterial pathogen interactions during infection, including possible complex post-transcriptional regulation of host gene expression involving antisense RNA.
Yamamura, Shohei; Yamada, Eriko; Kimura, Fukiko; Miyajima, Kumiko; Shigeto, Hajime
2017-10-21
A new single-cell microarray chip was designed and developed to separate and analyze single adherent and non-adherent cancer cells. The single-cell microarray chip is made of polystyrene with over 60,000 microchambers of 10 different size patterns (31-40 µm upper diameter, 11-20 µm lower diameter). A drop of suspension of adherent carcinoma (NCI-H1650) and non-adherent leukocyte (CCRF-CEM) cells was placed onto the chip, and single-cell occupancy of NCI-H1650 and CCRF-CEM was determined to be 79% and 84%, respectively. This was achieved by controlling the chip design and surface treatment. Analysis of protein expression in single NCI-H1650 and CCRF-CEM cells was performed on the single-cell microarray chip by multi-antibody staining. Additionally, with this system, we retrieved positive single cells from the microchambers by a micromanipulator. Thus, this system demonstrates the potential for easy and accurate separation and analysis of various types of single cells.
Stress analysis of ultra-thin silicon chip-on-foil electronic assembly under bending
NASA Astrophysics Data System (ADS)
Wacker, Nicoleta; Richter, Harald; Hoang, Tu; Gazdzicki, Pawel; Schulze, Mathias; Angelopoulos, Evangelos A.; Hassan, Mahadi-Ul; Burghartz, Joachim N.
2014-09-01
In this paper we investigate the bending-induced uniaxial stress at the top of ultra-thin (thickness \\leqslant 20 μm) single-crystal silicon (Si) chips adhesively attached with the aid of an epoxy glue to soft polymeric substrate through combined theoretical and experimental methods. Stress is first determined analytically and numerically using dedicated models. The theoretical results are validated experimentally through piezoresistive measurements performed on complementary metal-oxide-semiconductor (CMOS) transistors built on specially designed chips, and through micro-Raman spectroscopy investigation. Stress analysis of strained ultra-thin chips with CMOS circuitry is crucial, not only for the accurate evaluation of the piezoresistive behavior of the built-in devices and circuits, but also for reliability and deformability analysis. The results reveal an uneven bending-induced stress distribution at the top of the Si-chip that decreases from the central area towards the chip's edges along the bending direction, and increases towards the other edges. Near these edges, stress can reach very high values, facilitating the emergence of cracks causing ultimate chip failure.
Bayesian Networks Predict Neuronal Transdifferentiation.
Ainsworth, Richard I; Ai, Rizi; Ding, Bo; Li, Nan; Zhang, Kai; Wang, Wei
2018-05-30
We employ the language of Bayesian networks to systematically construct gene-regulation topologies from deep-sequencing single-nucleus RNA-Seq data for human neurons. From the perspective of the cell-state potential landscape, we identify attractors that correspond closely to different neuron subtypes. Attractors are also recovered for cell states from an independent data set confirming our models accurate description of global genetic regulations across differing cell types of the neocortex (not included in the training data). Our model recovers experimentally confirmed genetic regulations and community analysis reveals genetic associations in common pathways. Via a comprehensive scan of all theoretical three-gene perturbations of gene knockout and overexpression, we discover novel neuronal trans-differrentiation recipes (including perturbations of SATB2, GAD1, POU6F2 and ADARB2) for excitatory projection neuron and inhibitory interneuron subtypes. Copyright © 2018, G3: Genes, Genomes, Genetics.
NASA Astrophysics Data System (ADS)
Choi, Jin-Ha; Lee, Jaewon; Shin, Woojung; Choi, Jeong-Woo; Kim, Hyun Jung
2016-10-01
Nanotechnology and bioengineering have converged over the past decades, by which the application of multi-functional nanoparticles (NPs) has been emerged in clinical and biomedical fields. The NPs primed to detect disease-specific biomarkers or to deliver biopharmaceutical compounds have beena validated in conventional in vitro culture models including two dimensional (2D) cell cultures or 3D organoid models. However, a lack of experimental models that have strong human physiological relevance has hampered accurate validation of the safety and functionality of NPs. Alternatively, biomimetic human "Organs-on-Chips" microphysiological systems have recapitulated the mechanically dynamic 3D tissue interface of human organ microenvironment, in which the transport, cytotoxicity, biocompatibility, and therapeutic efficacy of NPs and their conjugates may be more accurately validated. Finally, integration of NP-guided diagnostic detection and targeted nanotherapeutics in conjunction with human organs-on-chips can provide a novel avenue to accelerate the NP-based drug development process as well as the rapid detection of cellular secretomes associated with pathophysiological processes.
Organs-on-chips at the frontiers of drug discovery
Esch, Eric W.; Bahinski, Anthony; Huh, Dongeun
2016-01-01
Improving the effectiveness of preclinical predictions of human drug responses is critical to reducing costly failures in clinical trials. Recent advances in cell biology, microfabrication and microfluidics have enabled the development of microengineered models of the functional units of human organs — known as organs-on-chips — that could provide the basis for preclinical assays with greater predictive power. Here, we examine the new opportunities for the application of organ-on-chip technologies in a range of areas in preclinical drug discovery, such as target identification and validation, target-based screening, and phenotypic screening. We also discuss emerging drug discovery opportunities enabled by organs-on-chips, as well as important challenges in realizing the full potential of this technology. PMID:25792263
Best Practices and Joint Calling of the HumanExome BeadChip: The CHARGE Consortium
Grove, Megan L.; Yu, Bing; Cochran, Barbara J.; Haritunians, Talin; Bis, Joshua C.; Taylor, Kent D.; Hansen, Mark; Borecki, Ingrid B.; Cupples, L. Adrienne; Fornage, Myriam; Gudnason, Vilmundur; Harris, Tamara B.; Kathiresan, Sekar; Kraaij, Robert; Launer, Lenore J.; Levy, Daniel; Liu, Yongmei; Mosley, Thomas; Peloso, Gina M.; Psaty, Bruce M.; Rich, Stephen S.; Rivadeneira, Fernando; Siscovick, David S.; Smith, Albert V.; Uitterlinden, Andre; van Duijn, Cornelia M.; Wilson, James G.; O’Donnell, Christopher J.; Rotter, Jerome I.; Boerwinkle, Eric
2013-01-01
Genotyping arrays are a cost effective approach when typing previously-identified genetic polymorphisms in large numbers of samples. One limitation of genotyping arrays with rare variants (e.g., minor allele frequency [MAF] <0.01) is the difficulty that automated clustering algorithms have to accurately detect and assign genotype calls. Combining intensity data from large numbers of samples may increase the ability to accurately call the genotypes of rare variants. Approximately 62,000 ethnically diverse samples from eleven Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium cohorts were genotyped with the Illumina HumanExome BeadChip across seven genotyping centers. The raw data files for the samples were assembled into a single project for joint calling. To assess the quality of the joint calling, concordance of genotypes in a subset of individuals having both exome chip and exome sequence data was analyzed. After exclusion of low performing SNPs on the exome chip and non-overlap of SNPs derived from sequence data, genotypes of 185,119 variants (11,356 were monomorphic) were compared in 530 individuals that had whole exome sequence data. A total of 98,113,070 pairs of genotypes were tested and 99.77% were concordant, 0.14% had missing data, and 0.09% were discordant. We report that joint calling allows the ability to accurately genotype rare variation using array technology when large sample sizes are available and best practices are followed. The cluster file from this experiment is available at www.chargeconsortium.com/main/exomechip. PMID:23874508
Lee, Dong Woo; Oh, Woo-Yeon; Yi, Sang Hyun; Ku, Bosung; Lee, Moo-Yeal; Cho, Yoon Hee; Yang, Mihi
2016-09-30
Bisphenol A (BPA) has been widely used for manufacturing polycarbonate plastics and epoxy resins and has been extensively tested in animals to predict human toxicity. In order to reduce the use of animals for toxicity assessment and provide further accurate information on BPA toxicity in humans, we encapsulated Hep3B human hepatoma cells in alginate and cultured them in three dimensions (3D) on a micropillar chip coupled to a panel of metabolic enzymes on a microwell chip. As a result, we were able to assess the toxicity of BPA under various metabolic enzyme conditions using a high-throughput and micro assay; sample volumes were nearly 2,000 times less than that required for a 96-well plate. We applied a total of 28 different enzymes to each chip, including 10 cytochrome P450s (CYP450s), 10 UDP-glycosyltransferases (UGTs), 3 sulfotransferases (SULTs), alcohol dehydrogenase (ADH), and aldehyde dehydrogenase 2 (ALDH2). Phase I enzyme mixtures, phase II enzyme mixtures, and a combination of phase I and phase II enzymes were also applied to the chip. BPA toxicity was higher in samples containing CYP2E1 than controls, which contained no enzymes (IC50, 184±16μM and 270±25.8μM, respectively, p<0.01). However, BPA-induced toxicity was alleviated in the presence of ADH (IC50, 337±17.9μM), ALDH2 (335±13.9μM), and SULT1E1 (318±17.7μM) (p<0.05). CYP2E1-mediated cytotoxicity was confirmed by quantifying unmetabolized BPA using HPLC/FD. Therefore, we suggest the present micropillar/microwell chip platform as an effective alternative to animal testing for estimating BPA toxicity via human metabolic systems. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Fernandez-Valverde, Selene L; Calcino, Andrew D; Degnan, Bernard M
2015-05-15
The demosponge Amphimedon queenslandica is amongst the few early-branching metazoans with an assembled and annotated draft genome, making it an important species in the study of the origin and early evolution of animals. Current gene models in this species are largely based on in silico predictions and low coverage expressed sequence tag (EST) evidence. Amphimedon queenslandica protein-coding gene models are improved using deep RNA-Seq data from four developmental stages and CEL-Seq data from 82 developmental samples. Over 86% of previously predicted genes are retained in the new gene models, although 24% have additional exons; there is also a marked increase in the total number of annotated 3' and 5' untranslated regions (UTRs). Importantly, these new developmental transcriptome data reveal numerous previously unannotated protein-coding genes in the Amphimedon genome, increasing the total gene number by 25%, from 30,060 to 40,122. In general, Amphimedon genes have introns that are markedly smaller than those in other animals and most of the alternatively spliced genes in Amphimedon undergo intron-retention; exon-skipping is the least common mode of alternative splicing. Finally, in addition to canonical polyadenylation signal sequences, Amphimedon genes are enriched in a number of unique AT-rich motifs in their 3' UTRs. The inclusion of developmental transcriptome data has substantially improved the structure and composition of protein-coding gene models in Amphimedon queenslandica, providing a more accurate and comprehensive set of genes for functional and comparative studies. These improvements reveal the Amphimedon genome is comprised of a remarkably high number of tightly packed genes. These genes have small introns and there is pervasive intron retention amongst alternatively spliced transcripts. These aspects of the sponge genome are more similar unicellular opisthokont genomes than to other animal genomes.
Gardeux, Vincent; Achour, Ikbel; Li, Jianrong; Maienschein-Cline, Mark; Li, Haiquan; Pesce, Lorenzo; Parinandi, Gurunadh; Bahroos, Neil; Winn, Robert; Foster, Ian; Garcia, Joe G N; Lussier, Yves A
2014-01-01
Background The emergence of precision medicine allowed the incorporation of individual molecular data into patient care. Indeed, DNA sequencing predicts somatic mutations in individual patients. However, these genetic features overlook dynamic epigenetic and phenotypic response to therapy. Meanwhile, accurate personal transcriptome interpretation remains an unmet challenge. Further, N-of-1 (single-subject) efficacy trials are increasingly pursued, but are underpowered for molecular marker discovery. Method ‘N-of-1-pathways’ is a global framework relying on three principles: (i) the statistical universe is a single patient; (ii) significance is derived from geneset/biomodules powered by paired samples from the same patient; and (iii) similarity between genesets/biomodules assesses commonality and differences, within-study and cross-studies. Thus, patient gene-level profiles are transformed into deregulated pathways. From RNA-Seq of 55 lung adenocarcinoma patients, N-of-1-pathways predicts the deregulated pathways of each patient. Results Cross-patient N-of-1-pathways obtains comparable results with conventional genesets enrichment analysis (GSEA) and differentially expressed gene (DEG) enrichment, validated in three external evaluations. Moreover, heatmap and star plots highlight both individual and shared mechanisms ranging from molecular to organ-systems levels (eg, DNA repair, signaling, immune response). Patients were ranked based on the similarity of their deregulated mechanisms to those of an independent gold standard, generating unsupervised clusters of diametric extreme survival phenotypes (p=0.03). Conclusions The N-of-1-pathways framework provides a robust statistical and relevant biological interpretation of individual disease-free survival that is often overlooked in conventional cross-patient studies. It enables mechanism-level classifiers with smaller cohorts as well as N-of-1 studies. Software http://lussierlab.org/publications/N-of-1-pathways PMID:25301808
2013-01-01
Background Adenosine-to-inosine (A-to-I) RNA editing is recognized as a cellular mechanism for generating both RNA and protein diversity. Inosine base pairs with cytidine during reverse transcription and therefore appears as guanosine during sequencing of cDNA. Current approaches of RNA editing identification largely depend on the comparison between transcriptomes and genomic DNA (gDNA) sequencing datasets from the same individuals, and it has been challenging to identify editing candidates from transcriptomes in the absence of gDNA information. Results We have developed a new strategy to accurately predict constitutive RNA editing sites from publicly available human RNA-seq datasets in the absence of relevant genomic sequences. Our approach establishes new parameters to increase the ability to map mismatches and to minimize sequencing/mapping errors and unreported genome variations. We identified 695 novel constitutive A-to-I editing sites that appear in clusters (named “editing boxes”) in multiple samples and which exhibit spatial and dynamic regulation across human tissues. Some of these editing boxes are enriched in non-repetitive regions lacking inverted repeat structures and contain an extremely high conversion frequency of As to Is. We validated a number of editing boxes in multiple human cell lines and confirmed that ADAR1 is responsible for the observed promiscuous editing events in non-repetitive regions, further expanding our knowledge of the catalytic substrate of A-to-I RNA editing by ADAR enzymes. Conclusions The approach we present here provides a novel way of identifying A-to-I RNA editing events by analyzing only RNA-seq datasets. This method has allowed us to gain new insights into RNA editing and should also aid in the identification of more constitutive A-to-I editing sites from additional transcriptomes. PMID:23537002
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gardeux, Vincent; Achour, Ikbel; Li, Jianrong
Background: The emergence of precision medicine allowed the incorporation of individual molecular data into patient care. This research entails, DNA sequencing predicts somatic mutations in individual patients. However, these genetic features overlook dynamic epigenetic and phenotypic response to therapy. Meanwhile, accurate personal transcriptome interpretation remains an unmet challenge. Further, N-of-1 (single-subject) efficacy trials are increasingly pursued, but are underpowered for molecular marker discovery. Method: ‘N-of-1- pathways’ is a global framework relying on three principles: (i) the statistical universe is a single patient; (ii) significance is derived from geneset/biomodules powered by paired samples from the same patient; and (iii) similarity betweenmore » genesets/biomodules assesses commonality and differences, within-study and cross-studies. Thus, patient gene-level profiles are transformed into deregulated pathways. From RNA-Seq of 55 lung adenocarcinoma patients, N-of-1- pathways predicts the deregulated pathways of each patient. Results: Cross-patient N-of-1- pathways obtains comparable results with conventional genesets enrichment analysis (GSEA) and differentially expressed gene (DEG) enrichment, validated in three external evaluations. Moreover, heatmap and star plots highlight both individual and shared mechanisms ranging from molecular to organ-systems levels (eg, DNA repair, signaling, immune response). Patients were ranked based on the similarity of their deregulated mechanisms to those of an independent gold standard, generating unsupervised clusters of diametric extreme survival phenotypes (p=0.03). Conclusions: The N-of-1- pathways framework provides a robust statistical and relevant biological interpretation of individual disease-free survival that is often overlooked in conventional cross-patient studies. It enables mechanism-level classifiers with smaller cohorts as well as N-of-1 studies.« less
Gardeux, Vincent; Achour, Ikbel; Li, Jianrong; ...
2014-11-01
Background: The emergence of precision medicine allowed the incorporation of individual molecular data into patient care. This research entails, DNA sequencing predicts somatic mutations in individual patients. However, these genetic features overlook dynamic epigenetic and phenotypic response to therapy. Meanwhile, accurate personal transcriptome interpretation remains an unmet challenge. Further, N-of-1 (single-subject) efficacy trials are increasingly pursued, but are underpowered for molecular marker discovery. Method: ‘N-of-1- pathways’ is a global framework relying on three principles: (i) the statistical universe is a single patient; (ii) significance is derived from geneset/biomodules powered by paired samples from the same patient; and (iii) similarity betweenmore » genesets/biomodules assesses commonality and differences, within-study and cross-studies. Thus, patient gene-level profiles are transformed into deregulated pathways. From RNA-Seq of 55 lung adenocarcinoma patients, N-of-1- pathways predicts the deregulated pathways of each patient. Results: Cross-patient N-of-1- pathways obtains comparable results with conventional genesets enrichment analysis (GSEA) and differentially expressed gene (DEG) enrichment, validated in three external evaluations. Moreover, heatmap and star plots highlight both individual and shared mechanisms ranging from molecular to organ-systems levels (eg, DNA repair, signaling, immune response). Patients were ranked based on the similarity of their deregulated mechanisms to those of an independent gold standard, generating unsupervised clusters of diametric extreme survival phenotypes (p=0.03). Conclusions: The N-of-1- pathways framework provides a robust statistical and relevant biological interpretation of individual disease-free survival that is often overlooked in conventional cross-patient studies. It enables mechanism-level classifiers with smaller cohorts as well as N-of-1 studies.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Naimi, Ladan J.; Collard, Flavien; Bi, Xiaotao
Size reduction is an unavoidable operation for preparing biomass for biofuels and bioproduct conversion. Yet, there is considerable uncertainty in power input requirement and the uniformity of ground biomass. Considerable gains are possible if the required power input for a size reduction ratio is estimated accurately. In this research three well-known mechanistic equations attributed to Rittinger, Kick, and Bond available for predicting energy input for grinding pine wood chips were tested against experimental grinding data. Prior to testing, samples of pine wood chips were conditioned to 11.7% wb, moisture content. The wood chips were successively ground in a hammer millmore » using screen sizes of 25.4 mm, 10 mm, 6.4 mm, and 3.2 mm. The input power and the flow of material into the grinder were recorded continuously. The recorded power input vs. mean particle size showed that the Rittinger equation had the best fit to the experimental data. The ground particle sizes were 4 to 7 times smaller than the size of installed screen. Geometric mean size of particles were calculated using two methods (1) Tyler sieves and using particle size analysis and (2) Sauter mean diameter calculated from the ratio of volume to surface that were estimated from measured length and width. The two mean diameters agreed well, pointing to the fact that either mechanical sieving or particle imaging can be used to characterize particle size. In conclusion, specific energy input to the hammer mill increased from 1.4 kWh t –1 (5.2 J g –1) for large 25.1-mm screen to 25 kWh t –1 (90.4 J g –1) for small 3.2-mm screen.« less
Naimi, Ladan J.; Collard, Flavien; Bi, Xiaotao; ...
2016-01-05
Size reduction is an unavoidable operation for preparing biomass for biofuels and bioproduct conversion. Yet, there is considerable uncertainty in power input requirement and the uniformity of ground biomass. Considerable gains are possible if the required power input for a size reduction ratio is estimated accurately. In this research three well-known mechanistic equations attributed to Rittinger, Kick, and Bond available for predicting energy input for grinding pine wood chips were tested against experimental grinding data. Prior to testing, samples of pine wood chips were conditioned to 11.7% wb, moisture content. The wood chips were successively ground in a hammer millmore » using screen sizes of 25.4 mm, 10 mm, 6.4 mm, and 3.2 mm. The input power and the flow of material into the grinder were recorded continuously. The recorded power input vs. mean particle size showed that the Rittinger equation had the best fit to the experimental data. The ground particle sizes were 4 to 7 times smaller than the size of installed screen. Geometric mean size of particles were calculated using two methods (1) Tyler sieves and using particle size analysis and (2) Sauter mean diameter calculated from the ratio of volume to surface that were estimated from measured length and width. The two mean diameters agreed well, pointing to the fact that either mechanical sieving or particle imaging can be used to characterize particle size. In conclusion, specific energy input to the hammer mill increased from 1.4 kWh t –1 (5.2 J g –1) for large 25.1-mm screen to 25 kWh t –1 (90.4 J g –1) for small 3.2-mm screen.« less
Wei, Li; Xu, Jian
2018-06-01
Epigenetic factors such as histone modifications play integral roles in plant development and stress response, yet their implications in algae remain poorly understood. In the industrial oleaginous microalgae Nannochloropsis spp., the lack of an efficient methodology for chromatin immunoprecipitation (ChIP), which determines the specific genomic location of various histone modifications, has hindered probing the epigenetic basis of their photosynthetic carbon conversion and storage as oil. Here, a detailed ChIP protocol was developed for Nannochloropsis oceanica, which represents a reliable approach for the analysis of histone modifications, chromatin state, and transcription factor-binding sites at the epigenetic level. Using ChIP-qPCR, genes related to photosynthetic carbon fixation in this microalga were systematically assessed. Furthermore, a ChIP-Seq protocol was established and optimized, which generated a genome-wide profile of histone modification events, using histone mark H3K9Ac as an example. These results are the first step for appreciation of the chromatin landscape in industrial oleaginous microalgae and for epigenetics-based microalgal feedstock development. © 2018 Phycological Society of America.
Makeyev, Aleksandr V; Bayarsaihan, Dashzeveg
2013-05-01
Objectives : GTF2I and GTF2IRD1 genes located in Williams-Beuren syndrome (WBS) critical region encode TFII-I family transcription factors. The aim of this study was to map genomic sites bound by these proteins across promoter regions of developmental regulators associated with craniofacial development. Design : Chromatin was isolated from human neural crest progenitor cells and the DNA-binding profile was generated using the human RefSeq tiling promoter ChIP-chip arrays. Results : TFII-I transcription factors are recruited to the promoters of SEC23A, CFDP1, and NSD1 previously defined as TFII-I target genes. Moreover, our analysis revealed additional binding elements that contain E-boxes and initiator-like motifs. Conclusions : Genome-wide promoter binding studies revealed SEC23A, CFDP1, and NSD1 linked to craniofacial or dental development as direct TFII-I targets. Developmental regulation of these genes by TFII-I factors could contribute to the WBS-specific facial dysmorphism.
Engagement and communication among participants in the ClinSeq Genomic Sequencing Study.
Hooker, Gillian W; Umstead, Kendall L; Lewis, Katie L; Koehly, Laura K; Biesecker, Leslie G; Biesecker, Barbara B
2017-01-01
As clinical genome sequencing expand its reach, understanding how individuals engage with this process are of critical importance. In this study, we aimed to describe internal engagement and its correlates among a ClinSeq cohort of adults consented to genome sequencing and receipt of results. This study was framed using the precaution adoption process model (PAPM), in which knowledge predicts engagement and engagement predicts subsequent behaviors. Prior to receipt of sequencing results, 630 participants in the study completed a baseline survey. Engagement was assessed as the frequency with which participants thought about their participation in ClinSeq since enrollment. Results were consistent with the PAPM: those with higher genomics knowledge reported higher engagement (r = 0.13, P = 0.001) and those who were more engaged reported more frequent communication with their physicians (r = 0.28, P < 0.001) and family members (r = 0.35, P < 0.001) about ClinSeq. Characteristics of those with higher engagement included poorer overall health (r = -0.13, P = 0.002), greater seeking of health information (r = 0.16, P < 0.001), and more recent study enrollment (r = -0.21, P < 0.001). These data support the importance of internal engagement in communication related to genomic sequencing.Genet Med 19 1, 98-103.
Application of LogitBoost Classifier for Traceability Using SNP Chip Data
Kang, Hyunsung; Cho, Seoae; Kim, Heebal; Seo, Kang-Seok
2015-01-01
Consumer attention to food safety has increased rapidly due to animal-related diseases; therefore, it is important to identify their places of origin (POO) for safety purposes. However, only a few studies have addressed this issue and focused on machine learning-based approaches. In the present study, classification analyses were performed using a customized SNP chip for POO prediction. To accomplish this, 4,122 pigs originating from 104 farms were genotyped using the SNP chip. Several factors were considered to establish the best prediction model based on these data. We also assessed the applicability of the suggested model using a kinship coefficient-filtering approach. Our results showed that the LogitBoost-based prediction model outperformed other classifiers in terms of classification performance under most conditions. Specifically, a greater level of accuracy was observed when a higher kinship-based cutoff was employed. These results demonstrated the applicability of a machine learning-based approach using SNP chip data for practical traceability. PMID:26436917
Application of LogitBoost Classifier for Traceability Using SNP Chip Data.
Kim, Kwondo; Seo, Minseok; Kang, Hyunsung; Cho, Seoae; Kim, Heebal; Seo, Kang-Seok
2015-01-01
Consumer attention to food safety has increased rapidly due to animal-related diseases; therefore, it is important to identify their places of origin (POO) for safety purposes. However, only a few studies have addressed this issue and focused on machine learning-based approaches. In the present study, classification analyses were performed using a customized SNP chip for POO prediction. To accomplish this, 4,122 pigs originating from 104 farms were genotyped using the SNP chip. Several factors were considered to establish the best prediction model based on these data. We also assessed the applicability of the suggested model using a kinship coefficient-filtering approach. Our results showed that the LogitBoost-based prediction model outperformed other classifiers in terms of classification performance under most conditions. Specifically, a greater level of accuracy was observed when a higher kinship-based cutoff was employed. These results demonstrated the applicability of a machine learning-based approach using SNP chip data for practical traceability.
A Concise Atlas of Thyroid Cancer Next-Generation Sequencing Panel ThyroSeq v.2
Alsina, Jorge; Alsina, Raul; Gulec, Seza
2017-01-01
The next-generation sequencing technology allows high out-put genomic analysis. An innovative assay in thyroid cancer, ThyroSeq® was developed for targeted mutation detection by next generation sequencing technology in fine needle aspiration and tissue samples. ThyroSeq v.2 next generation sequencing panel offers simultaneous sequencing and detection in >1000 hotspots of 14 thyroid cancer-related genes and for 42 types of gene fusions known to occur in thyroid cancer. ThyroSeq is being increasingly used to further narrow the indeterminate category defined by cytology for thyroid nodules. From a surgical perspective, genomic profiling also provides prognostic and predictive information and closely relates to determination of surgical strategy. Both the genomic analysis technology and the informatics for the cancer genome data base are rapidly developing. In this paper, we have gathered existing information on the thyroid cancer-related genes involved in the initiation and progression of thyroid cancer. Our goal is to assemble a glossary for the current ThyroSeq genomic panel that can help elucidate the role genomics play in thyroid cancer oncogenesis. PMID:28117295
Strand-Specific RNA-Seq Analyses of Fruiting Body Development in Coprinopsis cinerea
Muraguchi, Hajime; Umezawa, Kiwamu; Niikura, Mai; ...
2015-10-28
We report that the basidiomycete fungus Coprinopsis cinerea is an important model system for multicellular development. Fruiting bodies of C. cinerea are typical mushrooms, which can be produced synchronously on defined media in the laboratory. To investigate the transcriptome in detail during fruiting body development, high-throughput sequencing (RNA-seq) was performed using cDNA libraries strand-specifically constructed from 13 points (stages/tissues) with two biological replicates. The reads were aligned to 14,245 predicted transcripts, and counted for forward and reverse transcripts. Differentially expressed genes (DEGs) between two adjacent points and between vegetative mycelium and each point were detected by Tag Count Comparison (TCC).more » To validate RNA-seq data, expression levels of selected genes were compared using RPKM values in RNA-seq data and qRT-PCR data, and DEGs detected in microarray data were examined in MA plots of RNA-seq data by TCC. We discuss events deduced from GO analysis of DEGs. In addition, we uncovered both transcription factor candidates and antisense transcripts that are likely to be involved in developmental regulation for fruiting.« less
Strand-Specific RNA-Seq Analyses of Fruiting Body Development in Coprinopsis cinerea
DOE Office of Scientific and Technical Information (OSTI.GOV)
Muraguchi, Hajime; Umezawa, Kiwamu; Niikura, Mai
We report that the basidiomycete fungus Coprinopsis cinerea is an important model system for multicellular development. Fruiting bodies of C. cinerea are typical mushrooms, which can be produced synchronously on defined media in the laboratory. To investigate the transcriptome in detail during fruiting body development, high-throughput sequencing (RNA-seq) was performed using cDNA libraries strand-specifically constructed from 13 points (stages/tissues) with two biological replicates. The reads were aligned to 14,245 predicted transcripts, and counted for forward and reverse transcripts. Differentially expressed genes (DEGs) between two adjacent points and between vegetative mycelium and each point were detected by Tag Count Comparison (TCC).more » To validate RNA-seq data, expression levels of selected genes were compared using RPKM values in RNA-seq data and qRT-PCR data, and DEGs detected in microarray data were examined in MA plots of RNA-seq data by TCC. We discuss events deduced from GO analysis of DEGs. In addition, we uncovered both transcription factor candidates and antisense transcripts that are likely to be involved in developmental regulation for fruiting.« less
HSA: a heuristic splice alignment tool.
Bu, Jingde; Chi, Xuebin; Jin, Zhong
2013-01-01
RNA-Seq methodology is a revolutionary transcriptomics sequencing technology, which is the representative of Next generation Sequencing (NGS). With the high throughput sequencing of RNA-Seq, we can acquire much more information like differential expression and novel splice variants from deep sequence analysis and data mining. But the short read length brings a great challenge to alignment, especially when the reads span two or more exons. A two steps heuristic splice alignment tool is generated in this investigation. First, map raw reads to reference with unspliced aligner--BWA; second, split initial unmapped reads into three equal short reads (seeds), align each seed to the reference, filter hits, search possible split position of read and extend hits to a complete match. Compare with other splice alignment tools like SOAPsplice and Tophat2, HSA has a better performance in call rate and efficiency, but its results do not as accurate as the other software to some extent. HSA is an effective spliced aligner of RNA-Seq reads mapping, which is available at https://github.com/vlcc/HSA.
Guttman, Mitchell; Garber, Manuel; Levin, Joshua Z.; Donaghey, Julie; Robinson, James; Adiconis, Xian; Fan, Lin; Koziol, Magdalena J.; Gnirke, Andreas; Nusbaum, Chad; Rinn, John L.; Lander, Eric S.; Regev, Aviv
2010-01-01
RNA-Seq provides an unbiased way to study a transcriptome, including both coding and non-coding genes. To date, most RNA-Seq studies have critically depended on existing annotations, and thus focused on expression levels and variation in known transcripts. Here, we present Scripture, a method to reconstruct the transcriptome of a mammalian cell using only RNA-Seq reads and the genome sequence. We apply it to mouse embryonic stem cells, neuronal precursor cells, and lung fibroblasts to accurately reconstruct the full-length gene structures for the vast majority of known expressed genes. We identify substantial variation in protein-coding genes, including thousands of novel 5′-start sites, 3′-ends, and internal coding exons. We then determine the gene structures of over a thousand lincRNA and antisense loci. Our results open the way to direct experimental manipulation of thousands of non-coding RNAs, and demonstrate the power of ab initio reconstruction to render a comprehensive picture of mammalian transcriptomes. PMID:20436462
Zhou, Ke-Ren; Liu, Shun; Sun, Wen-Ju; Zheng, Ling-Ling; Zhou, Hui; Yang, Jian-Hua; Qu, Liang-Hu
2017-01-04
The abnormal transcriptional regulation of non-coding RNAs (ncRNAs) and protein-coding genes (PCGs) is contributed to various biological processes and linked with human diseases, but the underlying mechanisms remain elusive. In this study, we developed ChIPBase v2.0 (http://rna.sysu.edu.cn/chipbase/) to explore the transcriptional regulatory networks of ncRNAs and PCGs. ChIPBase v2.0 has been expanded with ∼10 200 curated ChIP-seq datasets, which represent about 20 times expansion when comparing to the previous released version. We identified thousands of binding motif matrices and their binding sites from ChIP-seq data of DNA-binding proteins and predicted millions of transcriptional regulatory relationships between transcription factors (TFs) and genes. We constructed 'Regulator' module to predict hundreds of TFs and histone modifications that were involved in or affected transcription of ncRNAs and PCGs. Moreover, we built a web-based tool, Co-Expression, to explore the co-expression patterns between DNA-binding proteins and various types of genes by integrating the gene expression profiles of ∼10 000 tumor samples and ∼9100 normal tissues and cell lines. ChIPBase also provides a ChIP-Function tool and a genome browser to predict functions of diverse genes and visualize various ChIP-seq data. This study will greatly expand our understanding of the transcriptional regulations of ncRNAs and PCGs. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Quantifying circular RNA expression from RNA-seq data using model-based framework.
Li, Musheng; Xie, Xueying; Zhou, Jing; Sheng, Mengying; Yin, Xiaofeng; Ko, Eun-A; Zhou, Tong; Gu, Wanjun
2017-07-15
Circular RNAs (circRNAs) are a class of non-coding RNAs that are widely expressed in various cell lines and tissues of many organisms. Although the exact function of many circRNAs is largely unknown, the cell type-and tissue-specific circRNA expression has implicated their crucial functions in many biological processes. Hence, the quantification of circRNA expression from high-throughput RNA-seq data is becoming important to ascertain. Although many model-based methods have been developed to quantify linear RNA expression from RNA-seq data, these methods are not applicable to circRNA quantification. Here, we proposed a novel strategy that transforms circular transcripts to pseudo-linear transcripts and estimates the expression values of both circular and linear transcripts using an existing model-based algorithm, Sailfish. The new strategy can accurately estimate transcript expression of both linear and circular transcripts from RNA-seq data. Several factors, such as gene length, amount of expression and the ratio of circular to linear transcripts, had impacts on quantification performance of circular transcripts. In comparison to count-based tools, the new computational framework had superior performance in estimating the amount of circRNA expression from both simulated and real ribosomal RNA-depleted (rRNA-depleted) RNA-seq datasets. On the other hand, the consideration of circular transcripts in expression quantification from rRNA-depleted RNA-seq data showed substantial increased accuracy of linear transcript expression. Our proposed strategy was implemented in a program named Sailfish-cir. Sailfish-cir is freely available at https://github.com/zerodel/Sailfish-cir . tongz@medicine.nevada.edu or wanjun.gu@gmail.com. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Atak, Zeynep Kalender; Gianfelici, Valentina; Hulselmans, Gert; De Keersmaecker, Kim; Devasia, Arun George; Geerdens, Ellen; Mentens, Nicole; Chiaretti, Sabina; Durinck, Kaat; Uyttebroeck, Anne; Vandenberghe, Peter; Wlodarska, Iwona; Cloos, Jacqueline; Foà, Robin; Speleman, Frank; Cools, Jan; Aerts, Stein
2013-01-01
RNA-seq is a promising technology to re-sequence protein coding genes for the identification of single nucleotide variants (SNV), while simultaneously obtaining information on structural variations and gene expression perturbations. We asked whether RNA-seq is suitable for the detection of driver mutations in T-cell acute lymphoblastic leukemia (T-ALL). These leukemias are caused by a combination of gene fusions, over-expression of transcription factors and cooperative point mutations in oncogenes and tumor suppressor genes. We analyzed 31 T-ALL patient samples and 18 T-ALL cell lines by high-coverage paired-end RNA-seq. First, we optimized the detection of SNVs in RNA-seq data by comparing the results with exome re-sequencing data. We identified known driver genes with recurrent protein altering variations, as well as several new candidates including H3F3A, PTK2B, and STAT5B. Next, we determined accurate gene expression levels from the RNA-seq data through normalizations and batch effect removal, and used these to classify patients into T-ALL subtypes. Finally, we detected gene fusions, of which several can explain the over-expression of key driver genes such as TLX1, PLAG1, LMO1, or NKX2-1; and others result in novel fusion transcripts encoding activated kinases (SSBP2-FER and TPM3-JAK2) or involving MLLT10. In conclusion, we present novel analysis pipelines for variant calling, variant filtering, and expression normalization on RNA-seq data, and successfully applied these for the detection of translocations, point mutations, INDELs, exon-skipping events, and expression perturbations in T-ALL.
Grape RNA-Seq analysis pipeline environment
Knowles, David G.; Röder, Maik; Merkel, Angelika; Guigó, Roderic
2013-01-01
Motivation: The avalanche of data arriving since the development of NGS technologies have prompted the need for developing fast, accurate and easily automated bioinformatic tools capable of dealing with massive datasets. Among the most productive applications of NGS technologies is the sequencing of cellular RNA, known as RNA-Seq. Although RNA-Seq provides similar or superior dynamic range than microarrays at similar or lower cost, the lack of standard and user-friendly pipelines is a bottleneck preventing RNA-Seq from becoming the standard for transcriptome analysis. Results: In this work we present a pipeline for processing and analyzing RNA-Seq data, that we have named Grape (Grape RNA-Seq Analysis Pipeline Environment). Grape supports raw sequencing reads produced by a variety of technologies, either in FASTA or FASTQ format, or as prealigned reads in SAM/BAM format. A minimal Grape configuration consists of the file location of the raw sequencing reads, the genome of the species and the corresponding gene and transcript annotation. Grape first runs a set of quality control steps, and then aligns the reads to the genome, a step that is omitted for prealigned read formats. Grape next estimates gene and transcript expression levels, calculates exon inclusion levels and identifies novel transcripts. Grape can be run on a single computer or in parallel on a computer cluster. It is distributed with specific mapping and quantification tools, but given its modular design, any tool supporting popular data interchange formats can be integrated. Availability: Grape can be obtained from the Bioinformatics and Genomics website at: http://big.crg.cat/services/grape. Contact: david.gonzalez@crg.eu or roderic.guigo@crg.eu PMID:23329413
The Nuclear Receptor, RORγ, Regulates Pathways Necessary for Breast Cancer Metastasis.
Oh, Tae Gyu; Wang, Shu-Ching M; Acharya, Bipul R; Goode, Joel M; Graham, J Dinny; Clarke, Christine L; Yap, Alpha S; Muscat, George E O
2016-04-01
We have previously reported that RORγ expression was decreased in ER-ve breast cancer, and increased expression improves clinical outcomes. However, the underlying RORγ dependent mechanisms that repress breast carcinogenesis have not been elucidated. Here we report that RORγ negatively regulates the oncogenic TGF-β/EMT and mammary stem cell (MaSC) pathways, whereas RORγ positively regulates DNA-repair. We demonstrate that RORγ expression is: (i) decreased in basal-like subtype cancers, and (ii) inversely correlated with histological grade and drivers of carcinogenesis in breast cancer cohorts. Furthermore, integration of RNA-seq and ChIP-chip data reveals that RORγ regulates the expression of many genes involved in TGF-β/EMT-signaling, DNA-repair and MaSC pathways (including the non-coding RNA, LINC00511). In accordance, pharmacological studies demonstrate that an RORγ agonist suppresses breast cancer cell viability, migration, the EMT transition (microsphere outgrowth) and mammosphere-growth. In contrast, RNA-seq demonstrates an RORγ inverse agonist induces TGF-β/EMT-signaling. These findings suggest pharmacological modulation of RORγ activity may have utility in breast cancer. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Modrák, Martin; Vohradský, Jiří
2018-04-13
Identifying regulons of sigma factors is a vital subtask of gene network inference. Integrating multiple sources of data is essential for correct identification of regulons and complete gene regulatory networks. Time series of expression data measured with microarrays or RNA-seq combined with static binding experiments (e.g., ChIP-seq) or literature mining may be used for inference of sigma factor regulatory networks. We introduce Genexpi: a tool to identify sigma factors by combining candidates obtained from ChIP experiments or literature mining with time-course gene expression data. While Genexpi can be used to infer other types of regulatory interactions, it was designed and validated on real biological data from bacterial regulons. In this paper, we put primary focus on CyGenexpi: a plugin integrating Genexpi with the Cytoscape software for ease of use. As a part of this effort, a plugin for handling time series data in Cytoscape called CyDataseries has been developed and made available. Genexpi is also available as a standalone command line tool and an R package. Genexpi is a useful part of gene network inference toolbox. It provides meaningful information about the composition of regulons and delivers biologically interpretable results.
Liu, Wanting; Xiang, Lunping; Zheng, Tingkai; Jin, Jingjie; Zhang, Gong
2018-01-04
Translation is a key regulatory step, linking transcriptome and proteome. Two major methods of translatome investigations are RNC-seq (sequencing of translating mRNA) and Ribo-seq (ribosome profiling). To facilitate the investigation of translation, we built a comprehensive database TranslatomeDB (http://www.translatomedb.net/) which provides collection and integrated analysis of published and user-generated translatome sequencing data. The current version includes 2453 Ribo-seq, 10 RNC-seq and their 1394 corresponding mRNA-seq datasets in 13 species. The database emphasizes the analysis functions in addition to the dataset collections. Differential gene expression (DGE) analysis can be performed between any two datasets of same species and type, both on transcriptome and translatome levels. The translation indices translation ratios, elongation velocity index and translational efficiency can be calculated to quantitatively evaluate translational initiation efficiency and elongation velocity, respectively. All datasets were analyzed using a unified, robust, accurate and experimentally-verifiable pipeline based on the FANSe3 mapping algorithm and edgeR for DGE analyzes. TranslatomeDB also allows users to upload their own datasets and utilize the identical unified pipeline to analyze their data. We believe that our TranslatomeDB is a comprehensive platform and knowledgebase on translatome and proteome research, releasing the biologists from complex searching, analyzing and comparing huge sequencing data without needing local computational power. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
A novel statistical method for quantitative comparison of multiple ChIP-seq datasets.
Chen, Li; Wang, Chi; Qin, Zhaohui S; Wu, Hao
2015-06-15
ChIP-seq is a powerful technology to measure the protein binding or histone modification strength in the whole genome scale. Although there are a number of methods available for single ChIP-seq data analysis (e.g. 'peak detection'), rigorous statistical method for quantitative comparison of multiple ChIP-seq datasets with the considerations of data from control experiment, signal to noise ratios, biological variations and multiple-factor experimental designs is under-developed. In this work, we develop a statistical method to perform quantitative comparison of multiple ChIP-seq datasets and detect genomic regions showing differential protein binding or histone modification. We first detect peaks from all datasets and then union them to form a single set of candidate regions. The read counts from IP experiment at the candidate regions are assumed to follow Poisson distribution. The underlying Poisson rates are modeled as an experiment-specific function of artifacts and biological signals. We then obtain the estimated biological signals and compare them through the hypothesis testing procedure in a linear model framework. Simulations and real data analyses demonstrate that the proposed method provides more accurate and robust results compared with existing ones. An R software package ChIPComp is freely available at http://web1.sph.emory.edu/users/hwu30/software/ChIPComp.html. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
zUMIs - A fast and flexible pipeline to process RNA sequencing data with UMIs.
Parekh, Swati; Ziegenhain, Christoph; Vieth, Beate; Enard, Wolfgang; Hellmann, Ines
2018-06-01
Single-cell RNA-sequencing (scRNA-seq) experiments typically analyze hundreds or thousands of cells after amplification of the cDNA. The high throughput is made possible by the early introduction of sample-specific bar codes (BCs), and the amplification bias is alleviated by unique molecular identifiers (UMIs). Thus, the ideal analysis pipeline for scRNA-seq data needs to efficiently tabulate reads according to both BC and UMI. zUMIs is a pipeline that can handle both known and random BCs and also efficiently collapse UMIs, either just for exon mapping reads or for both exon and intron mapping reads. If BC annotation is missing, zUMIs can accurately detect intact cells from the distribution of sequencing reads. Another unique feature of zUMIs is the adaptive downsampling function that facilitates dealing with hugely varying library sizes but also allows the user to evaluate whether the library has been sequenced to saturation. To illustrate the utility of zUMIs, we analyzed a single-nucleus RNA-seq dataset and show that more than 35% of all reads map to introns. Also, we show that these intronic reads are informative about expression levels, significantly increasing the number of detected genes and improving the cluster resolution. zUMIs flexibility makes if possible to accommodate data generated with any of the major scRNA-seq protocols that use BCs and UMIs and is the most feature-rich, fast, and user-friendly pipeline to process such scRNA-seq data.
NASA Technical Reports Server (NTRS)
Smith, Edwyn D.
1991-01-01
Two silicon CMOS application specific integrated circuits (ASICs), a data generation chip, and a data checker chip were designed. The conversion of the data generator circuitry into a pair of CMOS ASIC chips using the 1.2 micron standard cell library is documented. The logic design of the data checker is discussed. The functions of the control circuitry is described. An accurate estimate of timing relationships is essential to make sure that the logic design performs correctly under practical conditions. Timing and delay information are examined.
Hu, Beibei; Zhang, Xueqing; Chen, Haopeng; Cui, Daxiang
2011-03-01
We proposed a new algorithm for automatic identification of fluorescent signal. Based on the features of chromatographic chips, mathematic morphology in RGB color space was used to filter and enhance the images, pyramid connection was used to segment the areas of fluorescent signal, and then the method of Gaussian Mixture Model was used to detect the fluorescent signal. Finally we calculated the average fluorescent intensity in obtained fluorescent areas. Our results show that the algorithm has a good efficacy to segment the fluorescent areas, can detect the fluorescent signal quickly and accurately, and finally realize the quantitative detection of fluorescent signal in chromatographic chip.
Bartram, Jack; Mountjoy, Edward; Brooks, Tony; Hancock, Jeremy; Williamson, Helen; Wright, Gary; Moppett, John; Goulden, Nick; Hubank, Mike
2016-07-01
High-throughput sequencing (HTS) (next-generation sequencing) of the rearranged Ig and T-cell receptor genes promises to be less expensive and more sensitive than current methods of monitoring minimal residual disease (MRD) in patients with acute lymphoblastic leukemia. However, the adoption of new approaches by clinical laboratories requires careful evaluation of all potential sources of error and the development of strategies to ensure the highest accuracy. Timely and efficient clinical use of HTS platforms will depend on combining multiple samples (multiplexing) in each sequencing run. Here we examine the Ig heavy-chain gene HTS on the Illumina MiSeq platform for MRD. We identify errors associated with multiplexing that could potentially impact the accuracy of MRD analysis. We optimize a strategy that combines high-purity, sequence-optimized oligonucleotides, dual indexing, and an error-aware demultiplexing approach to minimize errors and maximize sensitivity. We present a probability-based, demultiplexing pipeline Error-Aware Demultiplexer that is suitable for all MiSeq strategies and accurately assigns samples to the correct identifier without excessive loss of data. Finally, using controls quantified by digital PCR, we show that HTS-MRD can accurately detect as few as 1 in 10(6) copies of specific leukemic MRD. Crown Copyright © 2016. Published by Elsevier Inc. All rights reserved.
A software pipeline for prediction of allele-specific alternative RNA processing events using single RNA-seq data. The current version focuses on prediction of alternative splicing and alternative polyadenylation modulated by genetic variants.
DEMO: Sequence Alignment to Predict Across Species Susceptibility
The US Environmental Protection Agency Sequence Alignment to Predict Across Species Susceptibility tool (SeqAPASS; https://seqapass.epa.gov/seqapass/) was developed to comparatively evaluate protein sequence and structural similarity across species as a means to extrapolate toxic...
Johnson, Nathan T; Dhroso, Andi; Hughes, Katelyn J; Korkin, Dmitry
2018-06-25
The extent to which the genes are expressed in the cell can be simplistically defined as a function of one or more factors of the environment, lifestyle, and genetics. RNA sequencing (RNA-Seq) is becoming a prevalent approach to quantify gene expression, and is expected to gain better insights to a number of biological and biomedical questions, compared to the DNA microarrays. Most importantly, RNA-Seq allows to quantify expression at the gene and alternative splicing isoform levels. However, leveraging the RNA-Seq data requires development of new data mining and analytics methods. Supervised machine learning methods are commonly used approaches for biological data analysis, and have recently gained attention for their applications to the RNA-Seq data. In this work, we assess the utility of supervised learning methods trained on RNA-Seq data for a diverse range of biological classification tasks. We hypothesize that the isoform-level expression data is more informative for biological classification tasks than the gene-level expression data. Our large-scale assessment is done through utilizing multiple datasets, organisms, lab groups, and RNA-Seq analysis pipelines. Overall, we performed and assessed 61 biological classification problems that leverage three independent RNA-Seq datasets and include over 2,000 samples that come from multiple organisms, lab groups, and RNA-Seq analyses. These 61 problems include predictions of the tissue type, sex, or age of the sample, healthy or cancerous phenotypes and, the pathological tumor stage for the samples from the cancerous tissue. For each classification problem, the performance of three normalization techniques and six machine learning classifiers was explored. We find that for every single classification problem, the isoform-based classifiers outperform or are comparable with gene expression based methods. The top-performing supervised learning techniques reached a near perfect classification accuracy, demonstrating the utility of supervised learning for RNA-Seq based data analysis. Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees.
Williams, Philip H; Eyles, Rod; Weiller, Georg
2012-01-01
MicroRNAs (miRNAs) are nonprotein coding RNAs between 20 and 22 nucleotides long that attenuate protein production. Different types of sequence data are being investigated for novel miRNAs, including genomic and transcriptomic sequences. A variety of machine learning methods have successfully predicted miRNA precursors, mature miRNAs, and other nonprotein coding sequences. MirTools, mirDeep2, and miRanalyzer require "read count" to be included with the input sequences, which restricts their use to deep-sequencing data. Our aim was to train a predictor using a cross-section of different species to accurately predict miRNAs outside the training set. We wanted a system that did not require read-count for prediction and could therefore be applied to short sequences extracted from genomic, EST, or RNA-seq sources. A miRNA-predictive decision-tree model has been developed by supervised machine learning. It only requires that the corresponding genome or transcriptome is available within a sequence window that includes the precursor candidate so that the required sequence features can be collected. Some of the most critical features for training the predictor are the miRNA:miRNA(∗) duplex energy and the number of mismatches in the duplex. We present a cross-species plant miRNA predictor with 84.08% sensitivity and 98.53% specificity based on rigorous testing by leave-one-out validation.
Calibrated thermal microscopy of the tool-chip interface in machining
NASA Astrophysics Data System (ADS)
Yoon, Howard W.; Davies, Matthew A.; Burns, Timothy J.; Kennedy, M. D.
2000-03-01
A critical parameter in predicting tool wear during machining and in accurate computer simulations of machining is the spatially-resolved temperature at the tool-chip interface. We describe the development and the calibration of a nearly diffraction-limited thermal-imaging microscope to measure the spatially-resolved temperatures during the machining of an AISI 1045 steel with a tungsten-carbide tool bit. The microscope has a target area of 0.5 mm X 0.5 mm square region with a < 5 micrometers spatial resolution and is based on a commercial InSb 128 X 128 focal plane array with an all reflective microscope objective. The minimum frame image acquisition time is < 1 ms. The microscope is calibrated using a standard blackbody source from the radiance temperature calibration laboratory at the National Institute of Standards and Technology, and the emissivity of the machined material is deduced from the infrared reflectivity measurements. The steady-state thermal images from the machining of 1045 steel are compared to previous determinations of tool temperatures from micro-hardness measurements and are found to be in agreement with those studies. The measured average chip temperatures are also in agreement with the temperature rise estimated from energy balance considerations. From these calculations and the agreement between the experimental and the calculated determinations of the emissivity of the 1045 steel, the standard uncertainty of the temperature measurements is estimated to be about 45 degree(s)C at 900 degree(s)C.
Improved accuracies for satellite tracking
NASA Technical Reports Server (NTRS)
Kammeyer, P. C.; Fiala, A. D.; Seidelmann, P. K.
1991-01-01
A charge coupled device (CCD) camera on an optical telescope which follows the stars can be used to provide high accuracy comparisons between the line of sight to a satellite, over a large range of satellite altitudes, and lines of sight to nearby stars. The CCD camera can be rotated so the motion of the satellite is down columns of the CCD chip, and charge can be moved from row to row of the chip at a rate which matches the motion of the optical image of the satellite across the chip. Measurement of satellite and star images, together with accurate timing of charge motion, provides accurate comparisons of lines of sight. Given lines of sight to stars near the satellite, the satellite line of sight may be determined. Initial experiments with this technique, using an 18 cm telescope, have produced TDRS-4 observations which have an rms error of 0.5 arc second, 100 m at synchronous altitude. Use of a mosaic of CCD chips, each having its own rate of charge motion, in the focal place of a telescope would allow point images of a geosynchronous satellite and of stars to be formed simultaneously in the same telescope. The line of sight of such a satellite could be measured relative to nearby star lines of sight with an accuracy of approximately 0.03 arc second. Development of a star catalog with 0.04 arc second rms accuracy and perhaps ten stars per square degree would allow determination of satellite lines of sight with 0.05 arc second rms absolute accuracy, corresponding to 10 m at synchronous altitude. Multiple station time transfers through a communications satellite can provide accurate distances from the satellite to the ground stations. Such observations can, if calibrated for delays, determine satellite orbits to an accuracy approaching 10 m rms.
The role of simulation in the design of a neural network chip
NASA Technical Reports Server (NTRS)
Desai, Utpal; Roppel, Thaddeus A.; Padgett, Mary L.
1993-01-01
An iterative, simulation-based design procedure for a neural network chip is introduced. For this design procedure, the goal is to produce a chip layout for a neural network in which the weights are determined by transistor gate width-to-length ratios. In a given iteration, the current layout is simulated using the circuit simulator SPICE, and layout adjustments are made based on conventional gradient-decent methods. After the iteration converges, the chip is fabricated. Monte Carlo analysis is used to predict the effect of statistical fabrication process variations on the overall performance of the neural network chip.
GHOST: global hepatitis outbreak and surveillance technology.
Longmire, Atkinson G; Sims, Seth; Rytsareva, Inna; Campo, David S; Skums, Pavel; Dimitrova, Zoya; Ramachandran, Sumathi; Medrzycki, Magdalena; Thai, Hong; Ganova-Raeva, Lilia; Lin, Yulin; Punkova, Lili T; Sue, Amanda; Mirabito, Massimo; Wang, Silver; Tracy, Robin; Bolet, Victor; Sukalac, Thom; Lynberg, Chris; Khudyakov, Yury
2017-12-06
Hepatitis C is a major public health problem in the United States and worldwide. Outbreaks of hepatitis C virus (HCV) infections associated with unsafe injection practices, drug diversion, and other exposures to blood are difficult to detect and investigate. Effective HCV outbreak investigation requires comprehensive surveillance and robust case investigation. We previously developed and validated a methodology for the rapid and cost-effective identification of HCV transmission clusters. Global Hepatitis Outbreak and Surveillance Technology (GHOST) is a cloud-based system enabling users, regardless of computational expertise, to analyze and visualize transmission clusters in an independent, accurate and reproducible way. We present and explore performance of several GHOST implemented algorithms using next-generation sequencing data experimentally obtained from hypervariable region 1 of genetically related and unrelated HCV strains. GHOST processes data from an entire MiSeq run in approximately 3 h. A panel of seven specimens was used for preparation of six repeats of MiSeq libraries. Testing sequence data from these libraries by GHOST showed a consistent transmission linkage detection, testifying to high reproducibility of the system. Lack of linkage among genetically unrelated HCV strains and constant detection of genetic linkage between HCV strains from known transmission pairs and from follow-up specimens at different levels of MiSeq-read sampling indicate high specificity and sensitivity of GHOST in accurate detection of HCV transmission. GHOST enables automatic extraction of timely and relevant public health information suitable for guiding effective intervention measures. It is designed as a virtual diagnostic system intended for use in molecular surveillance and outbreak investigations rather than in research. The system produces accurate and reproducible information on HCV transmission clusters for all users, irrespective of their level of bioinformatics expertise. Improvement in molecular detection capacity will contribute to increasing the rate of transmission detection, thus providing opportunity for rapid, accurate and effective response to outbreaks of hepatitis C. Although GHOST was originally developed for hepatitis C surveillance, its modular structure is readily applicable to other infectious diseases. Worldwide availability of GHOST for the detection of HCV transmissions will foster deeper involvement of public health researchers and practitioners in hepatitis C outbreak investigation.
Identification of innate lymphoid cells in single-cell RNA-Seq data.
Suffiotti, Madeleine; Carmona, Santiago J; Jandus, Camilla; Gfeller, David
2017-07-01
Innate lymphoid cells (ILCs) consist of natural killer (NK) cells and non-cytotoxic ILCs that are broadly classified into ILC1, ILC2, and ILC3 subtypes. These cells recently emerged as important early effectors of innate immunity for their roles in tissue homeostasis and inflammation. Over the last few years, ILCs have been extensively studied in mouse and human at the functional and molecular level, including gene expression profiling. However, sorting ILCs with flow cytometry for gene expression analysis is a delicate and time-consuming process. Here we propose and validate a novel framework for studying ILCs at the transcriptomic level using single-cell RNA-Seq data. Our approach combines unsupervised clustering and a new cell type classifier trained on mouse ILC gene expression data. We show that this approach can accurately identify different ILCs, especially ILC2 cells, in human lymphocyte single-cell RNA-Seq data. Our new model relies only on genes conserved across vertebrates, thereby making it in principle applicable in any vertebrate species. Considering the rapid increase in throughput of single-cell RNA-Seq technology, our work provides a computational framework for studying ILC2 cells in single-cell transcriptomic data and may help exploring their conservation in distant vertebrate species.
Zhang, Ye; Li, Wenjie; Zhou, Yun; Johnson, Amanda; Venable, Amanda; Hassan, Ahmed; Griswold, John; Pappas, Dimitri
2017-12-18
A microfluidic affinity separation device was developed for the detection of sepsis in critical care patients. An affinity capture method was developed to capture cells based on changes in CD64 expression in a single, simple microfluidic chip for sepsis detection. Both sepsis patient samples and a laboratory CD64+ expression model were used to validate the microfluidic assay. Flow cytometry analysis showed that the chip cell capture had a linear relationship with CD64 expression in laboratory models. The Sepsis Chip detected an increase in upregulated neutrophil-like cells when the upregulated cell population is as low as 10% of total cells spiked into commercially available aseptic blood samples. In a proof of concept study, blood samples obtained from sepsis patients within 24 hours of diagnosis were tested on the chip to further validate its performance. On-chip CD64+ cell capture from 10 patient samples (619 ± 340 cells per chip) was significantly different from control samples (32 ± 11 cells per chip) and healthy volunteer samples (228 ± 95 cells per chip). In addition, the on-chip cell capture has a linear relationship with CD64 expression indicating our approach can be used to measure CD64 expression based on total cell capture on Sepsis Chip. Our method has proven to be sensitive, accurate, rapid, and cost-effective. Therefore, this device is a promising detection platform for neutrophil activation and sepsis diagnosis.
Thermal-Aware Test Access Mechanism and Wrapper Design Optimization for System-on-Chips
NASA Astrophysics Data System (ADS)
Yu, Thomas Edison; Yoneda, Tomokazu; Chakrabarty, Krishnendu; Fujiwara, Hideo
Rapid advances in semiconductor manufacturing technology have led to higher chip power densities, which places greater emphasis on packaging and temperature control during testing. For system-on-chips, peak power-based scheduling algorithms have been used to optimize tests under specified power constraints. However, imposing power constraints does not always solve the problem of overheating due to the non-uniform distribution of power across the chip. This paper presents a TAM/Wrapper co-design methodology for system-on-chips that ensures thermal safety while still optimizing the test schedule. The method combines a simplified thermal-cost model with a traditional bin-packing algorithm to minimize test time while satisfying temperature constraints. Furthermore, for temperature checking, thermal simulation is done using cycle-accurate power profiles for more realistic results. Experiments show that even a minimal sacrifice in test time can yield a considerable decrease in test temperature as well as the possibility of further lowering temperatures beyond those achieved using traditional power-based test scheduling.
YM500v2: a small RNA sequencing (smRNA-seq) database for human cancer miRNome research
Cheng, Wei-Chung; Chung, I-Fang; Tsai, Cheng-Fong; Huang, Tse-Shun; Chen, Chen-Yang; Wang, Shao-Chuan; Chang, Ting-Yu; Sun, Hsing-Jen; Chao, Jeffrey Yung-Chuan; Cheng, Cheng-Chung; Wu, Cheng-Wen; Wang, Hsei-Wei
2015-01-01
We previously presented YM500, which is an integrated database for miRNA quantification, isomiR identification, arm switching discovery and novel miRNA prediction from 468 human smRNA-seq datasets. Here in this updated YM500v2 database (http://ngs.ym.edu.tw/ym500/), we focus on the cancer miRNome to make the database more disease-orientated. New miRNA-related algorithms developed after YM500 were included in YM500v2, and, more significantly, more than 8000 cancer-related smRNA-seq datasets (including those of primary tumors, paired normal tissues, PBMC, recurrent tumors, and metastatic tumors) were incorporated into YM500v2. Novel miRNAs (miRNAs not included in the miRBase R21) were not only predicted by three independent algorithms but also cleaned by a new in silico filtration strategy and validated by wetlab data such as Cross-Linked ImmunoPrecipitation sequencing (CLIP-seq) to reduce the false-positive rate. A new function ‘Meta-analysis’ is additionally provided for allowing users to identify real-time differentially expressed miRNAs and arm-switching events according to customer-defined sample groups and dozens of clinical criteria tidying up by proficient clinicians. Cancer miRNAs identified hold the potential for both basic research and biotech applications. PMID:25398902
Hosseinkhani, Farideh; Emaneini, Mohammad; van Leeuwen, Willem
2017-07-20
Using Illumina HiSeq and PacBio technologies, we sequenced the genome of the multidrug-resistant bacterium Staphylococcus haemolyticus , originating from a bloodstream infection in a neonate. The sequence data can be used as an accurate reference sequence. Copyright © 2017 Hosseinkhani et al.
Li, Yongping; Wei, Wei; Feng, Jia; Luo, Huifeng; Pi, Mengting; Liu, Zhongchi; Kang, Chunying
2018-01-01
Abstract The genome of the wild diploid strawberry species Fragaria vesca, an ideal model system of cultivated strawberry (Fragaria × ananassa, octoploid) and other Rosaceae family crops, was first published in 2011 and followed by a new assembly (Fvb). However, the annotation for Fvb mainly relied on ab initio predictions and included only predicted coding sequences, therefore an improved annotation is highly desirable. Here, a new annotation version named v2.0.a2 was created for the Fvb genome by a pipeline utilizing one PacBio library, 90 Illumina RNA-seq libraries, and 9 small RNA-seq libraries. Altogether, 18,641 genes (55.6% out of 33,538 genes) were augmented with information on the 5′ and/or 3′ UTRs, 13,168 (39.3%) protein-coding genes were modified or newly identified, and 7,370 genes were found to possess alternative isoforms. In addition, 1,938 long non-coding RNAs, 171 miRNAs, and 51,714 small RNA clusters were integrated into the annotation. This new annotation of F. vesca is substantially improved in both accuracy and integrity of gene predictions, beneficial to the gene functional studies in strawberry and to the comparative genomic analysis of other horticultural crops in Rosaceae family. PMID:29036429
Axt, Brant; Hsieh, Yi-Fan; Nalayanda, Divya; Wang, Tza-Huei
2017-09-01
Droplet microfluidics has found use in many biological assay applications as a means of high-throughput sample processing. One of the challenges of the technology, however, is the ability to control and merge droplets on-demand as they flow through the microdevices. It is in the interest of developing lab-on-chip devices to be able to combinatorically program additive mixing steps for more complex multistep and multiplex assays. Existing technologies to merge droplets are either passive in nature or require highly predictable droplet movement for feedforward control, making them vulnerable to errors during high throughput operation. In this paper, we describe and demonstrate a microfluidic valve-based device for the purpose of combinatorial droplet injection at any stage in a multistep assay. Microfluidic valves are used to robustly control fluid flow, droplet generation, and droplet mixing in the device on-demand, while on-chip impedance measurements taken in real time are used as feedback to accurately time the droplet injections. The presented system is contrasted to attempts without feedback, and is shown to be 100% reliable over long durations. Additionally, content detection and discretionary injections are explored and successfully executed.
Stec, James; Wang, Jing; Coombes, Kevin; Ayers, Mark; Hoersch, Sebastian; Gold, David L.; Ross, Jeffrey S; Hess, Kenneth R.; Tirrell, Stephen; Linette, Gerald; Hortobagyi, Gabriel N.; Symmans, W. Fraser; Pusztai, Lajos
2005-01-01
We examined how well differentially expressed genes and multigene outcome classifiers retain their class-discriminating values when tested on data generated by different transcriptional profiling platforms. RNA from 33 stage I-III breast cancers was hybridized to both Affymetrix GeneChip and Millennium Pharmaceuticals cDNA arrays. Only 30% of all corresponding gene expression measurements on the two platforms had Pearson correlation coefficient r ≥ 0.7 when UniGene was used to match probes. There was substantial variation in correlation between different Affymetrix probe sets matched to the same cDNA probe. When cDNA and Affymetrix probes were matched by basic local alignment tool (BLAST) sequence identity, the correlation increased substantially. We identified 182 genes in the Affymetrix and 45 in the cDNA data (including 17 common genes) that accurately separated 91% of cases in supervised hierarchical clustering in each data set. Cross-platform testing of these informative genes resulted in lower clustering accuracy of 45 and 79%, respectively. Several sets of accurate five-gene classifiers were developed on each platform using linear discriminant analysis. The best 100 classifiers showed average misclassification error rate of 2% on the original data that rose to 19.5% when tested on data from the other platform. Random five-gene classifiers showed misclassification error rate of 33%. We conclude that multigene predictors optimized for one platform lose accuracy when applied to data from another platform due to missing genes and sequence differences in probes that result in differing measurements for the same gene. PMID:16049308
Measurement And Modeling Of Fe VIII To Fe XVI M-shell Emission In The Extreme Ultraviolet
NASA Astrophysics Data System (ADS)
Beiersdorfer, Peter; Lepson, J. K.; Hurwitz, M.
2007-05-01
The solar EUV emission near 200 Å is presently being studied with high resolution with the Cosmic Hot Interstellar Plasma Spectrometer (CHIPS), which focuses on the emission between 90 and 270 Å, and with the EUV Imaging Spectrometer on Hinode, which focuses on the region 180 to 204 Å and 250 to 290 Å. The Solar EUV Experiment on the TIMED spacecraft also observes this spectral band but with greatly reduced resolution. The spectrum in this region is dominated by emission from moderate charge states of iron. The interpretation of the data relies on accurate and complete plasma emission models, notably CHIANTI. We have performed a series of laboratory measurements of the 3-3 emission from M-shell iron ions. The measurements cover the range 170 - 250 Å and are made at an electron density of about 1011 cm-3. Emission from Fe VIII through Fe XVI has been identified. Excellent agreement with CHIANTI predictions is found. A few weak transitions are noted in the laboratory data that are predicted by CHIANTI to be vanishingly small and should not have been observed. These are tentatively attributed to transitions in Fe XV. A comparison with observations from CHIPS is also presented. This work was supported in part by NASA's Solar and Heliospheric Physics Supporting Research and Technology Program. Work at UC-LLNL was performed under the auspices of the DOE by under Contract W-7405-Eng-48.
Lerdrup, Mads; Gomes, Ana-Luisa; Kryh, Hanna; Spigolon, Giada; Caboche, Jocelyne; Fisone, Gilberto; Hansen, Klaus
2014-01-01
Polycomb group (PcG) proteins bind to and repress genes in embryonic stem cells through lineage commitment to the terminal differentiated state. PcG repressed genes are commonly characterized by the presence of the epigenetic histone mark H3K27me3, catalyzed by the Polycomb repressive complex 2. Here, we present in vivo evidence for a previously unrecognized plasticity of PcG-repressed genes in terminally differentiated brain neurons of parkisonian mice. We show that acute administration of the dopamine precursor, L-DOPA, induces a remarkable increase in H3K27me3S28 phosphorylation. The induction of the H3K27me3S28p histone mark specifically occurs in medium spiny neurons expressing dopamine D1 receptors and is dependent on Msk1 kinase activity and DARPP-32-mediated inhibition of protein phosphatase-1. Chromatin immunoprecipitation (ChIP) experiments showed that increased H3K27me3S28p was accompanied by reduced PcG binding to regulatory regions of genes. An analysis of the genome wide distribution of L-DOPA-induced H3K27me3S28 phosphorylation by ChIP sequencing (ChIP-seq) in combination with expression analysis by RNA-sequencing (RNA-seq) showed that the induction of H3K27me3S28p correlated with increased expression of a subset of PcG repressed genes. We found that induction of H3K27me3S28p persisted during chronic L-DOPA administration to parkisonian mice and correlated with aberrant gene expression. We propose that dopaminergic transmission can activate PcG repressed genes in the adult brain and thereby contribute to long-term maladaptive responses including the motor complications, or dyskinesia, caused by prolonged administration of L-DOPA in Parkinson's disease. PMID:25254549
PySeqLab: an open source Python package for sequence labeling and segmentation.
Allam, Ahmed; Krauthammer, Michael
2017-11-01
Text and genomic data are composed of sequential tokens, such as words and nucleotides that give rise to higher order syntactic constructs. In this work, we aim at providing a comprehensive Python library implementing conditional random fields (CRFs), a class of probabilistic graphical models, for robust prediction of these constructs from sequential data. Python Sequence Labeling (PySeqLab) is an open source package for performing supervised learning in structured prediction tasks. It implements CRFs models, that is discriminative models from (i) first-order to higher-order linear-chain CRFs, and from (ii) first-order to higher-order semi-Markov CRFs (semi-CRFs). Moreover, it provides multiple learning algorithms for estimating model parameters such as (i) stochastic gradient descent (SGD) and its multiple variations, (ii) structured perceptron with multiple averaging schemes supporting exact and inexact search using 'violation-fixing' framework, (iii) search-based probabilistic online learning algorithm (SAPO) and (iv) an interface for Broyden-Fletcher-Goldfarb-Shanno (BFGS) and the limited-memory BFGS algorithms. Viterbi and Viterbi A* are used for inference and decoding of sequences. Using PySeqLab, we built models (classifiers) and evaluated their performance in three different domains: (i) biomedical Natural language processing (NLP), (ii) predictive DNA sequence analysis and (iii) Human activity recognition (HAR). State-of-the-art performance comparable to machine-learning based systems was achieved in the three domains without feature engineering or the use of knowledge sources. PySeqLab is available through https://bitbucket.org/A_2/pyseqlab with tutorials and documentation. ahmed.allam@yale.edu or michael.krauthammer@yale.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Gong, Ting; Szustakowski, Joseph D
2013-04-15
For heterogeneous tissues, measurements of gene expression through mRNA-Seq data are confounded by relative proportions of cell types involved. In this note, we introduce an efficient pipeline: DeconRNASeq, an R package for deconvolution of heterogeneous tissues based on mRNA-Seq data. It adopts a globally optimized non-negative decomposition algorithm through quadratic programming for estimating the mixing proportions of distinctive tissue types in next-generation sequencing data. We demonstrated the feasibility and validity of DeconRNASeq across a range of mixing levels and sources using mRNA-Seq data mixed in silico at known concentrations. We validated our computational approach for various benchmark data, with high correlation between our predicted cell proportions and the real fractions of tissues. Our study provides a rigorous, quantitative and high-resolution tool as a prerequisite to use mRNA-Seq data. The modularity of package design allows an easy deployment of custom analytical pipelines for data from other high-throughput platforms. DeconRNASeq is written in R, and is freely available at http://bioconductor.org/packages. Supplementary data are available at Bioinformatics online.
Brezovský, Jan
2016-01-01
An important message taken from human genome sequencing projects is that the human population exhibits approximately 99.9% genetic similarity. Variations in the remaining parts of the genome determine our identity, trace our history and reveal our heritage. The precise delineation of phenotypically causal variants plays a key role in providing accurate personalized diagnosis, prognosis, and treatment of inherited diseases. Several computational methods for achieving such delineation have been reported recently. However, their ability to pinpoint potentially deleterious variants is limited by the fact that their mechanisms of prediction do not account for the existence of different categories of variants. Consequently, their output is biased towards the variant categories that are most strongly represented in the variant databases. Moreover, most such methods provide numeric scores but not binary predictions of the deleteriousness of variants or confidence scores that would be more easily understood by users. We have constructed three datasets covering different types of disease-related variants, which were divided across five categories: (i) regulatory, (ii) splicing, (iii) missense, (iv) synonymous, and (v) nonsense variants. These datasets were used to develop category-optimal decision thresholds and to evaluate six tools for variant prioritization: CADD, DANN, FATHMM, FitCons, FunSeq2 and GWAVA. This evaluation revealed some important advantages of the category-based approach. The results obtained with the five best-performing tools were then combined into a consensus score. Additional comparative analyses showed that in the case of missense variations, protein-based predictors perform better than DNA sequence-based predictors. A user-friendly web interface was developed that provides easy access to the five tools’ predictions, and their consensus scores, in a user-understandable format tailored to the specific features of different categories of variations. To enable comprehensive evaluation of variants, the predictions are complemented with annotations from eight databases. The web server is freely available to the community at http://loschmidt.chemi.muni.cz/predictsnp2. PMID:27224906
Bendl, Jaroslav; Musil, Miloš; Štourač, Jan; Zendulka, Jaroslav; Damborský, Jiří; Brezovský, Jan
2016-05-01
An important message taken from human genome sequencing projects is that the human population exhibits approximately 99.9% genetic similarity. Variations in the remaining parts of the genome determine our identity, trace our history and reveal our heritage. The precise delineation of phenotypically causal variants plays a key role in providing accurate personalized diagnosis, prognosis, and treatment of inherited diseases. Several computational methods for achieving such delineation have been reported recently. However, their ability to pinpoint potentially deleterious variants is limited by the fact that their mechanisms of prediction do not account for the existence of different categories of variants. Consequently, their output is biased towards the variant categories that are most strongly represented in the variant databases. Moreover, most such methods provide numeric scores but not binary predictions of the deleteriousness of variants or confidence scores that would be more easily understood by users. We have constructed three datasets covering different types of disease-related variants, which were divided across five categories: (i) regulatory, (ii) splicing, (iii) missense, (iv) synonymous, and (v) nonsense variants. These datasets were used to develop category-optimal decision thresholds and to evaluate six tools for variant prioritization: CADD, DANN, FATHMM, FitCons, FunSeq2 and GWAVA. This evaluation revealed some important advantages of the category-based approach. The results obtained with the five best-performing tools were then combined into a consensus score. Additional comparative analyses showed that in the case of missense variations, protein-based predictors perform better than DNA sequence-based predictors. A user-friendly web interface was developed that provides easy access to the five tools' predictions, and their consensus scores, in a user-understandable format tailored to the specific features of different categories of variations. To enable comprehensive evaluation of variants, the predictions are complemented with annotations from eight databases. The web server is freely available to the community at http://loschmidt.chemi.muni.cz/predictsnp2.
Blood Biomarkers for the Early Diagnosis of Stroke: The Stroke-Chip Study.
Bustamante, Alejandro; López-Cancio, Elena; Pich, Sara; Penalba, Anna; Giralt, Dolors; García-Berrocoso, Teresa; Ferrer-Costa, Carles; Gasull, Teresa; Hernández-Pérez, María; Millan, Mónica; Rubiera, Marta; Cardona, Pedro; Cano, Luis; Quesada, Helena; Terceño, Mikel; Silva, Yolanda; Castellanos, Mar; Garces, Moisés; Reverté, Silvia; Ustrell, Xavier; Marés, Rafael; Baiges, Joan Josep; Serena, Joaquín; Rubio, Francisco; Salas, Eduardo; Dávalos, Antoni; Montaner, Joan
2017-09-01
Stroke diagnosis could be challenging in the acute phase. We aimed to develop a blood-based diagnostic tool to differentiate between real strokes and stroke mimics and between ischemic and hemorrhagic strokes in the hyperacute phase. The Stroke-Chip was a prospective, observational, multicenter study, conducted at 6 Stroke Centers in Catalonia. Consecutive patients with suspected stroke were enrolled within the first 6 hours after symptom onset, and blood samples were drawn immediately after admission. A 21-biomarker panel selected among previous results and from the literature was measured by immunoassays. Outcomes were differentiation between real strokes and stroke mimics and between ischemic and hemorrhagic strokes. Predictive models were developed by combining biomarkers and clinical variables in logistic regression models. Accuracy was evaluated with receiver operating characteristic curves. From August 2012 to December 2013, 1308 patients were included (71.9% ischemic, 14.8% stroke mimics, and 13.3% hemorrhagic). For stroke versus stroke mimics comparison, no biomarker resulted included in the logistic regression model, but it was only integrated by clinical variables, with a predictive accuracy of 80.8%. For ischemic versus hemorrhagic strokes comparison, NT-proBNP (N-Terminal Pro-B-Type Natriuretic Peptide) >4.9 (odds ratio, 2.40; 95% confidence interval, 1.55-3.71; P <0.0001) and endostatin >4.7 (odds ratio, 2.02; 95% confidence interval, 1.19-3.45; P =0.010), together with age, sex, blood pressure, stroke severity, atrial fibrillation, and hypertension, were included in the model. Predictive accuracy was 80.6%. The studied biomarkers were not sufficient for an accurate differential diagnosis of stroke in the hyperacute setting. Additional discovery of new biomarkers and improvement on laboratory techniques seem necessary for achieving a molecular diagnosis of stroke. © 2017 American Heart Association, Inc.
Lee, Hyungseok; Cho, Dong-Woo
2016-07-05
Although various types of organs-on-chips have been introduced recently as tools for drug discovery, the current studies are limited in terms of fabrication methods. The fabrication methods currently available not only need a secondary cell-seeding process and result in severe protein absorption due to the material used, but also have difficulties in providing various cell types and extracellular matrix (ECM) environments for spatial heterogeneity in the organs-on-chips. Therefore, in this research, we introduce a novel 3D bioprinting method for organ-on-a-chip applications. With our novel 3D bioprinting method, it was possible to prepare an organ-on-a-chip in a simple one-step fabrication process. Furthermore, protein absorption on the printed platform was very low, which will lead to accurate measurement of metabolism and drug sensitivity. Moreover, heterotypic cell types and biomaterials were successfully used and positioned at the desired position for various organ-on-a-chip applications, which will promote full mimicry of the natural conditions of the organs. The liver organ was selected for the evaluation of the developed method, and liver function was shown to be significantly enhanced on the liver-on-a-chip, which was prepared by 3D bioprinting. Consequently, the results demonstrate that the suggested 3D bioprinting method is easier and more versatile for production of organs-on-chips.
Le Guellec, S; Lesluyes, T; Sarot, E; Valle, C; Filleron, T; Rochaix, P; Valentin, T; Pérot, G; Coindre, J-M; Chibon, F
2018-05-31
Prediction of metastatic outcome in sarcomas is challenging for clinical management since they are aggressive and carry a high metastatic risk. A 67-gene expression signature, the Complexity INdex in SARComas (CINSARC), has been identified as a better prognostic factor than the reference pathological grade. Since it cannot be applied easily in standard laboratory practice, we assessed its prognostic value using nanoString on formalin-fixed, paraffin-embedded (FFPE) blocks to evaluate its potential in clinical routine practice and guided therapeutic management. A code set consisting of 67 probes derived from the 67 genes of the CINSARC signature was built and named NanoCind®. To compare the performance of RNA-seq and nanoString (NanoCind®), we used expressions of various sarcomas (n=124, frozen samples) using both techniques and compared predictive values based on CINSARC risk groups and clinical annotations. We also used nanoString on FFPE blocks (n=67) and matching frozen and FFPE samples (n=45) to compare their level of agreement. Metastasis-free survival and agreement values in classification groups were evaluated. CINSARC strongly predicted metastatic outcome using nanoString on frozen samples (HR = 2.9, 95% CI 1.23-6.82) with similar risk-group classifications (86%). While more than 50% of FFPE blocks were not analyzable by RNA-seq owing to poor RNA quality, all samples were analyzable with nanoString. When similar (risk-group) classifications were measured with frozen tumors (RNA-seq) compared to FFPE blocks (84% agreement), the CINSARC signature was still a predictive factor of metastatic outcome with nanoString on FFPE samples (HR = 4.43, 95% CI 1.25-15.72). CINSARC is a material-independent prognostic signature for metastatic outcome in sarcomas and outperforms histological grade. Unlike RNA-seq, nanoString is not influenced by the poor quality of RNA extracted from FFPE blocks. The CINSARC signature can potentially be used in combination with nanoString (NanoCind®) in routine clinical practice on FFPE blocks to predict metastatic outcome.
Fast prediction of RNA-RNA interaction using heuristic algorithm.
Montaseri, Soheila
2015-01-01
Interaction between two RNA molecules plays a crucial role in many medical and biological processes such as gene expression regulation. In this process, an RNA molecule prohibits the translation of another RNA molecule by establishing stable interactions with it. Some algorithms have been formed to predict the structure of the RNA-RNA interaction. High computational time is a common challenge in most of the presented algorithms. In this context, a heuristic method is introduced to accurately predict the interaction between two RNAs based on minimum free energy (MFE). This algorithm uses a few dot matrices for finding the secondary structure of each RNA and binding sites between two RNAs. Furthermore, a parallel version of this method is presented. We describe the algorithm's concurrency and parallelism for a multicore chip. The proposed algorithm has been performed on some datasets including CopA-CopT, R1inv-R2inv, Tar-Tar*, DIS-DIS, and IncRNA54-RepZ in Escherichia coli bacteria. The method has high validity and efficiency, and it is run in low computational time in comparison to other approaches.
Auer, Lucas; Mariadassou, Mahendra; O'Donohue, Michael; Klopp, Christophe; Hernandez-Raquet, Guillermina
2017-11-01
Next-generation sequencing technologies give access to large sets of data, which are extremely useful in the study of microbial diversity based on 16S rRNA gene. However, the production of such large data sets is not only marred by technical biases and sequencing noise but also increases computation time and disc space use. To improve the accuracy of OTU predictions and overcome both computations, storage and noise issues, recent studies and tools suggested removing all single reads and low abundant OTUs, considering them as noise. Although the effect of applying an OTU abundance threshold on α- and β-diversity has been well documented, the consequences of removing single reads have been poorly studied. Here, we test the effect of singleton read filtering (SRF) on microbial community composition using in silico simulated data sets as well as sequencing data from synthetic and real communities displaying different levels of diversity and abundance profiles. Scalability to large data sets is also assessed using a complete MiSeq run. We show that SRF drastically reduces the chimera content and computational time, enabling the analysis of a complete MiSeq run in just a few minutes. Moreover, SRF accurately determines the actual community diversity: the differences in α- and β-community diversity obtained with SRF and standard procedures are much smaller than the intrinsic variability of technical and biological replicates. © 2017 John Wiley & Sons Ltd.
The Single-Nucleotide Resolution Transcriptome of Pseudomonas aeruginosa Grown in Body Temperature
Dandekar, Ajai A.; Edelheit, Sarit; Greenberg, E. Peter; Sorek, Rotem; Lory, Stephen
2012-01-01
One of the hallmarks of opportunistic pathogens is their ability to adjust and respond to a wide range of environmental and host-associated conditions. The human pathogen Pseudomonas aeruginosa has an ability to thrive in a variety of hosts and cause a range of acute and chronic infections in individuals with impaired host defenses or cystic fibrosis. Here we report an in-depth transcriptional profiling of this organism when grown at host-related temperatures. Using RNA-seq of samples from P. aeruginosa grown at 28°C and 37°C we detected genes preferentially expressed at the body temperature of mammalian hosts, suggesting that they play a role during infection. These temperature-induced genes included the type III secretion system (T3SS) genes and effectors, as well as the genes responsible for phenazines biosynthesis. Using genome-wide transcription start site (TSS) mapping by RNA-seq we were able to accurately define the promoters and cis-acting RNA elements of many genes, and uncovered new genes and previously unrecognized non-coding RNAs directly controlled by the LasR quorum sensing regulator. Overall we identified 165 small RNAs and over 380 cis-antisense RNAs, some of which predicted to perform regulatory functions, and found that non-coding RNAs are preferentially localized in pathogenicity islands and horizontally transferred regions. Our work identifies regulatory features of P. aeruginosa genes whose products play a role in environmental adaption during infection and provides a reference transcriptional landscape for this pathogen. PMID:23028334
The UEA sRNA Workbench (version 4.4): a comprehensive suite of tools for analyzing miRNAs and sRNAs.
Stocks, Matthew B; Mohorianu, Irina; Beckers, Matthew; Paicu, Claudia; Moxon, Simon; Thody, Joshua; Dalmay, Tamas; Moulton, Vincent
2018-05-02
RNA interference, a highly conserved regulatory mechanism, is mediated via small RNAs. Recent technical advances enabled the analysis of larger, complex datasets and the investigation of microRNAs and the less known small interfering RNAs. However, the size and intricacy of current data requires a comprehensive set of tools, able to discriminate the patterns from the low-level, noise-like, variation; numerous and varied suggestions from the community represent an invaluable source of ideas for future tools, the ability of the community to contribute to this software is essential. We present a new version of the UEA sRNA Workbench, reconfigured to allow an easy insertion of new tools/workflows. In its released form, it comprises of a suite of tools in a user-friendly environment, with enhanced capabilities for a comprehensive processing of sRNA-seq data e.g. tools for an accurate prediction of sRNA loci (CoLIde) and miRNA loci (miRCat2), as well as workflows to guide the users through common steps such as quality checking of the input data, normalization of abundances or detection of differential expression represent the first step in sRNA-seq analyses. The UEA sRNA Workbench is available at: http://srna-workbench.cmp.uea.ac.uk The source code is available at: https://github.com/sRNAworkbenchuea/UEA_sRNA_Workbench. v.moulton@uea.ac.uk.
Zhang, M Z; Zhang, X F; Chen, X M; Chen, X; Wu, S; Xu, L L
2015-08-10
The enzyme-linked probe hybridization chip utilizes a method based on ligase-hybridizing probe chip technology, with the principle of using thio-primers for protection against enzyme digestion, and using lambda DNA exonuclease to cut multiple PCR products obtained from the sample being tested into single-strand chains for hybridization. The 5'-end amino-labeled probe was fixed onto the aldehyde chip, and hybridized with the single-stranded PCR product, followed by addition of a fluorescent-modified probe that was then enzymatically linked with the adjacent, substrate-bound probe in order to achieve highly specific, parallel, and high-throughput detection. Specificity and sensitivity testing demonstrated that enzyme-linked probe hybridization technology could be applied to the specific detection of eight genetic modification events at the same time, with a sensitivity reaching 0.1% and the achievement of accurate, efficient, and stable results.
Optofluidic analysis system for amplification-free, direct detection of Ebola infection
NASA Astrophysics Data System (ADS)
Cai, H.; Parks, J. W.; Wall, T. A.; Stott, M. A.; Stambaugh, A.; Alfson, K.; Griffiths, A.; Mathies, R. A.; Carrion, R.; Patterson, J. L.; Hawkins, A. R.; Schmidt, H.
2015-09-01
The massive outbreak of highly lethal Ebola hemorrhagic fever in West Africa illustrates the urgent need for diagnostic instruments that can identify and quantify infections rapidly, accurately, and with low complexity. Here, we report on-chip sample preparation, amplification-free detection and quantification of Ebola virus on clinical samples using hybrid optofluidic integration. Sample preparation and target preconcentration are implemented on a PDMS-based microfluidic chip (automaton), followed by single nucleic acid fluorescence detection in liquid-core optical waveguides on a silicon chip in under ten minutes. We demonstrate excellent specificity, a limit of detection of 0.2 pfu/mL and a dynamic range of thirteen orders of magnitude, far outperforming other amplification-free methods. This chip-scale approach and reduced complexity compared to gold standard RT-PCR methods is ideal for portable instruments that can provide immediate diagnosis and continued monitoring of infectious diseases at the point-of-care.
Density determination of nail polishes and paint chips using magnetic levitation
NASA Astrophysics Data System (ADS)
Huang, Peggy P.
Trace evidence is often small, easily overlooked, and difficult to analyze. This study describes a nondestructive method to separate and accurately determine the density of trace evidence samples, specifically nail polish and paint chip using magnetic levitation (MagLev). By determining the levitation height of each sample in the MagLev device, the density of the sample is back extrapolated using a standard density bead linear regression line. The results show that MagLev distinguishes among eight clear nail polishes, including samples from the same manufacturer; separates select colored nail polishes from the same manufacturer; can determine the density range of household paint chips; and shows limited levitation for unknown paint chips. MagLev provides a simple, affordable, and nondestructive means of determining density. The addition of co-solutes to the paramagnetic solution to expand the density range may result in greater discriminatory power and separation and lead to further applications of this technique.
On the distance of genetic relationships and the accuracy of genomic prediction in pig breeding.
Meuwissen, Theo H E; Odegard, Jorgen; Andersen-Ranberg, Ina; Grindflek, Eli
2014-08-01
With the advent of genomic selection, alternative relationship matrices are used in animal breeding, which vary in their coverage of distant relationships due to old common ancestors. Relationships based on pedigree (A) and linkage analysis (GLA) cover only recent relationships because of the limited depth of the known pedigree. Relationships based on identity-by-state (G) include relationships up to the age of the SNP (single nucleotide polymorphism) mutations. We hypothesised that the latter relationships were too old, since QTL (quantitative trait locus) mutations for traits under selection were probably more recent than the SNPs on a chip, which are typically selected for high minor allele frequency. In addition, A and GLA relationships are too recent to cover genetic differences accurately. Thus, we devised a relationship matrix that considered intermediate-aged relationships and compared all these relationship matrices for their accuracy of genomic prediction in a pig breeding situation. Haplotypes were constructed and used to build a haplotype-based relationship matrix (GH), which considers more intermediate-aged relationships, since haplotypes recombine more quickly than SNPs mutate. Dense genotypes (38 453 SNPs) on 3250 elite breeding pigs were combined with phenotypes for growth rate (2668 records), lean meat percentage (2618), weight at three weeks of age (7387) and number of teats (5851) to estimate breeding values for all animals in the pedigree (8187 animals) using the aforementioned relationship matrices. Phenotypes on the youngest 424 to 486 animals were masked and predicted in order to assess the accuracy of the alternative genomic predictions. Correlations between the relationships and regressions of older on younger relationships revealed that the age of the relationships increased in the order A, GLA, GH and G. Use of genomic relationship matrices yielded significantly higher prediction accuracies than A. GH and G, differed not significantly, but were significantly more accurate than GLA. Our hypothesis that intermediate-aged relationships yield more accurate genomic predictions than G was confirmed for two of four traits, but these results were not statistically significant. Use of estimated genotype probabilities for ungenotyped animals proved to be an efficient method to include the phenotypes of ungenotyped animals.
Quantification of differential gene expression by multiplexed targeted resequencing of cDNA
Arts, Peer; van der Raadt, Jori; van Gestel, Sebastianus H.C.; Steehouwer, Marloes; Shendure, Jay; Hoischen, Alexander; Albers, Cornelis A.
2017-01-01
Whole-transcriptome or RNA sequencing (RNA-Seq) is a powerful and versatile tool for functional analysis of different types of RNA molecules, but sample reagent and sequencing cost can be prohibitive for hypothesis-driven studies where the aim is to quantify differential expression of a limited number of genes. Here we present an approach for quantification of differential mRNA expression by targeted resequencing of complementary DNA using single-molecule molecular inversion probes (cDNA-smMIPs) that enable highly multiplexed resequencing of cDNA target regions of ∼100 nucleotides and counting of individual molecules. We show that accurate estimates of differential expression can be obtained from molecule counts for hundreds of smMIPs per reaction and that smMIPs are also suitable for quantification of relative gene expression and allele-specific expression. Compared with low-coverage RNA-Seq and a hybridization-based targeted RNA-Seq method, cDNA-smMIPs are a cost-effective high-throughput tool for hypothesis-driven expression analysis in large numbers of genes (10 to 500) and samples (hundreds to thousands). PMID:28474677
Shiroguchi, Katsuyuki; Jia, Tony Z.; Sims, Peter A.; Xie, X. Sunney
2012-01-01
RNA sequencing (RNA-Seq) is a powerful tool for transcriptome profiling, but is hampered by sequence-dependent bias and inaccuracy at low copy numbers intrinsic to exponential PCR amplification. We developed a simple strategy for mitigating these complications, allowing truly digital RNA-Seq. Following reverse transcription, a large set of barcode sequences is added in excess, and nearly every cDNA molecule is uniquely labeled by random attachment of barcode sequences to both ends. After PCR, we applied paired-end deep sequencing to read the two barcodes and cDNA sequences. Rather than counting the number of reads, RNA abundance is measured based on the number of unique barcode sequences observed for a given cDNA sequence. We optimized the barcodes to be unambiguously identifiable, even in the presence of multiple sequencing errors. This method allows counting with single-copy resolution despite sequence-dependent bias and PCR-amplification noise, and is analogous to digital PCR but amendable to quantifying a whole transcriptome. We demonstrated transcriptome profiling of Escherichia coli with more accurate and reproducible quantification than conventional RNA-Seq. PMID:22232676
Somatic Mutations and Neoepitope Homology in Melanomas Treated with CTLA-4 Blockade.
Nathanson, Tavi; Ahuja, Arun; Rubinsteyn, Alexander; Aksoy, Bulent Arman; Hellmann, Matthew D; Miao, Diana; Van Allen, Eliezer; Merghoub, Taha; Wolchok, Jedd D; Snyder, Alexandra; Hammerbacher, Jeff
2017-01-01
Immune checkpoint inhibitors are promising treatments for patients with a variety of malignancies. Toward understanding the determinants of response to immune checkpoint inhibitors, it was previously demonstrated that the presence of somatic mutations is associated with benefit from checkpoint inhibition. A hypothesis was posited that neoantigen homology to pathogens may in part explain the link between somatic mutations and response. To further examine this hypothesis, we reanalyzed cancer exome data obtained from our previously published study of 64 melanoma patients treated with CTLA-4 blockade and a new dataset of RNA-Seq data from 24 of these patients. We found that the ability to accurately predict patient benefit did not increase as the analysis narrowed from somatic mutation burden, to inclusion of only those mutations predicted to be MHC class I neoantigens, to only including those neoantigens that were expressed or that had homology to pathogens. The only association between somatic mutation burden and response was found when examining samples obtained prior to treatment. Neoantigen and expressed neoantigen burden were also associated with response, but neither was more predictive than somatic mutation burden. Neither the previously described tetrapeptide signature nor an updated method to evaluate neoepitope homology to pathogens was more predictive than mutation burden. Cancer Immunol Res; 5(1); 84-91. ©2016 AACR. ©2016 American Association for Cancer Research.
Read clouds uncover variation in complex regions of the human genome
Bishara, Alex; Liu, Yuling; Weng, Ziming; Kashef-Haghighi, Dorna; Newburger, Daniel E.; West, Robert; Sidow, Arend; Batzoglou, Serafim
2015-01-01
Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies. PMID:26286554
Analysis of the resistive network in a bio-inspired CMOS vision chip
NASA Astrophysics Data System (ADS)
Kong, Jae-Sung; Sung, Dong-Kyu; Hyun, Hyo-Young; Shin, Jang-Kyoo
2007-12-01
CMOS vision chips for edge detection based on a resistive circuit have recently been developed. These chips help develop neuromorphic systems with a compact size, high speed of operation, and low power dissipation. The output of the vision chip depends dominantly upon the electrical characteristics of the resistive network which consists of a resistive circuit. In this paper, the body effect of the MOSFET for current distribution in a resistive circuit is discussed with a simple model. In order to evaluate the model, two 160×120 CMOS vision chips have been fabricated by using a standard CMOS technology. The experimental results have been nicely matched with our prediction.
PeakRanger: A cloud-enabled peak caller for ChIP-seq data
2011-01-01
Background Chromatin immunoprecipitation (ChIP), coupled with massively parallel short-read sequencing (seq) is used to probe chromatin dynamics. Although there are many algorithms to call peaks from ChIP-seq datasets, most are tuned either to handle punctate sites, such as transcriptional factor binding sites, or broad regions, such as histone modification marks; few can do both. Other algorithms are limited in their configurability, performance on large data sets, and ability to distinguish closely-spaced peaks. Results In this paper, we introduce PeakRanger, a peak caller software package that works equally well on punctate and broad sites, can resolve closely-spaced peaks, has excellent performance, and is easily customized. In addition, PeakRanger can be run in a parallel cloud computing environment to obtain extremely high performance on very large data sets. We present a series of benchmarks to evaluate PeakRanger against 10 other peak callers, and demonstrate the performance of PeakRanger on both real and synthetic data sets. We also present real world usages of PeakRanger, including peak-calling in the modENCODE project. Conclusions Compared to other peak callers tested, PeakRanger offers improved resolution in distinguishing extremely closely-spaced peaks. PeakRanger has above-average spatial accuracy in terms of identifying the precise location of binding events. PeakRanger also has excellent sensitivity and specificity in all benchmarks evaluated. In addition, PeakRanger offers significant improvements in run time when running on a single processor system, and very marked improvements when allowed to take advantage of the MapReduce parallel environment offered by a cloud computing resource. PeakRanger can be downloaded at the official site of modENCODE project: http://www.modencode.org/software/ranger/ PMID:21554709
High-sensitivity HLA typing by Saturated Tiling Capture Sequencing (STC-Seq).
Jiao, Yang; Li, Ran; Wu, Chao; Ding, Yibin; Liu, Yanning; Jia, Danmei; Wang, Lifeng; Xu, Xiang; Zhu, Jing; Zheng, Min; Jia, Junling
2018-01-15
Highly polymorphic human leukocyte antigen (HLA) genes are responsible for fine-tuning the adaptive immune system. High-resolution HLA typing is important for the treatment of autoimmune and infectious diseases. Additionally, it is routinely performed for identifying matched donors in transplantation medicine. Although many HLA typing approaches have been developed, the complexity, low-efficiency and high-cost of current HLA-typing assays limit their application in population-based high-throughput HLA typing for donors, which is required for creating large-scale databases for transplantation and precision medicine. Here, we present a cost-efficient Saturated Tiling Capture Sequencing (STC-Seq) approach to capturing 14 HLA class I and II genes. The highly efficient capture (an approximately 23,000-fold enrichment) of these genes allows for simplified allele calling. Tests on five genes (HLA-A/B/C/DRB1/DQB1) from 31 human samples and 351 datasets using STC-Seq showed results that were 98% consistent with the known two sets of digitals (field1 and field2) genotypes. Additionally, STC can capture genomic DNA fragments longer than 3 kb from HLA loci, making the library compatible with the third-generation sequencing. STC-Seq is a highly accurate and cost-efficient method for HLA typing which can be used to facilitate the establishment of population-based HLA databases for the precision and transplantation medicine.
Jia, Cheng; Hu, Yu; Kelly, Derek; Kim, Junhyong; Li, Mingyao; Zhang, Nancy R
2017-11-02
Recent technological breakthroughs have made it possible to measure RNA expression at the single-cell level, thus paving the way for exploring expression heterogeneity among individual cells. Current single-cell RNA sequencing (scRNA-seq) protocols are complex and introduce technical biases that vary across cells, which can bias downstream analysis without proper adjustment. To account for cell-to-cell technical differences, we propose a statistical framework, TASC (Toolkit for Analysis of Single Cell RNA-seq), an empirical Bayes approach to reliably model the cell-specific dropout rates and amplification bias by use of external RNA spike-ins. TASC incorporates the technical parameters, which reflect cell-to-cell batch effects, into a hierarchical mixture model to estimate the biological variance of a gene and detect differentially expressed genes. More importantly, TASC is able to adjust for covariates to further eliminate confounding that may originate from cell size and cell cycle differences. In simulation and real scRNA-seq data, TASC achieves accurate Type I error control and displays competitive sensitivity and improved robustness to batch effects in differential expression analysis, compared to existing methods. TASC is programmed to be computationally efficient, taking advantage of multi-threaded parallelization. We believe that TASC will provide a robust platform for researchers to leverage the power of scRNA-seq. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Jia, Cheng; Hu, Yu; Kelly, Derek; Kim, Junhyong
2017-01-01
Abstract Recent technological breakthroughs have made it possible to measure RNA expression at the single-cell level, thus paving the way for exploring expression heterogeneity among individual cells. Current single-cell RNA sequencing (scRNA-seq) protocols are complex and introduce technical biases that vary across cells, which can bias downstream analysis without proper adjustment. To account for cell-to-cell technical differences, we propose a statistical framework, TASC (Toolkit for Analysis of Single Cell RNA-seq), an empirical Bayes approach to reliably model the cell-specific dropout rates and amplification bias by use of external RNA spike-ins. TASC incorporates the technical parameters, which reflect cell-to-cell batch effects, into a hierarchical mixture model to estimate the biological variance of a gene and detect differentially expressed genes. More importantly, TASC is able to adjust for covariates to further eliminate confounding that may originate from cell size and cell cycle differences. In simulation and real scRNA-seq data, TASC achieves accurate Type I error control and displays competitive sensitivity and improved robustness to batch effects in differential expression analysis, compared to existing methods. TASC is programmed to be computationally efficient, taking advantage of multi-threaded parallelization. We believe that TASC will provide a robust platform for researchers to leverage the power of scRNA-seq. PMID:29036714
Nebula--a web-server for advanced ChIP-seq data analysis.
Boeva, Valentina; Lermine, Alban; Barette, Camille; Guillouf, Christel; Barillot, Emmanuel
2012-10-01
ChIP-seq consists of chromatin immunoprecipitation and deep sequencing of the extracted DNA fragments. It is the technique of choice for accurate characterization of the binding sites of transcription factors and other DNA-associated proteins. We present a web service, Nebula, which allows inexperienced users to perform a complete bioinformatics analysis of ChIP-seq data. Nebula was designed for both bioinformaticians and biologists. It is based on the Galaxy open source framework. Galaxy already includes a large number of functionalities for mapping reads and peak calling. We added the following to Galaxy: (i) peak calling with FindPeaks and a module for immunoprecipitation quality control, (ii) de novo motif discovery with ChIPMunk, (iii) calculation of the density and the cumulative distribution of peak locations relative to gene transcription start sites, (iv) annotation of peaks with genomic features and (v) annotation of genes with peak information. Nebula generates the graphs and the enrichment statistics at each step of the process. During Steps 3-5, Nebula optionally repeats the analysis on a control dataset and compares these results with those from the main dataset. Nebula can also incorporate gene expression (or gene modulation) data during these steps. In summary, Nebula is an innovative web service that provides an advanced ChIP-seq analysis pipeline providing ready-to-publish results. Nebula is available at http://nebula.curie.fr/ Supplementary data are available at Bioinformatics online.
Michel, Audrey M.; Ahern, Anna M.; Donohue, Claire A.
2015-01-01
The boundaries of protein coding sequences are more difficult to define at the 5′ end than at the 3′ end due to potential multiple translation initiation sites (TISs). Even in the presence of phylogenetic data, the use of sequence information only may not be sufficient for the accurate identification of TISs. Traditional proteomics approaches may also fail because the N‐termini of newly synthesized proteins are often processed. Thus ribosome profiling (ribo‐seq), producing a snapshot of the ribosome distribution across the entire transcriptome, is an attractive experimental technique for the purpose of TIS location exploration. The GWIPS‐viz (Genome Wide Information on Protein Synthesis visualized) browser (http://gwips.ucc.ie) provides free access to the genomic alignments of ribo‐seq data and corresponding mRNA‐seq data along with relevant annotation tracks. In this brief, we illustrate how GWIPS‐viz can be used to explore the ribosome occupancy at the 5′ ends of protein coding genes to assess the activity of AUG and non‐AUG TISs responsible for the synthesis of proteoforms with alternative or heterogeneous N‐termini. The presence of ribo‐seq tracks for various organisms allows for cross‐species comparison of orthologous genes and the availability of datasets from multiple laboratories permits the assessment of the technical reproducibility of the ribosome densities. PMID:25736862
Yates, Kathleen B.; Bi, Kevin; Darko, Samuel; Godec, Jernej; Gerdemann, Ulrike; Swadling, Leo; Douek, Daniel C.; Klenerman, Paul; Barnes, Eleanor J.; Sharpe, Arlene H.
2017-01-01
Abstract The T cell compartment must contain diversity in both T cell receptor (TCR) repertoire and cell state to provide effective immunity against pathogens. However, it remains unclear how differences in the TCR contribute to heterogeneity in T cell state. Single cell RNA-sequencing (scRNA-seq) can allow simultaneous measurement of TCR sequence and global transcriptional profile from single cells. However, current methods for TCR inference from scRNA-seq are limited in their sensitivity and require long sequencing reads, thus increasing the cost and decreasing the number of cells that can be feasibly analyzed. Here we present TRAPeS, a publicly available tool that can efficiently extract TCR sequence information from short-read scRNA-seq libraries. We apply it to investigate heterogeneity in the CD8+ T cell response in humans and mice, and show that it is accurate and more sensitive than existing approaches. Coupling TRAPeS with transcriptome analysis of CD8+ T cells specific for a single epitope from Yellow Fever Virus (YFV), we show that the recently described ‘naive-like’ memory population have significantly longer CDR3 regions and greater divergence from germline sequence than do effector-memory phenotype cells. This suggests that TCR usage is associated with the differentiation state of the CD8+ T cell response to YFV. PMID:28934479
YM500v2: a small RNA sequencing (smRNA-seq) database for human cancer miRNome research.
Cheng, Wei-Chung; Chung, I-Fang; Tsai, Cheng-Fong; Huang, Tse-Shun; Chen, Chen-Yang; Wang, Shao-Chuan; Chang, Ting-Yu; Sun, Hsing-Jen; Chao, Jeffrey Yung-Chuan; Cheng, Cheng-Chung; Wu, Cheng-Wen; Wang, Hsei-Wei
2015-01-01
We previously presented YM500, which is an integrated database for miRNA quantification, isomiR identification, arm switching discovery and novel miRNA prediction from 468 human smRNA-seq datasets. Here in this updated YM500v2 database (http://ngs.ym.edu.tw/ym500/), we focus on the cancer miRNome to make the database more disease-orientated. New miRNA-related algorithms developed after YM500 were included in YM500v2, and, more significantly, more than 8000 cancer-related smRNA-seq datasets (including those of primary tumors, paired normal tissues, PBMC, recurrent tumors, and metastatic tumors) were incorporated into YM500v2. Novel miRNAs (miRNAs not included in the miRBase R21) were not only predicted by three independent algorithms but also cleaned by a new in silico filtration strategy and validated by wetlab data such as Cross-Linked ImmunoPrecipitation sequencing (CLIP-seq) to reduce the false-positive rate. A new function 'Meta-analysis' is additionally provided for allowing users to identify real-time differentially expressed miRNAs and arm-switching events according to customer-defined sample groups and dozens of clinical criteria tidying up by proficient clinicians. Cancer miRNAs identified hold the potential for both basic research and biotech applications. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Zhang, Yu; Zhang, Xiaofeng; Zhang, Jinling; Sun, Bin; Zheng, Lulu; Li, Jun; Liu, Sixiu; Sui, Guodong; Yin, Zhengfeng
2016-11-01
Circulating tumor cells (CTCs) have been proposed to be an active source of metastasis or recurrence of hepatocellular carcinoma (HCC). The enumeration and characterization of CTCs has important clinical significance in recurrence prediction and treatment monitoring in HCC patients. We previously developed a unique method to separate HCC CTCs based on the interaction of the asialoglycoprotein receptor (ASGPR) expressed on their membranes with its ligand. The current study applied the ligand-receptor binding assay to a CTC-chip in a microfluidic device. Efficient capture of HCC CTCs originates from the small dimensions of microfluidic channels and enhanced local topographic interactions between the microfluidic channel and extracellular extensions. With the optimized conditions, a capture yield reached > 85% for artificial CTC blood samples. Clinical utility of the system was further validated. CTCs were detected in all the examined 36 patients with HCC, with an average of 14 ± 10/2 mL. On the contrary, no CTCs were detected in healthy, benign liver disease or non-HCC cancer subjects. The current study also successfully demonstrated that the captured CTCs on our CTC-chip were readily released with ethylene diamine tetraacetic acid (EDTA); released CTCs remained alive and could be expanded to form a spheroid-like structure in a 3-dimensional cell culture assay; furthermore, sensitivity of released CTCs to chemotherapeutic agents (sorafenib or oxaliplatin) could be effectively tested utilizing this culture assay. In conclusion, the methodologies presented here offer great promise for accurate enumeration and easy release of captured CTCs, and released CTCs could be cultured for further functional studies.
Genome wide selection in Citrus breeding.
Gois, I B; Borém, A; Cristofani-Yaly, M; de Resende, M D V; Azevedo, C F; Bastianel, M; Novelli, V M; Machado, M A
2016-10-17
Genome wide selection (GWS) is essential for the genetic improvement of perennial species such as Citrus because of its ability to increase gain per unit time and to enable the efficient selection of characteristics with low heritability. This study assessed GWS efficiency in a population of Citrus and compared it with selection based on phenotypic data. A total of 180 individual trees from a cross between Pera sweet orange (Citrus sinensis Osbeck) and Murcott tangor (Citrus sinensis Osbeck x Citrus reticulata Blanco) were evaluated for 10 characteristics related to fruit quality. The hybrids were genotyped using 5287 DArT_seq TM (diversity arrays technology) molecular markers and their effects on phenotypes were predicted using the random regression - best linear unbiased predictor (rr-BLUP) method. The predictive ability, prediction bias, and accuracy of GWS were estimated to verify its effectiveness for phenotype prediction. The proportion of genetic variance explained by the markers was also computed. The heritability of the traits, as determined by markers, was 16-28%. The predictive ability of these markers ranged from 0.53 to 0.64, and the regression coefficients between predicted and observed phenotypes were close to unity. Over 35% of the genetic variance was accounted for by the markers. Accuracy estimates with GWS were lower than those obtained by phenotypic analysis; however, GWS was superior in terms of genetic gain per unit time. Thus, GWS may be useful for Citrus breeding as it can predict phenotypes early and accurately, and reduce the length of the selection cycle. This study demonstrates the feasibility of genomic selection in Citrus.
Toward Evolvable Hardware Chips: Experiments with a Programmable Transistor Array
NASA Technical Reports Server (NTRS)
Stoica, Adrian
1998-01-01
Evolvable Hardware is reconfigurable hardware that self-configures under the control of an evolutionary algorithm. We search for a hardware configuration can be performed using software models or, faster and more accurate, directly in reconfigurable hardware. Several experiments have demonstrated the possibility to automatically synthesize both digital and analog circuits. The paper introduces an approach to automated synthesis of CMOS circuits, based on evolution on a Programmable Transistor Array (PTA). The approach is illustrated with a software experiment showing evolutionary synthesis of a circuit with a desired DC characteristic. A hardware implementation of a test PTA chip is then described, and the same evolutionary experiment is performed on the chip demonstrating circuit synthesis/self-configuration directly in hardware.
Cloud, Joann L; Conville, Patricia S; Croft, Ann; Harmsen, Dag; Witebsky, Frank G; Carroll, Karen C
2004-02-01
Identification of clinically significant nocardiae to the species level is important in patient diagnosis and treatment. A study was performed to evaluate Nocardia species identification obtained by partial 16S ribosomal DNA (rDNA) sequencing by the MicroSeq 500 system with an expanded database. The expanded portion of the database was developed from partial 5' 16S rDNA sequences derived from 28 reference strains (from the American Type Culture Collection and the Japanese Collection of Microorganisms). The expanded MicroSeq 500 system was compared to (i). conventional identification obtained from a combination of growth characteristics with biochemical and drug susceptibility tests; (ii). molecular techniques involving restriction enzyme analysis (REA) of portions of the 16S rRNA and 65-kDa heat shock protein genes; and (iii). when necessary, sequencing of a 999-bp fragment of the 16S rRNA gene. An unknown isolate was identified as a particular species if the sequence obtained by partial 16S rDNA sequencing by the expanded MicroSeq 500 system was 99.0% similar to that of the reference strain. Ninety-four nocardiae representing 10 separate species were isolated from patient specimens and examined by using the three different methods. Sequencing of partial 16S rDNA by the expanded MicroSeq 500 system resulted in only 72% agreement with conventional methods for species identification and 90% agreement with the alternative molecular methods. Molecular methods for identification of Nocardia species provide more accurate and rapid results than the conventional methods using biochemical and susceptibility testing. With an expanded database, the MicroSeq 500 system for partial 16S rDNA was able to correctly identify the human pathogens N. brasiliensis, N. cyriacigeorgica, N. farcinica, N. nova, N. otitidiscaviarum, and N. veterana.
The molecular topography of silenced chromatin in Saccharomyces cerevisiae
Thurtle, Deborah M.; Rine, Jasper
2014-01-01
Heterochromatin imparts regional, promoter-independent repression of genes and is epigenetically heritable. Understanding how silencing achieves this regional repression is a fundamental problem in genetics and development. Current models of yeast silencing posit that Sir proteins, recruited by transcription factors bound to the silencers, spread throughout the silenced region. To test this model directly at high resolution, we probed the silenced chromatin architecture by chromatin immunoprecipitation (ChIP) followed by next-generation sequencing (ChIP-seq) of Sir proteins, histones, and a key histone modification, H4K16-acetyl. These analyses revealed that Sir proteins are strikingly concentrated at and immediately adjacent to the silencers, with lower levels of enrichment over the promoters at HML and HMR, the critical targets for transcriptional repression. The telomeres also showed discrete peaks of Sir enrichment yet a continuous domain of hypoacetylated histone H4K16. Surprisingly, ChIP-seq of cross-linked chromatin revealed a distribution of nucleosomes at silenced loci that was similar to Sir proteins, whereas native nucleosome maps showed a regular distribution throughout silenced loci, indicating that cross-linking captured a specialized chromatin organization imposed by Sir proteins. This specialized chromatin architecture observed in yeast informs the importance of a steric contribution to regional repression in other organisms. PMID:24493645
Di Pierro, Michele; Cheng, Ryan R; Lieberman Aiden, Erez; Wolynes, Peter G; Onuchic, José N
2017-11-14
Inside the cell nucleus, genomes fold into organized structures that are characteristic of cell type. Here, we show that this chromatin architecture can be predicted de novo using epigenetic data derived from chromatin immunoprecipitation-sequencing (ChIP-Seq). We exploit the idea that chromosomes encode a 1D sequence of chromatin structural types. Interactions between these chromatin types determine the 3D structural ensemble of chromosomes through a process similar to phase separation. First, a neural network is used to infer the relation between the epigenetic marks present at a locus, as assayed by ChIP-Seq, and the genomic compartment in which those loci reside, as measured by DNA-DNA proximity ligation (Hi-C). Next, types inferred from this neural network are used as an input to an energy landscape model for chromatin organization [Minimal Chromatin Model (MiChroM)] to generate an ensemble of 3D chromosome conformations at a resolution of 50 kilobases (kb). After training the model, dubbed Maximum Entropy Genomic Annotation from Biomarkers Associated to Structural Ensembles (MEGABASE), on odd-numbered chromosomes, we predict the sequences of chromatin types and the subsequent 3D conformational ensembles for the even chromosomes. We validate these structural ensembles by using ChIP-Seq tracks alone to predict Hi-C maps, as well as distances measured using 3D fluorescence in situ hybridization (FISH) experiments. Both sets of experiments support the hypothesis of phase separation being the driving process behind compartmentalization. These findings strongly suggest that epigenetic marking patterns encode sufficient information to determine the global architecture of chromosomes and that de novo structure prediction for whole genomes may be increasingly possible. Copyright © 2017 the Author(s). Published by PNAS.
Buschmann, Dominik; Haberberger, Anna; Kirchner, Benedikt; Spornraft, Melanie; Riedmaier, Irmgard; Schelling, Gustav; Pfaffl, Michael W.
2016-01-01
Small RNA-Seq has emerged as a powerful tool in transcriptomics, gene expression profiling and biomarker discovery. Sequencing cell-free nucleic acids, particularly microRNA (miRNA), from liquid biopsies additionally provides exciting possibilities for molecular diagnostics, and might help establish disease-specific biomarker signatures. The complexity of the small RNA-Seq workflow, however, bears challenges and biases that researchers need to be aware of in order to generate high-quality data. Rigorous standardization and extensive validation are required to guarantee reliability, reproducibility and comparability of research findings. Hypotheses based on flawed experimental conditions can be inconsistent and even misleading. Comparable to the well-established MIQE guidelines for qPCR experiments, this work aims at establishing guidelines for experimental design and pre-analytical sample processing, standardization of library preparation and sequencing reactions, as well as facilitating data analysis. We highlight bottlenecks in small RNA-Seq experiments, point out the importance of stringent quality control and validation, and provide a primer for differential expression analysis and biomarker discovery. Following our recommendations will encourage better sequencing practice, increase experimental transparency and lead to more reproducible small RNA-Seq results. This will ultimately enhance the validity of biomarker signatures, and allow reliable and robust clinical predictions. PMID:27317696
Epileptic Seizure Prediction Using Big Data and Deep Learning: Toward a Mobile System.
Kiral-Kornek, Isabell; Roy, Subhrajit; Nurse, Ewan; Mashford, Benjamin; Karoly, Philippa; Carroll, Thomas; Payne, Daniel; Saha, Susmita; Baldassano, Steven; O'Brien, Terence; Grayden, David; Cook, Mark; Freestone, Dean; Harrer, Stefan
2018-01-01
Seizure prediction can increase independence and allow preventative treatment for patients with epilepsy. We present a proof-of-concept for a seizure prediction system that is accurate, fully automated, patient-specific, and tunable to an individual's needs. Intracranial electroencephalography (iEEG) data of ten patients obtained from a seizure advisory system were analyzed as part of a pseudoprospective seizure prediction study. First, a deep learning classifier was trained to distinguish between preictal and interictal signals. Second, classifier performance was tested on held-out iEEG data from all patients and benchmarked against the performance of a random predictor. Third, the prediction system was tuned so sensitivity or time in warning could be prioritized by the patient. Finally, a demonstration of the feasibility of deployment of the prediction system onto an ultra-low power neuromorphic chip for autonomous operation on a wearable device is provided. The prediction system achieved mean sensitivity of 69% and mean time in warning of 27%, significantly surpassing an equivalent random predictor for all patients by 42%. This study demonstrates that deep learning in combination with neuromorphic hardware can provide the basis for a wearable, real-time, always-on, patient-specific seizure warning system with low power consumption and reliable long-term performance. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Vidaki, Athina; Ballard, David; Aliferi, Anastasia; Miller, Thomas H; Barron, Leon P; Syndercombe Court, Denise
2017-05-01
The ability to estimate the age of the donor from recovered biological material at a crime scene can be of substantial value in forensic investigations. Aging can be complex and is associated with various molecular modifications in cells that accumulate over a person's lifetime including epigenetic patterns. The aim of this study was to use age-specific DNA methylation patterns to generate an accurate model for the prediction of chronological age using data from whole blood. In total, 45 age-associated CpG sites were selected based on their reported age coefficients in a previous extensive study and investigated using publicly available methylation data obtained from 1156 whole blood samples (aged 2-90 years) analysed with Illumina's genome-wide methylation platforms (27K/450K). Applying stepwise regression for variable selection, 23 of these CpG sites were identified that could significantly contribute to age prediction modelling and multiple regression analysis carried out with these markers provided an accurate prediction of age (R 2 =0.92, mean absolute error (MAE)=4.6 years). However, applying machine learning, and more specifically a generalised regression neural network model, the age prediction significantly improved (R 2 =0.96) with a MAE=3.3 years for the training set and 4.4 years for a blind test set of 231 cases. The machine learning approach used 16 CpG sites, located in 16 different genomic regions, with the top 3 predictors of age belonged to the genes NHLRC1, SCGN and CSNK1D. The proposed model was further tested using independent cohorts of 53 monozygotic twins (MAE=7.1 years) and a cohort of 1011 disease state individuals (MAE=7.2 years). Furthermore, we highlighted the age markers' potential applicability in samples other than blood by predicting age with similar accuracy in 265 saliva samples (R 2 =0.96) with a MAE=3.2 years (training set) and 4.0 years (blind test). In an attempt to create a sensitive and accurate age prediction test, a next generation sequencing (NGS)-based method able to quantify the methylation status of the selected 16 CpG sites was developed using the Illumina MiSeq ® platform. The method was validated using DNA standards of known methylation levels and the age prediction accuracy has been initially assessed in a set of 46 whole blood samples. Although the resulted prediction accuracy using the NGS data was lower compared to the original model (MAE=7.5years), it is expected that future optimization of our strategy to account for technical variation as well as increasing the sample size will improve both the prediction accuracy and reproducibility. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Gadala-Maria, Daniel; Yaari, Gur; Uduman, Mohamed; Kleinstein, Steven H
2015-02-24
Individual variation in germline and expressed B-cell immunoglobulin (Ig) repertoires has been associated with aging, disease susceptibility, and differential response to infection and vaccination. Repertoire properties can now be studied at large-scale through next-generation sequencing of rearranged Ig genes. Accurate analysis of these repertoire-sequencing (Rep-Seq) data requires identifying the germline variable (V), diversity (D), and joining (J) gene segments used by each Ig sequence. Current V(D)J assignment methods work by aligning sequences to a database of known germline V(D)J segment alleles. However, existing databases are likely to be incomplete and novel polymorphisms are hard to differentiate from the frequent occurrence of somatic hypermutations in Ig sequences. Here we develop a Tool for Ig Genotype Elucidation via Rep-Seq (TIgGER). TIgGER analyzes mutation patterns in Rep-Seq data to identify novel V segment alleles, and also constructs a personalized germline database containing the specific set of alleles carried by a subject. This information is then used to improve the initial V segment assignments from existing tools, like IMGT/HighV-QUEST. The application of TIgGER to Rep-Seq data from seven subjects identified 11 novel V segment alleles, including at least one in every subject examined. These novel alleles constituted 13% of the total number of unique alleles in these subjects, and impacted 3% of V(D)J segment assignments. These results reinforce the highly polymorphic nature of human Ig V genes, and suggest that many novel alleles remain to be discovered. The integration of TIgGER into Rep-Seq processing pipelines will increase the accuracy of V segment assignments, thus improving B-cell repertoire analyses.
Predicting Gene Structure Changes Resulting from Genetic Variants via Exon Definition Features.
Majoros, William H; Holt, Carson; Campbell, Michael S; Ware, Doreen; Yandell, Mark; Reddy, Timothy E
2018-04-25
Genetic variation that disrupts gene function by altering gene splicing between individuals can substantially influence traits and disease. In those cases, accurately predicting the effects of genetic variation on splicing can be highly valuable for investigating the mechanisms underlying those traits and diseases. While methods have been developed to generate high quality computational predictions of gene structures in reference genomes, the same methods perform poorly when used to predict the potentially deleterious effects of genetic changes that alter gene splicing between individuals. Underlying that discrepancy in predictive ability are the common assumptions by reference gene finding algorithms that genes are conserved, well-formed, and produce functional proteins. We describe a probabilistic approach for predicting recent changes to gene structure that may or may not conserve function. The model is applicable to both coding and noncoding genes, and can be trained on existing gene annotations without requiring curated examples of aberrant splicing. We apply this model to the problem of predicting altered splicing patterns in the genomes of individual humans, and we demonstrate that performing gene-structure prediction without relying on conserved coding features is feasible. The model predicts an unexpected abundance of variants that create de novo splice sites, an observation supported by both simulations and empirical data from RNA-seq experiments. While these de novo splice variants are commonly misinterpreted by other tools as coding or noncoding variants of little or no effect, we find that in some cases they can have large effects on splicing activity and protein products, and we propose that they may commonly act as cryptic factors in disease. The software is available from geneprediction.org/SGRF. bmajoros@duke.edu. Supplementary information is available at Bioinformatics online.
Liu, Ruolin; Dickerson, Julie
2017-11-01
We propose a novel method and software tool, Strawberry, for transcript reconstruction and quantification from RNA-Seq data under the guidance of genome alignment and independent of gene annotation. Strawberry consists of two modules: assembly and quantification. The novelty of Strawberry is that the two modules use different optimization frameworks but utilize the same data graph structure, which allows a highly efficient, expandable and accurate algorithm for dealing large data. The assembly module parses aligned reads into splicing graphs, and uses network flow algorithms to select the most likely transcripts. The quantification module uses a latent class model to assign read counts from the nodes of splicing graphs to transcripts. Strawberry simultaneously estimates the transcript abundances and corrects for sequencing bias through an EM algorithm. Based on simulations, Strawberry outperforms Cufflinks and StringTie in terms of both assembly and quantification accuracies. Under the evaluation of a real data set, the estimated transcript expression by Strawberry has the highest correlation with Nanostring probe counts, an independent experiment measure for transcript expression. Strawberry is written in C++14, and is available as open source software at https://github.com/ruolin/strawberry under the MIT license.
School Self-Concept in Adolescents With Chronic Pain.
Logan, Deirdre E; Gray, Laura S; Iversen, Christina N; Kim, Susan
2017-09-01
This study investigated school self-efficacy and sense of school membership (collectively "school self-concept") as potential influences on impaired school function among adolescents with chronic pain, including comparison of adolescents with primary pain to those with disease-based pain and pain-free peers. In all, 264 adolescents (12-17 years old) with primary pain conditions, juvenile idiopathic arthritis, or no pain completed measures of functional disability, school functioning, pain characteristics, and school self-concept, the Self-Efficacy Questionnaire for School Situations (SEQ-SS), and Psychological Sense of School Membership (PSSM). Both the SEQ-SS and PSSM demonstrated reliability and some validity, with the SEQ-SS more strongly supported. As a group, adolescents with primary pain conditions reported poorer school self-concept. School self-efficacy, but not school belongingness, predicted school functioning later in the school year. School self-concept, especially as assessed with the SEQ-SS, is relevant and important to assess when addressing school functioning in youth with chronic pain. © The Author 2017. Published by Oxford University Press on behalf of the Society of Pediatric Psychology. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
Investigation of electromigration behavior in lead-free flip chip solder bumps
NASA Astrophysics Data System (ADS)
Kalkundri, Kaustubh Jayant
Packaging technology has also evolved over time in an effort to keep pace with the demanding requirements. Wirebond and flip chip packaging technologies have become extremely versatile and ubiquitous in catering to myriad applications due to their inherent potential. This research is restricted strictly to flip chip technology. This technology incorporates a process in which the bare chip is turned upside down, i.e., active face down, and is bonded through the I/O to the substrate, hence called flip chip. A solder interconnect that provides electrical connection between the chip and substrate is bumped on a processed silicon wafer prior to dicing for die-attach. The assembly is then reflow-soldered followed by the underfill process to provide the required encapsulation. The demand for smaller and lighter products has increased the number of I/Os without increasing the package sizes, thereby drastically reducing the size of the flip chip solder bumps and their pitch. Reliability assessment and verification of these devices has gained tremendous importance due to their shrinking size. To add to the complexity, changing material sets that are results of recently enacted lead-free solder legislations have raised some compatibility issues that are already being researched. In addition to materials and process related flip chip challenges such as solder-flux compatibility, Coefficient of Thermal Expansion (CTE) mismatch, underfill-flux compatibility and thermal management, flip chip packages are vulnerable to a comparatively newer challenge, namely electromigration observed in solder bumps. It is interesting to note that electromigration has come to the forefront of challenges only recently. It has been exacerbated by the reduction in bump cross-section due to the seemingly continuous shrinking in package size over time. The focus of this research was to understand the overall electromigration behavior in lead-free (SnAg) flip chip solder bumps. The objectives of the research were to comprehend the physics of failure mechanism in electromigration for lead-free solder bumps assembled in a flip chip ceramic package having thick copper under bump metallization and to estimate the unknown critical material parameters from Black's equation that describe failure due to electromigration. In addition, the intent was to verify the 'use condition reliability' by extrapolation from experimental conditions. The methodology adopted for this research was comprised of accelerated electromigration tests on SnAg flip chip solder bumps assembled on ceramic substrate with a thick copper under bump metallization. The experimental approach was comprised of elaborate measurement of the temperature of each sample by separate metallization resistance exhibiting positive resistance characteristics to overcome the variation in Joule heating. After conducting the constant current experiments and analyzing the failed samples, it was found that the primary electromigration failure mode observed was the dissolution of the thick copper under bump metallization in the solder, leading to a change in resistance. The lifetime data obtained from different experiments was solved simultaneously using a multiple regression approach to yield the unknown Black's equation parameters of current density exponent and activation energy. In addition to the implementation of a systematic failure analysis and data analysis procedure, it was also deduced that thermomigration due to the temperature gradient across the chip does impact the overall electromigration behavior. This research and the obtained results were significant in bridging the gap for an overall understanding of this critical failure mode observed in flip chip solder bumps. The measurement of each individual sample temperature instead of an average temperature enabled an accurate analysis for predicting the 'use condition reliability' of a comparable product. The obtained results and the conclusions can be used as potential inputs in future designs and newer generations of flip chip devices that might undergo aggressive scaling. This will enable these devices to retain their functionality during their intended useful life with minimal threat of failure due to the potent issue of electromigration. (Abstract shortened by UMI.)
Cutting process simulation of flat drill
NASA Astrophysics Data System (ADS)
Tamura, Shoichi; Matsumura, Takashi
2018-05-01
Flat drills at a point angle of 180 deg. have recently been developed for drilling of automobile parts with the inclination of the workpiece surfaces. The paper studies the cutting processes of the flat drills in the analytical simulation. A predictive force model is applied to simulation of the cutting force with the chip flow direction. The chip flow model is piled up with orthogonal cuttings in the plane containing the cutting velocities and the chip flow velocities, in which the chip flow direction is determined to minimize the cutting energy. Then, the cutting force is predicted in the determined in the chip flow model. The typical cutting force of the flat drill is discussed with comparing to that of the standard drill. The typical differences are confirmed in the cutting force change during the tool engagement and disengagement. The cutting force, then, is simulated in drilling for an inclined workpiece with a flat drill. The horizontal components in the cutting forces are simulated with changing the inclination angle of the plate. The horizontal force component in the flat drilling is stable to be controlled in terms of the machining accuracy and the tool breakage.
Edge chipping and flexural resistance of monolithic ceramics☆
Zhang, Yu; Lee, James J.-W.; Srikanth, Ramanathan; Lawn, Brian R.
2014-01-01
Objective Test the hypothesis that monolithic ceramics can be developed with combined esthetics and superior fracture resistance to circumvent processing and performance drawbacks of traditional all-ceramic crowns and fixed-dental-prostheses consisting of a hard and strong core with an esthetic porcelain veneer. Specifically, to demonstrate that monolithic prostheses can be produced with a much reduced susceptibility to fracture. Methods Protocols were applied for quantifying resistance to chipping as well as resistance to flexural failure in two classes of dental ceramic, microstructurally-modified zirconias and lithium disilicate glass–ceramics. A sharp indenter was used to induce chips near the edges of flat-layer specimens, and the results compared with predictions from a critical load equation. The critical loads required to produce cementation surface failure in monolithic specimens bonded to dentin were computed from established flexural strength relations and the predictions validated with experimental data. Results Monolithic zirconias have superior chipping and flexural fracture resistance relative to their veneered counterparts. While they have superior esthetics, glass–ceramics exhibit lower strength but higher chip fracture resistance relative to porcelain-veneered zirconias. Significance The study suggests a promising future for new and improved monolithic ceramic restorations, with combined durability and acceptable esthetics. PMID:24139756
Wu, Chan-Han; Huang, Chun-Ming; Chung, Fu-Yen; Huang, Ching-Wen; Tsai, Hsiang-Lin; Chen, Chin-Fan; Wang, Jaw-Yuan
2013-01-01
This study is to investigate multiple chemotherapeutic agent- and radiation-related genetic biomarkers in locally advanced rectal cancer (LARC) patients following fluoropyrimidine-based concurrent chemoradiotherapy (CCRT) for response prediction. We initially selected 6 fluoropyrimidine metabolism-related genes (DPYD, ORPT, TYMS, TYMP, TK1, and TK2) and 3 radiotherapy response-related genes (GLUT1, HIF-1 α, and HIF-2 α) as targets for gene expression identification in 60 LARC cancer specimens. Subsequently, a high-sensitivity weighted enzymatic chip array was designed and constructed to predict responses following CCRT. After CCRT, 39 of 60 (65%) LARC patients were classified as responders (pathological tumor regression grade 2 ~ 4). Using a panel of multiple genetic biomarkers (chip), including DPYD, TYMS, TYMP, TK1, and TK2, at a cutoff value for 3 positive genes, a sensitivity of 89.7% and a specificity of 81% were obtained (AUC: 0.915; 95% CI: 0.840–0.991). Negative chip results were significantly correlated to poor CCRT responses (TRG 0-1) (P = 0.014, hazard ratio: 22.704, 95% CI: 3.055–235.448 in multivariate analysis). Disease-free survival analysis showed significantly better survival rate in patients with positive chip results (P = 0.0001). We suggest that a chip including DPYD, TYMS, TYMP, TK1, and TK2 genes is a potential tool to predict response in LARC following fluoropyrimidine-based CCRT. PMID:24455740
The analysis method of the DRAM cell pattern hotspot
NASA Astrophysics Data System (ADS)
Lee, Kyusun; Lee, Kweonjae; Chang, Jinman; Kim, Taeheon; Han, Daehan; Hong, Aeran; Kim, Yonghyeon; Kang, Jinyoung; Choi, Bumjin; Lee, Joosung; Lee, Jooyoung; Hong, Hyeongsun; Lee, Kyupil; Jin, Gyoyoung
2015-03-01
It is increasingly difficult to determine degree of completion of the patterning and the distribution at the DRAM Cell Patterns. When we research DRAM Device Cell Pattern, there are three big problems currently, it is as follows. First, due to etch loading, it is difficult to predict the potential defect. Second, due to under layer topology, it is impossible to demonstrate the influence of the hotspot. Finally, it is extremely difficult to predict final ACI pattern by the photo simulation, because current patterning process is double patterning technology which means photo pattern is completely different from final etch pattern. Therefore, if the hotspot occurs in wafer, it is very difficult to find it. CD-SEM is the most common pattern measurement tool in semiconductor fabrication site. CD-SEM is used to accurately measure small region of wafer pattern primarily. Therefore, there is no possibility of finding places where unpredictable defect occurs. Even though, "Current Defect detector" can measure a wide area, every chip has same pattern issue, the detector cannot detect critical hotspots. Because defect detecting algorithm of bright field machine is based on image processing, if same problems occur on compared and comparing chip, the machine cannot identify it. Moreover this instrument is not distinguished the difference of distribution about 1nm~3nm. So, "Defect detector" is difficult to handle the data for potential weak point far lower than target CD. In order to solve those problems, another method is needed. In this paper, we introduce the analysis method of the DRAM Cell Pattern Hotspot.
Jo, Kyuri; Kwon, Hawk-Bin; Kim, Sun
2014-06-01
Measuring expression levels of genes at the whole genome level can be useful for many purposes, especially for revealing biological pathways underlying specific phenotype conditions. When gene expression is measured over a time period, we have opportunities to understand how organisms react to stress conditions over time. Thus many biologists routinely measure whole genome level gene expressions at multiple time points. However, there are several technical difficulties for analyzing such whole genome expression data. In addition, these days gene expression data is often measured by using RNA-sequencing rather than microarray technologies and then analysis of expression data is much more complicated since the analysis process should start with mapping short reads and produce differentially activated pathways and also possibly interactions among pathways. In addition, many useful tools for analyzing microarray gene expression data are not applicable for the RNA-seq data. Thus a comprehensive package for analyzing time series transcriptome data is much needed. In this article, we present a comprehensive package, Time-series RNA-seq Analysis Package (TRAP), integrating all necessary tasks such as mapping short reads, measuring gene expression levels, finding differentially expressed genes (DEGs), clustering and pathway analysis for time-series data in a single environment. In addition to implementing useful algorithms that are not available for RNA-seq data, we extended existing pathway analysis methods, ORA and SPIA, for time series analysis and estimates statistical values for combined dataset by an advanced metric. TRAP also produces visual summary of pathway interactions. Gene expression change labeling, a practical clustering method used in TRAP, enables more accurate interpretation of the data when combined with pathway analysis. We applied our methods on a real dataset for the analysis of rice (Oryza sativa L. Japonica nipponbare) upon drought stress. The result showed that TRAP was able to detect pathways more accurately than several existing methods. TRAP is available at http://biohealth.snu.ac.kr/software/TRAP/. Copyright © 2014 Elsevier Inc. All rights reserved.
The design of high performance, low power triple-track magnetic sensor chip.
Wu, Xiulong; Li, Minghua; Lin, Zhiting; Xi, Mengyuan; Chen, Junning
2013-07-09
This paper presents a design of a high performance and low power consumption triple-track magnetic sensor chip which was fabricated in TSMC 0.35 μm CMOS process. This chip is able to simultaneously sense, decode and read out the information stored in triple-track magnetic cards. A reference voltage generating circuit, a low-cost filter circuit, a power-on reset circuit, an RC oscillator, and a pre-decoding circuit are utilized as the basic modules. The triple-track magnetic sensor chip has four states, i.e., reset, sleep, swiping card and data read-out. In sleep state, the internal RC oscillator is closed, which means that the digital part does not operate to optimize energy consumption. In order to improve decoding accuracy and expand the sensing range of the signal, two kinds of circuit are put forward, naming offset correction circuit, and tracking circuit. With these two circuits, the sensing function of this chip can be more efficiently and accurately. We simulated these circuit modules with TSMC technology library. The results showed that these modules worked well within wide range input signal. Based on these results, the layout and tape-out were carried out. The measurement results showed that the chip do function well within a wide swipe speed range, which achieved the design target.
The Design of High Performance, Low Power Triple-Track Magnetic Sensor Chip
Wu, Xiulong; Li, Minghua; Lin, Zhiting; Xi, Mengyuan; Chen, Junning
2013-01-01
This paper presents a design of a high performance and low power consumption triple-track magnetic sensor chip which was fabricated in TSMC 0.35 μm CMOS process. This chip is able to simultaneously sense, decode and read out the information stored in triple-track magnetic cards. A reference voltage generating circuit, a low-cost filter circuit, a power-on reset circuit, an RC oscillator, and a pre-decoding circuit are utilized as the basic modules. The triple-track magnetic sensor chip has four states, i.e., reset, sleep, swiping card and data read-out. In sleep state, the internal RC oscillator is closed, which means that the digital part does not operate to optimize energy consumption. In order to improve decoding accuracy and expand the sensing range of the signal, two kinds of circuit are put forward, naming offset correction circuit, and tracking circuit. With these two circuits, the sensing function of this chip can be more efficiently and accurately. We simulated these circuit modules with TSMC technology library. The results showed that these modules worked well within wide range input signal. Based on these results, the layout and tape-out were carried out. The measurement results showed that the chip do function well within a wide swipe speed range, which achieved the design target. PMID:23839231
Sonntag, Frank; Schilling, Niels; Mader, Katja; Gruchow, Mathias; Klotzbach, Udo; Lindner, Gerd; Horland, Reyk; Wagner, Ilka; Lauster, Roland; Howitz, Steffen; Hoffmann, Silke; Marx, Uwe
2010-07-01
Dynamic miniaturized human multi-micro-organ bioreactor systems are envisaged as a possible solution for the embarrassing gap of predictive substance testing prior to human exposure. A rational approach was applied to simulate and design dynamic long-term cultures of the smallest possible functional human organ units, human "micro-organoids", on a chip the shape of a microscope slide. Each chip contains six identical dynamic micro-bioreactors with three different micro-organoid culture segments each, a feed supply and waste reservoirs. A liver, a brain cortex and a bone marrow micro-organoid segment were designed into each bioreactor. This design was translated into a multi-layer chip prototype and a routine manufacturing procedure was established. The first series of microscopable, chemically resistant and sterilizable chip prototypes was tested for matrix compatibility and primary cell culture suitability. Sterility and long-term human cell survival could be shown. Optimizing the applied design approach and prototyping tools resulted in a time period of only 3 months for a single design and prototyping cycle. This rapid prototyping scheme now allows for fast adjustment or redesign of inaccurate architectures. The designed chip platform is thus ready to be evaluated for the establishment and maintenance of the human liver, brain cortex and bone marrow micro-organoids in a systemic microenvironment. Copyright (c) 2010 Elsevier B.V. All rights reserved.
Jiang, Yiyue; Lei, Cheng; Yasumoto, Atsushi; Kobayashi, Hirofumi; Aisaka, Yuri; Ito, Takuro; Guo, Baoshan; Nitta, Nao; Kutsuna, Natsumaro; Ozeki, Yasuyuki; Nakagawa, Atsuhiro; Yatomi, Yutaka; Goda, Keisuke
2017-07-11
According to WHO, about 10 million new cases of thrombotic disorders are diagnosed worldwide every year. Thrombotic disorders, including atherothrombosis (the leading cause of death in the US and Europe), are induced by occlusion of blood vessels, due to the formation of blood clots in which aggregated platelets play an important role. The presence of aggregated platelets in blood may be related to atherothrombosis (especially acute myocardial infarction) and is, hence, useful as a potential biomarker for the disease. However, conventional high-throughput blood analysers fail to accurately identify aggregated platelets in blood. Here we present an in vitro on-chip assay for label-free, single-cell image-based detection of aggregated platelets in human blood. This assay builds on a combination of optofluidic time-stretch microscopy on a microfluidic chip operating at a high throughput of 10 000 blood cells per second with machine learning, enabling morphology-based identification and enumeration of aggregated platelets in a short period of time. By performing cell classification with machine learning, we differentiate aggregated platelets from single platelets and white blood cells with a high specificity and sensitivity of 96.6% for both. Our results indicate that the assay is potentially promising as predictive diagnosis and therapeutic monitoring of thrombotic disorders in clinical settings.
Silicon photonic integrated circuit for fast and precise dual-comb distance metrology.
Weimann, C; Lauermann, M; Hoeller, F; Freude, W; Koos, C
2017-11-27
We demonstrate an optical distance sensor integrated on a silicon photonic chip with a footprint of well below 1 mm 2 . The integrated system comprises a heterodyne receiver structure with tunable power splitting ratio and on-chip photodetectors. The functionality of the device is demonstrated in a synthetic-wavelength interferometry experiment using frequency combs as optical sources. We obtain accurate and fast distance measurements with an unambiguity range of 3.75 mm, a root-mean-square error of 3.4 µm and acquisition times of 14 µs.
Qu, Xiancheng; Hu, Menghong; Shang, Yueyong; Pan, Lisha; Jia, Peixuan; Fu, Chunxue; Liu, Qigen; Wang, Youji
2018-01-01
Next-generation sequencing was used to analyze the effects of toxic microcystin-LR (MC-LR) on silver carp (Hypophthalmichthys molitrix). Silver carps were intraperitoneally injected with MC-LR, and RNA-seq and miRNA-seq in the liver were analyzed at 0.25, 0.5, and 1 h. The expression of glutathione S-transferase (GST), which acts as a marker gene for MC-LR, was tested to determine the earliest time point at which GST transcription was initiated in the liver tissues of the MC-LR-treated silver carps. Hepatic RNA-seq/miRNA-seq analysis and data integration analysis were conducted with reference to the identified time point. Quantitative PCR (qPCR) was performed to detect the expression of the following genes at the three time points: heme oxygenase 1 (HO-1), interleukin-10 receptor 1 (IL-10R1), apolipoprotein A-I (apoA-I), and heme binding protein 2 (HBP2). Results showed that the liver GST expression was remarkably decreased at 0.25 h (P < 0.05). RNA-seq at this time point revealed that the liver tissue contained 97,505 unigenes, including 184 significantly different unigenes and 75 unknown genes. Gene Ontology (GO) term enrichment analysis suggested that 35 of the 145 enriched GO terms were significantly enriched and mainly related to the immune system regulation network. KEGG pathway enrichment analysis showed that 18 of the 189 pathways were significantly enriched, and the most significant was a ribosome pathway containing 77 differentially expressed genes. miRNA-seq analysis indicated that the longest miRNA had 22 nucleotides (nt), followed by 21 and 23 nt. A total of 286 known miRNAs, 332 known miRNA precursor sequences, and 438 new miRNAs were predicted. A total of 1,048,575 mRNA–miRNA interaction sites were obtained, and 21,252 and 21,241 target genes were respectively predicted in known and new miRNAs. qPCR revealed that HO-1, IL-10R1, apoA-I, and HBP2 were significantly differentially expressed and might play important roles in the toxicity and liver detoxification of MC-LR in fish. These results were consistent with those of high-throughput sequencing, thereby verifying the accuracy of our sequencing data. RNA-seq and miRNA-seq analyses of silver carp liver injected with MC-LR provided valuable and new insights into the toxic effects of MC-LR and the antitoxic mechanisms of MC-LR in fish. The RNA/miRNA data are available from the NCBI database Registration No. : SRP075165. PMID:29692738
Reconfigurable radio-frequency arbitrary waveforms synthesized in a silicon photonic chip.
Wang, Jian; Shen, Hao; Fan, Li; Wu, Rui; Niu, Ben; Varghese, Leo T; Xuan, Yi; Leaird, Daniel E; Wang, Xi; Gan, Fuwan; Weiner, Andrew M; Qi, Minghao
2015-01-12
Photonic methods of radio-frequency waveform generation and processing can provide performance advantages and flexibility over electronic methods due to the ultrawide bandwidth offered by the optical carriers. However, bulk optics implementations suffer from the lack of integration and slow reconfiguration speed. Here we propose an architecture of integrated photonic radio-frequency generation and processing and implement it on a silicon chip fabricated in a semiconductor manufacturing foundry. Our device can generate programmable radio-frequency bursts or continuous waveforms with only the light source, electrical drives/controls and detectors being off-chip. It modulates an individual pulse in a radio-frequency burst within 4 ns, achieving a reconfiguration speed three orders of magnitude faster than thermal tuning. The on-chip optical delay elements offer an integrated approach to accurately manipulating individual radio-frequency waveform features without constraints set by the speed and timing jitter of electronics, and should find applications ranging from high-speed wireless to defence electronics.
Versatile single-chip event sequencer for atomic physics experiments
NASA Astrophysics Data System (ADS)
Eyler, Edward
2010-03-01
A very inexpensive dsPIC microcontroller with internal 32-bit counters is used to produce a flexible timing signal generator with up to 16 TTL-compatible digital outputs, with a time resolution and accuracy of 50 ns. This time resolution is easily sufficient for event sequencing in typical experiments involving cold atoms or laser spectroscopy. This single-chip device is capable of triggered operation and can also function as a sweeping delay generator. With one additional chip it can also concurrently produce accurately timed analog ramps, and another one-chip addition allows real-time control from an external computer. Compared to an FPGA-based digital pattern generator, this design is slower but simpler and more flexible, and it can be reprogrammed using ordinary `C' code without special knowledge. I will also describe the use of the same microcontroller with additional hardware to implement a digital lock-in amplifier and PID controller for laser locking, including a simple graphics-based control unit. This work is supported in part by the NSF.
Reconfigurable radio-frequency arbitrary waveforms synthesized in a silicon photonic chip
Wang, Jian; Shen, Hao; Fan, Li; Wu, Rui; Niu, Ben; Varghese, Leo T.; Xuan, Yi; Leaird, Daniel E.; Wang, Xi; Gan, Fuwan; Weiner, Andrew M.; Qi, Minghao
2015-01-01
Photonic methods of radio-frequency waveform generation and processing can provide performance advantages and flexibility over electronic methods due to the ultrawide bandwidth offered by the optical carriers. However, bulk optics implementations suffer from the lack of integration and slow reconfiguration speed. Here we propose an architecture of integrated photonic radio-frequency generation and processing and implement it on a silicon chip fabricated in a semiconductor manufacturing foundry. Our device can generate programmable radio-frequency bursts or continuous waveforms with only the light source, electrical drives/controls and detectors being off-chip. It modulates an individual pulse in a radio-frequency burst within 4 ns, achieving a reconfiguration speed three orders of magnitude faster than thermal tuning. The on-chip optical delay elements offer an integrated approach to accurately manipulating individual radio-frequency waveform features without constraints set by the speed and timing jitter of electronics, and should find applications ranging from high-speed wireless to defence electronics. PMID:25581847
Genomics Portals: integrative web-platform for mining genomics data.
Shinde, Kaustubh; Phatak, Mukta; Johannes, Freudenberg M; Chen, Jing; Li, Qian; Vineet, Joshi K; Hu, Zhen; Ghosh, Krishnendu; Meller, Jaroslaw; Medvedovic, Mario
2010-01-13
A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis. The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org.
Genomics Portals: integrative web-platform for mining genomics data
2010-01-01
Background A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Results Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis. Conclusion The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org. PMID:20070909
[The joint applications of DNA chips and single nucleotide polymorphisms in forensic science].
Bai, Peng; Tian, Li; Zhou, Xue-ping
2005-05-01
DNA chip technology, being a new high-technology, shows its vigorous life and rapid growth. Single Nucleotide Polymorphisms (SNPs) is the most common diversity in the human genome. It provides suitable genetic markers which play a key role in disease linkage study, pharmacogenomics, forensic medicine, population evolution and immigration study. Their advantage such as being analyzed with DNA chips technology, is predicted to play an important role in the field of forensic medicine, especially in paternity test and individual identification. This report mainly reviews the characteristics of DNA chip and SNPs, and their joint applications in the practice of forensic medicine.
An integrated workflow for analysis of ChIP-chip data.
Weigelt, Karin; Moehle, Christoph; Stempfl, Thomas; Weber, Bernhard; Langmann, Thomas
2008-08-01
Although ChIP-chip is a powerful tool for genome-wide discovery of transcription factor target genes, the steps involving raw data analysis, identification of promoters, and correlation with binding sites are still laborious processes. Therefore, we report an integrated workflow for the analysis of promoter tiling arrays with the Genomatix ChipInspector system. We compare this tool with open-source software packages to identify PU.1 regulated genes in mouse macrophages. Our results suggest that ChipInspector data analysis, comparative genomics for binding site prediction, and pathway/network modeling significantly facilitate and enhance whole-genome promoter profiling to reveal in vivo sites of transcription factor-DNA interactions.
Read clouds uncover variation in complex regions of the human genome.
Bishara, Alex; Liu, Yuling; Weng, Ziming; Kashef-Haghighi, Dorna; Newburger, Daniel E; West, Robert; Sidow, Arend; Batzoglou, Serafim
2015-10-01
Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies. © 2015 Bishara et al.; Published by Cold Spring Harbor Laboratory Press.
A survey of the sorghum transcriptome using single-molecule long reads
Abdel-Ghany, Salah E.; Hamilton, Michael; Jacobi, Jennifer L.; ...
2016-06-24
Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length splice isoforms. Here we sequenced the sorghum transcriptome using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify full-length splice isoforms and APA sites. Our analysis reveals transcriptome-wide full-length isoforms at an unprecedented scale with over 11,000 novelmore » splice isoforms. Additionally, we uncover APA ofB11,000 expressed genes and more than 2,100 novel genes. Lastly, these results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop. The TAPIS pipeline will serve as a useful tool to analyse Iso-Seq data from any organism.« less
A survey of the sorghum transcriptome using single-molecule long reads
Abdel-Ghany, Salah E.; Hamilton, Michael; Jacobi, Jennifer L.; Ngam, Peter; Devitt, Nicholas; Schilkey, Faye; Ben-Hur, Asa; Reddy, Anireddy S. N.
2016-01-01
Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length splice isoforms. Here we sequenced the sorghum transcriptome using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify full-length splice isoforms and APA sites. Our analysis reveals transcriptome-wide full-length isoforms at an unprecedented scale with over 11,000 novel splice isoforms. Additionally, we uncover APA of ∼11,000 expressed genes and more than 2,100 novel genes. These results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop. The TAPIS pipeline will serve as a useful tool to analyse Iso-Seq data from any organism. PMID:27339290
Predictive model for consumer preference of a dried, chip-style persimmon product
USDA-ARS?s Scientific Manuscript database
The State of California is a major producer of Asian persimmons (Diospyros kaki), however, there is limited availability of persimmons outside of this region and the fruit’s short harvest season. A dried, chip-style product could increase the geographic area and timeframe in which persimmon growers...
Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes and progenitors
Villani, Alexandra-Chloé; Satija, Rahul; Reynolds, Gary; Sarkizova, Siranush; Shekhar, Karthik; Fletcher, James; Griesbeck, Morgane; Butler, Andrew; Zheng, Shiwei; Lazo, Suzan; Jardine, Laura; Dixon, David; Stephenson, Emily; Nilsson, Emil; Grundberg, Ida; McDonald, David; Filby, Andrew; Li, Weibo; De Jager, Philip L.; Rozenblatt-Rosen, Orit; Lane, Andrew A.; Haniffa, Muzlifah; Regev, Aviv; Hacohen, Nir
2017-01-01
Dendritic cells (DCs) and monocytes play a central role in pathogen sensing, phagocytosis and antigen presentation and consist of multiple specialized subtypes. However, their identities and interrelationships are not fully understood. Using unbiased single-cell RNA sequencing (RNA-seq) of ~2400 cells, we identified six human DCs and four monocyte subtypes in human blood. Our study reveals: a new DC subset that shares properties with plasmacytoid DCs (pDCs) but potently activates T cells, thus redefining pDCs; a new subdivision within the CD1C+ subset of DCs; the relationship between blastic plasmacytoid DC neoplasia cells and healthy DCs; and circulating progenitor of conventional DCs (cDCs). Our revised taxonomy will enable more accurate functional and developmental analyses as well as immune monitoring in health and disease. PMID:28428369
Circular RNA profile in gliomas revealed by identification tool UROBORUS.
Song, Xiaofeng; Zhang, Naibo; Han, Ping; Moon, Byoung-San; Lai, Rose K; Wang, Kai; Lu, Wange
2016-05-19
Recent evidence suggests that many endogenous circular RNAs (circRNAs) may play roles in biological processes. However, the expression patterns and functions of circRNAs in human diseases are not well understood. Computationally identifying circRNAs from total RNA-seq data is a primary step in studying their expression pattern and biological roles. In this work, we have developed a computational pipeline named UROBORUS to detect circRNAs in total RNA-seq data. By applying UROBORUS to RNA-seq data from 46 gliomas and normal brain samples, we detected thousands of circRNAs supported by at least two read counts, followed by successful experimental validation on 24 circRNAs from the randomly selected 27 circRNAs. UROBORUS is an efficient tool that can detect circRNAs with low expression levels in total RNA-seq without RNase R treatment. The circRNAs expression profiling revealed more than 476 circular RNAs differentially expressed in control brain tissues and gliomas. Together with parental gene expression, we found that circRNA and its parental gene have diversified expression patterns in gliomas and control brain tissues. This study establishes an efficient and sensitive approach for predicting circRNAs using total RNA-seq data. The UROBORUS pipeline can be accessed freely for non-commercial purposes at http://uroborus.openbioinformatics.org/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Li, Xiaochun; Yang, Fan; Wong, Jessica X H; Yu, Hua-Zhong
2017-09-05
We demonstrate herein an integrated, smartphone-app-chip (SPAC) system for on-site quantitation of food toxins, as demonstrated with aflatoxin B1 (AFB1), at parts-per-billion (ppb) level in food products. The detection is based on an indirect competitive immunoassay fabricated on a transparent plastic chip with the assistance of a microfluidic channel plate. A 3D-printed optical accessory attached to a smartphone is adapted to align the assay chip and to provide uniform illumination for imaging, with which high-quality images of the assay chip are captured by the smartphone camera and directly processed using a custom-developed Android app. The performance of this smartphone-based detection system was tested using both spiked and moldy corn samples; consistent results with conventional enzyme-linked immunosorbent assay (ELISA) kits were obtained. The achieved detection limit (3 ± 1 ppb, equivalent to μg/kg) and dynamic response range (0.5-250 ppb) meet the requested testing standards set by authorities in China and North America. We envision that the integrated SPAC system promises to be a simple and accurate method of food toxin quantitation, bringing much benefit for rapid on-site screening.
Park, Christopher Y.; Krishnan, Arjun; Zhu, Qian; Wong, Aaron K.; Lee, Young-Suk; Troyanskaya, Olga G.
2015-01-01
Motivation: Leveraging the large compendium of genomic data to predict biomedical pathways and specific mechanisms of protein interactions genome-wide in metazoan organisms has been challenging. In contrast to unicellular organisms, biological and technical variation originating from diverse tissues and cell-lineages is often the largest source of variation in metazoan data compendia. Therefore, a new computational strategy accounting for the tissue heterogeneity in the functional genomic data is needed to accurately translate the vast amount of human genomic data into specific interaction-level hypotheses. Results: We developed an integrated, scalable strategy for inferring multiple human gene interaction types that takes advantage of data from diverse tissue and cell-lineage origins. Our approach specifically predicts both the presence of a functional association and also the most likely interaction type among human genes or its protein products on a whole-genome scale. We demonstrate that directly incorporating tissue contextual information improves the accuracy of our predictions, and further, that such genome-wide results can be used to significantly refine regulatory interactions from primary experimental datasets (e.g. ChIP-Seq, mass spectrometry). Availability and implementation: An interactive website hosting all of our interaction predictions is publically available at http://pathwaynet.princeton.edu. Software was implemented using the open-source Sleipnir library, which is available for download at https://bitbucket.org/libsleipnir/libsleipnir.bitbucket.org. Contact: ogt@cs.princeton.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25431329
Nguyen, Quan; Lukowski, Samuel; Chiu, Han; Senabouth, Anne; Bruxner, Timothy; Christ, Angelika; Palpant, Nathan; Powell, Joseph
2018-05-11
Heterogeneity of cell states represented in pluripotent cultures have not been described at the transcriptional level. Since gene expression is highly heterogeneous between cells, single-cell RNA sequencing can be used to identify how individual pluripotent cells function. Here, we present results from the analysis of single-cell RNA sequencing data from 18,787 individual WTC CRISPRi human induced pluripotent stem cells. We developed an unsupervised clustering method, and through this identified four subpopulations distinguishable on the basis of their pluripotent state including: a core pluripotent population (48.3%), proliferative (47.8%), early-primed for differentiation (2.8%) and late-primed for differentiation (1.1%). For each subpopulation we were able to identify the genes and pathways that define differences in pluripotent cell states. Our method identified four discrete predictor gene sets comprised of 165 unique genes that denote the specific pluripotency states; and using these sets, we developed a multigenic machine learning prediction method to accurately classify single cells into each of the subpopulations. Compared against a set of established pluripotency markers, our method increases prediction accuracy by 10%, specificity by 20%, and explains a substantially larger proportion of deviance (up to 3-fold) from the prediction model. Finally, we developed an innovative method to predict cells transitioning between subpopulations, and support our conclusions with results from two orthogonal pseudotime trajectory methods. Published by Cold Spring Harbor Laboratory Press.
LING, SHIZHANG; RETTIG, ELENI M.; TAN, MARIETTA; CHANG, XIAOFEI; WANG, ZHIMING; BRAIT, MARIANA; BISHOP, JUSTIN A.; FERTIG, ELANA J.; CONSIDINE, MICHAEL; WICK, MICHAEL J.; HA, PATRICK K.
2016-01-01
Salivary gland adenoid cystic carcinoma (ACC) is a rare head and neck malignancy without molecular biomarkers that can be used to predict the chemotherapeutic response or prognosis of ACC. The regulation of gene expression of oncogenes and tumor suppressor genes (TSGs) through DNA promoter methylation may play a role in the carcinogenesis of ACC. To identify differentially methylated genes in ACC, a global demethylating agent, 5-aza-2′-deoxycytidine (5-AZA) was utilized to unmask putative TSG silencing in ACC xenograft models in mice. Fresh xenografts were passaged, implanted in triplicate in mice that were treated with 5-AZA daily for 28 days. These xenografts were then evaluated for genome-wide DNA methylation patterns using the Illumina Infinium HumanMethylation27 BeadChip array. Validation of the 32 candidate genes was performed by bisulfite sequencing (BS-seq) in a separate cohort of 6 ACC primary tumors and 6 normal control salivary gland tissues. Hypermethylation was identified in the HCN2 gene promoter in all 6 control tissues, but hypomethylation was found in all 6 ACC tumor tissues. Quantitative validation of HCN2 promoter methylation level in the region detected by BS-seq was performed in a larger cohort of primary tumors (n=32) confirming significant HCN2 hypomethylation in ACCs compared with normal samples (n=10; P=0.04). HCN2 immunohistochemical staining was performed on an ACC tissue microarray. HCN2 staining intensity and H-score, but not percentage of the positively stained cells, were significantly stronger in normal tissues than those of ACC tissues. With our novel screening and sequencing methods, we identified several gene candidates that were methylated. The most significant of these genes, HCN2, was actually hypomethylated in tumors. However, promoter methylation status does not appear to be a major determinant of HCN2 expression in normal and ACC tissues. HCN2 hypomethylation is a biomarker of ACC and may play an important role in the carcinogenesis of ACC. PMID:27212063
Utilisation of chip thickness models in grinding
NASA Astrophysics Data System (ADS)
Singleton, Roger
Grinding is now a well established process utilised for both stock removal and finish applications. Although significant research is performed in this field, grinding still experiences problems with burn and high forces which can lead to poor quality components and damage to equipment. This generally occurs in grinding when the process deviates from its safe working conditions. In milling, chip thickness parameters are utilised to predict and maintain process outputs leading to improved control of the process. This thesis looks to further the knowledge of the relationship between chip thickness and the grinding process outputs to provide an increased predictive and maintenance modelling capability. Machining trials were undertaken using different chip thickness parameters to understand how these affect the process outputs. The chip thickness parameters were maintained at different grinding wheel diameters for a constant productivity process to determine the impact of chip thickness at a constant material removal rate.. Additional testing using a modified pin on disc test rig was performed to provide further information on process variables. The different chip thickness parameters provide control of different process outputs in the grinding process. These relationships can be described using contact layer theory and heat flux partitioning. The contact layer is defined as the immediate layer beneath the contact arc at the wheel workpiece interface. The size of the layer governs the force experienced during the process. The rate of contact layer removal directly impacts the net power required from the system. It was also found that the specific grinding energy of a process is more dependent on the productivity of a grinding process rather than the value of chip thickness. Changes in chip thickness at constant material removal rate result in microscale changes in the rate of contact layer removal when compared to changes in process productivity. This is a significant piece of information in relation to specific grinding energy where conventional theory states it is primarily dependent on chip thickness..
Kuo, Pei-Yu; Leshchenko, Violetta V.; Fazzari, Melissa J.; Perumal, Deepak; Gellen, Tobias; He, Tianfang; Iqbal, Javeed; Baumgartner-Wennerholm, Stefanie; Nygren, Lina; Zhang, Fan; Zhang, Weijia; Suh, K. Stephen; Goy, Andre; Yang, David T.; Chan, Wing-Chung; Kahl, Brad S.; Verma, Amit K.; Gascoyne, Randy D.; Kimby, Eva; Sander, Birgitta; Ye, B. Hilda; Melnick, Ari M.; Parekh, Samir
2015-01-01
SOX11 (Sex determining region Y-box 11) expression is specific for MCL as compared to other Non-Hodgkin's lymphomas. However, the function and direct binding targets of SOX11 in MCL are largely unknown. We used high-resolution ChIP-Seq to identify the direct target genes of SOX11 in a genome-wide, unbiased manner and elucidate its functional significance. Pathway analysis identified WNT, PKA and TGF-beta signaling pathways as significantly enriched by SOX11 target genes. qCHIP and promoter reporter assays confirmed that SOX11 directly binds to individual genes and modulates their transcription activities in these pathways in MCL. Functional studies using RNA interference demonstrate that SOX11 directly regulates WNT in MCL. We analyzed SOX11 expression in three independent well-annotated tissue microarrays from the University of Wisconsin (UW), Karolinska Institute and British Columbia Cancer Agency (BCCA). Our findings suggest that high SOX11 expression is associated with improved survival in a subset of MCL patients, particularly those treated with intensive chemotherapy. Transcriptional regulation of WNT and other biological pathways affected by SOX11 target genes may help explain the impact of SOX11 expression on patient outcomes. PMID:24681958
Schroth, Werner; Hamann, Ute; Fasching, Peter A; Dauser, Silke; Winter, Stefan; Eichelbaum, Michel; Schwab, Matthias; Brauch, Hiltrud
2010-09-01
This study aimed to validate matrix-assisted laser desorption/ionization-time-of-flight mass spectrometry (MALDI-TOF MS)/Taqman copy number assay (CNA) CYP2D6 genotyping by AmpliChip CYP450 Test for the prediction of tamoxifen metabolizer phenotypes in breast cancer, and to investigate the influence of CYP2D6 variant coverage on genotype-phenotype relationships and tamoxifen outcome. Hormone receptor-positive postmenopausal breast cancer patients (n = 492) treated with adjuvant tamoxifen, previously analyzed by MALDI-TOF MS/CNA, were reanalyzed by AmpliChip CYP450 Test and validated by independent methods. Cox proportional hazard ratios (HR) were calculated for recurrence of poor (PM) relative to extensive metabolizer (EM) phenotypes with increasing numbers of CYP2D6 variants. Kaplan-Meier distributions were calculated for different phenotype classifications. Concordance was 99.2% to 99.5% for CNA and 99.8% to 100% per CYP2D6 allele (*3, *4, *5, *9, *10, and *41). The prevalence of predicted phenotypes was 1.2% for ultrarapid metabolizer (UM), 37.2% for EM without variant, 43.5% for heterozygous EM, 9.7% for intermediate metabolizer (IM), and 8.3% for PM. Approximately, one third of patients were misclassified based on a *4 analysis only, but inclusion of all reduced-function alleles increased the PM-associated HR from 1.33 (P = 0.58) to 2.87 (P = 0.006). Kaplan-Meier analyses showed highest and lowest clinical benefit for UM and PM with respect to both the AmpliChip-based and a redefined phenotype assignment. The latter revealed significant allele-dose-dependent associations (P = 0.011) and largest effect size (HR(PM_EM) = 2.77; 95% confidence interval, 1.31-5.89). MALDI-TOF MS/CNA is suitable for accurate CYP2D6 genotyping. For tamoxifen pharmacogenetics, broad CYP2D6 allele coverage is recommended to reduce phenotype misclassification. Classification based on refined EM and reduced-function metabolizers is advisable. AACR.
Pinpointing the weed sources of potato psyllid in Washington
USDA-ARS?s Scientific Manuscript database
A major challenge in controlling zebra chip disease of potatoes is our inability to predict what potato fields are likely to be colonized by potato psyllid, the vector of the pathogen that causes zebra chip. Researchers at the USDA-ARS laboratory in Wapato, WA developed a novel method to identify w...
Ayvaz, Huseyin; Rodriguez-Saona, Luis E
2015-05-01
The most common methods for acrylamide analysis in foods require the use of LC-MS/MS and GC-MS. Although these methods have great analytical performance, they need intensive sample preparation, highly specialised instrumentation, and are time consuming. In this study, portable and handheld infrared spectrometers were evaluated as rapid methods for screening acrylamide in potato chips and their performances were compared to those of benchtop infrared systems. The acrylamide content of 64 commercial potato chips (169-2453 μg/kg) was determined by LC-MS/MS. Spectral data were collected using mid-infrared (MIR) and near-infrared (NIR) spectrometers. Partial least squares regression (PLSR) calibration models were developed to predict acrylamide levels. Overall, good linear correlation was found between the predicted acrylamide levels and actual measured acrylamide concentrations by LC-MS/MS (rPred > 0.90 and SEP < 100 μg/kg). Our results indicate that portable and handheld spectrometers can be used as simple and rapid alternatives for acrylamide analysis in potato chips. Copyright © 2014 Elsevier Ltd. All rights reserved.
Microfluidic Organ/Body-on-a-Chip Devices at the Convergence of Biology and Microengineering.
Perestrelo, Ana Rubina; Águas, Ana C P; Rainer, Alberto; Forte, Giancarlo
2015-12-10
Recent advances in biomedical technologies are mostly related to the convergence of biology with microengineering. For instance, microfluidic devices are now commonly found in most research centers, clinics and hospitals, contributing to more accurate studies and therapies as powerful tools for drug delivery, monitoring of specific analytes, and medical diagnostics. Most remarkably, integration of cellularized constructs within microengineered platforms has enabled the recapitulation of the physiological and pathological conditions of complex tissues and organs. The so-called "organ-on-a-chip" technology, which represents a new avenue in the field of advanced in vitro models, with the potential to revolutionize current approaches to drug screening and toxicology studies. This review aims to highlight recent advances of microfluidic-based devices towards a body-on-a-chip concept, exploring their technology and broad applications in the biomedical field.
Process-Hardened, Multi-Analyte Sensor for Characterizing Rocket Plume Constituents
NASA Technical Reports Server (NTRS)
Goswami, Kisholoy
2011-01-01
A multi-analyte sensor was developed that enables simultaneous detection of rocket engine combustion-product molecules in a launch-vehicle ground test stand. The sensor was developed using a pin-printing method by incorporating multiple sensor elements on a single chip. It demonstrated accurate and sensitive detection of analytes such as carbon dioxide, carbon monoxide, kerosene, isopropanol, and ethylene from a single measurement. The use of pin-printing technology enables high-volume fabrication of the sensor chip, which will ultimately eliminate the need for individual sensor calibration since many identical sensors are made in one batch. Tests were performed using a single-sensor chip attached to a fiber-optic bundle. The use of a fiber bundle allows placement of the opto-electronic readout device at a place remote from the test stand. The sensors are rugged for operation in harsh environments.
Wu, Wenjie; Cheng, Peng; Lyu, Jingtong; Zhang, Zehua; Xu, Jianzhong
2018-05-01
We developed a Tag Array chip for detecting first- and second-line anti tuberculosis drug resistance in pulmonary tuberculosis and compared the analytical performance of the gene chip to that of phenotypic drug susceptibility testing (DST). From November 2011 to April 2016.234 consecutive culture-confirmed, clinically and imaging diagnosed patients with pulmonary tuberculosis from Southwest Hospital, Chongqing were enrolled into the study. Specimens collected during sputum or bronchoalveolar lavage fluid from the pulmonary tuberculosis patients were subjected to M. tuberculosis species identification and drug-resistance detection by the Tag Array gene chip, and evaluate the sensitivity and specificity of chip. A total of 186 patients was diagnosed drug-resistant tuberculosis. The detection of rifampicin (RFP), isoniazid (INH), fluoroquinolones (FQS), streptomycin (SM) resistance genes was highly sensitive and specific: however, for detection of amikacin (AMK), capreomycin (CPM), Kanamycin (KM), specificity was higher, but sensitivity was lower. Sensitivity for the detection of a mutation in the eis promoter region could be improved. The detection sensitivity of the EMB resistance gene was low, therefore it is easy to miss a diagnosis of EMB drug resistance, but its specificity was high. Tag Array chip can achieve rapid, accurate and high-throughput detection of tuberculosis resistance in pulmonary tuberculosis, which has important clinical significance and feasibility. Copyright © 2018. Published by Elsevier Ltd.
NASA Astrophysics Data System (ADS)
Kong, Jae-Sung; Hyun, Hyo-Young; Seo, Sang-Ho; Shin, Jang-Kyoo
2008-11-01
Complementary metal-oxide-semiconductor (CMOS) vision chips for edge detection based on a resistive circuit have recently been developed. These chips help in the creation of neuromorphic systems of a compact size, high speed of operation, and low power dissipation. The output of the vision chip depends predominantly upon the electrical characteristics of the resistive network which consists of a resistive circuit. In this paper, the body effect of the metal-oxide-semiconductor field-effect transistor for current distribution in a resistive circuit is discussed with a simple model. In order to evaluate the model, two 160 × 120 CMOS vision chips have been fabricated using a standard CMOS technology. The experimental results nicely match our prediction.
Iquebal, M A; Jaiswal, Sarika; Mahato, Ajay Kumar; Jayaswal, Pawan K; Angadi, U B; Kumar, Neeraj; Sharma, Nimisha; Singh, Anand K; Srivastav, Manish; Prakash, Jai; Singh, S K; Khan, Kasim; Mishra, Rupesh K; Rajan, Shailendra; Bajpai, Anju; Sandhya, B S; Nischita, Puttaraju; Ravishankar, K V; Dinesh, M R; Rai, Anil; Kumar, Dinesh; Sharma, Tilak R; Singh, Nagendra K
2017-11-02
Mango is one of the most important fruits of tropical ecological region of the world, well known for its nutritive value, aroma and taste. Its world production is >45MT worth >200 billion US dollars. Genomic resources are required for improvement in productivity and management of mango germplasm. There is no web-based genomic resources available for mango. Hence rapid and cost-effective high throughput putative marker discovery is required to develop such resources. RAD-based marker discovery can cater this urgent need till whole genome sequence of mango becomes available. Using a panel of 84 mango varieties, a total of 28.6 Gb data was generated by ddRAD-Seq approach on Illumina HiSeq 2000 platform. A total of 1.25 million SNPs were discovered. Phylogenetic tree using 749 common SNPs across these varieties revealed three major lineages which was compared with geographical locations. A web genomic resources MiSNPDb, available at http://webtom.cabgrid.res.in/mangosnps/ is based on 3-tier architecture, developed using PHP, MySQL and Javascript. This web genomic resources can be of immense use in the development of high density linkage map, QTL discovery, varietal differentiation, traceability, genome finishing and SNP chip development for future GWAS in genomic selection program. We report here world's first web-based genomic resources for genetic improvement and germplasm management of mango.
Systems biology of cancer biomarker detection.
Mitra, Sanga; Das, Smarajit; Chakrabarti, Jayprokas
2013-01-01
Cancer systems-biology is an ever-growing area of research due to explosion of data; how to mine these data and extract useful information is the problem. To have an insight on carcinogenesis one need to systematically mine several resources, such as databases, microarray and next-generation sequences. This review encompasses management and analysis of cancer data, databases construction and data deposition, whole transcriptome and genome comparison, analysing results from high throughput experiments to uncover cellular pathways and molecular interactions, and the design of effective algorithms to identify potential biomarkers. Recent technical advances such as ChIP-on-chip, ChIP-seq and RNA-seq can be applied to get epigenetic information transformed into a high-throughput endeavour to which systems biology and bioinformatics are making significant inroads. The data from ENCODE and GENCODE projects available through UCSC genome browser can be considered as benchmark for comparison and meta-analysis. A pipeline for integrating next generation sequencing data, microarray data, and putting them together with the existing database is discussed. The understanding of cancer genomics is changing the way we approach cancer diagnosis and treatment. To give a better understanding of utilizing available resources' we have chosen oral cancer to show how and what kind of analysis can be done. This review is a computational genomic primer that provides a bird's eye view of computational and bioinformatics' tools currently available to perform integrated genomic and system biology analyses of several carcinoma.
Tang, Haibin; Chen, Zhangxing; Zhou, Guowei; ...
2018-02-06
To develop further understanding towards the role of a heterogeneous microstructure on tensile crack initiation and failure behavior in chopped carbon fiber chip-reinforced composites, uni-axial tensile tests are performed on coupons cut from compression molded plaque with varying directions. Our experimental results indicate that failure initiation is relevant to the strain localization, and a new criterion with the nominal modulus to predict the failure location is proposed based on the strain analysis. Furthermore, optical microscopic images show that the nominal modulus is determined by the chip orientation distribution. At the area with low nominal modulus, it is found that chipsmore » are mostly aligning along directions transverse to loading direction and/or less concentrated, while at the area with high nominal modulus, more chips are aligning to tensile direction. On the basis of failure mechanism analysis, it is concluded that transversely-oriented chips or resin-rich regions are easier for damage initiation, while longitudinally-oriented chips postpone the fracture. Good agreement is found among failure mechanism, strain localization and chip orientation distribution.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tang, Haibin; Chen, Zhangxing; Zhou, Guowei
To develop further understanding towards the role of a heterogeneous microstructure on tensile crack initiation and failure behavior in chopped carbon fiber chip-reinforced composites, uni-axial tensile tests are performed on coupons cut from compression molded plaque with varying directions. Our experimental results indicate that failure initiation is relevant to the strain localization, and a new criterion with the nominal modulus to predict the failure location is proposed based on the strain analysis. Furthermore, optical microscopic images show that the nominal modulus is determined by the chip orientation distribution. At the area with low nominal modulus, it is found that chipsmore » are mostly aligning along directions transverse to loading direction and/or less concentrated, while at the area with high nominal modulus, more chips are aligning to tensile direction. On the basis of failure mechanism analysis, it is concluded that transversely-oriented chips or resin-rich regions are easier for damage initiation, while longitudinally-oriented chips postpone the fracture. Good agreement is found among failure mechanism, strain localization and chip orientation distribution.« less
High-throughput detection of RNA processing in bacteria.
Gill, Erin E; Chan, Luisa S; Winsor, Geoffrey L; Dobson, Neil; Lo, Raymond; Ho Sui, Shannan J; Dhillon, Bhavjinder K; Taylor, Patrick K; Shrestha, Raunak; Spencer, Cory; Hancock, Robert E W; Unrau, Peter J; Brinkman, Fiona S L
2018-03-27
Understanding the RNA processing of an organism's transcriptome is an essential but challenging step in understanding its biology. Here we investigate with unprecedented detail the transcriptome of Pseudomonas aeruginosa PAO1, a medically important and innately multi-drug resistant bacterium. We systematically mapped RNA cleavage and dephosphorylation sites that result in 5'-monophosphate terminated RNA (pRNA) using monophosphate RNA-Seq (pRNA-Seq). Transcriptional start sites (TSS) were also mapped using differential RNA-Seq (dRNA-Seq) and both datasets were compared to conventional RNA-Seq performed in a variety of growth conditions. The pRNA-Seq library revealed known tRNA, rRNA and transfer-messenger RNA (tmRNA) processing sites, together with previously uncharacterized RNA cleavage events that were found disproportionately near the 5' ends of transcripts associated with basic bacterial functions such as oxidative phosphorylation and purine metabolism. The majority (97%) of the processed mRNAs were cleaved at precise codon positions within defined sequence motifs indicative of distinct endonucleolytic activities. The most abundant of these motifs corresponded closely to an E. coli RNase E site previously established in vitro. Using the dRNA-Seq library, we performed an operon analysis and predicted 3159 potential TSS. A correlation analysis uncovered 105 antiparallel pairs of TSS that were separated by 18 bp from each other and were centered on single palindromic TAT(A/T)ATA motifs (likely - 10 promoter elements), suggesting that, consistent with previous in vitro experimentation, these sites can initiate transcription bi-directionally and may thus provide a novel form of transcriptional regulation. TSS and RNA-Seq analysis allowed us to confirm expression of small non-coding RNAs (ncRNAs), many of which are differentially expressed in swarming and biofilm formation conditions. This study uses pRNA-Seq, a method that provides a genome-wide survey of RNA processing, to study the bacterium Pseudomonas aeruginosa and discover extensive transcript processing not previously appreciated. We have also gained novel insight into RNA maturation and turnover as well as a potential novel form of transcription regulation. NOTE: All sequence data has been submitted to the NCBI sequence read archive. Accession numbers are as follows: [NCBI sequence read archive: SRX156386, SRX157659, SRX157660, SRX157661, SRX157683 and SRX158075]. The sequence data is viewable using Jbrowse on www.pseudomonas.com .
Adjustments of the Pesticide Risk Index Used in Environmental Policy in Flanders
Fevery, Davina; Peeters, Bob; Lenders, Sonia; Spanoghe, Pieter
2015-01-01
Indicators are used to quantify the pressure of pesticides on the environment. Pesticide risk indicators typically require weighting environmental exposure by a no effect concentration. An indicator based on spread equivalents (ΣSeq) is used in environmental policy in Flanders (Belgium). The pesticide risk for aquatic life is estimated by weighting active ingredient usage by the ratio of their maximum allowable concentration and their soil halflife. Accurate estimates of total pesticide usage in the region are essential in such calculations. Up to 2012, the environmental impact of pesticides was estimated on sales figures provided by the Federal Government. Since 2013, pesticide use is calculated based on results from the Farm Accountancy Data Network (FADN). The estimation of pesticide use was supplemented with data for non-agricultural use based on sales figures of amateur use provided by industry and data obtained from public services. The Seq-indicator was modified to better reflect reality. This method was applied for the period 2009-2012 and showed differences between estimated use and sales figures of pesticides. The estimated use of pesticides based on accountancy data is more accurate compared to sales figures. This approach resulted in a better view on pesticide use and its respective environmental impact in Flanders. PMID:26046655
Adjustments of the Pesticide Risk Index Used in Environmental Policy in Flanders.
Fevery, Davina; Peeters, Bob; Lenders, Sonia; Spanoghe, Pieter
2015-01-01
Indicators are used to quantify the pressure of pesticides on the environment. Pesticide risk indicators typically require weighting environmental exposure by a no effect concentration. An indicator based on spread equivalents (ΣSeq) is used in environmental policy in Flanders (Belgium). The pesticide risk for aquatic life is estimated by weighting active ingredient usage by the ratio of their maximum allowable concentration and their soil halflife. Accurate estimates of total pesticide usage in the region are essential in such calculations. Up to 2012, the environmental impact of pesticides was estimated on sales figures provided by the Federal Government. Since 2013, pesticide use is calculated based on results from the Farm Accountancy Data Network (FADN). The estimation of pesticide use was supplemented with data for non-agricultural use based on sales figures of amateur use provided by industry and data obtained from public services. The Seq-indicator was modified to better reflect reality. This method was applied for the period 2009-2012 and showed differences between estimated use and sales figures of pesticides. The estimated use of pesticides based on accountancy data is more accurate compared to sales figures. This approach resulted in a better view on pesticide use and its respective environmental impact in Flanders.
Cost analysis of whole genome sequencing in German clinical practice.
Plöthner, Marika; Frank, Martin; von der Schulenburg, J-Matthias Graf
2017-06-01
Whole genome sequencing (WGS) is an emerging tool in clinical diagnostics. However, little has been said about its procedure costs, owing to a dearth of related cost studies. This study helps fill this research gap by analyzing the execution costs of WGS within the setting of German clinical practice. First, to estimate costs, a sequencing process related to clinical practice was undertaken. Once relevant resources were identified, a quantification and monetary evaluation was conducted using data and information from expert interviews with clinical geneticists, and personnel at private enterprises and hospitals. This study focuses on identifying the costs associated with the standard sequencing process, and the procedure costs for a single WGS were analyzed on the basis of two sequencing platforms-namely, HiSeq 2500 and HiSeq Xten, both by Illumina, Inc. In addition, sensitivity analyses were performed to assess the influence of various uses of sequencing platforms and various coverage values on a fixed-cost degression. In the base case scenario-which features 80 % utilization and 30-times coverage-the cost of a single WGS analysis with the HiSeq 2500 was estimated at €3858.06. The cost of sequencing materials was estimated at €2848.08; related personnel costs of €396.94 and acquisition/maintenance costs (€607.39) were also found. In comparison, the cost of sequencing that uses the latest technology (i.e., HiSeq Xten) was approximately 63 % cheaper, at €1411.20. The estimated costs of WGS currently exceed the prediction of a 'US$1000 per genome', by more than a factor of 3.8. In particular, the material costs in themselves exceed this predicted cost.
Preliminary investigation of self-as-context in people with fibromyalgia
Yu, Lin; Norton, Sam; Almarzooqi, Sarah; McCracken, Lance M
2017-01-01
Acceptance and commitment therapy (ACT), based on the Psychological Flexibility (PF) model, has been recently applied to fibromyalgia (FM), and appeared effective in improving functioning. However, evidence for some of the processes within the PF model, self-as-context (SAC) in particular, is lacking within this population. The current study validates a measure of SAC, the Self Experiences Questionnaire (SEQ), and preliminarily investigates the role of SAC in relation to functioning in FM. Participants (N = 298, 93.3% women) self-reporting a diagnosis of FM were recruited via the Internet and completed an online survey. Measures included pain, pain acceptance and SAC, as processes, and pain interference, work and social adjustment, depression and depression-related interference, as outcomes. Confirmatory factor analysis of the SEQ suggested a bi-factor structure, with a general factor underlying all items and two sub-factors, self-as-distinction and self-as-observer (χ2 = 46.55, p = .06, comparative fit index (CFI) = .99, Tucker–Lewis Index (TLI) = .99, root mean square error of approximation (RMSEA) = .04). Component factors showed good reliability, Cronbach’s α = .90, and construct validity, supported by significant Pearson’s correlations between SEQ scores, acceptance and outcomes (r = −.14 to −.33). In multiple regression analyses, SEQ scores significantly predicted pain-related interference (β = −.17, p < .05), work and social adjustment (β = −.14, p < .05) and depression (β = −.21, p < .01), but not depression-related interference, after controlling for pain, but only significantly predicted depression after controlling pain acceptance. These preliminary results show potentially important associations between SAC and functioning in people with FM. PMID:28785409
Roush, George C; Abdelfattah, Ramy; Song, Steven; Kostis, John B; Ernst, Michael E; Sica, Domenic A
2018-06-01
Found in 36-41% of hypertension, elevated left ventricular mass (LVM) independently predicts cardiovascular events and total mortality. Conversely, drug-induced regression of LVM predicts improved outcomes. Previous studies have favored renin-angiotensin system inhibitors (RASIs) over other antihypertensives for reducing LVM but ignored differences among thiazide-type diuretics. From evidence regarding potency, cardiovascular events, and electrolytes, we hypothesized a priori that 'CHIP' diuretics [CHlorthalidone, Indapamide and Potassium-sparing Diuretic/hydrochlorothiazide (PSD/HCTZ)] would rival RASIs for reducing LVM. Systematic review yielded 12 relevant double-blind randomized trials. CHIPs were more closely associated with reduced LVM than HCTZ (P = 0.004), indicating that RASIs must be compared with each diuretic separately. Publication bias favoring RASIs was corrected by cumulative analysis. For reducing LVM, HCTZ tended to be less effective than RASIs. However, the following surpassed RASIs: chlorthalidone Hedge's G: -0.37 (95% CI -0.72 to -0.02), P = 0.036; indapamide -0.20 (-0.39 to -0.01), P = 0.035; all CHIPs combined (with 61% of patients in one trial) -0.25 (-0.41to -0.09), P = 0.002. Statistical significance (P < 0.05) did not depend on any one trial. CHIPs reduction in LVM was 37% greater than that from RASIs. CHIPs superiority tended to increase with trial duration, from a negligible effect at 0.5 year to a maximal effect at 0.9-1.0 years: -0.26 (-0.43 to -0.09), P = 0.003. Fifty-eight percent of patients had information on echocardiographic components of LVM: relative to RASIs, CHIPs significantly reduced end-diastolic LV internal dimension (EDLVID): -0.18 (-0.36 to -0.00), P = 0.046. Strength of evidence favoring CHIPs over RASIs was at least moderate. In these novel results in patients with hypertension, CHIPs surpassed RASIs for reducing LVM and EDLVID.
Design-based modeling of magnetically actuated soft diaphragm materials
NASA Astrophysics Data System (ADS)
Jayaneththi, V. R.; Aw, K. C.; McDaid, A. J.
2018-04-01
Magnetic polymer composites (MPC) have shown promise for emerging biomedical applications such as lab-on-a-chip and implantable drug delivery. These soft material actuators are capable of fast response, large deformation and wireless actuation. Existing MPC modeling approaches are computationally expensive and unsuitable for rapid design prototyping and real-time control applications. This paper proposes a macro-scale 1-DOF model capable of predicting force and displacement of an MPC diaphragm actuator. Model validation confirmed both blocked force and displacement can be accurately predicted in a variety of working conditions i.e. different magnetic field strengths, static/dynamic fields, and gap distances. The contribution of this work includes a comprehensive experimental investigation of a macro-scale diaphragm actuator; the derivation and validation of a new phenomenological model to describe MPC actuation; and insights into the proposed model’s design-based functionality i.e. scalability and generalizability in terms of magnetic filler concentration and diaphragm diameter. Due to the lumped element modeling approach, the proposed model can also be adapted to alternative actuator configurations, and thus presents a useful tool for design, control and simulation of novel MPC applications.
Spatial reconstruction of single-cell gene expression data.
Satija, Rahul; Farrell, Jeffrey A; Gennert, David; Schier, Alexander F; Regev, Aviv
2015-05-01
Spatial localization is a key determinant of cellular fate and behavior, but methods for spatially resolved, transcriptome-wide gene expression profiling across complex tissues are lacking. RNA staining methods assay only a small number of transcripts, whereas single-cell RNA-seq, which measures global gene expression, separates cells from their native spatial context. Here we present Seurat, a computational strategy to infer cellular localization by integrating single-cell RNA-seq data with in situ RNA patterns. We applied Seurat to spatially map 851 single cells from dissociated zebrafish (Danio rerio) embryos and generated a transcriptome-wide map of spatial patterning. We confirmed Seurat's accuracy using several experimental approaches, then used the strategy to identify a set of archetypal expression patterns and spatial markers. Seurat correctly localizes rare subpopulations, accurately mapping both spatially restricted and scattered groups. Seurat will be applicable to mapping cellular localization within complex patterned tissues in diverse systems.
Spatial reconstruction of single-cell gene expression
Satija, Rahul; Farrell, Jeffrey A.; Gennert, David; Schier, Alexander F.; Regev, Aviv
2015-01-01
Spatial localization is a key determinant of cellular fate and behavior, but spatial RNA assays traditionally rely on staining for a limited number of RNA species. In contrast, single-cell RNA-seq allows for deep profiling of cellular gene expression, but established methods separate cells from their native spatial context. Here we present Seurat, a computational strategy to infer cellular localization by integrating single-cell RNA-seq data with in situ RNA patterns. We applied Seurat to spatially map 851 single cells from dissociated zebrafish (Danio rerio) embryos, inferring a transcriptome-wide map of spatial patterning. We confirmed Seurat’s accuracy using several experimental approaches, and used it to identify a set of archetypal expression patterns and spatial markers. Additionally, Seurat correctly localizes rare subpopulations, accurately mapping both spatially restricted and scattered groups. Seurat will be applicable to mapping cellular localization within complex patterned tissues in diverse systems. PMID:25867923
Kondrashova, Olga; Love, Clare J.; Lunke, Sebastian; Hsu, Arthur L.; Waring, Paul M.; Taylor, Graham R.
2015-01-01
Whilst next generation sequencing can report point mutations in fixed tissue tumour samples reliably, the accurate determination of copy number is more challenging. The conventional Multiplex Ligation-dependent Probe Amplification (MLPA) assay is an effective tool for measurement of gene dosage, but is restricted to around 50 targets due to size resolution of the MLPA probes. By switching from a size-resolved format, to a sequence-resolved format we developed a scalable, high-throughput, quantitative assay. MLPA-seq is capable of detecting deletions, duplications, and amplifications in as little as 5ng of genomic DNA, including from formalin-fixed paraffin-embedded (FFPE) tumour samples. We show that this method can detect BRCA1, BRCA2, ERBB2 and CCNE1 copy number changes in DNA extracted from snap-frozen and FFPE tumour tissue, with 100% sensitivity and >99.5% specificity. PMID:26569395
Barrett, Christian L.; Cho, Byung-Kwan
2011-01-01
Immuno-precipitation of protein–DNA complexes followed by microarray hybridization is a powerful and cost-effective technology for discovering protein–DNA binding events at the genome scale. It is still an unresolved challenge to comprehensively, accurately and sensitively extract binding event information from the produced data. We have developed a novel strategy composed of an information-preserving signal-smoothing procedure, higher order derivative analysis and application of the principle of maximum entropy to address this challenge. Importantly, our method does not require any input parameters to be specified by the user. Using genome-scale binding data of two Escherichia coli global transcription regulators for which a relatively large number of experimentally supported sites are known, we show that ∼90% of known sites were resolved to within four probes, or ∼88 bp. Over half of the sites were resolved to within two probes, or ∼38 bp. Furthermore, we demonstrate that our strategy delivers significant quantitative and qualitative performance gains over available methods. Such accurate and sensitive binding site resolution has important consequences for accurately reconstructing transcriptional regulatory networks, for motif discovery, for furthering our understanding of local and non-local factors in protein–DNA interactions and for extending the usefulness horizon of the ChIP-chip platform. PMID:21051353
A deep learning framework for modeling structural features of RNA-binding protein targets
Zhang, Sai; Zhou, Jingtian; Hu, Hailin; Gong, Haipeng; Chen, Ligong; Cheng, Chao; Zeng, Jianyang
2016-01-01
RNA-binding proteins (RBPs) play important roles in the post-transcriptional control of RNAs. Identifying RBP binding sites and characterizing RBP binding preferences are key steps toward understanding the basic mechanisms of the post-transcriptional gene regulation. Though numerous computational methods have been developed for modeling RBP binding preferences, discovering a complete structural representation of the RBP targets by integrating their available structural features in all three dimensions is still a challenging task. In this paper, we develop a general and flexible deep learning framework for modeling structural binding preferences and predicting binding sites of RBPs, which takes (predicted) RNA tertiary structural information into account for the first time. Our framework constructs a unified representation that characterizes the structural specificities of RBP targets in all three dimensions, which can be further used to predict novel candidate binding sites and discover potential binding motifs. Through testing on the real CLIP-seq datasets, we have demonstrated that our deep learning framework can automatically extract effective hidden structural features from the encoded raw sequence and structural profiles, and predict accurate RBP binding sites. In addition, we have conducted the first study to show that integrating the additional RNA tertiary structural features can improve the model performance in predicting RBP binding sites, especially for the polypyrimidine tract-binding protein (PTB), which also provides a new evidence to support the view that RBPs may own specific tertiary structural binding preferences. In particular, the tests on the internal ribosome entry site (IRES) segments yield satisfiable results with experimental support from the literature and further demonstrate the necessity of incorporating RNA tertiary structural information into the prediction model. The source code of our approach can be found in https://github.com/thucombio/deepnet-rbp. PMID:26467480
MITIE: Simultaneous RNA-Seq-based transcript identification and quantification in multiple samples.
Behr, Jonas; Kahles, André; Zhong, Yi; Sreedharan, Vipin T; Drewe, Philipp; Rätsch, Gunnar
2013-10-15
High-throughput sequencing of mRNA (RNA-Seq) has led to tremendous improvements in the detection of expressed genes and reconstruction of RNA transcripts. However, the extensive dynamic range of gene expression, technical limitations and biases, as well as the observed complexity of the transcriptional landscape, pose profound computational challenges for transcriptome reconstruction. We present the novel framework MITIE (Mixed Integer Transcript IdEntification) for simultaneous transcript reconstruction and quantification. We define a likelihood function based on the negative binomial distribution, use a regularization approach to select a few transcripts collectively explaining the observed read data and show how to find the optimal solution using Mixed Integer Programming. MITIE can (i) take advantage of known transcripts, (ii) reconstruct and quantify transcripts simultaneously in multiple samples, and (iii) resolve the location of multi-mapping reads. It is designed for genome- and assembly-based transcriptome reconstruction. We present an extensive study based on realistic simulated RNA-Seq data. When compared with state-of-the-art approaches, MITIE proves to be significantly more sensitive and overall more accurate. Moreover, MITIE yields substantial performance gains when used with multiple samples. We applied our system to 38 Drosophila melanogaster modENCODE RNA-Seq libraries and estimated the sensitivity of reconstructing omitted transcript annotations and the specificity with respect to annotated transcripts. Our results corroborate that a well-motivated objective paired with appropriate optimization techniques lead to significant improvements over the state-of-the-art in transcriptome reconstruction. MITIE is implemented in C++ and is available from http://bioweb.me/mitie under the GPL license.
Single cell HaloChip assay on paper for point-of-care diagnosis.
Ma, Liyuan; Qiao, Yong; Jones, Ross; Singh, Narendra; Su, Ming
2016-11-01
This article describes a paper-based low cost single cell HaloChip assay that can be used to assess drug- and radiation-induced DNA damage at point-of-care. Printing ink on paper effectively blocks fluorescence of paper materials, provides high affinity to charged polyelectrolytes, and prevents penetration of water in paper. After exposure to drug or ionizing radiation, cells are patterned on paper to create discrete and ordered single cell arrays, embedded inside an agarose gel, lysed with alkaline solution to allow damaged DNA fragments to diffuse out of nucleus cores, and form diffusing halos in the gel matrix. After staining DNA with a fluorescent dye, characteristic halos formed around cells, and the level of DNA damage can be quantified by determining sizes of halos and nucleus with an image processing program based on MATLAB. With its low fabrication cost and easy operation, this HaloChip on paper platform will be attractive to rapidly and accurately determine DNA damage for point-of-care evaluation of drug efficacy and radiation condition. Graphical Abstract Single cell HaloChip on paper.
Neural dynamics in reconfigurable silicon.
Basu, A; Ramakrishnan, S; Petre, C; Koziol, S; Brink, S; Hasler, P E
2010-10-01
A neuromorphic analog chip is presented that is capable of implementing massively parallel neural computations while retaining the programmability of digital systems. We show measurements from neurons with Hopf bifurcations and integrate and fire neurons, excitatory and inhibitory synapses, passive dendrite cables, coupled spiking neurons, and central pattern generators implemented on the chip. This chip provides a platform for not only simulating detailed neuron dynamics but also uses the same to interface with actual cells in applications such as a dynamic clamp. There are 28 computational analog blocks (CAB), each consisting of ion channels with tunable parameters, synapses, winner-take-all elements, current sources, transconductance amplifiers, and capacitors. There are four other CABs which have programmable bias generators. The programmability is achieved using floating gate transistors with on-chip programming control. The switch matrix for interconnecting the components in CABs also consists of floating-gate transistors. Emphasis is placed on replicating the detailed dynamics of computational neural models. Massive computational area efficiency is obtained by using the reconfigurable interconnect as synaptic weights, resulting in more than 50 000 possible 9-b accurate synapses in 9 mm(2).
TnSeq of Mycobacterium tuberculosis clinical isolates reveals strain-specific antibiotic liabilities
Carey, Allison F.; Rock, Jeremy M.; Krieger, Inna V.; Gagneux, Sebastien; Sacchettini, James C.; Fortune, Sarah M.
2018-01-01
Once considered a phenotypically monomorphic bacterium, there is a growing body of work demonstrating heterogeneity among Mycobacterium tuberculosis (Mtb) strains in clinically relevant characteristics, including virulence and response to antibiotics. However, the genetic and molecular basis for most phenotypic differences among Mtb strains remains unknown. To investigate the basis of strain variation in Mtb, we performed genome-wide transposon mutagenesis coupled with next-generation sequencing (TnSeq) for a panel of Mtb clinical isolates and the reference strain H37Rv to compare genetic requirements for in vitro growth across these strains. We developed an analytic approach to identify quantitative differences in genetic requirements between these genetically diverse strains, which vary in genomic structure and gene content. Using this methodology, we found differences between strains in their requirements for genes involved in fundamental cellular processes, including redox homeostasis and central carbon metabolism. Among the genes with differential requirements were katG, which encodes the activator of the first-line antitubercular agent isoniazid, and glcB, which encodes malate synthase, the target of a novel small-molecule inhibitor. Differences among strains in their requirement for katG and glcB predicted differences in their response to these antimicrobial agents. Importantly, these strain-specific differences in antibiotic response could not be predicted by genetic variants identified through whole genome sequencing or by gene expression analysis. Our results provide novel insight into the basis of variation among Mtb strains and demonstrate that TnSeq is a scalable method to predict clinically important phenotypic differences among Mtb strains. PMID:29505613
Zheng, Qi; Grice, Elizabeth A
2016-10-01
Accurate mapping of next-generation sequencing (NGS) reads to reference genomes is crucial for almost all NGS applications and downstream analyses. Various repetitive elements in human and other higher eukaryotic genomes contribute in large part to ambiguously (non-uniquely) mapped reads. Most available NGS aligners attempt to address this by either removing all non-uniquely mapping reads, or reporting one random or "best" hit based on simple heuristics. Accurate estimation of the mapping quality of NGS reads is therefore critical albeit completely lacking at present. Here we developed a generalized software toolkit "AlignerBoost", which utilizes a Bayesian-based framework to accurately estimate mapping quality of ambiguously mapped NGS reads. We tested AlignerBoost with both simulated and real DNA-seq and RNA-seq datasets at various thresholds. In most cases, but especially for reads falling within repetitive regions, AlignerBoost dramatically increases the mapping precision of modern NGS aligners without significantly compromising the sensitivity even without mapping quality filters. When using higher mapping quality cutoffs, AlignerBoost achieves a much lower false mapping rate while exhibiting comparable or higher sensitivity compared to the aligner default modes, therefore significantly boosting the detection power of NGS aligners even using extreme thresholds. AlignerBoost is also SNP-aware, and higher quality alignments can be achieved if provided with known SNPs. AlignerBoost's algorithm is computationally efficient, and can process one million alignments within 30 seconds on a typical desktop computer. AlignerBoost is implemented as a uniform Java application and is freely available at https://github.com/Grice-Lab/AlignerBoost.
NASA Astrophysics Data System (ADS)
Gupta, Shashank; Nam, Donguk; Vuckovic, Jelena; Saraswat, Krishna
2018-04-01
A complementary metal-oxide semiconductor compatible on-chip light source is the holy grail of silicon photonics and has the potential to alleviate the key scaling issues arising due to electrical interconnects. Despite several theoretical predictions, a sustainable, room temperature laser from a group-IV material is yet to be demonstrated. In this work, we show that a particular loss mechanism, inter-valence-band absorption (IVBA), has been inadequately modeled until now and capturing its effect accurately as a function of strain is crucial to understanding light emission processes from uniaxially strained germanium (Ge). We present a detailed model of light emission in Ge that accurately models IVBA in the presence of strain and other factors such as polarization, doping, and carrier injection, thereby revising the road map toward a room temperature Ge laser. Strikingly, a special resonance between gain and loss mechanisms at 4%-5% 〈100 〉 uniaxial strain is found resulting in a high net gain of more than 400 cm-1 at room temperature. It is shown that achieving this resonance should be the goal of experimental work rather than pursuing a direct band gap Ge.
Wang, WeiBo; Sun, Wei; Wang, Wei; Szatkiewicz, Jin
2018-03-01
The application of high-throughput sequencing in a broad range of quantitative genomic assays (e.g., DNA-seq, ChIP-seq) has created a high demand for the analysis of large-scale read-count data. Typically, the genome is divided into tiling windows and windowed read-count data is generated for the entire genome from which genomic signals are detected (e.g. copy number changes in DNA-seq, enrichment peaks in ChIP-seq). For accurate analysis of read-count data, many state-of-the-art statistical methods use generalized linear models (GLM) coupled with the negative-binomial (NB) distribution by leveraging its ability for simultaneous bias correction and signal detection. However, although statistically powerful, the GLM+NB method has a quadratic computational complexity and therefore suffers from slow running time when applied to large-scale windowed read-count data. In this study, we aimed to speed up substantially the GLM+NB method by using a randomized algorithm and we demonstrate here the utility of our approach in the application of detecting copy number variants (CNVs) using a real example. We propose an efficient estimator, the randomized GLM+NB coefficients estimator (RGE), for speeding up the GLM+NB method. RGE samples the read-count data and solves the estimation problem on a smaller scale. We first theoretically validated the consistency and the variance properties of RGE. We then applied RGE to GENSENG, a GLM+NB based method for detecting CNVs. We named the resulting method as "R-GENSENG". Based on extensive evaluation using both simulated and empirical data, we concluded that R-GENSENG is ten times faster than the original GENSENG while maintaining GENSENG's accuracy in CNV detection. Our results suggest that RGE strategy developed here could be applied to other GLM+NB based read-count analyses, i.e. ChIP-seq data analysis, to substantially improve their computational efficiency while preserving the analytic power.
Dai, Zhiyan; Shu, Xin; Li, Gang; Xi, Weidong
2018-04-01
We evaluated the performance of a protein chip assay for the detection of antibodies to hepatitis C virus (HCV) peptides among injection drug abusers (IDAs) by comparing the assay with existing methods, including enzyme-linked immunosorbent assay (ELISA), reverse transcription-polymerase chain reaction (RT-PCR), and recombinant immunoblot assay (RIBA). Seventy serum samples collected from IDAs were analyzed by protein chip assay. ELISA, RT-PCR, and RIBA assay were used to validate the results. The protein chips could detect different peptides' antibodies against HCV-C, HCV-NS3, HCV-NS4, HCV-NS5, and HCV-mix antigen; no cross reactivity between antigens was observed. Results of the protein-chip assay were compared with those of ELISA. Any inconsistency in results was validated by both RT-PCR and RIBA. Concordance between the results of the protein-chip assay and ELISA was 96.3% for positive samples and 100% for negative samples. The protein chip had a higher specificity than ELISA, a higher sensitivity than RTPCR, and similar specificity and sensitivity as compared to RIBA. The limit of detection of HCV antibodies in the protein chip assay was examined and calculated by incubation of model array with different dilutions. The protein chip assay required smaller amounts of both samples and reagents; it detected serum antibody in sample quantity as low as 3 ng/mL and at antibody dilution as low as 1:1000; its cost was low as well. The positive rates in the antiC, anti-NS3, anti-NS4, and anti-NS5 groups were significantly associated with levels of HCV RNA and the viral load. The HCV RNA and protein chip positive rate in the injection equipment-sharing group was higher than that in the non-injection equipment-sharing group. The protein chip assay is a faster and simpler approach to simultaneously screen for all HCV peptide antibodies accurately and to provide a rapid diagnosis of HCV infection in IDAs. The dominant positive HCV peptide antibodies were significantly associated with HCV RNA load, especially in the injection equipment-sharing group.
Broddrick, Jared T.; Rubin, Benjamin E.; Welkie, David G.; ...
2016-12-20
The model cyanobacterium, Synechococcus elongatus PCC 7942, is a genetically tractable obligate phototroph that is being developed for the bioproduction of high-value chemicals. Genome-scale models (GEMs) have been successfully used to assess and engineer cellular metabolism; however, GEMs of phototrophic metabolism have been limited by the lack of experimental datasets for model validation and the challenges of incorporating photon uptake. In this paper, we develop a GEM of metabolism in S. elongatus using random barcode transposon site sequencing (RB-TnSeq) essential gene and physiological data specific to photoautotrophic metabolism. The model explicitly describes photon absorption and accounts for shading, resulting inmore » the characteristic linear growth curve of photoautotrophs. GEM predictions of gene essentiality were compared with data obtained from recent dense-transposon mutagenesis experiments. This dataset allowed major improvements to the accuracy of the model. Furthermore, discrepancies between GEM predictions and the in vivo dataset revealed biological characteristics, such as the importance of a truncated, linear TCA pathway, low flux toward amino acid synthesis from photorespiration, and knowledge gaps within nucleotide metabolism. Finally, coupling of strong experimental support and photoautotrophic modeling methods thus resulted in a highly accurate model of S. elongatus metabolism that highlights previously unknown areas of S. elongatus biology.« less
Lee, Seungeun; Yamamoto, Naomichi
2015-12-01
This study characterized the accuracy of high-throughput amplicon sequencing to identify species within the genus Aspergillus. To this end, we sequenced the internal transcribed spacer 1 (ITS1), β-tubulin (BenA), and calmodulin (CaM) gene encoding sequences as DNA markers from eight reference Aspergillus strains with known identities using 300-bp sequencing on the Illumina MiSeq platform, and compared them with the BLASTn outputs. The identifications with the sequences longer than 250 bp were accurate at the section rank, with some ambiguities observed at the species rank due to mostly cross detection of sibling species. Additionally, in silico analysis was performed to predict the identification accuracy for all species in the genus Aspergillus, where 107, 210, and 187 species were predicted to be identifiable down to the species rank based on ITS1, BenA, and CaM, respectively. Finally, air filter samples were analysed to quantify the relative abundances of Aspergillus species in outdoor air. The results were reproducible across biological duplicates both at the species and section ranks, but not strongly correlated between ITS1 and BenA, suggesting the Aspergillus detection can be taxonomically biased depending on the selection of the DNA markers and/or primers. Copyright © 2015 The British Mycological Society. Published by Elsevier Ltd. All rights reserved.
Broddrick, Jared T.; Rubin, Benjamin E.; Welkie, David G.; Du, Niu; Mih, Nathan; Diamond, Spencer; Lee, Jenny J.; Golden, Susan S.; Palsson, Bernhard O.
2016-01-01
The model cyanobacterium, Synechococcus elongatus PCC 7942, is a genetically tractable obligate phototroph that is being developed for the bioproduction of high-value chemicals. Genome-scale models (GEMs) have been successfully used to assess and engineer cellular metabolism; however, GEMs of phototrophic metabolism have been limited by the lack of experimental datasets for model validation and the challenges of incorporating photon uptake. Here, we develop a GEM of metabolism in S. elongatus using random barcode transposon site sequencing (RB-TnSeq) essential gene and physiological data specific to photoautotrophic metabolism. The model explicitly describes photon absorption and accounts for shading, resulting in the characteristic linear growth curve of photoautotrophs. GEM predictions of gene essentiality were compared with data obtained from recent dense-transposon mutagenesis experiments. This dataset allowed major improvements to the accuracy of the model. Furthermore, discrepancies between GEM predictions and the in vivo dataset revealed biological characteristics, such as the importance of a truncated, linear TCA pathway, low flux toward amino acid synthesis from photorespiration, and knowledge gaps within nucleotide metabolism. Coupling of strong experimental support and photoautotrophic modeling methods thus resulted in a highly accurate model of S. elongatus metabolism that highlights previously unknown areas of S. elongatus biology. PMID:27911809
DOE Office of Scientific and Technical Information (OSTI.GOV)
Broddrick, Jared T.; Rubin, Benjamin E.; Welkie, David G.
The model cyanobacterium, Synechococcus elongatus PCC 7942, is a genetically tractable obligate phototroph that is being developed for the bioproduction of high-value chemicals. Genome-scale models (GEMs) have been successfully used to assess and engineer cellular metabolism; however, GEMs of phototrophic metabolism have been limited by the lack of experimental datasets for model validation and the challenges of incorporating photon uptake. In this paper, we develop a GEM of metabolism in S. elongatus using random barcode transposon site sequencing (RB-TnSeq) essential gene and physiological data specific to photoautotrophic metabolism. The model explicitly describes photon absorption and accounts for shading, resulting inmore » the characteristic linear growth curve of photoautotrophs. GEM predictions of gene essentiality were compared with data obtained from recent dense-transposon mutagenesis experiments. This dataset allowed major improvements to the accuracy of the model. Furthermore, discrepancies between GEM predictions and the in vivo dataset revealed biological characteristics, such as the importance of a truncated, linear TCA pathway, low flux toward amino acid synthesis from photorespiration, and knowledge gaps within nucleotide metabolism. Finally, coupling of strong experimental support and photoautotrophic modeling methods thus resulted in a highly accurate model of S. elongatus metabolism that highlights previously unknown areas of S. elongatus biology.« less
NASA Astrophysics Data System (ADS)
Michalicek, M. Adrian; Bright, Victor M.
2001-10-01
This paper presents the design, fabrication, modeling, and testing of various arrays of cantilever micromirror devices integrated atop CMOS control electronics. The upper layers of the arrays are prefabricated in the MUMPs process and then flip-chip transferred to CMOS receiving modules using a novel latching off-chip hinge mechanism. This mechanism allows the micromirror arrays to be released, rotated off the edge of the host module and then bonded to the receiving module using a standard probe station. The hinge mechanism supports the arrays by tethers that are severed to free the arrays once bonded. The resulting devices are inherently planarized since the bottom of the first releasable MUMPs layer becomes the surface of the integrated mirror. The working devices are formed by mirror surfaces bonded to address electrodes fabricated above static memory cells on the CMOS module. These arrays demonstrate highly desirable features such as compatible address potentials, less than 2 nm of RMS roughness, approximately 1 micrometers of lateral position accuracy and the unique ability to metallize reflective surfaces without masking. Ultimately, the off-chip hinge mechanism enables very low-cost, simple, reliable, repeatable and accurate assembly of advanced MEMS and integrated microsystems without specialized equipment or complex procedures.
Sensing systems using chip-based spectrometers
NASA Astrophysics Data System (ADS)
Nitkowski, Arthur; Preston, Kyle J.; Sherwood-Droz, Nicolás.; Behr, Bradford B.; Bismilla, Yusuf; Cenko, Andrew T.; DesRoches, Brandon; Meade, Jeffrey T.; Munro, Elizabeth A.; Slaa, Jared; Schmidt, Bradley S.; Hajian, Arsen R.
2014-06-01
Tornado Spectral Systems has developed a new chip-based spectrometer called OCTANE, the Optical Coherence Tomography Advanced Nanophotonic Engine, built using a planar lightwave circuit with integrated waveguides fabricated on a silicon wafer. While designed for spectral domain optical coherence tomography (SD-OCT) systems, the same miniaturized technology can be applied to many other spectroscopic applications. The field of integrated optics enables the design of complex optical systems which are monolithically integrated on silicon chips. The form factors of these systems can be significantly smaller, more robust and less expensive than their equivalent free-space counterparts. Fabrication techniques and material systems developed for microelectronics have previously been adapted for integrated optics in the telecom industry, where millions of chip-based components are used to power the optical backbone of the internet. We have further adapted the photonic technology platform for spectroscopy applications, allowing unheard-of economies of scale for these types of optical devices. Instead of changing lenses and aligning systems, these devices are accurately designed programmatically and are easily customized for specific applications. Spectrometers using integrated optics have large advantages in systems where size, robustness and cost matter: field-deployable devices, UAVs, UUVs, satellites, handheld scanning and more. We will discuss the performance characteristics of our chip-based spectrometers and the type of spectral sensing applications enabled by this technology.
Plug-and-play, infrared, laser-mediated PCR in a microfluidic chip.
Pak, Nikita; Saunders, D Curtis; Phaneuf, Christopher R; Forest, Craig R
2012-04-01
Microfluidic polymerase chain reaction (PCR) systems have set milestones for small volume (100 nL-5 μL), amplification speed (100-400 s), and on-chip integration of upstream and downstream sample handling including purification and electrophoretic separation functionality. In practice, the microfluidic chips in these systems require either insertion of thermocouples or calibration prior to every amplification. These factors can offset the speed advantages of microfluidic PCR and have likely hindered commercialization. We present an infrared, laser-mediated, PCR system that features a single calibration, accurate and repeatable precision alignment, and systematic thermal modeling and management for reproducible, open-loop control of PCR in 1 μL chambers of a polymer microfluidic chip. Total cycle time is less than 12 min: 1 min to fill and seal, 10 min to amplify, and 1 min to recover the sample. We describe the design, basis for its operation, and the precision engineering in the system and microfluidic chip. From a single calibration, we demonstrate PCR amplification of a 500 bp amplicon from λ-phage DNA in multiple consecutive trials on the same instrument as well as multiple identical instruments. This simple, relatively low-cost plug-and-play design is thus accessible to persons who may not be skilled in assembly and engineering.
Chromas from chromatin: sonification of the epigenome
Cittaro, Davide; Lazarevic, Dejan; Provero, Paolo
2016-01-01
The epigenetic modifications are organized in patterns determining the functional properties of the underlying genome. Such patterns, typically measured by ChIP-seq assays of histone modifications, can be combined and translated into musical scores, summarizing multiple signals into a single waveform. As music is recognized as a universal way to convey meaningful information, we wanted to investigate properties of music obtained by sonification of ChIP-seq data. We show that the music produced by such quantitative signals is perceived by human listeners as more pleasant than that produced from randomized signals. Moreover, the waveform can be analyzed to predict phenotypic properties, such as differential gene expression. PMID:27019695
Caballero, D; Kaushik, S; Correlo, V M; Oliveira, J M; Reis, R L; Kundu, S C
2017-12-01
Most cancer patients do not die from the primary tumor but from its metastasis. Current in vitro and in vivo cancer models are incapable of satisfactorily predicting the outcome of various clinical treatments on patients. This is seen as a serious limitation and efforts are underway to develop a new generation of highly predictive cancer models with advanced capabilities. In this regard, organ-on-chip models of cancer metastasis emerge as powerful predictors of disease progression. They offer physiological-like conditions where the (hypothesized) mechanistic determinants of the disease can be assessed with ease. Combined with high-throughput characteristics, the employment of organ-on-chip technology would allow pharmaceutical companies and clinicians to test new therapeutic compounds and therapies. This will permit the screening of a large battery of new drugs in a fast and economic manner, to accelerate the diagnosis of the disease in the near future, and to test personalized treatments using cells from patients. In this review, we describe the latest advances in the field of organ-on-chip models of cancer metastasis and their integration with advanced imaging, screening and biosensing technologies for future precision medicine applications. We focus on their clinical applicability and market opportunities to drive us forward to the next generation of tumor models for improved cancer patient theranostics. Copyright © 2017 Elsevier Ltd. All rights reserved.
Study of CMOS-SOI Integrated Temperature Sensing Circuits for On-Chip Temperature Monitoring.
Malits, Maria; Brouk, Igor; Nemirovsky, Yael
2018-05-19
This paper investigates the concepts, performance and limitations of temperature sensing circuits realized in complementary metal-oxide-semiconductor (CMOS) silicon on insulator (SOI) technology. It is shown that the MOSFET threshold voltage ( V t ) can be used to accurately measure the chip local temperature by using a V t extractor circuit. Furthermore, the circuit's performance is compared to standard circuits used to generate an accurate output current or voltage proportional to the absolute temperature, i.e., proportional-to-absolute temperature (PTAT), in terms of linearity, sensitivity, power consumption, speed, accuracy and calibration needs. It is shown that the V t extractor circuit is a better solution to determine the temperature of low power, analog and mixed-signal designs due to its accuracy, low power consumption and no need for calibration. The circuit has been designed using 1 µm partially depleted (PD) CMOS-SOI technology, and demonstrates a measurement inaccuracy of ±1.5 K across 300 K⁻500 K temperature range while consuming only 30 µW during operation.
Medipix2 as a tool for proton beam characterization
NASA Astrophysics Data System (ADS)
Bisogni, M. G.; Cirrone, G. A. P.; Cuttone, G.; Del Guerra, A.; Lojacono, P.; Piliero, M. A.; Romano, F.; Rosso, V.; Sipala, V.; Stefanini, A.
2009-08-01
Proton therapy is a technique used to deliver a highly accurate and effective dose for the treatment of a variety of tumor diseases. The possibility to have an instrument able to give online information could reduce the time necessary to characterize the proton beam. To this aim we propose a detection system for online proton beam characterization based on the Medipix2 chip. Medipix2 is a detection system based on a single event counter read-out chip, bump-bonded to silicon pixel detector. The read-out chip is a matrix of 256×256 cells, 55×55 μm 2 each. To demonstrate the capabilities of Medipix2 as a proton detector, we have used a 62 MeV flux proton beam at the CATANA beam line of the LNS-INFN laboratory. The measurements performed confirmed the good imaging performances of the Medipix2 system also for the characterization of proton beams.
Nilyanimit, Pornjarim; Chansaenroj, Jira; Poomipak, Witthaya; Praianantathavorn, Kesmanee; Payungporn, Sunchai; Poovorawan, Yong
2018-03-01
Human papillomavirus (HPV) infection causes cervical cancer, thus necessitating early detection by screening. Rapid and accurate HPV genotyping is crucial both for the assessment of patients with HPV infection and for surveillance studies. Fifty-eight cervicovaginal samples were tested for HPV genotypes using four methods in parallel: nested-PCR followed by conventional sequencing, INNO-LiPA, electrochemical DNA chip, and next-generation sequencing (NGS). Seven HPV genotypes (16, 18, 31, 33, 45, 56, and 58) were identified by all four methods. Nineteen HPV genotypes were detected by NGS, but not by nested-PCR, INNO-LiPA, or electrochemical DNA chip. Although NGS is relatively expensive and complex, it may serve as a sensitive HPV genotyping method. Because of its highly sensitive detection of multiple HPV genotypes, NGS may serve as an alternative for diagnostic HPV genotyping in certain situations. © The Korean Society for Laboratory Medicine
Color sensor and neural processor on one chip
NASA Astrophysics Data System (ADS)
Fiesler, Emile; Campbell, Shannon R.; Kempem, Lother; Duong, Tuan A.
1998-10-01
Low-cost, compact, and robust color sensor that can operate in real-time under various environmental conditions can benefit many applications, including quality control, chemical sensing, food production, medical diagnostics, energy conservation, monitoring of hazardous waste, and recycling. Unfortunately, existing color sensor are either bulky and expensive or do not provide the required speed and accuracy. In this publication we describe the design of an accurate real-time color classification sensor, together with preprocessing and a subsequent neural network processor integrated on a single complementary metal oxide semiconductor (CMOS) integrated circuit. This one-chip sensor and information processor will be low in cost, robust, and mass-producible using standard commercial CMOS processes. The performance of the chip and the feasibility of its manufacturing is proven through computer simulations based on CMOS hardware parameters. Comparisons with competing methodologies show a significantly higher performance for our device.
Charbonneau, David M; Breault-Turcot, Julien; Sinnett, Daniel; Krajinovic, Maja; Leclerc, Jean-Marie; Masson, Jean-François; Pelletier, Joelle N
2017-12-22
Microbial asparaginase is an essential component of chemotherapy for the treatment of childhood acute lymphoblastic leukemia (cALL). Silent hypersensitivity reactions to this microbial enzyme need to be monitored accurately during treatment to avoid adverse effects of the drug and its silent inactivation. Here, we present a dual-response anti-asparaginase sensor that combines indirect SPR and fluorescence on a single chip to perform ELISA-type immunosensing, and correlate measurements with classical ELISA. Analysis of serum samples from children undergoing cALL therapy revealed a clear correlation between single-chip indirect SPR/fluorescence immunosensing and ELISA used in clinical settings (R 2 > 0.9). We also report that the portable SPR/fluorescence system had a better sensitivity than classical ELISA to detect antibodies in clinical samples with low antigenicity. This work demonstrates the reliability of dual sensing for monitoring clinically relevant antibody titers in clinical serum samples.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thomas, Luc, E-mail: luc.thomas@headway.com; Jan, Guenole; Le, Son
The thermal stability of perpendicular Spin-Transfer-Torque Magnetic Random Access Memory (STT-MRAM) devices is investigated at chip level. Experimental data are analyzed in the framework of the Néel-Brown model including distributions of the thermal stability factor Δ. We show that in the low error rate regime important for applications, the effect of distributions of Δ can be described by a single quantity, the effective thermal stability factor Δ{sub eff}, which encompasses both the median and the standard deviation of the distributions. Data retention of memory chips can be assessed accurately by measuring Δ{sub eff} as a function of device diameter andmore » temperature. We apply this method to show that 54 nm devices based on our perpendicular STT-MRAM design meet our 10 year data retention target up to 120 °C.« less
Magnetic domain wall conduits for single cell applications.
Donolato, M; Torti, A; Kostesha, N; Deryabina, M; Sogne, E; Vavassori, P; Hansen, M F; Bertacco, R
2011-09-07
The ability to trap, manipulate and release single cells on a surface is important both for fundamental studies of cellular processes and for the development of novel lab-on-chip miniaturized tools for biological and medical applications. In this paper we demonstrate how magnetic domain walls generated in micro- and nano-structures fabricated on a chip surface can be used to handle single yeast cells labeled with magnetic beads. In detail, first we show that the proposed approach maintains the microorganism viable, as proven by monitoring the division of labeled yeast cells trapped by domain walls over 16 hours. Moreover, we demonstrate the controlled transport and release of individual yeast cells via displacement and annihilation of individual domain walls in micro- and nano-sized magnetic structures. These results pave the way to the implementation of magnetic devices based on domain walls technology in lab-on-chip systems devoted to accurate individual cell trapping and manipulation.
Al-Tobasei, Rafet; Ali, Ali; Leeds, Timothy D; Liu, Sixin; Palti, Yniv; Kenney, Brett; Salem, Mohamed
2017-08-07
Coding/functional SNPs change the biological function of a gene and, therefore, could serve as "large-effect" genetic markers. In this study, we used two bioinformatics pipelines, GATK and SAMtools, for discovering coding/functional SNPs with allelic-imbalances associated with total body weight, muscle yield, muscle fat content, shear force, and whiteness. Phenotypic data were collected for approximately 500 fish, representing 98 families (5 fish/family), from a growth-selected line, and the muscle transcriptome was sequenced from 22 families with divergent phenotypes (4 low- versus 4 high-ranked families per trait). GATK detected 59,112 putative SNPs; of these SNPs, 4798 showed allelic imbalances (>2.0 as an amplification and <0.5 as loss of heterozygosity). SAMtools detected 87,066 putative SNPs; and of them, 4962 had allelic imbalances between the low- and high-ranked families. Only 1829 SNPs with allelic imbalances were common between the two datasets, indicating significant differences in algorithms. The two datasets contained 7930 non-redundant SNPs of which 4439 mapped to 1498 protein-coding genes (with 6.4% non-synonymous SNPs) and 684 mapped to 295 lncRNAs. Validation of a subset of 92 SNPs revealed 1) 86.7-93.8% success rate in calling polymorphic SNPs and 2) 95.4% consistent matching between DNA and cDNA genotypes indicating a high rate of identifying SNPs with allelic imbalances. In addition, 4.64% SNPs revealed random monoallelic expression. Genome distribution of the SNPs with allelic imbalances exhibited high density for all five traits in several chromosomes, especially chromosome 9, 20 and 28. Most of the SNP-harboring genes were assigned to important growth-related metabolic pathways. These results demonstrate utility of RNA-Seq in assessing phenotype-associated allelic imbalances in pooled RNA-Seq samples. The SNPs identified in this study were included in a new SNP-Chip design (available from Affymetrix) for genomic and genetic analyses in rainbow trout.
Liu, Liqin; Luo, Qiaoling; Teng, Wan; Li, Bin; Li, Hongwei; Li, Yiwen; Li, Zhensheng; Zheng, Qi
2018-05-01
Based on SLAF-seq, 67 Thinopyrum ponticum-specific markers and eight Th. ponticum-specific FISH probes were developed, and these markers and probes could be used for detection of alien chromatin in a wheat background. Decaploid Thinopyrum ponticum (2n = 10x = 70) is a valuable gene reservoir for wheat improvement. Identification of Th. ponticum introgression would facilitate its transfer into diverse wheat genetic backgrounds and its practical utilization in wheat improvement. Based on specific-locus-amplified fragment sequencing (SLAF-seq) technology, 67 new Th. ponticum-specific molecular markers and eight Th. ponticum-specific fluorescence in situ hybridization (FISH) probes have been developed from a tiny wheat-Th. ponticum translocation line. These newly developed molecular markers allowed the detection of Th. ponticum DNA in a variety of materials specifically and steadily at high throughput. According to the hybridization signal pattern, the eight Th. ponticum-specific probes could be divided into two groups. The first group including five dispersed repetitive sequence probes could identify Th. ponticum chromatin more sensitively and accurately than genomic in situ hybridization (GISH). Whereas the second group having three tandem repetitive sequence probes enabled the discrimination of Th. ponticum chromosomes together with another clone pAs1 in wheat-Th. ponticum partial amphiploid Xiaoyan 68.
SEASTAR: systematic evaluation of alternative transcription start sites in RNA.
Qin, Zhiyi; Stoilov, Peter; Zhang, Xuegong; Xing, Yi
2018-05-04
Alternative first exons diversify the transcriptomes of eukaryotes by producing variants of the 5' Untranslated Regions (5'UTRs) and N-terminal coding sequences. Accurate transcriptome-wide detection of alternative first exons typically requires specialized experimental approaches that are designed to identify the 5' ends of transcripts. We developed a computational pipeline SEASTAR that identifies first exons from RNA-seq data alone then quantifies and compares alternative first exon usage across multiple biological conditions. The exons inferred by SEASTAR coincide with transcription start sites identified directly by CAGE experiments and bear epigenetic hallmarks of active promoters. To determine if differential usage of alternative first exons can yield insights into the mechanism controlling gene expression, we applied SEASTAR to an RNA-seq dataset that tracked the reprogramming of mouse fibroblasts into induced pluripotent stem cells. We observed dynamic temporal changes in the usage of alternative first exons, along with correlated changes in transcription factor expression. Using a combined sequence motif and gene set enrichment analysis we identified N-Myc as a regulator of alternative first exon usage in the pluripotent state. Our results demonstrate that SEASTAR can leverage the available RNA-seq data to gain insights into the control of gene expression and alternative transcript variation in eukaryotic transcriptomes.
The LAM-PCR Method to Sequence LV Integration Sites.
Wang, Wei; Bartholomae, Cynthia C; Gabriel, Richard; Deichmann, Annette; Schmidt, Manfred
2016-01-01
Integrating viral gene transfer vectors are commonly used gene delivery tools in clinical gene therapy trials providing stable integration and continuous gene expression of the transgene in the treated host cell. However, integration of the reverse-transcribed vector DNA into the host genome is a potentially mutagenic event that may directly contribute to unwanted side effects. A comprehensive and accurate analysis of the integration site (IS) repertoire is indispensable to study clonality in transduced cells obtained from patients undergoing gene therapy and to identify potential in vivo selection of affected cell clones. To date, next-generation sequencing (NGS) of vector-genome junctions allows sophisticated studies on the integration repertoire in vitro and in vivo. We have explored the use of the Illumina MiSeq Personal Sequencer platform to sequence vector ISs amplified by non-restrictive linear amplification-mediated PCR (nrLAM-PCR) and LAM-PCR. MiSeq-based high-quality IS sequence retrieval is accomplished by the introduction of a double-barcode strategy that substantially minimizes the frequency of IS sequence collisions compared to the conventionally used single-barcode protocol. Here, we present an updated protocol of (nr)LAM-PCR for the analysis of lentiviral IS using a double-barcode system and followed by deep sequencing using the MiSeq device.
contamDE: differential expression analysis of RNA-seq data for contaminated tumor samples.
Shen, Qi; Hu, Jiyuan; Jiang, Ning; Hu, Xiaohua; Luo, Zewei; Zhang, Hong
2016-03-01
Accurate detection of differentially expressed genes between tumor and normal samples is a primary approach of cancer-related biomarker identification. Due to the infiltration of tumor surrounding normal cells, the expression data derived from tumor samples would always be contaminated with normal cells. Ignoring such cellular contamination would deflate the power of detecting DE genes and further confound the biological interpretation of the analysis results. For the time being, there does not exists any differential expression analysis approach for RNA-seq data in literature that can properly account for the contamination of tumor samples. Without appealing to any extra information, we develop a new method 'contamDE' based on a novel statistical model that associates RNA-seq expression levels with cell types. It is demonstrated through simulation studies that contamDE could be much more powerful than the existing methods that ignore the contamination. In the application to two cancer studies, contamDE uniquely found several potential therapy and prognostic biomarkers of prostate cancer and non-small cell lung cancer. An R package contamDE is freely available at http://homepage.fudan.edu.cn/zhangh/softwares/ zhanghfd@fudan.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Ahadi, Alireza; Sablok, Gaurav; Hutvagner, Gyorgy
2017-04-07
MicroRNAs (miRNAs) are ∼19-22 nucleotides (nt) long regulatory RNAs that regulate gene expression by recognizing and binding to complementary sequences on mRNAs. The key step in revealing the function of a miRNA, is the identification of miRNA target genes. Recent biochemical advances including PAR-CLIP and HITS-CLIP allow for improved miRNA target predictions and are widely used to validate miRNA targets. Here, we present miRTar2GO, which is a model, trained on the common rules of miRNA-target interactions, Argonaute (Ago) CLIP-Seq data and experimentally validated miRNA target interactions. miRTar2GO is designed to predict miRNA target sites using more relaxed miRNA-target binding characteristics. More importantly, miRTar2GO allows for the prediction of cell-type specific miRNA targets. We have evaluated miRTar2GO against other widely used miRNA target prediction algorithms and demonstrated that miRTar2GO produced significantly higher F1 and G scores. Target predictions, binding specifications, results of the pathway analysis and gene ontology enrichment of miRNA targets are freely available at http://www.mirtar2go.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Physics of self-aligned assembly at room temperature
NASA Astrophysics Data System (ADS)
Dubey, V.; Beyne, E.; Derakhshandeh, J.; De Wolf, I.
2018-01-01
Self-aligned assembly, making use of capillary forces, is considered as an alternative to active alignment during thermo-compression bonding of Si chips in the 3D heterogeneous integration process. Various process parameters affect the alignment accuracy of the chip over the patterned binding site on a substrate/carrier wafer. This paper discusses the chip motion due to wetting and capillary force using a transient coupled physics model for the two regimes (that is, wetting regime and damped oscillatory regime) in the temporal domain. Using the transient model, the effect of the volume of the liquid and the placement accuracy of the chip on the alignment force is studied. The capillary time (that is, the time it takes for the chip to reach its mean position) for the chip is directly proportional to the placement offset and inversely proportional to the viscosity. The time constant of the harmonic oscillations is directly proportional to the gap between the chips due to the volume of the fluid. The predicted behavior from transient simulations is next experimentally validated and it is confirmed that the liquid volume and the initial placement affect the final alignment accuracy of the top chip on the bottom substrate. With statistical experimental data, we demonstrate an alignment accuracy reaching <1 μm.
NASA Astrophysics Data System (ADS)
Reece, Amy E.
The microfabrication of microfluidic control systems and advances in molecular amplification tools has enabled the miniaturization of single cell analytical platforms for the efficient, highly selective enumeration and molecular characterization of rare and diseased cells from clinical samples. In many cases, the high-throughput nature of microfluidic inertial focusing has enabled the popularization of this new class of Lab-on-a-Chip devices that exhibit numerous advantages over conventional methods as prognostic and diagnostic tools. Inertial focusing is the passive, sheathless alignment of particles and cells to precise spatiotemporal equilibrium positions that arise from a force balance between opposing inertial lift forces and hydrodynamic repulsions. The applicability of inertial focusing to a spectrum of filtration, separation and encapsulation challenges places heavy emphasis upon the accurate description of the hydrodynamic forces responsible for predictable inertial focusing behavior. These inertial focusing fundamentals, limitations and their applications are studied extensively throughout this work.
Dense and Sparse Matrix Operations on the Cell Processor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, Samuel W.; Shalf, John; Oliker, Leonid
2005-05-01
The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. Therefore, the high performance computing community is examining alternative architectures that address the limitations of modern superscalar designs. In this work, we examine STI's forthcoming Cell processor: a novel, low-power architecture that combines a PowerPC core with eight independent SIMD processing units coupled with a software-controlled memory to offer high FLOP/s/Watt. Since neither Cell hardware nor cycle-accurate simulators are currently publicly available, we develop an analytic framework to predict Cell performance on dense and sparse matrix operations, usingmore » a variety of algorithmic approaches. Results demonstrate Cell's potential to deliver more than an order of magnitude better GFLOP/s per watt performance, when compared with the Intel Itanium2 and Cray X1 processors.« less
Comparative Analysis of Single-Cell RNA Sequencing Methods.
Ziegenhain, Christoph; Vieth, Beate; Parekh, Swati; Reinius, Björn; Guillaumet-Adkins, Amy; Smets, Martha; Leonhardt, Heinrich; Heyn, Holger; Hellmann, Ines; Enard, Wolfgang
2017-02-16
Single-cell RNA sequencing (scRNA-seq) offers new possibilities to address biological and medical questions. However, systematic comparisons of the performance of diverse scRNA-seq protocols are lacking. We generated data from 583 mouse embryonic stem cells to evaluate six prominent scRNA-seq methods: CEL-seq2, Drop-seq, MARS-seq, SCRB-seq, Smart-seq, and Smart-seq2. While Smart-seq2 detected the most genes per cell and across cells, CEL-seq2, Drop-seq, MARS-seq, and SCRB-seq quantified mRNA levels with less amplification noise due to the use of unique molecular identifiers (UMIs). Power simulations at different sequencing depths showed that Drop-seq is more cost-efficient for transcriptome quantification of large numbers of cells, while MARS-seq, SCRB-seq, and Smart-seq2 are more efficient when analyzing fewer cells. Our quantitative comparison offers the basis for an informed choice among six prominent scRNA-seq methods, and it provides a framework for benchmarking further improvements of scRNA-seq protocols. Copyright © 2017 Elsevier Inc. All rights reserved.
Next Generation Programmable Bio-Nano-Chip System for On-Site Detection in Oral Fluids.
Christodoulides, Nicolaos; De La Garza, Richard; Simmons, Glennon W; McRae, Michael P; Wong, Jorge; Newton, Thomas F; Kosten, Thomas R; Haque, Ahmed; McDevitt, John T
2015-11-23
Current on-site drug of abuse detection methods involve invasive sampling of blood and urine specimens, or collection of oral fluid, followed by qualitative screening tests using immunochromatographic cartridges. Test confirmation and quantitative assessment of a presumptive positive are then provided by remote laboratories, an inefficient and costly process decoupled from the initial sampling. Recently, a new noninvasive oral fluid sampling approach that is integrated with the chip-based Programmable Bio-Nano-Chip (p-BNC) platform has been developed for the rapid (~ 10 minutes), sensitive detection (~ ng/ml) and quantitation of 12 drugs of abuse. Furthermore, the system can provide the time-course of select drug and metabolite profiles in oral fluids. For cocaine, we observed three slope components were correlated with cocaine-induced impairment using this chip-based p-BNC detection modality. Thus, this p-BNC has significant potential for roadside drug testing by law enforcement officers. Initial work reported on chip-based drug detection was completed using 'macro' or "chip in the lab" prototypes, that included metal encased "flow cells", external peristaltic pumps and a bench-top analyzer system instrumentation. We now describe the next generation miniaturized analyzer instrumentation along with customized disposables and sampling devices. These tools will offer real-time oral fluid drug monitoring capabilities, to be used for roadside drug testing as well as testing in clinical settings as a non-invasive, quantitative, accurate and sensitive tool to verify patient adherence to treatment.
High-accurate optical fiber liquid level sensor
NASA Astrophysics Data System (ADS)
Sun, Dexing; Chen, Shouliu; Pan, Chao; Jin, Henghuan
1991-08-01
A highly accurate optical fiber liquid level sensor is presented. The single-chip microcomputer is used to process and control the signal. This kind of sensor is characterized by self-security and is explosion-proof, so it can be applied in any liquid level detecting areas, especially in the oil and chemical industries. The theories and experiments about how to improve the measurement accuracy are described. The relative error for detecting the measurement range 10 m is up to 0.01%.
Numerical modeling of immiscible two-phase flow in micro-models using a commercial CFD code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Crandall, Dustin; Ahmadia, Goodarz; Smith, Duane H.
2009-01-01
Off-the-shelf CFD software is being used to analyze everything from flow over airplanes to lab-on-a-chip designs. So, how accurately can two-phase immiscible flow be modeled flowing through some small-scale models of porous media? We evaluate the capability of the CFD code FLUENT{trademark} to model immiscible flow in micro-scale, bench-top stereolithography models. By comparing the flow results to experimental models we show that accurate 3D modeling is possible.
SeqDepot: streamlined database of biological sequences and precomputed features.
Ulrich, Luke E; Zhulin, Igor B
2014-01-15
Assembling and/or producing integrated knowledge of sequence features continues to be an onerous and redundant task despite a large number of existing resources. We have developed SeqDepot-a novel database that focuses solely on two primary goals: (i) assimilating known primary sequences with predicted feature data and (ii) providing the most simple and straightforward means to procure and readily use this information. Access to >28.5 million sequences and 300 million features is provided through a well-documented and flexible RESTful interface that supports fetching specific data subsets, bulk queries, visualization and searching by MD5 digests or external database identifiers. We have also developed an HTML5/JavaScript web application exemplifying how to interact with SeqDepot and Perl/Python scripts for use with local processing pipelines. Freely available on the web at http://seqdepot.net/. RESTaccess via http://seqdepot.net/api/v1. Database files and scripts maybe downloaded from http://seqdepot.net/download.
Mandryk, J; Alwis, K U; Hocking, A D
1999-05-01
Four sawmills, a wood chipping mill, and five joineries in New South Wales, Australia, were studied for the effects of personal exposure to wood dust, endotoxins. (1-->3)-beta-D-glucans, Gram-negative bacteria, and fungi on lung function among woodworkers. Personal inhalable and respirable dust sampling was carried out. The lung function tests of workers were conducted before and after a workshift. The mean percentage cross-shift decrease in lung function was markedly high for woodworkers compared with the controls. Dose-response relationships among personal exposures and percentage cross-shift decrease in lung function and percentage predicted lung function were more pronounced among joinery workers compared with sawmill and chip mill workers. Woodworkers had markedly high prevalence of regular cough, phlegm, and chronic bronchitis compared with controls. Significant associations were found between percentage cross-shift decrease in FVC and regular phlegm and blocked nose among sawmill and chip mill workers. Both joinery workers and sawmill and chip mill workers showed significant relationships between percentage predicted lung function (FVC, FEV1, FEV1/FVC, FEF25-75%) and respiratory symptoms. Wood dust and biohazards associated with wood dust are potential health hazards and should be controlled.
A proposed holistic approach to on-chip, off-chip, test, and package interconnections
NASA Astrophysics Data System (ADS)
Bartelink, Dirk J.
1998-11-01
The term interconnection has traditionally implied a `robust' connection from a transistor or a group of transistors in an IC to the outside world, usually a PC board. Optimum system utilization is done from outside the IC. As an alternative, this paper addresses `unimpeded' transistor-to-transistor interconnection aimed at reaching the high circuit densities and computational capabilities of neighboring IC's. In this view, interconnections are not made to some human-centric place outside the IC world requiring robustness—except for system input and output connections. This unimpeded interconnect style is currently available only through intra-chip signal traces in `system-on-a-chip' implementations, as exemplified by embedded DRAMs. Because the traditional off-chip penalty in performance and wiring density is so large, a merging of complex process technologies is the only option today. It is suggested that, for system integration to move forward, the traditional robustness requirement inherited from conventional packaging interconnect and IC manufacturing test must be discarded. Traditional system assembly from vendor parts requires robustness under shipping, inspection and assembly. The trend toward systems on a chip signifies willingness by semiconductor companies to design and fabricate whole systems in house, so that `in-house' chip-to-chip assembly is not beyond reach. In this scenario, bare chips never leave the controlled environment of the IC fabricator while the two major contributors to off-chip signal penalty, ESD protection and the need to source a 50-ohm test head, are avoided. With in-house assembly, ESD protection can be eliminated with the precautions already familiar in plasma etching. Test interconnection impacts the fundamentals of IC manufacturing, particularly with clock speeds approaching 1GHz, and cannot be an afterthought. It should be an integral part of the chip-to-chip interconnection bandwidth optimization, because—as we must recognize—test is also performed using IC's. A system interconnection is proposed using multiple chips fabricated with conventional silicon processes, including MEMS technology. The system resembles an MCM that can be joined without committing to final assembly to perform at-speed testing. 50-Ohm test probes never load the circuit; only intended neighboring chips are ever connected. A `back-plane' chip provides the connection layers for both inter- and intra-chip signals and also serves as the probe card, in analogy with membrane probes now used for single-chip testing. Intra-chip connections, which require complicated connections during test that exactly match the product, are then properly made and all waveforms and loading conditions under test will be identical to those of the product. The major benefit is that all front-end chip technologies can be merged—logic, memory, RF, even passives. ESD protection is required only on external system connections. Manufacturing test information will accurately characterize process faults and thus avoid the Known-Good-Die problem that has slowed the arrival of conventional MCM's.
De novo RNA-seq and functional annotation of Ornithonyssus bacoti.
Niu, DongLing; Wang, RuiLing; Zhao, YaE; Yang, Rui; Hu, Li
2018-06-01
Ornithonyssus bacoti (Hirst) (Acari: Macronyssidae) is a vector and reservoir of pathogens causing serious infectious diseases, such as epidemic hemorrhagic fever, endemic typhus, tularemia, and leptospirosis. Its genome and transcriptome data are lacking in public databases. In this study, total RNA was extracted from live O. bacoti to conduct RNA-seq, functional annotation, coding domain sequence (CDS) prediction and simple sequence repeats (SSRs) detection. The results showed that 65.8 million clean reads were generated and assembled into 72,185 unigenes, of which 49.4% were annotated by seven functional databases. 23,121 unigenes were annotated and assigned to 457 species by non-redundant protein sequence database. The BLAST top-two hit species were Metaseiulus occidentalis and Ixodes scapularis. The procedure detected 12,426 SSRs, of which tri- and di-nucleotides were the most abundant types and the representative motifs were AAT/ATT and AC/GT. 26,936 CDS were predicted with a mean length of 711 bp. 87 unigenes of 30 functional genes, which are usually involved in stress responses, drug resistance, movement, metabolism and allergy, were further identified by bioinformatics methods. The unigenes putatively encoding cytochrome P450 proteins were further analyzed phylogenetically. In conclusion, this study completed the RNA-seq and functional annotation of O. bacoti successfully, which provides reliable molecular data for its future studies of gene function and molecular markers.
Comparative assessment of methods for the fusion transcripts detection from RNA-Seq data
Kumar, Shailesh; Vo, Angie Duy; Qin, Fujun; Li, Hui
2016-01-01
RNA-Seq made possible the global identification of fusion transcripts, i.e. “chimeric RNAs”. Even though various software packages have been developed to serve this purpose, they behave differently in different datasets provided by different developers. It is important for both users, and developers to have an unbiased assessment of the performance of existing fusion detection tools. Toward this goal, we compared the performance of 12 well-known fusion detection software packages. We evaluated the sensitivity, false discovery rate, computing time, and memory usage of these tools in four different datasets (positive, negative, mixed, and test). We conclude that some tools are better than others in terms of sensitivity, positive prediction value, time consumption and memory usage. We also observed small overlaps of the fusions detected by different tools in the real dataset (test dataset). This could be due to false discoveries by various tools, but could also be due to the reason that none of the tools are inclusive. We have found that the performance of the tools depends on the quality, read length, and number of reads of the RNA-Seq data. We recommend that users choose the proper tools for their purpose based on the properties of their RNA-Seq data. PMID:26862001
42 CFR 431.972 - Claims sampling procedures.
Code of Federal Regulations, 2013 CFR
2013-10-01
... Estimating Improper Payments in Medicaid and CHIP § 431.972 Claims sampling procedures. (a) Claims universe. (1) The PERM claims universe includes payments that were originally paid (paid claims) and for which... must establish controls to ensure FFS and managed care universes are accurate and complete, including...
42 CFR 431.972 - Claims sampling procedures.
Code of Federal Regulations, 2012 CFR
2012-10-01
... Estimating Improper Payments in Medicaid and CHIP § 431.972 Claims sampling procedures. (a) Claims universe. (1) The PERM claims universe includes payments that were originally paid (paid claims) and for which... must establish controls to ensure FFS and managed care universes are accurate and complete, including...
42 CFR 431.972 - Claims sampling procedures.
Code of Federal Regulations, 2014 CFR
2014-10-01
... Estimating Improper Payments in Medicaid and CHIP § 431.972 Claims sampling procedures. (a) Claims universe. (1) The PERM claims universe includes payments that were originally paid (paid claims) and for which... must establish controls to ensure FFS and managed care universes are accurate and complete, including...
PRISM offers a comprehensive genomic approach to transcription factor function prediction
Wenger, Aaron M.; Clarke, Shoa L.; Guturu, Harendra; Chen, Jenny; Schaar, Bruce T.; McLean, Cory Y.; Bejerano, Gill
2013-01-01
The human genome encodes 1500–2000 different transcription factors (TFs). ChIP-seq is revealing the global binding profiles of a fraction of TFs in a fraction of their biological contexts. These data show that the majority of TFs bind directly next to a large number of context-relevant target genes, that most binding is distal, and that binding is context specific. Because of the effort and cost involved, ChIP-seq is seldom used in search of novel TF function. Such exploration is instead done using expression perturbation and genetic screens. Here we propose a comprehensive computational framework for transcription factor function prediction. We curate 332 high-quality nonredundant TF binding motifs that represent all major DNA binding domains, and improve cross-species conserved binding site prediction to obtain 3.3 million conserved, mostly distal, binding site predictions. We combine these with 2.4 million facts about all human and mouse gene functions, in a novel statistical framework, in search of enrichments of particular motifs next to groups of target genes of particular functions. Rigorous parameter tuning and a harsh null are used to minimize false positives. Our novel PRISM (predicting regulatory information from single motifs) approach obtains 2543 TF function predictions in a large variety of contexts, at a false discovery rate of 16%. The predictions are highly enriched for validated TF roles, and 45 of 67 (67%) tested binding site regions in five different contexts act as enhancers in functionally matched cells. PMID:23382538
Yuksel, Ferhat; Karaman, Safa; Kayacier, Ahmed
2014-02-15
In the present study, wheat chips enriched with flaxseed flour were produced and response surface methodology was used for the studying the simultaneous effects of flaxseed level (10-20%), frying temperature (160-180 °C) and frying time (40-60 s) on some physicochemical, textural and sensorial properties and fatty acid composition of wheat chips. Ridge analysis was conducted to determine the optimum levels of processing variables. Predictive regression equations with adequate coefficients of determination (R² ≥ 0.705) to explain the effect of processing variables were constructed. Addition of flaxseed flour increased the dry matter and protein content of samples and increase of frying temperature decreased the hardness values of wheat chips samples. Increment in flaxseed level provided an increase in unsaturated fatty acid content namely omega-3 fatty acids of wheat chips samples. Overall acceptability of chips increased with the increase of frying temperature. Ridge analysis showed that maximum taste score would be at flaxseed level = 10%, frying temperature = 180 °C and frying time = 50 s. Copyright © 2013 Elsevier Ltd. All rights reserved.
Olsen, Colin; Arantes, Valdeir; Saddler, Jack
2015-01-01
The influence of chip size and moisture content on the combined sugar recovery after steam pretreatment of lodgepole pine and subsequent enzymatic hydrolysis of the cellulosic component were investigated using response surface methodology. Chip size had little influence on sugar recovery after both steam pretreatment and enzymatic hydrolysis. In contrast, the moisture of the chips greatly influenced the relative severity of steam pretreatment and, as a result, the combined sugar recovery from the hemicellulosic and cellulosic fractions. Irrespective of chip size and the pretreatment temperature, time, and SO2 loading that were used, the relative severity of pretreatment was highest at a moisture of 30-40w/w%. However, the predictive model indicated that an elevated moisture content of roughly 50w/w% (about the moisture content of a standard softwood mill chip) would result in the highest, combined sugar recovery (80%) over the widest range of steam pretreatment conditions. Copyright © 2015 Elsevier Ltd. All rights reserved.
Single event effects on the APV25 front-end chip
NASA Astrophysics Data System (ADS)
Friedl, M.; Bauer, T.; Pernicka, M.
2003-03-01
The Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider at CERN will include a Silicon Strip Tracker covering a sensitive area of 206 m2. About ten million channels will be read out by APV25 front-end chips, fabricated in the 0.25 μm deep submicron process. Although permanent damage is not expected within CMS radiation levels, transient Single Event Upsets are inevitable. Moreover, localized ionization can also produce fake signals in the analog circuitry. Eight APV25 chips were exposed to a high-intensity pion beam at the Paul Scherrer Institute (Villigen/CH) to study the radiation induced effects in detail. The results, which are compatible to similar measurements performed with heavy ions, are used to predict the chip error rate at CMS.
Chen, Zhangxing; Huang, Tianyu; Shao, Yimin; ...
2018-03-15
Predicting the mechanical behavior of the chopped carbon fiber Sheet Molding Compound (SMC) due to spatial variations in local material properties is critical for the structural performance analysis but is computationally challenging. Such spatial variations are induced by the material flow in the compression molding process. In this work, a new multiscale SMC modeling framework and the associated computational techniques are developed to provide accurate and efficient predictions of SMC mechanical performance. The proposed multiscale modeling framework contains three modules. First, a stochastic algorithm for 3D chip-packing reconstruction is developed to efficiently generate the SMC mesoscale Representative Volume Element (RVE)more » model for Finite Element Analysis (FEA). A new fiber orientation tensor recovery function is embedded in the reconstruction algorithm to match reconstructions with the target characteristics of fiber orientation distribution. Second, a metamodeling module is established to improve the computational efficiency by creating the surrogates of mesoscale analyses. Third, the macroscale behaviors are predicted by an efficient multiscale model, in which the spatially varying material properties are obtained based on the local fiber orientation tensors. Our approach is further validated through experiments at both meso- and macro-scales, such as tensile tests assisted by Digital Image Correlation (DIC) and mesostructure imaging.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Zhangxing; Huang, Tianyu; Shao, Yimin
Predicting the mechanical behavior of the chopped carbon fiber Sheet Molding Compound (SMC) due to spatial variations in local material properties is critical for the structural performance analysis but is computationally challenging. Such spatial variations are induced by the material flow in the compression molding process. In this work, a new multiscale SMC modeling framework and the associated computational techniques are developed to provide accurate and efficient predictions of SMC mechanical performance. The proposed multiscale modeling framework contains three modules. First, a stochastic algorithm for 3D chip-packing reconstruction is developed to efficiently generate the SMC mesoscale Representative Volume Element (RVE)more » model for Finite Element Analysis (FEA). A new fiber orientation tensor recovery function is embedded in the reconstruction algorithm to match reconstructions with the target characteristics of fiber orientation distribution. Second, a metamodeling module is established to improve the computational efficiency by creating the surrogates of mesoscale analyses. Third, the macroscale behaviors are predicted by an efficient multiscale model, in which the spatially varying material properties are obtained based on the local fiber orientation tensors. Our approach is further validated through experiments at both meso- and macro-scales, such as tensile tests assisted by Digital Image Correlation (DIC) and mesostructure imaging.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lu, Tse -Yuan; Mehlhorn, Tonia L; Pelletier, Dale A.
RNA-seq is being used increasingly for gene expression studies and it is revolutionizing the fields of genomics and transcriptomics. However, the field of RNA-seq analysis is still evolving. Therefore, we specifically designed this study to contain large numbers of reads and four biological replicates per condition so we could alter these parameters and assess their impact on differential expression results. Bacillus thuringiensis strains ATCC10792 and CT43 were grown in two Luria broth medium lots on four dates and transcriptomics data were generated using one lane of sequence output from an Illumina HiSeq2000 instrument for each of the 32 samples, whichmore » were then analyzed using DESeq2. Genome coverages across samples ranged from 87 to 465X with medium lots and culture dates identified as major variation sources. Significantly differentially expressed genes (5% FDR, two-fold change) were detected for cultures grown using different medium lots and between different dates. The highly differentially expressed iron acquisition and metabolism genes, were a likely consequence of differing amounts of iron in the two media lots. Indeed, in this study RNA-seq was a tool for predictive biology since we hypothesized and confirmed the two LB medium lots had different iron contents (~two-fold difference). Furthermore, this study shows that the noise in data can be controlled and minimized with appropriate experimental design and by having the appropriate number of replicates and reads for the system being studied. We outline parameters for an efficient and cost effective microbial transcriptomics study.« less
Lu, Tse -Yuan; Mehlhorn, Tonia L; Pelletier, Dale A.; ...
2016-05-31
RNA-seq is being used increasingly for gene expression studies and it is revolutionizing the fields of genomics and transcriptomics. However, the field of RNA-seq analysis is still evolving. Therefore, we specifically designed this study to contain large numbers of reads and four biological replicates per condition so we could alter these parameters and assess their impact on differential expression results. Bacillus thuringiensis strains ATCC10792 and CT43 were grown in two Luria broth medium lots on four dates and transcriptomics data were generated using one lane of sequence output from an Illumina HiSeq2000 instrument for each of the 32 samples, whichmore » were then analyzed using DESeq2. Genome coverages across samples ranged from 87 to 465X with medium lots and culture dates identified as major variation sources. Significantly differentially expressed genes (5% FDR, two-fold change) were detected for cultures grown using different medium lots and between different dates. The highly differentially expressed iron acquisition and metabolism genes, were a likely consequence of differing amounts of iron in the two media lots. Indeed, in this study RNA-seq was a tool for predictive biology since we hypothesized and confirmed the two LB medium lots had different iron contents (~two-fold difference). Furthermore, this study shows that the noise in data can be controlled and minimized with appropriate experimental design and by having the appropriate number of replicates and reads for the system being studied. We outline parameters for an efficient and cost effective microbial transcriptomics study.« less
Manga, Punita; Klingeman, Dawn M; Lu, Tse-Yuan S; Mehlhorn, Tonia L; Pelletier, Dale A; Hauser, Loren J; Wilson, Charlotte M; Brown, Steven D
2016-01-01
RNA-seq is being used increasingly for gene expression studies and it is revolutionizing the fields of genomics and transcriptomics. However, the field of RNA-seq analysis is still evolving. Therefore, we specifically designed this study to contain large numbers of reads and four biological replicates per condition so we could alter these parameters and assess their impact on differential expression results. Bacillus thuringiensis strains ATCC10792 and CT43 were grown in two Luria broth medium lots on four dates and transcriptomics data were generated using one lane of sequence output from an Illumina HiSeq2000 instrument for each of the 32 samples, which were then analyzed using DESeq2. Genome coverages across samples ranged from 87 to 465X with medium lots and culture dates identified as major variation sources. Significantly differentially expressed genes (5% FDR, two-fold change) were detected for cultures grown using different medium lots and between different dates. The highly differentially expressed iron acquisition and metabolism genes, were a likely consequence of differing amounts of iron in the two media lots. Indeed, in this study RNA-seq was a tool for predictive biology since we hypothesized and confirmed the two LB medium lots had different iron contents (~two-fold difference). This study shows that the noise in data can be controlled and minimized with appropriate experimental design and by having the appropriate number of replicates and reads for the system being studied. We outline parameters for an efficient and cost effective microbial transcriptomics study.
Fast imputation using medium- or low-coverage sequence data
USDA-ARS?s Scientific Manuscript database
Direct imputation from raw sequence reads can be more accurate than calling genotypes first and then imputing, especially if read depth is low or error rates high, but different imputation strategies are required than those used for data from genotyping chips. A fast algorithm to impute from lower t...
Physics-based process modeling, reliability prediction, and design guidelines for flip-chip devices
NASA Astrophysics Data System (ADS)
Michaelides, Stylianos
Flip Chip on Board (FCOB) and Chip-Scale Packages (CSPs) are relatively new technologies that are being increasingly used in the electronic packaging industry. Compared to the more widely used face-up wirebonding and TAB technologies, flip-chips and most CSPs provide the shortest possible leads, lower inductance, higher frequency, better noise control, higher density, greater input/output (I/O), smaller device footprint and lower profile. However, due to the short history and due to the introduction of several new electronic materials, designs, and processing conditions, very limited work has been done to understand the role of material, geometry, and processing parameters on the reliability of flip-chip devices. Also, with the ever-increasing complexity of semiconductor packages and with the continued reduction in time to market, it is too costly to wait until the later stages of design and testing to discover that the reliability is not satisfactory. The objective of the research is to develop integrated process-reliability models that will take into consideration the mechanics of assembly processes to be able to determine the reliability of face-down devices under thermal cycling and long-term temperature dwelling. The models incorporate the time and temperature-dependent constitutive behavior of various materials in the assembly to be able to predict failure modes such as die cracking and solder cracking. In addition, the models account for process-induced defects and macro-micro features of the assembly. Creep-fatigue and continuum-damage mechanics models for the solder interconnects and fracture-mechanics models for the die have been used to determine the reliability of the devices. The results predicted by the models have been successfully validated against experimental data. The validated models have been used to develop qualification and test procedures for implantable medical devices. In addition, the research has helped develop innovative face-down devices without the underfill, based on the thorough understanding of the failure modes. Also, practical design guidelines for material, geometry and process parameters for reliable flip-chip devices have been developed.
Liu, Lian; Zhang, Shao-Wu; Huang, Yufei; Meng, Jia
2017-08-31
As a newly emerged research area, RNA epigenetics has drawn increasing attention recently for the participation of RNA methylation and other modifications in a number of crucial biological processes. Thanks to high throughput sequencing techniques, such as, MeRIP-Seq, transcriptome-wide RNA methylation profile is now available in the form of count-based data, with which it is often of interests to study the dynamics at epitranscriptomic layer. However, the sample size of RNA methylation experiment is usually very small due to its costs; and additionally, there usually exist a large number of genes whose methylation level cannot be accurately estimated due to their low expression level, making differential RNA methylation analysis a difficult task. We present QNB, a statistical approach for differential RNA methylation analysis with count-based small-sample sequencing data. Compared with previous approaches such as DRME model based on a statistical test covering the IP samples only with 2 negative binomial distributions, QNB is based on 4 independent negative binomial distributions with their variances and means linked by local regressions, and in the way, the input control samples are also properly taken care of. In addition, different from DRME approach, which relies only the input control sample only for estimating the background, QNB uses a more robust estimator for gene expression by combining information from both input and IP samples, which could largely improve the testing performance for very lowly expressed genes. QNB showed improved performance on both simulated and real MeRIP-Seq datasets when compared with competing algorithms. And the QNB model is also applicable to other datasets related RNA modifications, including but not limited to RNA bisulfite sequencing, m 1 A-Seq, Par-CLIP, RIP-Seq, etc.
Evaluating whole transcriptome amplification for gene profiling experiments using RNA-Seq.
Faherty, Sheena L; Campbell, C Ryan; Larsen, Peter A; Yoder, Anne D
2015-07-30
RNA-Seq has enabled high-throughput gene expression profiling to provide insight into the functional link between genotype and phenotype. Low quantities of starting RNA can be a severe hindrance for studies that aim to utilize RNA-Seq. To mitigate this bottleneck, whole transcriptome amplification (WTA) technologies have been developed to generate sufficient sequencing targets from minute amounts of RNA. Successful WTA requires accurate replication of transcript abundance without the loss or distortion of specific mRNAs. Here, we test the efficacy of NuGEN's Ovation RNA-Seq V2 system, which uses linear isothermal amplification with a unique chimeric primer for amplification, using white adipose tissue from standard laboratory rats (Rattus norvegicus). Our goal was to investigate potential biological artifacts introduced through WTA approaches by establishing comparisons between matched raw and amplified RNA libraries derived from biological replicates. We found that 93% of expressed genes were identical between all unamplified versus matched amplified comparisons, also finding that gene density is similar across all comparisons. Our sequencing experiment and downstream bioinformatic analyses using the Tuxedo analysis pipeline resulted in the assembly of 25,543 high-quality transcripts. Libraries constructed from raw RNA and WTA samples averaged 15,298 and 15,253 expressed genes, respectively. Although significant differentially expressed genes (P < 0.05) were identified in all matched samples, each of these represents less than 0.15% of all shared genes for each comparison. Transcriptome amplification is efficient at maintaining relative transcript frequencies with no significant bias when using this NuGEN linear isothermal amplification kit under ideal laboratory conditions as presented in this study. This methodology has broad applications, from clinical and diagnostic, to field-based studies when sample acquisition, or sample preservation, methods prove challenging.
TagDigger: user-friendly extraction of read counts from GBS and RAD-seq data.
Clark, Lindsay V; Sacks, Erik J
2016-01-01
In genotyping-by-sequencing (GBS) and restriction site-associated DNA sequencing (RAD-seq), read depth is important for assessing the quality of genotype calls and estimating allele dosage in polyploids. However, existing pipelines for GBS and RAD-seq do not provide read counts in formats that are both accurate and easy to access. Additionally, although existing pipelines allow previously-mined SNPs to be genotyped on new samples, they do not allow the user to manually specify a subset of loci to examine. Pipelines that do not use a reference genome assign arbitrary names to SNPs, making meta-analysis across projects difficult. We created the software TagDigger, which includes three programs for analyzing GBS and RAD-seq data. The first script, tagdigger_interactive.py, rapidly extracts read counts and genotypes from FASTQ files using user-supplied sets of barcodes and tags. Input and output is in CSV format so that it can be opened by spreadsheet software. Tag sequences can also be imported from the Stacks, TASSEL-GBSv2, TASSEL-UNEAK, or pyRAD pipelines, and a separate file can be imported listing the names of markers to retain. A second script, tag_manager.py, consolidates marker names and sequences across multiple projects. A third script, barcode_splitter.py, assists with preparing FASTQ data for deposit in a public archive by splitting FASTQ files by barcode and generating MD5 checksums for the resulting files. TagDigger is open-source and freely available software written in Python 3. It uses a scalable, rapid search algorithm that can process over 100 million FASTQ reads per hour. TagDigger will run on a laptop with any operating system, does not consume hard drive space with intermediate files, and does not require programming skill to use.
Highly sensitive and unbiased approach for elucidating antibody repertoires
Lin, Sherry G.; Ba, Zhaoqing; Du, Zhou; Zhang, Yu; Hu, Jiazhi; Alt, Frederick W.
2016-01-01
Developing B lymphocytes undergo V(D)J recombination to assemble germ-line V, D, and J gene segments into exons that encode the antigen-binding variable region of Ig heavy (H) and light (L) chains. IgH and IgL chains associate to form the B-cell receptor (BCR), which, upon antigen binding, activates B cells to secrete BCR as an antibody. Each of the huge number of clonally independent B cells expresses a unique set of IgH and IgL variable regions. The ability of V(D)J recombination to generate vast primary B-cell repertoires results from a combinatorial assortment of large numbers of different V, D, and J segments, coupled with diversification of the junctions between them to generate the complementary determining region 3 (CDR3) for antigen contact. Approaches to evaluate in depth the content of primary antibody repertoires and, ultimately, to study how they are further molded by secondary mutation and affinity maturation processes are of great importance to the B-cell development, vaccine, and antibody fields. We now describe an unbiased, sensitive, and readily accessible assay, referred to as high-throughput genome-wide translocation sequencing-adapted repertoire sequencing (HTGTS-Rep-seq), to quantify antibody repertoires. HTGTS-Rep-seq quantitatively identifies the vast majority of IgH and IgL V(D)J exons, including their unique CDR3 sequences, from progenitor and mature mouse B lineage cells via the use of specific J primers. HTGTS-Rep-seq also accurately quantifies DJH intermediates and V(D)J exons in either productive or nonproductive configurations. HTGTS-Rep-seq should be useful for studies of human samples, including clonal B-cell expansions, and also for following antibody affinity maturation processes. PMID:27354528
A computational method for estimating the PCR duplication rate in DNA and RNA-seq experiments.
Bansal, Vikas
2017-03-14
PCR amplification is an important step in the preparation of DNA sequencing libraries prior to high-throughput sequencing. PCR amplification introduces redundant reads in the sequence data and estimating the PCR duplication rate is important to assess the frequency of such reads. Existing computational methods do not distinguish PCR duplicates from "natural" read duplicates that represent independent DNA fragments and therefore, over-estimate the PCR duplication rate for DNA-seq and RNA-seq experiments. In this paper, we present a computational method to estimate the average PCR duplication rate of high-throughput sequence datasets that accounts for natural read duplicates by leveraging heterozygous variants in an individual genome. Analysis of simulated data and exome sequence data from the 1000 Genomes project demonstrated that our method can accurately estimate the PCR duplication rate on paired-end as well as single-end read datasets which contain a high proportion of natural read duplicates. Further, analysis of exome datasets prepared using the Nextera library preparation method indicated that 45-50% of read duplicates correspond to natural read duplicates likely due to fragmentation bias. Finally, analysis of RNA-seq datasets from individuals in the 1000 Genomes project demonstrated that 70-95% of read duplicates observed in such datasets correspond to natural duplicates sampled from genes with high expression and identified outlier samples with a 2-fold greater PCR duplication rate than other samples. The method described here is a useful tool for estimating the PCR duplication rate of high-throughput sequence datasets and for assessing the fraction of read duplicates that correspond to natural read duplicates. An implementation of the method is available at https://github.com/vibansal/PCRduplicates .
do Nascimento, Naíla C; Guimaraes, Ana M S; Dos Santos, Andrea P; Chu, Yuefeng; Marques, Lucas M; Messick, Joanne B
2018-06-18
Pigs are popular animal models in biomedical research. RNA-Seq is becoming the predominant tool to investigate transcriptional changes of the pig's response to infection. The high sensitivity of this tool requires a strict control of the study design beginning with the selection of healthy animals to provide accurate interpretation of research data. Pigs chronically infected with Mycoplasma suis often show no obvious clinical signs, however the infection may affect the validity of animal research. The goal of this study was to investigate whether or not this silent infection is also silent at the host transcriptional level. Therefore, immunocompetent pigs were experimentally infected with M. suis and transcriptional profiles of whole blood, generated by RNA-Seq, were analyzed and compared to non-infected animals. RNA-Seq showed 55 differentially expressed (DE) genes in the M. suis infected pigs. Down-regulation of genes related to innate immunity (tlr8, chemokines, chemokines receptors) and genes containing IFN gamma-activated sequence (gbp1, gbp2, il15, cxcl10, casp1, cd274) suggests a general suppression of the immune response in the infected animals. Sixteen (29.09%) of the DE genes were involved in two protein interaction networks: one involving chemokines, chemokine receptors and interleukin-15 and another involving the complement cascade. Genes related to vascular permeability, blood coagulation, and endothelium integrity were also DE in infected pigs. These findings suggest that M. suis subclinical infection causes significant alterations in blood mRNA levels, which could impact data interpretation of research using pigs. Screening of pigs for M. suis infection before initiating animal studies is strongly recommended.
Transcriptional risk scores link GWAS to eQTLs and predict complications in Crohn's disease.
Marigorta, Urko M; Denson, Lee A; Hyams, Jeffrey S; Mondal, Kajari; Prince, Jarod; Walters, Thomas D; Griffiths, Anne; Noe, Joshua D; Crandall, Wallace V; Rosh, Joel R; Mack, David R; Kellermayer, Richard; Heyman, Melvin B; Baker, Susan S; Stephens, Michael C; Baldassano, Robert N; Markowitz, James F; Kim, Mi-Ok; Dubinsky, Marla C; Cho, Judy; Aronow, Bruce J; Kugathasan, Subra; Gibson, Greg
2017-10-01
Gene expression profiling can be used to uncover the mechanisms by which loci identified through genome-wide association studies (GWAS) contribute to pathology. Given that most GWAS hits are in putative regulatory regions and transcript abundance is physiologically closer to the phenotype of interest, we hypothesized that summation of risk-allele-associated gene expression, namely a transcriptional risk score (TRS), should provide accurate estimates of disease risk. We integrate summary-level GWAS and expression quantitative trait locus (eQTL) data with RNA-seq data from the RISK study, an inception cohort of pediatric Crohn's disease. We show that TRSs based on genes regulated by variants linked to inflammatory bowel disease (IBD) not only outperform genetic risk scores (GRSs) in distinguishing Crohn's disease from healthy samples, but also serve to identify patients who in time will progress to complicated disease. Our dissection of eQTL effects may be used to distinguish genes whose association with disease is through promotion versus protection, thereby linking statistical association to biological mechanism. The TRS approach constitutes a potential strategy for personalized medicine that enhances inference from static genotypic risk assessment.
NASA Astrophysics Data System (ADS)
Lamb, A. K.; Calvin, W. M.
2010-12-01
We surveyed drill chips with a lab spectrometer in the visible-near infrared (VNIR) and short-wave infrared (SWIR) regions, 0.35-2.5 μm, to evaluate hydrothermal alteration mineralogy of samples from two known geothermal fields in western Nevada. Rock is fractured into small pieces or “chips” during drilling and stored in trays by depth interval. The drill chips are used to determine subsurface properties such as lithology, structure, and alteration. Accurately determining alteration mineralogy in the geothermal reservoir is important for indicating thermal fluids (usually associated with fluid pathways such as faults) and the highest temperature of alteration. Hydrothermal minerals, including carbonates, iron oxides, hydroxides, sheet silicates, and sulfates, are especially diagnostic in the VNIR-SWIR region.. The strength of reflectance spectroscopy is that it is rapid and accurate for differentiating temperature-sensitive minerals that are not visually unique. We examined drill chips from two western Nevada geothermal fields: Hawthorne (two wells) and Steamboat Springs (three wells) using an ASD lab spectrometer with very high resolution. The Steamboat Hills geothermal field has produced electricity since 1988 and is well studied, and is believed to be a combination of extensional tectonics and magmatic origin. Bedrocks are Cretaceous granodiorite intruding into older metasediments. Hot springs and other surface expressions occur over an area of about 2.6 km2. In contrast, the Hawthorne geothermal reservoir is a ‘blind’ system with no surface expressions such as hot springs or geysers. The geothermal field is situated in a range front fault zone in an extensional area, and is contained in Mesozoic mixed granite and meta-volcanics. We collected spectra at each interval in the chip trays. Interval length varied between 10’ and 30’. - Endmember analysis and mineral identification were performed -using standard analysis approaches used to map mineralogy in remote sensing data sets. Mapped by depth, we identified narrow zones of intense alteration that mark fluid circulation, and overall changes in metamorphic grade facies through clay type. Steamboat Hills is more highly altered than Hawthorne, thus the alteration assemblages reflect the pH and temperature differences.
Fang, Xin; Sastry, Anand; Mih, Nathan; Kim, Donghyuk; Tan, Justin; Lloyd, Colton J.; Gao, Ye; Yang, Laurence; Palsson, Bernhard O.
2017-01-01
Transcriptional regulatory networks (TRNs) have been studied intensely for >25 y. Yet, even for the Escherichia coli TRN—probably the best characterized TRN—several questions remain. Here, we address three questions: (i) How complete is our knowledge of the E. coli TRN; (ii) how well can we predict gene expression using this TRN; and (iii) how robust is our understanding of the TRN? First, we reconstructed a high-confidence TRN (hiTRN) consisting of 147 transcription factors (TFs) regulating 1,538 transcription units (TUs) encoding 1,764 genes. The 3,797 high-confidence regulatory interactions were collected from published, validated chromatin immunoprecipitation (ChIP) data and RegulonDB. For 21 different TF knockouts, up to 63% of the differentially expressed genes in the hiTRN were traced to the knocked-out TF through regulatory cascades. Second, we trained supervised machine learning algorithms to predict the expression of 1,364 TUs given TF activities using 441 samples. The algorithms accurately predicted condition-specific expression for 86% (1,174 of 1,364) of the TUs, while 193 TUs (14%) were predicted better than random TRNs. Third, we identified 10 regulatory modules whose definitions were robust against changes to the TRN or expression compendium. Using surrogate variable analysis, we also identified three unmodeled factors that systematically influenced gene expression. Our computational workflow comprehensively characterizes the predictive capabilities and systems-level functions of an organism’s TRN from disparate data types. PMID:28874552
Brooks, Matthew J.; Rajasimha, Harsha K.; Roger, Jerome E.
2011-01-01
Purpose Next-generation sequencing (NGS) has revolutionized systems-based analysis of cellular pathways. The goals of this study are to compare NGS-derived retinal transcriptome profiling (RNA-seq) to microarray and quantitative reverse transcription polymerase chain reaction (qRT–PCR) methods and to evaluate protocols for optimal high-throughput data analysis. Methods Retinal mRNA profiles of 21-day-old wild-type (WT) and neural retina leucine zipper knockout (Nrl−/−) mice were generated by deep sequencing, in triplicate, using Illumina GAIIx. The sequence reads that passed quality filters were analyzed at the transcript isoform level with two methods: Burrows–Wheeler Aligner (BWA) followed by ANOVA (ANOVA) and TopHat followed by Cufflinks. qRT–PCR validation was performed using TaqMan and SYBR Green assays. Results Using an optimized data analysis workflow, we mapped about 30 million sequence reads per sample to the mouse genome (build mm9) and identified 16,014 transcripts in the retinas of WT and Nrl−/− mice with BWA workflow and 34,115 transcripts with TopHat workflow. RNA-seq data confirmed stable expression of 25 known housekeeping genes, and 12 of these were validated with qRT–PCR. RNA-seq data had a linear relationship with qRT–PCR for more than four orders of magnitude and a goodness of fit (R2) of 0.8798. Approximately 10% of the transcripts showed differential expression between the WT and Nrl−/− retina, with a fold change ≥1.5 and p value <0.05. Altered expression of 25 genes was confirmed with qRT–PCR, demonstrating the high degree of sensitivity of the RNA-seq method. Hierarchical clustering of differentially expressed genes uncovered several as yet uncharacterized genes that may contribute to retinal function. Data analysis with BWA and TopHat workflows revealed a significant overlap yet provided complementary insights in transcriptome profiling. Conclusions Our study represents the first detailed analysis of retinal transcriptomes, with biologic replicates, generated by RNA-seq technology. The optimized data analysis workflows reported here should provide a framework for comparative investigations of expression profiles. Our results show that NGS offers a comprehensive and more accurate quantitative and qualitative evaluation of mRNA content within a cell or tissue. We conclude that RNA-seq based transcriptome characterization would expedite genetic network analyses and permit the dissection of complex biologic functions. PMID:22162623
Dale, Ryan K; Matzat, Leah H; Lei, Elissa P
2014-08-01
Here we introduce metaseq, a software library written in Python, which enables loading multiple genomic data formats into standard Python data structures and allows flexible, customized manipulation and visualization of data from high-throughput sequencing studies. We demonstrate its practical use by analyzing multiple datasets related to chromatin insulators, which are DNA-protein complexes proposed to organize the genome into distinct transcriptional domains. Recent studies in Drosophila and mammals have implicated RNA in the regulation of chromatin insulator activities. Moreover, the Drosophila RNA-binding protein Shep has been shown to antagonize gypsy insulator activity in a tissue-specific manner, but the precise role of RNA in this process remains unclear. Better understanding of chromatin insulator regulation requires integration of multiple datasets, including those from chromatin-binding, RNA-binding, and gene expression experiments. We use metaseq to integrate RIP- and ChIP-seq data for Shep and the core gypsy insulator protein Su(Hw) in two different cell types, along with publicly available ChIP-chip and RNA-seq data. Based on the metaseq-enabled analysis presented here, we propose a model where Shep associates with chromatin cotranscriptionally, then is recruited to insulator complexes in trans where it plays a negative role in insulator activity. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Nelson, Christopher S; Fuller, Chris K; Fordyce, Polly M; Greninger, Alexander L; Li, Hao; DeRisi, Joseph L
2013-07-01
The transcription factor forkhead box P2 (FOXP2) is believed to be important in the evolution of human speech. A mutation in its DNA-binding domain causes severe speech impairment. Humans have acquired two coding changes relative to the conserved mammalian sequence. Despite intense interest in FOXP2, it has remained an open question whether the human protein's DNA-binding specificity and chromatin localization are conserved. Previous in vitro and ChIP-chip studies have provided conflicting consensus sequences for the FOXP2-binding site. Using MITOMI 2.0 microfluidic affinity assays, we describe the binding site of FOXP2 and its affinity profile in base-specific detail for all substitutions of the strongest binding site. We find that human and chimp FOXP2 have similar binding sites that are distinct from previously suggested consensus binding sites. Additionally, through analysis of FOXP2 ChIP-seq data from cultured neurons, we find strong overrepresentation of a motif that matches our in vitro results and identifies a set of genes with FOXP2 binding sites. The FOXP2-binding sites tend to be conserved, yet we identified 38 instances of evolutionarily novel sites in humans. Combined, these data present a comprehensive portrait of FOXP2's-binding properties and imply that although its sequence specificity has been conserved, some of its genomic binding sites are newly evolved.
Nelson, Christopher S.; Fuller, Chris K.; Fordyce, Polly M.; Greninger, Alexander L.; Li, Hao; DeRisi, Joseph L.
2013-01-01
The transcription factor forkhead box P2 (FOXP2) is believed to be important in the evolution of human speech. A mutation in its DNA-binding domain causes severe speech impairment. Humans have acquired two coding changes relative to the conserved mammalian sequence. Despite intense interest in FOXP2, it has remained an open question whether the human protein’s DNA-binding specificity and chromatin localization are conserved. Previous in vitro and ChIP-chip studies have provided conflicting consensus sequences for the FOXP2-binding site. Using MITOMI 2.0 microfluidic affinity assays, we describe the binding site of FOXP2 and its affinity profile in base-specific detail for all substitutions of the strongest binding site. We find that human and chimp FOXP2 have similar binding sites that are distinct from previously suggested consensus binding sites. Additionally, through analysis of FOXP2 ChIP-seq data from cultured neurons, we find strong overrepresentation of a motif that matches our in vitro results and identifies a set of genes with FOXP2 binding sites. The FOXP2-binding sites tend to be conserved, yet we identified 38 instances of evolutionarily novel sites in humans. Combined, these data present a comprehensive portrait of FOXP2’s-binding properties and imply that although its sequence specificity has been conserved, some of its genomic binding sites are newly evolved. PMID:23625967
Moreno-Ramos, Oscar A; Olivares, Ana María; Haider, Neena B; de Autismo, Liga Colombiana; Lattig, María Claudia
2015-01-01
Autism spectrum disorders (ASDs) are a range of complex neurodevelopmental conditions principally characterized by dysfunctions linked to mental development. Previous studies have shown that there are more than 1000 genes likely involved in ASD, expressed mainly in brain and highly interconnected among them. We applied whole exome sequencing in Colombian-South American trios. Two missense novel SNVs were found in the same child: ALDH1A3 (RefSeq NM_000693: c.1514T>C (p.I505T)) and FOXN1 (RefSeq NM_003593: c.146C>T (p.S49L)). Gene expression studies reveal that Aldh1a3 and Foxn1 are expressed in ~E13.5 mouse embryonic brain, as well as in adult piriform cortex (PC; ~P30). Conserved Retinoic Acid Response Elements (RAREs) upstream of human ALDH1A3 and FOXN1 and in mouse Aldh1a3 and Foxn1 genes were revealed using bioinformatic approximation. Chromatin immunoprecipitation (ChIP) assay using Retinoid Acid Receptor B (Rarb) as the immunoprecipitation target suggests RA regulation of Aldh1a3 and Foxn1 in mice. Our results frame a possible link of RA regulation in brain to ASD etiology, and a feasible non-additive effect of two apparently unrelated variants in ALDH1A3 and FOXN1 recognizing that every result given by next generation sequencing should be cautiously analyzed, as it might be an incidental finding.
Xiang, Yuqian; Zhang, Junyu; Li, Qiaoli; Zhou, Xinyao; Wang, Teng; Xu, Mingqing; Xia, Shihui; Xing, Qinghe; Wang, Lei; He, Lin; Zhao, Xinzhi
2014-09-01
Utilizing epigenetic (DNA methylation) differences to differentiate between maternal peripheral blood (PBL) and fetal (placental) DNA has been a promising strategy for non-invasive prenatal testing (NIPT). However, the differentially methylated regions (DMRs) have yet to be fully ascertained. In the present study, we performed genome-wide comparative methylome analysis between maternal PBL and placental DNA from pregnancies of first trimester by methylated DNA immunoprecipitation-sequencing (MeDIP-Seq) and Infinium HumanMethylation450 BeadChip assays. A total of 36 931 DMRs and 45 804 differentially methylated sites (DMSs) covering the whole genome, exclusive of the Y chromosome, were identified via MeDIP-Seq and Infinium 450k array, respectively, of which 3759 sites in 2188 regions were confirmed by both methods. Not only did we find the previously reported potential fetal DNA markers in our identified DMRs/DMSs but also we verified fully the identified DMRs/DMSs in the validation round by MassARRAY EpiTYPER. The screened potential fetal DNA markers may be used for NIPT on aneuploidies and other chromosomal diseases, such as cri du chat syndrome and velo-cardio-facial syndrome. In addition, these potential markers may have application in the early diagnosis of placental dysfunction, such as pre-eclampsia. © The Author 2014. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.