Sample records for sequence profile analysis

  1. Identification of (R)-selective ω-aminotransferases by exploring evolutionary sequence space.

    PubMed

    Kim, Eun-Mi; Park, Joon Ho; Kim, Byung-Gee; Seo, Joo-Hyun

    2018-03-01

    Several (R)-selective ω-aminotransferases (R-ωATs) have been reported. The existence of additional R-ωATs having different sequence characteristics from previous ones is highly expected. In addition, it is generally accepted that R-ωATs are variants of aminotransferase group III. Based on these backgrounds, sequences in RefSeq database were scored using family profiles of branched-chain amino acid aminotransferase (BCAT) and d-alanine aminotransferase (DAT) to predict and identify putative R-ωATs. Sequences with two profile analysis scores were plotted on two-dimensional score space. Candidates with relatively similar scores in both BCAT and DAT profiles (i.e., profile analysis score using BCAT profile was similar to profile analysis score using DAT profile) were selected. Experimental results for selected candidates showed that putative R-ωATs from Saccharopolyspora erythraea (R-ωAT_Sery), Bacillus cellulosilyticus (R-ωAT_Bcel), and Bacillus thuringiensis (R-ωAT_Bthu) had R-ωAT activity. Additional experiments revealed that R-ωAT_Sery also possessed DAT activity while R-ωAT_Bcel and R-ωAT_Bthu had BCAT activity. Selecting putative R-ωATs from regions with similar profile analysis scores identified potential R-ωATs. Therefore, R-ωATs could be efficiently identified by using simple family profile analysis and exploring evolutionary sequence space. Copyright © 2017 Elsevier Inc. All rights reserved.

  2. Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications

    USDA-ARS?s Scientific Manuscript database

    Analysis of DNA methylation patterns relies increasingly on sequencing-based profiling methods. The four most frequently used sequencing-based technologies are the bisulfite-based methods MethylC-seq and reduced representation bisulfite sequencing (RRBS), and the enrichment-based techniques methylat...

  3. ChIP-chip versus ChIP-seq: Lessons for experimental design and data analysis

    PubMed Central

    2011-01-01

    Background Chromatin immunoprecipitation (ChIP) followed by microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) allows genome-wide discovery of protein-DNA interactions such as transcription factor bindings and histone modifications. Previous reports only compared a small number of profiles, and little has been done to compare histone modification profiles generated by the two technologies or to assess the impact of input DNA libraries in ChIP-seq analysis. Here, we performed a systematic analysis of a modENCODE dataset consisting of 31 pairs of ChIP-chip/ChIP-seq profiles of the coactivator CBP, RNA polymerase II (RNA PolII), and six histone modifications across four developmental stages of Drosophila melanogaster. Results Both technologies produce highly reproducible profiles within each platform, ChIP-seq generally produces profiles with a better signal-to-noise ratio, and allows detection of more peaks and narrower peaks. The set of peaks identified by the two technologies can be significantly different, but the extent to which they differ varies depending on the factor and the analysis algorithm. Importantly, we found that there is a significant variation among multiple sequencing profiles of input DNA libraries and that this variation most likely arises from both differences in experimental condition and sequencing depth. We further show that using an inappropriate input DNA profile can impact the average signal profiles around genomic features and peak calling results, highlighting the importance of having high quality input DNA data for normalization in ChIP-seq analysis. Conclusions Our findings highlight the biases present in each of the platforms, show the variability that can arise from both technology and analysis methods, and emphasize the importance of obtaining high quality and deeply sequenced input DNA libraries for ChIP-seq analysis. PMID:21356108

  4. ampliMethProfiler: a pipeline for the analysis of CpG methylation profiles of targeted deep bisulfite sequenced amplicons.

    PubMed

    Scala, Giovanni; Affinito, Ornella; Palumbo, Domenico; Florio, Ermanno; Monticelli, Antonella; Miele, Gennaro; Chiariotti, Lorenzo; Cocozza, Sergio

    2016-11-25

    CpG sites in an individual molecule may exist in a binary state (methylated or unmethylated) and each individual DNA molecule, containing a certain number of CpGs, is a combination of these states defining an epihaplotype. Classic quantification based approaches to study DNA methylation are intrinsically unable to fully represent the complexity of the underlying methylation substrate. Epihaplotype based approaches, on the other hand, allow methylation profiles of cell populations to be studied at the single molecule level. For such investigations, next-generation sequencing techniques can be used, both for quantitative and for epihaplotype analysis. Currently available tools for methylation analysis lack output formats that explicitly report CpG methylation profiles at the single molecule level and that have suited statistical tools for their interpretation. Here we present ampliMethProfiler, a python-based pipeline for the extraction and statistical epihaplotype analysis of amplicons from targeted deep bisulfite sequencing of multiple DNA regions. ampliMethProfiler tool provides an easy and user friendly way to extract and analyze the epihaplotype composition of reads from targeted bisulfite sequencing experiments. ampliMethProfiler is written in python language and requires a local installation of BLAST and (optionally) QIIME tools. It can be run on Linux and OS X platforms. The software is open source and freely available at http://amplimethprofiler.sourceforge.net .

  5. mPUMA: a computational approach to microbiota analysis by de novo assembly of operational taxonomic units based on protein-coding barcode sequences.

    PubMed

    Links, Matthew G; Chaban, Bonnie; Hemmingsen, Sean M; Muirhead, Kevin; Hill, Janet E

    2013-08-15

    Formation of operational taxonomic units (OTU) is a common approach to data aggregation in microbial ecology studies based on amplification and sequencing of individual gene targets. The de novo assembly of OTU sequences has been recently demonstrated as an alternative to widely used clustering methods, providing robust information from experimental data alone, without any reliance on an external reference database. Here we introduce mPUMA (microbial Profiling Using Metagenomic Assembly, http://mpuma.sourceforge.net), a software package for identification and analysis of protein-coding barcode sequence data. It was developed originally for Cpn60 universal target sequences (also known as GroEL or Hsp60). Using an unattended process that is independent of external reference sequences, mPUMA forms OTUs by DNA sequence assembly and is capable of tracking OTU abundance. mPUMA processes microbial profiles both in terms of the direct DNA sequence as well as in the translated amino acid sequence for protein coding barcodes. By forming OTUs and calculating abundance through an assembly approach, mPUMA is capable of generating inputs for several popular microbiota analysis tools. Using SFF data from sequencing of a synthetic community of Cpn60 sequences derived from the human vaginal microbiome, we demonstrate that mPUMA can faithfully reconstruct all expected OTU sequences and produce compositional profiles consistent with actual community structure. mPUMA enables analysis of microbial communities while empowering the discovery of novel organisms through OTU assembly.

  6. ProfileGrids: a sequence alignment visualization paradigm that avoids the limitations of Sequence Logos.

    PubMed

    Roca, Alberto I

    2014-01-01

    The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org.

  7. A Comprehensive Approach to Sequence-oriented IsomiR annotation (CASMIR): demonstration with IsomiR profiling in colorectal neoplasia.

    PubMed

    Wu, Chung Wah; Evans, Jared M; Huang, Shengbing; Mahoney, Douglas W; Dukek, Brian A; Taylor, William R; Yab, Tracy C; Smyrk, Thomas C; Jen, Jin; Kisiel, John B; Ahlquist, David A

    2018-05-25

    MicroRNA (miRNA) profiling is an important step in studying biological associations and identifying marker candidates. miRNA exists in isoforms, called isomiRs, which may exhibit distinct properties. With conventional profiling methods, limitations in assay and analysis platforms may compromise isomiR interrogation. We introduce a comprehensive approach to sequence-oriented isomiR annotation (CASMIR) to allow unbiased identification of global isomiRs from small RNA sequencing data. In this approach, small RNA reads are maintained as independent sequences instead of being summarized under miRNA names. IsomiR features are identified through step-wise local alignment against canonical forms and precursor sequences. Through customizing the reference database, CASMIR is applicable to isomiR annotation across species. To demonstrate its application, we investigated isomiR profiles in normal and neoplastic human colorectal epithelia. We also ran miRDeep2, a popular miRNA analysis algorithm to validate isomiRs annotated by CASMIR. With CASMIR, specific and biologically relevant isomiR patterns could be identified. We note that specific isomiRs are often more abundant than their canonical forms. We identify isomiRs that are commonly up-regulated in both colorectal cancer and advanced adenoma, and illustrate advantages in targeting isomiRs as potential biomarkers over canonical forms. Studying miRNAs at the isomiR level could reveal new insight into miRNA biology and inform assay design for specific isomiRs. CASMIR facilitates comprehensive annotation of isomiR features in small RNA sequencing data for isomiR profiling and differential expression analysis.

  8. ProfileGrids: a sequence alignment visualization paradigm that avoids the limitations of Sequence Logos

    PubMed Central

    2014-01-01

    Background The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. Results The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. Conclusions The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org. PMID:25237393

  9. Evolutionary profiles from the QR factorization of multiple sequence alignments

    PubMed Central

    Sethi, Anurag; O'Donoghue, Patrick; Luthey-Schulten, Zaida

    2005-01-01

    We present an algorithm to generate complete evolutionary profiles that represent the topology of the molecular phylogenetic tree of the homologous group. The method, based on the multidimensional QR factorization of numerically encoded multiple sequence alignments, removes redundancy from the alignments and orders the protein sequences by increasing linear dependence, resulting in the identification of a minimal basis set of sequences that spans the evolutionary space of the homologous group of proteins. We observe a general trend that these smaller, more evolutionarily balanced profiles have comparable and, in many cases, better performance in database searches than conventional profiles containing hundreds of sequences, constructed in an iterative and computationally intensive procedure. For more diverse families or superfamilies, with sequence identity <30%, structural alignments, based purely on the geometry of the protein structures, provide better alignments than pure sequence-based methods. Merging the structure and sequence information allows the construction of accurate profiles for distantly related groups. These structure-based profiles outperformed other sequence-based methods for finding distant homologs and were used to identify a putative class II cysteinyl-tRNA synthetase (CysRS) in several archaea that eluded previous annotation studies. Phylogenetic analysis showed the putative class II CysRSs to be a monophyletic group and homology modeling revealed a constellation of active site residues similar to that in the known class I CysRS. PMID:15741270

  10. Assigning protein functions by comparative genome analysis protein phylogenetic profiles

    DOEpatents

    Pellegrini, Matteo; Marcotte, Edward M.; Thompson, Michael J.; Eisenberg, David; Grothe, Robert; Yeates, Todd O.

    2003-05-13

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  11. Gene Scanning of an Internalin B Gene Fragment Using High-Resolution Melting Curve Analysis as a Tool for Rapid Typing of Listeria monocytogenes

    PubMed Central

    Pietzka, Ariane T.; Stöger, Anna; Huhulescu, Steliana; Allerberger, Franz; Ruppitsch, Werner

    2011-01-01

    The ability to accurately track Listeria monocytogenes strains involved in outbreaks is essential for control and prevention of listeriosis. Because current typing techniques are time-consuming, cost-intensive, technically demanding, and difficult to standardize, we developed a rapid and cost-effective method for typing of L. monocytogenes. In all, 172 clinical L. monocytogenes isolates and 20 isolates from culture collections were typed by high-resolution melting (HRM) curve analysis of a specific locus of the internalin B gene (inlB). All obtained HRM curve profiles were verified by sequence analysis. The 192 tested L. monocytogenes isolates yielded 15 specific HRM curve profiles. Sequence analysis revealed that these 15 HRM curve profiles correspond to 18 distinct inlB sequence types. The HRM curve profiles obtained correlated with the five phylogenetic groups I.1, I.2, II.1, II.2, and III. Thus, HRM curve analysis constitutes an inexpensive assay and represents an improvement in typing relative to classical serotyping or multiplex PCR typing protocols. This method provides a rapid and powerful screening tool for simultaneous preliminary typing of up to 384 samples in approximately 2 hours. PMID:21227395

  12. Cancer systems biology in the genome sequencing era: part 1, dissecting and modeling of tumor clones and their networks.

    PubMed

    Wang, Edwin; Zou, Jinfeng; Zaman, Naif; Beitel, Lenore K; Trifiro, Mark; Paliouras, Miltiadis

    2013-08-01

    Recent tumor genome sequencing confirmed that one tumor often consists of multiple cell subpopulations (clones) which bear different, but related, genetic profiles such as mutation and copy number variation profiles. Thus far, one tumor has been viewed as a whole entity in cancer functional studies. With the advances of genome sequencing and computational analysis, we are able to quantify and computationally dissect clones from tumors, and then conduct clone-based analysis. Emerging technologies such as single-cell genome sequencing and RNA-Seq could profile tumor clones. Thus, we should reconsider how to conduct cancer systems biology studies in the genome sequencing era. We will outline new directions for conducting cancer systems biology by considering that genome sequencing technology can be used for dissecting, quantifying and genetically characterizing clones from tumors. Topics discussed in Part 1 of this review include computationally quantifying of tumor subpopulations; clone-based network modeling, cancer hallmark-based networks and their high-order rewiring principles and the principles of cell survival networks of fast-growing clones. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.

  13. AfterQC: automatic filtering, trimming, error removing and quality control for fastq data.

    PubMed

    Chen, Shifu; Huang, Tanxiao; Zhou, Yanqing; Han, Yue; Xu, Mingyan; Gu, Jia

    2017-03-14

    Some applications, especially those clinical applications requiring high accuracy of sequencing data, usually have to face the troubles caused by unavoidable sequencing errors. Several tools have been proposed to profile the sequencing quality, but few of them can quantify or correct the sequencing errors. This unmet requirement motivated us to develop AfterQC, a tool with functions to profile sequencing errors and correct most of them, plus highly automated quality control and data filtering features. Different from most tools, AfterQC analyses the overlapping of paired sequences for pair-end sequencing data. Based on overlapping analysis, AfterQC can detect and cut adapters, and furthermore it gives a novel function to correct wrong bases in the overlapping regions. Another new feature is to detect and visualise sequencing bubbles, which can be commonly found on the flowcell lanes and may raise sequencing errors. Besides normal per cycle quality and base content plotting, AfterQC also provides features like polyX (a long sub-sequence of a same base X) filtering, automatic trimming and K-MER based strand bias profiling. For each single or pair of FastQ files, AfterQC filters out bad reads, detects and eliminates sequencer's bubble effects, trims reads at front and tail, detects the sequencing errors and corrects part of them, and finally outputs clean data and generates HTML reports with interactive figures. AfterQC can run in batch mode with multiprocess support, it can run with a single FastQ file, a single pair of FastQ files (for pair-end sequencing), or a folder for all included FastQ files to be processed automatically. Based on overlapping analysis, AfterQC can estimate the sequencing error rate and profile the error transform distribution. The results of our error profiling tests show that the error distribution is highly platform dependent. Much more than just another new quality control (QC) tool, AfterQC is able to perform quality control, data filtering, error profiling and base correction automatically. Experimental results show that AfterQC can help to eliminate the sequencing errors for pair-end sequencing data to provide much cleaner outputs, and consequently help to reduce the false-positive variants, especially for the low-frequency somatic mutations. While providing rich configurable options, AfterQC can detect and set all the options automatically and require no argument in most cases.

  14. Advanced colorectal adenoma related gene expression signature may predict prognostic for colorectal cancer patients with adenoma-carcinoma sequence.

    PubMed

    Li, Bing; Shi, Xiao-Yu; Liao, Dai-Xiang; Cao, Bang-Rong; Luo, Cheng-Hua; Cheng, Shu-Jun

    2015-01-01

    There are still no absolute parameters predicting progression of adenoma into cancer. The present study aimed to characterize functional differences on the multistep carcinogenetic process from the adenoma-carcinoma sequence. All samples were collected and mRNA expression profiling was performed by using Agilent Microarray high-throughput gene-chip technology. Then, the characteristics of mRNA expression profiles of adenoma-carcinoma sequence were described with bioinformatics software, and we analyzed the relationship between gene expression profiles of adenoma-adenocarcinoma sequence and clinical prognosis of colorectal cancer. The mRNA expressions of adenoma-carcinoma sequence were significantly different between high-grade intraepithelial neoplasia group and adenocarcinoma group. The biological process of gene ontology function enrichment analysis on differentially expressed genes between high-grade intraepithelial neoplasia group and adenocarcinoma group showed that genes enriched in the extracellular structure organization, skeletal system development, biological adhesion and itself regulated growth regulation, with the P value after FDR correction of less than 0.05. In addition, IPR-related protein mainly focused on the insulin-like growth factor binding proteins. The variable trends of gene expression profiles for adenoma-carcinoma sequence were mainly concentrated in high-grade intraepithelial neoplasia and adenocarcinoma. The differentially expressed genes are significantly correlated between high-grade intraepithelial neoplasia group and adenocarcinoma group. Bioinformatics analysis is an effective way to study the gene expression profiles in the adenoma-carcinoma sequence, and may provide an effective tool to involve colorectal cancer research strategy into colorectal adenoma or advanced adenoma.

  15. Stratification of co-evolving genomic groups using ranked phylogenetic profiles

    PubMed Central

    Freilich, Shiri; Goldovsky, Leon; Gottlieb, Assaf; Blanc, Eric; Tsoka, Sophia; Ouzounis, Christos A

    2009-01-01

    Background Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present rank-BLAST, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database. Results The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples. Conclusion Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples. PMID:19860884

  16. Uniform, optimal signal processing of mapped deep-sequencing data.

    PubMed

    Kumar, Vibhor; Muratani, Masafumi; Rayan, Nirmala Arul; Kraus, Petra; Lufkin, Thomas; Ng, Huck Hui; Prabhakar, Shyam

    2013-07-01

    Despite their apparent diversity, many problems in the analysis of high-throughput sequencing data are merely special cases of two general problems, signal detection and signal estimation. Here we adapt formally optimal solutions from signal processing theory to analyze signals of DNA sequence reads mapped to a genome. We describe DFilter, a detection algorithm that identifies regulatory features in ChIP-seq, DNase-seq and FAIRE-seq data more accurately than assay-specific algorithms. We also describe EFilter, an estimation algorithm that accurately predicts mRNA levels from as few as 1-2 histone profiles (R ∼0.9). Notably, the presence of regulatory motifs in promoters correlates more with histone modifications than with mRNA levels, suggesting that histone profiles are more predictive of cis-regulatory mechanisms. We show by applying DFilter and EFilter to embryonic forebrain ChIP-seq data that regulatory protein identification and functional annotation are feasible despite tissue heterogeneity. The mathematical formalism underlying our tools facilitates integrative analysis of data from virtually any sequencing-based functional profile.

  17. Distinct profiles of expressed sequence tags during intestinal regeneration in the sea cucumber Holothuria glaberrima

    PubMed Central

    Rojas-Cartagena, Carmencita; Ortíz-Pineda, Pablo; Ramírez-Gómez, Francisco; Suárez-Castillo, Edna C.; Matos-Cruz, Vanessa; Rodríguez, Carlos; Ortíz-Zuazaga, Humberto; García-Arrarás, José E.

    2010-01-01

    Repair and regeneration are key processes for tissue maintenance, and their disruption may lead to disease states. Little is known about the molecular mechanisms that underline the repair and regeneration of the digestive tract. The sea cucumber Holothuria glaberrima represents an excellent model to dissect and characterize the molecular events during intestinal regeneration. To study the gene expression profile, cDNA libraries were constructed from normal, 3-day, and 7-day regenerating intestines of H. glaberrima. Clones were randomly sequenced and queried against the nonredundant protein database at the National Center for Biotechnology Information. RT-PCR analyses were made of several genes to determine their expression profile during intestinal regeneration. A total of 5,173 sequences from three cDNA libraries were obtained. About 46.2, 35.6, and 26.2% of the sequences for the normal, 3-days, and 7-days cDNA libraries, respectively, shared significant similarity with known sequences in the protein database of GenBank but only present 10% of similarity among them. Analysis of the libraries in terms of functional processes, protein domains, and most common sequences suggests that a differential expression profile is taking place during the regeneration process. Further examination of the expressed sequence tag dataset revealed that 12 putative genes are differentially expressed at significant level (R > 6). Experimental validation by RT-PCR analysis reveals that at least three genes (unknown C-4677-1, melanotransferrin, and centaurin) present a differential expression during regeneration. These findings strongly suggest that the gene expression profile varies among regeneration stages and provide evidence for the existence of differential gene expression. PMID:17579180

  18. EvoDB: a database of evolutionary rate profiles, associated protein domains and phylogenetic trees for PFAM-A

    PubMed Central

    Ndhlovu, Andrew; Durand, Pierre M.; Hazelhurst, Scott

    2015-01-01

    The evolutionary rate at codon sites across protein-coding nucleotide sequences represents a valuable tier of information for aligning sequences, inferring homology and constructing phylogenetic profiles. However, a comprehensive resource for cataloguing the evolutionary rate at codon sites and their corresponding nucleotide and protein domain sequence alignments has not been developed. To address this gap in knowledge, EvoDB (an Evolutionary rates DataBase) was compiled. Nucleotide sequences and their corresponding protein domain data including the associated seed alignments from the PFAM-A (protein family) database were used to estimate evolutionary rate (ω = dN/dS) profiles at codon sites for each entry. EvoDB contains 98.83% of the gapped nucleotide sequence alignments and 97.1% of the evolutionary rate profiles for the corresponding information in PFAM-A. As the identification of codon sites under positive selection and their position in a sequence profile is usually the most sought after information for molecular evolutionary biologists, evolutionary rate profiles were determined under the M2a model using the CODEML algorithm in the PAML (Phylogenetic Analysis by Maximum Likelihood) suite of software. Validation of nucleotide sequences against amino acid data was implemented to ensure high data quality. EvoDB is a catalogue of the evolutionary rate profiles and provides the corresponding phylogenetic trees, PFAM-A alignments and annotated accession identifier data. In addition, the database can be explored and queried using known evolutionary rate profiles to identify domains under similar evolutionary constraints and pressures. EvoDB is a resource for evolutionary, phylogenetic studies and presents a tier of information untapped by current databases. Database URL: http://www.bioinf.wits.ac.za/software/fire/evodb PMID:26140928

  19. EvoDB: a database of evolutionary rate profiles, associated protein domains and phylogenetic trees for PFAM-A.

    PubMed

    Ndhlovu, Andrew; Durand, Pierre M; Hazelhurst, Scott

    2015-01-01

    The evolutionary rate at codon sites across protein-coding nucleotide sequences represents a valuable tier of information for aligning sequences, inferring homology and constructing phylogenetic profiles. However, a comprehensive resource for cataloguing the evolutionary rate at codon sites and their corresponding nucleotide and protein domain sequence alignments has not been developed. To address this gap in knowledge, EvoDB (an Evolutionary rates DataBase) was compiled. Nucleotide sequences and their corresponding protein domain data including the associated seed alignments from the PFAM-A (protein family) database were used to estimate evolutionary rate (ω = dN/dS) profiles at codon sites for each entry. EvoDB contains 98.83% of the gapped nucleotide sequence alignments and 97.1% of the evolutionary rate profiles for the corresponding information in PFAM-A. As the identification of codon sites under positive selection and their position in a sequence profile is usually the most sought after information for molecular evolutionary biologists, evolutionary rate profiles were determined under the M2a model using the CODEML algorithm in the PAML (Phylogenetic Analysis by Maximum Likelihood) suite of software. Validation of nucleotide sequences against amino acid data was implemented to ensure high data quality. EvoDB is a catalogue of the evolutionary rate profiles and provides the corresponding phylogenetic trees, PFAM-A alignments and annotated accession identifier data. In addition, the database can be explored and queried using known evolutionary rate profiles to identify domains under similar evolutionary constraints and pressures. EvoDB is a resource for evolutionary, phylogenetic studies and presents a tier of information untapped by current databases. © The Author(s) 2015. Published by Oxford University Press.

  20. Molecular and comparative analysis of Salmonella enterica Senftenberg from humans and animals using PFGE, MLST and NARMS.

    PubMed

    Stepan, Ryan M; Sherwood, Julie S; Petermann, Shana R; Logue, Catherine M

    2011-06-27

    Salmonella species are recognized worldwide as a significant cause of human and animal disease. In this study the molecular profiles and characteristics of Salmonella enterica Senftenberg isolated from human cases of illness and those recovered from healthy or diagnostic cases in animals were assessed. Included in the study was a comparison with our own sequenced strain of S. Senfteberg recovered from production turkeys in North Dakota. Isolates examined in this study were subjected to antimicrobial susceptibility profiling using the National Antimicrobial Resistance Monitoring System (NARMS) panel which tested susceptibility to 15 different antimicrobial agents. The molecular profiles of all isolates were determined using Pulsed Field Gel Electrophoresis (PFGE) and the sequence types of the strains were obtained using Multi-Locus Sequence Type (MLST) analysis based on amplification and sequence interrogation of seven housekeeping genes (aroC, dnaN, hemD, hisD, purE, sucA, and thrA). PFGE data was input into BioNumerics analysis software to generate a dendrogram of relatedness among the strains. The study found 93 profiles among 98 S. Senftenberg isolates tested and there were primarily two sequence types associated with humans and animals (ST185 and ST14) with overlap observed in all host types suggesting that the distribution of S. Senftenberg sequence types is not host dependent. Antimicrobial resistance was observed among the animal strains, however no resistance was detected in human isolates suggesting that animal husbandry has a significant influence on the selection and promotion of antimicrobial resistance. The data demonstrates the circulation of at least two strain types in both animal and human health suggesting that S. Senftenberg is relatively homogeneous in its distribution. The data generated in this study could be used towards defining a pathotype for this serovar.

  1. Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data.

    PubMed

    Favero, F; Joshi, T; Marquard, A M; Birkbak, N J; Krzystanek, M; Li, Q; Szallasi, Z; Eklund, A C

    2015-01-01

    Exome or whole-genome deep sequencing of tumor DNA along with paired normal DNA can potentially provide a detailed picture of the somatic mutations that characterize the tumor. However, analysis of such sequence data can be complicated by the presence of normal cells in the tumor specimen, by intratumor heterogeneity, and by the sheer size of the raw data. In particular, determination of copy number variations from exome sequencing data alone has proven difficult; thus, single nucleotide polymorphism (SNP) arrays have often been used for this task. Recently, algorithms to estimate absolute, but not allele-specific, copy number profiles from tumor sequencing data have been described. We developed Sequenza, a software package that uses paired tumor-normal DNA sequencing data to estimate tumor cellularity and ploidy, and to calculate allele-specific copy number profiles and mutation profiles. We applied Sequenza, as well as two previously published algorithms, to exome sequence data from 30 tumors from The Cancer Genome Atlas. We assessed the performance of these algorithms by comparing their results with those generated using matched SNP arrays and processed by the allele-specific copy number analysis of tumors (ASCAT) algorithm. Comparison between Sequenza/exome and SNP/ASCAT revealed strong correlation in cellularity (Pearson's r = 0.90) and ploidy estimates (r = 0.42, or r = 0.94 after manual inspecting alternative solutions). This performance was noticeably superior to previously published algorithms. In addition, in artificial data simulating normal-tumor admixtures, Sequenza detected the correct ploidy in samples with tumor content as low as 30%. The agreement between Sequenza and SNP array-based copy number profiles suggests that exome sequencing alone is sufficient not only for identifying small scale mutations but also for estimating cellularity and inferring DNA copy number aberrations. © The Author 2014. Published by Oxford University Press on behalf of the European Society for Medical Oncology.

  2. AlignMe—a membrane protein sequence alignment web server

    PubMed Central

    Stamm, Marcus; Staritzbichler, René; Khafizov, Kamil; Forrest, Lucy R.

    2014-01-01

    We present a web server for pair-wise alignment of membrane protein sequences, using the program AlignMe. The server makes available two operational modes of AlignMe: (i) sequence to sequence alignment, taking two sequences in fasta format as input, combining information about each sequence from multiple sources and producing a pair-wise alignment (PW mode); and (ii) alignment of two multiple sequence alignments to create family-averaged hydropathy profile alignments (HP mode). For the PW sequence alignment mode, four different optimized parameter sets are provided, each suited to pairs of sequences with a specific similarity level. These settings utilize different types of inputs: (position-specific) substitution matrices, secondary structure predictions and transmembrane propensities from transmembrane predictions or hydrophobicity scales. In the second (HP) mode, each input multiple sequence alignment is converted into a hydrophobicity profile averaged over the provided set of sequence homologs; the two profiles are then aligned. The HP mode enables qualitative comparison of transmembrane topologies (and therefore potentially of 3D folds) of two membrane proteins, which can be useful if the proteins have low sequence similarity. In summary, the AlignMe web server provides user-friendly access to a set of tools for analysis and comparison of membrane protein sequences. Access is available at http://www.bioinfo.mpg.de/AlignMe PMID:24753425

  3. PanFunPro: Bacterial Pan-Genome Analysis Based on the Functional Profiles (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lukjancenko, Oksana

    2012-06-01

    Julien Tremblay from DOE JGI presents "Evaluation of Multiplexed 16S rRNA Microbial Population Surveys Using Illumina MiSeq Platorm" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  4. PanFunPro: Bacterial Pan-Genome Analysis Based on the Functional Profiles (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Lukjancenko, Oksana

    2018-01-10

    Julien Tremblay from DOE JGI presents "Evaluation of Multiplexed 16S rRNA Microbial Population Surveys Using Illumina MiSeq Platorm" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  5. Next Generation Sequencing Technology and Genomewide Data Analysis: Perspectives for Retinal Research

    PubMed Central

    Chaitankar, Vijender; Karakülah, Gökhan; Ratnapriya, Rinki; Giuste, Felipe O.; Brooks, Matthew J.; Swaroop, Anand

    2016-01-01

    The advent of high throughput next generation sequencing (NGS) has accelerated the pace of discovery of disease-associated genetic variants and genomewide profiling of expressed sequences and epigenetic marks, thereby permitting systems-based analyses of ocular development and disease. Rapid evolution of NGS and associated methodologies presents significant challenges in acquisition, management, and analysis of large data sets and for extracting biologically or clinically relevant information. Here we illustrate the basic design of commonly used NGS-based methods, specifically whole exome sequencing, transcriptome, and epigenome profiling, and provide recommendations for data analyses. We briefly discuss systems biology approaches for integrating multiple data sets to elucidate gene regulatory or disease networks. While we provide examples from the retina, the NGS guidelines reviewed here are applicable to other tissues/cell types as well. PMID:27297499

  6. Refined identification of Vibrio bacterial flora from Acanthasther planci based on biochemical profiling and analysis of housekeeping genes.

    PubMed

    Rivera-Posada, J A; Pratchett, M; Cano-Gomez, A; Arango-Gomez, J D; Owens, L

    2011-09-09

    We used a polyphasic approach for precise identification of bacterial flora (Vibrionaceae) isolated from crown-of-thorns starfish (COTS) from Lizard Island (Great Barrier Reef, Australia) and Guam (U.S.A., Western Pacific Ocean). Previous 16S rRNA gene phylogenetic analysis was useful to allocate and identify isolates within the Photobacterium, Splendidus and Harveyi clades but failed in the identification of Vibrio harveyi-like isolates. Species of the V harveyi group have almost indistinguishable phenotypes and genotypes, and thus, identification by standard biochemical tests and 16S rRNA gene analysis is commonly inaccurate. Biochemical profiling and sequence analysis of additional topA and mreB housekeeping genes were carried out for definitive identification of 19 bacterial isolates recovered from sick and wild COTS. For 8 isolates, biochemical profiles and topA and mreB gene sequence alignments with the closest relatives (GenBank) confirmed previous 16S rRNA-based identification: V. fortis and Photobacterium eurosenbergii species (from wild COTS), and V natriegens (from diseased COTS). Further phylogenetic analysis based on topA and mreB concatenated sequences served to identify the remaining 11 V harveyi-like isolates: V. owensii and V. rotiferianus (from wild COTS), and V. owensii, V. rotiferianus, and V. harveyi (from diseased COTS). This study further confirms the reliability of topA-mreB gene sequence analysis for identification of these close species, and it reveals a wider distribution range of the potentially pathogenic V. harveyi group.

  7. From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline.

    PubMed

    Chen, Yunshun; Lun, Aaron T L; Smyth, Gordon K

    2016-01-01

    In recent years, RNA sequencing (RNA-seq) has become a very widely used technology for profiling gene expression. One of the most common aims of RNA-seq profiling is to identify genes or molecular pathways that are differentially expressed (DE) between two or more biological conditions. This article demonstrates a computational workflow for the detection of DE genes and pathways from RNA-seq data by providing a complete analysis of an RNA-seq experiment profiling epithelial cell subsets in the mouse mammary gland. The workflow uses R software packages from the open-source Bioconductor project and covers all steps of the analysis pipeline, including alignment of read sequences, data exploration, differential expression analysis, visualization and pathway analysis. Read alignment and count quantification is conducted using the Rsubread package and the statistical analyses are performed using the edgeR package. The differential expression analysis uses the quasi-likelihood functionality of edgeR.

  8. Early Detection of NSCLC Using Stromal Markers in Peripheral Blood

    DTIC Science & Technology

    2016-09-01

    circulating myeloid cells, flow cytometry, RNA -sequencing, expression profiling. 3. ACCOMPLISHMENTS:  What were the major goals of the project...Subtask 2: Flow cytometry sorting of circulating myeloid cells. Subtask 3: RNA -Sequencing Subtask 4: RNA -seq data analysis Subtask 5: Feasible RT-PCR...accomplished the patient recruitment, flow cytometry sorting of circulating myeloid cells, RNA -sequencing of the samples. During the RNA - seq data analysis, we

  9. Colorstratigraphy; A New Stratigraphic Correlation Technique

    NASA Astrophysics Data System (ADS)

    Nanayakkara, N. U.; Ranasinghage, P. N.; Priyantha, C.; Abillapitiya, T.

    2016-12-01

    Here we introduce a novel stratigraphic technique namely colorstratigraphy for correlating sedimentary sequences. Minihagalkanda is about 1 km long amphitheater like sedimentary terrain, situated at the southeastern coast of Sri Lanka. It has Miocene sedimentary sequences, separated in to 10-12 m high small hillocks by erosion, and bounded by about 30 m high escarpment. Sandstone, yellowish sandy clay, greenish silty clay sequences are capped by 4-5 m limestone bed in these hillocks but not at the boundary escarpment. Stratigraphic profiles at two hillocks and the boundary escarpment, separated each other by 200-300 m, were selected to test the new colorstartigraphic correlation technique. Color reflectance (DSR) was measured at four samples in each sequence at every profile and hence altogether 36 reflectance measurements were taken using Minolta 2500D hand-held color spectrophotometer. The first-derivative of the reflectance spectra (dR/dλ) defines the "spectral shape" of the sample. Therefore, DSR data (360-740 nm) measured at 10 nm resolution were used to calculate a center-weighted, first-derivative spectra for each reflectance sample consisting of 39 channels. Particle size of each sequence was measured at all 03 profiles using laser particle size analyzer to verify the stratigraphic correlation. Mean reflectance spectrum for each sequence at all 03 profiles were plotted on the same graph for comparison. Same was done for the grain size spectrums. Discriminant function analysis was performed separately for dsr data and grain size data using a number assigned to each sedimentary sequence as the grouping variable Color spectrums of sandstone, yellowish sandy clay, and greenish silty clay sequences at all three profiles perfectly match showing clear stratigraphic correlation among these three stratigraphic profiles. Matching grain size distribution curves of the three sequence at the three profiles verify the stratigraphic correlation. Perfect 100 % discrimination of the three sequences with color reflectance data proves the accuracy of the correlation. Similar 100 % discrimination resulted with grain size data further verifies the results. Therefore, colorstratigraphy based on DSR can be introduced as a quick and easy technique for stratigraphic correlation of sedimentary sequences.

  10. Determining protein function and interaction from genome analysis

    DOEpatents

    Eisenberg, David; Marcotte, Edward M.; Thompson, Michael J.; Pellegrini, Matteo; Yeates, Todd O.

    2004-08-03

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  11. Comprehensive analysis of the T-cell receptor beta chain gene in rhesus monkey by high throughput sequencing

    PubMed Central

    Li, Zhoufang; Liu, Guangjie; Tong, Yin; Zhang, Meng; Xu, Ying; Qin, Li; Wang, Zhanhui; Chen, Xiaoping; He, Jiankui

    2015-01-01

    Profiling immune repertoires by high throughput sequencing enhances our understanding of immune system complexity and immune-related diseases in humans. Previously, cloning and Sanger sequencing identified limited numbers of T cell receptor (TCR) nucleotide sequences in rhesus monkeys, thus their full immune repertoire is unknown. We applied multiplex PCR and Illumina high throughput sequencing to study the TCRβ of rhesus monkeys. We identified 1.26 million TCRβ sequences corresponding to 643,570 unique TCRβ sequences and 270,557 unique complementarity-determining region 3 (CDR3) gene sequences. Precise measurements of CDR3 length distribution, CDR3 amino acid distribution, length distribution of N nucleotide of junctional region, and TCRV and TCRJ gene usage preferences were performed. A comprehensive profile of rhesus monkey immune repertoire might aid human infectious disease studies using rhesus monkeys. PMID:25961410

  12. Identification of a 'Candidatus Phytoplasma hispanicum'-related strain, associated with yellows-type diseases, in smoke-tree sharpshooter (Homalodisca liturata Ball).

    PubMed

    Servín-Villegas, Rosalía; Caamal-Chan, Maria Goretty; Chavez-Medina, Alicia; Loera-Muro, Abraham; Barraza, Aarón; Medina-Hernández, Diana; Holguín-Peña, Ramón Jaime

    2018-04-11

    The 16SrXIII group from phytoplasma bacteria were identified in salivary glands from Homalodisca liturata, which were collected in El Comitán on the Baja California peninsula in Mexico. We were able to positively identify 15 16S rRNA gene sequences with the corresponding signature sequence of 'CandidatusPhytoplasma' (CAAGAYBATKATGTKTAGCYGGDCT) and in silico restriction fragment length polymorphism (RFLP) profiles (F value estimations) coupled with a phylogenetic analysis to confirm their relatedness to 'CandidatusPhytoplasma hispanicum', which in turn belongs to the 16SrXIII group. A restriction analysis was carried out with AluI and EcoRI to confirm that the five sequences belongs to subgroup D. The rest of the sequences did not exhibit any known RFLP profile related to a subgroup reported in the 16SrXIII group.

  13. cyclostratigraphy, sequence stratigraphy and organic matter accumulation mechanism

    NASA Astrophysics Data System (ADS)

    Cong, F.; Li, J.

    2016-12-01

    The first member of Maokou Formation of Sichuan basin is composed of well preserved carbonate ramp couplets of limestone and marlstone/shale. It acts as one of the potential shale gas source rock, and is suitable for time-series analysis. We conducted time-series analysis to identify high-frequency sequences, reconstruct high-resolution sedimentation rate, estimate detailed primary productivity for the first time in the study intervals and discuss organic matter accumulation mechanism of source rock under sequence stratigraphic framework.Using the theory of cyclostratigraphy and sequence stratigraphy, the high-frequency sequences of one outcrop profile and one drilling well are identified. Two third-order sequences and eight fourth-order sequences are distinguished on outcrop profile based on the cycle stacking patterns. For drilling well, sequence boundary and four system tracts is distinguished by "integrated prediction error filter analysis" (INPEFA) of Gamma-ray logging data, and eight fourth-order sequences is identified by 405ka long eccentricity curve in depth domain which is quantified and filtered by integrated analysis of MTM spectral analysis, evolutive harmonic analysis (EHA), evolutive average spectral misfit (eASM) and band-pass filtering. It suggests that high-frequency sequences correlate well with Milankovitch orbital signals recorded in sediments, and it is applicable to use cyclostratigraphy theory in dividing high-frequency(4-6 orders) sequence stratigraphy.High-resolution sedimentation rate is reconstructed through the study interval by tracking the highly statistically significant short eccentricity component (123ka) revealed by EHA. Based on sedimentation rate, measured TOC and density data, the burial flux, delivery flux and primary productivity of organic carbon was estimated. By integrating redox proxies, we can discuss the controls on organic matter accumulation by primary production and preservation under the high-resolution sequence stratigraphic framework. Results show that high average organic carbon contents in the study interval are mainly attributed to high primary production. The results also show a good correlation between high organic carbon accumulation and intervals of transgression.

  14. A High Resolution Seismic Sequence Analysis of the Malta Plateau

    DTIC Science & Technology

    1999-05-01

    the SACLANTCEN Programme of Work. The document has been approved for release by The Director, SACLANTCEN. Jan L . Spoelstra Director NATO...the Plio- Quatemary. To the southwest of Sicily, Di Stefano et al. (1993) identified six sequence boundaries and estimated the ages by the...the location of the seismic reflection profiles in Di Stefano et al. (1993) do not overlap any of the profiles in this study and use a lower frequency

  15. MEGGASENSE - The Metagenome/Genome Annotated Sequence Natural Language Search Engine: A Platform for 
the Construction of Sequence Data Warehouses.

    PubMed

    Gacesa, Ranko; Zucko, Jurica; Petursdottir, Solveig K; Gudmundsdottir, Elisabet Eik; Fridjonsson, Olafur H; Diminic, Janko; Long, Paul F; Cullum, John; Hranueli, Daslav; Hreggvidsson, Gudmundur O; Starcevic, Antonio

    2017-06-01

    The MEGGASENSE platform constructs relational databases of DNA or protein sequences. The default functional analysis uses 14 106 hidden Markov model (HMM) profiles based on sequences in the KEGG database. The Solr search engine allows sophisticated queries and a BLAST search function is also incorporated. These standard capabilities were used to generate the SCATT database from the predicted proteome of Streptomyces cattleya . The implementation of a specialised metagenome database (AMYLOMICS) for bioprospecting of carbohydrate-modifying enzymes is described. In addition to standard assembly of reads, a novel 'functional' assembly was developed, in which screening of reads with the HMM profiles occurs before the assembly. The AMYLOMICS database incorporates additional HMM profiles for carbohydrate-modifying enzymes and it is illustrated how the combination of HMM and BLAST analyses helps identify interesting genes. A variety of different proteome and metagenome databases have been generated by MEGGASENSE.

  16. Base resolution methylome profiling: considerations in platform selection, data preprocessing and analysis

    PubMed Central

    Sun, Zhifu; Cunningham, Julie; Slager, Susan; Kocher, Jean-Pierre

    2015-01-01

    Bisulfite treatment-based methylation microarray (mainly Illumina 450K Infinium array) and next-generation sequencing (reduced representation bisulfite sequencing, Agilent SureSelect Human Methyl-Seq, NimbleGen SeqCap Epi CpGiant or whole-genome bisulfite sequencing) are commonly used for base resolution DNA methylome research. Although multiple tools and methods have been developed and used for the data preprocessing and analysis, confusions remains for these platforms including how and whether the 450k array should be normalized; which platform should be used to better fit researchers’ needs; and which statistical models would be more appropriate for differential methylation analysis. This review presents the commonly used platforms and compares the pros and cons of each in methylome profiling. We then discuss approaches to study design, data normalization, bias correction and model selection for differentially methylated individual CpGs and regions. PMID:26366945

  17. Quantitative phenotyping via deep barcode sequencing.

    PubMed

    Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey

    2009-10-01

    Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.

  18. Molecular and comparative analysis of Salmonella enterica Senftenberg from humans and animals using PFGE, MLST and NARMS

    PubMed Central

    2011-01-01

    Background Salmonella species are recognized worldwide as a significant cause of human and animal disease. In this study the molecular profiles and characteristics of Salmonella enterica Senftenberg isolated from human cases of illness and those recovered from healthy or diagnostic cases in animals were assessed. Included in the study was a comparison with our own sequenced strain of S. Senfteberg recovered from production turkeys in North Dakota. Isolates examined in this study were subjected to antimicrobial susceptibility profiling using the National Antimicrobial Resistance Monitoring System (NARMS) panel which tested susceptibility to 15 different antimicrobial agents. The molecular profiles of all isolates were determined using Pulsed Field Gel Electrophoresis (PFGE) and the sequence types of the strains were obtained using Multi-Locus Sequence Type (MLST) analysis based on amplification and sequence interrogation of seven housekeeping genes (aroC, dnaN, hemD, hisD, purE, sucA, and thrA). PFGE data was input into BioNumerics analysis software to generate a dendrogram of relatedness among the strains. Results The study found 93 profiles among 98 S. Senftenberg isolates tested and there were primarily two sequence types associated with humans and animals (ST185 and ST14) with overlap observed in all host types suggesting that the distribution of S. Senftenberg sequence types is not host dependent. Antimicrobial resistance was observed among the animal strains, however no resistance was detected in human isolates suggesting that animal husbandry has a significant influence on the selection and promotion of antimicrobial resistance. Conclusion The data demonstrates the circulation of at least two strain types in both animal and human health suggesting that S. Senftenberg is relatively homogeneous in its distribution. The data generated in this study could be used towards defining a pathotype for this serovar. PMID:21708021

  19. Random whole metagenomic sequencing for forensic discrimination of soils.

    PubMed

    Khodakova, Anastasia S; Smith, Renee J; Burgoyne, Leigh; Abarno, Damien; Linacre, Adrian

    2014-01-01

    Here we assess the ability of random whole metagenomic sequencing approaches to discriminate between similar soils from two geographically distinct urban sites for application in forensic science. Repeat samples from two parklands in residential areas separated by approximately 3 km were collected and the DNA was extracted. Shotgun, whole genome amplification (WGA) and single arbitrarily primed DNA amplification (AP-PCR) based sequencing techniques were then used to generate soil metagenomic profiles. Full and subsampled metagenomic datasets were then annotated against M5NR/M5RNA (taxonomic classification) and SEED Subsystems (metabolic classification) databases. Further comparative analyses were performed using a number of statistical tools including: hierarchical agglomerative clustering (CLUSTER); similarity profile analysis (SIMPROF); non-metric multidimensional scaling (NMDS); and canonical analysis of principal coordinates (CAP) at all major levels of taxonomic and metabolic classification. Our data showed that shotgun and WGA-based approaches generated highly similar metagenomic profiles for the soil samples such that the soil samples could not be distinguished accurately. An AP-PCR based approach was shown to be successful at obtaining reproducible site-specific metagenomic DNA profiles, which in turn were employed for successful discrimination of visually similar soil samples collected from two different locations.

  20. Evaluating the protein coding potential of exonized transposable element sequences

    PubMed Central

    Piriyapongsa, Jittima; Rutledge, Mark T; Patel, Sanil; Borodovsky, Mark; Jordan, I King

    2007-01-01

    Background Transposable element (TE) sequences, once thought to be merely selfish or parasitic members of the genomic community, have been shown to contribute a wide variety of functional sequences to their host genomes. Analysis of complete genome sequences have turned up numerous cases where TE sequences have been incorporated as exons into mRNAs, and it is widely assumed that such 'exonized' TEs encode protein sequences. However, the extent to which TE-derived sequences actually encode proteins is unknown and a matter of some controversy. We have tried to address this outstanding issue from two perspectives: i-by evaluating ascertainment biases related to the search methods used to uncover TE-derived protein coding sequences (CDS) and ii-through a probabilistic codon-frequency based analysis of the protein coding potential of TE-derived exons. Results We compared the ability of three classes of sequence similarity search methods to detect TE-derived sequences among data sets of experimentally characterized proteins: 1-a profile-based hidden Markov model (HMM) approach, 2-BLAST methods and 3-RepeatMasker. Profile based methods are more sensitive and more selective than the other methods evaluated. However, the application of profile-based search methods to the detection of TE-derived sequences among well-curated experimentally characterized protein data sets did not turn up many more cases than had been previously detected and nowhere near as many cases as recent genome-wide searches have. We observed that the different search methods used were complementary in the sense that they yielded largely non-overlapping sets of hits and differed in their ability to recover known cases of TE-derived CDS. The probabilistic analysis of TE-derived exon sequences indicates that these sequences have low protein coding potential on average. In particular, non-autonomous TEs that do not encode protein sequences, such as Alu elements, are frequently exonized but unlikely to encode protein sequences. Conclusion The exaptation of the numerous TE sequences found in exons as bona fide protein coding sequences may prove to be far less common than has been suggested by the analysis of complete genomes. We hypothesize that many exonized TE sequences actually function as post-transcriptional regulators of gene expression, rather than coding sequences, which may act through a variety of double stranded RNA related regulatory pathways. Indeed, their relatively high copy numbers and similarity to sequences dispersed throughout the genome suggests that exonized TE sequences could serve as master regulators with a wide scope of regulatory influence. Reviewers: This article was reviewed by Itai Yanai, Kateryna D. Makova, Melissa Wilson (nominated by Kateryna D. Makova) and Cedric Feschotte (nominated by John M. Logsdon Jr.). PMID:18036258

  1. Sequencing Bands of Ribosomal Intergenic Spacer Analysis Fingerprints for Characterization and Microscale Distribution of Soil Bacterium Populations Responding to Mercury Spiking

    PubMed Central

    Ranjard, Lionel; Brothier, Elisabeth; Nazaret, Sylvie

    2000-01-01

    Two major emerging bands (a 350-bp band and a 650-bp band) within the RISA (ribosomal intergenic spacer analysis) profile of a soil bacterial community spiked with Hg(II) were selected for further identification of the populations involved in the response of the community to the added metal. The bands were cut out from polyacrylamide gels, cloned, characterized by restriction analysis, and sequenced for phylogenetic affiliation of dominant clones. The sequences were the intergenic spacer between the rrs and rrl genes and the first 130 nucleotides of the rrl gene. Comparison of sequences derived from the 350-bp band to The GenBank database permitted us to identify the bacteria as being mostly close relatives to low G+C firmicutes (Clostridium-like genera), while the 650-bp band permitted us to identify the bacteria as being mostly close relatives to β-proteobacteria (Ralstonia-like genera). Oligonucleotide probes specific for the identified dominant bacteria were designed and hybridized with the RISA profiles derived from the control and spiked communities. These studies confirmed the contribution of these populations to the community response to the metal. Hybridization of the RISA profiles from subcommunities (bacterial pools associated with different soil microenvironments) also permitted to characterize the distribution and the dynamics of these populations at a microscale level following mercury spiking. PMID:11097911

  2. Genetic diversity reflects geographical origin of Ralstonia solanacearum strains isolated from plant and water sources in Spain.

    PubMed

    Caruso, Paola; Biosca, Elena G; Bertolini, Edson; Marco-Noales, Ester; Gorris, María Teresa; Licciardello, Concetta; López, María M

    2017-12-01

    The characterization and intraspecific diversity of a collection of 45 Ralstonia solanacearum strains isolated in Spain from different sources and geographical origins is reported. To test the influence of the site and the host on strain diversity, phenotypic and genotypic analysis were performed by a polyphasic approach. Biochemical and metabolic profiles were compared. Serological relationship was evaluated by Indirect-ELISA using polyclonal and monoclonal antibodies. For genotypic analysis, hrpB and egl DNA sequence analysis, repetitive sequences (rep-PCR), amplified fragment length polymorphism (AFLP) profiles and macrorestriction with XbaI followed by pulsed field gel electrophoresis (PFGE) were performed. The biochemical and metabolic characterization, serological tests, rep-PCR typing and phylogenetic analysis showed that all analysed strains belonged to phylotype II sequevar 1 and shared homogeneous profiles. However, interesting differences among strains were found by AFLP and macrorestriction with XbaI followed by PFGE techniques, some profiles being related to the geographical origin of the strains. Diversity results obtained offer new insights into the biogeography of this quarantine organism and its possible sources and reservoirs in Spain and Mediterranean countries. Copyright© by the Spanish Society for Microbiology and Institute for Catalan Studies.

  3. Composite transcriptome assembly of RNA-seq data in a sheep model for delayed bone healing.

    PubMed

    Jäger, Marten; Ott, Claus-Eric; Grünhagen, Johannes; Hecht, Jochen; Schell, Hanna; Mundlos, Stefan; Duda, Georg N; Robinson, Peter N; Lienau, Jasmin

    2011-03-24

    The sheep is an important model organism for many types of medically relevant research, but molecular genetic experiments in the sheep have been limited by the lack of knowledge about ovine gene sequences. Prior to our study, mRNA sequences for only 1,556 partial or complete ovine genes were publicly available. Therefore, we developed a composite de novo transcriptome assembly method for next-generation sequence data to combine known ovine mRNA and EST sequences, mRNA sequences from mouse and cow, and sequences assembled de novo from short read RNA-Seq data into a composite reference transcriptome, and identified transcripts from over 12 thousand previously undescribed ovine genes. Gene expression analysis based on these data revealed substantially different expression profiles in standard versus delayed bone healing in an ovine tibial osteotomy model. Hundreds of transcripts were differentially expressed between standard and delayed healing and between the time points of the standard and delayed healing groups. We used the sheep sequences to design quantitative RT-PCR assays with which we validated the differential expression of 26 genes that had been identified by RNA-seq analysis. A number of clusters of characteristic expression profiles could be identified, some of which showed striking differences between the standard and delayed healing groups. Gene Ontology (GO) analysis showed that the differentially expressed genes were enriched in terms including extracellular matrix, cartilage development, contractile fiber, and chemokine activity. Our results provide a first atlas of gene expression profiles and differentially expressed genes in standard and delayed bone healing in a large-animal model and provide a number of clues as to the shifts in gene expression that underlie delayed bone healing. In the course of our study, we identified transcripts of 13,987 ovine genes, including 12,431 genes for which no sequence information was previously available. This information will provide a basis for future molecular research involving the sheep as a model organism.

  4. Composite transcriptome assembly of RNA-seq data in a sheep model for delayed bone healing

    PubMed Central

    2011-01-01

    Background The sheep is an important model organism for many types of medically relevant research, but molecular genetic experiments in the sheep have been limited by the lack of knowledge about ovine gene sequences. Results Prior to our study, mRNA sequences for only 1,556 partial or complete ovine genes were publicly available. Therefore, we developed a composite de novo transcriptome assembly method for next-generation sequence data to combine known ovine mRNA and EST sequences, mRNA sequences from mouse and cow, and sequences assembled de novo from short read RNA-Seq data into a composite reference transcriptome, and identified transcripts from over 12 thousand previously undescribed ovine genes. Gene expression analysis based on these data revealed substantially different expression profiles in standard versus delayed bone healing in an ovine tibial osteotomy model. Hundreds of transcripts were differentially expressed between standard and delayed healing and between the time points of the standard and delayed healing groups. We used the sheep sequences to design quantitative RT-PCR assays with which we validated the differential expression of 26 genes that had been identified by RNA-seq analysis. A number of clusters of characteristic expression profiles could be identified, some of which showed striking differences between the standard and delayed healing groups. Gene Ontology (GO) analysis showed that the differentially expressed genes were enriched in terms including extracellular matrix, cartilage development, contractile fiber, and chemokine activity. Conclusions Our results provide a first atlas of gene expression profiles and differentially expressed genes in standard and delayed bone healing in a large-animal model and provide a number of clues as to the shifts in gene expression that underlie delayed bone healing. In the course of our study, we identified transcripts of 13,987 ovine genes, including 12,431 genes for which no sequence information was previously available. This information will provide a basis for future molecular research involving the sheep as a model organism. PMID:21435219

  5. A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis.

    PubMed

    Liu, Bin; Wang, Xiaolong; Lin, Lei; Dong, Qiwen; Wang, Xuan

    2008-12-01

    Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. A key step to improve the performance of the SVM-based methods is to find a suitable representation of protein sequences. In this paper, a novel building block of proteins called Top-n-grams is presented, which contains the evolutionary information extracted from the protein sequence frequency profiles. The protein sequence frequency profiles are calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into Top-n-grams. The protein sequences are transformed into fixed-dimension feature vectors by the occurrence times of each Top-n-gram. The training vectors are evaluated by SVM to train classifiers which are then used to classify the test protein sequences. We demonstrate that the prediction performance of remote homology detection and fold recognition can be improved by combining Top-n-grams and latent semantic analysis (LSA), which is an efficient feature extraction technique from natural language processing. When tested on superfamily and fold benchmarks, the method combining Top-n-grams and LSA gives significantly better results compared to related methods. The method based on Top-n-grams significantly outperforms the methods based on many other building blocks including N-grams, patterns, motifs and binary profiles. Therefore, Top-n-gram is a good building block of the protein sequences and can be widely used in many tasks of the computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the prediction of protein binding sites.

  6. Next-generation sequencing facilitates quantitative analysis of wild-type and Nrl−/− retinal transcriptomes

    PubMed Central

    Brooks, Matthew J.; Rajasimha, Harsha K.; Roger, Jerome E.

    2011-01-01

    Purpose Next-generation sequencing (NGS) has revolutionized systems-based analysis of cellular pathways. The goals of this study are to compare NGS-derived retinal transcriptome profiling (RNA-seq) to microarray and quantitative reverse transcription polymerase chain reaction (qRT–PCR) methods and to evaluate protocols for optimal high-throughput data analysis. Methods Retinal mRNA profiles of 21-day-old wild-type (WT) and neural retina leucine zipper knockout (Nrl−/−) mice were generated by deep sequencing, in triplicate, using Illumina GAIIx. The sequence reads that passed quality filters were analyzed at the transcript isoform level with two methods: Burrows–Wheeler Aligner (BWA) followed by ANOVA (ANOVA) and TopHat followed by Cufflinks. qRT–PCR validation was performed using TaqMan and SYBR Green assays. Results Using an optimized data analysis workflow, we mapped about 30 million sequence reads per sample to the mouse genome (build mm9) and identified 16,014 transcripts in the retinas of WT and Nrl−/− mice with BWA workflow and 34,115 transcripts with TopHat workflow. RNA-seq data confirmed stable expression of 25 known housekeeping genes, and 12 of these were validated with qRT–PCR. RNA-seq data had a linear relationship with qRT–PCR for more than four orders of magnitude and a goodness of fit (R2) of 0.8798. Approximately 10% of the transcripts showed differential expression between the WT and Nrl−/− retina, with a fold change ≥1.5 and p value <0.05. Altered expression of 25 genes was confirmed with qRT–PCR, demonstrating the high degree of sensitivity of the RNA-seq method. Hierarchical clustering of differentially expressed genes uncovered several as yet uncharacterized genes that may contribute to retinal function. Data analysis with BWA and TopHat workflows revealed a significant overlap yet provided complementary insights in transcriptome profiling. Conclusions Our study represents the first detailed analysis of retinal transcriptomes, with biologic replicates, generated by RNA-seq technology. The optimized data analysis workflows reported here should provide a framework for comparative investigations of expression profiles. Our results show that NGS offers a comprehensive and more accurate quantitative and qualitative evaluation of mRNA content within a cell or tissue. We conclude that RNA-seq based transcriptome characterization would expedite genetic network analyses and permit the dissection of complex biologic functions. PMID:22162623

  7. Single-Cell Sequencing for Precise Cancer Research: Progress and Prospects.

    PubMed

    Zhang, Xiaoyan; Marjani, Sadie L; Hu, Zhaoyang; Weissman, Sherman M; Pan, Xinghua; Wu, Shixiu

    2016-03-15

    Advances in genomic technology have enabled the faithful detection and measurement of mutations and the gene expression profile of cancer cells at the single-cell level. Recently, several single-cell sequencing methods have been developed that permit the comprehensive and precise analysis of the cancer-cell genome, transcriptome, and epigenome. The use of these methods to analyze cancer cells has led to a series of unanticipated discoveries, such as the high heterogeneity and stochastic changes in cancer-cell populations, the new driver mutations and the complicated clonal evolution mechanisms, and the novel identification of biomarkers of variant tumors. These methods and the knowledge gained from their utilization could potentially improve the early detection and monitoring of rare cancer cells, such as circulating tumor cells and disseminated tumor cells, and promote the development of personalized and highly precise cancer therapy. Here, we discuss the current methods for single cancer-cell sequencing, with a strong focus on those practically used or potentially valuable in cancer research, including single-cell isolation, whole genome and transcriptome amplification, epigenome profiling, multi-dimensional sequencing, and next-generation sequencing and analysis. We also examine the current applications, challenges, and prospects of single cancer-cell sequencing. ©2016 American Association for Cancer Research.

  8. Echinococcus granulosus Sensu Stricto in Dogs and Jackals from Caspian Sea Region, Northern Iran

    PubMed Central

    GHOLAMI, Shirzad; JAHANDAR, Hefzallah; ABASTABAR, Mahdi; PAGHEH, Abdolsatar; MOBEDI, Iraj; SHARBATKHORI, Mitra

    2016-01-01

    Background: The aim of the present study was genotyping of Echinococcus granulosus isolates from dogs and jackals in Mazandaran Province, northern Iran, and using partial sequence of the mitochondrial cytochrome c oxidase subunit 1 gene (cox1). Methods: E. granulosus isolates (n = 15) were collected from 42 stray dogs and 16 jackals found in south of the Caspian Sea in northern Iran. After morphological study, the isolates were genetically characterized using consensus sequences (366bp) of the cox1 gene. Phylogenetic analysis of cox1 nucleotide sequence data was performed using a Bayesian Inference approach. Results: Four different sequences were observed among the isolates. Two genotypes [G1 (66.7%) and G3 (33.3%)] were identified among the isolates. The G1 sequences indicated three sequence profiles. One profile (Maz1) had 100% homology with reference sequence (AN: KP339045). Two other profiles, designated Maz2 and Maz3, had 99% homology with the G1 genotype (ANs: KP339046 and KP339047). A G3 sequence designated Maz4 showed 100% homology with a G3 reference sequence (AN: KP339048). Conclusion: The occurrence of the G1 genotype of E. granulosus sensu stricto as a frequent genotype in dogs is emphasized. This study established the first molecular characterization of E. granulosus in the province. PMID:28096852

  9. Evaluation of Verbal, Spatial and Numerical Sequencing Scores in the WISC-R, with Special Reference to Children with Reading Difficulties.

    ERIC Educational Resources Information Center

    Moseley, David

    The paper reviews factor analytic studies concerning the Wechsler Intelligence Scale for Children-Revised (WISC-R) profiles of children with learning disabilities (LD). Considered are the following topics: subtest profiles of backward readers, a sex difference in coding, and derivation and use of grouped subtest scores in profile analysis. The…

  10. Quantitative phenotyping via deep barcode sequencing

    PubMed Central

    Smith, Andrew M.; Heisler, Lawrence E.; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J.; Chee, Mark; Roth, Frederick P.; Giaever, Guri; Nislow, Corey

    2009-01-01

    Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or “Bar-seq,” outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that ∼20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene–environment interactions on a genome-wide scale. PMID:19622793

  11. Molecular Characterization and Phylogenetic Analysis of Pseudomonas aeruginosa Isolates Recovered from Greek Aquatic Habitats Implementing the Double-Locus Sequence Typing Scheme.

    PubMed

    Pappa, Olga; Beloukas, Apostolos; Vantarakis, Apostolos; Mavridou, Athena; Kefala, Anastasia-Maria; Galanis, Alex

    2017-07-01

    The recently described double-locus sequence typing (DLST) scheme implemented to deeply characterize the genetic profiles of 52 resistant environmental Pseudomonas aeruginosa isolates deriving from aquatic habitats of Greece. DLST scheme was able not only to assign an already known allelic profile to the majority of the isolates but also to recognize two new ones (ms217-190, ms217-191) with high discriminatory power. A third locus (oprD) was also used for the molecular typing, which has been found to be fundamental for the phylogenetic analysis of environmental isolates given the resulted increased discrimination between the isolates. Additionally, the circulation of acquired resistant mechanisms in the aquatic habitats according to their genetic profiles was proved to be more extent. Hereby, we suggest that the combination of the DLST to oprD typing can discriminate phenotypically and genetically related environmental P. aeruginosa isolates providing reliable phylogenetic analysis at a local level.

  12. Total Extracellular Small RNA Profiles from Plasma, Saliva, and Urine of Healthy Subjects

    PubMed Central

    Yeri, Ashish; Courtright, Amanda; Reiman, Rebecca; Carlson, Elizabeth; Beecroft, Taylor; Janss, Alex; Siniard, Ashley; Richholt, Ryan; Balak, Chris; Rozowsky, Joel; Kitchen, Robert; Hutchins, Elizabeth; Winarta, Joseph; McCoy, Roger; Anastasi, Matthew; Kim, Seungchan; Huentelman, Matthew; Van Keuren-Jensen, Kendall

    2017-01-01

    Interest in circulating RNAs for monitoring and diagnosing human health has grown significantly. There are few datasets describing baseline expression levels for total cell-free circulating RNA from healthy control subjects. In this study, total extracellular RNA (exRNA) was isolated and sequenced from 183 plasma samples, 204 urine samples and 46 saliva samples from 55 male college athletes ages 18–25 years. Many participants provided more than one sample, allowing us to investigate variability in an individual’s exRNA expression levels over time. Here we provide a systematic analysis of small exRNAs present in each biofluid, as well as an analysis of exogenous RNAs. The small RNA profile of each biofluid is distinct. We find that a large number of RNA fragments in plasma (63%) and urine (54%) have sequences that are assigned to YRNA and tRNA fragments respectively. Surprisingly, while many miRNAs can be detected, there are few miRNAs that are consistently detected in all samples from a single biofluid, and profiles of miRNA are different for each biofluid. Not unexpectedly, saliva samples have high levels of exogenous sequence that can be traced to bacteria. These data significantly contribute to the current number of sequenced exRNA samples from normal healthy individuals. PMID:28303895

  13. VISA--Vector Integration Site Analysis server: a web-based server to rapidly identify retroviral integration sites from next-generation sequencing.

    PubMed

    Hocum, Jonah D; Battrell, Logan R; Maynard, Ryan; Adair, Jennifer E; Beard, Brian C; Rawlings, David J; Kiem, Hans-Peter; Miller, Daniel G; Trobridge, Grant D

    2015-07-07

    Analyzing the integration profile of retroviral vectors is a vital step in determining their potential genotoxic effects and developing safer vectors for therapeutic use. Identifying retroviral vector integration sites is also important for retroviral mutagenesis screens. We developed VISA, a vector integration site analysis server, to analyze next-generation sequencing data for retroviral vector integration sites. Sequence reads that contain a provirus are mapped to the human genome, sequence reads that cannot be localized to a unique location in the genome are filtered out, and then unique retroviral vector integration sites are determined based on the alignment scores of the remaining sequence reads. VISA offers a simple web interface to upload sequence files and results are returned in a concise tabular format to allow rapid analysis of retroviral vector integration sites.

  14. Comprehensive analysis of RNA-protein interactions by high-throughput sequencing-RNA affinity profiling.

    PubMed

    Tome, Jacob M; Ozer, Abdullah; Pagano, John M; Gheba, Dan; Schroth, Gary P; Lis, John T

    2014-06-01

    RNA-protein interactions play critical roles in gene regulation, but methods to quantitatively analyze these interactions at a large scale are lacking. We have developed a high-throughput sequencing-RNA affinity profiling (HiTS-RAP) assay by adapting a high-throughput DNA sequencer to quantify the binding of fluorescently labeled protein to millions of RNAs anchored to sequenced cDNA templates. Using HiTS-RAP, we measured the affinity of mutagenized libraries of GFP-binding and NELF-E-binding aptamers to their respective targets and identified critical regions of interaction. Mutations additively affected the affinity of the NELF-E-binding aptamer, whose interaction depended mainly on a single-stranded RNA motif, but not that of the GFP aptamer, whose interaction depended primarily on secondary structure.

  15. Genome-wide sequencing and quantification of circulating microRNAs for dogs with congestive heart failure secondary to myxomatous mitral valve degeneration.

    PubMed

    Jung, SeungWoo; Bohan, Amy

    2018-02-01

    OBJECTIVE To characterize expression profiles of circulating microRNAs via genome-wide sequencing for dogs with congestive heart failure (CHF) secondary to myxomatous mitral valve degeneration (MMVD). ANIMALS 9 healthy client-owned dogs and 8 age-matched client-owned dogs with CHF secondary to MMVD. PROCEDURES Blood samples were collected before administering cardiac medications for the management of CHF. Isolated microRNAs from plasma were classified into microRNA libraries and subjected to next-generation sequencing (NGS) for genome-wide sequencing analysis and quantification of circulating microRNAs. Quantitative reverse transcription PCR (qRT-PCR) assays were used to validate expression profiles of differentially expressed circulating microRNAs identified from NGS analysis of dogs with CHF. RESULTS 326 microRNAs were identified with NGS analysis. Hierarchical analysis revealed distinct expression patterns of circulating microRNAs between healthy dogs and dogs with CHF. Results of qRT-PCR assays confirmed upregulation of 4 microRNAs (miR-133, miR-1, miR-let-7e, and miR-125) and downregulation of 4 selected microRNAs (miR-30c, miR-128, miR-142, and miR-423). Results of qRT-PCR assays were highly correlated with NGS data and supported the specificity of circulating microRNA expression profiles in dogs with CHF secondary to MMVD. CONCLUSIONS AND CLINICAL RELEVANCE These results suggested that circulating microRNA expression patterns were unique and could serve as molecular biomarkers of CHF in dogs with MMVD.

  16. Analysis of microRNA profile of Anopheles sinensis by deep sequencing and bioinformatic approaches.

    PubMed

    Feng, Xinyu; Zhou, Xiaojian; Zhou, Shuisen; Wang, Jingwen; Hu, Wei

    2018-03-12

    microRNAs (miRNAs) are small non-coding RNAs widely identified in many mosquitoes. They are reported to play important roles in development, differentiation and innate immunity. However, miRNAs in Anopheles sinensis, one of the Chinese malaria mosquitoes, remain largely unknown. We investigated the global miRNA expression profile of An. sinensis using Illumina Hiseq 2000 sequencing. Meanwhile, we applied a bioinformatic approach to identify potential miRNAs in An. sinensis. The identified miRNA profiles were compared and analyzed by two approaches. The selected miRNAs from the sequencing result and the bioinformatic approach were confirmed with qRT-PCR. Moreover, target prediction, GO annotation and pathway analysis were carried out to understand the role of miRNAs in An. sinensis. We identified 49 conserved miRNAs and 12 novel miRNAs by next-generation high-throughput sequencing technology. In contrast, 43 miRNAs were predicted by the bioinformatic approach, of which two were assigned as novel. Comparative analysis of miRNA profiles by two approaches showed that 21 miRNAs were shared between them. Twelve novel miRNAs did not match any known miRNAs of any organism, indicating that they are possibly species-specific. Forty miRNAs were found in many mosquito species, indicating that these miRNAs are evolutionally conserved and may have critical roles in the process of life. Both the selected known and novel miRNAs (asi-miR-281, asi-miR-184, asi-miR-14, asi-miR-nov5, asi-miR-nov4, asi-miR-9383, and asi-miR-2a) could be detected by quantitative real-time PCR (qRT-PCR) in the sequenced sample, and the expression patterns of these miRNAs measured by qRT-PCR were in concordance with the original miRNA sequencing data. The predicted targets for the known and the novel miRNAs covered many important biological roles and pathways indicating the diversity of miRNA functions. We also found 21 conserved miRNAs and eight counterparts of target immune pathway genes in An. sinensis based on the analysis of An. gambiae. Our results provide the first lead to the elucidation of the miRNA profile in An. sinensis. Unveiling the roles of mosquito miRNAs will undoubtedly lead to a better understanding of mosquito biology and mosquito-pathogen interactions. This work lays the foundation for the further functional study of An. sinensis miRNAs and will facilitate their application in vector control.

  17. Isolation, sequence identification and tissue expression profiles of 3 novel porcine genes: ASPA, NAGA, and HEXA.

    PubMed

    Shu, Xianghua; Liu, Yonggang; Yang, Liangyu; Song, Chunlian; Hou, Jiafa

    2008-01-01

    The complete coding sequences of 3 porcine genes - ASPA, NAGA, and HEXA - were amplified by the reverse transcriptase polymerase chain reaction (RT-PCR) based on the conserved sequence information of the mouse or other mammals and referenced pig ESTs. These 3 novel porcine genes were then deposited in the NCBI database and assigned GeneIDs: 100142661, 100142664 and 100142667. The phylogenetic tree analysis revealed that the porcine ASPA, NAGA, and HEXA all have closer genetic relationships with the ASPA, NAGA, and HEXA of cattle. Tissue expression profile analysis was also carried out and results revealed that swine ASPA, NAGA, and HEXA genes were differentially expressed in various organs, including skeletal muscle, the heart, liver, fat, kidney, lung, and small and large intestines. Our experiment is the first one to establish the foundation for further research on these 3 swine genes.

  18. Rapid Identification and Subtyping of Helicobacter cinaedi Strains by Intact-Cell Mass Spectrometry Profiling with the Use of Matrix-Assisted Laser Desorption Ionization–Time of Flight Mass Spectrometry

    PubMed Central

    Taniguchi, Takako; Sekiya, Ayumi; Higa, Mariko; Saeki, Yuji; Umeki, Kazumi; Okayama, Akihiko; Hayashi, Tetsuya

    2014-01-01

    Helicobacter cinaedi infection is recognized as an increasingly important emerging disease in humans. Although H. cinaedi-like strains have been isolated from a variety of animals, it is difficult to identify particular isolates due to their unusual phenotypic profiles and the limited number of biochemical tests for detecting helicobacters. Moreover, analyses of the 16S rRNA gene sequences are also limited due to the high levels of similarity among closely related helicobacters. This study was conducted to evaluate intact-cell mass spectrometry (ICMS) profiling using matrix-assisted laser desorption ionization–time of flight mass spectrometry (MALDI-TOF MS) as a tool for the identification of H. cinaedi. A total of 68 strains of H. cinaedi isolated from humans, dogs, a cat, and hamsters were examined in addition to other Helicobacter species. The major ICMS profiles of H. cinaedi were identical and differed from those of Helicobacter bilis, which show >98% sequence similarity at the 16S rRNA sequence level. A phyloproteomic analysis of the H. cinaedi strains examined in this work revealed that human isolates formed a single cluster that was distinct from that of the animal isolates, with the exception of two strains from dogs. These phyloproteomic results agreed with those of the phylogenetic analysis based on the nucleotide sequences of the hsp60 gene. Because they formed a distinct cluster in both analyses, our data suggest that animal strains may not be a major source of infection in humans. In conclusion, the ICMS profiles obtained using a MALDI-TOF MS approach may be useful for the identification and subtyping of H. cinaedi. PMID:24153128

  19. microRNA expression profiling in fetal single ventricle malformation identified by deep sequencing.

    PubMed

    Yu, Zhang-Bin; Han, Shu-Ping; Bai, Yun-Fei; Zhu, Chun; Pan, Ya; Guo, Xi-Rong

    2012-01-01

    microRNAs (miRNAs) have emerged as key regulators in many biological processes, particularly cardiac growth and development, although the specific miRNA expression profile associated with this process remains to be elucidated. This study aimed to characterize the cellular microRNA profile involved in the development of congenital heart malformation, through the investigation of single ventricle (SV) defects. Comprehensive miRNA profiling in human fetal SV cardiac tissue was performed by deep sequencing. Differential expression of 48 miRNAs was revealed by sequencing by oligonucleotide ligation and detection (SOLiD) analysis. Of these, 38 were down-regulated and 10 were up-regulated in differentiated SV cardiac tissue, compared to control cardiac tissue. This was confirmed by real-time quantitative reverse transcription-polymerase chain reaction (qRT-PCR) analysis. Predicted target genes of the 48 differentially expressed miRNAs were analyzed by gene ontology and categorized according to cellular process, regulation of biological process and metabolic process. Pathway-Express analysis identified the WNT and mTOR signaling pathways as the most significant processes putatively affected by the differential expression of these miRNAs. The candidate genes involved in cardiac development were identified as potential targets for these differentially expressed microRNAs and the collaborative network of microRNAs and cardiac development related-mRNAs was constructed. These data provide the basis for future investigation of the mechanism of the occurrence and development of fetal SV malformations.

  20. Dose-Response Analysis of RNA-Seq Profiles in Archival Formalin-Fixed Paraffin-Embedded (FFPE) Samples.

    EPA Science Inventory

    Use of archival resources has been limited to date by inconsistent methods for genomic profiling of degraded RNA from formalin-fixed paraffin-embedded (FFPE) samples. RNA-sequencing offers a promising way to address this problem. Here we evaluated transcriptomic dose responses us...

  1. Diversity of Bradyrhizobium strains nodulating Lupinus micranthus on both sides of the Western Mediterranean: Algeria and Spain.

    PubMed

    Bourebaba, Yasmina; Durán, David; Boulila, Farida; Ahnia, Hadjira; Boulila, Abdelghani; Temprano, Francisco; Palacios, José M; Imperial, Juan; Ruiz-Argüeso, Tomás; Rey, Luis

    2016-06-01

    Lupinus micranthus is a lupine distributed in the Mediterranean basin whose nitrogen fixing symbiosis has not been described in detail. In this study, 101 slow-growing nodule isolates were obtained from L. micranthus thriving in soils on both sides of the Western Mediterranean. The diversity of the isolates, 60 from Algeria and 41 from Spain, was addressed by multilocus sequence analysis of housekeeping genes (16S rRNA, atpD, glnII and recA) and one symbiotic gene (nodC). Using genomic fingerprints from BOX elements, 37 different profiles were obtained (22 from Algeria and 15 from Spain). Phylogenetic analysis based on 16S rRNA and concatenated atpD, glnII and recA sequences of a representative isolate of each BOX profile displayed a homogeneous distribution of profiles in six different phylogenetic clusters. All isolates were taxonomically ascribed to the genus Bradyrhizobium. Three clusters comprising 24, 6, and 4 isolates, respectively, accounted for most of the profiles. The largest cluster was close to the Bradyrhizobium canariense lineage, while the other two were related to B. cytisi/B. rifense. The three remaining clusters included only one isolate each, and were close to B. canariense, B. japonicum and B. elkanii species, respectively. In contrast, phylogenetic clustering of BOX profiles based on nodC sequences yielded only two phylogenetic groups. One of them included all the profiles except one, and belonged to symbiovar genistearum. The remaining profile, constituted by a strain related to B. elkanii, was not related to any well-defined symbiotic lineage, and may constitute both a new symbiovar and a new genospecies. Copyright © 2016 Elsevier GmbH. All rights reserved.

  2. Subgrouping Automata: automatic sequence subgrouping using phylogenetic tree-based optimum subgrouping algorithm.

    PubMed

    Seo, Joo-Hyun; Park, Jihyang; Kim, Eun-Mi; Kim, Juhan; Joo, Keehyoung; Lee, Jooyoung; Kim, Byung-Gee

    2014-02-01

    Sequence subgrouping for a given sequence set can enable various informative tasks such as the functional discrimination of sequence subsets and the functional inference of unknown sequences. Because an identity threshold for sequence subgrouping may vary according to the given sequence set, it is highly desirable to construct a robust subgrouping algorithm which automatically identifies an optimal identity threshold and generates subgroups for a given sequence set. To meet this end, an automatic sequence subgrouping method, named 'Subgrouping Automata' was constructed. Firstly, tree analysis module analyzes the structure of tree and calculates the all possible subgroups in each node. Sequence similarity analysis module calculates average sequence similarity for all subgroups in each node. Representative sequence generation module finds a representative sequence using profile analysis and self-scoring for each subgroup. For all nodes, average sequence similarities are calculated and 'Subgrouping Automata' searches a node showing statistically maximum sequence similarity increase using Student's t-value. A node showing the maximum t-value, which gives the most significant differences in average sequence similarity between two adjacent nodes, is determined as an optimum subgrouping node in the phylogenetic tree. Further analysis showed that the optimum subgrouping node from SA prevents under-subgrouping and over-subgrouping. Copyright © 2013. Published by Elsevier Ltd.

  3. Company profile: Complete Genomics Inc.

    PubMed

    Reid, Clifford

    2011-02-01

    Complete Genomics Inc. is a life sciences company that focuses on complete human genome sequencing. It is taking a completely different approach to DNA sequencing than other companies in the industry. Rather than building a general-purpose platform for sequencing all organisms and all applications, it has focused on a single application - complete human genome sequencing. The company's Complete Genomics Analysis Platform (CGA™ Platform) comprises an integrated package of biochemistry, instrumentation and software that sequences human genomes at the highest quality, lowest cost and largest scale available. Complete Genomics offers a turnkey service that enables customers to outsource their human genome sequencing to the company's genome sequencing center in Mountain View, CA, USA. Customers send in their DNA samples, the company does all the library preparation, DNA sequencing, assembly and variant analysis, and customers receive research-ready data that they can use for biological discovery.

  4. A chemogenomic analysis of the human proteome: application to enzyme families.

    PubMed

    Bernasconi, Paul; Chen, Min; Galasinski, Scott; Popa-Burke, Ioana; Bobasheva, Anna; Coudurier, Louis; Birkos, Steve; Hallam, Rhonda; Janzen, William P

    2007-10-01

    Sequence-based phylogenies (SBP) are well-established tools for describing relationships between proteins. They have been used extensively to predict the behavior and sensitivity toward inhibitors of enzymes within a family. The utility of this approach diminishes when comparing proteins with little sequence homology. Even within an enzyme family, SBPs must be complemented by an orthogonal method that is independent of sequence to better predict enzymatic behavior. A chemogenomic approach is demonstrated here that uses the inhibition profile of a 130,000 diverse molecule library to uncover relationships within a set of enzymes. The profile is used to construct a semimetric additive distance matrix. This matrix, in turn, defines a sequence-independent phylogeny (SIP). The method was applied to 97 enzymes (kinases, proteases, and phosphatases). SIP does not use structural information from the molecules used for establishing the profile, thus providing a more heuristic method than the current approaches, which require knowledge of the specific inhibitor's structure. Within enzyme families, SIP shows a good overall correlation with SBP. More interestingly, SIP uncovers distances within families that are not recognizable by sequence-based methods. In addition, SIP allows the determination of distance between enzymes with no sequence homology, thus uncovering novel relationships not predicted by SBP. This chemogenomic approach, used in conjunction with SBP, should prove to be a powerful tool for choosing target combinations for drug discovery programs as well as for guiding the selection of profiling and liability targets.

  5. High throughput 16SrRNA gene sequencing reveals the correlation between Propionibacterium acnes and sarcoidosis.

    PubMed

    Zhao, Meng-Meng; Du, Shan-Shan; Li, Qiu-Hong; Chen, Tao; Qiu, Hui; Wu, Qin; Chen, Shan-Shan; Zhou, Ying; Zhang, Yuan; Hu, Yang; Su, Yi-Liang; Shen, Li; Zhang, Fen; Weng, Dong; Li, Hui-Ping

    2017-02-01

    This study aims to use high throughput 16SrRNA gene sequencing to examine the bacterial profile of lymph node biopsy samples of patients with sarcoidosis and to further verify the association between Propionibacterium acnes (P. acnes) and sarcoidosis. A total of 36 mediastinal lymph node biopsy specimens were collected from 17 cases of sarcoidosis, 8 tuberculosis (TB group), and 11 non-infectious lung diseases (control group). The V4 region of the bacterial 16SrRNA gene in the specimens was amplified and sequenced using the high throughput sequencing platform MiSeq, and bacterial profile was established. The data analysis software QIIME and Metastats were used to compare bacterial relative abundance in the three patient groups. Overall, 545 genera were identified; 38 showed significantly lower and 29 had significantly higher relative abundance in the sarcoidosis group than in the TB and control groups (P < 0.01). P. acnes 16SrRNA was exclusively found in all the 17 samples of the sarcoidosis group, whereas was not detected in the TB and control groups. The relative abundance of P. acnes in the sarcoidosis group (0.16% ± 0. 11%) was significantly higher than that in the TB (Metastats analysis: P = 0.0010, q = 0.0044) and control groups (Metastats analysis: P = 0.0010, q = 0.0038). The relative abundance of P. granulosum was only 0.0022% ± 0. 0044% in the sarcoidosis group. P. granulosum 16SrRNA was not detected in the other two groups. High throughput 16SrRNA gene sequencing appears to be a useful tool to investigate the bacterial profile of sarcoidosis specimens. The results suggest that P. acnes may be involved in sarcoidosis development.

  6. DNA-Sequence Based Typing of the Cronobacter Genus Using MLST, CRISPR-cas Array and Capsular Profiling

    PubMed Central

    Ogrodzki, Pauline; Forsythe, Stephen J.

    2017-01-01

    The Cronobacter genus is composed of seven species, within which a number of pathovars have been described. The most notable infections by Cronobacter spp. are of infants through the consumption of contaminated infant formula. The description of the genus has greatly improved in recent years through DNA sequencing techniques, and this has led to a robust means of identification. However some species are highly clonal and this limits the ability to discriminate between unrelated strains by some methods of genotyping. This article updates the application of three genotyping methods across the Cronobacter genus. The three genotyping methods were multilocus sequence typing (MLST), capsular profiling of the K-antigen and colanic acid (CA) biosynthesis regions, and CRISPR-cas array profiling. A total of 1654 MLST profiled and 286 whole genome sequenced strains, available by open access at the PubMLST Cronobacter database, were used this analysis. The predominance of C. sakazakii and C. malonaticus in clinical infections was confirmed. The majority of clinical strains being in the C. sakazakii clonal complexes (CC) 1 and 4, sequence types (ST) 8 and 12 and C. malonaticus ST7. The capsular profile K2:CA2, previously proposed as being strongly associated with C. sakazakii and C. malonaticus isolates from severe neonatal infections, was also found in C. turicensis, C. dublinensis and C. universalis. The majority of CRISPR-cas types across the genus was the I-E (Ecoli) type. Some strains of C. dublinensis and C. muytjensii encoded the I-F (Ypseudo) type, and others lacked the cas gene loci. The significance of the expanding profiling will be of benefit to researchers as well as governmental and industrial risk assessors. PMID:29033918

  7. IMNGS: A comprehensive open resource of processed 16S rRNA microbial profiles for ecology and diversity studies.

    PubMed

    Lagkouvardos, Ilias; Joseph, Divya; Kapfhammer, Martin; Giritli, Sabahattin; Horn, Matthias; Haller, Dirk; Clavel, Thomas

    2016-09-23

    The SRA (Sequence Read Archive) serves as primary depository for massive amounts of Next Generation Sequencing data, and currently host over 100,000 16S rRNA gene amplicon-based microbial profiles from various host habitats and environments. This number is increasing rapidly and there is a dire need for approaches to utilize this pool of knowledge. Here we created IMNGS (Integrated Microbial Next Generation Sequencing), an innovative platform that uniformly and systematically screens for and processes all prokaryotic 16S rRNA gene amplicon datasets available in SRA and uses them to build sample-specific sequence databases and OTU-based profiles. Via a web interface, this integrative sequence resource can easily be queried by users. We show examples of how the approach allows testing the ecological importance of specific microorganisms in different hosts or ecosystems, and performing targeted diversity studies for selected taxonomic groups. The platform also offers a complete workflow for de novo analysis of users' own raw 16S rRNA gene amplicon datasets for the sake of comparison with existing data. IMNGS can be accessed at www.imngs.org.

  8. Study of cnidarian-algal symbiosis in the "omics" age.

    PubMed

    Meyer, Eli; Weis, Virginia M

    2012-08-01

    The symbiotic associations between cnidarians and dinoflagellate algae (Symbiodinium) support productive and diverse ecosystems in coral reefs. Many aspects of this association, including the mechanistic basis of host-symbiont recognition and metabolic interaction, remain poorly understood. The first completed genome sequence for a symbiotic anthozoan is now available (the coral Acropora digitifera), and extensive expressed sequence tag resources are available for a variety of other symbiotic corals and anemones. These resources make it possible to profile gene expression, protein abundance, and protein localization associated with the symbiotic state. Here we review the history of "omics" studies of cnidarian-algal symbiosis and the current availability of sequence resources for corals and anemones, identifying genes putatively involved in symbiosis across 10 anthozoan species. The public availability of candidate symbiosis-associated genes leaves the field of cnidarian-algal symbiosis poised for in-depth comparative studies of sequence diversity and gene expression and for targeted functional studies of genes associated with symbiosis. Reviewing the progress to date suggests directions for future investigations of cnidarian-algal symbiosis that include (i) sequencing of Symbiodinium, (ii) proteomic analysis of the symbiosome membrane complex, (iii) glycomic analysis of Symbiodinium cell surfaces, and (iv) expression profiling of the gastrodermal cells hosting Symbiodinium.

  9. Analysis of petunia hybrida in response to salt stress using high throughput RNA sequencing

    USDA-ARS?s Scientific Manuscript database

    Salt and drought are among the greatest challenges to crop and native plants in meeting their yield and reproductive potentials. DNA sequencing-enabled transcriptome profiling provides a means of assessing what genes are responding to salt or drought stress so as to better understand the molecular ...

  10. MetaMeta: integrating metagenome analysis tools to improve taxonomic profiling.

    PubMed

    Piro, Vitor C; Matschkowski, Marcel; Renard, Bernhard Y

    2017-08-14

    Many metagenome analysis tools are presently available to classify sequences and profile environmental samples. In particular, taxonomic profiling and binning methods are commonly used for such tasks. Tools available among these two categories make use of several techniques, e.g., read mapping, k-mer alignment, and composition analysis. Variations on the construction of the corresponding reference sequence databases are also common. In addition, different tools provide good results in different datasets and configurations. All this variation creates a complicated scenario to researchers to decide which methods to use. Installation, configuration and execution can also be difficult especially when dealing with multiple datasets and tools. We propose MetaMeta: a pipeline to execute and integrate results from metagenome analysis tools. MetaMeta provides an easy workflow to run multiple tools with multiple samples, producing a single enhanced output profile for each sample. MetaMeta includes a database generation, pre-processing, execution, and integration steps, allowing easy execution and parallelization. The integration relies on the co-occurrence of organisms from different methods as the main feature to improve community profiling while accounting for differences in their databases. In a controlled case with simulated and real data, we show that the integrated profiles of MetaMeta overcome the best single profile. Using the same input data, it provides more sensitive and reliable results with the presence of each organism being supported by several methods. MetaMeta uses Snakemake and has six pre-configured tools, all available at BioConda channel for easy installation (conda install -c bioconda metameta). The MetaMeta pipeline is open-source and can be downloaded at: https://gitlab.com/rki_bioinformatics .

  11. Identification of the sequence motif of glycoside hydrolase 13 family members

    PubMed Central

    Kumar, Vikash

    2011-01-01

    A bioinformatics analysis of sequences of enzymes of the glycoside hydrolase (GH) 13 family members such as α-amylase, cyclodextrin glycosyltransferase (CGTase), branching enzyme and cyclomaltodextrinase has been carried out in order to find out the sequence motifs that govern the reactions specificities of these enzymes by using hidden Markov model (HMM) profile. This analysis suggests the existence of such sequence motifs and residues of these motifs constituting the −1 to +3 catalytic subsites of the enzyme. Hence, by introducing mutations in the residues of these four subsites, one can change the reaction specificities of the enzymes. In general it has been observed that α -amylase sequence motif have low sequence conservation than rest of the motifs of the GH13 family members. PMID:21544166

  12. Analysis of temporal transcription expression profiles reveal links between protein function and developmental stages of Drosophila melanogaster.

    PubMed

    Wan, Cen; Lees, Jonathan G; Minneci, Federico; Orengo, Christine A; Jones, David T

    2017-10-01

    Accurate gene or protein function prediction is a key challenge in the post-genome era. Most current methods perform well on molecular function prediction, but struggle to provide useful annotations relating to biological process functions due to the limited power of sequence-based features in that functional domain. In this work, we systematically evaluate the predictive power of temporal transcription expression profiles for protein function prediction in Drosophila melanogaster. Our results show significantly better performance on predicting protein function when transcription expression profile-based features are integrated with sequence-derived features, compared with the sequence-derived features alone. We also observe that the combination of expression-based and sequence-based features leads to further improvement of accuracy on predicting all three domains of gene function. Based on the optimal feature combinations, we then propose a novel multi-classifier-based function prediction method for Drosophila melanogaster proteins, FFPred-fly+. Interpreting our machine learning models also allows us to identify some of the underlying links between biological processes and developmental stages of Drosophila melanogaster.

  13. Transcriptome analysis and gene expression profiling of abortive and developing ovules during fruit development in hazelnut.

    PubMed

    Cheng, Yunqing; Liu, Jianfeng; Zhang, Huidi; Wang, Ju; Zhao, Yixin; Geng, Wanting

    2015-01-01

    A high ratio of blank fruit in hazelnut (Corylus heterophylla Fisch) is a very common phenomenon that causes serious yield losses in northeast China. The development of blank fruit in the Corylus genus is known to be associated with embryo abortion. However, little is known about the molecular mechanisms responsible for embryo abortion during the nut development stage. Genomic information for C. heterophylla Fisch is not available; therefore, data related to transcriptome and gene expression profiling of developing and abortive ovules are needed. In this study, de novo transcriptome sequencing and RNA-seq analysis were conducted using short-read sequencing technology (Illumina HiSeq 2000). The results of the transcriptome assembly analysis revealed genetic information that was associated with the fruit development stage. Two digital gene expression libraries were constructed, one for a full (normally developing) ovule and one for an empty (abortive) ovule. Transcriptome sequencing and assembly results revealed 55,353 unigenes, including 18,751 clusters and 36,602 singletons. These results were annotated using the public databases NR, NT, Swiss-Prot, KEGG, COG, and GO. Using digital gene expression profiling, gene expression differences in developing and abortive ovules were identified. A total of 1,637 and 715 unigenes were significantly upregulated and downregulated, respectively, in abortive ovules, compared with developing ovules. Quantitative real-time polymerase chain reaction analysis was used in order to verify the differential expression of some genes. The transcriptome and digital gene expression profiling data of normally developing and abortive ovules in hazelnut provide exhaustive information that will improve our understanding of the molecular mechanisms of abortive ovule formation in hazelnut.

  14. High-throughput sequencing of human plasma RNA by using thermostable group II intron reverse transcriptases

    PubMed Central

    Qin, Yidan; Yao, Jun; Wu, Douglas C.; Nottingham, Ryan M.; Mohr, Sabine; Hunicke-Smith, Scott; Lambowitz, Alan M.

    2016-01-01

    Next-generation RNA-sequencing (RNA-seq) has revolutionized transcriptome profiling, gene expression analysis, and RNA-based diagnostics. Here, we developed a new RNA-seq method that exploits thermostable group II intron reverse transcriptases (TGIRTs) and used it to profile human plasma RNAs. TGIRTs have higher thermostability, processivity, and fidelity than conventional reverse transcriptases, plus a novel template-switching activity that can efficiently attach RNA-seq adapters to target RNA sequences without RNA ligation. The new TGIRT-seq method enabled construction of RNA-seq libraries from <1 ng of plasma RNA in <5 h. TGIRT-seq of RNA in 1-mL plasma samples from a healthy individual revealed RNA fragments mapping to a diverse population of protein-coding gene and long ncRNAs, which are enriched in intron and antisense sequences, as well as nearly all known classes of small ncRNAs, some of which have never before been seen in plasma. Surprisingly, many of the small ncRNA species were present as full-length transcripts, suggesting that they are protected from plasma RNases in ribonucleoprotein (RNP) complexes and/or exosomes. This TGIRT-seq method is readily adaptable for profiling of whole-cell, exosomal, and miRNAs, and for related procedures, such as HITS-CLIP and ribosome profiling. PMID:26554030

  15. Inaugural Genomics Automation Congress and the coming deluge of sequencing data.

    PubMed

    Creighton, Chad J

    2010-10-01

    Presentations at Select Biosciences's first 'Genomics Automation Congress' (Boston, MA, USA) in 2010 focused on next-generation sequencing and the platforms and methodology around them. The meeting provided an overview of sequencing technologies, both new and emerging. Speakers shared their recent work on applying sequencing to profile cells for various levels of biomolecular complexity, including DNA sequences, DNA copy, DNA methylation, mRNA and microRNA. With sequencing time and costs continuing to drop dramatically, a virtual explosion of very large sequencing datasets is at hand, which will probably present challenges and opportunities for high-level data analysis and interpretation, as well as for information technology infrastructure.

  16. Genetic and Metabolic Intraspecific Biodiversity of Ganoderma lucidum

    PubMed Central

    Pawlik, Anna; Janusz, Grzegorz; Dębska, Iwona; Siwulski, Marek; Frąc, Magdalena; Rogalski, Jerzy

    2015-01-01

    Fourteen Ganoderma lucidum strains from different geographic regions were identified using ITS region sequencing. Based on the sequences obtained, the genomic relationship between the analyzed strains was determined. All G. lucidum strains were also genetically characterized using the AFLP technique. G. lucidum strains included in the analysis displayed an AFLP profile similarity level in the range from 9.6 to 33.9%. Biolog FF MicroPlates were applied to obtain data on utilization of 95 carbon sources and mitochondrial activity. The analysis allowed comparison of functional diversity of the fungal strains. The substrate utilization profiles for the isolates tested revealed a broad variability within the analyzed G. lucidum species and proved to be a good profiling technology for studying the diversity in fungi. Significant differences have been demonstrated in substrate richness values. Interestingly, the analysis of growth and biomass production also differentiated the strains based on the growth rate on the agar and sawdust substrate. In general, the mycelial growth on the sawdust substrate was more balanced and the fastest fungal growth was observed for GRE3 and FCL192. PMID:25815332

  17. Performance of amplicon-based next generation DNA sequencing for diagnostic gene mutation profiling in oncopathology.

    PubMed

    Sie, Daoud; Snijders, Peter J F; Meijer, Gerrit A; Doeleman, Marije W; van Moorsel, Marinda I H; van Essen, Hendrik F; Eijk, Paul P; Grünberg, Katrien; van Grieken, Nicole C T; Thunnissen, Erik; Verheul, Henk M; Smit, Egbert F; Ylstra, Bauke; Heideman, Daniëlle A M

    2014-10-01

    Next generation DNA sequencing (NGS) holds promise for diagnostic applications, yet implementation in routine molecular pathology practice requires performance evaluation on DNA derived from routine formalin-fixed paraffin-embedded (FFPE) tissue specimens. The current study presents a comprehensive analysis of TruSeq Amplicon Cancer Panel-based NGS using a MiSeq Personal sequencer (TSACP-MiSeq-NGS) for somatic mutation profiling. TSACP-MiSeq-NGS (testing 212 hotspot mutation amplicons of 48 genes) and a data analysis pipeline were evaluated in a retrospective learning/test set approach (n = 58/n = 45 FFPE-tumor DNA samples) against 'gold standard' high-resolution-melting (HRM)-sequencing for the genes KRAS, EGFR, BRAF and PIK3CA. Next, the performance of the validated test algorithm was assessed in an independent, prospective cohort of FFPE-tumor DNA samples (n = 75). In the learning set, a number of minimum parameter settings was defined to decide whether a FFPE-DNA sample is qualified for TSACP-MiSeq-NGS and for calling mutations. The resulting test algorithm revealed 82% (37/45) compliance to the quality criteria and 95% (35/37) concordant assay findings for KRAS, EGFR, BRAF and PIK3CA with HRM-sequencing (kappa = 0.92; 95% CI = 0.81-1.03) in the test set. Subsequent application of the validated test algorithm to the prospective cohort yielded a success rate of 84% (63/75), and a high concordance with HRM-sequencing (95% (60/63); kappa = 0.92; 95% CI = 0.84-1.01). TSACP-MiSeq-NGS detected 77 mutations in 29 additional genes. TSACP-MiSeq-NGS is suitable for diagnostic gene mutation profiling in oncopathology.

  18. Rhea: a transparent and modular R pipeline for microbial profiling based on 16S rRNA gene amplicons

    PubMed Central

    Fischer, Sandra; Kumar, Neeraj

    2017-01-01

    The importance of 16S rRNA gene amplicon profiles for understanding the influence of microbes in a variety of environments coupled with the steep reduction in sequencing costs led to a surge of microbial sequencing projects. The expanding crowd of scientists and clinicians wanting to make use of sequencing datasets can choose among a range of multipurpose software platforms, the use of which can be intimidating for non-expert users. Among available pipeline options for high-throughput 16S rRNA gene analysis, the R programming language and software environment for statistical computing stands out for its power and increased flexibility, and the possibility to adhere to most recent best practices and to adjust to individual project needs. Here we present the Rhea pipeline, a set of R scripts that encode a series of well-documented choices for the downstream analysis of Operational Taxonomic Units (OTUs) tables, including normalization steps, alpha- and beta-diversity analysis, taxonomic composition, statistical comparisons, and calculation of correlations. Rhea is primarily a straightforward starting point for beginners, but can also be a framework for advanced users who can modify and expand the tool. As the community standards evolve, Rhea will adapt to always represent the current state-of-the-art in microbial profiles analysis in the clear and comprehensive way allowed by the R language. Rhea scripts and documentation are freely available at https://lagkouvardos.github.io/Rhea. PMID:28097056

  19. Rhea: a transparent and modular R pipeline for microbial profiling based on 16S rRNA gene amplicons.

    PubMed

    Lagkouvardos, Ilias; Fischer, Sandra; Kumar, Neeraj; Clavel, Thomas

    2017-01-01

    The importance of 16S rRNA gene amplicon profiles for understanding the influence of microbes in a variety of environments coupled with the steep reduction in sequencing costs led to a surge of microbial sequencing projects. The expanding crowd of scientists and clinicians wanting to make use of sequencing datasets can choose among a range of multipurpose software platforms, the use of which can be intimidating for non-expert users. Among available pipeline options for high-throughput 16S rRNA gene analysis, the R programming language and software environment for statistical computing stands out for its power and increased flexibility, and the possibility to adhere to most recent best practices and to adjust to individual project needs. Here we present the Rhea pipeline, a set of R scripts that encode a series of well-documented choices for the downstream analysis of Operational Taxonomic Units (OTUs) tables, including normalization steps, alpha - and beta -diversity analysis, taxonomic composition, statistical comparisons, and calculation of correlations. Rhea is primarily a straightforward starting point for beginners, but can also be a framework for advanced users who can modify and expand the tool. As the community standards evolve, Rhea will adapt to always represent the current state-of-the-art in microbial profiles analysis in the clear and comprehensive way allowed by the R language. Rhea scripts and documentation are freely available at https://lagkouvardos.github.io/Rhea.

  20. The molecular genetic makeup of acute lymphoblastic leukemia | Office of Cancer Genomics

    Cancer.gov

    Abstract: Genomic profiling has transformed our understanding of the genetic basis of acute lymphoblastic leukemia (ALL). Recent years have seen a shift from microarray analysis and candidate gene sequencing to next-generation sequencing. Together, these approaches have shown that many ALL subtypes are characterized by constellations of structural rearrangements, submicroscopic DNA copy number alterations, and sequence mutations, several of which have clear implications for risk stratification and targeted therapeutic intervention.

  1. Fluorescent in situ sequencing (FISSEQ) of RNA for gene expression profiling in intact cells and tissues

    PubMed Central

    Lee, Je Hyuk; Daugharthy, Evan R.; Scheiman, Jonathan; Kalhor, Reza; Ferrante, Thomas C.; Terry, Richard; Turczyk, Brian M.; Yang, Joyce L.; Lee, Ho Suk; Aach, John; Zhang, Kun; Church, George M.

    2014-01-01

    RNA sequencing measures the quantitative change in gene expression over the whole transcriptome, but it lacks spatial context. On the other hand, in situ hybridization provides the location of gene expression, but only for a small number of genes. Here we detail a protocol for genome-wide profiling of gene expression in situ in fixed cells and tissues, in which RNA is converted into cross-linked cDNA amplicons and sequenced manually on a confocal microscope. Unlike traditional RNA-seq our method enriches for context-specific transcripts over house-keeping and/or structural RNA, and it preserves the tissue architecture for RNA localization studies. Our protocol is written for researchers experienced in cell microscopy with minimal computing skills. Library construction and sequencing can be completed within 14 d, with image analysis requiring an additional 2 d. PMID:25675209

  2. Mechanisms controlling the complete accretionary beach state sequence

    NASA Astrophysics Data System (ADS)

    Dubarbier, Benjamin; Castelle, Bruno; Ruessink, Gerben; Marieu, Vincent

    2017-06-01

    Accretionary downstate beach sequence is a key element of observed nearshore morphological variability along sandy coasts. We present and analyze the first numerical simulation of such a sequence using a process-based morphodynamic model that solves the coupling between waves, depth-integrated currents, and sediment transport. The simulation evolves from an alongshore uniform barred beach (storm profile) to an almost featureless shore-welded terrace (summer profile) through the highly alongshore variable detached crescentic bar and transverse bar/rip system states. A global analysis of the full sequence allows determining the varying contributions of the different hydro-sedimentary processes. Sediment transport driven by orbital velocity skewness is critical to the overall onshore sandbar migration, while gravitational downslope sediment transport acts as a damping term inhibiting further channel growth enforced by rip flow circulation. Accurate morphological diffusivity and inclusion of orbital velocity skewness opens new perspectives in terms of morphodynamic modeling of real beaches.

  3. Molecular Identification of Unusual Pathogenic Yeast Isolates by Large Ribosomal Subunit Gene Sequencing: 2 Years of Experience at the United Kingdom Mycology Reference Laboratory▿

    PubMed Central

    Linton, Christopher J.; Borman, Andrew M.; Cheung, Grace; Holmes, Ann D.; Szekely, Adrien; Palmer, Michael D.; Bridge, Paul D.; Campbell, Colin K.; Johnson, Elizabeth M.

    2007-01-01

    Rapid identification of yeast isolates from clinical samples is particularly important given their innately variable antifungal susceptibility profiles. We present here an analysis of the utility of PCR amplification and sequence analysis of the hypervariable D1/D2 region of the 26S rRNA gene for the identification of yeast species submitted to the United Kingdom Mycology Reference Laboratory over a 2-year period. A total of 3,033 clinical isolates were received from 2004 to 2006 encompassing 50 different yeast species. While more than 90% of the isolates, corresponding to the most common Candida species, could be identified by using the AUXACOLOR2 yeast identification kit, 153 isolates (5%), comprised of 47 species, could not be identified by using this system and were subjected to molecular identification via 26S rRNA gene sequencing. These isolates included some common species that exhibited atypical biochemical and phenotypic profiles and also many rarer yeast species that are infrequently encountered in the clinical setting. All 47 species requiring molecular identification were unambiguously identified on the basis of D1/D2 sequences, and the molecular identities correlated well with the observed biochemical profiles of the various organisms. Together, our data underscore the utility of molecular techniques as a reference adjunct to conventional methods of yeast identification. Further, we show that PCR amplification and sequencing of the D1/D2 region reliably identifies more than 45 species of clinically significant yeasts and can also potentially identify new pathogenic yeast species. PMID:17251397

  4. Methylation-sensitive amplified polymorphism-based genome-wide analysis of cytosine methylation profiles in Nicotiana tabacum cultivars.

    PubMed

    Jiao, J; Wu, J; Lv, Z; Sun, C; Gao, L; Yan, X; Cui, L; Tang, Z; Yan, B; Jia, Y

    2015-11-26

    This study aimed to investigate cytosine methylation profiles in different tobacco (Nicotiana tabacum) cultivars grown in China. Methylation-sensitive amplified polymorphism was used to analyze genome-wide global methylation profiles in four tobacco cultivars (Yunyan 85, NC89, K326, and Yunyan 87). Amplicons with methylated C motifs were cloned by reamplified polymerase chain reaction, sequenced, and analyzed. The results show that geographical location had a greater effect on methylation patterns in the tobacco genome than did sampling time. Analysis of the CG dinucleotide distribution in methylation-sensitive polymorphic restriction fragments suggested that a CpG dinucleotide cluster-enriched area is a possible site of cytosine methylation in the tobacco genome. The sequence alignments of the Nia1 gene (that encodes nitrate reductase) in Yunyan 87 in different regions indicate that a C-T transition might be responsible for the tobacco phenotype. T-C nucleotide replacement might also be responsible for the tobacco phenotype and may be influenced by geographical location.

  5. COACH: profile-profile alignment of protein families using hidden Markov models.

    PubMed

    Edgar, Robert C; Sjölander, Kimmen

    2004-05-22

    Alignments of two multiple-sequence alignments, or statistical models of such alignments (profiles), have important applications in computational biology. The increased amount of information in a profile versus a single sequence can lead to more accurate alignments and more sensitive homolog detection in database searches. Several profile-profile alignment methods have been proposed and have been shown to improve sensitivity and alignment quality compared with sequence-sequence methods (such as BLAST) and profile-sequence methods (e.g. PSI-BLAST). Here we present a new approach to profile-profile alignment we call Comparison of Alignments by Constructing Hidden Markov Models (HMMs) (COACH). COACH aligns two multiple sequence alignments by constructing a profile HMM from one alignment and aligning the other to that HMM. We compare the alignment accuracy of COACH with two recently published methods: Yona and Levitt's prof_sim and Sadreyev and Grishin's COMPASS. On two sets of reference alignments selected from the FSSP database, we find that COACH is able, on average, to produce alignments giving the best coverage or the fewest errors, depending on the chosen parameter settings. COACH is freely available from www.drive5.com/lobster

  6. SUPER-FOCUS: A tool for agile functional analysis of shotgun metagenomic data

    DOE PAGES

    Silva, Genivaldo Gueiros Z.; Green, Kevin T.; Dutilh, Bas E.; ...

    2015-10-09

    Analyzing the functional profile of a microbial community from unannotated shotgun sequencing reads is one of the important goals in metagenomics. Functional profiling has valuable applications in biological research because it identifies the abundances of the functional genes of the organisms present in the original sample, answering the question what they can do. Currently, available tools do not scale well with increasing data volumes, which is important because both the number and lengths of the reads produced by sequencing platforms keep increasing. Here, we introduce SUPER-FOCUS, SUbsystems Profile by databasE Reduction using FOCUS, an agile homology-based approach using a reducedmore » reference database to report the subsystems present in metagenomic datasets and profile their abundances. We tested SUPER-FOCUS with over 70 real metagenomes, the results showing that it accurately predicts the subsystems present in the profiled microbial communities, and is up to 1000 times faster than other tools.« less

  7. PROSPECT improves cis-acting regulatory element prediction by integrating expression profile data with consensus pattern searches

    PubMed Central

    Fujibuchi, Wataru; Anderson, John S. J.; Landsman, David

    2001-01-01

    Consensus pattern and matrix-based searches designed to predict cis-acting transcriptional regulatory sequences have historically been subject to large numbers of false positives. We sought to decrease false positives by incorporating expression profile data into a consensus pattern-based search method. We have systematically analyzed the expression phenotypes of over 6000 yeast genes, across 121 expression profile experiments, and correlated them with the distribution of 14 known regulatory elements over sequences upstream of the genes. Our method is based on a metric we term probabilistic element assessment (PEA), which is a ranking of potential sites based on sequence similarity in the upstream regions of genes with similar expression phenotypes. For eight of the 14 known elements that we examined, our method had a much higher selectivity than a naïve consensus pattern search. Based on our analysis, we have developed a web-based tool called PROSPECT, which allows consensus pattern-based searching of gene clusters obtained from microarray data. PMID:11574681

  8. Primer and platform effects on 16S rRNA tag sequencing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tremblay, Julien; Singh, Kanwar; Fern, Alison

    Sequencing of 16S rRNA gene tags is a popular method for profiling and comparing microbial communities. The protocols and methods used, however, vary considerably with regard to amplification primers, sequencing primers, sequencing technologies; as well as quality filtering and clustering. How results are affected by these choices, and whether data produced with different protocols can be meaningfully compared, is often unknown. Here we compare results obtained using three different amplification primer sets (targeting V4, V6–V8, and V7–V8) and two sequencing technologies (454 pyrosequencing and Illumina MiSeq) using DNA from a mock community containing a known number of species as wellmore » as complex environmental samples whose PCR-independent profiles were estimated using shotgun sequencing. We find that paired-end MiSeq reads produce higher quality data and enabled the use of more aggressive quality control parameters over 454, resulting in a higher retention rate of high quality reads for downstream data analysis. While primer choice considerably influences quantitative abundance estimations, sequencing platform has relatively minor effects when matched primers are used. In conclusion, beta diversity metrics are surprisingly robust to both primer and sequencing platform biases.« less

  9. Primer and platform effects on 16S rRNA tag sequencing

    DOE PAGES

    Tremblay, Julien; Singh, Kanwar; Fern, Alison; ...

    2015-08-04

    Sequencing of 16S rRNA gene tags is a popular method for profiling and comparing microbial communities. The protocols and methods used, however, vary considerably with regard to amplification primers, sequencing primers, sequencing technologies; as well as quality filtering and clustering. How results are affected by these choices, and whether data produced with different protocols can be meaningfully compared, is often unknown. Here we compare results obtained using three different amplification primer sets (targeting V4, V6–V8, and V7–V8) and two sequencing technologies (454 pyrosequencing and Illumina MiSeq) using DNA from a mock community containing a known number of species as wellmore » as complex environmental samples whose PCR-independent profiles were estimated using shotgun sequencing. We find that paired-end MiSeq reads produce higher quality data and enabled the use of more aggressive quality control parameters over 454, resulting in a higher retention rate of high quality reads for downstream data analysis. While primer choice considerably influences quantitative abundance estimations, sequencing platform has relatively minor effects when matched primers are used. In conclusion, beta diversity metrics are surprisingly robust to both primer and sequencing platform biases.« less

  10. On the Regularities of the Polar Profiles of Proteins Related to Ebola Virus Infection and their Functional Domains.

    PubMed

    Polanco, Carlos; Samaniego Mendoza, José Lino; Buhse, Thomas; Uversky, Vladimir N; Bañuelos Chao, Ingrid Paola; Bañuelos Cedano, Marcela Angola; Tavera, Fernando Michel; Tavera, Daniel Michel; Falconi, Manuel; Ponce de León, Abelardo Vela

    2018-03-06

    The number of fatalities and economic losses caused by the Ebola virus infection across the planet culminated in the havoc that occurred between August and November 2014. However, little is known about the molecular protein profile of this devastating virus. This work represents a thorough bioinformatics analysis of the regularities of charge distribution (polar profiles) in two groups of proteins and their functional domains associated with Ebola virus disease: Ebola virus proteins and Human proteins interacting with Ebola virus. Our analysis reveals that a fragment exists in each of these proteins-one named the "functional domain"-with the polar profile similar to the polar profile of the protein that contains it. Each protein is formed by a group of short sub-sequences, where each fragment has a different and distinctive polar profile and where the polar profile between adjacent short sub-sequences changes orderly and gradually to coincide with the polar profile of the whole protein. When using the charge distribution as a metric, it was observed that it effectively discriminates the proteins from their functional domains. As a counterexample, the same test was applied to a set of synthetic proteins built for that purpose, revealing that any of the regularities reported here for the Ebola virus proteins and human proteins interacting with Ebola virus were not present in the synthetic proteins. Our results indicate that the polar profile of each protein studied and its corresponding functional domain are similar. Thus, when building each protein from its functional domai-adding one amino acid at a time and plotting each time its polar profile-it was observed that the resulting graphs can be divided into groups with similar polar profiles.

  11. Bi-PROF

    PubMed Central

    Gries, Jasmin; Schumacher, Dirk; Arand, Julia; Lutsik, Pavlo; Markelova, Maria Rivera; Fichtner, Iduna; Walter, Jörn; Sers, Christine; Tierling, Sascha

    2013-01-01

    The use of next generation sequencing has expanded our view on whole mammalian methylome patterns. In particular, it provides a genome-wide insight of local DNA methylation diversity at single nucleotide level and enables the examination of single chromosome sequence sections at a sufficient statistical power. We describe a bisulfite-based sequence profiling pipeline, Bi-PROF, which is based on the 454 GS-FLX Titanium technology that allows to obtain up to one million sequence stretches at single base pair resolution without laborious subcloning. To illustrate the performance of the experimental workflow connected to a bioinformatics program pipeline (BiQ Analyzer HT) we present a test analysis set of 68 different epigenetic marker regions (amplicons) in five individual patient-derived xenograft tissue samples of colorectal cancer and one healthy colon epithelium sample as a control. After the 454 GS-FLX Titanium run, sequence read processing and sample decoding, the obtained alignments are quality controlled and statistically evaluated. Comprehensive methylation pattern interpretation (profiling) assessed by analyzing 102-104 sequence reads per amplicon allows an unprecedented deep view on pattern formation and methylation marker heterogeneity in tissues concerned by complex diseases like cancer. PMID:23803588

  12. VariantBam: filtering and profiling of next-generational sequencing data using region-specific rules.

    PubMed

    Wala, Jeremiah; Zhang, Cheng-Zhong; Meyerson, Matthew; Beroukhim, Rameen

    2016-07-01

    We developed VariantBam, a C ++ read filtering and profiling tool for use with BAM, CRAM and SAM sequencing files. VariantBam provides a flexible framework for extracting sequencing reads or read-pairs that satisfy combinations of rules, defined by any number of genomic intervals or variant sites. We have implemented filters based on alignment data, sequence motifs, regional coverage and base quality. For example, VariantBam achieved a median size reduction ratio of 3.1:1 when applied to 10 lung cancer whole genome BAMs by removing large tags and selecting for only high-quality variant-supporting reads and reads matching a large dictionary of sequence motifs. Thus VariantBam enables efficient storage of sequencing data while preserving the most relevant information for downstream analysis. VariantBam and full documentation are available at github.com/jwalabroad/VariantBam rameen@broadinstitute.org Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  13. Active bacterial community structure along vertical redox gradients in Baltic Sea sediment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jansson, Janet; Edlund, Anna; Hardeman, Fredrik

    Community structures of active bacterial populations were investigated along a vertical redox profile in coastal Baltic Sea sediments by terminal-restriction fragment length polymorphism (T-RFLP) and clone library analysis. According to correspondence analysis of T-RFLP results and sequencing of cloned 16S rRNA genes, the microbial community structures at three redox depths (179 mV, -64 mV and -337 mV) differed significantly. The bacterial communities in the community DNA differed from those in bromodeoxyuridine (BrdU)-labeled DNA, indicating that the growing members of the community that incorporated BrdU were not necessarily the most dominant members. The structures of the actively growing bacterial communities weremore » most strongly correlated to organic carbon followed by total nitrogen and redox potentials. Bacterial identification by sequencing of 16S rRNA genes from clones of BrdU-labeled DNA and DNA from reverse transcription PCR (rt-PCR) showed that bacterial taxa involved in nitrogen and sulfur cycling were metabolically active along the redox profiles. Several sequences had low similarities to previously detected sequences indicating that novel lineages of bacteria are present in Baltic Sea sediments. Also, a high number of different 16S rRNA gene sequences representing different phyla were detected at all sampling depths.« less

  14. Preparing and Analyzing Expressed Sequence Tags (ESTs) Library for the Mammary Tissue of Local Turkish Kivircik Sheep

    PubMed Central

    Omeroglu Ulu, Zehra; Ulu, Salih; Un, Cemal; Ozdem Oztabak, Kemal; Altunatmaz, Kemal

    2017-01-01

    Kivircik sheep is an important local Turkish sheep according to its meat quality and milk productivity. The aim of this study was to analyze gene expression profiles of both prenatal and postnatal stages for the Kivircik sheep. Therefore, two different cDNA libraries, which were taken from the same Kivircik sheep mammary gland tissue at prenatal and postnatal stages, were constructed. Total 3072 colonies which were randomly selected from the two libraries were sequenced for developing a sheep ESTs collection. We used Phred/Phrap computer programs for analysis of the raw EST and readable EST sequences were assembled with the CAP3 software. Putative functions of all unique sequences and statistical analysis were determined by Geneious software. Total 422 ESTs have over 80% similarity to known sequences of other organisms in NCBI classified by Panther database for the Gene Ontology (GO) category. By comparing gene expression profiles, we observed some putative genes that may be relative to reproductive performance or play important roles in milk synthesis and secretion. A total of 2414 ESTs have been deposited to the NCBI GenBank database (GW996847–GW999260). EST data in this study have provided a new source of information to functional genome studies of sheep. PMID:28239610

  15. Deep sequencing-based transcriptome profiling analysis of bacteria-challenged Lateolabrax japonicus reveals insight into the immune-relevant genes in marine fish

    PubMed Central

    2010-01-01

    Background Systematic research on fish immunogenetics is indispensable in understanding the origin and evolution of immune systems. This has long been a challenging task because of the limited number of deep sequencing technologies and genome backgrounds of non-model fish available. The newly developed Solexa/Illumina RNA-seq and Digital gene expression (DGE) are high-throughput sequencing approaches and are powerful tools for genomic studies at the transcriptome level. This study reports the transcriptome profiling analysis of bacteria-challenged Lateolabrax japonicus using RNA-seq and DGE in an attempt to gain insights into the immunogenetics of marine fish. Results RNA-seq analysis generated 169,950 non-redundant consensus sequences, among which 48,987 functional transcripts with complete or various length encoding regions were identified. More than 52% of these transcripts are possibly involved in approximately 219 known metabolic or signalling pathways, while 2,673 transcripts were associated with immune-relevant genes. In addition, approximately 8% of the transcripts appeared to be fish-specific genes that have never been described before. DGE analysis revealed that the host transcriptome profile of Vibrio harveyi-challenged L. japonicus is considerably altered, as indicated by the significant up- or down-regulation of 1,224 strong infection-responsive transcripts. Results indicated an overall conservation of the components and transcriptome alterations underlying innate and adaptive immunity in fish and other vertebrate models. Analysis suggested the acquisition of numerous fish-specific immune system components during early vertebrate evolution. Conclusion This study provided a global survey of host defence gene activities against bacterial challenge in a non-model marine fish. Results can contribute to the in-depth study of candidate genes in marine fish immunity, and help improve current understanding of host-pathogen interactions and evolutionary history of immunogenetics from fish to mammals. PMID:20707909

  16. Clonal architecture of secondary acute myeloid leukemia defined by single-cell sequencing.

    PubMed

    Hughes, Andrew E O; Magrini, Vincent; Demeter, Ryan; Miller, Christopher A; Fulton, Robert; Fulton, Lucinda L; Eades, William C; Elliott, Kevin; Heath, Sharon; Westervelt, Peter; Ding, Li; Conrad, Donald F; White, Brian S; Shao, Jin; Link, Daniel C; DiPersio, John F; Mardis, Elaine R; Wilson, Richard K; Ley, Timothy J; Walter, Matthew J; Graubert, Timothy A

    2014-07-01

    Next-generation sequencing has been used to infer the clonality of heterogeneous tumor samples. These analyses yield specific predictions-the population frequency of individual clones, their genetic composition, and their evolutionary relationships-which we set out to test by sequencing individual cells from three subjects diagnosed with secondary acute myeloid leukemia, each of whom had been previously characterized by whole genome sequencing of unfractionated tumor samples. Single-cell mutation profiling strongly supported the clonal architecture implied by the analysis of bulk material. In addition, it resolved the clonal assignment of single nucleotide variants that had been initially ambiguous and identified areas of previously unappreciated complexity. Accordingly, we find that many of the key assumptions underlying the analysis of tumor clonality by deep sequencing of unfractionated material are valid. Furthermore, we illustrate a single-cell sequencing strategy for interrogating the clonal relationships among known variants that is cost-effective, scalable, and adaptable to the analysis of both hematopoietic and solid tumors, or any heterogeneous population of cells.

  17. A comparative molecular analysis of water-filled limestone sinkholes in north-eastern Mexico.

    PubMed

    Sahl, Jason W; Gary, Marcus O; Harris, J Kirk; Spear, John R

    2011-01-01

    Sistema Zacatón in north-eastern Mexico is host to several deep, water-filled, anoxic, karstic sinkholes (cenotes). These cenotes were explored, mapped, and geochemically and microbiologically sampled by the autonomous underwater vehicle deep phreatic thermal explorer (DEPTHX). The community structure of the filterable fraction of the water column and extensive microbial mats that coat the cenote walls was investigated by comparative analysis of small-subunit (SSU) 16S rRNA gene sequences. Full-length Sanger gene sequence analysis revealed novel microbial diversity that included three putative bacterial candidate phyla and three additional groups that showed high intra-clade distance with poorly characterized bacterial candidate phyla. Limited functional gene sequence analysis in these anoxic environments identified genes associated with methanogenesis, sulfate reduction and anaerobic ammonium oxidation. A directed, barcoded amplicon, multiplex pyrosequencing approach was employed to compare ∼100,000 bacterial SSU gene sequences from water column and wall microbial mat samples from five cenotes in Sistema Zacatón. A new, high-resolution sequence distribution profile (SDP) method identified changes in specific phylogenetic types (phylotypes) in microbial mats at varied depths; Mantel tests showed a correlation of the genetic distances between mat communities in two cenotes and the geographic location of each cenote. Community structure profiles from the water column of three neighbouring cenotes showed distinct variation; statistically significant differences in the concentration of geochemical constituents suggest that the variation observed in microbial communities between neighbouring cenotes are due to geochemical variation. © 2010 Society for Applied Microbiology and Blackwell Publishing Ltd.

  18. Correlation of offshore seismic profiles with onshore New Jersey Miocene sediments

    USGS Publications Warehouse

    Monteverde, D.H.; Miller, K.G.; Mountain, Gregory S.

    2000-01-01

    The New Jersey passive continental margin records the interaction of sequences and sea-level, although previous studies linking seismically defined sequences, borehole control, and global ??18O records were hindered by a seismic data gap on the inner-shelf. We describe new seismic data from the innermost New Jersey shelf that tie offshore seismic stratigraphy directly to onshore boreholes. These data link the onshore boreholes to existing seismic grids across the outer margin and to boreholes on the continental slope. Surfaces defined by age; facies, and log signature in the onshore boreholes at the base of sequences Kw2b, Kw2a, Kw1c, and Kw0 are now tied to seismic sequence boundaries m5s, m5.2s, m5.4s, and m6s, respectively, defined beneath the inner shelf. Sequence boundaries recognized in onshore boreholes and inner shelf seismic profiles apparently correlate with reflections m5, m5.2, m5.4, and m6, respectively, that were dated at slope boreholes during ODP Leg 150. We now recognize an additional sequence boundary beneath the shelf that we name m5.5s and correlate to the base of the onshore sequence Kw1b. The new seismic data image prograding Oligocene clinoforms beneath the inner shelf, consistent with the results from onshore boreholes. A land-based seismic profile crossing the Island Beach borehole reveals reflector geometries that we tie to Lower Miocene litho- and bio-facies in this borehole. These land-based seismic profiles image well-defined sequence boundaries, onlap and downlap truncations that correlate to Transgressive Systems Tracts (TST) and Highstand Systems Tracts (HST) identified in boreholes. Preliminary analysis of CH0698 data continues these system tract delineations across the inner shelf The CH0698 seismic profiles tie seismically defined sequence boundaries with sequences identified by lithiologic and paleontologic criteria. Both can now be related to global ??18O increases and attendant glacioeustatic lowerings. This integration of core, log, and seismic character of mid-Tertiary sediments across the width of the New Jersey margin is a major step in the long-standing effort to evaluate the impact of glaciouestasy on siliciclastic sediments of a passive continental margin. (C) 2000 Elsevier Science B.V. All rights reserved.

  19. Identification of species by multiplex analysis of variable-length sequences

    PubMed Central

    Pereira, Filipe; Carneiro, João; Matthiesen, Rune; van Asch, Barbara; Pinto, Nádia; Gusmão, Leonor; Amorim, António

    2010-01-01

    The quest for a universal and efficient method of identifying species has been a longstanding challenge in biology. Here, we show that accurate identification of species in all domains of life can be accomplished by multiplex analysis of variable-length sequences containing multiple insertion/deletion variants. The new method, called SPInDel, is able to discriminate 93.3% of eukaryotic species from 18 taxonomic groups. We also demonstrate that the identification of prokaryotic and viral species with numeric profiles of fragment lengths is generally straightforward. A computational platform is presented to facilitate the planning of projects and includes a large data set with nearly 1800 numeric profiles for species in all domains of life (1556 for eukaryotes, 105 for prokaryotes and 130 for viruses). Finally, a SPInDel profiling kit for discrimination of 10 mammalian species was successfully validated on highly processed food products with species mixtures and proved to be easily adaptable to multiple screening procedures routinely used in molecular biology laboratories. These results suggest that SPInDel is a reliable and cost-effective method for broad-spectrum species identification that is appropriate for use in suboptimal samples and is amenable to different high-throughput genotyping platforms without the need for DNA sequencing. PMID:20923781

  20. Bayesian mixture analysis for metagenomic community profiling.

    PubMed

    Morfopoulou, Sofia; Plagnol, Vincent

    2015-09-15

    Deep sequencing of clinical samples is now an established tool for the detection of infectious pathogens, with direct medical applications. The large amount of data generated produces an opportunity to detect species even at very low levels, provided that computational tools can effectively profile the relevant metagenomic communities. Data interpretation is complicated by the fact that short sequencing reads can match multiple organisms and by the lack of completeness of existing databases, in particular for viral pathogens. Here we present metaMix, a Bayesian mixture model framework for resolving complex metagenomic mixtures. We show that the use of parallel Monte Carlo Markov chains for the exploration of the species space enables the identification of the set of species most likely to contribute to the mixture. We demonstrate the greater accuracy of metaMix compared with relevant methods, particularly for profiling complex communities consisting of several related species. We designed metaMix specifically for the analysis of deep transcriptome sequencing datasets, with a focus on viral pathogen detection; however, the principles are generally applicable to all types of metagenomic mixtures. metaMix is implemented as a user friendly R package, freely available on CRAN: http://cran.r-project.org/web/packages/metaMix sofia.morfopoulou.10@ucl.ac.uk Supplementary data are available at Bionformatics online. © The Author 2015. Published by Oxford University Press.

  1. Selective 2'-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) for direct, versatile and accurate RNA structure analysis.

    PubMed

    Smola, Matthew J; Rice, Greggory M; Busan, Steven; Siegfried, Nathan A; Weeks, Kevin M

    2015-11-01

    Selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) chemistries exploit small electrophilic reagents that react with 2'-hydroxyl groups to interrogate RNA structure at single-nucleotide resolution. Mutational profiling (MaP) identifies modified residues by using reverse transcriptase to misread a SHAPE-modified nucleotide and then counting the resulting mutations by massively parallel sequencing. The SHAPE-MaP approach measures the structure of large and transcriptome-wide systems as accurately as can be done for simple model RNAs. This protocol describes the experimental steps, implemented over 3 d, that are required to perform SHAPE probing and to construct multiplexed SHAPE-MaP libraries suitable for deep sequencing. Automated processing of MaP sequencing data is accomplished using two software packages. ShapeMapper converts raw sequencing files into mutational profiles, creates SHAPE reactivity plots and provides useful troubleshooting information. SuperFold uses these data to model RNA secondary structures, identify regions with well-defined structures and visualize probable and alternative helices, often in under 1 d. SHAPE-MaP can be used to make nucleotide-resolution biophysical measurements of individual RNA motifs, rare components of complex RNA ensembles and entire transcriptomes.

  2. Selective ribosome profiling as a tool to study the interaction of chaperones and targeting factors with nascent polypeptide chains and ribosomes

    PubMed Central

    Becker, Annemarie H.; Oh, Eugene; Weissman, Jonathan S.; Kramer, Günter; Bukau, Bernd

    2014-01-01

    A plethora of factors is involved in the maturation of newly synthesized proteins, including chaperones, membrane targeting factors, and enzymes. Many factors act cotranslationally through association with ribosome-nascent chain complexes (RNCs), but their target specificities and modes of action remain poorly understood. We developed selective ribosome profiling (SeRP) to identify substrate pools and points of RNC engagement of these factors. SeRP is based on sequencing mRNA fragments covered by translating ribosomes (general ribosome profiling, RP), combined with a procedure to selectively isolate RNCs whose nascent polypeptides are associated with the factor of interest. Factor–RNC interactions are stabilized by crosslinking, the resulting factor–RNC adducts are then nuclease-treated to generate monosomes, and affinity-purified. The ribosome-extracted mRNA footprints are converted to DNA libraries for deep sequencing. The protocol is specified for general RP and SeRP in bacteria. It was first applied to the chaperone trigger factor and is readily adaptable to other cotranslationally acting factors, including eukaryotic factors. Factor–RNC purification and sequencing library preparation takes 7–8 days, sequencing and data analysis can be completed in 5–6 days. PMID:24136347

  3. Comparative genome analysis and characterization of the Salmonella Typhimurium strain CCRJ_26 isolated from swine carcasses using whole-genome sequencing approach.

    PubMed

    Panzenhagen, P H N; Cabral, C C; Suffys, P N; Franco, R M; Rodrigues, D P; Conte-Junior, C A

    2018-04-01

    Salmonella pathogenicity relies on virulence factors many of which are clustered within the Salmonella pathogenicity islands. Salmonella also harbours mobile genetic elements such as virulence plasmids, prophage-like elements and antimicrobial resistance genes which can contribute to increase its pathogenicity. Here, we have genetically characterized a selected S. Typhimurium strain (CCRJ_26) from our previous study with Multiple Drugs Resistant profile and high-frequency PFGE clonal profile which apparently persists in the pork production centre of Rio de Janeiro State, Brazil. By whole-genome sequencing, we described the strain's genome virulent content and characterized the repertoire of bacterial plasmids, antibiotic resistance genes and prophage-like elements. Here, we have shown evidence that strain CCRJ_26 genome possible represent a virulence-associated phenotype which may be potentially virulent in human infection. Whole-genome sequencing technologies are still costly and remain underexplored for applied microbiology in Brazil. Hence, this genomic description of S. Typhimurium strain CCRJ_26 will provide help in future molecular epidemiological studies. The analysis described here reveals a quick and useful pipeline for bacterial virulence characterization using whole-genome sequencing approach. © 2018 The Society for Applied Microbiology.

  4. Transcript Profiling of Common Bean (Phaseolus vulgaris) Using the GeneChip Soybean Genome Array: Optimizing Analysis by Masking Biased Probes

    USDA-ARS?s Scientific Manuscript database

    Common bean (Phaseolus vulgaris) and soybean (Glycine max) both belong to the Phaseoleae tribe and share significant coding sequence homology. To evaluate the utility of the soybean GeneChip for transcript profiling of common bean, we hybridized cRNAs purified from nodule, leaf, and root of common b...

  5. Distribution of Bartonella henselae Variants in Patients, Reservoir Hosts and Vectors in Spain

    PubMed Central

    Gil, Horacio; Escudero, Raquel; Pons, Inmaculada; Rodríguez-Vargas, Manuela; García-Esteban, Coral; Rodríguez-Moreno, Isabel; García-Amil, Cristina; Lobo, Bruno; Valcárcel, Félix; Pérez, Azucena; Jiménez, Santos; Jado, Isabel; Juste, Ramón; Segura, Ferrán; Anda, Pedro

    2013-01-01

    We have studied the diversity of B. henselae circulating in patients, reservoir hosts and vectors in Spain. In total, we have fully characterized 53 clinical samples from 46 patients, as well as 78 B. henselae isolates obtained from 35 cats from La Rioja and Catalonia (northeastern Spain), four positive cat blood samples from which no isolates were obtained, and three positive fleas by Multiple Locus Sequence Typing and Multiple Locus Variable Number Tandem Repeats Analysis. This study represents the largest series of human cases characterized with these methods, with 10 different sequence types and 41 MLVA profiles. Two of the sequence types and 35 of the profiles were not described previously. Most of the B. henselae variants belonged to ST5. Also, we have identified a common profile (72) which is well distributed in Spain and was found to persist over time. Indeed, this profile seems to be the origin from which most of the variants identified in this study have been generated. In addition, ST5, ST6 and ST9 were found associated with felines, whereas ST1, ST5 and ST8 were the most frequent sequence types found infecting humans. Interestingly, some of the feline associated variants never found on patients were located in a separate clade, which could represent a group of strains less pathogenic for humans. PMID:23874563

  6. Effects of age, sex, and genotype on high-sensitivity metabolomic profiles in the fruit fly, Drosophila melanogaster

    PubMed Central

    Hoffman, Jessica M; Soltow, Quinlyn A; Li, Shuzhao; Sidik, Alfire; Jones, Dean P; Promislow, Daniel E L

    2014-01-01

    Researchers have used whole-genome sequencing and gene expression profiling to identify genes associated with age, in the hope of understanding the underlying mechanisms of senescence. But there is a substantial gap from variation in gene sequences and expression levels to variation in age or life expectancy. In an attempt to bridge this gap, here we describe the effects of age, sex, genotype, and their interactions on high-sensitivity metabolomic profiles in the fruit fly, Drosophila melanogaster. Among the 6800 features analyzed, we found that over one-quarter of all metabolites were significantly associated with age, sex, genotype, or their interactions, and multivariate analysis shows that individual metabolomic profiles are highly predictive of these traits. Using a metabolomic equivalent of gene set enrichment analysis, we identified numerous metabolic pathways that were enriched among metabolites associated with age, sex, and genotype, including pathways involving sugar and glycerophospholipid metabolism, neurotransmitters, amino acids, and the carnitine shuttle. Our results suggest that high-sensitivity metabolomic studies have excellent potential not only to reveal mechanisms that lead to senescence, but also to help us understand differences in patterns of aging among genotypes and between males and females. PMID:24636523

  7. Massively parallel sequencing of 17 commonly used forensic autosomal STRs and amelogenin with small amplicons.

    PubMed

    Kim, Eun Hye; Lee, Hwan Young; Yang, In Seok; Jung, Sang-Eun; Yang, Woo Ick; Shin, Kyoung-Jin

    2016-05-01

    The next-generation sequencing (NGS) method has been utilized to analyze short tandem repeat (STR) markers, which are routinely used for human identification purposes in the forensic field. Some researchers have demonstrated the successful application of the NGS system to STR typing, suggesting that NGS technology may be an alternative or additional method to overcome limitations of capillary electrophoresis (CE)-based STR profiling. However, there has been no available multiplex PCR system that is optimized for NGS analysis of forensic STR markers. Thus, we constructed a multiplex PCR system for the NGS analysis of 18 markers (13CODIS STRs, D2S1338, D19S433, Penta D, Penta E and amelogenin) by designing amplicons in the size range of 77-210 base pairs. Then, PCR products were generated from two single-sources, mixed samples and artificially degraded DNA samples using a multiplex PCR system, and were prepared for sequencing on the MiSeq system through construction of a subsequent barcoded library. By performing NGS and analyzing the data, we confirmed that the resultant STR genotypes were consistent with those of CE-based typing. Moreover, sequence variations were detected in targeted STR regions. Through the use of small-sized amplicons, the developed multiplex PCR system enables researchers to obtain successful STR profiles even from artificially degraded DNA as well as STR loci which are analyzed with large-sized amplicons in the CE-based commercial kits. In addition, successful profiles can be obtained from mixtures up to a 1:19 ratio. Consequently, the developed multiplex PCR system, which produces small size amplicons, can be successfully applied to STR NGS analysis of forensic casework samples such as mixtures and degraded DNA samples. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  8. An approach to functionally relevant clustering of the protein universe: Active site profile-based clustering of protein structures and sequences.

    PubMed

    Knutson, Stacy T; Westwood, Brian M; Leuthaeuser, Janelle B; Turner, Brandon E; Nguyendac, Don; Shea, Gabrielle; Kumar, Kiran; Hayden, Julia D; Harper, Angela F; Brown, Shoshana D; Morris, John H; Ferrin, Thomas E; Babbitt, Patricia C; Fetrow, Jacquelyn S

    2017-04-01

    Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification-amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two-Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure-Function Linkage Database, SFLD) self-identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self-identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well-curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP-identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F-measure and performance analysis on the enolase search results and comparison to GEMMA and SCI-PHY demonstrate that TuLIP avoids the over-division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results. © 2017 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.

  9. Identification of Variable-Number Tandem-Repeat (VNTR) Sequences in Acinetobacter baumannii and Interlaboratory Validation of an Optimized Multiple-Locus VNTR Analysis Typing Scheme▿†

    PubMed Central

    Pourcel, Christine; Minandri, Fabrizia; Hauck, Yolande; D'Arezzo, Silvia; Imperi, Francesco; Vergnaud, Gilles; Visca, Paolo

    2011-01-01

    Acinetobacter baumannii is an important opportunistic pathogen responsible for nosocomial outbreaks, mostly occurring in intensive care units. Due to the multiplicity of infection sources, reliable molecular fingerprinting techniques are needed to establish epidemiological correlations among A. baumannii isolates. Multiple-locus variable-number tandem-repeat analysis (MLVA) has proven to be a fast, reliable, and cost-effective typing method for several bacterial species. In this study, an MLVA assay compatible with simple PCR- and agarose gel-based electrophoresis steps as well as with high-throughput automated methods was developed for A. baumannii typing. Preliminarily, 10 potential polymorphic variable-number tandem repeats (VNTRs) were identified upon bioinformatic screening of six annotated genome sequences of A. baumannii. A collection of 7 reference strains plus 18 well-characterized isolates, including unique types and representatives of the three international A. baumannii lineages, was then evaluated in a two-center study aimed at validating the MLVA assay and comparing it with other genotyping assays, namely, macrorestriction analysis with pulsed-field gel electrophoresis (PFGE) and PCR-based sequence group (SG) profiling. The results showed that MLVA can discriminate between isolates with identical PFGE types and SG profiles. A panel of eight VNTR markers was selected, all showing the ability to be amplified and good amounts of polymorphism in the majority of strains. Independently generated MLVA profiles, composed of an ordered string of allele numbers corresponding to the number of repeats at each VNTR locus, were concordant between centers. Typeability, reproducibility, stability, discriminatory power, and epidemiological concordance were excellent. A database containing information and MLVA profiles for several A. baumannii strains is available from http://mlva.u-psud.fr/. PMID:21147956

  10. Classification of Fowl Adenovirus Serotypes by Use of High-Resolution Melting-Curve Analysis of the Hexon Gene Region▿

    PubMed Central

    Steer, Penelope A.; Kirkpatrick, Naomi C.; O'Rourke, Denise; Noormohammadi, Amir H.

    2009-01-01

    Identification of fowl adenovirus (FAdV) serotypes is of importance in epidemiological studies of disease outbreaks and the adoption of vaccination strategies. In this study, real-time PCR and subsequent high-resolution melting (HRM)-curve analysis of three regions of the hexon gene were developed and assessed for their potential in differentiating 12 FAdV reference serotypes. The results were compared to previously described PCR and restriction enzyme analyses of the hexon gene. Both HRM-curve analysis of a 191-bp region of the hexon gene and restriction enzyme analysis failed to distinguish a number of serotypes used in this study. In addition, PCR of the region spanning nucleotides (nt) 144 to 1040 failed to amplify FAdV-5 in sufficient quantities for further analysis. However, HRM-curve analysis of the region spanning nt 301 to 890 proved a sensitive and specific method of differentiating all 12 serotypes. All melt curves were highly reproducible, and replicates of each serotype were correctly genotyped with a mean confidence value of more than 99% using normalized HRM curves. Sequencing analysis revealed that each profile was related to a unique sequence, with some sequences sharing greater than 94% identity. Melting-curve profiles were found to be related mainly to GC composition and distribution throughout the amplicons, regardless of sequence identity. The results presented in this study show that the closed-tube method of PCR and HRM-curve analysis provides an accurate, rapid, and robust genotyping technique for the identification of FAdV serotypes and can be used as a model for developing genotyping techniques for other pathogens. PMID:19036935

  11. Characterization of transcriptome dynamics during watermelon fruit development: sequencing, assembly, annotation and gene expression profiles

    PubMed Central

    2011-01-01

    Background Cultivated watermelon [Citrullus lanatus (Thunb.) Matsum. & Nakai var. lanatus] is an important agriculture crop world-wide. The fruit of watermelon undergoes distinct stages of development with dramatic changes in its size, color, sweetness, texture and aroma. In order to better understand the genetic and molecular basis of these changes and significantly expand the watermelon transcript catalog, we have selected four critical stages of watermelon fruit development and used Roche/454 next-generation sequencing technology to generate a large expressed sequence tag (EST) dataset and a comprehensive transcriptome profile for watermelon fruit flesh tissues. Results We performed half Roche/454 GS-FLX run for each of the four watermelon fruit developmental stages (immature white, white-pink flesh, red flesh and over-ripe) and obtained 577,023 high quality ESTs with an average length of 302.8 bp. De novo assembly of these ESTs together with 11,786 watermelon ESTs collected from GenBank produced 75,068 unigenes with a total length of approximately 31.8 Mb. Overall 54.9% of the unigenes showed significant similarities to known sequences in GenBank non-redundant (nr) protein database and around two-thirds of them matched proteins of cucumber, the most closely-related species with a sequenced genome. The unigenes were further assigned with gene ontology (GO) terms and mapped to biochemical pathways. More than 5,000 SSRs were identified from the EST collection. Furthermore we carried out digital gene expression analysis of these ESTs and identified 3,023 genes that were differentially expressed during watermelon fruit development and ripening, which provided novel insights into watermelon fruit biology and a comprehensive resource of candidate genes for future functional analysis. We then generated profiles of several interesting metabolites that are important to fruit quality including pigmentation and sweetness. Integrative analysis of metabolite and digital gene expression profiles helped elucidating molecular mechanisms governing these important quality-related traits during watermelon fruit development. Conclusion We have generated a large collection of watermelon ESTs, which represents a significant expansion of the current transcript catalog of watermelon and a valuable resource for future studies on the genomics of watermelon and other closely-related species. Digital expression analysis of this EST collection allowed us to identify a large set of genes that were differentially expressed during watermelon fruit development and ripening, which provide a rich source of candidates for future functional analysis and represent a valuable increase in our knowledge base of watermelon fruit biology. PMID:21936920

  12. Characterization of transcriptome dynamics during watermelon fruit development: sequencing, assembly, annotation and gene expression profiles.

    PubMed

    Guo, Shaogui; Liu, Jingan; Zheng, Yi; Huang, Mingyun; Zhang, Haiying; Gong, Guoyi; He, Hongju; Ren, Yi; Zhong, Silin; Fei, Zhangjun; Xu, Yong

    2011-09-21

    Cultivated watermelon [Citrullus lanatus (Thunb.) Matsum. & Nakai var. lanatus] is an important agriculture crop world-wide. The fruit of watermelon undergoes distinct stages of development with dramatic changes in its size, color, sweetness, texture and aroma. In order to better understand the genetic and molecular basis of these changes and significantly expand the watermelon transcript catalog, we have selected four critical stages of watermelon fruit development and used Roche/454 next-generation sequencing technology to generate a large expressed sequence tag (EST) dataset and a comprehensive transcriptome profile for watermelon fruit flesh tissues. We performed half Roche/454 GS-FLX run for each of the four watermelon fruit developmental stages (immature white, white-pink flesh, red flesh and over-ripe) and obtained 577,023 high quality ESTs with an average length of 302.8 bp. De novo assembly of these ESTs together with 11,786 watermelon ESTs collected from GenBank produced 75,068 unigenes with a total length of approximately 31.8 Mb. Overall 54.9% of the unigenes showed significant similarities to known sequences in GenBank non-redundant (nr) protein database and around two-thirds of them matched proteins of cucumber, the most closely-related species with a sequenced genome. The unigenes were further assigned with gene ontology (GO) terms and mapped to biochemical pathways. More than 5,000 SSRs were identified from the EST collection. Furthermore we carried out digital gene expression analysis of these ESTs and identified 3,023 genes that were differentially expressed during watermelon fruit development and ripening, which provided novel insights into watermelon fruit biology and a comprehensive resource of candidate genes for future functional analysis. We then generated profiles of several interesting metabolites that are important to fruit quality including pigmentation and sweetness. Integrative analysis of metabolite and digital gene expression profiles helped elucidating molecular mechanisms governing these important quality-related traits during watermelon fruit development. We have generated a large collection of watermelon ESTs, which represents a significant expansion of the current transcript catalog of watermelon and a valuable resource for future studies on the genomics of watermelon and other closely-related species. Digital expression analysis of this EST collection allowed us to identify a large set of genes that were differentially expressed during watermelon fruit development and ripening, which provide a rich source of candidates for future functional analysis and represent a valuable increase in our knowledge base of watermelon fruit biology.

  13. Diagnostics based on nucleic acid sequence variant profiling: PCR, hybridization, and NGS approaches.

    PubMed

    Khodakov, Dmitriy; Wang, Chunyan; Zhang, David Yu

    2016-10-01

    Nucleic acid sequence variations have been implicated in many diseases, and reliable detection and quantitation of DNA/RNA biomarkers can inform effective therapeutic action, enabling precision medicine. Nucleic acid analysis technologies being translated into the clinic can broadly be classified into hybridization, PCR, and sequencing, as well as their combinations. Here we review the molecular mechanisms of popular commercial assays, and their progress in translation into in vitro diagnostics. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  14. Direct profiling of environmental microbial populations by thermal dissociation analysis of native rRNAs hybridized to oligonucleotide microarrays

    NASA Technical Reports Server (NTRS)

    El Fantroussi, Said; Urakawa, Hidetoshi; Bernhard, Anne E.; Kelly, John J.; Noble, Peter A.; Smidt, H.; Yershov, G. M.; Stahl, David A.

    2003-01-01

    Oligonucleotide microarrays were used to profile directly extracted rRNA from environmental microbial populations without PCR amplification. In our initial inspection of two distinct estuarine study sites, the hybridization patterns were reproducible and varied between estuarine sediments of differing salinities. The determination of a thermal dissociation curve (i.e., melting profile) for each probe-target duplex provided information on hybridization specificity, which is essential for confirming adequate discrimination between target and nontarget sequences.

  15. Differential gene expression in dentate granule cells in mesial temporal lobe epilepsy with and without hippocampal sclerosis.

    PubMed

    Griffin, Nicole G; Wang, Yu; Hulette, Christine M; Halvorsen, Matt; Cronin, Kenneth D; Walley, Nicole M; Haglund, Michael M; Radtke, Rodney A; Skene, J H Pate; Sinha, Saurabh R; Heinzen, Erin L

    2016-03-01

    Hippocampal sclerosis is the most common neuropathologic finding in cases of medically intractable mesial temporal lobe epilepsy. In this study, we analyzed the gene expression profiles of dentate granule cells of patients with mesial temporal lobe epilepsy with and without hippocampal sclerosis to show that next-generation sequencing methods can produce interpretable genomic data from RNA collected from small homogenous cell populations, and to shed light on the transcriptional changes associated with hippocampal sclerosis. RNA was extracted, and complementary DNA (cDNA) was prepared and amplified from dentate granule cells that had been harvested by laser capture microdissection from surgically resected hippocampi from patients with mesial temporal lobe epilepsy with and without hippocampal sclerosis. Sequencing libraries were sequenced, and the resulting sequencing reads were aligned to the reference genome. Differential expression analysis was used to ascertain expression differences between patients with and without hippocampal sclerosis. Greater than 90% of the RNA-Seq reads aligned to the reference. There was high concordance between transcriptional profiles obtained for duplicate samples. Principal component analysis revealed that the presence or absence of hippocampal sclerosis was the main determinant of the variance within the data. Among the genes up-regulated in the hippocampal sclerosis samples, there was significant enrichment for genes involved in oxidative phosphorylation. By analyzing the gene expression profiles of dentate granule cells from surgically resected hippocampal specimens from patients with mesial temporal lobe epilepsy with and without hippocampal sclerosis, we have demonstrated the utility of next-generation sequencing methods for producing biologically relevant results from small populations of homogeneous cells, and have provided insight on the transcriptional changes associated with this pathology. Wiley Periodicals, Inc. © 2016 International League Against Epilepsy.

  16. Prolonged and mixed non-O157 Escherichia coli infection in an Australian household.

    PubMed

    Staples, M; Graham, R M A; Doyle, C J; Smith, H V; Jennison, A V

    2012-05-01

    An Australian family was identified through a Public Health follow up on a Shiga-toxigenic Escherichia coli (STEC) positive bloody diarrhoea case, with three of the four family members experiencing either symptomatic or asymptomatic STEC shedding. Bacterial isolates were submitted to stx sequence sub-typing, multi-locus variable number tandem repeat analysis (MLVA), multi-locus sequence typing (MLST) and binary typing. The analysis revealed that there were multiple strains of STEC being shed by the family members, with similar virulence gene profiles and the same serogroup but differing in their MLVA and MLST profiles. This study illustrates the potentially complicated nature of non-O157 STEC infections and the importance of molecular epidemiology in understanding disease clusters. © 2012 QUEENSLAND HEALTH. Clinical Microbiology and Infection © 2012 European Society of Clinical Microbiology and Infectious Diseases.

  17. OP17MICRORNA PROFILING USING SMALL RNA-SEQ IN PAEDIATRIC LOW GRADE GLIOMAS

    PubMed Central

    Jeyapalan, Jennie N.; Jones, Tania A.; Tatevossian, Ruth G.; Qaddoumi, Ibrahim; Ellison, David W.; Sheer, Denise

    2014-01-01

    INTRODUCTION: MicroRNAs regulate gene expression by targeting mRNAs for translational repression or degradation at the post-transcriptional level. In paediatric low-grade gliomas a few key genetic mutations have been identified, including BRAF fusions, FGFR1 duplications and MYB rearrangements. Our aim in the current study is to profile aberrant microRNA expression in paediatric low-grade gliomas and determine the role of epigenetic changes in the aetiology and behaviour of these tumours. METHOD: MicroRNA profiling of tumour samples (6 pilocytic, 2 diffuse, 2 pilomyxoid astrocytomas) and normal brain controls (4 adult normal brain samples and a primary glial progenitor cell-line) was performed using small RNA sequencing. Bioinformatic analysis included sequence alignment, analysis of the number of reads (CPM, counts per million) and differential expression. RESULTS: Sequence alignment identified 695 microRNAs, whose expression was compared in tumours v. normal brain. PCA and hierarchical clustering showed separate groups for tumours and normal brain. Computational analysis identified approximately 400 differentially expressed microRNAs in the tumours compared to matched location controls. Our findings will then be validated and integrated with extensive genetic and epigenetic information we have previously obtained for the full tumour cohort. CONCLUSION: We have identified microRNAs that are differentially expressed in paediatric low-grade gliomas. As microRNAs are known to target genes involved in the initiation and progression of cancer, they provide critical information on tumour pathogenesis and are an important class of biomarkers.

  18. Integrative Clinical Genomics of Metastatic Cancer

    PubMed Central

    Robinson, Dan R.; Wu, Yi-Mi; Lonigro, Robert J.; Vats, Pankaj; Cobain, Erin; Everett, Jessica; Cao, Xuhong; Rabban, Erica; Kumar-Sinha, Chandan; Raymond, Victoria; Schuetze, Scott; Alva, Ajjai; Siddiqui, Javed; Chugh, Rashmi; Worden, Francis; Zalupski, Mark M.; Innis, Jeffrey; Mody, Rajen J.; Tomlins, Scott A.; Lucas, David; Baker, Laurence H.; Ramnath, Nithya; Schott, Ann F.; Hayes, Daniel F.; Vijai, Joseph; Offit, Kenneth; Stoffel, Elena M.; Roberts, J. Scott; Smith, David C.; Kunju, Lakshmi P.; Talpaz, Moshe; Cieslik, Marcin; Chinnaiyan, Arul M.

    2017-01-01

    SUMMARY Metastasis is the primary cause of cancer-related deaths. While The Cancer Genome Atlas (TCGA) has sequenced primary tumor types obtained from surgical resections, much less comprehensive molecular analysis is available from clinically acquired metastatic cancers. Here, we perform whole exome and transcriptome sequencing of 500 adult patients with metastatic solid tumors of diverse lineage and biopsy site. The most prevalent genes somatically altered in metastatic cancer included TP53, CDKN2A, PTEN, PIK3CA, and RB1. Putative pathogenic germline variants were present in 12.2% of cases of which 75% were related to defects in DNA repair. RNA sequencing complemented DNA sequencing for the identification of gene fusions, pathway activation, and immune profiling. Integrative sequence analysis provides a clinically relevant, multi-dimensional view of the complex molecular landscape and microenvironment of metastatic cancers. PMID:28783718

  19. Selective 2′-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) for direct, versatile, and accurate RNA structure analysis

    PubMed Central

    Smola, Matthew J.; Rice, Greggory M.; Busan, Steven; Siegfried, Nathan A.; Weeks, Kevin M.

    2016-01-01

    SHAPE chemistries exploit small electrophilic reagents that react with the 2′-hydroxyl group to interrogate RNA structure at single-nucleotide resolution. Mutational profiling (MaP) identifies modified residues based on the ability of reverse transcriptase to misread a SHAPE-modified nucleotide and then counting the resulting mutations by massively parallel sequencing. The SHAPE-MaP approach measures the structure of large and transcriptome-wide systems as accurately as for simple model RNAs. This protocol describes the experimental steps, implemented over three days, required to perform SHAPE probing and construct multiplexed SHAPE-MaP libraries suitable for deep sequencing. These steps include RNA folding and SHAPE structure probing, mutational profiling by reverse transcription, library construction, and sequencing. Automated processing of MaP sequencing data is accomplished using two software packages. ShapeMapper converts raw sequencing files into mutational profiles, creates SHAPE reactivity plots, and provides useful troubleshooting information, often within an hour. SuperFold uses these data to model RNA secondary structures, identify regions with well-defined structures, and visualize probable and alternative helices, often in under a day. We illustrate these algorithms with the E. coli thiamine pyrophosphate riboswitch, E. coli 16S rRNA, and HIV-1 genomic RNAs. SHAPE-MaP can be used to make nucleotide-resolution biophysical measurements of individual RNA motifs, rare components of complex RNA ensembles, and entire transcriptomes. The straightforward MaP strategy greatly expands the number, length, and complexity of analyzable RNA structures. PMID:26426499

  20. The Genome Sequencer FLX System--longer reads, more applications, straight forward bioinformatics and more complete data sets.

    PubMed

    Droege, Marcus; Hill, Brendon

    2008-08-31

    The Genome Sequencer FLX System (GS FLX), powered by 454 Sequencing, is a next-generation DNA sequencing technology featuring a unique mix of long reads, exceptional accuracy, and ultra-high throughput. It has been proven to be the most versatile of all currently available next-generation sequencing technologies, supporting many high-profile studies in over seven applications categories. GS FLX users have pursued innovative research in de novo sequencing, re-sequencing of whole genomes and target DNA regions, metagenomics, and RNA analysis. 454 Sequencing is a powerful tool for human genetics research, having recently re-sequenced the genome of an individual human, currently re-sequencing the complete human exome and targeted genomic regions using the NimbleGen sequence capture process, and detected low-frequency somatic mutations linked to cancer.

  1. Indexed variation graphs for efficient and accurate resistome profiling.

    PubMed

    Rowe, Will P M; Winn, Martyn D

    2018-05-14

    Antimicrobial resistance remains a major threat to global health. Profiling the collective antimicrobial resistance genes within a metagenome (the "resistome") facilitates greater understanding of antimicrobial resistance gene diversity and dynamics. In turn, this can allow for gene surveillance, individualised treatment of bacterial infections and more sustainable use of antimicrobials. However, resistome profiling can be complicated by high similarity between reference genes, as well as the sheer volume of sequencing data and the complexity of analysis workflows. We have developed an efficient and accurate method for resistome profiling that addresses these complications and improves upon currently available tools. Our method combines a variation graph representation of gene sets with an LSH Forest indexing scheme to allow for fast classification of metagenomic sequence reads using similarity-search queries. Subsequent hierarchical local alignment of classified reads against graph traversals enables accurate reconstruction of full-length gene sequences using a scoring scheme. We provide our implementation, GROOT, and show it to be both faster and more accurate than a current reference-dependent tool for resistome profiling. GROOT runs on a laptop and can process a typical 2 gigabyte metagenome in 2 minutes using a single CPU. Our method is not restricted to resistome profiling and has the potential to improve current metagenomic workflows. GROOT is written in Go and is available at https://github.com/will-rowe/groot (MIT license). will.rowe@stfc.ac.uk. Supplementary data are available at Bioinformatics online.

  2. Information theory applications for biological sequence analysis.

    PubMed

    Vinga, Susana

    2014-05-01

    Information theory (IT) addresses the analysis of communication systems and has been widely applied in molecular biology. In particular, alignment-free sequence analysis and comparison greatly benefited from concepts derived from IT, such as entropy and mutual information. This review covers several aspects of IT applications, ranging from genome global analysis and comparison, including block-entropy estimation and resolution-free metrics based on iterative maps, to local analysis, comprising the classification of motifs, prediction of transcription factor binding sites and sequence characterization based on linguistic complexity and entropic profiles. IT has also been applied to high-level correlations that combine DNA, RNA or protein features with sequence-independent properties, such as gene mapping and phenotype analysis, and has also provided models based on communication systems theory to describe information transmission channels at the cell level and also during evolutionary processes. While not exhaustive, this review attempts to categorize existing methods and to indicate their relation with broader transversal topics such as genomic signatures, data compression and complexity, time series analysis and phylogenetic classification, providing a resource for future developments in this promising area.

  3. Identification of Microbial Profile of Koji Using Single Molecule, Real-Time Sequencing Technology.

    PubMed

    Hui, Wenyan; Hou, Qiangchuan; Cao, Chenxia; Xu, Haiyan; Zhen, Yi; Kwok, Lai-Yu; Sun, Tiansong; Zhang, Heping; Zhang, Wenyi

    2017-05-01

    Koji is a kind of Japanese traditional fermented starter that has been used for centuries. Many fermented foods are made from koji, such as sake, miso, and soy sauce. This study used the single molecule real-time sequencing technology (SMRT) to investigate the bacterial and fungal microbiota of 3 Japanese koji samples. After SMRT analysis, a total of 39121 high-quality sequences were generated, including 14354 bacterial and 24767 fungal sequence reads. The high-quality gene sequences were assigned to 5 bacterial and 2 fungal plyla, dominated by Proteobacteria and Ascomycota, respectively. At the genus level, Ochrobactrum and Wickerhamomyces were the most abundant bacterial and fungal genera, respectively. The predominant bacterial and fungal species were Ochrobactrum lupini and Wickerhamomyces anomalus, respectively. Our study profiled the microbiota composition of 3 Japanese koji samples to the species level precision. The results may be useful for further development of traditional fermented products, especially optimization of koji preparation. Meanwhile, this study has demonstrated that SMRT is a robust tool for analyzing the microbial composition in food samples. © 2017 Institute of Food Technologists®.

  4. Genome-wide profiling of DNA-binding proteins using barcode-based multiplex Solexa sequencing.

    PubMed

    Raghav, Sunil Kumar; Deplancke, Bart

    2012-01-01

    Chromatin immunoprecipitation (ChIP) is a commonly used technique to detect the in vivo binding of proteins to DNA. ChIP is now routinely paired to microarray analysis (ChIP-chip) or next-generation sequencing (ChIP-Seq) to profile the DNA occupancy of proteins of interest on a genome-wide level. Because ChIP-chip introduces several biases, most notably due to the use of a fixed number of probes, ChIP-Seq has quickly become the method of choice as, depending on the sequencing depth, it is more sensitive, quantitative, and provides a greater binding site location resolution. With the ever increasing number of reads that can be generated per sequencing run, it has now become possible to analyze several samples simultaneously while maintaining sufficient sequence coverage, thus significantly reducing the cost per ChIP-Seq experiment. In this chapter, we provide a step-by-step guide on how to perform multiplexed ChIP-Seq analyses. As a proof-of-concept, we focus on the genome-wide profiling of RNA Polymerase II as measuring its DNA occupancy at different stages of any biological process can provide insights into the gene regulatory mechanisms involved. However, the protocol can also be used to perform multiplexed ChIP-Seq analyses of other DNA-binding proteins such as chromatin modifiers and transcription factors.

  5. Specific identification of Bacillus anthracis strains

    NASA Astrophysics Data System (ADS)

    Krishnamurthy, Thaiya; Deshpande, Samir; Hewel, Johannes; Liu, Hongbin; Wick, Charles H.; Yates, John R., III

    2007-01-01

    Accurate identification of human pathogens is the initial vital step in treating the civilian terrorism victims and military personnel afflicted in biological threat situations. We have applied a powerful multi-dimensional protein identification technology (MudPIT) along with newly generated software termed Profiler to identify the sequences of specific proteins observed for few strains of Bacillus anthracis, a human pathogen. Software termed Profiler was created to initially screen the MudPIT data of B. anthracis strains and establish the observed proteins specific for its strains. A database was also generated using Profiler containing marker proteins of B. anthracis and its strains, which in turn could be used for detecting the organism and its corresponding strains in samples. Analysis of the unknowns by our methodology, combining MudPIT and Profiler, led to the accurate identification of the anthracis strains present in samples. Thus, a new approach for the identification of B. anthracis strains in unknown samples, based on the molecular mass and sequences of marker proteins, has been ascertained.

  6. An Agile Functional Analysis of Metagenomic Data Using SUPER-FOCUS.

    PubMed

    Silva, Genivaldo Gueiros Z; Lopes, Fabyano A C; Edwards, Robert A

    2017-01-01

    One of the main goals in metagenomics is to identify the functional profile of a microbial community from unannotated shotgun sequencing reads. Functional annotation is important in biological research because it enables researchers to identify the abundance of functional genes of the organisms present in the sample, answering the question, "What can the organisms in the sample do?" Most currently available approaches do not scale with increasing data volumes, which is important because both the number and lengths of the reads provided by sequencing platforms keep increasing. Here, we present SUPER-FOCUS, SUbsystems Profile by databasE Reduction using FOCUS, an agile homology-based approach using a reduced reference database to report the subsystems present in metagenomic datasets and profile their abundances. SUPER-FOCUS was tested with real metagenomes, and the results show that it accurately predicts the subsystems present in the profiled microbial communities, is computationally efficient, and up to 1000 times faster than other tools. SUPER-FOCUS is freely available at http://edwards.sdsu.edu/SUPERFOCUS .

  7. Overview of recurrent chromosomal losses in retinoblastoma detected by low coverage next generation sequencing

    PubMed Central

    García-Chequer, A.J.; Méndez-Tenorio, A.; Olguín-Ruiz, G.; Sánchez-Vallejo, C.; Isa, P.; Arias, C.F.; Torres, J.; Hernández-Angeles, A.; Ramírez-Ortiz, M.A.; Lara, C.; Cabrera-Muñoz, M.L.; Sadowinski-Pine, S.; Bravo-Ortiz, J.C.; Ramón-García, G.; Diegopérez-Ramírez, J.; Ramírez-Reyes, G.; Casarrubias-Islas, R.; Ramírez, J.; Orjuela, M.A.; Ponce-Castañeda, M.V.

    2016-01-01

    Genes are frequently lost or gained in malignant tumors and the analysis of these changes can be informative about the underlying tumor biology. Retinoblastoma is a pediatric intraocular malignancy, and since deletions in chromosome 13 have been described in this tumor, we performed genome wide sequencing with the Illumina platform to test whether recurrent losses could be detected in low coverage data from DNA pools of Rb cases. An in silico reference profile for each pool was created from the human genome sequence GRCh37p5; a chromosome integrity score and a graphics 40 Kb window analysis approach, allowed us to identify with high resolution previously reported non random recurrent losses in all chromosomes of these tumors. We also found a pattern of gains and losses associated to clear and dark cytogenetic bands respectively. We further analyze a pool of medulloblastoma and found a more stable genomic profile and previously reported losses in this tumor. This approach facilitates identification of recurrent deletions from many patients that may be biological relevant for tumor development. PMID:26883451

  8. PanGEA: identification of allele specific gene expression using the 454 technology.

    PubMed

    Kofler, Robert; Teixeira Torres, Tatiana; Lelley, Tamas; Schlötterer, Christian

    2009-05-14

    Next generation sequencing technologies hold great potential for many biological questions. While mainly used for genomic sequencing, they are also very promising for gene expression profiling. Sequencing of cDNA does not only provide an estimate of the absolute expression level, it can also be used for the identification of allele specific gene expression. We developed PanGEA, a tool which enables a fast and user-friendly analysis of allele specific gene expression using the 454 technology. PanGEA allows mapping of 454-ESTs to genes or whole genomes, displaying gene expression profiles, identification of SNPs and the quantification of allele specific gene expression. The intuitive GUI of PanGEA facilitates a flexible and interactive analysis of the data. PanGEA additionally implements a modification of the Smith-Waterman algorithm which deals with incorrect estimates of homopolymer length as occuring in the 454 technology To our knowledge, PanGEA is the first tool which facilitates the identification of allele specific gene expression. PanGEA is distributed under the Mozilla Public License and available at: http://www.kofler.or.at/bioinformatics/PanGEA

  9. PanGEA: Identification of allele specific gene expression using the 454 technology

    PubMed Central

    Kofler, Robert; Teixeira Torres, Tatiana; Lelley, Tamas; Schlötterer, Christian

    2009-01-01

    Background Next generation sequencing technologies hold great potential for many biological questions. While mainly used for genomic sequencing, they are also very promising for gene expression profiling. Sequencing of cDNA does not only provide an estimate of the absolute expression level, it can also be used for the identification of allele specific gene expression. Results We developed PanGEA, a tool which enables a fast and user-friendly analysis of allele specific gene expression using the 454 technology. PanGEA allows mapping of 454-ESTs to genes or whole genomes, displaying gene expression profiles, identification of SNPs and the quantification of allele specific gene expression. The intuitive GUI of PanGEA facilitates a flexible and interactive analysis of the data. PanGEA additionally implements a modification of the Smith-Waterman algorithm which deals with incorrect estimates of homopolymer length as occuring in the 454 technology Conclusion To our knowledge, PanGEA is the first tool which facilitates the identification of allele specific gene expression. PanGEA is distributed under the Mozilla Public License and available at: PMID:19442283

  10. Identification of differentially expressed genes in cucumber (Cucumis sativus L.) root under waterlogging stress by digital gene expression profile.

    PubMed

    Qi, Xiao-Hua; Xu, Xue-Wen; Lin, Xiao-Jian; Zhang, Wen-Jie; Chen, Xue-Hao

    2012-03-01

    High-throughput tag-sequencing (Tag-seq) analysis based on the Solexa Genome Analyzer platform was applied to analyze the gene expression profiling of cucumber plant at 5 time points over a 24h period of waterlogging treatment. Approximately 5.8 million total clean sequence tags per library were obtained with 143013 distinct clean tag sequences. Approximately 23.69%-29.61% of the distinct clean tags were mapped unambiguously to the unigene database, and 53.78%-60.66% of the distinct clean tags were mapped to the cucumber genome database. Analysis of the differentially expressed genes revealed that most of the genes were down-regulated in the waterlogging stages, and the differentially expressed genes mainly linked to carbon metabolism, photosynthesis, reactive oxygen species generation/scavenging, and hormone synthesis/signaling. Finally, quantitative real-time polymerase chain reaction using nine genes independently verified the tag-mapped results. This present study reveals the comprehensive mechanisms of waterlogging-responsive transcription in cucumber. Copyright © 2011 Elsevier Inc. All rights reserved.

  11. De novo assembled expressed gene catalog of a fast-growing Eucalyptus tree produced by Illumina mRNA-Seq

    PubMed Central

    2010-01-01

    Background De novo assembly of transcript sequences produced by short-read DNA sequencing technologies offers a rapid approach to obtain expressed gene catalogs for non-model organisms. A draft genome sequence will be produced in 2010 for a Eucalyptus tree species (E. grandis) representing the most important hardwood fibre crop in the world. Genome annotation of this valuable woody plant and genetic dissection of its superior growth and productivity will be greatly facilitated by the availability of a comprehensive collection of expressed gene sequences from multiple tissues and organs. Results We present an extensive expressed gene catalog for a commercially grown E. grandis × E. urophylla hybrid clone constructed using only Illumina mRNA-Seq technology and de novo assembly. A total of 18,894 transcript-derived contigs, a large proportion of which represent full-length protein coding genes were assembled and annotated. Analysis of assembly quality, length and diversity show that this dataset represent the most comprehensive expressed gene catalog for any Eucalyptus tree. mRNA-Seq analysis furthermore allowed digital expression profiling of all of the assembled transcripts across diverse xylogenic and non-xylogenic tissues, which is invaluable for ascribing putative gene functions. Conclusions De novo assembly of Illumina mRNA-Seq reads is an efficient approach for transcriptome sequencing and profiling in Eucalyptus and other non-model organisms. The transcriptome resource (Eucspresso, http://eucspresso.bi.up.ac.za/) generated by this study will be of value for genomic analysis of woody biomass production in Eucalyptus and for comparative genomic analysis of growth and development in woody and herbaceous plants. PMID:21122097

  12. High-resolution community profiling of arbuscular mycorrhizal fungi.

    PubMed

    Schlaeppi, Klaus; Bender, S Franz; Mascher, Fabio; Russo, Giancarlo; Patrignani, Andrea; Camenzind, Tessa; Hempel, Stefan; Rillig, Matthias C; van der Heijden, Marcel G A

    2016-11-01

    Community analyses of arbuscular mycorrhizal fungi (AMF) using ribosomal small subunit (SSU) or internal transcribed spacer (ITS) DNA sequences often suffer from low resolution or coverage. We developed a novel sequencing based approach for a highly resolving and specific profiling of AMF communities. We took advantage of previously established AMF-specific PCR primers that amplify a c. 1.5-kb long fragment covering parts of SSU, ITS and parts of the large ribosomal subunit (LSU), and we sequenced the resulting amplicons with single molecule real-time (SMRT) sequencing. The method was applicable to soil and root samples, detected all major AMF families and successfully discriminated closely related AMF species, which would not be discernible using SSU sequences. In inoculation tests we could trace the introduced AMF inoculum at the molecular level. One of the introduced strains almost replaced the local strain(s), revealing that AMF inoculation can have a profound impact on the native community. The methodology presented offers researchers a powerful new tool for AMF community analysis because it unifies improved specificity and enhanced resolution, whereas the drawback of medium sequencing throughput appears of lesser importance for low-diversity groups such as AMF. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.

  13. Fungal Genomics Program

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Grigoriev, Igor

    The JGI Fungal Genomics Program aims to scale up sequencing and analysis of fungal genomes to explore the diversity of fungi important for energy and the environment, and to promote functional studies on a system level. Combining new sequencing technologies and comparative genomics tools, JGI is now leading the world in fungal genome sequencing and analysis. Over 120 sequenced fungal genomes with analytical tools are available via MycoCosm (www.jgi.doe.gov/fungi), a web-portal for fungal biologists. Our model of interacting with user communities, unique among other sequencing centers, helps organize these communities, improves genome annotation and analysis work, and facilitates new larger-scalemore » genomic projects. This resulted in 20 high-profile papers published in 2011 alone and contributing to the Genomics Encyclopedia of Fungi, which targets fungi related to plant health (symbionts, pathogens, and biocontrol agents) and biorefinery processes (cellulose degradation, sugar fermentation, industrial hosts). Our next grand challenges include larger scale exploration of fungal diversity (1000 fungal genomes), developing molecular tools for DOE-relevant model organisms, and analysis of complex systems and metagenomes.« less

  14. Analysis of Bacterial Community Structure in Sulfurous-Oil-Containing Soils and Detection of Species Carrying Dibenzothiophene Desulfurization (dsz) Genes

    PubMed Central

    Duarte, Gabriela Frois; Rosado, Alexandre Soares; Seldin, Lucy; de Araujo, Welington; van Elsas, Jan Dirk

    2001-01-01

    The selective effects of sulfur-containing hydrocarbons, with respect to changes in bacterial community structure and selection of desulfurizing organisms and genes, were studied in soil. Samples taken from a polluted field soil (A) along a concentration gradient of sulfurous oil and from soil microcosms treated with dibenzothiophene (DBT)-containing petroleum (FSL soil) were analyzed. Analyses included plate counts of total bacteria and of DBT utilizers, molecular community profiling via soil DNA-based PCR-denaturing gradient gel electrophoresis (PCR-DGGE), and detection of genes that encode enzymes involved in the desulfurization of hydrocarbons, i.e., dszA, dszB, and dszC.Data obtained from the A soil showed no discriminating effects of oil levels on the culturable bacterial numbers on either medium used. Generally, counts of DBT degraders were 10- to 100-fold lower than the total culturable counts. However, PCR-DGGE showed that the numbers of bands detected in the molecular community profiles decreased with increasing oil content of the soil. Analysis of the sequences of three prominent bands of the profiles generated with the highly polluted soil samples suggested that the underlying organisms were related to Actinomyces sp., Arthrobacter sp., and a bacterium of uncertain affiliation. dszA, dszB, and dszC genes were present in all A soil samples, whereas a range of unpolluted soils gave negative results in this analysis. Results from the study of FSL soil revealed minor effects of the petroleum-DBT treatment on culturable bacterial numbers and clear effects on the DBT-utilizing communities. The molecular community profiles were largely stable over time in the untreated soil, whereas they showed a progressive change over time following treatment with DBT-containing petroleum. Direct PCR assessment revealed the presence of dszB-related signals in the untreated FSL soil and the apparent selection of dszA- and dszC-related sequences by the petroleum-DBT treatment. PCR-DGGE applied to sequential enrichment cultures in DBT-containing sulfur-free basal salts medium prepared from the A and treated FSL soils revealed the selection of up to 10 distinct bands. Sequencing a subset of these bands provided evidence for the presence of organisms related to Pseudomonas putida, a Pseudomonas sp., Stenotrophomonas maltophilia, and Rhodococcus erythropolis. Several of 52 colonies obtained from the A and FSL soils on agar plates with DBT as the sole sulfur source produced bands that matched the migration of bands selected in the enrichment cultures. Evidence for the presence of dszB in 12 strains was obtained, whereas dszA and dszC genes were found in only 7 and 6 strains, respectively. Most of the strains carrying dszA or dszC were classified as R. erythropolis related, and all revealed the capacity to desulfurize DBT. A comparison of 37 dszA sequences, obtained via PCR from the A and FSL soils, from enrichments of these soils, and from isolates, revealed the great similarity of all sequences to the canonical (R. erythropolis strain IGTS8) dszA sequence and a large degree of internal conservation. The 37 sequences recovered were grouped in three clusters. One group, consisting of 30 sequences, was minimally 98% related to the IGTS8 sequence, a second group of 2 sequences was slightly different, and a third group of 5 sequences was 95% similar. The first two groups contained sequences obtained from both soil types and enrichment cultures (including isolates), but the last consisted of sequences obtained directly from the polluted A soil. PMID:11229891

  15. Application of denaturing gradient gel electrophoresis (DGGE) to the analysis of microbial communities of subgingival plaque.

    PubMed

    Fujimoto, C; Maeda, H; Kokeguchi, S; Takashiba, S; Nishimura, F; Arai, H; Fukui, K; Murayama, Y

    2003-08-01

    Denaturing gradient gel electrophoresis (DGGE) was applied to the microbiologic examination of subgingival plaque. The PCR primers were designed from conserved nucleotide sequences on 16S ribosomal RNA gene (16SrDNA) with GC rich clamp at the 5'-end. Polymerase chain reaction (PCR) was performed using the primers and genomic DNAs of typical periodontal bacteria. The generated 16SrDNA fragments were separated by denaturing gel. Although the sizes of the amplified DNA fragments were almost the same among the species, 16SrDNAs of the periodontal bacteria were distinguished according to their specific sequences. The microflora of clinical plaque samples were profiled by the PCR-DGGE method, and the dominant 16SrDNA bands were cloned and sequenced. Simultaneously, Actinobacillus actinomycetemcomitans, Porphyromonas gingivalis and Prevotella intermedia were detected by an ordinary PCR method. In the deep periodontal pockets, the bacterial community structures were complicated and P. gingivalis was the most dominant species, whereas the DGGE profiles were simple and Streptococcus or Neisseria species were dominant in the shallow pockets. The species-specific PCR method revealed the presence of A. actinomycetemcomitans, P. gingivalis and P. intermedia in the clinical samples. However, corresponding bands were not always observed in the DGGE profiles, indicating a lower sensitivity of the DGGE method. Although the DGGE method may have a lower sensitivity than the ordinary PCR methods, it could visualize the bacterial qualitative compositions and reveal the major species of the plaque. The DGGE analysis and following sequencing may have the potential to be a promising bacterial examination procedure in periodontal diseases.

  16. Improving protein complex classification accuracy using amino acid composition profile.

    PubMed

    Huang, Chien-Hung; Chou, Szu-Yu; Ng, Ka-Lok

    2013-09-01

    Protein complex prediction approaches are based on the assumptions that complexes have dense protein-protein interactions and high functional similarity between their subunits. We investigated those assumptions by studying the subunits' interaction topology, sequence similarity and molecular function for human and yeast protein complexes. Inclusion of amino acids' physicochemical properties can provide better understanding of protein complex properties. Principal component analysis is carried out to determine the major features. Adopting amino acid composition profile information with the SVM classifier serves as an effective post-processing step for complexes classification. Improvement is based on primary sequence information only, which is easy to obtain. Copyright © 2013 Elsevier Ltd. All rights reserved.

  17. The complete and fully assembled genome sequence of Aeromonas salmonicida subsp. pectinolytica and its comparative analysis with other Aeromonas species: investigation of the mobilome in environmental and pathogenic strains.

    PubMed

    Pfeiffer, Friedhelm; Zamora-Lagos, Maria-Antonia; Blettinger, Martin; Yeroslaviz, Assa; Dahl, Andreas; Gruber, Stephan; Habermann, Bianca H

    2018-01-05

    Due to the predominant usage of short-read sequencing to date, most bacterial genome sequences reported in the last years remain at the draft level. This precludes certain types of analyses, such as the in-depth analysis of genome plasticity. Here we report the finalized genome sequence of the environmental strain Aeromonas salmonicida subsp. pectinolytica 34mel, for which only a draft genome with 253 contigs is currently available. Successful completion of the transposon-rich genome critically depended on the PacBio long read sequencing technology. Using finalized genome sequences of A. salmonicida subsp. pectinolytica and other Aeromonads, we report the detailed analysis of the transposon composition of these bacterial species. Mobilome evolution is exemplified by a complex transposon, which has shifted from pathogenicity-related to environmental-related gene content in A. salmonicida subsp. pectinolytica 34mel. Obtaining the complete, circular genome of A. salmonicida subsp. pectinolytica allowed us to perform an in-depth analysis of its mobilome. We demonstrate the mobilome-dependent evolution of this strain's genetic profile from pathogenic to environmental.

  18. Microbiological profile of chicken carcasses: A comparative analysis using shotgun metagenomic sequencing

    PubMed Central

    Cesare, Alessandra De; Palma, Federica; Lucchi, Alex; Pasquali, Frederique; Manfreda, Gerardo

    2018-01-01

    In the last few years metagenomic and 16S rRNA sequencing have completly changed the microbiological investigations of food products. In this preliminary study, the microbiological profile of chicken carcasses collected from animals fed with different diets were tested by using shotgun metagenomic sequencing. A total of 15 carcasses have been collected at the slaughetrhouse at the end of the refrigeration tunnel from chickens reared for 35 days and fed with a control diet (n=5), a diet supplemented with 1500 FTU/kg of commercial phytase (n=5) and a diet supplemented with 1500 FTU/kg of commercial phytase and 3g/kg of inositol (n=5). Ten grams of neck and breast skin were obtained from each carcass and submited to total DNA extraction by using the DNeasy Blood & Tissue Kit (Qiagen). Sequencing libraries have been prepared by using the Nextera XT DNA Library Preparation Kit (Illumina) and sequenced in a HiScanSQ (Illumina) at 100 bp in paired ends. A number of sequences ranging between 5 and 9 million was obtained for each sample. Sequence analysis showed that Proteobacteria and Firmicutes represented more than 98% of whole bacterial populations associated to carcass skin in all groups but their abundances were different between groups. Moraxellaceae and other degradative bacteria showed a significantly higher abundance in the control compared to the treated groups. Furthermore, Clostridium perfringens showed a relative frequency of abundance significantly higher in the group fed with phytase and Salmonella enterica in the group fed with phytase plus inositol. The results of this preliminary study showed that metagenome sequencing is suitable to investigate and monitor carcass microbiota in order to detect specific pathogenic and/or degradative populations. PMID:29732327

  19. Molecular evidence of Burkholderia pseudomallei genotypes based on geographical distribution.

    PubMed

    Zulkefli, Noorfatin Jihan; Mariappan, Vanitha; Vellasamy, Kumutha Malar; Chong, Chun Wie; Thong, Kwai Lin; Ponnampalavanar, Sasheela; Vadivelu, Jamuna; Teh, Cindy Shuan Ju

    2016-01-01

    Background. Central intermediary metabolism (CIM) in bacteria is defined as a set of metabolic biochemical reactions within a cell, which is essential for the cell to survive in response to environmental perturbations. The genes associated with CIM are commonly found in both pathogenic and non-pathogenic strains. As these genes are involved in vital metabolic processes of bacteria, we explored the efficiency of the genes in genotypic characterization of Burkholderia pseudomallei isolates, compared with the established pulsed-field gel electrophoresis (PFGE) and multilocus sequence typing (MLST) schemes. Methods. Nine previously sequenced B. pseudomallei isolates from Malaysia were characterized by PFGE, MLST and CIM genes. The isolates were later compared to the other 39 B. pseudomallei strains, retrieved from GenBank using both MLST and sequence analysis of CIM genes. UniFrac and hierachical clustering analyses were performed using the results generated by both MLST and sequence analysis of CIM genes. Results. Genetic relatedness of nine Malaysian B. pseudomallei isolates and the other 39 strains was investigated. The nine Malaysian isolates were subtyped into six PFGE profiles, four MLST profiles and five sequence types based on CIM genes alignment. All methods demonstrated the clonality of OB and CB as well as CMS and THE. However, PFGE showed less than 70% similarity between a pair of morphology variants, OS and OB. In contrast, OS was identical to the soil isolate, MARAN. To have a better understanding of the genetic diversity of B. pseudomallei worldwide, we further aligned the sequences of genes used in MLST and genes associated with CIM for the nine Malaysian isolates and 39 B. pseudomallei strains from NCBI database. Overall, based on the CIM genes, the strains were subtyped into 33 profiles where majority of the strains from Asian countries were clustered together. On the other hand, MLST resolved the isolates into 31 profiles which formed three clusters. Hierarchical clustering using UniFrac distance suggested that the isolates from Australia were genetically distinct from the Asian isolates. Nevertheless, statistical significant differences were detected between isolates from Malaysia, Thailand and Australia. Discussion. Overall, PFGE showed higher discriminative power in clustering the nine Malaysian B. pseudomallei isolates and indicated its suitability for localized epidemiological study. Compared to MLST, CIM genes showed higher resolution in distinguishing those non-related strains and better clustering of strains from different geographical regions. A closer genetic relatedness of Malaysian isolates with all Asian strains in comparison to Australian strains was observed. This finding was supported by UniFrac analysis which resulted in geographical segregation between Australia and the Asian countries.

  20. Analysis of DNA methylation in FFPE tissues using the MethyLight technology.

    PubMed

    Dallol, Ashraf; Al-Ali, Waleed; Al-Shaibani, Amina; Al-Mulla, Fahd

    2011-01-01

    Novel biomarkers are sought after by mining DNA extracted from formalin-fixed, paraffin-embedded (FFPE) tissues. Such tissues offer the great advantage of often having complete clinical data (including survival), as well as the tissues are amenable for laser microdissection targeting specific tissue areas. Downstream analysis of such DNA includes mutational screens and methylation profiling. Screening for mutations by sequencing requires a significant amount of DNA for PCR and cycle sequencing. This is self-inhibitory if the gene screened has a large number of exons. Profiling DNA methylation using the MethyLight technology circumvents this problem and allows for the mining of several biomarkers from DNA extracted from a single microscope slide of the tissue of interest. We describe in this chapter a detailed protocol for MethyLight and its use in the determination of CpG Island Methylator Phenotype status in FFPE colorectal cancer samples.

  1. Quantitative profiling of immune repertoires for minor lymphocyte counts using unique molecular identifiers.

    PubMed

    Egorov, Evgeny S; Merzlyak, Ekaterina M; Shelenkov, Andrew A; Britanova, Olga V; Sharonov, George V; Staroverov, Dmitriy B; Bolotin, Dmitriy A; Davydov, Alexey N; Barsova, Ekaterina; Lebedev, Yuriy B; Shugay, Mikhail; Chudakov, Dmitriy M

    2015-06-15

    Emerging high-throughput sequencing methods for the analyses of complex structure of TCR and BCR repertoires give a powerful impulse to adaptive immunity studies. However, there are still essential technical obstacles for performing a truly quantitative analysis. Specifically, it remains challenging to obtain comprehensive information on the clonal composition of small lymphocyte populations, such as Ag-specific, functional, or tissue-resident cell subsets isolated by sorting, microdissection, or fine needle aspirates. In this study, we report a robust approach based on unique molecular identifiers that allows profiling Ag receptors for several hundred to thousand lymphocytes while preserving qualitative and quantitative information on clonal composition of the sample. We also describe several general features regarding the data analysis with unique molecular identifiers that are critical for accurate counting of starting molecules in high-throughput sequencing applications. Copyright © 2015 by The American Association of Immunologists, Inc.

  2. Transcriptome profile of Trichoderma harzianum IOC-3844 induced by sugarcane bagasse.

    PubMed

    Horta, Maria Augusta Crivelente; Vicentini, Renato; Delabona, Priscila da Silva; Laborda, Prianda; Crucello, Aline; Freitas, Sindélia; Kuroshu, Reginaldo Massanobu; Polikarpov, Igor; Pradella, José Geraldo da Cruz; Souza, Anete Pereira

    2014-01-01

    Profiling the transcriptome that underlies biomass degradation by the fungus Trichoderma harzianum allows the identification of gene sequences with potential application in enzymatic hydrolysis processing. In the present study, the transcriptome of T. harzianum IOC-3844 was analyzed using RNA-seq technology. The sequencing generated 14.7 Gbp for downstream analyses. De novo assembly resulted in 32,396 contigs, which were submitted for identification and classified according to their identities. This analysis allowed us to define a principal set of T. harzianum genes that are involved in the degradation of cellulose and hemicellulose and the accessory genes that are involved in the depolymerization of biomass. An additional analysis of expression levels identified a set of carbohydrate-active enzymes that are upregulated under different conditions. The present study provides valuable information for future studies on biomass degradation and contributes to a better understanding of the role of the genes that are involved in this process.

  3. Plant-RRBS, a bisulfite and next-generation sequencing-based methylome profiling method enriching for coverage of cytosine positions.

    PubMed

    Schmidt, Martin; Van Bel, Michiel; Woloszynska, Magdalena; Slabbinck, Bram; Martens, Cindy; De Block, Marc; Coppens, Frederik; Van Lijsebettens, Mieke

    2017-07-06

    Cytosine methylation in plant genomes is important for the regulation of gene transcription and transposon activity. Genome-wide methylomes are studied upon mutation of the DNA methyltransferases, adaptation to environmental stresses or during development. However, from basic biology to breeding programs, there is a need to monitor multiple samples to determine transgenerational methylation inheritance or differential cytosine methylation. Methylome data obtained by sodium hydrogen sulfite (bisulfite)-conversion and next-generation sequencing (NGS) provide genome-wide information on cytosine methylation. However, a profiling method that detects cytosine methylation state dispersed over the genome would allow high-throughput analysis of multiple plant samples with distinct epigenetic signatures. We use specific restriction endonucleases to enrich for cytosine coverage in a bisulfite and NGS-based profiling method, which was compared to whole-genome bisulfite sequencing of the same plant material. We established an effective methylome profiling method in plants, termed plant-reduced representation bisulfite sequencing (plant-RRBS), using optimized double restriction endonuclease digestion, fragment end repair, adapter ligation, followed by bisulfite conversion, PCR amplification and NGS. We report a performant laboratory protocol and a straightforward bioinformatics data analysis pipeline for plant-RRBS, applicable for any reference-sequenced plant species. As a proof of concept, methylome profiling was performed using an Oryza sativa ssp. indica pure breeding line and a derived epigenetically altered line (epiline). Plant-RRBS detects methylation levels at tens of millions of cytosine positions deduced from bisulfite conversion in multiple samples. To evaluate the method, the coverage of cytosine positions, the intra-line similarity and the differential cytosine methylation levels between the pure breeding line and the epiline were determined. Plant-RRBS reproducibly covers commonly up to one fourth of the cytosine positions in the rice genome when using MspI-DpnII within a group of five biological replicates of a line. The method predominantly detects cytosine methylation in putative promoter regions and not-annotated regions in rice. Plant-RRBS offers high-throughput and broad, genome-dispersed methylation detection by effective read number generation obtained from reproducibly covered genome fractions using optimized endonuclease combinations, facilitating comparative analyses of multi-sample studies for cytosine methylation and transgenerational stability in experimental material and plant breeding populations.

  4. Molecular studies on larvae of Pseudoterranova parasite of Trichiurus lepturus Linnaeus, 1758 and Pomatomus saltatrix (Linnaeus, 1766) off Brazilian waters.

    PubMed

    Borges, Juliana N; Cunha, Luiz F G; Miranda, Daniele F; Monteiro-Neto, Cassiano; Santos, Cláudia P

    2015-12-01

    Pseudoterranova larvae parasitizing cutlassfish Trichiurus lepturus and bluefish Pomatomus saltatrix from Southwest Atlantic coast of Brazil were studied in this work by morphological, ultrastructural and molecular approaches. The genetic analysis were performed for the ITS2 intergenic region specific for Pseudoterranova decipiens, the partial 28S (LSU) of ribosomal DNA and the mtDNA cox-1 region. We obtained results for the 28S region and mtDNA cox-1 that was amplified using the polymerase chain reaction and sequenced to evaluate the phylogenetic relationships between sequences of this study and sequences from the GenBank. The morphological profile indicated that all the nine specimens collected from both fish were L3 larvae of Pseudoterranova sp. The genetic profile confirmed the generic level but due to the absence of similar sequences for adult parasites on GenBank for the regions amplifyied, it was not possible to identify them to the species level. The sequences obtained presented 89% of similarity with Pseudoterranova decipiens (28S sequences) and Contracaecum osculatum B (mtDNA cox-1). The low similarity allied to the fact that the amplification with the specific primer for P. decipiens didn't occur, lead us to conclude that our sequences don't belong to P. decipiens complex.

  5. Accounting for biases in riboprofiling data indicates a major role for proline in stalling translation.

    PubMed

    Artieri, Carlo G; Fraser, Hunter B

    2014-12-01

    The recent advent of ribosome profiling-sequencing of short ribosome-bound fragments of mRNA-has offered an unprecedented opportunity to interrogate the sequence features responsible for modulating translational rates. Nevertheless, numerous analyses of the first riboprofiling data set have produced equivocal and often incompatible results. Here we analyze three independent yeast riboprofiling data sets, including two with much higher coverage than previously available, and find that all three show substantial technical sequence biases that confound interpretations of ribosomal occupancy. After accounting for these biases, we find no effect of previously implicated factors on ribosomal pausing. Rather, we find that incorporation of proline, whose unique side-chain stalls peptide synthesis in vitro, also slows the ribosome in vivo. We also reanalyze a method that implicated positively charged amino acids as the major determinant of ribosomal stalling and demonstrate that it produces false signals of stalling in low-coverage data. Our results suggest that any analysis of riboprofiling data should account for sequencing biases and sparse coverage. To this end, we establish a robust methodology that enables analysis of ribosome profiling data without prior assumptions regarding which positions spanned by the ribosome cause stalling. © 2014 Artieri and Fraser; Published by Cold Spring Harbor Laboratory Press.

  6. The dynamics of genome replication using deep sequencing

    PubMed Central

    Müller, Carolin A.; Hawkins, Michelle; Retkute, Renata; Malla, Sunir; Wilson, Ray; Blythe, Martin J.; Nakato, Ryuichiro; Komata, Makiko; Shirahige, Katsuhiko; de Moura, Alessandro P.S.; Nieduszynski, Conrad A.

    2014-01-01

    Eukaryotic genomes are replicated from multiple DNA replication origins. We present complementary deep sequencing approaches to measure origin location and activity in Saccharomyces cerevisiae. Measuring the increase in DNA copy number during a synchronous S-phase allowed the precise determination of genome replication. To map origin locations, replication forks were stalled close to their initiation sites; therefore, copy number enrichment was limited to origins. Replication timing profiles were generated from asynchronous cultures using fluorescence-activated cell sorting. Applying this technique we show that the replication profiles of haploid and diploid cells are indistinguishable, indicating that both cell types use the same cohort of origins with the same activities. Finally, increasing sequencing depth allowed the direct measure of replication dynamics from an exponentially growing culture. This is the first time this approach, called marker frequency analysis, has been successfully applied to a eukaryote. These data provide a high-resolution resource and methodological framework for studying genome biology. PMID:24089142

  7. Studies of a biochemical factory: tomato trichome deep expressed sequence tag sequencing and proteomics.

    PubMed

    Schilmiller, Anthony L; Miner, Dennis P; Larson, Matthew; McDowell, Eric; Gang, David R; Wilkerson, Curtis; Last, Robert L

    2010-07-01

    Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces beta-caryophyllene and alpha-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells.

  8. Studies of a Biochemical Factory: Tomato Trichome Deep Expressed Sequence Tag Sequencing and Proteomics1[W][OA

    PubMed Central

    Schilmiller, Anthony L.; Miner, Dennis P.; Larson, Matthew; McDowell, Eric; Gang, David R.; Wilkerson, Curtis; Last, Robert L.

    2010-01-01

    Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces β-caryophyllene and α-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells. PMID:20431087

  9. A novel genome signature based on inter-nucleotide distances profiles for visualization of metagenomic data

    NASA Astrophysics Data System (ADS)

    Xie, Xian-Hua; Yu, Zu-Guo; Ma, Yuan-Lin; Han, Guo-Sheng; Anh, Vo

    2017-09-01

    There has been a growing interest in visualization of metagenomic data. The present study focuses on the visualization of metagenomic data using inter-nucleotide distances profile. We first convert the fragment sequences into inter-nucleotide distances profiles. Then we analyze these profiles by principal component analysis. Finally the principal components are used to obtain the 2-D scattered plot according to their source of species. We name our method as inter-nucleotide distances profiles (INP) method. Our method is evaluated on three benchmark data sets used in previous published papers. Our results demonstrate that the INP method is good, alternative and efficient for visualization of metagenomic data.

  10. ExprAlign - the identification of ESTs in non-model species by alignment of cDNA microarray expression profiles

    PubMed Central

    2009-01-01

    Background Sequence identification of ESTs from non-model species offers distinct challenges particularly when these species have duplicated genomes and when they are phylogenetically distant from sequenced model organisms. For the common carp, an environmental model of aquacultural interest, large numbers of ESTs remained unidentified using BLAST sequence alignment. We have used the expression profiles from large-scale microarray experiments to suggest gene identities. Results Expression profiles from ~700 cDNA microarrays describing responses of 7 major tissues to multiple environmental stressors were used to define a co-expression landscape. This was based on the Pearsons correlation coefficient relating each gene with all other genes, from which a network description provided clusters of highly correlated genes as 'mountains'. We show that these contain genes with known identities and genes with unknown identities, and that the correlation constitutes evidence of identity in the latter. This procedure has suggested identities to 522 of 2701 unknown carp ESTs sequences. We also discriminate several common carp genes and gene isoforms that were not discriminated by BLAST sequence alignment alone. Precision in identification was substantially improved by use of data from multiple tissues and treatments. Conclusion The detailed analysis of co-expression landscapes is a sensitive technique for suggesting an identity for the large number of BLAST unidentified cDNAs generated in EST projects. It is capable of detecting even subtle changes in expression profiles, and thereby of distinguishing genes with a common BLAST identity into different identities. It benefits from the use of multiple treatments or contrasts, and from the large-scale microarray data. PMID:19939286

  11. Classification of Ancient Mammal Individuals Using Dental Pulp MALDI-TOF MS Peptide Profiling

    PubMed Central

    Tran, Thi-Nguyen-Ny; Aboudharam, Gérard; Gardeisen, Armelle; Davoust, Bernard; Bocquet-Appel, Jean-Pierre; Flaudrops, Christophe; Belghazi, Maya; Raoult, Didier; Drancourt, Michel

    2011-01-01

    Background The classification of ancient animal corpses at the species level remains a challenging task for forensic scientists and anthropologists. Severe damage and mixed, tiny pieces originating from several skeletons may render morphological classification virtually impossible. Standard approaches are based on sequencing mitochondrial and nuclear targets. Methodology/Principal Findings We present a method that can accurately classify mammalian species using dental pulp and mass spectrometry peptide profiling. Our work was organized into three successive steps. First, after extracting proteins from the dental pulp collected from 37 modern individuals representing 13 mammalian species, trypsin-digested peptides were used for matrix-assisted laser desorption/ionization time-of-flight mass spectrometry analysis. The resulting peptide profiles accurately classified every individual at the species level in agreement with parallel cytochrome b gene sequencing gold standard. Second, using a 279–modern spectrum database, we blindly classified 33 of 37 teeth collected in 37 modern individuals (89.1%). Third, we classified 10 of 18 teeth (56%) collected in 15 ancient individuals representing five mammal species including human, from five burial sites dating back 8,500 years. Further comparison with an upgraded database comprising ancient specimen profiles yielded 100% classification in ancient teeth. Peptide sequencing yield 4 and 16 different non-keratin proteins including collagen (alpha-1 type I and alpha-2 type I) in human ancient and modern dental pulp, respectively. Conclusions/Significance Mass spectrometry peptide profiling of the dental pulp is a new approach that can be added to the arsenal of species classification tools for forensics and anthropology as a complementary method to DNA sequencing. The dental pulp is a new source for collagen and other proteins for the species classification of modern and ancient mammal individuals. PMID:21364886

  12. 'DNA Strider': a 'C' program for the fast analysis of DNA and protein sequences on the Apple Macintosh family of computers.

    PubMed Central

    Marck, C

    1988-01-01

    DNA Strider is a new integrated DNA and Protein sequence analysis program written with the C language for the Macintosh Plus, SE and II computers. It has been designed as an easy to learn and use program as well as a fast and efficient tool for the day-to-day sequence analysis work. The program consists of a multi-window sequence editor and of various DNA and Protein analysis functions. The editor may use 4 different types of sequences (DNA, degenerate DNA, RNA and one-letter coded protein) and can handle simultaneously 6 sequences of any type up to 32.5 kB each. Negative numbering of the bases is allowed for DNA sequences. All classical restriction and translation analysis functions are present and can be performed in any order on any open sequence or part of a sequence. The main feature of the program is that the same analysis function can be repeated several times on different sequences, thus generating multiple windows on the screen. Many graphic capabilities have been incorporated such as graphic restriction map, hydrophobicity profile and the CAI plot- codon adaptation index according to Sharp and Li. The restriction sites search uses a newly designed fast hexamer look-ahead algorithm. Typical runtime for the search of all sites with a library of 130 restriction endonucleases is 1 second per 10,000 bases. The circular graphic restriction map of the pBR322 plasmid can be therefore computed from its sequence and displayed on the Macintosh Plus screen within 2 seconds and its multiline restriction map obtained in a scrolling window within 5 seconds. PMID:2832831

  13. Biosynthesis of the active compounds of Isatis indigotica based on transcriptome sequencing and metabolites profiling

    PubMed Central

    2013-01-01

    Backgroud Isatis indigotica is a widely used herb for the clinical treatment of colds, fever, and influenza in Traditional Chinese Medicine (TCM). Various structural classes of compounds have been identified as effective ingredients. However, little is known at genetics level about these active metabolites. In the present study, we performed de novo transcriptome sequencing for the first time to produce a comprehensive dataset of I. indigotica. Results A database of 36,367 unigenes (average length = 1,115.67 bases) was generated by performing transcriptome sequencing. Based on the gene annotation of the transcriptome, 104 unigenes were identified covering most of the catalytic steps in the general biosynthetic pathways of indole, terpenoid, and phenylpropanoid. Subsequently, the organ-specific expression patterns of the genes involved in these pathways, and their responses to methyl jasmonate (MeJA) induction, were investigated. Metabolites profile of effective phenylpropanoid showed accumulation pattern of secondary metabolites were mostly correlated with the transcription of their biosynthetic genes. According to the analysis of UDP-dependent glycosyltransferases (UGT) family, several flavonoids were indicated to exist in I. indigotica and further identified by metabolic profile using UPLC/Q-TOF. Moreover, applying transcriptome co-expression analysis, nine new, putative UGTs were suggested as flavonol glycosyltransferases and lignan glycosyltransferases. Conclusions This database provides a pool of candidate genes involved in biosynthesis of effective metabolites in I. indigotica. Furthermore, the comprehensive analysis and characterization of the significant pathways are expected to give a better insight regarding the diversity of chemical composition, synthetic characteristics, and the regulatory mechanism which operate in this medical herb. PMID:24308360

  14. Comparative genome analysis in the integrated microbial genomes (IMG) system.

    PubMed

    Markowitz, Victor M; Kyrpides, Nikos C

    2007-01-01

    Comparative genome analysis is critical for the effective exploration of a rapidly growing number of complete and draft sequences for microbial genomes. The Integrated Microbial Genomes (IMG) system (img.jgi.doe.gov) has been developed as a community resource that provides support for comparative analysis of microbial genomes in an integrated context. IMG allows users to navigate the multidimensional microbial genome data space and focus their analysis on a subset of genes, genomes, and functions of interest. IMG provides graphical viewers, summaries, and occurrence profile tools for comparing genes, pathways, and functions (terms) across specific genomes. Genes can be further examined using gene neighborhoods and compared with sequence alignment tools.

  15. High quality de novo sequencing and assembly of the Saccharomyces arboricolus genome

    PubMed Central

    2013-01-01

    Background Comparative genomics is a formidable tool to identify functional elements throughout a genome. In the past ten years, studies in the budding yeast Saccharomyces cerevisiae and a set of closely related species have been instrumental in showing the benefit of analyzing patterns of sequence conservation. Increasing the number of closely related genome sequences makes the comparative genomics approach more powerful and accurate. Results Here, we report the genome sequence and analysis of Saccharomyces arboricolus, a yeast species recently isolated in China, that is closely related to S. cerevisiae. We obtained high quality de novo sequence and assemblies using a combination of next generation sequencing technologies, established the phylogenetic position of this species and considered its phenotypic profile under multiple environmental conditions in the light of its gene content and phylogeny. Conclusions We suggest that the genome of S. arboricolus will be useful in future comparative genomics analysis of the Saccharomyces sensu stricto yeasts. PMID:23368932

  16. Chromosome-Encoded Broad-Spectrum Ambler Class A β-Lactamase RUB-1 from Serratia rubidaea

    PubMed Central

    Didi, Jennifer; Ergani, Ayla; Lima, Sandra

    2016-01-01

    ABSTRACT Whole-genome sequencing of Serratia rubidaea CIP 103234T revealed a chromosomally located Ambler class A β-lactamase gene. The gene was cloned, and the β-lactamase, RUB-1, was characterized. RUB-1 displayed 74% and 73% amino acid sequence identity with the GIL-1 and TEM-1 penicillinases, respectively, and its substrate profile was similar to that of the latter β-lactamases. Analysis by 5′ rapid amplification of cDNA ends revealed promoter sequences highly divergent from the Escherichia coli σ70 consensus sequence. This work further illustrates the heterogeneity of β-lactamases among Serratia spp. PMID:27956418

  17. Chromosome-Encoded Broad-Spectrum Ambler Class A β-Lactamase RUB-1 from Serratia rubidaea.

    PubMed

    Bonnin, Rémy A; Didi, Jennifer; Ergani, Ayla; Lima, Sandra; Naas, Thierry

    2017-02-01

    Whole-genome sequencing of Serratia rubidaea CIP 103234 T revealed a chromosomally located Ambler class A β-lactamase gene. The gene was cloned, and the β-lactamase, RUB-1, was characterized. RUB-1 displayed 74% and 73% amino acid sequence identity with the GIL-1 and TEM-1 penicillinases, respectively, and its substrate profile was similar to that of the latter β-lactamases. Analysis by 5' rapid amplification of cDNA ends revealed promoter sequences highly divergent from the Escherichia coli σ 70 consensus sequence. This work further illustrates the heterogeneity of β-lactamases among Serratia spp. Copyright © 2017 American Society for Microbiology.

  18. Effects of the Laramide Structures on the Regional Distribution of Tight-Gas Sandstone in the Upper Mesaverde Group, Uinta Basin, Utah

    NASA Astrophysics Data System (ADS)

    Sitaula, R. P.; Aschoff, J.

    2013-12-01

    Regional-scale sequence stratigraphic correlation, well log analysis, syntectonic unconformity mapping, isopach maps, and depositional environment maps of the upper Mesaverde Group (UMG) in Uinta basin, Utah suggest higher accommodation in northeastern part (Natural Buttes area) and local development of lacustrine facies due to increased subsidence caused by uplift of San Rafael Swell (SRS) in southern and Uinta Uplift in northern parts. Recently discovered lacustrine facies in Natural Buttes area are completely different than the dominant fluvial facies in outcrops along Book Cliffs and could have implications for significant amount of tight-gas sand production from this area. Data used for sequence stratigraphic correlation, isopach maps and depositional environmental maps include > 100 well logs, 20 stratigraphic profiles, 35 sandstone thin sections and 10 outcrop-based gamma ray profiles. Seven 4th order depositional sequences (~0.5 my duration) are identified and correlated within UMG. Correlation was constructed using a combination of fluvial facies and stacking patterns in outcrops, chert-pebble conglomerates and tidally influenced strata. These surfaces were extrapolated into subsurface by matching GR profiles. GR well logs and core log of Natural Buttes area show intervals of coarsening upward patterns suggesting possible lacustrine intervals that might contain high TOC. Locally, younger sequences are completely truncated across SRS whereas older sequences are truncated and thinned toward SRS. The cycles of truncation and thinning represent phases of SRS uplift. Thinning possibly related with the Uinta Uplift is also observed in northwestern part. Paleocurrents are consistent with interpretation of periodic segmentation and deflection of sedimentation. Regional paleocurrents are generally E-NE-directed in Sequences 1-4, and N-directed in Sequences 5-7. From isopach maps and paleocurrent direction it can be interpreted that uplift of SRS changed route of sediment supply from west to southwest. Locally, paleocurrents are highly variable near SRS further suggesting UMG basin-fill was partitioned by uplift of SRS. Sandstone composition analysis also suggests the uplift of SRS causing the variation of source rocks in upper sequences than the lower sequences. In conclusion, we suggest that Uinta basin was episodically partitioned during the deposition of UMG due to uplift of Laramide structures in the basin and accommodation was localized in northeastern part. Understanding of structural controls on accommodation, sedimentation patterns and depositional environments will aid prediction of the best-producing gas reservoirs.

  19. Microbial Communities in the Surface Mucopolysaccharide Layer and the Black Band Microbial Mat of Black Band-Diseased Siderastrea siderea

    PubMed Central

    Sekar, Raju; Mills, DeEtta K.; Remily, Elizabeth R.; Voss, Joshua D.; Richardson, Laurie L.

    2006-01-01

    Microbial community profiles and species composition associated with two black band-diseased colonies of the coral Siderastrea siderea were studied by 16S rRNA-targeted gene cloning, sequencing, and amplicon-length heterogeneity PCR (LH-PCR). Bacterial communities associated with the surface mucopolysaccharide layer (SML) of apparently healthy tissues of the infected colonies, together with samples of the black band disease (BBD) infections, were analyzed using the same techniques for comparison. Gene sequences, ranging from 424 to 1,537 bp, were retrieved from all positive clones (n = 43 to 48) in each of the four clone libraries generated and used for comparative sequence analysis. In addition to LH-PCR community profiling, all of the clone sequences were aligned with LH-PCR primer sequences, and the theoretical lengths of the amplicons were determined. Results revealed that the community profiles were significantly different between BBD and SML samples. The SML samples were dominated by γ-proteobacteria (53 to 64%), followed by β-proteobacteria (18 to 21%) and α-proteobacteria (5 to 11%). In contrast, both BBD clone libraries were dominated by α-proteobacteria (58 to 87%), followed by verrucomicrobia (2 to 10%) and 0 to 6% each of δ-proteobacteria, bacteroidetes, firmicutes, and cyanobacteria. Alphaproteobacterial sequence types related to the bacteria associated with toxin-producing dinoflagellates were observed in BBD clone libraries but were not found in the SML libraries. Similarly, sequences affiliated with the family Desulfobacteraceae and toxin-producing cyanobacteria, both believed to be involved in BBD pathogenesis, were found only in BBD libraries. These data provide evidence for an association of numerous toxin-producing heterotrophic microorganisms with BBD of corals. PMID:16957217

  20. Accurate detection for a wide range of mutation and editing sites of microRNAs from small RNA high-throughput sequencing profiles

    PubMed Central

    Zheng, Yun; Ji, Bo; Song, Renhua; Wang, Shengpeng; Li, Ting; Zhang, Xiaotuo; Chen, Kun; Li, Tianqing; Li, Jinyan

    2016-01-01

    Various types of mutation and editing (M/E) events in microRNAs (miRNAs) can change the stabilities of pre-miRNAs and/or complementarities between miRNAs and their targets. Small RNA (sRNA) high-throughput sequencing (HTS) profiles can contain many mutated and edited miRNAs. Systematic detection of miRNA mutation and editing sites from the huge volume of sRNA HTS profiles is computationally difficult, as high sensitivity and low false positive rate (FPR) are both required. We propose a novel method (named MiRME) for an accurate and fast detection of miRNA M/E sites using a progressive sequence alignment approach which refines sensitivity and improves FPR step-by-step. From 70 sRNA HTS profiles with over 1.3 billion reads, MiRME has detected thousands of statistically significant M/E sites, including 3′-editing sites, 57 A-to-I editing sites (of which 32 are novel), as well as some putative non-canonical editing sites. We demonstrated that a few non-canonical editing sites were not resulted from mutations in genome by integrating the analysis of genome HTS profiles of two human cell lines, suggesting the existence of new editing types to further diversify the functions of miRNAs. Compared with six existing studies or methods, MiRME has shown much superior performance for the identification and visualization of the M/E sites of miRNAs from the ever-increasing sRNA HTS profiles. PMID:27229138

  1. TOF-SIMS Analysis of Red Color Inks of Writing and Printing Tools on Questioned Documents.

    PubMed

    Lee, Jihye; Nam, Yun Sik; Min, Jisook; Lee, Kang-Bong; Lee, Yeonhee

    2016-05-01

    Time-of-flight secondary ion mass spectrometry (TOF-SIMS) is a well-established surface technique that provides both elemental and molecular information from several monolayers of a sample surface while also allowing depth profiling or image mapping to be performed. Static TOF-SIMS with improved performances has expanded the application of TOF-SIMS to the study of a variety of organic, polymeric, biological, archaeological, and forensic materials. In forensic investigation, the use of a minimal sample for the analysis is preferable. Although the TOF-SIMS technique is destructive, the probing beams have microsized diameters so that only small portion of the questioned sample is necessary for the analysis, leaving the rest available for other analyses. In this study, TOF-SIMS and attenuated total reflectance Fourier transform infrared (ATR-FTIR) were applied to the analysis of several different pen inks, red sealing inks, and printed patterns on paper. The overlapping areas of ballpoint pen writing, red seal stamping, and laser printing in a document were investigated to identify the sequence of recording. The sequence relations for various cases were determined from the TOF-SIMS mapping image and the depth profile. TOF-SIMS images were also used to investigate numbers or characters altered with two different red pens. TOF-SIMS was successfully used to determine the sequence of intersecting lines and the forged numbers on the paper. © 2016 American Academy of Forensic Sciences.

  2. Exploiting three kinds of interface propensities to identify protein binding sites.

    PubMed

    Liu, Bin; Wang, Xiaolong; Lin, Lei; Dong, Qiwen; Wang, Xuan

    2009-08-01

    Predicting the binding sites between two interacting proteins provides important clues to the function of a protein. In this study, we present a building block of proteins called order profiles to use the evolutionary information of the protein sequence frequency profiles and apply this building block to produce a class of propensities called order profile interface propensities. For comparisons, we revisit the usage of residue interface propensities and binary profile interface propensities for protein binding site prediction. Each kind of propensities combined with sequence profiles and accessible surface areas are inputted into SVM. When tested on four types of complexes (hetero-permanent complexes, hetero-transient complexes, homo-permanent complexes and homo-transient complexes), experimental results show that the order profile interface propensities are better than residue interface propensities and binary profile interface propensities. Therefore, order profile is a suitable profile-level building block of the protein sequences and can be widely used in many tasks of computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the protein remote homology detection.

  3. ChIP-seq: advantages and challenges of a maturing technology.

    PubMed

    Park, Peter J

    2009-10-01

    Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is a technique for genome-wide profiling of DNA-binding proteins, histone modifications or nucleosomes. Owing to the tremendous progress in next-generation sequencing technology, ChIP-seq offers higher resolution, less noise and greater coverage than its array-based predecessor ChIP-chip. With the decreasing cost of sequencing, ChIP-seq has become an indispensable tool for studying gene regulation and epigenetic mechanisms. In this Review, I describe the benefits and challenges in harnessing this technique with an emphasis on issues related to experimental design and data analysis. ChIP-seq experiments generate large quantities of data, and effective computational analysis will be crucial for uncovering biological mechanisms.

  4. DNA Replication Profiling Using Deep Sequencing.

    PubMed

    Saayman, Xanita; Ramos-Pérez, Cristina; Brown, Grant W

    2018-01-01

    Profiling of DNA replication during progression through S phase allows a quantitative snap-shot of replication origin usage and DNA replication fork progression. We present a method for using deep sequencing data to profile DNA replication in S. cerevisiae.

  5. Brucella papionis sp. nov., isolated from baboons (Papio spp.)

    PubMed Central

    Davison, Nicholas; Cloeckaert, Axel; Al Dahouk, Sascha; Zygmunt, Michel S.; Brew, Simon D.; Perrett, Lorraine L.; Koylass, Mark S.; Vergnaud, Gilles; Quance, Christine; Scholz, Holger C.; Dick, Edward J.; Hubbard, Gene; Schlabritz-Loutsevitch, Natalia E.

    2014-01-01

    Two Gram-negative, non-motile, non-spore-forming coccoid bacteria (strains F8/08-60T and F8/08-61) isolated from clinical specimens obtained from baboons (Papio spp.) that had delivered stillborn offspring were subjected to a polyphasic taxonomic study. On the basis of 16S rRNA gene sequence similarities, both strains, which possessed identical sequences, were assigned to the genus Brucella. This placement was confirmed by extended multilocus sequence analysis (MLSA), where both strains possessed identical sequences, and whole-genome sequencing of a representative isolate. All of the above analyses suggested that the two strains represent a novel lineage within the genus Brucella. The strains also possessed a unique profile when subjected to the phenotyping approach classically used to separate species of the genus Brucella, reacting only with Brucella A monospecific antiserum, being sensitive to the dyes thionin and fuchsin, being lysed by bacteriophage Wb, Bk2 and Fi phage at routine test dilution (RTD) but only partially sensitive to bacteriophage Tb, and with no requirement for CO2 and no production of H2S but strong urease activity. Biochemical profiling revealed a pattern of enzyme activity and metabolic capabilities distinct from existing species of the genus Brucella. Molecular analysis of the omp2 locus genes showed that both strains had a novel combination of two highly similar omp2b gene copies. The two strains shared a unique fingerprint profile of the multiple-copy Brucella-specific element IS711. Like MLSA, a multilocus variable number of tandem repeat analysis (MLVA) showed that the isolates clustered together very closely, but represent a distinct group within the genus Brucella. Isolates F8/08-60T and F8/08-61 could be distinguished clearly from all known species of the genus Brucellaand their biovars by both phenotypic and molecular properties. Therefore, by applying the species concept for the genus Brucellasuggested by the ICSP Subcommittee on the Taxonomy of Brucella, they represent a novel species within the genus Brucella, for which the name Brucella papionis sp. nov. is proposed, with the type strain F8/08-60T ( = NCTC 13660T = CIRMBP 0958T). PMID:25242540

  6. Uncovering leaf rust responsive miRNAs in wheat (Triticum aestivum L.) using high-throughput sequencing and prediction of their targets through degradome analysis.

    PubMed

    Kumar, Dhananjay; Dutta, Summi; Singh, Dharmendra; Prabhu, Kumble Vinod; Kumar, Manish; Mukhopadhyay, Kunal

    2017-01-01

    Deep sequencing identified 497 conserved and 559 novel miRNAs in wheat, while degradome analysis revealed 701 targets genes. QRT-PCR demonstrated differential expression of miRNAs during stages of leaf rust progression. Bread wheat (Triticum aestivum L.) is an important cereal food crop feeding 30 % of the world population. Major threat to wheat production is the rust epidemics. This study was targeted towards identification and functional characterizations of micro(mi)RNAs and their target genes in wheat in response to leaf rust ingression. High-throughput sequencing was used for transcriptome-wide identification of miRNAs and their expression profiling in retort to leaf rust using mock and pathogen-inoculated resistant and susceptible near-isogenic wheat plants. A total of 1056 mature miRNAs were identified, of which 497 miRNAs were conserved and 559 miRNAs were novel. The pathogen-inoculated resistant plants manifested more miRNAs compared with the pathogen infected susceptible plants. The miRNA counts increased in susceptible isoline due to leaf rust, conversely, the counts decreased in the resistant isoline in response to pathogenesis illustrating precise spatial tuning of miRNAs during compatible and incompatible interaction. Stem-loop quantitative real-time PCR was used to profile 10 highly differentially expressed miRNAs obtained from high-throughput sequencing data. The spatio-temporal profiling validated the differential expression of miRNAs between the isolines as well as in retort to pathogen infection. Degradome analysis provided 701 predicted target genes associated with defense response, signal transduction, development, metabolism, and transcriptional regulation. The obtained results indicate that wheat isolines employ diverse arrays of miRNAs that modulate their target genes during compatible and incompatible interaction. Our findings contribute to increase knowledge on roles of microRNA in wheat-leaf rust interactions and could help in rust resistance breeding programs.

  7. Co-Inheritance Analysis within the Domains of Life Substantially Improves Network Inference by Phylogenetic Profiling

    PubMed Central

    Shin, Junha; Lee, Insuk

    2015-01-01

    Phylogenetic profiling, a network inference method based on gene inheritance profiles, has been widely used to construct functional gene networks in microbes. However, its utility for network inference in higher eukaryotes has been limited. An improved algorithm with an in-depth understanding of pathway evolution may overcome this limitation. In this study, we investigated the effects of taxonomic structures on co-inheritance analysis using 2,144 reference species in four query species: Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana, and Homo sapiens. We observed three clusters of reference species based on a principal component analysis of the phylogenetic profiles, which correspond to the three domains of life—Archaea, Bacteria, and Eukaryota—suggesting that pathways inherit primarily within specific domains or lower-ranked taxonomic groups during speciation. Hence, the co-inheritance pattern within a taxonomic group may be eroded by confounding inheritance patterns from irrelevant taxonomic groups. We demonstrated that co-inheritance analysis within domains substantially improved network inference not only in microbe species but also in the higher eukaryotes, including humans. Although we observed two sub-domain clusters of reference species within Eukaryota, co-inheritance analysis within these sub-domain taxonomic groups only marginally improved network inference. Therefore, we conclude that co-inheritance analysis within domains is the optimal approach to network inference with the given reference species. The construction of a series of human gene networks with increasing sample sizes of the reference species for each domain revealed that the size of the high-accuracy networks increased as additional reference species genomes were included, suggesting that within-domain co-inheritance analysis will continue to expand human gene networks as genomes of additional species are sequenced. Taken together, we propose that co-inheritance analysis within the domains of life will greatly potentiate the use of the expected onslaught of sequenced genomes in the study of molecular pathways in higher eukaryotes. PMID:26394049

  8. Web-Based Phylogenetic Assignment Tool for Analysis of Terminal Restriction Fragment Length Polymorphism Profiles of Microbial Communities

    PubMed Central

    Kent, Angela D.; Smith, Dan J.; Benson, Barbara J.; Triplett, Eric W.

    2003-01-01

    Culture-independent DNA fingerprints are commonly used to assess the diversity of a microbial community. However, relating species composition to community profiles produced by community fingerprint methods is not straightforward. Terminal restriction fragment length polymorphism (T-RFLP) is a community fingerprint method in which phylogenetic assignments may be inferred from the terminal restriction fragment (T-RF) sizes through the use of web-based resources that predict T-RF sizes for known bacteria. The process quickly becomes computationally intensive due to the need to analyze profiles produced by multiple restriction digests and the complexity of profiles generated by natural microbial communities. A web-based tool is described here that rapidly generates phylogenetic assignments from submitted community T-RFLP profiles based on a database of fragments produced by known 16S rRNA gene sequences. Users have the option of submitting a customized database generated from unpublished sequences or from a gene other than the 16S rRNA gene. This phylogenetic assignment tool allows users to employ T-RFLP to simultaneously analyze microbial community diversity and species composition. An analysis of the variability of bacterial species composition throughout the water column in a humic lake was carried out to demonstrate the functionality of the phylogenetic assignment tool. This method was validated by comparing the results generated by this program with results from a 16S rRNA gene clone library. PMID:14602639

  9. Transcriptome Profiling of Chironomus kiinensis under Phenol Stress Using Solexa Sequencing Technology

    PubMed Central

    Cao, Chuanwang; Wang, Zhiying; Niu, Changying; Desneux, Nicolas; Gao, Xiwu

    2013-01-01

    Phenol is a major pollutant in aquatic ecosystems due to its chemical stability, water solubility and environmental mobility. To date, little is known about the molecular modifications of invertebrates under phenol stress. In the present study, we used Solexa sequencing technology to investigate the transcriptome and differentially expressed genes (DEGs) of midges (Chironomus kiinensis) in response to phenol stress. A total of 51,518,972 and 51,150,832 clean reads in the phenol-treated and control libraries, respectively, were obtained and assembled into 51,014 non-redundant (Nr) consensus sequences. A total of 6,032 unigenes were classified by Gene Ontology (GO), and 18,366 unigenes were categorized into 238 Kyoto Encyclopedia of Genes and Genomes (KEGG) categories. These genes included representatives from almost all functional categories. A total of 10,724 differentially expressed genes (P value <0.05) were detected in a comparative analysis of the expression profiles between phenol-treated and control C. kiinensis including 8,390 upregulated and 2,334 downregulated genes. The expression levels of 20 differentially expressed genes were confirmed by real-time RT-PCR, and the trends in gene expression that were observed matched the Solexa expression profiles, although the magnitude of the variations was different. Through pathway enrichment analysis, significantly enriched pathways were identified for the DEGs, including metabolic pathways, aryl hydrocarbon receptor (AhR), pancreatic secretion and neuroactive ligand-receptor interaction pathways, which may be associated with the phenol responses of C. kiinensis. Using Solexa sequencing technology, we identified several groups of key candidate genes as well as important biological pathways involved in the molecular modifications of chironomids under phenol stress. PMID:23527048

  10. SUPER-FOCUS: a tool for agile functional analysis of shotgun metagenomic data

    PubMed Central

    Green, Kevin T.; Dutilh, Bas E.; Edwards, Robert A.

    2016-01-01

    Summary: Analyzing the functional profile of a microbial community from unannotated shotgun sequencing reads is one of the important goals in metagenomics. Functional profiling has valuable applications in biological research because it identifies the abundances of the functional genes of the organisms present in the original sample, answering the question what they can do. Currently, available tools do not scale well with increasing data volumes, which is important because both the number and lengths of the reads produced by sequencing platforms keep increasing. Here, we introduce SUPER-FOCUS, SUbsystems Profile by databasE Reduction using FOCUS, an agile homology-based approach using a reduced reference database to report the subsystems present in metagenomic datasets and profile their abundances. SUPER-FOCUS was tested with over 70 real metagenomes, the results showing that it accurately predicts the subsystems present in the profiled microbial communities, and is up to 1000 times faster than other tools. Availability and implementation: SUPER-FOCUS was implemented in Python, and its source code and the tool website are freely available at https://edwards.sdsu.edu/SUPERFOCUS. Contact: redwards@mail.sdsu.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26454280

  11. SUPER-FOCUS: a tool for agile functional analysis of shotgun metagenomic data.

    PubMed

    Silva, Genivaldo Gueiros Z; Green, Kevin T; Dutilh, Bas E; Edwards, Robert A

    2016-02-01

    Analyzing the functional profile of a microbial community from unannotated shotgun sequencing reads is one of the important goals in metagenomics. Functional profiling has valuable applications in biological research because it identifies the abundances of the functional genes of the organisms present in the original sample, answering the question what they can do. Currently, available tools do not scale well with increasing data volumes, which is important because both the number and lengths of the reads produced by sequencing platforms keep increasing. Here, we introduce SUPER-FOCUS, SUbsystems Profile by databasE Reduction using FOCUS, an agile homology-based approach using a reduced reference database to report the subsystems present in metagenomic datasets and profile their abundances. SUPER-FOCUS was tested with over 70 real metagenomes, the results showing that it accurately predicts the subsystems present in the profiled microbial communities, and is up to 1000 times faster than other tools. SUPER-FOCUS was implemented in Python, and its source code and the tool website are freely available at https://edwards.sdsu.edu/SUPERFOCUS. redwards@mail.sdsu.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  12. Application of the High Resolution Melting analysis for genetic mapping of Sequence Tagged Site markers in narrow-leafed lupin (Lupinus angustifolius L.).

    PubMed

    Kamel, Katarzyna A; Kroc, Magdalena; Święcicki, Wojciech

    2015-01-01

    Sequence tagged site (STS) markers are valuable tools for genetic and physical mapping that can be successfully used in comparative analyses among related species. Current challenges for molecular markers genotyping in plants include the lack of fast, sensitive and inexpensive methods suitable for sequence variant detection. In contrast, high resolution melting (HRM) is a simple and high-throughput assay, which has been widely applied in sequence polymorphism identification as well as in the studies of genetic variability and genotyping. The present study is the first attempt to use the HRM analysis to genotype STS markers in narrow-leafed lupin (Lupinus angustifolius L.). The sensitivity and utility of this method was confirmed by the sequence polymorphism detection based on melting curve profiles in the parental genotypes and progeny of the narrow-leafed lupin mapping population. Application of different approaches, including amplicon size and a simulated heterozygote analysis, has allowed for successful genetic mapping of 16 new STS markers in the narrow-leafed lupin genome.

  13. PHYLOViZ: phylogenetic inference and data visualization for sequence based typing methods

    PubMed Central

    2012-01-01

    Background With the decrease of DNA sequencing costs, sequence-based typing methods are rapidly becoming the gold standard for epidemiological surveillance. These methods provide reproducible and comparable results needed for a global scale bacterial population analysis, while retaining their usefulness for local epidemiological surveys. Online databases that collect the generated allelic profiles and associated epidemiological data are available but this wealth of data remains underused and are frequently poorly annotated since no user-friendly tool exists to analyze and explore it. Results PHYLOViZ is platform independent Java software that allows the integrated analysis of sequence-based typing methods, including SNP data generated from whole genome sequence approaches, and associated epidemiological data. goeBURST and its Minimum Spanning Tree expansion are used for visualizing the possible evolutionary relationships between isolates. The results can be displayed as an annotated graph overlaying the query results of any other epidemiological data available. Conclusions PHYLOViZ is a user-friendly software that allows the combined analysis of multiple data sources for microbial epidemiological and population studies. It is freely available at http://www.phyloviz.net. PMID:22568821

  14. UFO: a web server for ultra-fast functional profiling of whole genome protein sequences.

    PubMed

    Meinicke, Peter

    2009-09-02

    Functional profiling is a key technique to characterize and compare the functional potential of entire genomes. The estimation of profiles according to an assignment of sequences to functional categories is a computationally expensive task because it requires the comparison of all protein sequences from a genome with a usually large database of annotated sequences or sequence families. Based on machine learning techniques for Pfam domain detection, the UFO web server for ultra-fast functional profiling allows researchers to process large protein sequence collections instantaneously. Besides the frequencies of Pfam and GO categories, the user also obtains the sequence specific assignments to Pfam domain families. In addition, a comparison with existing genomes provides dissimilarity scores with respect to 821 reference proteomes. Considering the underlying UFO domain detection, the results on 206 test genomes indicate a high sensitivity of the approach. In comparison with current state-of-the-art HMMs, the runtime measurements show a considerable speed up in the range of four orders of magnitude. For an average size prokaryotic genome, the computation of a functional profile together with its comparison typically requires about 10 seconds of processing time. For the first time the UFO web server makes it possible to get a quick overview on the functional inventory of newly sequenced organisms. The genome scale comparison with a large number of precomputed profiles allows a first guess about functionally related organisms. The service is freely available and does not require user registration or specification of a valid email address.

  15. [Detection and diversity analysis of rumen methanogens in the co-cultures with anaerobic fungi].

    PubMed

    Cheng, Yan-fen; Mao, Sheng-yong; Pei, Cai-xia; Liu, Jian-xin; Zhu, Wei-yun

    2006-12-01

    Rumen methanogen diversity in the co-cultures with anaerobic fungi from goat rumen was analyzed. Mix-cultures of anaerobic fungi and methanogens were obtained from goat rumen using anaerobic fungal medium and the addition of penicillin and streptomycin and then subcultured 62 times by transferring cultures every 3 - 4d. Total DNA from the original rumen fluid and subcultured fungal cultures was used for PCR/DGGE and RFLP analysis. 16S rDNA of clones corresponding to representative OTUs were sequenced. Results showed that the diversity index (Shannon index) of the methanogens generated from DGGE profiles reduced from 1.32 to 0.99 from rumen fluid to fungal culture after 45 subculturing, with the lowest similarity of DGGE profiles at 34.7%. The Shannon index increased from 0.99 to 1.15 from the fungal culture after 45 subculturing to that after 62 subculturing, with the lowest similarity at 89.2% . A total of 5 OTUs were obtained from 69. clones using RFLP analysis and six clones representing the 5 OTUs respectively were sequenced. Of the 5 OTUs, three had their cloned 16S rDNA sequences most closely related to uncultured archaeal symbiont PA202 with the same similarity of 95 %, but had not closely related to any identified culturable methanogen. The rest two OTUs had their cloned 16S rDNA sequences sharing the same closest relative, uncultured rumen methanogen 956, with the same similarity of 97% .Their 16S rDNA sequences of these two OTUs also showed 97% similar to the closest identified culturable methanogen Methanobrevibacter sp. NT7. In conclusion, diverse yet unidentified rumen methanogen species exist in the co-cultures with anaerobic fungi isolated from the goat rumen.

  16. Developmental validation of a Nextera XT mitogenome Illumina MiSeq sequencing method for high-quality samples.

    PubMed

    Peck, Michelle A; Sturk-Andreaggi, Kimberly; Thomas, Jacqueline T; Oliver, Robert S; Barritt-Ross, Suzanne; Marshall, Charla

    2018-05-01

    Generating mitochondrial genome (mitogenome) data from reference samples in a rapid and efficient manner is critical to harnessing the greater power of discrimination of the entire mitochondrial DNA (mtDNA) marker. The method of long-range target enrichment, Nextera XT library preparation, and Illumina sequencing on the MiSeq is a well-established technique for generating mitogenome data from high-quality samples. To this end, a validation was conducted for this mitogenome method processing up to 24 samples simultaneously along with analysis in the CLC Genomics Workbench and utilizing the AQME (AFDIL-QIAGEN mtDNA Expert) tool to generate forensic profiles. This validation followed the Federal Bureau of Investigation's Quality Assurance Standards (QAS) for forensic DNA testing laboratories and the Scientific Working Group on DNA Analysis Methods (SWGDAM) validation guidelines. The evaluation of control DNA, non-probative samples, blank controls, mixtures, and nonhuman samples demonstrated the validity of this method. Specifically, the sensitivity was established at ≥25 pg of nuclear DNA input for accurate mitogenome profile generation. Unreproducible low-level variants were observed in samples with low amplicon yields. Further, variant quality was shown to be a useful metric for identifying sequencing error and crosstalk. Success of this method was demonstrated with a variety of reference sample substrates and extract types. These studies further demonstrate the advantages of using NGS techniques by highlighting the quantitative nature of heteroplasmy detection. The results presented herein from more than 175 samples processed in ten sequencing runs, show this mitogenome sequencing method and analysis strategy to be valid for the generation of reference data. Copyright © 2018 Elsevier B.V. All rights reserved.

  17. Identification and substrate prediction of new Fragaria x ananassa aquaporins and expression in different tissues and during strawberry fruit development.

    PubMed

    Merlaen, Britt; De Keyser, Ellen; Van Labeke, Marie-Christine

    2018-01-01

    The newly identified aquaporin coding sequences presented here pave the way for further insights into the plant-water relations in the commercial strawberry ( Fragaria x ananassa ). Aquaporins are water channel proteins that allow water to cross (intra)cellular membranes. In Fragaria x ananassa , few of them have been identified hitherto, hampering the exploration of the water transport regulation at cellular level. Here, we present new aquaporin coding sequences belonging to different subclasses: plasma membrane intrinsic proteins subtype 1 and subtype 2 (PIP1 and PIP2) and tonoplast intrinsic proteins (TIP). The classification is based on phylogenetic analysis and is confirmed by the presence of conserved residues. Substrate-specific signature sequences (SSSSs) and specificity-determining positions (SDPs) predict the substrate specificity of each new aquaporin. Expression profiling in leaves, petioles and developing fruits reveals distinct patterns, even within the same (sub)class. Expression profiles range from leaf-specific expression over constitutive expression to fruit-specific expression. Both upregulation and downregulation during fruit ripening occur. Substrate specificity and expression profiles suggest that functional specialization exists among aquaporins belonging to a different but also to the same (sub)class.

  18. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control consortium

    PubMed Central

    2014-01-01

    We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the United States Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for junction discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed, for these and qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcript-level profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings. PMID:25150838

  19. Genotyping of Echinococcus granulosus from domestic animals and humans from Ardabil Province, northwest Iran.

    PubMed

    Pezeshki, A; Akhlaghi, L; Sharbatkhori, M; Razmjou, E; Oormazdi, H; Mohebali, M; Meamar, A R

    2013-12-01

    Cystic echinococcosis is endemic in Iran, particularly in Ardabil Province, where it causes health and economic problems. The genetic pattern of Echinococcus granulosus has been determined in most parts of Iran, except in this area. In the present investigation, 55 larval isolates were collected from humans (11), sheep (19), goats (4) and cattle (21). For analysis of the genetic characteristics of E. granulosus isolates, DNA sequencing of mitochondrial cytochrome c oxidase subunit 1 (cox1) and NADH dehydrogenase subunit 1 (nad1) genes was applied. Fifty isolates were successfully analysed, with 92% (46) and 8% (4) identified as G1 and G3 genotypes, respectively. The sequence analyses of the isolates displayed nine characteristic profiles in cox1 sequences and eight characteristic profiles in nad1 sequences. Based on these results, the sheep strain (G1 genotype) was the most prevalent in humans, sheep, goats and cattle. The buffalo strain (G3 genotype) was not only demonstrated in sheep (1 isolate) and cattle (1 isolate), but also for the first time in two human isolates. These findings will provide information for local control of echinococcosis.

  20. Use of life course work-family profiles to predict mortality risk among US women.

    PubMed

    Sabbath, Erika L; Guevara, Ivan Mejía; Glymour, M Maria; Berkman, Lisa F

    2015-04-01

    We examined relationships between US women's exposure to midlife work-family demands and subsequent mortality risk. We used data from women born 1935 to 1956 in the Health and Retirement Study to calculate employment, marital, and parenthood statuses for each age between 16 and 50 years. We used sequence analysis to identify 7 prototypical work-family trajectories. We calculated age-standardized mortality rates and hazard ratios (HRs) for mortality associated with work-family sequences, with adjustment for covariates and potentially explanatory later-life factors. Married women staying home with children briefly before reentering the workforce had the lowest mortality rates. In comparison, after adjustment for age, race/ethnicity, and education, HRs for mortality were 2.14 (95% confidence interval [CI] = 1.58, 2.90) among single nonworking mothers, 1.48 (95% CI = 1.06, 1.98) among single working mothers, and 1.36 (95% CI = 1.02, 1.80) among married nonworking mothers. Adjustment for later-life behavioral and economic factors partially attenuated risks. Sequence analysis is a promising exposure assessment tool for life course research. This method permitted identification of certain lifetime work-family profiles associated with mortality risk before age 75 years.

  1. Gut microbial profile analysis by MiSeq sequencing of pancreatic carcinoma patients in China

    PubMed Central

    Xie, Haiyang; Li, Ang; Lu, Haifeng; Xu, Shaoyan; Zhou, Lin; Zhang, Hua; Cui, Guangying; Chen, Xinhua; Liu, Yuanxing; Wu, Liming; Qin, Nan; Sun, Ranran; Wang, Wei; Li, Lanjuan; Wang, Weilin; Zheng, Shusen

    2017-01-01

    Pancreatic carcinoma (PC) is a lethal cancer. Gut microbiota is associated with some risk factors of PC, e.g. obesity and types II diabetes. However, the specific gut microbial profile in clinical PC in China has never been reported. This prospective study collected 85 PC and 57 matched healthy controls (HC) to analyze microbial characteristics by MiSeq sequencing. The results showed that gut microbial diversity was decreased in PC with an unique microbial profile, which partly attributed to its decrease of alpha diversity. Microbial alterations in PC featured by the increase of certain pathogens and lipopolysaccharides-producing bacteria, and the decrease of probiotics and butyrate-producing bacteria. Microbial community in obstruction cases was separated from the un-obstructed cases. Streptococcus was associated with the bile. Furthermore, 23 microbial functions e.g. Leucine and LPS biosynthesis were enriched, while 13 functions were reduced in PC. Importantly, based on 40 genera associated with PC, microbial markers achieves a high classification power with AUC of 0.842. In conclusion, gut microbial profile was unique in PC, providing a microbial marker for non-invasive PC diagnosis. PMID:29221120

  2. Assessing the genome level diversity of Listeria monocytogenes from contaminated ice cream and environmental samples linked to a listeriosis outbreak in the United States.

    PubMed

    Chen, Yi; Luo, Yan; Curry, Phillip; Timme, Ruth; Melka, David; Doyle, Matthew; Parish, Mickey; Hammack, Thomas S; Allard, Marc W; Brown, Eric W; Strain, Errol A

    2017-01-01

    A listeriosis outbreak in the United States implicated contaminated ice cream produced by one company, which operated 3 facilities. We performed single nucleotide polymorphism (SNP)-based whole genome sequencing (WGS) analysis on Listeria monocytogenes from food, environmental and clinical sources, identifying two clusters and a single branch, belonging to PCR serogroup IIb and genetic lineage I. WGS Cluster I, representing one outbreak strain, contained 82 food and environmental isolates from Facility I and 4 clinical isolates. These isolates differed by up to 29 SNPs, exhibited 9 pulsed-field gel electrophoresis (PFGE) profiles and multilocus sequence typing (MLST) sequence type (ST) 5 of clonal complex 5 (CC5). WGS Cluster II contained 51 food and environmental isolates from Facility II, 4 food isolates from Facility I and 5 clinical isolates. Among them the isolates from Facility II and clinical isolates formed a clade and represented another outbreak strain. Isolates in this clade differed by up to 29 SNPs, exhibited 3 PFGE profiles and ST5. The only isolate collected from Facility III belonged to singleton ST489, which was in a single branch separate from Clusters I and II, and was not associated with the outbreak. WGS analyses clustered together outbreak-associated isolates exhibiting multiple PFGE profiles, while differentiating them from epidemiologically unrelated isolates that exhibited outbreak PFGE profiles. The complete genome of a Cluster I isolate allowed the identification and analyses of putative prophages, revealing that Cluster I isolates differed by the gain or loss of three putative prophages, causing the banding pattern differences among all 3 AscI-PFGE profiles observed in Cluster I isolates. WGS data suggested that certain ice cream varieties and/or production lines might have contamination sources unique to them. The SNP-based analysis was able to distinguish CC5 as a group from non-CC5 isolates and differentiate among CC5 isolates from different outbreaks/incidents.

  3. Assessing the genome level diversity of Listeria monocytogenes from contaminated ice cream and environmental samples linked to a listeriosis outbreak in the United States

    PubMed Central

    Chen, Yi; Luo, Yan; Curry, Phillip; Timme, Ruth; Melka, David; Doyle, Matthew; Parish, Mickey; Hammack, Thomas S.; Allard, Marc W.; Brown, Eric W.; Strain, Errol A.

    2017-01-01

    A listeriosis outbreak in the United States implicated contaminated ice cream produced by one company, which operated 3 facilities. We performed single nucleotide polymorphism (SNP)-based whole genome sequencing (WGS) analysis on Listeria monocytogenes from food, environmental and clinical sources, identifying two clusters and a single branch, belonging to PCR serogroup IIb and genetic lineage I. WGS Cluster I, representing one outbreak strain, contained 82 food and environmental isolates from Facility I and 4 clinical isolates. These isolates differed by up to 29 SNPs, exhibited 9 pulsed-field gel electrophoresis (PFGE) profiles and multilocus sequence typing (MLST) sequence type (ST) 5 of clonal complex 5 (CC5). WGS Cluster II contained 51 food and environmental isolates from Facility II, 4 food isolates from Facility I and 5 clinical isolates. Among them the isolates from Facility II and clinical isolates formed a clade and represented another outbreak strain. Isolates in this clade differed by up to 29 SNPs, exhibited 3 PFGE profiles and ST5. The only isolate collected from Facility III belonged to singleton ST489, which was in a single branch separate from Clusters I and II, and was not associated with the outbreak. WGS analyses clustered together outbreak-associated isolates exhibiting multiple PFGE profiles, while differentiating them from epidemiologically unrelated isolates that exhibited outbreak PFGE profiles. The complete genome of a Cluster I isolate allowed the identification and analyses of putative prophages, revealing that Cluster I isolates differed by the gain or loss of three putative prophages, causing the banding pattern differences among all 3 AscI-PFGE profiles observed in Cluster I isolates. WGS data suggested that certain ice cream varieties and/or production lines might have contamination sources unique to them. The SNP-based analysis was able to distinguish CC5 as a group from non-CC5 isolates and differentiate among CC5 isolates from different outbreaks/incidents. PMID:28166293

  4. Using RNA-Seq for gene identification, polymorphism detection and transcript profiling in two alfalfa genotypes with divergent cell wall composition in stems

    PubMed Central

    2011-01-01

    Background Alfalfa, [Medicago sativa (L.) sativa], a widely-grown perennial forage has potential for development as a cellulosic ethanol feedstock. However, the genomics of alfalfa, a non-model species, is still in its infancy. The recent advent of RNA-Seq, a massively parallel sequencing method for transcriptome analysis, provides an opportunity to expand the identification of alfalfa genes and polymorphisms, and conduct in-depth transcript profiling. Results Cell walls in stems of alfalfa genotype 708 have higher cellulose and lower lignin concentrations compared to cell walls in stems of genotype 773. Using the Illumina GA-II platform, a total of 198,861,304 expression sequence tags (ESTs, 76 bp in length) were generated from cDNA libraries derived from elongating stem (ES) and post-elongation stem (PES) internodes of 708 and 773. In addition, 341,984 ESTs were generated from ES and PES internodes of genotype 773 using the GS FLX Titanium platform. The first alfalfa (Medicago sativa) gene index (MSGI 1.0) was assembled using the Sanger ESTs available from GenBank, the GS FLX Titanium EST sequences, and the de novo assembled Illumina sequences. MSGI 1.0 contains 124,025 unique sequences including 22,729 tentative consensus sequences (TCs), 22,315 singletons and 78,981 pseudo-singletons. We identified a total of 1,294 simple sequence repeats (SSR) among the sequences in MSGI 1.0. In addition, a total of 10,826 single nucleotide polymorphisms (SNPs) were predicted between the two genotypes. Out of 55 SNPs randomly selected for experimental validation, 47 (85%) were polymorphic between the two genotypes. We also identified numerous allelic variations within each genotype. Digital gene expression analysis identified numerous candidate genes that may play a role in stem development as well as candidate genes that may contribute to the differences in cell wall composition in stems of the two genotypes. Conclusions Our results demonstrate that RNA-Seq can be successfully used for gene identification, polymorphism detection and transcript profiling in alfalfa, a non-model, allogamous, autotetraploid species. The alfalfa gene index assembled in this study, and the SNPs, SSRs and candidate genes identified can be used to improve alfalfa as a forage crop and cellulosic feedstock. PMID:21504589

  5. Gene Expression Profiling Reveals Functional Specialization along the Intestinal Tract of a Carnivorous Teleostean Fish (Dicentrarchus labrax)

    PubMed Central

    Calduch-Giner, Josep A.; Sitjà-Bobadilla, Ariadna; Pérez-Sánchez, Jaume

    2016-01-01

    High-quality sequencing reads from the intestine of European sea bass were assembled, annotated by similarity against protein reference databases and combined with nucleotide sequences from public and private databases. After redundancy filtering, 24,906 non-redundant annotated sequences encoding 15,367 different gene descriptions were obtained. These annotated sequences were used to design a custom, high-density oligo-microarray (8 × 15 K) for the transcriptomic profiling of anterior (AI), middle (MI), and posterior (PI) intestinal segments. Similar molecular signatures were found for AI and MI segments, which were combined in a single group (AI-MI) whereas the PI outstood separately, with more than 1900 differentially expressed genes with a fold-change cutoff of 2. Functional analysis revealed that molecular and cellular functions related to feed digestion and nutrient absorption and transport were over-represented in AI-MI segments. By contrast, the initiation and establishment of immune defense mechanisms became especially relevant in PI, although the microarray expression profiling validated by qPCR indicated that these functional changes are gradual from anterior to posterior intestinal segments. This functional divergence occurred in association with spatial transcriptional changes in nutrient transporters and the mucosal chemosensing system via G protein-coupled receptors. These findings contribute to identify key indicators of gut functions and to compare different fish feeding strategies and immune defense mechanisms acquired along the evolution of teleosts. PMID:27610085

  6. Gene Expression Profiling Reveals Functional Specialization along the Intestinal Tract of a Carnivorous Teleostean Fish (Dicentrarchus labrax).

    PubMed

    Calduch-Giner, Josep A; Sitjà-Bobadilla, Ariadna; Pérez-Sánchez, Jaume

    2016-01-01

    High-quality sequencing reads from the intestine of European sea bass were assembled, annotated by similarity against protein reference databases and combined with nucleotide sequences from public and private databases. After redundancy filtering, 24,906 non-redundant annotated sequences encoding 15,367 different gene descriptions were obtained. These annotated sequences were used to design a custom, high-density oligo-microarray (8 × 15 K) for the transcriptomic profiling of anterior (AI), middle (MI), and posterior (PI) intestinal segments. Similar molecular signatures were found for AI and MI segments, which were combined in a single group (AI-MI) whereas the PI outstood separately, with more than 1900 differentially expressed genes with a fold-change cutoff of 2. Functional analysis revealed that molecular and cellular functions related to feed digestion and nutrient absorption and transport were over-represented in AI-MI segments. By contrast, the initiation and establishment of immune defense mechanisms became especially relevant in PI, although the microarray expression profiling validated by qPCR indicated that these functional changes are gradual from anterior to posterior intestinal segments. This functional divergence occurred in association with spatial transcriptional changes in nutrient transporters and the mucosal chemosensing system via G protein-coupled receptors. These findings contribute to identify key indicators of gut functions and to compare different fish feeding strategies and immune defense mechanisms acquired along the evolution of teleosts.

  7. Characterizing partial AZFc deletions of the Y chromosome with amplicon-specific sequence markers

    PubMed Central

    Navarro-Costa, Paulo; Pereira, Luísa; Alves, Cíntia; Gusmão, Leonor; Proença, Carmen; Marques-Vidal, Pedro; Rocha, Tiago; Correia, Sónia C; Jorge, Sónia; Neves, António; Soares, Ana P; Nunes, Joaquim; Calhaz-Jorge, Carlos; Amorim, António; Plancha, Carlos E; Gonçalves, João

    2007-01-01

    Background The AZFc region of the human Y chromosome is a highly recombinogenic locus containing multi-copy male fertility genes located in repeated DNA blocks (amplicons). These AZFc gene families exhibit slight sequence variations between copies which are considered to have functional relevance. Yet, partial AZFc deletions yield phenotypes ranging from normospermia to azoospermia, thwarting definite conclusions on their real impact on fertility. Results The amplicon content of partial AZFc deletion products was characterized with novel amplicon-specific sequence markers. Data indicate that partial AZFc deletions are a male infertility risk [odds ratio: 5.6 (95% CI: 1.6–30.1)] and although high diversity of partial deletion products and sequence conversion profiles were recorded, the AZFc marker profiles detected in fertile men were also observed in infertile men. Additionally, the assessment of rearrangement recurrence by Y-lineage analysis indicated that while partial AZFc deletions occurred in highly diverse samples, haplotype diversity was minimal in fertile men sharing identical marker profiles. Conclusion Although partial AZFc deletion products are highly heterogeneous in terms of amplicon content, this plasticity is not sufficient to account for the observed phenotypical variance. The lack of causative association between the deletion of specific gene copies and infertility suggests that AZFc gene content might be part of a multifactorial network, with Y-lineage evolution emerging as a possible phenotype modulator. PMID:17903263

  8. Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis

    PubMed Central

    Steele, Joe; Bastola, Dhundy

    2014-01-01

    Modern sequencing and genome assembly technologies have provided a wealth of data, which will soon require an analysis by comparison for discovery. Sequence alignment, a fundamental task in bioinformatics research, may be used but with some caveats. Seminal techniques and methods from dynamic programming are proving ineffective for this work owing to their inherent computational expense when processing large amounts of sequence data. These methods are prone to giving misleading information because of genetic recombination, genetic shuffling and other inherent biological events. New approaches from information theory, frequency analysis and data compression are available and provide powerful alternatives to dynamic programming. These new methods are often preferred, as their algorithms are simpler and are not affected by synteny-related problems. In this review, we provide a detailed discussion of computational tools, which stem from alignment-free methods based on statistical analysis from word frequencies. We provide several clear examples to demonstrate applications and the interpretations over several different areas of alignment-free analysis such as base–base correlations, feature frequency profiles, compositional vectors, an improved string composition and the D2 statistic metric. Additionally, we provide detailed discussion and an example of analysis by Lempel–Ziv techniques from data compression. PMID:23904502

  9. Transcriptome assembly and digital gene expression atlas of the rainbow trout

    USDA-ARS?s Scientific Manuscript database

    Background: Transcriptome analysis is a preferred method for gene discovery, marker development and gene expression profiling in non-model organisms. Previously, we sequenced a transcriptome reference using Sanger-based and 454-pyrosequencing, however, a transcriptome assembly is still incomplete an...

  10. Analysis, annotation, and profiling of the oat seed transcriptome

    USDA-ARS?s Scientific Manuscript database

    Novel high-throughput next generation sequencing (NGS) technologies are providing opportunities to explore genomes and transcriptomes in a cost-effective manner. To construct a gene expression atlas of developing oat (Avena sativa) seeds, two software packages specifically designed for RNA-seq (Trin...

  11. Identification of Human Lineage-Specific Transcriptional Coregulators Enabled by a Glossary of Binding Modules and Tunable Genomic Backgrounds.

    PubMed

    Mariani, Luca; Weinand, Kathryn; Vedenko, Anastasia; Barrera, Luis A; Bulyk, Martha L

    2017-09-27

    Transcription factors (TFs) control cellular processes by binding specific DNA motifs to modulate gene expression. Motif enrichment analysis of regulatory regions can identify direct and indirect TF binding sites. Here, we created a glossary of 108 non-redundant TF-8mer "modules" of shared specificity for 671 metazoan TFs from publicly available and new universal protein binding microarray data. Analysis of 239 ENCODE TF chromatin immunoprecipitation sequencing datasets and associated RNA sequencing profiles suggest the 8mer modules are more precise than position weight matrices in identifying indirect binding motifs and their associated tethering TFs. We also developed GENRE (genomically equivalent negative regions), a tunable tool for construction of matched genomic background sequences for analysis of regulatory regions. GENRE outperformed four state-of-the-art approaches to background sequence construction. We used our TF-8mer glossary and GENRE in the analysis of the indirect binding motifs for the co-occurrence of tethering factors, suggesting novel TF-TF interactions. We anticipate that these tools will aid in elucidating tissue-specific gene-regulatory programs. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. Schizosaccharomyces pombe Polysome Profile Analysis and RNA Purification.

    PubMed

    Wolf, Dieter A; Bähler, Jürg; Wise, Jo Ann

    2017-04-03

    Polysome profile analysis is widely used by investigators studying the mechanism and regulation of translation. The method described here uses high-velocity centrifugation of whole cell extracts on linear sucrose gradients to separate 40S and 60S ribosomal subunits from 80S monosomes and polysomes. Cycloheximide is included in the lysis buffer to "freeze" polysomes by blocking translation. After centrifugation, the gradient is fractionated and RNA (and/or protein) is prepared from each fraction for subsequent analysis of individual species using northern or western blots. The entire RNA population in each fraction can be analyzed by hybridization to microarrays or by high-throughput RNA sequencing, and the proteins present can be identified by mass spectrometry analysis. © 2017 Cold Spring Harbor Laboratory Press.

  13. ViDiT-CACTUS: an inexpensive and versatile library preparation and sequence analysis method for virus discovery and other microbiology applications.

    PubMed

    Verhoeven, Joost Theo Petra; Canuti, Marta; Munro, Hannah J; Dufour, Suzanne C; Lang, Andrew S

    2018-04-19

    High-throughput sequencing (HTS) technologies are becoming increasingly important within microbiology research, but aspects of library preparation, such as high cost per sample or strict input requirements, make HTS difficult to implement in some niche applications and for research groups on a budget. To answer these necessities, we developed ViDiT, a customizable, PCR-based, extremely low-cost (<5 US dollars per sample) and versatile library preparation method, and CACTUS, an analysis pipeline designed to rely on cloud computing power to generate high-quality data from ViDiT-based experiments without the need of expensive servers. We demonstrate here the versatility and utility of these methods within three fields of microbiology: virus discovery, amplicon-based viral genome sequencing and microbiome profiling. ViDiT-CACTUS allowed the identification of viral fragments from 25 different viral families from 36 oropharyngeal-cloacal swabs collected from wild birds, the sequencing of three almost complete genomes of avian influenza A viruses (>90% coverage), and the characterization and functional profiling of the complete microbial diversity (bacteria, archaea, viruses) within a deep-sea carnivorous sponge. ViDiT-CACTUS demonstrated its validity in a wide range of microbiology applications and its simplicity and modularity make it easily implementable in any molecular biology laboratory, towards various research goals.

  14. Transcriptional profiling reveals the expression of novel genes in response to various stimuli in the human dermatophyte Trichophyton rubrum

    PubMed Central

    2010-01-01

    Background Cutaneous mycoses are common human infections among healthy and immunocompromised hosts, and the anthropophilic fungus Trichophyton rubrum is the most prevalent microorganism isolated from such clinical cases worldwide. The aim of this study was to determine the transcriptional profile of T. rubrum exposed to various stimuli in order to obtain insights into the responses of this pathogen to different environmental challenges. Therefore, we generated an expressed sequence tag (EST) collection by constructing one cDNA library and nine suppression subtractive hybridization libraries. Results The 1388 unigenes identified in this study were functionally classified based on the Munich Information Center for Protein Sequences (MIPS) categories. The identified proteins were involved in transcriptional regulation, cellular defense and stress, protein degradation, signaling, transport, and secretion, among other functions. Analysis of these unigenes revealed 575 T. rubrum sequences that had not been previously deposited in public databases. Conclusion In this study, we identified novel T. rubrum genes that will be useful for ORF prediction in genome sequencing and facilitating functional genome analysis. Annotation of these expressed genes revealed metabolic adaptations of T. rubrum to carbon sources, ambient pH shifts, and various antifungal drugs used in medical practice. Furthermore, challenging T. rubrum with cytotoxic drugs and ambient pH shifts extended our understanding of the molecular events possibly involved in the infectious process and resistance to antifungal drugs. PMID:20144196

  15. High-Resolution Genotyping of Streptococcus pyogenes Serotype M1 Isolates by Fluorescent Amplified-Fragment Length Polymorphism Analysis

    PubMed Central

    Desai, Meeta; Efstratiou, Androulla; George, Robert; Stanley, John

    1999-01-01

    We have used fluorescent amplified-fragment length polymorphism (FAFLP) analysis to subtype clinical isolates of Streptococcus pyogenes serotype M1. Established typing methods define most M1 isolates as members of a clone that has a worldwide distribution and that is strongly associated with invasive diseases. FAFLP analysis simultaneously sampled 90 to 120 loci throughout the M1 genome. Its discriminatory power, precision, and reproducibility were compared with those of other molecular typing methods. Irrespective of disease symptomatology or geographic origin, the majority of the clinical M1 isolates shared a single ribotype, pulsed-field gel electrophoresis macrorestriction profile, and emm1 gene sequence. Nonetheless, among these isolates, FAFLP analysis could differentiate 17 distinct profiles, including seven multi-isolate groups. The FAFLP profiles of M1 isolates reproducibly exhibited between 1 and more than 20 amplified fragment differences. The high discriminatory power of genotyping by FAFLP analysis revealed genetic microheterogeneity and differentiated otherwise “identical” M1 isolates as members of a clone complex. PMID:10325352

  16. An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences

    PubMed Central

    Knutson, Stacy T.; Westwood, Brian M.; Leuthaeuser, Janelle B.; Turner, Brandon E.; Nguyendac, Don; Shea, Gabrielle; Kumar, Kiran; Hayden, Julia D.; Harper, Angela F.; Brown, Shoshana D.; Morris, John H.; Ferrin, Thomas E.; Babbitt, Patricia C.

    2017-01-01

    Abstract Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification—amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two‐Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure‐Function Linkage Database, SFLD) self‐identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self‐identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well‐curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP‐identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F‐measure and performance analysis on the enolase search results and comparison to GEMMA and SCI‐PHY demonstrate that TuLIP avoids the over‐division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results. PMID:28054422

  17. IMPACT: a whole-exome sequencing analysis pipeline for integrating molecular profiles with actionable therapeutics in clinical samples

    PubMed Central

    Hintzsche, Jennifer; Kim, Jihye; Yadav, Vinod; Amato, Carol; Robinson, Steven E; Seelenfreund, Eric; Shellman, Yiqun; Wisell, Joshua; Applegate, Allison; McCarter, Martin; Box, Neil; Tentler, John; De, Subhajyoti

    2016-01-01

    Objective Currently, there is a disconnect between finding a patient’s relevant molecular profile and predicting actionable therapeutics. Here we develop and implement the Integrating Molecular Profiles with Actionable Therapeutics (IMPACT) analysis pipeline, linking variants detected from whole-exome sequencing (WES) to actionable therapeutics. Methods and materials The IMPACT pipeline contains 4 analytical modules: detecting somatic variants, calling copy number alterations, predicting drugs against deleterious variants, and analyzing tumor heterogeneity. We tested the IMPACT pipeline on whole-exome sequencing data in The Cancer Genome Atlas (TCGA) lung adenocarcinoma samples with known EGFR mutations. We also used IMPACT to analyze melanoma patient tumor samples before treatment, after BRAF-inhibitor treatment, and after BRAF- and MEK-inhibitor treatment. Results IMPACT Food and Drug Administration (FDA) correctly identified known EGFR mutations in the TCGA lung adenocarcinoma samples. IMPACT linked these EGFR mutations to the appropriate FDA-approved EGFR inhibitors. For the melanoma patient samples, we identified NRAS p.Q61K as an acquired resistance mutation to BRAF-inhibitor treatment. We also identified CDKN2A deletion as a novel acquired resistance mutation to BRAFi/MEKi inhibition. The IMPACT analysis pipeline predicts these somatic variants to actionable therapeutics. We observed the clonal dynamic in the tumor samples after various treatments. We showed that IMPACT not only helped in successful prioritization of clinically relevant variants but also linked these variations to possible targeted therapies. Conclusion IMPACT provides a new bioinformatics strategy to delineate candidate somatic variants and actionable therapies. This approach can be applied to other patient tumor samples to discover effective drug targets for personalized medicine. IMPACT is publicly available at http://tanlab.ucdenver.edu/IMPACT. PMID:27026619

  18. High-resolution melt PCR analysis for genotyping of Ureaplasma parvum isolates directly from clinical samples.

    PubMed

    Payne, Matthew S; Tabone, Tania; Kemp, Matthew W; Keelan, Jeffrey A; Spiller, O Brad; Newnham, John P

    2014-02-01

    Ureaplasma sp. infection in neonates and adults underlies a variety of disease pathologies. Of the two human Ureaplasma spp., Ureaplasma parvum is clinically the most common. We have developed a high-resolution melt (HRM) PCR assay for the differentiation of the four serovars of U. parvum in a single step. Currently U. parvum strains are separated into four serovars by sequencing the promoter and coding region of the multiple-banded antigen (MBA) gene. We designed primers to conserved sequences within this region for PCR amplification and HRM analysis to generate reproducible and distinct melt profiles that distinguish clonal representatives of serovars 1, 3, 6, and 14. Furthermore, our HRM PCR assay could classify DNA extracted from 74 known (MBA-sequenced) test strains with 100% accuracy. Importantly, HRM PCR was also able to identify U. parvum serovars directly from 16 clinical swabs. HRM PCR performed with DNA consisting of mixtures of combined known serovars yielded profiles that were easily distinguished from those for single-serovar controls. These profiles mirrored clinical samples that contained mixed serovars. Unfortunately, melt curve analysis software is not yet robust enough to identify the composition of mixed serovar samples, only that more than one serovar is present. HRM PCR provides a single-step, rapid, cost-effective means to differentiate the four serovars of U. parvum that did not amplify any of the known 10 serovars of Ureaplasma urealyticum tested in parallel. Choice of reaction reagents was found to be crucial to allow sufficient sensitivity to differentiate U. parvum serovars directly from clinical swabs rather than requiring cell enrichment using microbial culture techniques.

  19. Microarray analysis of gene expression profiles in ripening pineapple fruits.

    PubMed

    Koia, Jonni H; Moyle, Richard L; Botella, Jose R

    2012-12-18

    Pineapple (Ananas comosus) is a tropical fruit crop of significant commercial importance. Although the physiological changes that occur during pineapple fruit development have been well characterized, little is known about the molecular events that occur during the fruit ripening process. Understanding the molecular basis of pineapple fruit ripening will aid the development of new varieties via molecular breeding or genetic modification. In this study we developed a 9277 element pineapple microarray and used it to profile gene expression changes that occur during pineapple fruit ripening. Microarray analyses identified 271 unique cDNAs differentially expressed at least 1.5-fold between the mature green and mature yellow stages of pineapple fruit ripening. Among these 271 sequences, 184 share significant homology with genes encoding proteins of known function, 53 share homology with genes encoding proteins of unknown function and 34 share no significant homology with any database accession. Of the 237 pineapple sequences with homologs, 160 were up-regulated and 77 were down-regulated during pineapple fruit ripening. DAVID Functional Annotation Cluster (FAC) analysis of all 237 sequences with homologs revealed confident enrichment scores for redox activity, organic acid metabolism, metalloenzyme activity, glycolysis, vitamin C biosynthesis, antioxidant activity and cysteine peptidase activity, indicating the functional significance and importance of these processes and pathways during pineapple fruit development. Quantitative real-time PCR analysis validated the microarray expression results for nine out of ten genes tested. This is the first report of a microarray based gene expression study undertaken in pineapple. Our bioinformatic analyses of the transcript profiles have identified a number of genes, processes and pathways with putative involvement in the pineapple fruit ripening process. This study extends our knowledge of the molecular basis of pineapple fruit ripening and non-climacteric fruit ripening in general.

  20. Microarray analysis of gene expression profiles in ripening pineapple fruits

    PubMed Central

    2012-01-01

    Background Pineapple (Ananas comosus) is a tropical fruit crop of significant commercial importance. Although the physiological changes that occur during pineapple fruit development have been well characterized, little is known about the molecular events that occur during the fruit ripening process. Understanding the molecular basis of pineapple fruit ripening will aid the development of new varieties via molecular breeding or genetic modification. In this study we developed a 9277 element pineapple microarray and used it to profile gene expression changes that occur during pineapple fruit ripening. Results Microarray analyses identified 271 unique cDNAs differentially expressed at least 1.5-fold between the mature green and mature yellow stages of pineapple fruit ripening. Among these 271 sequences, 184 share significant homology with genes encoding proteins of known function, 53 share homology with genes encoding proteins of unknown function and 34 share no significant homology with any database accession. Of the 237 pineapple sequences with homologs, 160 were up-regulated and 77 were down-regulated during pineapple fruit ripening. DAVID Functional Annotation Cluster (FAC) analysis of all 237 sequences with homologs revealed confident enrichment scores for redox activity, organic acid metabolism, metalloenzyme activity, glycolysis, vitamin C biosynthesis, antioxidant activity and cysteine peptidase activity, indicating the functional significance and importance of these processes and pathways during pineapple fruit development. Quantitative real-time PCR analysis validated the microarray expression results for nine out of ten genes tested. Conclusions This is the first report of a microarray based gene expression study undertaken in pineapple. Our bioinformatic analyses of the transcript profiles have identified a number of genes, processes and pathways with putative involvement in the pineapple fruit ripening process. This study extends our knowledge of the molecular basis of pineapple fruit ripening and non-climacteric fruit ripening in general. PMID:23245313

  1. IMPACT: a whole-exome sequencing analysis pipeline for integrating molecular profiles with actionable therapeutics in clinical samples.

    PubMed

    Hintzsche, Jennifer; Kim, Jihye; Yadav, Vinod; Amato, Carol; Robinson, Steven E; Seelenfreund, Eric; Shellman, Yiqun; Wisell, Joshua; Applegate, Allison; McCarter, Martin; Box, Neil; Tentler, John; De, Subhajyoti; Robinson, William A; Tan, Aik Choon

    2016-07-01

    Currently, there is a disconnect between finding a patient's relevant molecular profile and predicting actionable therapeutics. Here we develop and implement the Integrating Molecular Profiles with Actionable Therapeutics (IMPACT) analysis pipeline, linking variants detected from whole-exome sequencing (WES) to actionable therapeutics. The IMPACT pipeline contains 4 analytical modules: detecting somatic variants, calling copy number alterations, predicting drugs against deleterious variants, and analyzing tumor heterogeneity. We tested the IMPACT pipeline on whole-exome sequencing data in The Cancer Genome Atlas (TCGA) lung adenocarcinoma samples with known EGFR mutations. We also used IMPACT to analyze melanoma patient tumor samples before treatment, after BRAF-inhibitor treatment, and after BRAF- and MEK-inhibitor treatment. IMPACT Food and Drug Administration (FDA) correctly identified known EGFR mutations in the TCGA lung adenocarcinoma samples. IMPACT linked these EGFR mutations to the appropriate FDA-approved EGFR inhibitors. For the melanoma patient samples, we identified NRAS p.Q61K as an acquired resistance mutation to BRAF-inhibitor treatment. We also identified CDKN2A deletion as a novel acquired resistance mutation to BRAFi/MEKi inhibition. The IMPACT analysis pipeline predicts these somatic variants to actionable therapeutics. We observed the clonal dynamic in the tumor samples after various treatments. We showed that IMPACT not only helped in successful prioritization of clinically relevant variants but also linked these variations to possible targeted therapies. IMPACT provides a new bioinformatics strategy to delineate candidate somatic variants and actionable therapies. This approach can be applied to other patient tumor samples to discover effective drug targets for personalized medicine.IMPACT is publicly available at http://tanlab.ucdenver.edu/IMPACT. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  2. Using DGGE and 16S rRNA gene sequence analysis to evaluate changes in oral bacterial composition.

    PubMed

    Chen, Zhou; Trivedi, Harsh M; Chhun, Nok; Barnes, Virginia M; Saxena, Deepak; Xu, Tao; Li, Yihong

    2011-01-01

    To investigate whether a standard dental prophylaxis followed by tooth brushing with an antibacterial dentifrice will affect the oral bacterial community, as determined by denaturing gradient gel electrophoresis (DGGE) combined with 16S rRNA gene sequence analysis. Twenty-four healthy adults were instructed to brush their teeth using commercial dentifrice for 1 week during a washout period. An initial set of pooled supragingival plaque samples was collected from each participant at baseline (0 h) before prophylaxis treatment. The subjects were given a clinical examination and dental prophylaxis and asked to brush for 1 min with a dentifrice containing 0.3% triclosan, 2.0% PVM/MA copolymer and 0.243% sodium fluoride (Colgate Total). On the following day, a second set of pooled supragingival plaque samples (24 h) was collected. Total bacterial genomic DNA was isolated from the samples. Differences in the microbial composition before and after the prophylactic procedure and tooth brushing were assessed by comparing the DGGE profiles and 16S rRNA gene segments sequence analysis. Two distinct clusters of DGGE profiles were found, suggesting that a shift in the microbial composition had occurred 24 h after the prophylaxis and brushing. A detailed sequencing analysis of 16S rRNA gene segments further identified 6 phyla and 29 genera, including known and unknown bacterial species. Importantly, an increase in bacterial diversity was observed after 24 h, including members of the Streptococcaceae family, Prevotella, Corynebacterium, TM7 and other commensal bacteria. The results suggest that the use of a standard prophylaxis followed by the use of the dentifrice containing 0.3% triclosan, 2.0% PVM/MA copolymer and 0.243% sodium fluoride may promote a healthier composition within the oral bacterial community.

  3. PanFP: Pangenome-based functional profiles for microbial communities

    DOE PAGES

    Jun, Se -Ran; Hauser, Loren John; Schadt, Christopher Warren; ...

    2015-09-26

    For decades there has been increasing interest in understanding the relationships between microbial communities and ecosystem functions. Current DNA sequencing technologies allows for the exploration of microbial communities in two principle ways: targeted rRNA gene surveys and shotgun metagenomics. For large study designs, it is often still prohibitively expensive to sequence metagenomes at both the breadth and depth necessary to statistically capture the true functional diversity of a community. Although rRNA gene surveys provide no direct evidence of function, they do provide a reasonable estimation of microbial diversity, while being a very cost effective way to screen samples of interestmore » for later shotgun metagenomic analyses. However, there is a great deal of 16S rRNA gene survey data currently available from diverse environments, and thus a need for tools to infer functional composition of environmental samples based on 16S rRNA gene survey data. As a result, we present a computational method called pangenome based functional profiles (PanFP), which infers functional profiles of microbial communities from 16S rRNA gene survey data for Bacteria and Archaea. PanFP is based on pangenome reconstruction of a 16S rRNA gene operational taxonomic unit (OTU) from known genes and genomes pooled from the OTU s taxonomic lineage. From this lineage, we derive an OTU functional profile by weighting a pangenome s functional profile with the OTUs abundance observed in a given sample. We validated our method by comparing PanFP to the functional profiles obtained from the direct shotgun metagenomic measurement of 65 diverse communities via Spearman correlation coefficients. These correlations improved with increasing sequencing depth, within the range of 0.8 0.9 for the most deeply sequenced Human Microbiome Project mock community samples. PanFP is very similar in performance to another recently released tool, PICRUSt, for almost all of survey data analysed here. But, our method is unique in that any OTU building method can be used, as opposed to being limited to closed reference OTU picking strategies against specific reference sequence databases. In conclusion, we developed an automated computational method, which derives an inferred functional profile based on the 16S rRNA gene surveys of microbial communities. The inferred functional profile provides a cost effective way to study complex ecosystems through predicted comparative functional metagenomes and metadata analysis. All PanFP source code and additional documentation are freely available online at GitHub.« less

  4. PanFP: pangenome-based functional profiles for microbial communities.

    PubMed

    Jun, Se-Ran; Robeson, Michael S; Hauser, Loren J; Schadt, Christopher W; Gorin, Andrey A

    2015-09-26

    For decades there has been increasing interest in understanding the relationships between microbial communities and ecosystem functions. Current DNA sequencing technologies allows for the exploration of microbial communities in two principle ways: targeted rRNA gene surveys and shotgun metagenomics. For large study designs, it is often still prohibitively expensive to sequence metagenomes at both the breadth and depth necessary to statistically capture the true functional diversity of a community. Although rRNA gene surveys provide no direct evidence of function, they do provide a reasonable estimation of microbial diversity, while being a very cost-effective way to screen samples of interest for later shotgun metagenomic analyses. However, there is a great deal of 16S rRNA gene survey data currently available from diverse environments, and thus a need for tools to infer functional composition of environmental samples based on 16S rRNA gene survey data. We present a computational method called pangenome-based functional profiles (PanFP), which infers functional profiles of microbial communities from 16S rRNA gene survey data for Bacteria and Archaea. PanFP is based on pangenome reconstruction of a 16S rRNA gene operational taxonomic unit (OTU) from known genes and genomes pooled from the OTU's taxonomic lineage. From this lineage, we derive an OTU functional profile by weighting a pangenome's functional profile with the OTUs abundance observed in a given sample. We validated our method by comparing PanFP to the functional profiles obtained from the direct shotgun metagenomic measurement of 65 diverse communities via Spearman correlation coefficients. These correlations improved with increasing sequencing depth, within the range of 0.8-0.9 for the most deeply sequenced Human Microbiome Project mock community samples. PanFP is very similar in performance to another recently released tool, PICRUSt, for almost all of survey data analysed here. But, our method is unique in that any OTU building method can be used, as opposed to being limited to closed-reference OTU picking strategies against specific reference sequence databases. We developed an automated computational method, which derives an inferred functional profile based on the 16S rRNA gene surveys of microbial communities. The inferred functional profile provides a cost effective way to study complex ecosystems through predicted comparative functional metagenomes and metadata analysis. All PanFP source code and additional documentation are freely available online at GitHub ( https://github.com/srjun/PanFP ).

  5. Genomic Heterogeneity as a Barrier to Precision Medicine in Gastroesophageal Adenocarcinoma.

    PubMed

    Pectasides, Eirini; Stachler, Matthew D; Derks, Sarah; Liu, Yang; Maron, Steven; Islam, Mirazul; Alpert, Lindsay; Kwak, Heewon; Kindler, Hedy; Polite, Blase; Sharma, Manish R; Allen, Kenisha; O'Day, Emily; Lomnicki, Samantha; Maranto, Melissa; Kanteti, Rajani; Fitzpatrick, Carrie; Weber, Christopher; Setia, Namrata; Xiao, Shu-Yuan; Hart, John; Nagy, Rebecca J; Kim, Kyoung-Mee; Choi, Min-Gew; Min, Byung-Hoon; Nason, Katie S; O'Keefe, Lea; Watanabe, Masayuki; Baba, Hideo; Lanman, Rick; Agoston, Agoston T; Oh, David J; Dunford, Andrew; Thorner, Aaron R; Ducar, Matthew D; Wollison, Bruce M; Coleman, Haley A; Ji, Yuan; Posner, Mitchell C; Roggin, Kevin; Turaga, Kiran; Chang, Paul; Hogarth, Kyle; Siddiqui, Uzma; Gelrud, Andres; Ha, Gavin; Freeman, Samuel S; Rhoades, Justin; Reed, Sarah; Gydush, Greg; Rotem, Denisse; Davison, Jon; Imamura, Yu; Adalsteinsson, Viktor; Lee, Jeeyun; Bass, Adam J; Catenacci, Daniel V

    2018-01-01

    Gastroesophageal adenocarcinoma (GEA) is a lethal disease where targeted therapies, even when guided by genomic biomarkers, have had limited efficacy. A potential reason for the failure of such therapies is that genomic profiling results could commonly differ between the primary and metastatic tumors. To evaluate genomic heterogeneity, we sequenced paired primary GEA and synchronous metastatic lesions across multiple cohorts, finding extensive differences in genomic alterations, including discrepancies in potentially clinically relevant alterations. Multiregion sequencing showed significant discrepancy within the primary tumor (PT) and between the PT and disseminated disease, with oncogene amplification profiles commonly discordant. In addition, a pilot analysis of cell-free DNA (cfDNA) sequencing demonstrated the feasibility of detecting genomic amplifications not detected in PT sampling. Lastly, we profiled paired primary tumors, metastatic tumors, and cfDNA from patients enrolled in the personalized antibodies for GEA (PANGEA) trial of targeted therapies in GEA and found that genomic biomarkers were recurrently discrepant between the PT and untreated metastases. Divergent primary and metastatic tissue profiling led to treatment reassignment in 32% (9/28) of patients. In discordant primary and metastatic lesions, we found 87.5% concordance for targetable alterations in metastatic tissue and cfDNA, suggesting the potential for cfDNA profiling to enhance selection of therapy. Significance: We demonstrate frequent baseline heterogeneity in targetable genomic alterations in GEA, indicating that current tissue sampling practices for biomarker testing do not effectively guide precision medicine in this disease and that routine profiling of metastatic lesions and/or cfDNA should be systematically evaluated. Cancer Discov; 8(1); 37-48. ©2017 AACR. See related commentary by Sundar and Tan, p. 14 See related article by Janjigian et al., p. 49 This article is highlighted in the In This Issue feature, p. 1 . ©2017 American Association for Cancer Research.

  6. eShadow: A tool for comparing closely related sequences

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ovcharenko, Ivan; Boffelli, Dario; Loots, Gabriela G.

    2004-01-15

    Primate sequence comparisons are difficult to interpret due to the high degree of sequence similarity shared between such closely related species. Recently, a novel method, phylogenetic shadowing, has been pioneered for predicting functional elements in the human genome through the analysis of multiple primate sequence alignments. We have expanded this theoretical approach to create a computational tool, eShadow, for the identification of elements under selective pressure in multiple sequence alignments of closely related genomes, such as in comparisons of human to primate or mouse to rat DNA. This tool integrates two different statistical methods and allows for the dynamic visualizationmore » of the resulting conservation profile. eShadow also includes a versatile optimization module capable of training the underlying Hidden Markov Model to differentially predict functional sequences. This module grants the tool high flexibility in the analysis of multiple sequence alignments and in comparing sequences with different divergence rates. Here, we describe the eShadow comparative tool and its potential uses for analyzing both multiple nucleotide and protein alignments to predict putative functional elements. The eShadow tool is publicly available at http://eshadow.dcode.org/« less

  7. Comparative sequencing analysis reveals high genomic concordance between matched primary and metastatic colorectal cancer lesions.

    PubMed

    Brannon, A Rose; Vakiani, Efsevia; Sylvester, Brooke E; Scott, Sasinya N; McDermott, Gregory; Shah, Ronak H; Kania, Krishan; Viale, Agnes; Oschwald, Dayna M; Vacic, Vladimir; Emde, Anne-Katrin; Cercek, Andrea; Yaeger, Rona; Kemeny, Nancy E; Saltz, Leonard B; Shia, Jinru; D'Angelica, Michael I; Weiser, Martin R; Solit, David B; Berger, Michael F

    2014-08-28

    Colorectal cancer is the second leading cause of cancer death in the United States, with over 50,000 deaths estimated in 2014. Molecular profiling for somatic mutations that predict absence of response to anti-EGFR therapy has become standard practice in the treatment of metastatic colorectal cancer; however, the quantity and type of tissue available for testing is frequently limited. Further, the degree to which the primary tumor is a faithful representation of metastatic disease has been questioned. As next-generation sequencing technology becomes more widely available for clinical use and additional molecularly targeted agents are considered as treatment options in colorectal cancer, it is important to characterize the extent of tumor heterogeneity between primary and metastatic tumors. We performed deep coverage, targeted next-generation sequencing of 230 key cancer-associated genes for 69 matched primary and metastatic tumors and normal tissue. Mutation profiles were 100% concordant for KRAS, NRAS, and BRAF, and were highly concordant for recurrent alterations in colorectal cancer. Additionally, whole genome sequencing of four patient trios did not reveal any additional site-specific targetable alterations. Colorectal cancer primary tumors and metastases exhibit high genomic concordance. As current clinical practices in colorectal cancer revolve around KRAS, NRAS, and BRAF mutation status, diagnostic sequencing of either primary or metastatic tissue as available is acceptable for most patients. Additionally, consistency between targeted sequencing and whole genome sequencing results suggests that targeted sequencing may be a suitable strategy for clinical diagnostic applications.

  8. Transcriptome Analysis at the Single-Cell Level Using SMART Technology.

    PubMed

    Fish, Rachel N; Bostick, Magnolia; Lehman, Alisa; Farmer, Andrew

    2016-10-10

    RNA sequencing (RNA-seq) is a powerful method for analyzing cell state, with minimal bias, and has broad applications within the biological sciences. However, transcriptome analysis of seemingly homogenous cell populations may in fact overlook significant heterogeneity that can be uncovered at the single-cell level. The ultra-low amount of RNA contained in a single cell requires extraordinarily sensitive and reproducible transcriptome analysis methods. As next-generation sequencing (NGS) technologies mature, transcriptome profiling by RNA-seq is increasingly being used to decipher the molecular signature of individual cells. This unit describes an ultra-sensitive and reproducible protocol to generate cDNA and sequencing libraries directly from single cells or RNA inputs ranging from 10 pg to 10 ng. Important considerations for working with minute RNA inputs are given. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.

  9. Application of the Ramanujan Fourier Transform for the analysis of secondary structure content in amino acid sequences.

    PubMed

    Mainardi, L T; Pattini, L; Cerutti, S

    2007-01-01

    A novel method is presented for the investigation of protein properties of sequences using Ramanujan Fourier Transform (RFT). The new methodology involves the preprocessing of protein sequence data by numerically encoding it and then applying the RFT. The RFT is based on projecting the obtained numerical series on a set of basis functions constituted by Ramanujan sums (RS). In RS components, periodicities of finite integer length, rather than frequency, (as in classical harmonic analysis) are considered. The potential of the new approach is documented by a few examples in the analysis of hydrophobic profiles of proteins in two classes including abundance of alpha-helices (group A) or beta-strands (group B). Different patterns are provided as evidence. RFT can be used to characterize the structural properties of proteins and integrate complementary information provided by other signal processing transforms.

  10. Characterization of the glutathione S-transferase gene family through ESTs and expression analyses within common and pigmented cultivars of Citrus sinensis (L.) Osbeck.

    PubMed

    Licciardello, Concetta; D'Agostino, Nunzio; Traini, Alessandra; Recupero, Giuseppe Reforgiato; Frusciante, Luigi; Chiusano, Maria Luisa

    2014-02-03

    Glutathione S-transferases (GSTs) represent a ubiquitous gene family encoding detoxification enzymes able to recognize reactive electrophilic xenobiotic molecules as well as compounds of endogenous origin. Anthocyanin pigments require GSTs for their transport into the vacuole since their cytoplasmic retention is toxic to the cell. Anthocyanin accumulation in Citrus sinensis (L.) Osbeck fruit flesh determines different phenotypes affecting the typical pigmentation of Sicilian blood oranges. In this paper we describe: i) the characterization of the GST gene family in C. sinensis through a systematic EST analysis; ii) the validation of the EST assembly by exploiting the genome sequences of C. sinensis and C. clementina and their genome annotations; iii) GST gene expression profiling in six tissues/organs and in two different sweet orange cultivars, Cadenera (common) and Moro (pigmented). We identified 61 GST transcripts, described the full- or partial-length nature of the sequences and assigned to each sequence the GST class membership exploiting a comparative approach and the classification scheme proposed for plant species. A total of 23 full-length sequences were defined. Fifty-four of the 61 transcripts were successfully aligned to the C. sinensis and C. clementina genomes. Tissue specific expression profiling demonstrated that the expression of some GST transcripts was 'tissue-affected' and cultivar specific. A comparative analysis of C. sinensis GSTs with those from other plant species was also considered. Data from the current analysis are accessible at http://biosrv.cab.unina.it/citrusGST/, with the aim to provide a reference resource for C. sinensis GSTs. This study aimed at the characterization of the GST gene family in C. sinensis. Based on expression patterns from two different cultivars and on sequence-comparative analyses, we also highlighted that two sequences, a Phi class GST and a Mapeg class GST, could be involved in the conjugation of anthocyanin pigments and in their transport into the vacuole, specifically in fruit flesh of the pigmented cultivar.

  11. FANTOM5 CAGE profiles of human and mouse samples.

    PubMed

    Noguchi, Shuhei; Arakawa, Takahiro; Fukuda, Shiro; Furuno, Masaaki; Hasegawa, Akira; Hori, Fumi; Ishikawa-Kato, Sachi; Kaida, Kaoru; Kaiho, Ai; Kanamori-Katayama, Mutsumi; Kawashima, Tsugumi; Kojima, Miki; Kubosaki, Atsutaka; Manabe, Ri-Ichiroh; Murata, Mitsuyoshi; Nagao-Sato, Sayaka; Nakazato, Kenichi; Ninomiya, Noriko; Nishiyori-Sueki, Hiromi; Noma, Shohei; Saijyo, Eri; Saka, Akiko; Sakai, Mizuho; Simon, Christophe; Suzuki, Naoko; Tagami, Michihira; Watanabe, Shoko; Yoshida, Shigehiro; Arner, Peter; Axton, Richard A; Babina, Magda; Baillie, J Kenneth; Barnett, Timothy C; Beckhouse, Anthony G; Blumenthal, Antje; Bodega, Beatrice; Bonetti, Alessandro; Briggs, James; Brombacher, Frank; Carlisle, Ailsa J; Clevers, Hans C; Davis, Carrie A; Detmar, Michael; Dohi, Taeko; Edge, Albert S B; Edinger, Matthias; Ehrlund, Anna; Ekwall, Karl; Endoh, Mitsuhiro; Enomoto, Hideki; Eslami, Afsaneh; Fagiolini, Michela; Fairbairn, Lynsey; Farach-Carson, Mary C; Faulkner, Geoffrey J; Ferrai, Carmelo; Fisher, Malcolm E; Forrester, Lesley M; Fujita, Rie; Furusawa, Jun-Ichi; Geijtenbeek, Teunis B; Gingeras, Thomas; Goldowitz, Daniel; Guhl, Sven; Guler, Reto; Gustincich, Stefano; Ha, Thomas J; Hamaguchi, Masahide; Hara, Mitsuko; Hasegawa, Yuki; Herlyn, Meenhard; Heutink, Peter; Hitchens, Kelly J; Hume, David A; Ikawa, Tomokatsu; Ishizu, Yuri; Kai, Chieko; Kawamoto, Hiroshi; Kawamura, Yuki I; Kempfle, Judith S; Kenna, Tony J; Kere, Juha; Khachigian, Levon M; Kitamura, Toshio; Klein, Sarah; Klinken, S Peter; Knox, Alan J; Kojima, Soichi; Koseki, Haruhiko; Koyasu, Shigeo; Lee, Weonju; Lennartsson, Andreas; Mackay-Sim, Alan; Mejhert, Niklas; Mizuno, Yosuke; Morikawa, Hiromasa; Morimoto, Mitsuru; Moro, Kazuyo; Morris, Kelly J; Motohashi, Hozumi; Mummery, Christine L; Nakachi, Yutaka; Nakahara, Fumio; Nakamura, Toshiyuki; Nakamura, Yukio; Nozaki, Tadasuke; Ogishima, Soichi; Ohkura, Naganari; Ohno, Hiroshi; Ohshima, Mitsuhiro; Okada-Hatakeyama, Mariko; Okazaki, Yasushi; Orlando, Valerio; Ovchinnikov, Dmitry A; Passier, Robert; Patrikakis, Margaret; Pombo, Ana; Pradhan-Bhatt, Swati; Qin, Xian-Yang; Rehli, Michael; Rizzu, Patrizia; Roy, Sugata; Sajantila, Antti; Sakaguchi, Shimon; Sato, Hiroki; Satoh, Hironori; Savvi, Suzana; Saxena, Alka; Schmidl, Christian; Schneider, Claudio; Schulze-Tanzil, Gundula G; Schwegmann, Anita; Sheng, Guojun; Shin, Jay W; Sugiyama, Daisuke; Sugiyama, Takaaki; Summers, Kim M; Takahashi, Naoko; Takai, Jun; Tanaka, Hiroshi; Tatsukawa, Hideki; Tomoiu, Andru; Toyoda, Hiroo; van de Wetering, Marc; van den Berg, Linda M; Verardo, Roberto; Vijayan, Dipti; Wells, Christine A; Winteringham, Louise N; Wolvetang, Ernst; Yamaguchi, Yoko; Yamamoto, Masayuki; Yanagi-Mizuochi, Chiyo; Yoneda, Misako; Yonekura, Yohei; Zhang, Peter G; Zucchelli, Silvia; Abugessaisa, Imad; Arner, Erik; Harshbarger, Jayson; Kondo, Atsushi; Lassmann, Timo; Lizio, Marina; Sahin, Serkan; Sengstag, Thierry; Severin, Jessica; Shimoji, Hisashi; Suzuki, Masanori; Suzuki, Harukazu; Kawai, Jun; Kondo, Naoto; Itoh, Masayoshi; Daub, Carsten O; Kasukawa, Takeya; Kawaji, Hideya; Carninci, Piero; Forrest, Alistair R R; Hayashizaki, Yoshihide

    2017-08-29

    In the FANTOM5 project, transcription initiation events across the human and mouse genomes were mapped at a single base-pair resolution and their frequencies were monitored by CAGE (Cap Analysis of Gene Expression) coupled with single-molecule sequencing. Approximately three thousands of samples, consisting of a variety of primary cells, tissues, cell lines, and time series samples during cell activation and development, were subjected to a uniform pipeline of CAGE data production. The analysis pipeline started by measuring RNA extracts to assess their quality, and continued to CAGE library production by using a robotic or a manual workflow, single molecule sequencing, and computational processing to generate frequencies of transcription initiation. Resulting data represents the consequence of transcriptional regulation in each analyzed state of mammalian cells. Non-overlapping peaks over the CAGE profiles, approximately 200,000 and 150,000 peaks for the human and mouse genomes, were identified and annotated to provide precise location of known promoters as well as novel ones, and to quantify their activities.

  12. FANTOM5 CAGE profiles of human and mouse samples

    PubMed Central

    Noguchi, Shuhei; Arakawa, Takahiro; Fukuda, Shiro; Furuno, Masaaki; Hasegawa, Akira; Hori, Fumi; Ishikawa-Kato, Sachi; Kaida, Kaoru; Kaiho, Ai; Kanamori-Katayama, Mutsumi; Kawashima, Tsugumi; Kojima, Miki; Kubosaki, Atsutaka; Manabe, Ri-ichiroh; Murata, Mitsuyoshi; Nagao-Sato, Sayaka; Nakazato, Kenichi; Ninomiya, Noriko; Nishiyori-Sueki, Hiromi; Noma, Shohei; Saijyo, Eri; Saka, Akiko; Sakai, Mizuho; Simon, Christophe; Suzuki, Naoko; Tagami, Michihira; Watanabe, Shoko; Yoshida, Shigehiro; Arner, Peter; Axton, Richard A.; Babina, Magda; Baillie, J. Kenneth; Barnett, Timothy C.; Beckhouse, Anthony G.; Blumenthal, Antje; Bodega, Beatrice; Bonetti, Alessandro; Briggs, James; Brombacher, Frank; Carlisle, Ailsa J.; Clevers, Hans C.; Davis, Carrie A.; Detmar, Michael; Dohi, Taeko; Edge, Albert S.B.; Edinger, Matthias; Ehrlund, Anna; Ekwall, Karl; Endoh, Mitsuhiro; Enomoto, Hideki; Eslami, Afsaneh; Fagiolini, Michela; Fairbairn, Lynsey; Farach-Carson, Mary C.; Faulkner, Geoffrey J.; Ferrai, Carmelo; Fisher, Malcolm E.; Forrester, Lesley M.; Fujita, Rie; Furusawa, Jun-ichi; Geijtenbeek, Teunis B.; Gingeras, Thomas; Goldowitz, Daniel; Guhl, Sven; Guler, Reto; Gustincich, Stefano; Ha, Thomas J.; Hamaguchi, Masahide; Hara, Mitsuko; Hasegawa, Yuki; Herlyn, Meenhard; Heutink, Peter; Hitchens, Kelly J.; Hume, David A.; Ikawa, Tomokatsu; Ishizu, Yuri; Kai, Chieko; Kawamoto, Hiroshi; Kawamura, Yuki I.; Kempfle, Judith S.; Kenna, Tony J.; Kere, Juha; Khachigian, Levon M.; Kitamura, Toshio; Klein, Sarah; Klinken, S. Peter; Knox, Alan J.; Kojima, Soichi; Koseki, Haruhiko; Koyasu, Shigeo; Lee, Weonju; Lennartsson, Andreas; Mackay-sim, Alan; Mejhert, Niklas; Mizuno, Yosuke; Morikawa, Hiromasa; Morimoto, Mitsuru; Moro, Kazuyo; Morris, Kelly J.; Motohashi, Hozumi; Mummery, Christine L.; Nakachi, Yutaka; Nakahara, Fumio; Nakamura, Toshiyuki; Nakamura, Yukio; Nozaki, Tadasuke; Ogishima, Soichi; Ohkura, Naganari; Ohno, Hiroshi; Ohshima, Mitsuhiro; Okada-Hatakeyama, Mariko; Okazaki, Yasushi; Orlando, Valerio; Ovchinnikov, Dmitry A.; Passier, Robert; Patrikakis, Margaret; Pombo, Ana; Pradhan-Bhatt, Swati; Qin, Xian-Yang; Rehli, Michael; Rizzu, Patrizia; Roy, Sugata; Sajantila, Antti; Sakaguchi, Shimon; Sato, Hiroki; Satoh, Hironori; Savvi, Suzana; Saxena, Alka; Schmidl, Christian; Schneider, Claudio; Schulze-Tanzil, Gundula G.; Schwegmann, Anita; Sheng, Guojun; Shin, Jay W.; Sugiyama, Daisuke; Sugiyama, Takaaki; Summers, Kim M.; Takahashi, Naoko; Takai, Jun; Tanaka, Hiroshi; Tatsukawa, Hideki; Tomoiu, Andru; Toyoda, Hiroo; van de Wetering, Marc; van den Berg, Linda M.; Verardo, Roberto; Vijayan, Dipti; Wells, Christine A.; Winteringham, Louise N.; Wolvetang, Ernst; Yamaguchi, Yoko; Yamamoto, Masayuki; Yanagi-Mizuochi, Chiyo; Yoneda, Misako; Yonekura, Yohei; Zhang, Peter G.; Zucchelli, Silvia; Abugessaisa, Imad; Arner, Erik; Harshbarger, Jayson; Kondo, Atsushi; Lassmann, Timo; Lizio, Marina; Sahin, Serkan; Sengstag, Thierry; Severin, Jessica; Shimoji, Hisashi; Suzuki, Masanori; Suzuki, Harukazu; Kawai, Jun; Kondo, Naoto; Itoh, Masayoshi; Daub, Carsten O.; Kasukawa, Takeya; Kawaji, Hideya; Carninci, Piero; Forrest, Alistair R.R.; Hayashizaki, Yoshihide

    2017-01-01

    In the FANTOM5 project, transcription initiation events across the human and mouse genomes were mapped at a single base-pair resolution and their frequencies were monitored by CAGE (Cap Analysis of Gene Expression) coupled with single-molecule sequencing. Approximately three thousands of samples, consisting of a variety of primary cells, tissues, cell lines, and time series samples during cell activation and development, were subjected to a uniform pipeline of CAGE data production. The analysis pipeline started by measuring RNA extracts to assess their quality, and continued to CAGE library production by using a robotic or a manual workflow, single molecule sequencing, and computational processing to generate frequencies of transcription initiation. Resulting data represents the consequence of transcriptional regulation in each analyzed state of mammalian cells. Non-overlapping peaks over the CAGE profiles, approximately 200,000 and 150,000 peaks for the human and mouse genomes, were identified and annotated to provide precise location of known promoters as well as novel ones, and to quantify their activities. PMID:28850106

  13. From genes to genomes: a new paradigm for studying fungal pathogenesis in Magnaporthe oryzae.

    PubMed

    Xu, Jin-Rong; Zhao, Xinhua; Dean, Ralph A

    2007-01-01

    Magnaporthe oryzae is the most destructive fungal pathogen of rice worldwide and because of its amenability to classical and molecular genetic manipulation, availability of a genome sequence, and other resources it has emerged as a leading model system to study host-pathogen interactions. This chapter reviews recent progress toward elucidation of the molecular basis of infection-related morphogenesis, host penetration, invasive growth, and host-pathogen interactions. Related information on genome analysis and genomic studies of plant infection processes is summarized under specific topics where appropriate. Particular emphasis is placed on the role of MAP kinase and cAMP signal transduction pathways and unique features in the genome such as repetitive sequences and expanded gene families. Emerging developments in functional genome analysis through large-scale insertional mutagenesis and gene expression profiling are detailed. The chapter concludes with new prospects in the area of systems biology, such as protein expression profiling, and highlighting remaining crucial information needed to fully appreciate host-pathogen interactions.

  14. Molecular profiling of multiple myeloma: from gene expression analysis to next-generation sequencing.

    PubMed

    Agnelli, Luca; Tassone, Pierfrancesco; Neri, Antonino

    2013-06-01

    Multiple myeloma is a fatal malignant proliferation of clonal bone marrow Ig-secreting plasma cells, characterized by wide clinical, biological, and molecular heterogeneity. Herein, global gene and microRNA expression, genome-wide DNA profilings, and next-generation sequencing technology used to investigate the genomic alterations underlying the bio-clinical heterogeneity in multiple myeloma are discussed. High-throughput technologies have undoubtedly allowed a better comprehension of the molecular basis of the disease, a fine stratification, and early identification of high-risk patients, and have provided insights toward targeted therapy studies. However, such technologies are at risk of being affected by laboratory- or cohort-specific biases, and are moreover influenced by high number of expected false positives. This aspect has a major weight in myeloma, which is characterized by large molecular heterogeneity. Therefore, meta-analysis as well as multiple approaches are desirable if not mandatory to validate the results obtained, in line with commonly accepted recommendation for tumor diagnostic/prognostic biomarker studies.

  15. A Nasal Brush-based Classifier of Asthma Identified by Machine Learning Analysis of Nasal RNA Sequence Data.

    PubMed

    Pandey, Gaurav; Pandey, Om P; Rogers, Angela J; Ahsen, Mehmet E; Hoffman, Gabriel E; Raby, Benjamin A; Weiss, Scott T; Schadt, Eric E; Bunyavanich, Supinda

    2018-06-11

    Asthma is a common, under-diagnosed disease affecting all ages. We sought to identify a nasal brush-based classifier of mild/moderate asthma. 190 subjects with mild/moderate asthma and controls underwent nasal brushing and RNA sequencing of nasal samples. A machine learning-based pipeline identified an asthma classifier consisting of 90 genes interpreted via an L2-regularized logistic regression classification model. This classifier performed with strong predictive value and sensitivity across eight test sets, including (1) a test set of independent asthmatic and control subjects profiled by RNA sequencing (positive and negative predictive values of 1.00 and 0.96, respectively; AUC of 0.994), (2) two independent case-control cohorts of asthma profiled by microarray, and (3) five cohorts with other respiratory conditions (allergic rhinitis, upper respiratory infection, cystic fibrosis, smoking), where the classifier had a low to zero misclassification rate. Following validation in large, prospective cohorts, this classifier could be developed into a nasal biomarker of asthma.

  16. Identification, isolation, and N-terminal sequencing of style glycoproteins associated with self-incompatibility in Nicotiana alata.

    PubMed

    Jahnen, W; Batterham, M P; Clarke, A E; Moritz, R L; Simpson, R J

    1989-05-01

    S-Gene-associated glycoproteins (S-glycoproteins) from styles of Nicotiana alata, identified by non-equilibrium two-dimensional electrophoresis, were purified by cation exchange fast protein liquid chromatography with yields of 0.5 to 8 micrograms of protein per style, depending on the S-genotype of the plant. The method relies on the highly basic nature of the S-glycoproteins. The elution profiles of the different S-glycoproteins from the fast protein liquid chromatography column were characteristic of each S-glycoprotein, and could be used to establish the S-genotype of plants in outbreeding populations. In all cases, the S-genotype predicted from the style protein profile corresponded to that predicted from DNA gel blot analysis using S-allele-specific DNA probes and to that established by conventional breeding tests. Amino-terminal sequences of five purified S-glycoproteins showed a high degree of homology with the previously published sequences of N. alata and Lycopersicon esculentum S-glycoproteins.

  17. Flavivirus and Filovirus EvoPrinters: New alignment tools for the comparative analysis of viral evolution.

    PubMed

    Brody, Thomas; Yavatkar, Amarendra S; Park, Dong Sun; Kuzin, Alexander; Ross, Jermaine; Odenwald, Ward F

    2017-06-01

    Flavivirus and Filovirus infections are serious epidemic threats to human populations. Multi-genome comparative analysis of these evolving pathogens affords a view of their essential, conserved sequence elements as well as progressive evolutionary changes. While phylogenetic analysis has yielded important insights, the growing number of available genomic sequences makes comparisons between hundreds of viral strains challenging. We report here a new approach for the comparative analysis of these hemorrhagic fever viruses that can superimpose an unlimited number of one-on-one alignments to identify important features within genomes of interest. We have adapted EvoPrinter alignment algorithms for the rapid comparative analysis of Flavivirus or Filovirus sequences including Zika and Ebola strains. The user can input a full genome or partial viral sequence and then view either individual comparisons or generate color-coded readouts that superimpose hundreds of one-on-one alignments to identify unique or shared identity SNPs that reveal ancestral relationships between strains. The user can also opt to select a database genome in order to access a library of pre-aligned genomes of either 1,094 Flaviviruses or 460 Filoviruses for rapid comparative analysis with all database entries or a select subset. Using EvoPrinter search and alignment programs, we show the following: 1) superimposing alignment data from many related strains identifies lineage identity SNPs, which enable the assessment of sublineage complexity within viral outbreaks; 2) whole-genome SNP profile screens uncover novel Dengue2 and Zika recombinant strains and their parental lineages; 3) differential SNP profiling identifies host cell A-to-I hyper-editing within Ebola and Marburg viruses, and 4) hundreds of superimposed one-on-one Ebola genome alignments highlight ultra-conserved regulatory sequences, invariant amino acid codons and evolutionarily variable protein-encoding domains within a single genome. EvoPrinter allows for the assessment of lineage complexity within Flavivirus or Filovirus outbreaks, identification of recombinant strains, highlights sequences that have undergone host cell A-to-I editing, and identifies unique input and database SNPs within highly conserved sequences. EvoPrinter's ability to superimpose alignment data from hundreds of strains onto a single genome has allowed us to identify unique Zika virus sublineages that are currently spreading in South, Central and North America, the Caribbean, and in China. This new set of integrated alignment programs should serve as a useful addition to existing tools for the comparative analysis of these viruses.

  18. Entropic Profiler – detection of conservation in genomes using information theory

    PubMed Central

    Fernandes, Francisco; Freitas, Ana T; Almeida, Jonas S; Vinga, Susana

    2009-01-01

    Background In the last decades, with the successive availability of whole genome sequences, many research efforts have been made to mathematically model DNA. Entropic Profiles (EP) were proposed recently as a new measure of continuous entropy of genome sequences. EP represent local information plots related to DNA randomness and are based on information theory and statistical concepts. They express the weighed relative abundance of motifs for each position in genomes. Their study is very relevant because under or over-representation segments are often associated with significant biological meaning. Findings The Entropic Profiler application here presented is a new tool designed to detect and extract under and over-represented DNA segments in genomes by using EP. It allows its computation in a very efficient way by recurring to improved algorithms and data structures, which include modified suffix trees. Available through a web interface and as downloadable source code, it allows to study positions and to search for motifs inside the whole sequence or within a specified range. DNA sequences can be entered from different sources, including FASTA files, pre-loaded examples or resuming a previously saved work. Besides the EP value plots, p-values and z-scores for each motif are also computed, along with the Chaos Game Representation of the sequence. Conclusion EP are directly related with the statistical significance of motifs and can be considered as a new method to extract and classify significant regions in genomes and estimate local scales in DNA. The present implementation establishes an efficient and useful tool for whole genome analysis. PMID:19416538

  19. A high-throughput approach to profile RNA structure.

    PubMed

    Delli Ponti, Riccardo; Marti, Stefanie; Armaos, Alexandros; Tartaglia, Gian Gaetano

    2017-03-17

    Here we introduce the Computational Recognition of Secondary Structure (CROSS) method to calculate the structural profile of an RNA sequence (single- or double-stranded state) at single-nucleotide resolution and without sequence length restrictions. We trained CROSS using data from high-throughput experiments such as Selective 2΄-Hydroxyl Acylation analyzed by Primer Extension (SHAPE; Mouse and HIV transcriptomes) and Parallel Analysis of RNA Structure (PARS; Human and Yeast transcriptomes) as well as high-quality NMR/X-ray structures (PDB database). The algorithm uses primary structure information alone to predict experimental structural profiles with >80% accuracy, showing high performances on large RNAs such as Xist (17 900 nucleotides; Area Under the ROC Curve AUC of 0.75 on dimethyl sulfate (DMS) experiments). We integrated CROSS in thermodynamics-based methods to predict secondary structure and observed an increase in their predictive power by up to 30%. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. HIPPI: highly accurate protein family classification with ensembles of HMMs.

    PubMed

    Nguyen, Nam-Phuong; Nute, Michael; Mirarab, Siavash; Warnow, Tandy

    2016-11-11

    Given a new biological sequence, detecting membership in a known family is a basic step in many bioinformatics analyses, with applications to protein structure and function prediction and metagenomic taxon identification and abundance profiling, among others. Yet family identification of sequences that are distantly related to sequences in public databases or that are fragmentary remains one of the more difficult analytical problems in bioinformatics. We present a new technique for family identification called HIPPI (Hierarchical Profile Hidden Markov Models for Protein family Identification). HIPPI uses a novel technique to represent a multiple sequence alignment for a given protein family or superfamily by an ensemble of profile hidden Markov models computed using HMMER. An evaluation of HIPPI on the Pfam database shows that HIPPI has better overall precision and recall than blastp, HMMER, and pipelines based on HHsearch, and maintains good accuracy even for fragmentary query sequences and for protein families with low average pairwise sequence identity, both conditions where other methods degrade in accuracy. HIPPI provides accurate protein family identification and is robust to difficult model conditions. Our results, combined with observations from previous studies, show that ensembles of profile Hidden Markov models can better represent multiple sequence alignments than a single profile Hidden Markov model, and thus can improve downstream analyses for various bioinformatic tasks. Further research is needed to determine the best practices for building the ensemble of profile Hidden Markov models. HIPPI is available on GitHub at https://github.com/smirarab/sepp .

  1. Quantitative microbiome profiling links gut community variation to microbial load.

    PubMed

    Vandeputte, Doris; Kathagen, Gunter; D'hoe, Kevin; Vieira-Silva, Sara; Valles-Colomer, Mireia; Sabino, João; Wang, Jun; Tito, Raul Y; De Commer, Lindsey; Darzi, Youssef; Vermeire, Séverine; Falony, Gwen; Raes, Jeroen

    2017-11-23

    Current sequencing-based analyses of faecal microbiota quantify microbial taxa and metabolic pathways as fractions of the sample sequence library generated by each analysis. Although these relative approaches permit detection of disease-associated microbiome variation, they are limited in their ability to reveal the interplay between microbiota and host health. Comparative analyses of relative microbiome data cannot provide information about the extent or directionality of changes in taxa abundance or metabolic potential. If microbial load varies substantially between samples, relative profiling will hamper attempts to link microbiome features to quantitative data such as physiological parameters or metabolite concentrations. Saliently, relative approaches ignore the possibility that altered overall microbiota abundance itself could be a key identifier of a disease-associated ecosystem configuration. To enable genuine characterization of host-microbiota interactions, microbiome research must exchange ratios for counts. Here we build a workflow for the quantitative microbiome profiling of faecal material, through parallelization of amplicon sequencing and flow cytometric enumeration of microbial cells. We observe up to tenfold differences in the microbial loads of healthy individuals and relate this variation to enterotype differentiation. We show how microbial abundances underpin both microbiota variation between individuals and covariation with host phenotype. Quantitative profiling bypasses compositionality effects in the reconstruction of gut microbiota interaction networks and reveals that the taxonomic trade-off between Bacteroides and Prevotella is an artefact of relative microbiome analyses. Finally, we identify microbial load as a key driver of observed microbiota alterations in a cohort of patients with Crohn's disease, here associated with a low-cell-count Bacteroides enterotype (as defined through relative profiling).

  2. Quantitative assessment of RNA-protein interactions with high-throughput sequencing-RNA affinity profiling.

    PubMed

    Ozer, Abdullah; Tome, Jacob M; Friedman, Robin C; Gheba, Dan; Schroth, Gary P; Lis, John T

    2015-08-01

    Because RNA-protein interactions have a central role in a wide array of biological processes, methods that enable a quantitative assessment of these interactions in a high-throughput manner are in great demand. Recently, we developed the high-throughput sequencing-RNA affinity profiling (HiTS-RAP) assay that couples sequencing on an Illumina GAIIx genome analyzer with the quantitative assessment of protein-RNA interactions. This assay is able to analyze interactions between one or possibly several proteins with millions of different RNAs in a single experiment. We have successfully used HiTS-RAP to analyze interactions of the EGFP and negative elongation factor subunit E (NELF-E) proteins with their corresponding canonical and mutant RNA aptamers. Here we provide a detailed protocol for HiTS-RAP that can be completed in about a month (8 d hands-on time). This includes the preparation and testing of recombinant proteins and DNA templates, clustering DNA templates on a flowcell, HiTS and protein binding with a GAIIx instrument, and finally data analysis. We also highlight aspects of HiTS-RAP that can be further improved and points of comparison between HiTS-RAP and two other recently developed methods, quantitative analysis of RNA on a massively parallel array (RNA-MaP) and RNA Bind-n-Seq (RBNS), for quantitative analysis of RNA-protein interactions.

  3. Diversity and distribution of archaea community along a stratigraphic permafrost profile from Qinghai-Tibetan Plateau, China.

    PubMed

    Wei, Shiping; Cui, Hongpeng; He, Hao; Hu, Fei; Su, Xin; Zhu, Youhai

    2014-01-01

    Accompanying the thawing permafrost expected to result from the climate change, microbial decomposition of the massive amounts of frozen organic carbon stored in permafrost is a potential emission source of greenhouse gases, possibly leading to positive feedbacks to the greenhouse effect. In this study, the community composition of archaea in stratigraphic soils from an alpine permafrost of Qinghai-Tibetan Plateau was investigated. Phylogenic analysis of 16S rRNA sequences revealed that the community was predominantly constituted by Crenarchaeota and Euryarchaeota. The active layer contained a proportion of Crenarchaeota at 51.2%, with the proportion of Euryarchaeota at 48.8%, whereas the permafrost contained 41.2% Crenarchaeota and 58.8% Euryarchaeota, based on 16S rRNA gene sequence analysis. OTU1 and OTU11, affiliated to Group 1.3b/MCG-A within Crenarchaeota and the unclassified group within Euryarchaeota, respectively, were widely distributed in all sediment layers. However, OTU5 affiliated to Group 1.3b/MCG-A was primarily distributed in the active layers. Sequence analysis of the DGGE bands from the 16S rRNAs of methanogenic archaea showed that the majority of methanogens belonged to Methanosarcinales and Methanomicrobiales affiliated to Euryarchaeota and the uncultured ZC-I cluster affiliated to Methanosarcinales distributed in all the depths along the permafrost profile, which indicated a dominant group of methanogens occurring in the cold ecosystems.

  4. Diversity and Distribution of Archaea Community along a Stratigraphic Permafrost Profile from Qinghai-Tibetan Plateau, China

    PubMed Central

    Cui, Hongpeng; He, Hao; Hu, Fei; Su, Xin; Zhu, Youhai

    2014-01-01

    Accompanying the thawing permafrost expected to result from the climate change, microbial decomposition of the massive amounts of frozen organic carbon stored in permafrost is a potential emission source of greenhouse gases, possibly leading to positive feedbacks to the greenhouse effect. In this study, the community composition of archaea in stratigraphic soils from an alpine permafrost of Qinghai-Tibetan Plateau was investigated. Phylogenic analysis of 16S rRNA sequences revealed that the community was predominantly constituted by Crenarchaeota and Euryarchaeota. The active layer contained a proportion of Crenarchaeota at 51.2%, with the proportion of Euryarchaeota at 48.8%, whereas the permafrost contained 41.2% Crenarchaeota and 58.8% Euryarchaeota, based on 16S rRNA gene sequence analysis. OTU1 and OTU11, affiliated to Group 1.3b/MCG-A within Crenarchaeota and the unclassified group within Euryarchaeota, respectively, were widely distributed in all sediment layers. However, OTU5 affiliated to Group 1.3b/MCG-A was primarily distributed in the active layers. Sequence analysis of the DGGE bands from the 16S rRNAs of methanogenic archaea showed that the majority of methanogens belonged to Methanosarcinales and Methanomicrobiales affiliated to Euryarchaeota and the uncultured ZC-I cluster affiliated to Methanosarcinales distributed in all the depths along the permafrost profile, which indicated a dominant group of methanogens occurring in the cold ecosystems. PMID:25525409

  5. Determination of differential gene expression profiles in superficial and deeper zones of mature rat articular cartilage using RNA sequencing of laser microdissected tissue specimens.

    PubMed

    Mori, Yoshifumi; Chung, Ung-Il; Tanaka, Sakae; Saito, Taku

    2014-01-01

    Superficial zone (SFZ) cells, which are morphologically and functionally distinct from chondrocytes in deeper zones, play important roles in the maintenance of articular cartilage. Here, we established an easy and reliable method for performance of laser microdissection (LMD) on cryosections of mature rat articular cartilage using an adhesive membrane. We further examined gene expression profiles in the SFZ and the deeper zones of articular cartilage by performing RNA sequencing (RNA-seq). We validated sample collection methods, RNA amplification and the RNA-seq data using real-time RT-PCR. The combined data provide comprehensive information regarding genes specifically expressed in the SFZ or deeper zones, as well as a useful protocol for expression analysis of microsamples of hard tissues.

  6. Hidden Markov models incorporating fuzzy measures and integrals for protein sequence identification and alignment.

    PubMed

    Bidargaddi, Niranjan P; Chetty, Madhu; Kamruzzaman, Joarder

    2008-06-01

    Profile hidden Markov models (HMMs) based on classical HMMs have been widely applied for protein sequence identification. The formulation of the forward and backward variables in profile HMMs is made under statistical independence assumption of the probability theory. We propose a fuzzy profile HMM to overcome the limitations of that assumption and to achieve an improved alignment for protein sequences belonging to a given family. The proposed model fuzzifies the forward and backward variables by incorporating Sugeno fuzzy measures and Choquet integrals, thus further extends the generalized HMM. Based on the fuzzified forward and backward variables, we propose a fuzzy Baum-Welch parameter estimation algorithm for profiles. The strong correlations and the sequence preference involved in the protein structures make this fuzzy architecture based model as a suitable candidate for building profiles of a given family, since the fuzzy set can handle uncertainties better than classical methods.

  7. DNA methylation-based reclassification of olfactory neuroblastoma.

    PubMed

    Capper, David; Engel, Nils W; Stichel, Damian; Lechner, Matt; Glöss, Stefanie; Schmid, Simone; Koelsche, Christian; Schrimpf, Daniel; Niesen, Judith; Wefers, Annika K; Jones, David T W; Sill, Martin; Weigert, Oliver; Ligon, Keith L; Olar, Adriana; Koch, Arend; Forster, Martin; Moran, Sebastian; Tirado, Oscar M; Sáinz-Japeado, Miguel; Mora, Jaume; Esteller, Manel; Alonso, Javier; Del Muro, Xavier Garcia; Paulus, Werner; Felsberg, Jörg; Reifenberger, Guido; Glatzel, Markus; Frank, Stephan; Monoranu, Camelia M; Lund, Valerie J; von Deimling, Andreas; Pfister, Stefan; Buslei, Rolf; Ribbat-Idel, Julika; Perner, Sven; Gudziol, Volker; Meinhardt, Matthias; Schüller, Ulrich

    2018-05-05

    Olfactory neuroblastoma/esthesioneuroblastoma (ONB) is an uncommon neuroectodermal neoplasm thought to arise from the olfactory epithelium. Little is known about its molecular pathogenesis. For this study, a retrospective cohort of n = 66 tumor samples with the institutional diagnosis of ONB was analyzed by immunohistochemistry, genome-wide DNA methylation profiling, copy number analysis, and in a subset, next-generation panel sequencing of 560 tumor-associated genes. DNA methylation profiles were compared to those of relevant differential diagnoses of ONB. Unsupervised hierarchical clustering analysis of DNA methylation data revealed four subgroups among institutionally diagnosed ONB. The largest group (n = 42, 64%, Core ONB) presented with classical ONB histology and no overlap with other classes upon methylation profiling-based t-distributed stochastic neighbor embedding (t-SNE) analysis. A second DNA methylation group (n = 7, 11%) with CpG island methylator phenotype (CIMP) consisted of cases with strong expression of cytokeratin, no or scarce chromogranin A expression and IDH2 hotspot mutation in all cases. T-SNE analysis clustered these cases together with sinonasal carcinoma with IDH2 mutation. Four cases (6%) formed a small group characterized by an overall high level of DNA methylation, but without CIMP. The fourth group consisted of 13 cases that had heterogeneous DNA methylation profiles and strong cytokeratin expression in most cases. In t-SNE analysis, these cases mostly grouped among sinonasal adenocarcinoma, squamous cell carcinoma, and undifferentiated carcinoma. Copy number analysis indicated highly recurrent chromosomal changes among Core ONB with a high frequency of combined loss of chromosome 1-4, 8-10, and 12. NGS sequencing did not reveal highly recurrent mutations in ONB, with the only recurrently mutated genes being TP53 and DNMT3A. In conclusion, we demonstrate that institutionally diagnosed ONB are a heterogeneous group of tumors. Expression of cytokeratin, chromogranin A, the mutational status of IDH2 as well as DNA methylation patterns may greatly aid in the precise classification of ONB.

  8. RNA-Seq for gene identification and transcript profiling of three Stevia rebaudiana genotypes.

    PubMed

    Chen, Junwen; Hou, Kai; Qin, Peng; Liu, Hongchang; Yi, Bin; Yang, Wenting; Wu, Wei

    2014-07-07

    Stevia (Stevia rebaudiana) is an important medicinal plant that yields diterpenoid steviol glycosides (SGs). SGs are currently used in the preparation of medicines, food products and neutraceuticals because of its sweetening property (zero calories and about 300 times sweeter than sugar). Recently, some progress has been made in understanding the biosynthesis of SGs in Stevia, but little is known about the molecular mechanisms underlying this process. Additionally, the genomics of Stevia, a non-model species, remains uncharacterized. The recent advent of RNA-Seq, a next generation sequencing technology, provides an opportunity to expand the identification of Stevia genes through in-depth transcript profiling. We present a comprehensive landscape of the transcriptome profiles of three genotypes of Stevia with divergent SG compositions characterized using RNA-seq. 191,590,282 high-quality reads were generated and then assembled into 171,837 transcripts with an average sequence length of 969 base pairs. A total of 80,160 unigenes were annotated, and 14,211 of the unique sequences were assigned to specific metabolic pathways by the Kyoto Encyclopedia of Genes and Genomes. Gene sequences of all enzymes known to be involved in SG synthesis were examined. A total of 143 UDP-glucosyltransferase (UGT) unigenes were identified, some of which might be involved in SG biosynthesis. The expression patterns of eight of these genes were further confirmed by RT-QPCR. RNA-seq analysis identified candidate genes encoding enzymes responsible for the biosynthesis of SGs in Stevia, a non-model plant without a reference genome. The transcriptome data from this study yielded new insights into the process of SG accumulation in Stevia. Our results demonstrate that RNA-Seq can be successfully used for gene identification and transcript profiling in a non-model species.

  9. Molecular Profiling of Malignant Pleural Effusion in Metastatic Non-Small-Cell Lung Carcinoma. The Effect of Preanalytical Factors.

    PubMed

    Carter, Jamal; Miller, James Adam; Feller-Kopman, David; Ettinger, David; Sidransky, David; Maleki, Zahra

    2017-07-01

    Non-small-cell lung cancer (NSCLC)-associated malignant pleural effusions (MPEs) are sometimes the only available specimens for molecular analysis. This study evaluates diagnostic yield of NSCLC-associated MPE, its adequacy for molecular profiling and the potential influence of MPE volume/cellularity on the analytic sensitivity of our assays. Molecular results of 50 NSCLC-associated MPE cases during a 5-year period were evaluated. Molecular profiling was performed on cell blocks and consisted of fluorescent in situ hybridization (FISH) for ALK gene rearrangements and the following sequencing platforms: Sanger sequencing (for EGFR) and high-throughput pyrosequencing (for KRAS and BRAF) during the first 4 years of the study period, and targeted next-generation sequencing performed thereafter. A total of 50 NSCLC-associated MPE cases were identified where molecular testing was requested. Of these, 17 cases were excluded: 14 cases (28%) due to inadequate tumor cellularity and 3 cases due to unavailability of the slides to review. A total of 27 out of 50 MPE cases (54%) underwent at least EGFR and KRAS sequencing and FISH for ALK rearrangement. Of the 27 cases with molecular testing results available, a genetic abnormality was detected in 16 cases (59%). The most common genetic aberrations identified involved EGFR ( 9 ) and KRAS ( 7 ). Six cases had ALK FISH only, of which one showed rearrangement. MPE volume was not associated with overall cellularity or tumor cellularity (P = 0.360). Molecular profiling of MPE is a viable alternative to testing solid tissue in NSCLC. This study shows successful detection of genetic aberrations in 59% of samples with minimal risk of false negative.

  10. Dynamic transcriptomic analysis in hircine longissimus dorsi muscle from fetal to neonatal development stages.

    PubMed

    Zhan, Siyuan; Zhao, Wei; Song, Tianzeng; Dong, Yao; Guo, Jiazhong; Cao, Jiaxue; Zhong, Tao; Wang, Linjie; Li, Li; Zhang, Hongping

    2018-01-01

    Muscle growth and development from fetal to neonatal stages consist of a series of delicately regulated and orchestrated changes in expression of genes. In this study, we performed whole transcriptome profiling based on RNA-Seq of caprine longissimus dorsi muscle tissue obtained from prenatal stages (days 45, 60, and 105 of gestation) and neonatal stage (the 3-day-old newborn) to identify genes that are differentially expressed and investigate their temporal expression profiles. A total of 3276 differentially expressed genes (DEGs) were identified (Q value < 0.01). Time-series expression profile clustering analysis indicated that DEGs were significantly clustered into eight clusters which can be divided into two classes (Q value < 0.05), class I profiles with downregulated patterns and class II profiles with upregulated patterns. Based on cluster analysis, GO enrichment analysis found that 75, 25, and 8 terms to be significantly enriched in biological process (BP), cellular component (CC), and molecular function (MF) categories in class I profiles, while 35, 21, and 8 terms to be significantly enriched in BP, CC, and MF in class II profiles. KEGG pathway analysis revealed that DEGs from class I profiles were significantly enriched in 22 pathways and the most enriched pathway was Rap1 signaling pathway. DEGs from class II profiles were significantly enriched in 17 pathways and the mainly enriched pathway was AMPK signaling pathway. Finally, six selected DEGs from our sequencing results were confirmed by qPCR. Our study provides a comprehensive understanding of the molecular mechanisms during goat skeletal muscle development from fetal to neonatal stages and valuable information for future studies of muscle development in goats.

  11. Genetic diversity of Rhizobia isolates from Amazon soils using cowpea (Vigna unguiculata) as trap plant

    PubMed Central

    Silva, F.V.; Simões-Araújo, J.L.; Silva Júnior, J.P.; Xavier, G.R.; Rumjanek, N.G.

    2012-01-01

    The aim of this work was to characterize rhizobia isolated from the root nodules of cowpea (Vigna unguiculata) plants cultivated in Amazon soils samples by means of ARDRA (Amplified rDNA Restriction Analysis) and sequencing analysis, to know their phylogenetic relationships. The 16S rRNA gene of rhizobia was amplified by PCR (polymerase chain reaction) using universal primers Y1 and Y3. The amplification products were analyzed by the restriction enzymes HinfI, MspI and DdeI and also sequenced with Y1, Y3 and six intermediate primers. The clustering analysis based on ARDRA profiles separated the Amazon isolates in three subgroups, which formed a group apart from the reference isolates of Bradyrhizobium japonicum and Bradyrhizobium elkanii. The clustering analysis of 16S rRNA gene sequences showed that the fast-growing isolates had similarity with Enterobacter, Rhizobium, Klebsiella and Bradyrhizobium and all the slow-growing clustered close to Bradyrhizobium. PMID:24031880

  12. DNA methylation profiling using HpaII tiny fragment enrichment by ligation-mediated PCR (HELP)

    PubMed Central

    Suzuki, Masako; Greally, John M.

    2010-01-01

    The HELP assay is a technique that allows genome-wide analysis of cytosine methylation. Here we describe the assay, its relative strengths and weaknesses, and the transition of the assay from a microarray to massively-parallel sequencing-based foundation. PMID:20434563

  13. SHARAKU: an algorithm for aligning and clustering read mapping profiles of deep sequencing in non-coding RNA processing.

    PubMed

    Tsuchiya, Mariko; Amano, Kojiro; Abe, Masaya; Seki, Misato; Hase, Sumitaka; Sato, Kengo; Sakakibara, Yasubumi

    2016-06-15

    Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs. We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5'-end processing and 3'-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain. The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA004502. yasu@bio.keio.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  14. A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis

    PubMed Central

    Down, Thomas A.; Rakyan, Vardhman K.; Turner, Daniel J.; Flicek, Paul; Li, Heng; Kulesha, Eugene; Gräf, Stefan; Johnson, Nathan; Herrero, Javier; Tomazou, Eleni M.; Thorne, Natalie P.; Bäckdahl, Liselotte; Herberth, Marlis; Howe, Kevin L.; Jackson, David K.; Miretti, Marcos M.; Marioni, John C.; Birney, Ewan; Hubbard, Tim J. P.; Durbin, Richard; Tavaré, Simon; Beck, Stephan

    2009-01-01

    DNA methylation is an indispensible epigenetic modification of mammalian genomes. Consequently there is great interest in strategies for genome-wide/whole-genome DNA methylation analysis, and immunoprecipitation-based methods have proven to be a powerful option. Such methods are rapidly shifting the bottleneck from data generation to data analysis, necessitating the development of better analytical tools. Until now, a major analytical difficulty associated with immunoprecipitation-based DNA methylation profiling has been the inability to estimate absolute methylation levels. Here we report the development of a novel cross-platform algorithm – Bayesian Tool for Methylation Analysis (Batman) – for analyzing Methylated DNA Immunoprecipitation (MeDIP) profiles generated using arrays (MeDIP-chip) or next-generation sequencing (MeDIP-seq). The latter is an approach we have developed to elucidate the first high-resolution whole-genome DNA methylation profile (DNA methylome) of any mammalian genome. MeDIP-seq/MeDIP-chip combined with Batman represent robust, quantitative, and cost-effective functional genomic strategies for elucidating the function of DNA methylation. PMID:18612301

  15. Quantitative Assessment of RNA-Protein Interactions with High Throughput Sequencing - RNA Affinity Profiling (HiTS-RAP)

    PubMed Central

    Ozer, Abdullah; Tome, Jacob M.; Friedman, Robin C.; Gheba, Dan; Schroth, Gary P.; Lis, John T.

    2016-01-01

    Because RNA-protein interactions play a central role in a wide-array of biological processes, methods that enable a quantitative assessment of these interactions in a high-throughput manner are in great demand. Recently, we developed the High Throughput Sequencing-RNA Affinity Profiling (HiTS-RAP) assay, which couples sequencing on an Illumina GAIIx with the quantitative assessment of one or several proteins’ interactions with millions of different RNAs in a single experiment. We have successfully used HiTS-RAP to analyze interactions of EGFP and NELF-E proteins with their corresponding canonical and mutant RNA aptamers. Here, we provide a detailed protocol for HiTS-RAP, which can be completed in about a month (8 days hands-on time) including the preparation and testing of recombinant proteins and DNA templates, clustering DNA templates on a flowcell, high-throughput sequencing and protein binding with GAIIx, and finally data analysis. We also highlight aspects of HiTS-RAP that can be further improved and points of comparison between HiTS-RAP and two other recently developed methods, RNA-MaP and RBNS. A successful HiTS-RAP experiment provides the sequence and binding curves for approximately 200 million RNAs in a single experiment. PMID:26182240

  16. Longitudinal Metagenomic Analysis of Hospital Air Identifies Clinically Relevant Microbes.

    PubMed

    King, Paula; Pham, Long K; Waltz, Shannon; Sphar, Dan; Yamamoto, Robert T; Conrad, Douglas; Taplitz, Randy; Torriani, Francesca; Forsyth, R Allyn

    2016-01-01

    We describe the sampling of sixty-three uncultured hospital air samples collected over a six-month period and analysis using shotgun metagenomic sequencing. Our primary goals were to determine the longitudinal metagenomic variability of this environment, identify and characterize genomes of potential pathogens and determine whether they are atypical to the hospital airborne metagenome. Air samples were collected from eight locations which included patient wards, the main lobby and outside. The resulting DNA libraries produced 972 million sequences representing 51 gigabases. Hierarchical clustering of samples by the most abundant 50 microbial orders generated three major nodes which primarily clustered by type of location. Because the indoor locations were longitudinally consistent, episodic relative increases in microbial genomic signatures related to the opportunistic pathogens Aspergillus, Penicillium and Stenotrophomonas were identified as outliers at specific locations. Further analysis of microbial reads specific for Stenotrophomonas maltophilia indicated homology to a sequenced multi-drug resistant clinical strain and we observed broad sequence coverage of resistance genes. We demonstrate that a shotgun metagenomic sequencing approach can be used to characterize the resistance determinants of pathogen genomes that are uncharacteristic for an otherwise consistent hospital air microbial metagenomic profile.

  17. Evol and ProDy for bridging protein sequence evolution and structural dynamics

    PubMed Central

    Mao, Wenzhi; Liu, Ying; Chennubhotla, Chakra; Lezon, Timothy R.; Bahar, Ivet

    2014-01-01

    Correlations between sequence evolution and structural dynamics are of utmost importance in understanding the molecular mechanisms of function and their evolution. We have integrated Evol, a new package for fast and efficient comparative analysis of evolutionary patterns and conformational dynamics, into ProDy, a computational toolbox designed for inferring protein dynamics from experimental and theoretical data. Using information-theoretic approaches, Evol coanalyzes conservation and coevolution profiles extracted from multiple sequence alignments of protein families with their inferred dynamics. Availability and implementation: ProDy and Evol are open-source and freely available under MIT License from http://prody.csb.pitt.edu/. Contact: bahar@pitt.edu PMID:24849577

  18. FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues.

    PubMed

    El-Manzalawy, Yasser; Abbas, Mostafa; Malluhi, Qutaibah; Honavar, Vasant

    2016-01-01

    A wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses are mediated by RNA-protein interactions. However, experimental determination of the structures of protein-RNA complexes is expensive and technically challenging. Hence, a number of computational tools have been developed for predicting protein-RNA interfaces. Some of the state-of-the-art protein-RNA interface predictors rely on position-specific scoring matrix (PSSM)-based encoding of the protein sequences. The computational efforts needed for generating PSSMs severely limits the practical utility of protein-RNA interface prediction servers. In this work, we experiment with two approaches, random sampling and sequence similarity reduction, for extracting a representative reference database of protein sequences from more than 50 million protein sequences in UniRef100. Our results suggest that random sampled databases produce better PSSM profiles (in terms of the number of hits used to generate the profile and the distance of the generated profile to the corresponding profile generated using the entire UniRef100 data as well as the accuracy of the machine learning classifier trained using these profiles). Based on our results, we developed FastRNABindR, an improved version of RNABindR for predicting protein-RNA interface residues using PSSM profiles generated using 1% of the UniRef100 sequences sampled uniformly at random. To the best of our knowledge, FastRNABindR is the only protein-RNA interface residue prediction online server that requires generation of PSSM profiles for query sequences and accepts hundreds of protein sequences per submission. Our approach for determining the optimal BLAST database for a protein-RNA interface residue classification task has the potential of substantially speeding up, and hence increasing the practical utility of, other amino acid sequence based predictors of protein-protein and protein-DNA interfaces.

  19. Distinctive archaebacterial species associated with anaerobic rumen protozoan Entodinium caudatum.

    PubMed

    Tóthová, T; Piknová, M; Kisidayová, S; Javorský, P; Pristas, P

    2008-01-01

    The diversity of archaebacteria associated with anaerobic rumen protozoan Entodinium caudatum in long term in vitro culture was investigated by denaturing gradient gel electrophoresis (DGGE) analysis of hypervariable V3 region of archaebacterial 16S rRNA gene. PCR was accomplished directly from DNA extracted from a single protozoal cell and from total community genomic DNA and the obtained fingerprints were compared. The analysis indicated the presence of a solitary intensive band present in Entodinium caudatum single cell DNA, which had no counterparts in the profile from total DNA. The identity of archaebacterium represented by this band was determined by sequence analysis which showed that the sequence fell to the cluster of ciliate symbiotic methanogens identified recently by 16S gene library approach.

  20. Salmonella enterica Prophage Sequence Profiles Reflect Genome Diversity and Can Be Used for High Discrimination Subtyping.

    PubMed

    Mottawea, Walid; Duceppe, Marc-Olivier; Dupras, Andrée A; Usongo, Valentine; Jeukens, Julie; Freschi, Luca; Emond-Rheault, Jean-Guillaume; Hamel, Jeremie; Kukavica-Ibrulj, Irena; Boyle, Brian; Gill, Alexander; Burnett, Elton; Franz, Eelco; Arya, Gitanjali; Weadge, Joel T; Gruenheid, Samantha; Wiedmann, Martin; Huang, Hongsheng; Daigle, France; Moineau, Sylvain; Bekal, Sadjia; Levesque, Roger C; Goodridge, Lawrence D; Ogunremi, Dele

    2018-01-01

    Non-typhoidal Salmonella is a leading cause of foodborne illness worldwide. Prompt and accurate identification of the sources of Salmonella responsible for disease outbreaks is crucial to minimize infections and eliminate ongoing sources of contamination. Current subtyping tools including single nucleotide polymorphism (SNP) typing may be inadequate, in some instances, to provide the required discrimination among epidemiologically unrelated Salmonella strains. Prophage genes represent the majority of the accessory genes in bacteria genomes and have potential to be used as high discrimination markers in Salmonella . In this study, the prophage sequence diversity in different Salmonella serovars and genetically related strains was investigated. Using whole genome sequences of 1,760 isolates of S. enterica representing 151 Salmonella serovars and 66 closely related bacteria, prophage sequences were identified from assembled contigs using PHASTER. We detected 154 different prophages in S. enterica genomes. Prophage sequences were highly variable among S. enterica serovars with a median ± interquartile range (IQR) of 5 ± 3 prophage regions per genome. While some prophage sequences were highly conserved among the strains of specific serovars, few regions were lineage specific. Therefore, strains belonging to each serovar could be clustered separately based on their prophage content. Analysis of S . Enteritidis isolates from seven outbreaks generated distinct prophage profiles for each outbreak. Taken altogether, the diversity of the prophage sequences correlates with genome diversity. Prophage repertoires provide an additional marker for differentiating S. enterica subtypes during foodborne outbreaks.

  1. Identification and phylogeny of Arabian snakes: Comparison of venom chromatographic profiles versus 16S rRNA gene sequences.

    PubMed

    Al Asmari, Abdulrahman; Manthiri, Rajamohammed Abbas; Khan, Haseeb Ahmad

    2014-11-01

    Identification of snake species is important for various reasons including the emergency treatment of snake bite victims. We present a simple method for identification of six snake species using the gel filtration chromatographic profiles of their venoms. The venoms of Echis coloratus, Echis pyramidum, Cerastes gasperettii, Bitis arietans, Naja arabica, and Walterinnesia aegyptia were milked, lyophilized, diluted and centrifuged to separate the mucus from the venom. The clear supernatants were filtered and chromatographed on fast protein liquid chromatography (FPLC). We obtained the 16S rRNA gene sequences of the above species and performed phylogenetic analysis using the neighbor-joining method. The chromatograms of venoms from different snake species showed peculiar patterns based on the number and location of peaks. The dendrograms generated from similarity matrix based on the presence/absence of particular chromatographic peaks clearly differentiated Elapids from Viperids. Molecular cladistics using 16S rRNA gene sequences resulted in jumping clades while separating the members of these two families. These findings suggest that chromatographic profiles of snake venoms may provide a simple and reproducible chemical fingerprinting method for quick identification of snake species. However, the validation of this methodology requires further studies on large number of specimens from within and across species.

  2. Identification of cancer-specific motifs in mimotope profiles of serum antibody repertoire.

    PubMed

    Gerasimov, Ekaterina; Zelikovsky, Alex; Măndoiu, Ion; Ionov, Yurij

    2017-06-07

    For fighting cancer, earlier detection is crucial. Circulating auto-antibodies produced by the patient's own immune system after exposure to cancer proteins are promising bio-markers for the early detection of cancer. Since an antibody recognizes not the whole antigen but 4-7 critical amino acids within the antigenic determinant (epitope), the whole proteome can be represented by a random peptide phage display library. This opens the possibility to develop an early cancer detection test based on a set of peptide sequences identified by comparing cancer patients' and healthy donors' global peptide profiles of antibody specificities. Due to the enormously large number of peptide sequences contained in global peptide profiles generated by next generation sequencing, the large number of cancer and control sera is required to identify cancer-specific peptides with high degree of statistical significance. To decrease the number of peptides in profiles generated by nextgen sequencing without losing cancer-specific sequences we used for generation of profiles the phage library enriched by panning on the pool of cancer sera. To further decrease the complexity of profiles we used computational methods for transforming a list of peptides constituting the mimotope profiles to the list motifs formed by similar peptide sequences. We have shown that the amino-acid order is meaningful in mimotope motifs since they contain significantly more peptides than motifs among peptides where amino-acids are randomly permuted. Also the single sample motifs significantly differ from motifs in peptides drawn from multiple samples. Finally, multiple cancer-specific motifs have been identified.

  3. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jun, Se -Ran; Hauser, Loren John; Schadt, Christopher Warren

    For decades there has been increasing interest in understanding the relationships between microbial communities and ecosystem functions. Current DNA sequencing technologies allows for the exploration of microbial communities in two principle ways: targeted rRNA gene surveys and shotgun metagenomics. For large study designs, it is often still prohibitively expensive to sequence metagenomes at both the breadth and depth necessary to statistically capture the true functional diversity of a community. Although rRNA gene surveys provide no direct evidence of function, they do provide a reasonable estimation of microbial diversity, while being a very cost effective way to screen samples of interestmore » for later shotgun metagenomic analyses. However, there is a great deal of 16S rRNA gene survey data currently available from diverse environments, and thus a need for tools to infer functional composition of environmental samples based on 16S rRNA gene survey data. As a result, we present a computational method called pangenome based functional profiles (PanFP), which infers functional profiles of microbial communities from 16S rRNA gene survey data for Bacteria and Archaea. PanFP is based on pangenome reconstruction of a 16S rRNA gene operational taxonomic unit (OTU) from known genes and genomes pooled from the OTU s taxonomic lineage. From this lineage, we derive an OTU functional profile by weighting a pangenome s functional profile with the OTUs abundance observed in a given sample. We validated our method by comparing PanFP to the functional profiles obtained from the direct shotgun metagenomic measurement of 65 diverse communities via Spearman correlation coefficients. These correlations improved with increasing sequencing depth, within the range of 0.8 0.9 for the most deeply sequenced Human Microbiome Project mock community samples. PanFP is very similar in performance to another recently released tool, PICRUSt, for almost all of survey data analysed here. But, our method is unique in that any OTU building method can be used, as opposed to being limited to closed reference OTU picking strategies against specific reference sequence databases. In conclusion, we developed an automated computational method, which derives an inferred functional profile based on the 16S rRNA gene surveys of microbial communities. The inferred functional profile provides a cost effective way to study complex ecosystems through predicted comparative functional metagenomes and metadata analysis. All PanFP source code and additional documentation are freely available online at GitHub.« less

  4. Identification of the full-length β-actin sequence and expression profiles in the tree shrew (Tupaia belangeri).

    PubMed

    Zheng, Yu; Yun, Chenxia; Wang, Qihui; Smith, Wanli W; Leng, Jing

    2015-02-01

    The tree shrew (Tupaia belangeri) diverges from the primate order (Primates) and is classified as a separate taxonomic group of mammals - Scandentia. It has been suggested that the tree shrew can be used as an animal model for studying human diseases; however, the genomic sequence of the tree shrew is largely unidentified. In the present study, we reported the full-length cDNA sequence of the housekeeping gene, β-actin, in the tree shrew. The amino acid sequence of β-actin in the tree shrew was compared to that of humans and other species; a simple phylogenetic relationship was discovered. Quantitative polymerase chain reaction (qPCR) and western blot analysis further demonstrated that the expression profiles of β-actin, as a general conservative housekeeping gene, in the tree shrew were similar to those in humans, although the expression levels varied among different types of tissue in the tree shrew. Our data provide evidence that the tree shrew has a close phylogenetic association with humans. These findings further enhance the potential that the tree shrew, as a species, may be used as an animal model for studying human disorders.

  5. Genetic characterisation of Taenia multiceps cysts from ruminants in Greece.

    PubMed

    Al-Riyami, Shumoos; Ioannidou, Evi; Koehler, Anson V; Hussain, Muhammad H; Al-Rawahi, Abdulmajeed H; Giadinis, Nektarios D; Lafi, Shawkat Q; Papadopoulos, Elias; Jabbar, Abdul

    2016-03-01

    This study was designed to genetically characterise the larval stage (coenurus) of Taenia multiceps from ruminants in Greece, utilising DNA regions within the cytochrome c oxidase subunit 1 (partial cox1) and NADH dehydrogenase 1 (pnad1) mitochondrial (mt) genes, respectively. A molecular-phylogenetic approach was used to analyse the pcox1 and pnad1 amplicons derived from genomic DNA samples from individual cysts (n=105) from cattle (n=3), goats (n=5) and sheep (n=97). Results revealed five and six distinct electrophoretic profiles for pcox1 and pnad1, respectively, using single-strand conformation polymorphism. Direct sequencing of selected amplicons representing each of these profiles defined five haplotypes each for pcox1 and pnad1, among all 105 isolates. Phylogenetic analysis of individual sequence data for each locus, including a range of well-defined reference sequences, inferred that all isolates of T. multiceps cysts from ruminants in Greece clustered with previously published sequences from different continents. The present study provides a foundation for future large-scale studies on the epidemiology of T. multiceps in ruminants as well as dogs in Greece. Copyright © 2015 Elsevier B.V. All rights reserved.

  6. Identification of Type A, B, E, and F Botulinum Neurotoxin Genes and of Botulinum Neurotoxigenic Clostridia by Denaturing High-Performance Liquid Chromatography

    PubMed Central

    Franciosa, Giovanna; Pourshaban, Manoocheher; De Luca, Alessandro; Buccino, Anna; Dallapiccola, Bruno; Aureli, Paolo

    2004-01-01

    Denaturing high-performance liquid chromatography (DHPLC) is a recently developed technique for rapid screening of nucleotide polymorphisms in PCR products. We used this technique for the identification of type A, B, E, and F botulinum neurotoxin genes. PCR products amplified from a conserved region of the type A, B, E, and F botulinum toxin genes from Clostridium botulinum, neurotoxigenic C. butyricum type E, and C. baratii type F strains were subjected to both DHPLC analysis and sequencing. Unique DHPLC peak profiles were obtained with each different type of botulinum toxin gene fragment, consistent with nucleotide differences observed in the related sequences. We then evaluated the ability of this technique to identify botulinal neurotoxigenic organisms at the genus and species level. A specific short region of the 16S rRNA gene which contains genus-specific and in some cases species-specific heterogeneity was amplified from botulinum neurotoxigenic clostridia and from different food-borne pathogens and subjected to DHPLC analysis. Different peak profiles were obtained for each genus and species, demonstrating that the technique could be a reliable alternative to sequencing for the rapid identification of food-borne pathogens, specifically of botulinal neurotoxigenic clostridia most frequently implicated in human botulism. PMID:15240298

  7. Use of Life Course Work–Family Profiles to Predict Mortality Risk Among US Women

    PubMed Central

    Guevara, Ivan Mejía; Glymour, M. Maria; Berkman, Lisa F.

    2015-01-01

    Objectives. We examined relationships between US women’s exposure to midlife work–family demands and subsequent mortality risk. Methods. We used data from women born 1935 to 1956 in the Health and Retirement Study to calculate employment, marital, and parenthood statuses for each age between 16 and 50 years. We used sequence analysis to identify 7 prototypical work–family trajectories. We calculated age-standardized mortality rates and hazard ratios (HRs) for mortality associated with work–family sequences, with adjustment for covariates and potentially explanatory later-life factors. Results. Married women staying home with children briefly before reentering the workforce had the lowest mortality rates. In comparison, after adjustment for age, race/ethnicity, and education, HRs for mortality were 2.14 (95% confidence interval [CI] = 1.58, 2.90) among single nonworking mothers, 1.48 (95% CI = 1.06, 1.98) among single working mothers, and 1.36 (95% CI = 1.02, 1.80) among married nonworking mothers. Adjustment for later-life behavioral and economic factors partially attenuated risks. Conclusions. Sequence analysis is a promising exposure assessment tool for life course research. This method permitted identification of certain lifetime work–family profiles associated with mortality risk before age 75 years. PMID:25713976

  8. Phylogenetic profiles reveal structural/functional determinants of TRPC3 signal-sensing antennae

    PubMed Central

    Ko, Kyung Dae; Bhardwaj, Gaurav; Hong, Yoojin; Chang, Gue Su; Kiselyov, Kirill

    2009-01-01

    Biochemical assessment of channel structure/function is incredibly challenging. Developing computational tools that provide these data would enable translational research, accelerating mechanistic experimentation for the bench scientist studying ion channels. Starting with the premise that protein sequence encodes information about structure, function and evolution (SF&E), we developed a unified framework for inferring SF&E from sequence information using a knowledge-based approach. The Gestalt Domain Detection Algorithm-Basic Local Alignment Tool (GDDA-BLAST) provides phylogenetic profiles that can model, ab initio, SF&E relationships of biological sequences at the whole protein, single domain and single-amino acid level.1,2 In our recent paper,4 we have applied GDDA-BLAST analysis to study canonical TRP (TRPC) channels1 and empirically validated predicted lipid-binding and trafficking activities contained within the TRPC3 TRP_2 domain of unknown function. Overall, our in silico, in vitro, and in vivo experiments support a model in which TRPC3 has signal-sensing antennae which are adorned with lipid-binding, trafficking and calmodulin regulatory domains. In this Addendum, we correlate our functional domain analysis with the cryo-EM structure of TRPC3.3 In addition, we synthesize recent studies with our new findings to provide a refined model on the mechanism(s) of TRPC3 activation/deactivation. PMID:19704910

  9. Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis.

    PubMed

    Bonham-Carter, Oliver; Steele, Joe; Bastola, Dhundy

    2014-11-01

    Modern sequencing and genome assembly technologies have provided a wealth of data, which will soon require an analysis by comparison for discovery. Sequence alignment, a fundamental task in bioinformatics research, may be used but with some caveats. Seminal techniques and methods from dynamic programming are proving ineffective for this work owing to their inherent computational expense when processing large amounts of sequence data. These methods are prone to giving misleading information because of genetic recombination, genetic shuffling and other inherent biological events. New approaches from information theory, frequency analysis and data compression are available and provide powerful alternatives to dynamic programming. These new methods are often preferred, as their algorithms are simpler and are not affected by synteny-related problems. In this review, we provide a detailed discussion of computational tools, which stem from alignment-free methods based on statistical analysis from word frequencies. We provide several clear examples to demonstrate applications and the interpretations over several different areas of alignment-free analysis such as base-base correlations, feature frequency profiles, compositional vectors, an improved string composition and the D2 statistic metric. Additionally, we provide detailed discussion and an example of analysis by Lempel-Ziv techniques from data compression. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  10. Gene expression profiling of the plant pathogenic basidiomycetous fungus Rhizoctonia solani AG 4 reveals putative virulence factors

    USDA-ARS?s Scientific Manuscript database

    Rhizoctonia solani is a ubiquitous basidiomycetous soilborne fungal pathogen causing damping off of seedlings, aerial blights and postharvest diseases. To gain insight into the molecular mechanisms of pathogenesis a global approach based on analysis of expressed sequence tags (ESTs) was undertaken. ...

  11. Checking of individuality by DNA profiling.

    PubMed

    Brdicka, R; Nürnberg, P

    1993-08-25

    A review of methods of DNA analysis used in forensic medicine for identification, paternity testing, etc. is provided. Among other techniques, DNA fingerprinting using different probes and polymerase chain reaction-based techniques such as amplified sequence polymorphisms and minisatellite variant repeat mapping are thoroughly described and both theoretical and practical aspects are discussed.

  12. New natural products identified by combined genomics-metabolomics profiling of marine Streptomyces sp. MP131-18.

    PubMed

    Paulus, Constanze; Rebets, Yuriy; Tokovenko, Bogdan; Nadmid, Suvd; Terekhova, Larisa P; Myronovskyi, Maksym; Zotchev, Sergey B; Rückert, Christian; Braig, Simone; Zahler, Stefan; Kalinowski, Jörn; Luzhetskyy, Andriy

    2017-02-10

    Marine actinobacteria are drawing more and more attention as a promising source of new natural products. Here we report isolation, genome sequencing and metabolic profiling of new strain Streptomyces sp. MP131-18 isolated from marine sediment sample collected in the Trondheim Fjord, Norway. The 16S rRNA and multilocus phylogenetic analysis showed that MP131-18 belongs to the genus Streptomyces. The genome of MP131-18 isolate was sequenced, and 36 gene clusters involved in the biosynthesis of 18 different types of secondary metabolites were predicted using antiSMASH analysis. The combined genomics-metabolics profiling of the strain led to the identification of several new biologically active compounds. As a result, the family of bisindole pyrroles spiroindimicins was extended with two new members, spiroindimicins E and F. Furthermore, prediction of the biosynthetic pathway for unusual α-pyrone lagunapyrone isolated from MP131-18 resulted in foresight and identification of two new compounds of this family - lagunapyrones D and E. The diversity of identified and predicted compounds from Streptomyces sp. MP131-18 demonstrates that marine-derived actinomycetes are not only a promising source of new natural products, but also represent a valuable pool of genes for combinatorial biosynthesis of secondary metabolites.

  13. New natural products identified by combined genomics-metabolomics profiling of marine Streptomyces sp. MP131-18

    PubMed Central

    Paulus, Constanze; Rebets, Yuriy; Tokovenko, Bogdan; Nadmid, Suvd; Terekhova, Larisa P.; Myronovskyi, Maksym; Zotchev, Sergey B.; Rückert, Christian; Braig, Simone; Zahler, Stefan; Kalinowski, Jörn; Luzhetskyy, Andriy

    2017-01-01

    Marine actinobacteria are drawing more and more attention as a promising source of new natural products. Here we report isolation, genome sequencing and metabolic profiling of new strain Streptomyces sp. MP131-18 isolated from marine sediment sample collected in the Trondheim Fjord, Norway. The 16S rRNA and multilocus phylogenetic analysis showed that MP131-18 belongs to the genus Streptomyces. The genome of MP131-18 isolate was sequenced, and 36 gene clusters involved in the biosynthesis of 18 different types of secondary metabolites were predicted using antiSMASH analysis. The combined genomics-metabolics profiling of the strain led to the identification of several new biologically active compounds. As a result, the family of bisindole pyrroles spiroindimicins was extended with two new members, spiroindimicins E and F. Furthermore, prediction of the biosynthetic pathway for unusual α-pyrone lagunapyrone isolated from MP131-18 resulted in foresight and identification of two new compounds of this family – lagunapyrones D and E. The diversity of identified and predicted compounds from Streptomyces sp. MP131-18 demonstrates that marine-derived actinomycetes are not only a promising source of new natural products, but also represent a valuable pool of genes for combinatorial biosynthesis of secondary metabolites. PMID:28186197

  14. Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling.

    PubMed

    Łabaj, Paweł P; Leparc, Germán G; Linggi, Bryan E; Markillie, Lye Meng; Wiley, H Steven; Kreil, David P

    2011-07-01

    Measurement precision determines the power of any analysis to reliably identify significant signals, such as in screens for differential expression, independent of whether the experimental design incorporates replicates or not. With the compilation of large-scale RNA-Seq datasets with technical replicate samples, however, we can now, for the first time, perform a systematic analysis of the precision of expression level estimates from massively parallel sequencing technology. This then allows considerations for its improvement by computational or experimental means. We report on a comprehensive study of target identification and measurement precision, including their dependence on transcript expression levels, read depth and other parameters. In particular, an impressive recall of 84% of the estimated true transcript population could be achieved with 331 million 50 bp reads, with diminishing returns from longer read lengths and even less gains from increased sequencing depths. Most of the measurement power (75%) is spent on only 7% of the known transcriptome, however, making less strongly expressed transcripts harder to measure. Consequently, <30% of all transcripts could be quantified reliably with a relative error<20%. Based on established tools, we then introduce a new approach for mapping and analysing sequencing reads that yields substantially improved performance in gene expression profiling, increasing the number of transcripts that can reliably be quantified to over 40%. Extrapolations to higher sequencing depths highlight the need for efficient complementary steps. In discussion we outline possible experimental and computational strategies for further improvements in quantification precision. rnaseq10@boku.ac.at

  15. Comparative phylobiomic analysis of the bacterial community of water kefir by 16S rRNA gene amplicon sequencing and ARDRA analysis.

    PubMed

    Gulitz, A; Stadie, J; Ehrmann, M A; Ludwig, W; Vogel, R F

    2013-04-01

    The aim of this study was to analyse the bacterial microbiota of water kefir using culture-independent methods. We compared four water kefirs of different origins using 16S rDNA amplicon sequencing and ARDRA. The microbiota consisted of different proportions of the genera Lactobacillus (Lact.), Leuconostoc (Leuc.), Acetobacter (Acet.) and Gluconobacter. Surprisingly, varying but consistently high numbers of sequences representing members of the genus Bifidobacterium (Bif.) were found in all kefirs. Whereas part of the bifidobacterial sequences could be assigned to Bifidobacterium psychraerophilum, a majority of sequences identical to each other could not be assigned to any known species. A nearly full-length sequence of the latter exhibited a beyond-species similarity (96.4%) with the sequence from the closest relative species Bif. psychraerophilum. A Bifidobacterium-specific ARDRA analysis reflected the abundance of the novel Bifidobacterium species by revealing its unique MboI restriction profile. Attempts to isolate the bifidobacteria were successful for Bif. psychraerophilum only. The complexity of the water kefir microbiota has been underestimated in previously studies. The occurrence of bifidobacteria as part of the consortium is novel. These data give new insights into the understanding of the complexity of food fermentations and underline the need for approaches detecting noncultivable organisms. © 2013 The Society for Applied Microbiology.

  16. Brucella papionis sp. nov., isolated from baboons (Papio spp.).

    PubMed

    Whatmore, Adrian M; Davison, Nicholas; Cloeckaert, Axel; Al Dahouk, Sascha; Zygmunt, Michel S; Brew, Simon D; Perrett, Lorraine L; Koylass, Mark S; Vergnaud, Gilles; Quance, Christine; Scholz, Holger C; Dick, Edward J; Hubbard, Gene; Schlabritz-Loutsevitch, Natalia E

    2014-12-01

    Two Gram-negative, non-motile, non-spore-forming coccoid bacteria (strains F8/08-60(T) and F8/08-61) isolated from clinical specimens obtained from baboons (Papio spp.) that had delivered stillborn offspring were subjected to a polyphasic taxonomic study. On the basis of 16S rRNA gene sequence similarities, both strains, which possessed identical sequences, were assigned to the genus Brucella. This placement was confirmed by extended multilocus sequence analysis (MLSA), where both strains possessed identical sequences, and whole-genome sequencing of a representative isolate. All of the above analyses suggested that the two strains represent a novel lineage within the genus Brucella. The strains also possessed a unique profile when subjected to the phenotyping approach classically used to separate species of the genus Brucella, reacting only with Brucella A monospecific antiserum, being sensitive to the dyes thionin and fuchsin, being lysed by bacteriophage Wb, Bk2 and Fi phage at routine test dilution (RTD) but only partially sensitive to bacteriophage Tb, and with no requirement for CO2 and no production of H2S but strong urease activity. Biochemical profiling revealed a pattern of enzyme activity and metabolic capabilities distinct from existing species of the genus Brucella. Molecular analysis of the omp2 locus genes showed that both strains had a novel combination of two highly similar omp2b gene copies. The two strains shared a unique fingerprint profile of the multiple-copy Brucella-specific element IS711. Like MLSA, a multilocus variable number of tandem repeat analysis (MLVA) showed that the isolates clustered together very closely, but represent a distinct group within the genus Brucella. Isolates F8/08-60(T) and F8/08-61 could be distinguished clearly from all known species of the genus Brucella and their biovars by both phenotypic and molecular properties. Therefore, by applying the species concept for the genus Brucella suggested by the ICSP Subcommittee on the Taxonomy of Brucella, they represent a novel species within the genus Brucella, for which the name Brucella papionis sp. nov. is proposed, with the type strain F8/08-60(T) ( = NCTC 13660(T) = CIRMBP 0958(T)). Crown Copyright 2014. Reproduced with the permission of the Controller of Her Majesty's Stationery Office/Queen's Printer for Scotland and AHVLA.

  17. Investigation of genetic diversity and epidemiological characteristics of Pasteurella multocida isolates from poultry in southwest China by population structure, multi-locus sequence typing and virulence-associated gene profile analysis.

    PubMed

    Li, Zhangcheng; Cheng, Fangjun; Lan, Shimei; Guo, Jianhua; Liu, Wei; Li, Xiaoyan; Luo, Zeli; Zhang, Manli; Wu, Juan; Shi, Yang

    2018-04-25

    Fowl cholera caused by Pasteurella multocida has always been a disease of global importance for poultry production. The aim of this study was to obtain more information about the epidemiology of avian P. multocida infection in southwest China and the genetic characteristics of clinical isolates. P. multocida isolates were characterized by biochemical and molecular-biological methods. The distributions of the capsular serogroups, the phenotypic antimicrobial resistance profiles, lipopolysaccharide (LPS) genotyping and the presence of 19 virulence genes were investigated in 45 isolates of P. multocida that were associated with clinical disease in poultry. The genetic diversity of P. multocida strains was performed by 16S rRNA and rpoB gene sequence analysis as well as multilocus sequence typing (MLST). The results showed that most (80.0%) of the P. multocida isolates in this study represented special P. multocida subspecies, and 71.1% of the isolates showed multiple-drug resistance. 45 isolates belonged to capsular types: A (100%) and two LPS genotypes: L1 (95.6%) and L3 (4.4%). MLST revealed two new alleles (pmi77 and gdh57) and one new sequence type (ST342). ST129 types dominated in 45 P. multocida isolates. Isolates belonging to ST129 were with the genes ompH+plpB+ptfA+tonB, whereas ST342 included isolates with fur+hgbA+tonB genes. Population genetic analysis and the MLST results revealed that at least one new ST genotype was present in the avian P. multocida in China. These findings provide novel insights into the epidemiological characteristics of avian P. multocida isolates in southwest China.

  18. Identification of copy number variants in whole-genome data using Reference Coverage Profiles

    PubMed Central

    Glusman, Gustavo; Severson, Alissa; Dhankani, Varsha; Robinson, Max; Farrah, Terry; Mauldin, Denise E.; Stittrich, Anna B.; Ament, Seth A.; Roach, Jared C.; Brunkow, Mary E.; Bodian, Dale L.; Vockley, Joseph G.; Shmulevich, Ilya; Niederhuber, John E.; Hood, Leroy

    2015-01-01

    The identification of DNA copy numbers from short-read sequencing data remains a challenge for both technical and algorithmic reasons. The raw data for these analyses are measured in tens to hundreds of gigabytes per genome; transmitting, storing, and analyzing such large files is cumbersome, particularly for methods that analyze several samples simultaneously. We developed a very efficient representation of depth of coverage (150–1000× compression) that enables such analyses. Current methods for analyzing variants in whole-genome sequencing (WGS) data frequently miss copy number variants (CNVs), particularly hemizygous deletions in the 1–100 kb range. To fill this gap, we developed a method to identify CNVs in individual genomes, based on comparison to joint profiles pre-computed from a large set of genomes. We analyzed depth of coverage in over 6000 high quality (>40×) genomes. The depth of coverage has strong sequence-specific fluctuations only partially explained by global parameters like %GC. To account for these fluctuations, we constructed multi-genome profiles representing the observed or inferred diploid depth of coverage at each position along the genome. These Reference Coverage Profiles (RCPs) take into account the diverse technologies and pipeline versions used. Normalization of the scaled coverage to the RCP followed by hidden Markov model (HMM) segmentation enables efficient detection of CNVs and large deletions in individual genomes. Use of pre-computed multi-genome coverage profiles improves our ability to analyze each individual genome. We make available RCPs and tools for performing these analyses on personal genomes. We expect the increased sensitivity and specificity for individual genome analysis to be critical for achieving clinical-grade genome interpretation. PMID:25741365

  19. Divergent evolution of arrested development in the dauer stage of Caenorhabditis elegans and the infective stage of Heterodera glycines

    PubMed Central

    Elling, Axel A; Mitreva, Makedonka; Recknor, Justin; Gai, Xiaowu; Martin, John; Maier, Thomas R; McDermott, Jeffrey P; Hewezi, Tarek; McK Bird, David; Davis, Eric L; Hussey, Richard S; Nettleton, Dan; McCarter, James P; Baum, Thomas J

    2007-01-01

    Background The soybean cyst nematode Heterodera glycines is the most important parasite in soybean production worldwide. A comprehensive analysis of large-scale gene expression changes throughout the development of plant-parasitic nematodes has been lacking to date. Results We report an extensive genomic analysis of H. glycines, beginning with the generation of 20,100 expressed sequence tags (ESTs). In-depth analysis of these ESTs plus approximately 1,900 previously published sequences predicted 6,860 unique H. glycines genes and allowed a classification by function using InterProScan. Expression profiling of all 6,860 genes throughout the H. glycines life cycle was undertaken using the Affymetrix Soybean Genome Array GeneChip. Our data sets and results represent a comprehensive resource for molecular studies of H. glycines. Demonstrating the power of this resource, we were able to address whether arrested development in the Caenorhabditis elegans dauer larva and the H. glycines infective second-stage juvenile (J2) exhibits shared gene expression profiles. We determined that the gene expression profiles associated with the C. elegans dauer pathway are not uniformly conserved in H. glycines and that the expression profiles of genes for metabolic enzymes of C. elegans dauer larvae and H. glycines infective J2 are dissimilar. Conclusion Our results indicate that hallmark gene expression patterns and metabolism features are not shared in the developmentally arrested life stages of C. elegans and H. glycines, suggesting that developmental arrest in these two nematode species has undergone more divergent evolution than previously thought and pointing to the need for detailed genomic analyses of individual parasite species. PMID:17919324

  20. Functional phylogenomics analysis of bacteria and archaea using consistent genome annotation with UniFam

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chai, Juanjuan; Kora, Guruprasad; Ahn, Tae-Hyuk

    2014-10-09

    To supply some background, phylogenetic studies have provided detailed knowledge on the evolutionary mechanisms of genes and species in Bacteria and Archaea. However, the evolution of cellular functions, represented by metabolic pathways and biological processes, has not been systematically characterized. Many clades in the prokaryotic tree of life have now been covered by sequenced genomes in GenBank. This enables a large-scale functional phylogenomics study of many computationally inferred cellular functions across all sequenced prokaryotes. Our results show a total of 14,727 GenBank prokaryotic genomes were re-annotated using a new protein family database, UniFam, to obtain consistent functional annotations for accuratemore » comparison. The functional profile of a genome was represented by the biological process Gene Ontology (GO) terms in its annotation. The GO term enrichment analysis differentiated the functional profiles between selected archaeal taxa. 706 prokaryotic metabolic pathways were inferred from these genomes using Pathway Tools and MetaCyc. The consistency between the distribution of metabolic pathways in the genomes and the phylogenetic tree of the genomes was measured using parsimony scores and retention indices. The ancestral functional profiles at the internal nodes of the phylogenetic tree were reconstructed to track the gains and losses of metabolic pathways in evolutionary history. In conclusion, our functional phylogenomics analysis shows divergent functional profiles of taxa and clades. Such function-phylogeny correlation stems from a set of clade-specific cellular functions with low parsimony scores. On the other hand, many cellular functions are sparsely dispersed across many clades with high parsimony scores. These different types of cellular functions have distinct evolutionary patterns reconstructed from the prokaryotic tree.« less

  1. A Concise Atlas of Thyroid Cancer Next-Generation Sequencing Panel ThyroSeq v.2

    PubMed Central

    Alsina, Jorge; Alsina, Raul; Gulec, Seza

    2017-01-01

    The next-generation sequencing technology allows high out-put genomic analysis. An innovative assay in thyroid cancer, ThyroSeq® was developed for targeted mutation detection by next generation sequencing technology in fine needle aspiration and tissue samples. ThyroSeq v.2 next generation sequencing panel offers simultaneous sequencing and detection in >1000 hotspots of 14 thyroid cancer-related genes and for 42 types of gene fusions known to occur in thyroid cancer. ThyroSeq is being increasingly used to further narrow the indeterminate category defined by cytology for thyroid nodules. From a surgical perspective, genomic profiling also provides prognostic and predictive information and closely relates to determination of surgical strategy. Both the genomic analysis technology and the informatics for the cancer genome data base are rapidly developing. In this paper, we have gathered existing information on the thyroid cancer-related genes involved in the initiation and progression of thyroid cancer. Our goal is to assemble a glossary for the current ThyroSeq genomic panel that can help elucidate the role genomics play in thyroid cancer oncogenesis. PMID:28117295

  2. Identification of immunity-related genes in the larvae of Protaetia brevitarsis seulensis (Coleoptera: Cetoniidae) by a next-generation sequencing-based transcriptome analysis.

    PubMed

    Bang, Kyeongrin; Hwang, Sejung; Lee, Jiae; Cho, Saeyoull

    2015-01-01

    To identify immune-related genes in the larvae of white-spotted flower chafers, next-generation sequencing was conducted with an Illumina HiSeq2000, resulting in 100 million cDNA reads with sequence information from over 10 billion base pairs (bp) and >50× transcriptome coverage. A subset of 77,336 contigs was created, and ∼35,532 sequences matched entries against the NCBI nonredundant database (cutoff, e < 10(-5)). Statistical analysis was performed on the 35,532 contigs. For profiling of the immune response, samples were analyzed by aligning 42 base sequence tags to the de novo reference assembly, comparing levels in immunized larvae to control levels of expression. Of the differentially expressed genes, 3,440 transcripts were upregulated and 3,590 transcripts were downregulated. Many of these genes were confirmed as immune-related genes such as pattern recognition proteins, immune-related signal transduction proteins, antimicrobial peptides, and cellular response proteins, by comparison to published data. © The Author 2015. Published by Oxford University Press on behalf of the Entomological Society of America.

  3. Nanopore sequencing in microgravity

    PubMed Central

    McIntyre, Alexa B R; Rizzardi, Lindsay; Yu, Angela M; Alexander, Noah; Rosen, Gail L; Botkin, Douglas J; Stahl, Sarah E; John, Kristen K; Castro-Wallace, Sarah L; McGrath, Ken; Burton, Aaron S; Feinberg, Andrew P; Mason, Christopher E

    2016-01-01

    Rapid DNA sequencing and analysis has been a long-sought goal in remote research and point-of-care medicine. In microgravity, DNA sequencing can facilitate novel astrobiological research and close monitoring of crew health, but spaceflight places stringent restrictions on the mass and volume of instruments, crew operation time, and instrument functionality. The recent emergence of portable, nanopore-based tools with streamlined sample preparation protocols finally enables DNA sequencing on missions in microgravity. As a first step toward sequencing in space and aboard the International Space Station (ISS), we tested the Oxford Nanopore Technologies MinION during a parabolic flight to understand the effects of variable gravity on the instrument and data. In a successful proof-of-principle experiment, we found that the instrument generated DNA reads over the course of the flight, including the first ever sequenced in microgravity, and additional reads measured after the flight concluded its parabolas. Here we detail modifications to the sample-loading procedures to facilitate nanopore sequencing aboard the ISS and in other microgravity environments. We also evaluate existing analysis methods and outline two new approaches, the first based on a wave-fingerprint method and the second on entropy signal mapping. Computationally light analysis methods offer the potential for in situ species identification, but are limited by the error profiles (stays, skips, and mismatches) of older nanopore data. Higher accuracies attainable with modified sample processing methods and the latest version of flow cells will further enable the use of nanopore sequencers for diagnostics and research in space. PMID:28725742

  4. Comparative Analysis of 35 Basidiomycete Genomes Reveals Diversity and Uniqueness of the Phylum

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Riley, Robert; Salamov, Asaf; Otillar, Robert

    Fungi of the phylum Basidiomycota (basidiomycetes), make up some 37percent of the described fungi, and are important in forestry, agriculture, medicine, and bioenergy. This diverse phylum includes symbionts, pathogens, and saprobes including wood decaying fungi. To better understand the diversity of this phylum we compared the genomes of 35 basidiomycete fungi including 6 newly sequenced genomes. The genomes of basidiomycetes span extremes of genome size, gene number, and repeat content. A phylogenetic tree of Basidiomycota was generated using the Phyldog software, which uses all available protein sequence data to simultaneously infer gene and species trees. Analysis of core genes revealsmore » that some 48percent of basidiomycete proteins are unique to the phylum with nearly half of those (22percent) comprising proteins found in only one organism. Phylogenetic patterns of plant biomass-degrading genes suggest a continuum rather than a sharp dichotomy between the white rot and brown rot modes of wood decay among the members of Agaricomycotina subphylum. There is a correlation of the profile of certain gene families to nutritional mode in Agaricomycotina. Based on phylogenetically-informed PCA analysis of such profiles, we predict that that Botryobasidium botryosum and Jaapia argillacea have properties similar to white rot species, although neither has liginolytic class II fungal peroxidases. Furthermore, we find that both fungi exhibit wood decay with white rot-like characteristics in growth assays. Analysis of the rate of discovery of proteins with no or few homologs suggests the high value of continued sequencing of basidiomycete fungi.« less

  5. Informatic selection of a neural crest-melanocyte cDNA set for microarray analysis

    PubMed Central

    Loftus, S. K.; Chen, Y.; Gooden, G.; Ryan, J. F.; Birznieks, G.; Hilliard, M.; Baxevanis, A. D.; Bittner, M.; Meltzer, P.; Trent, J.; Pavan, W.

    1999-01-01

    With cDNA microarrays, it is now possible to compare the expression of many genes simultaneously. To maximize the likelihood of finding genes whose expression is altered under the experimental conditions, it would be advantageous to be able to select clones for tissue-appropriate cDNA sets. We have taken advantage of the extensive sequence information in the dbEST expressed sequence tag (EST) database to identify a neural crest-derived melanocyte cDNA set for microarray analysis. Analysis of characterized genes with dbEST identified one library that contained ESTs representing 21 neural crest-expressed genes (library 198). The distribution of the ESTs corresponding to these genes was biased toward being derived from library 198. This is in contrast to the EST distribution profile for a set of control genes, characterized to be more ubiquitously expressed in multiple tissues (P < 1 × 10−9). From library 198, a subset of 852 clustered ESTs were selected that have a library distribution profile similar to that of the 21 neural crest-expressed genes. Microarray analysis demonstrated the majority of the neural crest-selected 852 ESTs (Mel1 array) were differentially expressed in melanoma cell lines compared with a non-neural crest kidney epithelial cell line (P < 1 × 10−8). This was not observed with an array of 1,238 ESTs that was selected without library origin bias (P = 0.204). This study presents an approach for selecting tissue-appropriate cDNAs that can be used to examine the expression profiles of developmental processes and diseases. PMID:10430933

  6. Identification and characterization of Burkholderia multivorans CCA53.

    PubMed

    Akita, Hironaga; Kimura, Zen-Ichiro; Yusoff, Mohd Zulkhairi Mohd; Nakashima, Nobutaka; Hoshino, Tamotsu

    2017-07-06

    A lignin-degrading bacterium, Burkholderia sp. CCA53, was previously isolated from leaf soil. The purpose of this study was to determine phenotypic and biochemical features of Burkholderia sp. CCA53. Multilocus sequence typing (MLST) analysis based on fragments of the atpD, gltD, gyrB, lepA, recA and trpB gene sequences was performed to identify Burkholderia sp. CCA53. The MLST analysis revealed that Burkholderia sp. CCA53 was tightly clustered with B. multivorans ATCC BAA-247 T . The quinone and cellular fatty acid profiles, carbon source utilization, growth temperature and pH were consistent with the characteristics of B. multivorans species. Burkholderia sp. CCA53 was therefore identified as B. multivorans CCA53.

  7. Using Genome-Wide Expression Profiling to Define Gene Networks Relevant to the Study of Complex Traits: From RNA Integrity to Network Topology

    PubMed Central

    O'Brien, M.A.; Costin, B.N.; Miles, M.F.

    2014-01-01

    Postgenomic studies of the function of genes and their role in disease have now become an area of intense study since efforts to define the raw sequence material of the genome have largely been completed. The use of whole-genome approaches such as microarray expression profiling and, more recently, RNA-sequence analysis of transcript abundance has allowed an unprecedented look at the workings of the genome. However, the accurate derivation of such high-throughput data and their analysis in terms of biological function has been critical to truly leveraging the postgenomic revolution. This chapter will describe an approach that focuses on the use of gene networks to both organize and interpret genomic expression data. Such networks, derived from statistical analysis of large genomic datasets and the application of multiple bioinformatics data resources, poten-tially allow the identification of key control elements for networks associated with human disease, and thus may lead to derivation of novel therapeutic approaches. However, as discussed in this chapter, the leveraging of such networks cannot occur without a thorough understanding of the technical and statistical factors influencing the derivation of genomic expression data. Thus, while the catch phrase may be “it's the network … stupid,” the understanding of factors extending from RNA isolation to genomic profiling technique, multivariate statistics, and bioinformatics are all critical to defining fully useful gene networks for study of complex biology. PMID:23195313

  8. AQME: A forensic mitochondrial DNA analysis tool for next-generation sequencing data.

    PubMed

    Sturk-Andreaggi, Kimberly; Peck, Michelle A; Boysen, Cecilie; Dekker, Patrick; McMahon, Timothy P; Marshall, Charla K

    2017-11-01

    The feasibility of generating mitochondrial DNA (mtDNA) data has expanded considerably with the advent of next-generation sequencing (NGS), specifically in the generation of entire mtDNA genome (mitogenome) sequences. However, the analysis of these data has emerged as the greatest challenge to implementation in forensics. To address this need, a custom toolkit for use in the CLC Genomics Workbench (QIAGEN, Hilden, Germany) was developed through a collaborative effort between the Armed Forces Medical Examiner System - Armed Forces DNA Identification Laboratory (AFMES-AFDIL) and QIAGEN Bioinformatics. The AFDIL-QIAGEN mtDNA Expert, or AQME, generates an editable mtDNA profile that employs forensic conventions and includes the interpretation range required for mtDNA data reporting. AQME also integrates an mtDNA haplogroup estimate into the analysis workflow, which provides the analyst with phylogenetic nomenclature guidance and a profile quality check without the use of an external tool. Supplemental AQME outputs such as nucleotide-per-position metrics, configurable export files, and an audit trail are produced to assist the analyst during review. AQME is applied to standard CLC outputs and thus can be incorporated into any mtDNA bioinformatics pipeline within CLC regardless of sample type, library preparation or NGS platform. An evaluation of AQME was performed to demonstrate its functionality and reliability for the analysis of mitogenome NGS data. The study analyzed Illumina mitogenome data from 21 samples (including associated controls) of varying quality and sample preparations with the AQME toolkit. A total of 211 tool edits were automatically applied to 130 of the 698 total variants reported in an effort to adhere to forensic nomenclature. Although additional manual edits were required for three samples, supplemental tools such as mtDNA haplogroup estimation assisted in identifying and guiding these necessary modifications to the AQME-generated profile. Along with profile generation, AQME reported accurate haplogroups for 18 of the 19 samples analyzed. The single errant haplogroup assignment, although phylogenetically close, identified a bug that only affects partial mitogenome data. Future adjustments to AQME's haplogrouping tool will address this bug as well as enhance the overall scoring strategy to better refine and automate haplogroup assignments. As NGS enables broader use of the mtDNA locus in forensics, the availability of AQME and other forensic-focused mtDNA analysis tools will ease the transition and further support mitogenome analysis within routine casework. Toward this end, the AFMES-AFDIL has utilized the AQME toolbox in conjunction with the CLC Genomics Workbench to successfully validate and implement two NGS mitogenome methods. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. Technical Report: Benchmarking for Quasispecies Abundance Inference with Confidence Intervals from Metagenomic Sequence Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McLoughlin, K.

    2016-01-22

    The software application “MetaQuant” was developed by our group at Lawrence Livermore National Laboratory (LLNL). It is designed to profile microbial populations in a sample using data from whole-genome shotgun (WGS) metagenomic DNA sequencing. Several other metagenomic profiling applications have been described in the literature. We ran a series of benchmark tests to compare the performance of MetaQuant against that of a few existing profiling tools, using real and simulated sequence datasets. This report describes our benchmarking procedure and results.

  10. Unraveling the oral cancer lncRNAome: Identification of novel lncRNAs associated with malignant progression and HPV infection.

    PubMed

    Nohata, Nijiro; Abba, Martin C; Gutkind, J Silvio

    2016-08-01

    The role of long non-coding RNA (lncRNA) expression in human head and neck squamous cell carcinoma (HNSCC) is still poorly understood. In this study, we aimed at establishing the onco-lncRNAome profiling of HNSCC and to identify lncRNAs correlating with prognosis and patient survival. The Atlas of Noncoding RNAs in Cancer (TANRIC) database was employed to retrieve the lncRNA expression information generated from The Cancer Genome Atlas (TCGA) HNSCC RNA-sequencing data. RNA-sequencing data from HNSCC cell lines were also considered for this study. Bioinformatics approaches, such as differential gene expression analysis, survival analysis, principal component analysis, and Co-LncRNA enrichment analysis were performed. Using TCGA HNSCC RNA-sequencing data from 426 HNSCC and 42 adjacent normal tissues, we found 728 lncRNA transcripts significantly and differentially expressed in HNSCC. Among the 728 lncRNAs, 55 lncRNAs were significantly associated with poor prognosis, such as overall survival and/or disease-free survival. Next, we found 140 lncRNA transcripts significantly and differentially expressed between Human Papilloma Virus (HPV) positive tumors and HPV negative tumors. Thirty lncRNA transcripts were differentially expressed between TP53 mutated and TP53 wild type tumors. Co-LncRNA analysis suggested that protein-coding genes that are co-expressed with these deregulated lncRNAs might be involved in cancer associated molecular events. With consideration of differential expression of lncRNAs in a HNSCC cell lines panel (n=22), we found several lncRNAs that may represent potential targets for diagnosis, therapy and prevention of HNSCC. LncRNAs profiling could provide novel insights into the potential mechanisms of HNSCC oncogenesis. Copyright © 2016 Elsevier Ltd. All rights reserved.

  11. Unraveling the Oral Cancer lncRNAome: Identification of Novel lncRNAs Associated with Malignant Progression and HPV Infection

    PubMed Central

    Nohata, Nijiro; Abba, Martin C.; Gutkind, J. Silvio

    2017-01-01

    Objectives The role of long non-coding RNA (lncRNA) expression in human head and neck squamous cell carcinoma (HNSCC) is still poorly understood. In this study, we aimed at establishing the onco-lncRNAome profiling of HNSCC and to identify lncRNAs correlating with prognosis and patient survival. Materials and Methods The Atlas of Noncoding RNAs in Cancer (TANRIC) database was employed to retrieve the lncRNA expression information generated from The Cancer Genome Atlas (TCGA) HNSCC RNA-sequencing data. RNA-sequencing data from HNSCC cell lines were also considered for this study. Bioinformatics approaches, such as differential gene expression analysis, survival analysis, principal component analysis, and Co-LncRNA enrichment analysis were performed. Results Using TCGA HNSCC RNA-sequencing data from 426 HNSCC and 42 adjacent normal tissues, we found 728 lncRNA transcripts significantly and differentially expressed in HNSCC. Among the 728 lncRNAs, 55 lncRNAs were significantly associated with poor prognosis, such as overall survival and/or disease-free survival. Next, we found 140 lncRNA transcripts significantly and differentially expressed between Human Papilloma Virus (HPV) positive tumors and HPV negative tumors. Thirty lncRNA transcripts were differentially expressed between TP53 mutated and TP53 wild type tumors. Co-LncRNA analysis suggested that protein-coding genes that are co-expressed with these deregulated lncRNAs might be involved in cancer associated molecular events. With consideration of differential expression of lncRNAs in a HNSCC cell lines panel (n=22), we found several lncRNAs that may represent potential targets for diagnosis, therapy and prevention of HNSCC. Conclusion LncRNAs profiling could provide novel insights into the potential mechanisms of HNSCC oncogenesis. PMID:27424183

  12. Genome-wide transcriptome and expression profile analysis of Phalaenopsis during explant browning.

    PubMed

    Xu, Chuanjun; Zeng, Biyu; Huang, Junmei; Huang, Wen; Liu, Yumei

    2015-01-01

    Explant browning presents a major problem for in vitro culture, and can lead to the death of the explant and failure of regeneration. Considerable work has examined the physiological mechanisms underlying Phalaenopsis leaf explant browning, but the molecular mechanisms of browning remain elusive. In this study, we used whole genome RNA sequencing to examine Phalaenopsis leaf explant browning at genome-wide level. We first used Illumina high-throughput technology to sequence the transcriptome of Phalaenopsis and then performed de novo transcriptome assembly. We assembled 79,434,350 clean reads into 31,708 isogenes and generated 26,565 annotated unigenes. We assigned Gene Ontology (GO) terms, Kyoto Encyclopedia of Genes and Genomes (KEGG) annotations, and potential Pfam domains to each transcript. Using the transcriptome data as a reference, we next analyzed the differential gene expression of explants cultured for 0, 3, and 6 d, respectively. We then identified differentially expressed genes (DEGs) before and after Phalaenopsis explant browning. We also performed GO, KEGG functional enrichment and Pfam analysis of all DEGs. Finally, we selected 11 genes for quantitative real-time PCR (qPCR) analysis to confirm the expression profile analysis. Here, we report the first comprehensive analysis of transcriptome and expression profiles during Phalaenopsis explant browning. Our results suggest that Phalaenopsis explant browning may be due in part to gene expression changes that affect the secondary metabolism, such as: phenylpropanoid pathway and flavonoid biosynthesis. Genes involved in photosynthesis and ATPase activity have been found to be changed at transcription level; these changes may perturb energy metabolism and thus lead to the decay of plant cells and tissues. This study provides comprehensive gene expression data for Phalaenopsis browning. Our data constitute an important resource for further functional studies to prevent explant browning.

  13. Genome-Wide Transcriptome and Expression Profile Analysis of Phalaenopsis during Explant Browning

    PubMed Central

    Xu, Chuanjun; Zeng, Biyu; Huang, Junmei; Huang, Wen; Liu, Yumei

    2015-01-01

    Background Explant browning presents a major problem for in vitro culture, and can lead to the death of the explant and failure of regeneration. Considerable work has examined the physiological mechanisms underlying Phalaenopsis leaf explant browning, but the molecular mechanisms of browning remain elusive. In this study, we used whole genome RNA sequencing to examine Phalaenopsis leaf explant browning at genome-wide level. Methodology/Principal Findings We first used Illumina high-throughput technology to sequence the transcriptome of Phalaenopsis and then performed de novo transcriptome assembly. We assembled 79,434,350 clean reads into 31,708 isogenes and generated 26,565 annotated unigenes. We assigned Gene Ontology (GO) terms, Kyoto Encyclopedia of Genes and Genomes (KEGG) annotations, and potential Pfam domains to each transcript. Using the transcriptome data as a reference, we next analyzed the differential gene expression of explants cultured for 0, 3, and 6 d, respectively. We then identified differentially expressed genes (DEGs) before and after Phalaenopsis explant browning. We also performed GO, KEGG functional enrichment and Pfam analysis of all DEGs. Finally, we selected 11 genes for quantitative real-time PCR (qPCR) analysis to confirm the expression profile analysis. Conclusions/Significance Here, we report the first comprehensive analysis of transcriptome and expression profiles during Phalaenopsis explant browning. Our results suggest that Phalaenopsis explant browning may be due in part to gene expression changes that affect the secondary metabolism, such as: phenylpropanoid pathway and flavonoid biosynthesis. Genes involved in photosynthesis and ATPase activity have been found to be changed at transcription level; these changes may perturb energy metabolism and thus lead to the decay of plant cells and tissues. This study provides comprehensive gene expression data for Phalaenopsis browning. Our data constitute an important resource for further functional studies to prevent explant browning. PMID:25874455

  14. SPARTA: Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis.

    PubMed

    Johnson, Benjamin K; Scholz, Matthew B; Teal, Tracy K; Abramovitch, Robert B

    2016-02-04

    Many tools exist in the analysis of bacterial RNA sequencing (RNA-seq) transcriptional profiling experiments to identify differentially expressed genes between experimental conditions. Generally, the workflow includes quality control of reads, mapping to a reference, counting transcript abundance, and statistical tests for differentially expressed genes. In spite of the numerous tools developed for each component of an RNA-seq analysis workflow, easy-to-use bacterially oriented workflow applications to combine multiple tools and automate the process are lacking. With many tools to choose from for each step, the task of identifying a specific tool, adapting the input/output options to the specific use-case, and integrating the tools into a coherent analysis pipeline is not a trivial endeavor, particularly for microbiologists with limited bioinformatics experience. To make bacterial RNA-seq data analysis more accessible, we developed a Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis (SPARTA). SPARTA is a reference-based bacterial RNA-seq analysis workflow application for single-end Illumina reads. SPARTA is turnkey software that simplifies the process of analyzing RNA-seq data sets, making bacterial RNA-seq analysis a routine process that can be undertaken on a personal computer or in the classroom. The easy-to-install, complete workflow processes whole transcriptome shotgun sequencing data files by trimming reads and removing adapters, mapping reads to a reference, counting gene features, calculating differential gene expression, and, importantly, checking for potential batch effects within the data set. SPARTA outputs quality analysis reports, gene feature counts and differential gene expression tables and scatterplots. SPARTA provides an easy-to-use bacterial RNA-seq transcriptional profiling workflow to identify differentially expressed genes between experimental conditions. This software will enable microbiologists with limited bioinformatics experience to analyze their data and integrate next generation sequencing (NGS) technologies into the classroom. The SPARTA software and tutorial are available at sparta.readthedocs.org.

  15. Sequence of the bchG gene from Chloroflexus aurantiacus: relationship between chlorophyll synthase and other polyprenyltransferases

    NASA Technical Reports Server (NTRS)

    Lopez, J. C.; Ryan, S.; Blankenship, R. E.

    1996-01-01

    The sequence of the Chloroflexus aurantiacus open reading frame thought to be the C. aurantiacus homolog of the Rhodobacter capsulatus bchG gene is reported. The BchG gene product catalyzes esterification of bacteriochlorophyllide a by geranylgeraniol-PPi during bacteriochlorophyll a biosynthesis. Homologs from Arabidopsis thaliana, Synechocystis sp. strain PCC6803, and C. aurantiacus were identified in database searches. Profile analysis identified three related polyprenyltransferase enzymes which attach an aliphatic alcohol PPi to an aromatic substrate. This suggests a broader relationship between chlorophyll synthases and other polyprenyltransferases.

  16. Comparative expression profiling in grape (Vitis vinifera) berries derived from frequency analysis of ESTs and MPSS signatures.

    PubMed

    Iandolino, Alberto; Nobuta, Kan; da Silva, Francisco Goes; Cook, Douglas R; Meyers, Blake C

    2008-05-12

    Vitis vinifera (V. vinifera) is the primary grape species cultivated for wine production, with an industry valued annually in the billions of dollars worldwide. In order to sustain and increase grape production, it is necessary to understand the genetic makeup of grape species. Here we performed mRNA profiling using Massively Parallel Signature Sequencing (MPSS) and combined it with available Expressed Sequence Tag (EST) data. These tag-based technologies, which do not require a priori knowledge of genomic sequence, are well-suited for transcriptional profiling. The sequence depth of MPSS allowed us to capture and quantify almost all the transcripts at a specific stage in the development of the grape berry. The number and relative abundance of transcripts from stage II grape berries was defined using Massively Parallel Signature Sequencing (MPSS). A total of 2,635,293 17-base and 2,259,286 20-base signatures were obtained, representing at least 30,737 and 26,878 distinct sequences. The average normalized abundance per signature was approximately 49 TPM (Transcripts Per Million). Comparisons of the MPSS signatures with available Vitis species' ESTs and a unigene set demonstrated that 6,430 distinct contigs and 2,190 singletons have a perfect match to at least one MPSS signature. Among the matched sequences, ESTs were identified from tissues other than berries or from berries at different developmental stages. Additional MPSS signatures not matching to known grape ESTs can extend our knowledge of the V. vinifera transcriptome, particularly when these data are used to assist in annotation of whole genome sequences from Vitis vinifera. The MPSS data presented here not only achieved a higher level of saturation than previous EST based analyses, but in doing so, expand the known set of transcripts of grape berries during the unique stage in development that immediately precedes the onset of ripening. The MPSS dataset also revealed evidence of antisense expression not previously reported in grapes but comparable to that reported in other plant species. Finally, we developed a novel web-based, public resource for utilization of the grape MPSS data [1].

  17. An automated genotyping tool for enteroviruses and noroviruses.

    PubMed

    Kroneman, A; Vennema, H; Deforche, K; v d Avoort, H; Peñaranda, S; Oberste, M S; Vinjé, J; Koopmans, M

    2011-06-01

    Molecular techniques are established as routine in virological laboratories and virus typing through (partial) sequence analysis is increasingly common. Quality assurance for the use of typing data requires harmonization of genotype nomenclature, and agreement on target genes, depending on the level of resolution required, and robustness of methods. To develop and validate web-based open-access typing-tools for enteroviruses and noroviruses. An automated web-based typing algorithm was developed, starting with BLAST analysis of the query sequence against a reference set of sequences from viruses in the family Picornaviridae or Caliciviridae. The second step is phylogenetic analysis of the query sequence and a sub-set of the reference sequences, to assign the enterovirus type or norovirus genotype and/or variant, with profile alignment, construction of phylogenetic trees and bootstrap validation. Typing is performed on VP1 sequences of Human enterovirus A to D, and ORF1 and ORF2 sequences of genogroup I and II noroviruses. For validation, we used the tools to automatically type sequences in the RIVM and CDC enterovirus databases and the FBVE norovirus database. Using the typing-tools, 785(99%) of 795 Enterovirus VP1 sequences, and 8154(98.5%) of 8342 norovirus sequences were typed in accordance with previously used methods. Subtyping into variants was achieved for 4439(78.4%) of 5838 NoV GII.4 sequences. The online typing-tools reliably assign genotypes for enteroviruses and noroviruses. The use of phylogenetic methods makes these tools robust to ongoing evolution. This should facilitate standardized genotyping and nomenclature in clinical and public health laboratories, thus supporting inter-laboratory comparisons. Copyright © 2011 Elsevier B.V. All rights reserved.

  18. Sample sequencing of vascular plants demonstrates widespread conservation and divergence of microRNAs.

    PubMed

    Chávez Montes, Ricardo A; de Fátima Rosas-Cárdenas, Flor; De Paoli, Emanuele; Accerbi, Monica; Rymarquis, Linda A; Mahalingam, Gayathri; Marsch-Martínez, Nayelli; Meyers, Blake C; Green, Pamela J; de Folter, Stefan

    2014-04-23

    Small RNAs are pivotal regulators of gene expression that guide transcriptional and post-transcriptional silencing mechanisms in eukaryotes, including plants. Here we report a comprehensive atlas of sRNA and miRNA from 3 species of algae and 31 representative species across vascular plants, including non-model plants. We sequence and quantify sRNAs from 99 different tissues or treatments across species, resulting in a data set of over 132 million distinct sequences. Using miRBase mature sequences as a reference, we identify the miRNA sequences present in these libraries. We apply diverse profiling methods to examine critical sRNA and miRNA features, such as size distribution, tissue-specific regulation and sequence conservation between species, as well as to predict putative new miRNA sequences. We also develop database resources, computational analysis tools and a dedicated website, http://smallrna.udel.edu/. This study provides new insights on plant sRNAs and miRNAs, and a foundation for future studies.

  19. The HMMER Web Server for Protein Sequence Similarity Search.

    PubMed

    Prakash, Ananth; Jeffryes, Matt; Bateman, Alex; Finn, Robert D

    2017-12-08

    Protein sequence similarity search is one of the most commonly used bioinformatics methods for identifying evolutionarily related proteins. In general, sequences that are evolutionarily related share some degree of similarity, and sequence-search algorithms use this principle to identify homologs. The requirement for a fast and sensitive sequence search method led to the development of the HMMER software, which in the latest version (v3.1) uses a combination of sophisticated acceleration heuristics and mathematical and computational optimizations to enable the use of profile hidden Markov models (HMMs) for sequence analysis. The HMMER Web server provides a common platform by linking the HMMER algorithms to databases, thereby enabling the search for homologs, as well as providing sequence and functional annotation by linking external databases. This unit describes three basic protocols and two alternate protocols that explain how to use the HMMER Web server using various input formats and user defined parameters. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  20. Impact of sequencing depth and read length on single cell RNA sequencing data of T cells.

    PubMed

    Rizzetto, Simone; Eltahla, Auda A; Lin, Peijie; Bull, Rowena; Lloyd, Andrew R; Ho, Joshua W K; Venturi, Vanessa; Luciani, Fabio

    2017-10-06

    Single cell RNA sequencing (scRNA-seq) provides great potential in measuring the gene expression profiles of heterogeneous cell populations. In immunology, scRNA-seq allowed the characterisation of transcript sequence diversity of functionally relevant T cell subsets, and the identification of the full length T cell receptor (TCRαβ), which defines the specificity against cognate antigens. Several factors, e.g. RNA library capture, cell quality, and sequencing output affect the quality of scRNA-seq data. We studied the effects of read length and sequencing depth on the quality of gene expression profiles, cell type identification, and TCRαβ reconstruction, utilising 1,305 single cells from 8 publically available scRNA-seq datasets, and simulation-based analyses. Gene expression was characterised by an increased number of unique genes identified with short read lengths (<50 bp), but these featured higher technical variability compared to profiles from longer reads. Successful TCRαβ reconstruction was achieved for 6 datasets (81% - 100%) with at least 0.25 millions (PE) reads of length >50 bp, while it failed for datasets with <30 bp reads. Sufficient read length and sequencing depth can control technical noise to enable accurate identification of TCRαβ and gene expression profiles from scRNA-seq data of T cells.

  1. PHYLOGENETIC AFFILIATION OF WATER DISTRIBUTION SYSTEM BACTERIAL ISOLATES USING 16S RDNA SEQUENCE ANALYSIS

    EPA Science Inventory

    In a previously described study, only 15% of the bacterial strains isolated from a water distribution system (WDS) grown on R2A agar were identifiable using fatty acid methyl esthers (FAME) profiling. The lack of success was attributed to the use of fatty acid databases of bacter...

  2. HuH-7 reference genome profile: complex karyotype composed of massive loss of heterozygosity.

    PubMed

    Kasai, Fumio; Hirayama, Noriko; Ozawa, Midori; Satoh, Motonobu; Kohara, Arihiro

    2018-05-17

    Human cell lines represent a valuable resource as in vitro experimental models. A hepatoma cell line, HuH-7 (JCRB0403), has been used extensively in various research fields and a number of studies using this line have been published continuously since it was established in 1982. However, an accurate genome profile, which can be served as a reliable reference, has not been available. In this study, we performed M-FISH, SNP microarray and amplicon sequencing to characterize the cell line. Single cell analysis of metaphases revealed a high level of heterogeneity with a mode of 60 chromosomes. Cytogenetic results demonstrated chromosome abnormalities involving every chromosome in addition to a massive loss of heterozygosity, which accounts for 55.3% of the genome, consistent with the homozygous variants seen in the sequence analysis. We provide empirical data that the HuH-7 cell line is composed of highly heterogeneous cell populations, suggesting that besides cell line authentication, the quality of cell lines needs to be taken into consideration in the future use of tumor cell lines.

  3. Molecular characterization of southern bluefin tuna myoglobin (Thunnus maccoyii).

    PubMed

    Nurilmala, Mala; Ochiai, Yoshihiro

    2016-10-01

    The primary structure of southern bluefin tuna Thunnus maccoyii Mb has been elucidated by molecular cloning techniques. The cDNA of this tuna encoding Mb contained 776 nucleotides, with an open reading frame of 444 nucleotides encoding 147 amino acids. The nucleotide sequence of the coding region was identical to those of other bluefin tunas (T. thynnus and T. orientalis), thus giving the same amino acid sequences. Based on the deduced amino acid sequence, bioinformatic analysis was performed including phylogenic tree, hydropathy plot and homology modeling. In order to investigate the autoxidation profiles, the isolation of Mb was performed from the dark muscle. The water soluble fraction was subjected to ammonium sulfate fractionation (60-90 % saturation) followed by preparative gel electrophoresis. Autoxidation profiles of Mb were delineated at pH 5.6, 6.5 and 7.4 at temperature 37 °C. The autoxidation rate of tuna Mb was slightly higher than that of horse Mb at all pH examined. These results revealed that tuna myoglobin was unstable than that of horse Mb mainly at acidic pH.

  4. Profiling the transcriptome of Gracilaria changii (Rhodophyta) in response to light deprivation.

    PubMed

    Ho, Chai-Ling; Teoh, Seddon; Teo, Swee-Sen; Rahim, Raha Abdul; Phang, Siew-Moi

    2009-01-01

    Light regulates photosynthesis, growth and reproduction, yield and properties of phycocolloids, and starch contents in seaweeds. Despite its importance as an environmental cue that regulates many developmental, physiological, and biochemical processes, the network of genes involved during light deprivation are obscure. In this study, we profiled the transcriptome of Gracilaria changii at two different irradiance levels using a cDNA microarray containing more than 3,000 cDNA probes. Microarray analysis revealed that 93 and 105 genes were up- and down-regulated more than 3-fold under light deprivation, respectively. However, only 50% of the transcripts have significant matches to the nonredundant peptide sequences in the database. The transcripts that accumulated under light deprivation include vanadium chloroperoxidase, thioredoxin, ferredoxin component, and reduced nicotinamide adenine dinucleotide dehydrogenase. Among the genes that were down-regulated under light deprivation were genes encoding light harvesting protein, light harvesting complex I, phycobilisome 7.8 kDa linker polypeptide, low molecular weight early light-inducible protein, and vanadium bromoperoxidase. Our findings also provided important clues to the functions of many unknown sequences that could not be annotated using sequence comparison.

  5. The technology and biology of single-cell RNA sequencing.

    PubMed

    Kolodziejczyk, Aleksandra A; Kim, Jong Kyoung; Svensson, Valentine; Marioni, John C; Teichmann, Sarah A

    2015-05-21

    The differences between individual cells can have profound functional consequences, in both unicellular and multicellular organisms. Recently developed single-cell mRNA-sequencing methods enable unbiased, high-throughput, and high-resolution transcriptomic analysis of individual cells. This provides an additional dimension to transcriptomic information relative to traditional methods that profile bulk populations of cells. Already, single-cell RNA-sequencing methods have revealed new biology in terms of the composition of tissues, the dynamics of transcription, and the regulatory relationships between genes. Rapid technological developments at the level of cell capture, phenotyping, molecular biology, and bioinformatics promise an exciting future with numerous biological and medical applications. Copyright © 2015 Elsevier Inc. All rights reserved.

  6. Influence of long-term repeated prescribed burning on mycelial communities of ectomycorrhizal fungi.

    PubMed

    Bastias, Brigitte A; Xu, Zhihong; Cairney, John W G

    2006-01-01

    To demonstrate the efficacy of direct DNA extraction from hyphal ingrowth bags for community profiling of ectomycorrhizal (ECM) mycelia in soil, we applied the method to investigate the influence of long-term repeated prescribed burning on an ECM fungal community. DNA was extracted from hyphal ingrowth bags buried in forest plots that received different prescribed burning treatments for 30 yr, and denaturing gradient gel electrophoresis (DGGE) profiles of partial fungal rDNA internal transcribed spacer (ITS) regions were compared. Restriction fragment length polymorphism (RFLP) and sequence analyses were also used to compare clone assemblages between the treatments. The majority of sequences derived from the ingrowth bags were apparently those of ECM fungi. DGGE profiles for biennially burned plots were significantly different from those of quadrennially burned and unburned control plots. Analysis of clone assemblages indicated that this reflected altered ECM fungal community composition. The results indicate that hyphal ingrowth bags represent a useful method for investigation of ECM mycelial communities, and that frequent long-term prescribed burning can influence below-ground ECM fungal communities.

  7. HMM-ModE: implementation, benchmarking and validation with HMMER3

    PubMed Central

    2014-01-01

    Background HMM-ModE is a computational method that generates family specific profile HMMs using negative training sequences. The method optimizes the discrimination threshold using 10 fold cross validation and modifies the emission probabilities of profiles to reduce common fold based signals shared with other sub-families. The protocol depends on the program HMMER for HMM profile building and sequence database searching. The recent release of HMMER3 has improved database search speed by several orders of magnitude, allowing for the large scale deployment of the method in sequence annotation projects. We have rewritten our existing scripts both at the level of parsing the HMM profiles and modifying emission probabilities to upgrade HMM-ModE using HMMER3 that takes advantage of its probabilistic inference with high computational speed. The method is benchmarked and tested on GPCR dataset as an accurate and fast method for functional annotation. Results The implementation of this method, which now works with HMMER3, is benchmarked with the earlier version of HMMER, to show that the effect of local-local alignments is marked only in the case of profiles containing a large number of discontinuous match states. The method is tested on a gold standard set of families and we have reported a significant reduction in the number of false positive hits over the default HMM profiles. When implemented on GPCR sequences, the results showed an improvement in the accuracy of classification compared with other methods used to classify the familyat different levels of their classification hierarchy. Conclusions The present findings show that the new version of HMM-ModE is a highly specific method used to differentiate between fold (superfamily) and function (family) specific signals, which helps in the functional annotation of protein sequences. The use of modified profile HMMs of GPCR sequences provides a simple yet highly specific method for classification of the family, being able to predict the sub-family specific sequences with high accuracy even though sequences share common physicochemical characteristics between sub-families. PMID:25073805

  8. Dose-Response Analysis of RNA-Seq Profiles in Archival ...

    EPA Pesticide Factsheets

    Use of archival resources has been limited to date by inconsistent methods for genomic profiling of degraded RNA from formalin-fixed paraffin-embedded (FFPE) samples. RNA-sequencing offers a promising way to address this problem. Here we evaluated transcriptomic dose responses using RNA-sequencing in paired FFPE and frozen (FROZ) samples from two archival studies in mice, one 20 years old. Experimental treatments included 3 different doses of di(2-ethylhexyl)phthalate or dichloroacetic acid for the recently archived and older studies, respectively. Total RNA was ribo-depleted and sequenced using the Illumina HiSeq platform. In the recently archived study, FFPE samples had 35% lower total counts compared to FROZ samples but high concordance in fold-change values of differentially expressed genes (DEGs) (r2 = 0.99), highly enriched pathways (90% overlap with FROZ), and benchmark dose estimates for preselected target genes (2% difference vs FROZ). In contrast, older FFPE samples had markedly lower total counts (3% of FROZ) and poor concordance in global DEGs and pathways. However, counts from FFPE and FROZ samples still positively correlated (r2 = 0.84 across all transcripts) and showed comparable dose responses for more highly expressed target genes. These findings highlight potential applications and issues in using RNA-sequencing data from FFPE samples. Recently archived FFPE samples were highly similar to FROZ samples in sequencing q

  9. Evaluation of massively parallel sequencing for forensic DNA methylation profiling.

    PubMed

    Richards, Rebecca; Patel, Jayshree; Stevenson, Kate; Harbison, SallyAnn

    2018-05-11

    Epigenetics is an emerging area of interest in forensic science. DNA methylation, a type of epigenetic modification, can be applied to chronological age estimation, identical twin differentiation and body fluid identification. However, there is not yet an agreed, established methodology for targeted detection and analysis of DNA methylation markers in forensic research. Recently a massively parallel sequencing-based approach has been suggested. The use of massively parallel sequencing is well established in clinical epigenetics and is emerging as a new technology in the forensic field. This review investigates the potential benefits, limitations and considerations of this technique for the analysis of DNA methylation in a forensic context. The importance of a robust protocol, regardless of the methodology used, that minimises potential sources of bias is highlighted. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  10. Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing

    PubMed Central

    2012-01-01

    Background RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Results Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. Conclusions This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates. PMID:22985019

  11. Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing.

    PubMed

    Robles, José A; Qureshi, Sumaira E; Stephen, Stuart J; Wilson, Susan R; Burden, Conrad J; Taylor, Jennifer M

    2012-09-17

    RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates.

  12. Next-generation sequencing for identification of candidate genes for Fusarium wilt and sterility mosaic disease in pigeonpea (Cajanus cajan).

    PubMed

    Singh, Vikas K; Khan, Aamir W; Saxena, Rachit K; Kumar, Vinay; Kale, Sandip M; Sinha, Pallavi; Chitikineni, Annapurna; Pazhamala, Lekha T; Garg, Vanika; Sharma, Mamta; Sameer Kumar, Chanda Venkata; Parupalli, Swathi; Vechalapu, Suryanarayana; Patil, Suyash; Muniswamy, Sonnappa; Ghanta, Anuradha; Yamini, Kalinati Narasimhan; Dharmaraj, Pallavi Subbanna; Varshney, Rajeev K

    2016-05-01

    To map resistance genes for Fusarium wilt (FW) and sterility mosaic disease (SMD) in pigeonpea, sequencing-based bulked segregant analysis (Seq-BSA) was used. Resistant (R) and susceptible (S) bulks from the extreme recombinant inbred lines of ICPL 20096 × ICPL 332 were sequenced. Subsequently, SNP index was calculated between R- and S-bulks with the help of draft genome sequence and reference-guided assembly of ICPL 20096 (resistant parent). Seq-BSA has provided seven candidate SNPs for FW and SMD resistance in pigeonpea. In parallel, four additional genotypes were re-sequenced and their combined analysis with R- and S-bulks has provided a total of 8362 nonsynonymous (ns) SNPs. Of 8362 nsSNPs, 60 were found within the 2-Mb flanking regions of seven candidate SNPs identified through Seq-BSA. Haplotype analysis narrowed down to eight nsSNPs in seven genes. These eight nsSNPs were further validated by re-sequencing 11 genotypes that are resistant and susceptible to FW and SMD. This analysis revealed association of four candidate nsSNPs in four genes with FW resistance and four candidate nsSNPs in three genes with SMD resistance. Further, In silico protein analysis and expression profiling identified two most promising candidate genes namely C.cajan_01839 for SMD resistance and C.cajan_03203 for FW resistance. Identified candidate genomic regions/SNPs will be useful for genomics-assisted breeding in pigeonpea. © 2015 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  13. Photospheres of hot stars. IV - Spectral type O4

    NASA Technical Reports Server (NTRS)

    Bohannan, Bruce; Abbott, David C.; Voels, Stephen A.; Hummer, David G.

    1990-01-01

    The basic stellar parameters of a supergiant (Zeta Pup) and two main-sequence stars, 9 Sgr and HD 46223, at spectral class O4 are determined using line profile analysis. The stellar parameters are determined by comparing high signal-to-noise hydrogen and helium line profiles with those from stellar atmosphere models which include the effect of radiation scattered back onto the photosphere from an overlying stellar wind, an effect referred to as wind blanketing. At spectral class O4, the inclusion of wind-blanketing in the model atmosphere reduces the effective temperature by an average of 10 percent. This shift in effective temperature is also reflected by shifts in several other stellar parameters relative to previous O4 spectral-type calibrations. It is also shown through the analysis of the two O4 V stars that scatter in spectral type calibrations is introduced by assuming that the observed line profile reflects the photospheric stellar parameters.

  14. Investigation of Sequence Clipping and Structural Heterogeneity of an HIV Broadly Neutralizing Antibody by a Comprehensive LC-MS Analysis

    NASA Astrophysics Data System (ADS)

    Ivleva, Vera B.; Schneck, Nicole A.; Gollapudi, Deepika; Arnold, Frank; Cooper, Jonathan W.; Lei, Q. Paula

    2018-05-01

    CAP256 is one of the highly potent, broadly neutralizing monoclonal antibodies (bNAb) designed for HIV-1 therapy. During the process development of one of the constructs, an unexpected product-related impurity was observed via microfluidics gel electrophoresis. A panel of complementary LC-MS analyses was applied for the comprehensive characterization of CAP256 which included the analysis of the intact and reduced protein, the middle-up approach, and a set of complementary peptide mapping techniques and verification of the disulfide bonds. The designed workflow allowed to identify a clip within a protruding acidic loop in the CDR-H3 region of the heavy chain, which can lead to the decrease of bNAb potency. This characterization explained the origin of the additional species reflected by the reducing gel profile. An intra-loop disulfide bond linking the two fragments was identified, which explained why the non-reducing capillary electrophoresis (CE) profile was not affected. The extensive characterization of CAP256 post-translational modifications was performed to investigate a possible cause of CE profile complexity and to illustrate other structural details related to this molecule's biological function. Two sites of the engineered Tyr sulfation were verified in the antigen-binding loop, and pyroglutamate formation was used as a tool for monitoring the extent of antibody clipping. Overall, the comprehensive LC-MS study was crucial to (1) identify the impurity as sequence clipping, (2) pinpoint the clipping location and justify its susceptibility relative to the molecular structure, (3) lead to an upstream process optimization to mitigate product quality risk, and (4) ultimately re-engineer the sequence to be clip-resistant. [Figure not available: see fulltext.

  15. Comparative analysis of bacteria associated with different mosses by 16S rRNA and 16S rDNA sequencing.

    PubMed

    Tian, Yang; Li, Yan Hong

    2017-01-01

    To understand the differences of the bacteria associated with different mosses, a phylogenetic study of bacterial communities in three mosses was carried out based on 16S rDNA and 16S rRNA sequencing. The mosses used were Hygroamblystegium noterophilum, Entodon compressus and Grimmia montana, representing hygrophyte, shady plant and xerophyte, respectively. In total, the operational taxonomic units (OTUs), richness and diversity were different regardless of the moss species and the library level. All the examined 1183 clones were assigned to 248 OTUs, 56 genera were assigned in rDNA libraries and 23 genera were determined at the rRNA level. Proteobacteria and Bacteroidetes were considered as the most dominant phyla in all the libraries, whereas abundant Actinobacteria and Acidobacteria were detected in the rDNA library of Entodon compressus and approximately 24.7% clones were assigned to Candidate division TM7 in Grimmia montana at rRNA level. The heatmap showed the bacterial profiles derived from rRNA and rDNA were partly overlapping. However, the principle component analysis of all the profiles derived from rDNA showed sharper differences between the different mosses than that of rRNA-based profiles. This suggests that the metabolically active bacterial compositions in different mosses were more phylogenetically similar and the differences of the bacteria associated with different mosses were mainly detected at the rDNA level. Obtained results clearly demonstrate that combination of 16S rDNA and 16S rRNA sequencing is preferred approach to have a good understanding on the constitution of the microbial communities in mosses. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  16. High throughput profile-profile based fold recognition for the entire human proteome.

    PubMed

    McGuffin, Liam J; Smith, Richard T; Bryson, Kevin; Sørensen, Søren-Aksel; Jones, David T

    2006-06-07

    In order to maintain the most comprehensive structural annotation databases we must carry out regular updates for each proteome using the latest profile-profile fold recognition methods. The ability to carry out these updates on demand is necessary to keep pace with the regular updates of sequence and structure databases. Providing the highest quality structural models requires the most intensive profile-profile fold recognition methods running with the very latest available sequence databases and fold libraries. However, running these methods on such a regular basis for every sequenced proteome requires large amounts of processing power. In this paper we describe and benchmark the JYDE (Job Yield Distribution Environment) system, which is a meta-scheduler designed to work above cluster schedulers, such as Sun Grid Engine (SGE) or Condor. We demonstrate the ability of JYDE to distribute the load of genomic-scale fold recognition across multiple independent Grid domains. We use the most recent profile-profile version of our mGenTHREADER software in order to annotate the latest version of the Human proteome against the latest sequence and structure databases in as short a time as possible. We show that our JYDE system is able to scale to large numbers of intensive fold recognition jobs running across several independent computer clusters. Using our JYDE system we have been able to annotate 99.9% of the protein sequences within the Human proteome in less than 24 hours, by harnessing over 500 CPUs from 3 independent Grid domains. This study clearly demonstrates the feasibility of carrying out on demand high quality structural annotations for the proteomes of major eukaryotic organisms. Specifically, we have shown that it is now possible to provide complete regular updates of profile-profile based fold recognition models for entire eukaryotic proteomes, through the use of Grid middleware such as JYDE.

  17. An Artificial Functional Family Filter in Homolog Searching in Next-generation Sequencing Metagenomics

    PubMed Central

    Du, Ruofei; Mercante, Donald; Fang, Zhide

    2013-01-01

    In functional metagenomics, BLAST homology search is a common method to classify metagenomic reads into protein/domain sequence families such as Clusters of Orthologous Groups of proteins (COGs) in order to quantify the abundance of each COG in the community. The resulting functional profile of the community is then used in downstream analysis to correlate the change in abundance to environmental perturbation, clinical variation, and so on. However, the short read length coupled with next-generation sequencing technologies poses a barrier in this approach, essentially because similarity significance cannot be discerned by searching with short reads. Consequently, artificial functional families are produced, in which those with a large number of reads assigned decreases the accuracy of functional profile dramatically. There is no method available to address this problem. We intended to fill this gap in this paper. We revealed that BLAST similarity scores of homologues for short reads from COG protein members coding sequences are distributed differently from the scores of those derived elsewhere. We showed that, by choosing an appropriate score cut-off, we are able to filter out most artificial families and simultaneously to preserve sufficient information in order to build the functional profile. We also showed that, by incorporated application of BLAST and RPS-BLAST, some artificial families with large read counts can be further identified after the score cutoff filtration. Evaluated on three experimental metagenomic datasets with different coverages, we found that the proposed method is robust against read coverage and consistently outperforms the other E-value cutoff methods currently used in literatures. PMID:23516532

  18. Molecular and Biochemical Analysis of the Galactose Phenotype of Dairy Streptococcus thermophilus Strains Reveals Four Different Fermentation Profiles

    PubMed Central

    de Vin, Filip; Rådström, Peter; Herman, Lieve; De Vuyst, Luc

    2005-01-01

    Lactose-limited fermentations of 49 dairy Streptococcus thermophilus strains revealed four distinct fermentation profiles with respect to galactose consumption after lactose depletion. All the strains excreted galactose into the medium during growth on lactose, except for strain IMDOST40, which also displayed extremely high galactokinase (GalK) activity. Among this strain collection eight galactose-positive phenotypes sensu stricto were found and their fermentation characteristics and Leloir enzyme activities were measured. As the gal promoter seems to play an important role in the galactose phenotype, the galR-galK intergenic region was sequenced for all strains yielding eight different nucleotide sequences (NS1 to NS8). The gal promoter played an important role in the Gal-positive phenotype but did not determine it exclusively. Although GalT and GalE activities were detected for all Gal-positive strains, GalK activity could only be detected for two out of eight Gal-positive strains. This finding suggests that the other six S. thermophilus strains metabolize galactose via an alternative route. For each type of fermentation profile obtained, a representative strain was chosen and four complete Leloir gene clusters were sequenced. It turned out that Gal-positive strains contained more amino acid differences within their gal genes than Gal-negative strains. Finally, the biodiversity regarding lactose-galactose utilization among the different S. thermophilus strains used in this study was shown by RAPD-PCR. Five Gal-positive strains that contain nucleotide sequence NS2 in their galR-galK intergenic region were closely related. PMID:16000774

  19. Defining the transcriptome assembly and its use for genome dynamics and transcriptome profiling studies in pigeonpea (Cajanus cajan L.).

    PubMed

    Dubey, Anuja; Farmer, Andrew; Schlueter, Jessica; Cannon, Steven B; Abernathy, Brian; Tuteja, Reetu; Woodward, Jimmy; Shah, Trushar; Mulasmanovic, Benjamin; Kudapa, Himabindu; Raju, Nikku L; Gothalwal, Ragini; Pande, Suresh; Xiao, Yongli; Town, Chris D; Singh, Nagendra K; May, Gregory D; Jackson, Scott; Varshney, Rajeev K

    2011-06-01

    This study reports generation of large-scale genomic resources for pigeonpea, a so-called 'orphan crop species' of the semi-arid tropic regions. FLX/454 sequencing carried out on a normalized cDNA pool prepared from 31 tissues produced 494 353 short transcript reads (STRs). Cluster analysis of these STRs, together with 10 817 Sanger ESTs, resulted in a pigeonpea trancriptome assembly (CcTA) comprising of 127 754 tentative unique sequences (TUSs). Functional analysis of these TUSs highlights several active pathways and processes in the sampled tissues. Comparison of the CcTA with the soybean genome showed similarity to 10 857 and 16 367 soybean gene models (depending on alignment methods). Additionally, Illumina 1G sequencing was performed on Fusarium wilt (FW)- and sterility mosaic disease (SMD)-challenged root tissues of 10 resistant and susceptible genotypes. More than 160 million sequence tags were used to identify FW- and SMD-responsive genes. Sequence analysis of CcTA and the Illumina tags identified a large new set of markers for use in genetics and breeding, including 8137 simple sequence repeats, 12 141 single-nucleotide polymorphisms and 5845 intron-spanning regions. Genomic resources developed in this study should be useful for basic and applied research, not only for pigeonpea improvement but also for other related, agronomically important legumes.

  20. Characterization of the glutathione S-transferase gene family through ESTs and expression analyses within common and pigmented cultivars of Citrus sinensis (L.) Osbeck

    PubMed Central

    2014-01-01

    Background Glutathione S-transferases (GSTs) represent a ubiquitous gene family encoding detoxification enzymes able to recognize reactive electrophilic xenobiotic molecules as well as compounds of endogenous origin. Anthocyanin pigments require GSTs for their transport into the vacuole since their cytoplasmic retention is toxic to the cell. Anthocyanin accumulation in Citrus sinensis (L.) Osbeck fruit flesh determines different phenotypes affecting the typical pigmentation of Sicilian blood oranges. In this paper we describe: i) the characterization of the GST gene family in C. sinensis through a systematic EST analysis; ii) the validation of the EST assembly by exploiting the genome sequences of C. sinensis and C. clementina and their genome annotations; iii) GST gene expression profiling in six tissues/organs and in two different sweet orange cultivars, Cadenera (common) and Moro (pigmented). Results We identified 61 GST transcripts, described the full- or partial-length nature of the sequences and assigned to each sequence the GST class membership exploiting a comparative approach and the classification scheme proposed for plant species. A total of 23 full-length sequences were defined. Fifty-four of the 61 transcripts were successfully aligned to the C. sinensis and C. clementina genomes. Tissue specific expression profiling demonstrated that the expression of some GST transcripts was 'tissue-affected' and cultivar specific. A comparative analysis of C. sinensis GSTs with those from other plant species was also considered. Data from the current analysis are accessible at http://biosrv.cab.unina.it/citrusGST/, with the aim to provide a reference resource for C. sinensis GSTs. Conclusions This study aimed at the characterization of the GST gene family in C. sinensis. Based on expression patterns from two different cultivars and on sequence-comparative analyses, we also highlighted that two sequences, a Phi class GST and a Mapeg class GST, could be involved in the conjugation of anthocyanin pigments and in their transport into the vacuole, specifically in fruit flesh of the pigmented cultivar. PMID:24490620

  1. Atypical fibroxanthoma and pleomorphic dermal sarcoma harbor frequent NOTCH1/2 and FAT1 mutations and similar DNA copy number alteration profiles.

    PubMed

    Griewank, Klaus G; Wiesner, Thomas; Murali, Rajmohan; Pischler, Carina; Müller, Hansgeorg; Koelsche, Christian; Möller, Inga; Franklin, Cindy; Cosgarea, Ioana; Sucker, Antje; Schadendorf, Dirk; Schaller, Jörg; Horn, Susanne; Brenn, Thomas; Mentzel, Thomas

    2018-03-01

    Atypical fibroxanthomas and pleomorphic dermal sarcomas are tumors arising in sun-damaged skin of elderly patients. They have differing prognoses and are currently distinguished using histological criteria, such as invasion of deeper tissue structures, necrosis and lymphovascular or perineural invasion. To investigate the as-yet poorly understood genetics of these tumors, 41 atypical fibroxanthomas and 40 pleomorphic dermal sarcomas were subjected to targeted next-generation sequencing approaches as well as DNA copy number analysis by comparative genomic hybridization. In an analysis of the entire coding region of 341 oncogenes and tumor suppressor genes in 13 atypical fibroxanthomas using an established hybridization-based next-generation sequencing approach, we found that these tumors harbor a large number of mutations. Gene alterations were identified in more than half of the analyzed samples in FAT1, NOTCH1/2, CDKN2A, TP53, and the TERT promoter. The presence of these alterations was verified in 26 atypical fibroxanthoma and 35 pleomorphic dermal sarcoma samples by targeted amplicon-based next-generation sequencing. Similar mutation profiles in FAT1, NOTCH1/2, CDKN2A, TP53, and the TERT promoter were identified in both atypical fibroxanthoma and pleomorphic dermal sarcoma. Activating RAS mutations (G12 and G13) identified in 3 pleomorphic dermal sarcoma were not found in atypical fibroxanthoma. Comprehensive DNA copy number analysis demonstrated a wide array of different copy number gains and losses, with similar profiles in atypical fibroxanthoma and pleomorphic dermal sarcoma. In summary, atypical fibroxanthoma and pleomorphic dermal sarcoma are highly mutated tumors with recurrent mutations in FAT1, NOTCH1/2, CDKN2A, TP53, and the TERT promoter, and a range of DNA copy number alterations. These findings suggest that atypical fibroxanthomas and pleomorphic dermal sarcomas are genetically related, potentially representing two ends of a common tumor spectrum and distinguishing these entities is at present still best performed using histological criteria.

  2. CoryneBase: Corynebacterium Genomic Resources and Analysis Tools at Your Fingertips

    PubMed Central

    Tan, Mui Fern; Jakubovics, Nick S.; Wee, Wei Yee; Mutha, Naresh V. R.; Wong, Guat Jah; Ang, Mia Yang; Yazdi, Amir Hessam; Choo, Siew Woh

    2014-01-01

    Corynebacteria are used for a wide variety of industrial purposes but some species are associated with human diseases. With increasing number of corynebacterial genomes having been sequenced, comparative analysis of these strains may provide better understanding of their biology, phylogeny, virulence and taxonomy that may lead to the discoveries of beneficial industrial strains or contribute to better management of diseases. To facilitate the ongoing research of corynebacteria, a specialized central repository and analysis platform for the corynebacterial research community is needed to host the fast-growing amount of genomic data and facilitate the analysis of these data. Here we present CoryneBase, a genomic database for Corynebacterium with diverse functionality for the analysis of genomes aimed to provide: (1) annotated genome sequences of Corynebacterium where 165,918 coding sequences and 4,180 RNAs can be found in 27 species; (2) access to comprehensive Corynebacterium data through the use of advanced web technologies for interactive web interfaces; and (3) advanced bioinformatic analysis tools consisting of standard BLAST for homology search, VFDB BLAST for sequence homology search against the Virulence Factor Database (VFDB), Pairwise Genome Comparison (PGC) tool for comparative genomic analysis, and a newly designed Pathogenomics Profiling Tool (PathoProT) for comparative pathogenomic analysis. CoryneBase offers the access of a range of Corynebacterium genomic resources as well as analysis tools for comparative genomics and pathogenomics. It is publicly available at http://corynebacterium.um.edu.my/. PMID:24466021

  3. High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis.

    PubMed

    Simonyan, Vahan; Mazumder, Raja

    2014-09-30

    The High-performance Integrated Virtual Environment (HIVE) is a high-throughput cloud-based infrastructure developed for the storage and analysis of genomic and associated biological data. HIVE consists of a web-accessible interface for authorized users to deposit, retrieve, share, annotate, compute and visualize Next-generation Sequencing (NGS) data in a scalable and highly efficient fashion. The platform contains a distributed storage library and a distributed computational powerhouse linked seamlessly. Resources available through the interface include algorithms, tools and applications developed exclusively for the HIVE platform, as well as commonly used external tools adapted to operate within the parallel architecture of the system. HIVE is composed of a flexible infrastructure, which allows for simple implementation of new algorithms and tools. Currently, available HIVE tools include sequence alignment and nucleotide variation profiling tools, metagenomic analyzers, phylogenetic tree-building tools using NGS data, clone discovery algorithms, and recombination analysis algorithms. In addition to tools, HIVE also provides knowledgebases that can be used in conjunction with the tools for NGS sequence and metadata analysis.

  4. High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis

    PubMed Central

    Simonyan, Vahan; Mazumder, Raja

    2014-01-01

    The High-performance Integrated Virtual Environment (HIVE) is a high-throughput cloud-based infrastructure developed for the storage and analysis of genomic and associated biological data. HIVE consists of a web-accessible interface for authorized users to deposit, retrieve, share, annotate, compute and visualize Next-generation Sequencing (NGS) data in a scalable and highly efficient fashion. The platform contains a distributed storage library and a distributed computational powerhouse linked seamlessly. Resources available through the interface include algorithms, tools and applications developed exclusively for the HIVE platform, as well as commonly used external tools adapted to operate within the parallel architecture of the system. HIVE is composed of a flexible infrastructure, which allows for simple implementation of new algorithms and tools. Currently, available HIVE tools include sequence alignment and nucleotide variation profiling tools, metagenomic analyzers, phylogenetic tree-building tools using NGS data, clone discovery algorithms, and recombination analysis algorithms. In addition to tools, HIVE also provides knowledgebases that can be used in conjunction with the tools for NGS sequence and metadata analysis. PMID:25271953

  5. Phylogenomic analysis of UDP glycosyltransferase 1 multigene family in Linum usitatissimum identified genes with varied expression patterns.

    PubMed

    Barvkar, Vitthal T; Pardeshi, Varsha C; Kale, Sandip M; Kadoo, Narendra Y; Gupta, Vidya S

    2012-05-08

    The glycosylation process, catalyzed by ubiquitous glycosyltransferase (GT) family enzymes, is a prevalent modification of plant secondary metabolites that regulates various functions such as hormone homeostasis, detoxification of xenobiotics and biosynthesis and storage of secondary metabolites. Flax (Linum usitatissimum L.) is a commercially grown oilseed crop, important because of its essential fatty acids and health promoting lignans. Identification and characterization of UDP glycosyltransferase (UGT) genes from flax could provide valuable basic information about this important gene family and help to explain the seed specific glycosylated metabolite accumulation and other processes in plants. Plant genome sequencing projects are useful to discover complexity within this gene family and also pave way for the development of functional genomics approaches. Taking advantage of the newly assembled draft genome sequence of flax, we identified 137 UDP glycosyltransferase (UGT) genes from flax using a conserved signature motif. Phylogenetic analysis of these protein sequences clustered them into 14 major groups (A-N). Expression patterns of these genes were investigated using publicly available expressed sequence tag (EST), microarray data and reverse transcription quantitative real time PCR (RT-qPCR). Seventy-three per cent of these genes (100 out of 137) showed expression evidence in 15 tissues examined and indicated varied expression profiles. The RT-qPCR results of 10 selected genes were also coherent with the digital expression analysis. Interestingly, five duplicated UGT genes were identified, which showed differential expression in various tissues. Of the seven intron loss/gain positions detected, two intron positions were conserved among most of the UGTs, although a clear relationship about the evolution of these genes could not be established. Comparison of the flax UGTs with orthologs from four other sequenced dicot genomes indicated that seven UGTs were flax diverged. Flax has a large number of UGT genes including few flax diverged ones. Phylogenetic analysis and expression profiles of these genes identified tissue and condition specific repertoire of UGT genes from this crop. This study would facilitate precise selection of candidate genes and their further characterization of substrate specificities and in planta functions.

  6. Phylogenomic analysis of UDP glycosyltransferase 1 multigene family in Linum usitatissimum identified genes with varied expression patterns

    PubMed Central

    2012-01-01

    Background The glycosylation process, catalyzed by ubiquitous glycosyltransferase (GT) family enzymes, is a prevalent modification of plant secondary metabolites that regulates various functions such as hormone homeostasis, detoxification of xenobiotics and biosynthesis and storage of secondary metabolites. Flax (Linum usitatissimum L.) is a commercially grown oilseed crop, important because of its essential fatty acids and health promoting lignans. Identification and characterization of UDP glycosyltransferase (UGT) genes from flax could provide valuable basic information about this important gene family and help to explain the seed specific glycosylated metabolite accumulation and other processes in plants. Plant genome sequencing projects are useful to discover complexity within this gene family and also pave way for the development of functional genomics approaches. Results Taking advantage of the newly assembled draft genome sequence of flax, we identified 137 UDP glycosyltransferase (UGT) genes from flax using a conserved signature motif. Phylogenetic analysis of these protein sequences clustered them into 14 major groups (A-N). Expression patterns of these genes were investigated using publicly available expressed sequence tag (EST), microarray data and reverse transcription quantitative real time PCR (RT-qPCR). Seventy-three per cent of these genes (100 out of 137) showed expression evidence in 15 tissues examined and indicated varied expression profiles. The RT-qPCR results of 10 selected genes were also coherent with the digital expression analysis. Interestingly, five duplicated UGT genes were identified, which showed differential expression in various tissues. Of the seven intron loss/gain positions detected, two intron positions were conserved among most of the UGTs, although a clear relationship about the evolution of these genes could not be established. Comparison of the flax UGTs with orthologs from four other sequenced dicot genomes indicated that seven UGTs were flax diverged. Conclusions Flax has a large number of UGT genes including few flax diverged ones. Phylogenetic analysis and expression profiles of these genes identified tissue and condition specific repertoire of UGT genes from this crop. This study would facilitate precise selection of candidate genes and their further characterization of substrate specificities and in planta functions. PMID:22568875

  7. Absolute quantification of microbial taxon abundances.

    PubMed

    Props, Ruben; Kerckhof, Frederiek-Maarten; Rubbens, Peter; De Vrieze, Jo; Hernandez Sanabria, Emma; Waegeman, Willem; Monsieurs, Pieter; Hammes, Frederik; Boon, Nico

    2017-02-01

    High-throughput amplicon sequencing has become a well-established approach for microbial community profiling. Correlating shifts in the relative abundances of bacterial taxa with environmental gradients is the goal of many microbiome surveys. As the abundances generated by this technology are semi-quantitative by definition, the observed dynamics may not accurately reflect those of the actual taxon densities. We combined the sequencing approach (16S rRNA gene) with robust single-cell enumeration technologies (flow cytometry) to quantify the absolute taxon abundances. A detailed longitudinal analysis of the absolute abundances resulted in distinct abundance profiles that were less ambiguous and expressed in units that can be directly compared across studies. We further provide evidence that the enrichment of taxa (increase in relative abundance) does not necessarily relate to the outgrowth of taxa (increase in absolute abundance). Our results highlight that both relative and absolute abundances should be considered for a comprehensive biological interpretation of microbiome surveys.

  8. FIST: a sensory domain for diverse signal transduction pathways in prokaryotes and ubiquitin signaling in eukaryotes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Borziak, Kirill; Jouline, Igor B

    2007-01-01

    Motivation: Sensory domains that are conserved among Bacteria, Archaea and Eucarya are important detectors of common signals detected by living cells. Due to their high sequence divergence, sensory domains are difficult to identify. We systematically look for novel sensory domains using sensitive profile-based searches initi-ated with regions of signal transduction proteins where no known domains can be identified by current domain models. Results: Using profile searches followed by multiple sequence alignment, structure prediction, and domain architecture analysis, we have identified a novel sensory domain termed FIST, which is present in signal transduction proteins from Bacteria, Archaea and Eucarya. Remote similaritymore » to a known ligand-binding fold and chromosomal proximity of FIST-encoding genes to those coding for proteins involved in amino acid metabolism and transport suggest that FIST domains bind small ligands, such as amino acids.« less

  9. [Genetic analysis of the putative remains of general Władysław Sikorski].

    PubMed

    Kupiec, Tomasz; Branicki, Wojciech

    2009-01-01

    The paper presents results of genetic identification studies carried out in material collected during exhumation of the putative body of general Władysław Sikorski, buried in a sarcophagus in Saint Leonard's crypt in the Wawel Cathedral. The analysis of STR-type autosomal markers, Y-STR markers and sequences of HVI and HVII regions of mitochondrial DNA carried out in samples collected for genetic analysis--fragments of the thigh bone and a tooth--yielded a full set of results. The same mtDNA profile was also determined in hair revealed on the underpants and shirt secured from the studied body. The mitochondrial DNA profile determined in the bone material and also in the hair matched the profile characteristic for a female relative through the maternal line of general Władysław Sikorski. The obtained evidence supports the hypothesis that the studied body is that of general Sikorski. An additional analysis of position SNP rs12913832 located on the HERC2 gene revealed the presence of genotype C/C, which suggests that general Władysław Sikorski had light (most probably blue) eyes.

  10. Current genetic methodologies in the identification of disaster victims and in forensic analysis.

    PubMed

    Ziętkiewicz, Ewa; Witt, Magdalena; Daca, Patrycja; Zebracka-Gala, Jadwiga; Goniewicz, Mariusz; Jarząb, Barbara; Witt, Michał

    2012-02-01

    This review presents the basic problems and currently available molecular techniques used for genetic profiling in disaster victim identification (DVI). The environmental conditions of a mass disaster often result in severe fragmentation, decomposition and intermixing of the remains of victims. In such cases, traditional identification based on the anthropological and physical characteristics of the victims is frequently inconclusive. This is the reason why DNA profiling became the gold standard for victim identification in mass-casualty incidents (MCIs) or any forensic cases where human remains are highly fragmented and/or degraded beyond recognition. The review provides general information about the sources of genetic material for DNA profiling, the genetic markers routinely used during genetic profiling (STR markers, mtDNA and single-nucleotide polymorphisms [SNP]) and the basic statistical approaches used in DNA-based disaster victim identification. Automated technological platforms that allow the simultaneous analysis of a multitude of genetic markers used in genetic identification (oligonucleotide microarray techniques and next-generation sequencing) are also presented. Forensic and population databases containing information on human variability, routinely used for statistical analyses, are discussed. The final part of this review is focused on recent developments, which offer particularly promising tools for forensic applications (mRNA analysis, transcriptome variation in individuals/populations and genetic profiling of specific cells separated from mixtures).

  11. Slice profile effects in 2D slice-selective MRI of hyperpolarized nuclei.

    PubMed

    Deppe, Martin H; Teh, Kevin; Parra-Robles, Juan; Lee, Kuan J; Wild, Jim M

    2010-02-01

    This work explores slice profile effects in 2D slice-selective gradient-echo MRI of hyperpolarized nuclei. Two different sequences were investigated: a Spoiled Gradient Echo sequence with variable flip angle (SPGR-VFA) and a balanced Steady-State Free Precession (SSFP) sequence. It is shown that in SPGR-VFA the distribution of flip angles across the slice present in any realistically shaped radiofrequency (RF) pulse leads to large excess signal from the slice edges in later RF views, which results in an undesired non-constant total transverse magnetization, potentially exceeding the initial value by almost 300% for the last RF pulse. A method to reduce this unwanted effect is demonstrated, based on dynamic scaling of the slice selection gradient. SSFP sequences with small to moderate flip angles (<40 degrees ) are also shown to preserve the slice profile better than the most commonly used SPGR sequence with constant flip angle (SPGR-CFA). For higher flip angles, the slice profile in SSFP evolves in a manner similar to SPGR-CFA, with depletion of polarization in the center of the slice. Copyright 2009 Elsevier Inc. All rights reserved.

  12. Bisulfite-independent analysis of CpG island methylation enables genome-scale stratification of single cells

    PubMed Central

    Han, Lin; Wu, Hua-Jun; Zhu, Haiying; Kim, Kun-Yong; Marjani, Sadie L.; Riester, Markus; Euskirchen, Ghia; Zi, Xiaoyuan; Yang, Jennifer; Han, Jasper; Snyder, Michael; Park, In-Hyun; Irizarry, Rafael; Weissman, Sherman M.

    2017-01-01

    Abstract Conventional DNA bisulfite sequencing has been extended to single cell level, but the coverage consistency is insufficient for parallel comparison. Here we report a novel method for genome-wide CpG island (CGI) methylation sequencing for single cells (scCGI-seq), combining methylation-sensitive restriction enzyme digestion and multiple displacement amplification for selective detection of methylated CGIs. We applied this method to analyzing single cells from two types of hematopoietic cells, K562 and GM12878 and small populations of fibroblasts and induced pluripotent stem cells. The method detected 21 798 CGIs (76% of all CGIs) per cell, and the number of CGIs consistently detected from all 16 profiled single cells was 20 864 (72.7%), with 12 961 promoters covered. This coverage represents a substantial improvement over results obtained using single cell reduced representation bisulfite sequencing, with a 66-fold increase in the fraction of consistently profiled CGIs across individual cells. Single cells of the same type were more similar to each other than to other types, but also displayed epigenetic heterogeneity. The method was further validated by comparing the CpG methylation pattern, methylation profile of CGIs/promoters and repeat regions and 41 classes of known regulatory markers to the ENCODE data. Although not every minor methylation differences between cells are detectable, scCGI-seq provides a solid tool for unsupervised stratification of a heterogeneous cell population. PMID:28126923

  13. Asymmetric histone modifications between the original and derived loci of human segmental duplications

    PubMed Central

    Zheng, Deyou

    2008-01-01

    Background Sequencing and annotation of several mammalian genomes have revealed that segmental duplications are a common architectural feature of primate genomes; in fact, about 5% of the human genome is composed of large blocks of interspersed segmental duplications. These segmental duplications have been implicated in genomic copy-number variation, gene novelty, and various genomic disorders. However, the molecular processes involved in the evolution and regulation of duplicated sequences remain largely unexplored. Results In this study, the profile of about 20 histone modifications within human segmental duplications was characterized using high-resolution, genome-wide data derived from a ChIP-Seq study. The analysis demonstrates that derivative loci of segmental duplications often differ significantly from the original with respect to many histone methylations. Further investigation showed that genes are present three times more frequently in the original than in the derivative, whereas pseudogenes exhibit the opposite trend. These asymmetries tend to increase with the age of segmental duplications. The uneven distribution of genes and pseudogenes does not, however, fully account for the asymmetry in the profile of histone modifications. Conclusion The first systematic analysis of histone modifications between segmental duplications demonstrates that two seemingly 'identical' genomic copies are distinct in their epigenomic properties. Results here suggest that local chromatin environments may be implicated in the discrimination of derived copies of segmental duplications from their originals, leading to a biased pseudogenization of the new duplicates. The data also indicate that further exploration of the interactions between histone modification and sequence degeneration is necessary in order to understand the divergence of duplicated sequences. PMID:18598352

  14. Unravelling the Molecular Epidemiology and Genetic Diversity among Burkholderia pseudomallei Isolates from South India Using Multi-Locus Sequence Typing.

    PubMed

    Tellapragada, Chaitanya; Kamthan, Aayushi; Shaw, Tushar; Ke, Vandana; Kumar, Subodh; Bhat, Vinod; Mukhopadhyay, Chiranjay

    2016-01-01

    There is a slow but steady rise in the case detection rates of melioidosis from various parts of the Indian sub-continent in the past two decades. However, the epidemiology of the disease in India and the surrounding South Asian countries remains far from well elucidated. Multi-locus sequence typing (MLST) is a useful epidemiological tool to study the genetic relatedness of bacterial isolates both with-in and across the countries. With this background, we studied the molecular epidemiology of 32 Burkholderia pseudomallei isolates (31 clinical and 1 soil isolate) obtained during 2006-2015 from various parts of south India using multi-locus sequencing typing and analysis. Of the 32 isolates included in the analysis, 30 (93.7%) had novel allelic profiles that were not reported previously. Sequence type (ST) 1368 (n = 15, 46.8%) with allelic profile (1, 4, 6, 4, 1, 1, 3) was the most common genotype observed. We did not observe a genotypic association of STs with geographical location, type of infection and year of isolation in the present study. Measure of genetic differentiation (FST) between Indian and the rest of world isolates was 0.14413. Occurrence of the same ST across three adjacent states of south India suggest the dispersion of B.pseudomallei across the south western coastal part of India with limited geographical clustering. However, majority of the STs reported from the present study remained as "outliers" on the eBURST "Population snapshot", suggesting the genetic diversity of Indian isolates from the Australasian and Southeast Asian isolates.

  15. Isolation and N-terminal sequencing of a novel cadmium-binding protein from Boletus edulis

    NASA Astrophysics Data System (ADS)

    Collin-Hansen, C.; Andersen, R. A.; Steinnes, E.

    2003-05-01

    A Cd-binding protein was isolated from the popular edible mushroom Boletus edulis, which is a hyperaccumulator of both Cd and Hg. Wild-growing samples of B. edulis were collected from soils rich in Cd. Cd radiotracer was added to the crude protein preparation obtained from ethanol precipitation of heat-treated cytosol. Proteins were then further separated in two consecutive steps; gel filtration and anion exchange chromatography. In both steps the Cd radiotracer profile showed only one distinct peak, which corresponded well with the profiles of endogenous Cd obtained by atomic absorption spectrophotometry (AAS). Concentrations of the essential elements Cu and Zn were low in the protein fractions high in Cd. N-terminal sequencing performed on the Cd-binding protein fractions revealed a protein with a novel amino acid sequence, which contained aromatic amino acids as well as proline. Both the N-terminal sequencing and spectrofluorimetric analysis with EDTA and ABD-F (4-aminosulfonyl-7-fluoro-2, 1, 3-benzoxadiazole) failed to detect cysteine in the Cd-binding fractions. These findings conclude that the novel protein does not belong to the metallothionein family. The results suggest a role for the protein in Cd transport and storage, and they are of importance in view of toxicology and food chemistry, but also for environmental protection.

  16. Profiling the resting venom gland of the scorpion Tityus stigmurus through a transcriptomic survey.

    PubMed

    Almeida, Diego D; Scortecci, Katia C; Kobashi, Leonardo S; Agnez-Lima, Lucymara F; Medeiros, Silvia R B; Silva-Junior, Arnóbio A; Junqueira-de-Azevedo, Inácio de L M; Fernandes-Pedrosa, Matheus de F

    2012-08-01

    The scorpion Tityus stigmurus is widely distributed in Northeastern Brazil and known to cause severe human envenoming, inducing pain, hyposthesia, edema, erythema, paresthesia, headaches and vomiting. The present study uses a transcriptomic approach to characterize the gene expression profile from the non-stimulated venom gland of Tityus stigmurus scorpion. A cDNA library was constructed and 540 clones were sequenced and grouped into 153 clusters, with one or more ESTs (expressed sequence tags). Forty-one percent of ESTs belong to recognized toxin-coding sequences, with transcripts encoding antimicrobial toxins (AMP-like) being the most abundant, followed by alfa KTx- like, beta KTx-like, beta NaTx-like and alfa NaTx-like. Our analysis indicated that 34% of the transcripts encode "other possible venom molecules", which correspond to anionic peptides, hypothetical secreted peptides, metalloproteinases, cystein-rich peptides and lectins. Fifteen percent of ESTs are similar to cellular transcripts. Sequences without good matches corresponded to 11%. This investigation provides the first global view of gene expression of the venom gland from Tityus stigmurus under resting conditions. This approach enables characterization of a large number of venom gland component molecules, which belong either to known or non yet described types of venom peptides and proteins from the Buthidae family.

  17. The molecular genetic makeup of acute lymphoblastic leukemia.

    PubMed

    Mullighan, Charles G

    2012-01-01

    Genomic profiling has transformed our understanding of the genetic basis of acute lymphoblastic leukemia (ALL). Recent years have seen a shift from microarray analysis and candidate gene sequencing to next-generation sequencing. Together, these approaches have shown that many ALL subtypes are characterized by constellations of structural rearrangements, submicroscopic DNA copy number alterations, and sequence mutations, several of which have clear implications for risk stratification and targeted therapeutic intervention. Mutations in genes regulating lymphoid development are a hallmark of ALL, and alterations of the lymphoid transcription factor gene IKZF1 (IKAROS) are associated with a high risk of treatment failure in B-ALL. Approximately 20% of B-ALL cases harbor genetic alterations that activate kinase signaling that may be amenable to treatment with tyrosine kinase inhibitors, including rearrangements of the cytokine receptor gene CRLF2; rearrangements of ABL1, JAK2, and PDGFRB; and mutations of JAK1 and JAK2. Whole-genome sequencing has also identified novel targets of mutation in aggressive T-lineage ALL, including hematopoietic regulators (ETV6 and RUNX1), tyrosine kinases, and epigenetic regulators. Challenges for the future are to comprehensively identify and experimentally validate all genetic alterations driving leukemogenesis and treatment failure in childhood and adult ALL and to implement genomic profiling into the clinical setting to guide risk stratification and targeted therapy.

  18. Intrinsic challenges in ancient microbiome reconstruction using 16S rRNA gene amplification.

    PubMed

    Ziesemer, Kirsten A; Mann, Allison E; Sankaranarayanan, Krithivasan; Schroeder, Hannes; Ozga, Andrew T; Brandt, Bernd W; Zaura, Egija; Waters-Rist, Andrea; Hoogland, Menno; Salazar-García, Domingo C; Aldenderfer, Mark; Speller, Camilla; Hendy, Jessica; Weston, Darlene A; MacDonald, Sandy J; Thomas, Gavin H; Collins, Matthew J; Lewis, Cecil M; Hofman, Corinne; Warinner, Christina

    2015-11-13

    To date, characterization of ancient oral (dental calculus) and gut (coprolite) microbiota has been primarily accomplished through a metataxonomic approach involving targeted amplification of one or more variable regions in the 16S rRNA gene. Specifically, the V3 region (E. coli 341-534) of this gene has been suggested as an excellent candidate for ancient DNA amplification and microbial community reconstruction. However, in practice this metataxonomic approach often produces highly skewed taxonomic frequency data. In this study, we use non-targeted (shotgun metagenomics) sequencing methods to better understand skewed microbial profiles observed in four ancient dental calculus specimens previously analyzed by amplicon sequencing. Through comparisons of microbial taxonomic counts from paired amplicon (V3 U341F/534R) and shotgun sequencing datasets, we demonstrate that extensive length polymorphisms in the V3 region are a consistent and major cause of differential amplification leading to taxonomic bias in ancient microbiome reconstructions based on amplicon sequencing. We conclude that systematic amplification bias confounds attempts to accurately reconstruct microbiome taxonomic profiles from 16S rRNA V3 amplicon data generated using universal primers. Because in silico analysis indicates that alternative 16S rRNA hypervariable regions will present similar challenges, we advocate for the use of a shotgun metagenomics approach in ancient microbiome reconstructions.

  19. Intrinsic challenges in ancient microbiome reconstruction using 16S rRNA gene amplification

    PubMed Central

    Ziesemer, Kirsten A.; Mann, Allison E.; Sankaranarayanan, Krithivasan; Schroeder, Hannes; Ozga, Andrew T.; Brandt, Bernd W.; Zaura, Egija; Waters-Rist, Andrea; Hoogland, Menno; Salazar-García, Domingo C.; Aldenderfer, Mark; Speller, Camilla; Hendy, Jessica; Weston, Darlene A.; MacDonald, Sandy J.; Thomas, Gavin H.; Collins, Matthew J.; Lewis, Cecil M.; Hofman, Corinne; Warinner, Christina

    2015-01-01

    To date, characterization of ancient oral (dental calculus) and gut (coprolite) microbiota has been primarily accomplished through a metataxonomic approach involving targeted amplification of one or more variable regions in the 16S rRNA gene. Specifically, the V3 region (E. coli 341–534) of this gene has been suggested as an excellent candidate for ancient DNA amplification and microbial community reconstruction. However, in practice this metataxonomic approach often produces highly skewed taxonomic frequency data. In this study, we use non-targeted (shotgun metagenomics) sequencing methods to better understand skewed microbial profiles observed in four ancient dental calculus specimens previously analyzed by amplicon sequencing. Through comparisons of microbial taxonomic counts from paired amplicon (V3 U341F/534R) and shotgun sequencing datasets, we demonstrate that extensive length polymorphisms in the V3 region are a consistent and major cause of differential amplification leading to taxonomic bias in ancient microbiome reconstructions based on amplicon sequencing. We conclude that systematic amplification bias confounds attempts to accurately reconstruct microbiome taxonomic profiles from 16S rRNA V3 amplicon data generated using universal primers. Because in silico analysis indicates that alternative 16S rRNA hypervariable regions will present similar challenges, we advocate for the use of a shotgun metagenomics approach in ancient microbiome reconstructions. PMID:26563586

  20. Transcriptome and Small RNA Deep Sequencing Reveals Deregulation of miRNA Biogenesis in Human Glioma

    PubMed Central

    Moore, Lynette M.; Kivinen, Virpi; Liu, Yuexin; Annala, Matti; Cogdell, David; Liu, Xiuping; Liu, Chang-Gong; Sawaya, Raymond; Yli-Harja, Olli; Shmulevich, Ilya; Fuller, Gregory N.; Zhang, Wei; Nykter, Matti

    2013-01-01

    Altered expression of oncogenic and tumor-suppressing microRNAs (miRNAs) is widely associated with tumorigenesis. However, the regulatory mechanisms underlying these alterations are poorly understood. We sought to shed light on the deregulation of miRNA biogenesis promoting the aberrant miRNA expression profiles identified in these tumors. Using sequencing technology to perform both whole-transcriptome and small RNA sequencing of glioma patient samples, we examined precursor and mature miRNAs to directly evaluate the miRNA maturation process, and interrogated expression profiles for genes involved in the major steps of miRNA biogenesis. We found that ratios of mature to precursor forms of a large number of miRNAs increased with the progression from normal brain to low-grade and then to high-grade gliomas. The expression levels of genes involved in each of the three major steps of miRNA biogenesis (nuclear processing, nucleo-cytoplasmic transport, and cytoplasmic processing) were systematically altered in glioma tissues. Survival analysis of an independent data set demonstrated that the alteration of genes involved in miRNA maturation correlates with survival in glioma patients. Direct quantification of miRNA maturation with deep sequencing demonstrated that deregulation of the miRNA biogenesis pathway is a hallmark for glioma genesis and progression. PMID:23007860

  1. MethVisual - visualization and exploratory statistical analysis of DNA methylation profiles from bisulfite sequencing.

    PubMed

    Zackay, Arie; Steinhoff, Christine

    2010-12-15

    Exploration of DNA methylation and its impact on various regulatory mechanisms has become a very active field of research. Simultaneously there is an arising need for tools to process and analyse the data together with statistical investigation and visualisation. MethVisual is a new application that enables exploratory analysis and intuitive visualization of DNA methylation data as is typically generated by bisulfite sequencing. The package allows the import of DNA methylation sequences, aligns them and performs quality control comparison. It comprises basic analysis steps as lollipop visualization, co-occurrence display of methylation of neighbouring and distant CpG sites, summary statistics on methylation status, clustering and correspondence analysis. The package has been developed for methylation data but can be also used for other data types for which binary coding can be inferred. The application of the package, as well as a comparison to existing DNA methylation analysis tools and its workflow based on two datasets is presented in this paper. The R package MethVisual offers various analysis procedures for data that can be binarized, in particular for bisulfite sequenced methylation data. R/Bioconductor has become one of the most important environments for statistical analysis of various types of biological and medical data. Therefore, any data analysis within R that allows the integration of various data types as provided from different technological platforms is convenient. It is the first and so far the only specific package for DNA methylation analysis, in particular for bisulfite sequenced data available in R/Bioconductor enviroment. The package is available for free at http://methvisual.molgen.mpg.de/ and from the Bioconductor Consortium http://www.bioconductor.org.

  2. MethVisual - visualization and exploratory statistical analysis of DNA methylation profiles from bisulfite sequencing

    PubMed Central

    2010-01-01

    Background Exploration of DNA methylation and its impact on various regulatory mechanisms has become a very active field of research. Simultaneously there is an arising need for tools to process and analyse the data together with statistical investigation and visualisation. Findings MethVisual is a new application that enables exploratory analysis and intuitive visualization of DNA methylation data as is typically generated by bisulfite sequencing. The package allows the import of DNA methylation sequences, aligns them and performs quality control comparison. It comprises basic analysis steps as lollipop visualization, co-occurrence display of methylation of neighbouring and distant CpG sites, summary statistics on methylation status, clustering and correspondence analysis. The package has been developed for methylation data but can be also used for other data types for which binary coding can be inferred. The application of the package, as well as a comparison to existing DNA methylation analysis tools and its workflow based on two datasets is presented in this paper. Conclusions The R package MethVisual offers various analysis procedures for data that can be binarized, in particular for bisulfite sequenced methylation data. R/Bioconductor has become one of the most important environments for statistical analysis of various types of biological and medical data. Therefore, any data analysis within R that allows the integration of various data types as provided from different technological platforms is convenient. It is the first and so far the only specific package for DNA methylation analysis, in particular for bisulfite sequenced data available in R/Bioconductor enviroment. The package is available for free at http://methvisual.molgen.mpg.de/ and from the Bioconductor Consortium http://www.bioconductor.org. PMID:21159174

  3. Gene Structures, Evolution and Transcriptional Profiling of the WRKY Gene Family in Castor Bean (Ricinus communis L.).

    PubMed

    Zou, Zhi; Yang, Lifu; Wang, Danhua; Huang, Qixing; Mo, Yeyong; Xie, Guishui

    2016-01-01

    WRKY proteins comprise one of the largest transcription factor families in plants and form key regulators of many plant processes. This study presents the characterization of 58 WRKY genes from the castor bean (Ricinus communis L., Euphorbiaceae) genome. Compared with the automatic genome annotation, one more WRKY-encoding locus was identified and 20 out of the 57 predicted gene models were manually corrected. All RcWRKY genes were shown to contain at least one intron in their coding sequences. According to the structural features of the present WRKY domains, the identified RcWRKY genes were assigned to three previously defined groups (I-III). Although castor bean underwent no recent whole-genome duplication event like physic nut (Jatropha curcas L., Euphorbiaceae), comparative genomics analysis indicated that one gene loss, one intron loss and one recent proximal duplication occurred in the RcWRKY gene family. The expression of all 58 RcWRKY genes was supported by ESTs and/or RNA sequencing reads derived from roots, leaves, flowers, seeds and endosperms. Further global expression profiles with RNA sequencing data revealed diverse expression patterns among various tissues. Results obtained from this study not only provide valuable information for future functional analysis and utilization of the castor bean WRKY genes, but also provide a useful reference to investigate the gene family expansion and evolution in Euphorbiaceus plants.

  4. Uncovering microRNA-mediated response to SO2 stress in Arabidopsis thaliana by deep sequencing.

    PubMed

    Li, Lihong; Xue, Meizhao; Yi, Huilan

    2016-10-05

    Sulfur dioxide (SO2) is a major air pollutant and has significant impacts on plants. MicroRNAs (miRNAs) are a class of gene expression regulators that play important roles in response to environmental stresses. In this study, deep sequencing was used for genome-wide identification of miRNAs and their expression profiles in response to SO2 stress in Arabidopsis thaliana shoots. A total of 27 conserved miRNAs and 5 novel miRNAs were found to be differentially expressed under SO2 stress. qRT-PCR analysis showed mostly negative correlation between miRNA accumulation and target gene mRNA abundance, suggesting regulatory roles of these miRNAs during SO2 exposure. The target genes of SO2-responsive miRNAs encode transcription factors and proteins that regulate auxin signaling and stress response, and the miRNAs-mediated suppression of these genes could improve plant resistance to SO2 stress. Promoter sequence analysis of genes encoding SO2-responsive miRNAs showed that stress-responsive and phytohormone-related cis-regulatory elements occurred frequently, providing additional evidence of the involvement of miRNAs in adaption to SO2 stress. This study represents a comprehensive expression profiling of SO2-responsive miRNAs in Arabidopsis and broads our perspective on the ubiquitous regulatory roles of miRNAs under stress conditions. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. Mycobacterium shottsii sp. nov., a slowly growing species isolated from Chesapeake Bay striped bass (Morone saxatilis)

    USGS Publications Warehouse

    Rhodes, M.W.; Kator, H.; Kotob, S.; van Berkum, P.; Kaattari, I.; Vogelbein, W.; Quinn, F.; Floyd, M.M.; Butler, W.R.; Ottinger, C.A.

    2003-01-01

    Slowly growing, non-pigmented mycobacteria were isolated from striped bass (Morone saxatilis) during an epizootic of mycobacteriosis in the Chesapeake Bay. Growth characteristics, acid-fastness and results of 16S rRNA gene sequencing were consistent with those of the genus Mycobacterium. A unique profile of biochemical reactions was observed among the 21 isolates. A single cluster of eight peaks identified by analysis of mycolic acids (HPLC) resembled those of reference patterns but differed in peak elution times from profiles of reference species of the Mycobacterium tuberculosis complex. One isolate (M175T) was placed within the slowly growing mycobacteria by analysis of aligned 16S rRNA gene sequences and was proximate in phylogeny to Mycobacterium ulcerans and Mycobacterium marinum. However, distinct nucleotide differences were detected in the 16S rRNA gene sequence among M175T, M. ulcerans and M. marinum (99.2% similarity). Isolate M175T could be differentiated from other slowly growing, non-pigmented mycobacteria by its inability to grow at 37??C, production of niacin and urease, absence of nitrate reductase and resistance to isoniazid (1 ??g ml-1), thiacetazone and thiophene-2-carboxylic hydrazide. Based upon these genetic and phenotypic differences, isolate M175T (= ATCC 700981T = NCTC 13215T) is proposed as the type strain of a novel species, Mycobacterium shottsii sp. nov.

  6. Qualitative analysis of the vaginal microbiota of healthy cattle and cattle with genital-tract disease.

    PubMed

    Rodrigues, N F; Kästle, J; Coutinho, T J D; Amorim, A T; Campos, G B; Santos, V M; Marques, L M; Timenetsky, J; de Farias, S T

    2015-06-12

    The microbial community of the reproductive appara-tus, when known, can provide information about the health of the host. Metagenomics has been used to characterize and obtain genetic infor-mation about microbial communities in various environments and can relate certain diseases with changes in this community composition. In this study, samples of vaginal surface mucosal secretions were col-lected from five healthy cows and five cows that showed symptoms of reproductive disorders. Following high-throughput sequencing of the isolated microbial DNA, data were processed using the Mothur soft-ware to remove low-quality sequences and chimeras, and released to the Ribosomal Database Project for classification of operational taxo-nomic units (OTUs). Local BLASTn was performed and results were loaded into the MEGAN program for viewing profiles and taxonomic microbial attributes. The control profile comprised a total of 15 taxa, with Bacteroides, Enterobacteriaceae, and Victivallis comprising the highest representation of OTUs; the reproductive disorder-positive profile comprised 68 taxa, with Bacteroides, Enterobacteriaceae, His-tophilus, Victivallis, Alistipes, and Coriobacteriaceae being the taxa with the most OTU representation. A change was observed in both the community composition as well as in the microbial attributes of the profiles, suggesting that a relationship might exist between the patho-gen and representative taxa, reflecting the production of metabolites to disease progression.

  7. Transcriptome analysis of Cymbidium sinense and its application to the identification of genes associated with floral development

    PubMed Central

    2013-01-01

    Background Cymbidium sinense belongs to the Orchidaceae, which is one of the most abundant angiosperm families. C. sinense, a high-grade traditional potted flower, is most prevalent in China and some Southeast Asian countries. The control of flowering time is a major bottleneck in the industrialized development of C. sinense. Little is known about the mechanisms responsible for floral development in this orchid. Moreover, genome references for entire transcriptome sequences do not currently exist for C. sinense. Thus, transcriptome and expression profiling data for this species are needed as an important resource to identify genes and to better understand the biological mechanisms of floral development in C. sinense. Results In this study, de novo transcriptome assembly and gene expression analysis using Illumina sequencing technology were performed. Transcriptome analysis assembles gene-related information related to vegetative and reproductive growth of C. sinense. Illumina sequencing generated 54,248,006 high quality reads that were assembled into 83,580 unigenes with an average sequence length of 612 base pairs, including 13,315 clusters and 70,265 singletons. A total of 41,687 (49.88%) unique sequences were annotated, 23,092 of which were assigned to specific metabolic pathways by the Kyoto Encyclopedia of Genes and Genomes (KEGG). Gene Ontology (GO) analysis of the annotated unigenes revealed that the majority of sequenced genes were associated with metabolic and cellular processes, cell and cell parts, catalytic activity and binding. Furthermore, 120 flowering-associated unigenes, 73 MADS-box unigenes and 28 CONSTANS-LIKE (COL) unigenes were identified from our collection. In addition, three digital gene expression (DGE) libraries were constructed for the vegetative phase (VP), floral differentiation phase (FDP) and reproductive phase (RP). The specific expression of many genes in the three development phases was also identified. 32 genes among three sub-libraries with high differential expression were selected as candidates connected with flower development. Conclusion RNA-seq and DGE profiling data provided comprehensive gene expression information at the transcriptional level that could facilitate our understanding of the molecular mechanisms of floral development at three development phases of C. sinense. This data could be used as an important resource for investigating the genetics of the flowering pathway and various biological mechanisms in this orchid. PMID:23617896

  8. Transcriptome analysis of Cymbidium sinense and its application to the identification of genes associated with floral development.

    PubMed

    Zhang, Jianxia; Wu, Kunlin; Zeng, Songjun; Teixeira da Silva, Jaime A; Zhao, Xiaolan; Tian, Chang-En; Xia, Haoqiang; Duan, Jun

    2013-04-24

    Cymbidium sinense belongs to the Orchidaceae, which is one of the most abundant angiosperm families. C. sinense, a high-grade traditional potted flower, is most prevalent in China and some Southeast Asian countries. The control of flowering time is a major bottleneck in the industrialized development of C. sinense. Little is known about the mechanisms responsible for floral development in this orchid. Moreover, genome references for entire transcriptome sequences do not currently exist for C. sinense. Thus, transcriptome and expression profiling data for this species are needed as an important resource to identify genes and to better understand the biological mechanisms of floral development in C. sinense. In this study, de novo transcriptome assembly and gene expression analysis using Illumina sequencing technology were performed. Transcriptome analysis assembles gene-related information related to vegetative and reproductive growth of C. sinense. Illumina sequencing generated 54,248,006 high quality reads that were assembled into 83,580 unigenes with an average sequence length of 612 base pairs, including 13,315 clusters and 70,265 singletons. A total of 41,687 (49.88%) unique sequences were annotated, 23,092 of which were assigned to specific metabolic pathways by the Kyoto Encyclopedia of Genes and Genomes (KEGG). Gene Ontology (GO) analysis of the annotated unigenes revealed that the majority of sequenced genes were associated with metabolic and cellular processes, cell and cell parts, catalytic activity and binding. Furthermore, 120 flowering-associated unigenes, 73 MADS-box unigenes and 28 CONSTANS-LIKE (COL) unigenes were identified from our collection. In addition, three digital gene expression (DGE) libraries were constructed for the vegetative phase (VP), floral differentiation phase (FDP) and reproductive phase (RP). The specific expression of many genes in the three development phases was also identified. 32 genes among three sub-libraries with high differential expression were selected as candidates connected with flower development. RNA-seq and DGE profiling data provided comprehensive gene expression information at the transcriptional level that could facilitate our understanding of the molecular mechanisms of floral development at three development phases of C. sinense. This data could be used as an important resource for investigating the genetics of the flowering pathway and various biological mechanisms in this orchid.

  9. Structure-related statistical singularities along protein sequences: a correlation study.

    PubMed

    Colafranceschi, Mauro; Colosimo, Alfredo; Zbilut, Joseph P; Uversky, Vladimir N; Giuliani, Alessandro

    2005-01-01

    A data set composed of 1141 proteins representative of all eukaryotic protein sequences in the Swiss-Prot Protein Knowledge base was coded by seven physicochemical properties of amino acid residues. The resulting numerical profiles were submitted to correlation analysis after the application of a linear (simple mean) and a nonlinear (Recurrence Quantification Analysis, RQA) filter. The main RQA variables, Recurrence and Determinism, were subsequently analyzed by Principal Component Analysis. The RQA descriptors showed that (i) within protein sequences is embedded specific information neither present in the codes nor in the amino acid composition and (ii) the most sensitive code for detecting ordered recurrent (deterministic) patterns of residues in protein sequences is the Miyazawa-Jernigan hydrophobicity scale. The most deterministic proteins in terms of autocorrelation properties of primary structures were found (i) to be involved in protein-protein and protein-DNA interactions and (ii) to display a significantly higher proportion of structural disorder with respect to the average data set. A study of the scaling behavior of the average determinism with the setting parameters of RQA (embedding dimension and radius) allows for the identification of patterns of minimal length (six residues) as possible markers of zones specifically prone to inter- and intramolecular interactions.

  10. DROMPA: easy-to-handle peak calling and visualization software for the computational analysis and validation of ChIP-seq data.

    PubMed

    Nakato, Ryuichiro; Itoh, Tahehiko; Shirahige, Katsuhiko

    2013-07-01

    Chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq) can identify genomic regions that bind proteins involved in various chromosomal functions. Although the development of next-generation sequencers offers the technology needed to identify these protein-binding sites, the analysis can be computationally challenging because sequencing data sometimes consist of >100 million reads/sample. Herein, we describe a cost-effective and time-efficient protocol that is generally applicable to ChIP-seq analysis; this protocol uses a novel peak-calling program termed DROMPA to identify peaks and an additional program, parse2wig, to preprocess read-map files. This two-step procedure drastically reduces computational time and memory requirements compared with other programs. DROMPA enables the identification of protein localization sites in repetitive sequences and efficiently identifies both broad and sharp protein localization peaks. Specifically, DROMPA outputs a protein-binding profile map in pdf or png format, which can be easily manipulated by users who have a limited background in bioinformatics. © 2013 The Authors Genes to Cells © 2013 by the Molecular Biology Society of Japan and Wiley Publishing Asia Pty Ltd.

  11. Mutation detection in the human HSP70B′ gene by denaturing high-performance liquid chromatography

    PubMed Central

    Hecker, Karl H.; Asea, Alexzander; Kobayashi, Kaoru; Green, Stacy; Tang, Dan; Calderwood, Stuart K.

    2000-01-01

    Variances, particularly single nucleotide polymorphisms (SNP), in the genomic sequence of individuals are the primary key to understanding gene function as it relates to differences in the susceptibility to disease, environmental influences, and therapy. In this report, the HSP70B′ gene is the target sequence for mutation detection in biopsy samples from human prostate cancer patients undergoing combined hyperthermia and radiation therapy at the Dana-Farber Cancer Institute, using temperature-modulated heteroduplex analysis (TMHA). The underlying principles of TMHA for mutation detection using DHPLC technology are discussed. The procedures involved in amplicon design for mutation analysis by DHPLC are detailed. The melting behavior of the complete coding sequence of the target gene is characterized using WAVEMAKERTM software. Four overlapping amplicons, which span the complete coding region of the HSP70B′ gene, amenable to mutation detection by DHPLC were identified based on the software-predicted melting profile of the target sequence. TMHA was performed on PCR products of individual amplicons of the HSP70B′ gene on the WAVE® Nucleic Acid Fragment Analysis System. The criteria for mutation calling by comparing wild-type and mutant chromatographic patterns are discussed. PMID:11189446

  12. Mutation detection in the human HSP7OB' gene by denaturing high-performance liquid chromatography.

    PubMed

    Hecker, K H; Asea, A; Kobayashi, K; Green, S; Tang, D; Calderwood, S K

    2000-11-01

    Variances, particularly single nucleotide polymorphisms (SNP), in the genomic sequence of individuals are the primary key to understanding gene function as it relates to differences in the susceptibility to disease, environmental influences, and therapy. In this report, the HSP70B' gene is the target sequence for mutation detection in biopsy samples from human prostate cancer patients undergoing combined hyperthermia and radiation therapy at the Dana-Farber Cancer Institute, using temperature-modulated heteroduplex analysis (TMHA). The underlying principles of TMHA for mutation detection using DHPLC technology are discussed. The procedures involved in amplicon design for mutation analysis by DHPLC are detailed. The melting behavior of the complete coding sequence of the target gene is characterized using WAVEMAKER software. Four overlapping amplicons, which span the complete coding region of the HSP70B' gene, amenable to mutation detection by DHPLC were identified based on the software-predicted melting profile of the target sequence. TMHA was performed on PCR products of individual amplicons of the HSP70B' gene on the WAVE Nucleic Acid Fragment Analysis System. The criteria for mutation calling by comparing wild-type and mutant chromatographic patterns are discussed.

  13. Flexible, fast and accurate sequence alignment profiling on GPGPU with PaSWAS.

    PubMed

    Warris, Sven; Yalcin, Feyruz; Jackson, Katherine J L; Nap, Jan Peter

    2015-01-01

    To obtain large-scale sequence alignments in a fast and flexible way is an important step in the analyses of next generation sequencing data. Applications based on the Smith-Waterman (SW) algorithm are often either not fast enough, limited to dedicated tasks or not sufficiently accurate due to statistical issues. Current SW implementations that run on graphics hardware do not report the alignment details necessary for further analysis. With the Parallel SW Alignment Software (PaSWAS) it is possible (a) to have easy access to the computational power of NVIDIA-based general purpose graphics processing units (GPGPUs) to perform high-speed sequence alignments, and (b) retrieve relevant information such as score, number of gaps and mismatches. The software reports multiple hits per alignment. The added value of the new SW implementation is demonstrated with two test cases: (1) tag recovery in next generation sequence data and (2) isotype assignment within an immunoglobulin 454 sequence data set. Both cases show the usability and versatility of the new parallel Smith-Waterman implementation.

  14. [The future of forensic DNA analysis for criminal justice].

    PubMed

    Laurent, François-Xavier; Vibrac, Geoffrey; Rubio, Aurélien; Thévenot, Marie-Thérèse; Pène, Laurent

    2017-11-01

    In the criminal framework, the analysis of approximately 20 DNA microsatellites enables the establishment of a genetic profile with a high statistical power of discrimination. This technique gives us the possibility to establish or exclude a match between a biological trace detected at a crime scene and a suspect whose DNA was collected via an oral swab. However, conventional techniques do tend to complexify the interpretation of complex DNA samples, such as degraded DNA and mixture DNA. The aim of this review is to highlight the powerness of new forensic DNA methods (including high-throughput sequencing or single-cell sequencing) to facilitate the interpretation of the expert with full compliance with existing french legislation. © 2017 médecine/sciences – Inserm.

  15. Benthic bacterial diversity in submerged sinkhole ecosystems.

    PubMed

    Nold, Stephen C; Pangborn, Joseph B; Zajack, Heidi A; Kendall, Scott T; Rediske, Richard R; Biddanda, Bopaiah A

    2010-01-01

    Physicochemical characterization, automated ribosomal intergenic spacer analysis (ARISA) community profiling, and 16S rRNA gene sequencing approaches were used to study bacterial communities inhabiting submerged Lake Huron sinkholes inundated with hypoxic, sulfate-rich groundwater. Photosynthetic cyanobacterial mats on the sediment surface were dominated by Phormidium autumnale, while deeper, organically rich sediments contained diverse and active bacterial communities.

  16. funRNA: a fungi-centered genomics platform for genes encoding key components of RNAi.

    PubMed

    Choi, Jaeyoung; Kim, Ki-Tae; Jeon, Jongbum; Wu, Jiayao; Song, Hyeunjeong; Asiegbu, Fred O; Lee, Yong-Hwan

    2014-01-01

    RNA interference (RNAi) is involved in genome defense as well as diverse cellular, developmental, and physiological processes. Key components of RNAi are Argonaute, Dicer, and RNA-dependent RNA polymerase (RdRP), which have been functionally characterized mainly in model organisms. The key components are believed to exist throughout eukaryotes; however, there is no systematic platform for archiving and dissecting these important gene families. In addition, few fungi have been studied to date, limiting our understanding of RNAi in fungi. Here we present funRNA http://funrna.riceblast.snu.ac.kr/, a fungal kingdom-wide comparative genomics platform for putative genes encoding Argonaute, Dicer, and RdRP. To identify and archive genes encoding the abovementioned key components, protein domain profiles were determined from reference sequences obtained from UniProtKB/SwissProt. The domain profiles were searched using fungal, metazoan, and plant genomes, as well as bacterial and archaeal genomes. 1,163, 442, and 678 genes encoding Argonaute, Dicer, and RdRP, respectively, were predicted. Based on the identification results, active site variation of Argonaute, diversification of Dicer, and sequence analysis of RdRP were discussed in a fungus-oriented manner. funRNA provides results from diverse bioinformatics programs and job submission forms for BLAST, BLASTMatrix, and ClustalW. Furthermore, sequence collections created in funRNA are synced with several gene family analysis portals and databases, offering further analysis opportunities. funRNA provides identification results from a broad taxonomic range and diverse analysis functions, and could be used in diverse comparative and evolutionary studies. It could serve as a versatile genomics workbench for key components of RNAi.

  17. De novo assembly and characterization of Muscovy duck liver transcriptome and analysis of differentially regulated genes in response to heat stress.

    PubMed

    Zeng, Tao; Zhang, Liping; Li, Jinjun; Wang, Deqian; Tian, Yong; Lu, Lizhi

    2015-05-01

    High temperature is a major abiotic stress limiting animal growth and productivity worldwide. The Muscovy duck (Cairina moschata), sometimes called the Barbary drake, is a type of duck with a fairly unusual domestication history. In Southeast Asia, duck meat is one of the top meats consumed, and as such, the production of the meat is an important topic of research. The transcriptomic and genomic data presently available are insufficient to understanding the molecular mechanism underlying the heat tolerance of Muscovy ducks. Thus, transcriptome and expression profiling data for this species are required as important resource for identifying genes and developing molecular marker. In this study, de novo transcriptome assembly and gene expression analysis using Illumina sequencing technology were performed. More than 225 million clean reads were generated and assembled into 36,903 unique transcripts with an average length of 1,135 bp. A total of 21,221 (57.50 %) unigenes were annotated. Gene Ontology (GO) analysis of the annotated unigenes revealed that the majority of sequenced genes were associated with transcription, signal transduction, and apoptosis. We also performed gene expression profiling analysis upon heat treatment in Muscovy ducks and identified 470 heat-response unique transcripts. GO term enrichment showed that protein folding and chaperone binding were significant enrichment, whereas KEGG pathway analyses showed that Ras and MAPKs were activated after heat stress in Muscovy ducks. Our research enriched sequences information of Muscovy duck, provided novel insights into responses to heat stress in these ducks, and serve as candidate genes or markers that can be used to guide future efforts to breed heat-tolerant duck strains.

  18. Multi-Omics Analysis Reveals a Correlation between the Host Phylogeny, Gut Microbiota and Metabolite Profiles in Cyprinid Fishes

    PubMed Central

    Li, Tongtong; Long, Meng; Li, Huan; Gatesoupe, François-Joël; Zhang, Xujie; Zhang, Qianqian; Feng, Dongyue; Li, Aihua

    2017-01-01

    Gut microbiota play key roles in host nutrition and metabolism. However, little is known about the relationship between host genetics, gut microbiota and metabolic profiles. Here, we used high-throughput sequencing and gas chromatography/mass spectrometry approaches to characterize the microbiota composition and the metabolite profiles in the gut of five cyprinid fish species with three different feeding habits raised under identical husbandry conditions. Our results showed that host species and feeding habits significantly affect not only gut microbiota composition but also metabolite profiles (ANOSIM, p ≤ 0.05). Mantel test demonstrated that host phylogeny, gut microbiota, and metabolite profiles were significantly related to each other (p ≤ 0.05). Additionally, the carps with the same feeding habits had more similarity in gut microbiota composition and metabolite profiles. Various metabolites were correlated positively with bacterial taxa involved in food degradation. Our results shed new light on the microbiome and metabolite profiles in the gut content of cyprinid fishes, and highlighted the correlations between host genotype, fish gut microbiome and putative functions, and gut metabolite profiles. PMID:28367147

  19. RNA-Seq-based transcriptome analysis of dormant flower buds of Chinese cherry (Prunus pseudocerasus).

    PubMed

    Zhu, Youyin; Li, Yongqiang; Xin, Dedong; Chen, Wenrong; Shao, Xu; Wang, Yue; Guo, Weidong

    2015-01-25

    Bud dormancy is a critical biological process allowing Chinese cherry (Prunus pseudocerasus) to survive in winter. Due to the lake of genomic information, molecular mechanisms triggering endodormancy release in flower buds have remained unclear. Hence, we used Illumina RNA-Seq technology to carry out de novo transcriptome assembly and digital gene expression profiling of flower buds. Approximately 47million clean reads were assembled into 50,604 sequences with an average length of 837bp. A total of 37,650 unigene sequences were successfully annotated. 128 pathways were annotated by Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, and metabolic, biosynthesis of second metabolite and plant hormone signal transduction accounted for higher percentage in flower bud. In critical period of endodormancy release, 1644, significantly differentially expressed genes (DEGs) were identified from expression profile. DEGs related to oxidoreductase activity were especially abundant in Gene Ontology (GO) molecular function category. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis demonstrated that DEGs were involved in various metabolic processes, including phytohormone metabolism. Quantitative real-time PCR (qRT-PCR) analysis indicated that levels of DEGs for abscisic acid and gibberellin biosynthesis decreased while the abundance of DEGs encoding their degradation enzymes increased and GID1 was down-regulated. Concomitant with endodormancy release, MADS-box transcription factors including P. pseudocerasus dormancy-associated MADS-box (PpcDAM), Agamous-like2, and APETALA3-like genes, shown remarkably epigenetic roles. The newly generated transcriptome and gene expression profiling data provide valuable genetic information for revealing transcriptomic variation during bud dormancy in Chinese cherry. The uncovered data should be useful for future studies of bud dormancy in Prunus fruit trees lacking genomic information. Copyright © 2014 Elsevier B.V. All rights reserved.

  20. Comparison of base composition analysis and Sanger sequencing of mitochondrial DNA for four U.S. population groups.

    PubMed

    Kiesler, Kevin M; Coble, Michael D; Hall, Thomas A; Vallone, Peter M

    2014-01-01

    A set of 711 samples from four U.S. population groups was analyzed using a novel mass spectrometry based method for mitochondrial DNA (mtDNA) base composition profiling. Comparison of the mass spectrometry results with Sanger sequencing derived data yielded a concordance rate of 99.97%. Length heteroplasmy was identified in 46% of samples and point heteroplasmy was observed in 6.6% of samples in the combined mass spectral and Sanger data set. Using discrimination capacity as a metric, Sanger sequencing of the full control region had the highest discriminatory power, followed by the mass spectrometry base composition method, which was more discriminating than Sanger sequencing of just the hypervariable regions. This trend is in agreement with the number of nucleotides covered by each of the three assays. Published by Elsevier Ireland Ltd.

  1. Mass Spectrometry Based Ultrasensitive DNA Methylation Profiling Using Target Fragmentation Assay.

    PubMed

    Lin, Xiang-Cheng; Zhang, Ting; Liu, Lan; Tang, Hao; Yu, Ru-Qin; Jiang, Jian-Hui

    2016-01-19

    Efficient tools for profiling DNA methylation in specific genes are essential for epigenetics and clinical diagnostics. Current DNA methylation profiling techniques have been limited by inconvenient implementation, requirements of specific reagents, and inferior accuracy in quantifying methylation degree. We develop a novel mass spectrometry method, target fragmentation assay (TFA), which enable to profile methylation in specific sequences. This method combines selective capture of DNA target from restricted cleavage of genomic DNA using magnetic separation with MS detection of the nonenzymatic hydrolysates of target DNA. This method is shown to be highly sensitive with a detection limit as low as 0.056 amol, allowing direct profiling of methylation using genome DNA without preamplification. Moreover, this method offers a unique advantage in accurately determining DNA methylation level. The clinical applicability was demonstrated by DNA methylation analysis using prostate tissue samples, implying the potential of this method as a useful tool for DNA methylation profiling in early detection of related diseases.

  2. Analysis of differential secondary effects of novel rexinoids: select rexinoid X receptor ligands demonstrate differentiated side effect profiles

    PubMed Central

    Marshall, Pamela A; Jurutka, Peter W; Wagner, Carl E; van der Vaart, Arjan; Kaneko, Ichiro; Chavez, Pedro I; Ma, Ning; Bhogal, Jaskaran S; Shahani, Pritika; Swierski, Johnathon C; MacNeill, Mairi

    2015-01-01

    In order to determine the feasibility of utilizing novel rexinoids for chemotherapeutics and as potential treatments for neurological conditions, we undertook an assessment of the side effect profile of select rexinoid X receptor (RXR) analogs that we reported previously. We assessed pharmacokinetic profiles, lipid and thyroid-stimulating hormone (TSH) levels in rats, and cell culture activity of rexinoids in sterol regulatory element-binding protein (SREBP) induction and thyroid hormone inhibition assays. We also performed RNA sequencing of the brain tissues of rats that had been dosed with the compounds. We show here for the first time that potent rexinoid activity can be uncoupled from drastic lipid changes and thyroid axis variations, and we propose that rexinoids can be developed with improved side effect profiles than the parent compound, bexarotene (1). PMID:26038698

  3. How Next-Generation Sequencing and Multiscale Data Analysis Will Transform Infectious Disease Management

    PubMed Central

    Pak, Theodore R.; Kasarskis, Andrew

    2015-01-01

    Recent reviews have examined the extent to which routine next-generation sequencing (NGS) on clinical specimens will improve the capabilities of clinical microbiology laboratories in the short term, but do not explore integrating NGS with clinical data from electronic medical records (EMRs), immune profiling data, and other rich datasets to create multiscale predictive models. This review introduces a range of “omics” and patient data sources relevant to managing infections and proposes 3 potentially disruptive applications for these data in the clinical workflow. The combined threats of healthcare-associated infections and multidrug-resistant organisms may be addressed by multiscale analysis of NGS and EMR data that is ideally updated and refined over time within each healthcare organization. Such data and analysis should form the cornerstone of future learning health systems for infectious disease. PMID:26251049

  4. Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications

    PubMed Central

    Harris, R. Alan; Wang, Ting; Coarfa, Cristian; Nagarajan, Raman P.; Hong, Chibo; Downey, Sara L.; Johnson, Brett E.; Fouse, Shaun D.; Delaney, Allen; Zhao, Yongjun; Olshen, Adam; Ballinger, Tracy; Zhou, Xin; Forsberg, Kevin J.; Gu, Junchen; Echipare, Lorigail; O’Geen, Henriette; Lister, Ryan; Pelizzola, Mattia; Xi, Yuanxin; Epstein, Charles B.; Bernstein, Bradley E.; Hawkins, R. David; Ren, Bing; Chung, Wen-Yu; Gu, Hongcang; Bock, Christoph; Gnirke, Andreas; Zhang, Michael Q.; Haussler, David; Ecker, Joseph; Li, Wei; Farnham, Peggy J.; Waterland, Robert A.; Meissner, Alexander; Marra, Marco A.; Hirst, Martin; Milosavljevic, Aleksandar; Costello, Joseph F.

    2010-01-01

    Sequencing-based DNA methylation profiling methods are comprehensive and, as accuracy and affordability improve, will increasingly supplant microarrays for genome-scale analyses. Here, four sequencing-based methodologies were applied to biological replicates of human embryonic stem cells to compare their CpG coverage genome-wide and in transposons, resolution, cost, concordance and its relationship with CpG density and genomic context. The two bisulfite methods reached concordance of 82% for CpG methylation levels and 99% for non-CpG cytosine methylation levels. Using binary methylation calls, two enrichment methods were 99% concordant, while regions assessed by all four methods were 97% concordant. To achieve comprehensive methylome coverage while reducing cost, an approach integrating two complementary methods was examined. The integrative methylome profile along with histone methylation, RNA, and SNP profiles derived from the sequence reads allowed genome-wide assessment of allele-specific epigenetic states, identifying most known imprinted regions and new loci with monoallelic epigenetic marks and monoallelic expression. PMID:20852635

  5. SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.

    PubMed

    Yang, Xiaoxia; Wang, Jia; Sun, Jun; Liu, Rong

    2015-01-01

    Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.

  6. Genotypic diversity of oscillatoriacean strains belonging to the genera Geitlerinema and Spirulina determined by 16S rDNA restriction analysis.

    PubMed

    Margheri, Maria C; Piccardi, Raffaella; Ventura, Stefano; Viti, Carlo; Giovannetti, Luciana

    2003-05-01

    Genotypic diversity of several cyanobacterial strains mostly isolated from marine or brackish waters, belonging to the genera Geitlerinema and Spirulina, was investigated by amplified 16S ribosomal DNA restriction analysis and compared with morphological features and response to salinity. Cluster analysis was performed on amplified 16S rDNA restriction profiles of these strains along with profiles obtained from sequence data of five Spirulina-like strains, including three representatives of the new genus Halospirulina. Our strains with tightly coiled trichomes from hypersaline waters could be assigned to the Halospirulina genus. Among the uncoiled strains, the two strains of hypersaline origin clustered together and were found to be distant from their counterparts of marine and freshwater habitat. Moreover, another cluster, formed by alkali-tolerant strains with tightly coiled trichomes, was well delineated.

  7. It's DE-licious: A Recipe for Differential Expression Analyses of RNA-seq Experiments Using Quasi-Likelihood Methods in edgeR.

    PubMed

    Lun, Aaron T L; Chen, Yunshun; Smyth, Gordon K

    2016-01-01

    RNA sequencing (RNA-seq) is widely used to profile transcriptional activity in biological systems. Here we present an analysis pipeline for differential expression analysis of RNA-seq experiments using the Rsubread and edgeR software packages. The basic pipeline includes read alignment and counting, filtering and normalization, modelling of biological variability and hypothesis testing. For hypothesis testing, we describe particularly the quasi-likelihood features of edgeR. Some more advanced downstream analysis steps are also covered, including complex comparisons, gene ontology enrichment analyses and gene set testing. The code required to run each step is described, along with an outline of the underlying theory. The chapter includes a case study in which the pipeline is used to study the expression profiles of mammary gland cells in virgin, pregnant and lactating mice.

  8. Transcriptome analysis of Capsicum annuum varieties Mandarin and Blackcluster: assembly, annotation and molecular marker discovery.

    PubMed

    Ahn, Yul-Kyun; Tripathi, Swati; Kim, Jeong-Ho; Cho, Young-Il; Lee, Hye-Eun; Kim, Do-Sun; Woo, Jong-Gyu; Cho, Myeong-Cheoul

    2014-01-10

    Next generation sequencing technologies have proven to be a rapid and cost-effective means to assemble and characterize gene content and identify molecular markers in various organisms. Pepper (Capsicum annuum L., Solanaceae) is a major staple vegetable crop, which is economically important and has worldwide distribution. High-throughput transcriptome profiling of two pepper cultivars, Mandarin and Blackcluster, using 454 GS-FLX pyrosequencing yielded 279,221 and 316,357 sequenced reads with a total 120.44 and 142.54Mb of sequence data (average read length of 431 and 450 nucleotides). These reads resulted from 17,525 and 16,341 'isogroups' and were assembled into 19,388 and 18,057 isotigs, and 22,217 and 13,153 singletons for both the cultivars, respectively. Assembled sequences were annotated functionally based on homology to genes in multiple public databases. Detailed sequence variant analysis identified a total of 9701 and 12,741 potential SNPs which eventually resulted in 1025 and 1059 genotype specific SNPs, for both the varieties, respectively, after examining SNP frequency distribution for each mapped unigenes. These markers for pepper will be highly valuable for marker-assisted breeding and other genetic studies. © 2013 Elsevier B.V. All rights reserved.

  9. GI-POP: a combinational annotation and genomic island prediction pipeline for ongoing microbial genome projects.

    PubMed

    Lee, Chi-Ching; Chen, Yi-Ping Phoebe; Yao, Tzu-Jung; Ma, Cheng-Yu; Lo, Wei-Cheng; Lyu, Ping-Chiang; Tang, Chuan Yi

    2013-04-10

    Sequencing of microbial genomes is important because of microbial-carrying antibiotic and pathogenetic activities. However, even with the help of new assembling software, finishing a whole genome is a time-consuming task. In most bacteria, pathogenetic or antibiotic genes are carried in genomic islands. Therefore, a quick genomic island (GI) prediction method is useful for ongoing sequencing genomes. In this work, we built a Web server called GI-POP (http://gipop.life.nthu.edu.tw) which integrates a sequence assembling tool, a functional annotation pipeline, and a high-performance GI predicting module, in a support vector machine (SVM)-based method called genomic island genomic profile scanning (GI-GPS). The draft genomes of the ongoing genome projects in contigs or scaffolds can be submitted to our Web server, and it provides the functional annotation and highly probable GI-predicting results. GI-POP is a comprehensive annotation Web server designed for ongoing genome project analysis. Researchers can perform annotation and obtain pre-analytic information include possible GIs, coding/non-coding sequences and functional analysis from their draft genomes. This pre-analytic system can provide useful information for finishing a genome sequencing project. Copyright © 2012 Elsevier B.V. All rights reserved.

  10. Bacillus nealsonii sp. nov., isolated from a spacecraft-assembly facility, whose spores are gamma-radiation resistant

    NASA Technical Reports Server (NTRS)

    Venkateswaran, Kasthuri; Kempf, Michael; Chen, Fei; Satomi, Masataka; Nicholson, Wayne; Kern, Roger

    2003-01-01

    One of the spore-formers isolated from a spacecraft-assembly facility, belonging to the genus Bacillus, is described on the basis of phenotypic characterization, 16S rDNA sequence analysis and DNA-DNA hybridization studies. It is a Gram-positive, facultatively anaerobic, rod-shaped eubacterium that produces endospores. The spores of this novel bacterial species exhibited resistance to UV, gamma-radiation, H2O2 and desiccation. The 18S rDNA sequence analysis revealed a clear affiliation between this strain and members of the low G+C Firmicutes. High 16S rDNA sequence similarity values were found with members of the genus Bacillus and this was supported by fatty acid profiles. The 16S rDNA sequence similarity between strain FO-92T and Bacillus benzoevorans DSM 5391T was very high. However, molecular characterizations employing small-subunit 16S rDNA sequences were at the limits of resolution for the differentiation of species in this genus, but DNA-DNA hybridization data support the proposal of FO-92T as Bacillus nealsonii sp. nov. (type strain is FO-92T =ATCC BAAM-519T =DSM 15077T).

  11. Transcriptome Sequencing of Gracilariopsis lemaneiformis to Analyze the Genes Related to Optically Active Phycoerythrin Synthesis.

    PubMed

    Huang, Xiaoyun; Zang, Xiaonan; Wu, Fei; Jin, Yuming; Wang, Haitao; Liu, Chang; Ding, Yating; He, Bangxiang; Xiao, Dongfang; Song, Xinwei; Liu, Zhu

    2017-01-01

    Gracilariopsis lemaneiformis (aka Gracilaria lemaneiformis) is a red macroalga rich in phycoerythrin, which can capture light efficiently and transfer it to photosystemⅡ. However, little is known about the synthesis of optically active phycoerythrinin in G. lemaneiformis at the molecular level. With the advent of high-throughput sequencing technology, analysis of genetic information for G. lemaneiformis by transcriptome sequencing is an effective means to get a deeper insight into the molecular mechanism of phycoerythrin synthesis. Illumina technology was employed to sequence the transcriptome of two strains of G. lemaneiformis- the wild type and a green-pigmented mutant. We obtained a total of 86915 assembled unigenes as a reference gene set, and 42884 unigenes were annotated in at least one public database. Taking the above transcriptome sequencing as a reference gene set, 4041 differentially expressed genes were screened to analyze and compare the gene expression profiles of the wild type and green mutant. By GO and KEGG pathway analysis, we concluded that three factors, including a reduction in the expression level of apo-phycoerythrin, an increase of chlorophyll light-harvesting complex synthesis, and reduction of phycoerythrobilin by competitive inhibition, caused the reduction of optically active phycoerythrin in the green-pigmented mutant.

  12. Identification and characterization of microRNAs in white and brown alpaca skin

    PubMed Central

    2012-01-01

    Background MicroRNAs (miRNAs) are small, non-coding 21–25 nt RNA molecules that play an important role in regulating gene expression. Little is known about the expression profiles and functions of miRNAs in skin and their role in pigmentation. Alpacas have more than 22 natural coat colors, more than any other fiber producing species. To better understand the role of miRNAs in control of coat color we performed a comprehensive analysis of miRNA expression profiles in skin of white versus brown alpacas. Results Two small RNA libraries from white alpaca (WA) and brown alpaca (BA) skin were sequenced with the aid of Illumina sequencing technology. 272 and 267 conserved miRNAs were obtained from the WA and BA skin libraries, respectively. Of these conserved miRNAs, 35 and 13 were more abundant in WA and BA skin, respectively. The targets of these miRNAs were predicted and grouped based on Gene Ontology and KEGG pathway analysis. Many predicted target genes for these miRNAs are involved in the melanogenesis pathway controlling pigmentation. In addition to the conserved miRNAs, we also obtained 22 potentially novel miRNAs from the WA and BA skin libraries. Conclusion This study represents the first comprehensive survey of miRNAs expressed in skin of animals of different coat colors by deep sequencing analysis. We discovered a collection of miRNAs that are differentially expressed in WA and BA skin. The results suggest important potential functions of miRNAs in coat color regulation. PMID:23067000

  13. Evol and ProDy for bridging protein sequence evolution and structural dynamics.

    PubMed

    Bakan, Ahmet; Dutta, Anindita; Mao, Wenzhi; Liu, Ying; Chennubhotla, Chakra; Lezon, Timothy R; Bahar, Ivet

    2014-09-15

    Correlations between sequence evolution and structural dynamics are of utmost importance in understanding the molecular mechanisms of function and their evolution. We have integrated Evol, a new package for fast and efficient comparative analysis of evolutionary patterns and conformational dynamics, into ProDy, a computational toolbox designed for inferring protein dynamics from experimental and theoretical data. Using information-theoretic approaches, Evol coanalyzes conservation and coevolution profiles extracted from multiple sequence alignments of protein families with their inferred dynamics. ProDy and Evol are open-source and freely available under MIT License from http://prody.csb.pitt.edu/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  14. 16S rRNA analysis provides evidence of biofilms on all components of three infected periprosthetic knees including permanent braided suture.

    PubMed

    Swearingen, Matthew C; DiBartola, Alex C; Dusane, Devendra; Granger, Jeffrey; Stoodley, Paul

    2016-10-01

    Bacterial biofilms are the main etiological agent of periprosthetic joint infections (PJI); however, it is unclear if biofilms colonize one or multiple components. Because biofilms can colonize a variety of surfaces, we hypothesized that biofilms would be present on all components. 16S ribosomal RNA (rRNA) gene sequencing analysis was used to identify bacteria recovered from individual components and non-absorbable suture material recovered from three PJI total knee revision cases. Bray-Curtis non-metric multidimensional scaling analysis revealed no significant differences in similarity when factoring component, material type, or suture versus non-suture material, but did reveal significant differences in organism profile between patients (P < 0.001) and negative controls (P < 0.001). Confocal microscopy and a novel agar encasement culturing method also confirmed biofilm growth on a subset of components. While 16S sequencing suggested that the microbiology was more complex than revealed by culture contaminating, bacterial DNA generates a risk of false positives. This report highlights that biofilm bacteria may colonize all infected prosthetic components including braided suture material, and provides further evidence that clinical culture can fail to sufficiently identify the full pathogen profile in PJI cases. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  15. Transcriptome profile and unique genetic evolution of positively selected genes in yak lungs.

    PubMed

    Lan, DaoLiang; Xiong, XianRong; Ji, WenHui; Li, Jian; Mipam, Tserang-Donko; Ai, Yi; Chai, ZhiXin

    2018-04-01

    The yak (Bos grunniens), which is a unique bovine breed that is distributed mainly in the Qinghai-Tibetan Plateau, is considered a good model for studying plateau adaptability in mammals. The lungs are important functional organs that enable animals to adapt to their external environment. However, the genetic mechanism underlying the adaptability of yak lungs to harsh plateau environments remains unknown. To explore the unique evolutionary process and genetic mechanism of yak adaptation to plateau environments, we performed transcriptome sequencing of yak and cattle (Bos taurus) lungs using RNA-Seq technology and a subsequent comparison analysis to identify the positively selected genes in the yak. After deep sequencing, a normal transcriptome profile of yak lung that containing a total of 16,815 expressed genes was obtained, and the characteristics of yak lungs transcriptome was described by functional analysis. Furthermore, Ka/Ks comparison statistics result showed that 39 strong positively selected genes are identified from yak lungs. Further GO and KEGG analysis was conducted for the functional annotation of these genes. The results of this study provide valuable data for further explorations of the unique evolutionary process of high-altitude hypoxia adaptation in yaks in the Tibetan Plateau and the genetic mechanism at the molecular level.

  16. dictyExpress: a web-based platform for sequence data management and analytics in Dictyostelium and beyond.

    PubMed

    Stajdohar, Miha; Rosengarten, Rafael D; Kokosar, Janez; Jeran, Luka; Blenkus, Domen; Shaulsky, Gad; Zupan, Blaz

    2017-06-02

    Dictyostelium discoideum, a soil-dwelling social amoeba, is a model for the study of numerous biological processes. Research in the field has benefited mightily from the adoption of next-generation sequencing for genomics and transcriptomics. Dictyostelium biologists now face the widespread challenges of analyzing and exploring high dimensional data sets to generate hypotheses and discovering novel insights. We present dictyExpress (2.0), a web application designed for exploratory analysis of gene expression data, as well as data from related experiments such as Chromatin Immunoprecipitation sequencing (ChIP-Seq). The application features visualization modules that include time course expression profiles, clustering, gene ontology enrichment analysis, differential expression analysis and comparison of experiments. All visualizations are interactive and interconnected, such that the selection of genes in one module propagates instantly to visualizations in other modules. dictyExpress currently stores the data from over 800 Dictyostelium experiments and is embedded within a general-purpose software framework for management of next-generation sequencing data. dictyExpress allows users to explore their data in a broader context by reciprocal linking with dictyBase-a repository of Dictyostelium genomic data. In addition, we introduce a companion application called GenBoard, an intuitive graphic user interface for data management and bioinformatics analysis. dictyExpress and GenBoard enable broad adoption of next generation sequencing based inquiries by the Dictyostelium research community. Labs without the means to undertake deep sequencing projects can mine the data available to the public. The entire information flow, from raw sequence data to hypothesis testing, can be accomplished in an efficient workspace. The software framework is generalizable and represents a useful approach for any research community. To encourage more wide usage, the backend is open-source, available for extension and further development by bioinformaticians and data scientists.

  17. 16S and 23S plastid rDNA phylogenies of Prototheca species and their auxanographic phenotypes.

    PubMed

    Ewing, Aren; Brubaker, Shane; Somanchi, Aravind; Yu, Esther; Rudenko, George; Reyes, Nina; Espina, Karen; Grossman, Arthur; Franklin, Scott

    2014-08-01

    Because algae have become more accepted as sources of human nutrition, phylogenetic analysis can help resolve the taxonomy of taxa that have not been well studied. This can help establish algal evolutionary relationships. Here, we compare Auxenochlorella protothecoides and 23 strains of Prototheca based on their complete 16S and partial 23S plastid rDNA sequences along with nutrient utilization (auxanographic) profiles. These data demonstrate that some of the species groupings are not in agreement with the molecular phylogenetic analyses and that auxanographic profiles are poor predictors of phylogenetic relationships.

  18. 16S and 23S plastid rDNA phylogenies of Prototheca species and their auxanographic phenotypes1

    PubMed Central

    Ewing, Aren; Brubaker, Shane; Somanchi, Aravind; Yu, Esther; Rudenko, George; Reyes, Nina; Espina, Karen; Grossman, Arthur; Franklin, Scott

    2014-01-01

    Because algae have become more accepted as sources of human nutrition, phylogenetic analysis can help resolve the taxonomy of taxa that have not been well studied. This can help establish algal evolutionary relationships. Here, we compare Auxenochlorella protothecoides and 23 strains of Prototheca based on their complete 16S and partial 23S plastid rDNA sequences along with nutrient utilization (auxanographic) profiles. These data demonstrate that some of the species groupings are not in agreement with the molecular phylogenetic analyses and that auxanographic profiles are poor predictors of phylogenetic relationships. PMID:25937672

  19. Analysis of Protein-DNA Interaction by Chromatin Immunoprecipitation and DNA Tiling Microarray (ChIP-on-chip).

    PubMed

    Gao, Hui; Zhao, Chunyan

    2018-01-01

    Chromatin immunoprecipitation (ChIP) has become the most effective and widely used tool to study the interactions between specific proteins or modified forms of proteins and a genomic DNA region. Combined with genome-wide profiling technologies, such as microarray hybridization (ChIP-on-chip) or massively parallel sequencing (ChIP-seq), ChIP could provide a genome-wide mapping of in vivo protein-DNA interactions in various organisms. Here, we describe a protocol of ChIP-on-chip that uses tiling microarray to obtain a genome-wide profiling of ChIPed DNA.

  20. Analysis of whole genome sequencing for the Escherichia coli O157:H7 typing phages.

    PubMed

    Cowley, Lauren A; Beckett, Stephen J; Chase-Topping, Margo; Perry, Neil; Dallman, Tim J; Gally, David L; Jenkins, Claire

    2015-04-08

    Shiga toxin producing Escherichia coli O157 can cause severe bloody diarrhea and haemolytic uraemic syndrome. Phage typing of E. coli O157 facilitates public health surveillance and outbreak investigations, certain phage types are more likely to occupy specific niches and are associated with specific age groups and disease severity. The aim of this study was to analyse the genome sequences of 16 (fourteen T4 and two T7) E. coli O157 typing phages and to determine the genes responsible for the subtle differences in phage type profiles. The typing phages were sequenced using paired-end Illumina sequencing at The Genome Analysis Centre and the Animal Health and Veterinary Laboratories Agency and bioinformatics programs including Velvet, Brig and Easyfig were used to analyse them. A two-way Euclidian cluster analysis highlighted the associations between groups of phage types and typing phages. The analysis showed that the T7 typing phages (9 and 10) differed by only three genes and that the T4 typing phages formed three distinct groups of similar genomic sequences: Group 1 (1, 8, 11, 12 and 15, 16), Group 2 (3, 6, 7 and 13) and Group 3 (2, 4, 5 and 14). The E. coli O157 phage typing scheme exhibited a significantly modular network linked to the genetic similarity of each group showing that these groups are specialised to infect a subset of phage types. Sequencing the typing phage has enabled us to identify the variable genes within each group and to determine how this corresponds to changes in phage type.

  1. Fast multiclonal clusterization of V(D)J recombinations from high-throughput sequencing.

    PubMed

    Giraud, Mathieu; Salson, Mikaël; Duez, Marc; Villenet, Céline; Quief, Sabine; Caillault, Aurélie; Grardel, Nathalie; Roumier, Christophe; Preudhomme, Claude; Figeac, Martin

    2014-05-28

    V(D)J recombinations in lymphocytes are essential for immunological diversity. They are also useful markers of pathologies. In leukemia, they are used to quantify the minimal residual disease during patient follow-up. However, the full breadth of lymphocyte diversity is not fully understood. We propose new algorithms that process high-throughput sequencing (HTS) data to extract unnamed V(D)J junctions and gather them into clones for quantification. This analysis is based on a seed heuristic and is fast and scalable because in the first phase, no alignment is performed with germline database sequences. The algorithms were applied to TR γ HTS data from a patient with acute lymphoblastic leukemia, and also on data simulating hypermutations. Our methods identified the main clone, as well as additional clones that were not identified with standard protocols. The proposed algorithms provide new insight into the analysis of high-throughput sequencing data for leukemia, and also to the quantitative assessment of any immunological profile. The methods described here are implemented in a C++ open-source program called Vidjil.

  2. Transcriptome analysis of eyestalk and hemocytes in the ridgetail white prawn Exopalaemon carinicauda: assembly, annotation and marker discovery.

    PubMed

    Li, Jitao; Li, Jian; Chen, Ping; Liu, Ping; He, Yuying

    2015-01-01

    The ridgetail white prawn Exopalaemon carinicauda is one of major economic mariculture species in eastern China. The deficiency of genomic and transcriptomic data is becoming the bottleneck of further researches on its good traits. In the present study, 454 pyrosequencing was undertaken to investigate the transcriptome profiles of E. carinicauda. A collection of 1,028,710 sequence reads (459.59 Mb) obtained from cDNA prepared from eyestalk and hemocytes was assembled into 162,056 expressed sequence tags (ESTs). Of these, 29.88 % of 48,428 contigs and 70.12 % of 113,628 singlets possessed high similarities to sequences in the GenBank non-redundant database, with most significant (E value <1e(-10)) unigenes matches occurring with crustacean and insect sequences. KEGG analysis of unigenes identified putative members of biological pathways related to growth and immunity. In addition, we obtained a total of putative 125,112 SNPs and 13,467 microsatellites. These results will contribute to the understanding of the genome makeup and provide useful information for future functional genomic research in E. carinicauda.

  3. Molecular cloning of a putative gene encoding isopentenyltransferase from pingyitiancha (Malus hupehensis) and characterization of its response to nitrate.

    PubMed

    Peng, Jing; Peng, Futian; Zhu, Chunfu; Wei, Shaochong

    2008-06-01

    A putative isopentenyltransferase (IPT) encoding gene was identified from a pingyitiancha (Malus hupehensis Rehd.) expressed sequence tag database, and the full-length gene was cloned by RACE. Based on expression profile and sequence alignment, the nucleotide sequence of the clone, named MhIPT3, was most similar to AtIPT3, an IPT gene in Arabidopsis. The full-length cDNA contained a 963-bp open reading frame encoding a protein of 321 amino acids with a molecular mass of 37.3 kDa. Sequence analysis of genomic DNA revealed the absence of introns in the frame. Quantitative real-time PCR analysis demonstrated that the gene was expressed in roots, stems and leaves. Application of nitrate to roots of nitrogen-deprived seedlings strongly induced expression of MhIPT3 and was accompanied by the accumulation of cytokinins, whereas MhIPT3 expression was little affected by ammonium application to roots of nitrogen-deprived seedlings. Application of nitrate to leaves also up-regulated the expression of MhIPT3 and corresponded closely with the accumulation of isopentyladenine and isopentyladenosine in leaves.

  4. DNA methylation assessment from human slow- and fast-twitch skeletal muscle fibers

    PubMed Central

    Begue, Gwénaëlle; Raue, Ulrika; Jemiolo, Bozena

    2017-01-01

    A new application of the reduced representation bisulfite sequencing method was developed using low-DNA input to investigate the epigenetic profile of human slow- and fast-twitch skeletal muscle fibers. Successful library construction was completed with as little as 15 ng of DNA, and high-quality sequencing data were obtained with 32 ng of DNA. Analysis identified 143,160 differentially methylated CpG sites across 14,046 genes. In both fiber types, selected genes predominantly expressed in slow or fast fibers were hypomethylated, which was supported by the RNA-sequencing analysis. These are the first fiber type-specific methylation data from human skeletal muscle and provide a unique platform for future research. NEW & NOTEWORTHY This study validates a low-DNA input reduced representation bisulfite sequencing method for human muscle biopsy samples to investigate the methylation patterns at a fiber type-specific level. These are the first fiber type-specific methylation data reported from human skeletal muscle and thus provide initial insight into basal state differences in myosin heavy chain I and IIa muscle fibers among young, healthy men. PMID:28057818

  5. Transcriptome sequencing and whole genome expression profiling of chrysanthemum under dehydration stress

    PubMed Central

    2013-01-01

    Background Chrysanthemum is one of the most important ornamental crops in the world and drought stress seriously limits its production and distribution. In order to generate a functional genomics resource and obtain a deeper understanding of the molecular mechanisms regarding chrysanthemum responses to dehydration stress, we performed large-scale transcriptome sequencing of chrysanthemum plants under dehydration stress using the Illumina sequencing technology. Results Two cDNA libraries constructed from mRNAs of control and dehydration-treated seedlings were sequenced by Illumina technology. A total of more than 100 million reads were generated and de novo assembled into 98,180 unique transcripts which were further extensively annotated by comparing their sequencing to different protein databases. Biochemical pathways were predicted from these transcript sequences. Furthermore, we performed gene expression profiling analysis upon dehydration treatment in chrysanthemum and identified 8,558 dehydration-responsive unique transcripts, including 307 transcription factors and 229 protein kinases and many well-known stress responsive genes. Gene ontology (GO) term enrichment and biochemical pathway analyses showed that dehydration stress caused changes in hormone response, secondary and amino acid metabolism, and light and photoperiod response. These findings suggest that drought tolerance of chrysanthemum plants may be related to the regulation of hormone biosynthesis and signaling, reduction of oxidative damage, stabilization of cell proteins and structures, and maintenance of energy and carbon supply. Conclusions Our transcriptome sequences can provide a valuable resource for chrysanthemum breeding and research and novel insights into chrysanthemum responses to dehydration stress and offer candidate genes or markers that can be used to guide future studies attempting to breed drought tolerant chrysanthemum cultivars. PMID:24074255

  6. Pattern similarity study of functional sites in protein sequences: lysozymes and cystatins

    PubMed Central

    Nakai, Shuryo; Li-Chan, Eunice CY; Dou, Jinglie

    2005-01-01

    Background Although it is generally agreed that topography is more conserved than sequences, proteins sharing the same fold can have different functions, while there are protein families with low sequence similarity. An alternative method for profile analysis of characteristic conserved positions of the motifs within the 3D structures may be needed for functional annotation of protein sequences. Using the approach of quantitative structure-activity relationships (QSAR), we have proposed a new algorithm for postulating functional mechanisms on the basis of pattern similarity and average of property values of side-chains in segments within sequences. This approach was used to search for functional sites of proteins belonging to the lysozyme and cystatin families. Results Hydrophobicity and β-turn propensity of reference segments with 3–7 residues were used for the homology similarity search (HSS) for active sites. Hydrogen bonding was used as the side-chain property for searching the binding sites of lysozymes. The profiles of similarity constants and average values of these parameters as functions of their positions in the sequences could identify both active and substrate binding sites of the lysozyme of Streptomyces coelicolor, which has been reported as a new fold enzyme (Cellosyl). The same approach was successfully applied to cystatins, especially for postulating the mechanisms of amyloidosis of human cystatin C as well as human lysozyme. Conclusion Pattern similarity and average index values of structure-related properties of side chains in short segments of three residues or longer were, for the first time, successfully applied for predicting functional sites in sequences. This new approach may be applicable to studying functional sites in un-annotated proteins, for which complete 3D structures are not yet available. PMID:15904486

  7. Molecular Typing and Virulence Gene Profiles of Enterotoxin Gene Cluster (egc)-Positive Staphylococcus aureus Isolates Obtained from Various Food and Clinical Specimens.

    PubMed

    Song, Minghui; Shi, Chunlei; Xu, Xuebing; Shi, Xianming

    2016-11-01

    The enterotoxin gene cluster (egc) has been proposed to contribute to the Staphylococcus aureus colonization, which highlights the need to evaluate genetic diversity and virulence gene profiles of the egc-positive population. Here, a total of 43 egc-positive isolates (16.2%) were identified from 266 S. aureus isolates that were obtained from various food and clinical specimens in Shanghai. Seven different egc profiles were found based on the polymerase chain reaction (PCR) result for egc genes. Then, these 43 egc-positive isolates were further typed by multilocus sequence typing, pulsed-field gel electrophoresis (PFGE), multiple-locus variable-number tandem-repeat analysis (MLVA), and accessory gene regulatory (agr) typing. It showed that the 43 egc-positive isolates displayed 17 sequence types, 28 PFGE patterns, 29 MLVA types, and 4 agr types, respectively. Among them, the dominant clonal lineage was CC5-agr II (48.84%). Thirty toxin and 20 adhesion-associated genes were detected by PCR in egc-positive isolates. Notably, invasive toxin genes showed a high prevalence, such as 76.7% for Panton-Valentine leukocidin encoding genes, 27.9% for sec, and 23.3% for tsst-1. Most of the examined adhesion-associated genes were found to be conserved (76.7-100%), whereas the fnbB gene was only found in 8 (18.6%) isolates. In addition, 33 toxin gene profiles and 13 adhesion gene profiles were identified, respectively. Our results imply that isolates belonging to the same clonal lineage harbored similar adhesion gene profiles but diverse toxin gene profiles. Overall, the high prevalence of invasive virulence genes increases the potential risk of egc-positive isolates in S. aureus infection.

  8. Cell cloning-based transcriptome analysis in Rett patients: relevance to the pathogenesis of Rett syndrome of new human MeCP2 target genes.

    PubMed

    Nectoux, J; Fichou, Y; Rosas-Vargas, H; Cagnard, N; Bahi-Buisson, N; Nusbaum, P; Letourneur, F; Chelly, J; Bienvenu, T

    2010-07-01

    More than 90% of Rett syndrome (RTT) patients have heterozygous mutations in the X-linked methyl-CpG binding protein 2 (MECP2) gene that encodes the methyl-CpG-binding protein 2, a transcriptional modulator. Because MECP2 is subjected to X chromosome inactivation (XCI), girls with RTT either express the wild-type or mutant allele in each individual cell. To test the consequences of MECP2 mutations resulting from a genome-wide transcriptional dysregulation and to identify its target genes in a system that circumvents the functional mosaicism resulting from XCI, we carried out gene expression profiling of clonal populations derived from fibroblast primary cultures expressing exclusively either the wild-type or the mutant MECP2 allele. Clonal cultures were obtained from skin biopsy of three RTT patients carrying either a non-sense or a frameshift MECP2 mutation. For each patient, gene expression profiles of wild-type and mutant clones were compared by oligonucleotide expression microarray analysis. Firstly, clustering analysis classified the RTT patients according to their genetic background and MECP2 mutation. Secondly, expression profiling by microarray analysis and quantitative RT-PCR indicated four up-regulated genes and five down-regulated genes significantly dysregulated in all our statistical analysis, including excellent potential candidate genes for the understanding of the pathophysiology of this neurodevelopmental disease. Thirdly, chromatin immunoprecipitation analysis confirmed MeCP2 binding to respective CpG islands in three out of four up-regulated candidate genes and sequencing of bisulphite-converted DNA indicated that MeCP2 preferentially binds to methylated-DNA sequences. Most importantly, the finding that at least two of these genes (BMCC1 and RNF182) were shown to be involved in cell survival and/or apoptosis may suggest that impaired MeCP2 function could alter the survival of neurons thus compromising brain function without inducing cell death.

  9. Exercise-associated DNA methylation change in skeletal muscle and the importance of imprinted genes: a bioinformatics meta-analysis.

    PubMed

    Brown, William M

    2015-12-01

    Epigenetics is the study of processes--beyond DNA sequence alteration--producing heritable characteristics. For example, DNA methylation modifies gene expression without altering the nucleotide sequence. A well-studied DNA methylation-based phenomenon is genomic imprinting (ie, genotype-independent parent-of-origin effects). We aimed to elucidate: (1) the effect of exercise on DNA methylation and (2) the role of imprinted genes in skeletal muscle gene networks (ie, gene group functional profiling analyses). Gene ontology (ie, gene product elucidation)/meta-analysis. 26 skeletal muscle and 86 imprinted genes were subjected to g:Profiler ontology analysis. Meta-analysis assessed exercise-associated DNA methylation change. g:Profiler found four muscle gene networks with imprinted loci. Meta-analysis identified 16 articles (387 genes/1580 individuals) associated with exercise. Age, method, sample size, sex and tissue variation could elevate effect size bias. Only skeletal muscle gene networks including imprinted genes were reported. Exercise-associated effect sizes were calculated by gene. Age, method, sample size, sex and tissue variation were moderators. Six imprinted loci (RB1, MEG3, UBE3A, PLAGL1, SGCE, INS) were important for muscle gene networks, while meta-analysis uncovered five exercise-associated imprinted loci (KCNQ1, MEG3, GRB10, L3MBTL1, PLAGL1). DNA methylation decreased with exercise (60% of loci). Exercise-associated DNA methylation change was stronger among older people (ie, age accounted for 30% of the variation). Among older people, genes exhibiting DNA methylation decreases were part of a microRNA-regulated gene network functioning to suppress cancer. Imprinted genes were identified in skeletal muscle gene networks and exercise-associated DNA methylation change. Exercise-associated DNA methylation modification could rewind the 'epigenetic clock' as we age. CRD42014009800. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/

  10. The long tail of molecular alterations in non-small cell lung cancer: a single-institution experience of next-generation sequencing in clinical molecular diagnostics.

    PubMed

    Fumagalli, Caterina; Vacirca, Davide; Rappa, Alessandra; Passaro, Antonio; Guarize, Juliana; Rafaniello Raviele, Paola; de Marinis, Filippo; Spaggiari, Lorenzo; Casadio, Chiara; Viale, Giuseppe; Barberis, Massimo; Guerini-Rocco, Elena

    2018-03-13

    Molecular profiling of advanced non-small cell lung cancers (NSCLC) is essential to identify patients who may benefit from targeted treatments. In the last years, the number of potentially actionable molecular alterations has rapidly increased. Next-generation sequencing allows for the analysis of multiple genes simultaneously. To evaluate the feasibility and the throughput of next-generation sequencing in clinical molecular diagnostics of advanced NSCLC. A single-institution cohort of 535 non-squamous NSCLC was profiled using a next-generation sequencing panel targeting 22 actionable and cancer-related genes. 441 non-squamous NSCLC (82.4%) harboured at least one gene alteration, including 340 cases (63.6%) with clinically relevant molecular aberrations. Mutations have been detected in all but one gene ( FGFR1 ) of the panel. Recurrent alterations were observed in KRAS , TP53 , EGFR , STK11 and MET genes, whereas the remaining genes were mutated in <5% of the cases. Concurrent mutations were detected in 183 tumours (34.2%), mostly impairing KRAS or EGFR in association with TP53 alterations. The study highlights the feasibility of targeted next-generation sequencing in clinical setting. The majority of NSCLC harboured mutations in clinically relevant genes, thus identifying patients who might benefit from different targeted therapies. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  11. FMAP: Functional Mapping and Analysis Pipeline for metagenomics and metatranscriptomics studies.

    PubMed

    Kim, Jiwoong; Kim, Min Soo; Koh, Andrew Y; Xie, Yang; Zhan, Xiaowei

    2016-10-10

    Given the lack of a complete and comprehensive library of microbial reference genomes, determining the functional profile of diverse microbial communities is challenging. The available functional analysis pipelines lack several key features: (i) an integrated alignment tool, (ii) operon-level analysis, and (iii) the ability to process large datasets. Here we introduce our open-sourced, stand-alone functional analysis pipeline for analyzing whole metagenomic and metatranscriptomic sequencing data, FMAP (Functional Mapping and Analysis Pipeline). FMAP performs alignment, gene family abundance calculations, and statistical analysis (three levels of analyses are provided: differentially-abundant genes, operons and pathways). The resulting output can be easily visualized with heatmaps and functional pathway diagrams. FMAP functional predictions are consistent with currently available functional analysis pipelines. FMAP is a comprehensive tool for providing functional analysis of metagenomic/metatranscriptomic sequencing data. With the added features of integrated alignment, operon-level analysis, and the ability to process large datasets, FMAP will be a valuable addition to the currently available functional analysis toolbox. We believe that this software will be of great value to the wider biology and bioinformatics communities.

  12. Contrasting cDNA-AFLP profiles between crown and leaf tissues of cold-acclimated wheat plants indicate differing regulatory circuitries for low temperature tolerance.

    PubMed

    Ganeshan, Seedhabadee; Sharma, Pallavi; Young, Lester; Kumar, Ashwani; Fowler, D Brian; Chibbar, Ravindra N

    2011-03-01

    Low-temperature (LT) tolerance in winter wheat (Triticum aestivum L.) is an economically important but complex trait. Four selected wheat genotypes, a winter hardy cultivar, Norstar, a tender spring cultivar, Manitou and two near-isogenic lines with Vrn-A1 (spring Norstar) and vrn-A1 (winter Manitou) alleles of Manitou and Norstar were cold-acclimated at 6°C and crown and leaf tissues were collected at 0, 2, 14, 21, 35, 42, 56 and 70 days of cold acclimation. cDNA-AFLP profiling was used to determine temporal expression profiles of transcripts during cold-acclimation in crown and leaf tissues, separately to determine if LT regulatory circuitries in crown and leaf tissues could be delineated using this approach. Screening 64 primer combinations identified 4,074 and 2,757 differentially expressed transcript-derived fragments (TDFs) out of which 38 and 16% were up-regulated as compared to 3 and 6% that were down-regulated in crown and leaf tissues, respectively. DNA sequencing of TDFs revealed sequences common to both tissues including genes coding for DEAD-box RNA helicase, choline-phosphate cytidylyltransferase and delta-1-pyrroline carboxylate synthetase. TDF specific to crown tissues included genes coding for phospahtidylinositol kinase, auxin response factor protein and brassinosteroid insensitive 1-associated receptor kinase. In leaf, genes such as methylene tetrahydrofolate reductase, NADH-cytochrome b5 reductase and malate dehydrogenase were identified. However, 30 and 14% of the DNA sequences from the crown and leaf tissues, respectively, were hypothetical or unknown proteins. Cluster analysis of up-, down-regulated and unique TDFs, DNA sequence and real-time PCR validation, infer that mechanisms operating in crown and leaf tissue in response to LT are differently regulated and warrant further studies.

  13. Generation of sequence signatures from DNA amplification fingerprints with mini-hairpin and microsatellite primers.

    PubMed

    Caetano-Anollés, G; Gresshoff, P M

    1996-06-01

    DNA amplification fingerprinting (DAF) with mini-hairpins harboring arbitrary "core" sequences at their 3' termini were used to fingerprint a variety of templates, including PCR products and whole genomes, to establish genetic relationships between plant tax at the interspecific and intraspecific level, and to identify closely related fungal isolates and plant accessions. No correlation was observed between the sequence of the arbitrary core, the stability of the mini-hairpin structure and DAF efficiency. Mini-hairpin primers with short arbitrary cores and primers complementary to simple sequence repeats present in microsatellites were also used to generate arbitrary signatures from amplification profiles (ASAP). The ASAP strategy is a dual-step amplification procedure that uses at least one primer in each fingerprinting stage. ASAP was able to reproducibly amplify DAF products (representing about 10-15 kb of sequence) following careful optimization of amplification parameters such as primer and template concentration. Avoidance of primer sequences partially complementary to DAF product termini was necessary in order to produce distinct fingerprints. This allowed the combinatorial use of oligomers in nucleic acid screening, with numerous ASAP fingerprinting reactions based on a limited number of primer sequences. Mini-hairpin primers and ASAP analysis significantly increased detection of polymorphic DNA, separating closely related bermudagrass (Cynodon) cultivars and detecting putatively linked markers in bulked segregant analysis of the soybean (Glycine max) supernodulation (nitrate-tolerant symbiosis) locus.

  14. RiboGalaxy: A browser based platform for the alignment, analysis and visualization of ribosome profiling data

    PubMed Central

    Michel, Audrey M.; Mullan, James P. A.; Velayudhan, Vimalkumar; O'Connor, Patrick B. F.; Donohue, Claire A.; Baranov, Pavel V.

    2016-01-01

    ABSTRACT Ribosome profiling (ribo-seq) is a technique that uses high-throughput sequencing to reveal the exact locations and densities of translating ribosomes at the entire transcriptome level. The technique has become very popular since its inception in 2009. Yet experimentalists who generate ribo-seq data often have to rely on bioinformaticians to process and analyze their data. We present RiboGalaxy (http://ribogalaxy.ucc.ie), a freely available Galaxy-based web server for processing and analyzing ribosome profiling data with the visualization functionality provided by GWIPS-viz (http://gwips.ucc.ie). RiboGalaxy offers researchers a suite of tools specifically tailored for processing ribo-seq and corresponding mRNA-seq data. Researchers can take advantage of the published workflows which reduce the multi-step alignment process to a minimum of inputs from the user. Users can then explore their own aligned data as custom tracks in GWIPS-viz and compare their ribosome profiles to existing ribo-seq tracks from published studies. In addition, users can assess the quality of their ribo-seq data, determine the strength of the triplet periodicity signal, generate meta-gene ribosome profiles as well as analyze the relative impact of mRNA sequence features on local read density. RiboGalaxy is accompanied by extensive documentation and tips for helping users. In addition we provide a forum (http://gwips.ucc.ie/Forum) where we encourage users to post their questions and feedback to improve the overall RiboGalaxy service. PMID:26821742

  15. Comparing Stellar Populations Across the Hubble Sequence

    NASA Astrophysics Data System (ADS)

    Loeffler, Shane; Kaleida, Catherine C.; Parkash, Vaishali

    2015-01-01

    Previous work (Jansen et al., 2000, Taylor et al., 2005) has revealed trends in the optical wavelength radial profiles of galaxies across the Hubble Sequence. Radial profiles offer insight into stellar populations, metallicity, and dust concentrations, aspects which are deeply tied to the individual evolution of a galaxy. The Nearby Field Galaxy Survey (NFGS) provides a sampling of nearby galaxies that spans the range of morphological types, luminosities, and masses. Currently available NFGS data includes optical radial surface profiles and spectra of 196 nearby galaxies. We aim to look for trends in the infrared portion of the spectrum for these galaxies, but find that existing 2MASS data is not sufficiently deep. Herein, we expand the available data for the NGFS galaxy IC1639 deeper into the infrared using new data taken with the Infrared Sideport Imager (ISPI) on the 4-m Blanco Telescope at the Cerro Tololo Inter-American Observatory (CTIO) in Chile. Images taken in J, H, and Ks were reduced using standard IRAF and IDL procedures. Photometric calibrations were completed by using the highest quality (AAA) 2MASS stars in the field. Aperture photometry was then performed on the galaxy and radial profiles of surface brightness, J-H color, and H-Ks color were produced. For IC1639, the new ISPI data reveals flat color gradients and surface brightness gradients that decrease with radius. These trends reveal an archetypal elliptical galaxy, with a relatively homogeneous stellar population, stellar density decreasing with radius, and little-to-no obscuration by dust. We have obtained ISPI images for an additional 8 galaxies, and further reduction and analysis of these data will allow for investigation of radial trends in the infrared for galaxies across the Hubble Sequence.

  16. [Genome-scale sequence data processing and epigenetic analysis of DNA methylation].

    PubMed

    Wang, Ting-Zhang; Shan, Gao; Xu, Jian-Hong; Xue, Qing-Zhong

    2013-06-01

    A new approach recently developed for detecting cytosine DNA methylation (mC) and analyzing the genome-scale DNA methylation profiling, is called BS-Seq which is based on bisulfite conversion of genomic DNA combined with next-generation sequencing. The method can not only provide an insight into the difference of genome-scale DNA methylation among different organisms, but also reveal the conservation of DNA methylation in all contexts and nucleotide preference for different genomic regions, including genes, exons, and repetitive DNA sequences. It will be helpful to under-stand the epigenetic impacts of cytosine DNA methylation on the regulation of gene expression and maintaining silence of repetitive sequences, such as transposable elements. In this paper, we introduce the preprocessing steps of DNA methylation data, by which cytosine (C) and guanine (G) in the reference sequence are transferred to thymine (T) and adenine (A), and cytosine in reads is transferred to thymine, respectively. We also comprehensively review the main content of the DNA methylation analysis on the genomic scale: (1) the cytosine methylation under the context of different sequences; (2) the distribution of genomic methylcytosine; (3) DNA methylation context and the preference for the nucleotides; (4) DNA- protein interaction sites of DNA methylation; (5) degree of methylation of cytosine in the different structural elements of genes. DNA methylation analysis technique provides a powerful tool for the epigenome study in human and other species, and genes and environment interaction, and founds the theoretical basis for further development of disease diagnostics and therapeutics in human.

  17. eRNA: a graphic user interface-based tool optimized for large data analysis from high-throughput RNA sequencing

    PubMed Central

    2014-01-01

    Background RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. Results We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification” includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module “mRNA identification” includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module “Target screening” provides expression profiling analyses and graphic visualization. The module “Self-testing” offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program’s functionality. Conclusions eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory. PMID:24593312

  18. eRNA: a graphic user interface-based tool optimized for large data analysis from high-throughput RNA sequencing.

    PubMed

    Yuan, Tiezheng; Huang, Xiaoyi; Dittmar, Rachel L; Du, Meijun; Kohli, Manish; Boardman, Lisa; Thibodeau, Stephen N; Wang, Liang

    2014-03-05

    RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification" includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module "mRNA identification" includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module "Target screening" provides expression profiling analyses and graphic visualization. The module "Self-testing" offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program's functionality. eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory.

  19. Complementary DNA cloning, sequence analysis, and tissue transcription profile of a novel U2AF2 gene from the Chinese Banna mini-pig inbred line.

    PubMed

    Wang, S Y; Huo, J L; Miao, Y W; Cheng, W M; Zeng, Y Z

    2013-04-02

    U2 small nuclear RNA auxiliary factor 2 (U2AF2) is an important gene for pre-messenger RNA splicing in higher eukaryotes. In this study, the Banna mini-pig inbred line (BMI) U2AF2 coding sequence (CDS) was cloned, sequenced, and characterized. The U2AF2 complete CDS was amplified using the reverse transcription-polymerase chain reaction (RT-PCR) technique based on the conserved sequence information of cattle and known highly homologous swine expressed sequence tags. This novel gene was deposited into the National Center for Biotechnology Information database (Accession No. JQ839267). Sequence analysis revealed that the BMI U2AF2 coding sequence consisted of 1416 bp and encoded 471 amino acids with a molecular weight of 53.12 kDa. The protein sequence has high sequence homology with U2AF65 of 6 species - Homo sapiens (100%), Equus caballus (100%), Canis lupus (100%), Macaca mulatta (99.8%), Bos taurus (74.4%), and Mus musculus (74.4%). The phylogenetic tree analysis revealed that BMI U2AF65 has a closer genetic relationship with B. taurus U2AF65 than with U2AF65 of E. caballus, C. lupus, M. mulatta, H. sapiens, and M. musculus. RT-PCR analysis showed that BMI U2AF2 was most highly expressed in the brain; moderately expressed in the spleen, lung, muscle, and skin; and weakly expressed in the liver, kidney, and ovary. Its expression was nearly silent in the spinal cord, nerve fiber, heart, stomach, pancreas, and intestine. Three microRNA target sites were predicted in the CDS of BMI U2AF2 messenger RNA. Our results establish a foundation for further insight into this swine gene.

  20. Accelerating Information Retrieval from Profile Hidden Markov Model Databases.

    PubMed

    Tamimi, Ahmad; Ashhab, Yaqoub; Tamimi, Hashem

    2016-01-01

    Profile Hidden Markov Model (Profile-HMM) is an efficient statistical approach to represent protein families. Currently, several databases maintain valuable protein sequence information as profile-HMMs. There is an increasing interest to improve the efficiency of searching Profile-HMM databases to detect sequence-profile or profile-profile homology. However, most efforts to enhance searching efficiency have been focusing on improving the alignment algorithms. Although the performance of these algorithms is fairly acceptable, the growing size of these databases, as well as the increasing demand for using batch query searching approach, are strong motivations that call for further enhancement of information retrieval from profile-HMM databases. This work presents a heuristic method to accelerate the current profile-HMM homology searching approaches. The method works by cluster-based remodeling of the database to reduce the search space, rather than focusing on the alignment algorithms. Using different clustering techniques, 4284 TIGRFAMs profiles were clustered based on their similarities. A representative for each cluster was assigned. To enhance sensitivity, we proposed an extended step that allows overlapping among clusters. A validation benchmark of 6000 randomly selected protein sequences was used to query the clustered profiles. To evaluate the efficiency of our approach, speed and recall values were measured and compared with the sequential search approach. Using hierarchical, k-means, and connected component clustering techniques followed by the extended overlapping step, we obtained an average reduction in time of 41%, and an average recall of 96%. Our results demonstrate that representation of profile-HMMs using a clustering-based approach can significantly accelerate data retrieval from profile-HMM databases.

  1. Mapping Argonaute and conventional RNA-binding protein interactions with RNA at single-nucleotide resolution using HITS-CLIP and CIMS analysis

    PubMed Central

    Moore, Michael; Zhang, Chaolin; Gantman, Emily Conn; Mele, Aldo; Darnell, Jennifer C.; Darnell, Robert B.

    2014-01-01

    Summary Identifying sites where RNA binding proteins (RNABPs) interact with target RNAs opens the door to understanding the vast complexity of RNA regulation. UV-crosslinking and immunoprecipitation (CLIP) is a transformative technology in which RNAs purified from in vivo cross-linked RNA-protein complexes are sequenced to reveal footprints of RNABP:RNA contacts. CLIP combined with high throughput sequencing (HITS-CLIP) is a generalizable strategy to produce transcriptome-wide RNA binding maps with higher accuracy and resolution than standard RNA immunoprecipitation (RIP) profiling or purely computational approaches. Applying CLIP to Argonaute proteins has expanded the utility of this approach to mapping binding sites for microRNAs and other small regulatory RNAs. Finally, recent advances in data analysis take advantage of crosslinked-induced mutation sites (CIMS) to refine RNA-binding maps to single-nucleotide resolution. Once IP conditions are established, HITS-CLIP takes approximately eight days to prepare RNA for sequencing. Established pipelines for data analysis, including for CIMS, take 3-4 days. PMID:24407355

  2. Systematic evaluation of the impact of ChIP-seq read designs on genome coverage, peak identification, and allele-specific binding detection.

    PubMed

    Zhang, Qi; Zeng, Xin; Younkin, Sam; Kawli, Trupti; Snyder, Michael P; Keleş, Sündüz

    2016-02-24

    Chromatin immunoprecipitation followed by sequencing (ChIP-seq) experiments revolutionized genome-wide profiling of transcription factors and histone modifications. Although maturing sequencing technologies allow these experiments to be carried out with short (36-50 bps), long (75-100 bps), single-end, or paired-end reads, the impact of these read parameters on the downstream data analysis are not well understood. In this paper, we evaluate the effects of different read parameters on genome sequence alignment, coverage of different classes of genomic features, peak identification, and allele-specific binding detection. We generated 101 bps paired-end ChIP-seq data for many transcription factors from human GM12878 and MCF7 cell lines. Systematic evaluations using in silico variations of these data as well as fully simulated data, revealed complex interplay between the sequencing parameters and analysis tools, and indicated clear advantages of paired-end designs in several aspects such as alignment accuracy, peak resolution, and most notably, allele-specific binding detection. Our work elucidates the effect of design on the downstream analysis and provides insights to investigators in deciding sequencing parameters in ChIP-seq experiments. We present the first systematic evaluation of the impact of ChIP-seq designs on allele-specific binding detection and highlights the power of pair-end designs in such studies.

  3. Multilocus Sequence Typing Analysis of Staphylococcus lugdunensis Implies a Clonal Population Structure

    PubMed Central

    Chassain, Benoît; Lemée, Ludovic; Didi, Jennifer; Thiberge, Jean-Michel; Brisse, Sylvain; Pons, Jean-Louis

    2012-01-01

    Staphylococcus lugdunensis is recognized as one of the major pathogenic species within the genus Staphylococcus, even though it belongs to the coagulase-negative group. A multilocus sequence typing (MLST) scheme was developed to study the genetic relationships and population structure of 87 S. lugdunensis isolates from various clinical and geographic sources by DNA sequence analysis of seven housekeeping genes (aroE, dat, ddl, gmk, ldh, recA, and yqiL). The number of alleles ranged from four (gmk and ldh) to nine (yqiL). Allelic profiles allowed the definition of 20 different sequence types (STs) and five clonal complexes. The 20 STs lacked correlation with geographic source. Isolates recovered from hematogenic infections (blood or osteoarticular isolates) or from skin and soft tissue infections did not cluster in separate lineages. Penicillin-resistant isolates clustered mainly in one clonal complex, unlike glycopeptide-tolerant isolates, which did not constitute a distinct subpopulation within S. lugdunensis. Phylogenies from the sequences of the seven individual housekeeping genes were congruent, indicating a predominantly mutational evolution of these genes. Quantitative analysis of the linkages between alleles from the seven loci revealed a significant linkage disequilibrium, thus confirming a clonal population structure for S. lugdunensis. This first MLST scheme for S. lugdunensis provides a new tool for investigating the macroepidemiology and phylogeny of this unusually virulent coagulase-negative Staphylococcus. PMID:22785196

  4. Annotating Protein Functional Residues by Coupling High-Throughput Fitness Profile and Homologous-Structure Analysis.

    PubMed

    Du, Yushen; Wu, Nicholas C; Jiang, Lin; Zhang, Tianhao; Gong, Danyang; Shu, Sara; Wu, Ting-Ting; Sun, Ren

    2016-11-01

    Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp), we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available. To fully comprehend the diverse functions of a protein, it is essential to understand the functionality of individual residues. Current methods are highly dependent on evolutionary sequence conservation, which is usually limited by sampling size. Sequence conservation-based methods are further confounded by structural constraints and multifunctionality of proteins. Here we present a method that can systematically identify and annotate functional residues of a given protein. We used a high-throughput functional profiling platform to identify essential residues. Coupling it with homologous-structure comparison, we were able to annotate multiple functions of proteins. We demonstrated the method with the PB1 protein of influenza A virus and identified novel functional residues in addition to its canonical function as an RNA-dependent RNA polymerase. Not limited to virology, this method is generally applicable to other proteins that can be functionally selected and about which homologous-structure information is available. Copyright © 2016 Du et al.

  5. Deppdb--DNA electrostatic potential properties database: electrostatic properties of genome DNA.

    PubMed

    Osypov, Alexander A; Krutinin, Gleb G; Kamzolova, Svetlana G

    2010-06-01

    The electrostatic properties of genome DNA influence its interactions with different proteins, in particular, the regulation of transcription by RNA-polymerases. DEPPDB--DNA Electrostatic Potential Properties Database--was developed to hold and provide all available information on the electrostatic properties of genome DNA combined with its sequence and annotation of biological and structural properties of genome elements and whole genomes. Genomes in DEPPDB are organized on a taxonomical basis. Currently, the database contains all the completely sequenced bacterial and viral genomes according to NCBI RefSeq. General properties of the genome DNA electrostatic potential profile and principles of its formation are revealed. This potential correlates with the GC content but does not correspond to it exactly and strongly depends on both the sequence arrangement and its context (flanking regions). Analysis of the promoter regions for bacterial and viral RNA polymerases revealed a correspondence between the scale of these proteins' physical properties and electrostatic profile patterns. We also discovered a direct correlation between the potential value and the binding frequency of RNA polymerase to DNA, supporting the idea of the role of electrostatics in these interactions. This matches a pronounced tendency of the promoter regions to possess higher values of the electrostatic potential.

  6. Identification and characterization of novel serum microRNA candidates from deep sequencing in cervical cancer patients.

    PubMed

    Juan, Li; Tong, Hong-li; Zhang, Pengjun; Guo, Guanghong; Wang, Zi; Wen, Xinyu; Dong, Zhennan; Tian, Ya-ping

    2014-09-03

    Small non-coding microRNAs (miRNAs) are involved in cancer development and progression, and serum profiles of cervical cancer patients may be useful for identifying novel miRNAs. We performed deep sequencing on serum pools of cervical cancer patients and healthy controls with 3 replicates and constructed a small RNA library. We used MIREAP to predict novel miRNAs and identified 2 putative novel miRNAs between serum pools of cervical cancer patients and healthy controls after filtering out pseudo-pre-miRNAs using Triplet-SVM analysis. The 2 putative novel miRNAs were validated by real time PCR and were significantly decreased in cervical cancer patients compared with healthy controls. One novel miRNA had an area under curve (AUC) of 0.921 (95% CI: 0.883, 0.959) with a sensitivity of 85.7% and a specificity of 88.2% when discriminating between cervical cancer patients and healthy controls. Our results suggest that characterizing serum profiles of cervical cancers by Solexa sequencing may be a good method for identifying novel miRNAs and that the validated novel miRNAs described here may be cervical cancer-associated biomarkers.

  7. RNA sequencing-based longitudinal transcriptomic profiling gives novel insights into the disease mechanism of generalized pustular psoriasis.

    PubMed

    Wang, Lingyan; Yu, Xiaoling; Wu, Chao; Zhu, Teng; Wang, Wenming; Zheng, Xiaofeng; Jin, Hongzhong

    2018-06-05

    Generalized pustular psoriasis (GPP) is a rare, episodic, potentially life-threatening inflammatory disease. However, the pathogenesis of GPP, and universally accepted therapies for treating it, remain undefined. To better understand the disease mechanism of GPP, we performed a transcriptome analysis to profile the gene expression of peripheral blood mononuclear cells (PBMCs) from patients enrolled at the time of diagnosis and receiving follow-up treatment for up to 6 months. RNA sequencing data revealed that gene expression in five GPP patients' PBMCs was profoundly altered following acitretin treatment. Differentially expressed gene (DEG) analysis suggested that genes related to psoriatic inflammation, including CXCL1, CXCL8 (IL-8), S100A8, S100A9, S100A12 and LCN2, were significantly downregulated in patients in remission from GPP. Functional enrichment and annotation analysis unveiled a cluster of DEGs significantly associated with the function of leukocytes, particularly neutrophils. Pathway analysis suggested that a variety of pro-inflammatory pathways were inhibited in patients in remission. This analysis not only reaffirmed known signaling pathways in GPP pathogenesis, but also implicated novel factors and pathways, such as cell cycle regulation pathways. Furthermore, regulator network analysis provided bioinformatics-based support for upstream molecules as potential therapeutic targets such as oncostatin M. This longitudinal analysis of blood transcriptomes provides the first evidence that dysregulated gene expression in peripheral blood may significantly contribute to psoriatic inflammation in GPP patients. Novel canonical pathways and biomarkers identified in the current research may provide insights to help understand GPP pathobiology and advance novel therapeutics.

  8. Transcript Profiling of Common Bean (Phaseolus vulgaris L.) Using the GeneChip(R) Soybean Genome Array: Optimizing Analysis by Masking Biased Probes

    USDA-ARS?s Scientific Manuscript database

    Common bean (Phaseolus vulgaris) and soybean (Glycine max) both belong to the Phaseoleae tribe and share significant coding sequence homology. This suggests that the GeneChip(R) Soybean Genome Array (soybean GeneChip) may be used for gene expression studies using common bean. To evaluate the utility...

  9. Identification and expression analysis of duck interleukin-17D in Riemeralla anatipestifer infection

    USDA-ARS?s Scientific Manuscript database

    Interleukin (IL)-17D is a proinflammatory cytokine with limited information on its biological functions. Here we provide the description of the sequence, bioactivity, and mRNA expression profile of duck IL-17D homologue. A full-length duck IL-17D (duIL-17D) cDNA with a 624-bp coding region was ident...

  10. Defining the transcriptome assembly and its use for genome dynamics and transcriptome profiling studies in pigeonpea (Cajanus cajan L.)

    USDA-ARS?s Scientific Manuscript database

    This study reports generation of large-scale genomic resources for pigeonpea, a so-called ‘orphan crop species’ of the semi-arid tropic regions. Roche FLX/454 sequencing was carried out on a normalized cDNA pool prepared from 31 tissues produced 494,353 short transcript reads (STRs). Cluster analysi...

  11. Benthic Bacterial Diversity in Submerged Sinkhole Ecosystems▿ †

    PubMed Central

    Nold, Stephen C.; Pangborn, Joseph B.; Zajack, Heidi A.; Kendall, Scott T.; Rediske, Richard R.; Biddanda, Bopaiah A.

    2010-01-01

    Physicochemical characterization, automated ribosomal intergenic spacer analysis (ARISA) community profiling, and 16S rRNA gene sequencing approaches were used to study bacterial communities inhabiting submerged Lake Huron sinkholes inundated with hypoxic, sulfate-rich groundwater. Photosynthetic cyanobacterial mats on the sediment surface were dominated by Phormidium autumnale, while deeper, organically rich sediments contained diverse and active bacterial communities. PMID:19880643

  12. Iridium profile for 10 million years across the Cretaceous-Tertiary boundary at Gubbio (Italy)

    NASA Technical Reports Server (NTRS)

    Alvarez, Walter; Asaro, Frank; Montanari, Alessandro

    1990-01-01

    The iridium anomaly at the Cretaceous-Tertiary (KT) boundary was discovered in the pelagic limestone sequence at Gubbio on the basis of 12 samples analyzed by neutron activation analysis (NAA) and was interpreted as indicating impact of a large extraterrestrial object at exactly the time of the KT mass extinction. Continuing controversy over the shape of the Ir profile at the Gubbio KT boundary and its interpretation called for a more detailed follow-up study. Analysis of a 57-meter-thick, 10-million-year-old part of the Gubbio sequence using improved NAA techniques revealed that there is only one Ir anomaly at the KT boundary, but this anomaly shows an intricate fine structure, the origin of which cannot yet be entirely explained. The KT Ir anomaly peaks in a 1-centimeter-thick clay layer, where the average Ir concentration is 3000 parts per trillion (ppt); this peak is flanked by tails with Ir concentrations of 20 to 80 ppt that rise above a background of 12 to 13 ppt. The fine structure of the tails is probably due in part to lateral reworking, diffusion, burrowing, and perhaps Milankovitch cyclicity.

  13. The study of transcriptome profiles in Holstein cows with miscarriage during peri-implantation.

    PubMed

    Zhao, Guoli; Li, Yanyan; Kang, Xiaolong; Huang, Liang; Li, Peng; Zhou, Jinghang; Shi, Yuangang

    2018-05-31

    In this study, the transcriptome profile of cows who experienced miscarriage during peri-implantation was investigated. The transcriptome was checked by RNA sequencing, and the analyzed by bioinformatics methods. The results suggested that serum progesterone levels were significantly decreased in the cows who miscarried compared with the pregnant cows at 18 d, 21d, 33 d, 39 d and 51 d after artificial insemination. The RNA sequencing results suggested that 32, 176, 5, 10 and 2 differentially expressed genes (DEGs) were identified in the pregnant cows and the cows who miscarried at 18, 21, 33, 39 and 51 d after artificial insemination. Furthermore, the DEGs were analysed with hierarchical clustering and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, and 15, 101, 1, 2 and 2 DEGs were upregulated, and 17, 74, 4, 8 and 0 DEGs were downregulated in the cows in the pregnant and miscarriage groups, respectively at 18, 21 33, 39 and 51 d after artificial insemination. These DEGs were distributed to 13, 20, 3, 6 and 20 pathways. This analysis has identified genes and pathways crucial for pregnancy and miscarriage in cows.

  14. On the potential of using peculiarities of the protein intrinsic disorder distribution in mitochondrial cytochrome b to identify the source of animal meats

    PubMed Central

    Yacoub, Haitham A.; Sadek, Mahmoud A.; Uversky, Vladimir N.

    2017-01-01

    ABSTRACT This study was conducted to identify the source of animal meat based on the peculiarities of protein intrinsic disorder distribution in mitochondrial cytochrome b (mtCyt-b). The analysis revealed that animal and avian species can be discriminated based on the proportions of the two groups of residues, Leu+Ile, and Ser+Pro+Ala, in the amino acid sequences of their mtCyt-b. Although levels of the overall intrinsic disorder in mtCyt-b is not very high, the peculiarities of disorder distribution within the sequences of mtCyt-b from different species varies in a rather specific way. In fact, positions and intensities of disorder/flexibility “signals” in the corresponding disorder profiles are relatively unique for avian and animal species. Therefore, it is possible to devise a set of simple rules based on the peculiarities of disorder profiles of their mtCyt-b proteins to discriminate among species. This intrinsic disorder-based analysis represents a new technique that could be used to provide a promising solution for identification of the source of meats. PMID:28331777

  15. High-Resolution Analysis of Coronavirus Gene Expression by RNA Sequencing and Ribosome Profiling

    PubMed Central

    Jones, Joshua D.; Chung, Betty Y.-W.; Siddell, Stuart G.; Brierley, Ian

    2016-01-01

    Members of the family Coronaviridae have the largest genomes of all RNA viruses, typically in the region of 30 kilobases. Several coronaviruses, such as Severe acute respiratory syndrome-related coronavirus (SARS-CoV) and Middle East respiratory syndrome-related coronavirus (MERS-CoV), are of medical importance, with high mortality rates and, in the case of SARS-CoV, significant pandemic potential. Other coronaviruses, such as Porcine epidemic diarrhea virus and Avian coronavirus, are important livestock pathogens. Ribosome profiling is a technique which exploits the capacity of the translating ribosome to protect around 30 nucleotides of mRNA from ribonuclease digestion. Ribosome-protected mRNA fragments are purified, subjected to deep sequencing and mapped back to the transcriptome to give a global “snap-shot” of translation. Parallel RNA sequencing allows normalization by transcript abundance. Here we apply ribosome profiling to cells infected with Murine coronavirus, mouse hepatitis virus, strain A59 (MHV-A59), a model coronavirus in the same genus as SARS-CoV and MERS-CoV. The data obtained allowed us to study the kinetics of virus transcription and translation with exquisite precision. We studied the timecourse of positive and negative-sense genomic and subgenomic viral RNA production and the relative translation efficiencies of the different virus ORFs. Virus mRNAs were not found to be translated more efficiently than host mRNAs; rather, virus translation dominates host translation at later time points due to high levels of virus transcripts. Triplet phasing of the profiling data allowed precise determination of translated reading frames and revealed several translated short open reading frames upstream of, or embedded within, known virus protein-coding regions. Ribosome pause sites were identified in the virus replicase polyprotein pp1a ORF and investigated experimentally. Contrary to expectations, ribosomes were not found to pause at the ribosomal frameshift site. To our knowledge this is the first application of ribosome profiling to an RNA virus. PMID:26919232

  16. Massively parallel sequencing of forensic STRs and SNPs using the Illumina® ForenSeq™ DNA Signature Prep Kit on the MiSeq FGx™ Forensic Genomics System.

    PubMed

    Guo, Fei; Yu, Jiao; Zhang, Lu; Li, Jun

    2017-11-01

    The ForenSeq™ DNA Signature Prep Kit (ForenSeq Kit) is designed to detect more than 200 forensically relevant markers in a single reaction on the MiSeq FGx™ Forensic Genomics System (MiSeq FGx System), including Amelogenin, 27 autosomal short tandem repeats (A-STRs), 7 X chromosomal STRs (X-STRs), 24 Y chromosomal STRs (Y-STRs) and 94 identity-informative single nucleotide polymorphisms (iSNPs) with the option to contain 22 phenotypic-informative SNPs (pSNPs) and 56 ancestry-informative SNPs (aSNPs). In this study, we evaluated the MiSeq FGx System on three major parts: methodological optimization (DNA extraction, sample quantification, library normalization, diluted libraries concentration, and sample-to-cell arrangement), massively parallel sequencing (MPS) performance (depth of coverage, sequence coverage ratio, and allele coverage ratio), and ForenSeq Kit characteristics (repeatability and concordance, sensitivity, mixture, stability and case-type samples). Results showed that quantitative polymerase chain reaction (qPCR)-based sample quantification and library normalization and the appropriate number of pooled libraries and concentration of diluted libraries provided a greater level of MPS performance and repeatability. Repeatable and concordant genotypes were obtained by the ForenSeq Kit. Full profiles were obtained from ≥100pg input DNA for STRs and ≥200pg for SNPs. A sample with ≥5% minor contributors was considered as a mixture by imbalanced allele coverage ratio distribution, and full profiles from minor contributors were easily detected between 9:1 and 1:9 mixtures with known reference profiles. The ForenSeq Kit tolerated considerable concentrations of inhibitors like ≤200μM hematin and ≤50μg/ml humic acid, and >56% STR profiles and >88% SNP profiles were obtained from ≥200-bp degraded samples. Also, it was adapted to case-type samples. As a whole, the ForenSeq Kit is a well-performed, robust, reliable, reproducible and highly informative assay, and it can fully meet requirements for human identification. Further, sensitive QC indicator and automated sample comparison function in the ForenSeq™ Universal Analysis Software are quite helpful, so that we can concentrate on questionable genotypes and avoid tedious and time-consuming labor to maximum the time spent in data analysis. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. Bisulfite-independent analysis of CpG island methylation enables genome-scale stratification of single cells.

    PubMed

    Han, Lin; Wu, Hua-Jun; Zhu, Haiying; Kim, Kun-Yong; Marjani, Sadie L; Riester, Markus; Euskirchen, Ghia; Zi, Xiaoyuan; Yang, Jennifer; Han, Jasper; Snyder, Michael; Park, In-Hyun; Irizarry, Rafael; Weissman, Sherman M; Michor, Franziska; Fan, Rong; Pan, Xinghua

    2017-06-02

    Conventional DNA bisulfite sequencing has been extended to single cell level, but the coverage consistency is insufficient for parallel comparison. Here we report a novel method for genome-wide CpG island (CGI) methylation sequencing for single cells (scCGI-seq), combining methylation-sensitive restriction enzyme digestion and multiple displacement amplification for selective detection of methylated CGIs. We applied this method to analyzing single cells from two types of hematopoietic cells, K562 and GM12878 and small populations of fibroblasts and induced pluripotent stem cells. The method detected 21 798 CGIs (76% of all CGIs) per cell, and the number of CGIs consistently detected from all 16 profiled single cells was 20 864 (72.7%), with 12 961 promoters covered. This coverage represents a substantial improvement over results obtained using single cell reduced representation bisulfite sequencing, with a 66-fold increase in the fraction of consistently profiled CGIs across individual cells. Single cells of the same type were more similar to each other than to other types, but also displayed epigenetic heterogeneity. The method was further validated by comparing the CpG methylation pattern, methylation profile of CGIs/promoters and repeat regions and 41 classes of known regulatory markers to the ENCODE data. Although not every minor methylation differences between cells are detectable, scCGI-seq provides a solid tool for unsupervised stratification of a heterogeneous cell population. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. Global analysis of gene expression profiles in developing physic nut (Jatropha curcas L.) seeds.

    PubMed

    Jiang, Huawu; Wu, Pingzhi; Zhang, Sheng; Song, Chi; Chen, Yaping; Li, Meiru; Jia, Yongxia; Fang, Xiaohua; Chen, Fan; Wu, Guojiang

    2012-01-01

    Physic nut (Jatropha curcas L.) is an oilseed plant species with high potential utility as a biofuel. Furthermore, following recent sequencing of its genome and the availability of expressed sequence tag (EST) libraries, it is a valuable model plant for studying carbon assimilation in endosperms of oilseed plants. There have been several transcriptomic analyses of developing physic nut seeds using ESTs, but they have provided limited information on the accumulation of stored resources in the seeds. We applied next-generation Illumina sequencing technology to analyze global gene expression profiles of developing physic nut seeds 14, 19, 25, 29, 35, 41, and 45 days after pollination (DAP). The acquired profiles reveal the key genes, and their expression timeframes, involved in major metabolic processes including: carbon flow, starch metabolism, and synthesis of storage lipids and proteins in the developing seeds. The main period of storage reserves synthesis in the seeds appears to be 29-41 DAP, and the fatty acid composition of the developing seeds is consistent with relative expression levels of different isoforms of acyl-ACP thioesterase and fatty acid desaturase genes. Several transcription factor genes whose expression coincides with storage reserve deposition correspond to those known to regulate the process in Arabidopsis. The results will facilitate searches for genes that influence de novo lipid synthesis, accumulation and their regulatory networks in developing physic nut seeds, and other oil seeds. Thus, they will be helpful in attempts to modify these plants for efficient biofuel production.

  19. RNA-Sequencing of Primary Retinoblastoma Tumors Provides New Insights and Challenges Into Tumor Development.

    PubMed

    Elchuri, Sailaja V; Rajasekaran, Swetha; Miles, Wayne O

    2018-01-01

    Retinoblastoma is rare tumor of the retina caused by the homozygous loss of the Retinoblastoma 1 tumor suppressor gene (RB1). Loss of the RB1 protein, pRB, results in de-regulated activity of the E2F transcription factors, chromatin changes and developmental defects leading to tumor development. Extensive microarray profiles of these tumors have enabled the identification of genes sensitive to pRB disruption, however, this technology has a number of limitations in the RNA profiles that they generate. The advent of RNA-sequencing has enabled the global profiling of all of the RNA within the cell including both coding and non-coding features and the detection of aberrant RNA processing events. In this perspective, we focus on discussing how RNA-sequencing of rare Retinoblastoma tumors will build on existing data and open up new area's to improve our understanding of the biology of these tumors. In particular, we discuss how the RB-research field may be to use this data to determine how RB1 loss results in the expression of; non-coding RNAs, causes aberrant RNA processing events and how a deeper analysis of metabolic RNA changes can be utilized to model tumor specific shifts in metabolism. Each section discusses new opportunities and challenges associated with these types of analyses and aims to provide an honest assessment of how understanding these different processes may contribute to the treatment of Retinoblastoma.

  20. ReprDB and panDB: minimalist databases with maximal microbial representation.

    PubMed

    Zhou, Wei; Gay, Nicole; Oh, Julia

    2018-01-18

    Profiling of shotgun metagenomic samples is hindered by a lack of unified microbial reference genome databases that (i) assemble genomic information from all open access microbial genomes, (ii) have relatively small sizes, and (iii) are compatible to various metagenomic read mapping tools. Moreover, computational tools to rapidly compile and update such databases to accommodate the rapid increase in new reference genomes do not exist. As a result, database-guided analyses often fail to profile a substantial fraction of metagenomic shotgun sequencing reads from complex microbiomes. We report pipelines that efficiently traverse all open access microbial genomes and assemble non-redundant genomic information. The pipelines result in two species-resolution microbial reference databases of relatively small sizes: reprDB, which assembles microbial representative or reference genomes, and panDB, for which we developed a novel iterative alignment algorithm to identify and assemble non-redundant genomic regions in multiple sequenced strains. With the databases, we managed to assign taxonomic labels and genome positions to the majority of metagenomic reads from human skin and gut microbiomes, demonstrating a significant improvement over a previous database-guided analysis on the same datasets. reprDB and panDB leverage the rapid increases in the number of open access microbial genomes to more fully profile metagenomic samples. Additionally, the databases exclude redundant sequence information to avoid inflated storage or memory space and indexing or analyzing time. Finally, the novel iterative alignment algorithm significantly increases efficiency in pan-genome identification and can be useful in comparative genomic analyses.

  1. Gene context analysis in the Integrated Microbial Genomes (IMG) data management system.

    PubMed

    Mavromatis, Konstantinos; Chu, Ken; Ivanova, Natalia; Hooper, Sean D; Markowitz, Victor M; Kyrpides, Nikos C

    2009-11-24

    Computational methods for determining the function of genes in newly sequenced genomes have been traditionally based on sequence similarity to genes whose function has been identified experimentally. Function prediction methods can be extended using gene context analysis approaches such as examining the conservation of chromosomal gene clusters, gene fusion events and co-occurrence profiles across genomes. Context analysis is based on the observation that functionally related genes are often having similar gene context and relies on the identification of such events across phylogenetically diverse collection of genomes. We have used the data management system of the Integrated Microbial Genomes (IMG) as the framework to implement and explore the power of gene context analysis methods because it provides one of the largest available genome integrations. Visualization and search tools to facilitate gene context analysis have been developed and applied across all publicly available archaeal and bacterial genomes in IMG. These computations are now maintained as part of IMG's regular genome content update cycle. IMG is available at: http://img.jgi.doe.gov.

  2. Context based computational analysis and characterization of ARS consensus sequences (ACS) of Saccharomyces cerevisiae genome.

    PubMed

    Singh, Vinod Kumar; Krishnamachari, Annangarachari

    2016-09-01

    Genome-wide experimental studies in Saccharomyces cerevisiae reveal that autonomous replicating sequence (ARS) requires an essential consensus sequence (ACS) for replication activity. Computational studies identified thousands of ACS like patterns in the genome. However, only a few hundreds of these sites act as replicating sites and the rest are considered as dormant or evolving sites. In a bid to understand the sequence makeup of replication sites, a content and context-based analysis was performed on a set of replicating ACS sequences that binds to origin-recognition complex (ORC) denoted as ORC-ACS and non-replicating ACS sequences (nrACS), that are not bound by ORC. In this study, DNA properties such as base composition, correlation, sequence dependent thermodynamic and DNA structural profiles, and their positions have been considered for characterizing ORC-ACS and nrACS. Analysis reveals that ORC-ACS depict marked differences in nucleotide composition and context features in its vicinity compared to nrACS. Interestingly, an A-rich motif was also discovered in ORC-ACS sequences within its nucleosome-free region. Profound changes in the conformational features, such as DNA helical twist, inclination angle and stacking energy between ORC-ACS and nrACS were observed. Distribution of ACS motifs in the non-coding segments points to the locations of ORC-ACS which are found far away from the adjacent gene start position compared to nrACS thereby enabling an accessible environment for ORC-proteins. Our attempt is novel in considering the contextual view of ACS and its flanking region along with nucleosome positioning in the S. cerevisiae genome and may be useful for any computational prediction scheme.

  3. DNA Multiple Sequence Alignment Guided by Protein Domains: The MSA-PAD 2.0 Method.

    PubMed

    Balech, Bachir; Monaco, Alfonso; Perniola, Michele; Santamaria, Monica; Donvito, Giacinto; Vicario, Saverio; Maggi, Giorgio; Pesole, Graziano

    2018-01-01

    Multiple sequence alignment (MSA) is a fundamental component in many DNA sequence analyses including metagenomics studies and phylogeny inference. When guided by protein profiles, DNA multiple alignments assume a higher precision and robustness. Here we present details of the use of the upgraded version of MSA-PAD (2.0), which is a DNA multiple sequence alignment framework able to align DNA sequences coding for single/multiple protein domains guided by PFAM or user-defined annotations. MSA-PAD has two alignment strategies, called "Gene" and "Genome," accounting for coding domains order and genomic rearrangements, respectively. Novel options were added to the present version, where the MSA can be guided by protein profiles provided by the user. This allows MSA-PAD 2.0 to run faster and to add custom protein profiles sometimes not present in PFAM database according to the user's interest. MSA-PAD 2.0 is currently freely available as a Web application at https://recasgateway.cloud.ba.infn.it/ .

  4. Mining genes involved in insecticide resistance of Liposcelis bostrychophila Badonnel by transcriptome and expression profile analysis.

    PubMed

    Dou, Wei; Shen, Guang-Mao; Niu, Jin-Zhi; Ding, Tian-Bo; Wei, Dan-Dan; Wang, Jin-Jun

    2013-01-01

    Recent studies indicate that infestations of psocids pose a new risk for global food security. Among the psocids species, Liposcelis bostrychophila Badonnel has gained recognition in importance because of its parthenogenic reproduction, rapid adaptation, and increased worldwide distribution. To date, the molecular data available for L. bostrychophila is largely limited to genes identified through homology. Also, no transcriptome data relevant to psocids infection is available. In this study, we generated de novo assembly of L. bostrychophila transcriptome performed through the short read sequencing technology (Illumina). In a single run, we obtained more than 51 million sequencing reads that were assembled into 60,012 unigenes (mean size = 711 bp) by Trinity. The transcriptome sequences from different developmental stages of L. bostrychophila including egg, nymph and adult were annotated with non-redundant (Nr) protein database, gene ontology (GO), cluster of orthologous groups of proteins (COG), and KEGG orthology (KO). The analysis revealed three major enzyme families involved in insecticide metabolism as differentially expressed in the L. bostrychophila transcriptome. A total of 49 P450-, 31 GST- and 21 CES-specific genes representing the three enzyme families were identified. Besides, 16 transcripts were identified to contain target site sequences of resistance genes. Furthermore, we profiled gene expression patterns upon insecticide (malathion and deltamethrin) exposure using the tag-based digital gene expression (DGE) method. The L. bostrychophila transcriptome and DGE data provide gene expression data that would further our understanding of molecular mechanisms in psocids. In particular, the findings of this investigation will facilitate identification of genes involved in insecticide resistance and designing of new compounds for control of psocids.

  5. Mining Genes Involved in Insecticide Resistance of Liposcelis bostrychophila Badonnel by Transcriptome and Expression Profile Analysis

    PubMed Central

    Dou, Wei; Shen, Guang-Mao; Niu, Jin-Zhi; Ding, Tian-Bo; Wei, Dan-Dan; Wang, Jin-Jun

    2013-01-01

    Background Recent studies indicate that infestations of psocids pose a new risk for global food security. Among the psocids species, Liposcelis bostrychophila Badonnel has gained recognition in importance because of its parthenogenic reproduction, rapid adaptation, and increased worldwide distribution. To date, the molecular data available for L. bostrychophila is largely limited to genes identified through homology. Also, no transcriptome data relevant to psocids infection is available. Methodology and Principal Findings In this study, we generated de novo assembly of L. bostrychophila transcriptome performed through the short read sequencing technology (Illumina). In a single run, we obtained more than 51 million sequencing reads that were assembled into 60,012 unigenes (mean size = 711 bp) by Trinity. The transcriptome sequences from different developmental stages of L. bostrychophila including egg, nymph and adult were annotated with non-redundant (Nr) protein database, gene ontology (GO), cluster of orthologous groups of proteins (COG), and KEGG orthology (KO). The analysis revealed three major enzyme families involved in insecticide metabolism as differentially expressed in the L. bostrychophila transcriptome. A total of 49 P450-, 31 GST- and 21 CES-specific genes representing the three enzyme families were identified. Besides, 16 transcripts were identified to contain target site sequences of resistance genes. Furthermore, we profiled gene expression patterns upon insecticide (malathion and deltamethrin) exposure using the tag-based digital gene expression (DGE) method. Conclusion The L. bostrychophila transcriptome and DGE data provide gene expression data that would further our understanding of molecular mechanisms in psocids. In particular, the findings of this investigation will facilitate identification of genes involved in insecticide resistance and designing of new compounds for control of psocids. PMID:24278202

  6. Molecular cloning, sequence identification and tissue expression profile of three novel sheep (Ovis aries) genes - BCKDHA, NAGA and HEXA.

    PubMed

    Liu, G Y; Gao, S Z

    2009-01-01

    The complete coding sequences of three sheep genes- BCKDHA, NAGA and HEXA were amplified using the reverse transcriptase polymerase chain reaction (RT-PCR), based on the conserved sequence information of the mouse or other mammals. The nucleotide sequences of these three genes revealed that the sheep BCKDHA gene encodes a protein of 313 amino acids which has high homology with the BCKDHA gene that encodes a protein of 447 amino acids that has high homology with the Branched chain keto acid dehydrogenase El, alpha polypeptide (BCKDHA) of five species chimpanzee (93%), human (96%), crab-eating macaque (93%), bovine (98%) and mouse (91%). The sheep NAGA gene encodes a protein of 411 amino acids that has high homology with the alpha-N-acetylgalactosaminidase (NAGA) of five species human (85%), bovine (94%), mouse (91%), rat (83%) and chicken (74%). The sheep HEXA gene encodes a protein of 529 amino acids that has high homology with the hexosaminidase A(HEXA) of five species bovine (98%), human (84%), Bornean orangután (84%), rat (80%) and mouse (81%). Finally these three novel sheep genes were assigned to GenelDs: 100145857, 100145858 and 100145856. The phylogenetic tree analysis revealed that the sheep BCKDHA, NAGA, and HEXA all have closer genetic relationships to the BCKDHA, NAGA, and HEXA of bovine. Tissue expression profile analysis was also carried out and results revealed that sheep BCKDHA, NAGA and HEXA genes were differentially expressed in tissues including muscle, heart, liver, fat, kidney, lung, small and large intestine. Our experiment is the first to establish the primary foundation for further research on these three sheep genes.

  7. Droplet barcoding for single cell transcriptomics applied to embryonic stem cells

    PubMed Central

    Klein, Allon M; Mazutis, Linas; Akartuna, Ilke; Tallapragada, Naren; Veres, Adrian; Li, Victor; Peshkin, Leonid; Weitz, David A; Kirschner, Marc W

    2015-01-01

    Summary It has long been the dream of biologists to map gene expression at the single cell level. With such data one might track heterogeneous cell sub-populations, and infer regulatory relationships between genes and pathways. Recently, RNA sequencing has achieved single cell resolution. What is limiting is an effective way to routinely isolate and process large numbers of individual cells for quantitative in-depth sequencing. We have developed a high-throughput droplet-microfluidic approach for barcoding the RNA from thousands of individual cells for subsequent analysis by next-generation sequencing. The method shows a surprisingly low noise profile and is readily adaptable to other sequencing-based assays. We analyzed mouse embryonic stem cells, revealing in detail the population structure and the heterogeneous onset of differentiation after LIF withdrawal. The reproducibility of these high-throughput single cell data allowed us to deconstruct cell populations and infer gene expression relationships. PMID:26000487

  8. Salivary bacterial fingerprints of established oral disease revealed by the Human Oral Microbe Identification using Next Generation Sequencing (HOMINGS) technique

    PubMed Central

    Belstrøm, Daniel; Paster, Bruce J.; Fiehn, Nils-Erik; Bardow, Allan; Holmstrup, Palle

    2016-01-01

    Background and objective The composition of the salivary microbiota, as determined using various molecular methods, has been reported to differentiate oral health from diseases. Thus, the purpose of this study was to utilize the newly developed molecular technique HOMINGS (Human Oral Microbe Identification using Next Generation Sequencing) for comparison of the salivary microbiota in patients with periodontitis, patients with dental caries, and orally healthy individuals. The hypothesis was that this method could add on to the existing knowledge on salivary bacterial profiles in oral health and disease. Design Stimulated saliva samples (n=30) were collected from 10 patients with untreated periodontitis, 10 patients with untreated dental caries, and 10 orally healthy individuals. Salivary microbiota was analyzed using HOMINGS and statistical analysis was performed using Kruskal–Wallis test with Benjamini–Hochberg's correction. Results From a total of 30 saliva samples, a mean number of probe targets of 205 (range 120–353) were identified, and a statistically significant higher mean number of targets was registered in samples from patients with periodontitis (mean 220, range 143–306) and dental caries (mean 221, range 165–353) as compared to orally healthy individuals (mean 174, range 120–260) (p=0.04 and p=0.04). Nine probe targets were identified with a different relative abundance between groups (p<0.05). Conclusions Cross-sectional comparison of salivary bacterial profiles by means of HOMINGS analysis showed that different salivary bacterial profiles were associated with oral health and disease. Future large-scale prospective studies are needed to evaluate if saliva-based screening for disease-associated oral bacterial profiles may be used for identification of patients at risk of acquiring periodontitis and dental caries. PMID:26782357

  9. Time-Resolved Transposon Insertion Sequencing Reveals Genome-Wide Fitness Dynamics during Infection.

    PubMed

    Yang, Guanhua; Billings, Gabriel; Hubbard, Troy P; Park, Joseph S; Yin Leung, Ka; Liu, Qin; Davis, Brigid M; Zhang, Yuanxing; Wang, Qiyao; Waldor, Matthew K

    2017-10-03

    Transposon insertion sequencing (TIS) is a powerful high-throughput genetic technique that is transforming functional genomics in prokaryotes, because it enables genome-wide mapping of the determinants of fitness. However, current approaches for analyzing TIS data assume that selective pressures are constant over time and thus do not yield information regarding changes in the genetic requirements for growth in dynamic environments (e.g., during infection). Here, we describe structured analysis of TIS data collected as a time series, termed pattern analysis of conditional essentiality (PACE). From a temporal series of TIS data, PACE derives a quantitative assessment of each mutant's fitness over the course of an experiment and identifies mutants with related fitness profiles. In so doing, PACE circumvents major limitations of existing methodologies, specifically the need for artificial effect size thresholds and enumeration of bacterial population expansion. We used PACE to analyze TIS samples of Edwardsiella piscicida (a fish pathogen) collected over a 2-week infection period from a natural host (the flatfish turbot). PACE uncovered more genes that affect E. piscicida 's fitness in vivo than were detected using a cutoff at a terminal sampling point, and it identified subpopulations of mutants with distinct fitness profiles, one of which informed the design of new live vaccine candidates. Overall, PACE enables efficient mining of time series TIS data and enhances the power and sensitivity of TIS-based analyses. IMPORTANCE Transposon insertion sequencing (TIS) enables genome-wide mapping of the genetic determinants of fitness, typically based on observations at a single sampling point. Here, we move beyond analysis of endpoint TIS data to create a framework for analysis of time series TIS data, termed pattern analysis of conditional essentiality (PACE). We applied PACE to identify genes that contribute to colonization of a natural host by the fish pathogen Edwardsiella piscicida. PACE uncovered more genes that affect E. piscicida 's fitness in vivo than were detected using a terminal sampling point, and its clustering of mutants with related fitness profiles informed design of new live vaccine candidates. PACE yields insights into patterns of fitness dynamics and circumvents major limitations of existing methodologies. Finally, the PACE method should be applicable to additional "omic" time series data, including screens based on clustered regularly interspaced short palindromic repeats with Cas9 (CRISPR/Cas9). Copyright © 2017 Yang et al.

  10. Integration of Bioinformatics and Synthetic Promoters Leads to the Discovery of Novel Elicitor-Responsive cis-Regulatory Sequences in Arabidopsis1[C][W][OA

    PubMed Central

    Koschmann, Jeannette; Machens, Fabian; Becker, Marlies; Niemeyer, Julia; Schulze, Jutta; Bülow, Lorenz; Stahl, Dietmar J.; Hehl, Reinhard

    2012-01-01

    A combination of bioinformatic tools, high-throughput gene expression profiles, and the use of synthetic promoters is a powerful approach to discover and evaluate novel cis-sequences in response to specific stimuli. With Arabidopsis (Arabidopsis thaliana) microarray data annotated to the PathoPlant database, 732 different queries with a focus on fungal and oomycete pathogens were performed, leading to 510 up-regulated gene groups. Using the binding site estimation suite of tools, BEST, 407 conserved sequence motifs were identified in promoter regions of these coregulated gene sets. Motif similarities were determined with STAMP, classifying the 407 sequence motifs into 37 families. A comparative analysis of these 37 families with the AthaMap, PLACE, and AGRIS databases revealed similarities to known cis-elements but also led to the discovery of cis-sequences not yet implicated in pathogen response. Using a parsley (Petroselinum crispum) protoplast system and a modified reporter gene vector with an internal transformation control, 25 elicitor-responsive cis-sequences from 10 different motif families were identified. Many of the elicitor-responsive cis-sequences also drive reporter gene expression in an Agrobacterium tumefaciens infection assay in Nicotiana benthamiana. This work significantly increases the number of known elicitor-responsive cis-sequences and demonstrates the successful integration of a diverse set of bioinformatic resources combined with synthetic promoter analysis for data mining and functional screening in plant-pathogen interaction. PMID:22744985

  11. m6aViewer: software for the detection, analysis, and visualization of N6-methyladenosine peaks from m6A-seq/ME-RIP sequencing data.

    PubMed

    Antanaviciute, Agne; Baquero-Perez, Belinda; Watson, Christopher M; Harrison, Sally M; Lascelles, Carolina; Crinnion, Laura; Markham, Alexander F; Bonthron, David T; Whitehouse, Adrian; Carr, Ian M

    2017-10-01

    Recent methods for transcriptome-wide N 6 -methyladenosine (m 6 A) profiling have facilitated investigations into the RNA methylome and established m 6 A as a dynamic modification that has critical regulatory roles in gene expression and may play a role in human disease. However, bioinformatics resources available for the analysis of m 6 A sequencing data are still limited. Here, we describe m6aViewer-a cross-platform application for analysis and visualization of m 6 A peaks from sequencing data. m6aViewer implements a novel m 6 A peak-calling algorithm that identifies high-confidence methylated residues with more precision than previously described approaches. The application enables data analysis through a graphical user interface, and thus, in contrast to other currently available tools, does not require the user to be skilled in computer programming. m6aViewer and test data can be downloaded here: http://dna2.leeds.ac.uk/m6a. © 2017 Antanaviciute et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  12. The Essential Genome of Escherichia coli K-12

    PubMed Central

    2018-01-01

    ABSTRACT Transposon-directed insertion site sequencing (TraDIS) is a high-throughput method coupling transposon mutagenesis with short-fragment DNA sequencing. It is commonly used to identify essential genes. Single gene deletion libraries are considered the gold standard for identifying essential genes. Currently, the TraDIS method has not been benchmarked against such libraries, and therefore, it remains unclear whether the two methodologies are comparable. To address this, a high-density transposon library was constructed in Escherichia coli K-12. Essential genes predicted from sequencing of this library were compared to existing essential gene databases. To decrease false-positive identification of essential genes, statistical data analysis included corrections for both gene length and genome length. Through this analysis, new essential genes and genes previously incorrectly designated essential were identified. We show that manual analysis of TraDIS data reveals novel features that would not have been detected by statistical analysis alone. Examples include short essential regions within genes, orientation-dependent effects, and fine-resolution identification of genome and protein features. Recognition of these insertion profiles in transposon mutagenesis data sets will assist genome annotation of less well characterized genomes and provides new insights into bacterial physiology and biochemistry. PMID:29463657

  13. Epigenetics of prostate cancer.

    PubMed

    McKee, Tawnya C; Tricoli, James V

    2015-01-01

    The introduction of novel technologies that can be applied to the investigation of the molecular underpinnings of human cancer has allowed for new insights into the mechanisms associated with tumor development and progression. They have also advanced the diagnosis, prognosis and treatment of cancer. These technologies include microarray and other analysis methods for the generation of large-scale gene expression data on both mRNA and miRNA, next-generation DNA sequencing technologies utilizing a number of platforms to perform whole genome, whole exome, or targeted DNA sequencing to determine somatic mutational differences and gene rearrangements, and a variety of proteomic analysis platforms including liquid chromatography/mass spectrometry (LC/MS) analysis to survey alterations in protein profiles in tumors. One other important advancement has been our current ability to survey the methylome of human tumors in a comprehensive fashion through the use of sequence-based and array-based methylation analysis (Bock et al., Nat Biotechnol 28:1106-1114, 2010; Harris et al., Nat Biotechnol 28:1097-1105, 2010). The focus of this chapter is to present and discuss the evidence for key genes involved in prostate tumor development, progression, or resistance to therapy that are regulated by methylation-induced silencing.

  14. Integrated metagenomic data analysis demonstrates that a loss of diversity in oral microbiota is associated with periodontitis.

    PubMed

    Ai, Dongmei; Huang, Ruocheng; Wen, Jin; Li, Chao; Zhu, Jiangping; Xia, Li Charlie

    2017-01-25

    Periodontitis is an inflammatory disease affecting the tissues supporting teeth (periodontium). Integrative analysis of metagenomic samples from multiple periodontitis studies is a powerful way to examine microbiota diversity and interactions within host oral cavity. A total of 43 subjects were recruited to participate in two previous studies profiling the microbial community of human subgingival plaque samples using shotgun metagenomic sequencing. We integrated metagenomic sequence data from those two studies, including six healthy controls, 14 sites representative of stable periodontitis, 16 sites representative of progressing periodontitis, and seven periodontal sites of unknown status. We applied phylogenetic diversity, differential abundance, and network analyses, as well as clustering, to the integrated dataset to compare microbiological community profiles among the different disease states. We found alpha-diversity, i.e., mean species diversity in sites or habitats at a local scale, to be the single strongest predictor of subjects' periodontitis status (P < 0.011). More specifically, healthy subjects had the highest alpha-diversity, while subjects with stable sites had the lowest alpha-diversity. From these results, we developed an alpha-diversity logistic model-based naive classifier able to perfectly predict the disease status of the seven subjects with unknown periodontal status (not used in training). Phylogenetic profiling resulted in the discovery of nine marker microbes, and these species are able to differentiate between stable and progressing periodontitis, achieving an accuracy of 94.4%. Finally, we found that the reduction of negatively correlated species is a notable signature of disease progression. Our results consistently show a strong association between the loss of oral microbiota diversity and the progression of periodontitis, suggesting that metagenomics sequencing and phylogenetic profiling are predictive of early periodontitis, leading to potential therapeutic intervention. Our results also support a keystone pathogen-mediated polymicrobial synergy and dysbiosis (PSD) model to explain the etiology of periodontitis. Apart from P. gingivalis, we identified three additional keystone species potentially mediating the progression of periodontitis progression based on pathogenic characteristics similar to those of known keystone pathogens.

  15. Genome-wide assessment of differential translations with ribosome profiling data.

    PubMed

    Xiao, Zhengtao; Zou, Qin; Liu, Yu; Yang, Xuerui

    2016-04-04

    The closely regulated process of mRNA translation is crucial for precise control of protein abundance and quality. Ribosome profiling, a combination of ribosome foot-printing and RNA deep sequencing, has been used in a large variety of studies to quantify genome-wide mRNA translation. Here, we developed Xtail, an analysis pipeline tailored for ribosome profiling data that comprehensively and accurately identifies differentially translated genes in pairwise comparisons. Applied on simulated and real datasets, Xtail exhibits high sensitivity with minimal false-positive rates, outperforming existing methods in the accuracy of quantifying differential translations. With published ribosome profiling datasets, Xtail does not only reveal differentially translated genes that make biological sense, but also uncovers new events of differential translation in human cancer cells on mTOR signalling perturbation and in human primary macrophages on interferon gamma (IFN-γ) treatment. This demonstrates the value of Xtail in providing novel insights into the molecular mechanisms that involve translational dysregulations.

  16. Personalized comprehensive molecular profiling of high risk osteosarcoma: Implications and limitations for precision medicine.

    PubMed

    Subbiah, Vivek; Wagner, Michael J; McGuire, Mary F; Sarwari, Nawid M; Devarajan, Eswaran; Lewis, Valerae O; Westin, Shanon; Kato, Shumei; Brown, Robert E; Anderson, Pete

    2015-12-01

    Despite advances in molecular medicine over recent decades, there has been little advancement in the treatment of osteosarcoma. We performed comprehensive molecular profiling in two cases of metastatic and chemotherapy-refractory osteosarcoma to guide molecularly targeted therapy. Hybridization capture of >300 cancer-related genes plus introns from 28 genes often rearranged or altered in cancer was applied to >50 ng of DNA extracted from tumor samples from two patients with recurrent, metastatic osteosarcoma. The DNA from each sample was sequenced to high, uniform coverage. Immunohistochemical probes and morphoproteomics analysis were performed, in addition to fluorescence in situ hybridization. All analyses were performed in CLIA-certified laboratories. Molecularly targeted therapy based on the resulting profiles was offered to the patients. Biomedical analytics were performed using QIAGEN's Ingenuity® Pathway Analysis. In Patient #1, comprehensive next-generation exome sequencing showed MET amplification, PIK3CA mutation, CCNE1 amplification, and PTPRD mutation. Immunohistochemistry-based morphoproteomic analysis revealed c-Met expression [(p)-c-Met (Tyr1234/1235)] and activation of mTOR/AKT pathway [IGF-1R (Tyr1165/1166), p-mTOR [Ser2448], p-Akt (Ser473)] and expression of SPARC and COX2. Targeted therapy was administered to match the P1K3CA, c-MET, and SPARC and COX2 aberrations with sirolimus+ crizotinib and abraxane+ celecoxib. In Patient #2, aberrations included NF2 loss in exons 2-16, PDGFRα amplification, and TP53 mutation. This patient was enrolled on a clinical trial combining targeted agents temsirolimus, sorafenib and bevacizumab, to match NF2, PDGFRα and TP53 aberrations. Both the patients did not benefit from matched therapy. Relapsed osteosarcoma is characterized by complex signaling and drug resistance pathways. Comprehensive molecular profiling holds great promise for tailoring personalized therapies for cancer. Methods for such profiling are evolving and need to be refined to better assist clinicians in making treatment decisions based on the large amount of data that results from this type of testing. Further research in this area is warranted.

  17. Honey bee (Apis mellifera) transferrin-gene structure and the role of ecdysteroids in the developmental regulation of its expression.

    PubMed

    do Nascimento, Adriana Mendes; Cuvillier-Hot, Virginie; Barchuk, Angel Roberto; Simões, Zilá Luz Paulino; Hartfelder, Klaus

    2004-05-01

    Social life is prone to invasion by microorganisms, and binding of ferric ions by transferrin is an efficient strategy to restrict their access to iron. In this study, we isolated cDNA and genomic clones encoding an Apis mellifera transferrin (AmTRF) gene. It has an open reading frame (ORF) of 2136 bp spread over nine exons. The deduced protein sequence comprises 686 amino acid residues plus a 26 residues signal sequence, giving a predicted molecular mass of 76 kDa. Comparison of the deduced AmTRF amino acid sequence with known insect transferrins revealed significant similarity extending over the entire sequence. It clusters with monoferric transferrins, with which it shares putative iron-binding residues in the N-terminal lobe. In a functional analysis of AmTRF expression in honey bee development, we monitored its expression profile in the larval and pupal stages. The negative regulation of AmTRF by ecdysteroids deduced from the developmental expression profile was confirmed by experimental treatment of spinning-stage honey bee larvae with 20-hydroxyecdysone, and of fourth instar-larvae with juvenile hormone. A juvenile hormone application to spinning-stage larvae, in contrast, had only a minor effect on AmTRF transcript levels. This is the first study implicating ecdysteroids in the developmental regulation of transferrin expression in an insect species.

  18. Dfam: a database of repetitive DNA based on profile hidden Markov models.

    PubMed

    Wheeler, Travis J; Clements, Jody; Eddy, Sean R; Hubley, Robert; Jones, Thomas A; Jurka, Jerzy; Smit, Arian F A; Finn, Robert D

    2013-01-01

    We present a database of repetitive DNA elements, called Dfam (http://dfam.janelia.org). Many genomes contain a large fraction of repetitive DNA, much of which is made up of remnants of transposable elements (TEs). Accurate annotation of TEs enables research into their biology and can shed light on the evolutionary processes that shape genomes. Identification and masking of TEs can also greatly simplify many downstream genome annotation and sequence analysis tasks. The commonly used TE annotation tools RepeatMasker and Censor depend on sequence homology search tools such as cross_match and BLAST variants, as well as Repbase, a collection of known TE families each represented by a single consensus sequence. Dfam contains entries corresponding to all Repbase TE entries for which instances have been found in the human genome. Each Dfam entry is represented by a profile hidden Markov model, built from alignments generated using RepeatMasker and Repbase. When used in conjunction with the hidden Markov model search tool nhmmer, Dfam produces a 2.9% increase in coverage over consensus sequence search methods on a large human benchmark, while maintaining low false discovery rates, and coverage of the full human genome is 54.5%. The website provides a collection of tools and data views to support improved TE curation and annotation efforts. Dfam is also available for download in flat file format or in the form of MySQL table dumps.

  19. How next-generation sequencing and multiscale data analysis will transform infectious disease management.

    PubMed

    Pak, Theodore R; Kasarskis, Andrew

    2015-12-01

    Recent reviews have examined the extent to which routine next-generation sequencing (NGS) on clinical specimens will improve the capabilities of clinical microbiology laboratories in the short term, but do not explore integrating NGS with clinical data from electronic medical records (EMRs), immune profiling data, and other rich datasets to create multiscale predictive models. This review introduces a range of "omics" and patient data sources relevant to managing infections and proposes 3 potentially disruptive applications for these data in the clinical workflow. The combined threats of healthcare-associated infections and multidrug-resistant organisms may be addressed by multiscale analysis of NGS and EMR data that is ideally updated and refined over time within each healthcare organization. Such data and analysis should form the cornerstone of future learning health systems for infectious disease. © The Author 2015. Published by Oxford University Press on behalf of the Infectious Diseases Society of America.

  20. Variation in Soil Microbial Community Structure Associated with Different Legume Species Is Greater than that Associated with Different Grass Species

    PubMed Central

    Zhou, Yang; Zhu, Honghui; Fu, Shenglei; Yao, Qing

    2017-01-01

    Plants are the essential factors shaping soil microbial community (SMC) structure. When most studies focus on the difference in the SMC structure associated different plant species, the variation in the SMC structure associated with phylogenetically close species is less investigated. Legume (Fabaceae) and grass (Poaceae) are functionally important plant groups; however, their influences on the SMC structure are seldom compared, and the variation in the SMC structure among legume or grass species is largely unknown. In this study, we grew three legume species vs. three grass species in mesocosms, and monitored the soil chemical property, quantified the abundance of bacteria and fungi. The SMC structure was also characterized using PCR-DGGE and Miseq sequencing. Results showed that legume and grass differentially affected soil pH, dissolved organic C, total N content, and available P content, and that legume enriched fungi more greatly than grass. Both DGGE profiling and Miseq-sequencing indicated that the bacterial diversity associated with legume was higher than that associated with grass. When legume increased the abundance of Verrucomicrobia, grass decreased it, and furthermore, linear discriminant analysis identified some group-specific microbial taxa as potential biomarkers of legume or grass. These data suggest that legume and grass differentially select for the SMC. More importantly, clustering analysis based on both DGGE profiling and Miseq-sequencing demonstrated that the variation in the SMC structure associated with three legume species was greater than that associated with three grass species. PMID:28620371

  1. Population genetic analysis of Enterocytozoon bieneusi in humans.

    PubMed

    Li, Wei; Cama, Vitaliano; Feng, Yaoyu; Gilman, Robert H; Bern, Caryn; Zhang, Xichen; Xiao, Lihua

    2012-01-01

    Genotyping based on sequence analysis of the ribosomal internal transcribed spacer has revealed significant genetic diversity in Enterocytozoonbieneusi. Thus far, the population genetics of E. bieneusi and its significance in the epidemiology of microsporidiosis have not been examined. In this study, a multilocus sequence typing of E. bieneusi in AIDS patients in Lima, Peru was conducted, using 72 specimens previously genotyped as A, D, IV, EbpC, WL11, Peru7, Peru8, Peru10 and Peru11 at the internal transcribed spacer locus. Altogether, 39 multilocus genotypes were identified among the 72 specimens. The observation of strong intragenic linkage disequilibria and limited genetic recombination among markers were indicative of an overall clonal population structure of E. bieneusi. Measures of pair-wise intergenic linkage disequilibria and a standardised index of association (IAS) based on allelic profile data further supported this conclusion. Both sequence-based and allelic profile-based phylogenetic analyses showed the presence of two genetically isolated groups in the study population, one (group 1) containing isolates of the anthroponotic internal transcribed spacer genotype A, and the other (group 2) containing isolates of multiple internal transcribed spacer genotypes (mainly genotypes D and IV) with zoonotic potential. The measurement of linkage disequilibria and recombination indicated group 2 had a clonal population structure, whereas group 1 had an epidemic population structure. The formation of the two sub-populations was confirmed by STRUCTURE and Wright's fixation index (FST) analyses. The data highlight the power of MLST in understanding the epidemiology of E. bieneusi. Published by Elsevier Ltd.

  2. Streptococcus iniae SF1: Complete Genome Sequence, Proteomic Profile, and Immunoprotective Antigens

    PubMed Central

    Zhang, Bao-cun; Zhang, Jian; Sun, Li

    2014-01-01

    Streptococcus iniae is a Gram-positive bacterium that is reckoned one of the most severe aquaculture pathogens. It has a broad host range among farmed marine and freshwater fish and can also cause zoonotic infection in humans. Here we report for the first time the complete genome sequence as well as the host factor-induced proteomic profile of a pathogenic S. iniae strain, SF1, a serotype I isolate from diseased fish. SF1 possesses a single chromosome of 2,149,844 base pairs, which contains 2,125 predicted protein coding sequences (CDS), 12 rRNA genes, and 45 tRNA genes. Among the protein-encoding CDS are genes involved in resource acquisition and utilization, signal sensing and transduction, carbohydrate metabolism, and defense against host immune response. Potential virulence genes include those encoding adhesins, autolysins, toxins, exoenzymes, and proteases. In addition, two putative prophages and a CRISPR-Cas system were found in the genome, the latter containing a CRISPR locus and four cas genes. Proteomic analysis detected 21 secreted proteins whose expressions were induced by host serum. Five of the serum-responsive proteins were subjected to immunoprotective analysis, which revealed that two of the proteins were highly protective against lethal S. iniae challenge when used as purified recombinant subunit vaccines. Taken together, these results provide an important molecular basis for future study of S. iniae in various aspects, in particular those related to pathogenesis and disease control. PMID:24621602

  3. Analysis and functional annotation of expressed sequence tags from the fall armyworm Spodoptera frugiperda

    PubMed Central

    Deng, Youping; Dong, Yinghua; Thodima, Venkata; Clem, Rollie J; Passarelli, A Lorena

    2006-01-01

    Background Little is known about the genome sequences of lepidopteran insects, although this group of insects has been studied extensively in the fields of endocrinology, development, immunity, and pathogen-host interactions. In addition, cell lines derived from Spodoptera frugiperda and other lepidopteran insects are routinely used for baculovirus foreign gene expression. This study reports the results of an expressed sequence tag (EST) sequencing project in cells from the lepidopteran insect S. frugiperda, the fall armyworm. Results We have constructed an EST database using two cDNA libraries from the S. frugiperda-derived cell line, SF-21. The database consists of 2,367 ESTs which were assembled into 244 contigs and 951 singlets for a total of 1,195 unique sequences. Conclusion S. frugiperda is an agriculturally important pest insect and genomic information will be instrumental for establishing initial transcriptional profiling and gene function studies, and for obtaining information about genes manipulated during infections by insect pathogens such as baculoviruses. PMID:17052344

  4. Identification and Expression Analysis of microRNAs at the Grain Filling Stage in Rice(Oryza sativa L.)via Deep Sequencing

    PubMed Central

    Yi, Rong; Zhu, Zhixuan; Hu, Jihong; Qian, Qian; Dai, Jincheng; Ding, Yi

    2013-01-01

    MicroRNAs (miRNAs) have been shown to play crucial roles in the regulation of plant development. In this study, high-throughput RNA-sequencing technology was used to identify novel miRNAs, and to reveal miRNAs expression patterns at different developmental stages during rice (Oryza sativa L.) grain filling. A total of 434 known miRNAs (380, 402, 390 and 392 at 5, 7, 12 and 17 days after fertilization, respectively.) were obtained from rice grain. The expression profiles of these identified miRNAs were analyzed and the results showed that 161 known miRNAs were differentially expressed during grain development, a high proportion of which were up-regulated from 5 to 7 days after fertilization. In addition, sixty novel miRNAs were identified, and five of these were further validated experimentally. Additional analysis showed that the predicted targets of the differentially expressed miRNAs may participate in signal transduction, carbohydrate and nitrogen metabolism, the response to stimuli and epigenetic regulation. In this study, differences were revealed in the composition and expression profiles of miRNAs among individual developmental stages during the rice grain filling process, and miRNA editing events were also observed, analyzed and validated during this process. The results provide novel insight into the dynamic profiles of miRNAs in developing rice grain and contribute to the understanding of the regulatory roles of miRNAs in grain filling. PMID:23469249

  5. Profile of microRNA in Giant Panda Blood: A Resource for Immune-Related and Novel microRNAs

    PubMed Central

    Yang, Mingyu; Du, Lianming; Li, Wujiao; Shen, Fujun; Fan, Zhenxin; Jian, Zuoyi; Hou, Rong; Shen, Yongmei; Yue, Bisong; Zhang, Xiuyue

    2015-01-01

    The giant panda (Ailuropoda melanoleuca) is one of the world’s most beloved endangered mammals. Although the draft genome of this species had been assembled, little was known about the composition of its microRNAs (miRNAs) or their functional profiles. Recent studies demonstrated that changes in the expression of miRNAs are associated with immunity. In this study, miRNAs were extracted from the blood of four healthy giant pandas and sequenced by Illumina next generation sequencing technology. As determined by miRNA screening, a total of 276 conserved miRNAs and 51 novel putative miRNAs candidates were detected. After differential expression analysis, we noticed that the expressions of 7 miRNAs were significantly up-regulated in young giant pandas compared with that of adults. Moreover, 2 miRNAs were up-regulated in female giant pandas and 1 in the male individuals. Target gene prediction suggested that the miRNAs of giant panda might be relevant to the expressions of 4,602 downstream genes. Subseuqently, the predicted target genes were conducted to KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway analysis and we found that these genes were mainly involved in host immunity, including the Ras signaling pathway, the PI3K-Akt signaling pathway, and the MAPK signaling pathway. In conclusion, our results provide the first miRNA profiles of giant panda blood, and the predicted functional analyses may open an avenue for further study of giant panda immunity. PMID:26599861

  6. Assessing the accuracy of blood RNA profiles to identify patients with post-concussion syndrome: A pilot study in a military patient population.

    PubMed

    Hardy, Jimmaline J; Mooney, Scott R; Pearson, Andrea N; McGuire, Dawn; Correa, Daniel J; Simon, Roger P; Meller, Robert

    2017-01-01

    Mild traumatic brain injury (mTBI) is a complex, neurophysiological condition that can have detrimental outcomes. Yet, to date, no objective method of diagnosis exists. Physical damage to the blood-brain-barrier and normal waste clearance via the lymphatic system may enable the detection of biomarkers of mTBI in peripheral circulation. Here we evaluate the accuracy of whole transcriptome analysis of blood to predict the clinical diagnosis of post-concussion syndrome (PCS) in a military cohort. Sixty patients with clinically diagnosed chronic concussion and controls (no history of concussion) were recruited (retrospective study design). Male patients (46) were split into a training set comprised of 20 long-term concussed (> 6 months and symptomatic) and 12 controls (no documented history of concussion). Models were validated in a testing set (control = 9, concussed = 5). RNA_Seq libraries were prepared from whole blood samples for sequencing using a SOLiD5500XL sequencer and aligned to hg19 reference genome. Patterns of differential exon expression were used for diagnostic modeling using support vector machine classification, and then validated in a second patient cohort. The accuracy of RNA profiles to predict the clinical diagnosis of post-concussion syndrome patients from controls was 86% (sensitivity 80%; specificity 89%). In addition, RNA profiles reveal duration of concussion. This pilot study shows the potential utility of whole transcriptome analysis to establish the clinical diagnosis of chronic concussion syndrome.

  7. Effect of DNA extraction and sample preservation method on rumen bacterial population.

    PubMed

    Fliegerova, Katerina; Tapio, Ilma; Bonin, Aurelie; Mrazek, Jakub; Callegari, Maria Luisa; Bani, Paolo; Bayat, Alireza; Vilkki, Johanna; Kopečný, Jan; Shingfield, Kevin J; Boyer, Frederic; Coissac, Eric; Taberlet, Pierre; Wallace, R John

    2014-10-01

    The comparison of the bacterial profile of intracellular (iDNA) and extracellular DNA (eDNA) isolated from cow rumen content stored under different conditions was conducted. The influence of rumen fluid treatment (cheesecloth squeezed, centrifuged, filtered), storage temperature (RT, -80 °C) and cryoprotectants (PBS-glycerol, ethanol) on quality and quantity parameters of extracted DNA was evaluated by bacterial DGGE analysis, real-time PCR quantification and metabarcoding approach using high-throughput sequencing. Samples clustered according to the type of extracted DNA due to considerable differences between iDNA and eDNA bacterial profiles, while storage temperature and cryoprotectants additives had little effect on sample clustering. The numbers of Firmicutes and Bacteroidetes were lower (P < 0.01) in eDNA samples. The qPCR indicated significantly higher amount of Firmicutes in iDNA sample frozen with glycerol (P < 0.01). Deep sequencing analysis of iDNA samples revealed the prevalence of Bacteroidetes and similarity of samples frozen with and without cryoprotectants, which differed from sample stored with ethanol at room temperature. Centrifugation and consequent filtration of rumen fluid subjected to the eDNA isolation procedure considerably changed the ratio of molecular operational taxonomic units (MOTUs) of Bacteroidetes and Firmicutes. Intracellular DNA extraction using bead-beating method from cheesecloth sieved rumen content mixed with PBS-glycerol and stored at -80 °C was found as the optimal method to study ruminal bacterial profile. Copyright © 2013 Elsevier Ltd. All rights reserved.

  8. Profile of microRNA in Giant Panda Blood: A Resource for Immune-Related and Novel microRNAs.

    PubMed

    Yang, Mingyu; Du, Lianming; Li, Wujiao; Shen, Fujun; Fan, Zhenxin; Jian, Zuoyi; Hou, Rong; Shen, Yongmei; Yue, Bisong; Zhang, Xiuyue

    2015-01-01

    The giant panda (Ailuropoda melanoleuca) is one of the world's most beloved endangered mammals. Although the draft genome of this species had been assembled, little was known about the composition of its microRNAs (miRNAs) or their functional profiles. Recent studies demonstrated that changes in the expression of miRNAs are associated with immunity. In this study, miRNAs were extracted from the blood of four healthy giant pandas and sequenced by Illumina next generation sequencing technology. As determined by miRNA screening, a total of 276 conserved miRNAs and 51 novel putative miRNAs candidates were detected. After differential expression analysis, we noticed that the expressions of 7 miRNAs were significantly up-regulated in young giant pandas compared with that of adults. Moreover, 2 miRNAs were up-regulated in female giant pandas and 1 in the male individuals. Target gene prediction suggested that the miRNAs of giant panda might be relevant to the expressions of 4,602 downstream genes. Subseuqently, the predicted target genes were conducted to KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway analysis and we found that these genes were mainly involved in host immunity, including the Ras signaling pathway, the PI3K-Akt signaling pathway, and the MAPK signaling pathway. In conclusion, our results provide the first miRNA profiles of giant panda blood, and the predicted functional analyses may open an avenue for further study of giant panda immunity.

  9. Molecular Profiling Reveals Biologically Discrete Subsets and Pathways of Progression in Diffuse Glioma.

    PubMed

    Ceccarelli, Michele; Barthel, Floris P; Malta, Tathiane M; Sabedot, Thais S; Salama, Sofie R; Murray, Bradley A; Morozova, Olena; Newton, Yulia; Radenbaugh, Amie; Pagnotta, Stefano M; Anjum, Samreen; Wang, Jiguang; Manyam, Ganiraju; Zoppoli, Pietro; Ling, Shiyun; Rao, Arjun A; Grifford, Mia; Cherniack, Andrew D; Zhang, Hailei; Poisson, Laila; Carlotti, Carlos Gilberto; Tirapelli, Daniela Pretti da Cunha; Rao, Arvind; Mikkelsen, Tom; Lau, Ching C; Yung, W K Alfred; Rabadan, Raul; Huse, Jason; Brat, Daniel J; Lehman, Norman L; Barnholtz-Sloan, Jill S; Zheng, Siyuan; Hess, Kenneth; Rao, Ganesh; Meyerson, Matthew; Beroukhim, Rameen; Cooper, Lee; Akbani, Rehan; Wrensch, Margaret; Haussler, David; Aldape, Kenneth D; Laird, Peter W; Gutmann, David H; Noushmehr, Houtan; Iavarone, Antonio; Verhaak, Roel G W

    2016-01-28

    Therapy development for adult diffuse glioma is hindered by incomplete knowledge of somatic glioma driving alterations and suboptimal disease classification. We defined the complete set of genes associated with 1,122 diffuse grade II-III-IV gliomas from The Cancer Genome Atlas and used molecular profiles to improve disease classification, identify molecular correlations, and provide insights into the progression from low- to high-grade disease. Whole-genome sequencing data analysis determined that ATRX but not TERT promoter mutations are associated with increased telomere length. Recent advances in glioma classification based on IDH mutation and 1p/19q co-deletion status were recapitulated through analysis of DNA methylation profiles, which identified clinically relevant molecular subsets. A subtype of IDH mutant glioma was associated with DNA demethylation and poor outcome; a group of IDH-wild-type diffuse glioma showed molecular similarity to pilocytic astrocytoma and relatively favorable survival. Understanding of cohesive disease groups may aid improved clinical outcomes. Copyright © 2016 Elsevier Inc. All rights reserved.

  10. Inferring genome-wide interplay landscape between DNA methylation and transcriptional regulation.

    PubMed

    Tang, Binhua; Wang, Xin

    2015-01-01

    DNA methylation and transcriptional regulation play important roles in cancer cell development and differentiation processes. Based on the currently available cell line profiling information from the ENCODE Consortium, we propose a Bayesian inference model to infer and construct genome-wide interaction landscape between DNA methylation and transcriptional regulation, which sheds light on the underlying complex functional mechanisms important within the human cancer and disease context. For the first time, we select all the currently available cell lines (>=20) and transcription factors (>=80) profiling information from the ENCODE Consortium portal. Through the integration of those genome-wide profiling sources, our genome-wide analysis detects multiple functional loci of interest, and indicates that DNA methylation is cell- and region-specific, due to the interplay mechanisms with transcription regulatory activities. We validate our analysis results with the corresponding RNA-sequencing technique for those detected genomic loci. Our results provide novel and meaningful insights for the interplay mechanisms of transcriptional regulation and gene expression for the human cancer and disease studies.

  11. YM500: a small RNA sequencing (smRNA-seq) database for microRNA research

    PubMed Central

    Cheng, Wei-Chung; Chung, I-Fang; Huang, Tse-Shun; Chang, Shih-Ting; Sun, Hsing-Jen; Tsai, Cheng-Fong; Liang, Muh-Lii; Wong, Tai-Tong; Wang, Hsei-Wei

    2013-01-01

    MicroRNAs (miRNAs) are small RNAs ∼22 nt in length that are involved in the regulation of a variety of physiological and pathological processes. Advances in high-throughput small RNA sequencing (smRNA-seq), one of the next-generation sequencing applications, have reshaped the miRNA research landscape. In this study, we established an integrative database, the YM500 (http://ngs.ym.edu.tw/ym500/), containing analysis pipelines and analysis results for 609 human and mice smRNA-seq results, including public data from the Gene Expression Omnibus (GEO) and some private sources. YM500 collects analysis results for miRNA quantification, for isomiR identification (incl. RNA editing), for arm switching discovery, and, more importantly, for novel miRNA predictions. Wetlab validation on >100 miRNAs confirmed high correlation between miRNA profiling and RT-qPCR results (R = 0.84). This database allows researchers to search these four different types of analysis results via our interactive web interface. YM500 allows researchers to define the criteria of isomiRs, and also integrates the information of dbSNP to help researchers distinguish isomiRs from SNPs. A user-friendly interface is provided to integrate miRNA-related information and existing evidence from hundreds of sequencing datasets. The identified novel miRNAs and isomiRs hold the potential for both basic research and biotech applications. PMID:23203880

  12. TranslatomeDB: a comprehensive database and cloud-based analysis platform for translatome sequencing data

    PubMed Central

    Liu, Wanting; Xiang, Lunping; Zheng, Tingkai; Jin, Jingjie

    2018-01-01

    Abstract Translation is a key regulatory step, linking transcriptome and proteome. Two major methods of translatome investigations are RNC-seq (sequencing of translating mRNA) and Ribo-seq (ribosome profiling). To facilitate the investigation of translation, we built a comprehensive database TranslatomeDB (http://www.translatomedb.net/) which provides collection and integrated analysis of published and user-generated translatome sequencing data. The current version includes 2453 Ribo-seq, 10 RNC-seq and their 1394 corresponding mRNA-seq datasets in 13 species. The database emphasizes the analysis functions in addition to the dataset collections. Differential gene expression (DGE) analysis can be performed between any two datasets of same species and type, both on transcriptome and translatome levels. The translation indices translation ratios, elongation velocity index and translational efficiency can be calculated to quantitatively evaluate translational initiation efficiency and elongation velocity, respectively. All datasets were analyzed using a unified, robust, accurate and experimentally-verifiable pipeline based on the FANSe3 mapping algorithm and edgeR for DGE analyzes. TranslatomeDB also allows users to upload their own datasets and utilize the identical unified pipeline to analyze their data. We believe that our TranslatomeDB is a comprehensive platform and knowledgebase on translatome and proteome research, releasing the biologists from complex searching, analyzing and comparing huge sequencing data without needing local computational power. PMID:29106630

  13. Rapid Molecular Identification of Pathogenic Yeasts by Pyrosequencing Analysis of 35 Nucleotides of Internal Transcribed Spacer 2 ▿

    PubMed Central

    Borman, Andrew M.; Linton, Christopher J.; Oliver, Debra; Palmer, Michael D.; Szekely, Adrien; Johnson, Elizabeth M.

    2010-01-01

    Rapid identification of yeast species isolates from clinical samples is particularly important given their innately variable antifungal susceptibility profiles. Here, we have evaluated the utility of pyrosequencing analysis of a portion of the internal transcribed spacer 2 region (ITS2) for identification of pathogenic yeasts. A total of 477 clinical isolates encompassing 43 different fungal species were subjected to pyrosequencing analysis in a strictly blinded study. The molecular identifications produced by pyrosequencing were compared with those obtained using conventional biochemical tests (AUXACOLOR2) and following PCR amplification and sequencing of the D1-D2 portion of the nuclear 28S large rRNA gene. More than 98% (469/477) of isolates encompassing 40 of the 43 fungal species tested were correctly identified by pyrosequencing of only 35 bp of ITS2. Moreover, BLAST searches of the public synchronized databases with the ITS2 pyrosequencing signature sequences revealed that there was only minimal sequence redundancy in the ITS2 under analysis. In all cases, the pyrosequencing signature sequences were unique to the yeast species (or species complex) under investigation. Finally, when pyrosequencing was combined with the Whatman FTA paper technology for the rapid extraction of fungal genomic DNA, molecular identification could be accomplished within 6 h from the time of starting from pure cultures. PMID:20702674

  14. novPTMenzy: a database for enzymes involved in novel post-translational modifications

    PubMed Central

    Khater, Shradha; Mohanty, Debasisa

    2015-01-01

    With the recent discoveries of novel post-translational modifications (PTMs) which play important roles in signaling and biosynthetic pathways, identification of such PTM catalyzing enzymes by genome mining has been an area of major interest. Unlike well-known PTMs like phosphorylation, glycosylation, SUMOylation, no bioinformatics resources are available for enzymes associated with novel and unusual PTMs. Therefore, we have developed the novPTMenzy database which catalogs information on the sequence, structure, active site and genomic neighborhood of experimentally characterized enzymes involved in five novel PTMs, namely AMPylation, Eliminylation, Sulfation, Hydroxylation and Deamidation. Based on a comprehensive analysis of the sequence and structural features of these known PTM catalyzing enzymes, we have created Hidden Markov Model profiles for the identification of similar PTM catalyzing enzymatic domains in genomic sequences. We have also created predictive rules for grouping them into functional subfamilies and deciphering their mechanistic details by structure-based analysis of their active site pockets. These analytical modules have been made available as user friendly search interfaces of novPTMenzy database. It also has a specialized analysis interface for some PTMs like AMPylation and Eliminylation. The novPTMenzy database is a unique resource that can aid in discovery of unusual PTM catalyzing enzymes in newly sequenced genomes. Database URL: http://www.nii.ac.in/novptmenzy.html PMID:25931459

  15. MALINA: a web service for visual analytics of human gut microbiota whole-genome metagenomic reads.

    PubMed

    Tyakht, Alexander V; Popenko, Anna S; Belenikin, Maxim S; Altukhov, Ilya A; Pavlenko, Alexander V; Kostryukova, Elena S; Selezneva, Oksana V; Larin, Andrei K; Karpova, Irina Y; Alexeev, Dmitry G

    2012-12-07

    MALINA is a web service for bioinformatic analysis of whole-genome metagenomic data obtained from human gut microbiota sequencing. As input data, it accepts metagenomic reads of various sequencing technologies, including long reads (such as Sanger and 454 sequencing) and next-generation (including SOLiD and Illumina). It is the first metagenomic web service that is capable of processing SOLiD color-space reads, to authors' knowledge. The web service allows phylogenetic and functional profiling of metagenomic samples using coverage depth resulting from the alignment of the reads to the catalogue of reference sequences which are built into the pipeline and contain prevalent microbial genomes and genes of human gut microbiota. The obtained metagenomic composition vectors are processed by the statistical analysis and visualization module containing methods for clustering, dimension reduction and group comparison. Additionally, the MALINA database includes vectors of bacterial and functional composition for human gut microbiota samples from a large number of existing studies allowing their comparative analysis together with user samples, namely datasets from Russian Metagenome project, MetaHIT and Human Microbiome Project (downloaded from http://hmpdacc.org). MALINA is made freely available on the web at http://malina.metagenome.ru. The website is implemented in JavaScript (using Ext JS), Microsoft .NET Framework, MS SQL, Python, with all major browsers supported.

  16. Effect of the cytochrome P450 2C19 inhibitor omeprazole on the pharmacokinetics and safety profile of bortezomib in patients with advanced solid tumours, non-Hodgkin's lymphoma or multiple myeloma.

    PubMed

    Quinn, David I; Nemunaitis, John; Fuloria, Jyotsna; Britten, Carolyn D; Gabrail, Nashat; Yee, Lorrin; Acharya, Milin; Chan, Kai; Cohen, Nadine; Dudov, Assen

    2009-01-01

    Bortezomib, an antineoplastic for the treatment of relapsed multiple myeloma and mantle cell lymphoma, undergoes metabolism through oxidative deboronation by cytochrome P450 (CYP) enzymes, primarily CYP3A4 and CYP2C19. Omeprazole, a proton-pump inhibitor, is primarily metabolized by and demonstrates high affinity for CYP2C19. This study investigated whether coadministration of omeprazole affected the pharmacokinetics, pharmacodynamics and safety profile of bortezomib in patients with advanced cancer. The variability of bortezomib pharmacokinetics with CYP enzyme polymorphism was also investigated. This open-label, crossover, pharmacokinetic drug-drug interaction study was conducted at seven institutions in the US and Europe between January 2005 and August 2006. Patients who had advanced solid tumours, non-Hodgkin's lymphoma or multiple myeloma, were aged >/=18 years, weighed >/=50 kg and had a life expectancy of >/=3 months were eligible. Patients received bortezomib 1.3 mg/m2 on days 1, 4, 8 and 11 for two 21-day cycles, plus omeprazole 40 mg in the morning of days 6-10 and in the evening of day 8 in either cycle 1 (sequence 1) or cycle 2 (sequence 2). On day 21 of cycle 2, patients benefiting from therapy could continue to receive bortezomib for six additional cycles. Blood samples for pharmacokinetic/pharmacodynamic evaluation were collected prior to and at various timepoints after bortezomib administration on day 8 of cycles 1 and 2. Blood samples for pharmacogenomics were also collected. Pharmacokinetic parameters were calculated by noncompartmental analysis of plasma concentration-time data for bortezomib administration on day 8 of cycles 1 and 2, using WinNonlin version 4.0.1.a software. The pharmacodynamic profile was assessed using a whole-blood 20S proteasome inhibition assay. Twenty-seven patients (median age 64 years) were enrolled, 12 in sequence 1 and 15 in sequence 2, including eight and nine pharmacokinetic-evaluable patients, respectively. Bortezomib pharmacokinetic parameters were similar when bortezomib was administered alone or with omeprazole (maximum plasma concentration 120 vs 123 ng/mL; area under the plasma concentration-time curve from 0 to 72 hours 129 vs 135 ng . h/mL). The pharmacodynamic parameters were also similar (maximum effect 85.8% vs 93.7%; area under the percent inhibition-time curve over 72 hours 4052 vs 3910 % x h); the differences were not statistically significant. Pharmacogenomic analysis revealed no meaningful relationships between CYP enzyme polymorphisms and pharmacokinetic/pharmacodynamic parameters. Toxicities were generally similar between patients in sequence 1 and sequence 2, and between cycle 1 and cycle 2 in both treatment sequences. Among 26 evaluable patients, 13 (50%) were assessed as benefiting from bortezomib at the end of cycle 2 and continued to receive treatment. No impact on the pharmacokinetics, pharmacodynamics and safety profile of bortezomib was seen with coadministration of omeprazole. Concomitant administration of bortezomib and omeprazole is unlikely to cause clinically significant drug-drug interactions and is unlikely to have an impact on the efficacy or safety of bortezomib.

  17. High-throughput sequencing of small RNAs and analysis of differentially expressed microRNAs associated with pistil development in Japanese apricot

    PubMed Central

    2012-01-01

    Background MicroRNAs (miRNAs) are a class of endogenous, small, non-coding RNAs that regulate gene expression by mediating gene silencing at transcriptional and post-transcriptional levels in high plants. However, the diversity of miRNAs and their roles in floral development in Japanese apricot (Prunus mume Sieb. et Zucc) remains largely unexplored. Imperfect flowers with pistil abortion seriously decrease production yields. To understand the role of miRNAs in pistil development, pistil development-related miRNAs were identified by Solexa sequencing in Japanese apricot. Results Solexa sequencing was used to identify and quantitatively profile small RNAs from perfect and imperfect flower buds of Japanese apricot. A total of 22,561,972 and 24,952,690 reads were sequenced from two small RNA libraries constructed from perfect and imperfect flower buds, respectively. Sixty-one known miRNAs, belonging to 24 families, were identified. Comparative profiling revealed that seven known miRNAs exhibited significant differential expression between perfect and imperfect flower buds. A total of 61 potentially novel miRNAs/new members of known miRNA families were also identified by the presence of mature miRNAs and corresponding miRNA*s in the sRNA libraries. Comparative analysis showed that six potentially novel miRNAs were differentially expressed between perfect and imperfect flower buds. Target predictions of the 13 differentially expressed miRNAs resulted in 212 target genes. Gene ontology (GO) annotation revealed that high-ranking miRNA target genes are those implicated in the developmental process, the regulation of transcription and response to stress. Conclusions This study represents the first comparative identification of miRNAomes between perfect and imperfect Japanese apricot flowers. Seven known miRNAs and six potentially novel miRNAs associated with pistil development were identified, using high-throughput sequencing of small RNAs. The findings, both computationally and experimentally, provide valuable information for further functional characterisation of miRNAs associated with pistil development in plants. PMID:22863067

  18. PSS-3D1D: an improved 3D1D profile method of protein fold recognition for the annotation of twilight zone sequences.

    PubMed

    Ganesan, K; Parthasarathy, S

    2011-12-01

    Annotation of any newly determined protein sequence depends on the pairwise sequence identity with known sequences. However, for the twilight zone sequences which have only 15-25% identity, the pair-wise comparison methods are inadequate and the annotation becomes a challenging task. Such sequences can be annotated by using methods that recognize their fold. Bowie et al. described a 3D1D profile method in which the amino acid sequences that fold into a known 3D structure are identified by their compatibility to that known 3D structure. We have improved the above method by using the predicted secondary structure information and employ it for fold recognition from the twilight zone sequences. In our Protein Secondary Structure 3D1D (PSS-3D1D) method, a score (w) for the predicted secondary structure of the query sequence is included in finding the compatibility of the query sequence to the known fold 3D structures. In the benchmarks, the PSS-3D1D method shows a maximum of 21% improvement in predicting correctly the α + β class of folds from the sequences with twilight zone level of identity, when compared with the 3D1D profile method. Hence, the PSS-3D1D method could offer more clues than the 3D1D method for the annotation of twilight zone sequences. The web based PSS-3D1D method is freely available in the PredictFold server at http://bioinfo.bdu.ac.in/servers/ .

  19. Profile analysis and prediction of tissue-specific CpG island methylation classes

    PubMed Central

    2009-01-01

    Background The computational prediction of DNA methylation has become an important topic in the recent years due to its role in the epigenetic control of normal and cancer-related processes. While previous prediction approaches focused merely on differences between methylated and unmethylated DNA sequences, recent experimental results have shown the presence of much more complex patterns of methylation across tissues and time in the human genome. These patterns are only partially described by a binary model of DNA methylation. In this work we propose a novel approach, based on profile analysis of tissue-specific methylation that uncovers significant differences in the sequences of CpG islands (CGIs) that predispose them to a tissue- specific methylation pattern. Results We defined CGI methylation profiles that separate not only between constitutively methylated and unmethylated CGIs, but also identify CGIs showing a differential degree of methylation across tissues and cell-types or a lack of methylation exclusively in sperm. These profiles are clearly distinguished by a number of CGI attributes including their evolutionary conservation, their significance, as well as the evolutionary evidence of prior methylation. Additionally, we assess profile functionality with respect to the different compartments of protein coding genes and their possible use in the prediction of DNA methylation. Conclusion Our approach provides new insights into the biological features that determine if a CGI has a functional role in the epigenetic control of gene expression and the features associated with CGI methylation susceptibility. Moreover, we show that the ability to predict CGI methylation is based primarily on the quality of the biological information used and the relationships uncovered between different sources of knowledge. The strategy presented here is able to predict, besides the constitutively methylated and unmethylated classes, two more tissue specific methylation classes conserving the accuracy provided by leading binary methylation classification methods. PMID:19383127

  20. Transcriptome Profiling of Bovine Milk Oligosaccharide Metabolism Genes Using RNA-Sequencing

    PubMed Central

    Wickramasinghe, Saumya; Hua, Serenus; Rincon, Gonzalo; Islas-Trejo, Alma; German, J. Bruce; Lebrilla, Carlito B.; Medrano, Juan F.

    2011-01-01

    This study examines the genes coding for enzymes involved in bovine milk oligosaccharide metabolism by comparing the oligosaccharide profiles with the expressions of glycosylation-related genes. Fresh milk samples (n = 32) were collected from four Holstein and Jersey cows at days 1, 15, 90 and 250 of lactation and free milk oligosaccharide profiles were analyzed. RNA was extracted from milk somatic cells at days 15 and 250 of lactation (n = 12) and gene expression analysis was conducted by RNA-Sequencing. A list was created of 121 glycosylation-related genes involved in oligosaccharide metabolism pathways in bovine by analyzing the oligosaccharide profiles and performing an extensive literature search. No significant differences were observed in either oligosaccharide profiles or expressions of glycosylation-related genes between Holstein and Jersey cows. The highest concentrations of free oligosaccharides were observed in the colostrum samples and a sharp decrease was observed in the concentration of free oligosaccharides on day 15, followed by progressive decrease on days 90 and 250. Ninety-two glycosylation-related genes were expressed in milk somatic cells. Most of these genes exhibited higher expression in day 250 samples indicating increases in net glycosylation-related metabolism in spite of decreases in free milk oligosaccharides in late lactation milk. Even though fucosylated free oligosaccharides were not identified, gene expression indicated the likely presence of fucosylated oligosaccharides in bovine milk. Fucosidase genes were expressed in milk and a possible explanation for not detecting fucosylated free oligosaccharides is the degradation of large fucosylated free oligosaccharides by the fucosidases. Detailed characterization of enzymes encoded by the 92 glycosylation-related genes identified in this study will provide the basic knowledge for metabolic network analysis of oligosaccharides in mammalian milk. These candidate genes will guide the design of a targeted breeding strategy to optimize the content of beneficial oligosaccharides in bovine milk. PMID:21541029

  1. Genetic diversity and virulence profiles of Listeria monocytogenes recovered from bulk tank milk, milk filters, and milking equipment from dairies in the United States (2002 to 2014).

    PubMed

    Kim, Seon Woo; Haendiges, Julie; Keller, Eric N; Myers, Robert; Kim, Alexander; Lombard, Jason E; Karns, Jeffrey S; Van Kessel, Jo Ann S; Haley, Bradd J

    2018-01-01

    Unpasteurized dairy products are known to occasionally harbor Listeria monocytogenes and have been implicated in recent listeriosis outbreaks and numerous sporadic cases of listeriosis. However, the diversity and virulence profiles of L. monocytogenes isolates recovered from these products have not been fully described. Here we report a genomic analysis of 121 L. monocytogenes isolates recovered from milk, milk filters, and milking equipment collected from bovine dairy farms in 19 states over a 12-year period. In a multi-virulence-locus sequence typing (MVLST) analysis, 59 Virulence Types (VT) were identified, of which 25% were Epidemic Clones I, II, V, VI, VII, VIII, IX, or X, and 31 were novel VT. In a multi-locus sequence typing (MLST) analysis, 60 Sequence Types (ST) of 56 Clonal Complexes (CC) were identified. Within lineage I, CC5 and CC1 were among the most abundant, and within lineage II, CC7 and CC37 were the most abundant. Multiple CCs previously associated with central nervous system and maternal-neonatal infections were identified. A genomic analysis identified variable distribution of virulence markers, Listeria pathogenicity islands (LIPI) -1, -3, and -4, and stress survival island-1 (SSI-1). Of these, 14 virulence markers, including LIPI-3 and -4 were more frequently detected in one lineage (I or II) than the other. LIPI-3 and LIPI-4 were identified in 68% and 28% of lineage I CCs, respectively. Results of this analysis indicate that there is a high level of genetic diversity among the L. monocytogenes present in bulk tank milk in the United States with some strains being more frequently detected than others, and some being similar to those that have been isolated from previous non-dairy related outbreaks. Results of this study also demonstrate significant number of strains isolated from dairy farms encode virulence markers associated with severe human disease.

  2. Integrated genomic classification of melanocytic tumors of the central nervous system using mutation analysis, copy number alterations and DNA methylation profiling.

    PubMed

    Griewank, Klaus; Koelsche, Christian; van de Nes, Johannes A P; Schrimpf, Daniel; Gessi, Marco; Möller, Inga; Sucker, Antje; Scolyer, Richard A; Buckland, Michael E; Murali, Rajmohan; Pietsch, Torsten; von Deimling, Andreas; Schadendorf, Dirk

    2018-06-11

    In the central nervous system, distinguishing primary leptomeningeal melanocytic tumors from melanoma metastases and predicting their biological behavior solely using histopathologic criteria can be challenging. We aimed to assess the diagnostic and prognostic value of integrated molecular analysis. Targeted next-generation-sequencing, array-based genome-wide methylation analysis and BAP1 immunohistochemistry was performed on the largest cohort of central nervous system melanocytic tumors analyzed to date, incl. 47 primary tumors of the central nervous system, 16 uveal melanomas. 13 cutaneous melanoma metastasis and 2 blue nevus-like melanomas. Gene mutation, DNA-methylation and copy-number profiles were correlated with clinicopathological features. Combining mutation, copy-number and DNA-methylation profiles clearly distinguished cutaneous melanoma metastases from other melanocytic tumors. Primary leptomeningeal melanocytic tumors, uveal melanomas and blue nevus-like melanoma showed common DNA-methylation, copy-number alteration and gene mutation signatures. Notably, tumors demonstrating chromosome 3 monosomy and BAP1 alterations formed a homogeneous subset within this group. Integrated molecular profiling aids in distinguishing primary from metastatic melanocytic tumors of the central nervous system. Primary leptomeningeal melanocytic tumors, uveal melanoma and blue nevus-like melanoma share molecular similarity with chromosome 3 and BAP1 alterations markers of poor prognosis. Copyright ©2018, American Association for Cancer Research.

  3. Mutational analysis of multiple lung cancers: Discrimination between primary and metastatic lung cancers by genomic profile.

    PubMed

    Goto, Taichiro; Hirotsu, Yosuke; Mochizuki, Hitoshi; Nakagomi, Takahiro; Shikata, Daichi; Yokoyama, Yujiro; Oyama, Toshio; Amemiya, Kenji; Okimoto, Kenichiro; Omata, Masao

    2017-05-09

    In cases of multiple lung cancers, individual tumors may represent either a primary lung cancer or both primary and metastatic lung cancers. Treatment selection varies depending on such features, and this discrimination is critically important in predicting prognosis. The present study was undertaken to determine the efficacy and validity of mutation analysis as a means of determining whether multiple lung cancers are primary or metastatic in nature. The study involved 12 patients who underwent surgery in our department for multiple lung cancers between July 2014 and March 2016. Tumor cells were collected from formalin-fixed paraffin-embedded tissues of the primary lesions by using laser capture microdissection, and targeted sequencing of 53 lung cancer-related genes was performed. In surgically treated patients with multiple lung cancers, the driver mutation profile differed among the individual tumors. Meanwhile, in a case of a solitary lung tumor that appeared after surgery for double primary lung cancers, gene mutation analysis using a bronchoscopic biopsy sample revealed a gene mutation profile consistent with the surgically resected specimen, thus demonstrating that the tumor in this case was metastatic. In cases of multiple lung cancers, the comparison of driver mutation profiles clarifies the clonal origin of the tumors and enables discrimination between primary and metastatic tumors.

  4. Rosetta stone method for detecting protein function and protein-protein interactions from genome sequences

    DOEpatents

    Eisenberg, David; Marcotte, Edward M.; Pellegrini, Matteo; Thompson, Michael J.; Yeates, Todd O.

    2002-10-15

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  5. Evolutionary characterization and transcript profiling of β-tubulin genes in flax (Linum usitatissimum L.) during plant development.

    PubMed

    Gavazzi, Floriana; Pigna, Gaia; Braglia, Luca; Gianì, Silvia; Breviario, Diego; Morello, Laura

    2017-12-08

    Microtubules, polymerized from alpha and beta-tubulin monomers, play a fundamental role in plant morphogenesis, determining the cell division plane, the direction of cell expansion and the deposition of cell wall material. During polarized pollen tube elongation, microtubules serve as tracks for vesicular transport and deposition of proteins/lipids at the tip membrane. Such functions are controlled by cortical microtubule arrays. Aim of this study was to first characterize the flax β-tubulin family by sequence and phylogenetic analysis and to investigate differential expression of β-tubulin genes possibly related to fibre elongation and to flower development. We report the cloning and characterization of the complete flax β-tubulin gene family: exon-intron organization, duplicated gene comparison, phylogenetic analysis and expression pattern during stem and hypocotyl elongation and during flower development. Sequence analysis of the fourteen expressed β-tubulin genes revealed that the recent whole genome duplication of the flax genome was followed by massive retention of duplicated tubulin genes. Expression analysis showed that β-tubulin mRNA profiles gradually changed along with phloem fibre development in both the stem and hypocotyl. In flowers, changes in relative tubulin transcript levels took place at anthesis in anthers, but not in carpels. Phylogenetic analysis supports the origin of extant plant β-tubulin genes from four ancestral genes pre-dating angiosperm separation. Expression analysis suggests that particular tubulin subpopulations are more suitable to sustain different microtubule functions such as cell elongation, cell wall thickening or pollen tube growth. Tubulin genes possibly related to different microtubule functions were identified as candidate for more detailed studies.

  6. The DNA methylation profile of oocytes in mice with hyperinsulinaemia and hyperandrogenism as detected by single-cell level whole genome bisulphite sequencing (SC-WGBS) technology.

    PubMed

    Li, Qian-Nan; Guo, Lei; Hou, Yi; Ou, Xiang-Hong; Liu, Zhonghua; Sun, Qing-Yuan

    2018-06-22

    Polycystic ovary syndrome (PCOS), a familial aggregation disease that causes anovulation in women, has well-recognised characteristics, two of which are hyperinsulinaemia and hyperandrogenaemia. To determine whether the DNA methylation status is altered in oocytes by high insulin and androgen levels, we generated a mouse model with hyperinsulinaemia and hyperandrogenaemia by injection of insulin and human chorionic gonadotrophin and investigated DNA methylation changes through single-cell level whole genome bisulphite sequencing. Our results showed that hyperinsulinaemia and hyperandrogenaemia had no significant effects on the global DNA methylation profile and different functional regions of genes, but did alter methylation status of some genes, which were significantly enriched in 17 gene ontology (GO) terms (P<0.05) by GO analysis. Among differently methylated genes, some were related to the occurrence of PCOS. Based on our results, we suggest that hyperinsulinaemia and hyperandrogenaemia may cause changes in some DNA methylation loci in oocytes.

  7. Profiling human breast epithelial cells using single cell RNA sequencing identifies cell diversity.

    PubMed

    Nguyen, Quy H; Pervolarakis, Nicholas; Blake, Kerrigan; Ma, Dennis; Davis, Ryan Tevia; James, Nathan; Phung, Anh T; Willey, Elizabeth; Kumar, Raj; Jabart, Eric; Driver, Ian; Rock, Jason; Goga, Andrei; Khan, Seema A; Lawson, Devon A; Werb, Zena; Kessenbrock, Kai

    2018-05-23

    Breast cancer arises from breast epithelial cells that acquire genetic alterations leading to subsequent loss of tissue homeostasis. Several distinct epithelial subpopulations have been proposed, but complete understanding of the spectrum of heterogeneity and differentiation hierarchy in the human breast remains elusive. Here, we use single-cell mRNA sequencing (scRNAseq) to profile the transcriptomes of 25,790 primary human breast epithelial cells isolated from reduction mammoplasties of seven individuals. Unbiased clustering analysis reveals the existence of three distinct epithelial cell populations, one basal and two luminal cell types, which we identify as secretory L1- and hormone-responsive L2-type cells. Pseudotemporal reconstruction of differentiation trajectories produces one continuous lineage hierarchy that closely connects the basal lineage to the two differentiated luminal branches. Our comprehensive cell atlas provides insights into the cellular blueprint of the human breast epithelium and will form the foundation to understand how the system goes awry during breast cancer.

  8. Managing the genomic revolution in cancer diagnostics.

    PubMed

    Nguyen, Doreen; Gocke, Christopher D

    2017-08-01

    Molecular tumor profiling is now a routine part of patient care, revealing targetable genomic alterations and molecularly distinct tumor subtypes with therapeutic and prognostic implications. The widespread adoption of next-generation sequencing technologies has greatly facilitated clinical implementation of genomic data and opened the door for high-throughput multigene-targeted sequencing. Herein, we discuss the variability of cancer genetic profiling currently offered by clinical laboratories, the challenges of applying rapidly evolving medical knowledge to individual patients, and the need for more standardized population-based molecular profiling.

  9. Transcriptomic analysis of grain amaranth (Amaranthus hypochondriacus) using 454 pyrosequencing: comparison with A. tuberculatus, expression profiling in stems and in response to biotic and abiotic stress

    PubMed Central

    2011-01-01

    Background Amaranthus hypochondriacus, a grain amaranth, is a C4 plant noted by its ability to tolerate stressful conditions and produce highly nutritious seeds. These possess an optimal amino acid balance and constitute a rich source of health-promoting peptides. Although several recent studies, mostly involving subtractive hybridization strategies, have contributed to increase the relatively low number of grain amaranth expressed sequence tags (ESTs), transcriptomic information of this species remains limited, particularly regarding tissue-specific and biotic stress-related genes. Thus, a large scale transcriptome analysis was performed to generate stem- and (a)biotic stress-responsive gene expression profiles in grain amaranth. Results A total of 2,700,168 raw reads were obtained from six 454 pyrosequencing runs, which were assembled into 21,207 high quality sequences (20,408 isotigs + 799 contigs). The average sequence length was 1,064 bp and 930 bp for isotigs and contigs, respectively. Only 5,113 singletons were recovered after quality control. Contigs/isotigs were further incorporated into 15,667 isogroups. All unique sequences were queried against the nr, TAIR, UniRef100, UniRef50 and Amaranthaceae EST databases for annotation. Functional GO annotation was performed with all contigs/isotigs that produced significant hits with the TAIR database. Only 8,260 sequences were found to be homologous when the transcriptomes of A. tuberculatus and A. hypochondriacus were compared, most of which were associated with basic house-keeping processes. Digital expression analysis identified 1,971 differentially expressed genes in response to at least one of four stress treatments tested. These included several multiple-stress-inducible genes that could represent potential candidates for use in the engineering of stress-resistant plants. The transcriptomic data generated from pigmented stems shared similarity with findings reported in developing stems of Arabidopsis and black cottonwood (Populus trichocarpa). Conclusions This study represents the first large-scale transcriptomic analysis of A. hypochondriacus, considered to be a highly nutritious and stress-tolerant crop. Numerous genes were found to be induced in response to (a)biotic stress, many of which could further the understanding of the mechanisms that contribute to multiple stress-resistance in plants, a trait that has potential biotechnological applications in agriculture. PMID:21752295

  10. Digital RNA sequencing minimizes sequence-dependent bias and amplification noise with optimized single-molecule barcodes

    PubMed Central

    Shiroguchi, Katsuyuki; Jia, Tony Z.; Sims, Peter A.; Xie, X. Sunney

    2012-01-01

    RNA sequencing (RNA-Seq) is a powerful tool for transcriptome profiling, but is hampered by sequence-dependent bias and inaccuracy at low copy numbers intrinsic to exponential PCR amplification. We developed a simple strategy for mitigating these complications, allowing truly digital RNA-Seq. Following reverse transcription, a large set of barcode sequences is added in excess, and nearly every cDNA molecule is uniquely labeled by random attachment of barcode sequences to both ends. After PCR, we applied paired-end deep sequencing to read the two barcodes and cDNA sequences. Rather than counting the number of reads, RNA abundance is measured based on the number of unique barcode sequences observed for a given cDNA sequence. We optimized the barcodes to be unambiguously identifiable, even in the presence of multiple sequencing errors. This method allows counting with single-copy resolution despite sequence-dependent bias and PCR-amplification noise, and is analogous to digital PCR but amendable to quantifying a whole transcriptome. We demonstrated transcriptome profiling of Escherichia coli with more accurate and reproducible quantification than conventional RNA-Seq. PMID:22232676

  11. Single-cell genomic profiling of acute myeloid leukemia for clinical use: A pilot study

    PubMed Central

    Yan, Benedict; Hu, Yongli; Ban, Kenneth H.K.; Tiang, Zenia; Ng, Christopher; Lee, Joanne; Tan, Wilson; Chiu, Lily; Tan, Tin Wee; Seah, Elaine; Ng, Chin Hin; Chng, Wee-Joo; Foo, Roger

    2017-01-01

    Although bulk high-throughput genomic profiling studies have led to a significant increase in the understanding of cancer biology, there is increasing awareness that bulk profiling approaches do not completely elucidate tumor heterogeneity. Single-cell genomic profiling enables the distinction of tumor heterogeneity, and may improve clinical diagnosis through the identification and characterization of putative subclonal populations. In the present study, the challenges associated with a single-cell genomics profiling workflow for clinical diagnostics were investigated. Single-cell RNA-sequencing (RNA-seq) was performed on 20 cells from an acute myeloid leukemia bone marrow sample. Putative blasts were identified based on their gene expression profiles and principal component analysis was performed to identify outlier cells. Variant calling was performed on the single-cell RNA-seq data. The present pilot study demonstrates a proof of concept for clinical single-cell genomic profiling. The recognized limitations include significant stochastic RNA loss and the relatively low throughput of the current proposed platform. Although the results of the present study are promising, further technological advances and protocol optimization are necessary for single-cell genomic profiling to be clinically viable. PMID:28454300

  12. Metabolic Pathway Assignment of Plant Genes based on Phylogenetic Profiling–A Feasibility Study

    PubMed Central

    Weißenborn, Sandra; Walther, Dirk

    2017-01-01

    Despite many developed experimental and computational approaches, functional gene annotation remains challenging. With the rapidly growing number of sequenced genomes, the concept of phylogenetic profiling, which predicts functional links between genes that share a common co-occurrence pattern across different genomes, has gained renewed attention as it promises to annotate gene functions based on presence/absence calls alone. We applied phylogenetic profiling to the problem of metabolic pathway assignments of plant genes with a particular focus on secondary metabolism pathways. We determined phylogenetic profiles for 40,960 metabolic pathway enzyme genes with assigned EC numbers from 24 plant species based on sequence and pathway annotation data from KEGG and Ensembl Plants. For gene sequence family assignments, needed to determine the presence or absence of particular gene functions in the given plant species, we included data of all 39 species available at the Ensembl Plants database and established gene families based on pairwise sequence identities and annotation information. Aside from performing profiling comparisons, we used machine learning approaches to predict pathway associations from phylogenetic profiles alone. Selected metabolic pathways were indeed found to be composed of gene families of greater than expected phylogenetic profile similarity. This was particularly evident for primary metabolism pathways, whereas for secondary pathways, both the available annotation in different species as well as the abstraction of functional association via distinct pathways proved limiting. While phylogenetic profile similarity was generally not found to correlate with gene co-expression, direct physical interactions of proteins were reflected by a significantly increased profile similarity suggesting an application of phylogenetic profiling methods as a filtering step in the identification of protein-protein interactions. This feasibility study highlights the potential and challenges associated with phylogenetic profiling methods for the detection of functional relationships between genes as well as the need to enlarge the set of plant genes with proven secondary metabolism involvement as well as the limitations of distinct pathways as abstractions of relationships between genes. PMID:29163570

  13. An ordered EST catalogue and gene expression profiles of cassava (Manihot esculenta) at key growth stages.

    PubMed

    Li, You-Zhi; Pan, Ying-Hua; Sun, Chang-Bin; Dong, Hai-Tao; Luo, Xing-Lu; Wang, Zhi-Qiang; Tang, Ji-Liang; Chen, Baoshan

    2010-12-01

    A cDNA library was constructed from the root tissues of cassava variety Huanan 124 at the root bulking stage. A total of 9,600 cDNA clones from the library were sequenced with single-pass from the 5'-terminus to establish a catalogue of expressed sequence tags (ESTs). Assembly of the resulting EST sequences resulted in 2,878 putative unigenes. Blastn analysis showed that 62.6% of the unigenes matched with known cassava ESTs and the rest had no 'hits' against the cassava database in the integrative PlantGDB database. Blastx analysis showed that 1,715 (59.59%) of the unigenes matched with one or more GenBank protein entries and 1,163 (40.41%) had no 'hits'. A cDNA microarray with 2,878 unigenes was developed and used to analyze gene expression profiling of Huanan 124 at key growth stages including seedling, formation of root system, root bulking, and starch maturity. Array data analysis revealed that (1) the higher ratio of up-regulated ribosome-related genes was accompanied by a high ratio of up-regulated ubiquitin, proteasome-related and protease genes in cassava roots; (2) starch formation and degradation simultaneously occur at the early stages of root development but starch degradation is declined partially due to decrease in UDP-glucose dehydrogenase activity with root maturity; (3) starch may also be synthesized in situ in roots; (4) starch synthesis, translocation, and accumulation are also associated probably with signaling pathways that parallel Wnt, LAM, TCS and ErbB signaling pathways in animals; (5) constitutive expression of stress-responsive genes may be due to the adaptation of cassava to harsh environments during long-term evolution.

  14. Phylogenetic and comparative gene expression analysis of barley (Hordeum vulgare)WRKY transcription factor family reveals putatively retained functions betweenmonocots and dicots

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mangelsen, Elke; Kilian, Joachim; Berendzen, Kenneth W.

    2008-02-01

    WRKY proteins belong to the WRKY-GCM1 superfamily of zinc finger transcription factors that have been subject to a large plant-specific diversification. For the cereal crop barley (Hordeum vulgare), three different WRKY proteins have been characterized so far, as regulators in sucrose signaling, in pathogen defense, and in response to cold and drought, respectively. However, their phylogenetic relationship remained unresolved. In this study, we used the available sequence information to identify a minimum number of 45 barley WRKY transcription factor (HvWRKY) genes. According to their structural features the HvWRKY factors were classified into the previously defined polyphyletic WRKY subgroups 1 tomore » 3. Furthermore, we could assign putative orthologs of the HvWRKY proteins in Arabidopsis and rice. While in most cases clades of orthologous proteins were formed within each group or subgroup, other clades were composed of paralogous proteins for the grasses and Arabidopsis only, which is indicative of specific gene radiation events. To gain insight into their putative functions, we examined expression profiles of WRKY genes from publicly available microarray data resources and found group specific expression patterns. While putative orthologs of the HvWRKY transcription factors have been inferred from phylogenetic sequence analysis, we performed a comparative expression analysis of WRKY genes in Arabidopsis and barley. Indeed, highly correlative expression profiles were found between some of the putative orthologs. HvWRKY genes have not only undergone radiation in monocot or dicot species, but exhibit evolutionary traits specific to grasses. HvWRKY proteins exhibited not only sequence similarities between orthologs with Arabidopsis, but also relatedness in their expression patterns. This correlative expression is indicative for a putative conserved function of related WRKY proteins in mono- and dicot species.« less

  15. Comparative Genomics Reveal That Host-Innate Immune Responses Influence the Clinical Prevalence of Legionella pneumophila Serogroups

    PubMed Central

    Khan, Mohammad Adil; Knox, Natalie; Prashar, Akriti; Alexander, David; Abdel-Nour, Mena; Duncan, Carla; Tang, Patrick; Amatullah, Hajera; Dos Santos, Claudia C.; Tijet, Nathalie; Low, Donald E.; Pourcel, Christine; Van Domselaar, Gary; Terebiznik, Mauricio; Ensminger, Alexander W.; Guyard, Cyril

    2013-01-01

    Legionella pneumophila is the primary etiologic agent of legionellosis, a potentially fatal respiratory illness. Amongst the sixteen described L. pneumophila serogroups, a majority of the clinical infections diagnosed using standard methods are serogroup 1 (Sg1). This high clinical prevalence of Sg1 is hypothesized to be linked to environmental specific advantages and/or to increased virulence of strains belonging to Sg1. The genetic determinants for this prevalence remain unknown primarily due to the limited genomic information available for non-Sg1 clinical strains. Through a systematic attempt to culture Legionella from patient respiratory samples, we have previously reported that 34% of all culture confirmed legionellosis cases in Ontario (n = 351) are caused by non-Sg1 Legionella. Phylogenetic analysis combining multiple-locus variable number tandem repeat analysis and sequence based typing profiles of all non-Sg1 identified that L. pneumophila clinical strains (n = 73) belonging to the two most prevalent molecular types were Sg6. We conducted whole genome sequencing of two strains representative of these sequence types and one distant neighbour. Comparative genomics of the three L. pneumophila Sg6 genomes reported here with published L. pneumophila serogroup 1 genomes identified genetic differences in the O-antigen biosynthetic cluster. Comparative optical mapping analysis between Sg6 and Sg1 further corroborated this finding. We confirmed an altered O-antigen profile of Sg6, and tested its possible effects on growth and replication in in vitro biological models and experimental murine infections. Our data indicates that while clinical Sg1 might not be better suited than Sg6 in colonizing environmental niches, increased bloodstream dissemination through resistance to the alternative pathway of complement mediated killing in the human host may explain its higher prevalence. PMID:23826259

  16. Lignocellulose-converting enzyme activity profiles correlate with molecular systematics and phylogeny grouping in the incoherent genus Phlebia (Polyporales, Basidiomycota).

    PubMed

    Kuuskeri, Jaana; Mäkelä, Miia R; Isotalo, Jarkko; Oksanen, Ilona; Lundell, Taina

    2015-10-19

    The fungal genus Phlebia consists of a number of species that are significant in wood decay. Biotechnological potential of a few species for enzyme production and degradation of lignin and pollutants has been previously studied, when most of the species of this genus are unknown. Therefore, we carried out a wider study on biochemistry and systematics of Phlebia species. Isolates belonging to the genus Phlebia were subjected to four-gene sequence analysis in order to clarify their phylogenetic placement at species level and evolutionary relationships of the genus among phlebioid Polyporales. rRNA-encoding (5.8S, partial LSU) and two protein-encoding gene (gapdh, rpb2) sequences were adopted for the evolutionary analysis, and ITS sequences (ITS1+5.8S+ITS2) were aligned for in-depth species-level phylogeny. The 49 fungal isolates were cultivated on semi-solid milled spruce wood medium for 21 days in order to follow their production of extracellular lignocellulose-converting oxidoreductases and carbohydrate active enzymes. Four-gene phylogenetic analysis confirmed the polyphyletic nature of the genus Phlebia. Ten species-level subgroups were formed, and their lignocellulose-converting enzyme activity profiles coincided with the phylogenetic grouping. The highest enzyme activities for lignin modification (manganese peroxidase activity) were obtained for Phlebia radiata group, which supports our previous studies on the enzymology and gene expression of this species on lignocellulosic substrates. Our study implies that there is a species-level connection of molecular systematics (genotype) to the efficiency in production of both lignocellulose-converting carbohydrate active enzymes and oxidoreductases (enzyme phenotype) on spruce wood. Thus, we may propose a similar phylogrouping approach for prediction of lignocellulose-converting enzyme phenotypes in new fungal species or genetically and biochemically less-studied isolates of the wood-decay Polyporales.

  17. Comparative and Evolutionary Analysis of Grass Pollen Allergens Using Brachypodium distachyon as a Model System.

    PubMed

    Sharma, Akanksha; Sharma, Niharika; Bhalla, Prem; Singh, Mohan

    2017-01-01

    Comparative genomics have facilitated the mining of biological information from a genome sequence, through the detection of similarities and differences with genomes of closely or more distantly related species. By using such comparative approaches, knowledge can be transferred from the model to non-model organisms and insights can be gained in the structural and evolutionary patterns of specific genes. In the absence of sequenced genomes for allergenic grasses, this study was aimed at understanding the structure, organisation and expression profiles of grass pollen allergens using the genomic data from Brachypodium distachyon as it is phylogenetically related to the allergenic grasses. Combining genomic data with the anther RNA-Seq dataset revealed 24 pollen allergen genes belonging to eight allergen groups mapping on the five chromosomes in B. distachyon. High levels of anther-specific expression profiles were observed for the 24 identified putative allergen-encoding genes in Brachypodium. The genomic evidence suggests that gene encoding the group 5 allergen, the most potent trigger of hay fever and allergic asthma originated as a pollen specific orphan gene in a common grass ancestor of Brachypodium and Triticiae clades. Gene structure analysis showed that the putative allergen-encoding genes in Brachypodium either lack or contain reduced number of introns. Promoter analysis of the identified Brachypodium genes revealed the presence of specific cis-regulatory sequences likely responsible for high anther/pollen-specific expression. With the identification of putative allergen-encoding genes in Brachypodium, this study has also described some important plant gene families (e.g. expansin superfamily, EF-Hand family, profilins etc) for the first time in the model plant Brachypodium. Altogether, the present study provides new insights into structural characterization and evolution of pollen allergens and will further serve as a base for their functional characterization in related grass species.

  18. Sequencing and de novo analysis of the hemocytes transcriptome in Litopenaeus vannamei response to white spot syndrome virus infection.

    PubMed

    Xue, Shuxia; Liu, Yichen; Zhang, Yichen; Sun, Yan; Geng, Xuyun; Sun, Jinsheng

    2013-01-01

    White spot syndrome virus (WSSV) is a causative pathogen found in most shrimp farming areas of the world and causes large economic losses to the shrimp aquaculture. The mechanism underlying the molecular pathogenesis of the highly virulent WSSV remains unknown. To better understand the virus-host interactions at the molecular level, the transcriptome profiles in hemocytes of unchallenged and WSSV-challenged shrimp (Litopenaeus vannamei) were compared using a short-read deep sequencing method (Illumina). RNA-seq analysis generated more than 25.81 million clean pair end (PE) reads, which were assembled into 52,073 unigenes (mean size = 520 bp). Based on sequence similarity searches, 23,568 (45.3%) genes were identified, among which 6,562 and 7,822 unigenes were assigned to gene ontology (GO) categories and clusters of orthologous groups (COG), respectively. Searches in the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG) mapped 14,941 (63.4%) unigenes to 240 KEGG pathways. Among all the annotated unigenes, 1,179 were associated with immune-related genes. Digital gene expression (DGE) analysis revealed that the host transcriptome profile was slightly changed in the early infection (5 hours post injection) of the virus, while large transcriptional differences were identified in the late infection (48 hpi) of WSSV. The differentially expressed genes mainly involved in pattern recognition genes and some immune response factors. The results indicated that antiviral immune mechanisms were probably involved in the recognition of pathogen-associated molecular patterns. This study provided a global survey of host gene activities against virus infection in a non-model organism, pacific white shrimp. Results can contribute to the in-depth study of candidate genes in white shrimp, and help to improve the current understanding of host-pathogen interactions.

  19. De novo transcriptome sequencing and digital gene expression analysis predict biosynthetic pathway of rhynchophylline and isorhynchophylline from Uncaria rhynchophylla, a non-model plant with potent anti-alzheimer's properties.

    PubMed

    Guo, Qianqian; Ma, Xiaojun; Wei, Shugen; Qiu, Deyou; Wilson, Iain W; Wu, Peng; Tang, Qi; Liu, Lijun; Dong, Shoukun; Zu, Wei

    2014-08-12

    The major medicinal alkaloids isolated from Uncaria rhynchophylla (gouteng in chinese) capsules are rhynchophylline (RIN) and isorhynchophylline (IRN). Extracts containing these terpene indole alkaloids (TIAs) can inhibit the formation and destabilize preformed fibrils of amyloid β protein (a pathological marker of Alzheimer's disease), and have been shown to improve the cognitive function of mice with Alzheimer-like symptoms. The biosynthetic pathways of RIN and IRN are largely unknown. In this study, RNA-sequencing of pooled Uncaria capsules RNA samples taken at three developmental stages that accumulate different amount of RIN and IRN was performed. More than 50 million high-quality reads from a cDNA library were generated and de novo assembled. Sequences for all of the known enzymes involved in TIAs synthesis were identified. Additionally, 193 cytochrome P450 (CYP450), 280 methyltransferase and 144 isomerase genes were identified, that are potential candidates for enzymes involved in RIN and IRN synthesis. Digital gene expression profile (DGE) analysis was performed on the three capsule developmental stages, and based on genes possessing expression profiles consistent with RIN and IRN levels; four CYP450s, three methyltransferases and three isomerases were identified as the candidates most likely to be involved in the later steps of RIN and IRN biosynthesis. A combination of de novo transcriptome assembly and DGE analysis was shown to be a powerful method for identifying genes encoding enzymes potentially involved in the biosynthesis of important secondary metabolites in a non-model plant. The transcriptome data from this study provides an important resource for understanding the formation of major bioactive constituents in the capsule extract from Uncaria, and provides information that may aid in metabolic engineering to increase yields of these important alkaloids.

  20. Skin Microbiome Surveys Are Strongly Influenced by Experimental Design.

    PubMed

    Meisel, Jacquelyn S; Hannigan, Geoffrey D; Tyldsley, Amanda S; SanMiguel, Adam J; Hodkinson, Brendan P; Zheng, Qi; Grice, Elizabeth A

    2016-05-01

    Culture-independent studies to characterize skin microbiota are increasingly common, due in part to affordable and accessible sequencing and analysis platforms. Compared to culture-based techniques, DNA sequencing of the bacterial 16S ribosomal RNA (rRNA) gene or whole metagenome shotgun (WMS) sequencing provides more precise microbial community characterizations. Most widely used protocols were developed to characterize microbiota of other habitats (i.e., gastrointestinal) and have not been systematically compared for their utility in skin microbiome surveys. Here we establish a resource for the cutaneous research community to guide experimental design in characterizing skin microbiota. We compare two widely sequenced regions of the 16S rRNA gene to WMS sequencing for recapitulating skin microbiome community composition, diversity, and genetic functional enrichment. We show that WMS sequencing most accurately recapitulates microbial communities, but sequencing of hypervariable regions 1-3 of the 16S rRNA gene provides highly similar results. Sequencing of hypervariable region 4 poorly captures skin commensal microbiota, especially Propionibacterium. WMS sequencing, which is resource and cost intensive, provides evidence of a community's functional potential; however, metagenome predictions based on 16S rRNA sequence tags closely approximate WMS genetic functional profiles. This study highlights the importance of experimental design for downstream results in skin microbiome surveys. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  1. Skin microbiome surveys are strongly influenced by experimental design

    PubMed Central

    Meisel, Jacquelyn S.; Hannigan, Geoffrey D.; Tyldsley, Amanda S.; SanMiguel, Adam J.; Hodkinson, Brendan P.; Zheng, Qi; Grice, Elizabeth A.

    2016-01-01

    Culture-independent studies to characterize skin microbiota are increasingly common, due in part to affordable and accessible sequencing and analysis platforms. Compared to culture-based techniques, DNA sequencing of the bacterial 16S ribosomal RNA (rRNA) gene or whole metagenome shotgun (WMS) sequencing provide more precise microbial community characterizations. Most widely used protocols were developed to characterize microbiota of other habitats (i.e. gastrointestinal), and have not been systematically compared for their utility in skin microbiome surveys. Here we establish a resource for the cutaneous research community to guide experimental design in characterizing skin microbiota. We compare two widely sequenced regions of the 16S rRNA gene to WMS sequencing for recapitulating skin microbiome community composition, diversity, and genetic functional enrichment. We show that WMS sequencing most accurately recapitulates microbial communities, but sequencing of hypervariable regions 1-3 of the 16S rRNA gene provides highly similar results. Sequencing of hypervariable region 4 poorly captures skin commensal microbiota, especially Propionibacterium. WMS sequencing, which is resource- and cost-intensive, provides evidence of a community’s functional potential; however, metagenome predictions based on 16S rRNA sequence tags closely approximate WMS genetic functional profiles. This work highlights the importance of experimental design for downstream results in skin microbiome surveys. PMID:26829039

  2. The impact of freeze-drying infant fecal samples on measures of their bacterial community profiles and milk-derived oligosaccharide content.

    PubMed

    Lewis, Zachery T; Davis, Jasmine C C; Smilowitz, Jennifer T; German, J Bruce; Lebrilla, Carlito B; Mills, David A

    2016-01-01

    Infant fecal samples are commonly studied to investigate the impacts of breastfeeding on the development of the microbiota and subsequent health effects. Comparisons of infants living in different geographic regions and environmental contexts are needed to aid our understanding of evolutionarily-selected milk adaptations. However, the preservation of fecal samples from individuals in remote locales until they can be processed can be a challenge. Freeze-drying (lyophilization) offers a cost-effective way to preserve some biological samples for transport and analysis at a later date. Currently, it is unknown what, if any, biases are introduced into various analyses by the freeze-drying process. Here, we investigated how freeze-drying affected analysis of two relevant and intertwined aspects of infant fecal samples, marker gene amplicon sequencing of the bacterial community and the fecal oligosaccharide profile (undigested human milk oligosaccharides). No differences were discovered between the fecal oligosaccharide profiles of wet and freeze-dried samples. The marker gene sequencing data showed an increase in proportional representation of Bacteriodes and a decrease in detection of bifidobacteria and members of class Bacilli after freeze-drying. This sample treatment bias may possibly be related to the cell morphology of these different taxa (Gram status). However, these effects did not overwhelm the natural variation among individuals, as the community data still strongly grouped by subject and not by freeze-drying status. We also found that compensating for sample concentration during freeze-drying, while not necessary, was also not detrimental. Freeze-drying may therefore be an acceptable method of sample preservation and mass reduction for some studies of microbial ecology and milk glycan analysis.

  3. Identification of hidden relationships from the coupling of hydrophobic cluster analysis and domain architecture information.

    PubMed

    Faure, Guilhem; Callebaut, Isabelle

    2013-07-15

    Describing domain architecture is a critical step in the functional characterization of proteins. However, some orphan domains do not match any profile stored in dedicated domain databases and are thereby difficult to analyze. We present here an original novel approach, called TREMOLO-HCA, for the analysis of orphan domain sequences and inspired from our experience in the use of Hydrophobic Cluster Analysis (HCA). Hidden relationships between protein sequences can be more easily identified from the PSI-BLAST results, using information on domain architecture, HCA plots and the conservation degree of amino acids that may participate in the protein core. This can lead to reveal remote relationships with known families of domains, as illustrated here with the identification of a hidden Tudor tandem in the human BAHCC1 protein and a hidden ET domain in the Saccharomyces cerevisiae Taf14p and human AF9 proteins. The results obtained in such a way are consistent with those provided by HHPRED, based on pairwise comparisons of HHMs. Our approach can, however, be applied even in absence of domain profiles or known 3D structures for the identification of novel families of domains. It can also be used in a reverse way for refining domain profiles, by starting from known protein domain families and identifying highly divergent members, hitherto considered as orphan. We provide a possible integration of this approach in an open TREMOLO-HCA package, which is fully implemented in python v2.7 and is available on request. Instructions are available at http://www.impmc.upmc.fr/∼callebau/tremolohca.html. isabelle.callebaut@impmc.upmc.fr Supplementary Data are available at Bioinformatics online.

  4. An integrated systems genetics screen reveals the transcriptional structure of inherited predisposition to metastatic disease

    PubMed Central

    Faraji, Farhoud; Hu, Ying; Wu, Gang; Goldberger, Natalie E.; Walker, Renard C.; Zhang, Jinghui; Hunter, Kent W.

    2014-01-01

    Metastasis is the result of stochastic genomic and epigenetic events leading to gene expression profiles that drive tumor dissemination. Here we exploit the principle that metastatic propensity is modified by the genetic background to generate prognostic gene expression signatures that illuminate regulators of metastasis. We also identify multiple microRNAs whose germline variation is causally linked to tumor progression and metastasis. We employ network analysis of global gene expression profiles in tumors derived from a panel of recombinant inbred mice to identify a network of co-expressed genes centered on Cnot2 that predicts metastasis-free survival. Modulating Cnot2 expression changes tumor cell metastatic potential in vivo, supporting a functional role for Cnot2 in metastasis. Small RNA sequencing of the same tumor set revealed a negative correlation between expression of the Mir216/217 cluster and tumor progression. Expression quantitative trait locus analysis (eQTL) identified cis-eQTLs at the Mir216/217 locus, indicating that differences in expression may be inherited. Ectopic expression of Mir216/217 in tumor cells suppressed metastasis in vivo. Finally, small RNA sequencing and mRNA expression profiling data were integrated to reveal that miR-3470a/b target a high proportion of network transcripts. In vivo analysis of Mir3470a/b demonstrated that both promote metastasis. Moreover, Mir3470b is a likely regulator of the Cnot2 network as its overexpression down-regulated expression of network hub genes and enhanced metastasis in vivo, phenocopying Cnot2 knockdown. The resulting data from this strategy identify Cnot2 as a novel regulator of metastasis and demonstrate the power of our systems-level approach in identifying modifiers of metastasis. PMID:24322557

  5. Identifying functionally informative evolutionary sequence profiles.

    PubMed

    Gil, Nelson; Fiser, Andras

    2018-04-15

    Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases. Availability and implementation: A freely accessible program, including source code, implementing SAMMI is available at https://github.com/nelsongil92/SAMMI.git. andras.fiser@einstein.yu.edu. Supplementary data are available at Bioinformatics online.

  6. Clustering of Genetically Defined Allele Classes in the Caenorhabditis elegans DAF-2 Insulin/IGF-1 Receptor

    PubMed Central

    Patel, Dhaval S.; Garza-Garcia, Acely; Nanji, Manoj; McElwee, Joshua J.; Ackerman, Daniel; Driscoll, Paul C.; Gems, David

    2008-01-01

    The DAF-2 insulin/IGF-1 receptor regulates development, metabolism, and aging in the nematode Caenorhabditis elegans. However, complex differences among daf-2 alleles complicate analysis of this gene. We have employed epistasis analysis, transcript profile analysis, mutant sequence analysis, and homology modeling of mutant receptors to understand this complexity. We define an allelic series of nonconditional daf-2 mutants, including nonsense and deletion alleles, and a putative null allele, m65. The most severe daf-2 alleles show incomplete suppression by daf-18(0) and daf-16(0) and have a range of effects on early development. Among weaker daf-2 alleles there exist distinct mutant classes that differ in epistatic interactions with mutations in other genes. Mutant sequence analysis (including 11 newly sequenced alleles) reveals that class 1 mutant lesions lie only in certain extracellular regions of the receptor, while class 2 (pleiotropic) and nonconditional missense mutants have lesions only in the ligand-binding pocket of the receptor ectodomain or the tyrosine kinase domain. Effects of equivalent mutations on the human insulin receptor suggest an altered balance of intracellular signaling in class 2 alleles. These studies consolidate and extend our understanding of the complex genetics of daf-2 and its underlying molecular biology. PMID:18245374

  7. Genome wide transcriptional profiling of Herbaspirillum seropedicae SmR1 grown in the presence of naringenin.

    PubMed

    Tadra-Sfeir, Michelle Z; Faoro, Helisson; Camilios-Neto, Doumit; Brusamarello-Santos, Liziane; Balsanelli, Eduardo; Weiss, Vinicius; Baura, Valter A; Wassem, Roseli; Cruz, Leonardo M; De Oliveira Pedrosa, Fábio; Souza, Emanuel M; Monteiro, Rose A

    2015-01-01

    Herbaspirillum seropedicae is a diazotrophic bacterium which associates endophytically with economically important gramineae. Flavonoids such as naringenin have been shown to have an effect on the interaction between H. seropedicae and its host plants. We used a high-throughput sequencing based method (RNA-Seq) to access the influence of naringenin on the whole transcriptome profile of H. seropedicae. Three hundred and four genes were downregulated and seventy seven were upregulated by naringenin. Data analysis revealed that genes related to bacterial flagella biosynthesis, chemotaxis and biosynthesis of peptidoglycan were repressed by naringenin. Moreover, genes involved in aromatic metabolism and multidrug transport efllux were actived.

  8. Multiplexed ChIP-Seq Using Direct Nucleosome Barcoding: A Tool for High-Throughput Chromatin Analysis.

    PubMed

    Chabbert, Christophe D; Adjalley, Sophie H; Steinmetz, Lars M; Pelechano, Vicent

    2018-01-01

    Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) or microarray hybridization (ChIP-on-chip) are standard methods for the study of transcription factor binding sites and histone chemical modifications. However, these approaches only allow profiling of a single factor or protein modification at a time.In this chapter, we present Bar-ChIP, a higher throughput version of ChIP-Seq that relies on the direct ligation of molecular barcodes to chromatin fragments. Bar-ChIP enables the concurrent profiling of multiple DNA-protein interactions and is therefore amenable to experimental scale-up, without the need for any robotic instrumentation.

  9. Bacterial community structure in the hyperarid core of the Atacama Desert, Chile

    USGS Publications Warehouse

    Drees, Kevin P.; Neilson, Julia W.; Betancourt, Julio L.; Quade, Jay; Henderson, David A.; Pryor, Barry M.; Maier, Raina M.

    2006-01-01

    Soils from the hyperarid Atacama Desert of northern Chile were sampled along an east-west elevational transect (23.75 to 24.70 degrees S) through the driest sector to compare the relative structure of bacterial communities. Analysis of denaturing gradient gel electrophoresis (DGGE) profiles from each of the samples revealed that microbial communities from the extreme hyperarid core of the desert clustered separately from all of the remaining communities. Bands sequenced from DGGE profiles of two samples taken at a 22-month interval from this core region revealed the presence of similar populations dominated by bacteria from the Gemmatimonadetes and Planctomycetes phyla.

  10. [Vertical distribution of soil active carbon and soil organic carbon storage under different forest types in the Qinling Mountains].

    PubMed

    Wang, Di; Geng, Zeng-Chao; She, Diao; He, Wen-Xiang; Hou, Lin

    2014-06-01

    Adopting field investigation and indoor analysis methods, the distribution patterns of soil active carbon and soil carbon storage in the soil profiles of Quercus aliena var. acuteserrata (Matoutan Forest, I), Pinus tabuliformis (II), Pinus armandii (III), pine-oak mixed forest (IV), Picea asperata (V), and Quercus aliena var. acuteserrata (Xinjiashan Forest, VI) of Qinling Mountains were studied in August 2013. The results showed that soil organic carbon (SOC), microbial biomass carbon (MBC), dissolved organic carbon (DOC), and easily oxidizable carbon (EOC) decreased with the increase of soil depth along the different forest soil profiles. The SOC and DOC contents of different depths along the soil profiles of P. asperata and pine-oak mixed forest were higher than in the other studied forest soils, and the order of the mean SOC and DOC along the different soil profiles was V > IV > I > II > III > VI. The contents of soil MBC of the different forest soil profiles were 71.25-710.05 mg x kg(-1), with a content sequence of I > V > N > III > II > VI. The content of EOC along the whole soil profile of pine-oak mixed forest had a largest decline, and the order of the mean EOC was IV > V> I > II > III > VI. The sequence of soil organic carbon storage of the 0-60 cm soil layer was V > I >IV > III > VI > II. The MBC, DOC and EOC contents of the different forest soils were significanty correlated to each other. There was significant positive correlation among soil active carbon and TOC, TN. Meanwhile, there was no significant correlation between soil active carbon and other soil basic physicochemical properties.

  11. Implications of diadochokinesia in children with speech sound disorder.

    PubMed

    Wertzner, Haydée Fiszbein; Pagan-Neves, Luciana de Oliveira; Alves, Renata Ramos; Barrozo, Tatiane Faria

    2013-01-01

    To verify the performance of children with and without speech sound disorder in oral motor skills measured by oral diadochokinesia according to age and gender and to compare the results by two different methods of analysis. Participants were 72 subjects aged from 5 years to 7 years and 11 months divided into four subgroups according to the presence of speech sound disorder (Study Group and Control Group) and age (<6 years and 5 months and >6 years and 5 months). Diadochokinesia skills were assessed by the repetition of the sequences 'pa', 'ta', 'ka' and 'pataka' measured both manually and by the software Motor Speech Profile®. Gender was statistically different for both groups but it did not influence on the number of sequences per second produced. Correlation between the number of sequences per second and age was observed for all sequences (except for 'ka') only for the control group children. Comparison between groups did not indicate differences between the number of sequences per second and age. Results presented strong agreement between the values of oral diadochokinesia measured manually and by MSP. This research demonstrated the importance of using different methods of analysis on the functional evaluation of oro-motor processing aspects of children with speech sound disorder and evidenced the oro-motor difficulties on children aged under than eight years old.

  12. Associations between soil bacterial community structure and nutrient cycling functions in long-term organic farm soils following cover crop and organic fertilizer amendment.

    PubMed

    Fernandez, Adria L; Sheaffer, Craig C; Wyse, Donald L; Staley, Christopher; Gould, Trevor J; Sadowsky, Michael J

    2016-10-01

    Agricultural management practices can produce changes in soil microbial populations whose functions are crucial to crop production and may be detectable using high-throughput sequencing of bacterial 16S rRNA. To apply sequencing-derived bacterial community structure data to on-farm decision-making will require a better understanding of the complex associations between soil microbial community structure and soil function. Here 16S rRNA sequencing was used to profile soil bacterial communities following application of cover crops and organic fertilizer treatments in certified organic field cropping systems. Amendment treatments were hairy vetch (Vicia villosa), winter rye (Secale cereale), oilseed radish (Raphanus sativus), buckwheat (Fagopyrum esculentum), beef manure, pelleted poultry manure, Sustane(®) 8-2-4, and a no-amendment control. Enzyme activities, net N mineralization, soil respiration, and soil physicochemical properties including nutrient levels, organic matter (OM) and pH were measured. Relationships between these functional and physicochemical parameters and soil bacterial community structure were assessed using multivariate methods including redundancy analysis, discriminant analysis, and Bayesian inference. Several cover crops and fertilizers affected soil functions including N-acetyl-β-d-glucosaminidase and β-glucosidase activity. Effects, however, were not consistent across locations and sampling timepoints. Correlations were observed among functional parameters and relative abundances of individual bacterial families and phyla. Bayesian analysis inferred no directional relationships between functional activities, bacterial families, and physicochemical parameters. Soil functional profiles were more strongly predicted by location than by treatment, and differences were largely explained by soil physicochemical parameters. Composition of soil bacterial communities was predictive of soil functional profiles. Differences in soil function were better explained using both soil physicochemical test values and bacterial community structure data than using soil tests alone. Pursuing a better understanding of bacterial community composition and how it is affected by farming practices is a promising avenue for increasing our ability to predict the impact of management practices on important soil functions. Copyright © 2016. Published by Elsevier B.V.

  13. Profiling mRNAs of Two Cuscuta Species Reveals Possible Candidate Transcripts Shared by Parasitic Plants

    PubMed Central

    Wijeratne, Saranga; Fraga, Martina; Meulia, Tea; Doohan, Doug; Li, Zhaohu; Qu, Feng

    2013-01-01

    Dodders are among the most important parasitic plants that cause serious yield losses in crop plants. In this report, we sought to unveil the genetic basis of dodder parasitism by profiling the trancriptomes of Cuscuta pentagona and C. suaveolens, two of the most common dodder species using a next-generation RNA sequencing platform. De novo assembly of the sequence reads resulted in more than 46,000 isotigs and contigs (collectively referred to as expressed sequence tags or ESTs) for each species, with more than half of them predicted to encode proteins that share significant sequence similarities with known proteins of non-parasitic plants. Comparing our datasets with transcriptomes of 12 other fully sequenced plant species confirmed a close evolutionary relationship between dodder and tomato. Using a rigorous set of filtering parameters, we were able to identify seven pairs of ESTs that appear to be shared exclusively by parasitic plants, thus providing targets for tailored management approaches. In addition, we also discovered ESTs with sequences similarities to known plant viruses, including cryptic viruses, in the dodder sequence assemblies. Together this study represents the first comprehensive transcriptome profiling of parasitic plants in the Cuscuta genus, and is expected to contribute to our understanding of the molecular mechanisms of parasitic plant-host plant interactions. PMID:24312295

  14. Profiling mRNAs of two Cuscuta species reveals possible candidate transcripts shared by parasitic plants.

    PubMed

    Jiang, Linjian; Wijeratne, Asela J; Wijeratne, Saranga; Fraga, Martina; Meulia, Tea; Doohan, Doug; Li, Zhaohu; Qu, Feng

    2013-01-01

    Dodders are among the most important parasitic plants that cause serious yield losses in crop plants. In this report, we sought to unveil the genetic basis of dodder parasitism by profiling the trancriptomes of Cuscuta pentagona and C. suaveolens, two of the most common dodder species using a next-generation RNA sequencing platform. De novo assembly of the sequence reads resulted in more than 46,000 isotigs and contigs (collectively referred to as expressed sequence tags or ESTs) for each species, with more than half of them predicted to encode proteins that share significant sequence similarities with known proteins of non-parasitic plants. Comparing our datasets with transcriptomes of 12 other fully sequenced plant species confirmed a close evolutionary relationship between dodder and tomato. Using a rigorous set of filtering parameters, we were able to identify seven pairs of ESTs that appear to be shared exclusively by parasitic plants, thus providing targets for tailored management approaches. In addition, we also discovered ESTs with sequences similarities to known plant viruses, including cryptic viruses, in the dodder sequence assemblies. Together this study represents the first comprehensive transcriptome profiling of parasitic plants in the Cuscuta genus, and is expected to contribute to our understanding of the molecular mechanisms of parasitic plant-host plant interactions.

  15. Comparative Transcriptome Profiling of Rice Near-Isogenic Line Carrying Xa23 under Infection of Xanthomonas oryzae pv. oryzae.

    PubMed

    Tariq, Rezwan; Wang, Chunlian; Qin, Tengfei; Xu, Feifei; Tang, Yongchao; Gao, Ying; Ji, Zhiyuan; Zhao, Kaijun

    2018-03-02

    Bacterial blight, caused by Xanthomonas oryzae pv. oryzae ( Xoo ), is an overwhelming disease in rice-growing regions worldwide. Our previous studies revealed that the executor R gene Xa23 confers broad-spectrum disease resistance to all naturally occurring biotypes of Xoo . In this study, comparative transcriptomic profiling of two near-isogenic lines (NILs), CBB23 (harboring Xa23 ) and JG30 (without Xa23 ), before and after infection of the Xoo strain, PXO99 A , was done by RNA sequencing, to identify genes associated with the resistance. After high throughput sequencing, 1645 differentially expressed genes (DEGs) were identified between CBB23 and JG30 at different time points. Gene Ontlogy (GO) analysis categorized the DEGs into biological process, molecular function, and cellular component. KEGG analysis categorized the DEGs into different pathways, and phenylpropanoid biosynthesis was the most prominent pathway, followed by biosynthesis of plant hormones, flavonoid biosynthesis, and glycolysis/gluconeogenesis. Further analysis led to the identification of differentially expressed transcription factors (TFs) and different kinase responsive genes in CBB23, than that in JG30. Besides TFs and kinase responsive genes, DEGs related to ethylene, jasmonic acid, and secondary metabolites were also identified in both genotypes after PXO99 A infection. The data of DEGs are a precious resource for further clarifying the network of Xa23 -mediated resistance.

  16. Comparative Transcriptome Profiling of Rice Near-Isogenic Line Carrying Xa23 under Infection of Xanthomonas oryzae pv. oryzae

    PubMed Central

    Tariq, Rezwan; Wang, Chunlian; Qin, Tengfei; Xu, Feifei; Tang, Yongchao; Gao, Ying; Ji, Zhiyuan; Zhao, Kaijun

    2018-01-01

    Bacterial blight, caused by Xanthomonas oryzae pv. oryzae (Xoo), is an overwhelming disease in rice-growing regions worldwide. Our previous studies revealed that the executor R gene Xa23 confers broad-spectrum disease resistance to all naturally occurring biotypes of Xoo. In this study, comparative transcriptomic profiling of two near-isogenic lines (NILs), CBB23 (harboring Xa23) and JG30 (without Xa23), before and after infection of the Xoo strain, PXO99A, was done by RNA sequencing, to identify genes associated with the resistance. After high throughput sequencing, 1645 differentially expressed genes (DEGs) were identified between CBB23 and JG30 at different time points. Gene Ontlogy (GO) analysis categorized the DEGs into biological process, molecular function, and cellular component. KEGG analysis categorized the DEGs into different pathways, and phenylpropanoid biosynthesis was the most prominent pathway, followed by biosynthesis of plant hormones, flavonoid biosynthesis, and glycolysis/gluconeogenesis. Further analysis led to the identification of differentially expressed transcription factors (TFs) and different kinase responsive genes in CBB23, than that in JG30. Besides TFs and kinase responsive genes, DEGs related to ethylene, jasmonic acid, and secondary metabolites were also identified in both genotypes after PXO99A infection. The data of DEGs are a precious resource for further clarifying the network of Xa23-mediated resistance. PMID:29498672

  17. Genome-wide identification and expression profiling of the SnRK2 gene family in Malus prunifolia.

    PubMed

    Shao, Yun; Qin, Yuan; Zou, Yangjun; Ma, Fengwang

    2014-11-15

    Sucrose non-fermenting-1-related protein kinase 2 (SnRK2) constitutes a small plant-specific serine/threonine kinase family with essential roles in the abscisic acid (ABA) signal pathway and in responses to osmotic stress. Although a genome-wide analysis of this family has been conducted in some species, little is known about SnRK2 genes in apple (Malus domestica). We identified 14 putative sequences encoding 12 deduced SnRK2 proteins within the apple genome. Gene chromosomal location and synteny analysis of the apple SnRK2 genes indicated that tandem and segmental duplications have likely contributed to the expansion and evolution of these genes. All 12 full-length coding sequences were confirmed by cloning from Malus prunifolia. The gene structure and motif compositions of the apple SnRK2 genes were analyzed. Phylogenetic analysis showed that MpSnRK2s could be classified into four groups. Profiling of these genes presented differential patterns of expression in various tissues. Under stress conditions, transcript levels for some family members were up-regulated in the leaves in response to drought, salinity, or ABA treatments. This suggested their possible roles in plant response to abiotic stress. Our findings provide essential information about SnRK2 genes in apple and will contribute to further functional dissection of this gene family. Copyright © 2014 Elsevier B.V. All rights reserved.

  18. Gene expression analysis of induced pluripotent stem cells from aneuploid chromosomal syndromes

    PubMed Central

    2013-01-01

    Background Human aneuploidy is the leading cause of early pregnancy loss, mental retardation, and multiple congenital anomalies. Due to the high mortality associated with aneuploidy, the pathophysiological mechanisms of aneuploidy syndrome remain largely unknown. Previous studies focused mostly on whether dosage compensation occurs, and the next generation transcriptomics sequencing technology RNA-seq is expected to eventually uncover the mechanisms of gene expression regulation and the related pathological phenotypes in human aneuploidy. Results Using next generation transcriptomics sequencing technology RNA-seq, we profiled the transcriptomes of four human aneuploid induced pluripotent stem cell (iPSC) lines generated from monosomy × (Turner syndrome), trisomy 8 (Warkany syndrome 2), trisomy 13 (Patau syndrome), and partial trisomy 11:22 (Emanuel syndrome) as well as two umbilical cord matrix iPSC lines as euploid controls to examine how phenotypic abnormalities develop with aberrant karyotype. A total of 466 M (50-bp) reads were obtained from the six iPSC lines, and over 13,000 mRNAs were identified by gene annotation. Global analysis of gene expression profiles and functional analysis of differentially expressed (DE) genes were implemented. Over 5000 DE genes are determined between aneuploidy and euploid iPSCs respectively while 9 KEGG pathways are overlapped enriched in four aneuploidy samples. Conclusions Our results demonstrate that the extra or missing chromosome has extensive effects on the whole transcriptome. Functional analysis of differentially expressed genes reveals that the genes most affected in aneuploid individuals are related to central nervous system development and tumorigenesis. PMID:24564826

  19. Comprehensive transcriptome-based characterization of differentially expressed genes involved in microsporogenesis of radish CMS line and its maintainer.

    PubMed

    Xie, Yang; Zhang, Wei; Wang, Yan; Xu, Liang; Zhu, Xianwen; Muleke, Everlyne M; Liu, Liwang

    2016-09-01

    Microsporogenesis is an indispensable period for investigating microspore development and cytoplasmic male sterility (CMS) occurrence. Radish CMS line plays a critical role in elite F1 hybrid seed production and heterosis utilization. However, the molecular mechanisms of microspore development and CMS occurrence have not been thoroughly uncovered in radish. In this study, a comparative analysis of radish floral buds from a CMS line (NAU-WA) and its maintainer (NAU-WB) was conducted using next generation sequencing (NGS) technology. Digital gene expression (DGE) profiling revealed that 3504 genes were significantly differentially expressed between NAU-WA and NAU-WB library, among which 1910 were upregulated and 1594 were downregulated. Gene ontology (GO) analysis showed that these differentially expressed genes (DEGs) were mainly enriched in extracellular region, catalytic activity, and response to stimulus. KEGG enrichment analysis revealed that the DEGs were predominantly associated with flavonoid biosynthesis, glycolysis, and biosynthesis of secondary metabolites. Real-time quantitative PCR analysis showed that the expression profiles of 13 randomly selected DEGs were in high agreement with results from Illumina sequencing. Several candidate genes encoding ATP synthase, auxin response factor (ARF), transcription factors (TFs), chalcone synthase (CHS), and male sterility (MS) were responsible for microsporogenesis. Furthermore, a schematic diagram for functional interaction of DEGs from NAU-WA vs. NAU-WB library in radish plants was proposed. These results could provide new information on the dissection of the molecular mechanisms underlying microspore development and CMS occurrence in radish.

  20. Multilocus Sequence Typing and Virulence-Associated Gene Profile Analysis of Staphylococcus aureus Isolates From Retail Ready-to-Eat Food in China.

    PubMed

    Yang, Xiaojuan; Yu, Shubo; Wu, Qingping; Zhang, Jumei; Wu, Shi; Rong, Dongli

    2018-01-01

    The aim of this study was to characterize the subtypes and virulence profiles of 69 Staphylococcus aureus isolates obtained from retail ready-to-eat food in China. The isolates were analyzed using multilocus sequence typing (MLST) and polymerase chain reaction (PCR) analysis of important virulence factor genes, including the staphylococcal enterotoxin (SE) genes ( sea , seb , sec , sed , see , seg , seh , sei , sej ), the exfoliative toxin genes ( eta and etb ), the toxic shock syndrome toxin-1 gene ( tst ), and the Panton-Valentine leucocidin-encoding gene ( pvl ). The isolates encompassed 26 different sequence types (STs), including four new STs (ST3482, ST3484, ST3485, ST3504), clustered in three clonal complexes and 17 singletons. The most prevalent STs were ST1, ST6, and ST15, constituting 34.8% of all isolates. Most STs (15/26, 57.7%) detected have previously been associated with human infections. All 13 toxin genes examined were detected in the S. aureus isolates, with 84.1% of isolates containing toxin genes. The three most prevalent toxin genes were seb (36.2%), sea (33.3%), and seg (33.3%). The classical SE genes ( sea - see ), which contribute significantly to staphylococcal food poisoning (SFP), were detected in 72.5% of the S. aureus isolates. In addition, pvl , eta , etb , and tst were found in 11.6, 10.1, 10.1, and 7.2% of the S. aureus isolates, respectively. Strains ST6 carrying sea and ST1 harboring sec-seh enterotoxin profile, which are the two most common clones associated with SFP, were also frequently detected in the food samples in this study. This study indicates that these S. aureus isolates present in Chinese ready-to-eat food represents a potential public health risk. These data are valuable for epidemiological studies, risk management, and public health strategies.

  1. Multilocus Sequence Typing and Virulence-Associated Gene Profile Analysis of Staphylococcus aureus Isolates From Retail Ready-to-Eat Food in China

    PubMed Central

    Yang, Xiaojuan; Yu, Shubo; Wu, Qingping; Zhang, Jumei; Wu, Shi; Rong, Dongli

    2018-01-01

    The aim of this study was to characterize the subtypes and virulence profiles of 69 Staphylococcus aureus isolates obtained from retail ready-to-eat food in China. The isolates were analyzed using multilocus sequence typing (MLST) and polymerase chain reaction (PCR) analysis of important virulence factor genes, including the staphylococcal enterotoxin (SE) genes (sea, seb, sec, sed, see, seg, seh, sei, sej), the exfoliative toxin genes (eta and etb), the toxic shock syndrome toxin-1 gene (tst), and the Panton-Valentine leucocidin-encoding gene (pvl). The isolates encompassed 26 different sequence types (STs), including four new STs (ST3482, ST3484, ST3485, ST3504), clustered in three clonal complexes and 17 singletons. The most prevalent STs were ST1, ST6, and ST15, constituting 34.8% of all isolates. Most STs (15/26, 57.7%) detected have previously been associated with human infections. All 13 toxin genes examined were detected in the S. aureus isolates, with 84.1% of isolates containing toxin genes. The three most prevalent toxin genes were seb (36.2%), sea (33.3%), and seg (33.3%). The classical SE genes (sea–see), which contribute significantly to staphylococcal food poisoning (SFP), were detected in 72.5% of the S. aureus isolates. In addition, pvl, eta, etb, and tst were found in 11.6, 10.1, 10.1, and 7.2% of the S. aureus isolates, respectively. Strains ST6 carrying sea and ST1 harboring sec-seh enterotoxin profile, which are the two most common clones associated with SFP, were also frequently detected in the food samples in this study. This study indicates that these S. aureus isolates present in Chinese ready-to-eat food represents a potential public health risk. These data are valuable for epidemiological studies, risk management, and public health strategies. PMID:29662467

  2. Deep RNA sequencing reveals dynamic regulation of myocardial noncoding RNAs in failing human heart and remodeling with mechanical circulatory support.

    PubMed

    Yang, Kai-Chien; Yamada, Kathryn A; Patel, Akshar Y; Topkara, Veli K; George, Isaac; Cheema, Faisal H; Ewald, Gregory A; Mann, Douglas L; Nerbonne, Jeanne M

    2014-03-04

    Microarrays have been used extensively to profile transcriptome remodeling in failing human heart, although the genomic coverage provided is limited and fails to provide a detailed picture of the myocardial transcriptome landscape. Here, we describe sequencing-based transcriptome profiling, providing comprehensive analysis of myocardial mRNA, microRNA (miRNA), and long noncoding RNA (lncRNA) expression in failing human heart before and after mechanical support with a left ventricular (LV) assist device (LVAD). Deep sequencing of RNA isolated from paired nonischemic (NICM; n=8) and ischemic (ICM; n=8) human failing LV samples collected before and after LVAD and from nonfailing human LV (n=8) was conducted. These analyses revealed high abundance of mRNA (37%) and lncRNA (71%) of mitochondrial origin. miRNASeq revealed 160 and 147 differentially expressed miRNAs in ICM and NICM, respectively, compared with nonfailing LV. Among these, only 2 (ICM) and 5 (NICM) miRNAs are normalized with LVAD. RNASeq detected 18 480, including 113 novel, lncRNAs in human LV. Among the 679 (ICM) and 570 (NICM) lncRNAs differentially expressed with heart failure, ≈10% are improved or normalized with LVAD. In addition, the expression signature of lncRNAs, but not miRNAs or mRNAs, distinguishes ICM from NICM. Further analysis suggests that cis-gene regulation represents a major mechanism of action of human cardiac lncRNAs. The myocardial transcriptome is dynamically regulated in advanced heart failure and after LVAD support. The expression profiles of lncRNAs, but not mRNAs or miRNAs, can discriminate failing hearts of different pathologies and are markedly altered in response to LVAD support. These results suggest an important role for lncRNAs in the pathogenesis of heart failure and in reverse remodeling observed with mechanical support.

  3. PknB remains an essential and a conserved target for drug development in susceptible and MDR strains of M. Tuberculosis.

    PubMed

    Gupta, Anamika; Pal, Sudhir K; Pandey, Divya; Fakir, Najneen A; Rathod, Sunita; Sinha, Dhiraj; SivaKumar, S; Sinha, Pallavi; Periera, Mycal; Balgam, Shilpa; Sekar, Gomathi; UmaDevi, K R; Anupurba, Shampa; Nema, Vijay

    2017-08-18

    The Mycobacterium tuberculosis (M.tb) protein kinase B (PknB) which is now proved to be essential for the growth and survival of M.tb, is a transmembrane protein with a potential to be a good drug target. However it is not known if this target remains conserved in otherwise resistant isolates from clinical origin. The present study describes the conservation analysis of sequences covering the inhibitor binding domain of PknB to assess if it remains conserved in susceptible and resistant clinical strains of mycobacteria picked from three different geographical areas of India. A total of 116 isolates from North, South and West India were used in the study with a variable profile of their susceptibilities towards streptomycin, isoniazid, rifampicin, ethambutol and ofloxacin. Isolates were also spoligotyped in order to find if the conservation pattern of pknB gene remain consistent or differ with different spoligotypes. The impact of variation as found in the study was analyzed using Molecular dynamics simulations. The sequencing results with 115/116 isolates revealed the conserved nature of pknB sequences irrespective of their susceptibility status and spoligotypes. The only variation found was in one strains wherein pnkB sequence had G to A mutation at 664 position translating into a change of amino acid, Valine to Isoleucine. After analyzing the impact of this sequence variation using Molecular dynamics simulations, it was observed that the variation is causing no significant change in protein structure or the inhibitor binding. Hence, the study endorses that PknB is an ideal target for drug development and there is no pre-existing or induced resistance with respect to the sequences involved in inhibitor binding. Also if the mutation that we are reporting for the first time is found again in subsequent work, it should be checked with phenotypic profile before drawing the conclusion that it would affect the activity in any way. Bioinformatics analysis in our study says that it has no significant effect on the binding and hence the activity of the protein.

  4. PARRoT- a homology-based strategy to quantify and compare RNA-sequencing from non-model organisms.

    PubMed

    Gan, Ruei-Chi; Chen, Ting-Wen; Wu, Timothy H; Huang, Po-Jung; Lee, Chi-Ching; Yeh, Yuan-Ming; Chiu, Cheng-Hsun; Huang, Hsien-Da; Tang, Petrus

    2016-12-22

    Next-generation sequencing promises the de novo genomic and transcriptomic analysis of samples of interests. However, there are only a few organisms having reference genomic sequences and even fewer having well-defined or curated annotations. For transcriptome studies focusing on organisms lacking proper reference genomes, the common strategy is de novo assembly followed by functional annotation. However, things become even more complicated when multiple transcriptomes are compared. Here, we propose a new analysis strategy and quantification methods for quantifying expression level which not only generate a virtual reference from sequencing data, but also provide comparisons between transcriptomes. First, all reads from the transcriptome datasets are pooled together for de novo assembly. The assembled contigs are searched against NCBI NR databases to find potential homolog sequences. Based on the searched result, a set of virtual transcripts are generated and served as a reference transcriptome. By using the same reference, normalized quantification values including RC (read counts), eRPKM (estimated RPKM) and eTPM (estimated TPM) can be obtained that are comparable across transcriptome datasets. In order to demonstrate the feasibility of our strategy, we implement it in the web service PARRoT. PARRoT stands for Pipeline for Analyzing RNA Reads of Transcriptomes. It analyzes gene expression profiles for two transcriptome sequencing datasets. For better understanding of the biological meaning from the comparison among transcriptomes, PARRoT further provides linkage between these virtual transcripts and their potential function through showing best hits in SwissProt, NR database, assigning GO terms. Our demo datasets showed that PARRoT can analyze two paired-end transcriptomic datasets of approximately 100 million reads within just three hours. In this study, we proposed and implemented a strategy to analyze transcriptomes from non-reference organisms which offers the opportunity to quantify and compare transcriptome profiles through a homolog based virtual transcriptome reference. By using the homolog based reference, our strategy effectively avoids the problems that may cause from inconsistencies among transcriptomes. This strategy will shed lights on the field of comparative genomics for non-model organism. We have implemented PARRoT as a web service which is freely available at http://parrot.cgu.edu.tw .

  5. Genome-wide transcriptional analysis of flagellar regeneration in Chlamydomonas reinhardtii identifies orthologs of ciliary disease genes

    NASA Technical Reports Server (NTRS)

    Stolc, Viktor; Samanta, Manoj Pratim; Tongprasit, Waraporn; Marshall, Wallace F.

    2005-01-01

    The important role that cilia and flagella play in human disease creates an urgent need to identify genes involved in ciliary assembly and function. The strong and specific induction of flagellar-coding genes during flagellar regeneration in Chlamydomonas reinhardtii suggests that transcriptional profiling of such cells would reveal new flagella-related genes. We have conducted a genome-wide analysis of RNA transcript levels during flagellar regeneration in Chlamydomonas by using maskless photolithography method-produced DNA oligonucleotide microarrays with unique probe sequences for all exons of the 19,803 predicted genes. This analysis represents previously uncharacterized whole-genome transcriptional activity profiling study in this important model organism. Analysis of strongly induced genes reveals a large set of known flagellar components and also identifies a number of important disease-related proteins as being involved with cilia and flagella, including the zebrafish polycystic kidney genes Qilin, Reptin, and Pontin, as well as the testis-expressed tubby-like protein TULP2.

  6. Unravelling the complexity of microRNA-mediated gene regulation in black pepper (Piper nigrum L.) using high-throughput small RNA profiling.

    PubMed

    Asha, Srinivasan; Sreekumar, Sweda; Soniya, E V

    2016-01-01

    Analysis of high-throughput small RNA deep sequencing data, in combination with black pepper transcriptome sequences revealed microRNA-mediated gene regulation in black pepper ( Piper nigrum L.). Black pepper is an important spice crop and its berries are used worldwide as a natural food additive that contributes unique flavour to foods. In the present study to characterize microRNAs from black pepper, we generated a small RNA library from black pepper leaf and sequenced it by Illumina high-throughput sequencing technology. MicroRNAs belonging to a total of 303 conserved miRNA families were identified from the sRNAome data. Subsequent analysis from recently sequenced black pepper transcriptome confirmed precursor sequences of 50 conserved miRNAs and four potential novel miRNA candidates. Stem-loop qRT-PCR experiments demonstrated differential expression of eight conserved miRNAs in black pepper. Computational analysis of targets of the miRNAs showed 223 potential black pepper unigene targets that encode diverse transcription factors and enzymes involved in plant development, disease resistance, metabolic and signalling pathways. RLM-RACE experiments further mapped miRNA-mediated cleavage at five of the mRNA targets. In addition, miRNA isoforms corresponding to 18 miRNA families were also identified from black pepper. This study presents the first large-scale identification of microRNAs from black pepper and provides the foundation for the future studies of miRNA-mediated gene regulation of stress responses and diverse metabolic processes in black pepper.

  7. Annotating Protein Functional Residues by Coupling High-Throughput Fitness Profile and Homologous-Structure Analysis

    PubMed Central

    Du, Yushen; Wu, Nicholas C.; Jiang, Lin; Zhang, Tianhao; Gong, Danyang; Shu, Sara; Wu, Ting-Ting

    2016-01-01

    ABSTRACT Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp), we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available. PMID:27803181

  8. Nitrous Oxide Reductase (nosZ) Gene Fragments Differ between Native and Cultivated Michigan Soils

    PubMed Central

    Stres, Blaž; Mahne, Ivan; Avguštin, Gorazd; Tiedje, James M.

    2004-01-01

    The effect of standard agricultural management on the genetic heterogeneity of nitrous oxide reductase (nosZ) fragments from denitrifying prokaryotes in native and cultivated soil was explored. Thirty-six soil cores were composited from each of the two soil management conditions. nosZ gene fragments were amplified from triplicate samples, and PCR products were cloned and screened by restriction fragment length polymorphism (RFLP). The total nosZ RFLP profiles increased in similarity with soil sample size until triplicate 3-g samples produced visually identical RFLP profiles for each treatment. Large differences in total nosZ profiles were observed between the native and cultivated soils. The fragments representing major groups of clones encountered at least twice and four randomly selected clones with unique RFLP patterns were sequenced to verify nosZ identity. The sequence diversity of nosZ clones from the cultivated field was higher, and only eight patterns were found in clone libraries from both soils among the 182 distinct nosZ RFLP patterns identified from the two soils. A group of clones that comprised 32% of all clones dominated the gene library of native soil, whereas many minor groups were observed in the gene library of cultivated soil. The 95% confidence intervals of the Chao1 nonparametric richness estimator for nosZ RFLP data did not overlap, indicating that the levels of species richness are significantly different in the two soils, the cultivated soil having higher diversity. Phylogenetic analysis of deduced amino acid sequences grouped the majority of nosZ clones into an interleaved Michigan soil cluster whose cultured members are α-Proteobacteria. Only four nosZ sequences from cultivated soil and one from the native soil were related to sequences found in γ-Proteobacteria. Sequences from the native field formed a distinct, closely related cluster (Dmean = 0.16) containing 91.6% of the native clones. Clones from the cultivated field were more distantly related to each other (Dmean = 0.26), and 65% were found outside of the cluster from the native soil, further indicating a difference in the two communities. Overall, there appears to be a relationship between use and richness, diversity, and the phylogenetic position of nosZ sequences, indicating that agricultural use of soil caused a shift to a more diverse denitrifying community. PMID:14711656

  9. Machine learning methods can replace 3D profile method in classification of amyloidogenic hexapeptides.

    PubMed

    Stanislawski, Jerzy; Kotulska, Malgorzata; Unold, Olgierd

    2013-01-17

    Amyloids are proteins capable of forming fibrils. Many of them underlie serious diseases, like Alzheimer disease. The number of amyloid-associated diseases is constantly increasing. Recent studies indicate that amyloidogenic properties can be associated with short segments of aminoacids, which transform the structure when exposed. A few hundreds of such peptides have been experimentally found. Experimental testing of all possible aminoacid combinations is currently not feasible. Instead, they can be predicted by computational methods. 3D profile is a physicochemical-based method that has generated the most numerous dataset - ZipperDB. However, it is computationally very demanding. Here, we show that dataset generation can be accelerated. Two methods to increase the classification efficiency of amyloidogenic candidates are presented and tested: simplified 3D profile generation and machine learning methods. We generated a new dataset of hexapeptides, using more economical 3D profile algorithm, which showed very good classification overlap with ZipperDB (93.5%). The new part of our dataset contains 1779 segments, with 204 classified as amyloidogenic. The dataset of 6-residue sequences with their binary classification, based on the energy of the segment, was applied for training machine learning methods. A separate set of sequences from ZipperDB was used as a test set. The most effective methods were Alternating Decision Tree and Multilayer Perceptron. Both methods obtained area under ROC curve of 0.96, accuracy 91%, true positive rate ca. 78%, and true negative rate 95%. A few other machine learning methods also achieved a good performance. The computational time was reduced from 18-20 CPU-hours (full 3D profile) to 0.5 CPU-hours (simplified 3D profile) to seconds (machine learning). We showed that the simplified profile generation method does not introduce an error with regard to the original method, while increasing the computational efficiency. Our new dataset proved representative enough to use simple statistical methods for testing the amylogenicity based only on six letter sequences. Statistical machine learning methods such as Alternating Decision Tree and Multilayer Perceptron can replace the energy based classifier, with advantage of very significantly reduced computational time and simplicity to perform the analysis. Additionally, a decision tree provides a set of very easily interpretable rules.

  10. Identification, characterization and description of Arcobacter faecis sp. nov., isolated from a human waste septic tank.

    PubMed

    Whiteduck-Léveillée, Kerri; Whiteduck-Léveillée, Jenni; Cloutier, Michel; Tambong, James T; Xu, Renlin; Topp, Edward; Arts, Michael T; Chao, Jerry; Adam, Zaky; Lévesque, C André; Lapen, David R; Villemur, Richard; Khan, Izhar U H

    2016-03-01

    A study on the taxonomic classification of Arcobacter species was performed on the cultures isolated from various fecal sources where an Arcobacter strain AF1078(T) from human waste septic tank near Ottawa, Ontario, Canada was characterized using a polyphasic approach. Genetic investigations including 16S rRNA, atpA, cpn60, gyrA, gyrB and rpoB gene sequences of strain AF1078(T) are unique in comparison with other arcobacters. Phylogenetic analysis based on the 16S rRNA gene sequence revealed that the strain is most closely related to Arcobacter lanthieri and Arcobacter cibarius. Analyses of atpA, cpn60, gyrA, gyrB and rpoB gene sequences suggested that strain AF1078(T) formed a phylogenetic lineage independent of other species in the genus. Whole-genome sequence, DNA-DNA hybridization, fatty acid profile and phenotypic analysis further supported the conclusion that strain AF1078(T) represents a novel Arcobacter species, for which the name Arcobacter faecis sp. nov. is proposed, with type strain AF1078(T) (=LMG 28519(T); CCUG 66484(T)). Crown Copyright © 2015. Published by Elsevier GmbH. All rights reserved.

  11. Rapid detection, classification and accurate alignment of up to a million or more related protein sequences.

    PubMed

    Neuwald, Andrew F

    2009-08-01

    The patterns of sequence similarity and divergence present within functionally diverse, evolutionarily related proteins contain implicit information about corresponding biochemical similarities and differences. A first step toward accessing such information is to statistically analyze these patterns, which, in turn, requires that one first identify and accurately align a very large set of protein sequences. Ideally, the set should include many distantly related, functionally divergent subgroups. Because it is extremely difficult, if not impossible for fully automated methods to align such sequences correctly, researchers often resort to manual curation based on detailed structural and biochemical information. However, multiply-aligning vast numbers of sequences in this way is clearly impractical. This problem is addressed using Multiply-Aligned Profiles for Global Alignment of Protein Sequences (MAPGAPS). The MAPGAPS program uses a set of multiply-aligned profiles both as a query to detect and classify related sequences and as a template to multiply-align the sequences. It relies on Karlin-Altschul statistics for sensitivity and on PSI-BLAST (and other) heuristics for speed. Using as input a carefully curated multiple-profile alignment for P-loop GTPases, MAPGAPS correctly aligned weakly conserved sequence motifs within 33 distantly related GTPases of known structure. By comparison, the sequence- and structurally based alignment methods hmmalign and PROMALS3D misaligned at least 11 and 23 of these regions, respectively. When applied to a dataset of 65 million protein sequences, MAPGAPS identified, classified and aligned (with comparable accuracy) nearly half a million putative P-loop GTPase sequences. A C++ implementation of MAPGAPS is available at http://mapgaps.igs.umaryland.edu. Supplementary data are available at Bioinformatics online.

  12. Evaluation of PCR and high-resolution melt curve analysis for differentiation of Salmonella isolates.

    PubMed

    Saeidabadi, Mohammad Sadegh; Nili, Hassan; Dadras, Habibollah; Sharifiyazdi, Hassan; Connolly, Joanne; Valcanis, Mary; Raidal, Shane; Ghorashi, Seyed Ali

    2017-06-01

    Consumption of poultry products contaminated with Salmonella is one of the major causes of foodborne diseases worldwide and therefore detection and differentiation of Salmonella spp. in poultry is important. In this study, oligonucleotide primers were designed from hemD gene and a PCR followed by high-resolution melt (HRM) curve analysis was developed for rapid differentiation of Salmonella isolates. Amplicons of 228 bp were generated from 16 different Salmonella reference strains and from 65 clinical field isolates mainly from poultry farms. HRM curve analysis of the amplicons differentiated Salmonella isolates and analysis of the nucleotide sequence of the amplicons from selected isolates revealed that each melting curve profile was related to a unique DNA sequence. The relationship between reference strains and tested specimens was also evaluated using a mathematical model without visual interpretation of HRM curves. In addition, the potential of the PCR-HRM curve analysis was evaluated for genotyping of additional Salmonella isolates from different avian species. The findings indicate that PCR followed by HRM curve analysis provides a rapid and robust technique for genotyping of Salmonella isolates to determine the serovar/serotype.

  13. DNA copy number, including telomeres and mitochondria, assayed using next-generation sequencing.

    PubMed

    Castle, John C; Biery, Matthew; Bouzek, Heather; Xie, Tao; Chen, Ronghua; Misura, Kira; Jackson, Stuart; Armour, Christopher D; Johnson, Jason M; Rohl, Carol A; Raymond, Christopher K

    2010-04-16

    DNA copy number variations occur within populations and aberrations can cause disease. We sought to develop an improved lab-automatable, cost-efficient, accurate platform to profile DNA copy number. We developed a sequencing-based assay of nuclear, mitochondrial, and telomeric DNA copy number that draws on the unbiased nature of next-generation sequencing and incorporates techniques developed for RNA expression profiling. To demonstrate this platform, we assayed UMC-11 cells using 5 million 33 nt reads and found tremendous copy number variation, including regions of single and homogeneous deletions and amplifications to 29 copies; 5 times more mitochondria and 4 times less telomeric sequence than a pool of non-diseased, blood-derived DNA; and that UMC-11 was derived from a male individual. The described assay outputs absolute copy number, outputs an error estimate (p-value), and is more accurate than array-based platforms at high copy number. The platform enables profiling of mitochondrial levels and telomeric length. The assay is lab-automatable and has a genomic resolution and cost that are tunable based on the number of sequence reads.

  14. DNA copy number, including telomeres and mitochondria, assayed using next-generation sequencing

    PubMed Central

    2010-01-01

    Background DNA copy number variations occur within populations and aberrations can cause disease. We sought to develop an improved lab-automatable, cost-efficient, accurate platform to profile DNA copy number. Results We developed a sequencing-based assay of nuclear, mitochondrial, and telomeric DNA copy number that draws on the unbiased nature of next-generation sequencing and incorporates techniques developed for RNA expression profiling. To demonstrate this platform, we assayed UMC-11 cells using 5 million 33 nt reads and found tremendous copy number variation, including regions of single and homogeneous deletions and amplifications to 29 copies; 5 times more mitochondria and 4 times less telomeric sequence than a pool of non-diseased, blood-derived DNA; and that UMC-11 was derived from a male individual. Conclusion The described assay outputs absolute copy number, outputs an error estimate (p-value), and is more accurate than array-based platforms at high copy number. The platform enables profiling of mitochondrial levels and telomeric length. The assay is lab-automatable and has a genomic resolution and cost that are tunable based on the number of sequence reads. PMID:20398377

  15. Multiple-locus variable-number tandem-repeats analysis of Listeria monocytogenes using multicolour capillary electrophoresis and comparison with pulsed-field gel electrophoresis typing.

    PubMed

    Lindstedt, Bjørn-Arne; Tham, Wilhelm; Danielsson-Tham, Marie-Louise; Vardund, Traute; Helmersson, Seved; Kapperud, Georg

    2008-02-01

    The multiple-locus variable-number tandem-repeats analysis (MLVA) method for genotyping has proven to be a fast and reliable typing tool in several bacterial species. MLVA is in our laboratory the routine typing method for Salmonella enterica subsp. enterica serovar Typhimurium and Escherichia coli O157. The gram-positive bacteria Listeria monocytogenes, while not isolated as frequent as S. Typhimurium and E. coli, causes severe illness with an overall mortality rate of 30%. Thus, it is important that any outbreak of this pathogen is detected early and a fast trace to the source can be performed. In view of this, we have used the information provided by two fully sequenced L. monocytogenes strains to develop a MLVA assay coupled with high-resolution capillary electrophoresis and compared it to pulsed-field gel electrophoresis (PFGE) in two sets of isolates, one Norwegian (79 isolates) and one Swedish (61 isolates) set. The MLVA assay could resolve all of the L. monocytogenes serotypes tested, and was slightly more discriminatory than PFGE for the Norwegian isolates (28 MLVA profiles and 24 PFGE profiles) and opposite for the Swedish isolates (42 MLVA profiles and 43 PFGE profiles).

  16. Micropathogen Community Analysis in Hyalomma rufipes via High-Throughput Sequencing of Small RNAs

    PubMed Central

    Luo, Jin; Liu, Min-Xuan; Ren, Qiao-Yun; Chen, Ze; Tian, Zhan-Cheng; Hao, Jia-Wei; Wu, Feng; Liu, Xiao-Cui; Luo, Jian-Xun; Yin, Hong; Wang, Hui; Liu, Guang-Yuan

    2017-01-01

    Ticks are important vectors in the transmission of a broad range of micropathogens to vertebrates, including humans. Because of the role of ticks in disease transmission, identifying and characterizing the micropathogen profiles of tick populations have become increasingly important. The objective of this study was to survey the micropathogens of Hyalomma rufipes ticks. Illumina HiSeq2000 technology was utilized to perform deep sequencing of small RNAs (sRNAs) extracted from field-collected H. rufipes ticks in Gansu Province, China. The resultant sRNA library data revealed that the surveyed tick populations produced reads that were homologous to St. Croix River Virus (SCRV) sequences. We also observed many reads that were homologous to microbial and/or pathogenic isolates, including bacteria, protozoa, and fungi. As part of this analysis, a phylogenetic tree was constructed to display the relationships among the homologous sequences that were identified. The study offered a unique opportunity to gain insight into the micropathogens of H. rufipes ticks. The effective control of arthropod vectors in the future will require knowledge of the micropathogen composition of vectors harboring infectious agents. Understanding the ecological factors that regulate vector propagation in association with the prevalence and persistence of micropathogen lineages is also imperative. These interactions may affect the evolution of micropathogen lineages, especially if the micropathogens rely on the vector or host for dispersal. The sRNA deep-sequencing approach used in this analysis provides an intuitive method to survey micropathogen prevalence in ticks and other vector species. PMID:28861401

  17. 3D: diversity, dynamics, differential testing - a proposed pipeline for analysis of next-generation sequencing T cell repertoire data.

    PubMed

    Zhang, Li; Cham, Jason; Paciorek, Alan; Trager, James; Sheikh, Nadeem; Fong, Lawrence

    2017-02-27

    Cancer immunotherapy has demonstrated significant clinical activity in different cancers. T cells represent a crucial component of the adaptive immune system and are thought to mediate anti-tumoral immunity. Antigen-specific recognition by T cells is via the T cell receptor (TCR) which is unique for each T cell. Next generation sequencing (NGS) of the TCRs can be used as a platform to profile the T cell repertoire. Though there are a number of software tools available for processing repertoire data by mapping antigen receptor segments to sequencing reads and assembling the clonotypes, most of them are not designed to track and examine the dynamic nature of the TCR repertoire across multiple time points or between different biologic compartments (e.g., blood and tissue samples) in a clinical context. We integrated different diversity measures to assess the T cell repertoire diversity and examined the robustness of the diversity indices. Among those tested, Clonality was identified for its robustness as a key metric for study design and the first choice to measure TCR repertoire diversity. To evaluate the dynamic nature of T cell clonotypes across time, we utilized several binary similarity measures (such as Baroni-Urbani and Buser overlap index), relative clonality and Morisita's overlap index, as well as the intraclass correlation coefficient, and performed fold change analysis, which was further extended to investigate the transition of clonotypes among different biological compartments. Furthermore, the application of differential testing enabled the detection of clonotypes which were significantly changed across time. By applying the proposed "3D" analysis pipeline to the real example of prostate cancer subjects who received sipuleucel-T, an FDA-approved immunotherapy, we were able to detect changes in TCR sequence frequency and diversity thus demonstrating that sipuleucel-T treatment affected TCR repertoire in blood and in prostate tissue. We also found that the increase in common TCR sequences between tissue and blood after sipuleucel-T treatment supported the hypothesis that treatment-induced T cell migrated into the prostate tissue. In addition, a second example of prostate cancer subjects treated with Ipilimumab and granulocyte macrophage colony stimulating factor (GM-CSF) was presented in the supplementary documents to further illustrate assessing the treatment-associated change in a clinical context by the proposed workflow. Our paper provides guidance to study the diversity and dynamics of NGS-based TCR repertoire profiling in a clinical context to ensure consistency and reproducibility of post-analysis. This analysis pipeline will provide an initial workflow for TCR sequencing data with serial time points and for comparing T cells in multiple compartments for a clinical study.

  18. Identification of a Novel HADHB Gene Mutation in an Iranian Patient with Mitochondrial Trifunctional Protein Deficiency.

    PubMed

    Shahrokhi, Mahdiyeh; Shafiei, Mohammad; Galehdari, Hamid; Shariati, Gholamreza

    2017-01-01

    Mitochondrial trifunctional protein (MTP) is a hetero-octamer composed of eight parts (subunits): four α-subunits containing LCEH (long-chain 2,3-enoyl-CoA  hydratase) and LCHAD (long-chain 3-hydroxyacyl CoA dehydrogenase) activity, and four β-subunits that possess LCKT (long-chain  3-ketoacyl-CoA thiolase) activity which catalyzes three out of four steps in β-oxidation spiral of long-chain fatty acid. Its deficiency is an autosomal recessive disorder that causes a clinical spectrum of diseases. A blood spot was collected from the patient's original newborn screening card with parental informed consent. A newborn screening test and quantity plasma acylcarnitine profile analysis by MS/MS were performed. After isolation of DNA and Amplification of all exons of the HADHA and HADHB, directly Sequence analyses of all exons and the flanking introns both of genes were performed. Here, we report a novel mutation in a patient with MTP deficiency diagnosed with newborn screening test and quantity plasma acylcarnitine profile analysis by MS/MS and then confirmed by enzyme analysis in cultured fibroblasts and direct sequencing of the HADHA and HADHB genes. Molecular analysis of causative genes showed a missense mutation (p.Q385P) c.1154A > C in exon 14 of HADHB gene. Since this mutation was not found in 50 normal control cases; so it was concluded that c.1154A > C mutation was a causative mutation. Phenotype analysis of this mutation predicted pathogenesis which reduces the stability of the MTP protein complex.

  19. Digital gene expression profiling of flax (Linum usitatissimum L.) stem peel identifies genes enriched in fiber-bearing phloem tissue.

    PubMed

    Guo, Yuan; Qiu, Caisheng; Long, Songhua; Chen, Ping; Hao, Dongmei; Preisner, Marta; Wang, Hui; Wang, Yufu

    2017-08-30

    To better understand the molecular mechanisms and gene expression characteristics associated with development of bast fiber cell within flax stem phloem, the gene expression profiling of flax stem peels and leaves were screened, using Illumina's Digital Gene Expression (DGE) analysis. Four DGE libraries (2 for stem peel and 2 for leaf), ranging from 6.7 to 9.2 million clean reads were obtained, which produced 7.0 million and 6.8 million mapped reads for flax stem peel and leave, respectively. By differential gene expression analysis, a total of 975 genes, of which 708 (73%) genes have protein-coding annotation, were identified as phloem enriched genes putatively involved in the processes of polysaccharide and cell wall metabolism. Differential expression genes (DEGs) was validated using quantitative RT-PCR, the expression pattern of all nine genes determined by qRT-PCR fitted in well with that obtained by sequencing analysis. Cluster and Gene Ontology (GO) analysis revealed that a large number of genes related to metabolic process, catalytic activity and binding category were expressed predominantly in the stem peels. The Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis of the phloem enriched genes suggested approximately 111 biological pathways. The large number of genes and pathways produced from DGE sequencing will expand our understanding of the complex molecular and cellular events in flax bast fiber development and provide a foundation for future studies on fiber development in other bast fiber crops. Copyright © 2017 Elsevier B.V. All rights reserved.

  20. Global Analysis of Gene Expression Profiles in Developing Physic Nut (Jatropha curcas L.) Seeds

    PubMed Central

    Jiang, Huawu; Wu, Pingzhi; Zhang, Sheng; Song, Chi; Chen, Yaping; Li, Meiru; Jia, Yongxia; Fang, Xiaohua; Chen, Fan; Wu, Guojiang

    2012-01-01

    Background Physic nut (Jatropha curcas L.) is an oilseed plant species with high potential utility as a biofuel. Furthermore, following recent sequencing of its genome and the availability of expressed sequence tag (EST) libraries, it is a valuable model plant for studying carbon assimilation in endosperms of oilseed plants. There have been several transcriptomic analyses of developing physic nut seeds using ESTs, but they have provided limited information on the accumulation of stored resources in the seeds. Methodology/Principal Findings We applied next-generation Illumina sequencing technology to analyze global gene expression profiles of developing physic nut seeds 14, 19, 25, 29, 35, 41, and 45 days after pollination (DAP). The acquired profiles reveal the key genes, and their expression timeframes, involved in major metabolic processes including: carbon flow, starch metabolism, and synthesis of storage lipids and proteins in the developing seeds. The main period of storage reserves synthesis in the seeds appears to be 29–41 DAP, and the fatty acid composition of the developing seeds is consistent with relative expression levels of different isoforms of acyl-ACP thioesterase and fatty acid desaturase genes. Several transcription factor genes whose expression coincides with storage reserve deposition correspond to those known to regulate the process in Arabidopsis. Conclusions/Significance The results will facilitate searches for genes that influence de novo lipid synthesis, accumulation and their regulatory networks in developing physic nut seeds, and other oil seeds. Thus, they will be helpful in attempts to modify these plants for efficient biofuel production. PMID:22574177

Top